The era of nine-digit defects

These massive outages cause big consumer headaches and sometimes cost CTOs, CIOs and CEOs their jobs

money loss bankrupt crash black hole
Thinkstock

Four hundred and forty million dollars flushed in 30 minutes at Knight Trading. More than $200 million in liabilities at Target Stores. More than £170 million lost at Royal Bank of Scotland. Delta and Southwest Airlines each down more than $150 million. The list continues to grow. We have entered the era of 9-digit defects.

If the trend toward multimillion dollar defects, some reaching 9 digits continues, or as I expect accelerates, the status quo in IT will change, and not from inside the IT community.

When losses from IT malfunctions hit 5 or 6 digits, IT managers are at risk. When losses hit 7 or 8 digits, IT and line-of-business executives are at risk. When losses hit 9 digits C-Level jobs are at risk.

Most often these 9-digit fiascos result from software flaws inside a system. Three trends magnify the impact of software malfunctions, driving business liabilities toward 9 digits.

  1. With increasing digital transformation, a far greater slice of business operations from sales to delivery is integrated and controlled by software, thus rapidly spreading the effects of a malfunction across the value chain.
  2. Businesses are now systems of systems, expanding complexity exponentially and concealing the triggers for 9-digit losses in a thicket of cross-system interactions.
  3. Increased competition, especially online, has prioritized speed-to-business over operational risk and corrective maintenance costs, a huge gamble for systems not designed to expect and manage failures.

The havoc wreaked by 9-digit defects can potentially reverse the trend toward decentralization. Decentralization is key to many current theories and practices in organizational design. The growing deployment of agile methods in IT decentralizes control to development teams.

Nine-digit defects create crises. In crisis mode, organizations tend to centralize control. When the effects of a 9-digit defect hit the nightly news, executive management usually reacts with policies that increase their perceived governance over the frequency and impact of software malfunctions. However, reactive policies are too late, rarely address root causes, and usually fail to properly balance control with agility.

If disruptions or hacks become more prevalent and damaging to customers, public demands for regulation will increase. Regulatory bodies like the Payment Card Industry Security Standards Council or Federal Drug Administration usually require compliance to specific practices before products or services can be certified or offered to the public. While some regulation is necessary, highly publicized crises frequently force regulatory bodies to overreact.

If the tipping point for greater regulation and centralized control has not been passed, avoiding it will require greater adherence to software best practices that move software development toward an engineering discipline. Example best practices include:

  1. Dedicating effort to identifying, fixing, or redeveloping defect-ridden components on the riskiest systems
  2. Providing system architects with more authority to enforce standards
  3. Requiring sufficient up-front architecting of business, mission, or device-critical systems
  4. Protecting effort buffers within development sprints dedicated to removing high-severity defects and technical debt
  5. Enforcing defined quality assurance processes prior to release
  6. Implementing operational analytics that identify external attacks or predict system malfunctions
  7. Ensuring developers know and practice secure and resilient coding practices
  8. Establishing quality targets specific to each critical system and certifying their quality levels through automated code analysis
  9. Increasing understanding within Boards and C-Levels of IT processes, risks, and governance

These practices will not guarantee 9-digit defects are eliminated, but their likelihood will be dramatically reduced. Better architectural and coding practices can also implement the internal system safeguards that limit the damage from potentially devastating defects long before they spiral toward to 9 digits. In the era of 9-digit defects, curtailing best practices to expedite a release is gambling that the next chamber is empty.

The business will always prioritize new features over the removal of technical debt, thus trading operational liabilities and maintenance costs for speed to market. In a few cases the cost of proven best practices and corrective maintenance is less than the financial benefit of speeding new functionality to market. However, that argument evaporates when a 9, or even 6-digit defect detonates. IT costs cannot be reduced if levels of system dependability appropriate to each application are not achieved.

Nine-digit defects are a boardroom issue. C-Levels and Boards cannot understand how digital businesses operate without familiarity with IT. They cannot govern IT risk if they do not understand how it is created. Boards should be constructed with one or more members knowledgeable in IT and digital transformation. Otherwise, control over profitability may be lurking deep in a critical system.

Copyright © 2016 IDG Communications, Inc.

How to choose a low-code development platform