We all like to think we learn from mistakes, whether our own or others’. So in theory, the more serious bloopers you know about, the less likely you are to be under the bright light of interrogation, explaining how you managed to screw up big-time. That’s why we put out an all-points bulletin to IT managers and vendors everywhere: For the good of humanity, tell us about the gotchas that have gotten you, so others can avoid them.
As it turns out, our many contributors to this article had a lot to say -- but precious little to say on record. Names may be withheld, but the lessons are still potent. We’ve distilled this glut of information down to the top 20 mistakes -- instances in which wrong decisions can lead to costly project overruns, business disasters, and in the worst cases, lost jobs. Read on, takes notes, and avoid.
1. Botching your outsourcing strategy
Mistakes relating to outsourcing could easily fill our top 20 list on their own. There are two different flavors. The first is the sin of commission: outsourcing important IT functions to avoid the hard work of understanding them. Relinquishing those functions can make it hard to get simple things done.
The other mistake is to hold on to functions that could easily and effectively be outsourced, such as running your own messaging environment. IT organizations with an overt bias against outsourcing could be courting disaster. For example, one CTO we interviewed took over operations for a Manhattan-based online services company, only to discover that the Web-hosting infrastructure for all mission-critical and revenue-producing applications was in-house because the IT staff didn’t trust third-party operations. When the great blackout of August 2003 darkened parts of Manhattan for as long as 28 hours, the company’s UPS systems kept everything running for only a relatively short time -- while competitors at well-provisioned Web-hosting companies experienced no downtime.
2. Dismissing open source -- or bowing before it
For better or worse, many IT shops are susceptible to “religious” behavior -- a blind, unyielding devotion to a particular technology or platform. Nowhere is that more true than with open source.
On the one hand, the most conservative IT shops dismiss open source solutions as a matter of policy. That’s a big mistake: Taking an indefinite wait-and-see attitude toward open source means passing up proven, stable, and scalable low-cost solutions such as Linux, Apache, MySQL, and PHP. On the other hand, insisting on open source purity in your IT operation can delay progress, as developers are forced to cobble together inferior or unwieldy open source solutions when more appropriate commercial software solutions already exist.
Open source software is not inherently better than commercial software; it all depends on the problem to be solved and the maturity of the solution being considered.
3. Offshoring with blinders on
Any list of IT mistakes would be incomplete without a mention of offshoring. The experience of one vice president of operations provides an instructive cautionary tale. At his previous employer, the vice president opened a branch office in India for software development and encountered numerous surprises, many counter to conventional offshoring wisdom.
At the time, India had been experiencing an IT employment boom similar to that of Silicon Valley in the late ’90s. According to the vice president, the workforce was not stable as a result. Transportation difficulties and the importance of time with family in Indian culture meant that employees generally worked eight-hour days -- the concept of the Silicon Valley engineer who goes sleepless at release time was, well, foreign.
In the end, the cost of offshoring the branch office was only 20 percent less than the going rate in the United States, and for cultural reasons, far more face time than initially expected was needed to ensure the commitment U.S. management demanded -- which resulted in trips to India at least once per quarter. The vice president emphasized that offshoring can indeed work but said it’s a mistake to assume that managing offshore IT is in any way equivalent to managing local IT or that cost savings will be as dramatic as you might expect.
4. Discounting internal security threats
IT managers focusing on external threats can easily lull themselves into a sense of false security. According to Gartner, 70 percent of security incidents that incur actual losses are inside jobs, making the insider threat arguably the most critical one facing the enterprise.
Of course, not all insider threats are born of malicious intent. In September 2004, HFC Bank, one of the United Kingdom’s largest banks, sent to 2,600 customers an e-mail that, due to an internal operator error, made recipients’ e-mail addresses visible to everyone else on the list. The problem was compounded when customers’ out-of-office messages -- containing home and mobile phone numbers -- responded to the mailing.
Even malicious acts are often carried out using very little technical sophistication. In a joint study released this year by CERT and the Secret Service, 87 percent of insider security breaches were found to have been achieved using simple, legitimate user commands, suggesting that IT needs to be vigilant about granting only necessary privileges to end-users. Identity management with specific permissions can help.
5. Failing to secure a fluid perimeter
IT’s responsibility now extends to Starbucks and beyond. The increasing mobility of workers, combined with the proliferation of public wireless hotspots and broadband in the home, means that IT is now responsible for securing systems on networks it does not control. In this environment, solid security means implementing host-based firewalls that will provide some level of protection on an unsecured broadband connection at home or at sites with public Wi-Fi access.
If you’re an experienced IT manager, you might feel comfortable with the top-of-the-line firewall you purchased three years ago. You configure it to block all incoming traffic except port 25 for inbound e-mail, and your employees generally make outbound WAN connections to the Web via ports 80 and 443. This is a common approach, but in a more decentralized IT environment, centralized approaches to network security are no longer sufficient. By encrypting traffic on your internal LAN, you will better protect your network from insider threats and from intruders who might have hopped onto your network via rogue wireless access points.
6. Ignoring security for handhelds
Although even inexperienced IT managers recognize the need for username/password authentication on network resources and desktop and laptop PCs, most IT shops still seem to be in a “wild West” phase when it comes to handheld devices.
A CTO of a wireless software company tells us about a venture capitalist who lost his BlackBerry on a business trip while he was in the middle of closing a highly sensitive, confidential deal. The BlackBerry wasn’t password-protected, so even after the panicked venture capitalist contacted his IT department to have e-mail delivery to the device stopped, anyone who happened to pick up the lost BlackBerry could read e-mails already received.
In this case, the minor convenience of not requiring a password had major implications. Ignoring the security of easily lost devices, particularly those belonging to key executives that traffic in confidential information, is a recipe for disaster.
7. Promoting the wrong people
As CTO or CIO, rewarding your top technologist with a promotion to a management position might seem like the right thing to do. But when a technologist is not ready to give up constant, hands-on technology work in favor of more people-oriented management duties, it could be a mistake you’ll regret on many levels.
One vice president of IT painted a grim picture of such a decision: The promoted employee could be resented by former peers and might not like the new management duties, which could lead to poor performance. Even worse, the new manager might feel compelled to cling to the ill-fitting position because the old position might no longer be available.
Just such an experience put this particular vice president in the tough position of having to deal with a new manager’s performance problems, which led to a double whammy: A top technologist left the company, and the new manager still had to be fired.
Management training can help avoid such disasters. But use your gut. Either the aptitude is there, or it isn’t.
8. Mishandling change management
The former CTO of a computer equipment manufacturer describes one situation in which a talented, but perhaps overly ambitious, systems administrator decided to make seemingly simple changes to a set of critical servers during routine maintenance.
While this individual was making the changes, all of which had been agreed on and planned in advance, he decided on his own to upgrade BIND (Berkeley Internet Name Domain), the open source server software that powers mission-critical local DNS for many companies.
A few hours later, the entire business was at a standstill, as all DNS functions failed. Reversing the “one small change” took hours, and millions of dollars in revenue were likely lost as a result. The lesson is that even talented employees can cause major problems when they don’t follow change management procedures.
Remember, change management is cultural. It all starts at the top: If IT management cuts corners, so will IT staff.
9. Mismanaging software development
In his seminal book The Mythical Man-Month, Frederick Brooks posited that planning software-development projects based on per-unit “man-months” ultimately does not work due to the unique nature of software development.
Even if the building of software could be broken into easily managed, interchangeable time units, the vast productivity difference between the best coders and merely average ones means IT managers might get their best work out of fewer, but more talented, programmers doing their work in less time.
Henri Asseily, CTO of BizRate, tells us via e-mail, “The right individual will always create better and faster core software than a group of people [will]. Everyone in every industry talks the usual talk of, ‘We invest in people,’ or, ‘Our people are our greatest asset,’ but nowhere is it more important than in IT. Simply put, a great programmer is 100 times more valuable than a regular programmer.”
The mythical man-month has been part of software lore since Brooks’ book came out 30 years ago, but many IT managers still plan projects and staff them based on this disproved paradigm. Holding on to this method might lead a naïve IT manager to staff a project with the right number of people for a defined amount of work, but CTOs such as Asseily insist that getting quality people is most important.
“IT managers should devote most of their free time to [finding] the best people. Almost nothing else matters, really,” Asseily says.
10. Letting engineers do their own QA
Not allowing engineers to do their own QA is an axiom of software development, but for small software development teams, there is always the temptation to cut corners. In fact, sometimes management colludes with developers to enable the practice. One CTO relates a situation in which a software development project was running significantly behind schedule and the lead developer had begun to do his own QA to try to speed up the release process. To make matters worse, the lead developer had planned a vacation that was approaching rapidly. A day before the vacation commenced, the developer pronounced all outstanding bugs resolved, and the system was released into production. By the time the developer arrived at his tropical destination, the system was crashing to the point of being unusable. Many of the existing bugs had not been corrected because the developer had not tested thoroughly or formally. Allowing engineers to perform their own QA is akin to allowing defendants to be the judges and juries for their own trials.
11. Developing Web apps for IE only
Despite the fact that mission-critical applications continue their march onto the Web browser and that Windows continues to dominate the corporate desktop, Web developers should avoid the temptation to develop applications only for bug-ridden IE. IT shops that insist on using IE for Web applications should be prepared to deal with malicious code attacks such as JS.Scob.
First discovered in June 2004, JS.Scob was distributed via compromised IIS Web servers. The code itself quietly redirects customers of compromised sites to sites controlled by a Russian hacking group. There, unwitting IE users download a Trojan horse program that captures keystrokes and personal data. Although this might not sound like a threat to corporate IT, keep in mind that employees often use the same passwords across corporate and personal assets.
Many enterprises may not be able to avoid using IE. But if you make sure your key Web applications don’t depend on IE-only functionality, you’ll have an easier time switching to an alternative, such as Mozilla Firefox, if ongoing IE security holes become too burdensome and risky for your IT environment.
12. Relying on a single network performance
When it comes to network performance, there’s no single metric by which to judge network health. Douglas Smith, president of network analysis vendor Network Instruments, points out that it’s a mistake to think that network utilization can be quantified in a single way. When management asks for a single network utilization report, IT is typically sent scurrying for a single metric for network health that is ultimately impossible to define.
That said, certain aspects of a network, such as port utilization, link utilization, and client utilization, can and should be measured. In any scenario, successful network analysis means taking a step back and looking at the data in the context of your enterprise.
Network utilization requires judgment calls. If two ports on a switch are 90 percent utilized and the others are not utilized, do you consider your switch utilization to be 90 percent? It might be more appropriate to ask which application is causing those particular ports to reach 90 percent utilization. Understanding the big picture and analyzing utilization levels in context are the keys to getting a sense of your network’s health.
13. Throwing bandwidth at a network problem
One of the most common complaints addressed by IT is simple: The network is running slower than normal. The knee-jerk reaction is to add more capacity. This is the right solution in some cases but dead wrong in others. Without the proper analysis, upgrading capacity can be a costly, unwise decision. Network Instruments’ Smith likens this approach to saying, “I’m running low on closet space, and therefore I need a new house.”
Capacity aside, common root causes of slowdowns include unwanted traffic broadcasting over the network from old systems or apps, such as IPX traffic, or misconfigured or inefficient applications that spew streams of packets onto the network at inconvenient times.
According to Smith, one of Network Instruments’ banking customers was considering upgrading its WAN links due to complaints from tellers that systems were running slow. The IT team used a network analyzer to determine that increased traffic levels were being caused by a security app that ran a daily update at 3 p.m. When the IT team reconfigured this application to make updates at 3 a.m. instead, they were able to quickly improve traffic levels without making the costly WAN upgrade.
14. Permitting weak passwords
In the Internet age, new threats such as worms and phishing tend to garner all the security attention, but the SANS Institute’s Top 20 Vulnerabilities list released in October points to a basic IT mistake: weak authentication or bad passwords (infoworld.com/2193). The most common password vulnerabilities include weak or nonexistent passwords; user accounts with widely known or physically displayed passwords (think Post-it Notes); administrative accounts with weak or widely known passwords; and weak or well-known password-hashing algorithms that are not well secured or are visible to anyone. Avoiding the weak authentication mistake boils down to simple IT blocking and tackling -- a clear, detailed, and consistently enforced password policy that proactively deals with the most exploited authentication weaknesses detailed in the SANS report.
15. Never sweating the small stuff
CTOs and CIOs like to talk about the strategic application of technology, but ignoring basic tactical issues can lead to simple but extremely costly mistakes. Missing a $30 domain name registration payment can be enough to grind your business to a halt. In one notorious example, last February a missed payment by The Washington Post knocked out employee e-mail for hours until the renewal was paid.
As datacenter environments become denser, even low-level facilities issues may demand scrutiny. On his Weblog, Sun Microsystems President Jonathan Schwartz quoted a CIO who responded to a “what keeps you up at night” question with, “I can no longer supply enough power to, or exhaust heat from [our datacenter]. I feel like I’m running hot plates, not computers.” A CIO who overlooks burning -- but not necessarily obvious -- issues such as these may soon be in search of another job.
16. Clinging to prior solutions
A common mistake for IT managers moving into a new position at a new company is to try to force solutions and approaches that worked at a prior job into a new environment with different business and technology considerations.
One current vice president of operations describes a new, low-cost open source environment he had to manage after working in a more traditional shop that relied on high-end Sun hardware and Oracle and Veritas software. The new startup company couldn’t afford the up-front cash required to set up a rock-solid environment based on commercial software, so they ran a LAMP (Linux, Apache, MySQL, PHP) architecture with an especially aggressive Linux implementation on 64-bit AMD Opteron machines. Gradually, the vice president realized that his old solutions wouldn’t work in the new environment from a technology or cost angle, so he changed his approach to fit the new reality, using none of the technologies from his prior job.
17. Falling behind on emerging technologies
Staying current can prevent a disaster. For instance, the emergence of inexpensive consumer wireless access points during the past few years has meant that anyone can create a wireless network -- a real problem for any reasonably structured corporate IT environment. A Network Instruments retail client, for example, was installing a WLAN to serve the needs of employees who measured warehouse inventory levels. Soon enough, management wanted access to the WLAN, and without asking for approval, some employees installed wireless access points at their desks.
Fortunately, the IT staff had implemented ways to check for rogue access points, and a WLAN channel scan with a network analyzer quickly showed there were more access points on the network than the administrator knew had been deployed. In this case, the IT staff recognized an emerging technology that might be stealthily introduced by employees and developed procedures to inventory the threat, thereby controlling it.
18. Underestimating PHP
IT managers who look only as far as J2EE and .Net when developing scalable Web apps are making a mistake by not taking a second look at scripting languages -- particularly PHP. This scripting language has been around for a decade now, and millions of Yahoo pages are served by PHP each day.
Discussion of PHP scalability reached a high-water mark in June, when the popular social-networking site Friendster finally beat nagging performance woes by migrating from J2EE to PHP. In a comment attached to a Weblog post about Friendster’s switch to PHP, Rasmus Lerdorf, inventor of PHP, explained the architectural secret of PHP’s capability of scaling: “Scalability is gained by using a shared-nothing architecture where you can scale horizontally infinitely.”
The stateless “shared-nothing” architecture of PHP means that each request is handled independently of all others, and simple horizontal scaling means adding more boxes. Any bottlenecks are limited to scaling a back-end database. Languages such as PHP might not be the right solution for everyone, but pre-emptively pushing scripting languages aside when there are proven scalability successes is a mistake.
19. Violating the KISS principle
Doug Pierce, technical architect at Datavantage, says that violating the KISS (keep it simple, stupid) principle is a systemic problem for IT. Pierce says he has seen “hundreds of millions” of dollars wasted on implementing, failing to implement, or supporting solutions that are too complex for the problem at hand. According to Pierce, although complex technologies such as CORBA and EJB are right for some organizations, many of the organizations using such technologies are introducing unnecessary complexity.
This violation of the KISS principle directly contributes to many instances of project failures, high IT costs, unmaintainable systems, and bloated, low-quality, or insecure software. Pierce offers a quote from Antoine de Saint-Exupery as a philosophical guide for rooting out complexity in IT systems: “You know you’ve achieved perfection in design, not when you have nothing more to add, but when you have nothing more to take away.”
20. Being a slave to vendor marketing strategies
When it comes to network devices, databases, servers, and many other IT products, terms such as “enterprise” and “workgroup” are bandied about to distinguish products, but often those terms mean little when it comes to performance characteristics.
Quite often a product labeled as a “workgroup” product has more than enough capacity for enterprise use. The low cost of commodity hardware -- particularly when it comes to Intel-based servers -- means that clustering arrays of cheap, workgroup hardware into an enterprise configuration is often more redundant and scalable than buying more expensive enterprise servers, especially when it comes to Web apps.