This opening day was followed by three weeks of similar incidents. The systems were running at 80v rather than the minimum allowed 100v, which accounted for the high power-supply failure issues. At night, the engineers would just hit the master power switch to turn off the system rather than shut the system itself down, which explained some of the disk problems they were having. On the software side, I discovered that the system was badly misconfigured, which caused performance problems.
But the biggest issues were with the IT and business managers, who tended to be close-minded and did not want to change. The business managers tended to say, "There must not be any problems." The IT managers had minimal, if any, technical background, and many of the systems engineers were very junior employees. This mix made for a culture in which it was very hard to speak up and implement positive changes.
The managers' default method of dealing with a problem was to blame the vendor. It took another eight months before they took some responsibility for their own actions. And the main reason they started taking responsibility was a series of news articles about mismanagement at the exchange.
The takeaway I had from the long experience was to start working with those at the top, not in the middle. When I started the assignment, I was working with middle managers who couldn't make policy decisions, and we didn't get anywhere. Even though working with the top IT managers was also a dead end for a while, at least I was dealing with those who could actually make the changes. Chipping away at the problems with the top managers got us further faster than working at the midlevels would have.