One of my first big Solaris projects was to build and deploy a large-scale NIS+ infrastructure to a 10-city WAN. At the time, NIS was viewed as legacy and NIS+ was the way of the future. That might have actually been the case if NIS+ wasn't so amazingly convoluted and difficult to administer. The decision was made to convert from NIS to NIS+ across the entire network, and each site was to get their own pair of NIS+ servers, linked to a master running at headquarters. I learned far more about NIS+ than any human should ever really know and built the whole thing using SparcStation IPX boxes, which were the odd Mac Mini-looking boxes Sun produced in the mid-nineties. They didn't have much horsepower or RAM, but for these purposes, they fit the bill. My cube had 20 of these little boxes, stacked in columns of five, and I spent the better part of three weeks building them, configuring them, and labeling them for shipping to each site. When they got there, they'd get plugged in, and theoretically Just Work. And they did. For nearly a month. At that time, a new admin on the site decided that he needed to rebuild the Kerberos keys on the master NIS+ server at HQ. Naturally, he didn't tell anyone about this, and of course, he didn't really need nor want to do this, since as soon as he did, the slaves at each office started failing when they tried to update their maps. As soon as I realized what was happening, the race was on to salvage the maps before they were completely lost. I managed to pull raw copies of the maps from the last pair of slaves that hadn't yet tried to update the maps and spent 24 straight hours rebuilding the whole authentication infrastructure. Good times.
Then there was the day that the first shipment of Sun E450s arrived. They were squat boxes that looked like they were made of Legos and packed a huge punch for the time. I recall conversing with a Sun engineer over dinner about the curious case of the multi-million-dollar Sun E10Ks that were inexplicably breaking at odd times, segfaulting all over the place for no rhyme or reason. After many months of troubleshooting and general puzzlement, some bright engineer covered the RAM with aluminum foil, and the problems went away. Apparently, the shielding wasn't sufficient and gamma rays were randomly flipping bits in active RAM. Talk about a non-obvious solution.
Burned into my memory is the 16-straight-hour effort I put in to attempt to salvage a Solstice Disksuite software RAID array that had fallen prey to an overzealous and underequipped admin. Naturally, there were no backups. Naturally, this was a mission-critical Oracle database installation. I was finally able to reconstruct the databases, but it wasn't pretty.