You sure witnessed it with your own eyes (or are going to) sooner or later: that dreadful project/system/situation where something got SO screwed up you just can't believe it actually went like it did.
Mismanagement? Misbudgeting? Misunderstanding? Just silly, plain ignorance? Name your cause, it sure happened (and keeps happening a lot, sadly; see here).
Describe it here, for amusement (although somewhat of the cynic kind) and learning (hopefully).
Some rules:
- This is not the place for random (even if utterly devastating) admin errors, so please avoid "argh I mistyped that rm -r" or "OMG I JUST COPIED THE CORRUPTED DATABASE OVER MY LAST GOOD BACKUP" (been there, done that); those things are better here. This is about "what sort of drugs was exactly under the influence of who designed/implemented this system?".
- One WTF per post, so they can get properly commented.
- Please post something you actually witnessed :-)
- If it was you who did it, it still qualifies :-)
I'll be adding some material soon, feel free to add your own; and please do :-)
I was called from a company I never heard of before, which had been tasked with implemented an Exchange 2003 mail server for a customer and had no clue at all about how to do it; nothing too strange, right? I work as a freelance consultant, so I'm perfectly fine doing jobs you don't know how to do for you (and getting your money for it).
So I went at the customer site, and discovered something quite strange: every single server in the network was a domain controller; all 15 or so of them.
Then I discovered something even stranger: no one of them was properly replicating with any other, Active Directory overall behaviour could only be described as "erratic", users were having about any network issue you can imagine and Exchange just refused to install with unknown-to-mankind errors.
So I had a look at the network configuration on the server, and I saw... it was using the ISP's public DNS servers. Then I look at another server... and it was the same. Then I look at a DC... same thing. Then I asked... and it was officially confirmed: each and every computer on the network (about 1500) was using the ISP's DNS instead of a rightful domain controller.
I proceeded in explaining DNS is quite critical for proper Active Directory operation, and was able to reconstruct the back story:
Once upon a time I had a client that was a small business (10 people) with an electronic health record. (Not a medical doctor). I noted one day that the backups had been failing. Upon testing, the tape drive was not working at all. I mentioned this to the owner, who said he was well aware that the drive was bad, but was too expensive to replace.
Sure -- thats not very WTF.
The WTF is he had his staff rotating the tape daily, taking it to a safety deposit box, and all that jazz for the 6-9 months since it died.
"Don't tell the staff, it might worry them"
I was working as a sysadmin for a Big Government Agency (one of the main bodies of Italy's governement), and had been managing their data center for some months. One evening, my phone rings and my boss tells me something Very Bad is happening: total power outage.
Ok, we have UPSes, right?
Yes, but they won't last long, so better go there and shut down everything until power returns.
I go there, make my way through the dark corridors, arrive at the server room... and am greeted by what can only be described as pure hell. Literally. The room was so hot you could have baked cakes in it. UPS power was ok, but half of the servers had already shut down from overheating and the remaining ones were screaming in agony.
The reason?
Servers were on UPS power... air conditioning was not.
Email response from a Microsoft support engineer to a reported issue:
Gold!
HP ProLiant ML370 G3 fan failure... The fan sensors on the motherboard of this model tended to go bad after 5 years. The server can't boot when the wrong combination of fans is not detected. I had to walk the customer through jumpstarting the machine with a shop-vac (to get the fans spinning at boot) and this is how they kept the server running until I could arrive with a new system.
I used to be an accounting software consultant for Dac-Easy accounting. One time I was called into the main office of a local business and was told by the accountant that if I couldn't resolve why the program was full of accounting errors each weekend, they would have to find another application and consultant. Going through the administrative log files, I discovered all of the entries were usually made on Friday or Saturday nights. I then found out that the owner's wife was logging in to the accounting system computer from home using PC Anywhere and trying to balance the accounts with her checkbook after several glasses of wine. Once the numbers looked good, she'd log off.
Another customer, another horror story.
In the main post I talked about having erroneously overwritten a good backup with the corrupted database it was meant to replace; it happens :-(
So a restore from backup was neeeded. Luckily, there actually WAS a backup there: it was done daily, on a central backup server with a Really Big tape library attached to it; this server managed backups for the whole company, it was Really Expensive and had a Real Backup Software installed on it.
So far, so good. We look up the backup job, load the proper tape, start the restore operation, the tape is loaded, the restore starts... and just nothing happens.
We try again, some thing.
We unload, reload, reboot, try restoring previous backups... nothing changes.
We assume some long operation was going on, and leave it on the whole night... next day, still nothing changes.
Ok, time to call the Real Backup Software vendor's support... but it can't be done, we are on Sunday. We try look up the vendor's support site, but a special access code is required, and only one manager has it... the same manager which will be Really upset on discovering the system is still down on Monday, when he comes to work.
Another day of pain, and I discover the bug is well-known, and it has been fixed by a vendor patch, which (obviously) nobody bother applying. So I go to apply it... but it can't be done: management doesn't want to risk breaking anything unless the vendor confirms the patch can be safely applied; the fact that the backup server wasn't able to restore anything apparently didn't look enough "breaked" to them.
Only after four total days, various support calls and the vendor sending a support engineer onsite we were finally able to apply the patch and restore the backup; the backup server hadn't EVER been able to do restores, but nobody ever tested it, so nobody noticed.
I lifted a computer I received from our parent company's office when I heard something rattling around inside. When I opened the case, I found it 1/2 full with Frosted Mini-Wheats. I figure a mouse was living in it or using a food cache. Likely entry point was the gap between the case and the DIN keyboard jack.
Not exactly what you asked for, but a definite WTF.
I knew someone who decided to reorganize the files on their computer by putting all the .exe's in one folder.
A network with ~60 (SIXTY) PCs.
A security-fanatic boss.
Some new switches with VLAN capabilities.
A "network reorganization plan" involving ~20 (TWENTY) VLANs.
Thanks to some unknown upper power, I left before all of this could get actually started...