Friday, January 5, 2007

The day I got a lot smarter

One sign of intelligence is the ability to learn from your mistakes. An even better sign is the ability to learn from someone else's mistakes. Unfortunately, we don't always have the luxury of watching someone else learn a valuable lesson, and we have to do it ourselves. But if we pay attention, sometimes we get to learn multiple lessons from one mistake. (Lucky us.)

Case in point: Dealing with a crisis. I was managing a group of web developers, and the project lead on an integration with our largest client was going on vacation. He assured me his backup was fully trained, and would be able to deal with any issues. He left on Friday, and we deployed some new code on Monday. Everything looked good.

On Wednesday at about 4 p.m., we got a call asking about an order. We couldn't find it in our system. From what we could tell, the branch that placed the order wasn't set up to use our system yet, so we shouldn't have the order. At 5 I let the backup go home for the day while I worked on writing up what we'd found. I sent an internal email explaining what I believed had happened. I said that I would call the client and explain why we didn't have the order, and that they should check their old system.

While double-checking the deployment plan, I discovered that the new branch actually was on our new system ... as of that Monday. That's part of what was included in the new code. That's when I got the shiver down my spine. By that time the backup, whose house was conveniently in a patch of bad cell coverage, was gone. The lead was on vacation. "Okay," I thought, "I've seen most of this code, in fact I've written a good bit of it. I can figure this out."

Stop laughing. It sounded good at the time.

To make a long story short (Too late!) we hadn't been accepting orders for three days from several branches, but had been returning confirmations for them. It was somewhere around 3 a.m. when I finally thought I knew exactly how many orders we had dropped, though I hadn't found the actual bug in the code yet. I created a spreadsheet with the list of affected orders. At one point I used Excel's drag-to-copy feature to fill a range of cells with the branch number for a set of orders.

Did you know Excel will automatically increment a number if you drag to copy? Yes, I know it too. At 11:30 in the morning today I know it. At 3 a.m. that night I apparently didn't know that. So I sent it to the client with non-existent branch numbers that I didn't double-check. "Oops" apparently doesn't quite cover it.

The next morning on a conference call with the client, my boss, his boss, and several other people, we were going over the spreadsheet when someone noticed the problem. To me, it seemed obvious that it was a simple cut-and-paste error on the spreadsheet. But someone -- a co-worker, believe it or not -- decided to ask, "Are you sure? Because I don't see those other two branches on here either." After dumbly admitting that I didn't know anything about any other two branches, I ended the call so I could go figure out what was happening.

Now I had apparently demonstrated that I didn't actually know what was wrong, that I had no idea of the scope of it, and that I was trying to cover it up. Yay me. We called in the lead (whose vacation was at home doing renovations) and started going through the code. I finally found the cause of the error, and it caused exactly the list of errors that I had sent out early that morning, except for the cut-and-paste error. The "other two branches" turned out to be from the previous night's email, where I had specifically said those branches were not affected by the problem.

Within two hours, we had the code fixed and all the orders recovered. So everyone's happy, right? If you think so, then you haven't yet learned the lessons I did that day.

  1. No matter how urgently someone says they need an answer, the wrong answer won't help.

  2. If it looks like the wrong answer, it might as well be the wrong answer. This doesn't mean counter-intuitive answers can't be right. It means that presentation and the ability to support your conclusion count.

  3. If you didn't create the problem, always give the person who did the first chance to fix it.

  4. If someone knows more about a topic than you do, have them check your work.

  5. Don't make important decisions on too little sleep.

  6. Before making a presentation to a client, review the materials with your co-workers.

  7. Don't make important changes when key people are unavailable.

Looking at that list, I realize I already knew several of those lessons. So why did it take that incident to "learn" them? Because there's a difference between knowing something, and believing it.

No comments: