On 31 Dec, I was busy most of the day working in the office - now new years bash for me, I was bashing a few issues in our first end to end connectivity test for the new application. The first end to end test is always painful. In the middle of all this bug bashing,a friend pointed me to a post about Microsoft Zune just dying - by thousands all over the world. As I learnt more details of the issue, it seemed stupid to me that such a stupid mistake as leap year check could happen. Well today when I saw the code here , I was laughing my ass off.
I see these bugs quite often in the work that I review and when I point it out, the other guy can just laugh. Once I had this guy in my team who built a portal for us. On his last day on the contract he wrote a fix and deployed it to UAT. He left and we never really bothered about the fix for a few weeks. Then one day we got abug saying that the hyperlink for the error detail was failing. It seemed silly - but when I looked deeper I found that it was failing only when the link text was alphanuneric - it would work when the link text was only numeric, whcih was 98% of the times. I called the guy and asked him - he was a pal, so the talk was fiendly. He laughed it off and told me which line to fix, I knew that already, but it seemed a sad state of affairs that something so silly and so obvious was left out - the guy could have paid some more attention and it would have saved us a bug.
In another case, we had a .NET application which did not have a dedicated support team - I was handling it alongside my other responsibilities. Most of the issues we ever got were due to lack of user training.So one evening when I get a call from the first line support saying they cannot export the data grid, I was inclined to push it back as a user error. But the pal on the other end was sure it was weird. So I go to look. I go there, open the grid and run a query that returned about 100 rows and export - OK. I run a query that returns 0 rows and export - I get a blank XML. Cool! Then he asked me to export 1,2 and 3 rows. One and three rows works fine, but 2 fails!!! It works for all n rows where n!=2! I still havent figured where it is going wrong!
Is it so hard to put some checks and validations in your code? Is it so hard to figure all the corner or conditional cases upfront? Its not hard, but it needs some effort - effort which might be easy but we are lazy to put in or effort that gets negelcted because we are running to meet an impossible deadline. Sometimes it is about another big issue that obershadows the main issue. Recently we did a major production rollout that impacted all users and broke compatibility. We had to lock out all users. The impact of failing to lock out all users and a user polluting the system due to a wrong version of the code was much much greater than a wrong fix, so I spent 3 hours making a fix, testing and releasing it to a few users in a controlled manner and one week planning the lock out strategy and fixning small leaks in our lock out code. The fix ran for a week with those few users without any issue - so we were happy. We had issues in the release to DEV but that was my Cruise Control scripts misbehaving. So three weeks and 2 pizza nights in the office we are there ! Except that the fix had a small validation error that screwed up three users. In the shadow of the lock out and with the aim of avoiding the lock out debacle, we screwed up the validation :(
Putting proper validations in code is not tough - if you make it a habit. Making it a habit is the pain.
Once you get into habit, its like riding a bike - you never forget.


0 comments:
Post a Comment