Brain farts

What happened?

In spite of plenty of advance warning from Twitter, we got caught by the Twitpocalypse bug.

For the 2.0.1 release, we had tested our software extensively. I actually wrote an emulation layer on top of the code that reads data from Twitter that added a large number to every ID read from Twitter. This testing uncovered several bugs which were fixed and incorporated into the MGTwitterEngine open source code.

Unfortunately, this testing didn’t take into account that the library we use to parse the data (YAJL) includes a range check that ensures signed values, which are allowed by the JSON specification, will fit into 32-bit storage. The “YAJL Error 3” is that library telling us that the range check failed.

In hindsight, I should have fed the parser some large unsigned integers. In the wonderful world of software development, we call this a brain fart.

How did we respond?

Unfortunately, the Twitpocalypse occurred on Friday evening, just as my wife and I were heading out for a high school graduation. I quickly shot off an email to Twitter asking a few questions (I had mistakenly thought that there were problems with data being returned by an authenticated connection.)

After a couple of hours of iPhone email and SMS, it was clear that we were going to need do a new release. We had a critical bug that affected thousands of users. And our fear was this submission would take longer than normal: many developers are submitting 3.0 updates.

Life officially sucked at this point, and the graduation ceremony was memorable for all the wrong reasons.

After getting a few hours of fitful rest, I opened up Xcode on Saturday morning and started looking for the problem. It took just a few minutes to find the bug and another few minutes to fix it. Our concern at this point was getting the update to users.

Apple saves the day

In spite of it being the middle of weekend after a busy week at WWDC, we were able to get in touch with Apple Developer Relations. Our contact was able to expedite the approval of the application.

To say that this was a relief would be the understatement of the century. The update for the free version started showing up in iTunes on Sunday evening. The paid version was updated on Tuesday morning.

What can we learn from this?

Developers are human. We make mistakes. It’s interesting to note that Loren Brichter and Buzz Anderson, fellow developers whose skills I hold in high regard, were also affected by the Twitpocalypse. It really does happen to the best of us.

It’s also interesting to see that Loren and Buzz reacted in the same way I did: by fixing the problem ASAP. In Loren’s case, that meant finding a hotel with WiFi in order to distribute his update. Buzz released a new beta in a matter of hours.

In my experience, these brain farts are problems with easy fixes. It’s something like checking for invalid bounds, getting a Boolean state wrong, or something else of that nature. It’s not a complex problem: it’s an oversight.

The problem, therefore, is not how fast we can react to fixing critical bugs, but how fast the App Store reviewers can react with an approval.

As a device that’s constantly connected to the Internet, the iPhone taps into a stream of data that is unpredictable. Data in “the cloud” can change at any time, and web applications naturally adapt to these changes with a quick deployment strategies. Those of us who are building client applications on top of this network infrastructure need the ability to adapt quickly, too.

Security issues are another area where a quick response is a requirement, not a luxury. If I discover something that puts a user’s private data at risk, it’s my responsibility to fix the problem as quickly as possible. Time spent in a review queue is time spent being exposed to a flaw.

At the same time, we shouldn’t blame Apple for these delays in reviewing applications. In the year that they have been selling our products, the App Store has been more successful than anyone imagined. It’s clear to me that reviewers and others involved in the approval process are overwhelmed by this success.

How can we fix it?

Fortunately, I think there’s a simple way to solve this problem for all developers selling products on the App Store. The inspiration for this solution will be obvious to anyone who’s used Apple’s Developer Technical Support (DTS.)

When you purchase an ADC membership, you are given a number of “incidents”. These DTS incidents can be used when you have a problem that can’t be solved through documentation, support forums or hours and hours of debugging. It’s for the hard stuff, and usually involves getting an engineer at Apple involved to understand and fix the issue.

As a developer, I’m very careful to use these incidents wisely: they are a last resort. There are some years where I don’t use them at all, those are the good years.

A similar system could be put in place for critical bug fixes on the App Store. If every developer was given one or two “prioritized reviews,” it would act as insurance for the brain farts. You’d have a way to raise a flag and say “I need special attention for a critical bug.”

If another developer has a critical bug, I have no problem with my review process for a feature release taking a little longer. And since prioritized reviews would be a scarce resource, they won’t be open for abuse because developers will think twice before using them.

Because it’s not a matter of if you have a brain fart that leads to a critical bug, it’s a matter of when.