Tuesday, October 22, 2013

inevitable failure

One of the main topics in the news lately has been the troubled roll-out of the healthcare.gov website that is the main portal for people in the thirty-six states that did not create portals of their own. Amid all of the grandstanding, excuses, and political showmanship from both the Right and the Left on this I have one question that keeps coming up in my mind. Wasn't this to be expected with a major website roll-out?

I understand as I am reading news reports that most people have not been a part of a major IT roll-out like the healthcare.gov website.  A significant minority of the population has, though, and the news reports are notably silent on the inevitability of these outages which should be obvious to anyone who has taken part in them.  Simply put, large IT projects with immovable release dates, extraordinary load requirements, and multiple complex inputs do not usually roll out successfully on the first try.  Many times I have seen roll-outs pushed back months due to unforeseen circumstances, and with a complex roll-out it is almost guaranteed that something unforeseen will occur.

As I have been reading over the last year about the impending go-live date for the website I was always inwardly thankful that I was not involved with what was obviously going to be a failed release.  The idea that major bugs would be addressed, security and load testing completed, and all of the unforeseeable issues that plague any roll-out by a very public unchangeable date was absurd.

While there will be calls for heads to roll, and many probably will, this whole thing smacks of a misunderstanding of how major websites are rolled out.  Few of the people sacked or called out publicly will have deserved it.  This was a failure in the planning stages of the project, and it will be the implementers who take the heat.  That is typically how the project blame game works.

If there is a lesson to be learned it is that something like this should be slowly phased in, lessons learned, then changes made based on those lessons.  The website target date should have been July so that necessary changes could be made when the roll-out inevitably failed.  Those running the project could have crowd-sourced the testing process and had individual volunteers try to overload and break the system, then used that feedback to know which issues needed addressing.  Also, to reduce load, the thirty-six states portals could have been phased in week-by-week.

That's just me being an armchair IT guy, though.

1 comment:

roamingwriter said...

Isn't it always the planning stages where things go wrong? I mean someone at that stage should have said, no, wait, that will take a lot longer than that if we're going to test it, etc.?