All too often, we hear stories of software failing as IT systems become an increasingly prominent part of our everyday lives. In the following article, we investigate some of the most significant factors behind past examples of failed software projects.
If you’ve ever been involved in a software project, either as a product owner or developer, I’m sure this question has crossed your mind at least once before. Maybe you’ve wondered what can go wrong in a software project? Or more specifically, what are the differences between a successful and unsuccessful software project? Whether it’s down to unrealistic expectations, poor communication, or even inadequate planning, we need to consider the impact of a variety of intertwined factors in order to best approach this question.
Dr. Robert Charette, an internationally acknowledged authority on risk management and IT systems, claimed that software project failures have a lot in common with plane crashes. “Just as pilots never intend to crash, software developers don’t aim to fail. When a commercial plane crashes, investigators look at many factors, such as the weather, maintenance records, the pilot’s disposition and training, and cultural factors within the airline.” As you’re about to see, to get to the root cause of a failed software project we must consider a variety of factors, including but not limited to technical issues, project management, business environment, as well as organisational culture.
The migration that brought an entire bank to its knees
To illustrate how disastrous the consequences of a failed software project can be, I want to start by telling you about a botched IT systems upgrade by one of the UK’s mainstream retail banks. Later, it would be revealed that the resulting meltdown alone cost them over £330m and 80,000 customers - much of which was put down to a lack of testing!
Back in 2018, TSB Bank embarked on an ambitious migration project to separate its IT systems from those owned by its former parent, Lloyds Banking Group, following TSB’s acquisition by Spanish bank Sabadell. At the time, TSB’s new parent company was warned about the high risk for the plan, and that the cost for executing it would likely exceed the £450m contributed by Lloyds.
Pressing ahead with the project, a hard-and-fast deadline was set. In just 18 months, the team had to develop the new system and migrate TSB over to it. To make matters worse, unrealistic expectations started to emerge straight away. The project was compared to migrations for small local banks in Spain, done previously by Sabadell’s IT arm. As you might have guessed, none of these small banks came close in scale with the sprawling legacy systems that they were up against.
On top of everything else, the team tasked with the migration didn’t have full access and control, nor the requisite understanding of Lloyds’ systems. They were essentially going in half-blind with one hand tied behind their back. As I’m sure you can appreciate, each of their failures and omissions at this point were only going to result in future setbacks at the very least - or at worst a catastrophic failure. Unfortunately for TSB, what resulted was no mere setback.
Eventually, Sabadell would announce the completion of TSB’s new IT system and the ‘successful’ migration of 5.4 million customers. However, it was only a matter of hours before reports started to emerge; over 1.9 million customers were locked out of their internet and mobile banking. Some of those customers were unable to access their accounts for up to a week. From the misallocation of funds among accounts, to mortgages appearing triple their actual size, the failure of TSB’s software and migration project was now affecting its customers on a personal level. I don’t know about you, but I wouldn’t be staying with a bank who I could not longer trust to guarantee reliable access to my own money.
Following a full investigation into the meltdown of TSB’s IT system, a report found that Sabadell had ultimately “cut corners” when it came to critical testing. Whilst we may never know the full extent nor the exact cause of TSB’s failed project, it is safe to say that a lack of planning, communication and testing were all significant contributing factors.
What can we learn from the TSB fiasco?
If one thing’s for sure… TEST TEST TEST! Whilst an independent report concluded that TSB lacked “common sense” and shifted to a new IT platform before it had been fully tested, we must ask ourselves, why was there a lack of testing here in the first place? Not only did a lack of proper planning lead to little or no testing, but scarily enough it’s often the first thing abandoned in any software project under time pressure.
The initial choice of implementation approach is a critical decision for any IT transformation project, particularly when there are major software changes and new infrastructure required, like in the case of TSB. In this case, the decision was made to migrate the whole of its existing customer base over the course of one weekend, as a single event or ‘big bang’ migration. Whilst the core advantage in doing so is that it is faster, cheaper and less complex than a migration implemented in phases, it is clear that TSB did not give sufficient consideration as to whether this was the right approach, nor the risks involved in doing so.
TSB sought to de-risk the main migration event through a number of transition events, or live proving, which migrated parts of the functionality ahead of the initial production release. However, the functionality put into use only represented a small part of the new platform. In effect, the live proving was not done at a scale that would be sufficient to allow TSB to identify the problems that would arise when their entire customer base was transitioned. As TSB failed to sufficiently reduce the risk of their migration approach, this risk could have been be properly mitigated if the new platform was subject to rigorous testing first.
An ambitious (and unrealistic) timetable was set from the beginning, and TSB relied on Sabadell and their previous migration experience to deliver the new platform at speed. Functional testing of the new platform was significantly delayed due to defects in the software. In light of the strict time constraints imposed, the majority of non-functional testing was done in a highly compressed period of time. The issue though is that the testing was important to confirm that the new platform could operate at the service levels expected by TSB. However, since it was not done properly, critical issues with their new platform were not identified (specifically around the configuration of two data centres), which contributed significantly to the problems experienced by TSB’s digital customers.
Critical success factors for software projects
So, having explored the TSB disaster, what can the failure of software projects be put down to? Professor Shamsul Sahibuddin from the University of Technology, Malaysia, alongside Dr. Mohd Hairul Nizam Nasir, conducted a study to examine the critical success factors for software projects to determine the most significant. Interestingly enough, they discovered that non-technical factors dominated those examined, coming in at 94%, as opposed to technical factors which covered just 4%. Based on these findings alone, Tom Demarco’s 20 year old claim that “the success or failure of a software project is seldom due to technical issues” holds true to this day.
In reality, these technical issues are often easily alleviated with proper processes and people management. As we saw in the case of TSB Bank, the following non-technical factors are just some of the key causes that formed the root of the critical technical issues ultimately encountered by their customers in the end:
- Inadequate planning.
- Unrealistic project goals.
- Inaccurate estimates of required resources.
- Poor communication.
- Poor project management.
- Inadequate risk management measures.
When it comes to building software, we have identified the 10 biggest risks in software development, as mitigating these will help to reduce the likelihood of your project failing. As you can now appreciate, a large enough failure has the potential to jeopardise an entire organisation’s prospects.
Last updated: 24 June 2020