Reflecting on IT Risk

A Reflective Piece – A look at an independent review of the TSB IT platform migration incident back in April 2018

Slaughter and May’s report on the TSB failure was published in October 2019 and provides an independent review before, during and after events of the failed IT migration.  For those that don’t know, this is a UK retail  bank that provided customers with current accounts, loans, mortgages etc. This incident was widely reported in the UK press and placed under a high-degree of scrutiny by UK regulators and government.

I don’t claim to know much more about this failure other than news clippings and what features in this report. However, I do find the observations contained within their review to be really interesting from a risk and resilience perspective. There are multiple themes contained within their findings which now feature as key areas for development across the risk management landscape. I think this serves as a useful case study to justify some of the work done in our space.

Anyway, here is a summary of what I read and my high-level thoughts/notes on this report.

The Report

It’s 262 pages containing 23 chapters broken up into 3 main sections –  wow! 

  1. Acquisition by Sabadell subsidiary SABIS and the mobilisation of the TSB IT  transformation 
  2. Execution of transformation and it’s delay
  3. Re-plan exercise, go live and subsequent events

The first 10-pages provides a very useful executive summary.

The Event

Circa 5 million TSB customers were to be migrated to the SABIS platform on the 22nd of April 2018 when it became unstable and almost unusable. This event generated 10 times the usual complaints and 70 times the amount of opportunistic fraud cases. 

Imagine being a trader requiring cash flow, or a member of the public trying to pay a last minute bill, which if they don’t puts them into more debt. Or, imagine a vulnerable customer who is trying to access their cash to buy their dinner for that night. I imagine this to be a particularly stressful scenario for the customer if they are denied access to their account and cash.

The  Report Observations (As far as I can see)

  • TSB inherited a legacy IT platform from their relationship with the Lloyds Banking Group which was then required to migrate over to an entirely new IT banking platform.
  • Following the acquisition of TSB, an ambitious and unrealistic go-live date was initially set for 17 months without detailed knowledge of technical requirements. Functional testing overran by 10 months from the plan, meaning non-functional testing only started at the point of the previous go-live date.
  • The board did not question why TSB would be “migration ready” 4-months after the previous go-live date, even with project streams still delayed by as much as 7 months! Furthermore, they announce the re-plan date publicly!
  • To meet new target dates,  performance testing targets were reduced. Reporting on non-functional testing and outstanding defects were also limited and inaccurate.
  • Limited third party governance was undertaken due to the nature of the relationship between SABIS and TSB akin to an intragroup set up.
  • Inadequate risk oversight and audit without robust independent opinion.  
  • This was a migration of the functionality and data of an entire bank to an almost entirely new IT platform and over a single weekend was very risky. According to the report, the board did not request or receive any advice on risks or the full range of implementation options.
  • A small piloted series of early cutovers representing small parts of the bank was the organisation’s approach to de-risk. Other protection such as being insulated from cost overruns and exit options were in place which reduced the risk of failure of a single migration.

I’m glad that case studies such as this exist for the risk and resilience community because it provides a real life example of what could go wrong. It also enables us to point to an example that supports the case for a effective risk management and governance (however dull and time-consuming it may appear to the management!).

Are we the storytellers?

I guess the obvious thing that springs to mind when reflecting on the observation  is that the board actually didn’t receive all the right information. Having seen this in another example before, where a hospital board we’re not aware of the lack of resources and training within a particularly  critical department. That organisation, at the same time, were experiencing a significant increase in mortality rates but no one put the two together for leadership. These are different scenarios but the point is the same –  if leadership don’t know, then what are they supposed to do about it?

Many mature complex organizations typically have a very comprehensive board assurance framework, where leaders are informed by huge decks of information on a monthly basis about risk stuff. It would be naïve of us to truly believe that they read and understand every slide in every deck. Moreover, we could never expect them to challenge missing information. It is incumbent on the risk and resilience professionals to find the most effective way to communicate the greatest risks to the organisation, regardless of what is in the standard reporting deck. Get your storytelling hats on folks and make the risk meaningful to the management because if they don’t get it  they won’t see it.

Is change the root of all risk? And do we need to communicate the upside?

I know it’s not the root of all risk but sometimes it does feel like most major and emerging risks (whether realised or not) derive from some form of change to the business. This case study represents a major technology change of which the risk was substantial. Therefore, one might suggest that any effective risk management program should include change management controls in every area possible within the business landscape because this could well be the Achilles heel for the organisation. Having worked in transformation programmes, any stage gate that requires approval or assurance before moving onto the next step is often perceived as a “blocker” or a hindrance to the progress of a project. This can sometimes create quite sensitive and difficult discussions. My experience so far is that opportunity wins against the risk on almost all occasions. This to begin with felt wrong and counterintuitive to my studies and learning thus far but I have now embraced the additional perspective which captures the opportunities of risk against the cost of doing business. I think this is a crucial part of the risk managers mindset as we balance the message to management about the risks being presented to the organisation.

Are organisations becoming just a brand that’s operated by Third Parties?

Third party due diligence and oversight has become a popular theme in recent years.  For example, the European Banking Authority released new guidelines in 2019 which went into great detail about how to manage the third parties operating within the financial  services arena.  The example above is a UK bank and it’s entire IT platform is moving to another organisation (albeit intragroup). The modern-day organisation often adopts a cloud-first strategy and choses to work with products and services are offered via SAAS solutions. It’s starting to look a lot like the organisation itself is nothing more than a brand with a thin veneer of operating management / relationship managers overseeing a vast array of third-party providers.  Is the traditional organisation dead? I did bring this up to a very experienced supply chain manager not too long ago and apparently for some organisations in some sectors this has been commonplace from as far back as the 90s. I wasn’t aware of this in financial services. Although I can certainly see this as the direction of travel. Risk and resilience practitioners may need to factor this into their  mindset when assessing a risk to the business.

Final Remarks

Using case study examples was a good for learning at university as it is for ongoing professional development. There is no denying that a lot appears to have gone wrong with the TSB example. The positive news is that they survived as an organisation, people kept their jobs and folks got their money – eventually.

I believe we are the storytellers. I believe we need to find the most effective way to communicate risk to the leadership. It needs to mean something for them to empower them to make the right decisions.