RBS automates away £7 million in manual server provisioning tasks

The UK banking group has saved up to £7 million (nearly $9 million) over less than two years by eliminating a whole host of manual server provisioning processes

RBS automates away £7 million in manual server provisioning tasks
Kertlis/Getty Images

The Royal Bank of Scotland (RBS) Group – which will soon be rebranded as Natwest Group under new CEO Alison Rose – has been on a big technology modernization drive since its government bailout after the 2007 banking crisis.

A key part of that mission involves speeding up software delivery cycles for its internal and customer-facing applications, while cutting out inefficient processes and costs by automating away manual server provisioning tasks, a practice which has really picked up pace across the bank in the past three years.

Prior to this initiative, “releases weren’t quick enough,” admitted David Sandilands, infrastructure engineer at RBS, during a recent webinar with the devops tool vendor Puppet. “There were lots of projects with fixes and updates having to be applied manually on top of our build. The process was highly manual, with emails and desk drop-ins to ask for pull requests.”

Automation was the obvious answer. But to do that effectively, RBS first needed to embrace modern cloud infrastructure.

Like many financial services institutions, RBS has been steadily shifting away from physical servers to more virtualized and cloud services – mostly private cloud for now, but increasingly some public cloud infrastructure-as-a-service from Amazon Web Services (AWS) and Microsoft Azure, depending on the workload. These new platforms have helped enable RBS to adopt automation techniques that have saved as much as £7 million (nearly $9 million) since October 2018.

Where it all started

When Sandilands joined RBS in 2005 the bank was running a large Unix estate spanning some 600 physical servers. Engineers at the bank would develop a build and hand a list of requirements to the operations team, “kicking off a long period of development until it reached operations, where it was almost too late to review in a hugely meaningful way,” Sandilands explained to InfoWorld. “If they found things that weren’t exactly what they needed, that led to either a huge delay to the release or it would be pushed into the next release.”

Lack of visibility metrics, manual enforcement of policies via shell scripts and checks, a lack of redundancy and resiliency for build servers, and an expectation of regular build failures – leading to a Word document detailing the various issues that often occurred – paints a picture of how the typical release cycle looked. Then, once built, there were still weeks-long waits to go live as staff went through rigorous, manual, fit-for-production checks.

Today most of the bank’s servers run on Red Hat Enterprise Linux (RHEL) and VMware. This shift from physical to virtual infrastructure – and increasingly some containerized workloads – has driven home the need for more automation tooling.

This has introduced a whole host of new infrastructure-as-code tooling into the bank in recent years, from Ansible, Terraform, and vRealize to Concourse CI and Chef InSpec.

Take Puppet for configuration management as an example. The bank started small, by experimenting with the open source version of Puppet eight years ago, as Sandilands and his team looked to iron out some major issues with the bank’s release processes.

All of these tools have “highlighted how scale has changed from physical to virtual to containers, but the surrounding change management processes still need to happen, which makes clear the case for automation,” Sandilands said.

Changing change management

As it turned out, however, getting the cautious change management team on board was a challenge. They had several concerns regarding the new tooling, much of it open source in the early days.

“The general discussion with them was about the whole pipeline and how all the tools work together to fully test and deliver reliable change,” he explained.

To assuage these concerns, the bank ran workshops and demonstrations workflow by workflow, tool by tool, to start showing that by making smaller, well-tested changes, the deployments would in fact be safer. This was paired with a drive towards peer-reviewed documentation, with Confluence pages and plenty of how-tos being shared across the IT function

“That marketing part is so critical,” he added. “We use Facebook Workplace to share demonstrations and run sessions, as well as regular meetings and workshops.”

Speaking about Puppet specifically, Sandilands said that team most feared how Puppet Enterprise could make changes to all servers at once.

“Deploying something as root creates a major fear for change management and there are huge estate-wide changes happening all at once,” he said.

“The other big fear was regular small changes, because all they heard was ‘more regular change,’ which there was a cultural fear around.”

Not to mention that automation always has job security connotations, something Sandilands is keenly aware of. “The big question we got from project teams often was, ‘Will this mean we lose our jobs?’ No, we move up the stack,” he said.

Key metrics

The change of approach has helped the bank bring down its internal service level agreement (SLA) for all fit-for-production tests to complete in three days, down from two weeks before the automation tooling was put in place. The ambition for RBS is to totally automate this process in the near future.

The bank’s entire code base has also been simplified and streamlined to work on modern infrastructure, with the Windows estate reduced from 25,099 lines of code to just 1,680, with full version control and change history. Jira tickets are also down massively as builds are deployed in a more standardized way, from 201 issues arising from the last build under the old method, down to just four production-impacting issues more recently.

This shift has also seen the bank take a more flexible approach to hardware provisioning. Where before it was buying physical servers which might sit around for months on end or were under-utilized, now it charges application teams a daily rate to use a private cloud instance, which they can access via a catalog in ServiceNow. The bank was getting 450 IaaS server requests through that portal each month at last count, in September 2019.

This charge-back scheme directly incentivizes teams to cut their infrastructure costs by decommissioning hardware they don’t need, cutting out manual processes, allowing them to focus on their application and get their hands on best-of-breed cloud native tooling.

To prove out the shift from physical to virtual, and from traditional development to devops, Sandilands and his team collect and share regular metrics packs around its software releases, which are shared with management as PDF documents. Being a large banking group, cost is always going to be a key metric for Sandilands and his team.

Showing savings – like that headline £7 million figure – and the return on investment allows Sandilands and his team to better justify their push for more automation and identify key areas for future investment.

“You have to be able to present back to management to explain what you are getting out of it,” he said.

His advice for other organizations making this shift? “Work out what is hurting you most or what would have the biggest return and focus on that.”

Copyright © 2020 IDG Communications, Inc.

How to choose a low-code development platform