Success in Multicloud: Processes for High Availability

Success in Multicloud: Processes for High Availability

You Are Already Multicloud; How’s it Going?

In point of fact, most organizations have been trivially multicloud for years. After all, about 97% have been using multiple SaaS providers for years. More recently, a majority – two-thirds, in Nemertes 2019-2020 Cloud and Cybersecurity Research Study – have come to use IaaS and PaaS to deliver services in production. What’s newly true this year is, the average organization now puts more than half of its IT workloads in external clouds. In fact, only 40% of workloads, on average, remain in the data center.

Multicloud = Multicomplicated

IT struggles to cleanly, seamlessly integrate external cloud services into their environment. They have to adjust policies, strategies, and roadmaps, job descriptions and team structures, and many processes. They also must adapt their monitoring and management toolset to show how their external cloud workloads are performing. And as importantly, they need tools and skills and processes to help them speedily resolve issues with the workloads they run in IaaS and PaaS. Restoring normal service delivery after an incident is a key metric for judging the organization’s success in its multicloud.

Mean Time to Normal: Cloud Success Metric

Incidents that affect availability are nearly inevitable. IT needs to be able to restore normal service fast when they happen. This is a two part process: figuring out what has gone wrong, and remedying the problem to restore normal service delivery.

In our study, we collected data on MTTI (mean time to identify a service problem) and MTTR (mean time to restore services once a problem is identified). From these we calculated participants’ MTTN – mean time to normal service delivery. We took this as the best metric for judging their success in delivering services from cloud. And we saw that the top third of organizations are nearly 100 times faster than the bottom two-thirds. They restore service in 29 minutes vs more than 2600 minutes. Though they have more incidents, they are so much faster resolving them that overall they have much higher availability.

Two Big Steps to Help Improve MTTN

While we saw a broad range of technologies, positions, and practices that are associated with improvements in MTTN, today I want to focus on two processes.

Build a Workload Placement Process (WPP)

A WPP allows IT to make consistent, careful decisions about where to place each workload: DC, IaaS, PaaS, or SaaS. It is a multifaceted consideration encompassing performance needs, availability needs, audience location, risk factors, compliance requirements, staff skillsets and loading, and more.  Those that have and use a WPP to decide what goes into the cloud have a 42% lower MTTN than those that do not.

Build a Cloud Workload On-boarding Process (CWOP)

A CWOP brings consistency to the process of bringing a new or migrating workload on-line in the cloud. It focuses on making sure IT understands the stakeholders involved, and the contextual requirements of the job. Many of these will have been captured in the course of working through the WPP, and are now used to ensure smooth launch and operations. Those that have and use a CWOP to bring consistency to how things move into the cloud have a 35% lower MTTN than those that do not.

Knowing It is Half the Battle

These processes make for improved MTTN because they ensure that IT understands the work it is putting in the cloud. When IT knows more about the workloads, it is easier for staff to find and fix problems.  Tools and staff also play a big role in ensuring success, but tying them together with solid processes is an indispensable step on the road to multicloud success.