What pops into your mind when you hear the term “cloud disaster management?” Do you picture natural disasters or human-created ones? Do you picture a devastating event like catastrophic floods or a terror attack, or something less dramatic, like a power-outage or a DDoS attack? Is cloud disaster recovery a simple process of running through a checklist, flipping a few switches and making a few calls, or is it a Herculean task requiring the best minds at your company to come up with creative solutions on the fly?
When companies fail at cloud disaster management, it’s often because they fail at imagination. Either they assume disasters are too unpredictable to prepare for, or else they assume everything will go as planned — no matter what befalls them.
Most DR scenarios aren’t all that dramatic — you’re more likely to deal with a service outage than a major earthquake — but that doesn’t mean you can get complacent, or assume you won’t suffer a bigger catastrophe at some point. Organizations that take the time to figure out all the things that could go wrong are much more likely to survive a disaster than those that just purchase a cloud backup service and call it a day.
The State of Cloud Disaster Management
You’re Probably Not Ready For a Disaster
While the data shows that more companies are taking disaster recovery seriously, it’s likely not enough. According to Forrester Research and the Disaster Recovery Journal, 40% of organizations had a formal enterprise risk management program that reports to the board or the C-suite in 2015, and only 19% of organizations lacked any kind of risk management program. A full 77% of organizations had a plan for data and information tampering, and 53% had a customer privacy breach plan. The majority of those tested their information tampering and privacy breach plans at least once a year.
Those numbers are an improvement — but it’s clear that not all companies are taking disaster recovery seriously. 60% of organizations don’t have an enterprise-wide risk management program, and almost half of organizations aren’t ready for a customer privacy breach. And organizations tend to be too confident in the preparations they’ve made — especially around data loss. According to a 2015 survey, although 77% are “at least fairly confident” that they can recover from data loss, 68.8% actually test their DR solutions less than once a year.
The problem with cloud disaster management is that it’s not very important until suddenly, it is. Outages are costly, but most companies can weather a few hours offline without too much of a problem. However, a major incident can destroy an unprepared organization, sometimes literally overnight. That’s what happened to Code Spaces, a company offering services for developers. They were targeted by a hacker who took over their control panel and demanded ransom. When they tried to regain control, the hacker started deleting data at random. By the time they got their system back, the hacker had deleted so much data that they were forced to close.
And Code Spaces did everything right — at least on paper. They replicated their services, they backed up their data. Unfortunately, their backup could be controlled from the same panel as their primary system. Because they hadn’t fully understood the risks that design decision created, they lost everything.
Many Cloud DR Providers Aren’t Helping the Situations
Collecting rent is a good way to make money in IT, and there’s nothing wrong with that. If you take good care of your customers’ data and keep them happy, you’ll be able to make good profit off your investment, save your customers money, and provide better reliability than they’d achieve in-house.
But if you’re providing cloud DR, you need to earn your keep by testing your disaster management procedures, while making sure your team and your customers know what to do when something goes wrong. Many cloud disaster recovery providers simply don’t do that. They may provide reliable hosting, data replication and failover procedures that should work, but they don’t make sure they do work. From the customer perspective, it seems fine — they trust the company to host the software, and it looks like everything is ready to go. But without running actual simulations that test disaster preparedness, they never know what risks they’re taking until it’s too late.
Cloud Disaster Recovery Solutions Are Complex
There’s More to Weathering a Disaster Than Your IT
When people talk about DR, they’re often referring to the core services and infrastructure that replicate data and (in theory) can be used to rollover to a backup service in an emergency. But successful recovery doesn’t just require that a copy of your data exists somewhere — it means that you can actually get your business up and online, and restore access to your stakeholders within a certain maximum amount of lost time and data.
That means you need a trained team, able to execute the rollover and connect all your stakeholders. It also takes a lot of planning. You need to examine your business requirements for not just the rollover, but the entire disaster process. Creating RTO and RPO targets isn’t enough. You need to be looking at the Maximum Tolerable Period of Disruption (MTPD) — the amount of time from when the disaster is declared until your business resumes normal operations.
To do that, you need to list out all your stakeholders, and examine what services they need, and how those services interact. Will your DR solution give access to your contractors? Will it automatically hook into your financial services provider? Does your cloud disaster management team have a method to get ahold of everyone and make sure they can successfully login to the backup system? Otherwise, you could end up with a backup production landscape that no one can use for a week.
Cloud Disaster Management Needs to Go Beyond IT
IT issues like data center failure and service outages are at the center of DR for a good reason: they’re far and away the most common disasters businesses face. But while having a strong data center solution is key there are plenty of other scenarios that can inflict catastrophic damage.
Earthquakes, fires, major power outages, super storms, and security breaches all create unpredictable risks for businesses that can’t be addressed by a backup system alone. Your disaster management approach does need to at least run through these scenarios, evaluate the risk to your particular business, and decide whether or not they’re worth addressing, and how to do it. Realistically, you can’t address everything that could possibly go wrong — that’s life — but you can (and should) create a good, risk-based model that mitigates the most serious threats.
You Need a Complete Disaster Management Program
Sometimes the universe just has it out for you. Your data center goes down while key members of your team are sick with a bad flu. You suffer a power outage in your West Coast office while your East Coast team is struggling through a major storm. Your landscape is attacked while you’re updating your software, or is attacked repeatedly by the same hacker group, using different vectors.
We’re big believers that the right team and the right technology can lead you through a disaster. We use a cutting edge next generation cloud DR solution, and practice regularly to make sure we’re prepared. But we also recognize that things don’t always go as planned. If you’re in cloud disaster management mode, it means something has already gone wrong, and there’s a risk of other things breaking. Having backup plans to your backup plans, and an experienced team who can stay cool, calm, and collected is crucial, because you never know exactly what situation you’re going to face.
When Disaster Strikes
Let’s imagine that a major storm or hurricane hits the region your head office and main data center are located in. If you have warning, you’re lucky — there’s time for some extra preparation. Your data center team can double-check backup generators and other equipment, get extra supplies in place in case your team is stranded inside, and position emergency equipment, such as pumps to mitigate possible flooding in any vulnerable areas.
They’ll need to secure the area to minimize the risks of equipment being knocked loose, and make sure the right staff are on hand or on call. If it’s going to be bad, your team may need to move mission-critical applications to a different facility, or even evacuate the data center in the most extreme cases.
And, of course, they’ll need to ensure they have a communication plan allowing them to coordinate their cloud DR solution with the backup center, even if cell service and connectivity go down. They’ll have to balance the costs and risks of rolling over before the storm hits with those of trying to wait it out.
Things aren’t going to be any easier for your home office. You need to make sure your staff stay safe, and that your business is prepared to minimize downtime during the disaster. You may need to close certain offices and temporary change the chain of command to keep your business running.
Now imagine it’s a freak storm. It starts off as just an ordinary thunderstorm, and grows rapidly into torrential rain, with massive winds, continuous lightning and flooding throughout the region. There’s no time to batten down the data center or shuffle leadership — you have nothing but your cloud disaster management plan and the preparations you made ahead of time.
The first challenge is simply activating the disaster recovery plan. You need to convene leadership, declare a disaster, and activate the various stakeholders. This can be extremely challenging in a major storm. Cell service may be sporadic or down altogether. You may have members who are stranded or unaccounted for. You might not be able access to facilities or resources you counted on, such as computer systems and/or log books with DR contact information. Hope you’ve got backups extra copies around.
Activating and coordinating the stakeholders gets more challenging as you go. Your communications leadership team needs to coordinate with all the parts of your business, which can include:
- Customer relations
Each of these leaders needs to be able to restore normal business operations for their own teams, from the board, all the way down to the ground floor worker and customers. And in many cases, they’ll need to coordinate across teams as well. The board may need to coordinate with network managers to handle communications with key customers. HR, technical leadership and facilities teams may have to work together to ensure workers can get back to their jobs as soon as possible, and work out solutions to keep core processes running in the meantime.
To make it all work, you need to assume the worst. If a key member of the team isn’t available, or a system your DR solution depends on fails, you need a backup plan, no matter what. Furthermore, your cloud disaster recovery plan needs to spell out every phase of the process for everyone involved, from beginning to end. You won’t have time for misunderstandings or mistakes. Your whole DR team needs to work together to get the business running again.
In-House Cloud DR Solutions Rarely Work
Very large enterprises may have the expertise to plan, prepare and execute their own cloud disaster management strategy, but for most organizations it’s just not a reasonable goal. It’s too difficult to anticipate and realistically evaluate all the risks. It’s too complicated to plan for every possible thing that could go wrong. It’s too expensive to purchase or engineer all the backup systems, and fill all the roles you need to ensure business continuity in a worst case scenario. And even if you could, without an outside opinion, it’s easy to miss a weakness or a flaw in your plan.
Cloud disaster management isn’t something you should figure out through trial and error. You need a partner with the knowledge, experience and infrastructure to get it right the first time, because you won’t get a second chance. Symmetry is an industry leader in cloud disaster recovery, with expertise that covers every part of your IT system, from application management, to cloud hosting, on down to the networking and hardware level. That means we understand everything that can go wrong in an enterprise landscape — and how to make sure it goes right.