Anyone can purchase or build a DR solution that works in theory, but staying prepared is the hard part. Between turnover, lack of testing, inadequate change management and other difficulties, it’s hard to ensure your DR system will really be ready for a worst-case scenario. Whether you partner with a cloud disaster recovery provider or handle it in-house, here’s what you need to know to stay safe.
Traditional Disaster Recovery Solutions Weren’t Worth Much
In the age of cloud disaster recovery, businesses are finally starting to give DR the attention it deserves, but this is a relatively recent development. Historically, companies have looked at DR as a pricey safeguard which was unlikely to yield ROI. Sure, it’s good to have backups in case everything goes down, but in many cases, that was as far as they would go. Businesses would run routine backups, ship the tapes offsite and cross their fingers. Although the first hot site was built in 1978, for decades after, DR was mostly a last-resort. If the system went down for a couple days, you’d eat the loss. If it went down, then you’d go about the unpleasant business of restoring your landscape.
DR Solutions Weren’t Cost Effective
Until fairly recently, the choice was either to pay a fortune for hot DR or High Availability (HA), or to control costs with cold DR, with very little peace of mind to show for it. Businesses that needed very good RTO (Recovery Time Objective — the amount of time it takes to get the system running after a disaster) and RPO (Recovery Point Objective — how many minutes of data you lose in a disaster) could attain it by running a hot DR site, which duplicated their entire landscape offsite. This came with RTO and RPO of an hour or less, at the cost of doubling the budget for things like servers, networking, electricity, climate control, and maintenance.
HA had even better performance by duplicating landscapes onsite. RPO was near instantaneous, and RTO was around 15 minutes, but this performance still came with a lot of costly, extra infrastructure. Additionally, although HA can protect against common problems like malfunctioning power supplies and hard drive crashes, it can’t protect against disasters that damage the location. If your data center suffers a flood or a fire, it’s likely to take out your backup system, along with your primary one.
Cold DR was really the only thing that made sense for most businesses, but it provided very little assurance. In a disaster, you’d have to ship your newer tapes to an offsite location, with only the minimum infrastructure. You’d have to source and install servers, networking, and other equipment at great cost, then install everything by hand. RTO was usually about a week (depending on backup frequency), and RPO could be considerably longer in some cases. And that’s assuming backup had been handled correctly — something many companies didn’t bother to test.
Cloud Disaster Recovery Has Made Things Better
With cloud DR, companies aren’t dependent on maintaining their own dedicated hardware for DR. Instead, they lease extra space in a virtualized cloud hosting landscape, providing much greater flexibility at much lower cost. Companies can now achieve whatever recovery objectives they want with either Disaster Recovery as a Service (DRaaS) or in-house solutions.
For example, Symmetry’s Cloud Based Hot DR can provide RPO and RTO of 10 minutes. That’s similar performance to HA at a much lower price point, and without the vulnerability of maintaining your backup landscape in the same building. Our cold DR has an RPO of 24 hours and an RTO of 72 hours — greatly superior to traditional cold DR, at a lower cost. And if you do suffer a disaster, you don’t have to fly your team out, or pay for all that hardware.
Cloud disaster recovery — and DRaaS in particularly — also has fewer unknowns. Cloud hosting is robust and redundant, with significantly higher uptime and better reliability than most onsite landscapes. A good cloud-based DR provider will have an automated infrastructure able to quickly provision new compute, and a highly-trained around-the-clock team, prepared for worst case scenarios in a way few onsite teams can match.
Companies Have Developed a More Mature Approach to Risk
It’s not just cloud disaster recovery that’s changing things. A range of changes in the business environment are making companies take DR and business continuity more seriously, and develop an approach that seeks to assess and minimize risk. Increasing compliance requirements are just one part of the story.
After the September 11th attacks, the SEC and the Office of the Comptroller General issued new guidelines for key financial institutions, mandating regular testing, dispersed backup sites relying on different infrastructure, clearly defined RPO and RTO goals, and regular testing. Other rule changes have increased the accountability of C-level executives for DR planning, and placed more detailed requirements for DR and business continuity. HIPAA is arguably even stricter, requiring detailed, stage-by-stage DR, including an emergency mode operation, allowing medical providers to access medical data and keep it secure during the disaster.
Companies have also become more informed about risk. There’s better data allowing companies to estimate the reliability of IT solutions, as well as the costs, likelihood and consequences of downtime. Instead of a last resort for a worst case scenario, DR is a set of controls for a range of risks, from the rare (a major natural disaster) to the inevitable (an outage or loss of connectivity in some part of the enterprise). More and more, businesses are seeing cloud disaster recovery as an investment in their future, with clearly defined ROI.
Many Companies Still Struggle to Develop Strategic Cloud DR
It’s always easier to build and harness new technologies than to understand and prepare for their risks. And with the rate of technological change, this is truer than ever. An IT landscape is a complex web of interconnected systems, dependent on countless other systems to connect to customers, remote workers, business partners, and other stakeholders.
Accounting for all the things that could break, calculating the risks and consequences of each, and ensuring that your solution addresses those risks is incredibly challenging. The human element of DR can be even harder. Preparing for everyone to fulfil their role, creating contingency plans, interfacing with partners so that your backup plan meshes with theirs, and countless other challenges pose unknown risks for companies that have never suffered a major disaster. But ensuring your cloud disaster recovery plan stays current and effective may be the most challenging aspect of all.
How Companies Fail at Disaster Recovery
- Companies Fail to Set the Right Priorities: DR is a system to protect you against outages and other events that pose a threat to business operations. To make that system effective, you need to put a lot of work into figuring out exactly what you’re trying to protect your business from, and how best to prevent it. When companies treat cloud disaster recovery as merely a type of infrastructure investment — something they buy instead of something they build and maintain — they risk setting the wrong priorities or missing key details necessary in making cloud DR work.
That’s why you should perform a Risk Level Assessment to identify and prioritize events that could harm business operations before investing in cloud disaster recovery. These events can run the gamut from simple hardware failures, to natural disasters like hurricanes, to manmade disasters, such as hack attacks or fires.
For each type of disaster, construct a risk rating by multiplying the probability by the impact on a scale from 0 to 1. Then, create a recommended action for each, such as mitigating the risk, or planning how to get back online if the disaster occurs. You may have to accept certain risks — for example, risks that are extremely unlikely, or have a minimal impact on operations.
Next, do a Business Impact Analysis for every risk you’ve decided to address. This starts by inventorying each piece in your landscape, and identifying its owners, users, function and components. From there, you can get down to the actual costs of downtime for each system. You’ll need to look at the full range of costs, including lost sales and productivity, breach of contract, fines, and restoration costs. Check out our Disaster Recovery Planning Workbook for more info about the process, and worksheets you can use to plan your recovery.
- Companies Fail at Technical Implementation: A cloud disaster recovery is like any other big IT project — it requires testing and refinement to ensure everything works properly. On a technical level, you and your provider need to make sure no mission-critical systems are left out, and that all stakeholders have the access they need once the DR process is executed. That requires your team to take the same iterative process you see in any technical project implementation, refining DR at each step, until you’re testing the whole system in a way that simulates (as closely as possible) real world conditions.
Although most cloud DR providers will test individual pieces of the landscape, many won’t execute a test DR rollover. This can cause them to miss problems, especially those that involve connectivity to external systems. Your business needs to be able to connect to your bank, payment processors, SaaS providers, and other stakeholders who aren’t technically part of your core landscape. Without comprehensive testing, a disaster could leave you with a backup system that can’t connect to the outside world, bringing your company to a standstill.
- Companies Fail at Preparation: If you have a team member call in sick, you can usually cope. You might have to make a few calls, look up some information, or have someone stay late, but you’ll be able to figure out an ad hoc strategy to get the job done.
A disaster is very different. Your team has to work together under adverse conditions and extreme stress. That means everyone needs to know exactly what their job is when you declare a disaster. Unfortunately, many companies assume having a list of responsibilities is enough, and that everyone will just be able to look up the DR process and do their job.
That’s just not the case. Depending on the type of emergency, you may be missing several team members, have no power or damaged infrastructure, or even be in physical danger. It’s crucial to have multiple rounds of testing, until everyone knows the cloud disaster recovery process inside and out. You should also have contingency plans for each step, including alternate members for your DR team.
- Companies Fail at Change Management: In ERP, any change to the landscape affects everything else. That’s why installing an SAP update is so complicated — you need to test it and configure it so that it doesn’t create an adverse impact on other parts of your system. Unfortunately, most companies have a “set it and forget it” approach to DR. They don’t make sure their disaster recovery cloud is adjusted to match changes to their landscape. You either need to track these changes internally, or else partner with someone who can.
- Companies Fail at Follow Up: Change management needs to be part of a larger strategy of following up and maintaining your disaster recovery solution. You need regular testing, and coordination between DR and HR. When a member of your DR team leaves or is reassigned, or a new member joins your leadership team, you need to adjust your plan, perform retraining, and make sure everyone has the right contact info. Then, you may need to run the whole team through another disaster simulation to make sure the new member understands their job, and can work with the group under pressure.
You really need the right partner. For all but the largest companies, it’s not cost effective to do DR internally. Just doing a Risk Level Assessment and Business Impact Analysis will require specialized skills you probably don’t have in house, and things get far more complex once you start setting up your system. Partnering with a cloud disaster recovery provider will give your business far better protection and quicker recovery at lower cost, no matter how many team members come or go, or how much your business changes.