Flipping through the news channels will show you that disaster can strike at any time. Extreme weather events, earthquakes, fires, large-scale power failures – these and other threatening scenarios can disrupt your IT along with your business. The field of Disaster Recovery (DR) has evolved to meet the challenge of ensuring quick recovery and continued operations in the event of a disaster. Now, the cloud has changed the nature and potential of DR. New advances in cloud disaster recovery require new practices. In addition, wholly new approaches like availability zones potentially upend the entire thought process on how best to protect a business from a disaster.
The evolution of disaster recovery in the cloud era
Traditional Information Security (InfoSec) thinking about disaster recovery consisted of three (or four) standard approaches to keeping IT functioning in the event of an outage. From an InfoSec perspective, the outage could be caused by anything. What mattered was the ability to bring applications and data back online in a timeframe that met the organisation’s expectations.
The three basic approaches were the ‘cold site’, the ‘warm site’ and the ‘hot site’. A cold site was simply a place, separate from the business’s location, where you would find computer hardware and software sitting on a shelf. If disaster struck, you could go to the cold site, set up the equipment, install the software, load backed-up data from tapes and make IT services available to users. In contrast, the warm site had the equipment and software ready to go, but not running. The hot site had everything up and running as an exact replica of your live systems, so there would be very minimal downtime to restoration.
Of the three, the hot site offered the fastest Recovery Time Objective (RTO). A fourth option was known as the ‘mirror site’, which operated in parallel with the main instance of whatever software and data it was protecting. It might be a fraction of a second behind, but it was essentially an identical twin of the IT infrastructure. You could fail over to a mirror site in seconds, ensuring nearly uninterrupted service. The mirror site, of course, was the costliest of the options, followed by hot, warm and cold in descending order of expense. The most urgent IT workloads were allocated to mirror or hot sites.
The cloud has completely transformed the traditional cold/warm/hot/mirror InfoSec DR paradigm. With its essentially limitless, rapidly set up compute and storage, a public cloud provider can bring the cost of the previously high-ticket mirror site right down. It becomes far easier and less expensive to run twin versions of core applications in multiple locations – in parallel.
Even for less complex DR tasks, the cloud is a game changer. Data backup and replication, for example, are far simpler and less expensive in the cloud than they are in traditional on-premises DR architectures. These advances, however, do not change the fundamentals of DR. If anything, faster, cheaper cloud DR options have put more pressure on DR managers to determine which workloads deserve new, higher levels of protection and which do not.
Cloud disaster recovery strategy
Disaster recovery strategy does not change dramatically with the addition of the cloud, but there are new factors to be considered. Your cloud disaster recovery strategy must still be business-orientated, reflecting the priorities of the business and its operating model. For example, if SAP ERP requires the fastest RTO, then that DR strategy goal will remain unchanged even if the DR is taking place in the cloud.
Cloud DR strategy should, like its earlier versions, align with the broader business continuity plan. After all, IT is just one element of a business. When there is a disaster, a business must be able to function on all levels, not just in terms of IT. This includes people, facilities, record-keeping, supply chain, logistics and so forth. The business continuity plan will take all of these elements into account.
The new universe of cyber threats has also changed the DR equation. Cyber presents a new kind of disaster, one that is arguably more challenging to deal with than a natural event. While a hurricane might destroy a data centre, high winds won’t corrupt your data or hold it to ransom. The cloud disaster recovery strategy has to encompass security countermeasures that keep backup systems and data free from malware and protected against cyber threats.
Cloud availability zones
The cloud has created a new option for DR known as the cloud availability zone. Looking at it, you can see some familiar components of traditional DR. Yet, it’s also quite novel and innovative.
What is an availability zone?
An availability zone is a logical grouping of resources which maintain a physical separation within a data centre, or, in AWS’s case, over two or more data centres within the same regional area. In the public cloud, placing a set of resources, typically instances, into an availability zone means they will obtain a higher level of availability which translates into a guaranteed higher SLA. If there is an outage on one set of resources, the other resources in the availability zone are likely not to be affected – this is the primary goal of the availability zone.
If you distribute your instances across multiple availability zones and one instance fails, you can design your application so that an instance in another availability zone can handle requests. It is important to note that the secondary instances are not clones of the primary instances. It is up to the application to resume operation on the secondary instances, and ensure that connectivity to the app is properly architected so that processes and users can access it.
Keeping SLAs and costs in focus
Setting up a DR availability zone through a cloud provider is not quite as simple as it looks. In working with companies that are building availability zones for their SAP landscapes, we have found that it’s essential to watch SLAs carefully. While most cloud platforms provide good performance, the effects of geographical space and other factors like compute resource quality and networks can negatively affect SLAs. Testing is critical to ensuring that you’re staying within your SLAs in an availability zone.
Cost is an issue. Cloud resources are generally very economical, but the depth of compute and storage you need to duplicate an SAP landscape may result in a higher cloud bill than you were expecting. You will also have to arrange licensing of software in the availability zone.
The hybrid cloud complicates things, too. No one is 100% SAP in the cloud yet. The majority of organisations that are running SAP in the cloud are doing so on a hybrid basis, mixing on-premises instances with cloud. To make cloud availability zones effective, you will have to replicate the on-premises instances in a separate cloud deployment. It takes a lot of work and DR testing to get it right.
Selecting a DR Partner for SAP
We have extensive experience working with companies on implementing cloud disaster recovery programs. We understand the nuances of availability zones and recommend them when they best fit your overall business and DR strategy.