Stuff happens in technology, it just does. And how we are impacted has as much to do how we have prepared for the inevitable as it does to do with what we have done to avoid the downtime. So what happens when the lights go out? Try to picture what happens when your enterprise systems go down. And I would argue that is more a question of when and not if. What is the impact to your business when your SAP goes down for an hour? A day? Longer? What happens next will likely depend on your Business Continuity (BC) plan. And at the heart of your BC plan will depend on your tolerance for downtime which brings us to ask the question, what is RTO and RPO?
What is an RTO?
We’ve discussed this before, but it’s worth getting into again in a bit more detail. RTO standards for “Recovery Time Objective.” The RTO is the amount of time that you are willing to let elapse before your backup systems come online. RTOs vary widely in the business world. For non-critical systems, RTOs might be measured in hours or even days. For core systems, RTOs could be a matter of seconds.
The RTO depends on how urgently you need the system to recover. In the case of SAP ERP, most businesses want systems to fail over to backup instances relatively quickly. It depends on the business, though. If your company ships containers every three weeks, then a 15-minute RTO probably won’t stress your operation very much. It will allow people to take a coffee break while the IT department scrambles to get the backup SAP landscape going.
More fast-paced and transaction-intensive businesses generally require shorter RTOs. The most extreme examples are in the financial field, when seconds count. A stock trade or wire transfer that gets delayed by an outage can be a costly proposition. It may also affect the reputation of the firm in question.
What is an RPO?
The RPO is the “Recovery Point Objective.” It reflects the moment in time that you want to be able to recover when you start up your backup systems. To illustrate, let’s say Mr. Smith buys a product from you at 11:00AM and Mr. Jones buys a product at 11:01AM. Your SAP system records the transactions at their respective times. Then, imagine you have an outage at 11:02AM. How far back do you want to recover? Do you want to be able to recover Mr. Jones’ transaction from one minute earlier or Mr. Smith’s transaction from two minutes prior? Your RPO determines how far back, i.e. the point in time, that you want to be able to recover.
Why RTO and RPO matter in Business Continuity
RTO and RPO are important elements of every BC plan. These are the specific metrics that dive backup & recovery, availability and failover process. And, as you might imagine, the faster the RTO and the shorter the RPO, the higher the cost of implementation. If you want an RTO measured in seconds, for example, you will need to architect accordingly. One common methodology is known as a “mirror site.”
A mirror site is an identical copy of your existing SAP landscape running in a separate (hopefully safe) location. If you’re in New York, your mirror could be in Arizona. You can configure your SAP landscape to fail over to the mirror site in seconds, therefore enabling transactions to continue within the defined RTO.
Mirror sites are expensive because they present a two-fold challenge. You have to run them in perfect alignment with the primary system. That’s one operational chore. Worse, though, is that you have to keep the two systems in exact parallel, in administrative terms. If you update software in one, you have to update the other, and so forth. It’s a time and resource-intensive process.
Less costly, slower RTOs might comprise “warm” backup sites that need to be booted up along with the restoration of database backups that have been replicated at periodic intervals during the day. For instance, if you replicate every hour, then your RPO is going to be up to one hour in length. Even cheaper are “cold sites,” where you have to install your SAP landscape from scratch. This might take days. But, depending on your business, it may not matter.
DR Best Practises
There is a useful body of best practise knowledge you can apply to your DR requirements. Learn from the good practises (and mistakes) of others. The following is a high-level overview. For more detail, we recommend our Disaster Recovery Planning Workbook.
Assign DR Priorities Using a Risk-Based Approach
Most of our clients have a multi-tiered approach to RTO and RPO. For their most critical systems, they invest in fast RTOs and short RPOs. For less essential systems, like file storage volumes, they settle for a less costly but slower RTO. The question that arises, of course, is which assets and threats deserve the highest level of DR investment and which do not?
A risk-based approach enables you to quantify the impact of threats and plan DR accordingly. There is a fair degree of subjectivity in this, but it allows you to differentiate between the likelihood and impact of more and less severe disaster scenarios. Here is a simplified version of the process. For each digital asset, e.g. your SAP ERP, email system, etc:
- Identify the threat, things that will disrupt business operations like natural disasters and cyber attacks.
- Estimate the probability of the disaster occurring, e.g. the likelihood that the threat expressed as a value between 0 and 1.
- Estimate the impact, assigning a rating between 0 and 1 that suggests how significantly the threat/event would affect your business operations.
- Assign a risk rating, the value derived by multiplying the Probability by the Impact of the particular threat.
As the table below shows, different systems have different risk ratings based on the impact of an outage. The higher the risk rating, the greater the need for a high RTO.
|Threat||Affected System||Probability||Impact||Risk Rating||Implications for RTO/RPO|
Selecting Appropriate DR Technologies
After assessing risk ratings, it’s necessary to select technology or tools that provide a means to achieve the RTO, RPO and other aspects of the DR plan. There are many choices available. These include SAP Hypervisor replication, secondary data center backups, database replication, storage replication and so forth. Each technology solution for DR has its pros and cons. An experienced DR partner can help you figure out the optimal approach from business continuity and cost perspectives.
Consider Disaster Recovery-as-a-Service (DRaaS)
DR can be sufficiently complex that it may make sense to outsource it to a provider who can do the whole thing as a service. We can help in this regard. Our Disaster Recovery as a Service is just one part of a complete cloud and IT infrastructure solution. We work with our partners to allocate resources based on:
- RTO and RPO
- Current computing needs
- Future growth
- Special projects
An affordable monthly rate gives you access to the resources you use. At the same time, you benefit from the scalability DRaaS in our managed cloud. In the event of a disaster, you can recover and scale up quickly, meeting predetermined RTOs and RPOs. This ensures business continuity without the costs of investing in DR infrastructure.