01 Sep Backup and Disaster Recovery (BDR) Planning
Backup and Disaster recovery planning should be a core activity of EVERY organization, yet it is often overlooked by many; the general rationale being that having systems and data backed up is all that needs to be done to ensure that an organization is protected. Backing up however does not address a number of critically important issues such as:
- Having timely, reliable access to properly sized IT assets on which to recover,
- The potentially high cost of down time during recovery (staffing costs, lost revenues, reputational costs),
- The effect on the business from loss of operational continuity during recovery.
Management will often choose not to plan and practice for disaster recovery believing that the likelihood of incurring a major disaster is so improbable as to be an immaterial consideration. Continuous access to current, reliable information technology data, assets, and related facilities has become expected and is critical at all times. This dependency should cause management which chooses not to address the importance of disaster recovery planning, to seriously challenge this outdated mode of thinking.
Developing and putting into place a BDR plan and having a ready, scaled site into which data could be recovered in the event of a disaster, is an important form of insurance. Insurance that an organization should hope never needs to be used, but when needed, can mean the difference between staying in business or suffering a potentially unrecoverable failure.
The core of a BDR plan is the creation and documentation of processes and procedures to recover and protect organizational IT infrastructure; systems, data, connectivity and security.
A DRP is documented in written form and specifies the backup and testing procedures to follow continuously through the normal course of business, and as well the exceptional sequence of events that need to be followed under the extreme stress of an actual disaster.
There are many benefits that can be obtained from drafting a disaster recovery plan, some of which are listed below:
- Guarantees the reliability of standby systems
- Minimizes need for critical decision-making during a disaster
- Reduces potential legal liabilities
- Provides important assurance to stakeholders that systems can be reliably restored on a reliable, timely basis
- Lowers stress in an inherently stressful circumstance
- Includes thorough testing to ensure that it will work as expected when needed
Disaster recovery plans must include analyses of what systems and data need to be covered, who will be involved (internally and 3rd parties), where the systems and data will be relocated, on what media the backups will occur, and how often systems and data is actually backed up. Issues such as the following must be addressed:
- Are we backing up all the required data and directories?
- Are we backing up data at the required frequency (mirrored, hourly, daily)?
- Who has access to the backup data, and how do we reach them?
- Have we conducted a full realistic mock disaster?
- Are long term copies required, and if so, at what frequency?
- Is backup data stored locally, on tape, NAS, in the cloud, some combination?
- Do we have access to a temporary office location to which we can send key staff?
- Are the applications backed up, or just the data?
- Do we have access to computer assets in short order on which to recover our operation?
A key part of a data recovery plan must also consider the restoration aspect of data. In the unfortunate event of a disaster, organizations will need to know how long it will take to restore all systems and to get operations back up and running. Organizations must consider how long they can afford to be idle.
An effective disaster recovery plan will minimize business impact when disaster strikes and reliably lead an organization back to its full operating potential in the shortest feasible time frame.
A proper DRP must be more than just a set of well intentioned, yet random, untested, uncoordinated backups. Unfortunately, the random approach is still by far the most common industry practice – and when disaster strikes, it’s too late to take action.
Downtime results in lost revenue, frustrated customers and clients and perhaps most importantly a deeply damaged reputation.