Automating DP, DR Creates Resilient Infrastructure
By Suresh Nair, VP and GM of sales and marketing for Asia Pacific, FalconStor Software
Data lies at the center of every company and is the most valuable asset of any business. Companies need, and want, to know that their data is fully secure and protected. If a company cannot access its data, employees are left twiddling their thumbs, productivity declines and the business grinds to a halt until the data is brought back online.
Protecting this data and assuring that it is accessible 24 hours a day, seven days a week, year-round is a critical task for IT and data center managers. However, companies have to be concerned about more than just the data. They need to look at the applications, systems and servers in which the data resides. IT managers need to look at their data centers from the perspective of services. Basically, it comes down to the fact that companies must protect their entire infrastructure – not just the data that is associated with it.
Most people think of disasters as hurricanes, tsunamis or power grids failing. Yet the majority of data center disasters are not related to natural causes. It is human error and malicious acts that cause these incidents. A simple act of someone missing their morning coffee can lead to flipping a wrong switch which eventually cascades to the point that power is lost along the Eastern seaboard. Alternatively, disaster could affect an individual rather than an entire region or company. For instance, an executive might update a PowerPoint presentation and press the “save” button rather than “save as,” overwriting the original work. This is the time where the ability to press a button and get that original document back.
IT faces a challenge in providing these data protection and disaster recovery (DR) services for an entire infrastructure. The amount of data that needs protection is growing. The IT infrastructure is expanding in complexity with virtual and physical servers, cloud-based and hosted applications and network connectivity. IT is under pressure to handle all of these tasks, protect each service and do so with limited budgets and staff. The company’s executives understand these services are critical to their business operations, but are often unwilling to allocate the right budget to the IT department to protect them.
Disaster recovery is IT’s insurance policy
What executives need to understand is that data protection and DR plans are insurance policies. These policies do nothing to enhance the business’s bottom line, but they are a necessary function. It is similar to how houses have smoke alarms. These are an insurance policy to protect individuals and their families if a disaster were to occur. However, what happens if homeowners fail to change the batteries, and then an event occurs? Those individuals would wish they had spent the money up front on the new battery rather than facing the loss of their house and belongings. The same holds true for the data center. The IT manager needs the support – financial and otherwise – to implement the disaster recovery services needed to ensure a company remains operational after an event.
Every minute spent trying to recover data or get service up and running again is time not spent on business operations. Most companies state that they can only afford downtime of four hours or less at one time – and given the chart below on lost revenue based on publicly available data, it is understandable why this is the case. This data provides a glimpse into a baseline outage cost. Protecting data means nothing if it can’t be recovered. One of the greatest data center challenges today is ensuring a smooth recovery of operations after downtime. Downtime can be caused by data loss or corruption, equipment failure or a complete site outage after a loss of power or a natural disaster.
Data courtesy of Storage Area Network for Dummies by Chris Poelker and Alex Nikitin
In fact, a recent IDG Research Services report, “Quick Poll: Disaster Recovery Trends and Metrics,” demonstrates the result of data losses within a company (see chart below), with loss of productivity being the biggest result followed closely by harm to a company’s reputation, financial losses, loss of sensitive data and procured costly technical assistance.
The current way DR is handled within the data center just does not work with the complexity of the infrastructure. As virtualization has changed the inherent data center structure, the method of conducting DR and data protection must change and so too must the way data and infrastructure are protected. Single-point protection solutions like replicating data to a single server, virtual tape libraries, physical tape or image-based back up do not work in these situations. These solutions also fail to take into affect the services and the application infrastructure upon which this data resides. Companies that use these solutions have a DR plan that is only as good as the last backup or image that was taken, and they still need a plan to restore the applications and services themselves. These traditional backup methods are comprised of a number of steps from one to hundreds, with each requiring reboots of the servers, applications and infrastructure after the completion of each step. If one step is missed along the way – which happens frequently due to simple human error – then the process to a complete restore takes twice as long.
What is automated disaster recovery?
With automated DR solutions, the IT manager’s job is simplified. Automated DR takes a number of these complex steps required by traditional recovery methods and automates them. Automated DR understands the specific order, process and procedures and applies this to the DR process. It eliminates failure and returns the company back to normal operations within minutes rather than hours or days. Automated DR is similar to a person buying a car. The car owner does not need to read the manual to understand how the engine works, only how to press a button or turn a key to start the car and drive it off the lot.
Why do companies need automated DR? If you have ever watched the show “M*A*S*H,” you’ve seen the scene when the bomb goes off and the wounded start to arrive. Soldiers are brought into tents, and the nurses begin triage. It is similar when a server goes down. IT staff quickly assemble to begin triage and solving the problems. Maybe there is a manual with steps on how to recover it, but something still goes horribly wrong and there are four people with Cs and Vs in their titles hanging over the shoulder of the IT manager. With automated DR, the IT manager can go straight to the system, right click on an icon and start restoration of the server. It removes the human element to the process. Automated DR is the insurance policy that IT invested in for events of all sizes and shapes.
However, automated DR does not take the place of the IT manager. The IT manager still needs to configure and continually test the disaster recovery plan. The IT manager is needed to create the plan and decide if and where the hot, warm and cold disaster recovery sites are located. For instance, a hot site, where all the data is continually replicated and running, can be located in the same data center as a failover location in the event of a server failure or lost file. A warm site that has some systems running on standby and receives the data perhaps once a day can be located in the same city. A cold site is a site that requires a physical start, but, being located across the country, has less of a chance of being affected by a large natural disaster.
Think back to 9/11. Many of the companies in Tower One replicated their data to Tower Two and thought this was an excellent disaster recovery plan. However, several of these firms ended up going out of business because important data was lost and never recovered. Other events such as Northeast snowstorms, hurricanes in the Southern U.S. and fires in the West are good examples of why geographical disaster recovery of cold or warm sites is critical.
Benefits and drawbacks of automated DR
There are numerous benefits to automated DR, but the biggest advantage is that IT has a way to recover the system without human intervention. With a single click, the solution will take information from the crashed A and B servers at C location, send it to D location and reinstall it on the virtual E machine using the right computer language.
Another benefit is peace of mind. IT managers know that if an issue occurs, they won’t be woken up and dragged into the office at 2:00 a.m. for countless hours of programming. Rather, the IT manager presses a button from home and goes back to bed. An automated DR solution is imperative for the health of the IT manager. Consider one IT manager who is located in the Southeast. He is on three panic attack medications, as he is extremely nervous about his company having an IT failure. He is continually trying to get the company’s executives to expand his budget for an automated DR solution. The third benefit is future proofing of the IT environment. Regardless of the type of servers, hardware or connectivity in the future, the automated DR solution will work with that environment ensuring the company remains protected.
The drawback to automated DR is the same as it is with all software. IT is only as good as the software that is implemented. If IT managers take the time to configure their software and then periodically refresh it with any changes, the system will run smoothly. The best method to ensure the automated DR solution works is to test it frequently. Now, most IT managers are leery about testing their DR plans, since if they find a problem, it only creates more work for them. In frequently testing DR solutions, any issues are found and resolved before the plan is put into effect. Also, automated DR allows for simple testing without impacting the production site or data replication occurring between various sites. DR testing relates back to the smoke alarms. If the homeowner changes the batteries every six months, then when an event occurs, the family is protected and the house is saved.
Resuming operations at a DR site, whether planned (such as a scheduled site migration) or unplanned (such as an accidental event) requires careful preparation. The planning will take months, but the execution of the plan needs to occur within minutes. During these precious minutes, all of the teams involved will be put under pressure to carry on their recovery procedures in a coordinated fashion. Anything could go wrong during the dozens, if not hundreds, of steps performed by the application, hardware, network and storage teams. A human error, a process flaw, a routing issue or pretty much anything could delay the site recovery. For many, this herculean effort is, simply put, risky and unpredictable.
Automated DR encompasses all applications, services and data. It has all the steps tied in and is able to handle the hybrid environments of physical and virtual machines. It takes the human element out of DR and allows companies to recover systems within minutes rather than hours, thus minimizing the negative effects on the business.