{"id":28813,"date":"2020-03-17T01:00:45","date_gmt":"2020-03-17T05:00:45","guid":{"rendered":"https:\/\/centricconsulting.com\/?p=28813"},"modified":"2023-09-01T11:31:05","modified_gmt":"2023-09-01T15:31:05","slug":"disaster-recovery-in-the-cloud-part-two","status":"publish","type":"post","link":"https:\/\/centricconsulting.com\/blog\/disaster-recovery-in-the-cloud-part-two\/","title":{"rendered":"Disaster Recovery in the Cloud: Part Two"},"content":{"rendered":"
Part two in a series.<\/a><\/p>\n In part one<\/a> of this series, I outlined the prerequisites to building any good disaster recovery plan. The disaster recovery plan must meet the business recovery requirements, which we define as part of the business continuity planning process. Without this plan, you are throwing darts at a dartboard, not truly knowing what data and systems need to be recoverable at what recovery point objective to sustain the business and limit financial risk to the organization.<\/p>\n In part two of the series, I’ll walk you through four common disaster recovery blueprints that you can utilize to align your disaster recovery plan to the business requirements defined in your business continuity plan.<\/p>\n\n Assuming you deploy all IT applications within the Amazon Web Services<\/a> (AWS) cloud, let’s look at four disaster recovery options to support your disaster recovery plan, which must support the defined recovery time objective <\/strong>(RTO) and r<\/strong>ecovery point objective <\/strong>(RPO). AWS enables you to cost-effectively operate your disaster recovery process using any of the following different scenarios. These are just examples of possible approaches to solving your DR needs. In some cases, variations or combinations of these approaches may be necessary to meet the RTO and RPO requirements outlined in your DR plan:<\/strong><\/p>\n In traditional on-premise environments, companies back up data to tape and send it off-site regularly. Or they back it up to a virtual tape library (VTL) and vault it to an alternate site. In the AWS cloud, taking snapshots of elastic block store (EBS) volumes and backups of Amazon RDS and storing them in Amazon S3 is a standard process for backup data, designed to provide 99.999999999 percent (11 9s) durability of objects over a given year.<\/p>\n Although this is the least expensive way, your recovery time will be the longest using this method. You will have to deploy a new EC2 and RDS instance and restore the backup data and configure the networking, security, database connectivity and any other custom configuration that needs to be updated for the application to function in the DR region. There are many commercial and open-source backup solutions that backup to Amazon S3.<\/p>\n <\/a><\/p>\n When using the pilot light method, the cost increases minimally, and you shorten the recovery time when compared to the backup-and-restore method. In the pilot light method, the core pieces of the system run and stay up to date in another region, where you will recover the rest of (or the remaining) applications.<\/p>\n <\/a><\/p>\n A recommended configuration is pre-building your load balancers with domain name service (DNS) names already registered in Route 53 and have your databases up and running with replicated data from your primary site. The server images back up, periodically, to AMIs and replicate to the DR region. During a disaster, you deploy the critical applications from AMIs in your pre-staged pilot light region.<\/strong> Then, you can update DNS records to reference the resources running in the DR region.<\/p>\n The image below shows the cutover process.<\/p>\n <\/a><\/p>\n Warm standby builds on top of the pilot light architecture. It further reduces the recovery time during a disaster because all critical services always run at a scaled-down capacity.<\/strong> This allows an almost immediate failover (very low RTO and RPO) during a disaster by simply updating DNS records to point to the warm standby resources but at a significant increase in operational costs. After a cutover to the warm standby DR site, you then scale up the infrastructure to support the full production workload.<\/p>\n The diagram below depicts the configuration of a warm standby DR site with data replication.<\/p>\n <\/a><\/p>\n In case of a disaster, the cutover to the warm standby site looks like this:<\/p>\n <\/a><\/p>\n To have the lowest RTO and RPO, the full application stack runs across both the primary site and DR site. In essence, you load-balance your application traffic across both sites using weighted DNS routing. Traffic goes to both sites at all times, replicates databases and maintains all applications and configuration in a production configuration across all platforms.<\/p>\n When the system detects a disaster, traffic automatically routes to the surviving site. By using auto-scaling, services scale up to support the full load at the surviving site. With multi-site, you can achieve zero RTO and RPO, but this is your most expensive DR option.<\/p>\n Below you can visualize an example hot standby architecture, where I configured Route 53 to route a portion of traffic to each site.<\/p>\n <\/a><\/strong><\/p>\n The figure below shows the effect of a disaster in a multi-site configuration. All traffic is routed to the surviving site and data you access from the replicated copy of the database. As traffic increases, auto-scaling deploys additional resources to support the increased load at this site.<\/p>\nAWS Disaster Recovery Blueprints<\/h2>\n
\n
\n
2. Pilot Light<\/h4>\n
3. Warm Standby<\/h4>\n
4. Hot Standby (Multi-Site)<\/h4>\n