December 5, 2014

Building Highly Available Systems Using Amazon Web Services

High Availability is the fundamental feature of building software solutions in a cloud environment. Traditionally high availability has been a very costly affair but now with AWS, one can leverage a number of AWS services for high availability or potentially “always availability” scenario.

In this blog, I’ll be mentioning some tips to attain high availability for your application deployed on Amazon Web Services.

Before starting, let us see how the AWS Infrastructure is organized across the globe.

1. AWS is spread across the world in multiple geographical locations called Regions. Refer to DIAG1 – REGIONS below.
2. Within each region, there are several isolated locations called Availability Zones. An availability zone can be seen as a separate data center within a region.




To build a highly available environment, the most recommended approach is to use multiple availability zones.

Assume a scenario of building a web application which needs to be highly available. A possible/simple deployment model for the same is to use Elastic Load Balancer in front of a cluster of your web application servers. Refer to DIAG2 – WEB TIER below.

One important point to note is: “The cluster should spread across multiple Availability Zones. The Availability Zones are fault isolated data centers which in turn maximize uptime for your Web Application Tier.”



Let us extend the above scenario; the web application is now integrated to a database. The whole application’s availability now depends on the database. What if it goes down? This will have a severe impact on the availability of the overall application.

The simple approach to have high availability between the web application tier and database is to set up a failover cluster for the database which is inherently supported by Amazon RDS Multi-AZ deployment. Again, Multi-AZ deployment ensures maximum availability. This way we have avoided a “Single Point of Failure” scenario.

<DIAG3 – Database>


Some of the common point of failures and their mitigation

DNS/Domain Services Route53
Load Balance Elastic Load Balancing
Web/Application Server Auto-Scaling
Database Servers Redundant nodes or clustering
Authentication Redundant nodes
Data Center Failures Use Multiple Availability Zone
Disaster Use Multiple Regions


A loosely coupled architecture is more fault tolerant, as components are not directly dependent on each other and the failure of one component does not bring the whole system down. Some of the key tricks for building loose coupling are:

• The application should be built on individual small modules. Each module should be a black box. They should be fairly independent.
• Use queues to pass messages between these black box components. Use AWS services like Simple Work Flow, Simple Queue Service, Simple Notification Service, and CloudWatch.
• Decouple the components by putting a load balancer between clusters. Use AWS services like Elastic Load Balancer, Route53, etc.


Once we achieve a great level of loose coupling, we should plan for failures of any individual component of the overall system.

We must understand that these components will go down sometime sooner or later. Some of the key tricks for recovering from these failures are as below

• Use AWS Services like Elastic Cloud Compute, Auto-Scaling, Cloud Watch, and Elastic Load Balancer to quickly start a new EC2 nodes under a high load scenario to avoid the overall application goes down
• Use AWS Services like Amazon Machine Image, Bootstrapping, and Cloud Formation to quickly build a new environment.

I hope the above provided suggestions will help you a long way in setting up highly available systems in AWS.

I will appreciate your feedback!!!