As more organizations become comfortable with the concept of having their software products hosted in the cloud, developers need to learn more about how to build their solutions to take advantage of the capabilities of these hosting solutions, as they provide significant advantages over traditional server hosting services. While the largest player in the space is still by far Amazon Web Services (AWS), other offerings from Microsoft (Azure), Google (Cloud), are growing quickly. The offerings from these companies fall into some high-level categories (nicknamed ‘AAS’ due to the prevalence of ‘as a service’ in their names):
This is typically what many people think of regarding cloud hosting. The primary capability is virtual machines provisioned on demand, allowing companies to change the size and/or number of servers stood up to support their needs. This tends to be the entry point for many companies as it’s the most similar in nature to traditional server hosting offerings – instead of 3 racks of servers in a data center, you have 15 virtual servers. The cloud hosting company has the responsibility to ensure availability of the machine and have it operate as configured. It is the customer’s responsibility, however, to handle basically everything past boot-up – OS, application server, and software patches, configuration and monitoring. Again, this is very similar to a physical hosting scenario. Additionally, IAAS offerings can also provide additional capabilities such as:
The third item is what we will be focused on today – load balancing and auto scaling. Future blog posts will dive deeper into other areas, both within IAAS, and the other AASs. To build software products that take advantage of the elastic abilities of these cloud offerings, a few key principles need to be followed:
The endless hours of chasing down what happened when a specific server goes screwy are gone. When a server in an elastic group stops working, you kill it and spin up another. Don’t misinterpret me – you’ll probably want to track down why the server went screwy – but you don’t have to have downtime or lower capacity. Take a snapshot of the machine for analysis, kill it, and spin up a fresh, clean instance to take its place.
The natural follow-up to the above is that to effectively “just spin up a new server,” provisioning the server to be ready must be automatic. Whether it’s by basing new servers off of a “golden image,” running a startup script at provision time, or utilizing some great tools like chef and puppet that assist in bootstrapping a server (or a combination of the three), your elastic server pool instances should take _no_ input from an administrator to successfully start up and be ready to serve requests.
The thought that your elastic farm can instantly scale up to handle any incoming traffic load is comforting. It is, however, an illusion. Depending on the size of the machine, the amount of work to fully provision it and prepare it for service, it may be 10 minutes by the time that server is up and running and responding to requests. While that is (honestly) freakin-light-speed compared to past purchasing, shipping, racking, stacking, configuring, deploying steps, it may not be fast enough to handle sharp spikes in traffic. Some approaches that help mitigate this:
Here’s where most of the differences in application design occur. Certain practices that most application developers are used to need to change to provide maximum benefits. Some of the areas to rethink are:
To support true scalability, the application server tier should scale independently from user session management. As a rule of thumb, user sessions are the enemy of scalability, as the more sessions you have, the more resources you need. However, they sometimes cannot be avoided – at its simplest, maintaining an authenticated user’s login involves user sessions. However, as we’ve seen above, servers are disposable. User sessions, in a well designed system, are not. The solution is to separate the session from the application server and stand up a separate session management system. Distributed caching systems (e.g. memcached) as well as databases (especially NoSQL) are highly optimized at object retrieval by key, and scale exceptionally well. It may seem counterintuitive to introduce a network component to manage something traditionally held in memory, but the benefits usually outweigh the costs. The application server only holds the session object in memory when needed, as well as allowing the system to scale the number of concurrent users (sessions) and application workload (cpu). Most importantly, it allows the application servers to be themselves stateless, allowing for any server to answer any request, as well as seamlessly handle the addition or removal of a server from the elastic pool.
As implied above, make sure that any server can handle any request. Avoiding and/or minimizing storing things in the application or session context (or equivalent) can go a long way to ensuring elasticity. The end user’s system can be used to store some information as well – via cookies or local storage (e.g. in mobile apps). However, as always, avoid sending sensitive data or data that can be manipulated to circumvent access restrictions (such as userid or product roles).
Whenever possible, do not use, or even reference, the local filesystem on the server. Not only does this potentially tie a user to a specific server, it conflicts with the disposable server principle. A piece that this impacts significantly is application logging. It is very, very common to log activity to the local filesystem. This should be redirected to a centralized logging server cluster. This can be done in real-time, or done via separate processes monitoring the local log file(s) and ‘shipping’ them to the log server. Runtime performance impact vs. log file loss risk will need to be measured to determine the appropriate choice for your product. It doesn’t have to be an all-or-none approach either – your audit log could be captured in real-time to your logging server while your debug/info log is sent asynchronously. Note to make sure that your logging information includes an identifier for the machine that generated the message to allow for any log analysis to be done.
A corollary to servers being disposable is that, in the real world, servers fail. Your product infrastructure must be capable of surviving a server/device failure. Each tier of your product will need to address fault tolerance.
Web tier: clustered servers serving static assets from a shared storage location
Shared storage: clustered, redundant storage
Application Server: Following the above practices for autoscaling protects from server failures significantly. Additionally, cloud hosting providers offer a means of ensuring that the servers are launched in separate data centers and/or racks – called Availability Sets in Azure and Availability Zones in AWS. Also, consider multi-regional deployment if necessary.
Data tier: clustered or master/slave replication approaches provide significant failover protection
Most cloud providers with elastic hosting pools provide the means of setting a ‘heath check’ URL to determine when a machine is no longer valid or healthy. A factor that is commonly forgotten in establishing elastic server pools is to provide not only a URL that the infrastructure can use to check the status of the server, but to ensure that the check is valid. The URL should be executing test code that validates all aspects of the system. Confirm connectivity to all data stores and that the data is valid – run code that exercises as many critical system routines as possible. Have it call your APIs. If something doesn’t work, you’ll want to know about it before a user does. Remember that just because your web server and application server are up and running doesn’t mean that your product is working. Make sure you configure your provider to notify you (or at a minimum track) when these health checks fail. You’ll want to be actively notified if a trend begins to occur.
Cloud hosting services offer many different computing processing options. Some are CPU optimized, some are Memory optimized, some have incredible IOPS to the storage system. One of the primary drivers of moving to cloud computing is to be able to control your costs and capacity. By designing your workloads into discrete, stateless units, your capacity can be increased by increasing the number of worker machines. If increased workloads require an individual machine to have larger capabilities (e.g. memory) – then you may need to rework your product design. For example, if processing a data file takes multiple steps, you may find issues if you lock a single machine to handle all the steps for a given file. Breaking the task up so that the file can be split to be processed in parallel, or saving the intermediate steps into queues may allow more concurrent files to be executed. Some scaling solutions can upsize and downsize based on message queue size, allowing for creating of additional worker nodes when more work is backing up, and shutting them down as the backlog is cleared.
While a great deal of focus is placed on the cloud capability of scaling up elastic pools to handle increases in site traffic, a critical factor to remember is to set up downscaling rules, so you only incur the cost of those additional machines for as long as you need them. Due to the time it takes to start new instances, your reduction rules should be configured so you get rapid increase, but a gradual decrease in pool size, effectively only removing instances when you’re sure you don’t need them.
Many of the Capability as a Service offerings from the major cloud providers are themselves distributed, auto-scalable, and highly fault tolerant. Consider the level of effort to build a component that is offered as a service by your cloud provider. Standing up your own fleet of RabbitMQ servers is going to be a higher total cost of management than utilizing the providers queue service.
It is especially important to evaluate database service offerings from cloud providers. Many software products are designed against a particular database vendor – so check to see if your cloud provider offers that vendor as a service, or if switching to an offered vendor is an option. The database service offerings are almost always configured with replication and fault tolerance set up, and from my experience, regardless of how “supported-out-of-the-box” it is, managing a database cluster is a significant level of effort over time. It may not be an option for your product – but it definitely should be evaluated. A special note for Azure here is that their hosted SQL Server model is significantly less expensive than the equivalent in AWS. Also note that cloud hosted database services may not offer 100% of the features of the standalone products (e.g. Azure SQL Server doesn’t support the built-in encryption functions).
Additionally, investigate the different pricing models that the cloud provider offers. Many offer a per-hour discount for an up-front commitment. If you know the majority of your servers will be on 24/7, signing up for different plans instead of an ad-hoc model can yield a 20-40% savings. A specific note should also be made to the AWS spot instance market. Essentially, AWS sells off its excess capacity for what the market is willing to pay. While it is not effective to use for steady-state instances (e.g. database), responses to spike traffic/processing, as well as intermittent batch processing of data are great use cases. Discounts can be as steep as 90% – there are significant caveats, such as if the market price goes above your max the machines are automatically shut down, but is a very cost effective option for short-term instance needs.
In conclusion, there is more involved in getting your product ‘in the cloud’ than simply starting up a hosted virtual machine. Knowing all your options, and leveraging the behavior of elastic hosting can provide your product the scalability you need, at the lowest possible price over time.