January 11, 2018
The Silence of the Lambdas – 5 Anti-Patterns for AWS Lambda
It’s no secret that AWS is pushing their serverless offerings at every opportunity. Serverless containers, storage, NOSQL, and even relational databases are abstracting the running of product software away from the underlying infrastructure that they are running on. At the core of all of AWS’ serverless landscape is their Lambda product. It is Function-as-a-service (FaaS), meaning it executes code packages on various event-driven triggers, like HTTP calls, notification topics, S3 file drops, and even scheduled cron jobs.
Here at 3Pillar Global, we are using them to build serverless products that span computer vision, data processing, and all kinds of product development, both for our customers as well as internally. In using them, however, we have found a number of ‘gotchas’ that you should look out for as you adopt this new model of cloud computing. We’ve gathered them up for you here, and hope that these pointers help sidestep - or at least prepare for - challenges that you may experience with Lambda.
1. Building Lambdas Like Server-Full Code
When you’re writing code to run on traditional server software, you typically take advantage of server startup time, and when an end-user is involved, pre-load wherever you can to minimize code execution time. In Lambda, your server can effectively restart anytime, so you can - and will - pay and re-pay that startup cost. Focus on writing code that is streamlined and fast-to-answer. Do not load anything that isn’t needed until it’s needed.
This is especially relevant if you’re using Java as your language, as almost all practices are about loading classes at startup. This is exacerbated by how easy it is to inadvertently pull in massive dependency trees when using frameworks like Spring. I am a huge Spring fan, but it is not well-suited for Lambda container lifecycles. I also suggest being brutal in adding dependencies to your package/pom/gradle/nuget file. If you’re hitting the Lambda code limit (50 MB), you can analyze your dependency tree and possibly even put in some explicit excludes.
That said, you can leverage the times when the container for your code is reused by basically taking a singleton approach to expensive resources - use it if it’s there, but don’t assume it is there, initialize it if it’s not. There is also a ‘/tmp’ mount on all Lambda containers that you can use as a scratch area, but again, you cannot assume the same container will be used from invocation to invocation. As a ‘last’ resort, you may leverage CloudWatch Events to periodically ‘ping’ your function to keep it hot. I would not recommend this unless it provides significant benefit, and even then you cannot rely on it always working, so you would still need to handle hitting a ‘cold’ Lambda.
2. Ignoring Monitoring Services
Normally, you can investigate the server logs for issues. Having no servers doesn’t mean no logs, though - by default, function output gets routed to CloudWatch Logs. These logs are organized by ‘streams,' so finding the exact execution you’re looking for can be difficult. Make your life easier by utilizing a correlation ID across API/Lambda invocation. Additionally, provide unique text identifiers on each error to make finding the records much easier. For performance-related inquiries, the last line of Lambda logging includes total request time and memory used. Also, X-Ray provides insight into container startup and tracing capabilities. Leverage it to get into the details of operation.
While CloudWatch Logs will never run out of drive space like a server might, the default retention of log events is forever. To avoid an ever-increasing cost, you may want to change your retention policies to have the logs auto-expire in an acceptable time frame.
3. Doing it all Manually
AWS’ serverless options are quite widespread and getting broader by the week. When attempting to build a production solution backed by serverless technologies, you can very easily get overwhelmed trying to define all the parts and wiring them all together. Thankfully, many solutions have emerged in the market to simplify putting it all together. AWS has two related solutions - the Serverless Application Model (SAM), and a python-specific version called ‘Chalice.' The non-AWS solutions - Serverless.com, Apex, and Zappa - are similar in nature, although they do offer multi-cloud support, since serverless is not just an AWS thing.
In any case, be sure to leverage the ability to define secondary resources (e.g. IAM roles, S3 buckets, DynamoDB tables) that your services depend on. Given how easy it is to add these resources, it’s a great time to push ‘infrastructure as code’ if you haven’t already. Controlling your supporting resources by storing the needs in the source code repository, and eliminating manual deployment and configuration greatly stabilizes your product operations.
4. Failure to Establish Standards & Conventions
AWS Lambda is very open in terms of how you want to configure it - any function that matches the right signature can be defined as the handler for the function. You can name things however you want. If you just rush in, you will likely find yourself in a rat’s nest of code that is incredibly hard to maintain and troubleshoot. I suggest that you establish naming and environmental conventions early - e.g. always name your function handler the same as the function name and name the method ‘handler’ (or whatever pattern you want), just define one and enforce it.
Since not all AWS resources support the concepts of ‘environments,' be sure to use naming conventions on things like S3 Bucket names, DynamoDB table names, etc. And have the Lambda code be passed in the environment it’s running in as a means of mapping it all together.
Sit down and also decide logically where you draw the lines between services, functions, code repositories, etc. I would start coarse-grained and split things as the product/code gets more complex. It is far easier to split code than to merge code.
Lastly, one of the benefits of Lambda is its polyglot nature - you can code each function in a separate supported language if desired. I would highly recommend keeping your product to as few languages as possible, but do be open to the option of leveraging other languages if there is a library or capability needed (Java and Python come to mind here). Keep these as the ‘exception’ rather than the rule to reduce cognitive overhead.
5. Don't stop regular best practices just because it's serverless
There are many practices that people have a habit of dropping just because of the different nature of deploying serverless code. That, combined with the nature of getting started with a new technologies, causes many efforts to skip some incredibly important coding practices. Just because your code is no longer running on explicit hardware doesn’t absolve you from bugs. You should still use the same level of rigor in your source control, still perform code reviews, and still perform static analysis of your code.
In fact, AWS provides many code release capabilities that, themselves, are serverless, including CodeCommit for git repository, CodeBuild for CI build, CodeDeploy for pushing things, and CodePipeline to orchestrate it all together. Additionally, you will still need to write unit tests and execute them at build time. Lacking a server doesn’t lessen the value of testing. You can use your standard set of testing tools for your language of choice; a benefit of Function-as-a-service is that it attempts to epitomize single responsibility principle, which actually lends itself well to testing. You can also create additional functions to use as test harnesses and/or utilities.
Lastly, there are a couple of ways to perform ‘local’ development. The first is to use developer-specific environments and still deploy your code and functions to AWS. This has the benefit of the code operating in an identical environment as where it will be deployed to, but has a few minor drawbacks - breakpoints are more difficult to manage, and there is a cost involved in deploying to AWS (not a lot, but it’s there). Add in the clutter that having an environment per team or per developer in addition to ‘dev,' ‘test,' and ‘prod,' and you can see that there is an upkeep cost. Fortunately, there are multiple solutions - AWS provides ‘SAM Local,’ Serverless.com has local invocation of functions, and there’s even localstack - a very robust collection of ‘local’ instances of many AWS services, even including runnable as Docker containers. These solutions can be leveraged to rapidly deploy to a developer’s machine to debug efficiently without polluting your AWS account and/or git repositories.
Special Bonus Lambda Gotchas
Recursion is risky with no limits
A last warning is to watch out for recursive execution of functions, whether intentional or not. In a normal environment, your CPU would max out if you inadvertently put yourself in an infinite loop (the function triggers an event, which in turn triggers the function…). In serverless, you will have executed a “DoW Attack” - Denial of Wallet attack on yourself - and your $10-$20 development bill can shoot up to the thousands with little warning. This is an anti-pattern for all event-driven models, but with the autoscaling capacity of AWS, it can really be an awkward conversation with your Engineering VP or CFO. Some ways to detect or prevent this is are to put CloudWatch event warnings against your total Lambda invocations, or to implement billing alerts. If recursion is really necessary for your product, you could pass data between function calls (in the event object) to keep a recursion count, and put in a failsafe that will abort execution if it reaches a wildly unreasonable level - say 10,000.
Idempotence is Key
There’s a dirty little secret about Lambda execution - your function may be triggered multiple times for the same root event. Some of this is that many of the potential sources are 'deliver-at-least-once,' so may actually fire multiple times; the other reason is that Lambda, under certain circumstances, may actually retry execution of your code (more details here). Because of this, your Lambda code should all be idempotent. While this is trivial for read operations, it can become significantly complicated in write operations. The ‘easiest’ way to handle this is to leverage the request ID that is passed in from all sources, and find a way within your application logic to see if that request ID has already been processed. If events are passed around, be sure to include the original source request ID in the payload of later events.
In closing, the future of product deployment will absolutely include serverless aspects - and on AWS, that means Lambda. Moving to these features opportunistically can provide much of the promise of microservices, and if you do it right, as few of the negatives as possible.
Stay cloudy my friends.