Yasser Muwakki (Director of Digital Transformation) and Jon Holman (Senior AWS DevOps Engineer)
We would like to share with you how we rewrote a service that is part of a larger solution we implemented for one of our clients. This new version of the service maintains the same level of security and performance, increases the service’s availability and scalability while reducing the yearly cost from $1,730 to approximately $4.
We did this by moving this service from AWS ECS Fargate to AWS’s Functions as a Service (FaaS) offering, AWS Lambda. FaaS is the most cost-effective way to utilize cloud computing resources. Using the typical cloud compute services, such as EC2 and Fargate, your service needs to be available for potential requests 24 hours a day. If it is running, you are being charged for it. This is where serverless FaaS sets itself apart. Your code is deployed to the cloud provider and associated with its configured events. It’s not running, and you are not paying for it until those events occur. You pay for the seconds that your function is actually running to respond to those events. When the function completes running, you no longer pay, and it is still ready to service new requests.
In case these great potential cost savings are not reason enough to consider moving to Lambda, it also brings increased availability and scalability.
For availability, when working with virtual servers or containers and you want your solution to be able to sustain the loss of an availability zone or data center, you need to create a fully functioning application stack in multiple availability zones. With FaaS or Lambda, your function is not locked into an availability zone; it spins up where there is capacity when it is needed and is seamless for you.
For scalability, if you want your application to support a peak of 10,000 concurrent users using containers/virtual servers, you need to configure auto-scaling groups with scaling rules to start enough servers/containers to handle that many concurrent users. With Lambda, your function is invoked in response to events and more events mean more invocations. However, there is nothing special you need to do to pre-configure more servers/containers to scale up.
In this blog, our specific use case is a small component in a public-cloud information exchange platform utilized by thousands of users throughout the United States, which leverages Appian for front-end workflow and case management and Alfresco for backend content and records management. This entire system is hosted in Amazon Web Services (AWS). We created Java-based custom microservices to support the integration between these two systems. Currently, all the custom microservices run as docker containers orchestrated by AWS Elastic Container Service (ECS).
One of these microservices is called the token service. A token service’s role is to obfuscate Alfresco Node IDs by mapping them to random hexadecimal strings (called tokens). The token service is used by several dependent applications that all must authenticate to be authorized to use the endpoints. Each token has several permissions attached to it as well, which can restrict what the token can be used for, who can use it, what end-user IP address it can be used from, and how many times it can be used (default of single-use).
Token Service Features
- Allows systems integrating with Alfresco to obfuscate the Alfresco node to prevent hijacking of the URL or URL sharing
- Ensures that the tokening is the only piece of information passed through the browser is the very large random number (token)
- External systems can securely control through the stored token who the user is, what file they have access to and what kind of access they have (Download, Preview, Online Edit)
- Prevents brute-force attacks with token parameters:
- Expiration date: token is only valid before this date
- Retries: maximum number of times a token can be reused. By default, the token is for one-time use
- Credentials: any call to the token service requires HTTP basic authentication
A sample Alfresco Node UUID: f2f42052-6e24-4c54-bc15-1e0bb6dce9f7
A sample token: 2c1d036e41b586309819ec80c9118038178741caa4ed7e6e978145d1bcda95fb
The following table summarizes the comparisons between the current ECS-Fargate token service versus the Lambda based token implementation:
|Cost||$433/year per instance.
4 Instances = $1730/year
*Does not include any application load balancer costs
|API Gateway $0.35 per month
Lambda Free Tier
|Performance||366 ms per call||283 ms per call|
|Security||ECR Security Scan – 1 Medium threat on a python package||LambdaGuard and SonarQube – 100% pass – no security vulnerabilities|
|Reliability||0.2 % failure rate||0% failure rate|
|Splunk Integration||Side-Car container deployment in ECS||CloudWatch subscriptions with Lambda function posting direct to Splunk|
|Resources||Docker containers, ECS, ALB||Lambda, API Gateway, CloudWatch|
As you can see, the Lambda implementation clearly has an advantage with significantly lower costs. Let’s describe in more detail the design and cost break-down of the Fargate vs Lambda implementation approaches.
ECS-Fargate Token Service Overview
The ECS-Fargate implementation has the following design specifications:
- Java code
- Docker container runs Tomcat
- Orchestrated via ECS
- Runs using Fargate (serverless)
- Runs 24 hours per day, 7 days per week
- Logs integrates with Splunk through a side-car container
- Fronted by ALB (Application Load Balancer)
- 3 REST APIs
- Generate token
- Use token
- per vCPU per hour $0.04048
- per GB per hour $0.004445
To accommodate the number of requests per busy hour, the token service is required to run four containers 24 hours/day with the following ECS Task definition parameters:
- CPU: 1 vCPU
- Memory: 2 GB
- $0.04048 + (2 * $0.004445) an hour $0.04937 = $1.18488 a day for 1 Task
- 30 days – $35.5464 per Task
- 4 Tasks (Token Containers) = $142.1856 per month
- Splunk sidecar container
AWS Lambda Overview
AWS Lambda lets you run code without provisioning or managing servers. The following are some advantages of using Lambda:
- Low cost: you pay only for the compute time you consume, no more idle servers waiting for requests
- Improves scalability: functions are run in response to events, more events will trigger more instances of the functions
- Highly available: not locked into a single data center or availability zone
- Natively supports Python, Java, Go, PowerShell, Node.js, C#, and Ruby code
- Runs up to 15 minutes per execution
Lambda (FaaS) Token Service Architecture Diagram
Lambda Token Service overview
The new token service uses serverless technologies, more specifically Functions as a Service (FaaS), AWS Lambda. We chose to develop the lambda functions using Python. The Lambda functions were created in AWS using the AWS SAM extensions to CloudFormation; they were then attached to HTTP endpoints by API Gateway. The functions run on demand when those endpoints are invoked. To fulfill the requirement that all calls to the token service must require basic authentication, we implemented an additional Lambda function that is associated with API Gateway as an authorizer function. The authorizer function’s purpose is to allow or deny a request to a HTTP endpoint based on set of criteria; in our case, validating a basic authentication credential. The authorizer function ultimately returns an IAM policy to the API Gateway to allow or deny the request.
For log management, our client had standardized using Splunk to aggregate and retain application logs. So, we created a solution to send the various token service log entries to Splunk without impacting performance. We achieved this by creating an additional AWS Lambda function called CloudWatch Logs to Splunk and subscribed this function to each AWS Lambda function’s CloudWatch Log Group. This function was then invoked whenever an entry was made to those CloudWatch Log Groups, which in turn sent that data to Splunk.
This table gives a summary of the 5 Lambda functions:
|Generate Token||POST /tokenservice/services/api/token||Create Tokens|
|Use Token||GET /tokenservice/services/api/use||Validate and Use tokens|
|Version||GET /tokenservice/services/api/version||Version information|
|Authorizer||API Gateway Authorizer||Validates HTTP Basic Auth|
|CloudWatch Logs to Splunk||Subscription to CloudWatch Log Group||Sends logs to Splunk|
In conducting high concurrency performance tests, we confirmed that each Lambda function worked well with the minimum lambda memory allocation, 128 MB. Most API calls responded in under 300 milliseconds except for a handful of requests that took between 1 and 3 seconds due to Lambda cold starts. We estimate that the token service received 100,000 calls per month.
This table describes our calculations for the expected monthly costs of the new Lambda service.
Note: this is for an AWS account that has been established for over a year, so the “always free” free tier applies, but not the first “12 months free” free tier.
|AWS Usage||Monthly Amount||AWS Monthly Price|
|Lambda||100,000 Requests||Free < 1M requests|
|Lambda Compute||100k / (1024/128) =
12,500 GB Seconds
|Free < 400,000 GB seconds|
|API Gateway||100,000 Requests||$0.35|
DevOps Approach and Lessons Learned
Obviously, with any project you want everything to be defined as code, Infrastructure as Code (IaC). Doing things by hand is not recommended due to it being slow, error-prone, inconsistent, not scalable and not repeatable. So, it is important that everything in our project is defined as code, that way it is self-documenting of what is deployed, easily repeatable, ready to be moved into an automated pipeline and easy to iteratively improve.
For this AWS Lambda project, we chose to use the AWS Serverless Application Model (SAM) Framework. AWS SAM is an extension of AWS CloudFormation that makes the building of serverless projects even more efficient. We then used AWS CodePipeline and AWS CodeBuild to create a Continuous Integration Continuous Deployment (CICD) pipeline for our project.
By rewriting token service to utilize serverless technologies, we were able to greatly reduce the cost of running the service. The benefits extended beyond costs as well, as we simplified the architecture while increasing scalability and availability. We recommend AWS Lambda to be considered for any potential use cases, especially as AWS continues to invest into Lambda’s capabilities to accommodate larger workloads and broader use cases.