Yasser Muwakki (Director of Digital Transformation) and Jon Holman (Senior AWS DevOps Engineer)
We would like to share with you how we rewrote a service that is part of a larger solution we implemented for one of our clients. This new version of the service maintains the same level of security and performance, increases the service’s availability and scalability while reducing the yearly cost from $1,730 to approximately $4.
We did this by moving this service from AWS ECS Fargate to AWS’s Functions as a Service (FaaS) offering, AWS Lambda. FaaS is the most cost-effective way to utilize cloud computing resources. Using the typical cloud compute services, such as EC2 and Fargate, your service needs to be available for potential requests 24 hours a day. If it is running, you are being charged for it. This is where serverless FaaS sets itself apart. Your code is deployed to the cloud provider and associated with its configured events. It’s not running, and you are not paying for it until those events occur. You pay for the seconds that your function is actually running to respond to those events. When the function completes running, you no longer pay, and it is still ready to service new requests.
In case these great potential cost savings are not reason enough to consider moving to Lambda, it also brings increased availability and scalability.
For availability, when working with virtual servers or containers and you want your solution to be able to sustain the loss of an availability zone or data center, you need to create a fully functioning application stack in multiple availability zones. With FaaS or Lambda, your function is not locked into an availability zone; it spins up where there is capacity when it is needed and is seamless for you.
For scalability, if you want your application to support a peak of 10,000 concurrent users using containers/virtual servers, you need to configure auto-scaling groups with scaling rules to start enough servers/containers to handle that many concurrent users. With Lambda, your function is invoked in response to events and more events mean more invocations. However, there is nothing special you need to do to pre-configure more servers/containers to scale up.
In this blog, our specific use case is a small component in a public-cloud information exchange platform utilized by thousands of users throughout the United States, which leverages Appian for front-end workflow and case management and Alfresco for backend content and records management. This entire system is hosted in Amazon Web Services (AWS). We created Java-based custom microservices to support the integration between these two systems. Currently, all the custom microservices run as docker containers orchestrated by AWS Elastic Container Service (ECS).
One of these microservices is called the token service. A token service’s role is to obfuscate Alfresco Node IDs by mapping them to random hexadecimal strings (called tokens). The token service is used by several dependent applications that all must authenticate to be authorized to use the endpoints. Each token has several permissions attached to it as well, which can restrict what the token can be used for, who can use it, what end-user IP address it can be used from, and how many times it can be used (default of single-use).
Token Service Features
- Allows systems integrating with Alfresco to obfuscate the Alfresco node to prevent hijacking of the URL or URL sharing
- Ensures that the tokening is the only piece of information passed through the browser is the very large random number (token)
- External systems can securely control through the stored token who the user is, what file they have access to and what kind of access they have (Download, Preview, Online Edit)
- Prevents brute-force attacks with token parameters:
- Expiration date: token is only valid before this date
- Retries: maximum number of times a token can be reused. By default, the token is for one-time use
- Credentials: any call to the token service requires HTTP basic authentication
A sample Alfresco Node UUID: f2f42052-6e24-4c54-bc15-1e0bb6dce9f7
A sample token: 2c1d036e41b586309819ec80c9118038178741caa4ed7e6e978145d1bcda95fb
The following table summarizes the comparisons between the current ECS-Fargate token service versus the Lambda based token implementation:
||$433/year per instance.
4 Instances = $1730/year
*Does not include any application load balancer costs
|API Gateway $0.35 per month
Lambda Free Tier
||366 ms per call
||283 ms per call
||ECR Security Scan – 1 Medium threat on a python package
||LambdaGuard and SonarQube – 100% pass – no security vulnerabilities
||0.2 % failure rate
||0% failure rate
||Side-Car container deployment in ECS
||CloudWatch subscriptions with Lambda function posting direct to Splunk
||Docker containers, ECS, ALB
||Lambda, API Gateway, CloudWatch
As you can see, the Lambda implementation clearly has an advantage with significantly lower costs. Let’s describe in more detail the design and cost break-down of the Fargate vs Lambda implementation approaches.
ECS-Fargate Token Service Overview
The ECS-Fargate implementation has the following design specifications:
- Java code
- Docker container runs Tomcat
- Orchestrated via ECS
- Runs using Fargate (serverless)
- Runs 24 hours per day, 7 days per week
- Logs integrates with Splunk through a side-car container
- Fronted by ALB (Application Load Balancer)
- 3 REST APIs
- Generate token
- Use token
- per vCPU per hour $0.04048
- per GB per hour $0.004445
To accommodate the number of requests per busy hour, the token service is required to run four containers 24 hours/day with the following ECS Task definition parameters:
- CPU: 1 vCPU
- Memory: 2 GB
- $0.04048 + (2 * $0.004445) an hour $0.04937 = $1.18488 a day for 1 Task
- 30 days – $35.5464 per Task
- 4 Tasks (Token Containers) = $142.1856 per month
- Splunk sidecar container
AWS Lambda Overview
AWS Lambda lets you run code without provisioning or managing servers. The following are some advantages of using Lambda:
- Low cost: you pay only for the compute time you consume, no more idle servers waiting for requests
- Improves scalability: functions are run in response to events, more events will trigger more instances of the functions
- Highly available: not locked into a single data center or availability zone
- Natively supports Python, Java, Go, PowerShell, Node.js, C#, and Ruby code
- Runs up to 15 minutes per execution
Lambda (FaaS) Token Service Architecture Diagram
Lambda Token Service overview
The new token service uses serverless technologies, more specifically Functions as a Service (FaaS), AWS Lambda. We chose to develop the lambda functions using Python. The Lambda functions were created in AWS using the AWS SAM extensions to CloudFormation; they were then attached to HTTP endpoints by API Gateway. The functions run on demand when those endpoints are invoked. To fulfill the requirement that all calls to the token service must require basic authentication, we implemented an additional Lambda function that is associated with API Gateway as an authorizer function. The authorizer function’s purpose is to allow or deny a request to a HTTP endpoint based on set of criteria; in our case, validating a basic authentication credential. The authorizer function ultimately returns an IAM policy to the API Gateway to allow or deny the request.
For log management, our client had standardized using Splunk to aggregate and retain application logs. So, we created a solution to send the various token service log entries to Splunk without impacting performance. We achieved this by creating an additional AWS Lambda function called CloudWatch Logs to Splunk and subscribed this function to each AWS Lambda function’s CloudWatch Log Group. This function was then invoked whenever an entry was made to those CloudWatch Log Groups, which in turn sent that data to Splunk.
This table gives a summary of the 5 Lambda functions:
||Validate and Use tokens
||API Gateway Authorizer
||Validates HTTP Basic Auth
|CloudWatch Logs to Splunk
||Subscription to CloudWatch Log Group
||Sends logs to Splunk
In conducting high concurrency performance tests, we confirmed that each Lambda function worked well with the minimum lambda memory allocation, 128 MB. Most API calls responded in under 300 milliseconds except for a handful of requests that took between 1 and 3 seconds due to Lambda cold starts. We estimate that the token service received 100,000 calls per month.
This table describes our calculations for the expected monthly costs of the new Lambda service.
Note: this is for an AWS account that has been established for over a year, so the “always free” free tier applies, but not the first “12 months free” free tier.
||AWS Monthly Price
||Free < 1M requests
||100k / (1024/128) =
12,500 GB Seconds
|Free < 400,000 GB seconds
DevOps Approach and Lessons Learned
Obviously, with any project you want everything to be defined as code, Infrastructure as Code (IaC). Doing things by hand is not recommended due to it being slow, error-prone, inconsistent, not scalable and not repeatable. So, it is important that everything in our project is defined as code, that way it is self-documenting of what is deployed, easily repeatable, ready to be moved into an automated pipeline and easy to iteratively improve.
For this AWS Lambda project, we chose to use the AWS Serverless Application Model (SAM) Framework. AWS SAM is an extension of AWS CloudFormation that makes the building of serverless projects even more efficient. We then used AWS CodePipeline and AWS CodeBuild to create a Continuous Integration Continuous Deployment (CICD) pipeline for our project.
By rewriting token service to utilize serverless technologies, we were able to greatly reduce the cost of running the service. The benefits extended beyond costs as well, as we simplified the architecture while increasing scalability and availability. We recommend AWS Lambda to be considered for any potential use cases, especially as AWS continues to invest into Lambda’s capabilities to accommodate larger workloads and broader use cases.
The COVID-19 pandemic forced enterprises into a full work-from-home mode practically overnight. There was no time to test, evaluate and decide. We were thrown into this “new normal” where anything that can be done remotely is being done remotely.
“We’re being forced into the world’s largest work-from-home experiment and, so far, it hasn’t been easy for a lot of organizations to implement,” says Saikat Chatterjee, Senior Director at Gartner.
This “new normal” is testing every enterprise’s agility to its core. Two questions are key for every enterprise:
- Can we continue working?
- How do we ensure continuity of operations during the pandemic?
The stakes are high as enterprises struggle to adapt and minimize any losses in money, reputation, clients, employees, etc.
With a significant growth of interest for remote work, enterprises are scrambling to find ways to adapt their existing legacy systems to enable their workforce to work from home. The chatter around work from home is so high that it’s incomparable to any previous time, as the chart below shows. This study points out that “work from home” was a subject of 423 transcribed conversation in public companies:
I looked up his position title on LinkedIn and it just says “Senior Director” so we don’t need to include the advisory bit [JS1] [JS1]
Technology plays the leading role in this digital transformation drama. And, according to Gartner’s research, a high 54% of HR leaders indicate poor technology and infrastructure as the main problem for efficient work from home implementation.
The challenges of legacy technology in a work from home setting
The first challenge that users of legacy systems face is their inability to provide work from home capabilities at an enterprise-wide scale. Simon Migliano, the head of research at Top10VPN.com once commented about enterprises’ abilities to provide VPN access to their entire workforce: “We know of at least one company whose VPN capacity is 8,000 users…Now, they have over five times as many employees trying to connect, with predictably frustrating results.”
According to an estimation of Rob Smith, another Gartner analyst, around one-third of enterprises were without proper equipment and knowledge to work from home. Another one-third had no plan at all and had never planned ahead to create any kind of a telecommuting strategy…
Many of these enterprises who didn’t proactively update their systems and corporate culture for the growing work from home trend found themselves in a challenging situation. Their IT departments needed to quickly adapt existing, usually outdated, on-premise software solutions that were never built to provide telecommuting capabilities.
These on-premise enterprise content management solutions were built with the preconception that there will be people close by to monitor and manage infrastructure, processes, documents, etc. Such proprietary systems needed the physical presence of people, so they could work efficiently, reliably, and securely.
Security is another challenge for enterprises using outdated systems. This is especially the case in regulated industries where workers go through security routines to get to their desks. Now, with the work from home culture, all those physical security measures account for nothing as enterprises are forced to grant remote access to sensitive data and processes without a tried and tested security setup.
For all these problems, the “government, legal, insurance, banking and healthcare are all great examples,” says Sumir Karayi, CEO and founder of 1E. “Many companies and organizations in these industries are working on legacy systems and are using software that is not patched. Not only does this mean remote work is a security concern, but it makes working a negative, unproductive experience for the employee.”
That leads us to another challenge: workforce productivity. The loss of productivity in enterprises that were not prepared for remote work is due to a lack of modern technology and a lack of employee training. An outdated enterprise infrastructure coupled with an untrained workforce for a work from home setting means enterprises found themselves in unknown territory with no battle plan.
Employees in unprepared enterprises are now juggling work and life challenges without any prior training. We’ll be reviewing these workforce challenges in the next blog post. For the time being, we’ll keep our focus on the technology side of the problem.
Proprietary and on-premise solutions should be replaced with modern Software-as-a-Service (SaaS) solutions.
Why is SaaS the solution for enterprise management?
Secure data access from anywhere for the entire workforce is one of the key reasons why enterprises should consider cloud solutions. Picking the right SaaS solution coupled with proper training can lead to productivity growth and employee satisfaction. Consequently, this will result with improved customer satisfaction. Enterprises that have planned and prepared for a work from home culture can, in fact, see solid growth during this COVID-19 period.
In general, SaaS companies and providers are offering a variety of enterprise solutions, such as:
- enterprise content management (ECM),
- business process management (BPM),
- customer relationship management (CRM),
- document management,
- case management,
- payroll and billing processing,
- human resource management
Today, according to 451 Research, the cloud is the new mainstream, with approximately 90% of organizations surveyed using some type of cloud service. In 2019, around 60% of all workloads were running on some form of a hosted cloud service. This represents a huge rise from 45% just a year ago. Amazon Web Services (AWS) is the leading cloud vendor with a 32% market share with an annual growth of 41%.
According to the IDG survey, 89% of companies use some kind of SaaS services. In another study, Cisco was cited as stating that by 2021, approximately 75% of all cloud workloads and compute instances will be SaaS. The leading reason for SaaS adoption across enterprises are:
- ability to work outside of the office (42%).
- ease of disaster recovery (38%),
- flexibility (37%),
- offloading IT support (36%)
The rate of SaaS adoption by enterprises has only accelerated because of the pandemic. At what rate, we’re yet to see. We should see a faster adoption pace as more enterprises move off of proprietary solutions to SaaS solutions simply because the old systems don’t provide the flexibility required by the new normal enforced by the COVID-19 pandemic.
SaaS benefits beyond productivity
SaaS is quickly becoming a reliable choice for private enterprises and government organizations. Enterprise executives are considering SaaS during this pandemic more than ever as they seek to adjust their organizations to a work from home culture and ensure staff efficiency.
It’s not only efficiency, though. According to the Rackspace study, from 1,300 surveyed companies, 88% of the enterprises with cloud services have experienced cost savings. Additionally, 56% reported an increase in profits.
Let’s look at a few other obvious benefits enterprises and agencies can gain from using SaaS solutions over the more traditional on-premise approach:
- Reduction of the hardware cost
When using SaaS, there will be no need to maintain the existing server infrastructure on-premise. Cutting out the cost of hardware purchases and maintenance is especially important for fast-growing enterprises. New hardware can be bulky, expensive and demand special treatment and HVAC improvements. SaaS-based cloud solutions overcomes these issues. Furthermore, the cost of repairing and replacing hardware components is now passed on to the vendors to worry about.
- Reduction of electricity and real estate expenses
The reduction of on-premise hardware will directly and positively affect the organization’s electricity and real estate expenses. Enterprises that adopt a SaaS solution for their ECM needs will free up real estate in their buildings, cut electricity costs to power that IT equipment and reduce their HVAC bills as there would be no need to climatize large rooms full of heat-producing equipment.
With offloading the needed hardware concerns to the SaaS supplier, enterprises and agencies will no longer need a considerable workforce that maintains those systems. Routine maintenance, patching, hardware and software upgrades are all part of the SaaS offer, and this can be a considerable amount that companies save every month.
- Reduce the time and cost for implementation and training
The deployment of a new cloud-based solution is considerably faster than any other conventional system implementation. While an on-premise solution would take months of work to set up the infrastructure, install the software and do internal training, a SaaS solution is set up in a few hours. Of course there will be exceptions to thissince some enterprises may need customization to an existing SaaS solution. But, all customization would be handled by a much larger IT team who can deploy a solution much faster. Most SaaS providers already have ready-made libraries of support materials like manuals or how-to videos, which makes training a self-service task for employees.
- Flexible cost of ownership
The typical SaaS pricing model is pay-as-you-go. Sometimes, there will be an initial setup fee if customizations are needed. After the setup, organizations will be looking at a relatively small fee each month, depending on the size of their workforce and features/resources they use. This pricing model is flexible, and it offers different ways for organizations to reduce overhead:
- First, the enterprise does not face a steep annual license With SaaS, there is no such thing as a software license. You rent the service and pay a small fee per user, per month.
- Second, this pay-as-you-go software can be canceled at any time. For example, if a pandemic happened. Or monthly fees can be reduced accordingly if and when the enterprise decides to reduce the workforce. The flip side, of course, is also the ease of scaling up the workforce. The only expense here would be the few minutes needed to create a new user profile for the employee, and they’d be ready to start using the software and the vendor-produced training materials.
- Third, enterprise SaaS solutions usually come with a utilization fee. For example, this means that enterprises won’t face fixed fees for data storage or computing power if they don’t use these services that much… This helps enterprises avoid fixed expenses of additional hardware that they would only use in peak periods. Cloud infrastructure providers, like AWS, are enabling SaaS providers to get the maximum value from server farms, and this benefit is clearly transferable to the end user, which in this case is the enterprise or organization using a SaaS solution.
This is clearly a very short list of extra benefits besides employee productivity and employee satisfaction improvements. As stated in the statistics above, enterprises and organizations are aware of the benefits of cloud-based solutions. This is why more and more of them are replacing proprietary and on-premise solutions with SaaS solutions. The pandemic only expedited this migration to the cloud as the work from home mode is now the only viable way for some organizations to continue operations.
The final step: SaaS-based Technology implementation
It used to be the case that the most prepared organizations in terms of software and hardware were the most resistant to cloud-based solutions. Their upfront investment made the cloud idea a bit redundant. But, times have changed. As Rick Holland, CISO and vice president of strategy at Digital Shadows, stated for Threatpost: “One of the unintended consequences of COVID-19 will likely be increased zero trust adoption that further embraces cloud services, eliminates VPNs, and enables employees to work from anywhere.”
Continuity of Operations must be provided. Enterprises will offload everything on the cloud: software, data storage, operations, processing power, users management, etc. Before the pandemic, Gartner estimates 50% of government organizations across the US were using cloud solutions. This number will rise even faster during a pandemic as SaaS solutions are already built with remote work in mind.
The crucial step of implementation of SaaS-based solutions is finding a reliable technology partner. This partner would ideally be a company that already has the know-how and routines in place to perform critical data migration and workflow creation processes using reliable technology… A company, which is recognized for its expertise, reliability, and ability to work under pressure.
Armedia LLC is a CMMI Level 3 company that provides a niche focus in Enterprise Content Management (ECM) technical and advisory services. We are a proven provider in delivering modern, flexible, robust, and scalable solutions to federal/state/local government, as well as commercial enterprises. With 18 years of deep ECM experience, our skilled team with industry certifications has helped deploy hundreds of ECM solutions. In just the past year, Armedia has actively supported over 20 initiatives in the government and commercial market.
For more info, don’t hesitate to contact us. We’d love to hear your thoughts on the topic, and we’d appreciate you sharing this blog post on your social media.