Dynamic Content Caching with AWS Lambda, S3 and AWS CloudFront
Serving content quickly is essential for any website or application to have better customer experience. If you have hosted your website or application in AWS Cloud, then it’s very easy to serve the content quickly with lower latency irrespective of where your application is being accessed from. AWS offers CloudFront service for caching the content in edge locations local to each user’s geographic location.
In this example we shall show you how images can be retrieved and transformed with AWS Lambda function, cache locally in CloudFront edge locations and how to invalidate these caches when the cached image is updated in the origin server. This approach can be applied to any other type of content.
1. What is Amazon CloudFront?
Amazon CloudFront is a Content Delivery Network (CDN) web service offered by AWS (Amazon Web Services) that allows the content to be available through multiple edge locations across worldwide. Amazon CloudFront currently offers 149 edge locations and 11 Regional Edge caches.
2. What is AWS Lambda?
AWS Lambda is an event-driven, serverless computing platform that runs the code without provisioning or managing servers. AWS Lambda executes code(functions) based on event triggers. For example, a simple use case would be, every time an image is uploaded to S3 bucket, Lambda function can be triggered to transform and resize the image. AWS Lambda functions can also be executed through an API built with AWS API Gateway.
3. What is Amazon S3?
AWS S3 (Simple Storage Service) is a Low-cost, Secure, Durable, Highly Available, and Horizontally Scalable object storage service where data can be stored, accessed, and easily backed up. You can store any amount of information from anywhere.
4. Fetch Images from Origin Server and Store in S3
Let’s say you have a Media server that hosts all images required by your website and assume this Server is able to trigger notification when an existing image is updated or a new image is added. Using AWS Lambda, these images can be retrieved, transformed, resized and store to S3 bucket.
The above diagram Figure 1 shows AWS Lambda function that retrieves raw images from the Media server, transform and post it to S3 bucket. It also listens to Message Queue for image updates posted by Media Server and refresh the images in S3.
5. Caching Content with Amazon CloudFront
Amazon CloudFront speeds up the delivery of your content through Edge caches. When user accesses your website or application and requests content, the request is routed to the nearest CloudFront edge location. Only for first user there will be latency in retrieving the content and all subsequent users of the same content will be able to retrieve the content quickly as the content will be cached in edge location.
Following is the process that happens on user’s request for content:
- CloudFront checks its cache for the requested
object. If the requested object is found in the cache, it’ll be returned. - If the requested object is not found in the CloudFront
cache,- The request will be redirected to the
configured Origin Server. Origin
Server could be S3 Bucket, Custom HTTTP Server or any other server.
- CloudFront caches the object returned from
Origin Server in the nearest edge location and returns to the user.
- The request will be redirected to the
Objects in CloudFront can be cached for a configured TTL (Time-To-Live), and once the TTL is expired the object will no longer be available to serve from the cache as shown in Figure 2.
6. How to Configure CloudFront on AWS Console
To
Configure CloudFront, first CloudFront distribution has to be created. Choose
Web distribution for content distribution and RMTP distribution for streaming
media files. For this case, let’s select Web distribution.
6.1 Create CloudFront Distribution
Login to AWS Console -> Select CloudFront Service -> Create Distribution -> Select Web distribution -> Get Started
To use S3 as an Origin Server, for Origin Domain Name select the S3 bucket that is already created. For Origin Path enter the directory path where the objects are stored. To use custom HTTP Server as Origin Server, for Origin Domain Name enter the DNS name of the custom origin server and for Origin Path, enter the context path.
To use S3 as an Origin Server, for Origin Domain Name select the S3 bucket that is already created. For Origin Path enter the directory path where the objects are stored. To use custom HTTP Server as Origin Server, for Origin Domain Name enter the DNS name of the custom origin server and for Origin Path, enter the context path as shown in Figure 3.
6.2 Cache Behavior Settings
If you want to allow only HTTPS access, then enable Redirect HTTP to HTTPS. In Object Caching, if “Use Origin Cache
Headers” is selected, then CloudFront will use Cache-Control header(max-age) from Origin Server response
unless “s-max-age” is also returned,
in which case s-max-age is used.
If you select Customize (for Object Caching), you may configure Minimum TTL, Maximum TTL and Default TTL as shown in Figure 4.
- Minimum TTL is the amount of time (in seconds) the object stays in the cache before CloudFront forwards the request to Origin Server. Object is cached for minimum of this much time even if Cache-control header from Origin Server has a lower value.
- Max TTL is the amount of time (in seconds) the object stays in the cache before CloudFront forwards the request to Origin Server. Max TTL is effective only when the origin server returns Cache-control headers. Object will not be cached longer than this, even if Cache-Control header from Origin Server has higher value.
- Default TTL is effective only when Origin Server not returns any Cache-control headers.
6.3 Query Parameter Whitelisting
When you don’t want to cache the images or content by All query parameters but by only few query parameters, you can configure CloudFront to cache only by selected query parameteres through Query String Whitelist as shown in Figure 5, so only those query parameters will be included in the cache key. It still forwards all the query parameters to the Origin server.
Create an Alias record to map Route53 CNAME record with CloudFront distribution Url, so request to Alias will be mapped to CloudFront distribution and then to the origin.
7. Invalidating Cached Objects in CloudFront
Once the objects are cached in CloudFront, they will stay in the cache until the cache TTL is expired. If an object is updated in the origin server, there is no way CloudFront knows about this, but will continue to serve the stale object from it’s cache. To avoid serving stale data objects, CloudFront allows to invalidate objects cache through different means.
From AWS Documentation, you can have invalidation requests for up to 3,000 files per distribution in progress at one time and each invalidation request can include up to 3000 URLs. Note that after initiating CloudFront cache invalidation, CloudFront takes few minutes to remove the objects from all the edge locations. i.e it might take some time to get the result of invalidation. The amount of time it takes for invalidation is dependent on number of object URLs included in the invalidation request.
7.1 Invalidating Objects from AWS Console
Following are the steps to follow to invalidate cached objects from AWS Console.
- Sign in to the AWS Management Console and open the CloudFront console
- Choose the distribution for which you want to invalidate files.
- Choose Distribution Settings and then choose the Invalidations tab.
- Choose Create Invalidation and enter the invalidation path e.g “originserver/image/ IMG54330080*”. This wild card path deletes all the images cached for the imageId “ IMG54330080“
7.2 Invalidating Objects using CLI(Command Line Interface)
The following command creates an invalidation for a CloudFront distribution for a given distribution Id. Command Syntax:
aws cloudfront create-invalidation --DistributionId --paths
Example:
aws cloudfront create-invalidation --distribution-id K14EK9G5DZUEWO --paths /originserver/image/IMG54330080*
{ "Location": "https://cloudfront.amazonaws.com/2019-01-25/distribution/K14EK9G5DZUEWO/invalidation/IUNZX941WYQR8", "Invalidation": { "Id": "IUNZX941WYQR8", "Status": "InProgress", "CreateTime": "2019-01-17T17:07:57.636Z", "InvalidationBatch": { "Paths": { "Quantity": 1, "Items": [ "/originserver/image/IMG54330080*" ] }, "CallerReference": "cli-324234242-463845" } } }
7.3 Invalidating Objects through AWS SDK
Following java code snippet is for invalidating the objects through Amazon SDK.
AWSCredentials awsCredentials = new DefaultAWSCredentialsProviderChain().getCredentials(); AmazonCloudFrontClient client = new AmazonCloudFrontClient(awsCredentials); Paths invalidation_paths = new Paths().withItems("/originserver/image/IMG54330080*", "/image/path/imageA.jpg").withQuantity(1); InvalidationBatch invalidation_batch = new InvalidationBatch(invalidation_paths, "Asset_Image_Cache"); CreateInvalidationRequest invalidation = new CreateInvalidationRequest("distributionID", invalidation_batch); CreateInvalidationResult ret = client.createInvalidation(invalidation); System.out.println("Invalidation result: " + ret.toString());
If you configured CloudFront to forward query parameters to your Origin Server, you must include the query parameters while invalidating the files as the CacheKey was created with query parameters included at the time of cache creation. E.g
http://cdn.originserver.com/image/IMG54330080?category=headshot&type=high-resolution&width=220&height=330&aspectRatio=3.5
Alternatively you can use wildcard * in the invalidation url. Note that in this case all cache entries with CacheKeys matching with the URL (with wildcard) will be invalidated. E.g
http://cdn.originserver.com/image/IMG54330080*
AWS Lambda function can also be used for invalidating the CloudFront objects when a new/updated Object is uploaded to S3.
Following Figure 6 shows when an updated object is uploaded to S3, it triggers Lambda function, which invalidates the Object in CloudFront Cache.
8. Summary
AWS
CloudFront is a powerful content caching service to quickly serve the content
to users of your website or application without worrying about where the
website or application is hosted and from where the users are accessing your
application. Using this Service along with other AWS services such as Lambda,
S3 etc.. help create the dynamic cache, so near real time data can be served to
users quickly. It’s also possible to invalidate the CloudFront Object cache, if
required.
In java code what is Asset_Image_Cache