Enterprise Java

Dynamic Content Caching with AWS Lambda, S3 and AWS CloudFront

Serving content quickly is essential for any website or application to have better customer experience. If you have hosted your website or application in AWS Cloud, then it’s very easy to serve the content quickly with lower latency irrespective of where your application is being accessed from. AWS offers CloudFront service for caching the content in edge locations local to each user’s geographic location.

In this example we shall show you how images can be retrieved and transformed with AWS Lambda function, cache locally in CloudFront edge locations and how to invalidate these caches when the cached image is updated in the origin server. This approach can be applied to any other type of content.

1. What is Amazon CloudFront?

Amazon CloudFront is a Content Delivery Network (CDN) web service offered by AWS (Amazon Web Services) that allows the content to be available through multiple edge locations across worldwide. Amazon CloudFront currently offers 149 edge locations and 11 Regional Edge caches.

2. What is AWS Lambda?

AWS Lambda is an event-driven, serverless computing platform that runs the code without provisioning or managing servers. AWS Lambda executes code(functions) based on event triggers. For example, a simple use case would be, every time an image is uploaded to S3 bucket, Lambda function can be triggered to transform and resize the image. AWS Lambda functions can also be executed through an API built with AWS API Gateway.

3. What is Amazon S3?

AWS S3 (Simple Storage Service) is a Low-cost, Secure, Durable, Highly Available, and Horizontally Scalable object storage service where data can be stored, accessed, and easily backed up. You can store any amount of information from anywhere.

4. Fetch Images from Origin Server and Store in S3

Let’s say you have a Media server that hosts all images required by your website and assume this Server is able to trigger notification when an existing image is updated or a new image is added. Using AWS Lambda, these images can be retrieved, transformed, resized and store to S3 bucket.

Dynamic Content Caching - Fetch Image from Origin Server
Figure 1 : Fetch Image from Origin Server using Lambda 

The above diagram Figure 1 shows AWS Lambda function that retrieves raw images from the Media server, transform and post it to S3 bucket. It also listens to Message Queue for image updates posted by Media Server and refresh the images in S3.

5. Caching Content with Amazon CloudFront

Amazon CloudFront speeds up the delivery of your content through Edge caches. When user accesses your website or application and requests content, the request is routed to the nearest CloudFront edge location. Only for first user there will be latency in retrieving the content and all subsequent users of the same content will be able to retrieve the content quickly as the content will be cached in edge location.

Following is the process that happens on user’s request for content:

  • CloudFront checks its cache for the requested
    object. If the requested object is found in the cache, it’ll be returned.
  • If the requested object is not found in the CloudFront
    cache,

    • The request will be redirected to the
      configured Origin Server. Origin
      Server could be S3 Bucket, Custom HTTTP Server or any other server.
    • CloudFront caches the object returned from
      Origin Server in the nearest edge location and returns to the user.

Objects in CloudFront can be cached for a configured TTL (Time-To-Live), and once the TTL is expired the object will no longer be available to serve from the cache as shown in Figure 2.

Dynamic Content Caching - Cache Image Content
Figure 2 : Cache Image Content with CloudFront

6. How to Configure CloudFront on AWS Console

To
Configure CloudFront, first CloudFront distribution has to be created. Choose
Web distribution for content distribution and RMTP distribution for streaming
media files. For this case, let’s select Web distribution.

6.1 Create CloudFront Distribution

Login to AWS Console -> Select CloudFront Service -> Create Distribution -> Select Web distribution -> Get Started

To use S3 as an Origin Server, for Origin Domain Name select the S3 bucket that is already created. For Origin Path enter the directory path where the objects are stored. To use custom HTTP Server as Origin Server, for Origin Domain Name enter the DNS name of the custom origin server and for Origin Path, enter the context path.

To use S3 as an Origin Server, for Origin Domain Name select the S3 bucket that is already created. For Origin Path enter the directory path where the objects are stored. To use custom HTTP Server as Origin Server, for Origin Domain Name enter the DNS name of the custom origin server and for Origin Path, enter the context path as shown in Figure 3.

Dynamic Content Caching -  Create CloudFront Distribution
Figure 3 : Create CloudFront Distribution from AWS Console

6.2 Cache Behavior Settings

If you want to allow only HTTPS access, then enable Redirect HTTP to HTTPS. In Object Caching, if “Use Origin Cache
Headers” is selected, then CloudFront will use Cache-Control header(max-age) from Origin Server response
unless “s-max-age” is also returned,
in which case s-max-age is used.

If you select Customize (for Object Caching), you may configure Minimum TTL, Maximum TTL and Default TTL as shown in Figure 4.

  • Minimum TTL is the amount of time (in seconds) the object stays in the cache before CloudFront forwards the request to Origin Server. Object is cached for minimum of this much time even if Cache-control header from Origin Server has a lower value.
  • Max TTL is the amount of time (in seconds) the object stays in the cache before CloudFront forwards the request to Origin Server. Max TTL is effective only when the origin server returns Cache-control headers. Object will not be cached longer than this, even if Cache-Control header from Origin Server has higher value.
  • Default TTL is effective only when Origin Server not returns any Cache-control headers.
Dynamic Content Caching - Cache Behavior Settings
Figure 4 :  Cache Behavior Settings on CloudFront Distribution

6.3 Query Parameter Whitelisting

When you don’t want to cache the images or content by All query parameters but by only few query parameters, you can configure CloudFront to cache only by selected query parameteres through Query String Whitelist as shown in Figure 5, so only those query parameters will be included in the cache key. It still forwards all the query parameters to the Origin server.

Dynamic Content Caching - Whitelist Query Parameters
Figure 5 : Whitelist Query Parameters in CloudFront

Create an Alias record to map Route53 CNAME record with CloudFront distribution Url, so request to Alias will be mapped to CloudFront distribution and then to the origin.

7. Invalidating Cached Objects in CloudFront

Once the objects are cached in CloudFront, they will stay in the cache until the cache TTL is expired. If an object is updated in the origin server, there is no way CloudFront knows about this, but will continue to serve the stale object from it’s cache. To avoid serving stale data objects, CloudFront allows to invalidate objects cache through different means.

From AWS Documentation,  you can have invalidation requests for up to 3,000 files per distribution in progress at one time and each invalidation request can include up to 3000 URLs. Note that after initiating CloudFront cache invalidation, CloudFront takes few minutes to remove the objects from all the edge locations. i.e it might take some time to get the result of invalidation. The amount of time it takes for invalidation is dependent on number of object URLs included in the invalidation request.

7.1 Invalidating Objects from AWS Console

Following are the steps to follow to invalidate cached objects from AWS Console.

  • Sign in to the AWS Management Console and open the CloudFront console
  • Choose the distribution for which you want to invalidate files.
  • Choose Distribution Settings and then choose the Invalidations tab.
  • Choose Create Invalidation and enter the invalidation path e.g “originserver/image/ IMG54330080*”. This wild card path deletes all the images cached for the imageId “ IMG54330080

7.2 Invalidating Objects using CLI(Command Line Interface)

The following command creates an invalidation for a CloudFront distribution for a given distribution Id. Command Syntax:

aws cloudfront create-invalidation --DistributionId --paths 

Example:

aws cloudfront create-invalidation --distribution-id K14EK9G5DZUEWO  --paths /originserver/image/IMG54330080*

{
     "Location": "https://cloudfront.amazonaws.com/2019-01-25/distribution/K14EK9G5DZUEWO/invalidation/IUNZX941WYQR8",
     "Invalidation": {
         "Id": "IUNZX941WYQR8",
         "Status": "InProgress",
         "CreateTime": "2019-01-17T17:07:57.636Z",
         "InvalidationBatch": {
             "Paths": {
                 "Quantity": 1,
                 "Items": [
                     "/originserver/image/IMG54330080*"
                 ]
             },
             "CallerReference": "cli-324234242-463845"
         }
     }
 }

7.3 Invalidating Objects through AWS SDK

Following java code snippet is for invalidating the objects through Amazon SDK.

AWSCredentials awsCredentials = new DefaultAWSCredentialsProviderChain().getCredentials();

AmazonCloudFrontClient client = new AmazonCloudFrontClient(awsCredentials);

Paths invalidation_paths = new Paths().withItems("/originserver/image/IMG54330080*", "/image/path/imageA.jpg").withQuantity(1);

InvalidationBatch invalidation_batch = new InvalidationBatch(invalidation_paths, "Asset_Image_Cache");

CreateInvalidationRequest invalidation = new CreateInvalidationRequest("distributionID", invalidation_batch);

CreateInvalidationResult ret = client.createInvalidation(invalidation);

System.out.println("Invalidation result: " + ret.toString());

If you configured CloudFront to forward query parameters to your Origin Server, you must include the query parameters while invalidating the files as the CacheKey was created with query parameters included at the time of cache creation. E.g

http://cdn.originserver.com/image/IMG54330080?category=headshot&type=high-resolution&width=220&height=330&aspectRatio=3.5 

Alternatively you can use wildcard * in the invalidation url. Note that in this case all cache entries with CacheKeys matching with the URL (with wildcard) will be invalidated. E.g

http://cdn.originserver.com/image/IMG54330080*

AWS Lambda function can also be used for invalidating the CloudFront objects when a new/updated Object is uploaded to S3.

Following Figure 6 shows when an updated object is uploaded to S3, it triggers Lambda function, which invalidates the Object in CloudFront Cache.

Dynamic Content Caching - Invalidate CloudFront objects
Figure 6 : Invalidate CloudFront objects through Lambda function

8. Summary

AWS
CloudFront is a powerful content caching service to quickly serve the content
to users of your website or application without worrying about where the
website or application is hosted and from where the users are accessing your
application. Using this Service along with other AWS services such as Lambda,
S3 etc.. help create the dynamic cache, so near real time data can be served to
users quickly. It’s also possible to invalidate the CloudFront Object cache, if
required.

Pushpa Sekhara R Matli

Pushpa Sekhara R Matli is a Hands on Enterprise Architect with more than 15 years of IT experience in analyzing, planning, architecting, and implementing enterprise applications using a comprehensive approach. He has successfully architected, executed and implemented various highly scalable and highly available solutions for large complex projects in multiple domains such as Video, Finance and Logistics for Fortune 500 companies
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
sai
sai
2 years ago

In java code what is Asset_Image_Cache


Back to top button