You are on page 1of 12

19/02/2021 PDF Generation With AWS Lambda.

ith AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

Anushka Rustagi
Oct 5, 2020 · 9 min read

PDF Generation With AWS Lambda

Serverless Computing is a great way of utilising resources of the cloud. It enables


services to scale without the overhead of provisioning and managing servers with
reduced cost. We have deployed one such service, PDF generation service on AWS
Lambda which we will be discussing in this article. PDF generation process is done in
many domains to create reports, bills, invoices etc. At the same time this process can
take up resources such as CPU and memory. To make the process more efficient and
scalable we used a Serverless approach for generating PDFs.

Serverless pattern encourages development of well-defined units of business logic


without taking decision on how it’s deployed or scaled. It frees the user from
deployment concerns, cost is done based on the execution of your programme, Auto
scales are per traffic.

Serverless is mainly, of two types: BAAS (Backend as a service) that incorporates


third party, cloud hosted applications and services such as single page web apps or
https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 1/12
19/02/2021 PDF Generation With AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

mobile apps, FAAS (Function as a service) programmes run in stateless compute


containers. They can be triggered, last only by one invocation. AWS Lambda is one of
the most popular FAAS platforms at present.

AWS Lambda follows serverless architecture also known as serverless computing or


function as a service, FaaS

To explain why we used Serverless approach AWS Lambda, for PDF generation, we will
take an example of shipping invoice generation on each order delivery.

CPU Utilization: On each order delivery, user is communicated with shipping invoice
PDF. On an average there is about ~20-30K requests per day on order service for
generating PDFs. Generating each PDF on the backend service is a CPU expensive task,
which leads to high cpu consumption and high latency in requests. Also, there are
service restarts in case of high load due to PDF generation. To decouple the PDF
generation process from backend server we shifted it to AWS Lambda. PDF file will be
stored in AWS S3 Bucket and the file link is then shared with user over email. So
hereby, we distributed the load by shifting the process to AWS Lambda.

Fig 1.1 EC2 Backend Service CPU Utilisation Metric

After releasing PDF generation service on 28th April, 2020 there is a ~40% reduction
in maximum CPU utilisation metric of the backend server. This is after decoupling PDF
generation logic from the backend and generating PDFs through PDF generation
service on AWS Lambda .

Cost Effective: If we wanted to distribute the load only then this code could be
deployed on EC2 instance instead of AWS Lambda but here’s why we didn’t go for it.

https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 2/12
19/02/2021 PDF Generation With AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

In terms of scalability and cost, lambda provided efficient solution. AWS Lambda is
auto scalable so effort in scaling and managing EC2 instances is saved. Even smallest
EC2 instance t2 nano would be costlier for two reasons; First, we need an ALB
(Application Load Balancer) for load balancing between instances, which will add to
the cost. Second, traffic is not evenly distributed always so we would need more EC2
instances than planned and EC2 will consume some memory being allocated. Lambda
can handle the load balancing internally so no extra cost is added while scaling.
Following shows the AWS Lambda Calculation based on inputs observed in production
metrics and lambda settings.

Fig 1.2 AWS Lambda Pricing Inputs

Fig 1.3 AWS Lambda Pricing Calculation

https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 3/12
19/02/2021 PDF Generation With AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

On getting ~0.6 Million requests per month with each invocation duration as 2.5 sec
and memory allocated as 3008 MB, the final calculated amount is 76.50 USD.

Serverless : Usually PDF generation process is a background task in backend services.


Incase of high load on the service, results to cpu spikes as high number of backgrounds
tasks accumulates. This can be managed by queuing these tasks and running them on
intervals to prevent cpu spikes. But overall CPU consumption is still high and if a heavy
PDF generation task is picked, then it will increase CPU utilization further. Since, it is
possible to decouple PDF generation function from a backend server and it can be used
as a common utility then it can deployed as a FAAS service. In order to achieve this we
integrated it with API gateway and S3 to serve requests from multiple services.
Currently, in 1mg multiple services uses PDF Generation for generating PDFs.

Fig 1.4 PDF Generation Service Flow Diagram

The process of generating PDF follows these steps, take input as HTML template, render
it and create an HTML string which is passed to PDF library and string is converted into
PDF file. Code supports jinja templates for rendering HTML and converting the HTML
string to PDF. In order to make PDF generation service generic, it is templatized so that
different backend services will share template S3 location and dynamic data to Lambda
function through API.

We have used html-pdf library to convert HTML to PDF because:

https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 4/12
19/02/2021 PDF Generation With AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

1. We are passing the HTML content as buffer to the library and creating PDF instead
of directly creating PDFs on the disk such as PDFKit which increase high I/O ops

2. We used templates for PDF generation to make process generic. Services will pass
there template S3 location which will be used for generating PDFs. To create PDF
we used Jinja templates which html-pdf library supports.

Library uses PhantomJS module internally which is a headless browser. This executable
must be installed on your system. Follow the steps to install
PhantomJS(https://phantomjs.org/download.html).

Prerequisite:

1. Node js 12.x — AWS Lambda supports Nodejs 12 version

2. Install Serverless — Module to deploy service on AWS Lambda

npm i -g serverless

Layer:

A layer is ZIP archive to configure Lambda function to pull in additional code. You can
move runtime dependencies out of your function code by placing them in a layer.
Lambda runtimes include paths in the /opt directory to ensure that your function code
has access to libraries that are included in layers.

service: executables-layer

provider:
name: aws
stage: ${opt:stage, 'dev'}
region: ${opt:region, 'ap-south-1'}

layers:
pdfGenerator:
path: executables
name: pdfGenerator-${self:provider.stage}
description: Executable binaries required to convert html to pdf

https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 5/12
19/02/2021 PDF Generation With AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

resources:
Outputs:
PDFGeneratorLayerExport:
Value:
Ref: PDFGeneratorLayer
Export:
Name: PDFGeneratorLayerLayer-${self:provider.stage}

Structure your layer so that function code can access libraries without additional
configuration. We have created the layer previously, so all the executables that were
deployed are accessible in function like /opt/phantomjs_linux-x86_64.

Code:

We will be creating a simple function that takes HTML template file and dynamic data
for rendering as input and converts it to the PDF. In this example, we will be using Jinja
template engine. We will be uploading the generated PDF file to S3. We are using the
html-pdf library for converting HTML to PDF. To get more info. about the configuration
of html-pdf visit (https://www.npmjs.com/package/html-pdf)

Let’s get started with the function.

mkdir pdfGenerator
cd pdfGenerator
touch handler.js

Write the following code in the handler.js file

We have imported following libraries for PDF generation function. AWS-SDK library is
for using S3 client.

import pdf from 'html-pdf'


import AWS from 'aws-sdk'
import nunjucks from 'nunjucks'

In the code, you can see that we have set some environment variables before the
function. It is very important to set this environment variables to work properly. You
can find more info. about AWS Environment Variables.

https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 6/12
19/02/2021 PDF Generation With AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

process.env.PATH = `${process.env.PATH}:/opt`
process.env.FONTCONFIG_PATH = '/opt'
process.env.LD_LIBRARY_PATH = '/opt'

It is important that we initialise all the argument’s default values prior to avoid getting
exception within the function.

let OUT_PDF_OPTIONS = {"format":"Letter", "orientation":


"landscape", "border": '15mm', "zoomFactor": "0.6"};

let PDF_UPLOAD_ARGS = {ContentType: 'application/pdf', ACL:'public-


read'};

let OUTPUT_PDF_NAME_POSTFIX = ".pdf"

One of the best practices in lambda functions is to initialise all client objects globally so
that client object creation on each invocation is avoided to reduce average duration of
the invocation.

const s3 = new AWS.S3();

Here we are formatting input data before its used for PDF generation process.

const transform_inputs = payload => {


let template_dynamic_data = payload.template_dynamic_data
let template_s3_bucket_details =
payload.template_s3_bucket_details
let pdf_s3_bucket_details = payload.pdf_s3_bucket_details
let version = payload.version
let resource_lock_id = payload.resource_lock_id
return {'template_dynamic_data': template_dynamic_data,
'template_s3_bucket_details':
template_s3_bucket_details,
'pdf_s3_bucket_details': pdf_s3_bucket_details,
'version':version, 'resource_lock_id': resource_lock_id}
}

PDF Generator function:

https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 7/12
19/02/2021 PDF Generation With AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

export const pdfGenerator = async event => {


try {
let payload = event

let transform_payload = transform_inputs(payload);

// template bucket details

let template_s3_bucket =
transform_payload.template_s3_bucket_details.BUCKET_NAME
let template_s3_key =
transform_payload.template_s3_bucket_details.OBJECT_KEY

// Bucket details for storing PDF generated

let pdf_bucket =
transform_payload.pdf_s3_bucket_details.BUCKET_NAME;
let pdf_file_info =
transform_payload.pdf_s3_bucket_details.PDF_FILE_INFO;
let pdf_file_path = pdf_file_info.PATH

if (pdf_file_path && pdf_file_path.split('.').length > 1 &&


(!pdf_file_path.endsWith(OUTPUT_PDF_NAME_POSTFIX))){
console.log('Incorrect pdf file extension')
return {
'statusCode': 400,
'body': JSON.stringify({"message": "Incorrect pdf file
extension"})
}
}
if (pdf_file_path && pdf_file_path.split('.').length == 1){
pdf_file_path = pdf_file_path + OUTPUT_PDF_NAME_POSTFIX
}
let pdf_generation_options =
pdf_file_info.PDF_GENERATION_OPTION;
let pdf_upload_extra_args =
pdf_file_info.PDF_UPLOAD_EXTRA_ARGS;

// Dynamic data for rendering PDF


let render_data = transform_payload.template_dynamic_data

// Data for queuing purpose


let version = transform_payload.version
let resource_lock_id = transform_payload.resource_lock_id

// template Object
let Data = await s3.getObject({ Bucket: template_s3_bucket,
Key: template_s3_key }).promise();

// Body will be a buffer type so need to convert it to


string before converting to pdf
let html = Data.Body.toString();
let template = nunjucks.compile(html);

// Dynamic data rendered into the template


let content = template.render(render_data);
https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 8/12
19/02/2021 PDF Generation With AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

let options = OUT_PDF_OPTIONS;


if (pdf_generation_options &&
Object.keys(pdf_generation_options).length){
options = pdf_generation_options;
}

// PDF generation
let file = await exportHtmlToPdf(content, options);

// PDF upload to s3
let upload_args = PDF_UPLOAD_ARGS;
if (pdf_upload_extra_args &&
Object.keys(pdf_upload_extra_args).length){
upload_args = pdf_upload_extra_args
}
upload_args.Bucket = pdf_bucket
upload_args.Key = pdf_file_path
upload_args.Body = file
let file_upload_data = await
s3.upload(upload_args).promise();

// Response formatting
let message = ''
let url = ''
let status
if (file_upload_data.Location){
status = 200
url = file_upload_data.Location
message = 'PDF generated successfully'
}
else{
status = 400
url = ''
message = 'Error in generating pdf'
}
let response_message = {
'message': message,
'version': version,
'resource_lock_id': resource_lock_id
}
let body = {"message": response_message,
"url": url}
return {
'statusCode': status,
'body': JSON.stringify(body)
}
} catch (error) {
return {
'statusCode': 500,
'body': JSON.stringify(error)
}
}
}

exportHtmlToPDF function:

https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 9/12
19/02/2021 PDF Generation With AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

const exportHtmlToPdf = async (html, options) => {


return new Promise((resolve, reject) => {
options.phantomPath= "/opt/phantomjs_linux-x86_64";
pdf.create(html, options).toBuffer((err, buffer) => {
if (err) {
console.log('Error in exportHtmlToPdf')
reject(err)
} else {
resolve(buffer)
}
});
})
}

Another important configuration inside exportHtmlToPdf function is the


phantomPath is set to /opt/phantomjs_linux-x86_64. This path is important else you
will get an error saying PhantomJS not found.

Our function is now ready. Let us now set up the serverless.yml file for deployment
purpose of the script

touch serverless.yml

Use the following code in serverless.yml

service: pdfGenerator

provider:
name: aws
runtime: nodejs12.x
stage: ${opt:stage, 'dev'}
region: ${opt:region, 'ap-south-1'}
environment:
S3_BUCKET: file-upload-bucket

functions:
pdfGenerator:
handler: handler.pdfGenerator
layers:
- ${cf:executables-
layer-${self:provider.stage}.PDFGeneratorLayerExport}

# serverless optimization
package:
individually: true

https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 10/12
19/02/2021 PDF Generation With AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

custom:
webpack:
webpackConfig: ../webpack.config.js
includeModules:
forceExclude:
- aws-sdk
packagePath: ../package.json

plugins:
- serverless-webpack
- serverless-offline

We are now ready for deployment. Deploy the function with the following command:

sls deploy --stage dev

The function is now ready to use. After deployment, you will get an endpoint like
https://xxxxxxxx.execute-api.ap-south-1.amazonaws.com/dev/api/pdfGenerator. The
region can be different for you. Try to invoke the API with any tool you like.

Production Metrics:

Fig 1.5 AWS Lambda•Invocations

Fig 1.7 AWS Lambda•Duration

https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 11/12
19/02/2021 PDF Generation With AWS Lambda. Serverless Computing is a great way of… | by Anushka Rustagi | 1mg Technology | Medium

Fig 1.7 AWS Lambda•Concurrency

Over the time we have monitored Lambda Production Metrics and refactored the PDF
Generator lambda function to improve the average duration of invocations and high
success rate. To Further improve the efficiency of lambda function we are integrating it
to AWS SQS to support queuing mechanism. Thanks for reading the article and I hope
this has helped you. Stay tuned for further article from 1mg.

Thanks to Chitransha Mishra. 

Serverless AWS Lambda 1mg

About Help Legal

Get the Medium app

https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77 12/12

You might also like