Professional Documents
Culture Documents
Aniket Mini - Project-Report
Aniket Mini - Project-Report
Submitted by
External Guide
Mr. Ramesh
School of Computer Science and Applications
REVA University, Bangalore
1
CERTIFICATE
This is to certify that the Minor project work entitled “Build A Serverless Real-Time Data
(R18BS005) under my supervision and guidance and that this Minor project work has not formed the
basis for the award of any Degree / Diploma / Associate ship / Fellowship or similar title to any candidate
of any University.
Place: Bangalore
Date:
2
DECLARATION
I, Aniket Vilas Chinchole (R18BS005) hereby declare that this Minor project work entitled
guidance of Prof Deepa B G and Mr. Ramesh and that this Minor project work has not
formed the basis for the award of any Degree / Diploma / Associate ship / Fellowship or similar
title to any candidate of any University.
Date
Countersigned by
3
ABSTRACT
In this mini project, we will learn how to build a serverless app to process real-time data streams. We
will build infrastructure for a fictional ride-sharing company.
In this case, we will enable operations personnel at a fictional Wild Rydes headquarters to monitor the
health and status of their unicorn fleet. Each unicorn is equipped with a sensor that reports its location
and vital signs. We will use AWS to build applications to process and visualize this data in real-time.
We will use AWS Lambda to process real-time streams, Amazon DynamoDB to persist records in a
NoSQL database, Amazon Kinesis Data Analytics to aggregate data, Amazon Kinesis Data Firehose to
archive the raw data to Amazon S3, and Amazon Athena to run ad-hoc queries against the raw data.
4
TABLE OF CONTENTS
CHAPTERS PAGE NO
1. INTRODUCTION
1.1 INTRODUCTION TO THE PROJECT 06
1.2 STATEMENT OF PROBLEM 06
2. SYSTEM ANALYSIS
2.1 EXISTING SYSTEM 07
2.2 PROPOSED SYSTEM 07
3. TOOLS AND TECHNOLOGIES
3.3 HARDWARE REQUIREMENTS 08
3.4 SOFTWARE REQUIREMENTS 08
4. SYSTEM DESIGN
4.1 ARCHITECTURE DIAGRAM 09
4.2 ALGORITHMS USED 10
5. IMPLEMENTATION 12
6. SNAPSHOTS 21
7. CONCLUSION 32
BIBLIOGRAPHY
5
CHAPTER 1
INTRODUCTION
Cloud Computing has become very popular due to the multiple benefits it provides and is being
adopted by businesses worldwide. Flexibility to scale up or down as per the business needs,
faster and efficient disaster recovery, subscription-based models which reduce the high cost of
hardware, and flexible working for employees are some of the benefits of cloud that attracts
businesses. Similar to cloud, Data Analytics is another crucial area which businesses are
exploring for their growth. With the exponential rise in the amount of data available on the
internet is a result of the boom in the usage of social media, mobile apps, IoT devices, sensors
and so on. It has become imperative for the organisations to analyse this data to get insights into
their businesses and take appropriate action.
AWS provides a reliable platform for solving complex problems where cost-effective
infrastructure can be built with great ease at low cost. AWS provides a wide range of managed
services, including computing, storage, networking, database, analytics, application services and
many more.
We have analysed multiple software solutions which perform analysis on data collected from
the market and provide information as well as suggestions and provide better customer
experience. This includes trade application providing stock price, taxi companies providing
locations of nearby taxis, journey plan applications providing live updates on the different
transport media and many more.
We have considered a “server-less” platform / “Server-less Computing Execution Model” to
build the real-time data-processing app. Architecture is based on managed services provided by
AWS.
6
What is “Server-less”?
A cloud-based execution model in which the cloud provider dynamically allocates and
runs the server. This is a consumption-based model where pricing is directly proportional
to consumer use. AWS takes complete ownership of operational responsibili ties eliminating
infrastructure management and availability with higher uptime.
7
CHAPTER 2
SYSTEM ANALYSIS
2.1 EXISTING SYSTEM
This workshop is an interactive exercise that builds infrastructure to collect, process, and persist
data without using servers
Agenda
• Overview of workshop scenario, modules, and the services we'll utilize
• Review of workshop prerequisites and tools
• Execution of four modules-each about thirty minutes in length
Calling all serverless developers! Wild Rydes (www.wildrydes.com), the world's leading unicorn
transportation startup, needs your help! The company's rydesharing network of unicorns has grown
to thousands, fulfilling hundreds of thousands of passenger rydes each day. Your mission is to collect,
store, process, and analyze data to track the real-time location and health of our unicorns. In this
workshop, learn how to build infrastructure to process data streams in real time using Amazon
Kinesis. Build serverless applications using Amazon Kinesis Analytics to aggregate and summarize
data and use AWS Lambda to store aggregated data in Amazon DynamoDB. Finally, use Amazon
Kinesis Firehose to build a data lake in Amazon S3, and use Amazon Athena to run ad-hoc queries
against it. Requirements: laptop, text editor, AWS account, and AWS Command Line Interface (CLI)
installed and configured.
8
CHAPTER 3
SYTEM REQUIREMENTS
3.1 HARDWARE REQUIREMENTS:
• Laptop/Desktop
• Keyboard
• Mouse
• Internet (2Mbps of speed)
9
CHAPTER 4
SYSTEM DESIGN
10
4.2 ALGORITHM USED
Overview
Serverless applications don’t require you to provision, scale, and manage any servers. We
can build them for nearly any type of application or backend service, and everything
required to run and scale your application with high availability is handled for you.
Serverless architectures can be used for many types of applications. For example, you can
process transaction orders, analyze click streams, clean data, generate metrics, filter logs,
analyze social media, or perform IoT device data telemetry and metering.
We will use AWS to build applications to process and visualize this data in real-time. We
will use AWS Lambda to process real-time streams, Amazon DynamoDB to persist
records in a NoSQL database, Amazon Kinesis Data Analytics to aggregate data, Amazon
Kinesis Data Firehose to archive the raw data to Amazon S3, and Amazon Athena to run
ad-hoc queries against the raw data.
This workshop is broken up into four modules. We must complete each module before
proceeding to the next.
2. Aggregate data
Build a Kinesis Data Analytics application to read from the stream and
aggregate metrics like unicorn health and distance traveled each minute.
Use AWS account email address and password to sign in as the AWS account root user to
the IAM console at https://console.aws.amazon.com/iam/.
In the navigation pane of the console, choose User, and then choose Add user.
• Go to the AWS Management Console, select Service then select Cloud9 under
Developer Tools.
12
Installation
• Go to the AW Management Console, select Service then select Kinesis under Analytics
• Select Get started if prompted with an introductory screen.
• Enter wildrydes into Kinesis stream name and 1 into Number of shards, then select
Create Kinesis stream.
• Within 60 seconds, Kinesis stream will be ACTIVE and ready to store real-time
streaming data.
• Go to the AWS Management Console, services Services then select Cognito under
Security, Identity & Compliance.
• Select Manage Identity Pools.
• Enter wildrydes into Identity pool name.
• Tick the Enable access to unauthenticated identities checkbox.
• Click Create Pool. 13
• Click Allow which will create authenticated and unauthenticated roles for your identity
pool.
• Select Go to Dashboard.
• Select Edit identity pool in the upper right-hand corner.
• Note the Identity pool ID for use in a later step.
• Stop the producer by pressing Control + C and notice the messages stop and the unicorn
disappear after 30 seconds.
• Start the producer again and notice the messages resume and the unicorn reappear.
• Hit the (+) button and click New Terminal to open a new terminal tab.
• Start another instance of the producer in the new tab. Provide a specific unicorn name 14
and notice data points for both unicorns in consumer’s output: ./producer -name
Bucephalus
• Check the dashboard and verify you see multiple unicorns.
• Go to the AWS Management Console, click Services then select Kinesis under
Analytics.
• Select Get started if prompted with an introductory screen.
• Select Create data stream.
• Enter wildrydes-summary into Kinesis stream name and 1 into Number of shards, then
select Create Kinesis stream.
• Within 60 seconds, your Kinesis stream will be ACTIVE and ready to store the real-
time streaming data.
• Switch to the tab where you have your Cloud9 environment open.
• Run the producer to start emitting sensor data to the stream.
• Go to the AWS Management Console, click Services then select Kinesis under
Analytics.
• Select Create analytics application.
• Enter wildrydes into the Application name and then select Create application.
• Select Connect streaming data.
• Select wildrydes from Kinesis stream
• Scroll down and click Discover schema, wait a moment, and ensure that schema was
properly auto discovered.
• Select Save and continue.
• Select Go to SQL editor. This will open up an interactive query session where we can
build a query on top of our real-time Amazon Kinesis stream.
• Select Yes, start application. It will take 30-90 seconds for your application to start.
• Copy and paste the following SQL query into the SQL editor:
15
• Select Save and run SQL. Each minute, you will see rows arrive containing the
aggregated data. Wait for the rows to arrive.
• Switch to the tab where you have your CLoud9 environment opened.
• Run the consumer to start reading sensor data from the stream: ./consumer -stream
wildrydes-summary
• Switch to the tab where you have your Cloud9 environment opened.
• Stop the producer by pressing Control + C and notice the messages stop.
• Start the producer again and notice the messages resume.
• Hit the (+) button and click New Terminal to open a new terminal tab.
• Start another instance of the producer in the new tab. Provide a specific unicorn name
and notice data points for both unicorns in
• Go to the AWS Management Console, choose Services then select DynamoDB under
Database.
• Click Create table.
• Enter Unicorn Sensor Data for the Table name.
• Enter Name for the Partition key and select String for the key type. 16
• Tick the Add sort key checkbox. Enter Status Time for the Sort key and select String
for the key type.
• Leave the Use default settings box checked and choose Create.
• Scroll to the Table details section of your new table's properties and note the Amazon
Resource Name (ARN). You will use this in the next step.
• From the AWS Console, click on Services and then select IAM in the Security, Identity
& Compliance section.
• Select Policies from the left navigation and then click Create policy.
• Using the Visual editor, we're going to create an IAM policy to allow our Lambda
function access to the DynamoDB table created in the last section. To begin, select
Service, begin typing DynamoDB in Find a service, and click DynamoDB.
• Select Action, begin typing Batch Write Item in Filter actions, and tick the Batch Write
Item checkbox.
• In Region, enter the AWS Region in which you created the DynamoDB table in the
previous section, e.g.: us-east-1.
• In Account, enter your AWS Account ID which is a twelve digit number, e.g.:
123456789012. To find your AWS account ID number in the AWS Management
Console, click on Support in the navigation bar in the upper-right, and then click
Support Center. Your currently signed ID appears in the upper-right corner below the
Support menu.
• In Table Name, enter Unicorn Sensor Data.
• Select Add.
• Select Review policy.
• Enter WildRydes DynamoDB Write Policy in the Name field.
• Select Create policy.
• Select Roles from the left navigation and then select Create role.
• Select Next: Permissions.
• Begin typing AWS Lambda Kinesis Execution Role in the Filter text box and check the
box next to that role.
• Begin typing WildRydes DynamoDB Write Policy in the Filter text box and check the
box next to that role.
• Click Next: Review.
• Enter WildRydes Stream Processor Role for the Role name.
• Click Create role.
• Go to the AWS Management Console, choose Service then select Lambda under
Compute.
• Select Create a function.
17
• Enter WildRydes Stream Processor in the Name field.
• Select Node.js 6.10 from Runtime.
• Select WildRydes Stream Processor Role from the Existing role dropdown.
• Select Create function.
• Scroll down to the Function code section.
• Copy and paste the JavaScript code below into the code editor.
• Run the producer to start emitting sensor data to the stream with a unicorn name:
./producer -name Rosinante
• Select the Monitor tab and explore the metrics available to monitor the function. Select
Jump to Logs to explore the function's log output.
• From the AWS Management Console select Services then select S3 under Storage.
• Select Next twice, and then select Create bucket.
• From the AWS Management Console select Services then select Kinesis under
Analytics.
• Select Create delivery stream.
• Enter wildrydes into Delivery stream name.
• Select Kinesis stream as Source and select wildrydes as the source stream.
• Select Next.
• Leave Record transformation and Record format conversation disabled and select Next.
• Select Amazon S3 from Destination.
• Choose the bucket you created in the previous section (i.e. wildrydes-data-johndoe)
from S3 bucket.
• Select Next.
• Enter 60 into Buffer interval under S3 Buffer to set the frequency of S3 deliveries once
per minute.
• Scroll down to the bottom of the page and select Create new or Choose from IAM role.
In the new tab, click Allow.
• Select Next. Review the delivery stream details and select Create delivery stream.
5.5 Clean Up
• Clean Amazon Athena.
• Clean Kinesis Data firehose.
• Clean S3.
• Clean Lambda.
• Clean DynamoDB.
• Clean IAM.
20
CHAPTER 6
SNAPSHOTS
21
=
6.3 Create an Amazon Kinesis stream:
22
6.5 Create an identity pool for the unicorn dashboard:
23
6.7 View unicorn status on the dashboard:
24
6.9 Create an Amazon Kinesis Data Analytics application:
25
6.10 Create an Amazon DynamoDB tables:
26
6.12 Create a Lambda function to process the stream:
27
6.13 Monitor the Lambda function:
28
6.15 Create an Amazon S3 bucket:
29
6.17 Create an Amazon Athena table:
30
6.19 Query the data files:
31
CHAPTER 7
CONCLUSION
Using AWS services, we were able to create a real-time data processing application based
on serverless architecture which is capable of accepting data through Kinesis data streams,
processing through Kinesis Data Analytics, triggering Lambda Function and s toring in
DynamoDB. The architecture can be reused for multiple data types from various data
sources and formats with minor modifications. We have used all the managed services
provided by AWS which led to zero infrastructure management efforts.
Capstone project has helped us in building practical expertise on AWS services like
Kinesis, Lambda, Dynamo DB, Athena, S3, Identity and Access Management, Serverless
Architecture and Managed Services. We have also learnt the Go programming language to
build pseudo data producer programs. AWS CLI has helped us to connect on-premises
infrastructure with cloud services.
32
BIBLIOGRAPHY
1. https://aws.amazon.com/getting-started/hands-on/build-serverless-real-time-data-
processing-app-lambda-kinesis-s3-dynamodb-cognito-athena/
2. https://github.com/Rajpratik71/Distributed-Systems-
Complete/blob/master/Build%20a%20Serverless%20Real-
Time%20Data%20Processing%20App.md
3. https://blog.griddynamics.com/how-to-create-a-serverless-real-time-analytics-platform-a-
case-study/
33