Professional Documents
Culture Documents
Analytics
‘Implementing Big Data’ : Case Study
04 05 06
Implementation AWS EMR EMR Services
07 08
Challenges & Future Conclusion
2
01 Paytm & Big Data
Trends in big data
Insights are
generated via new
AI technologies like
Advance business Boost
transform machine learning and
intelligence productivity
the norms natural language
You don’t processing.
have to be
big to use
big data With increasing
Minimize risk Build stronger Security amounts of data being
produced, protection
and fraud customer remains and security of sensitive
relationship significant and private information
is crucial.
3
02 Company Profile
Employs cutting-edge
Paytm, founded in 2010, is an technologies like AI, blockchain, 400+ million registered users as
Indian fintech and e-commerce Cloud, NFC and machine of 2022.
giant. learning
5
Challenges of Legacy Data
Pipeline
6
04 Implementation
Migration Strategies
Legacy pipeline
• Optimization of hardware usage.
• Reduced data analytical processing
time.
Need for
• Configured data with spark to analyse
Implementation of
newly updated/inserted records
MIGRATION • Implemented incremental processing to
help reduce scanning time and storage
capacity
PARTNERSHIPS
Generating approximately 250K reports
per day, which are consumed by Paytm By partnering with AWS, the Paytm Central Data
executives and merchants Platform team created a modern data pipeline in a
short amount of time.
Analytical jobs took approximately
It provides reduced data analytical times with
8–10 hours to complete, which often led to
extraordinary scaling capabilities, generating high
Service Level Agreements (SLA) breaches. quality reports for the executive management and
merchants on a daily basis.
7
8
05 AWS EMR
9
06 EMR Services used by Paytm
Apache Spark For real-time and batch processing
Apache Mahout, TensorFlow, or other ML libraries For fraud detection, risk assessment,
and personalized user experiences.
Apache Flink or Spark Streaming To Monitor real-time transactions and detect anomalies
Apache Zeppelin, Tableau To represent key performance indicators, customer trends, and
business insights.
10
07 Challenges to benefits
12
AWS EMR
Walk-through
Lets go to AWS Console
13
STEPS
1. Log-in to AWS console
2. Navigate to EMR Service
3. Click on "Create Cluster"
4. Configure Cluster
a. Select the
appropriate release
label (EMR version)
b. Choose the
applications
c. Choose Instance
Type(master & core
nodes)
d. Configure Cluster
Permissions
e. Configure bootstrap
actions or scripts
f. Configure storage
5. Configure network and
security.
6. Submit & access cluster
14
15
16
17
18
19
Thanks!
Do you have any questions?
20
References
● https://aws.amazon.com/blogs/big-data/how-paytm-modernized-their-data-pipeline-using-
amazon-emr/
● https://medium.com/@parth09/democratising-the-data-computation-in-complex-organisatio
ns-dc360243e36a
● https://aws.amazon.com/solutions/case-studies/paytm/
● https://paytm.com/blog/engineering/building-cloud-native-solutions-with-aws-codedeploy/
● https://www.linkedin.com/pulse/amazon-emr-your-solution-handle-big-data-musa-emin-oz
dem
● https://aws.amazon.com/emr/getting-started/
● https://www.sas.com/content/dam/SAS/documents/infographics/2019/en-big-data-110869.
pdf
● https://youtu.be/QuwaBOESGiU?si=OI1E4swkA28Dhfrq
● https://us-east-2.console.aws.amazon.com/emr/home?region=us-east-2#/clusters
21