Welcome to Scribd!

AWS Abstract

Uploaded by

0% found this document useful (0 votes)

6 views1 page

AWS services like S3, Glue, Athena, and DataBrew are proposed to build a scalable and cost-efficient data pipeline. S3 would serve as the central data lake to ingest raw data flexibly. Glue would perform ETL processes through automated schema discovery and script generation to clean and normalize data. Processed data would then be stored back in S3 or elsewhere. Athena allows querying and analysis of S3 data through SQL without needing to move or pre-process the data first. DataBrew further enhances the pipeline with visual data preparation capabilities for analysts. This proposed stack addresses challenges of handling diverse and large volumes of data for analytics.

Original Description:

Original Title

AWS abstract

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views1 page

AWS Abstract

Uploaded by

iambloodyrolex

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

Title: “AWS data pipeline”

Abstract:

The proposed data pipeline begins with Amazon S3 as the central data lake, providing a secure and
scalable storage solution for diverse datasets. Raw data is ingested into S3, maintaining flexibility for
various data formats and types.

AWS Glue is employed for Extract, Transform, Load (ETL) processes. Glue simplifies data preparation and
transformation tasks through automated schema discovery and dynamic ETL script generation. This
streamlines the workflow, allowing for efficient data cleansing, normalization, and enrichment.

The orchestrated ETL jobs within Glue seamlessly move processed data back to S3 or into other storage
solutions. This ensures that the data is ready for analysis while maintaining traceability and versioning.

For querying and analysis, Amazon Athena comes into play. Athena allows users to run SQL queries
directly on the data stored in S3, eliminating the need for complex data movement or pre-processing.
This serverless query service enables quick and cost-effective analysis, providing near real-time insights
without the need for managing infrastructure.

Moreover, the integration of AWS Glue DataBrew can enhance the pipeline by providing visual data
preparation capabilities, allowing data analysts and scientists to interactively explore, clean, and
transform data without writing code.

By leveraging this comprehensive AWS service stack, organizations can establish a resilient, scalable, and
cost-efficient data pipeline, addressing the challenges of handling large volumes of diverse data for
analytics and decision-making.

Azure Synapse
Document609 pages
Azure Synapse
Shubham Saraf
No ratings yet
AWS Data-Lake Ebook
Document9 pages
AWS Data-Lake Ebook
sdonthy_gmail
No ratings yet
SQL Database Azure
Document800 pages
SQL Database Azure
Gopal Arunachalam
No ratings yet
AWS Glue
Document36 pages
AWS Glue
clouditlab9
No ratings yet
Uplokm
Document3 pages
Uplokm
k2sh
No ratings yet
Aws Glue Interview
Document259 pages
Aws Glue Interview
Rick V
No ratings yet
Architecture
Document6 pages
Architecture
codrin enea
No ratings yet
Amazon S3 Notes
Document8 pages
Amazon S3 Notes
Rocky Arunn
No ratings yet
Affinity
Document7 pages
Affinity
k2sh
No ratings yet
ADF Course Deck
Document88 pages
ADF Course Deck
clouditlab9
No ratings yet
Research On AWS Glue
Document5 pages
Research On AWS Glue
Jack Kenneth Bondoc-Cutiongco
No ratings yet
Athffna
Document8 pages
Athffna
k2sh
No ratings yet
Lambda Architecure On For Batch Aws
Document12 pages
Lambda Architecure On For Batch Aws
nanich
No ratings yet
Real-Time Data Analytics
Document10 pages
Real-Time Data Analytics
emmanuel perez
No ratings yet
Real-Time Data Analytics
Document25 pages
Real-Time Data Analytics
emmanuel perez
No ratings yet
Most Frequently Asked Azure Data Factory Interview Questions
Document5 pages
Most Frequently Asked Azure Data Factory Interview Questions
Vijay Rajendiran
0% (1)
AWS Glue
Document225 pages
AWS Glue
Rick V
No ratings yet
Module 3
Document6 pages
Module 3
Abigail Mendez
No ratings yet
WP 6 ETL Guidelines (New Temp)
Document9 pages
WP 6 ETL Guidelines (New Temp)
Michael
No ratings yet
Athena
Document13 pages
Athena
k2sh
No ratings yet
Eb Cloud Data Lake Comparison Guide en
Document13 pages
Eb Cloud Data Lake Comparison Guide en
Asghar
No ratings yet
AWS-Archi-SERVERLESS MULTI-TIER ARCHITECTURE
Document7 pages
AWS-Archi-SERVERLESS MULTI-TIER ARCHITECTURE
Narendra
No ratings yet
Real-Time Data Analytics
Document24 pages
Real-Time Data Analytics
emmanuel perez
No ratings yet
Athna
Document13 pages
Athna
k2sh
No ratings yet
SQL DW
Document596 pages
SQL DW
karthikt27
No ratings yet
Taking Interviw
Document15 pages
Taking Interviw
A MISHRA
No ratings yet
3-Migrating and DataLoading Into ADB
Document9 pages
3-Migrating and DataLoading Into ADB
AndersonLondoño
No ratings yet
Big Data PDF
Document18 pages
Big Data PDF
Anushka Sinha
No ratings yet
Azure Synapse With Power BI Dataflows
Document19 pages
Azure Synapse With Power BI Dataflows
Aashish sahu
No ratings yet
The Core Computing Products in Google Cloud Platform Include
Document11 pages
The Core Computing Products in Google Cloud Platform Include
Sree Charan Mudiya (TCS)
No ratings yet
Data Factory
Document26 pages
Data Factory
Parminder Singh Kailey
100% (2)
AWS Data Analytics Specialty Exam Cram Notes
Document43 pages
AWS Data Analytics Specialty Exam Cram Notes
mnats2020
No ratings yet
Srilakshmi ADE Resume
Document4 pages
Srilakshmi ADE Resume
Srilakshmi M
No ratings yet
AWS Athena Knowledgebase
Document4 pages
AWS Athena Knowledgebase
David Joseph
No ratings yet
Deloitte Take Home Challenge - V2
Document83 pages
Deloitte Take Home Challenge - V2
Likhith D
No ratings yet
Vinodsingh CloudDataEngineer 900 (1) (1)
Document5 pages
Vinodsingh CloudDataEngineer 900 (1) (1)
HARSHA
No ratings yet
Power Machine Learning at Scale: Mapping Parallelized Modeling-to-HPC Infrastructure On AWS
Document20 pages
Power Machine Learning at Scale: Mapping Parallelized Modeling-to-HPC Infrastructure On AWS
park hyo
No ratings yet
Gartner-Building A Data Mana 771797 NDX
Document65 pages
Gartner-Building A Data Mana 771797 NDX
Aadi Manchanda
No ratings yet
50 New Features of Microsoft SQL Server 2008
Document6 pages
50 New Features of Microsoft SQL Server 2008
ernkjha
No ratings yet
SQL Server Question
Document16 pages
SQL Server Question
Dhiraj Gautame
No ratings yet
Ec2 Storage
Document7 pages
Ec2 Storage
Sandeep Ch
No ratings yet
Collaborative Apache
Document2 pages
Collaborative Apache
Sankalp Sales & Networking Pvt. Ltd.
No ratings yet
What Is Azure Data Engineer
Document74 pages
What Is Azure Data Engineer
Bala Krishna E
No ratings yet
AWS Glue
Document3 pages
AWS Glue
Parveen Mittal
No ratings yet
Lakehouse Analytics
Document20 pages
Lakehouse Analytics
nahuel.bourdichon
No ratings yet
44 AdvDevOps Exp 2
Document7 pages
44 AdvDevOps Exp 2
Shubham Nakte
No ratings yet
AWS
Document3 pages
AWS
Adam GameChannel
No ratings yet
Azure Data Factory
Document4 pages
Azure Data Factory
saranaji
No ratings yet
Azure Datalake
Document8 pages
Azure Datalake
Rahaman Abdul
No ratings yet
025.0 ADF Overview
Document12 pages
025.0 ADF Overview
clouditlab9
No ratings yet
SSIS Best Practices
Document14 pages
SSIS Best Practices
mailtopvvk
No ratings yet
Shelly Bansal - SR Data Engineer
Document6 pages
Shelly Bansal - SR Data Engineer
sri
No ratings yet
Data Lake Implementation Improved Processing Time by 4X
Document5 pages
Data Lake Implementation Improved Processing Time by 4X
purum
No ratings yet
Types of Activities in ADF
Document37 pages
Types of Activities in ADF
Balaji Komma
100% (1)
Azure Synapse Link For Azure Cosmos DB, Benefits, and When To Use It - Microsoft Docs
Document9 pages
Azure Synapse Link For Azure Cosmos DB, Benefits, and When To Use It - Microsoft Docs
demetrius albuquerque
No ratings yet
Looker For Amazon Redshift
Document9 pages
Looker For Amazon Redshift
sheikh abdullah aleem
No ratings yet
What Is Data?
Document27 pages
What Is Data?
Ivan Estuardo Echeverría Catalán
No ratings yet
Azure Synapse Analytics
Document7,794 pages
Azure Synapse Analytics
Prasenjit Patnaik
No ratings yet
What Is Azure Synapse Data Explorer (Preview) - Azure Synapse Analytics - Microsoft Docs
Document6 pages
What Is Azure Synapse Data Explorer (Preview) - Azure Synapse Analytics - Microsoft Docs
demetrius albuquerque
No ratings yet
Serverless Data Engineering
From Everand
Serverless Data Engineering
Chuck Sherman
No ratings yet