DataCamp - Data Engineer

Uploaded by

evolutionjourney.id

0% found this document useful (0 votes)

5 views2 pages

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

5 views2 pages

DataCamp - Data Engineer

Uploaded by

evolutionjourney.id

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Data Engineer

Intro - What is Data Engineering?

It comes the data engineer
 Data scattered
 Not optimized for analyses
 Legacy coding is causing corrupt data
Data Engineer to the rescue!

Data engineers: making your life easier

 Gather data from different sources
 Optimized database for analyses
 Removed corrupt data
Data scientist’s life got way easier!

Definition of job
An engineer that develops, constructs, tests, and maintains architectures such as databases and large-scale
processing systems.
 Processing large amounts of data
 Use clusters of machines

Data engineer vs Data scientist

Data Engineer Data Scientist
Develop scalable data architecture Mining data for patterns
Streamline data acquisition Statistical modelling
Set up processes to bring together data Predictive models using machine learning
Clean up corrupt data Monitor business processes
Well versed in cloud technology Clean outliers in data

Intro - Tools of the data engineer

Databases
 Hold large amounts of data
 Support application
 Other databases are used for analyses

Processing
 Clean data
 Aggregate data
 Join data
-Data engineer understand the abstractions

Scheduling
 Plan jobs with specific intervals
 Resolve dependency requirements of jobs

Existing tools: example

 Databases: MySQL, PostgreSQL, etc
 Processing: Spark, Hive, etc
 Scheduling: Apache AirFlow, Oozie, etc. Or using simple Bash tools: Cron

Intro - A data pipeline

To sum everything up, you can think of the data engineering pipeline through this diagram. It extracts all data
through connections with several databases, transforms it using a cluster computing framework like Spark, and loads
it into an analytical database. Also, everything is scheduled to run in a specific order through a scheduling framework
like Airflow. A small side note here is that the sources can be external APIs or other file formats too. We'll see this in
the exercises.
----------------------------------------> Scheduling (Apache AirFlow) ---------------------------------------->
SQL(Accounting) -----------------> Processing (Apache Spark) -----------------> SQL(Analitycs)
SQL(Online Stone)
No SQL(Catalog)

Intro - Cloud Providers

Data processing in the cloud
Clusters of machines required
Problem: self-host data-center
 Cover electrical and maintenance costs
 Peaks vs Quiet moments: hard to optimize
Solution: use the cloud

Data storage in the cloud

Reliability is required
Problem: self-host data-center
 Disaster will strike
 Need different geographical locations
Solution: use the cloud

The big three: AWS, Azure & Google

AWS: 32% market share in 2018
Azure: 17% market share in 2018
Google: 10% market share in 2018

Storage
Upload files, e.g. storing product images
Services
 AWS S3
 Azure Blob Storage
 Google Cloud Storage

Computation
Perform calculations, e.g. hosting a web server
Services
 AWS EC2
 Azure Virtual Machines
 Google Compute Engine

Databases
Hold structured information
Services
 AWS RDS
 Azure SQL Database
 Google Cloud SQL

Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
Rating: 5 out of 5 stars
5/5 (1)
The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform
From Everand
The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform
Ron C. L'Esteve
No ratings yet
Microsoft Certified Azure Data Engineer Associate Skills Measured
Document5 pages
Microsoft Certified Azure Data Engineer Associate Skills Measured
Fais
No ratings yet
Exam DP-203: Data Engineering On Microsoft Azure - Skills Measured
Document5 pages
Exam DP-203: Data Engineering On Microsoft Azure - Skills Measured
welhie
0% (1)
Srilakshmi ADE Resume
Document4 pages
Srilakshmi ADE Resume
Srilakshmi M
No ratings yet
Introduction To Data Engineering
Document23 pages
Introduction To Data Engineering
Chandra Putra
100% (1)
2.3 - Best Practices - Native Hadoop Tool Sqoop
Document24 pages
2.3 - Best Practices - Native Hadoop Tool Sqoop
Cherrie Ann Bautista Domingo-Dillera
No ratings yet
Real-Time Data Analytics
Document24 pages
Real-Time Data Analytics
emmanuel perez
No ratings yet
Data Warehousing: L.Ramanathan Asst. Prof. Scse VIT University
Document34 pages
Data Warehousing: L.Ramanathan Asst. Prof. Scse VIT University
thelionphoenix
No ratings yet
ETCW03
Document3 pages
ETCW03
Editor IJAERD
No ratings yet
Systems Analysis & IT Project Management: Pepper
Document25 pages
Systems Analysis & IT Project Management: Pepper
Tahir Adem
No ratings yet
Systems Analysis & IT Project Management: Pepper
Document25 pages
Systems Analysis & IT Project Management: Pepper
Ermias
No ratings yet
Systems Analysis & IT Project Management: Pepper
Document25 pages
Systems Analysis & IT Project Management: Pepper
yuri love
No ratings yet
Systems Analysis & IT Project Management: Pepper
Document25 pages
Systems Analysis & IT Project Management: Pepper
Tahir Adem
No ratings yet
ADF Course Deck
Document88 pages
ADF Course Deck
clouditlab9
No ratings yet
Manoj Kumar
Document3 pages
Manoj Kumar
Mandeep Bakshi
No ratings yet
Data Stage Faqs
Document47 pages
Data Stage Faqs
Vamsi Krishna Emany
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
DP 900
Document229 pages
DP 900
Riya Roy
100% (1)
Oracle SQL Developer
From Everand
Oracle SQL Developer
Narayanan Ajith
No ratings yet
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
Spark SQL
Document34 pages
Spark SQL
Roxana Godoy Astudillo
No ratings yet
Integrating Diverse Data Sources in Tableau To Elevate Performance
Document5 pages
Integrating Diverse Data Sources in Tableau To Elevate Performance
International Journal of Innovative Science and Research Technology
No ratings yet
Resumedata Engineer
Document3 pages
Resumedata Engineer
kcmfkgalt
No ratings yet
Oracle Database Performance Tuning: Presented By-Rahul Gaikwad
Document42 pages
Oracle Database Performance Tuning: Presented By-Rahul Gaikwad
Zakir Chowdhury
No ratings yet
ABHINAY VARMA PINNAMARAJU - Data Engineering
Document6 pages
ABHINAY VARMA PINNAMARAJU - Data Engineering
Chandra Babu Nookala
No ratings yet
Parul Priyadarshani ZZ
Document3 pages
Parul Priyadarshani ZZ
karan yadav
No ratings yet
Azure Synapse
Document609 pages
Azure Synapse
Shubham Saraf
No ratings yet
A Data Warehouse Technical Architecture - v3.0
Document84 pages
A Data Warehouse Technical Architecture - v3.0
Amir Grafiters
No ratings yet
Datastage Interview Questions - Answers - 0516
Document29 pages
Datastage Interview Questions - Answers - 0516
rachit
No ratings yet
Bidirectional Data Import To Hive Using SQOOP
Document6 pages
Bidirectional Data Import To Hive Using SQOOP
International Journal of Innovative Science and Research Technology
No ratings yet
Ayush
Document25 pages
Ayush
ayushchoudhary20289
No ratings yet
Shelly Bansal - SR Data Engineer
Document6 pages
Shelly Bansal - SR Data Engineer
sri
No ratings yet
Data Science Life Cycle PDF
Document403 pages
Data Science Life Cycle PDF
ragh K
No ratings yet
Datastage: Datastage Interview Questions/Answers
Document28 pages
Datastage: Datastage Interview Questions/Answers
ashok7374
No ratings yet
Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
Rating: 4.5 out of 5 stars
4.5/5 (2)
Azure Synapse Analytics
Document7,794 pages
Azure Synapse Analytics
Prasenjit Patnaik
No ratings yet
DP 203t00 Data Engineering On Microsoft Azure - en
Document2 pages
DP 203t00 Data Engineering On Microsoft Azure - en
Asif Khan
No ratings yet
Tapasvi - Lead GCP Cloud Data Engineer
Document5 pages
Tapasvi - Lead GCP Cloud Data Engineer
Sri Guru
No ratings yet
Assignment No. 1: Lab Practices-2 Fourth Year Computer Engineering Engineering
Document16 pages
Assignment No. 1: Lab Practices-2 Fourth Year Computer Engineering Engineering
Dhiraj Patil
No ratings yet
Naukri Vamsi (7y 5m)
Document5 pages
Naukri Vamsi (7y 5m)
Arup Naskar
No ratings yet
What's The Difference Between VAR A1 - A4 and VAR A1 - A4?
Document5 pages
What's The Difference Between VAR A1 - A4 and VAR A1 - A4?
Satish_Reddy_7671
No ratings yet
Abhishek BDA File
Document23 pages
Abhishek BDA File
Curious Nation
No ratings yet
End Term Report
Document26 pages
End Term Report
Mohit Akerkar
No ratings yet
Relational Reporting Business Reporting Management Reporting
Document7 pages
Relational Reporting Business Reporting Management Reporting
Vineet Choudhary
No ratings yet
Database Integrated Analytics Using R
Document7 pages
Database Integrated Analytics Using R
Esubalew Belay
No ratings yet
SAP Datasphere - Replication Flow
Document22 pages
SAP Datasphere - Replication Flow
nilgunsen_sapbi
No ratings yet
Azure Data Factory
Document2 pages
Azure Data Factory
pdvprasad_obiee
No ratings yet
Vinodsingh CloudDataEngineer 900 (1) (1)
Document5 pages
Vinodsingh CloudDataEngineer 900 (1) (1)
HARSHA
No ratings yet
Akash Data Engineer
Document6 pages
Akash Data Engineer
HARSHA
No ratings yet
MonishKunar DataAnalyst Resume
Document3 pages
MonishKunar DataAnalyst Resume
valish silverspace
No ratings yet
Jyostna DataEngineer GCEAD
Document5 pages
Jyostna DataEngineer GCEAD
Nishant Kumar
No ratings yet
Sol05 en
Document4 pages
Sol05 en
Alya penta agharid
No ratings yet
Reading - Data Warehousing Specialist
Document4 pages
Reading - Data Warehousing Specialist
Qaisar Shakoor
No ratings yet
R01 1
Document7 pages
R01 1
vitig2
No ratings yet
Applications of Data Science
Document3 pages
Applications of Data Science
song studio
No ratings yet
DS 7 5 Quest Ansrs Tuning
Document73 pages
DS 7 5 Quest Ansrs Tuning
George E. Coles
No ratings yet
Project Name
Document5 pages
Project Name
Rajeshwar Reddy Racha
No ratings yet
Notes Informatica
Document121 pages
Notes Informatica
Vijay Lolla
100% (3)
Note
Document9 pages
Note
KING MAKER
No ratings yet
100 Exquisite Adjectives
Document12 pages
100 Exquisite Adjectives
cikguhensem
No ratings yet
Oracle: 1Z0-1072 Exam
Document7 pages
Oracle: 1Z0-1072 Exam
Rahman Anisur
No ratings yet
Qadi Iyad On Marriage
Document336 pages
Qadi Iyad On Marriage
ghouse
No ratings yet
(By Gerhard Kittel, Geoffrey Bromiley) Theological 5323616
Document831 pages
(By Gerhard Kittel, Geoffrey Bromiley) Theological 5323616
Steven Yong
100% (16)
Setting, Characters and Themes in Twelfth Night
Document4 pages
Setting, Characters and Themes in Twelfth Night
Nirvana Suggie
No ratings yet
Feminism: Introduction and Aims: Rajpal Kaur
Document3 pages
Feminism: Introduction and Aims: Rajpal Kaur
alexis kim
No ratings yet
EnglishFile4e Upp-Int TG PCM Grammar 8B
Document1 page
EnglishFile4e Upp-Int TG PCM Grammar 8B
Dawid Kobylański
No ratings yet
Reading 5 - Data Preparation
Document23 pages
Reading 5 - Data Preparation
NR Yalife
No ratings yet
Session 2 Principles of Effective Communication
Document22 pages
Session 2 Principles of Effective Communication
ANIRUDDHA JADHAV
No ratings yet
Rakshith A. P. Profile English
Document2 pages
Rakshith A. P. Profile English
Dr. RAKSHITH A P
No ratings yet
RGBDuino Manual V1.1
Document35 pages
RGBDuino Manual V1.1
Ruben Esqueda
No ratings yet
Lesson 3 - Literary Works in The Region
Document6 pages
Lesson 3 - Literary Works in The Region
Katerina Tagle
50% (2)
Top Java Interview Q/A
Document32 pages
Top Java Interview Q/A
Sample One
No ratings yet
Btech 1 Sem Programming For Problem Solving Kcs 101 2018 19 PDF
Document3 pages
Btech 1 Sem Programming For Problem Solving Kcs 101 2018 19 PDF
Alok Singh
No ratings yet
8 Essential Rhythm Patterns: Let's Get Started
Document6 pages
8 Essential Rhythm Patterns: Let's Get Started
mikelppz
No ratings yet
CS Final Project - Debasish Khatei
Document23 pages
CS Final Project - Debasish Khatei
RAM CHANDRA
No ratings yet
1 - Parallel - Inverter Manual PDF
Document9 pages
1 - Parallel - Inverter Manual PDF
sakthi presquet
No ratings yet
Data Power
Document136 pages
Data Power
venkata
No ratings yet
Jurus 6 Coordinate Connectors Yang Benar
Document6 pages
Jurus 6 Coordinate Connectors Yang Benar
dayuwijayanti
No ratings yet
Roots of Equations: Bracketing Methods: Chapter Objectives
Document20 pages
Roots of Equations: Bracketing Methods: Chapter Objectives
lapu
No ratings yet
Simple Past Tense: Exercises
Document5 pages
Simple Past Tense: Exercises
gaston
No ratings yet
12.12.2 Storage of Data 12.13 Danfoss FC Control Profile 12.13.1 Control Word According To FC Profile (8-10 Control Profile FC Profile)
Document3 pages
12.12.2 Storage of Data 12.13 Danfoss FC Control Profile 12.13.1 Control Word According To FC Profile (8-10 Control Profile FC Profile)
ARGENIS GOMEZ
No ratings yet
DIALOGUES OF THE BUDDHA v1 RHYS DAVIDS
Document370 pages
DIALOGUES OF THE BUDDHA v1 RHYS DAVIDS
zdc
100% (1)
Oauth Developers Guide - Aconex Web Services1 0 0
Document6 pages
Oauth Developers Guide - Aconex Web Services1 0 0
yamini Reddy
No ratings yet
Sharath CV
Document3 pages
Sharath CV
Sharath Kumar
No ratings yet
09
Document39 pages
09
Alex
No ratings yet
Name: Grade: 4Th Test Paper 1. Complete With The Correct Form of The Verb To Be - Am/is/are
Document1 page
Name: Grade: 4Th Test Paper 1. Complete With The Correct Form of The Verb To Be - Am/is/are
Rosca Helena
No ratings yet
Writing Critique Paper
Document34 pages
Writing Critique Paper
Prescilo Nato Palor IV
No ratings yet
Clone Repositories From Github Using Git Bash: Step 1:-Login To Github Account. Step2: Create New Repository
Document9 pages
Clone Repositories From Github Using Git Bash: Step 1:-Login To Github Account. Step2: Create New Repository
Davis Arnold
No ratings yet
Lesson Exemplar in English Q4 - WK 6
Document5 pages
Lesson Exemplar in English Q4 - WK 6
Analyn Bagasala
No ratings yet