Professional Documents
Culture Documents
D ATA E N G I N E E R I N G F O R E V E R Y O N E
Hadrien Lacroix
Content Developer at DataCamp
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
A general definition
Data processing: converting raw data into meaningful information
Remove unwanted data No long term need for testing feature data
Optimize memory, process and network Can't a ord to store and stream les this
costs big
Convert data from one type to another Convert songs from .flac to .ogg
What it consists in
Hadrien Lacroix
Content Developer at DataCamp
Scheduling
Can apply to any task listed in data processing
Hadrien Lacroix
Content Developer at DataCamp
Parallel computing
Basis of modern data processing tools
Necessary:
Mainly because of memory
How it works:
Split tasks up into several smaller subtasks
Advantages
Extra processing power
Disadvantages
Moving data incurs a cost
Communication time
Hadrien Lacroix
Content Developer
Cloud computing for data processing
Servers on premises Servers on the cloud
Bought Rented
Processing power unused at quieter times The closer to the user the be er
Hadrien Lacroix
Content Developer at DataCamp
Actually, YOU are the champion!
How important it is
Parallel computing
Cloud computing