You are on page 1of 3

DATA ENGINEERING ROADMAP

Prerequisites
DBMS and SQL
Python


Linux
Hands On Hands On

Everything Data
Data warehouse

Course | Book

Also, do some research about more data concepts like

Data Lake, Data Mart, Data Fabric, Data Mesh, Data Catalog, etc 

Distributed Systems
Spark with Python

Course | Hands On

Also, research basics about Hadoop, Hive, Pig, MPP systems

Cloud

GCP or AWS or Azure

Consider certifications like AWS solutions architect, AWS big data speciality, GCP
Professional Data Engineer, Azure Cloud Fundamentals, etc

Must learn tools


Compute: Databricks and CICD: Jenkins and
Orchestration: Airflow
Snowflake Sonarqube

Streaming: Kafka Containers: Docker

Databricks substitutes can be AWS EMR or GCP Dataproc

Snowflake substitutes can be AWS Redshift or GCP Big Query


Projects

Batch processing Real time processing

Suggestion is to create a free tier account in any cloud platform that you prefer.
AWS, GCP or Azure and implement end to end projects

Additional Optional Topics


Data visualisation with
Building, Training and ETL/ELT: Matillion or
Tableau or Power BI or
Deploying ML models Talend
Looker

Containers: Kubernetes

And be an SME in one of the topics above. pick any tool or topic and deep dive
further! 
Projects

Batch processing Real time processing

Suggestion is to create a free tier account in any cloud platform that you prefer.
AWS, GCP or Azure and implement end to end projects

Additional Optional Topics


Data visualisation with
Building, Training and ETL/ELT: Matillion or
Tableau or Power BI or
Deploying ML models Talend
Looker

Containers: Kubernetes

And be an SME in one of the topics above. pick any tool or topic and deep dive
further! 

You might also like