You are on page 1of 6

b

PG Diploma in Data Science


(Program Curriculum)
For Prep Sessions + Batch Start Dates:
Please refer to upgrad.com

Note: This curriculum is subject to change based on inputs


from IIITB and Industry

TRACK COURSE MODULE NAME DESCRIPTION SESSION

Introduction to Excel

Taught by one of the most renowned


data scientists in the country
(S.Anand, CEO, Gramener), this
P R E PA R AT O R Y C O N T E N T

Data Analysis
module takes you from a beginner
in Excel
level Excel user to an almost
Data Analysis in Excel - I:
professional user.
Functions, Formulae, and Charts
PRE-PROGRAM

Data Analysis in Excel - II:


Pivots and Lookups

The CRISP-DM Framework - Business


and Data Understanding

Analytics Problem This module covers concepts of the


Solving CRISP-DM framework for business
problem-solving.

CRISP-DM Framework -
Data Preparation, Modelling,
Evaluation and Deployment

Basics of SQL: Data Retrieval, Com-


Data in companies is definitely not pound Functions, Relational Operators,
stored in excel sheets! Learn the fun- and Sorting
Data Analysis using SQL damentals of database and extract
information from RDBMS using the
structured query language. Advanced SQL: Aggregate
Functions, Nested Queries,
and Joins

Understanding the upGrad Coding


Console

Build a foundation for the most


Introduction to Python in-demand programming language Data Structures in Python
of the 21st century.

Control Structure and Functions

Logic and Syntax Building

Data Structures: Lists, Strings,


Dictionaries, and Stacks

Time Complexity
Learn how to approach and solve
Programming in Python logical problems using programming.

Searching and Sorting

Two Pointers

Recursion

Introduction to NumPy

Learn how to manipulate datasets in Operations on NumPy Arrays


Python using Pandas which is the
Python for Data Science most powerful library for data prepa-
ration and analysis.
Introduction to Pandas

Getting and Cleaning Data

Introduction to Data Visualization

Humans are visual learners and Basics of Visualization: Plots, Subplots


C O U R S E 1 : DATA TO O L K I T

hence no task related to data is com- and their Functionalities


Visualization in Python plete without visualisation. Learn to
plot and interpret various graphs in
Python and observe how they make Plotting Data Distributions
data analysis and drawing insights
easier.
Plotting Categorical and Time-Series
Data

Problem Statement

Reinforce the concepts learnt in data Evaluation Rubric


science through this rigorous assign-
IMDb Movie Assignment ment involving the past hundred
years of movie data. Final Submission

Solution

Data Sourcing

Data Cleaning
CURRICULUM

Univariate Analysis

Learn how to find and analyse the


Exploratory Data Analysis patterns in the data to draw action- Segmented Univariate
able insights.

Bivariate Analysis

Derived Metrics

Basics of Probability

Build a strong statistical foundation Discrete Probability Distributions


and learn how to 'infer' insights from
Inferential Statistics a huge population using a small
sample. Continuous Probability Distributions

Central Limit Theorem

Concepts of Hypothesis Testing - I: Null


and Alternate Hypothesis, Making a
Decision, and Critical Value Method
Understand how to formulate and
validate hypotheses for a population
Hypothesis Testing to solve real-life business problems. Concepts of Hypothesis Testing - II:
p-Value Method and Types of Errors

Industry Demonstration of Hypothesis


Testing: Two-Sample Mean and Propro-
tion Test, A/B Testing

Problem Statement

Solve a real industry problem Evaluation Rubric


Credit EDA Case Study through the concepts learnt in ex-
ploratory data analysis.
Final Submission

Solution

Venture into the machine learning Simple Linear Regression


Introduction to Machine community by learning how one vari-
Learning and Linear able can be predicted using several
other variables through a housing Multiple Linear Regression
Regression
dataset where you will predict the
prices of houses based on various
factors. Industry Relevance of Linear Regression

Problem Statement

Build a model to understand the Evaluation Rubric


factors car prices vary on and help a
Linear Regression
Chinese company enter the US car
Assignment market. Final Submission

Solution

Univariate Logistic Regression

Learn your first binary classification


technique by determining which cus-
Multivariate Logistic Regression: Model
Logistic Regression tomers of a telecom operator are
Building and Evaluation
COURSE 2 - MACHINE LEARNING

likely to churn versus who are not to


help the business retain customers.

Logistic Regression: Industry Applications

Introduction to Clustering

Learn how to group elements into


different clusters when you don't K-Means Clustering
have any pre-defined labels to
Unsupervised Learning: segregate them through K-means
Clustering clustering, hierarchical clustering, Hierarchical Clustering
and more.

Other Forms of Clustering: K-Mode,


K-Prototype, DB Scan

Introduction to Business Problem Solving


Learn how to approach open ended
Business Problem real world problems using data as a
Solving lever to draw actionable insights.
Business Problem Solving:
Practical Examples

Problem Statement

Apply the machine learning concepts Evaluation Rubric


Assignment:
Unsupervised + learnt to solve a real-life predictive
analytics problem.
Supervised Final Submission

Solution

Problem Statement

Help the Sales team of your compa- Evaluation Rubric


Case Study:
Lead Scoring ny identify which leads are worth
pursuing through this classification
case study. Final Submission

Solution
b

PG Diploma in Data Science


(Program Curriculum)
For Prep Sessions + Batch Start Dates:
Please refer to upgrad.com

Note: This curriculum is subject to change based on inputs


from IIITB and Industry

TRACK COURSE MODULE NAME DESCRIPTION SESSION

Introduction to Decision Trees

Algorithms for Decision Tree


Construction

Learn how the human decision


making process can be replicated
Tree Models using a decision tree and other
powerful ensemble algorithms.

Truncation and Puning

Random Forests

Principles of Model Selection

Learn the pros and cons of simple


and complex models and the
different methods for quantifying
Model Selection & model complexity, along with general Model Evaluation
General ML Techniques machine learning techniques like
feature engineering, model
evaluation, and many more.

Model Selection: Best Practices

Introduction to Boosting and AdaBoost


Learn about ensemble modelling
through bagging and boosting and
Bagging and Boosting understand how weak algorithms
can be transformed into stronger
ones. Gradient Boosting
COURSE 3 - ADVANCED MACHINE LEARNING

Generalized Linear Regression


In this module, take a more ad-
vanced look at regression models
Advanced Regression and learn the concepts related to
regularization.
Regularized Regression

Problem Statement

Evaluation Rubric
Build a regularized regression model
Advanced Regression to understand the most important
Assignment variables to predict the house prices
in Australia.

Final Submission

Solution

Prinicipal Component Analysis


In this module, take a more ad- and Singular Value Decomposition
Principal Component vanced look at regression models
Analysis and learn the concepts related to
regularization.
Principal Component
Analysis in Python

Introduction to Time Series and its


Components

Time Series Analysis In this module, you will learn how to


Working with Stationary
analyse and forecast a series that
Time Series
varies with time.

End-to-End Analysis of Time


Series
S P E C I A L I S AT I O N 1 : D E E P L E A R N I N G

Problem Statement

Evaluation Rubric
Solve the most crucial business prob-
Telecom Churn Case lem for a leading telecom operator in
Study India and southeast Asia - predicting
customer churn.

Final Submission

Solution

Structure of Neural Networks

Feed Forward in Neural Networks

Learn the most sophisticated and


Introduction to Neural cutting-edge technique in machine
learning - Artificial Neural Networks Backpropagation in Neural Networks
Networks
or ANNs

Modifications to Neural Networks


COURSE 4 - DEEP LEARNING AND NEURAL NETWORKS

Hyperparameter Tuning in Neural


Networks

Problem Statement

Evaluation Rubric
Build a neural network from scratch
Neural Networks in Numpy to identify handwritten
Assignment digits.

Final Submission

Solution

Introduction to Convolutional
Neural Networks

Building CNNs with Python and Keras

Convolutional Neural Build a neural network from scratch


Networks - Introduction in Numpy to identify handwritten
Style Transfer and Object Detection
and Industry digits.
Applications

Industry Demonstration:
Using CNNs with Flowers Images

Industry Demonstration: Using CNNs


with X-Ray Images

What Makes a Neural Network Recurrent

Ever wondered what goes behind


machine translation, sentiment anal-
ysis, speech recognition? Learn how
Recurrent Neural RNN helps in these areas having se- Variants of RNNs: Bidirectional RNNs
Networks quential data like text, speech, and LSTMs
videos, and a lot more.

Building RNNs in Python

Two Architectures: 3D Convs


and CNN-RNN Stack

Understanding Generators

Make a Smart TV system which can


control the TV with user’s hand ges-
Gesture Recognition
tures as the remote control

Starter Code Walkthrough

Problem Statement and Final Submission

An Overview of the Domain and


Associated Concepts
COURSE 5 - CAPSTONE PROJECT

Problem Statement

Evaluation Rubric
Choose from a range of real-world
industry woven projects on ad-
vanced topics like Recommendation
Capstone Project Systems, Fraud Detection, Emotion
Detection from faces, Social Media
Listening, Speech Recognition
among many others. Mid Submission

Final Submission

Solution
b

PG Diploma in Data Science


(Program Curriculum)
For Prep Sessions + Batch Start Dates:
Please refer to upgrad.com

Note: This curriculum is subject to change based on inputs


from IIITB and Industry

TRACK COURSE MODULE NAME DESCRIPTION SESSION

Introduction to Decision Trees

Algorithms for Decision Tree Construction


Learn how the human decision
Tree Models
making process can be replicated
using a decision tree and other
powerful ensemble algorithms.
Truncation and Puning

Random Forests

Principles of Model Selection


Learn the pros and cons of simple
and complex models and the
different methods for quantifying
Model Selection & model complexity, alongwith general Model Evaluation
General ML Techniques machine learning techniques like
feature engineering, model
evaluation, and many more.

Model Selection: Best Practices


COURSE 3 - ADVANCED MACHINE LEARNING

Learn about ensemble modelling Introduction to Boosting and AdaBoost


through bagging and boosting and
Bagging and Boosting understand how weak algorithms
can be transformed into stronger
ones.
Gradient Boosting

In this module, take a more ad- Generalized Linear Regression


vanced look at regression models
Advanced Regression and learn the concepts related to
regularization.

Regularized Regression

Problem Statement

Build a regularized regression model Evaluation Rubric


to understand the most important
Advanced Regression variables to predict the house prices
Assignment in Australia. Final Submission

Solution

Prinicipal Component Analysis and


Understand important concepts Singular Value Decomposition
related to dimensionality reduction,
the basic idea and the learning
Principal Component
algorithm of PCA, and its practical
Analysis
applications on supervised and
unsupervised problems.
Principal Component Analysis in Python

Introduction to Time Series and


its Components

In this module, you will learn how to


Time Series Analysis analyse and forecast a series that Working with Stationary Time Series
S P E C I A L I S AT I O N 2 : N L P

varies with time.

End-to-End Analysis of Time Series

Problem Statement

Solve the most crucial business Evaluation Rubric


Telecom Churn Case problem for a leading telecom
operator in India and southeast Asia
Study
- predicting customer churn. Final Submission

Solution

Introduction to Natural Language


Processing

Do you get annoyed by the constant


spams in yor mail box? Wouldn't it
be nice if we had a program to check Basic Lexical Processing:
your spellings? In this module learn Tokenization, Bag of Words,
how to build a spell checker & spam TF-IDF
Lexical Processing
detector using techniques like
phonetic hashing,bag-of-words,
TF-IDF, etc.
Advanced Lexical Processing:
Canonicalization, Phonetic Hashing,
Spell Corrector, Pointwise Mutual
Information

Introduction to Syntactic Processing

Learn how to analyse the syntax or


the grammatical structure of Parsing
C O U R S E 4 - N AT U R A L L A N G UAG E P R O C E S S I N G

sentences with the help of


Syntactic Processing algorithms & techniques like HMMs,
Viterbi Algorithm, Named Entity
Information Extraction
Recognition (NER), etc.

Conditional Random Fields (CRF)

Problem Statement

Build a POS tagger for tagging Evaluation Rubric


Syntactic Processing unknown words using HMM's &
modified Viterbi algorithm.
Assignment
Final Submission

Solution

Introduction to Semantic Processing

Learn the most interesting area in


the field of NLP and understand
different techniques like Distibutional Semantics
word-embeddings, LSA, topic
Semantic Processing modelling to build an application
that extracts opinions about socially
relevant issues (such as
Topic Modelling
demonetisation) on social media
platforms.

Social Media Opinion Mining: Semantic


Processing Case Study

Building Chatbots with Rasa


Imagine if you could make restau-
rant booking without opening
Zomato. Build your own restau-
Chatbot Case Study rant-search chatbot with the help of
RASA - an open source framework
and deploy it on Slack.
NLP Course Project - Building a
Chatbot: Problem Statment and Final
Submission

An Overview of the Domain and


Associated Concepts
COURSE 5 - CAPSTONE

Problem Statement
PROJECT

Choose from a range of real-world


industry woven projects on advanced
topics like Recommendation Systems,
Capstone Project Fraud Detection, Emotion Detection from Evaluation Rubric
faces, Social Media Listening, Speech
Recognition among many others.

Final Submission

Solution
b

PG Diploma in Data Science


(Program Curriculum)
For Prep Sessions + Batch Start Dates:
Please refer to upgrad.com

Note: This curriculum is subject to change based on inputs


from IIITB and Industry

TRACK COURSE MODULE NAME DESCRIPTION SESSION

OLAP vs OLTP

Learn how data is stored and which


Introduction to database is optimal to use in a Desgining a database schema
Databases particular scenario.

Creating ER Diagram

Stored functions and procedures in SQL

Apply advanced SQL concepts like


Advanced SQL and windowing and procedures to derive
insights from data and answer Conditional constructs in SQL
Best Practices
pertinent business questions.

Query Optimisation
C O U R S E 3 - S Q L A N D N O S Q L DATA B A S E S

Problem Statement

Design a schema with a predefined Evaluation Rubric


Schema Design and objective and create a database and
Retrieval Assignment perform analysis.
Final Submission

Solution

Introduction to NoSQL Database

Configuring MongoDB
Take your knowledge of query
languages a step further by learning
NoSQL Databases about MongoDB - a NoSQL database CRUD and MongoDB Shell
and Best Practices which is becoming more and more
popular in the industry.
Indexing and Aggregation

CouchDB/Redis/HBase/Neo4J

Understanding Big Data

Introduction to Apache Hive

Understand the basics of cloud and


Introduction to Cloud learn about the Hive Query Hive Data Models - Parititions and
and Hive Language. Buckets

File Formats in Apache Hive

Data Analysis using Apache Hive


S P E C I A L I S A T I O N 3 : B U S I N E S S I N T E L L I G E N C E / D A T A A N A LY T I C S

Problem Statement

Understand how a project in the in- Evaluation Rubric


SQL Case Study: dustry is taken up and solved
Analysing Big Data through a comprehensive business
in Retail case study. Final Submission

Solution

Advanced Functions in Excel

Learn the advanced concepts in Slicers with Pivots

Advanced Excel Excel and start to perform data


analysis like a pro!
Map Visualization

Data Analysis Toolpak: Descriptive


Summaries, Correlation, Regression

Data Exploration in Tableau

Learn advanced visualisation


Visualisation using techniques using the most in-demand Visualising and Analysing Data in Tableau
Tableau visualization tool in the industry.

Advanced Visualisations using Tableau

Problem Statement

Apply the new found Excel and Evaluation Rubric


Interactive Marketing Tableau skills to solve an exciting
Campaign Analysis business assignment.
Final Submission

Solution

PowerBI: Introduction and Setup

Visualising and Analysing Data


in PowerBI
Take your visualization game a step
Visualisation using forward by understanding how to
PowerBI operate PowerBI.

Data Transformations using


PowerBI

Making Interactive Dashboards


C O U R S E 4 - N AT U R A L L A N G UAG E P R O C E S S I N G

with PowerBI

Introduction to R

Data Structures and


Programming Constructs in R

Get a brief introduction to another


popular data science language and
learn how to manipulate dataframes
Introduction to R Dataframe Manipulation
in R and learn to create attractive
and RShiny in R
dashboards and web applications
using ShinyR.

Visualizations in R

Creating Web Applications using


RShiny

Introduction to MS Powerpoint

Identifying the Right Insights

Effective Communication Learn how to effectively strategise,


Strategies, Formats, communicate, and fine grain your data Documenting Insights
analysis projects.
and Templates

Choosing the Right View for your


Dashboard

Storytelling with Visualization

Understanding the End Goal of your


Presentation

Understanding your Stakeholder and


Stakeholder Empathy

Presentations to Understand how to optimally present


Levels of Details for Different
Technical and your findings to technical and
Stakeholders - CXO/Leadership Vs
Non-Technical non-technical stakeholders and upgrade
Team Presentations
your storytelling skills.
Stakeholders

Do's and Don't's of Making a


Presentation

Case Study: Making a Presentation


for Different Stakeholders

Problem Statement

Understand how a project in the Evaluation Rubric


Business Case Study industry is taken up and solved
through a comprehensive business
case study. Final Submission

Solution

An Overview of the Domain and


Associated Concepts
COURSE 5 - CAPSTONE

Problem Statement
PROJECT

Solve an end-to-end real-life industry


problem from a wide variety of domains
like Marketing, Retail, E-Commerce, Evaluation Rubric
Capstone Project
Supply Chain, Healtcare, BFSI, and many
more.

Final Submission

Solution
b

PG Diploma in Data Science


(Program Curriculum)
For Prep Sessions + Batch Start Dates:
Please refer to upgrad.com

Note: This curriculum is subject to change based on inputs


from IIITB and Industry

TRACK COURSE MODULE NAME DESCRIPTION SESSION

Introduction to Decision Trees

Algorithms for Decision Tree


Construction

Learn how the human decision


making process can be replicated
Tree Models using a decision tree and other
powerful ensemble algorithms.

Truncation and Puning


COURSE 3 - ADVANCED MACHINE LEARNING

Random Forests

Introduction to Time Series and its Com-

In this module, you will learn how to


Time Series Analysis analyse and forecast a series that Working with Stationary Time Series
varies with time.

End-to-End Analysis of Time Series

Problem Statement

Apply the concepts learnt in time Evaluation Rubric


Time Series Assignment series to solve a forecasting
problem.
Final Submission

Solution

Principles of Model Selection


Learn the pros and cons of simple
and complex models and the
different methods for quantifying
Model Selection & model complexity, alongwith general Model Evaluation
General ML Techniques machine learning techniques like
feature engineering, model
evaluation, and many more.
Model Selection: Best Practices

Problem Statement

Solve the most crucial business Evaluation Rubric


Telecom Churn Case problem for a leading telecom
operator in India and southeast Asia
Study
- predicting customer churn. Final Submission

Solution

Stored functions and procedures in SQL


Apply advanced SQL concepts like
windowing and procedures to derive
Advanced SQL and insights from data and answer Conditional constructs in SQL
Best Practices pertinent business questions.

Query Optimisation

Advanced Functions in Excel

Slicers with Pivots


Learn advanced visualisation tech-
Advanced Excel niques using the most in-demand
visualization tool in the industry.
Map Visualization
S P E C I A L I S A T I O N 4 : B U S I N E S S A N A LY T I C S

Data Analysis Toolpak: Descriptive


Summaries, Correlation,
Regression

Introduction to Structured Problem Solving

Structured Problem Apply the new found Excel and


Solving using Tableau skills to solve an exciting Understanding the Business Problem - I:
Frameworks business assignment. Interviewing and Basic Frameworks

Understanding the Business Problem - II:


5W, 5WHYS, SPIN Framework, etc.

Introduction to Hypothesis Formulation

The module will equip you with a


stepwise process for understanding
Hypothesis Formulation Issue Tree Framework
a business problem and building
hypotheses around it.

Specialized Frameworks: 4Ps and 5C

Problem Statement

Apply your learnings from the course Evaluation Rubric


Assignment to solve a real-life business problem.

Final Submission

Solution

Introduction to Cost Modelling

Learn about evaluation models


Revenue and Operational Modelling Frameworks
covering revenue and operational
Cost Modelling cost modelling.
COURSE 4 - BUSINESS REQUIREMENTS

Revenue and Operational


Cost Modelling in Various
Domains

Introduction to Financial Concepts

Revenue, Balance Sheets, Cash Flow


Statements

Understand financial concepts such as


revenue, cost of goods sold, profit, EBITDA
Introduction to balance sheets, cash flow statements,
Economics and and EBITDA. Also, get a brief
Financial Concepts introduction to economics concepts
including supply and demand curves,
cost curves, and a lot more! Introduction to Economics

Supply-Demand Curves

Basic Econometrics

Introduction to MS Powerpoint

Identifying the Right Insights

Effective Communication Learn how to effectively strategise,


Documenting Insights
Strategies, Formats, communicate, and fine grain your data
and Templates analysis projects.

Choosing the Right View for your


Dashboard

Storytelling with Visualization

Understanding the End Goal of your


Presentation

Understanding your Stakeholder and


Stakeholder Empathy

Presentations to Understand how to optimally present


Levels of Details for Different
Technical and your findings to technical and
Stakeholders - CXO/Leadership Vs
Non-Technical non-technical stakeholders and upgrade
Team Presentations
your storytelling skills.
Stakeholders

Do's and Don't's of Making a


Presentation

Case Study: Making a Presentation


for Different Stakeholders

Problem Statement

Understand how a project in the Evaluation Rubric


Business Case Study industry is taken up and solved
through a comprehensive business
case study. Final Submission

Solution

An Overview of the Domain and


Associated Concepts
COURSE 5 - CAPSTONE

Problem Statement
PROJECT

Solve an end-to-end real-life industry


problem from a wide variety of domains
like Marketing, Retail, E-Commerce, Evaluation Rubric
Capstone Project
Supply Chain, Healtcare, BFSI, and many
more.

Final Submission

Solution
b

PG Diploma in Data Science


(Program Curriculum)
For Prep Sessions + Batch Start Dates:
Please refer to upgrad.com

Note: This curriculum is subject to change based on inputs


from IIITB and Industry

TRACK COURSE MODULE NAME DESCRIPTION SESSION

Concepts retailed to distributed


computing

Understand the world of distributed


Introduction to Hadoop data processing and storage with
and MapReduce Hadoop. Learn to write MapReduce Hadoop Distributed File System
Programming jobs in Python.

MapReduce Programming in Python

Enterprise Data Management

Data Management and Understand the concepts of Data


Relational Database Management and learn to model Relational Database Modelling
Modelling data for a Relational Database.

Normal Forms and ER Diagrams


C O U R S E 3 - DATA E N G I N E E R I N G - I

Concepts of NoSQL Databases

Learn the concepts of NoSQL Introduction to Apache HBase


NoSQL Databases and databases. Understand the workings
Apache HBase of Apache HBase.
HBase Python API

Comparision of NoSQL Databases

Introduction to Data Warehouse and


Data Lakes

Data Warehousing Understand the intricacies behind


Designing Data Warehousing for an
(Optional) designing a data warehouse and
ETL Data Pipeline
a data lake for your use case.

Designing Data Lake for an ETL


Data Pipeline

Introduction to Data Ingestion

Data Ingestion with Get familiar with the challenges


Structured data ingestion with Sqoop
Apache Sqoop and involed in data ingestion. Use Sqoop
Apache Flume and Flume to ingest structured and
S P E C I A L I S AT I O N 5 : DATA E N G I N E E R I N G

unstructured data into Hadoop.

Unstructured data ingestion with


Flume

Problem Statement

Practice the concepts learnt so far Evaluation Rubric

Assignment (Optional) with this comprehensive assignment.

Final Submission

Solution

Fundamentals of Apache Hive


Manage and query a data ware-
Building and Querying house with Apache Hive. Learn to
Data Warehouse write optimized HQL for large scale Writing HQL for Data Analysis
with Apache Hive data analysis.

Hive Query Optimization

Introduction and Problem Statement


Make use of Sqoop, Flume, Hive and
Case Study: Ingestion HBase to design an ETL data
& Warehousing pipeline.
Grading Rubrics and Submission

Introduction to Apache Spark

Apache Spark Architecture


Get introduced tp Apache Spark, a
Data Processing with lighting fast big data processing
engine. Use PySpark to create large
PySpark PySpark APIs - RDDs, DataFrames, SQL
scale data processing applications.

Spark Job optimization

Fundamentals of Apache Kafka

Understand the producer-consumer


Real-Time Data
architecture of Apache Kafka. Learn
Streaming with Apache Setting up Kafka Producer and
to set up a Kafka cluster for
Kafka managing real-time data. Consumer

Kafka Connect API

Spark Streaming Architecture


C O U R S E 4 - DATA E N G I N E E R I N G - I I

"Real-Time Data Learn about the real-time data


Processing using processing architecture of Apache
Spark Streaming" Spark. Build Spark Streaming Spark Streaming APIs
applications to process data in
real-time.

Building Stream Processing Application


with Spark

Problem Statement

Evaluation Rubric
Use Kafka and Spark to develop a
Assignment (Optional) real-time data processing
applicaiton.
Final Submission

Solution

Fundaments of Oozie/Airflow

Building Automated Data Automate your Data Pipeline with


Pipelines with Workflow Management with
Apache Airflow.
Oozie/Airflow Oozie/Airflow

Automating an entire Data


Pipeline with Oozie/Airflow

"Exploratory Data Analysis with PySpark"

Use PySpark to do EDA and Predictive


Analytics using PySpark Analysis of large datasets.
Predictive Analysis with Spark MLlib

Introduction and Problem Statement


Build an end-to-end real-time data
Case Study: Kafka, Spark processing application using Spark
Streaming and PySpark Streaming and Kafka.

Grading Rubrics and Submission

An Overview of the Domain and


Associated Concepts
COURSE 5 - CAPSTONE

Problem Statement
PROJECT

The capstone project will stich all the


components of data engineering
together. Evaluation Rubric
Capstone Project

Final Submission

Solution

You might also like