You are on page 1of 10

Post Graduate Program in

Data Science and


Engineering
( PGP-DSE )

A WALKTHROUGH
1. Program Overview

2. Learning Journey
3. Pedagogy
What's In Here 4. Projects
PROGRAM OVERVIEW
GreatLakes
Great Lakes PG Program
PG Program in Data
in Data Science Science is &
& Engineering a
5Engineering is aclassroom
month full-time 5 month full-time
program classroom
that preparesprogram
young
that prepares
graduates and earlyyoung
careergraduates
professionalsand early with
(typically career
0-3
years of work experience)
professionals (typically to acquire
with 0-3 relevant
years skills and
of work
jumpstart theirtocareers
experience) acquirein relevant
Data Science.
skills and jumpstart their
careers in Data Science.
The program delivery follows a bootcamp model that is
The program delivery follows a bootcamp model that is
intensive, exhaustive and blends the right amount of the
intensive,
theory andexhaustive
application.and blends
The the right
program has amount
16 weeksof the
of
theory and application. The program has close
classroom learning sessions and 4 weeks of project work. to 300
hours of classroom
Throughout the durationlearning
of the sessions
program, and
the lab hours.
candidates
Throughout
would also bethe duration
doing of the
multiple program,
projects thewould
which candidates
enable
them
wouldgetalso
confidence and exposure
be doing multiple toprojects
industry-like problems.
which would
The program would conclude with the
enable them get confidence and exposure to Capstone Project
delivered by Great Lakes faculty and industry professionals
industry-like problems. The program would conclude
from the field of analytics.
with the Capstone Project (in the last 7-8 weeks)
deliveredfrom
Learning by some
Great Lakes topfaculty
of India’s ranked and
facultyindustry
and a
professionals from the field of analytics.
pedagogy that emphasises on classroom based learning
Learningtheory
through from some of India’s top
and application rankedthefaculty
makes and a
candidates
pedagogy
industry readythat emphasises
and relevant for Dataon classroom
Science roles oncebased
they
graduate.
learning through theory and application makes the
candidates industry ready and relevant for Data
Science roles once they graduate.
INTRODUCTION TO EXPLORATORY DATA
PROGRAMMING ANALYSIS &
Python VISUALIZATION DATA SCIENCE
• Conditional statements STATISTICS • Data cleansing MACHINE APPLICATIONS
• Loops USING • Data manipulation LEARNING • Ensemble
• User defined functions PYTHON • Missing value • Regression techniques
SQL • Summary statistics treatment • Supervised • Time series
• Data querying • Inferential statistics • Outlier treatment learning forecasting Certified
• Joins • Hypothesis testing • Building dashboards • Unsupervised • Text mining & Data Scientist
• Aggregations • ANOVA using Tableau learning sentiment Analysis • Capstone project

Program
starts 2nd week 4th week 8th week 12th week 16th week 20th week

LEARNING OUTCOMES
PEDAGOGY
The highlight of the program is the “boot camp style delivery” through
a structured learning framework, which incorporates a lab session
every day during the class hours. This ensures that the students
practically get to apply what they learn each day in the class and also
work on a mini project during the end of every week. The program
includes regular learning interventions intended for revision and this
concludes with a hackathon.

PGP-DSE uses a combination of learning methods that include


classroom teaching, self-learning through videos and reading
materials, team-based problem solving, mini projects, hackathons
and sessions with industry experts. Classes are conducted on
weekdays and assisted by online webinars, discussions, and
assignments. The capstone project is a mandatory
application-oriented industry project undertaken by all candidates to
develop the acumen to solve real-life business problems in
collaboration with their mentors. Industry experts and Great Lakes
faculty mentor the students through the entire duration of the
capstone project.
MODULES: CORE – FOUNDATION

Course: • Data types • Conditional statements


• Introduction to Python and
Introduction to Programming in Jupyter notebook • Basic functions in Python • User defined functions
Python • Syntax • Loops • Getting help

• An Introduction to Relational • Retrieve data from Single • Retrieve and Transform


Course: Database Concepts and SQL Tables- (use of SELECT data from multiple
SQL Queries on RDBMS Statement) and the power of Tables using JOINS
• Accessing Data Servers WHERE and ORDER by and Unions –
MYSQL/RDBMS Concepts
Clause Introduction to Views
• Working with Aggregate • Writing Subqueries
functions, grouping
and summarizing Records

• Data cleaning • Numpy • Box plot


Course:
• Missing value treatment • Pandas • Scatter plot
Exploratory Data Analysis using
• Descriptive statistics (mean, • Slicing and indexing data frames • Histogram
Python median, mode, standard deviation, Data visualization using Seaborn and Matplotlib

correlation, covariance)

• Probability (simple, conditional • CLT, Construction of • Chi square analysis


Course: and joint probability which is confidence intervals and • ANOVA
Statistical Methods for Decision application focussed) setting up hypothesis
Making (SMDM) • Binomial and Normal Distribution • 1 sample testing, 2 sample testing for mean
MODULES: CORE – MACHINE LEARNING

• Linear regression • KNN • Logistic regression


Course: • Naive Bayes • CART
Supervised Learning

Course: • Clustering • Decision trees • PCA


Unsupervised Learning

• Random Forest • Bagging • Boosting


Course:
Ensemble Techniques

MODULES: DATA SCIENCE APPLICATIONS


• Introduction to Data • Basic charts and dashboard • Visual Analytics: Storytelling
Course:
Visualization • Descriptive Statistics, through data
Data Visualization using Tableau
• Introduction to Tableau Dimensions and Measures • Dashboard design & principles

Course: • Text Classification


• Bag of word Analysis
Text mining and sentiment • Sentiment Analysis • Topic modelling
Analysis

TOOLS & TECHNOLOGIES


• Python • Tableau • SQL
PROJECTS

1 Course : EDA using Python

Project statement: Analysis of medical records of patients to identify respiratory disorders based on their medical history.

Project Description: Descriptive statistical analysis of medical dataset to identify the distribution of data points to analyse the medical
records of individual’s lung capacity. The analysis would revolve around summary statistics and graphical exploration of the data set
using box plots and bar charts in Python.

2 Course : Statistical Methods for Decision Making

Project statement: Recommendation of CEO compensation based on inferential statistical techniques using Python.

Project Description: Identify a benchmark for CEO compensation (salary package) and recommend it to the HR department for hiring
a prospective CEO candidate. This recommendation is based on the analysis of salary details collected from a market survey comprising
around 200 companies. The recommendation would be based on the analysis involving summary & inferential statistics.

Project statement: User engagement analysis on board-games


3
Project Description: A game design company which is into board games category, has entered the market in year 2015. They are
interested in analysing the amount of time gamers spends for each type of game. The project involves analysis using the "Two Sample
Test at 95% Confidence". This activity involves data cleaning and transformation as a primary step. It also involves visualization,
modelling, hypothesis and power of test.
PROJECTS

3 Course : Data Visualization Using Tableau

Project statement: Analysis of territorial sales of motor cycle spare parts across the globe.

Project Description: This project involves the idea of understanding what kind of data needs to be subjected to visual analytics.
It deals with visualizing numeric, categorical, multi variable and multi-dimensional data. Dashboards, score cards and infographics
based visualization mediums would-be used to visualize the time and space dimensional relationships between various measures
and categories in the data set.

4 Course : Supervised Learning (Regression)

Project statement: Analysis of medical records to predict cardio vascular issues in patients. This involves building a machine learning
model (based on regression techniques) to predict the outcome based on training data sets.

Project Description: The problem requires building a regression model based on analysis of different parameters like Age,
Blood Pressure, Cholesterol, etc. and then identify if weather linear or logistic regression model needs to be used. The also needs
fine-tuning in order to get best prediction model and test statistic.
PROJECTS

5 Course : Supervised Learning (CART)

Project statement: Development of a machine learning model (based on CART) to identify profitable market segments for cross
selling personal loans of a bank.

Project Description: A bank has executed a campaign to cross-sell Personal Loans. As part of their Pilot Campaign, 20000 customers
were sent campaigns through email, sms, and direct mail. They were given an offer of Personal Loan at an attractive interest rate of
12% and processing fee waived off if they respond within 1 Month. 2512 customer expressed their interest and are marked as
Target = 1 Many Demographics and Behavioural variables provided would be provided.

A model using Supervised Learning Techniques needs to be build, to identify profitable segments to target for cross-selling personal
loans. Make necessary assumptions have to be made appropriately.

6 Course : Time Series forecasting

Project statement: Predicting the sales for the period of October 2017 to December 2018 on 2 different types of FMCG items, based on
the sales data obtained between the period of January 2002 and September 2017.

Project Description: The problem requires fair bit of understanding of the 2 products in order to predict their demand and how it has
fared over time, seasonally. It also needs an exploration of unexplained deviation in demand in order to minimize it. The problem needs
to be solved using Time Series modelling in order to advise the store manager regarding product stocking details.

VER_19_10_2018

You might also like