You are on page 1of 3

Unit 1 | Week (1 - 4)

Planning and Thinking Skills for Architecting Data Science


Solutions
 10V's of data, Understanding Classification, Segmentation, Regression and Optimization (The
general tasks of a Data Scientist)
 Understanding Statistical (Discriminative and Generative), Non-Parametric (Instance Based and
Iterative) Models Graphically
 The Latest Trends: Sub-Space, Spectral, Kernel and Neural Networks

Unit 2 | Week (5 - 8)

Foundation Courses
 Data Analytics in Excel - foundation to dashboarding
 Visualization using Tableau
 Python / R Programming - coding structures, data handling, control structures, etc.

Unit 3 | Week (9 - 12)


Statistical modeling & EDA for Predictive Analytics

 Analytics Problem Solving - CRISP-DM Framework for business problem solving


 Probabilities, joint and conditional probabilities, simulations and estimations. Introduction to
gaussian mixtures and anomaly detection
 Data types, basic probabilities, Probability distributions (Discrete and Continuous) -Bernoulli,
Binomial, Multinomial and Poisson distribution
 Describing the relationship between attributes: Covariance; Correlation; ChiSquare
 Special emphasis on Normal distribution; Central Limit Theorem
 Inferential stats: t, f chi-square testing
 Inferential statistics: How to learn about the population from a sample and vice versa; Sampling
distributions; Confidence Intervals, Hypothesis Testing.
 Case Study - Uber Supply Gap - summarize and visualize your solutions using Uber supplydata.
Unit 4 | Week (13 - 16)

Data Pre-Processing

 Introduction to R/Python, Binning, Standardization, Normalization


 Type Conversion, Merging
 Normal Curves, Central Tendency and Outlier Detection
 Dimensionality Reduction: PCA, SVD approaches
 Handling Missing Values (K-NN, MI, Clustering etc.)

Unit 5 | Week (17 - 20)

Data Visualization in R / Python

 Data Exploration - Histograms, Bar Chart, Box Plot, Line Graph, Scatter Plot
 Data Storytelling - The Science, ggplot, Bubble Charts with Multiple Dimensions, Gauge Charts,
Treemap, Heat Map and Motion Charts

Linear Regression
 Approach: Model Estimation, MLE & Error Function, Optimization through Gradient Descent for
finding parameters
 Constructing a Linear Regression, Diagnostics
 Interpretation and Applications
 Case Study 1 - Help a digital media company understand why their viewership is falling and
propose recommendations to increase viewership
 Case Study 2 - Create a model to understand the factors that influence car prices in the US.

Unit 5 | Week (17 - 20)

Decision Trees
 Rule Based Knowledge: Logic of Rules, Evaluating Rules, Rule Induction and Association
Rules.
 Construction of Decision Trees through Simplified Examples; Choosing the "Best" attribute at
each Non-Leaf node; Entropy; Information Gain, Gini Index, Chi Square, Regression Trees.
 Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with Numerical
Variables; other Measures of Randomness
 Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as Rules Oblique Decision
Trees
 Oblique Decision Trees
 Case Study - Predict whether a customer will default on loan or not
Instance based learning
 K-NN method, wilson editing and triangulation
 K-NN in collaborative filtering, digit recognition

Ensembles
 Methods of Ensembling (Stacking, Mixture of Experts)K-NN in collaborative filtering, digit
recognition
 Bagging and Random forest (Logic, Practical Applications)
 Ada Boost
 Gradient Boosting Machines

Unit 6 | Week (21 - 24)

Discriminative Statistical Models: Logistic Regression


 Why Linear Regression Fails and Logit Function
 Approach: Model Estimation, MLE & Error Function, Optimization through Gradient Descent for
finding parameters
 Constructing Logistic Regression, Diagnostics
 Interpretation and Applications
 Case Study 1 - Predict employee attrition in a large organization.
 Case Study 2 - Predict whether the customers will buy a life insurance policy using a large
insurer's past customer data.

Time Series
 Regression on Time.
 Modeling Seasonality as Deviation
 Statistician's Approach: Components of a Time Series and Estimation Methods
 Smoothing: Moving Average, Weighted and Exponential Moving
 Holt Winters Method
 Box-Jenkins and ARIMA
 Case Study - Forecast gold prices using past 30 years data.

You might also like