You are on page 1of 10

DATASCIENCE COURSE CONTENT :

1.Descriptive Statistics and Probability Distributions:


 Introduction about Statistics
 Different Types of Variables
 Measures of Central Tendency with examples
 Measures of Dispersion
 Probability & Distributions
 Probability Basics
 Binomial Distribution and its properties
 Poisson distribution and its properties
 Normal distribution and its properties

2.Inferential Statistics and Testing of Hypothesis


 Sampling methods
 Different methods of estimation
 Testing of Hypothesis & Tests
 Analysis of Variance

3.Covariance & Correlation


->> Predictive Modeling Steps and Methodology with Live example:
 Data Preparation
 Exploratory Data analysis
 Model Development
 Model Validation
 Model Implementation

4.Supervised Techniques:
->> Multiple linear Regression
 Linear Regression - Introduction - Applications
 Assumptions of Linear Regression
 Building Linear Regression Model
 Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global
hypothesis etc)
 Validation of Linear Regression Models (Re running Vs. Scoring)
 Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers
etc)
 Interpretation of Results - Business Validation - Implementation on new data
 Real time case study of Manufacturing and Telecom Industry to estimate the future revenue using the
models
->> Logistic Regression - Introduction - Applications
 Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
 Building Logistic Regression Model
 Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test,
Gini, KS, Misclassification etc)
 Validation of Logistic Regression Models (Re running Vs. Scoring)
 Standard Business Outputs (Decile Analysis, ROC Curve)
 Probability Cut-offs, Lift charts, Model equation, drivers etc)
 Interpretation of Results - Business Validation - Implementation on new data
 Real time case study to Predict the Churn customers in the Banking and Retail industry
->> Partial Least Square Regression
 Partial Least square Regression - Introduction - Applications
 Difference between Linear Regression and Partial Least Square Regression
 Building PLS Model
 Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global
hypothesis etc)
 Interpretation of Results - Business Validation - Implementation on new data
 Sharing the real time example to identify the key factors which are driving the Revenue

5.Variable Reduction Techniques


->> Factor Analysis
->> Principle component analysis
 Assumptions of PCA
 Working Mechanism of PCA
 Types of Rotations
 Standardization
 Positives and Negatives of PCA

6.Supervised Techniques Classification:


->> CHAID
->> CART
->> Difference between CHAID and CART
->> Random Forest
 Decision tree vs. Random Forest
 Data Preparation
 Missing data imputation
 Outlier detection
 Handling imbalance data
 Random Record selection
 Random Forest R parameters
 Random Variable selection
 Optimal number of variables selection
 Calculating Out Of Bag (OOB) error rate
 Calculating Out of Bag Predictions
->> Couple of Real time use cases which are related to Telecom and Retail Industry. Identification of
the Churn.

7.Unsupervised Techniques:
->> Segmentation for Marketing Analysis
  Need for segmentation
 Criterion of segmentation
 Types of distances
 Clustering algorithms
 Hierarchical clustering
 K-means clustering
 Deciding number of clusters
 Case study
->> Business Rules Criteria
->> Real time use case to identify the Most Valuable revenue generating Customers.

8.Time series Analysis:


->> Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition
->> Basic Techniques
 Averages,
 Smoothening etc
Advanced Techniques
 AR Models,
 ARIMA
 UCM
 ->> Hybrid Model
->> Understanding Forecasting Accuracy - MAPE, MAD, MSE etc
->> Couple of use cases, To forecast the future sales of products

9.Text Analytics
->> Gathering text data from web and other sources
Processing raw web data
Collecting twitter data with Twitter API
->> Naive Bayes Algorithm
 Assumptions and of Naïve Bayes
 Processing of Text data
 Handling Standard and Text data
 Building Naïve Bayes Model
 Understanding standard model metrics
 Validation of the Models (Re running Vs. Scoring)
->> Sentiment analysis
 Goal Setting
 Text Preprocessing
 Parsing the content
 Text refinement
 Analysis and Scoring
->> Use case of Health care industry, To identify the sentiment of the patients on Specified hospital
by extracting the data from the TWITTER.

10.Visualization Using Tableau:


->> Live connectivity from R to Tableau
Generating the Reports and Charts

Data Science Online Training Course Content


Module:1 – Descriptive & Inferential Statistics
1. Turning Data into Information
• Data Visualization
• Measures of Central Tendency
• Measures of Variability
• Measures of Shape
• Covariance, Correlation
• Using Software-Real Time Problems
2.Probability Distributions
• Probability Distributions: Discrete Random Variables
• Mean, Expected Value
• Binomial Random Variable
• Poisson Random Variable
• Continuous Random Variable
• Normal distribution
• Using Software-Real Time Problems
3.Sampling Distributions
• Central Limit Theorem
• Sampling Distributions for Sample Proportion, p-hat
• Sampling Distribution of the Sample Mean, x-bar
• Using Software-Real Time Problems
4.Confidence Intervals
• Statistical Inference
• Constructing confidence intervals to estimate a population Mean,
Variance, Proportion
• Using Software-Real Time Problems
5.Hypothesis Testing
• Hypothesis Testing
• Type I and Type II Errors
• Decision Making in Hypothesis Testing
• Hypothesis Testing for a Mean, Variance, Proportion
• Power in Hypothesis Testing
• Using Software-Real Time Problems
6.Comparing Two Groups
• Comparing Two Groups
• Comparing Two Independent Means, Proportions
• Pairs wise testing for Means
• Two Variances Test(F-Test)
• Using Software-Real Time Problems
7. Analysis of Variance (ANOVA)
• One-Way and Two-way ANOVA
• ANOVA Assumptions
• Multiple Comparisons (Tukey, Dunnett)
• Using Software-Real Time Problems
8.Association Between Categorical Variables
• Two Categorical Variables Relation
• Statistical Significance of Observed Relationship / Chi-Square Test
• Calculating the Chi-Square Test Statistic
• Contingency Table
• Using Software-Real Time Problems

Module:2 – Applied Regression Methods


1.Simple Linear Regression(SLR)
 Prerequisite Mathematics
 The Simple Linear Regression Model
 What is The Common Error Variance?
 The Coefficient of Determination
 Hypothesis Test for the Population Correlation Coefficient
 Using Software-Real Time Problems
2.SLR Model Evaluation
 Inference for the Population Intercept and Slope
 The Analysis of Variance (ANOVA) table and the F-test
 Equivalent linear relationship tests
 Decomposing the Error
 The Lack of Fit F-test
 Using Software-Real Time Problems
3.SLR Estimation & Prediction
 Confidence Interval for the Mean Response
 Prediction Interval for a New Response
 Using Software-Real Time Problems
4.SLR Model Assumptions
 Model Assumptions Diagnostics
 Using Software-Real Time Problems
5.Multiple Linear
Regression(MLR)
 The Multiple Linear Regression Model
 Using Software-Real Time Problems
6.MLR Model Evaluation
 The General Linear Test
 Sequential (or Extra) Sums of Squares
 The Hypothesis Tests for the Slopes
 Partial R-squared
 Lack of Fit Testing in the Multiple Regression Setting
 Using Software-Real Time Problems
7.MLR Estimation, Prediction & Model Assumptions
 Confidence Interval for the Mean Response
 Prediction Interval for a New Response
 Model Assumptions Diagnostics
 Using Software-Real Time Problems
8.Categorical Predictors
 Coding Qualitative Variables
 Additive Effects
 Interaction Effects
 Using Software-Real Time Problems
9.Data Transformations
 Using Software-Real Time Problems
10. Model Building
 Forward Selection/Backward Elimination
 Stepwise Regression
 Adjusted R-Sq, Mallows Cp, PRESS, AIC, BIC, SBC, AICC
 Outliers and Influential Data Points
 Cooks Distance/DIFBETAS/DFFITS
 Using Software-Real Time Problems

Module:3 – Applied Time Series Analysis


1. Time Series Basics
• Overview
• ACF and AR(1) Model
2. MA Models, PACF
• Moving Average Models (MA models)
• PACF
• Using Software-Real Time Problems
3. ARIMA models
• Non-seasonal ARIMA
• Diagnostics
• Forecasting
• Using Software-Real Time Problem
4. Seasonal Models
• Seasonal ARIMA
• Identifying Seasonal Models
• Using Software-Real Time Problems
5. Smoothing and Decomposition Methods
• Decomposition Models
• Smoothing Time Series
• Using Software-Real Time Problems
6. Periodogram
• Periodogram
• Using Software-Real Time Problems
7. Regression with ARIMA errors; CCF; 2 Time Series
• Linear Regression Models with Autoregressive Errors
• CCF and Lagged Regressions
• Using Software-Real Time Problems

Module:4 – Machine Learning


1.Introduction
• Application Examples
• Supervised Learning
• Unsupervised Learning
2.Regression Shrinkage Methods
• Ridge RegressionüLasso
• Using Software-Real Time Problems
3.Classification
• Logistic Regression
• Discriminant Analysis
• Nearest-Neighbor Methods
• Using Software-Real Time Problems
4. Tree-based Methods

• The Basics of Decision Trees


• Regression Trees
• Classification Trees
• Ensemble Methods
• Bagging, Boosting, Bootstrap, Random Forests
• Using Software-Real Time Problems
5. Neural Networks
• Introduction
• Single Layer Perceptron
• Multi-layer Perceptron
• Forward Feed and Backward Propagation
• Using Software-Real Time Problems
6.Support Vector Machine
• Support Vector Classifier
• Support Vector Machine
• SVMs with More than Two Classes
• Using Software-Real Time Problems
7.Dimension Reduction Methods
• Principal Components Regression (PCR)
• Partial Least Squares (PLS)
• Using Software-Real Time Problems
8.Association rules
• Market Basket Analysis
• Using Software-Real Time Problems
Module:5 – SAS/R Programming
1.Base SAS
• Working with SAS program syntax
• Examining SAS data sets
• Accessing SAS libraries
• Producing Detail Reports
• Sorting and grouping report data
• Enhancing reports
• Formatting Data Values
• Creating user-defined formats
• Reading SAS Data Sets
• Customizing a SAS data set
• Handling missing data
• Manipulating Data
• Combining SAS Data Sets
• Creating Summary Reports
• Controlling Input and Output
• Summarizing Data
• Reading Raw Data Files
• Data Transformations
• Debugging Techniques
• Using the PUTLOG statement
• Processing Data Iteratively
• Restructuring a Data Set
• Creating and Maintaining Permanent Formats
2.SAS SQL
• Working with SAS program syntax
• Basic Queries
• Examining SAS data sets
• Sub-Queries
• Accessing SAS libraries
• Joins (SQL)
• Producing Detail Reports
• Operators
• Sorting and grouping report data
• Creating Tables and Views
• Enhancing reports
• Managing Tables
• Formatting Data Values
3. SAS Macros
• Creating user-defined formats
• Macro Variables
• Reading SAS Data Sets
• Definitions
• Customizing a SAS data set
• Data Step and SQL Interfaces
• Handling missing data
4. R Programming
• Manipulating Data
• RCMDR Package
• Combining SAS Data Sets
• Rattle Package
• Creating Summary Reports
data science online training

You might also like