You are on page 1of 12

Certificate Course on Data Science

Become a Data Scientist and learn Statistical Analysis, Machine Learning, Predictive Analytics, and
many more.
 Get Trained by Trainers from ISB, IIT & IIM
 184 Hours of Intensive Classroom & Online Sessions
 2 Capstone Live Projects
 Receive Certificate from Technology Leader - IBM
 Job Placement Assistance.

Data Science Course Programme Overview


This Data Science course using Python and R endorses the CRISP-DM Project Management
methodology and contains a preliminary introduction of the same. Data Science is a 90% statistical
analysis and it is only fair that the premier modules should bear an introduction to Statistical Data
Business Intelligence and Data Visualization techniques. Students will grapple with Plots, Inferential
Statistics, and various Probability Distributions in the module. A brief exposition on Exploratory Data
Analysis/ Descriptive Analytics is huddled in between. The core modules commence with a focus on
Hypothesis Testing and the "4" must know hypothesis tests. Data Mining with Supervised Learning and
the use of Linear Regression and OLS to enable the same find mention in succeeding modules. The
prominent use of Multiple Linear Regression to build Prediction Models is elaborated. The theory
behind Lasso and Ridge Regressions, Logistic Regression, Multinomial Regression, and Advanced
Regression For Count Data is discussed in the subsequent modules.

A separate module is devoted to Data Mining Unsupervised Learning where the techniques of
Clustering, Dimension Reduction, and Association Rules are elaborated. The nitty-gritty of
Recommendation Engines and Network Analytics are detailed in the following modules. The various
Machine Learning algorithms follow next like k-NN Classifier, Decision Tree and Random Forest,
Ensemble Techniques, Bagging and Boosting, Adaboost, and Extreme Gradient Boosting. Text Mining,
Natural Language Processing, Naive Bayes, Perceptron, and Multilayer Perceptron are the focal points
of the succeeding modules.

The fundamentals of Neural Network ANN and Deep Learning Black Box Techniques like CNN, RNN,
and SVM find prominent features as well. The concluding modules contain model-driven and data-
driven algorithms for Forecasting and Time Series Analysis.

What is Data Science?

Data science is an amalgam of methods derived from statistics, Data Analysis, and Machine
Learning that are trained to extract and analyze huge volumes of structured and unstructured data.

Who is a Data Scientist?

A Data Scientist is a researcher who has to prepare huge volumes of big data for analysis, build
complex quantitative algorithms to organize and synthesize the information, and present the findings
with compelling visualizations to senior management.

A Data Scientist enhances business decision making by introducing greater speed and better direction
to the entire process.

A Data Scientist must be a person who loves playing with numbers and figures. A strong analytical
mindset coupled with strong industrial knowledge is the skill set most desired in a data scientist. He
must possess above the average communication skills and must be adept in communicating the
technical concepts to non - technical people. Data Scientists need a strong foundation in Statistics,
Mathematics, Linear Algebra, Computer Programming, Data Warehousing, Mining, and modeling to build
winning algorithms.

They must be proficient in tools such as Python, R, R Studio, Hadoop, MapReduce, Apache Spark,
Apache Pig, Java, NoSQL database, Cloud Computing, Tableau, and SAS.
Data Science Training Learning Outcomes
The Data Science Course using Python and R commences with an introduction to statistics,
probability, python and R programming, and Exploratory Data Analysis. Participants will engage with
the concepts of Data Mining Supervised Learning with Linear regression and Predictive Modelling with
Multiple Linear Regression techniques. Data Mining Unsupervised using Clustering, Dimension
Reduction, and Association Rules is also dealt with in detail. A module is dedicated to scripting
Machine Learning algorithms and enabling Deep Learning and Neural Networks with Black Box
techniques and SVM. Learn to perform proactive forecasting and Time Series Analysis with algorithms
scripted in Python and R. in the best data science training institute in India.
 Work with various data generation sources
 Perform Text Mining to generate Customer Sentiment Analysis
 Analyse structured and unstructured data using different tools and techniques
 Develop an understanding of Descriptive and Predictive Analytics
 Apply Data-driven, Machine Learning approaches for business decisions
 Build models for day-to-day applicability
 Perform Forecasting to take proactive business decisions
 Use Data Concepts to represent data for easy understanding

Data Science Certification Course Modules


This Data Science course espouses the CRISP-DM Project Management Methodology. A primer on
statistics, DATA VISUALIZATION, plots, and Inferential Statistics, and Probability Distribution is
contained in the premier modules of the course. The subsequent modules deal with Exploratory Data
Analysis, Hypothesis Testing, and Data Mining Supervised Learning-enabled with Linear Regression and
OLS. The following modules focus on the various regression models. We learn to enable Predictive
Modeling with Multiple Linear Regression. The merits of Lasso and Ridge Regression, Logistic
Regression, Multinomial Regression, and Advanced Regression For Count Data are explored. Data
Mining Unsupervised Learning is the fulcrum of the next three modules. The various approaches used
to enable the same like Clustering, Dimension Reduction, and Association Rules are elaborated in-
depth with appropriate algorithms. The workings of Recommendation Engines and the key concepts of
Network Analytics are also detailed.

This Data Science Course in India lends focus to Machine Learning algorithms like k-NN Classifier,
Decision Tree and Random Forest, Ensemble Techniques- Bagging and Boosting, AdaBoost, Extreme
Gradient Boosting, and Naive Bayes algorithm. Text Mining and Natural Language Processing also
feature in the course curriculum. The building blocks of Neural Networks -ANN and Deep Learning
Black Box Techniques like CNN, RNN, and SVM are also described in great detail. The concluding
modules include model-driven and data-driven algorithm development for forecasting and Time Series
Analysis. This is the most comprehensive data science course from the best data science training
institute in India.

1. CRISP – DM - Project Management Methodology


2. Exploratory Data Analytics (EDA) / Descriptive Analytics
3. Statistical Data Business Intelligence and Data Visualization
4. Plots & Inferential Statistics
5. Probability Distributions (Continuous & Discrete)
6. Hypothesis Testing - The ‘4’ Must Know Hypothesis Tests
7. Data Mining Supervised Learning – Linear Regression, OLS
8. Predictive Modelling – Multiple Linear Regression
9. Lasso and Ridge Regressions
10. Logistic Regression – Binary Value Prediction, MLE
11. Multinomial Regression
12. Advanced Regression for Count Data
13. Machine Learning - k -NN Classifier
14. Decision Tree & Random Forest
15. Ensemble Techniques - Bagging and Boosting
16. AdaBoost & Extreme Gradient Boosting
17. Text Mining and Natural Language Processing (NLP)
18. Machine Learning Classifier Technique - Naive Bayes
19. Introduction to Perceptron and Multilayer Perceptron
20. Building Blocks of Neural Network - ANN
21. Deep Learning Primer
22. Kernel Method - SVM
23. Data Mining Unsupervised Learning – Clustering
24. Data Mining Unsupervised Learning - Dimension Reduction (PCA)
25. Data Mining Unsupervised Learning - Association Rules
26. Recommendation Engine
27. Network Analytics
28. Auto Machine Learning (Auto ML)
29. Survival Analytics
30. Forecasting/Time Series – Model-Driven Algorithms
31. Forecasting/Time Series - Data-Driven Algorithms

1. CRISP – DM - Project Management Methodology


Learn about insights on how data is assisting organizations to make informed data-driven decisions. Data is
treated as the new oil for all the industries and sectors which keep organizations ahead in the competition.
Learn the application of Big Data Analytics in real-time, you will understand the need for analytics with a use
case. Also, learn about the best project management methodology for Data Mining - CRISP-DM at a high
level.

2. All About 360DigiTMG & Innodatatics Inc., USA


3. Dos and Don'ts as a participant
4. Introduction to Big Data Analytics
5. Data and its uses – a case study (Grocery store)
6. Interactive marketing using data & IoT – A case study
7. Course outline, road map, and takeaways from the course
8. Stages of Analytics - Descriptive, Predictive, Prescriptive, etc.
9. Cross-Industry Standard Process for Data Mining

2. Exploratory Data Analytics (EDA) / Descriptive Analytics


Data Science project management methodology, CRISP-DM will be explained in this module in finer detail.
Learn about Data Collection, Data Cleansing, Data Preparation, Data Munging, Data Wrapping, etc. Learn
about the preliminary steps taken to churn the data, known as exploratory data analysis. In this module, you
also are introduced to statistical calculations which are used to derive information from data. We will begin to
understand how to perform a descriptive analysis.

 Machine Learning project management methodology


 Data Collection - Surveys and Design of Experiments
 Data Types namely Continuous, Discrete, Categorical, Count, Qualitative, Quantitative and its identification and
application
 Further classification of data in terms of Nominal, Ordinal, Interval & Ratio types
 Balanced versus Imbalanced datasets
 Cross Sectional versus Time Series vs Panel / Longitudinal Data
 Batch Processing vs Real Time Processing
 Structured versus Unstructured vs Semi-Structured Data
 Big vs Not-Big Data
 Data Cleaning / Preparation - Outlier Analysis, Missing Values Imputation Techniques, Transformations,
Normalization / Standardization, Discretization
 Sampling techniques for handling Balanced vs. Imbalanced Datasets
 What is the Sampling Funnel and its application and its components?
i. Population
ii. Sampling frame
iii. Simple random sampling
iv. Sample
 Measures of Central Tendency & Dispersion
 Population
 Mean/Average, Median, Mode
 Variance, Standard Deviation, Range

3. Statistical Data Business Intelligence and Data Visualization


Learn about various statistical calculations used to capture business moments for enabling decision makers
to make data driven decisions. You will learn about the distribution of the data and its shape using these
calculations. Understand to intercept information by representing data by visuals. Also learn about
Univariate analysis, Bivariate analysis and Multivariate analysis.

 Measure of Skewness
 Measure of Kurtosis
 Spread of the Data
 Various graphical techniques to understand data
i. Bar Plot
ii. Histogram
iii. Boxplot
iv. Scatter Plot

4. Plots & Inferential Statistics


Data Visualization helps understand the patterns or anomalies in the data easily and learn about various
graphical representations in this module. Understand the terms univariate and bivariate and the plots used
to analyze in 2D dimensions. Understand how to derive conclusions on business problems using
calculations performed on sample data. You will learn the concepts to deal with the variations that arise
while analyzing different samples for the same population using the central limit theorem.

 Line Chart
 Pair Plot
 Sample Statistics
 Population Parameters
 Inferential Statistics

5. Probability Distributions (Continuous & Discrete)


In this tutorial you will learn in detail about continuous probability distribution. Understand the properties of a
continuous random variable and its distribution under normal conditions. To identify the properties of a
continuous random variable, statisticians have defined a variable as a standard, learning the properties of
the standard variable and its distribution. You will learn to check if a continuous random variable is following
normal distribution using a normal Q-Q plot. Learn the science behind the estimation of value for a
population using sample data.

 Random Variable and its definition


 Probability & Probability Distribution
i. Continuous Probability Distribution / Probability Density Function
ii. Discrete Probability Distribution / Probability Mass Function
 Normal Distribution
 Standard Normal Distribution / Z distribution
 Z scores and the Z table
 QQ Plot / Quantile - Quantile plot
 Sampling Variation
 Central Limit Theorem
 Sample size calculator
 Confidence interval - concept
 Confidence interval with sigma
 T-distribution / Student's-t distribution
 Confidence interval
i. Population parameter with Standard deviation known
ii. Population parameter with Standard deviation not known
 A complete recap of Statistics

6. Hypothesis Testing - The ‘4’ Must Know Hypothesis Tests


Learn to frame business statements by making assumptions. Understand how to perform testing of these
assumptions to make decisions for business problems. Learn about different types of Hypothesis testing
and its statistics. You will learn the different conditions of the Hypothesis table, namely Null Hypothesis,
Alternative hypothesis, Type I error and Type II error. The prerequisites for conducting a Hypothesis test,
interpretation of the results will be discussed in this module.

 Formulating a Hypothesis
 Choosing Null and Alternative Hypothesis
 Type I or Alpha Error and Type II or Beta Error
 Confidence Level, Significance Level, Power of Test
 Comparative study of sample proportions using Hypothesis testing
 2 Sample t-test
 ANOVA
 2 Proportion test
 Chi-Square test

7. Data Mining Supervised Learning – Linear Regression, OLS


Data Mining supervised learning is all about making predictions for an unknown dependent variable using
mathematical equations explaining the relationship with independent variables. Revisit the school math with
the equation of a straight line. Learn about the components of Linear Regression with the equation of the
regression line. Get introduced to Linear Regression analysis with a use case for prediction of a continuous
dependent variable. Understand about ordinary least squares technique.

 Scatter diagram
i. Correlation analysis
ii. Correlation coefficient
 Ordinary least squares
 Principles of regression
 Simple Linear Regression
 Exponential Regression, Logarithmic Regression, Quadratic or Polynomial Regression
 Confidence Interval versus Prediction Interval
 Heteroscedasticity / Equal Variance

8. Predictive Modelling – Multiple Linear Regression


In the continuation to Regression analysis study you will learn how to deal with multiple independent
variables affecting the dependent variable. Learn about the conditions and assumptions to perform linear
regression analysis and the workarounds used to follow the conditions. Understand the steps required to
perform the evaluation of the model and to improvise the prediction accuracies. You will be introduced to
concepts of variance and bias.

 LINE assumption
i. Linearity
ii. Independence
iii. Normality
iv. Equal Variance / Homoscedasticity
 Collinearity (Variance Inflation Factor)
 Multiple Linear Regression
 Model Quality metrics
 Deletion Diagnostics

9. Lasso and Ridge Regressions


Learn about overfitting and underfitting conditions for prediction models developed. We need to strike the
right balance between overfitting and underfitting, learn about regularization techniques L1 norm and L2
norm used to reduce these abnormal conditions. The regression techniques Lasso and Ridge techniques
are discussed in this module .

 Understanding Overfitting (Variance) vs. Underfitting (Bias)


 Generalization error and Regularization techniques
 Different Error functions or Loss functions or Cost functions
 Lasso Regression
 Ridge Regression

10. Logistic Regression – Binary Value Prediction, MLE


You have learnt about predicting a continuous dependent variable. As part of this module, you will continue
to learn Regression techniques applied to predict attribute Data. Learn about the principles of the logistic
regression model, understand the sigmoid curve, the usage of cutoff value to interpret the probable outcome
of the logistic regression model. Learn about the confusion matrix and its parameters to evaluate the
outcome of the prediction model. Also, learn about maximum likelihood estimation.

 Principles of Logistic regression


 Types of Logistic regression
 Assumption & Steps in Logistic regression
 Analysis of Simple logistic regression results
 Multiple Logistic regression
 Confusion matrix
i. False Positive, False Negative
ii. True Positive, True Negative
iii. Sensitivity, Recall, Specificity, F1
 Receiver operating characteristics curve (ROC curve)
 Precision Recall (P-R) curve
 Lift charts and Gain charts

11. Multinomial Regression


Extension to logistic regression We have a multinomial regression technique used to predict a multiple categorical
outcome. Understand the concept of multi logit equations, baseline and making classifications using probability
outcomes. Learn about handling multiple categories in output variables including nominal as well as ordinal data.

 Logit and Log-Likelihood


 Category Baselining
 Modeling Nominal categorical data
 Handling Ordinal Categorical Data
 Interpreting the results of coefficient values

12. Advanced Regression for Count Data


As part of this module you learn further different regression techniques used for predicting discrete data.
These regression techniques are used to analyze the numeric data known as count data. Based on the
discrete probability distributions namely Poisson, negative binomial distribution the regression models try to
fit the data to these distributions. Alternatively, when excessive zeros exist in the dependent variable, zero-
inflated models are preferred, you will learn the types of zero-inflated models used to fit excessive zeros
data.

 Poisson Regression
 Poisson Regression with Offset
 Negative Binomial Regression
 Treatment of data with Excessive Zeros
11. Zero-inflated Poisson
12. Zero-inflated Negative Binomial
13. Hurdle Mode

13. Machine Learning - k -NN Classifier


k Nearest Neighbor algorithm is distance based machine learning algorithm. Learn to classify the dependent
variable using the appropriate k value. The k-NN classifier also known as lazy learner is a very popular
algorithm and one of the easiest for application.

 Deciding the K value


 Thumb rule in choosing the K value
 Building a KNN model by splitting the data
 Checking for Underfitting and Overfitting in KNN
 Generalization and Regulation Techniques to avoid overfitting in KNN

14. Decision Tree & Random Forest


Decision Tree & Random forest are some of the most powerful classifier algorithms based on classification
rules. In this tutorial, you will learn about deriving the rules for classifying the dependent variable by
constructing the best tree using statistical measures to capture the information from each of the attributes.
Random forest is an ensemble technique constructed using multiple Decision trees and the final outcome is
drawn from the aggregating the results obtained from these combinations of trees.

 Elements of classification tree - Root node, Child Node, Leaf Node, etc.
 Greedy algorithm
 Measure of Entropy
 Attribute selection using Information gain
 Ensemble techniques - Stacking, Boosting and Bagging
 Decision Tree C5.0 and understanding various arguments
 Checking for Underfitting and Overfitting in Decision Tree
 Generalization and Regulation Techniques to avoid overfitting in Decision Tree
 Random Forest and understanding various arguments
 Checking for Underfitting and Overfitting in Random Forest
 Generalization and Regulation Techniques to avoid overfitting in Random Forest

 Ensemble Techniques - Bagging and Boosting


Learn about improving reliability and accuracy of decision tree models using ensemble techniques. Bagging
and Boosting are the go to techniques in ensemble techniques. The parallel and sequential approaches
taken in Bagging and Boosting methods are discussed in this module.

 Overfitting
 Underfitting
 Pruning
 Boosting
 Bagging or Bootstrap aggregating

16. AdaBoost & Extreme Gradient Boosting


The Boosting algorithms AdaBoost and Extreme Gradient Boosting are discussed as part of this
continuation module You will also learn about stacking methods. Learn about these algorithms which are
providing unprecedented accuracy and helping many aspiring data scientists win the first place in various
competitions such as Kaggle, CrowdAnalytix, etc.
 AdaBoost / Adaptive Boosting Algorithm
 Checking for Underfitting and Overfitting in AdaBoost
 Generalization and Regulation Techniques to avoid overfitting in AdaBoost
 Gradient Boosting Algorithm<
 Checking for Underfitting and Overfitting in Gradient Boosting
 Generalization and Regulation Techniques to avoid overfitting in Gradient Boosting
 Extreme Gradient Boosting (XGB) Algorithm
 Checking for Underfitting and Overfitting in XGB
 Generalization and Regulation Techniques to avoid overfitting in XGB
17. Text Mining and Natural Language Processing (NLP)
Learn to analyse the unstructured textual data to derive meaningful insights. Understand the language
quirks to perform data cleansing, extract features using a bag of words and construct the key-value pair
matrix called DTM. Learn to understand the sentiment of customers from their feedback to take appropriate
actions. Advanced concepts of text mining will also be discussed which help to interpret the context of the
raw text data. Topic models using LDA algorithm, emotion mining using lexicons are discussed as part of
NLP module.

 Sources of data
 Bag of words
 Pre-processing, corpus Document Term Matrix (DTM) & TDM
 Word Clouds
 Corpus level word clouds
i. Sentiment Analysis
ii. Positive Word clouds
iii. Negative word clouds
iv. Unigram, Bigram, Trigram
 Semantic network
 Clustering
 Extract user reviews of the product/services from Amazon, Snapdeal and trip advisor
 Install Libraries from Shell
 Extraction and text analytics in Python
 LDA / Latent Dirichlet Allocation
 Topic Modelling
 Sentiment Extraction
 Lexicons & Emotion Mining

18. Machine Learning Classifier Technique - Naive Bayes


Revise Bayes theorem to develop a classification technique for Machine learning. In this tutorial you will
learn about joint probability and its applications. Learn how to predict whether an incoming email is a spam
or a ham email. Learn about Bayesian probability and the applications in solving complex business
problems.

 Probability – Recap
 Bayes Rule
 Naïve Bayes Classifier
 Text Classification using Naive Bayes
 Checking for Underfitting and Overfitting in Naive Bayes
 Generalization and Regulation Techniques to avoid overfitting in Naive Bayes

19. Introduction to Perceptron and Multilayer Perceptron


Perceptron algorithm is defined based on a biological brain model. You will talk about the parameters used in the
perceptron algorithm which is the foundation of developing much complex neural network models for AI applications.
Understand the application of perceptron algorithms to classify binary data in a linearly separable scenario.

 Neurons of a Biological Brain


 Artificial Neuron
 Perceptron
 Perceptron Algorithm
 Use case to classify a linearly separable data
 Multilayer Perceptron to handle non-linear data

20. Building Blocks of Neural Network - ANN


Neural Network is a black box technique used for deep learning models. Learn the logic of training and
weights calculations using various parameters and their tuning. Understand the activation function and
integration functions used in developing a neural network.

 Integration functions
 Activation functions
 Weights
 Bias
 Learning Rate (eta) - Shrinking Learning Rate, Decay Parameters
 Error functions - Entropy, Binary Cross Entropy, Categorical Cross Entropy, KL Divergence, etc.

 Deep Learning Primer


 Artificial Neural Networks
 ANN Structure
 Error Surface
 Gradient Descent Algorithm
 Backward Propagation
 Network Topology
 Principles of Gradient Descent (Manual Calculation)
 Learning Rate (eta)
 Batch Gradient Descent
 Stochastic Gradient Descent
 Minibatch Stochastic Gradient Descent
 Optimization Methods: Adagrad, Adadelta, RMSprop, Adam
 Convolution Neural Network (CNN)
11. ImageNet Challenge – Winning Architectures
12. Parameter Explosion with MLPs
13. Convolution Networks
 Recurrent Neural Network
11. Language Models
12. Traditional Language Model
13. Disadvantages of MLP
14. Back Propagation Through Time
15. Long Short-Term Memory (LSTM)
16. Gated Recurrent Network (GRU)

22. Kernel Method - SVM


 Support Vector Machines / Large-Margin / Max-Margin Classifier
 Hyperplanes
 Best Fit "boundary"
 Linear Support Vector Machine using Maximum Margin
 SVM for Noisy Data
 Non- Linear Space Classification
 Non-Linear Kernel Tricks
i. Linear Kernel
ii. Polynomial
iii. Sigmoid
iv. Gaussian RBF
 SVM for Multi-Class Classification
i. One vs. All
ii. One vs. One
 Directed Acyclic Graph (DAG) SVM

23. Data Mining Unsupervised Learning – Clustering


Data mining unsupervised techniques are used as EDA techniques to derive insights from the business
data. In this first module of unsupervised learning, get introduced to clustering algorithms. Learn about
different approaches for data segregation to create homogeneous groups of data. Hierarchical clustering, K
means clustering are most commonly used clustering algorithms. Understand the different mathematical
approaches to perform data segregation. Also learn about variations in K-means clustering like K-medoids,
K-mode techniques, learn to handle large data sets using CLARA technique.

 • Hierarchical • Supervised vs Unsupervised learning • Data Mining Process • Hierarchical Clustering / Agglomerative
Clustering • Dendrogram • Measure of distance
 Numeric
i. Euclidean, Manhattan, Mahalanobis
 Categorical
i. Binary Euclidean
ii. Simple Matching Coefficient
iii. Jaquard's Coefficient
 Mixed
i. Gower's General Dissimilarity Coefficient
 Types of Linkages
i. Single Linkage / Nearest Neighbour
ii. Complete Linkage / Farthest Neighbour
iii. Average Linkage
iv. Centroid Linkage
 K-Means Clustering
i. Measurement metrics of clustering
 Within the Sum of Squares
 Between the Sum of Squares
 Total Sum of Squares
ii. Choosing the ideal K value using Scree Plot / Elbow Curve
iii. Other Clustering Techniques
i. K-Medians
ii. K-Medoids
iii. K-Modes
iv. Clustering Large Application (CLARA)
v. Partitioning Around Medoids (PAM)
vi. Density-based spatial clustering of applications with noise (DBSCAN)

24. Data Mining Unsupervised Learning - Dimension Reduction (PCA)


Dimension Reduction (PCA) / Factor Analysis Description: Learn to handle high dimensional data. The
performance will be hit when the data has a high number of dimensions and machine learning techniques
training becomes very complex, as part of this module you will learn to apply data reduction techniques
without any variable deletion. Learn the advantages of dimensional reduction techniques. Also, learn about
yet another technique called Factor Analysis.

 Why Dimension Reduction


 Advantages of PCA
 Calculation of PCA weights
 2D Visualization using Principal components
 Basics of Matrix Algebra
 Factor Analysis

25. Data Mining Unsupervised Learning - Association Rules


Learn to measure the relationship between entities. Bundle offers are defined based on this measure of
dependency between products. Understand the metrics Support, Confidence and Lift used to define the
rules with the help of Apriori algorithm. Learn pros and cons of each of the metrics used in Association rules.

 What is Market Basket / Affinity Analysis


 Measure of Association
i. Support
ii. Confidence
iii. Lift Ratio
 Apriori Algorithm
 Sequential Pattern Mining

26. Recommendation Engine


Personalized recommendations made in e-commerce are based on all the previous transactions made.
Learn the science of making these recommendations using measuring similarity between customers. The
various methods applied for collaborative filtering, their pros and cons, SVD method used for
recommendations of movies by Netflix will be discussed as part of this module.

 User-based Collaborative Filtering


 A measure of distance/similarity between users
 Driver for Recommendation
 Computation Reduction Techniques
 Search based methods/Item to Item Collaborative Filtering
 SVD in recommendation
 The vulnerability of recommendation systems

27. Network Analytics


Study of a network with quantifiable values is known as network analytics. The vertex and edge are the
node and connection of a network, learn about the statistics used to calculate the value of each node in the
network. You will also learn about the google page ranking algorithm as part of this module.

 Definition of a network (the LinkedIn analogy)


 The measure of Node strength in a Network
i. Degree centrality
ii. Closeness centrality
iii. Eigenvector centrality
iv. Adjacency matrix
v. Betweenness centrality
vi. Cluster coefficient
 Introduction to Google page ranking

28. Auto Machine Learning (Auto ML)


 AutoML Methods
 AutoML Systems
 AutoML on Cloud - AWS
i. Amazon SageMaker
ii. Sagaemaker Notebook Instance for Model Development, Training and
iii. Deployment
iv. XG Boost Classification Model
v. Hyperparameter tuning jobs
 AutoML on Cloud - Azure
i. Workspace
ii. Environment
iii. Compute Instance
iv. Automatic Featurization
v. AutoML and ONNX
 AutoML on Cloud - GCP
i. AutoML Natural Language Performing Document Classification
ii. Performing Sentiment Analysis using AutoML Natural Language API
iii. Cloud ML Engine and Its Components
iv. Training and Deploying Applications on Cloud ML Engine
v. Choosing Right Cloud ML Engine for Training Jobs

29. Survival Analytics


Kaplan Meier method and life tables are used to estimate the time before the event occurs. Survival analysis
is about analyzing this duration or time before the event. Real-time applications of survival analysis in
customer churn, medical sciences and other sectors is discussed as part of this module. Learn how survival
analysis techniques can be used to understand the effect of the features on the event using Kaplan Meier
survival plot.

 Examples of Survival Analysis


 Time to event
 Censoring
 Survival, Hazard, Cumulative Hazard Functions
 Introduction to Parametric and non-parametric functions

30. Forecasting/Time Series – Model-Driven Algorithms


Time series analysis is performed on the data which is collected with respect to time. The response variable
is affected by time. Understand the time series components, Level, Trend, Seasonality, Noise and methods
to identify them in a time series data. The different forecasting methods available to handle the estimation of
the response variable based on the condition of whether the past is equal to the future or not will be
introduced in this module. In this first module of forecasting, you will learn the application of Model-based
forecasting techniques.

 Introduction to time series data


 Steps to forecasting
 Components to time series data
 Scatter plot and Time Plot
 Lag Plot
 ACF - Auto-Correlation Function / Correlogram
 Visualization principles
 Naïve forecast methods
 Errors in the forecast and it metrics - ME, MAD, MSE, RMSE, MPE, MAPE
 Model-Based approaches
i. Linear Model
ii. Exponential Model
iii. Quadratic Model
iv. Additive Seasonality
v. Multiplicative Seasonality
 Model-Based approaches Continued
 AR (Auto-Regressive) model for errors
 Random walk

31. Forecasting/Time Series - Data-Driven Algorithms


In this continuation module of forecasting learn about data-driven forecasting techniques. Learn
about ARMA and ARIMA models which combine model-based and data-driven techniques. Understand the
smoothing techniques and variations of these techniques. Get introduced to the concept of de-trending and
deseasonalize the data to make it stationary. You will learn about seasonal index calculations which are
used for reseasonalize the result obtained by smoothing models.

 ARMA (Auto-Regressive Moving Average), Order p and q


 ARIMA (Auto-Regressive Integrated Moving Average), Order p, d, and q
 A data-driven approach to forecasting
 Smoothing techniques
i. Moving Average
ii. Exponential Smoothing
iii. Holt's / Double Exponential Smoothing
iv. Winters / Holt-Winters
 De-seasoning and de-trending
 Econometric Models
 Forecasting using Python
 Forecasting using R

You might also like