Professional Documents
Culture Documents
Become a Data Scientist and learn Statistical Analysis, Machine Learning, Predictive Analytics, and
many more.
Get Trained by Trainers from ISB, IIT & IIM
184 Hours of Intensive Classroom & Online Sessions
2 Capstone Live Projects
Receive Certificate from Technology Leader - IBM
Job Placement Assistance.
A separate module is devoted to Data Mining Unsupervised Learning where the techniques of
Clustering, Dimension Reduction, and Association Rules are elaborated. The nitty-gritty of
Recommendation Engines and Network Analytics are detailed in the following modules. The various
Machine Learning algorithms follow next like k-NN Classifier, Decision Tree and Random Forest,
Ensemble Techniques, Bagging and Boosting, Adaboost, and Extreme Gradient Boosting. Text Mining,
Natural Language Processing, Naive Bayes, Perceptron, and Multilayer Perceptron are the focal points
of the succeeding modules.
The fundamentals of Neural Network ANN and Deep Learning Black Box Techniques like CNN, RNN,
and SVM find prominent features as well. The concluding modules contain model-driven and data-
driven algorithms for Forecasting and Time Series Analysis.
Data science is an amalgam of methods derived from statistics, Data Analysis, and Machine
Learning that are trained to extract and analyze huge volumes of structured and unstructured data.
A Data Scientist is a researcher who has to prepare huge volumes of big data for analysis, build
complex quantitative algorithms to organize and synthesize the information, and present the findings
with compelling visualizations to senior management.
A Data Scientist enhances business decision making by introducing greater speed and better direction
to the entire process.
A Data Scientist must be a person who loves playing with numbers and figures. A strong analytical
mindset coupled with strong industrial knowledge is the skill set most desired in a data scientist. He
must possess above the average communication skills and must be adept in communicating the
technical concepts to non - technical people. Data Scientists need a strong foundation in Statistics,
Mathematics, Linear Algebra, Computer Programming, Data Warehousing, Mining, and modeling to build
winning algorithms.
They must be proficient in tools such as Python, R, R Studio, Hadoop, MapReduce, Apache Spark,
Apache Pig, Java, NoSQL database, Cloud Computing, Tableau, and SAS.
Data Science Training Learning Outcomes
The Data Science Course using Python and R commences with an introduction to statistics,
probability, python and R programming, and Exploratory Data Analysis. Participants will engage with
the concepts of Data Mining Supervised Learning with Linear regression and Predictive Modelling with
Multiple Linear Regression techniques. Data Mining Unsupervised using Clustering, Dimension
Reduction, and Association Rules is also dealt with in detail. A module is dedicated to scripting
Machine Learning algorithms and enabling Deep Learning and Neural Networks with Black Box
techniques and SVM. Learn to perform proactive forecasting and Time Series Analysis with algorithms
scripted in Python and R. in the best data science training institute in India.
Work with various data generation sources
Perform Text Mining to generate Customer Sentiment Analysis
Analyse structured and unstructured data using different tools and techniques
Develop an understanding of Descriptive and Predictive Analytics
Apply Data-driven, Machine Learning approaches for business decisions
Build models for day-to-day applicability
Perform Forecasting to take proactive business decisions
Use Data Concepts to represent data for easy understanding
This Data Science Course in India lends focus to Machine Learning algorithms like k-NN Classifier,
Decision Tree and Random Forest, Ensemble Techniques- Bagging and Boosting, AdaBoost, Extreme
Gradient Boosting, and Naive Bayes algorithm. Text Mining and Natural Language Processing also
feature in the course curriculum. The building blocks of Neural Networks -ANN and Deep Learning
Black Box Techniques like CNN, RNN, and SVM are also described in great detail. The concluding
modules include model-driven and data-driven algorithm development for forecasting and Time Series
Analysis. This is the most comprehensive data science course from the best data science training
institute in India.
Measure of Skewness
Measure of Kurtosis
Spread of the Data
Various graphical techniques to understand data
i. Bar Plot
ii. Histogram
iii. Boxplot
iv. Scatter Plot
Line Chart
Pair Plot
Sample Statistics
Population Parameters
Inferential Statistics
Formulating a Hypothesis
Choosing Null and Alternative Hypothesis
Type I or Alpha Error and Type II or Beta Error
Confidence Level, Significance Level, Power of Test
Comparative study of sample proportions using Hypothesis testing
2 Sample t-test
ANOVA
2 Proportion test
Chi-Square test
Scatter diagram
i. Correlation analysis
ii. Correlation coefficient
Ordinary least squares
Principles of regression
Simple Linear Regression
Exponential Regression, Logarithmic Regression, Quadratic or Polynomial Regression
Confidence Interval versus Prediction Interval
Heteroscedasticity / Equal Variance
LINE assumption
i. Linearity
ii. Independence
iii. Normality
iv. Equal Variance / Homoscedasticity
Collinearity (Variance Inflation Factor)
Multiple Linear Regression
Model Quality metrics
Deletion Diagnostics
Poisson Regression
Poisson Regression with Offset
Negative Binomial Regression
Treatment of data with Excessive Zeros
11. Zero-inflated Poisson
12. Zero-inflated Negative Binomial
13. Hurdle Mode
Elements of classification tree - Root node, Child Node, Leaf Node, etc.
Greedy algorithm
Measure of Entropy
Attribute selection using Information gain
Ensemble techniques - Stacking, Boosting and Bagging
Decision Tree C5.0 and understanding various arguments
Checking for Underfitting and Overfitting in Decision Tree
Generalization and Regulation Techniques to avoid overfitting in Decision Tree
Random Forest and understanding various arguments
Checking for Underfitting and Overfitting in Random Forest
Generalization and Regulation Techniques to avoid overfitting in Random Forest
Overfitting
Underfitting
Pruning
Boosting
Bagging or Bootstrap aggregating
Sources of data
Bag of words
Pre-processing, corpus Document Term Matrix (DTM) & TDM
Word Clouds
Corpus level word clouds
i. Sentiment Analysis
ii. Positive Word clouds
iii. Negative word clouds
iv. Unigram, Bigram, Trigram
Semantic network
Clustering
Extract user reviews of the product/services from Amazon, Snapdeal and trip advisor
Install Libraries from Shell
Extraction and text analytics in Python
LDA / Latent Dirichlet Allocation
Topic Modelling
Sentiment Extraction
Lexicons & Emotion Mining
Probability – Recap
Bayes Rule
Naïve Bayes Classifier
Text Classification using Naive Bayes
Checking for Underfitting and Overfitting in Naive Bayes
Generalization and Regulation Techniques to avoid overfitting in Naive Bayes
Integration functions
Activation functions
Weights
Bias
Learning Rate (eta) - Shrinking Learning Rate, Decay Parameters
Error functions - Entropy, Binary Cross Entropy, Categorical Cross Entropy, KL Divergence, etc.
• Hierarchical • Supervised vs Unsupervised learning • Data Mining Process • Hierarchical Clustering / Agglomerative
Clustering • Dendrogram • Measure of distance
Numeric
i. Euclidean, Manhattan, Mahalanobis
Categorical
i. Binary Euclidean
ii. Simple Matching Coefficient
iii. Jaquard's Coefficient
Mixed
i. Gower's General Dissimilarity Coefficient
Types of Linkages
i. Single Linkage / Nearest Neighbour
ii. Complete Linkage / Farthest Neighbour
iii. Average Linkage
iv. Centroid Linkage
K-Means Clustering
i. Measurement metrics of clustering
Within the Sum of Squares
Between the Sum of Squares
Total Sum of Squares
ii. Choosing the ideal K value using Scree Plot / Elbow Curve
iii. Other Clustering Techniques
i. K-Medians
ii. K-Medoids
iii. K-Modes
iv. Clustering Large Application (CLARA)
v. Partitioning Around Medoids (PAM)
vi. Density-based spatial clustering of applications with noise (DBSCAN)