You are on page 1of 12

CERTIFIED DATA SCIENTIST

PROGRAM SYLLABUS

Accredited by

CONTENTS
COURSE 1: DATA SCIENCE FOUNDATION … 2
COURSE 2: PYTHON ESSENTIALS FOR DATA SCIENCE ... 2
COURSE 3: R LANGUAGE ESSENTIALS ... 3
COURSE 4: MATHS FOR DATA SCIENCE ... 3
COURSE 5: STATISTICS FOR DATA SCIENCE ... 4
COURSE 6: DATA PREPARATION WITH NUMPY & PANDAS ... 5
COURSE 7: VISUALIZATION WITH PYTHON ... 5
COURSE 8: MACHINE LEARNING ASSOCIATE ... 6
COURSE 9: ADVANCED MACHINE LEARNING ... 8
COURSE 10: SQL FOR DATA SCIENCE ... 10
COURSE 11: DEEP LEARNING – CNN BASICS ... 10
COURSE 12: TABLEAU ASSOCIATE ... 11
COURSE 13: ML MODEL DEPLOY- FLASK API ... 11
COURSE 14: BIG DATA ESSENTIALS ... 11
COURSE 15: DATA SCIENCE PROJECT EXECUTION ... 11
PROGRAM COURSE BROCHURE AND CONTACTS … 12

©2020 DataMites. All content are in this document is copyrighted,


Reproducing any part of the content requires written permission from DataMites®, IABAC®
COURSE 1 DATA SCIENCE FOUNDATION
WHAT IS DATA SCIENCE?
INTRODUCTION TO DATA
MODULE 1 EVOLUTION OF DATA SCIENCE
SCIENCE
DATA SCIENCE TERMINOLOGIES
DATA SCIENCE VS BUSINESS COMPARING VARIOUS RELATED DOMAINS WITH
MODULE 2
ANALYTICS VS BIG DATA DATA SCIENCE

DESCRIPTIVE ANALYTICS
2
CLASSIFICATION OF
MODULE 3 PREDICTIVE ANALYTICS
BUSINESS ANALYTICS
DISCOVERY ANALYTICS AND PRESCRIPTIVE ANALYTICS

DATA SCIENCE PROJECT CRIPS – DM FRAMEWORK


MODULE 4
WORKFLOW DATA SCIENCE PROJECT WORKFLOW
MODULE 5 ROLES IN DATA SCIENCE INDUSTRY ROLES AND RESPONSIBILITIES
INDUSTRY ADOPTION: HEALTH CARE, FINANCE &
APPLICATION OF DATA BANKING, MANUFACTURING, RETAIL, LOGISTICS,
MODULE 6 SCIENCE IN VARIOUS HUMAN RESOURCE
INDUSTRIES KEY USE CASES
TRENDS ON DATA SCIENCE ADOPTION

COURSE 2 PYTHON ESSENTIALS FOR DATA SCIENCE


INSTALLING ANACONDA ON LOCAL MACHINE
PYTHON INSTALLATION AND ADDITIONAL PACKAGE INSTALLATION
MODULE 1
SETUP INTRODUCTION TO JUPYTER NOTEBOOK
JUPYTER NOTEBOOK KEYBOARD SHORTCUTS
INTRODUCTION TO GOOGLE COLAB
SELECTION RUNTINE ENVIRONMENT GPU/TPU
MODULE 2 GOOGLE COLAB UPLOADING DATA/FILES IN COLAB
LOADING GOOGLE DRIVE AS FOLDER
SHARING COLAB NOTEBOOK
NATIVE DATA TYPES
KEY PYTHON FUNCTIONS
MODULE 3 PYTHON INTRODUCTION SLICING OPERATIONS
IMPORTING PACKAGES - DATETIME
PACKAGE, SUB-PACKAGE, METHODS & ATTRIBUTES
DATA STRUCTURES INTRODUCTION
LISTS, LIST OPERATIONS
TUPLE
MODULE 4 PYTHON DATA STRUCTURES
SETS
DICTIONARIES
LOOPING THROUGH ITERABLE DATA SET
IF - CONDITIONAL STATEMENT
PYTHON CONTROL
MODULE 5 FOR – LOOP
STATEMENTS
USER DEFINED FUNCTIONS

Certified Data Scientist | Syllabus | ©2020 DataMites®


COURSE 3 R LANGUAGE ESSENTIALS
R INSTALLATION AND SETUP
MODULE 1 R INTRODUCTION R STUDIO – R DEVELOPMENT ENVIRONMENT
R LANGUAGE BASICS
R DATA STRUCTURES

MODULE 2
R DATA SCIENCE R CONTROL STATEMENTS
3
R DATA SCIENCE PACKAGES EXPLORATION
PROJECT IN R

COURSE 4 MATH FOR MACHINE LEARNING


CALCULATING EIGENVALUES AND EIGENVECTORS
EIGENVALUES AND
MODULE 1 EIGEN DECOMPOSITION OF A MATRIX
EIGENVECTORS
EIGENVECTORS. WHAT ARE THEY?
DETERMINANTS
INVERSE

LINEAR TRANSFORMATIONS RANK


MODULE 2 AND MATRICES COLUMN AND NULL SPACE
LINEAR TRANSFORMATIONS
MATRICES: THE BASICS MATRIX OPERATIONS
SYSTEM OF LINEAR EQUATIONS
CRITICAL POINTS
MAXIMA AND MINIMA
DIFFERENTIATION
FUNCTIONS AND DERIVATIVES
MODULE 3 MULTIVARIABLE CALCULUS FUNCTIONS: PRIMER
MULTIVARIABLE FUNCTIONS
TAYLOR SERIES
THE HESSIAN
THE JACOBIAN
VECTOR-VALUED FUNCTIONS
DOT PRODUCT - EXAMPLE
VECTORS AND VECTOR SPACES APPLICATION
MODULE 4
VECTOR-VALUED FUNCTIONS INTRODUCTION TO LINEAR ALGEBRA
VECTOR OPERATIONS - THE DOT PRODUCT
VECTOR SPACES VECTORS: THE BASICS

Certified Data Scientist | Syllabus | ©2020 DataMites®


COURSE 5 STATISTICS FOR DATA SCIENCE
DESCRIPTIVE AND INFERENTIAL STATISTICS.
INTRODUCTION TO DEFINITIONS
MODULE 1
STATISTICS TERMS
TYPES OF DATA
TYPES OF SAMPLING DATA. SIMPLE RANDOM
MODULE 2 HARNESSING DATA SAMPLING
STRATIFIED
MEAN 4
MEDIAN AND MODE

MODULE 3 EXPLORATORY ANALYSIS DATA VARIABILITY


STANDARD DEVIATION
Z-SCORE
OUTLIERS
NORMAL DISTRIBUTION
CENTRAL LIMIT THEOREM
HISTOGRAM
MODULE 4 DISTRIBUTIONS NORMALIZATION
NORMALITY TESTS
SKEWNESS
KURTOSIS.
UNDERSTANDING HYPOTHESIS TESTING
HYPOTHESIS TESTING NULL AND ALTERNATE HYPOTHESES
MAKING A DECISION
HYPOTHESIS TESTING - CRITICAL VALUE METHOD
CRITICAL VALUE METHOD CRITICAL VALUE METHOD – EXAMPLES
P-VALUE METHOD
HYPOTHESIS TESTING – P-
P-VALUE METHOD – EXAMPLES
VALUE METHOD
TYPES OF ERRORS
T DISTRIBUTION
MODULE 5 ONE SAMPLE T-TEST
T-TESTS
INDEPENDENT AND RELATIONAL TWO-SAMPLE TEST
T-TEST HYPOTHESIS TESTING IN PYTHON.
ANALYSIS OF VARIANCE (ANOVA) THEORY
HYPOTHESIS TESTING WITH MORE THAN TWO
ONE WAY ANOVA TEST / F- VARIABLES WITH ANOVA
TEST
INDUSTRY EXAMPLE
F-TEST HYPOTHESIS TESTING IN PYTHON.
NON-PARAMETRIC CHI-SQUARE TEST THEORY
HYPOTHESIS TESTING APPLICATION OF CHI-SQUARE IN PYTHON
DIRECT AND INDIRECT CORRELATION
CORRELATION WITH STRONG AND WEAK
COLLERATION
MODULE 6 CORRELATION & REGRESSION CALCULATING CORRELATION WITH PYTHON
REGRESSION THEORY
SIMPLE LINEAR REGRESSION WITH PYTHON

Certified Data Scientist | Syllabus | ©2020 DataMites®


COURSE 6 DATA PREPARATION WITH NUMPY & PANDAS
INTRODUCTION
NUMPY BASICS
CREATING NUMPY ARRAYS

NUMPY STRUCTURE AND CONTENT OF ARRAYS


MODULE 1 NUMERICAL PYTHON
PACKAGE
SUBSET
SLICE 5
INDEX AND ITERATE THROUGH ARRAYS
MULTIDIMENSIONAL ARRAYS
PYTHON LISTS VS NUMPY ARRAYS
BASIC OPERATIONS
OPERATIONS ON NUMPY
MODULE 2 OPERATIONS ON ARRAYS
ARRAYS
BASIC LINEAR ALGEBRA OPERATIONS
PANDAS BASICS
INDEXING AND SELECTING DATA
PANDAS
MODULE 3 MERGE AND APPEND
PANEL DATA PACKAGE
GROUPING AND SUMMARIZING DATAFRAME
LAMBDA FUNCTION & PIVOT TABLES
PANDAS BASICS

DATA CLEANING INDEXING AND SELECTING DATA


MODULE 4 DATA MUNGING WITH MERGE AND APPEND
PANDAS GROUPING AND SUMMARIZING DATAFRAME
LAMBDA FUNCTION & PIVOT TABLES

COURSE 7 VISUALIZATION WITH PYTHON


COMPONENTS OF A PLOT
DATA VISUALIZATION TOOLKIT
MODULE 1 BASICS OF VISUALIZATION
FUNCTIONALITIES OF PLOTS
SUB-PLOTS
INTRODUCTION
PLOTTING AGGREGATE VALUES ACROSS CATEGORIES
PLOTTING DISTRIBUTIONS ACROSS CATEGORIES
PLOTTING CATEGORICAL AND
MODULE 2 BIVARIATE DISTRIBUTIONS - PLOTTING PAIRWISE
TIME SERIES DATA
RELATIONSHIPS
VECTOR SPACES
VECTORS: THE BASICS
INTRODUCTION
PLOTTING DATA
MODULE 3 UNIVARIATE DISTRIBUTIONS
DISTRIBUTIONS
UNIVARIATE DISTRIBUTIONS - RUG PLOTS

Certified Data Scientist | Syllabus | ©2020 DataMites®


COURSE 8 MACHINE LEARNING ASSOCIATE
WHAT IS ML? ML VS AI

MACHINE LEARNING ML WORKFLOW


MODULE 1
INTRODUCTION STATISTICAL MODELING OF ML
APPLICATION OF ML 6
POPULAR ML ALGORITHMS
CLUSTERING
MACHINE LEARNING
MODULE 2 CLASSIFICATION AND REGRESSION
ALGORITHMS
SUPERVISED VS UNSUPERVISED
CHOICE OF ML ALGORITHMS
REGRESSION LINE
SIMPLE LINEAR REGRESSION
BEST FIT LINE
ASSUMPTIONS OF SIMPLE LINEAR REGRESSION
READING AND UNDERSTANDING THE DATA
HYPOTHESIS TESTING IN LINEAR REGRESSION
LINEAR REGRESSION IN PYTHON
BUILDING A LINEAR MODEL

MODULE 3 RESIDUAL ANALYSIS AND PREDICTIONS


LINEAR REGRESSION USING SKLEARN
SIMPLE LINEAR REG VS MULTIPLE LINEAR REG
MULTICOLLINEARITY

MULTIPLE LINEAR REGRESSION DEALING WITH CATEGORICAL VARIABLES

MODEL ASSESSMENT AND COMPARISON

FEATURE SELECTION

INTRODUCTION: UNIVARIATE LOGISTIC


REGRESSION
LOGISTIC REGRESSION
BINARY CLASSIFICATION
BINARY CLASSIFIER
SIGMOID CURVE
FINDING THE BEST FIT SIGMOID CURVE SUMMARY
MODULE 4 MULTIVARIATE LOGISTIC REGRESSION
DATA CLEANING AND PREPARATION

LOGISTIC REGRESSION BUILDING YOUR FIRST MODEL


MODEL BUILDING FEATURE ELIMINATION USING RFE
CONFUSION MATRIX AND ACCURACY
MANUAL FEATURE ELIMINATION

Certified Data Scientist | Syllabus | ©2020 DataMites®


COURSE 8 MACHINE LEARNING ASSOCIATE
METRICS BEYOND ACCURACY: SENSITIVITY &
SPECIFICITY
LOGISTIC REGRESSION
MODULE 4 MODEL EVALUATION FINDING THE OPTIMAL THRESHOLD USING ROC
CURVE

METRICS BEYOND ACCURACY: PRECISION & RECALL


INTRODUCTION TO KNN
7
HOW IT WORKS: THEORY
SUPERVISED LEARNING:
K NEAREST NEIGHBOR PROS AND CONS OF KNN
MODULE 5
KNN CLASSIFIER APPLICATIONS OF KNN
MODEL BUILDING KNN IN PYTHON SKLEARN
EVALUATION: KNN MODEL.
INTRODUCTION

UNSUPERVISED LEARNING: UNDERSTANDING CLUSTERING


CLUSTERING
PRACTICAL EXAMPLE OF CLUSTERING - CUSTOMER
SEGMENTATION

INTRODUCTION
STEPS OF THE ALGORITHM
K MEANS ALGORITHM

MODULE 6 K MEANS AS COORDINATE DESCENT


K MEANS CLUSTERING
VISUALISING THE K MEANS ALGORITHM

PRACTICAL CONSIDERATION IN K MEANS


ALGORITHM

CLUSTER TENDENCY
INTRODUCTION
K MEANS IN PYTHON
CASE: IRIS DATASET IRIS DATA PREPARATION
CLUSTERING MAKING THE CLUSTERS

INTRODUCTION
MODULE 7
HIERARCHICAL CLUSTERING HIERARCHICAL CLUSTERING ALGORITHM
INTERPRETING THE DENDROGRAM
THE WHY'S AND WHAT'S OF PCA
BUILDING BLOCKS OF PCA
UNSUPERVISED LEARNING:
PRINCIPLE COMPONENT ILLUSTRATION - FINDING PRINCIPAL COMPONENTS
MODULE 8
ANALYSIS (PCA)
COMPREHENSION - CALCULATING THE PRINCIPAL
COMPONENTS

SINGULAR VALUE DECOMPOSITION (SVD)

Certified Data Scientist | Syllabus | ©2020 DataMites®


COURSE 9 ADVANCED MACHINE LEARNING
INTRODUCTION TO DECISION TREES

CLASSIFICATION AND INTERPRETING A DECISION TREE


Module 1 REGRESSION TREE (CART): COMPREHENSION - DECISION TREE CLASSIFICATION IN
DECISION TREE PYTHON
REGRESSION WITH DECISION TREES
INTRODUCTION
8
CONCEPT OF HOMOGENEITY
GINI INDEX
Module 2 THEORY OF DECISION TREE
ENTROPY AND INFORMATION GAIN
COMPREHENSION - INFORMATION GAIN
SPLITTING BY R-SQUARED
BUILDING DECISION TREES IN PYTHON
CHOOSING TREE HYPERPARAMETERS IN PYTHON
DECISION TREE HYPER- COMPREHENSION - HYPERPARAMETERS
Module 3
PARAMETER TUNING TREE TRUNCATION
ADVANTAGES AND DISADVANTAGES TREE
TRUNCATION
INTRODUCTION
ENSEMBLES
RANDOM FOREST COMPREHENSION - ENSEMBLES
Module 4 ENSEMBLE BAGGING
TECHNIQUE CREATING A RANDOM FOREST
COMPREHENSION - OOB (OUT-OF-BAG) ERROR
RANDOM FORESTS LAB
INTRODUCTION: NAIVE BAYES
CONDITIONAL PROBABILITY AND ITS INTUITION
NAÏVE BAYES: BAYES BAYES' THEOREM
THEOREM AND ALGORITHM
BUILDING BLOCKS NAIVE BAYES WITH ONE FEATURE
CONDITIONAL INDEPENDENCE IN NAIVE BAYES
DECIPHERING NAIVE BAYES
INTRODUCTION NAIVE BAYES FOR TEXT
Module 5
CLASSIFICATION
DOCUMENT CLASSIFIER PRE-PROCESSING STEPS
NAÏVE BAYES: TEXT
CLASSIFICATION DOCUMENT CLASSIFIER WORKED OUT EXAMPLE
HAM VS SPAM CASESTUDY LAPLACE SMOOTHING
BUILDING SPAM HAM CLASSIFIER
COMPREHENSION NAIVE BAYES FOR TEXT
CLASSIFICATION

Certified Data Scientist | Syllabus | ©2020 DataMites®


COURSE 9 ADVANCED MACHINE LEARNING
INTRODUCTION TO BOOSTING
WEAK LEARNERS
ADABOOST ALGORITHM
ADABOOST DISTRIBUTION AND PARAMETER

Module 6
BOOSTING: INTRODUCTION,
ADABOOST, GRADIENT
CALCULATION
ADABOOST LAB 9
BOOSTING, XGBOOST UNDERSTANDING GRADIENT BOOSTING
GRADIENT IN GRADIENT BOOSTING
GRADIENT BOOSTING ALGORITHM
XGBOOST
KAGGLE PRACTICE EXERCISE
INTRODUCTION TO SVM
CONCEPT OF A HYPERPLANE IN 2D
CONCEPT OF A HYPERPLANE IN 3D

SUPPORT VECTOR MACHINE: MAXIMAL MARGIN CLASSIFIER


THEORY THE SOFT MARGIN CLASSIFIER
THE SLACK VARIABLE
Module 7 NOTION OF SLACK VARIABLES
COST OF MISCLASSIFICATION
MAPPING NONLINEAR DATA TO LINEAR DATA
FEATURE TRANSFORMATION
SVM : IMPLEMENTING SVM IN
THE KERNEL TRICK
SKLEARN, CASESTUDY
MODELING SVM PYTHON SKLEARN
MODEL EVALUATION
INTRODUCTION TO ANN
SIMPLE ANN NETWORK

ARTIFICIAL NEURAL NETWORK HOW IT WORKS: BACKPROP ALGORITHM


Module 8
(ANN) IMPLEMENTING ANN WITH PYTHON SKLEARN
ANN MODELING AND EVALUATION
COMPREHENSION
ADV EVALUATION METRICS: ROC_AUC, R2 THEORY,
PRECISION, RECALL, F1 SCORE, RMSE
K-FOLD CROSSVALIDATION

Module 9 ADVANCED ML CONCEPTS GRID AND RANDOMIZED SEARCH CV IN SKLEARN


IMBALANCED DATA SET : SMOTE TECHNIQUE
FEATURE SELECTION TECHNIQUES
CHOOSING RIGHT ALGORITHMS

Certified Data Scientist | Syllabus | ©2020 DataMites®


COURSE 10 SQL FOR DATA SCIENCE
INSTALL SQL PACKAGES AND SQLALCHEMY
MODULE 1
CONNECTING TO DB PYMYSQL

BASICS OF SQL DB
RDBMS (RELATIONAL
MODULE 2 DATABASE MANAGEMENT) PRIMARY KEY
BASICS
FOREIGN KEY
10
SELECT SQL COMMAND, RETRIEVING DATA WITH SELECT SQL COMMAND
MODULE 3
WHERE CONDITION WHERE CONDITION TO PANDAS DATA FRAME.
ORDER BY CLAUSE
AGGREGATE FUNCTIONS
GROUP BY CLAUSE
MODULE 4 ADVANCED SQL
HAVING CLAUSE
NESTED QUERIES
INNER JOIN, OUTER JOINS, MULTI JOIN

COURSE 11 DEEP LEARNING – CNN FOUNDATION


WHAT IS DEEP LEARNING?
INTRODUCTION TO DEEP
MODULE 1 VARIOUS DEEP LEARNING MODELS IN PRACTICE
LEARNING
AND APPLICATIONS
IMAGE RESOLUTION
INTRODUCTION TO IMAGE
MODULE 2 PIXELS
BASICS
IMAGE MANIPULATIONS WITH FILTERS
CNN ESSENTIALS
CONVOLUTIONAL NEURAL CNN ARCHITECTURE
MODULE 3
NETWORK CNN INTRO WORK FLOW OF IMAGE CLASSIFICATION WITH
CNN
CASE STUDY: KERAS–
CNN HANDS ON APPLICATION FOR
MODULE 4 TENSORFLOW IMAGE
CLASSIFICATION IMAGES OF CATS AND DOGS
CLASSIFICATION

COURSE 12 TABLEAU ASSOCIATE


TABLEAU INTERFACE
MODULE 1 TABLEAU INTRODUCTION DIMENSIONS AND MEASURES
FILTER SHELF
CONNECTING TO SOURCES
CONNECTING TO DATA EXCEL
MODULE 2
SOURCE DATABASE
PDF
MODULE 3 VISUAL ANALYTICS CHARTS AND PLOTS WITH SUPERSTORE DATA
FORECASTING TIME SERIES DATA
MODULE 4 FORECASTING
FORECASTING SALES IN TIME PERIODS

Certified Data Scientist | Syllabus | ©2020 DataMites®


COURSE 13 MACHINE LEARNING MODEL DEPLOYMENT - API
BASICS OF APPLICATION API BASICS
MODULE 1 PROGRAM INTERFACE
(API) LOOSELY COUPLED ARCHITECTURE

INSTALLATION AND CONFIGURING FLASK 11


INSTALLING FLASK AND CROSS DOMAIN AUTHENTICATION WITH
MODULE 2
FLASK CORS FLASK_CORS

EXAMPLE TO USE FLASK AS API SERVER

COMPLETE PROJECT FLOW WITH API


END OF END ML PROJECT
MODULE 3 DEPLOYMENT AND ASSESSING THROUGH
WITH API DEPLOYMENT WEBSITE

COURSE 14 BIG DATA ESSENTIALS


WHAT IS BIG DATA?
MODULE 1 BIG DATA INTRODUCTION
VARIOUS BIG DATA FRAMEWORKS

HADOOP INTRODUCTION

SPARK BIG DATA FOR MACHINE LEARNING


MODULE 2 HADOOP AND SPARK
MANAGING BIG DATA IN DATA SCIENCE
PROJECTS

COURSE 15 DATA SCIENCE PROJECT EXECUTION

CRISP DM FRAMEWORK
DATA SCIENCE: PROJECT
MODULE 1
STRUCTURE 6-PHASE PROJECT EXECUTION

ML USE CASE DEVELOPMENT

MODULE 2 BUSINESS ASPECTS PROJECT MANAGEMENT METHODOLOGY

CHALLENGES AND PITFALLS

Certified Data Scientist | Syllabus | ©2020 DataMites®


PROGRAM DETAILS
COURSE NAME : CERTIFIED DATA SCIENTIST
DURATION : 2 + 4 MONTHS
LEARNING MODE : LIVE ONLINE TRAINING 12

admissions@datamites.com PROGRAM SCHEDULE


INDIA :+91 1800-313-3434 http://datamites.net/program-schedule
US : +1 628 228 6062 ELIGIBILITY TEST
UK : +44 752 066 5626 http://datamites.net/eligibility-test

COURSE BROCHURE DOWNLOAD

DATA SCIENCE IS RATED AS THE TOP CAREER


HIGHEST PAID – RECESSION PROOF – MILLIONS OF JOBS

DataMites provides the most comprehensive and industry aligned Data Science Program

15,000+ LEARNERS 55% AVG SALARY HIKE


50+ ELITE TRAINERS ₹ 75 LAKHS TOP SALARY

ENQUIRE NOW

TAKE FIRST STEP TOWARDS DATA SCIENCE CAREER

©2020 DataMites. All content are in this document is copyrighted,


Reproducing any part of the content requires written permission from DataMites®, IABAC®

You might also like