You are on page 1of 3

Data Science for Starters

About the Course


In this live training you will explore how to get into Data Science with a non-mathematical
background. You would be practicing with Python programming to work on Data Science models and
visualizations. You will also be replicating those steps in Astera DataPrep and Analytics Workbench.

This training will help you build fundamental concepts in all three elements of Data Science –
Statistics, Machine Learning, and ML Pipelines. We’ll be covering the core process of Data Science
from Data Manipulation, Descriptive Analytics and Visualization to Statistical Analysis, Machine
Learning models, and Regression Algorithms.

By the end of this training, you will have enough knowledge and hands-on expertise in different tools
of Data Science to use and apply them in the real world around you.

Instructor 1: Ayesha Amjad


Instructor 2: Shehmeer Adil
Support: Danish Hudani

Timings and Duration


This is an 8 weeklong training with 75-minute sessions twice a week.
Tentative: Jan23-Feb23

Course Outline
- What is Data Science
- What is Analytics
- Types of Analytics
- Data Science vs ML vs AI
Session 1
- Types of Machine Learning
Week 1

- Statistical Analysis vs ML
- Big Data Analytics
Introduction
- Python/R pre-requisites
- Python vs R
- No-Code Applications in DS
Session 2 - StatQuest and other resources
- Live Quiz 01
- Setting up Dataprep and Analytics Workbench
- Types of data collection (cross-sectional/Time
Series/Panel/Pooled)
- Data Sampling and its types
Session 1 - Class Imbalance
Week 2

- Data Assumptions and conditions


Collecting and - Correlation Analysis
Exploring Data - Contingency Tables
- Data Visualizations (line, bar, scatter)
Session 2 - Intro to Kaggle, and Anaconda
- Feature Engineering
Data -
-
Feature Scaling
Feature Selection
Manipulation Session 1 - Types of variables
Week 3
- Interaction Variable
and
segmentation -
-
Dealing with Missing Value
Dealing with Outliers
(Test/Train Session 2 - Dealing with non-constant variance
Splits) -
-
Dealing with multicollinearity
Test/Train splits and IQR shuffling
- Linear Regression
- Optimization vs estimation algorithms (cost
function vs iterative function)
Regression Session 1 - Gauss-Markov assumptions (Review)
- OLS vs WLS vs PLS vs GLS
Week 4

Analysis - LR Validation and model diagnostics


(Statistical - Overfitting underfitting
Analysis and - Bias-Variance Trade off
- Regularized Regression
Learning) Session 2 - Industrial Practices for using regression model
- Hands-On Lab: Building a Regression Model
- Hands-On Activity: Evaluating Performance
- Generalized Linear Regression
- Family and Link function
- Maximum Likelihood estimations vs OLS
GLM and Session 1 - Poisson Regression
Week 5

Predictive - Logit/Probit Models


- Gamma Regression
Modeling
(Predictive - Diagnostics Modeling vs Predictive Modeling
(dependent variable as an interaction term,
Analysis) Session 2 shipdamage data)
- Hands-On Activity: Building a GLM model

- Refresher on ML and DS workflow


- Decision Tree
- Gini vs Entropy vs Information Gain
- Confusion Matrix
Session 1
Week 6

- Understanding Precision, Accuracy, Recall, F-


Tree- Based measure
- ROC-AUC curves
ML Algorithms
- Random Forest
- Bootstrapping
Session 2 - Model Validation – Kfold
- Hands-On Activity: Churn Prediction Model
- Understanding unsupervised Learning
- Hidden Patterns using ML
- Clustering Basics
- Associate Rule-Based mining in retail industry
Session 1 - K-Means and KNN
Week 7

Clustering, and - Elbow Analysis


- Cluster Validation – Silhouette Coefficient
Ensemble
Modeling -
-
Introduction to Radar and pie charts
Anomaly detection and its removal
- Bagging, Boosting
Session 2
- Ensemble Methods
- Hands-on Activity: Social Media Clustering

- Intro to recommender systems


- Collaborative Filtering vs Content-based
filtering
Recommender Session 1
Week 8

- Recommendation Engine
- Hands-on Activity: Netflix Movie
System and Recommendation
Final Project - Review
- Discussion on Final Projects
Session 2

You might also like