You are on page 1of 7

BIRLA INSTITUTE OF

TECHNOLOGY & SCIENCE,


PILANI
WORK INTEGRATED LEARNING PROGRAMMES

COURSE HANDOUT

Part A: Content Design


Course Title Foundations of Data Science
Course No(s) MBA ZG536/PDBA ZG536
Credit Units 4
Course Author Arindam Roy
Version No 1.0
Date 1 June 2020

Course Description
Introduction, Role of a Data Scientist, Statistics vs. Data Science, Fundamentals of Data Science,
Data Science process and life cycle, Exploratory Data Analysis, Data Engineering and shaping,
Overview of Data Science Techniques and Models, Introduction to Regression, Classification,
Shrinkage, Dimension Reduction, Tree-based models, Support Vector Machines, Unsupervised
learning, Choosing and evaluating models, Featuarization, Overview of Neural Networks, Data
mining, and pattern recognition techniques, Documentation, Deployment, and Presentations of the
insights

Course Objectives

No Objective

CO1 Get introduced to the field of Data Science, roles, process and challenges involved
therein

CO2 Explore and experience the steps involved in the data preparations and exploratory data
analysis

CO3 Learn to select and apply proper analytics technique for various scenarios, assess the
models performance and interpret the results of the predictive model

CO4 Get familiarity with the general deployment considerations of the predictive models

CO5 Appreciate the importance of techniques like data visualization, storytelling with data for
the effective presentations of the outcomes to the stakeholders

Text Book(s)

No Author(s), Title, Edition, Publishing House


T1 Data Science for Business, By Foster Provost & Tom Fawcett, O’REILLY
T2 Applied Predictive Analytics, By Dean Abbott, WILEY
Reference Book(s) & other resources

No Author(s), Title, Edition, Publishing House


R1 Introduction to Data Mining, By Tan, Steinbach and Vipin Kumar, PEARSON
R2 Machine Learning using Python, Manaranjan Pradhan & U Dinesh Kumar, WILEY

Content Structure

No Title of the Module


M1 Data Science Foundations:
o Applications of Data Science
o Role and responsibilities of Data Scientists
o Comparing Data Science with other domains
o Challenges in the field of Data Science
o Data Science Process
o Data Scientists Toolbox

M2 Data Prep and Exploratory Data Analysis:


o Type of Data and data sets
o Data Quality
o Data Preprocessing
o Feature Creation
o Dimension Reduction
o Feature Selection
o Measures of Similarity and Dissimilarity
o Descriptive Analysis
o Data Visualizations

M3 Descriptive Modeling:
o Clustering
o Association Rules
o Principal Component Analysis
o Interpreting Descriptive models

M4 Predictive Modeling:
o Linear Regression
o Logistic Regression
o K-nearest neighbor
o Decision Tree
o Naïve Bayes
o Support Vector Machines
o Neural Networks
o Model Ensembles
o Assessing Predictive models

M5 Post-processing:
o General deployment considerations
o The Narrative - report / presentation structure
o Building narrative with Data
o Effective storytelling
Learning Outcomes:

No Learning Outcomes

LO1 Applications of Data Science and the process of Data Science project life cycle

LO2 Techniques and tools effective in addressing the data preprocessing and exploratory data
analysis stages

LO3 Applications of Descriptive and Predictive Data Analytics techniques

LO4 Hands-on experience of model building, evaluations and interpretations of results

LO5 Knowledge of post-processing involved in Data Science project including deployment


considerations, importance of effective storytelling

Part B: Contact Session Plan

Academic Term First Semester 2022-2023

Course Title Foundations of Data Science

Course No MBA ZG536 / PDBA ZG536

Lead Instructor Arindam Roy

Course Contents

Contact Contact List of Topic Title Text/Ref


Sessions Hours (from content structure in Course Handout) Book/external
(#) (#) resource

Module 1 : Data Science Foundations

1 1  Applications of Data Science T2: Ch 1


 Role and responsibilities of Data Scientists
2 T1:Ch 1, 2
 Comparing Data Science with other domains
R4:Ch1

 Challenges in the field of Data Science Additional


Reading(AR)
Class room
discussion

2 3  Data Science Process T1 : Ch 1


T2 : Ch 2
4 Class room
 Data Scientists Toolbox
discussion
Module 2: Data Prep and Exploratory Data Analysis

3 5  Type of Data and data sets R1: Ch 2


 Data Quality
6
 Data Preprocessing R1:Ch 2
T2: Ch 4

4 7  Feature Creation T2: Ch 4


 Dimension Reduction R1 : Appendix
8 T1 : Ch 2
 Feature Selection
AR

5 9  Measures of Similarity and Dissimilarity R1 : Ch 2

10 T2 : Ch 3
 Descriptive Analytics
 Data Visualizations R2 : Ch 2
R1 : Ch 3

Module 3 : Descriptive Modeling

6 11  Clustering T2 : Ch 6, 7
o Applications
o Data prep for clustering
o K-means algorithm

12 o Hierarchical clustering algorithm T1 : Ch 6


o Standard cluster model interpretation

7 13  Association Rules T2 : Ch 5
o Terminology R1 : Ch 6
o Parameter Settings
o Item set and candidate rules generation
o Apriori algorithm
14
o Measures of interesting rules
o Problems with Association rules
R4 : Ch 9
o Collaborative filtering

8 15  Principal Component Analysis T2 : Ch 6

16 T2 : Ch 7
 Interpreting Descriptive models
 Mid semester course review

Module 4: Predictive Modeling

9 17  Linear Regression T2 : Ch 8
o Simple Linear regression R4 : Ch 4
18 o Model diagnostics

 Multiple Linear regression T2 : Ch 8


o Categorical encoding R4 : Ch 4
o Multi-collinearity and VIF
o Residual analysis

10 19  Logistic Regression T1 : Ch 4
o Classification overview R4 : Ch 5
o Binary classification
o Gain chart and lift chart
20
o Interpreting Logistic regression models T2 : Ch 8
o Practical considerations

11 21  K-nearest neighbor T2 : Ch 8
o k-NN learning algorithm
o Distance metrics for k-NN R4 : Ch 6
22
o Practical Considerations

 Naïve Bayes T2 : Ch 8
o Bayes theorem
o The Naïve Bayes classifier R1 : Ch 5
o Interpreting Naïve Bayes classifier
o Practical considerations

12 23  Decision Tree T2 : Ch 8
o Decision tree landscape R1 : Ch 4
o Building decision trees

24 o Decision tree splitting metrics


o Decision tree Knobs and Options
o Practical considerations

13 25  Support Vector Machines T1 : Ch 4


o Maximum Margin Hyperplanes R1 : Ch5
o Linear SVM
26
 Neural Networks
o Building blocks R1 : Ch 5
o Network training
o Neural network setting , pruning T2 : Ch 8
o Interpreting decision boundaries
o Practical considerations

14 27  Model Ensembles T2 : Ch 10
o Motivation for Ensembles R1 : Ch 4
o Bagging
o Boosting
o Random forests
o Interpreting Model Ensembles
28
 Assessing Predictive models T2 : Ch 9
o Generalization T1 : 4, 5
o Model overfitting
o Batch approach to Model assessment
o Methods for comparing classifiers

Module 6: Post-processing

15 29  General deployment considerations T2 Ch:12


o Deployment steps
30
 The Narrative Class room
o Report structure discussion
o Presentation structure
 Building narrative with Data

16 31  Effective Story telling with Data AR

32  Course recap

# The above contact hours and topics can be adapted for non-specific and specific WILP
programs depending on the requirements and class interests.

Lab Details
Title Access URL
Lab Setup Instructions

Lab Capsules

Additional References

Select Topics and Case Studies from business for experiential learning

Topic Select Topics in Syllabus for experiential learning Access URL


No.
1. Descriptive Analytics – Exploring the structured data R4 : Ch 2

2. Clustering Techniques – Grouping the data based on similarity R4 : Ch 7

3. Recommendation Techniques – Providing the suggestions R4 : Ch 9

4. Linear Regression Techniques – Predicting the numeric value R4 : Ch 4

5. Classification Problems – Providing the class labels R4 : Ch 5

6. Data Science with Cloud based services AWS docs


Evaluation Scheme
Legend: EC = Evaluation Component
No Name Type Duration Weig Day, Date, Session, Time
ht
Experiential Learning
Assignment 1
EC1 Take Home-Online 25% To be announced
Experiential Learning
Assignment 2
EC2 Mid-Semester Exam Open Book 2 hours 30% Sunday, 25/09/2022 (FN)
EC3 Comprehensive Exam Open Book 2 hours 45% Sunday, 27/11/2022 (FN)

Important Information
Syllabus for Mid-Semester Test (Open Book): Topics in Weeks 1-8
Syllabus for Comprehensive Exam (Open Book): All topics given in plan of study
Evaluation Guidelines:
1. EC-1 consists of two Assignments. Announcements regarding the same will be made in a
timely manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted.
Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
3. For Open Book exams: Use of prescribed and reference text books, in original (not
photocopies) is permitted. Class notes/slides as reference material in filed or bound form is
permitted. However, loose sheets of paper will not be allowed. Use of calculators is
permitted in all exams. Laptops/Mobiles of any kind are not allowed. Exchange of any
material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the
student should follow the procedure to apply for the Make-Up Test/Exam. The
genuineness of the reason for absence in the Regular Exam shall be assessed prior to
giving permission to appear for the Make-up Exam. Make-Up Test/Exam will be
conducted only at selected exam centers on the dates to be announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self-study
schedule as given in the course handout, attend the lectures, and take all the prescribed evaluation
components such as Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to
the evaluation scheme provided in the handout.

You might also like