You are on page 1of 3

Course Outline

Workshop on Data Science and Analytics

Course Description
Internet based digital businesses have grown rapidly in the last decade with Covid-19 providing a
greater impetus. With such businesses on rise, naturally the amount of data is also increasing by
leaps and bounds, now the data is being computed in exabytes and yottyabytes 1. Since there is a
huge amount of data being accumulated it is imperative to have a scientific mechanism to extract
knowledge from this data. Data Science provides a method to extract knowledge from this sea of
data. A data scientist is expected to possess skills in mathematics, statistics, machine learning,
databases and other branches of computer science along with a good understanding of the problem
formulation to create an effective solution. This course will introduce students to the rapidly
growing field of data science and analytics and its applications in different functional practices of
business. Students will be exposed to various aspects of data science practice, together with data
collection and integration, exploratory data analysis, predictive modelling, descriptive modelling
and solution presentation.

Learning Outcomes
Deliverables of this course are:
• Explain the concepts of Data Science and its components
• Develop kill sets required to be a data scientist.
• Understanding the applications of statistical tools
• Python programming language for statistical modelling and analysis
• Exploratory data analysis (EDA) in data science.
• Apply machine learning algorithms (Linear Regression, Logistics Regression) for
predictive modelling.

1
Tibi Puiu, “How big is a petabyte, exabyte or yottabyte? What’s the biggest byte for that matter?”, ZME Science,
accessed on April 20, 2020, https://www.zmescience.com/science/how-big-data-can-get/

1
Evaluation methods:

Students will be evaluated on the basis of:

Class participation: 10%

Quiz: 30%

Individual Assignment: 10%

Group Project: 50%

Following is a listing of session wise contents; each session will be of 75 minutes

Session Number Topics Learning Resource


1 Introduction: What is Data Science?
- Why Data Science?
- Current landscape of perspectives
- Discussion on skills required

2-3 Data Science Toolkit


Understanding Python as a programming
environment
Setting up environment

-Python and Jupyter Notebook


- First Python sheet

4-5 Python Constructs


List, dictionaries & Tuples
Strings
Iterations

6-7 Libraries and Packages


Numpy & Pandas

8-9-10 Exploratory Data Analysis (EDA)


Basic tools (plots, graphs and summary
statistics) of EDA;
Reading data from various sources and
platforms

2
Data Cleaning and missing value
imputation

11 Statistical Inference through statsmodel


- Populations and samples
- Statistical modelling, probability
distributions, concept of hypothesis
testing( one sample -2 tail test)

12-13 Introduction to Machine Learning


Algorithm and applications
Supervised Machine Learning; Simple
Linear Regression; Multiple Linear
Regressions

14 Introduction to classification technique


and related steps
Logistic Regression
(Classification Technique)
Data Science Ethics

15- 16 Recap of the skill sets


Student presentations on projects

Recommended Text Book

Python for Data Analysis, O’reilly Wes McKinney


Python for Everybody, by Charles Severance
Doing Data Science by Rachel Schutt O’reilly
Python Data Science handbook by Jake vanderPlas

Reference Books:
Mastering Python for Data Science – Samir Madhavan
Hands-On Data Analysis with NumPy and Pandas -By Curtis Miller

You might also like