You are on page 1of 4

DATA SCIENCE WITH PYTHON

Introduction to Data Science with Python


This section introduces the students to Python statistical computing environment.
1

8 0
Viewing time: 01:11 hrs 0 MCQ
0 case Study

An understanding of analytics and data mining concepts


Familiarity with some common terms in analytics
Familiarity with pandas, numpy, matplotlib and scikitlearn
An understanding of basic python data structures such as lists, tuples
and strings
Be able to use lambda functions, map() and filter()
DATA SCIENCE WITH PYTHON
Scientific distributions used in Python for Data Science:
This class introduces the students to numpy, pandas and matplotlib library.

16
Viewing time: 3:19 hrs 1 Non-Graded assignment

Understanding how to work with numpy arrays, doing tasks such as slicing etc.
Introduction to pandas Data Frames and Series objects
Learning how to read in different flat files in python using pandas
Learning how to work with Web APIs, HTML and XML files
Learning how to run SQL queries in python
Be able to use basic charting functions using Pandas and Matplotlib
DATA SCIENCE WITH PYTHON
Machine Learning
In this class the basic thought process behind machine learning is introduced.
3

5
Viewing time: 0:50 hrs No Case Study

Appreciation of basic thought process behind machine learning tasks


Introduction to common Machine Learning tasks
Be able to build common machine learning models using Scikit learn API
An understanding of common classification error metrics
Understand the use of confusion matrix in the context of classification task
DATA SCIENCE WITH PYTHON
Practical Applications of Machine Learning
In this class building of machine learning models is demonstrated.
4

10 No Case Study
Viewing time: 1:44 hrs

Understanding how to pre=process data to build a machine learning model


Be able to use one hot encoding to pre-process categorical variables
Be able to use pipelines to automate steps in model building
Understand the use of ensembles such as Random Forests
Understand the notion of in-sample and out-sample error using bias variance trade-off
Be able to interpret ROC curves and AUC metric to compare the performance of different
classifiers
Be able to handle multiclass problems using SVM, Random Forests