You are on page 1of 125

Savitribai Phule Pune University, Pune

Second Year Electronics / E&Tc Engineering (2019 Course)


204198: Data Analytics Laboratory
Teaching Scheme: Credit Examination Scheme:

PR: 02 hr/week 01 OR: 25 Marks

Prerequisite Courses, if any: Good Python programming skills, fundamentals of


statistics
Companion Course, if any: Nil
• Course Objectives:
1. To introduce to students fundamentals of data science
2. To introduce to students various Python packages related to data science
3. To make student write Python programs related to data sequences using NumPy
and Pandas
4. To make student write Python programs related to data frames using NumPy and
Pandas

• Course Outcomes:
On completion of the course, student will able to -
1. Install Python Jupyter and write programs using NumPy, Pandas, Matplotlib and
Scikit-learn
2. Write programs related to 1D and 2D arrays
3. Write programs using data series
4. Write programs using data frames
5. Write programs to visualize output using various graphs and plots
6. Complete an end to end project related to data analytics

Guidelines for Instructor's Manual


This course introduces student to the basics of the Python programming environment for
preliminary data science applications. The course also introduces data manipulation and
cleaning techniques using the popular Python Pandas and Scikit-learn library and introduces
the abstraction of the Series and DataFrame as the central data structures for data analysis.
Design minimum ten lab assignments based on the syllabus. The focus shall be on to make
student take tabular data, clean it, manipulate it, and run basic inferential statistical analyses.
It is preferred to use some real life data (of small size) for validation of the assignments.

Guidelines for Student's Lab Journal


The student’s Lab Journal can be assignments submitted in the form a soft copy/hard copy. In
case of soft copy submission, the print out of only first page can be kept in the Journal. It should
include following as applicable:
Assignment No, Title of Assignment, Date of Performance, Date of Submission, Aims &
Objectives, Theory, Description of data used, Results, Conclusion.

Guidelines for Lab /TW Assessment


The oral examination will be based on the work carried out by the student in the Lab course.
Suitable rubrics can be used by the internal & external examiner for assessment.
Guidelines for Laboratory Conduction
During each lab experiment the following activities will be carried out:
1. The instructor will explain the aims & objectives of the assignments
2. The instructor will explain the topics required to carry out the experiment
3. The students will do the hands on as per the Lab manual & Web resources provided.
4. The students will show the results to the instructor.
• Note: If required, the teacher can conduct (additional) one lecture per week to explain
theoretical aspects of data science and to demonstrate Python data science library
functions

Suggested Topics for Laboratory Experiments/Assignments

Introduction to data analytics and Python fundamentals:


Understanding the Data
Python Packages for Data Science
1
Importing and Exporting Data in Python
Getting Started Analyzing Data in Python
Accessing Databases with Python
Data Visualization in Python:
2 Matplotlib, Pandas, Seaborn: Sactterplot, Barchart, Linechart, Histogram.
Other Graphs: Boxplot, Heatmap, Faceting, Pairplot,
Data Wrangling:
Pre-processing Data in Python
Dealing with Missing Values in Python
3 Data Formatting in Python
Data Normalization in Python
Binning in Python
Turning categorical variables into quantitative variables in Python
Statistical Data Analysis:
Probability
4
Sampling & Sampling Distributions
Hypothesis Testing
Exploratory Data Analysis:
Descriptive Statistics
Group By in Python
5
Correlation
Correlation - Statistics
Analysis of Variance ANOVA
Model Development:
Linear Regression and Multiple Linear Regression
Model Evaluation using Visualization
6
Polynomial Regression and Pipelines
Measures for In-Sample Evaluation
Prediction and Decision Making
Books & Other Resources:
Reference Books:
1. Python Data Science Handbook - Essential Tools for Working with Data, Jake
VanderPlas, O’REILLY
2. Python for Data Analysis, Wes McKinney, O’REILLY, 2nd Edition
3. Data Science from Scratch: First Principles with Python, Joel Grus, O’REILLY
Web Resources:
1. https://swayam.gov.in/nd1_noc20_cs46/
2. https://www.coursera.org/learn/data-analysis-with-python
3. https://www.geeksforgeeks.org/python-for-data-science/
4. https://www.coursera.org/learn/python-data-analysis/home/welcome/
5. https://www.udemy.com/course/data-science-with-python-a-complete-guide-3-in-1/
INTRODUCTION TO
DATA ANALYTICS AND
PYTHON FUNDAMENTALS
Agenda
 What is Data Science?
 Data Science Vs Data Analysis
 Why Python?
 Understanding Python Basics
 Python Packages for Data
Science
What is Data Science?
 is the field of study that combines
 domain expertise
 programming skills
 knowledge of mathematics and statistics

to extract meaningful insights from data.


Data Science Vs Data Analysis
Data Analysis Data Science
 curate's meaningful  deals more with the
insights from known hypotheticals
data  try to predict the
 using data to answer future and frame
questions presented those predictions in
to them new questions
Why python?
 Cross-functional, Interpreted
 Can streamline large complex data sets
 Rapid Application Development
 Extensive library support
 graphical options along with visualization tools
 Extended Pack of Analytics Tools Available
 High readability , Easy to learn
Python Packages for Data Science
Python Basics
 Data Types

 Variables

 Identifiers
Operators
Structures
 for

 while

 If - else
Lists
 Creating Lists

 Traversing

 Accessing elements

 Slicing

 Methods – append(), insert(), remove(),


Lists vs Tuples
List Tuple
Mutable Un mutable
Not efficient in memory usage Uses memory efficiently
Slower access Faster access
Cannot be used a key to dictionary Can be used as key to dictionary
Slicing possible Slicing possible
Can create an empty list Cannot create a tuple of single
element
Dictionary
 Stores data as {key : value}

 Storage based on the concept of hashing / hash


table

 Provides fast access


Functions
 improves –

 Reusability

 Readability

 Debugging

 Development time
Packages
 a package is a collection of Python modules
 Math

 Sys

 NumPy

 Pandas

 MatplotLib
NumPy Package
 created in 2005 by Travis Oliphant.
 stands for Numerical Python.
 Python library used for working with arrays.
 Includes functions for domain of linear algebra,
fourier transform, and matrices.
NumPy Array
 a numpy array is
 a grid of values,
 all of the same type,
 indexed by a tuple of nonnegative integers.

Rank of an array - The number of dimensions


Shape of an array - a tuple of integers giving the size of the
array along each dimension.
Indexing 2D numpy array
print(a[row,column])

2nd row, 1st col  a[2,1]

Only 1st row  a[1,:]

Only 2nd column  a[:,1]


Pandas
 was created by Wes McKinney in 2008
 name "Pandas" has a reference to both "Panel Data", and
"Python Data Analysis“

 is a Python library used for working with data sets


 includes functions for analyzing, cleaning, exploring, and
manipulating data.
What does Pandas Do?
Informs about data such as -
 Is there a correlation between two or more columns?

 What is average value?

 Max value?

 Min value?
MatplotLib
 a low level graph plotting library
 serves as a visualization utility.
 created by John D. Hunter.
What is Python

IDE: To write codes: Anaconda, Spyder, Jupyter id


Latest vesrion of Anaconada
2 SE ETC FDP 1/21/2021
3 SE ETC FDP 1/21/2021
4 SE ETC FDP 1/21/2021
Anaconda IDE Repository
 Install it on D drive
 Anaconda Navigator
 Anaconda Prompt
 It is started with Monty Python others kept maintaining
snake names however Guerdo meant a TV Show
 https://towardsdatascience.com/optimizing-jupyter-
notebook-tips-tricks-and-nbextensions-26d75d502663
 All additional information, used to specify how a particular
notebook/cell/output will be represented, when converted,
is stored in the metadata
 Conda list

5 SE ETC FDP 1/21/2021


6 SE ETC FDP 1/21/2021
Data Visualization

7 SE ETC FDP 1/21/2021


Popular plotting libraries in Python

8 SE ETC FDP 1/21/2021


Matplotlib

9 SE ETC FDP 1/21/2021


Scatter Plot

10 SE ETC FDP 1/21/2021


Importing necessary libraries

11 SE ETC FDP 1/21/2021


Importing Data

12 SE ETC FDP 1/21/2021


Scatter Plot

13 SE ETC FDP 1/21/2021


Scatter Plot

Markers

X ticks
X-tick labels

14 SE ETC FDP 1/21/2021


Histogram

15 SE ETC FDP 1/21/2021


16 SE ETC FDP 1/21/2021
Histogram

17 SE ETC FDP 1/21/2021


18 SE ETC FDP 1/21/2021
19 SE ETC FDP 1/21/2021
20 SE ETC FDP 1/21/2021
21 SE ETC FDP 1/21/2021
22 SE ETC FDP 1/21/2021
Summary

23 SE ETC FDP 1/21/2021


24 SE ETC FDP 1/21/2021
25 SE ETC FDP 1/21/2021
Scatter Plot

26 SE ETC FDP 1/21/2021


27 SE ETC FDP 1/21/2021
28 SE ETC FDP 1/21/2021
29 SE ETC FDP 1/21/2021
30 SE ETC FDP 1/21/2021
31 SE ETC FDP 1/21/2021
32 SE ETC FDP 1/21/2021
33 SE ETC FDP 1/21/2021
34 SE ETC FDP 1/21/2021
Frequency distribution of Age
A kernel density estimate (KDE) plot is
a method for visualizing the distribution
of observations in a dataset, analagous to a
histogram. KDE represents the data
using a continuous probability density
curve in one or more dimensions.

35 SE ETC FDP 1/21/2021


KDE===Distribution of variable age

36 SE ETC FDP 1/21/2021


37 SE ETC FDP 1/21/2021
38 SE ETC FDP 1/21/2021
39 SE ETC FDP 1/21/2021
40 SE ETC FDP 1/21/2021
41 SE ETC FDP 1/21/2021
42 SE ETC FDP 1/21/2021
43 SE ETC FDP 1/21/2021
BOXPLOT

44 SE ETC FDP 1/21/2021


IQR (Inter Quartile Range)

45 SE ETC FDP 1/21/2021


46 SE ETC FDP 1/21/2021
47 SE ETC FDP 1/21/2021
48 SE ETC FDP 1/21/2021
49 SE ETC FDP 1/21/2021
50 SE ETC FDP 1/21/2021
51 SE ETC FDP 1/21/2021
52 SE ETC FDP 1/21/2021
53 SE ETC FDP 1/21/2021
54 SE ETC FDP 1/21/2021
55 SE ETC FDP 1/21/2021
Python - Data Types

2 SE FDP ETC 1/21/2021


Data Pre-processing

3 SE FDP ETC 1/21/2021


4 SE FDP ETC 1/21/2021
Learning Objectives

5 SE FDP ETC 1/21/2021


Simple Dataframe Operations

6 SE FDP ETC 1/21/2021


7 SE FDP ETC 1/21/2021
Missing Values

8 SE FDP ETC 1/21/2021


Missing Values

9 SE FDP ETC 1/21/2021


10 SE FDP ETC 1/21/2021
11 SE FDP ETC 1/21/2021
12 SE FDP ETC 1/21/2021
How to drop missing values in Python

13 SE FDP ETC 1/21/2021


14 SE FDP ETC 1/21/2021
15 SE FDP ETC 1/21/2021
16 SE FDP ETC 1/21/2021
17 SE FDP ETC 1/21/2021
18 SE FDP ETC 1/21/2021
How to replace missing values in Python

19 SE FDP ETC 1/21/2021


20 SE FDP ETC 1/21/2021
21 SE FDP ETC 1/21/2021
22 SE FDP ETC 1/21/2021
23 SE FDP ETC 1/21/2021
24 SE FDP ETC 1/21/2021
25 SE FDP ETC 1/21/2021
26 SE FDP ETC 1/21/2021
27 SE FDP ETC 1/21/2021
28 SE FDP ETC 1/21/2021
29 SE FDP ETC 1/21/2021
30 SE FDP ETC 1/21/2021
31 SE FDP ETC 1/21/2021
32 SE FDP ETC 1/21/2021
33 SE FDP ETC 1/21/2021
34 SE FDP ETC 1/21/2021
35 SE FDP ETC 1/21/2021
36 SE FDP ETC 1/21/2021
37 SE FDP ETC 1/21/2021
38 SE FDP ETC 1/21/2021
39 SE FDP ETC 1/21/2021
40 SE FDP ETC 1/21/2021
41 SE FDP ETC 1/21/2021
42 SE FDP ETC 1/21/2021
43 SE FDP ETC 1/21/2021
44 SE FDP ETC 1/21/2021
45 SE FDP ETC 1/21/2021
46 SE FDP ETC 1/21/2021
47 SE FDP ETC 1/21/2021
48 SE FDP ETC 1/21/2021

You might also like