You are on page 1of 19

Page 1 of 19

INFORMATICS PRACTICES
(065)

Unit-1
Data Visualization &
Data Handling using Pandas
Project File
(Session: 2023-24)
For
Class 12th

Submitted By: Submitted To:


Aashi Nagiya Manish Agarwal
Page 2 of 19

DATA VISUALIZING
Python Library
Matplotlib
It is a comprehensive library for creating static, animated and interactive
visualization or data visualization in python

1. Develop publication quality Plots with just few lines of codes.


2. Use interactive figures that can Zoom, extend or update.

We can customize and take few controls of line style, font properties,
excess properties as well as export a number of file formats and
interactive environments.

DATA VISUALIZATION

Data Visualization is the discipline of trying to expose the data to understand it


by placing it in a visual context.

The main goals of data visualization are to distills large data a set into visual
graphics to allow for an easy understanding for complex relationship written the
data.

Data Visualization plays an essential role in the representation of small- and


large-scale data. It specially applies to explain the analysis of increasingly large
datasets.
Page 3 of 19

PURPOSE OF DATA VISUALIZATION

Following are the purposes of Data Visualization:


1. Quick Action
2. Better Analysis
3. Finding Errors
4. Identifying Patterns
5. Understanding the Story

FEATURES OF MATPLOTLIB LIBRARY (DATA VISUALIZATION)


1. Drawing
2. Customization
3. Saving

DRAWING
Plots can be drawer based on the passed data through specific function.

CUSTOMIZATION
Plots can be customized as per the requirement after specifying functions
like;
Color, Style, Width, Label, Title and Legend

SAVING
After drawing & customization Plots can be saved for future use.

TYPES OF PLOTS (USING MATPLOTLIB)


Following are the types of plots using Matplotlib:
1. Line Plot
2. Bar Graph
3. Histogram
Page 4 of 19

LINE PLOT
A line plot chart is a graph that shows the frequency of data occurring along a number
line.
The line plot is represented by a series of data points connected with the straight line.
We can plot a line graph to define the grid, the x and y axis scales and labels, littles and
display options etc.

For Example (Program):

import numpy as np
import matplotlib.pyplot as plt
year=(2015,2016,2017,2018,2019)
bpspercentage=(90, 92, 94, 95, 97)
srirampercentage=(89, 90, 88, 85, 93)
plt.plot(year, bpspercentage, colour='green')
plt.plot(year, srirampercentage, colour='red')
plt.xlabel('year')
plt.ylabel('passpercentage')
plt.title('Students Board Exams Pass Percentage')
plt.show()

Output:
Page 5 of 19

BAR GRAPH

A bar graph drawn using rectangular bars to show how large each value is. The bar can be
horizontal or vertical.

A bar graph makes easy to compare data between different groups at a glance. It represents
categories on one access and a discrete value in another. The goal of bar graph is to show
the relationship between two access. Bar Graph can also show big changes in data over the
time.

For Example (Program):

import numpy as np
import matplotlib.pyplot as plt
labels=('sandeep', 'chitra', 'mansi', 'mohit', 'ayush', 'shubh')
bpsper=(94,92,96,83,90,76)
index=np.arange(len(labels))
plt.bar(index, bpsper)
plt.xlabel('student name',
fontsize=10)
plt.ylabel('student percentage',
fontsize=11)
plt.show()
Output:
Page 6 of 19

HISTOGRAM
Histogram is a graphic presentation in which a organic group of data points into user’s
specified range.

Histogram produced virtual presentation of numerical data which shows the number of data
points that fall with in specified range of value.

Histogram is similar to the bar graph (vertical) but without gaps in between the bars.

For Example (Program):

import matplotlib.pyplot as plt


import numpy as np
data=[0, 10, 20, 30, 40]
plt.hist([5, 15, 25, 35, 45,55],
bins=[0, 10, 20, 30, 40, 50, 60],
weight=[12, 20, 15, 33, 6, 38],
fontsize=10,
edgecolor='red')
plt.show()
z= [10, 5, 8, 4, 2]
plt.hist(z)
plt.show()
Output:
Page 7 of 19

DATA STRUCTURE
There are two important data structure in the Python Pandas are follows:
1. Series
2. Data Frame
SERIES(S)
Series is like a one-dimension array like structure with homogenous data. The axis
label is collectively known as index. Series structure can store any type of data such
as integer, float, string, python objects, and so on. It can be created using an array,
a dictionary or a constant value.

For Example:
The following series in the collection of Integers.

SERIES
7 25 90 -1 136 76 5

1D

FEATURES OF SERIES
• Homogenous Data
• Series Data/Volume is Mutable
• Series Size is Immutable
Page 8 of 19

DATA FRAMES (DF)


Data Frames is likely a two-dimension array with heterogenous data. DataFrames are
one of the most common data structures used in modern data analytics because they
are a flexible and intuitive way of storing and working with data.

For Example:

DATA FRAME (TABLE)


Roll no. Student Name Address Age E-mail
Y - - - - -

- - - - -

- - - - -

- - - - -

2D

X OR DF

FEATURES OF DATA FRAMES (DF)


• Heterogenous Data
• Data Frames / Volume is Immutable
• Data Frames Size is Mutable
Page 9 of 19

PANDAS SERIES
Pandas Series is just like a one-dimensional array, capable of holding any type of data
like Integer, Float, String and Python Objects.

SYNTAX (PANDAS SERIES)


Data, Index, Type, Copy
Creation of series is also possible from and array, dictionary, scalar value
Series can be created using:
• Array
• Dict
• Scalar Value

Example of Pandas Series (Blank)


Example1
import pandas as pd
s=pd.Series()
print(s)
Output:
Series([], dtype: float64)

Example 2
import pandas as pd1
import numpy as np1
data=np1.array(['a', 'b', 'c', 'd'])
s=pd1.Series(data, index=[100, 110, 120, 130])
print(s)
Output:
100 a
110 b
120 c
130 d, dtype: object
Page 10 of 19

Example 3
Create a Pandas Series without Index
import pandas as pd1
import numpy as np1
data={'a':0.0, 'b':1.0, 'c':2.0, 'd':3.0}
s=pd1.Series(data)
print(s)
Output:
a 0.0
b 1.0
c 2.0
d 3.0, dtype: float64

Example 4
Create a Series from Scalar
import pandas as pd1
import numpy as np1
s=pd1.Series([5, index= 0, 1, 2, 3])
print()
Output:
0 5
1 5
2 5
3 5, dtype: int64
Page 11 of 19

Example 5
Maths Operations with Pandas Series
1.
import pandas as pd1
s=pd1.Series([1, 2, 3])
m=pd1.series([1, 2, 3])
n=s*m#perform addition operation
print(n)
Output:
0 2
1 4
2 7 dtype: int64
2.
import pandas as pd1
s=pd1.Series([1, 2, 3])
m=pd1.series([1, 2, 4])
n=s*m#perform multiplication operation
print(n)
Output:
0 1
1 4
2 12 dtype: int64
Page 12 of 19

PYTHON PANDAS HEAD AND TALE FUNCTION

Head Function ( ):
Head ( ) returns the first n rows. The default number of elements to display is 5.

For Example
import pandas as pd1
s=pd1.Series([1, 2, 3, 4, 5],index=['a', 'b', 'c', 'd', 'e'])
print(s.head(3))
Output:
a 1
b 2
c 3 dtype:int64

Tale Function ( ):
Tale Function returns the last n rows. The default number of elements to display
is 5 but we may paas a custom number.

For Example
import pandas as pd1
s=pd1.Series([1, 2, 3, 4, 5, 6] index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’])
print(s.tale(3))
Output:
d 4
e 5
f 6 dtype:int64
Page 13 of 19

ACCESS DATA FROM SERIES WITH STRING

For Example
import pandas as pd1
s=pd1.Series([1, 2, 3, 4, 5, 6], index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’])
print(s[3])
Output:
a 1
b 2
c 3 dtype:int64

ACCESS DATA FROM SERIES WITH SLICING

For Example
import pandas as pd1
s=pd1.Series([1, 2, 3, 4, 5, 6], index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’])
print(s[:3])
Output:
a 1
b 2
c 3 dtype:int64
Page 14 of 19

EXAMPLE OF DATA FRAME

Example 1
Create an Empty Data Frame
import pandas as pd1
df1=pd1.DataFrame()
print(df1)
Output:
“Empty DataFrame”
Columns: []
Index: []

Example 2
Create a Data frame from List
import pandas as pd1
data1=[1, 2, 3, 4, 5]
df1=pd1.DataFrame(data1)
print(df1)
Output:
0
0 1
1 2
2 3
3 4
4 5
Page 15 of 19

Example 3
Create a Data frame from (dict)nd Array or List
import pandas as pd1
data1={'Name':['Mansi', 'Krishna', 'Sandeep', 'Aashi'], 'Age':[17, 17, 17, 18]}
df1=pd1.DataFrame(data1)
print(df1)
Output:
Name Age
0 Mansi 17
1 Krishna 17
2 Sandeep 17
3 Aashi 18

Example 4
Create a Data frame from List (dict)
import pandas as pd1
data1=[{'x':1, 'y':2},{'x':10, 'y':12, 'z':15}]
df1=pd1.DataFrame(data1)
print(df1)
Output:
x y z
0 1 2 NaN
1 10 12 15.0
Page 16 of 19

Example 5
Create a Data frame from dict (Series)
import pandas as pd1
d1={'One':pd1.Series([1, 2, 3], index=('a','b', 'c')),
'Two':pd1.Series([1, 2, 3, 4],index=('a','b', 'c', 'd'))}
print(d1)
Output:
{'One': a 1
b 2
c 3
dtype: int64, 'Two': a 1
b 2
c 3
d 4
dtype: int64}
Page 17 of 19

CSV FILES (COMMA SEPARATED VALUES)

• CSV stands for Comma Separated Values files.


• It is a type plain text files that user specific structuring to arrange Tabular Data.
• It can contain only actual text data that will be printable ASCII or Unicode
characters..
For Example
Create a Data Frame from CSV file
Or
Import Data Frame from CSV File

Suppose we have a CSV file named as [BPS Product CSV] that contain the following data.

DATA RATE PRODUCT NAME WEIGHT


27-2-2020 1016.50 Joystick 250
19-9-2022 119.00 Mouse 50
11-8-2023 565.50 Keyboard 300

import pandas as pd1


#read data frame BPS product CSV
data1=pd1.read_ csv (“BPS product.cvs”)
# control rows, column name with read CVS data
head(1)
#Preview the first 1 line of the loaded data.
Page 18 of 19

MANIPULATION OR UPDATION IN PYTHON PANDAS DATA FRAME

• Column Addition in Data Frame

import pandas as pd
df=pd.DataFrame({"A":[1, 2, 3], "B":[4, 5, 6]})
c=[7, 8, 9]
df["C"]=c
print(df)
Output:
A B C
0 1 4 7
1 2 5 8
2 3 6 9

• Column Deletion
del df1[‘One’]
#deleting the first column using the del function
df.pop[‘Two’]
#deleting another column using the pop function

• Column Rename
import pandas as pd
df=pd.DataFrame({"A":[1, 2, 3], "B":[4, 5, 6]})
df.rename(columns={'A': 'a', 'B': 'b'})
Output:
a b
0 1 4
1 2 5
2 3 6
Page 19 of 19

PYTHON PANDAS DATA FRAME ROW SELECTION, ADDITION AND


DELETION
import pandas as pd1
d1={'one':pd1.Series([1, 2, 3], index=['a', 'b', 'c']), 'two':pd1.Series([1, 2, 3, 4],
index=['a', 'b', 'c', 'd'])}
df1=pd1.DataFrame(d1)
print(df1.loc['b'])
Output:
one 2.0
two 2.0 Name: b, dtype: float64

PANDAS DATA FRAME

Index in Data Frame using loc function or .loc [ ]


Loc function select data by the level of roms and columns
#.loc means location and .iloc mean location of indexing and that, what the actual
index are….
Or
.iloc is the location of integers.
For Example (Program):
import pandas as pd
import numpy as np
df=pd.DataFrame(np.random.randn(8,4),
index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'],
columns=['A', 'B', 'C', 'D'])
print(df.loc[:,'A'])
Output:
a 0.503603
b -1.917947
c -0.002972
d 2.097685
e 0.570147
f -0.267406
g -0.526720
h -0.540640
Name: A, dtype: float64

You might also like