You are on page 1of 21

Gyan Ganga Institute of Technology and Sciences,Jabalpur

Computer Science and Engineering


CSE-Data Science, IV semester DS406

PYTHON FOR DATA SCIENCE

LAB MANUAL
List of Experiments

1. Write a python program to reverse a string.

2. Write a python program to perform following operation using lists:

a. append element in the list

b. compare two lists

c. convert list to dictionary

3. Write a Program to transpose a table/pandas data frame.

4. Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10.

5. Write a python program to perform following operation on Data Frame:

a. Create two different Data Frames and perform the merging operations on it.

b. Create two different Data Frames and perform the grouping operations on it.

c. Create two different Data Frames and perform the concatenating operations on it

6. Program to check regular expression pattern is matching with string or not in Python

7. Create a sample dataset and apply the following aggregation function on it:

1. mean(), median()
2. min(), max()
3. std(), var()
4. sum()

8. Write a python program to get row wise proportion using crosstab () function.

9. Write a python program to display a bar chart of the popularity of programming languages.

10. Write a Python program to create bar plot of scores by group and gender. Use multiple X
values on the same chart for men and women.

Sample Data:
Means (men) = (22, 30, 35, 35, 26)
Means (women) = (25, 32, 30, 35, 29)
Experiment 1

Aim: Write a python program to reverse a string.

def reverse_string(str):

revstr=""

index=len(str)

while index>0:

revstr=revstr+str[index-1]

index=index-1

return revstr

print(reverse_string("Welcome"))

Output:

emocleW
Experiment 2

Aim: Write a python program to perform following operation using lists:

1. append element in the list


2. compare two lists
3. convert list to dictionary

# appending the element in the list

list1=[1,2,3,4,5]

for i in list1:

print(i)

list2=['a','b','c','d','e']

list1.append(list2)

print(list1)

Output:

[1, 2, 3, 4, 5, ['a', 'b', 'c', 'd', 'e']]


# comparing two lists

import collections

l1 = [10, 20, 30, 40, 50]

l2 = [10, 20, 30, 50, 40, 70]

l3 = [50, 10, 30, 20, 40]

l1.sort()

l2.sort()

l3.sort()

if l1 == l3:

print ("The lists l1 and l3 are the same")

else:

print ("The lists l1 and l3 are not the same")

if l1 == l2:

print ("The lists l1 and l2 are the same")

else:

print ("The lists l1 and l2 are not the same")

Output:

The lists l1 and l3 are the same

The lists l1 and l2 are not the same

# convert list to dictionary

index = [1,2,3]

languages = [‘a’, ’b’, ’c’]

dictionary={k:v for k, v in zip(index, languages)}

print(dictionary)

Output:

{1: 'a', 2: 'b', 3: 'c'}


Experiment 3

Aim: Write a Program to transpose a table/pandas data frame.

import numpy as np
import pandas as pd

d1 = {'c1': [2, 3], 'c2': [4, 5]}


df1 = pd.DataFrame(data=d1)
print(df1)

df1Transpose=df1.T
print(df1_transpose)

Output:
c1 c2

0 2 4

1 3 5

0 1

c1 2 3

c2 4 5
Experiment 4

Aim: Write a NumPy program to create a 3x3 matrix with values ranging from 2 to
10.

import numpy as np
x = np.arange(2, 11).reshape(3,3)
print(x)

Output:
[[ 2 3 4]

[ 5 6 7]

[ 8 9 10]]
Experiment 5

Aim: Write a python program to perform following operation on Data Frame:

a. Create two different Data Frames and perform the merging operations on it.

b. Create two different Data Frames and perform the grouping operations on it.

c. Create two different Data Frames and perform the concatenating operations on it

a. Create two different Data Frames and perform the merging operations on it.
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
},
index=[0, 1, 2, 3],
)
df2 = pd.DataFrame(
{
"A": ["A4", "A5", "A6", "A7"],
"B": ["B4", "B5", "B6", "B7"],
"C": ["C4", "C5", "C6", "C7"],
"D": ["D4", "D5", "D6", "D7"],
},
index=[4, 5, 6, 7],
)
df3 = pd.DataFrame(
{
"A": ["A8", "A9", "A10", "A11"],
"B": ["B8", "B9", "B10", "B11"],
"C": ["C8", "C9", "C10", "C11"],
"D": ["D8", "D9", "D10", "D11"],
},
index=[8, 9, 10, 11],
)

frames = [df1, df2, df3]

result = pd.concat(frames)
Output:

A B C D

0 A0 B0 C0 D0

1 A1 B1 C1 D1

2 A2 B2 C2 D2

3 A3 B3 C3 D3

4 A4 B4 C4 D4

5 A5 B5 C5 D5

6 A6 B6 C6 D6

7 A7 B7 C7 D7

8 A8 B8 C8 D8

9 A9 B9 C9 D9

10 A10 B10 C10 D10

11 A11 B11 C11 D11

Create two different Data Frames and perform the grouping operations on it:
Any groupby operation involves one of the following operations on the original object.
They are −
 Splitting the Object
 Applying a function
 Combining the results
In many situations, we split the data into sets and we apply some functionality on each
subset. In the apply functionality, we can perform the following operations −
 Aggregation − computing a summary statistic
 Transformation − perform some group-specific operation
 Filtration − discarding the data with some condition

Example:

Split Data into Groups

Pandas object can be split into any of their objects. There are multiple ways to split an
object like −

 obj.groupby('key')
 obj.groupby(['key1','key2'])
 obj.groupby(key,axis=1)

import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',


'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
grouped = df.groupby('Year')

for name,group in grouped:


print (name)
print (group)

Output:
2014
Team Rank Year Points
0 Riders 1 2014 876
2 Devils 2 2014 863
4 Kings 3 2014 741
9 Royals 4 2014 701
2015
Team Rank Year Points
1 Riders 2 2015 789
3 Devils 3 2015 673
5 kings 4 2015 812
10 Royals 1 2015 804
2016
Team Rank Year Points
6 Kings 1 2016 756
8 Riders 2 2016 694
2017
Team Rank Year Points
7 Kings 1 2017 788
11 Riders 2 2017 690

Using the get_group() method, we can select a single group.

# import the pandas library


import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',


'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

grouped = df.groupby('Year')
print (grouped.get_group(2014))

Output:
Team Rank Year Points
0 Riders 1 2014 876
2 Devils 2 2014 863
4 Kings 3 2014 741
9 Royals 4 2014 701

Aggregations

An aggregated function returns a single aggregated value for each group. Once the group
by object is created, several aggregation operations can be performed on the grouped
data.
An obvious one is aggregation via the aggregate or equivalent agg method −

# import the pandas library


import pandas as pd
import numpy as np

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',


'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

#Aggregations using agg() function


grouped = df.groupby('Year')
print (grouped['Points'].agg(np.mean))
grouped = df.groupby('Team')
# to see the size of each group is by applying the size() function
print (grouped.agg(np.size))

# pass a list or dict of functions to do aggregation with, and generate DataFrame as


output
print (grouped['Points'].agg([np.sum, np.mean, np.std]))

# Transformation
grouped = df.groupby('Team')
score = lambda x: (x - x.mean()) / x.std()*10
print (grouped.transform(score))

Output:

Year

2014 795.25

2015 769.50
2016 725.00

2017 739.00

Name: Points, dtype: float64

Rank Year Points

Team

Devils 2 2 2

Kings 3 3 3

Riders 4 4 4

Royals 2 2 2

kings 1 1 1

sum mean std

Team

Devils 1536 768.000000 134.350288

Kings 2285 761.666667 24.006943

Riders 3049 762.250000 88.567771

Royals 1505 752.500000 72.831998

kings 812 812.000000 NaN

Rank Year Points

0 -15.000000 -11.618950 12.843272

1 5.000000 -3.872983 3.020286

2 -7.071068 -7.071068 7.071068

3 7.071068 7.071068 -7.071068

4 11.547005 -10.910895 -8.608621

5 NaN NaN NaN

6 -5.773503 2.182179 -2.360428

7 -5.773503 8.728716 10.969049

8 5.000000 3.872983 -7.705963

9 7.071068 -7.071068 -7.071068

10 -7.071068 7.071068 7.071068

11 5.000000 11.618950 -8.157595


Experiment 6

Aim: Program to check regular expression pattern is matching with string or not
in Python

#Need module 're' for regular expression

import re

search_string = "Hello-World"

pattern = "World"

match = re.match(pattern, search_string)

#If-statement after search() tests if it succeeded

if match:
print("regex matches: ", match.group())

else:

print('pattern not found')

Output:

pattern not found

Experiment 7

Aim: Create a sample dataset and apply the following aggregation function on it:

1. mean(), median()
2. min(), max()
3. std(), var()
4. sum()

import pandas as pd

import numpy as np

raw_data = {'name': ['Willard Morris', 'Al Jennings', 'Omar Mullins', 'Spencer McDaniel'],

'age': [20, 19, 22, 21],

'favorite_color': ['blue', 'blue', 'yellow', "green"],

'grade': [88, 92, 95, 70]}


df = pd.DataFrame(raw_data, index = ['Willard Morris', 'Al Jennings', 'Omar Mullins',
'Spencer McDaniel'])

print(df)

print(df['grade'].mean())

print(df['grade'].median())

print(df['grade'].sum())

print(df['grade'].var())

print(df['grade'].std())

print(df['grade'].max())

print(df['grade'].min())

Output:

name age favorite_color grade

Willard Morris Willard Morris 20 blue 88

Al Jennings Al Jennings 19 blue 92

Omar Mullins Omar Mullins 22 yellow 95

Spencer McDaniel Spencer McDaniel 21 green 70

86.25

90.0

345

125.58333333333333

11.206396982676159

95

70
Experiment 8

Aim: Write a python program to get row wise proportion using crosstab () function.

import pandas as pd
import numpy as np
 
#Create a DataFrame
d={
    'Name':['Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine',
            'Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine'],
    'Exam':['Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1',
            'Semester 2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],
     
    'Subject':['Mathematics','Mathematics','Mathematics','Science','Science','Science',
               'Mathematics','Mathematics','Mathematics','Science','Science','Science'],
   'Result':['Pass','Pass','Fail','Pass','Fail','Pass','Pass','Fail','Fail','Pass','Pass','Fail']}
 
df = pd.DataFrame(d,columns=['Name','Exam','Subject','Result'])
pd.crosstab(df.Subject, df.Result,margins=True)

# Two way frequency table or cross table: Get proportion using crosstab() function

#### Rename the index and columns


 
my_crosstab.columns = ["Fail" , "Pass" , "rowtotal"]
my_crosstab.index= ["Mathematics","Science","coltotal"]

#Get row wise proportion using crosstab() function

#### Get the row proportion


 
print(my_crosstab.index(my_crosstab["rowtotal"],axis=0))

Output:

Experiment 9

Aim: Write a python program to display a bar chart of the popularity of


programming languages.

import matplotlib.pyplot as plt


x = ['Java', 'Python', 'PHP', 'JavaScript', 'C#', 'C++']
popularity = [22.2, 17.6, 8.8, 8, 7.7, 6.7]
x_pos = [i for i, _ in enumerate(x)]

plt.bar(x_pos, popularity, color=(0.4, 0.6, 0.8, 1.0), edgecolor='blue')

plt.xlabel("Languages")
plt.ylabel("Popularity")
plt.title("PopularitY of Programming Language\n" + "Worldwide, Oct 2017 compared to a
year ago")
plt.xticks(x_pos, x)
# Turn on the grid
plt.minorticks_on()
plt.grid(which='major', linestyle='-', linewidth='0.5', color='red')
# Customize the minor grid
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
plt.show()

Output:

Experiment 10
Aim: . Write a Python program to create bar plot of scores by group and gender. Use
multiple X values on the same chart for men and women.

Sample Data:
Means (men) = (22, 30, 35, 35, 26)
Means (women) = (25, 32, 30, 35, 29)

import numpy as np
import matplotlib.pyplot as plt

# data to plot
n_groups = 5
men_means = (22, 30, 33, 30, 26)
women_means = (25, 32, 30, 35, 29)
# create plot
fig, ax = plt.subplots()
index = np.arange(n_groups)
bar_width = 0.35
opacity = 0.8

rects1 = plt.bar(index, men_means, bar_width,


alpha=opacity,
color='g',
label='Men')

rects2 = plt.bar(index + bar_width, women_means, bar_width,


alpha=opacity,
color='r',
label='Women')

plt.xlabel('Person')
plt.ylabel('Scores')
plt.title('Scores by person')
plt.xticks(index + bar_width, ('G1', 'G2', 'G3', 'G4', 'G5'))
plt.legend()

plt.tight_layout()
plt.show()

Output:

You might also like