Professional Documents
Culture Documents
LAB MANUAL
TOPIC 1:
INTRODUCTION TO PYTHON
TUPLES:
Tuples are python objects separated by commas and are generally immutable.
CODE:
OUTPUT:
LISTS:
Lists are like dynamic arrays which may contain integers, strings, python
objects separated by commas. Unlike tuples, lists are mutable, hence they can
be altered after creation.
CODE:
OUTPUT:
DICTIONARY:
Dictionary is an unordered collection of data values which can store more than
one data type in a key: value format.
CODE:
OUTPUT:
SERIES:
Series is a 1D array capable to store the values of any data type. The values are
indexed.
CODE:
Output:
TOPIC 2:
INTRODUCTION TO PANDAS PACKAGES
OUTPUT:
CODE: SELECTING ROWS
OUTPUT:
CODE:
OUTPUT:
TOPIC 3:
INTRODUCTION TO NUMPY
1 DIMENSIONAL ARRAY:
A 1D array is a structured collection of components that can be accessed
individually by specifying the position of a component with a single index
value.
CODE:
OUTPUT:
2 DIMENSIONAL ARRAY:
It’s an array organised in the form of a matrix and can be represented as a
collection of rows and columns.
CODE:
OUTPUT:
3 DIMENSIONAL ARRAY
It’s a multidimensional array, which can be represented as a collection of 2D
arrays.
CODE:
OUTPUT:
TOPIC 4:
DATA MERGING
Data merging is a process of merging two data sets into one, and aligning the
rows from each based on common attributes and columns.
Points to be covered:
1. Merging 2 data frames
2. default Inner Join
3. Left Outer Join
4. Right Outer Join
5. Outer Join
6. concatenation
7. Hierarchical Index
8. To remove Index use ignore index=True
CODE:
import pandas as pd
import numpy as np
# student -> file1(name,age,address,mob no) file2(name,sap id, sem, course ,
university,mob no)
df1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1':
range(7)})
df2 = pd.DataFrame({'key': ['a', 'b', 'd'],'data2': range(3)})
print("df1:\n", df1,"\n\ndf2:\n",df2)
df_merge = pd.merge(df1,df2)
print(df_merge)
df1 = pd.merge(df1, df2, on='key') # default Inner Join - Use intersection of
Keys
print(df1)
df1=pd.merge(df1,df2,how='left',on='key') #Left Outer Join- Use keys from left
object
print(df1)
df1 = pd.merge(df1,df2,how='right',on='key') #Right Outer Join- Use keys from
right object
print(df1)
df1 =pd.merge(df1, df2, how='outer', on='key') #Outer Join- Use union of both
object keys
print(df1)
left1 = pd.DataFrame({'key': ['a', 'b', 'a', 'a', 'b', 'c'], 'value':
range(6)})
right1 = pd.DataFrame({'group_val': [3.5, 7]}, index=['a','b'])
print("left1:\n", left1,"\n\nright1:\n",right1,"\n\nAfter Merge:\n")
df1= pd.merge(left1, right1, left_on='key', right_index=True, how = 'outer')
print(df1)
#concatenation
# file1(sap_id, name, sem, cgpa) file2(sap_id, name, sem, cgpa)
s1 = pd.Series([0, 1], index=['a', 'b'])
s2 = pd.Series([2, 3, 4], index=['c', 'd', 'e'])
s3 = pd.Series([5, 6], index=['f', 'g'])
#Calling concat with these object in a list glues together the values and
indexes-;/
df1 = pd.concat([s1, s2, s3])
print(df1)
df1 = pd.concat([s1, s2, s3], axis=1)
print(df1)
#Same Operation on dataframe
df1 = pd.DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b',
'c'],columns=['one', 'two'])
df2 = pd.DataFrame(5 + np.arange(4).reshape(2, 2), index=['a',
'c'],columns=['three', 'four'])
print("df1:\n", df1,"\n\ndf2:\n",df2,"\n\nAfter Concat:\n")
df1 = pd.concat([df1, df2], axis=1, keys=['level1', 'level2'])
print(df1)
#Hierarchical Index
df1 = pd.concat([df1, df2], axis=1, keys=['level1', 'level2'],names=['upper',
'lower'])
print(df1)
#To remove Index use ignore_index=True
df1 = pd.concat([df1, df2],ignore_index=True)
print(df1)
OUTPUT:
TOPIC 5:
DATA CLEANING
OUTPUT:
CODE: REMOVING DUPLICATES
OUTPUT:
TOPIC 6:
DATA TRANSFORMATION
OUTPUT:
TOPIC 7:
INTRODUCTION TO MATPLOTLIBS
HISTOGRAM:
A histogram is an approximate representation of the distribution of numerical
data.
CODE:
OUTPUT:
STACK PLOT
Stack plot is used to draw a stacked area plot where plotting is done vertically
o top of each other rather than overlapping with one another.
CODE:
OUTPUT:
LINE PLOT:
Line plot is a graph that displays data using a number line.
CODE:
OUTPUT:
PIE PLOT:
It’s a circular statistical graph which is divided into slices to illustrate numerical
proportion.
CODE:
OUTPUT:
SCATTER PLOT:
It is used to describe the relationship between two variables, represented by
dots.
CODE:
OUTPUT:
TOPIC 8:
CUSTOMIZING PLOT WITH MATPLOTLIB
Points to be covered:
1. use of subplots
2. Grid histograms
3. graphs overlays using box and whisker plots
4. proper labelling of graph x, y axis title using legend
USE OF SUBPLOTS
Subplot in pyplot is a module in matplotlib library is used to create a figure and
a set of subplots. It’s used for creating multiple axis.
CODE:
OUTPUT:
GRID HISTOGRAM
CODE:
OUTPUT:
BOX PLOT
Boxplot is a way of displaying the distribution of data based on a five number
summary: minimum, first quartile, median, third quartile and maximum.
CODE:
fig = plt.figure(figsize =(10, 7))
ax = fig.add_subplot(111)
ax.boxplot(total_revenue, patch_artist = True,notch ='True', vert = 0)
plt.xlabel("revenue in exponential form")
ax.set_yticklabels(['mexico ', 'australia'])
plt.ylabel('country')
plt.show()
OUTPUT:
CODE:
OUTPUT:
TOPIC 9:
USING SEABORN FOR DATA VISUALIZATION
VISUALIZATION 2
CODE:
OUTPUT:
OUTPUT:
VISUALIZATION 4: CATEGORICAL DATA
CODE:
OUTPUT:
TOPIC 10:
COURSERA