AP19110010030 Assignmnet-1 Lab

23/08/2021 AP19110010030_Assignmnet-1 - Jupyter Notebook
Kilaru Sravan
AP19110010030
CSE-A
NumPy
NumPy stands for Numerical Python and it is the fundamental package for scientific computing in Python.
Why Use NumPy?

In Python we have lists that serve the purpose of arrays, but they are slow to process.
Arrays are very frequently used in data science, where speed and resources are very important.
NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and
manipulate them very efficiently.
NumPy provides efficient storage
It provides better way of handling data for processing and fast.
In [1]:
import numpy as np
#we are importing numpy as np
NumPy Creating Of Arrays

In [6]:
#Creation of 1D array
num1=np.array([1,12,18,111])
print('1D Array:',num1)
#Creation of 2D array
num2=np.array([[1,22,23,45],[31,14,995,91],[1,4,5,33]])
print('2D Array:',num2)
1D Array: [ 1 12 18 111]
2D Array: [[ 1 22 23 45]
[ 31 14 995 91]
[ 1 4 5 33]]
NumPy Array Shape
localhost:8888/notebooks/AP19110010030_Assignmnet-1.ipynb 1/24
In [7]:
#SHAPE: The shape of an array is the number of elements in each dimension.
print('shape of 1d',num1.shape)
print('shape of 2d',num2.shape)
shape of 1d (4,)
shape of 2d (3, 4)
In [8]:
arr = np.array([[1,12,18,111], [31,14,995,91]])

for x in arr:
print(x)
[ 1 12 18 111]
[ 31 14 995 91]
Data Type(DTYPE)
In [10]:
#DTYPE: we can identify data type of the array

num1 = np.array([11,2,431,1])
print('data type of num 1',num1.dtype)
string = np.array(["ab","sravan","teju"])
print('data type of string',string.dtype)
float = np.array([1.5,4.6]) #here it is float data type so it returns float
print('data type float',float.dtype)
data type of num 1 int32
data type of string <U6
data type float float64
In [11]:
#Converting Data Type on Existing Arrays

num1=np.array([1,21,41,1],dtype='float')
#it will convert values from int to float
print('data type of num 1',num1.dtype)
data type of num 1 float64
Reshape
In [13]:
#Reshaping the given array

num=np.array([1,2,3,4,5,6,7,8])
re_shape=num.reshape(2,4)
#it will reshape the array into 2,4
print('befor reshaping array:',num,'\n','after reshaping :',re_shape)
num1=np.array([1,2,3,4])
re_shape1=num1.reshape(2,2)
befor reshaping array: [1 2 3 4 5 6 7 8]
after reshaping : [[1 2 3 4]
[5 6 7 8]]
Zeros
In [14]:
#It gives us the number of zero's we required based on size and shape given
arr = np.zeros(6)
print('single array',arr)
single array [0. 0. 0. 0. 0. 0.]
Random
In [16]:
# Random : It gives the random number with in given range

rand_arr = np.random.randint(10,200,6)
print('random number from 10 to 200',rand_arr)
#it will give 6 random numbers between 10 and 200
random number from 10 to 200 [197 95 63 199 22 192]
Range
In [17]:
# Range: It will give us all values in a particular range

x = np.arange(1,20)
print(x)
#it will give array of 1 and 20
y = np.arange(1,20,3)
print(y)
#it will give array of 1 and 20 with a gap of 3
[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
[ 1 4 7 10 13 16 19]
Indexing
In [19]:
# Indexing: It is used to get an element at a given postion

num = np.array([1,2,3,4,5])
print('my array : ',num)
#it will give the all elements
print('1st posistion',num[0])
print('2nd posistion',num[1])
#it will give elements based on index
my array : [1 2 3 4 5]
1st posistion 1
2nd posistion 2
Sorting
In [20]:
# Sorting: It is used to sort the array in a particular order

x = np.array([1,4,55,63,223,654,2,6,18,12])
print('after sort',np.sort(x))
#it will print elements after sorting
after sort [ 1 2 4 6 12 18 55 63 223 654]
Slicing
In [57]:
# Slicing: It is used to get the specified element in the data

print(num[0:2]) #it will print elements from given position
print(num[2:])
print(num[::-1]) #it will print array in reverse order
[1 2]
[3 4 5]
[5 4 3 2 1]
PANDAS
Pandas is a built in library using for data analysis.
Pandas is a open source data analysis library written in python
It provides rich and highly robust data operations
We will be using Pandas heavily for data manipulation, visualisation, building machine learning models, etc.
Pandas implements a number of powerful data operations familiar to users of both database frameworks
and spreadsheet programs.
There are two main data structures in Pandas - Series and Dataframes.
The default way to store data is dataframes, and thus manipulating dataframes quickly is probably the most
important skill set for data analysis.
A series is similar to a 1-D numpy array, and contains values of the same type (numeric, character,
datetime etc.).
A dataframe is simply a table where each column is a pandas series.
In [21]:
#importing pandas as pd
import pandas as pd
Series
In [22]:
# Series: A Pandas Series is like a column in a table.

a = [1,2,3,4,5]
b= pd.Series(a) #converting array into series
print(b)
print(type(b)) #it will give type of b .it returns as series
0 1
1 2
2 3
3 4
4 5
dtype: int64
<class 'pandas.core.series.Series'>
Dataframes
In [24]:
df=pd.DataFrame({
'name':['sravan','kilaru','teju'],
'marks':[60,10,100]
})
#creation of pandas data frame
df
#printing the data frame
Out[24]:
name marks
0 sravan 60
1 kilaru 10
2 teju 100
Importing data and converting into different

formats
In [25]:
# reading data which is in csv

df=pd.read_csv('food.csv')
df #prints the data in csv file
Out[25]:
name ingredients diet prep_time cook_time flavor_profile course state r
Maida flour,
Balu West
0 yogurt, oil, vegetarian 45 25 sweet dessert
shahi Bengal
sugar
Gram flour,
1 Boondi vegetarian 80 30 sweet dessert Rajasthan
ghee, sugar
Carrots,
Gajar milk, sugar,
2 ka ghee, vegetarian 15 60 sweet dessert Punjab
halwa cashews,
raisins
Flour, ghee,
kewra, milk,
3 Ghevar vegetarian 15 30 sweet dessert Rajasthan
clarified
butter, su...
Milk
powder,
Gulab plain flour, West
4 vegetarian 15 40 sweet dessert
jamun baking Bengal
powder,
ghee,...
... ... ... ... ... ... ... ... ...
Glutinous
rice, black
250 Til Pitha vegetarian 5 30 sweet dessert Assam
sesame
seeds, gur
Coconut
milk, egg
251 Bebinca yolks, vegetarian 20 60 sweet dessert Goa
clarified
butter, all...
Cottage
cheese, dry
Jammu &
252 Shufta dates, dried vegetarian -1 -1 sweet dessert
Kashmir
rose petals,
...
Milk
powder, dry
Mawa fruits, Madhya
253 vegetarian 20 45 sweet dessert C
Bati arrowroot Pradesh
powder,
all...
Brown rice,
fennel
seeds,
254 Pinaca vegetarian -1 -1 sweet dessert Goa
grated
coconut,
blac...
255 rows × 9 columns
Pandas - Analyzing DataFrames
Head: Gives first few rows of the data

In [27]:
df.head(6) #it will give us first 6 records

Out[27]:
name ingredients diet prep_time cook_time flavor_profile course state regio
Maida flour,
Balu West
0 yogurt, oil, vegetarian 45 25 sweet dessert Ea
shahi Bengal
sugar
Gram flour,
1 Boondi vegetarian 80 30 sweet dessert Rajasthan We
ghee, sugar
Carrots,
Gajar milk, sugar,
2 ka ghee, vegetarian 15 60 sweet dessert Punjab Nor
halwa cashews,
raisins
Flour, ghee,
kewra, milk,
3 Ghevar vegetarian 15 30 sweet dessert Rajasthan We
clarified
butter, su...
Milk
powder,
4 vegetarian 15 40 sweet dessert Ea
jamun baking Bengal
powder,
ghee,...
Sugar
West
5 Imarti syrup, lentil vegetarian 10 50 sweet dessert Ea
Bengal
flour
Tail: Gives the last few rows of the data
In [28]:
df.tail(6) #it will give us last 6 records

Out[28]:
name ingredients diet prep_time cook_time flavor_profile course state reg
Green
beans,
main West
249 Shukto bitter gourd, vegetarian 10 20 spicy E
course Bengal
ridge gourd,
banana...
Glutinous
rice, black N
sesame E
seeds, gur
Coconut
milk, egg
251 Bebinca yolks, vegetarian 20 60 sweet dessert Goa W
clarified
butter, all...
Cottage
cheese, dry Jammu
252 Shufta dates, dried vegetarian -1 -1 sweet dessert & N
rose petals, Kashmir
...
Milk
powder, dry
Mawa fruits, Madhya
253 vegetarian 20 45 sweet dessert Ce
powder,
all...
Brown rice,
fennel
seeds,
254 Pinaca vegetarian -1 -1 sweet dessert Goa W
grated
coconut,
blac...
Sample: It gives some random records in the data
In [29]:
df.sample(5) #it will give us random 5 records

Out[29]:
name ingredients diet prep_time cook_time flavor_profile course state
Chak Rice, milk,

66 Hao sugar, vegetarian 240 45 sweet dessert Manipur
Kheer cardamom
Aloo,
tomatoes,
Konir non main
240 mustard oil, -1 -1 spicy Assam
Dom vegetarian course
bay leaf,
cinnamo...
Curd,
cooked rice,
main
244 Pakhala curry vegetarian -1 -1 -1 Odisha
course
leaves, dry
chilli
Whole
wheat flour,
206 Sukhdi vegetarian 10 20 sweet dessert Maharashtra
gur, clarified
butter
Pav bhaji
masala,
Pav gobi, main
200 vegetarian 20 40 spicy Maharashtra
Bhaji potatoes, course
green peas,
...
Describe: It gives us statistics on the given

dataframe
In [30]:
df.describe()
Out[30]:
prep_time cook_time
count 255.000000 255.000000
mean 31.105882 34.529412
std 72.554409 48.265650
min -1.000000 -1.000000
25% 10.000000 20.000000
50% 10.000000 30.000000
75% 20.000000 40.000000
max 500.000000 720.000000
DType: It gives data type of every row

In [31]:
df.dtypes
Out[31]:
name object
ingredients object
diet object
prep_time int64
cook_time int64
flavor_profile object
course object
state object
region object
dtype: object
Columns: Gives us all the column names

In [32]:
print(df.columns) #it will show all the columns in the data set
Index(['name', 'ingredients', 'diet', 'prep_time', 'cook_time',
'flavor_profile', 'course', 'state', 'region'],
dtype='object')
Index: It helps us to give the patterns in the index

value
In [33]:
print(df.index)
RangeIndex(start=0, stop=255, step=1)
df[column_name]: It only prints the column of given

name
In [34]:
df['state']
Out[34]:
0 West Bengal
1 Rajasthan
2 Punjab
3 Rajasthan
4 West Bengal
...
250 Assam
251 Goa
252 Jammu & Kashmir
253 Madhya Pradesh
254 Goa
Name: state, Length: 255, dtype: object
sort_values: It helps us to sort the values
In [35]:
df.sort_values(['state'],ascending=False)
#if ascending=false it will sort index of rows from largest to smallest
Out[35]:
name ingredients diet prep_time cook_time flavor_profile course state re
Maida flour,
West
0 Balu shahi yogurt, oil, vegetarian 45 25 sweet dessert
Bengal
sugar
Maida,
main West
224 Luchi vegetable vegetarian 20 30 -1
course Bengal
oil
Chicken,
dahi,
sesame
Chicken non main West
79 seeds, 10 35 spicy
razala vegetarian course Bengal
garam
masala
powd...
Moong dal,
garam
masala main West
84 Daal puri vegetarian 30 30 spicy
powder, course Bengal
garlic,
green ...
Rice flour,
jaggery,
West
36 Adhirasam ghee, vegetarian 10 50 sweet dessert
Bengal
vegetable
oil, elachi
... ... ... ... ... ... ... ... ...
Kala chana,
mashed
109 Pani puri potato, vegetarian 15 2 spicy snack -1
boondi, sev,
lemon
Urad dal,
ginger,
curry
162 Vada vegetarian 15 20 spicy snack -1 S
leaves,
green
chilies,...
Brown rice,
Brown main
231 soy sauce, vegetarian 15 25 -1 -1
Rice course
olive oil
Refined
flour,
besan,
12 Nankhatai vegetarian 20 30 sweet dessert -1
ghee,
powdered
sugar, yo...
Pigeon
peas,
eggplant, main
156 Sambar vegetarian 20 45 spicy -1 S
drumsticks, course
sambar
powd...
In [36]:
df.sort_values(['name'],ascending=True)
#ascending=True sort values from smallest to largest
Out[36]:
name ingredients diet prep_time cook_time flavor_profile course s
Rice flour,
jaggery,
36 Adhirasam ghee, vegetarian 10 50 sweet dessert West Be
vegetable
oil, elachi
Cauliflower,
potato,
garam main
68 Aloo gobi vegetarian 10 20 spicy Pu
masala, course
turmeric,
c...
Potato,
peas,
Aloo chillies, main
70 vegetarian 5 40 spicy Pu
matar ginger, course
garam
masala, ...
Potato,
fenugreek
main
71 Aloo methi leaves, vegetarian 10 40 bitter Pu
course
chillies, salt,
oil
Potato,
shimla
Aloo mirch,
main
72 shimla garam vegetarian 10 40 spicy Pu
course
mirch masala,
amchur
pow...
... ... ... ... ... ... ... ...
Urad dal,
ginger,
curry
162 Vada vegetarian 15 20 spicy snack
leaves,
green
chilies,...
Gobi,
potato,
Veg main
210 beans, khus vegetarian 20 30 spicy Maharas
Kolhapuri course
khus,
coconut
Baby corn,
french
Vegetable beans, main
121 vegetarian 10 30 spicy Pu
jalfrezi garam course
masala,
ginger,...
Chicken,
coconut oil,
non main
211 Vindaloo wine 10 40 spicy
vegetarian course
vinegar,
ginger, gr...
name ingredients diet prep_time cook_time flavor_profile course s
Gram flour,
mustard,
main
166 Zunka garlic, vegetarian 10 25 spicy Maharas
course
turmeric,
red chilli
sort_index: It helps us to sort on basis of indices of

axis
In [37]:
df.sort_index(axis=1,ascending=False)
Out[37]:
state region prep_time name ingredients flavor_profile diet course cook
Maida flour,
West Balu
0 East 45 yogurt, oil, sweet vegetarian dessert
Bengal shahi
sugar
Gram flour,
1 Rajasthan West 80 Boondi sweet vegetarian dessert
ghee, sugar
Carrots,
Gajar milk, sugar,
2 Punjab North 15 ka ghee, sweet vegetarian dessert
halwa cashews,
raisins
Flour, ghee,
kewra, milk,
3 Rajasthan West 15 Ghevar sweet vegetarian dessert
clarified
butter, su...
Milk
powder,
West Gulab plain flour,
4 East 15 sweet vegetarian dessert
Bengal jamun baking
powder,
ghee,...
... ... ... ... ... ... ... ... ...
Glutinous
North rice, black
250 Assam 5 Til Pitha sweet vegetarian dessert
East sesame
seeds, gur
Coconut
milk, egg
251 Goa West 20 Bebinca yolks, sweet vegetarian dessert
clarified
butter, all...
Cottage
cheese, dry
Jammu &
252 North -1 Shufta dates, dried sweet vegetarian dessert
Kashmir
rose petals,
...
Milk
powder, dry
Madhya Mawa fruits,
253 Central 20 sweet vegetarian dessert
Pradesh Bati arrowroot
powder,
all...
Brown rice,
fennel
seeds,
254 Goa West -1 Pinaca sweet vegetarian dessert
grated
coconut,
blac...
In [38]:
df.sort_index(axis=0,ascending=False)
Out[38]:
name ingredients diet prep_time cook_time flavor_profile course state r
Brown rice,
fennel
seeds,
254 Pinaca vegetarian -1 -1 sweet dessert Goa
grated
coconut,
blac...
Milk
powder, dry
Mawa fruits, Madhya
253 vegetarian 20 45 sweet dessert C
powder,
all...
Cottage
cheese, dry
Jammu &
252 Shufta dates, dried vegetarian -1 -1 sweet dessert
Kashmir
rose petals,
...
Coconut
milk, egg
251 Bebinca yolks, vegetarian 20 60 sweet dessert Goa
clarified
butter, all...
Glutinous
rice, black
sesame
seeds, gur
... ... ... ... ... ... ... ... ...
Milk
powder,
4 vegetarian 15 40 sweet dessert
jamun baking Bengal
powder,
ghee,...
Flour, ghee,
kewra, milk,
3 Ghevar vegetarian 15 30 sweet dessert Rajasthan
clarified
butter, su...
Carrots,
Gajar milk, sugar,
2 ka ghee, vegetarian 15 60 sweet dessert Punjab
halwa cashews,
raisins
Gram flour,
1 Boondi vegetarian 80 30 sweet dessert Rajasthan
ghee, sugar
Maida flour,
Balu West
0 yogurt, oil, vegetarian 45 25 sweet dessert
shahi Bengal
sugar
Indexing: It gives us columns which are specified

23/08/2021
g g p
AP19110010030_Assignmnet-1 - Jupyter Notebook
In [40]:
df[['state','name']]
Out[40]:
state name
0 West Bengal Balu shahi
1 Rajasthan Boondi
2 Punjab Gajar ka halwa
3 Rajasthan Ghevar
4 West Bengal Gulab jamun
... ... ...
250 Assam Til Pitha
251 Goa Bebinca
252 Jammu & Kashmir Shufta
253 Madhya Pradesh Mawa Bati
254 Goa Pinaca
Filtering: We can apply some conditions to check if it

satisifies or not
It returns boolean values
In [41]:
df['state']=='Goa' #it will return boolean True or False if the condition is true then true
Out[41]:
0 False
1 False
2 False
3 False
4 False
...
250 False
251 True
252 False
253 False
254 True
Name: state, Length: 255, dtype: bool
In [43]:
df[df['state']=='Goa'] # It only prints the rows of given condition

Out[43]:
name ingredients diet prep_time cook_time flavor_profile course state regio
Chicken,
coconut oil,
non main
211 Vindaloo wine 10 40 spicy Goa Wes
vegetarian course
vinegar,
ginger, gr...
Coconut
milk, egg
251 Bebinca yolks, vegetarian 20 60 sweet dessert Goa Wes
clarified
butter, all...
Brown rice,
fennel
seeds,
254 Pinaca vegetarian -1 -1 sweet dessert Goa Wes
grated
coconut,
blac...
Checking for missing Values

In [44]:
#we use isnull to check weather dataset is having missing values or not
#if True there are missing values else no missing values
df.isnull().values.any()
Out[44]:
True
In [45]:
#To check missing values for a specific row we use column name
df['state'].isnull().values.any()
Out[45]:
False
MATPLOTLIB
Matplotlib is an amazing visualization library in Python for 2D plots of arrays.
Matplotlib is a multi-platform data visualization library built on NumPy arrays and designed to work with the
broader SciPy stack. It was introduced by John Hunter in the year 2002.
One of the greatest benefits of visualization is that it allows us visual access to huge amounts of data in
easily digestible visuals.
Matplotlib consists of several plots like line, bar, scatter, histogram etc
In [46]:
#importing matplotlib
import matplotlib.pyplot as plt
In [48]:
import matplotlib.pyplot as plt

import numpy as np
x = np.array([0,1,2,3,4,5,6]) #x avlues
y = np.array([0,2,4,5,6,7,8]) #y values
plt.plot(x,y) #it will plot between x and y
plt.show() #it will show the graph
In [49]:
# Plotting Without Line

plt.plot(x, y, 'o')
plt.show()
In [50]:
plt.plot(x, y, '*')
plt.show()
Labels and Title

In [51]:
plt.plot(x,y,'o')
plt.xlabel("Day")
plt.ylabel("Items sold")
plt.show()
Scatter Plot
In [52]:
x = np.array([1,2,3,4,5])
y = np.array([10,3,40,50,60])
plt.scatter(x, y) #it will gives us a scatter plot
plt.show()
Bar Graph
In [53]:
x = np.array(["Sravan", "Kilaru", "ababab", "teju"])

y = np.array([18,12,45,32])
plt.bar(x,y)
plt.show()
Pie Chart
In [54]:
y = np.array([11,33,56,43,18,34,12])
mylabels = ['a','b','c','d','e','f','g']
plt.pie(y, labels = mylabels, startangle = 90)
plt.show()
Box Plot
In [55]:
x = np.array([11,33,56,43,18,34,12])
plt.boxplot(x)
plt.show()
Histogrm
In [56]:
x = np.random.randn(100)
plt.hist(x, histtype = "step")
plt.show()

AP19110010030 Assignmnet-1 Lab

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AP19110010030 Assignmnet-1 Lab

Uploaded by

Copyright:

Available Formats

23/08/2021 AP19110010030_Assignmnet-1 - Jupyter Notebook

Why Use NumPy?

NumPy Creating Of Arrays

NumPy Array Shape

#SHAPE: The shape of an array is the number of elements in each dimension.

arr = np.array([[1,12,18,111], [31,14,995,91]])

#DTYPE: we can identify data type of the array

data type of num 1 int32

data type of string <U6

data type float float64

#Converting Data Type on Existing Arrays

data type of num 1 float64

#Reshaping the given array

befor reshaping array: [1 2 3 4 5 6 7 8]

after reshaping : [[1 2 3 4]

single array [0. 0. 0. 0. 0. 0.]

# Random : It gives the random number with in given range

# Range: It will give us all values in a particular range

# Indexing: It is used to get an element at a given postion

# Sorting: It is used to sort the array in a particular order

after sort [ 1 2 4 6 12 18 55 63 223 654]

# Slicing: It is used to get the specified element in the data

A dataframe is simply a table where each column is a pandas series.

# Series: A Pandas Series is like a column in a table.

Importing data and converting into different

# reading data which is in csv

name ingredients diet prep_time cook_time flavor_profile course state r

... ... ... ... ... ... ... ... ...

255 rows × 9 columns

Pandas - Analyzing DataFrames

Head: Gives first few rows of the data

df.head(6) #it will give us first 6 records

name ingredients diet prep_time cook_time flavor_profile course state regio

Tail: Gives the last few rows of the data

df.tail(6) #it will give us last 6 records

name ingredients diet prep_time cook_time flavor_profile course state reg

Sample: It gives some random records in the data

df.sample(5) #it will give us random 5 records

name ingredients diet prep_time cook_time flavor_profile course state

Chak Rice, milk,

Describe: It gives us statistics on the given

count 255.000000 255.000000

mean 31.105882 34.529412

std 72.554409 48.265650

min -1.000000 -1.000000

25% 10.000000 20.000000

50% 10.000000 30.000000

75% 20.000000 40.000000

max 500.000000 720.000000

DType: It gives data type of every row

Columns: Gives us all the column names

Index(['name', 'ingredients', 'diet', 'prep_time', 'cook_time',

'flavor_profile', 'course', 'state', 'region'],

Index: It helps us to give the patterns in the index

RangeIndex(start=0, stop=255, step=1)

df[column_name]: It only prints the column of given

252 Jammu & Kashmir

253 Madhya Pradesh

Name: state, Length: 255, dtype: object

sort_values: It helps us to sort the values

name ingredients diet prep_time cook_time flavor_profile course state re

... ... ... ... ... ... ... ... ...