You are on page 1of 5

University Institute of Engineering

Department of Computer Science & Engineering

 Type of cancer dataset

viii.("type-->",type(cancerDataSet))
print print(type(cancerDataSet))

Output iii:
type--> <class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>

 Get the dimension of the dataset


ix. print("cancerDataSet.shape",cancerDataSet)
print(cancerDataSet.shape)
print("Rows     -->",cancerDataSet.shape[0])   ##axis 0 --row
print("columns  -->",cancerDataSet.shape[1])    ##column

Output ix:
cancerDataSet.shape Class age menopause tumor-size inv-nodes node-caps deg-
malig \
0 0 5 1 1 1 2 1
1 0 5 4 4 5 7 10
2 0 3 1 1 1 2 2
3 0 6 8 8 1 3 4
4 0 4 1 1 3 2 1
.. ... ... ... ... ... ... ...
678 0 3 1 1 1 3 2
679 0 2 1 1 1 2 1
680 1 5 10 10 3 7 3
681 1 4 8 6 4 3 4
682 1 4 8 8 5 4 5

breast breast-quad irradiat


0 3 1 1
1 3 2 1
2 3 1 1
3 3 7 1
4 3 1 1
.. ... ... ...
678 1 1 1
679 1 1 1
680 8 10 2
University Institute of Engineering

Department of Computer Science & Engineering

681 10 6 1
682 10 4 1

[683 rows x 10 columns]


(683, 10)
Rows --> 683
columns --> 10
 Accessing data from dataset - Part (using loc - Column Name) syntax--
>loc[ROW,COL_Names_in_List

x:
cancerDataSet.loc[:10,['Class', 'age']]

cancerDataSet.loc[10:100,['Class', 'age']]

cancerDataSet.loc[6:78,['Class', 'age']]

Output x:
age
Class

6 0 1

7 0 2

8 0 2

9 0 4

10 0 1

... ... ...

74 0 1

75 0 5

76 0 3

77 0 2
University Institute of Engineering

Department of Computer Science & Engineering

age
Class

78 0 2

73 rows × 2 columns

 Accessing data from dataset - Part 2 (using iloc - Column position) syntax--
>iloc[ROW,COL_Position]
xi. cancerDataSet.iloc[0:10,5:]
cancerDataSet.iloc[10:100,:-2]
cancerDataSet.iloc[20:30,1:5]
Output xii:

 Get the mean of the all columns present in the dataset

Xii: cancerDataSet.mean()
cancerDataSet.tail(50).mean()

Output xii:
University Institute of Engineering

Department of Computer Science & Engineering

 Get the maximum of each column in the dataset


Xiii : cancerDataSet.max()

cancerDataSet.min()

Output xiii:

 Drp NA values (delete rows)

xiv: cancerDataSet.isnull().sum()
cancerDataSet.dropna()
cancerDataSet.dropna(axis = 'columns')

cancerDataSet.fillna(0)

Output xiv:
University Institute of Engineering

Department of Computer Science & Engineering

You might also like