You are on page 1of 7

23/01/2024, 15:39 Data Preprocessing.

ipynb - Colaboratory

import pandas as pd

keyboard_arrow_down Data Loading


df=pd.read_csv("/content/studentdataset (1)

keyboard_arrow_down Data Exploration


print(type(df))
<class 'pandas.core.frame.DataFrame'>

print(df.head(10))
ID class gender race GPA Maths English Science computer \
0 1141 A male 1 73.47 NaN 81 87 60
1 1142 A female 1 71.22 NaN 50 51 51
2 1143 A female 2 74.56 NaN 48 71 60
3 1144 A female 1 72.89 NaN 72 38 60
4 1145 A female 1 70.11 NaN 45 63 60
5 1146 A male 3 65.04 NaN 60 39 61
6 1147 A male 4 77.11 NaN 43 52 63
7 1148 A female 5 64.75 NaN 38 60 63
8 1149 B female 5 77.92 NaN 60 66 68
9 1150 A female 5 76.50 NaN 61 60 69

History from1 from2 from3 from4 y


0 NaN A A A 3 0
1 NaN B A A 2 0
2 NaN C A A 0 1
3 NaN D A A 0 0
4 NaN E A A 0 0
5 NaN F B C 0 0
6 NaN G A A 0 1
7 NaN H B C 0 0
8 80.0 I B A 0 0
9 NaN H B A 0 0

print(df.tail(10))
ID class gender race GPA Maths English Science computer \
95 1236 A female 1 87.63 82.0 81 97 97
96 1237 A male 2 91.74 94.0 100 96 97
97 1238 A male 1 91.14 98.0 90 98 97
98 1239 A male 1 90.31 84.0 82 99 97
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 1/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
99 1240 B male 1 88.10 87.0 70 95 97
100 1241 A female 1 88.34 87.0 83 92 98
101 1242 B male 1 89.84 98.0 77 95 98
102 1243 B male 1 88.82 83.0 80 91 98
103 1244 A male 1 86.60 92.0 82 91 99
104 1245 A male 1 93.71 93.0 97 99 100

History from1 from2 from3 from4 y


95 88.0 J B A 2 0
96 95.0 C B S 0 2
97 83.0 AA B A 0 1
98 89.0 P B A 0 2
99 91.0 AB B A 0 0
100 93.0 M B A 0 1
101 96.0 A B A 0 1
102 93.0 T B A 0 2
103 94.0 S B A 0 2
104 97.0 K B A 0 2

print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105 entries, 0 to 104
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 105 non-null int64
1 class 105 non-null object
2 gender 105 non-null object
3 race 105 non-null int64
4 GPA 105 non-null float64
5 Maths 46 non-null float64
6 English 105 non-null int64
7 Science 105 non-null int64
8 computer 105 non-null int64
9 History 90 non-null float64
10 from1 105 non-null object
11 from2 105 non-null object
12 from3 105 non-null object
13 from4 105 non-null int64
14 y 105 non-null int64
dtypes: float64(3), int64(7), object(5)
memory usage: 12.4+ KB
None

print(df.describe())
ID race GPA Maths English Science \
count 105.000000 105.000000 105.000000 46.000000 105.000000 105.000000
mean 1193.000000 1.790476 82.957048 87.021739 71.961905 78.942857
std 30.454885 1.673867 6.053187 5.327034 12.197039 14.997326
min 1141.000000 1.000000 63.490000 80.000000 38.000000 17.000000
25% 1167.000000 1.000000 79.340000 82.000000 64.000000 71.000000
50% 1193.000000 1.000000 84.110000 87.000000 73.000000 83.000000
75% 1219.000000 1.000000 87.300000 92.000000 80.000000 91.000000
max 1245.000000 7.000000 93.710000 98.000000 100.000000 99.000000

https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 2/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory

computer History from4 y


count 105.000000 90.000000 105.000000 105.000000
mean 85.133333 87.011111 0.504762 0.714286
std 10.269509 6.336083 0.889293 0.828742
min 51.000000 75.000000 0.000000 0.000000
25% 80.000000 82.000000 0.000000 0.000000
50% 87.000000 87.500000 0.000000 0.000000
75% 92.000000 92.750000 0.000000 1.000000
max 100.000000 97.000000 3.000000 2.000000

print(df.dtypes)
ID int64
class object
gender object
race int64
GPA float64
Maths float64
English int64
Science int64
computer int64
History float64
from1 object
from2 object
from3 object
from4 int64
y int64
dtype: object

print(df.shape)
(105, 15)

keyboard_arrow_down Data Cleaning


df.isnull()

https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 3/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory

ID class gender race GPA Maths English Science computer History fr

0 False False False False False True False False False True Fa

1 False False False False False True False False False True Fa

2 False False False False False True False False False True Fa

3 False False False False False True False False False True Fa

4 False False False False False True False False False True Fa

... ... ... ... ... ... ... ... ... ... ...

100 False False False False False False False False False False Fa

101 False False False False False False False False False False Fa

102 False False False False False False False False False False Fa

103 False False False False False False False False False False Fa

104 False False False False False False False False False False Fa

105 rows × 15 columns

df.isnull().sum()
ID 0
class 0
gender 0
race 0
GPA 0
Maths 59
English 0
Science 0
computer 0
History 15
from1 0
from2 0
from3 0
from4 0
y 0
dtype: int64

df.columns
Index(['ID', 'class', 'gender', 'race', 'GPA', 'Maths', 'English', 'Science',
'computer', 'History', 'from1', 'from2', 'from3', 'from4', 'y'],
dtype='object')

df.columns=['ID', 'class', 'Gender', 'Race',


'computer', 'History', 'From1', 'From

https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 4/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory

df.head()
ID class Gender Race GPA Maths English Science computer History From1

0 1141 A male 1 73.47 NaN 81 87 60 NaN A

1 1142 A female 1 71.22 NaN 50 51 51 NaN B

2 1143 A female 2 74.56 NaN 48 71 60 NaN C

3 1144 A female 1 72.89 NaN 72 38 60 NaN D

4 1145 A female 1 70.11 NaN 45 63 60 NaN E

print(df['Maths'])
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
100 87.0
101 98.0
102 83.0
103 92.0
104 93.0
Name: Maths, Length: 105, dtype: float64

Mean=df['Maths'].mean()
print(Mean)
87.02173913043478

df['Maths']=df['Maths'].fillna(Mean)
df.head()
ID class Gender Race GPA Maths English Science computer History F

0 1141 A male 1 73.47 87.021739 81 87 60 NaN

1 1142 A female 1 71.22 87.021739 50 51 51 NaN

2 1143 A female 2 74.56 87.021739 48 71 60 NaN

3 1144 A female 1 72.89 87.021739 72 38 60 NaN

4 1145 A female 1 70.11 87.021739 45 63 60 NaN

https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 5/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory

Mean=df['History'].mean()
df['History']=df['History'].fillna(Mean)
df.head()
ID class Gender Race GPA Maths English Science computer History

0 1141 A male 1 73.47 87.021739 81 87 60 87.011111

1 1142 A female 1 71.22 87.021739 50 51 51 87.011111

2 1143 A female 2 74.56 87.021739 48 71 60 87.011111

3 1144 A female 1 72.89 87.021739 72 38 60 87.011111

4 1145 A female 1 70.11 87.021739 45 63 60 87.011111

df.dtypes
ID int64
class object
Gender object
Race int64
GPA float64
Maths float64
English int64
Science int64
computer int64
History float64
From1 object
From2 object
From3 object
From4 int64
Y int64
dtype: object

df['History']=df['History'].astype(int)
df['Maths']=df['Maths'].astype(int)
df.head()
ID class Gender Race GPA Maths English Science computer History From1

0 1141 A male 1 73.47 87 81 87 60 87 A

1 1142 A female 1 71.22 87 50 51 51 87 B

2 1143 A female 2 74.56 87 48 71 60 87 C

3 1144 A female 1 72.89 87 72 38 60 87 D

4 1145 A female 1 70.11 87 45 63 60 87 E

https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 6/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory

df.dtypes
ID int64
class object
Gender object
Race int64
GPA float64
Maths int64
English int64
Science int64
computer int64
History int64
From1 object
From2 object
From3 object
From4 int64
Y int64
dtype: object

df=df.drop(['GPA'],axis=1)

df.head()
ID class Gender Race Maths English Science computer History From1 From2

0 1141 A male 1 87 81 87 60 87 A A

1 1142 A female 1 87 50 51 51 87 B A

2 1143 A female 2 87 48 71 60 87 C A

3 1144 A female 1 87 72 38 60 87 D A

4 1145 A female 1 87 45 63 60 87 E A

https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 7/7

You might also like