Professional Documents
Culture Documents
[29]: df
[30]: df.to_csv("fruits_and_Vegetable.csv",index=False)
[31]: a1 = pd.read_csv('fruits_and_Vegetable.csv')
[32]: a1
1
7 pomegranate raddish
8 pineapple beetroot
9 mango cabbage
[ ]: #######################################################################################
[34]: df
cast country \
0 NaN United States
1 Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban… South Africa
2 Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi… NaN
3 NaN NaN
4 Mayur More, Jitendra Kumar, Ranjan Raj, Alam K… India
… … …
8802 Mark Ruffalo, Jake Gyllenhaal, Robert Downey J… United States
8803 NaN NaN
8804 Jesse Eisenberg, Woody Harrelson, Emma Stone, … United States
8805 Tim Allen, Courteney Cox, Chevy Chase, Kate Ma… United States
8806 Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan… India
2
4 September 24, 2021 2021 TV-MA 2 Seasons
… … … … …
8802 November 20, 2019 2007 R 158 min
8803 July 1, 2019 2018 TV-Y7 2 Seasons
8804 November 1, 2019 2009 R 88 min
8805 January 11, 2020 2006 PG 88 min
8806 March 2, 2019 2015 TV-14 111 min
listed_in \
0 Documentaries
1 International TV Shows, TV Dramas, TV Mysteries
2 Crime TV Shows, International TV Shows, TV Act…
3 Docuseries, Reality TV
4 International TV Shows, Romantic TV Shows, TV …
… …
8802 Cult Movies, Dramas, Thrillers
8803 Kids' TV, Korean TV Shows, TV Comedies
8804 Comedies, Horror Movies
8805 Children & Family Movies, Comedies
8806 Dramas, International Movies, Music & Musicals
description
0 As her father nears the end of his life, filmm…
1 After crossing paths at a party, a Cape Town t…
2 To protect his family from a powerful drug lor…
3 Feuds, flirtations and toilet talk go down amo…
4 In a city of coaching centers known to train I…
… …
8802 A political cartoonist, a crime reporter and a…
8803 While living alone in a spooky town, a young g…
8804 Looking to survive in a world taken over by zo…
8805 Dragged from civilian life, a former superhero…
8806 A scrappy but poor boy worms his way into a ty…
[35]: df.head()
cast country \
0 NaN United States
3
1 Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban… South Africa
2 Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi… NaN
3 NaN NaN
4 Mayur More, Jitendra Kumar, Ranjan Raj, Alam K… India
listed_in \
0 Documentaries
1 International TV Shows, TV Dramas, TV Mysteries
2 Crime TV Shows, International TV Shows, TV Act…
3 Docuseries, Reality TV
4 International TV Shows, Romantic TV Shows, TV …
description
0 As her father nears the end of his life, filmm…
1 After crossing paths at a party, a Cape Town t…
2 To protect his family from a powerful drug lor…
3 Feuds, flirtations and toilet talk go down amo…
4 In a city of coaching centers known to train I…
[36]: df.tail()
cast country \
8802 Mark Ruffalo, Jake Gyllenhaal, Robert Downey J… United States
8803 NaN NaN
8804 Jesse Eisenberg, Woody Harrelson, Emma Stone, … United States
8805 Tim Allen, Courteney Cox, Chevy Chase, Kate Ma… United States
8806 Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan… India
4
8806 March 2, 2019 2015 TV-14 111 min
listed_in \
8802 Cult Movies, Dramas, Thrillers
8803 Kids' TV, Korean TV Shows, TV Comedies
8804 Comedies, Horror Movies
8805 Children & Family Movies, Comedies
8806 Dramas, International Movies, Music & Musicals
description
8802 A political cartoonist, a crime reporter and a…
8803 While living alone in a spooky town, a young g…
8804 Looking to survive in a world taken over by zo…
8805 Dragged from civilian life, a former superhero…
8806 A scrappy but poor boy worms his way into a ty…
[38]: df.shape
[39]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 show_id 8807 non-null object
1 type 8807 non-null object
2 title 8807 non-null object
3 director 6173 non-null object
4 cast 7982 non-null object
5 country 7976 non-null object
6 date_added 8797 non-null object
7 release_year 8807 non-null int64
8 rating 8803 non-null object
9 duration 8804 non-null object
10 listed_in 8807 non-null object
11 description 8807 non-null object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB
[40]: df.describe()
[40]: release_year
count 8807.000000
mean 2014.180198
std 8.819312
5
min 1925.000000
25% 2013.000000
50% 2017.000000
75% 2019.000000
max 2021.000000
[41]: df.isnull()
[42]: df.isnull().sum()
[42]: show_id 0
type 0
title 0
director 2634
cast 825
country 831
date_added 10
release_year 0
rating 4
6
duration 3
listed_in 0
description 0
dtype: int64
[98]: #######################################################################################
[45]: df1.head()
[46]: df1.tail()
[47]: df1['mainroad']=='yes'
[47]: 0 True
1 True
7
2 True
3 True
4 True
…
540 True
541 False
542 True
543 False
544 True
Name: mainroad, Length: 545, dtype: bool
[48]: min(df1['area'])
[48]: 1650
[49]: max(df1['area'])
[49]: 16200
[50]: df1.isna()
8
furnishingstatus
0 False
1 False
2 False
3 False
4 False
.. …
540 False
541 False
542 False
543 False
544 False
[51]: df1.value_counts()
9
4 b) Reshaping, Filtering, Scaling, Merging the data and Handling
the missing values in datasets.
5 Merging
[52]: house_price=pd.DataFrame()
house_area=pd.DataFrame()
[53]: house_price=pd.read_csv('Housing.csv')
house_price=house_price[['price','mainroad']]
house_price
[54]: house_area=pd.read_csv('Housing.csv')
house_area=house_area[['area','mainroad']]
house_area
10
[55]: # Merging the dataframe
house_data=house_price.merge(house_area, how='inner',on='mainroad')
house_data
[56]: show_id 0
type 0
title 0
director 2634
cast 825
country 831
date_added 10
release_year 0
rating 4
duration 3
listed_in 0
description 0
dtype: int64
[ ]: df_trans1 = imputer.fit_transform(df)
# Not able to Handle missing values as dataframe contans categorical data
11
7 Label Encoding
[59]: from sklearn.preprocessing import LabelEncoder
[60]: df1.head()
[62]: new_house_data=pd.DataFrame()
new_house_data
[64]: new_house_data
[64]: area
0 232
1 260
2 268
3 237
4 232
.. …
540 39
541 15
542 72
543 35
544 90
12
[65]: new_house_data['price'] = df1['price']
new_house_data['bedrooms'] = df1['bedrooms']
new_house_data['stories'] = encoder.fit_transform(df1['stories'])
new_house_data['bathrooms'] = df1['bathrooms']
[66]: new_house_data
[68]: # Now all column data is in numner now we can handle missing data.
[71]: house_NaN_handle
[72]: house_NaN_handle=pd.DataFrame(house_NaN_handle)
[73]: house_NaN_handle
[73]: 0 1 2 3 4
0 232.0 13300000.0 4.0 2.0 2.0
1 260.0 12250000.0 4.0 3.0 4.0
13
2 268.0 12250000.0 3.0 1.0 2.0
3 237.0 12215000.0 4.0 1.0 2.0
4 232.0 11410000.0 4.0 1.0 1.0
.. … … … … …
540 39.0 1820000.0 2.0 0.0 1.0
541 15.0 1767150.0 3.0 0.0 1.0
542 72.0 1750000.0 2.0 0.0 1.0
543 35.0 1750000.0 3.0 0.0 1.0
544 90.0 1750000.0 3.0 1.0 1.0
[74]: house_NaN_handle.isnull().sum()
[74]: 0 0
1 0
2 0
3 0
4 0
dtype: int64
[76]: housedata.shape
[76]: (545, 5)
[78]: X
[78]: 0 1 2 3
0 232.0 13300000.0 4.0 2.0
1 260.0 12250000.0 4.0 3.0
2 268.0 12250000.0 3.0 1.0
3 237.0 12215000.0 4.0 1.0
4 232.0 11410000.0 4.0 1.0
.. … … … …
540 39.0 1820000.0 2.0 0.0
541 15.0 1767150.0 3.0 0.0
542 72.0 1750000.0 2.0 0.0
543 35.0 1750000.0 3.0 0.0
544 90.0 1750000.0 3.0 1.0
14
[545 rows x 4 columns]
[81]: Y
[81]: 0
0 2.0
1 4.0
2 2.0
3 2.0
4 1.0
.. …
540 1.0
541 1.0
542 1.0
543 1.0
544 1.0
[85]: X_scaled
[85]: 0 1 2 3
0 0.819788 1.000000 0.6 0.666667
1 0.918728 0.909091 0.6 1.000000
2 0.946996 0.909091 0.4 0.333333
3 0.837456 0.906061 0.6 0.333333
4 0.819788 0.836364 0.6 0.333333
.. … … … …
540 0.137809 0.006061 0.2 0.000000
541 0.053004 0.001485 0.4 0.000000
542 0.254417 0.000000 0.2 0.000000
543 0.123675 0.000000 0.4 0.000000
544 0.318021 0.000000 0.4 0.333333
15
[545 rows x 4 columns]
[88]: X_train
[88]: 0 1 2 3
166 243.0 5320000.0 3.0 0.0
378 12.0 3640000.0 3.0 2.0
349 141.0 3780000.0 3.0 1.0
368 168.0 3675000.0 2.0 0.0
306 142.0 4165000.0 3.0 1.0
.. … … … …
299 220.0 4200000.0 3.0 0.0
534 39.0 2100000.0 4.0 1.0
493 95.0 2800000.0 3.0 0.0
527 2.0 2275000.0 2.0 0.0
168 115.0 5250000.0 4.0 1.0
[89]: X_train.shape
[89]: (381, 4)
[90]: X_test
[90]: 0 1 2 3
333 39.0 3920000.0 3.0 1.0
84 84.0 6510000.0 3.0 1.0
439 93.0 3255000.0 2.0 0.0
396 75.0 3500000.0 2.0 0.0
161 188.0 5460000.0 3.0 2.0
.. … … … …
117 80.0 5950000.0 4.0 1.0
314 102.0 4095000.0 2.0 1.0
340 160.0 3850000.0 5.0 1.0
444 46.0 3220000.0 3.0 1.0
307 107.0 4165000.0 3.0 1.0
[91]: X_test.shape
16
[91]: (164, 4)
[94]: Y_train
[94]: 0
166 1.0
378 1.0
349 1.0
368 1.0
306 1.0
.. …
299 1.0
534 1.0
493 1.0
527 1.0
168 1.0
[95]: Y_train.shape
[95]: (381, 1)
[96]: Y_test
[96]: 0
333 1.0
84 1.0
439 1.0
396 1.0
161 1.0
.. …
117 1.0
314 1.0
340 2.0
444 1.0
307 1.0
[97]: Y_test.shape
[97]: (164, 1)
[ ]:
17