You are on page 1of 1

Riphah International University

Probability and Statistics Fall 2019

Dec 6 Assignment 2 Marks: 50


Due On Dec.
13

Consider the file imported_cars.csv uploaded with the assignment. This file contains
following 26 attributes

['symbols','losses','make','fuel_type','aspirator','doors','body_style','wheels','engine_loc','
wheel_base','length','width','height','curb_weight','engine_type','cylinders','engine_size','f
uel_system','bore','stroke','compression','horse_power','peak_rpm','city_mpg','highway_
mpg','price']

The file doesn’t contain the column header i.e. it contains only records. You need to
insert the column header into the data frame. After loading the file do the following

1. Some of the attributes contain null values (the null value in this dataset is
represented by ‘?’). You need to plot the attributes against the number of null
entries in them. i.e. bar graph of null value count in each column containing
null values
2. Plot the number of cars against the makes i.e. find out all the unique make
values (audi, bmw, honda etc) and then plot them against the count of each
3. Plot the counts of fuel types i.e. # of vehicles in each different type of fuel.
4. Plot city_mpg against highway_mpg through line or scatter plot
5. Display the average price of each make of car i.e. bar graph showing 1 bar for
each make where the bar height shows the mean price for that make of car
6. Using the catplot of seaborn, plot a bargraph that show body_style against
average price for each fuel_type (use x-axis body_style, y axis price and col
fuel_type)
7. Run the following code (assuming the name of your dataframe is dataset)

with sns.axes_style('white'):
sns.jointplot("horse_power", "price", data=dataset, kind='hex')

Give interpretation of this graph.


8. Apply multiple linear regression using “horse_power”, “highway_mpg”, “losses”
as independent variables to predict price.

You might also like