FINAL PROJECT Payal

DIABETES PREDECTION
USING
MACHINE LEARNING IN PYTHON.
Submitting To,
Mr. Mayur Dev Sewak
General Manager, Operations
Eisystems Services.
&
Ms. Mallika Srivastava
Trainer, Data Science & Analytics Domain
Eisystems Services.
Submitted By,
PAYAL ROUL
CONTENT
SERIAL TITLE PAGE
NO. NO.
01 List of Figures 1
02 List of Tables 1
03 Abstract 2
04 Project Summary 3
05 Dataset 3
06 Objective of the Project 4
07 Detail of process 5
08 Apparatus 5
09 Code 6-12
10 Conclusion 12
11 Reference 12
LIST OF FIGURES
1. Linear regression Training dataset (Figure 1)

2. Linear regression Testing dataset (Figure 2)
3. Lasso regression Training dataset (Figure 3)
4. Lasso regression Testing dataset (Figure 4)
LIST OF TABLES
1. Car_Dataset Head (Table1)

2. Car_Dataset information (Table 2)
3. Car_dataset Head (After encoding) (Table 3)
4. Data and Target (X and Y) (Table 4&5)
ABSTRACT
The production of cars has been steadily increasing in the past decade, with over 70 million
passenger cars being produced in the year 2016. This has given rise to the used car market,
which on its own has become a booming industry. The recent advent of online portals has
facilitated the need for both the customer and the seller to be better informed about the
trends and patterns that determine the value of a used car in the market. Using Machine
Learning Algorithms such as Lasso Regression and Linear Regression, we will try to develop a
statistical model which will be able to predict the price of a used car, based on previous
consumer data and a given set of features. We will also be comparing the prediction
accuracy of these models to determine the optimal one.
PROJECT SUMMARY
 Data collection and processing
1. inspecting the first 5 rows of the dataframe
2. shape
3. Information of the dataset
4. checking the distribution of categorical data
5. Encoding the Categorical Data
 Data preparation
1. Splitting the data and Target
2. Splitting Training and Test data
 Model building and evaluation
1. Linear regression (Prediction and Visualization)
2. Lasso regression (Prediction and Visualization)
DATASET
This dataset contains information about used cars listed on www.cardekho.com available on
Kaggle.
This data can be used for a lot of purposes such as price prediction to exemplify the use of
linear regression in Machine Learning.
The columns in the given dataset is as follows:
 Car_Name
 Year
 Selling_Price
 Present_Price
 Kms_Driven
 Fuel_Type
 Seller_Type
 Transmission
 Owner
OBJECTIVE OF THE PROJECT
The objective of our project is to be able to predict the price of a used car given various
attributes (data) of that car. There is a saying that a car loses 10% of its value the moment
you drive it off a lot. Given, that I would expect that one of the main predictors is the
amount of miles driven in the car, since more driving wears down the car. Additionally, I
would expect the brand (make) of the car to also be a factor in the price of a used car, since
some brands of cars cost more and may be better made. I expect to encounter some issues
with multicolinearity since some aspects of cars may be highly correlated. For example,
larger cars will probably have larger engines and more doors. Larger engines are correlated
with more cylinders.
Deciding whether a used car is worth the posted price when you see listings online can be
difficult. Several factors, including mileage, make, model, year, etc. can influence the actual
worth of a car. From the perspective of a seller, it is also a dilemma to price a used car
appropriately. Based on existing data, the aim is to use machine learning algorithms to
develop models for predicting used car prices.
DETAIL OF PROCESS
 Encoding of categorical data:
In our dataset, there are categorical variables and to apply the ML models, we need
to transform these categorical variables into numerical variables.
 Train the data:

In this process, 90% of the data was split for the train data and 10% of the data was
taken as test data.
 ML Models:
o Linear Regression:
Linear regression is a linear approach to modelling the relationship between a
scalar response (or dependent variable) and one or more explanatory
variables (or independent variables). In linear regression, the relationships are
modelled using linear predictor functions whose unknown model parameters
are estimated from the data.
o Lasso Regression:
Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is
where data values are shrunk towards a central point as mean. The lasso
procedure encourages simple, sparse models (i.e. models with fewer
parameters). The goal of lasso regression is to obtain the subset of predictors
that minimizes prediction error for a quantitative response variable. The lasso
does this by imposing a constraint on the model parameters that cause
regression coefficients for some variables to shrink toward zero.
 Visualize the Actual prices and Predicted prices
Using scatter plot to display values for actual and predicted price for a set of data.
APPARATUS
 Google Colab
 Dataset from Kaggle(Car data)
CODE
CONCLUSION
We tried predicting the car price using the various parameters that were provided in the
data about the car. We build linear regression model and lasso regression model to predict
car prices and saw that lasso regression model performed well at this data than linear
regression models.
REFERENCE
1. https://www.kaggle.com/nehalbirla/vehicle-dataset-from-cardekho?
select=car+data.csv
2.

FINAL PROJECT Payal

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FINAL PROJECT Payal

Uploaded by

Copyright:

Available Formats

DIABETES PREDECTION

06 Objective of the Project 4

1. Linear regression Training dataset (Figure 1)

1. Car_Dataset Head (Table1)

 Train the data:

You might also like