You are on page 1of 22

ANALYZING FACTORS

INFLUENCING USED CAR


PRICES: A STATISTICAL
STUDY OF THE TORONTO
MARKET
AUGUST 2023

QUANTITATIVE
METHODS PROJECT
AUTHORS
HIBA (EMBA/03/26)
PANKAJ DUBEY (EMBA/03/11)
RAJAT SINGH (EMBA/03/33)
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

ACKNOWLEDGMENT

All praises and gratitude are directed towards the Almighty God, the creator of the
universe, whose power and glory are manifested in the accomplishment of all things
good. His boundless mercy has bestowed upon us the potential, abilities, and the
opportunity to engage in the endeavor of this project.

Our profound appreciation extends to our esteemed educator, Professor Ankit Sharma,
whose unwavering guidance has illuminated every step of this project. It is without a
doubt that devoid of his sagacious mentorship, embarking upon this project would
have been a formidable task. May divine recompense be granted to him in accordance
with his deserving stature. Our gratitude further encompasses all individuals
intricately involved in the development of this project.

We express our sincere thankfulness to you, Sir, for elucidating complex notions and
enriching our understanding through the prism of your experience. The knowledge
imparted by you will undoubtedly navigate our future studies and professional
trajectories. Your consistent cooperation and invaluable assistance have not gone
unnoticed. We aspire that our endeavors have aligned with your expectations.

Concluding, we convey our heartfelt gratitude to our professor, with fervent prayers
accompanying our sentiments. It is through his erudition that we acquired the
knowledge propelling us to the successful culmination of this project.

Page 2 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

OBJECTIVE

Abstract
This project investigates the intricate dynamics of the used car market through the
lens of business statistics. By applying an array of statistical tools and methodologies,
this study aims to extract meaningful insights from secondary data pertaining to the
automotive sector, specifically the used car market in Toronto, Ontario, Canada.
Through a comprehensive analysis of various factors affecting used car prices, the
project seeks to enrich participants' comprehension of statistical concepts and their
practical implications in real-world scenarios. The project adheres to academic
integrity principles, emphasizing originality and accurate referencing in all phases of
research and reporting.

Introduction
The focal point of this project lies in the application of business statistics concepts to
secondary data derived from the automotive industry. In line with the coursework, the
project's objective is to illustrate participants' prowess in utilizing statistical tools,
techniques, and methodologies to uncover actionable insights from real-world data.

The chosen business sector for this analysis is the global used car market, which
serves as a prime example of a complex and multifaceted industry. This sector
comprises a diverse range of vehicles catering to varying consumer preferences,
spanning from economical options to luxury choices. As an integral component of the
automotive industry, the used car market plays a crucial role in addressing consumer
transportation needs while fostering competition and innovation among businesses.

The backdrop of this study is the remarkable growth of the global used car market,
valued at $1.41 trillion in 2020 and projected to reach $2.48 trillion by 2028. This
expansion highlights the sector's significance and underscores the need for a deeper
understanding of its dynamics.

Research centered on the factors influencing used car prices can yield valuable
insights into the interplay of supply and demand, thereby contributing to an efficient
and stable market. A thorough comprehension of these dynamics has far-reaching
implications, ranging from affordable transportation options for consumers to
bolstering overall economic health. The interplay of factors such as financing
availability, competition levels among dealerships, and the condition of vehicles can
sway the prices consumers are willing to pay for used cars.

By delving into these nuances and other pertinent factors, policymakers and industry
stakeholders can devise strategies that foster a thriving and efficient used car market.
This may involve initiatives aimed at transparency promotion, fair competition, and

Page 3 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

the reduction of entry barriers for emerging players. Moreover, investments in


technological advancements and infrastructure can enhance market accessibility and
efficiency.

This project narrows its focus to analyze the impact of two critical factors, year and
Kilometres run, on used car sales prices. Additional variables such as fuel type,
engine type, make, and passenger capacity are held constant to isolate the effects
under scrutiny. This rigorous examination of variables seeks to uncover the intricate
relationships influencing used car pricing dynamics.

The used car market, characterized by its intricate and dynamic nature, constantly
engages buyers and sellers in pursuit of an advantageous position. Among the key
determinants of success in this domain is a profound understanding of a vehicle's
value. This study capitalizes on a dataset encompassing 18,647 used vehicles listed
for sale within a 25KM radius of downtown Toronto in 2023. By employing
econometric tools, the study endeavors to discern the pivotal factors significantly
shaping the price of a used vehicle.

In essence, this project endeavors to facilitate well-informed decisions for buyers and
sellers, enhancing the efficiency of the used car market. The intrinsic importance of
researching used car prices extends to its potential to cultivate favorable outcomes for
consumers, sellers, and the broader economy. Through a meticulous application of
statistical methodologies and a dedication to academic integrity, this project
contributes to a comprehensive understanding of the intricate world of used car
pricing dynamics.

Page 4 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

List of Tables

Descriptive Analysis Tables

T: 1…………………………………………………………10

Correlation Tables

T: 2…………………………………………………………14

Regression Statistics

T: 3…………………………………………………………15

ANOVA Tables

T: 4………………………………………………………....17

Regression Analysis Tables

T: 5………………………………………………………...18

Page 5 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

Description of all Statistical Tools


Following is the brief description of the Statistical tools which we have used in our
project to analyze the data;

Measure of Central Tendency:


Measures of central tendency tell the center of the frequency distribution. This
includes the three different measures of central tendency, Mean, Median and Mod.
We have applied only following two of them

Mean:
The mean is also called the “Arithmetic Mean”. It is one of the measures of central
tendency which is used most frequently. It is obtained by dividing the sum of all

represents the whole data. The sample data mean is denoted by “ x ” and the
values by the number of values in the interested data. The single value of the “Mean”

population mean is denoted by “μ”.

Median:
The median is also an important measure of central tendency. If we want the “Median”
of the ungrouped data then it will be the middle value of the arranged data. It divides
the arranged data into two equal parts. It is calculated by arranging data first in
increasing order and then the middle value is founded from that arranged data.
If the values of the data are odd then the median will be the middle value of the
arranged data and if the values of the data are even then the median will be calculated
by the average of the two middle values of the data.

Standard Deviation:
It is a frequently used measure of dispersion. The standard deviation tells about the
deviation of the values of the data from its mean. The large value of standard
deviation shows the more spread of the values from the mean and the smaller value of
the standard deviation shows the less spread of the values from the mean. It is
measured by taking the square root of the variance. The variance of the population is
denoted by “σ²” and variance of the sample is denoted by “s²”.so the variance of
population is denoted by “σ” and sample standard deviation is denoted by “s”.

of the deviation will always equal to zero. It means ∑(x-x) or ∑(x-μ) = 0.


It is actually the sum of deviation of each value of the data from the mean and the sum

The value of the variance and standard deviation should not be negative. If the data
has no variation in it then the variance and the standard deviation will be “0”. It is
expressed in the same units as the data.

Co-Efficient of Variance:

Page 6 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

The coefficient of variation (CV) is defined as the ratio of the standard deviation “σ”
to the mean μ: The answer of the (CV) < 1 shows the less variance and (CV) > 1
shows the high variance. It also tells the per unit variance of the data.

Skewness:
Skewness is the measure of the symmetry of the probability distribution of the
random variable. Its value can be positive, negative or undecided. The negative
skewed value indicates the tail on the left side, longer than the right side which means
the data is clustered including median lying on the right side of the mean. The positive
value of the skewness tells that the tail on the right side is longer than the left side,
which means the data is clustered on the left side. If the value of the skewness is zero
means the valued are evenly distributed on the both sides of the mean.

Hypothesis Testing:
It is a method of making decision using data whether from controlled experiment or
from controlled experiment. Means we perform this test when we make the decision
about the population parameter on the bases of the sample statistic.

The two terms are used in this test “Null Hypothesis” and “Alternative Hypothesis”.
Null Hypotheses normally is a claim about the population parameter that is considered
to be true until it is proved false.

Alternative Hypothesis is used to check the claim whether or not the Null Hypothesis
is true.

ANOVA:
In analyses of the variance the “ANOVA” is used to check the Null Hypothesis
whether or not the means of three or more populations are equal.
The Hypothesis in the analyses of variance is normally stated like;
H₀: all three population means are equal
H₁: all three population means are not equal

Assumptions of ANOVA:
For using one-way ANOVA following assumptions must be fulfilled.
The populations are normally distributed from which the samples are drawn.
The populations also have the same variance or (Standard Deviation) from which the
samples are drawn.

The samples are independent and random, drawn from the different populations.

In the ANOVA we calculate the two variance estimates, variance between sample
(MSB) and variance within sample (MSW). If the means of the populations under
consideration are all equal then the variation between the means of the samples taken
from those populations well be less, (MSB) will be low and the means of the

Page 7 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

populations are not equal then the (MSB) will be higher.

MSW gives the value indicating the variance within the data of samples taken from
the different populations. It is similar to the concept of the pooled standard deviation.

Multiple Regression Analyses:


It is the study to confirm that how a dependent variable is depending on one or more
independent variables. For this purpose we use the following model;

Ŷ= a+ b₁x₁ + b₂x₂ + ……bn xn

Where:
Ŷ = dependent variable (Estimated Value of Y)
a = y-intercept (Constant)
b₁ = (Slope) per unit change in Ŷ due to the change in one unit of first independent
variable.
b₂ = (Slope) per unit change in Ŷ due to the change in one unit of second
independent variable.

The above model is often used for the future predictions of the important
components/variables of the organizations.

Coefficient of Determination:
Coefficient of Determination is used to answer the question that how good is the
regression model. It tells that how significantly a dependent variable is depending on
the independent variables used in the regression model.

It is denoted by “r²“. The value of the r² is always between “0 and 1”. As long as the
value of r² is close to the 1 it will represent the fitness of the model and gives the
confidence about the regression model. Its formula is;

SST−SSE
SST
r² =

Page 8 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

Brief Description Of Variables


For the purpose of this project we have selected following variables

1. Year of Manufacturing (Independent Variable)


2. Kilometers Driven (Independent Variable)
3. Price (Dependent Variable)

Data
The dataset utilized for this study on used car sales was obtained from Kaggle. This
comprehensive dataset comprises information on 18,647 pre-owned vehicles available
for sale in the year 2023, situated within a 25-kilometer radius of downtown Toronto,
Ontario, Canada. The vehicles were found for sale on Autotrader.ca. The dataset
provides a detailed array of attributes, encompassing variables such as the year of
manufacture, make, model, distance covered in kilometers, body type, engine
specifications, transmission details, drivetrain characteristics, exterior and interior
colors, passenger capacity, number of doors, fuel type, city fuel economy, highway
fuel economy, price, and various other numeric features.

Of particular relevance to this study are the summary statistics associated with two
critical factors: the manufacturing year and the kilometers run by the vehicles. These
statistics serve as foundational elements in understanding the dynamics of the used car
market and form the basis for the subsequent analysis.

Reference:
Kaggle. (2023). Used Vehicles for Sale.
URL: [https://www.kaggle.com/datasets/farhanhossein/used-vehicles-for-sale?resource=download]

Page 9 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

Descriptive Analysis

Table 1 - Descriptive Analysis

Interpretation :

 Here's the interpretation and significance of each statistic in the descriptive


analysis for the variable "Kilometres driven":

1. Mean
The mean of 65777.05845 indicates the average distance (in kilometers) that the listed
cars have been driven. This provides insight into the central tendency of the distances
covered.

2. Standard Error
The standard error of 467.5490582 suggests the potential variation in the sample
mean compared to the true population mean. It indicates the degree of uncertainty in
the estimated mean.

3. Median
The median value of 52600 represents the middle value of the distances driven. It's
less influenced by extreme values and gives an idea of the central tendency.

4. Mode
The mode of 90 signifies that a distance of 90 kilometers is the most common among
the listed cars. This value appears most frequently in the dataset.

Page 10 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

5. Standard Deviation
The standard deviation of 63845.7028 shows the typical amount of variation in the
distances driven from the mean. A larger standard deviation suggests greater
dispersion in the data.

6. Sample Variance
The sample variance of 4076273766 quantifies the spread of the squared differences
between each distance driven and the mean distance. It gives an idea of the overall
variability in the data.

7. Kurtosis
A kurtosis value of 0.897338251 indicates a distribution that is moderately peaked
compared to a normal distribution. It suggests that the data might have fewer extreme
values compared to a more peaked distribution.

8. Skewness
A positive skewness value of 1.042649855 suggests that the distribution of distances
driven is skewed to the right, implying a longer tail towards higher distances.

9. Range
The range of 480000 represents the difference between the maximum and minimum
distances driven in the dataset. It indicates the spread of distances covered.

10. Minimum and Maximum


The minimum of 0 and the maximum of 480000 show the range of distances within
the dataset.

11. Sum
The sum of 1226544809 is the total of all the individual distances driven.

12. Count
The count of 18647 represents the total number of distance values recorded in the
dataset.

In summary, this descriptive analysis provides insights into the distribution,


variability, and characteristics of the "Kilometres driven" variable within the dataset.
It helps understand the central tendency of distances driven, the degree of variability,
and the presence of potential outliers or skewed data points.

 Here's the interpretation and significance of each statistic in the descriptive


analysis for the variable "Price":

1. Mean
The mean of 47450.53805 represents the average listed price of the cars in the dataset.

Page 11 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

This provides insight into the central tendency of the prices.

2. Standard Error
The standard error of 390.8449856 indicates the potential variation in the calculated
sample mean compared to the true population mean. It measures the uncertainty in the
estimated mean.

3. Median
The median value of 36995 is the middle value of the listed prices. It's less sensitive
to extreme values and gives an idea of the central tendency.

4. Mode
The mode of 28995 signifies that a listed price of $28,995 is the most common among
the cars. This price appears most frequently in the dataset.

5. Standard Deviation
The standard deviation of 53371.45345 indicates the typical amount of variation in
the listed prices from the mean. A larger standard deviation implies greater dispersion
in the data.

6. Sample Variance
The sample variance of 2848512043 quantifies the spread of the squared differences
between each listed price and the mean price. It gives an idea of the overall variability
in the data.

7. Kurtosis
A kurtosis value of 152.6867497 indicates a distribution with very heavy tails
compared to a normal distribution. This suggests a high presence of extreme values
(outliers) in the data.

8. Skewness
A positive skewness value of 9.266779384 suggests that the distribution of listed
prices is highly skewed to the right, indicating a very long tail towards higher prices.

9. Range
The range of 1697998 represents the difference between the maximum and minimum
listed prices in the dataset. It shows the spread of prices covered.

10. Minimum and Maximum


The minimum of $2000 and the maximum of $1,699,998 show the range of prices
within the dataset.

11. Sum
The sum of 884810183 is the total of all the individual listed prices.

Page 12 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

12. Count
The count of 18647 represents the total number of listed prices recorded in the
dataset.

In summary, this descriptive analysis provides insights into the distribution,


variability, and characteristics of the "Price" variable within the dataset. It helps
understand the central tendency of listed prices, the degree of variability, and the
presence of potential outliers or skewed data points. The high kurtosis and skewness
values indicate a distribution with extreme values and a heavy tail towards higher
prices.

 Here's the interpretation and significance of each statistic in the descriptive


analysis for the variable "Year":

1. Mean
The mean of 2018.713466 indicates the average year of the listed cars is
approximately 2018. This provides a central point around which the years are
distributed.

2. Standard Error
The standard error of 0.029321488 suggests the variability that could exist in the
calculated sample mean compared to the true population mean. It's a measure of the
uncertainty in the estimated mean.

3. Median
The median value of 2019 represents the middle year in the dataset. It's less sensitive
to outliers compared to the mean and gives an idea of the central tendency.

4. Mode
The mode of 2023 signifies that the most common manufacturing year among the
listed cars is 2023. This is the year that appears most frequently in the dataset.

5. Standard Deviation
The standard deviation of 4.003967039 indicates the average amount of variation in
the manufacturing years from the mean. A larger standard deviation implies more
dispersion in the data.

6. Sample Variance
The sample variance of 16.03175205 quantifies the spread of the squared differences
between each year and the mean year. It provides insight into the overall variability in
the data.

7. Kurtosis

Page 13 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

A kurtosis value of 14.26695235 suggests a distribution with heavier tails compared


to a normal distribution. This means that there might be more extreme values (outliers)
in the dataset.

8. Skewness
The negative skewness value of -2.101962642 indicates that the distribution of years
is skewed to the left, implying a longer tail towards earlier years.

9. Range
The range of 65 represents the difference between the maximum and minimum years
in the dataset, showing the spread of years covered.

10. Minimum and Maximum


The minimum of 1958 and the maximum of 2023 show the range of years within the
dataset.

11. Sum
The sum of 37642950 is the total of all the individual years listed.

12. Count
The count of 18647 represents the total number of years recorded in the dataset.

In summary, this descriptive analysis provides valuable insights into the distribution,
variability, and characteristics of the “Year” variable within the dataset. It aids in
understanding the central tendency of manufacturing years, the degree of variability,
and the presence of potential outliers or skewed data points.

Correlation Analysis

Table 2 - Correlation Analysis

Interpretation:
The Pearson correlation coefficients presented in the table reflect the relationships
between the variables Year, Kilometres driven, and Price. Here's the significance of
each correlation coefficient:

Page 14 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

1. Year and Kilometres driven (-0.736)


The correlation coefficient between Year and Kilometres driven is -0.736. This value
indicates a moderate to strong negative linear relationship between these two
variables. In other words, as the manufacturing year of a car increases, the number of
kilometres driven tends to decrease, and vice versa. This correlation suggests that
newer cars generally tend to have fewer kilometres driven.

2. Year and Price (0.214)


The correlation coefficient between Year and Price is 0.214. This positive correlation
indicates a weak positive linear relationship between these two variables. It suggests
that, on average, as the manufacturing year of a car increases, the price also tends to
increase slightly. This could be due to factors like newer cars often having better
features or improved condition.

3. Kilometres driven and Price (-0.379)


The correlation coefficient between Kilometres driven and Price is -0.379. This
negative correlation indicates a moderate negative linear relationship between these
variables. As the number of kilometres driven increases, the price of the car tends to
decrease. This is a common trend in the used car market, where higher mileage
generally leads to lower prices.

In summary, the Pearson correlation coefficients provide insights into the


relationships between the variables. They help quantify the strength and direction of
the linear relationships. The negative correlation between Kilometres driven and Price
aligns with expectations, as does the negative correlation between Year and
Kilometres driven. The positive correlation between Year and Price, albeit weak,
suggests a tendency for newer cars to have slightly higher prices.

Regression Statistics

Table 3 - Regression Statistics


Interpretation
Here's the interpretation and significance of the regression statistics in the context of
data:

Page 15 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

1. Multiple R (Multiple Correlation Coefficient)


- The multiple correlation coefficient (R) measures the strength and direction of
the linear relationship between the independent variables (Year and Kilometres
Driven) collectively and the dependent variable (Price).
- In this case, the value of R is approximately 0.3905, indicating a weak positive
linear relationship.

2. R Square (Coefficient of Determination)


- The coefficient of determination (R Square) represents the proportion of the
variance in the dependent variable (Price) that is explained by the linear regression
model with the independent variables (Year and Kilometres Driven).
- Here, the R Square value is approximately 0.1525, suggesting that about 15.25%
of the variation in Price can be explained by the linear relationship with Year and
Kilometres Driven.

3. Adjusted R Square
- The adjusted R Square takes into account the number of independent variables
and adjusts R Square accordingly to prevent overfitting.
- The adjusted R Square value here is very similar to the R Square value,
indicating that the inclusion of Year and Kilometres Driven doesn't significantly
impact the adjusted R Square.

4. Standard Error
- The standard error represents the average deviation of the observed values from
the predicted values by the regression model.
- In this case, the standard error is approximately 49136.0036, indicating the
average variability between the actual Price values and the values predicted by the
regression model.

5. Observations
- The number of observations refers to the total data points in the analysis. Here,
there are 18647 data points in the dataset.

In summary, the regression statistics provide insights into the linear relationship
between the independent variables (Year and Kilometres Driven) and the dependent
variable (Price). The R Square value suggests that the variation in Price can be
partially explained by Year and Kilometres Driven, but the overall interpretation is
relatively modest at around 15.25%. The low R Square indicates that there might be
other factors not included in the model that influence Price. The standard error gives
an idea of the average error between the predicted and actual Price values. The
multiple correlation coefficient (R) suggests a weak positive linear relationship
between the independent variables and the dependent variable.

Page 16 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

ANOVA Analysis

Table 4 - ANOVA Analysis

Interpretation:
Here's the interpretation and significance of the ANOVA analysis results in the
context of data:

1. ANOVA Table
The ANOVA table summarizes the variance components and statistical measures
associated with the regression analysis.

2. df (Degrees of Freedom)
- Regression: 2 degrees of freedom are associated with the regression analysis,
indicating the number of independent variables being considered (Year and
Kilometres Driven).
- Residual: 18644 degrees of freedom correspond to the error or residual component,
representing the number of data points minus the number of predictor variables.
- Total: 18646 degrees of freedom account for the overall data points in the analysis.

3. SS (Sum of Squares)
- Regression: The sum of squares for the regression component is approximately
8.10027E+12. This value quantifies the variation explained by the regression model
using the independent variables (Year and Kilometres Driven) to predict the
dependent variable (Price).
- Residual: The sum of squares for the residual component is about 4.50131E+13. It
represents the unexplained variation or error, which is the variation that cannot be
accounted for by the regression model.
- Total: The sum of squares for the total is around 5.31134E+13, representing the total
variation in the dependent variable (Price).

4. MS (Mean Squares)
- Regression: The mean squares for the regression is approximately 4.05014E+12,
which is the sum of squares divided by the degrees of freedom associated with the
regression. It represents the average variation explained by the regression model.
- Residual: The mean squares for the residual is about 2414346850, calculated
similarly. It represents the average unexplained variation.

5. F (F-ratio or F-statistic)

Page 17 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

- The F-ratio is calculated as the ratio of the mean squares of regression to the mean
squares of residuals. In this case, it is calculated as 1677.528828.
- A large F-ratio indicates that the variation explained by the regression model is
significantly larger than the unexplained variation, suggesting the model's
significance.

6. Significance F
- The significance F value of 0 indicates the p-value associated with the F-ratio.
- A p-value of 0 (or very close to 0) suggests that the regression model is statistically
significant, meaning that at least one of the independent variables (Year and
Kilometres Driven) has a significant effect on the dependent variable (Price).

In summary, the ANOVA analysis indicates that the regression model, which
considers the independent variables (Year and Kilometres Driven) to explain the
variation in the dependent variable (Price), is highly significant. The low p-value (0)
suggests that at least one of the independent variables has a substantial impact on the
response variable. This signifies the importance of the regression model in
understanding the relationships among the variables and explaining the observed
variations in Price. The results are consistent with the regression statistics, reinforcing
the significance of the relationships between the variables.

Regression Analysis

Table 5 - Regression Analysis

Interpretation:
Here's the interpretation of the regression model table:

1. Regression Model Table


The regression model table provides information about the coefficients, standard
errors, t-statistics, p-values, and confidence intervals for the intercept and the two
independent variables (Year and Kilometres Driven) in the regression analysis.

Page 18 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

2. Intercept
- The intercept represents the estimated value of the dependent variable (Price) when
all independent variables are set to zero. In this context, it might not hold a practical
interpretation.
- The coefficient for the intercept is approximately 3854085.193.
- The standard error associated with the intercept is 268397.3624.
- The t-statistic is calculated as the coefficient divided by its standard error, resulting
in 14.35962395.
- The p-value (1.64207E-46) is extremely low, indicating that the intercept is
statistically significant.

3. Year
- The coefficient for the variable "Year" is -1872.540371. This coefficient represents
the change in the dependent variable (Price) for a one-unit change in the independent
variable "Year," while holding other variables constant.
- The standard error for the "Year" coefficient is 132.7547517.
- The t-statistic for "Year" is -14.10526062.
- The p-value (6.00251E-45) is very close to zero, indicating that the "Year" variable
is highly significant in explaining the variation in Price.
- The negative coefficient suggests that as the Year increases (car age), the Price tends
to decrease.

4. Kilometres Driven
- The coefficient for the variable "Kilometres Driven" is -0.40306138. This
coefficient represents the change in the dependent variable (Price) for a one-unit
change in the independent variable "Kilometres Driven," while holding other
variables constant.
- The standard error for the "Kilometres Driven" coefficient is 0.008325473.
- The t-statistic for "Kilometres Driven" is -48.41303292.
- The p-value is reported as 0, which indicates strong statistical significance.
- The negative coefficient implies that as the number of Kilometres Driven increases,
the Price tends to decrease.

5. Confidence Intervals
- The lower and upper confidence intervals provide a range within which we can be
confident that the true population parameter lies. For instance, the 95% confidence
interval for the intercept is between 3328001.876 and 4380168.51.

Overall
The low p-values for both "Year" and "Kilometres Driven" indicate that these
variables have a significant impact on the variation in Price. The coefficients,
t-statistics, and their signs provide insights into the direction and magnitude of these
effects. The R-squared value (0.152509153) suggests that the model explains about
15.25% of the variability in the dependent variable, Price.

Page 19 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

In summary, the regression model table indicates that both "Year" and "Kilometres
Driven" are statistically significant predictors of Price, and their coefficients provide
valuable insights into the relationships between these variables.

Pivotal Point of the Narrative


This study is centered around a meticulous investigation into the intricate dynamics of
the used car market using rigorous statistical analysis. The principal objective is to
discern the multifaceted influences that underlie the sale price of pre-owned vehicles.
The research harnesses an extensive dataset pertaining to variables encompassing the
car's vintage, mileage accrued, and the listed sale price within the Toronto market.

Commencing with an incisive descriptive analysis, this study furnishes pivotal


summary statistics for the variables Year, Kilometres Driven, and Price. These
statistics, encompassing measures such as the mean, median, standard deviation,
skewness, and kurtosis, efficaciously illuminate the distributional attributes and
central tendencies inherent in the dataset.

Moreover, the research embraces the intricacies of regression analysis, a methodical


inquiry into the interplay between the independent variables (Year and Kilometres
Driven) and the dependent variable (Price). The regression statistics, notably
including metrics such as the R-squared and adjusted R-squared values, function as
robust gauges to quantify the explanatory power of the independent variables on the
variability inherent in the dependent variable. Concurrently, the coefficients, standard
errors, t-statistics, and p-values collectively furnish nuanced insights into the profound
significance and directional bearings of the relationships elucidated.

A further layer of depth is added through the implementation of an ANOVA


(Analysis of Variance) analysis. This discerning analytical tool serves to holistically
evaluate the contributive impact of the independent variables in elucidating the
intricate variance within the dependent variable. The culmination of this analysis
distinctly ascertains the statistical significance of the established relationships.

Conclusively, the nucleus of this study lies in the deconstruction of how the temporal
dimension of the vehicle (Year) and the accumulated mileage collectively influence
the intricate tapestry of sale prices in the domain of used cars. The conducted
statistical analyses culminate in a wealth of invaluable insights pertaining to the
substantive significance, directional alignment, and robustness of these relationships.
This nuanced comprehension equips stakeholders within the used car market with the
acumen to craft judicious decisions, bolster market efficiencies, and foster a robust
automotive economic ecosystem.

Page 20 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

GRAPHICAL REPRESENTATIONS

Page 21 of 22
IIM AMRITSAR - QUANTITATIVE MANAGEMENT GROUP PROJECT (EMBA 2023-25)

REFERENCES
[1] Kaggle. (2023). Used Vehicles for Sale. URL:
[https://www.kaggle.com/datasets/farhanhossein/used-vehicles-for-sale?resource=do
wnload]

[2] Price Anomalies in the Used Car Market: Marco Haan and Peter Koo reman (PDF)
Price Anomalies in the Used Car Market (researchgate.net).

[3] Emons, W., and G. Sheldon (2002), “The Market for Used Cars: A New Test of
the Lemons Model”, CEPR Discussion Paper, DP 3360.

Page 22 of 22

You might also like