You are on page 1of 24

STA610 SAS PROGRAMMING

BACHELOR OF SCIENCE (HONS.) STATISTICS


FACULTY OF COMPUTER AND MATHEMATICAL SCIENCE
N4CS2415S

TITLE:
THE FACTOR THAT AFFECTING SALES OF CAR IN NORTH AMERICA

PREPARED FOR
MADAM NORNADIAH MOHD RAZALI
PREPARED BY
NAME MATRIC NUMBER
MUHAMMAD SAFWATULLAH BIN ABD 2019728011
HALIM
NABIL SYAHMI BIN JUMAHADI 2019583637
KHAIRUL NIZAM BIN ZULJASRI 2019594065

SUBMISSION DATE: 3RD JULY 2020


TABLE OF CONTENTS

NO TOPIC PAGES
1 INTRODUCTION
1.1 INTRODUCTION OF TOPIC 1
1.2 DESCRIPTION OF DATA 2
1.3 RESEARCH OBJECTIVES 3
1.4 RESEARCH QUESTIONS 3
1.5 RESEARCH HYPOTHESIS 3
2 METHODOLOGY
2.1 STATISTICAL METHOD 4
2.2 SUMMARY OF STATISTICAL METHOD 6-7
2.3 SAS PROCEDURE 8
3 RESULT AND ANALYSIS
3.1 DATA MANAGEMENT 9
3.2 DESCRIPTIVE ANALYSIS 10-13
3.3 INFERENTIAL STATISTIC 14-19
4 CONCLUSION AND RECOMMENDATIONS
4.1 CONCLUSION 20
4.2 RECOMMENDATIONS 20
5 REFERENCES

LIST OF TABLES
NO TABLE PAGES
1 Description of Variables
2 Summary of Statistical Methods
3 SAS Procedures
CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION OF TOPIC

The automobile was first invented and perfected in Germany and France in the
late 1800s, though Americans quickly came to dominate the automotive industry in the
first half of the twentieth century. Henry Ford innovated mass production techniques
that become standard and Ford, General Motors and Chrysler emerged as the “Big
Three” auto companies by the 1920s. Manufacturers funnelled their resources to the
military during World War II, and afterward automobile production in Europe and
Japan soared to meet growing demand. Once vital to the expansion of American urban
centres, the industry had become a shared global enterprise with the rise of Japan as the
leading automaker by 1980.

A car is a wheeled motor vehicle used for transportation. Most definition of cars
say that they run primarily on roads, seat one to eight people, have four tires and mainly
transport people rather than goods. Over the decades, additional features and controls
have been added to vehicles, making them progressively more complex but also more
reliable and easier to operate. Most cars use in the 2010s are propelled by an internal
combustion engine, fuelled by the combustion of fossil fuels. Electric cars which were
invented early in the history of the car became commercially available in the 2000s and
are predicted to cost less to buy than gasoline cars before 2025.

The personal benefits include on-demand transportation, mobility,


independence and convenience. The societal benefits include economic benefits such
as job and wealth creation from the automotive industry, transportation provision,
societal well-being from leisure and travel opportunities and revenue generation from
the taxes. People’s ability to move flexibly from place to place has far-reaching
implications for the nature of societies. There are around 1 billion cars in use
worldwide. The numbers are increasing rapidly especially in China, India and other
newly industrialized countries.

1
1.2 DESCRIPTION OF DATA

This data set consists of the car sales which include the information about the number
of cars being sold in North America and the factor that influence the number of cars being buy
by customer in North America. The population for this data set is all cars being sold in North
America and the sample for this data set is 100 type of cars buy by the customer in North
America. This table below will explain about all variables being used in this study.

Table 1: Description of Variables

Variable Name Unit of The Variable Type of Data


Manufacturer - Qualitative Data
Vehicle type - Qualitative Data
Engine size Cubic centimetre (cc) Quantitative Data
Horsepower Horsepower (cc) Quantitative Data
Fuel capacity Gallon (gal) Quantitative Data
Fuel efficiency Kilometre per litre (kmpl) Quantitative Data
Sales in thousand - Quantitative Data
Price in thousand Dollar Quantitative Data

In this study, sales of car in thousand is the dependent variable. There are seven
independent variables for this study which is consists of three qualitative data and five
quantitative data that will influence the dependent variable in this study.

2
1.3 RESEARCH OBJECTIVES

1) To identify the most suitable independent variable to be included into the model.
2) To study the relationship between the all independent variable and dependent
variable.

1.4 RESEARCH QUESTIONS

1) Which independent variable that is significant to be included in the model.


2) Is there any relationship between the variable in the model?

1.5 RESEARCH HYPOTHESIS

1) There is significant relationship between sales in thousand and price in thousand.


2) There is significant relationship between sales in thousand and fuel efficiency.
3) There is significant relationship between sales in thousand and fuel capacity.
4) There is significant relationship between sales in thousand and horsepower.
5) There is significant relationship between sales in thousand and engine size.
6) There is significant relationship between sales in thousand and vehicle type.
7) There is significant relationship between sales in thousand and manufacturer.

3
CHAPTER 2

METHODOLOGY

2.1 STATISTICAL METHOD

2.1.1 Multiple Linear Regression

Multiple linear regression analysis is the most common analysis of a linear regression.
This analysis is used for describing the relationship between one dependent variable and two
or more independent variables. The assumption of multiple linear regression are the data must
be normally distributed with zero mean and constant variance. The key assumptions of multiple
linear regression analysis are linearity relationship between dependent variable and
independent variables.

2.1.2 Stepwise Selection Method

The researcher can use stepwise selection method to select only significant independent
variables that are contribute to the dependent variable. Stepwise regression is the combination
of backward method and forward method that is carried out by automatic procedure. All
variables will be added to the model and then checked whether if their significance has reduced
below the tolerance value which is 0.1. The variable will be removed if there is insignificant
variable is found in the model.

2.1.3 Independent One Sample T-Test

Independent sample t-test is an appropriate test that is used to compare the significant
different between the mean of two independent group. The assumptions of this test are as
follows:

a) The dependent variable is normally distributed for each of the independent variables.
b) The dependent variable is continuous.
c) The independent variable should be two or more independent variables.
d) The variance is constant.

4
2.1.4 Analysis of Variance (ANOVA)

To determine whether there are any statistically significant differences between the
means of three or more independent (unrelated) groups, the analysis of variance (ANOVA) is
used. This ANOVA analysis can be run and test for the null and alternative hypothesis:

𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 =. . . = 0

𝐻1 : 𝜇1 = 𝜇2 = 𝜇3 =. . . = 0

From the ANOVA table analysis, if the p-value (significant value) < α= 0.05, the null
hypothesis will be rejected, and alternative hypothesis will be accepted. This will be
interpreting the independent variables is significantly different with each other’s.

5
2.2 SUMMARY OF STATISTICAL METHODS

The method of analysis for each variable and the require objective has been specified by the
researcher. Table 3.2 demonstrate the method of analysis use for each objective.

Table 2: Summary of Statistical Methods

Objective Dependent Independent Method of Analysis


Variable Variable
To identify Sales of X1 Stepwise Selection
the Car (Manufacturer) Method
significant X2
variable to (Vehicle Type)
be used X3
(Engine Size)
X4
(Horsepower)
X5
(Fuel Capacity)
X6
(Fuel Efficiency)
X7
(Price in thousand)
To study Sales of X1 Multiple Linear
the Car (Manufacturer) Regression
relationship X2
between (Vehicle Type) Pearson Correlation
variables X3 Coefficient
(Engine Size)
X4
(Horsepower)
X5
(Fuel Capacity)
X6
(Fuel Efficiency)

6
X7
(Price in thousand)

7
2.3 SAS PROCEDURE

In this study, SAS 9.3 application will be used to achieve the objectives. The SAS procedure
and their purpose will be concluded in Table 3

Table 3: SAS Procedures

SAS PROCEDURE PURPOSE


BY STATEMENT To refer the data
FORMAT To change the format of the variable
KEEP STATEMENT To keep the new variable in new data
LABEL To change the format of the variable
LIBNAME STATEMENT To create SAS library
MERGE STATEMENT To merge multiple data into one
NOOBS STATEMENT To remove the column number of
observations
PLOTS STATEMENT To create histogram chart
PROC CONTENT To assign the description of the data
PROC CORR To analyse the Spearman’s rank correlation
PROC FREC To compute the frequency count
PROC MEAN To compute summary statistics
PROC NPAR1WAY To analyse the Mann-Whitney analysis
PROC PRINT To print the content of the data set
PROC REPORT To create a report of data set
PROC SORT To sort the data set
PROC UNIVARIATE To test for normality
RENAME To rename variable
SET STATEMENT To refer the data
TITLE STATEMENT To assign the title

8
CHAPTER 3

RESULT AND ANALYSIS

3.1 DATA MANAGEMENT

3.1.1 Split data

Split data from car sales to several subset according to manufacture such as Audi, BMW,
Chevrolet, Ford, Honda, Hyundai, Mercedes, Mitsubish and Nissan.

3.1.2 Combine data

Combine all split data to get the data that choose only from several manufactured. The
researcher split and combine the data because want to decrease the observations by eliminate
the other manufactured that the researcher does not want to use.

3.1.3 Drop data

Drop model because the researcher does not want to use the variable.

BEFORE AFTER

9
3.2 DESCRIPTIVE ANALYSIS

The CONTENT procedure of sales of car

Based on the above, the dataset name is car sales that had been accessed from work library.
The data set had been imported from the Excel file. There are 37 observations had been
selected from the original data sets that contains 100 observations in total. 9 variables had
been identified from the data set and 8 variables had been used in the analysis.

10
3.2.1 Bar Chart of Average Sales by Vehicle Type

The figure above shows the bar chart of average sales in thousand by vehicle type. The chart
presents two separate indicators: neutral or dissatisfied and satisfied. Car has the higher average
sales in thousand compared to passenger.

11
3.2.2 Pie Chart of Total Sales by Manufacturer

From the pie chart above shows sales in thousand by manufacturer, the higher sales in thousand
is Ford with 1379.363. While the least score by Nissan is 241.659. The second largest sales are
other with 338.147. Other than that, Honda and Chevrolet with 285.743 and 258.258
respectively.

12
3.2.3 Summary Statistic of Sales by Manufacturer

From the above summary statistics, the Figure 3.1.(iii) shows that the number of observations
of manufacturer, Audi, BMW, Chevrolet, Ford, Honda, Hyundai, Mercedes, Mitsubish and
Nissan are 3, 3, 6, 7, 3, ,2 , 5, 4 and 4 respectively. The mean of Audi is 13.5190000, BMW is
15.5016667, Chevrolet is 43.0430000, Ford is 197.0518571, Honda is 95.2476665, Hyundai is
48.0710000, Mercedes is 12.3214000, Mitsubish is 23.3340000 and Nissan is 60.4147500
which mean the highest mean is belong to Ford. For Audi, BMW, Mercedes and Mitsubish has
the minimum sales between 0 to 10 thousand and the maximum sales is between 19 to 43
thousand. Meanwhile, Chevrolet, Ford, Honda, Hyundai, and Nissan, has the minimum sales
between 12 to 64 thousand and maximum sales between 65 to 541 thousand.

13
3.3 INFERENTIAL STATISTIC

3.3.1 Pearson’s Correlation Coefficient

a) Price

H0: There is no significant relationship between sales in thousand and price in thousand
dollars.

H1: There is a significant relationship between sales and price.

It is shown that P-value is equal to 0.0979 which is more than the significant value which is
0.05. Thus, null hypothesis is failed to reject, and we can conclude that there is no significant
relationship between sales in thousand and price in thousands dollar. The table shows that the
Pearson’s Correlation is -0.27622 which means that there is weak negative relationship
between sales in thousand and price in thousands dollar.

14
b) Horsepower

H0: There is no significant relationship between sales in thousand and horsepower.

H1: There is a significant relationship between sales and horsepower.

It is shown that P-value is equal to 0.3782 which is more than the significant value which is
0.05. Thus, null hypothesis is failed to reject, and we can conclude that there is no significant
relationship between sales in thousand and horsepower. The table shows that the Pearson’s
Correlation is -0.14916 which means that there is weak negative relationship between sales in
thousand and horsepower.

c) Fuel efficiency

H0: There is no significant relationship between sales in thousand and fuel efficiency.

H1: There is a significant relationship between sales in thousand and fuel efficiency.

It is shown that P-value is equal to 0.5447 which is more than the significant value which is
0.05. Thus, null hypothesis is failed to reject, and we can conclude that there is no significant
relationship between sales in thousand and Fuel efficiency. The table shows that the Pearson’s
Correlation is -0.10285 which means that there is weak negative relationship between sales in
thousand and fuel efficiency.

d) Fuel capacity

H0: There is no significant relationship between sales in thousand and fuel capacity.

H1: There is a significant relationship between sales in thousand and fuel capacity.

It is shown that P-value is equal to 0.4479 which is more than the significant value which is
0.05. Thus, null hypothesis is failed to reject, and we can conclude that there is no significant
relationship between sales in thousand and Fuel capacity. The table shows that the Pearson’s
Correlation is 0.12866 which means that there is weak positive relationship between sales in
thousand and fuel capacity.

15
e) Engine size

H0: There is no significant relationship between sales in thousand and engine size.

H1: There is a significant relationship between sales in thousand and engine size.

It is shown that P-value is equal to 0.6634 which is more than the significant value which is
0.05. Thus, null hypothesis is failed to reject, and we can conclude that there is no significant
relationship between sales in thousand and engine size. The table shows that the Pearson’s
Correlation is 0.07400 which means that there is weak positive relationship between sales in
thousand and engine size.

16
3.3.2 Stepwise Selection Method

The regression model is:

𝑦̂ = 71.89645 + 150.82215 (Ford) – 44.91681(passenger)

The table show that the p-value for Ford < 0.0001 which is less than alpha (0.05). Therefore,
the model is significant relationship between Ford and sales. Meanwhile, the p-value for
passenger = 0.1325 which is more than alpha (0.05). But the R-squared and adjusted R- squared
are 0.4373 and 0.4042, respectively. The model is a good fit model since there is a significant
relationship between ford and sales. The R-squared and adjusted R- squared also has high
value.

17
3.3.3 Kruskal Wallis Test

From the figure above, p-value is equal to 0.0026 which is reject 𝐻0 because p-value is lower
than 0.05. As the conclusion, there is no difference in sales among manufacturer which are Audi,
BMW, Chevrolet, Ford, Honda, Hyundai, Mercedes, Mitsubish and Nissan.

18
3.3.4 Independent One Sample T-Test

From the table above, the variance is unequal because the equality of variances is 0.0001 less
than 0.05. Therefore, Satterthwaite is chosen, and the p-value is 0.02245 higher than alpha
(0.05). So, there is no significant difference in sales and the vehicle type.

19
CHAPTER 4

CONCLUSION & RECOMMENDATIONS

4.1 CONCLUSION

For the first objective which is to identify the significant variable to be used in this study, we
conclude only variable Manufacturer which is Ford are significant with sales by using stepwise
selection method. The model is a good fit model since there is a significant relationship between
variable and the R-squared and adjusted R- squared also has high value.

Next, we also used Pearson’s Correlation to determine the correlation between sales of
car with other five continuous independent variables. Based on the Pearson’s result, it can be
concluded that there is weak negative relationship in variable price, horsepower, and fuel
efficiency while weak positive relationship found in variable fuel capacity and engine size,
respectively.

4.2 RECOMMENDATIONS

Other researcher can apply the recommendations in their upcoming researchers in this
study. To make a better study, the researcher can use more different variables that affect sales
of car to determine which variable are the most effects. The researchers also can add more
numbers of the observations to this study since many observations broadens the range of
possible data and forms a better picture for analysis.

After that, since the study is conducted in North America, which most of the peoples in
their country were affordable, the future researchers can do the study in another country or
region such as India or China to determine the variables that can affect the dependent variable.

20
REFERENCES

21

You might also like