You are on page 1of 24

QUESTION 1:

PART A: MINI PROJECT

1) PROJECT BACKGROUND
This report outlines the results of a survey conducted out to determine the brands of
motorcycle used or owned by UTeM students and how long they have been using that
motorcycle. This is a project is conducted in order to complete the assignment for BENG
2143 (Engineering Statistic). The respondent of this survey were limited as they only
aimed toward the student of UTeM. The survey involved 36 respondents from UTeM. As
requested, this survey report was obtained by means of a questionnaire.

2) METHODOLOGY
The method that is used for this project is by conducting survey using “Google Form”.
The link for the survey is blast through “WhatsApp” application by spreading it from
group to group to obtain different respond from different faculties all over UTeM. To
achieve the target of this study, a set of five questions will be given to each respondent.
The first question is about the faculty of the student. The second question is about
motorcycles that have by student. The third question is about the current of brand
motorcycle that used by students. Fourth, question was asked about the engine capacity
(CC) of motorcycle by students. Lastly, fifth question was asked about the duration that
students used the motorcycle.

SAMPLE OF QUESTIONNAIRE

Title: Brands of motorcycles used/owned by UTeM students

1) Faculty
 FKM
 FTKEE
 FTKMP
 FKE
 FKEKK

1
 FTMK
 FKP
 FPTT

2) Do you have a motorcycle?


 Yes (If you choose this answer, kindly proceed to the next question)
 No (If you choose this answer, the survey ends here. Thank you)

3) What brand of motorcycles that you used currently


 Honda
 Yamaha
 Modenas
 Other:

4) What is your motorcycle’s engine capacity (CC)


 100 CC
 125 CC
 135 CC
 150 CC
 250 CC

5) How long have you been using that motorcycle?


 0 – 1 years
 1 – 2 years
 2 – 3 years
 3 – 4 years
 4 – 5 years

2
3) DATA ANALYSIS AND RESULT

3.1 Qualitative Analysis

Question 3: The brand of motorcycle that used by student in UTeM.

Figure 1 and 2 show the data for the brands of motorcycles used or owned by UTeM
students. The charts are divided into 4 parts. The legends are Honda, Yamaha, Modenas
and others. Blue represent Honda, Red stands for Yamaha, Green is for Modenas and
purple represent others. There were 35 respondents who took part in the survey.
Based on their responses, it can be seen that the highest number of respondents are
using Yamaha brand motorcycle. It accounted for well below a half of the whole which
is 15 respondents. On the other hand, only a small number of respondents used other
brands of motorcycle. The data recorded were only 6%, which is 2 respondents. It can
be said that it is below a tenth of the whole. Besides, students who are using Honda
brand motorcycle makes up to 37% whereas Modenas 36 brand users comprises 14%.

6%

14%
37% Honda
Yamaha
Modenas
Others
43%

Figure 1: Percentages of 36 respondent’s brands of motorcycle that used by students


in UTeM.

3
Figure 2: Shows the number of respondent’s brand of motorcycle that used by
students in UTeM.

3.2 Quantitative Analysis

Questions 5: The duration that students used the motorcycle.


From the survey conducted, it can be seen that class limit 1-2 has the highest
frequency which is thirteen. Its shows cumulative frequency is sixty percent. In
contrast, the class limit 3-4 makes up only two out of thirty five, where it is the
lowest among the others. Lastly, the size of the class limit used is one.
From the calculation done in excel, the mean, mode and median are able to
determine. The mean we got was 2.0143 whereas the mode is 1.4545. Besides,
the value of median is 1.7308.

4
Table 1.0: Data of student been using the motorcycle

Unlike normally distributed data where all measures of central tendency (mean, median,
and mode) equal each other, in a negatively skewed data, the measures are dispersed. The
general relationship between the central tendency measures in a negatively skewed
distribution may be expressed using the following inequality:
Mode > Median > Mode
Another important note about the measures of central tendency in negatively
skewed distributions is that the arithmetic mean is generally located on the left from the
peak of the distribution. Although the rules mentioned previously are considered to be the
general rules for negatively skewed distributions, you may encounter many exceptions in
real life that violate the rules.
The significant negative skewness of a distribution may not be suitable for
thorough statistical analysis. The high skewness of the data may lead to misleading results
from the statistical tests. Due to such a reason, negatively skewed data goes through the
transformation process to make it close to the normal distribution. The statistical tests are
usually run only when the transformation of the data is complete.

5
̅ <𝔁
∴ 𝑇ℎ𝑒 𝑑𝑎𝑡𝑎 𝑖𝑠 𝑖𝑠 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑙𝑦 𝑠𝑘𝑒𝑤𝑒𝑑 𝑎𝑠 𝔁 ̃< 𝔁
̂

Figure 3: Negatively skewed distribution.

*Frequency table refer to Table 1 .0: Data of student been using the motorcycle

The range comprises the difference between highest value and the lowest value. In our
survey our range is five. On the other hand, the width of the class interval is one
whereas the number of class interval is five. As there are five class intervals it can be
said that it is reasonable for the given data. Lastly, the modal class interval is 1-2. This
is because it has the most number of the students.

6
Figure 4 shows the histogram of quantitative data from the survey conducted, it can be
seen that class limit 1-2 has the highest frequency which is thirteen. In contrast, the
class limit 3-4 makes up only two out of thirty five, where it is the lowest among the
others. Besides, the frequency of class limit 2-3 is more than the frequency of class
limit 3-4, where it is seven. The frequency of class limit 4-5 is slightly less than the
frequency of class limit 2-3.

14 13

12

10 9
Frequency, f

8 7

6 5

4
2
2

0
0-1 1-2 2-3 3-4 4-5

Years

Figure 4: Histogram of quantitative data.

7
According to Figure1.2, there were fluctuated trend in graph that represent the data of student
that been using the motorcycle. The data increased steadily by 21 respondents that had been
using motorcycle from 0 till 2years. Start from year 2, the graph is decrease until 4year of
using the motorcycle. 3 till 4 years is the lowest point of respondent that using a motorcycle.
This is because majority UTeM student is using another vehicle after 3 or 4 year of studies in
UTeM.
From the data, it is found there is an outlier. Assuming that the trend line is linear, supposedly
the second point should be a bit lower than the first point while the fourth point must be a bit
higher than the fifth point. Therefore, this graph will become a perfect negative linear
correlation. This happen because it indicates that many respondents has used their motorcycle
for 1-2 years. It is said to be at the most compared to the others. Whereas, the number of
users for 3-4 years is the least.

14

12

10
Frequency, f

0
0.5 1.5 2.5 3.5 4.5 5.5

Midpoint, m

Figure 5: Polygon of quantitative data

8
The first and third quartiles are descriptive statistics that are measurements of position in a
data set. Similar to how the median denotes the midway point of a data set, the first quartile
marks the quarter or 25% point. Approximately 25% of the data values are less than or equal
to the first quartile. The third quartile is similar, but for the upper 25% of data values. We
will look into these ideas in more detail in what follows.
The median is the most resistant to outliers. It marks the middle of the data in the sense that
half of the data is less than the median. We could calculate the median of the bottom half of
our data. One half of 50% is 25%. Thus half of half or one quarter, of the data would be
below this. Since we are dealing with a quarter of the original set, this median of the bottom
half of the data is called the first quartile, and is denoted by Q1. The median of this half,
which we will denote by Q3 also splits the data set into quarters. However, this number
denotes the top one quarter of the data. Thus three quarters of the data is below our number
Q3. This is why we call Q3 the third quartile.
Quartiles help to give us a fuller picture of our data set as a whole. The first and third
quartiles give us information about the internal structure of our data. The middle half of the
data falls between the first and third quartiles, and is centered about the median. The
difference between the first and third quartiles, called the interquartile range, shows how the
data is arranged about the median. A small interquartile range indicates data that is clumped
about the median. A larger interquartile range shows that the data is more spread out.
From the calculation using excel, it is found that first quartile is 10.0576 and the value of
third quartile is 2.75. For the value of median it is 1.7308.

40
35
35
Cumulative Frequency, cf

30
30 28

25
21
20

15

10 8

5
0
0
0 1 2 3 4 5 6
Upper Boundary

Figure 6: Ogive of quantitative data

9
The variance is calculated using equation 1 as below

1 (Σmf)2
𝑆2 = [Σm2 𝑓 − ] (Equation 1)
Σ𝑓 −1 Σf

S2 = 1.7277

∴ From excel 𝑆 2 = 1.6784

The standard deviation is calculated by using equation 2 as below

𝑆 = √𝑆 2

𝑆 = 1.3144

∴ From excel 𝑆 = 1.2955

It is found that there are difference between the value gained by excel and equation. In our
opinion this happens because of the value of decimal point. By using excel the final
answer will be more accurate as they already set a fixed decimal point value whereas by
doing it in manually we tend to decide our own decimal point. On the other hand, human
errors are also one of the reasons why there is a difference in the value.

CONCLUSION

Based on the analysis, it can be concluded that Yamaha has the most number of users
compared to the other brands because students tends to prefer its design and at the same time
it is fuel economy. Various feature and accessories maybe also one of the reason students
prefers to use Yamaha brand motorcycle. Furthermore, regarding the price, Yamaha is more
affordable compared to the other brands. It can be said that this is also one of the reason
students prefer this brand. In addition, from this survey, we found that 1-2 years is the most
frequent period of students using motorcycles. This is because longer usage of motorcycle
could affect its efficiency. Moreover, Honda is also said to be the second highest preferable
brand among the students because of the power of its brand and is association with quality
and value.

10
QUESTION 2:
Table 1.0: ANOVA Table for data on six different algorithms.

H0: µ₁₌µ₂₌µ₃₌µ₄₌µ₅₌µ₆
H1: At least two of the means are not equal
α= 0.05
Test statistic (F): 4.032930009
Critical region:
P-value= 0.00444322<0.05, thus reject H0
Decision:
Reject H0 and conclude that the average of software development cost for the six groups
are not all the same.
∴The algorithms different in their mean cost estimation is not accuracy.

11
Table 2.0: ANOVA Table for the data protopectin content

H0: µ₁₌µ₂₌µ₃₌µ₄

H1: At least two of the means are not equal

α= 0.75

Test statistic (F): 3.738971567

Critical region:

P-value= 0.020716189<0.75, thus reject H0

Decision:

Reject H0 and conclude that the average of protopectin content for the four groups is
different.

P-value=0.020716189

12
Table 3.0: The different on the specific storage times

Comparison Abs.diff Critical region Result


0-7 231.6222 597.030905 no significant diff
0-24 189.1111 597.030905 no significant diff
0-21 641.9444 597.030905 significant diff
7-14 42.51111 597.030905 no significant diff
7-21 410.3222 597.030905 no significant diff
14-21 452.8333 597.030905 no significant diff

df = 4 , Den df = 32, Qc = 4.27

Via the calculation done, the degree of freedom is four whereas the denominator degree of
freedom is thirty two. Besides that, the value of Qc is attained from the table (refer table in
appendix B), where it is 4.27.

By using excel we find a significant different in 0 to 21 days with the absolute different is
641.9444. I does not agree the statement that protopectin content decreases as storage time
increases.

13
QUESTION 3:
a)

Blood pressure rise versus sound pressure level


10
9
Blood pressure rise (MM HG)

8
7
6
5
4
3
2
1
0
0 20 40 60 80 100 120
Sound pressure level (DB)

Figure 1.0: Shows that scatter graph blood pressure rise versus sound pressure level.

Yes. It is reasonable to assume that y and x is linearly related as the scatter graph
shows positive linear correlation.

b) Using the CORREL function in Microsoft Excel, insert x values at array 1 and y
values at array 2, the correlation coefficient that found is 0.865019. The value is close
to 1, it is a strong positive correlation.

14
Figure 2.0: Shows that CORREL function in Microsoft Excel.

c) Using the LINEST function in Microsoft Excel, the slope and intercept values are determined.

Figure 3.0: Shows that the value of slope and intercept determined from LINEST
function.

First, the value of y for known_y’s command is inserted.

15
Figure 4.0: Shows that whole of value y inserted to known_y’s command.

16
Second, the value of x for known_x’s command is inserted.

Figure 5.0: Shows that whole of value x inserted to known_x’s command.

Then, TRUE was typed to generate the regression model for const and stats
command.

17
Figure 6.0: Shows that TRUE is inserted to const and stats command.

Finally, Ctrl + Shift + Enter were clicked simultaneously and the regression model
was generated.

Figure 7.0: Shows that the regression model generated from LINEST function.

18
Based on the linear regression line equation, which is y = bx + a, where b is the slope
and a is the y-intercept when x is zero. The equation becomes y = 0.1743x – 10.1315.

Blood pressure rise versus sound pressure level


10
9 y = 0.1743x - 10.132
R² = 0.7483
Blood pressure rise (MM HG)

8
7
6
5
yx
4
Linear (y x)
3
2
1
0
0 20 40 60 80 100 120
Sound pressure level (DB)

Figure 8.0: Shows that scatter graph for blood pressure rise versus sound pressure
level.

d) y = 0.1743x - 10.132
Where, x = 85
ŷ = 0.1743 (85) – 10.132
ŷ = 14.8155 – 10.132
ŷ = 4.6835

e) Using the Regression under Data Analysis to determine a 99% confidence interval
(CI) for the slope, B. First, set the confidence interval at 99%. Then, the whole
column of y is inserted under the Input Y Range.

19
Figure 9.0: Shows that whole column of y is inserted under the Input Y Range.

After that, the whole column of x is inserted under the Input X Range.

Figure 10.0: Shows that whole column of x is inserted under the Input X Range.

20
Summary Output show;

Figure 11.0: Shows that summary output from REGRESSION function.

Based on the summary output, the 99% confident interval (CI) for the slope, B is
(0.1057, 0.2429).

21
APPENDICES:
Appendix a

22
Figure 0.0: Data gained from the survey
Appendix b
Studentized Range q Table (to find Qc)

Equation:

𝑀𝑆𝐸
df=k , Den df= N-c, Critical range= Qc√ 𝑛

23
REFERENCES
1. Jalayer Academy, “Tukey Kramer Multiple Comparison Procedure and ANOVA with
Excel” 9 Sep 2014. [Online].
Available: https://www.youtube.com/watch?v=jtRK8hCDPU0 . [Accessed 7 May
2019]

2. Charles Zaiontz, “Real Statistics Using Excel” 2012-2019. [Online].


Available: http://www.real-statistics.com/statistics-tables/studentized-range-q-table/.
[Accessed 7 May 2019]
3. Corporate Finance Institute, “Negatively Skewed Distribution” 2015 to 2019.
[Online].
Available:https://corporatefinanceinstitute.com/resources/knowledge/other/negatively
-skewed-distribution/. [ Accessed 14 May 2019]

4. My Data Analysis Site, “Calculating the Interquartile Range in Excel” 19 Feb 2018.
[Online]. Available: https://www.youtube.com/watch?v=JaoA38n0pqI&t=99s.
[ Accessed 14 May 2019]

24

You might also like