You are on page 1of 13

_____________________________________________________________________________________

CENTRE FOR DIPLOMA STUDIES


PROBLEM-BASED LEARNING (PBL) PROJECT
_____________________________________________________________________________________
COURSE CODE DAC21903
_____________________________________________________________________________________
COURSE NAME STATISTICS
_____________________________________________________________________________________
GROUP 9
_____________________________________________________________________________________
GROUP MEMBERS 1. JUSTIN KOH TECK YUAN (AA201160)
2. MUHAMMAD FARIK BIN ABD HALIM (AA201036)
3. MUHAMMAD SYAZRIQ BIN MAZEHAN (AA200177)
4. THIVNASHDEWI A/P MUTHUSAMY (AA201002)
5. AWANG ALI ZAINUL ABIDIN BIN AWANG ALI RAHMAN
(AA200641)
_____________________________________________________________________________________
SECTION 2
_____________________________________________________________________________________
SEMESTER SEMESTER 1 2021/2022
_____________________________________________________________________________________
LECTURER’S NAME DR SITI SAMAHANI BINTI SURADI
_____________________________________________________________________________________
SUBMIT DATE 23 December 2021
_____________________________________________________________________________________

i
TABLE OF CONTENT

NO. CONTENTS PAGE NUMBER

1. Introduction 1

1.1 FILA Table 2

1.2 Minutes Meeting 3

2. Problem and Solution 4

3. Discussion and Conclusion 9

4. References 11

5. Link of Video 11

ii
1.0 INTRODUCTION

Regression models describe the relationship between variables by fitting a line to the
observed data. Linear regression models use a straight line, while logistic and nonlinear regression
models use a curved line. Regression allows you to estimate how the dependent variable changes
as the independent variable(s) change. Simple linear regression is used to estimate the relationship
between two quantitative variables. You can use simple linear regression when you want to know:
How strong the relationship is between two variables (e.g., the relationship between rainfall and
soil erosion). The value of the dependent variable at a certain value of the independent variable
(e.g., the amount of soil erosion at a certain level of rainfall). It’s also a statistical method that
allows us to summarize and study relationships between two continuous (quantitative) variables:
One variable, denoted x, is regarded as the predictor, explanatory, or independent variable. The
other variable denoted y is regarded as the response, outcome, or dependent variable. Simple linear
regression gets its adjective "simple," because it concerns the study of only one predictor variable.

There are two methods to find the simple regression model which are the graphical method
and the least square method. By using the graphical method, scatter diagram is plotted and drawn
to find the simple regression model. A scatter plot is a two-dimensional Cartesian graphic with
paired xi and yi values to look for a relationship between them. Apart from that, the least square
approach involves reducing the sum of the squares of the offsets (residual component) of the points
from the curve to get the best-fitting curve or line of best fit for a group of data points. The trend
of outcomes is statistically evaluated throughout the process of determining the relationship
between two variables.

Besides, the coefficient of determination is a method used to predict and explain the future
outcomes of a model. This method is also known as R2. This method also serves as a guide for
determining the accuracy of the model. The coefficient of determination is the square of the
correlation(r), thus it ranges from 0 to 1. The coefficient of determination in linear regression is
equal to the square of the correlation between the x and y variables.

1
1.1 FILA TABLE

PROBLEM FORMAT
Each group is required to collect data directly by recording information from nutrition labels of a variety of snack
foods. For this project, you have to explore the relationship between fat contents per serving and calories per
serving. You can collect as much as you can but the minimum data is 25. Analyse the data and interpret the
result. Note that all students are required to record the short video while conducting the experiment and recording
the dependent variable data for the data validation.

i. Create a scatterplot of this data between fat contents and calories and describe the relationship.
ii. In the scatterplot you made, what is the explanatory variables and response variable? Explain with a short
answer why you want to construct this problem by using simple linear regression.
iii. Fit a simple linear regression. Write the fitted model using mathematical notation and interpret the slope
and intercept. Sketch the line.
iv. Find and interpret the value of 𝑅2
FILA TABLE
FACTS IDEAS LEARNING ISSUES RESOURCES NEEDED
1. Each group is 1. Explore the 1. How to create a 1. Internet.
required to relationship scatterplot of this data
collect data by between fat and describe the 2. Video from
recording contents relationship? YouTube
information from per serving
nutrition labels of and 2. What are the explanatory 3. Reference Statistic
a variety of snack calories per variables and response Note
foods. serving variable?

2. All students are 2. Minimum 3. Explain why want to


required to record data is 25 construct this problem
the short video by using simple linear
while conducting 3. Analyse regression?
the experiment the data
and 4. How to Write the fitted
3. All students interpret model using
should Recording the result mathematical notation
the dependent and interpret the slope
variable data for and intercept?
the data
validation 5. How to Find and
interpret the value of 𝑅2?

2
1.2 MINUTES MEETING

DATE : 17th DECEMBER 2021


TIME : 10.00 p.m. - 10.30p.m.
PLACE : At respective houses (using platform: Google Meet)

ATTENDANCE
1. JUSTIN KOH TECK YUAN (AA201160)
2. AWANG ALI ZAINUL ABIDIN BIN AWANG ALI RAHMAN (AA200641)
3. MUHAMMAD FARIK BIN ABD HALIM (AA201036)
4. MUHAMMAD SYAZRIQ BIN MAZEHAN (AA200177)
5. THIVNASHDEWI A/P MUTHUSAMY (AA201002)

ACTIVITIES
1. Brief and collect the data needed (25 snacks).
2. Divide the task to each member equally to complete the PBL report.
a. JUSTIN KOH : Do problem and solution part and edit video
b. AWANG ALI ZAINUL : Do introduction part
c. MUHAMMAD FARIK : Do FILA Table and Minutes meeting
d. MUHAMMAD SYAZRIQ : Do problem and solution part
e. THIVNASHDEWI : Do discussion & conclusion part
3. Each group member presents the respective section they do.

Verified by,

…………………………………
(DR SITI SAMAHANI BINTI SURADI)
DATE:

3
2.0 PROBLEM AND SOLUTION

Each group is required to collect data directly by recording information from nutrition labels of a
variety of snack foods. For this project, you have to explore the relationship between fat contents
per serving and calories per serving. You can collect as much as you can but the minimum data is
25. Analyse the data and interpret the result.

No. Snack Name Fat contents, x Calories, y


1 Mentos Gum 0 5
2 Pretzels 1 171
3 Skittles Original 2 160
4 Nestle Honey Stars 3 385
5 Apollo Milk Chocolate Cream Wafer 3 63
6 Twisties 4 76
7 Popo Fish Muruku 5 68
8 M&M'S Chocolate Candies 5 140
9 Mamee Monster Noodle Snack 6 141
10 Koko krunch 6 198
11 DORITOS® Nacho Cheese Flavored Tortilla Chips 8 150
12 Cheez It 8 150
13 Hup Seng Plain Cream Crackers 9 160
14 Pringles Original 9 152
15 Snek Ku Mimi 9 156
16 LAY'S® Classic Potato Chips 10 160
17 CHEETOS® Crunchy Cheese Flavored Snacks 10 160
18 Fritos Classic Ranch Corn Chips 10 160
19 Nestle Kit Kat 12 230
20 Pocky stick 12 250
21 Hershey's Milk Chocolate Bar 13 220
22 Snickers bar 14 280
23 Ferrero Rocher Hazelnut Chocolates 16 230
24 Oreo Cookies 21 471
25 Snek Ku Tam Tam Crab Flavoured Snack 30 533

4
i. Create a scatterplot of this data between fat contents and calories and describe the
relationship.

Scatterplot diagram of calories per serving against fat


contents per serving
600
550
Calories per serving, y

500
450
400
350
300
250
200
150
100
50
0
0 5 10 15 20 25 30
Fat contents per serving, x

When the value of fat contents per serving increases, the value of calories per serving also
increases. There is an upward slope and a straight-line pattern in the plotted data points which
indicates the linear relationship.

ii. In the scatterplot you made, what is the explanatory variable and response variable?
Explain with a short answer why you want to construct this problem by using simple linear
regression.

Based on the scatterplot we had made, the explanatory variable is the fat contents per serving while
the response variable is the calories per serving. We construct this problem by using simple linear
regression because we have only one independent variable which is the fat content per serving, x
and one dependent variable which is the calories per serving from the data, y. There is a correlation
that exists between two variables as a dependent variable will change when the independent
variable change. Besides, we also want to find the strength of the relationship between the
variables.

5
iii. Fit a simple linear regression. Write the fitted model using mathematical notation and
interpret the slope and intercept. Sketch the line.

No. x y 𝒙𝟐 𝒚𝟐 xy
1. 0 5 0 25 0
2. 1 171 1 29241 171
3. 2 160 4 25600 320
4. 3 385 9 148225 1155
5. 3 63 9 3969 189
6. 4 76 16 5776 304
7. 5 68 25 4624 340
8. 5 140 25 19600 700
9. 6 141 36 19881 846
10. 6 198 36 39204 1188
11. 8 150 64 22500 1200
12. 8 150 64 22500 1200
13. 9 160 81 25600 1440
14. 9 152 81 23104 1368
15. 9 156 81 24336 1404
16. 10 160 100 25600 1600
17. 10 160 100 25600 1600
18. 10 160 100 25600 1600
19. 12 230 144 52900 2760
20. 12 250 144 62500 3000
21 13 220 169 48400 2860
22. 14 280 196 78400 3920
23. 16 230 256 52900 3680
24. 21 471 441 221841 9891
25. 30 533 900 284089 15990
∑ Total 226 4869 3082 1292015 58726

6
𝑛
(∑𝑛𝑖=1 𝑥𝑖 )2 1
𝑛
𝑆𝑥𝑥 = ∑ 𝑥𝑖2 − 𝑥̅ = ∑ 𝑥𝑖
𝑛 𝑛
𝑖=1
𝑖=1

(226)2 1
𝑆𝑥𝑥 = 3082 − = (226)
25 25
= 9.04
= 1038.96

𝛽̂0 = ̅𝑦 − 𝛽̂1 𝑥̅
𝑛 = 194.76 − 14.15862016 (9.04)
(∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )
𝑆𝑥𝑦 = ∑ 𝑥𝑖 𝑦𝑖 − = 66.766
𝑛
𝑖=1

(226)(4869)
𝑆𝑥𝑦 = 58726 − 𝑦̅ = 𝛽̂1 𝑥 + 𝛽̂0
25
𝑦̅ = 14.159𝑥 + 66.766
= 14710.24

Model:
𝑦̅ = 14.159𝑥 + 66.766
𝑆𝑥𝑦
𝛽̂1 = 𝛽̂1 = 𝑠𝑙𝑜𝑝𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑖𝑛𝑒, 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 14.159
𝑆𝑥𝑥
𝛽̂0 = 𝑦 − 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡, 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 66.766
14710.24
𝛽̂1 =
1038. .96

= 14.15862016

𝑛
1
𝑦̅ = ∑ 𝑦𝑖
𝑛
𝑖=1

1
= (4869)
25

= 194.76

7
Scatterplot diagram of calories per serving
against
fat contents per serving
600
550
Calories per serving, y
500
450
400
350
300
250
200 y = 14.159x + 66.766
150
R² = 0.6059
100
50
0
0 5 10 15 20 25 30
Fat contents per serving, x

iv. Find and interpret the value of 𝑅 2 . 𝑛


(∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )
𝑆𝑥𝑦 = ∑ 𝑥𝑖 𝑦𝑖 −
𝑛
𝑖=1

𝑛
(∑𝑛𝑖=1 𝑥𝑖 )2 (226)(4869)
𝑆𝑥𝑥 = ∑ 𝑥𝑖2 − 𝑆𝑥𝑦 = 58726 −
𝑛 25
𝑖=1
= 14710.24
(226)2
𝑆𝑥𝑥 = 3082 −
25

= 1038.96 (𝑆𝑥𝑦 )2
𝑅2 =
𝑆𝑥𝑥 𝑆𝑦𝑦

(14710.24)2
𝑛
(∑𝑛𝑖=1 𝑦𝑖 )2 𝑅2 =
𝑆𝑦𝑦 = ∑ 𝑦𝑖2 − 1038.96 × 343728.56
𝑛
𝑖=1
= 0.6059
(4869)2
𝑆𝑦𝑦 = 1292015 −
25 𝑅 2 = 0.6059 ,which indicates 60.59% of
= 343728.56 the variation in y can be explained by x.
Besides, 0 ≤ 𝑅 2 ≤ 1 which indicate the
extent to which the dependent variable is
predicted from the independent variable.

8
3.0 DISCUSSION AND CONCLUSION

Simple linear regression is a regression model that estimates the relationship between one
independent variable and one dependent variable using a straight line. My group members
collected about 25 snack foods to figure out the relation between fat content per serving and calorie
content per serving. For question (i), the question required to create a scatterplot of this data
between fat contents and calories and describe the relationship. Before plotting data into the graph,
a table consisting of items name, fat contents, x and calories, y were being jotted down. A
correlation exists between two variables when one of them is related to the other in some way. A
scatterplot (or scatter diagram) is a graph of the paired (x, y) sample data with a horizontal x-axis
and a vertical y-axis. From the scatterplot diagram, we can conclude that when the amount of fat
per serving increases, the number of calories per serving will increase as well. The plotted data
points have an increasing slope and a straight-line pattern. Thus, it indicates a linear relationship.
Based on the question (ii), the question asked to define the explanatory variable and response
variable from the scatterplot that we made and explain by using simple linear regression. Based on
the scatterplot that has been drawn, the explanatory variable is the fat contents per serving while
the response variable is the calories per serving. This problem is constructed using simple linear
regression. From this question, the independent variable is the fat content per saving, x and the
dependant variable is the calories per serving from the data, y. We construct this problem by using
simple linear regression because we have only one independent variable which is the fat content
per serving, x and one dependent variable which is the calories per serving from the data, y.
Because one of the variables is related to the other, there is a correlation between the two. We also
want to define the strength that is associated between the variables. In question (iii), the question
needs us to fit a simple linear regression. Then, write the fitted model using mathematical notation
and interpret the slope and intercept and sketch the line. First, a table consists of x, y, 𝑥 2 , 𝑦 2 and
xy has been drawn. In order to solve this question, firstly the following formula will be used 𝑆𝑥𝑥 =
2
(∑𝑛
𝑖=1 𝑥𝑖 )
∑𝑛𝑖=1 𝑥𝑖2 − . After that, Sxx will be calculated. Therefore, Sxx value will be 1038.96. Next,
𝑛
(∑𝑛 𝑛
𝑖=1 𝑥𝑖 )(∑𝑖=1 𝑦𝑖 )
the value of Sxy was calculated using 𝑆𝑥𝑦 = ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − and by substituting the
𝑛

values we will get Sxy equals 14710.24. Next, the slope of the line will be determined by dividing
𝑆𝑥𝑦 1
. and by substituting the values, 𝛽̂1 will be 14.15862016 . Next, 𝑦̅ = ∑𝑛𝑖=1 𝑦𝑖 will be
𝑆𝑥𝑥 𝑛

9
1
determined by substituting the values. Thus, 𝑦̅ will be 194.76. After that, 𝑥̅ = ∑𝑛𝑖=1 𝑥𝑖 will be
𝑛

determined. The value of 𝑥̅ will be 9.04. Last but not least, the y-intercept will be determined using
𝛽̂0 = ̅𝑦 − 𝛽̂1 𝑥̅ . The value that will get is 66.766. For question (iv), the question asked to find and
interpret the value of 𝑅 2 . In order to solve this, first 𝑆𝑥𝑥 will be find using 𝑆𝑥𝑥 = ∑𝑛𝑖=1 𝑥𝑖2 −
2
(∑𝑛
𝑖=1 𝑥𝑖 ) (226)2
. After substituting the values, we will get 𝑆𝑥𝑥 = 3082 − . Thus, we will get the
𝑛 25
2
(∑𝑛
𝑖=1 𝑦𝑖 )
value of 𝑆𝑥𝑥 is 1038.96. Next, value of 𝑆𝑦𝑦 will be determine by using 𝑆𝑦𝑦 = ∑𝑛𝑖=1 𝑦𝑖2 − .
𝑛
(4869)2
Then, we will get 𝑆𝑦𝑦 = 1292015 − . Therefore, value of 𝑆𝑦𝑦 will be 343728.56. Then,
25
(∑𝑛 𝑛
𝑖=1 𝑥𝑖 )(∑𝑖=1 𝑦𝑖 )
value of 𝑆𝑥𝑦 will be find by using 𝑆𝑥𝑦 = ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − . After substituting the value,
𝑛
(226)(4869)
we will get 𝑆𝑥𝑦 = 58726 − . Value of 𝑆𝑥𝑦 will be 14710.24. After that, value 𝑅 2 will
25
(𝑆𝑥𝑦 )2
be determined using 𝑅 2 = 𝑆 . The value of 𝑅 2 is 0.6059 or 60.59 %. We can also know that
𝑥𝑥 𝑆𝑦𝑦

0 ≤ 𝑅 2 ≤ 1 which indicate the extent to which the dependent variable is predicted from the
independent variable. In a nutshell, regression analysis is all about determining how changes in
the independent variables are associated with changes in the dependent variable.

10
4.0 REFERENCES

REFERENCE FROM BOOK

1. Bowerman, B. L., & Murphree, E. (2014). Regression analysis: Unified concepts, practical
applications, computer implementation. Business Expert Press.

2. Weisberg, S. (2013). Applied linear regression. John Wiley & Sons, Incorporated.

REFERENCES FROM WEBSITES

1. Kiernan, D. (2014, January 16). Chapter 7: Correlation and Simple Linear Regression.
Retrieved December 22, 2021, from Geneseo.edu website:
https://milnepublishing.geneseo.edu/natural-resources-biometrics/chapter/chapter-7-
correlation-and-simple-linear-regression/

2. Bevans, R. (2020, February 19). An introduction to simple linear regression. Retrieved


December 22, 2021, from Scribbr website: https://www.scribbr.com/statistics/simple-
linear-regression/

5.0 LINK OF VIDEO

https://youtu.be/Q8dQZg5ca1I

11

You might also like