Professional Documents
Culture Documents
Project Sta108 (Finalized) (Lasttt) PDF
Project Sta108 (Finalized) (Lasttt) PDF
GROUP : AS1204_M
1
Table of Contents
2
4.1 Report Summary............................................................................................................................ 37
REFERENCES ......................................................................................................................................... 38
APPENDIX ................................................................................................................................................ 39
3
CHAPTER 1: INTRODUCTION
Cholera is an illness caused by infection of the intestine with the toxigenic bacterium Vibrio
cholerae. A bacterium called Vibrio cholerea causes cholera infection. The deadly effects of
the disease are the result of toxin that the bacteria produce in the small intestine. So, the toxin
causes the body to secrete enormous amount of water, leading to diarrhea and a rapid loss
of fluids and salts. In Malaysia, there were 21535 cases that have been reported but the total
of death caused by Cholera were only 388 cases from year 1971 until 2000. This study was
taken to analyse the relationship between the number of reported cases and total death
caused by Cholera in Malaysia.
Based on this study, the number of reported cases is a manipulated variable while total death
caused by Cholera in Malaysia is a responded variable. It is because, total death caused by
Cholera in Malaysia depends on the number of reported cases. The data shows a positive
correlation which the value is 0. 7432.The value of correlation suggests a moderate correlation
relationship between the number of reported case and total death caused by Cholera in
Malaysia from 1971 until 2000. The higher number of reported cases, the higher total death
caused by Cholera in Malaysia.
4
1.2 Objectives of Study
1) To determine the relationship between the number of reported cases and total death
caused by Cholera in Malaysia.
2) To obtain the types of graph that suitable for the data.
3) To find the values of mean, standard deviation and interquartile range.
4) To determine the correlation and regression of data.
The data for this study is easy to access since it is already available at World Health
Organisation (WHO) website. Next, it helps to save more time and money as well since we
do not need to analyse, interpret the result and collect the data on our own. This kind of data
is way more cheaper compared to primary data. Hence, the secondary data is more accurate
than the primary data. It is because the values may be obtained rapidly. The stability of the
data also high since it is done by the expert researcher from the other country.
The limitation of this study is that no session for asking question can be made to prove more
about the accuracy of data since this data is already available in World Health Organisation
(WHO) website. Next, the data may slightly different in term of purpose of study to match with our
objective. It is because the data was already found from other researcher.
5
CHAPTER 2: METHODOLOGY
2.1.1 Population
The population that were used in this study is the number of reported cases and total death
caused by cholera from year 1971 to 2000 in all country of the world.
2.1.2 Samples
Sample that were used in this study is number of reported cases and total death caused by
cholera in Malaysia from year 1971 to 2000.
2.1.4 Variables
The variables that were used in this study is the number of reported cases and total death caused
by cholera from year 1971 to 2000 in Malaysia where there are 30 of observation were taken for
both variables. In statistic, there are two variables which are discrete and continuous variable.
The continuous variable is refer to a variable which is a response are taken on values to measure
the variable. This variable is not chosen because the data is a secondary data. In this study, the
type of variable that are used is discrete variable. This is because the data that were obtained in
this study is a quantitative data which is a numerical data where it is suitable for the discrete
variable that is a countable variable.
Due to the data that were obtained in this study is a grouped frequency distribution the histogram
graph was chosen. As shown in the figure 1 the vertical of the bar is to represent the frequency
of the class. The histogram graph used the frequency of the class as y-axis, and the class
boundary as the x-axis.
Figure 1
7
Figure 2
The figure 3 below shows the scatter diagram. The scatter diagram is known as nature of the
relationship between two continuous variable which are the dependent variable and the
independent variable. From the scatter diagram the characteristic of different possible correlation
can also be describe to identified how closed the relationship between the two variables. Type of
the characteristic is positive correlation, negative correlation, no correlation, curvilinear correlation
and perfect positive correlation. For the positive correlation it can be identify when the two variable
which is the dependent, y-axis and the independent variable, x-axis shows a positive variable.
The change of the direction on the x-axis will shows an increasing and also for the y-axis.
Secondly, for the negative correlation it will shows a negative relationship between the two
variables. The change of direction for both independent and dependent variable for negative
correlation have different direction which is when the independent variable, x-axis increases the
dependent variable, y-axis would be decrease.
8
Figure 3
Based on the figure 3 above the scatter diagram shows a positive skewness which mean in this
it have a positive relationship between the 2 variable where when the independent variable, x-
axis (number of reported case) is increase the dependent variable, y-axis (total death) also
increase.
9
2.3 Numerical Technique
2.3.2 Mean
Mean is known as the average of the data. It is the total of all the data observation divides by the
number of the data observation. It can be calculated on both grouped and ungrouped data.
Ungrouped data:
∑𝑥
𝑥̅ =
𝑛
Grouped data:
∑ 𝑓𝑥
𝑥̅ = [ ]
𝑛
2.3.3 Median
Median is the value that were arrange in an ascending order to determine its middle value. The
interpretation of median is 50% of the total number of observations having a value less than a
median value while another 50% of the total number of observations having a value more than a
median value.
10
Ungrouped Data
Grouped Data
Steps to calculated:
iii. Refer the position value in cumulative to find the class median
iv. Use the formula:
∑𝑓
− ∑ 𝑓𝑚−1
𝑥̃ = 𝐿𝑚 + [ 2 𝑓𝑚
].c
Where,
n=sample size
11
𝑓𝑚 = frequency of the median class
2.3.4 Mode
Mode is the value that is more frequent that occur on the data. Where it have the formula for the
ungrouped and grouped data. For ungrouped data:
ii. Find the mode (most frequently in a set of data) Then the mode is determined by analyzing the
most frequent value occur in those set of data.
iv. While for a quantitative data can be determined on the histogram, also the mode and the class
interval with the highest frequency can be determined.
There is also a special case for the mode which is the method is:
∆1
𝑥̂ = 𝐿𝑚0 + [ ].c
∆ 1 + ∆2
where,
12
∆1 =(modal class frequency – frequency for the class before the modal class)
∆2 = modal class frequency – frequency for the class after the modal class)
The data distribution is skewed to the left or left skewness distribution. If the mode > median >
mean (or simply mean < median or mean < mode).
the data distribution is skewed to the right or right skewness distribution If the mode < median <
mean (or simply mean > median or mean > mode).
Measure location is which it included the quartile where it separate into ungrouped and grouped
data. In the ungrouped data it is used to represent the position of the value with a large sets of
data of numerical data. Basically, ungrouped data quartile it is the extension of the median. It is
also the most used to non-central places. It actually divides the region under the frequency curve
into four equal areas. As for the:
Ungrouped Data
There have 3 position in the quartile:
First Quartiles / Lower Quartiles ( 𝑄1 ) - 25%of the total data is less than first quartile value
and 75% of the total data is more than first quartile value.
𝑛+1
𝑄1 = 𝑡ℎ
4
13
Second Quartiles / Median ( 𝑄2 ) - 50%of the total data is less than second quartile value and
50% of the total data is more than second quartile value.
2(𝑛 + 1)
𝑄2 = 𝑡ℎ
4
Third Quartiles/ Upper Quartiles (𝑄3 ) - 75%of the total data is less than third quartile value and
25% of the total data is more than third quartile value.
3(𝑛 + 1)
𝑄3 = 𝑡ℎ
4
Grouped Data
The quartile in grouped data their position can be measured by the first and the third quartile as
𝑄1 and 𝑄3 . The first and third quartiles can be calculated based on the distribution of a table and
also using the ogive.
Step 1 : the cumulative frequencies is obtained and also the position of the data.
Step 2 After identified the first and third quartile classes. Obtain the first location of the first and
the third quartile by using the formula and . then refer to the cumulative frequency column to
determine the locations and classes it place and lie. Within these classes, the value s of and can
be determine.
Step 3 : Find the first and third quartile as follows
14
𝑛
− 𝑓𝑄1 −1
𝑄1 = 𝐿𝑄1 + [4 ] × 𝐶𝑄1
𝑓𝑄1
where
n= number of observations.
𝐿1 = lower boundary of the first quartile class
𝑓𝑚−1= cumulative frequency before the first quartile class
𝑓1= frequency of the first quartile class
𝐶1 = first quartile class size
3𝑛
− 𝑓𝑄3
𝑄3 = 𝐿𝑄3 + [ 4 ] × 𝐶𝑄3
𝑓𝑄3
where
n = number of observations.
𝐿3 = lower boundary of the first quartile class
𝑓𝑚−1= cumulative frequency before the first quartile class
𝑓3= frequency of the first quartile class
𝐶3 = 𝑡ℎ𝑖𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒
understand the spread or variability of a set of data about the mean. It gives additional information
to judge the reliability of the measure of central tendency and helps in comparing dispersion that
is present in various samples. Some of the measure of dispersion that is discussed on this topic
is range, variance and standard deviation.
15
2.5.1 Range
In statistic the simplest measure of dispersion is the range which the difference between the
largest and the smallest value of data. So, with this two value of the data the range of the data
distribution can be obtained
1
𝜎 2 = [ ] ∑(𝑋 − 𝜇)2
𝑁
Where,
𝜎 2 = population variance
X = observation
N= total number of observation in the population
16
∑ = sum of all values
𝜇 = population mean
Grouped data
1 ∑ 𝑓𝑥
𝜎2 = ∑ 𝑓𝑥 2 –( )
𝑁 𝑁
𝜎 2 = population variance
X = observation
N= total number of observation in the population
∑ = sum of all values
𝜇 = population mean
Ungrouped data
1
𝜎 2 = √ ∑(𝑋 − 𝜇)2
𝑁
𝑋= observation
𝜇= population mean
17
Grouped data
1 ∑ 𝑓𝑥
𝜎 2 = √ ∑ 𝑓𝑥 2 – ( )
𝑁 𝑁
𝑋= observation
𝜇= population mean
Ungrouped data
1 (∑ 𝑥)2
𝑆2 = (∑ 𝑥 2 – )
𝑛−1 𝑛
X = observation or value
18
Grouped data
1 (∑ 𝑓𝑥)2
𝑆2 = (∑ 𝑓𝑥 2 – )
𝑛−1 𝑛
1 (∑ 𝑥)2
𝑆2 = √ (∑ 𝑥 2 – )
𝑛−1 𝑛
Grouped data
1 (∑ 𝑓𝑥)2
𝑆2 = √ (∑ 𝑓𝑥 2 – )
𝑛−1 𝑛
If the frequency curve has longer tail to left the distribution is known as negatively
19
Positively skewed distribution:
If the frequency curve has longer tail to right the distribution is known as positively
To represent a graphical data the box-and-whisker plot is one that useful method by using
minimum, maximum, first quartile, third quartile, and the median. The shape of data distribution
of the box-plot can be obtained and also it can determine if there are any outliers in the data.
Figure below is to show the Box-and-whisker plots for various types of distribution.
Figure 4
Based on the figure above, the first picture shows a normal distribution where the right and left
whisker are the same length. The second picture shows, the distribution is a positive skewed or
skewed to the right where the right whisker is longer than the left whisker. Lastly, the last picture
shows a negative skewed or skewed to the left distribution where the left whisker is longer than
the right whisker.
i. If skewness = 0 (symmetrical)
20
ii. If skewness > 0 (skewed to the right)
2.9 CORRELATION
Correlation analysis is use to analyzes the relationship between the 2 variable. Where it is to
measure how closed the two data series that are related. In particular, the correlation coefficient
is to measures the direction and the extent of linear association between two variables. There are
several types of correlation coefficients which include the Pearson product moment correlation
coefficient which is normally known by r. This Pearson’s correlation coefficient tells us two types
of relationship between the two variables. While the sign ( - or + ) is to identify what kind of
relationship of the r between the two quantitative variables, and the strength of the relationship
between the two variables describe the magnitude of the r. Which is the magnitude of the
correlation are lies between the value -1.0 and 1.0.
∑ 𝑥𝑖 𝑦𝑖
∑ 𝑥𝑖 𝑦𝑖 −
𝑛
r=
(∑ 𝑥𝑖 )2 ∑𝑦 2
√[∑ 𝑥𝑖 2 − ][∑ 𝑦𝑖 2 − 𝑛𝑖 ]
𝑛
r = Correlation coefficient
n = number of observation
x = independent variable
y = dependent variable
21
The value of r is always -1 ≤ r ≤ 1. A value of r greater than 0 indicates a positive linear association
between the two variables.
A value of r less than 0 indicates a negative linear association between the two
variables.
A value of r equal to 0 indicates no linear relation between the two variables.
|𝑟| =0 = No Correlation
2.9.2 Regression
Basic regression model where it consist of only one for independent variable and one for
dependent variable. To study the relationship between this two variable is:
1.Collect the data and then construct a scatter plot. The purpose of the scatter plot, as indicated
previously, is to determine the nature of the relationship where the possibilities include a positive
linear relationship, a negative linear relationship, a curvilinear relationship, or no discernible
relationship.
2. Compute the value of the correlation coefficient and then the value is test to identify its
significance of the relationship. If the value of the correlation coefficient is significant,
3. The equation of the regression line can be determined, in this state which we will find the data’s
best fit line. (Note: Determining the regression line when r is not significant and then
making predictions using the regression line are meaningless.). The purpose of the
regression line is to enable the researcher to see the trend and to make predictions on
22
the basis of the data. The simple linear model can be stated as follows;
𝑦𝑖 = 𝛽0 + 𝛽1 𝑋1 + 𝜀𝑖
Where,
𝑋1 = is a known constant the value of the independent variables in the ith trial
∑ 𝑥𝑦
∑ 𝑥𝑦−
𝑛
𝑏1 = 2 (∑ 𝑥)2
𝑏0 = 𝑦̅ − 𝑏1 𝑥̅
[∑ 𝑥 − 𝑛 ]
23
variability in Y can be explained by the fact that they are related to X. For simple linear regression
line of y on x, the coefficient of determination is the square root of the correlation
coefficient, r. Because of this, we can state that:
Explained Variation
Coefficient of Determination, 𝑅 2 = TotalVariation
𝑏0 = 𝑦̅ − 𝑏1 𝑥̅
24
CHAPTER 3: RESULTS AND INTERPRETATION
Table 1 : Number of Reported Cases and Total Death Caused by Cholera in Malaysia From Year
1971 To 2000
25
1996 1486 0
1997 389 4
1998 1304 19
1999 535 0
2000 124 1
26
3.2 DESCRIPTIVE STATISTICS ANALYSIS
3.2.1 Histogram
Figure 5
The above graph on the fiqure 5 shows positive data set, which it represent the number of
reported cases caused by cholera for a range of 30 years observation from years 1971 to 2000
in Malaysia. Based on the histogram above, the higher cases that is reported is about 2000 and
above and the lower cases that is reporter is about 50 and above. The distribution of the histogram
above is skewed to the right. While the value for the mean and standard deviation is 717.83 and
656.816
27
3.2.2 Histogram
Figure 6
The above graph on the figure 6 shows positive data set, which it represents the total death
caused by cholera in Malaysia from year 1971 to 2000 in Malaysia. Based on the histogram
above, the higher death that is reported is about 6 and above and the lowest death that is reported
is 0. The distribution of the histogram above is skewed to the right. While the value for the mean
and standard deviation is 12.93 and 14.579
28
3.2.3 Box Plot
Figure 7
Based on the figure 7 of the boxplot above the median for the number of reported cases caused
by cholera from year 1971 to 2000 is 488.00. While the interquartile range is about 987 number
of reported cases which mean in this about 50% at Malaysia have between 215.50 and 1202.0
number of reported cases.
29
3.2.4 Box Plot
Figure 8
Based on the figure 8 of t the boxplot above the value of the median for total death caused by
cholera in year 1971 to 2000 is 9.00. While the inter quartile range is about 16 total death which
mean in this about 50% at Malaysia have between 1.75 and 17.25 total death.
30
3.2.5 Descriptive
Figure 9
As from the table above, the minimum and maximum value for number of reported cases are 52
and 2209 respectively. Then, the mean and standard deviation calculated for the number of
reported cases are 717.83 and 659.816. Hence, minimum value of total death caused by cholera
in Malaysia is 0 while the maximum value is 64. Lastly, the mean and standard deviation for total
death are 12.93 and 14.579.
31
3.3 CORRELATION AND REGRESSION
Figure 10
This scatter plot suggests a positive correlation relationship between number of reported cases
and total death caused by the disease of cholera in Malaysia from the year 1971 to 2000.
32
3.3.2 Correlation
Figure 11
The value of r = 0.743 suggests a moderate correlation relationship between number of reported
cases and total death caused by cholera in Malaysia from the year 1971 to 2000. That is the
higher the number of reported cases, the higher the total death due to this disease.
33
3.3.3 Regression
Figure 12
Figure 13
Coefficient of determination, R2 = 0.552 means that 55.2 % of the variability of total death can be
explained by the number of reported cases. The remaining 44.8 % is unexplained variability of
total death.
34
Figure 14
Figure 15
The value of r = 0.743 suggests a moderate correlation relationship between number of reported
cases and total death caused by cholera in Malaysia from the year 1971 to 2000. That is the
higher the number of reported cases, the higher the total death due to this disease. The regression
equation is ŷ = 1.146 + 0.016 x. The value of β1 = 0.016 means that for every increase in number
of reported cases, the total death will increase by 0.016.
35
3.3.4 Fitting A Straight Line
Figure 16
If the number of reported cases increase by 1 rate, the total death predicted will increase by 0.016.
36
CHAPTER 4: CONCLUSION
From this study, it can be conclude that the relationship between number of reported cases and
total death caused by Cholera in Malaysia shows a positive correlation. Next, the graph that
suitable for this data is histogram. Besides, the value of mean for this data is 12.93, standard
deviation is 14.579 and for interquartile range is 16. For this data, the value of correlation is 0.743
which suggest a moderate correlation relationship between number of reported cases and total
death caused by Cholera in Malaysia from year 1971 to 2000.The regression equation is ŷ = 1.146
+ 0.016 x. The value of β1 = 0.016 means that for every increase in number of reported cases,
the total death will increase by 0.016.
37
REFERENCES
2. Number of reported cases of cholera. (n.d.). Retrieved June 11, 2020, from
https://www.who.int/data/gho/data/indicators/indicator-details/GHO/number-of-reported-
cases-of-cholera
38
APPENDIX
Frequencies
Statistics
Number_Of_Re
ported_Cases Total_Death
N Valid 30 30
Missing 0 0
Frequency Table
Number_Of_Reported_Cases
Cumulative
Frequency Percent Valid Percent Percent
Valid 52 1 3.3 3.3 3.3
53 1 3.3 3.3 6.7
55 1 3.3 3.3 10.0
67 1 3.3 3.3 13.3
97 1 3.3 3.3 16.7
110 1 3.3 3.3 20.0
124 1 3.3 3.3 23.3
246 1 3.3 3.3 26.7
349 1 3.3 3.3 30.0
369 1 3.3 3.3 33.3
389 1 3.3 3.3 36.7
393 1 3.3 3.3 40.0
444 1 3.3 3.3 43.3
469 1 3.3 3.3 46.7
474 1 3.3 3.3 50.0
502 1 3.3 3.3 53.3
506 1 3.3 3.3 56.7
516 1 3.3 3.3 60.0
39
534 1 3.3 3.3 63.3
535 1 3.3 3.3 66.7
864 1 3.3 3.3 70.0
995 1 3.3 3.3 73.3
1168 1 3.3 3.3 76.7
1304 1 3.3 3.3 80.0
1324 1 3.3 3.3 83.3
1486 1 3.3 3.3 86.7
1635 1 3.3 3.3 90.0
2071 1 3.3 3.3 93.3
2195 1 3.3 3.3 96.7
2209 1 3.3 3.3 100.0
Total 30 100.0 100.0
Total_Death
Cumulative
Frequency Percent Valid Percent Percent
Valid 0 4 13.3 13.3 13.3
1 3 10.0 10.0 23.3
2 2 6.7 6.7 30.0
4 2 6.7 6.7 36.7
6 1 3.3 3.3 40.0
7 1 3.3 3.3 43.3
8 2 6.7 6.7 50.0
10 1 3.3 3.3 53.3
11 1 3.3 3.3 56.7
12 1 3.3 3.3 60.0
13 1 3.3 3.3 63.3
14 2 6.7 6.7 70.0
17 2 6.7 6.7 76.7
18 1 3.3 3.3 80.0
19 1 3.3 3.3 83.3
27 1 3.3 3.3 86.7
32 1 3.3 3.3 90.0
38 2 6.7 6.7 96.7
64 1 3.3 3.3 100.0
Total 30 100.0 100.0
40
FREQUENCIES VARIABLES=Number_Of_Reported_Cases Total_Death
/FORMAT=NOTABLE
/NTILES=4
/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM MEAN MEDIAN MODE SKEWNESS
SESKEW
/HISTOGRAM NORMAL
/ORDER=ANALYSIS.
Frequencies
Statistics
Number_Of_Re
ported_Cases Total_Death
N Valid 30 30
Missing 0 0
Mean 717.83 12.93
Median 488.00 9.00
Mode 52a 0
Std. Deviation 659.816 14.579
Variance 435357.730 212.547
Skewness 1.105 1.863
Std. Error of Skewness .427 .427
Range 2157 64
Minimum 52 0
Maximum 2209 64
Percentiles 25 215.50 1.75
50 488.00 9.00
75 1202.00 17.25
a. Multiple modes exist. The smallest value is shown
41
Histogram
42
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Number_Of_Reported_Cases
MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Number_Of_Reported_Cases=col(source(s),
name("Number_Of_Reported_Cases"))
DATA: id=col(source(s), name("$CASENUM"), unit.category())
COORD: rect(dim(1), transpose())
GUIDE: axis(dim(1), label("Number_Of_Reported_Cases"))
GUIDE: text.title(label("1-D Boxplot of Number_Of_Reported_Cases"))
ELEMENT: schema(position(bin.quantile.letter(Number_Of_Reported_Cases)),
label(id))
END GPL.
GGraph
* Chart Builder.
GGRAPH
43
/GRAPHDATASET NAME="graphdataset" VARIABLES=Total_Death MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Total_Death=col(source(s), name("Total_Death"))
DATA: id=col(source(s), name("$CASENUM"), unit.category())
COORD: rect(dim(1), transpose())
GUIDE: axis(dim(1), label("Total_Death"))
GUIDE: text.title(label("1-D Boxplot of Total_Death"))
ELEMENT: schema(position(bin.quantile.letter(Total_Death)), label(id))
END GPL.
GGraph
Descriptives
Descriptive Statistics
44
Number_Of_Reported_Case 30 52 2209 717.83 659.816
s
Total_Death 30 0 64 12.93 14.579
Valid N (listwise) 30
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Number_Of_Reported_Cases
Total_Death MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE
/FITLINE TOTAL=NO.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Number_Of_Reported_Cases=col(source(s),
name("Number_Of_Reported_Cases"))
DATA: Total_Death=col(source(s), name("Total_Death"))
GUIDE: axis(dim(1), label("Number_Of_Reported_Cases"))
GUIDE: axis(dim(2), label("Total_Death"))
GUIDE: text.title(label("Simple Scatter of Total_Death by
Number_Of_Reported_Cases"))
ELEMENT: point(position(Number_Of_Reported_Cases*Total_Death))
END GPL.
GGraph
CORRELATIONS
45
/VARIABLES=Number_Of_Reported_Cases Total_Death
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
Correlations
Correlations
Number_Of_Re
ported_Cases Total_Death
Number_Of_Reported_Case Pearson Correlation 1 .743**
s Sig. (2-tailed) .000
N 30 30
Total_Death Pearson Correlation .743** 1
Sig. (2-tailed) .000
N 30 30
**. Correlation is significant at the 0.01 level (2-tailed).
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT Total_Death
/METHOD=ENTER Number_Of_Reported_Cases.
Regression
Variables Entered/Removeda
Variables Variables
Model Entered Removed Method
1 Number_Of_Re . Enter
ported_Casesb
a. Dependent Variable: Total_Death
b. All requested variables entered.
Model Summary
46
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .743a .552 .536 9.927
a. Predictors: (Constant), Number_Of_Reported_Cases
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 3404.534 1 3404.534 34.547 .000b
Residual 2759.333 28 98.548
Total 6163.867 29
a. Dependent Variable: Total_Death
b. Predictors: (Constant), Number_Of_Reported_Cases
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 1.146 2.703 .424 .675
Number_Of_Reported_Case .016 .003 .743 5.878 .000
s
a. Dependent Variable: Total_Death
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Number_Of_Reported_Cases
Total_Death MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE
/FITLINE TOTAL=YES.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Number_Of_Reported_Cases=col(source(s),
name("Number_Of_Reported_Cases"))
DATA: Total_Death=col(source(s), name("Total_Death"))
GUIDE: axis(dim(1), label("Number_Of_Reported_Cases"))
GUIDE: axis(dim(2), label("Total_Death"))
GUIDE: text.title(label("Simple Scatter with Fit Line of Total_Death by ",
"Number_Of_Reported_Cases"))
ELEMENT: point(position(Number_Of_Reported_Cases*Total_Death))
END GPL.
47
GGraph
48
EXAMINE VARIABLES=Number_Of_Reported_Cases Total_Death
/COMPARE GROUPS
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
Explore
Notes
Comments
Filter <none>
Weight <none>
49
Syntax EXAMINE
VARIABLES=Number_Of_Reported_C
ases Total_Death
/COMPARE GROUPS
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
Cases
Number_Of_Reported_Case
30 100.0% 0 0.0% 30 100.0%
s
50
Descriptives
Median 488.00
Variance 435357.730
Minimum 52
Maximum 2209
Range 2157
Median 9.00
Variance 212.547
Minimum 0
51
Maximum 64
Range 64
Interquartile Range 16
Number_Of_Reported_Cases
15.00 0 . 000001123333444
7.00 0 . 5555589
4.00 1 . 1334
1.00 1. 6
3.00 2 . 012
52
53
Total_Death
11.00 0 . 00001112244
4.00 0 . 6788
6.00 1 . 012344
4.00 1 . 7789
.00 2.
1.00 2. 7
1.00 3. 2
2.00 3 . 88
Stem width: 10
54
55