You are on page 1of 7

HAWASSA UNIVERSITY

Faculty of Veterinary Medicine and


Teaching Hospital

Departme
nt of methods and applied statistics
Research
(VetM 713) assignment

by
1. Redwan Anwar (DVM) = R0008/14
2. Iya Halake (BVSc) = R0007/14
3. Demeke Hailu (DVM) = R0003/14

January, 2022
Hawassa, Ethiopia
1) A given researcher is interested to find out if there is any relationship b/n height of the son and his
father. He took random sample of 8 fathers and their sons and collected data on their height in cm as
given in the table below.
Height of the father(X) 159 163 166 175 179 182 185 191
Height of his son(Y) 163 162 165 180 174 168 181 187

a. Fit simple linear regression model representing dependency son’s height on his father.
Y=β0+β1x+£ , Where
 Y = height of son (dependent variable)
 β0 = the intercept (constant i.e. the value of y when x = 0)
 β1 = the slope of the line (proportional change in blood sugar level for a unit change in exercise)
 x = height of fathers (independent variable)
 £ = random error
b. Estimate the regression coefficients and interpret your results.
 Ŷ = a + bx , to estimate the regression coefficients (a and b), we first find the two means, ȳ, and x bar.
 x bar = 1400/8 = 175, ȳ = 1380/8 = 172.5
 b = ¿)¿ ¿) = ¿)¿ ¿) = 0.706
 a = ȳ - bx bar = 172.5-0.706*175 = 48.914
 Ŷ = 48.914 + 0.706x
 Interpretation: it indicates that, for no change in the height of fathers, the height of sons increase by
48.914 and for one cm increase in the height of fathers, there is 49.62 cm increase in height of sons.
Again on average, a 0.706 cm increase in the height of fathers result in 49.62 cm increase in the height of
sons. In general, the relationship is statistically significant.
c. Calculate correlation coefficient and interpret the result.
Ʃxy−ƩxƩy /n 242137−1400∗1380/8
r= = = 0.853
√( Ʃx ¿ ¿ 2−( Ʃx ) / n)¿ ¿ ¿ ¿ ¿
2
√(245902−1400² /8) ¿ ¿ ¿ ¿
 The type of relationship between the height of fathers and their sons is direct/positive and strong
(r=0.853).
d. Obtain coefficient of determination and interpret the result.
 R Square = 0.728, obtained by squaring simple linear correlation coefficient (0.853). This implies that
72.8 % of change in the height of sons is explained/influenced by change in the height of their
fathers.
2) Suppose one of the objectives for a given study is to identify the dependency of the level of blood sugar
on exercise done (distance run) and create a model that represent it . The following table contains data on
the aerobic exercise levels (running distance in km) and blood sugar levels for 12 different days.
Distance (in km) 2.1 2.3 2.5 2.5 3.2 3.5 3.5 3.8 4.1 4.5
Blood sugar 136 146 131 125 120 116 116 104 95 85
(mg/dL)

10 10 10
Summary results ∑ ( xi ) = 32, ∑ ( yi ) = 1174, ∑ ( xiyi ) = 3624.6
n =1 n =1 n =1

10 10

∑ ( xi ) ²=108.44, ∑ ( yi ) ²=140976
n =1 n =1

Then, answer the following questions based on data given.


a. Write simple linear regression model representing dependency of blood sugar on exercise done (distance
run) and explain its component.
Y=β0+β1x+£ , Where
 Y = blood sugar (dependent variable)
 β0 = the intercept (constant i.e. the value of y when x = 0)
 β1 = the slope of the line (proportional change in blood sugar for a unit change in exercise)
 x = exercise (independent variable)
 £ = random error
b. Calculate the regression coefficients and write the fitted model by substituting estimated regression
parameters you calculated above.
 Ŷ = a + bx , to estimate the regression coefficients (a and b), we first find the two means, ȳ, and x bar.
 x bar = Ʃx/n = 32/10 = 3.2, ȳ = Ʃy/n = 1174/10 = 117.4
 b = ¿)¿ ¿) = ¿)¿ ¿) = -21.887
 a = ȳ - bx bar = 117.4 – (-21.877*3.2) = 187.44
 Ŷ = a + bx = 187.44 + (-21.887)x =
 Ŷ = 187.44 – 21.887x
c. Interpret the regression coefficients (type, magnitude and significance).
Interpretation: it indicates that, for no change in exercise, the level of blood sugar will change by 187.44
and for one km increase in exercise, there is a decrease in 165.553 mg/dL in blood sugar level. Again on
average, a 21.887 km decrease in exercise results in 165.553 mg/dL in blood sugar level. In general, the
relationship is statistically significant.
d. Suppose a given individual has run 3km distance. What would be his expected blood sugar level?
Ŷ = a + bx = 187.44 – 21.887 (3) = 121.779
e. Calculate simple linear correlation coefficient and interpret the result (type, magnitude and significance).
Ʃxy−ƩxƩy /n 3624.6−32∗1174 / 10
r= = = - 0.959
√( Ʃx ¿ ¿ 2−( Ʃx ) / n)¿ ¿ ¿ ¿ ¿
2
√(108.44−32²/10)¿ ¿ ¿ ¿
 The type of relationship between the level of blood sugar and exercise is direct/positive and strong
(r= - 0.959) and is statistically insignificant since the p-value is < 0.005 because r is significant only
when p-value is greater than 0.05.
f. Obtain coefficient of determination and interpret the result.
 R Square = 0.919, obtained by squaring simple linear correlation coefficient (0.959). This implies that
91.9% of change in blood sugar level is explained/influenced by change in exercise/distance run.
3) Three different techniques namely medication, exercises and special diet are randomly assigned to
individuals to test their effect in lowering the blood pressure. After four weeks the reduction in each
person’s blood pressure is recorded in the following table.
Medication Exercise Diet
11 6 7
12 8 9
9 3 12
15 1 9
14 2 5
Then:
a) Test at 5% level, whether there is significant difference in mean reduction of blood pressure among the
three techniques and write your conclusion.
Medication Exercise Diet Total
11 6 7 24
12 8 9 29
9 3 12 24
15 1 9 25
14 2 5 21
Ʃ= 61 Ʃ=20 Ʃ= 42 123
Mean 1 =61/5 Mean2 = 20/5 Mean3 = 42/5 Total mean= 123/15
12.2 4 8.4 8.2

Step 1: state hypothesis; HO: mean1=mean2=mean3


H1: at least one mean is different
Step 2: state the level of significance, α =0.05
Step 3: the appropriate test statistic is F-test
Step 4: Compute test statistic F = MSB/MSW
 SSB = (Ʃ(xi-xbar)²)*n = (12.2-8.2)² + (4-8.2)² + (8.4 – 8.2)² *5 = 168.4
 SSW = = Ʃ(Ʃ(xij-xibar)²) = (11-12.2)² + (12-12.2)² (9-12.2)² + 15-12.2)² + (14-12.2)² + (6-4)² + (8-
4)² + (3-4)² + (1-4)² + (2-4)² + (7-8.4)² + (9-8.4)² + (12-8.4)² + (9-8.4)² + (5-8.4)² = 84
 SST = SSB+SSW = 168.4+84 = 252.4
 MSB = SSB/k-1 = 168.4/(3-1) = 84.2
 MSW = SSW/n-k =84/(15-3) = 7
 F = MSB/MSW = 84.2/7 = 12.028
ANOVA summary table
Source of variation df SS MS F
Between 2 168.4 84.2 12.028
Within 12 84 7
Total 14 252.4 91.2

Step 5: compute the critical value


Fα (df1,df2) = F0.05(2,12) = 3.89
Decision: we reject HO since F calculated is greater than F α, i.e 12.028 > 3.89.
Step 6: conclusion: we are 95% confident that the one, two or all of the three different techniques have an
effect on lowering the blood pressure of individuals.
b) If the difference is statistically significant, go through individual test to identify the reason for the
difference
First: let we find the LSD which is LSD = t(n-k) α/2 √ 2∗MSW = t(15-3)0.05/2
√2∗7 = 2.179*1.67332 =
¿ 5
3.646.
Then let compare the differences:
i) Difference 1: mean 1 – mean 2 = 12.2-4 = 8.2, (medication vs exercise)
ii) Difference 2: mean 1 – mean 3 = 12.2-8.4 =3.8, (medication vs diet)
iii) Difference 3: mean 2 – mean 3 = 4- 8.8 = |-4.4| = 4.4 (exercise vs diet)
All the three different techniques namely medication, exercises and special diet are responsible
for reduction in mean reduction of blood pressure since the difference between their mean is
greater than that of least significant difference value (3.646).
4) Suppose an output of multiple linear regression analysis using SPSS software is displayed in the
following. The aim was to study the dependency of consumption level of households on monthly income,
family size, and schooling cost of households.
Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .890a .883 .857 .72502
a. Predictors: (Constant), monthly income, family size, and schooling cost of households

ANOVAa
Model Sum of df Mean Square F Sig.
Squares
1 Regression 60.601 2 30.30 33.290 .000b
Residual 4.550 5 .910
Total 65.156 7
a. Dependent Variable: consumption level of households
b. Predictors: (Constant), monthly income, family size, and schooling cost of households

Model Estimated Regression Coefficients t Sig.


B Std. Error
(Constant) -453.604 505.411 -0.897 0.396
Monthly 0.707 0.141 5.000 0.001
income
Family size 89.091 105.276 0.846 0.022
Schooling -0.329 0.244 -1.347 0.041
cost

Based on the output displayed in the table above, answer the following questions based on the output
displayed.
a. Before interpreting the output regression coefficients, test the adequacy of the model using output on
Model summary (R-square Table 1) and ANOVA Table 2.
 In this case to be ANOVA model is an adequate the coefficient of determination (R-square)
percent is greater than 50% and p-value is less than 0.05, then ANOVA is adequate. 0.883 *
100% = 88.3% which is greater than 50% and p-value is 0.000 which is less than 0.05. Therefore
ANOVA is adequate.
b. Write the model of multiple linear regression representing the problem and explain each component.
Y = β0+β1x1+ β2x2 + β 3X3 + £ , Where
 Y = consumption level of households (dependent variable)
 X = family size, monthly income, and school cost which are independent variables (expenses).
 β0 = is the consumption level of household which is not affected by monthly income, family size,
and school cost.
 β1 = the amount of income which affect the family level of consumption.
 β2 = the amount of family size which affect family level of consumption.
 β3 = school cost = cost/expense which affect the family’s monthly expense.
 £ = random error
c. Write the fitted model of MLR using (substituting) the regression coefficients displayed.
Ŷ = -453.604 + 0.707x1 + 89.091x2 -0.329x3
d. Interpret the regression coefficients (considering type, magnitude and significance) displayed in the
output table.
 β0 = - 453.604 indicates that the household is expected to expend even have no income, no other
individual added to family, and no school expense.
 β1 = 0.707 which is directly related with family consumption cost, which means monthly income
increase at the same time consumption expenditure increases. They have intermediate relationship.
 The monthly income is significant variable which affect family’s monthly expense.
 β2 = 89.091 which is the family size of the household. It is directly related to the monthly
expenditure of the household i.e increase or decrease in the number of family size greatly affects
the family’s monthly expenditure. It is also significantly associated with monthly expenditure of the
family.
 β3 = -0.329 (school cost) = the type of relationship between school cost and monthly expenditure of
household is indirectly related. The relationship between school cost and monthly expenditure is
intermediate because of change in the school cost slightly affect the family’s monthly expense. The
variable is significant even though it moderately related with the dependent variable.
e. Test the significances of regression parameters
 Solution: the significance of the regression is interpreted by the observation of p-value. If it is less
than 0.05 the variable included is significant, otherwise not significant.
 β1 = significant because 0.001 is < 0.05.
 β2 = significant because 0.022 is < 0.05.
 β3 = significant because 0.041 is < 0.05.
f. Suppose a given household with monthly income and schooling cost of 8000 and 1200 respectively has a
family size of 5. What would be the expected monthly expenditure level of the household?
Solution: substitute in the fitted model equation, Ŷ = -453.604 + 0.707x1 + 89.091x2 -0.329x3.
 Given x1 = 8000, x2 = 5, x3 = 1200,
 Then Ŷ = -453.604 + 0.707*8000 + 89.091*5 -0.329*1200 = 5253.051.
 So the household is expected to expend 5253.051.

You might also like