You are on page 1of 27

Macroeconomics(Econ 3033)

Department of statistics
GROUP ASSIGNMENT:
Name ID No

1. Adisu Bantie.............................NSR/162/12
2. Abera Worku………………………………NSR/101/12
3. Bikes Abebe………………………. NSR/549/12
4. Amarch wubetie………………… NSR/479/12
5. Wendie Aynalem……………………… NSR/2426/12
6. Abebech Zewdu…………….............NSR/065/12
7. Dessie Tenagne............................NSR/796/12
8. Gedifew Getie.................................NSR/1091/12
9. Maerg Kiflie...................................NSR/1573/12
10. Elisa Afewerki.............................NSR/865/12
11. Leyikun Demisie .......................... NSR/1495/12

Submission date………MAY 26 /2022

Submitted to Mr.Asfaw
Statistical Computing-I (Stat 3021), Project
Answer the following questions accordingly and analyze the data using SPSS software and
prepare your report on the Microsoft word for the questions given as an assignment. Your report
must include results, and interpretation of results. Deadline for submission of your report is May
25, 2022.
1.
Consider the following two data sets, patient history and diagnosis data registered separately
from a random sample of 4 patients at emergency and diagnosis rooms and then write how to
merge two data and create combine single SPSS data.

Table 1: Patient history


ptid age sex weight
P1 52 M 67
P2 45 F 45
P3 25 M 58
P4 37 M 37
Table 2: Diagnosis data
ptid diseases Result Pulse rate
P1 Malaria positive 67
P4 Malaria negative 70
P2 Pneumonia positive 58
P3 Lung cancer positive 63
Merged file
 First rearrange table 2 as shown below
ptid diseases Result Pulse rate
P1 Malaria positive 67
P2 Pneumonia positive 70
P3 Lung cancer positive 58
P4 Malaria negative 63

Merging Data Files

You can merge data from two files in two different ways. You can:

 Merge the active dataset with another open dataset SPSS Statistics data file containing
the same variables but different cases.
 Merge the active dataset with another open dataset or IBM SPSS Statistics data file
containing the same cases but different variables.

To Merge Files

1. From the menus choose:

Data > Merge Files

2. Select Add Variables.

 For more information on merging files by adding variables (columns), see Add Variables.

Fristnamethe variableand arrange the second table sequence order after this entrthe given data
set in the data window of spss different file name then save as the two file with extantion of the
file is .sav.To merge the two follow the above sign tax .

2. Consider the following data on the investigation by a pharmaceutical company wishes to know
whether an experimental drug being tested in its laboratories has any effect on the systolic blood
pressure. Thirty randomly selected subjects were given the drug and their systolic blood
pressures (in millimeters) are recorded as displayed below:
172,140,123,130,115,148,108,129,137,161,123,152,133,128,142,176,134,143,154,134,125,110,1
11,143,119,162,140,132,120,160. Add a column to the data, called SBP_S where you specify
(for each row) if the systolic blood pressures is ``normal'' (< 130) or `` at Risk'' (>=130) and
summary with appropriate statistics.

Recode into Different Variables

 The Recode into Different Variables dialog box allows you to reassign the values of
existing variables or collapse ranges of existing values into new values for a new
variable. For example, you could collapse salaries into a new variable containing
salary-range categories.

 You can recode numeric and string variables.


 You can recode numeric variables into string variables and vice versa.
 If you select multiple variables, they must all be the same type. You cannot recode
numeric and string variables together.

#Steps

1. From the menus choose:

Transform > Recode into Different Variables...


2. Select the variables you want to recode. If you select multiple variables, they must be
the same type (numeric or string).
3. Enter an output (new) variable name for each new variable and click Change.
4. Click Old and New Values and specify how to recode values.

to recode the given data first calculate the descriptive statistics.

Case Processing Summary


Cases
Included Excluded Total
N Percent N Percent N Percent
blood pressure * 30 100.0% 0 0.0% 30 100.0%
systolic blood pressure
of status

Report
blood pressure
systolic blood pressure Mean N Std. Median Std. Error of
of status Deviation Mean Sum
normal 119.18 11 7.291 120.00 2.198 1311
Risk 147.00 19 13.812 143.00 3.169 2793
Total 136.80 30 17.962 134.00 3.279 4104
3. Consider the following data set on the strength of materials manufactured two branches of
Company-Z: Branch1: 1.26, 0.34, 0.70, 1.75, 50.57, 1.55, 0.08, 0.42, 0.50, 3.20, 0.15, 0.49, 0.95,
0.24, 1.37, 0.17, 6.98, 0.10, 0.94, 0.38 and from Branch-2: 2.37, 2.16, 14.82, 1.73, 41.04, 0.23,
1.32, 2.91, 39.41, 0.11, 27.44, 4.51, 0.51, 4.50, 0.18, 14.68, 4.66, 1.30, 2.06, 1.19.
a. Summarize data on the strength of materials by computing the appropriate summary measures

Descriptive Statistics
Std.
N Range Minimum Maximum Sum Mean Deviation Variance
strength of 40 50.49 .08 50.57 239.27 5.9817 12.10669 146.572
material
Valid N (listwise) 40
b. Summarize data on the strength of materials by computing the appropriate summary measures
by branch and interpret your results.

Report
strength of material
Std. Std. Error of
branch Mean N Deviation Minimum Maximum Sum Mean Variance
branch1 3.6070 20 11.16464 .08 50.57 72.14 2.49649 124.649
branch2 8.3565 20 12.81938 .11 41.04 167.13 2.86650 164.337
Total 5.9817 40 12.10669 .08 50.57 239.27 1.91424 146.572
Interpretation
 Branch 1 has 20 observations and from 20 observations there is maximum and minimum
value which is 50.57 and 0.08 respectively.
 Branch 2 has 20 observations and from 20 observations there is maximum and minimum
value which is 41.04 and 0.11 respectively.
 Standard error of branch 2 is 12.81938 and standard error of branch1is 11.16464 based on
these values branch2 is more variable.
C. Construct simple box-plot of strength of materials and interpret it

Interpretation
 The result of box plot seems like right skewed distribution .the box plot have outliers
from 5, 23, 25, 29,31and 36 observations.
 The box plot deviate from normal because the median value not located at the center of
box plot.
d. Construct comparative box-plot of strength of materials by branches and interpret it.

Interpretation
 Both Branch1 and branch 2 seems like right skewed distribution. we observe outlier
data in our box plot output .the distribution of branch1 and branch2 deviate from
normal because the median value not located at the center of box plot.
4. A demographer was interested to know the number and percentage of households by marital
status group, which is often classified as single(S), married (M), divorced (D), Widowed (W) in
a certain town. The following data were obtained from the survey conducted: D S D D S W S D
S S D D W M M S D DD W M M S S W D M M D D W D D S S W D D S D S M W M W D S
WDWDMMSSDWWSSSWSDMMSSDSDM
A. Summarize the data by computing the appropriate summary statistics.

marital status group


Valid Cumulative
Frequency Percent Percent Percent
Valid divorced 24 33.3 33.3 33.3
married 13 18.1 18.1 51.4
single 21 29.2 29.2 80.6
widowed 14 19.4 19.4 100.0
Total 72 100.0 100.0

B. Present the data using the appropriate graph

Interpretation
 In marital status group 33.33% of households are divorced this is most frequent than
other marital status groups and 18.06% of households are married this is less
frequented value from other marital status groups.
5. Consider the following data on average weekly milk yields (in gallons) of a herd of 100 cows

a. Compute the mean and variance average weekly milk yields (in gallons)

Descriptive Statistics
N Mean Variance
average milk yields(in 100 16.850 19.078
gallon) a herd of 100
cows
Valid N (listwise) 100

b. Display data by histogram and give your comment in relation to normal distribution
Interpretation of the result

Left Skewed Distribution: Mean < Median < Mode

In a left skewed distribution, the mean is less than the median.

Right Skewed Distribution: Mode < Median < Mean

In a right skewed distribution, the mean is greater than the median.

No Skew: Mean = Median = Mode

In a symmetrical distribution, the mean, median, and mode are all equal.

 THERE FORE graph is a symmetrical distribution because in see the above graph
normal distrubtion
 THE data of the graph follow normal distribution
C. Display data by Box plot and give your comment in relation to normal distribution
Interpretation of the result
 The data of boxplotdo not show normal distribution
 Our box plot seems left skewed distribution. And the graph shows the existence of outlier
observation because there is a value deviate from the rest observation this value is 89.

6. Refer the following data which represent persons in a class and consist of their name, sex,
weight, height, and age

a. Create data frame with variables contented in the above data.

b. Construct scatter plot of height versus age and weight versus age;

B.ANSWER
Interpretation
 There is no linear relationship between age and height of person in class because the
points are far apart each other or the point’s does not follow straight line.

Interpretation
 The points are spread out each other or does not fallow straight line hence age and
weight of persons in class has no linear relationship.

C. Construct scatter plot of height versus age but differentiated (grouped) by sex

Interpretation
 The height of male approximately greater than height of females in this given data.

D. Construct a bar graph of sex group;


Interpretation
The bar chart shows Number of males greater than number of females in the class.

E. Construct a box-PLOT and-whisker graph of weight across the sex group;


Interpretation
 The box plot of male seems left skewed and the box plot of female seems right
skewed.

F. Construct a histogram to demonstrate the distribution of weight.


Interpretation
 It is negatively /Left skewed distribution because the tail is long to the left, and we
observe the outlier from histogram graph.

g. Interpret each of these graphs.


Interpret each graph below the figure in each question
B
 There is no linear relationship between age and height of person in class because the
points are far apart each other or the points does not follow straight line.
 The points are spread out each other or does not fallow straight line hence age and
weight of persons in class has no linear relationship.
C.
 The height of male approximately greater than height of females in this given data.
D.
 The bar chart seems Number of males greater than number of females in the class.
E.
 The box plot of male seems left skewed and the box plot of female seems right
skewed.
F.
It is negatively skewed distribution because the tail is long to the left, and we observe
the outlier from histogram graph.

h. Compute appropriate summary statistics for SEX of students


Frequencies

Statistics
SEX
N Valid 18
Missing 0
Mean .61
Median 1.00
Mode 1
Std. Deviation .502
Variance .252
Skewness -.498
Std. Error of Skewness .536
Kurtosis -1.987
Std. Error of Kurtosis 1.038
Sum 11
Percentiles 25 .00
50 1.00
75 1.00

sex of persons in class


Valid Cumulative
Frequency Percent Percent Percent
Valid female 7 38.9 38.9 38.9
male 11 61.1 61.1 100.0
Total 18 100.0 100.0
I.Compute appropriate summary statistics for height of students

Descriptive Statistics
Minimu Maximu Std.
N m m Mean Deviation
height of persons in 18 62 75 69.06 3.523
class
Valid N (listwise) 18

j. Compute appropriate summary statistics for weight of students by gender

Case Processing Summary


Cases
Included Excluded Total
N Percent N Percent N Percent
weight of persons in 18 100.0% 0 0.0% 18 100.0%
class * sex of persons
in class

Report

weight of persons in class

SEX Mean N Std. Deviation Sum Median Skewness Minimum Maximum Ra


F 123.29 7 13.889 863 124.00 -.941 98 139
M 161.64 11 10.902 1778 163.00 -.495 143 176
Total 146.72 18 22.541 2641 150.00 -.565 98 176

7. The following data describe a study in which the effects of two possible treatments for
hypertension were investigated. Thirty-two subjects suffering from hypertension were recruited
to the study, with sixteen being randomly allocated to each of treatment. Blood pressure
measurements were made on each subject after treatment, leading to the following data

a. Compute summary statistics for Blood pressure of patients

Descriptive Statistics
Std.
N Minimum Maximum Sum Mean Deviation Variance
blood pressure of 32 152 219 5838 182.44 18.893 356.964
patients
Valid N (listwise) 32
B.Compute summary statistics for Blood pressure of patients by drug types

Report
blood pressure of patients
Std. Minimu Maximu
drug types Mean N Deviation Sum m m Variance
206.00 1 . 206 206 206 .
drugA 173.50 16 14.180 2776 152 197 201.067
drugB 190.40 15 19.394 2856 159 219 376.114
Total 182.44 32 18.893 5838 152 219 356.964

c. Construct comparative box-plot for Blood pressure of patients by drug types


Interpretation
Interpretation
 Box plot for drug A seems normal, and box plot for drug B seems right-skewed
distribution because the area above median is greater than area below median.

8. Consider the following data on monthly totals of international airline passengers, 1957 to 1960
of country XZY, create time series plot of data
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct
Nov Dec
1957 315 301 356 348 355 422 465 467 404 347 305
336
1958 340 318 362 348 363 435 491 505 404 359 310
337
1959 342 406 396 420 472 548 559 463 407 362 405
1960 419 461 472 535 622 606 508 461 390 432 408
Model Statistics
Model Fit
statistics Ljung-Box Q(18)
Number of Stationary R- Number of
Model Predictors squared Statistics DF Sig. Outliers
data on monthly totals 0 .353 30.792 16 .014 0
of international airline
passenger-Model_1

9. Examinations of corn plants on various soils have the concentration of inorganic (x1) and
organic (x2) phosphorus in the soils and the phosphorus content (y) on corn grown in the soils
were measured for 17 soils as the results of this examination shown below:
a. Fit multiple linear regression using SPSS and identify the significant predictor for the
phosphorus content ( y) on corn

Model Summary
Adjusted R Std. Error of
Model R R Square Square the Estimate
1 .725 a
.525 .457 12.251
a. Predictors: (Constant), organic, inorganic

ANOVAa
Sum of
Model Squares df Mean Square F Sig.
1 Regression 2325.179 2 1162.590 7.746 .005b
Residual 2101.291 14 150.092
Total 4426.471 16
a. Dependent Variable: phospherus content(y)
b. Predictors: (Constant), organic, inorganic

Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 66.465 9.850 6.748 .000
inorganic 1.290 .343 .756 3.764 .002
organic -.111 .249 -.090 -.447 .662
a. Dependent Variable: phosphorus content(y)
Interpretation

 From our out put we can observe significant predictor is inorganic (x1) because P-
value(0.002)<α value(0.05) but organic(x2) is not significant predictor since P-
value(0.662)>α(0.05 ) then we can fit regression model by using significant predictor.
 Phosphorus content(y) = constant(β0)+inorganic(β1)+organic(β2) Phosphorus
content(y)=66.465+1.290(inorganic)-0.111(organic)

b. Test significance of each regression coefficients and interpret it.

Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 66.465 9.850 6.748 .000
inorganic 1.290 .343 .756 3.764 .002
organic -.111 .249 -.090 -.447 .662
a. Dependent Variable: phosphorus content(y)
Interpretation
 Constant (β0) is significant because p-value (0.00) < significance level α(0.05)
 Inorganic (β1) is significant because p-value (0.002) < significance level α(0.05)
 Organic (β2) is not significant because p-value (0.662) > significance level α(0.05)
 Constant (β0) and inorganic (β1) are significant for our regression model but organic (β2)
is not significant for our regression model.
 When inorganic phosphorus increased by one unit phosphorus content increased by
1.290.
 A constant value (66.465) indicates that the value o phosphorus content without
considering the value of independent variable.

c. Test the significance of overall regression model.

ANOVAa
Sum of
Model Squares df Mean Square F Sig.
1 Regression 2325.179 2 1162.590 7.746 .005b
Residual 2101.291 14 150.092
Total 4426.471 16
a. Dependent Variable: phospherus content(y)
b. Predictors: (Constant), organic, inorganic
Interpretation
From ANOVA table we can observe p-value (0.005) < significance level α(0.05)
Because of this we can conclude the overall regression model is significance.

d. Perform regression diagnostics like linearity, multicollinearity and normality of residuals

Coefficientsa
Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) 66.465 9.850 6.748 .000
inorganic 1.290 .343 .756 3.764 .002 .841 1.189
organic -.111 .249 -.090 -.447 .662 .841 1.189
a. Dependent Variable: phosphorus content(y)

Collinearity Diagnostics
Eigen Condition Variance Proportions
Model Dimension value Index (Constant) inorganic organic
1 1 2.681 1.000 .01 .04 .01
2 .275 3.120 .07 .89 .03
3 .044 7.829 .92 .07 .96
a. Dependent Variable: phosphorus content(y)

Residuals Statisticsa
Minimu Maximu Std.
m m Mean Deviation N
Predicted Value 61.10 99.38 76.18 12.055 17
Residual -25.282 16.946 .000 11.460 17
Std. Predicted -1.251 1.925 .000 1.000 17
Value
Std. Residual -2.064 1.383 .000 .935 17
a. Dependent Variable: phosphorus content(y)
Interpretation

 Variance inflation factor (VIF)=1.89 this indicates that there is no multicolinearity


between residuals because Variance inflation factor (VIF) less than 10.
 Tolerance=0.841 this indicates that there is no multicolinearity between residuals because
Tolerance (0.841) greater than 0.2.
 The residuals seem not normal because the points not strongly touch the fitted line from
normal p-p plot graph.

e. Draw your conclusions in relation to the objective of the study

 The phosphorus content (y) on corn grown in the soils was affected by inorganic soil
concentration.
 Generally 45.7% of phosphorus content is explained by independent variable.

10. Consider an experiment on spring cabbage in which there was four treatments arranged in
four randomized blocks of four plots each. The experiment was to compare three sources of
nitrogen with a control treatment of no nitrogen on the yields spring cabbage, there being one
replicates of the control in each block, blocking factor was site. The data on the yields spring
cabbage from the experiment were as follows.
Types of
Nitrogen sources Blocks

I II III IV
Nitro-chalk 70.3 72.5 79.0 86.2

Sulphate 75.5 63 65.4 67.7


ammonia

Nitrate 85.2 80.5 83.6 92.3

Control 35.7 39.6 45.5 50.5

a. Test of whether there is statistically significant difference on the average the spring cabbage
by sources of nitrogen.

Between-Subjects Factors
N
Source of Control 4
nitrogen Nitrate 4
nitro-chalk 4
sulphate ammonia 4

Tests of Between-Subjects Effects


Dependent Variable: average spring cabbage
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected 4068.937 a
3 1356.312 36.660 .000
Model
Intercept 74597.266 1 74597.266 2016.290 .000
Nitrogen type 4068.937 3 1356.312 36.660 .000
Error 443.968 12 36.997
Total 79110.170 16
Corrected Total 4512.904 15
a. R Squared = .902 (Adjusted R Squared = .877)
Interpretation
 There is statistically significant difference on the average the spring cabbage by sources
of nitrogen.
 Because the p-value (sig)=(<0.001) less than level of significance(α=0.05).

b. If there is statistically significant difference on the average the spring cabbage by sources of
nitrogen, then conduct post ANOVA test to identify where the difference lies.
Multiple Comparisons
Dependent Variable: average spring cabbage
Tukey HSD
Mean 95% Confidence Interval
(I) Source of (J) Source of Difference (I-
nitrogen nitrogen J) Std. Error Sig. Lower Bound Upper Bound
Control nitrate -42.575 *
4.3010 .000 -55.344 -29.806
nitro-chalk -34.175 *
4.3010 .000 -46.944 -21.406
sulphate ammonia -25.075 *
4.3010 .000 -37.844 -12.306
Nitrate control 42.575 *
4.3010 .000 29.806 55.344
nitro-chalk 8.400 4.3010 .258 -4.369 21.169
sulphate ammonia 17.500 *
4.3010 .007 4.731 30.269
nitro-chalk control 34.175 *
4.3010 .000 21.406 46.944
nitrate -8.400 4.3010 .258 -21.169 4.369
sulphate ammonia 9.100 4.3010 .203 -3.669 21.869
sulphate ammonia control 25.075 *
4.3010 .000 12.306 37.844
nitrate -17.500 *
4.3010 .007 -30.269 -4.731
nitro-chalk -9.100 4.3010 .203 -21.869 3.669
Based on observed means.
The error term is Mean Square (Error) = 36.997.
*. The mean difference is significant at the 0.05 level.

Interpretation

 Control wit nitrate


 Control with nitro-chalk
 Control with sulphate ammonia
 Nitrate with sulphate ammonia have significance difference of the mean.
 Nitrate with nitro-chalk
 Sulphate ammonia with nitro-chalk has no significance difference of the mean.

Multiple Comparisons
Dependent Variable: average spring cabbage
Dunnett t (2-sided)a
(I) Source of (J) Source of Mean Std. Error Sig. 95% Confidence Interval
Difference (I-
nitrogen nitrogen J) Lower Bound Upper Bound
nitro-chalk control 34.175* 4.3010 .000 22.636 45.714
sulphate ammonia control 25.075* 4.3010 .000 13.536 36.614
nitrate control 42.575* 4.3010 .000 31.036 54.114
Based on observed means.
The error term is Mean Square(Error) = 36.997.
*. The mean difference is significant at the 0.05 level.
a. Dunnett t-tests treat one group as a control, and compare all other groups against it.

c. Test whether the blocking was effective or not

Tests of Between-Subjects Effects


Dependent Variable: average spring cabbage
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected 226.082 a
3 75.361 .211 .887
Model
Intercept 74597.266 1 74597.266 208.818 .000
Blocks 226.082 3 75.361 .211 .887
Error 4286.823 12 357.235
Total 79110.170 16
Corrected Total 4512.904 15
a. R Squared = .050 (Adjusted R Squared = -.187)
Interpret the result
 Block has no significance effect on average spring cabbage.
11 Dermatologists in a hospital study patient with acute psoriasis, a skin disease. They would
like to know whether medication A is more effective in relieving the symptoms of psoriasis than
medication B. The data are retrospectively collected on 30 patients. The variables are gender
(M/F), age (in years), medication (A/B), and status of relief (1=relief, 0=no relief). The data are
as follows:
A Read the data into SPSS and fit a binary logistic model. Write down the fitted model. Discuss
significance of predictor variables, and goodness of fit of the model. Use α = 0.05

Hosmer and Lemeshow Test


Step Chi-square df Sig.
1 5.273 7 .627
Interpretation
 The distribution is normal because the sig value or p-value 0.627 greater than level of
significance 0.05. Since HO is normal vs. H1 not normal based on our data do not reject
HO.

Classification Tablea
Predicted
releif status of
patients Percentage
Observed no relief relief Correct
Step 1 relief status of no relief 7 2 77.8
patients relief 3 18 85.7
Overall Percentage 83.3
a. The cut value is .500

Variables in the Equation


B S.E. Wald df Sig. Exp(B)
Step 1 a
gender of -3.171 1.471 4.648 1 .031 .042
patients(1)
age of patients .171 .087 3.885 1 .049 1.187
medication type(1) 3.816 1.546 6.092 1 .014 45.441
Constant -3.628 2.471 2.156 1 .142 .027
a. Variable(s) entered on step 1: gender of patients, age of patients, medication type.

B Give interpretation of the estimated significant regression coefficients

Interpretation

 All coefficients except constant (β0) are significant because their p-value (sig) <α=0.05.

You might also like