Professional Documents
Culture Documents
Department of statistics
GROUP ASSIGNMENT:
Name ID No
1. Adisu Bantie.............................NSR/162/12
2. Abera Worku………………………………NSR/101/12
3. Bikes Abebe………………………. NSR/549/12
4. Amarch wubetie………………… NSR/479/12
5. Wendie Aynalem……………………… NSR/2426/12
6. Abebech Zewdu…………….............NSR/065/12
7. Dessie Tenagne............................NSR/796/12
8. Gedifew Getie.................................NSR/1091/12
9. Maerg Kiflie...................................NSR/1573/12
10. Elisa Afewerki.............................NSR/865/12
11. Leyikun Demisie .......................... NSR/1495/12
Submitted to Mr.Asfaw
Statistical Computing-I (Stat 3021), Project
Answer the following questions accordingly and analyze the data using SPSS software and
prepare your report on the Microsoft word for the questions given as an assignment. Your report
must include results, and interpretation of results. Deadline for submission of your report is May
25, 2022.
1.
Consider the following two data sets, patient history and diagnosis data registered separately
from a random sample of 4 patients at emergency and diagnosis rooms and then write how to
merge two data and create combine single SPSS data.
You can merge data from two files in two different ways. You can:
Merge the active dataset with another open dataset SPSS Statistics data file containing
the same variables but different cases.
Merge the active dataset with another open dataset or IBM SPSS Statistics data file
containing the same cases but different variables.
To Merge Files
Data > Merge Files
2. Select Add Variables.
For more information on merging files by adding variables (columns), see Add Variables.
Fristnamethe variableand arrange the second table sequence order after this entrthe given data
set in the data window of spss different file name then save as the two file with extantion of the
file is .sav.To merge the two follow the above sign tax .
2. Consider the following data on the investigation by a pharmaceutical company wishes to know
whether an experimental drug being tested in its laboratories has any effect on the systolic blood
pressure. Thirty randomly selected subjects were given the drug and their systolic blood
pressures (in millimeters) are recorded as displayed below:
172,140,123,130,115,148,108,129,137,161,123,152,133,128,142,176,134,143,154,134,125,110,1
11,143,119,162,140,132,120,160. Add a column to the data, called SBP_S where you specify
(for each row) if the systolic blood pressures is ``normal'' (< 130) or `` at Risk'' (>=130) and
summary with appropriate statistics.
The Recode into Different Variables dialog box allows you to reassign the values of
existing variables or collapse ranges of existing values into new values for a new
variable. For example, you could collapse salaries into a new variable containing
salary-range categories.
#Steps
Report
blood pressure
systolic blood pressure Mean N Std. Median Std. Error of
of status Deviation Mean Sum
normal 119.18 11 7.291 120.00 2.198 1311
Risk 147.00 19 13.812 143.00 3.169 2793
Total 136.80 30 17.962 134.00 3.279 4104
3. Consider the following data set on the strength of materials manufactured two branches of
Company-Z: Branch1: 1.26, 0.34, 0.70, 1.75, 50.57, 1.55, 0.08, 0.42, 0.50, 3.20, 0.15, 0.49, 0.95,
0.24, 1.37, 0.17, 6.98, 0.10, 0.94, 0.38 and from Branch-2: 2.37, 2.16, 14.82, 1.73, 41.04, 0.23,
1.32, 2.91, 39.41, 0.11, 27.44, 4.51, 0.51, 4.50, 0.18, 14.68, 4.66, 1.30, 2.06, 1.19.
a. Summarize data on the strength of materials by computing the appropriate summary measures
Descriptive Statistics
Std.
N Range Minimum Maximum Sum Mean Deviation Variance
strength of 40 50.49 .08 50.57 239.27 5.9817 12.10669 146.572
material
Valid N (listwise) 40
b. Summarize data on the strength of materials by computing the appropriate summary measures
by branch and interpret your results.
Report
strength of material
Std. Std. Error of
branch Mean N Deviation Minimum Maximum Sum Mean Variance
branch1 3.6070 20 11.16464 .08 50.57 72.14 2.49649 124.649
branch2 8.3565 20 12.81938 .11 41.04 167.13 2.86650 164.337
Total 5.9817 40 12.10669 .08 50.57 239.27 1.91424 146.572
Interpretation
Branch 1 has 20 observations and from 20 observations there is maximum and minimum
value which is 50.57 and 0.08 respectively.
Branch 2 has 20 observations and from 20 observations there is maximum and minimum
value which is 41.04 and 0.11 respectively.
Standard error of branch 2 is 12.81938 and standard error of branch1is 11.16464 based on
these values branch2 is more variable.
C. Construct simple box-plot of strength of materials and interpret it
Interpretation
The result of box plot seems like right skewed distribution .the box plot have outliers
from 5, 23, 25, 29,31and 36 observations.
The box plot deviate from normal because the median value not located at the center of
box plot.
d. Construct comparative box-plot of strength of materials by branches and interpret it.
Interpretation
Both Branch1 and branch 2 seems like right skewed distribution. we observe outlier
data in our box plot output .the distribution of branch1 and branch2 deviate from
normal because the median value not located at the center of box plot.
4. A demographer was interested to know the number and percentage of households by marital
status group, which is often classified as single(S), married (M), divorced (D), Widowed (W) in
a certain town. The following data were obtained from the survey conducted: D S D D S W S D
S S D D W M M S D DD W M M S S W D M M D D W D D S S W D D S D S M W M W D S
WDWDMMSSDWWSSSWSDMMSSDSDM
A. Summarize the data by computing the appropriate summary statistics.
Interpretation
In marital status group 33.33% of households are divorced this is most frequent than
other marital status groups and 18.06% of households are married this is less
frequented value from other marital status groups.
5. Consider the following data on average weekly milk yields (in gallons) of a herd of 100 cows
a. Compute the mean and variance average weekly milk yields (in gallons)
Descriptive Statistics
N Mean Variance
average milk yields(in 100 16.850 19.078
gallon) a herd of 100
cows
Valid N (listwise) 100
b. Display data by histogram and give your comment in relation to normal distribution
Interpretation of the result
In a symmetrical distribution, the mean, median, and mode are all equal.
THERE FORE graph is a symmetrical distribution because in see the above graph
normal distrubtion
THE data of the graph follow normal distribution
C. Display data by Box plot and give your comment in relation to normal distribution
Interpretation of the result
The data of boxplotdo not show normal distribution
Our box plot seems left skewed distribution. And the graph shows the existence of outlier
observation because there is a value deviate from the rest observation this value is 89.
6. Refer the following data which represent persons in a class and consist of their name, sex,
weight, height, and age
b. Construct scatter plot of height versus age and weight versus age;
B.ANSWER
Interpretation
There is no linear relationship between age and height of person in class because the
points are far apart each other or the point’s does not follow straight line.
Interpretation
The points are spread out each other or does not fallow straight line hence age and
weight of persons in class has no linear relationship.
C. Construct scatter plot of height versus age but differentiated (grouped) by sex
Interpretation
The height of male approximately greater than height of females in this given data.
Statistics
SEX
N Valid 18
Missing 0
Mean .61
Median 1.00
Mode 1
Std. Deviation .502
Variance .252
Skewness -.498
Std. Error of Skewness .536
Kurtosis -1.987
Std. Error of Kurtosis 1.038
Sum 11
Percentiles 25 .00
50 1.00
75 1.00
Descriptive Statistics
Minimu Maximu Std.
N m m Mean Deviation
height of persons in 18 62 75 69.06 3.523
class
Valid N (listwise) 18
Report
7. The following data describe a study in which the effects of two possible treatments for
hypertension were investigated. Thirty-two subjects suffering from hypertension were recruited
to the study, with sixteen being randomly allocated to each of treatment. Blood pressure
measurements were made on each subject after treatment, leading to the following data
Descriptive Statistics
Std.
N Minimum Maximum Sum Mean Deviation Variance
blood pressure of 32 152 219 5838 182.44 18.893 356.964
patients
Valid N (listwise) 32
B.Compute summary statistics for Blood pressure of patients by drug types
Report
blood pressure of patients
Std. Minimu Maximu
drug types Mean N Deviation Sum m m Variance
206.00 1 . 206 206 206 .
drugA 173.50 16 14.180 2776 152 197 201.067
drugB 190.40 15 19.394 2856 159 219 376.114
Total 182.44 32 18.893 5838 152 219 356.964
8. Consider the following data on monthly totals of international airline passengers, 1957 to 1960
of country XZY, create time series plot of data
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct
Nov Dec
1957 315 301 356 348 355 422 465 467 404 347 305
336
1958 340 318 362 348 363 435 491 505 404 359 310
337
1959 342 406 396 420 472 548 559 463 407 362 405
1960 419 461 472 535 622 606 508 461 390 432 408
Model Statistics
Model Fit
statistics Ljung-Box Q(18)
Number of Stationary R- Number of
Model Predictors squared Statistics DF Sig. Outliers
data on monthly totals 0 .353 30.792 16 .014 0
of international airline
passenger-Model_1
9. Examinations of corn plants on various soils have the concentration of inorganic (x1) and
organic (x2) phosphorus in the soils and the phosphorus content (y) on corn grown in the soils
were measured for 17 soils as the results of this examination shown below:
a. Fit multiple linear regression using SPSS and identify the significant predictor for the
phosphorus content ( y) on corn
Model Summary
Adjusted R Std. Error of
Model R R Square Square the Estimate
1 .725 a
.525 .457 12.251
a. Predictors: (Constant), organic, inorganic
ANOVAa
Sum of
Model Squares df Mean Square F Sig.
1 Regression 2325.179 2 1162.590 7.746 .005b
Residual 2101.291 14 150.092
Total 4426.471 16
a. Dependent Variable: phospherus content(y)
b. Predictors: (Constant), organic, inorganic
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 66.465 9.850 6.748 .000
inorganic 1.290 .343 .756 3.764 .002
organic -.111 .249 -.090 -.447 .662
a. Dependent Variable: phosphorus content(y)
Interpretation
From our out put we can observe significant predictor is inorganic (x1) because P-
value(0.002)<α value(0.05) but organic(x2) is not significant predictor since P-
value(0.662)>α(0.05 ) then we can fit regression model by using significant predictor.
Phosphorus content(y) = constant(β0)+inorganic(β1)+organic(β2) Phosphorus
content(y)=66.465+1.290(inorganic)-0.111(organic)
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 66.465 9.850 6.748 .000
inorganic 1.290 .343 .756 3.764 .002
organic -.111 .249 -.090 -.447 .662
a. Dependent Variable: phosphorus content(y)
Interpretation
Constant (β0) is significant because p-value (0.00) < significance level α(0.05)
Inorganic (β1) is significant because p-value (0.002) < significance level α(0.05)
Organic (β2) is not significant because p-value (0.662) > significance level α(0.05)
Constant (β0) and inorganic (β1) are significant for our regression model but organic (β2)
is not significant for our regression model.
When inorganic phosphorus increased by one unit phosphorus content increased by
1.290.
A constant value (66.465) indicates that the value o phosphorus content without
considering the value of independent variable.
ANOVAa
Sum of
Model Squares df Mean Square F Sig.
1 Regression 2325.179 2 1162.590 7.746 .005b
Residual 2101.291 14 150.092
Total 4426.471 16
a. Dependent Variable: phospherus content(y)
b. Predictors: (Constant), organic, inorganic
Interpretation
From ANOVA table we can observe p-value (0.005) < significance level α(0.05)
Because of this we can conclude the overall regression model is significance.
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) 66.465 9.850 6.748 .000
inorganic 1.290 .343 .756 3.764 .002 .841 1.189
organic -.111 .249 -.090 -.447 .662 .841 1.189
a. Dependent Variable: phosphorus content(y)
Collinearity Diagnostics
Eigen Condition Variance Proportions
Model Dimension value Index (Constant) inorganic organic
1 1 2.681 1.000 .01 .04 .01
2 .275 3.120 .07 .89 .03
3 .044 7.829 .92 .07 .96
a. Dependent Variable: phosphorus content(y)
Residuals Statisticsa
Minimu Maximu Std.
m m Mean Deviation N
Predicted Value 61.10 99.38 76.18 12.055 17
Residual -25.282 16.946 .000 11.460 17
Std. Predicted -1.251 1.925 .000 1.000 17
Value
Std. Residual -2.064 1.383 .000 .935 17
a. Dependent Variable: phosphorus content(y)
Interpretation
The phosphorus content (y) on corn grown in the soils was affected by inorganic soil
concentration.
Generally 45.7% of phosphorus content is explained by independent variable.
10. Consider an experiment on spring cabbage in which there was four treatments arranged in
four randomized blocks of four plots each. The experiment was to compare three sources of
nitrogen with a control treatment of no nitrogen on the yields spring cabbage, there being one
replicates of the control in each block, blocking factor was site. The data on the yields spring
cabbage from the experiment were as follows.
Types of
Nitrogen sources Blocks
I II III IV
Nitro-chalk 70.3 72.5 79.0 86.2
a. Test of whether there is statistically significant difference on the average the spring cabbage
by sources of nitrogen.
Between-Subjects Factors
N
Source of Control 4
nitrogen Nitrate 4
nitro-chalk 4
sulphate ammonia 4
b. If there is statistically significant difference on the average the spring cabbage by sources of
nitrogen, then conduct post ANOVA test to identify where the difference lies.
Multiple Comparisons
Dependent Variable: average spring cabbage
Tukey HSD
Mean 95% Confidence Interval
(I) Source of (J) Source of Difference (I-
nitrogen nitrogen J) Std. Error Sig. Lower Bound Upper Bound
Control nitrate -42.575 *
4.3010 .000 -55.344 -29.806
nitro-chalk -34.175 *
4.3010 .000 -46.944 -21.406
sulphate ammonia -25.075 *
4.3010 .000 -37.844 -12.306
Nitrate control 42.575 *
4.3010 .000 29.806 55.344
nitro-chalk 8.400 4.3010 .258 -4.369 21.169
sulphate ammonia 17.500 *
4.3010 .007 4.731 30.269
nitro-chalk control 34.175 *
4.3010 .000 21.406 46.944
nitrate -8.400 4.3010 .258 -21.169 4.369
sulphate ammonia 9.100 4.3010 .203 -3.669 21.869
sulphate ammonia control 25.075 *
4.3010 .000 12.306 37.844
nitrate -17.500 *
4.3010 .007 -30.269 -4.731
nitro-chalk -9.100 4.3010 .203 -21.869 3.669
Based on observed means.
The error term is Mean Square (Error) = 36.997.
*. The mean difference is significant at the 0.05 level.
Interpretation
Multiple Comparisons
Dependent Variable: average spring cabbage
Dunnett t (2-sided)a
(I) Source of (J) Source of Mean Std. Error Sig. 95% Confidence Interval
Difference (I-
nitrogen nitrogen J) Lower Bound Upper Bound
nitro-chalk control 34.175* 4.3010 .000 22.636 45.714
sulphate ammonia control 25.075* 4.3010 .000 13.536 36.614
nitrate control 42.575* 4.3010 .000 31.036 54.114
Based on observed means.
The error term is Mean Square(Error) = 36.997.
*. The mean difference is significant at the 0.05 level.
a. Dunnett t-tests treat one group as a control, and compare all other groups against it.
Classification Tablea
Predicted
releif status of
patients Percentage
Observed no relief relief Correct
Step 1 relief status of no relief 7 2 77.8
patients relief 3 18 85.7
Overall Percentage 83.3
a. The cut value is .500
Interpretation
All coefficients except constant (β0) are significant because their p-value (sig) <α=0.05.