You are on page 1of 7

Biostatistics Midterm 1a 2014

I. Answer the questions pertaining to the data below. 28pts Pg 28pts

Number of seeds per plant for Lupinus polyphyllus from an experimental biological population

Midpoints Limits Frequency Relative Frequency Cumulative Frequency


Number of seeds per plant
35 1
40 3
45 6
50 10
55 7
60 2
65 1

1. Enter the class limits, relative and cumulative frequencies of the first 3 classes in the table. 6pts Label the units of
the frequency, cumulative frequency and relative frequency columns. 3pts (9total)pts
see next pg

2. Find the mean. Show your work. 4pts

1 for equation

3. Find the crude value of the median using the method demonstrated in class. (Show your work.) 4pts

1 pt for 15.5

4. Find the mode. 1pt


50

5. Describe the shape of the distribution of data using appropriate statistical jargon. 2pts

Essentially normal
6. Which of the measures of central tendency is most appropriate for this data? Explain why. Be specific. 4pts
Mean for symmetrical data has least sampling error because it takes into account all the data equallly

7. On what scale is the variable number of seeds per plant measured? 2pts
ratio
8. Is the variable number of seeds per plant discrete or continuous? 2pts
discrete
9. Is this the shape (see#5) that you expect for the distribution based on the type of variable? Explain. 2pts

Although it is counted data (Poisson), the mean is large enough for it to be reasonably symmetric

1
Relative Cumulative
Midpoints Limits Frequency Frequency Frequency

Number of seeds per proportion


# plants # plants
plant plants
35   1 0.033 1
40   3 0.100 4
45   6 0.200 10
50   10    
55   7    
60   2    
65   1    
30

calc of mean calc of median


f*x

15 and
35
(n+1)/2 16
120 both occur in class with midpt =50
270
500 mode =50
385
120
65
1495 49.83333

II. Some studies have associated polyphenols in cocoa with beneficial effects on functions such as weight control
and cognitive function. A particular brand of Dutched cocoa powder has a mean polyphenol content of 40mg/g
with a standard deviation of 11mg/g based on a sample of size 10. Answer the following pertaining to this
information. Include formulas. 30 pts

2
Pg 28pts

1. Calculate the standard error and variance. 4pts

standard error 3.48

variance s2 = 112 = 121

2. What probability distribution (if any) is most likely to apply to the population of this type of data? 1pt
Normal population (t sampling dist assumes normal population)
3. Calculate the degrees of freedom 1pt
n – 1 = 10-1 =9

4. Calculate the 95 % confidence limits of the mean


= 2.262
40 ± 2.262(3.48) =40 ± 7.87
5. What is the advantage of calculating 95% rather than another % confidence limits? Be specific. 2pt

Compromise between accuracy (high percent) and precision (narrow limits)

6. Is the sample size used adequate? Explain and use a formula to calculate what you consider a good sample size. 5pts
No the confidence limits are broad

assume you want d to be 5% of the mean, use the crude approximation of t=2

=121

7. Assume you wish to test whether the level in the sample above is significantly less than the value in Dutched
chocolate which is known to be 96mg/g. 11pts total

a. Write the null and alternate hypothesis in symbols. 2pts


Ho: µ ≥ 96
Ha: µ < 96

b. Based on the 95% confidence limits (#4) will you fail to reject the null hypothesis? Explain. 2pts

Reject 96 is much less than the CL

c. Sketch the sampling distribution and label the reject and fail to reject regions. Include the value of the level of
significance and the critical value in your sketch. 4pts Assume that µ1 = the sample mean and expand your
drawing on the last page to indicate the probability of a Type II error and power. Label using appropriate
symbols. 3pts

α =0.05
reject Fail to reject

3
t crit=-1.833 96= µ0

4
Pg 20pts
III. What’s wrong here? A glaring mistake in either experimental design or analysis has been made in the
following experiments. The mistake is stated in the brief description; it does not involve a factor that is simply
omitted from the description. State the error and how to correct it. 8pts

1. A study was performed to test the hypothesis that plant growth is affected by small amounts of acidity. Twenty
similar small plots of herbaceous vegetation were randomly assigned to either the control (watering with pH=7) and
experimental (watering with a mixture of nitric and sulfuric acid with pH=6.3). After one season of growth the
biomass (dry mass in Kg) of each plot was measured. The mean of the control was 6.8 and the experimental 8.9.
When compared statistically they found a two tailed p of 0.06. They concluded that this reduction in pH did not
affect growth.

They are accepting Ho. Should say no evidence it affects growth


Some credit for saying n is too small since the results are not conclusive as p =0.06.

2. In a study of factors that may prevent heart attacks, investigators wished to determine whether lycopene (an
antioxidant found in tomatoes) increases blood vessel dilation, thus decreasing the chance of a heart attack. They
divided 60 patients with cardiovascular disease randomly into 2 groups: one took a pill with lycopene and the other a
placebo. After 2 months those that took the lycopene pill showed a statistically significant improvement in blood
vessel dilation. The investigators concluded that lycopene protects against heart attacks.

The conclusions are not justified. They should say that lycopene does significantly increase blood vessel in individuals
with cv disease. Note there are 3 problems=can’t extrapolate for dilation to heart attacks, can’t speak of non cv patients
and the statement may have been a bit too strong and not repeated significantly.

IV.Show the formulas (2pts), sketch (1pt) and calculate (2pts) the probability. Give the name of the distribution
you used (1pt). 12 pts total

1. Given that for a particular laboratory the mean C reactive protein (associated with inflammation) in normal
individuals is 1.5 mg/dl with a variance of .72 what is the probability of finding a normal individual with a value
between 1 and 3?

X  1  1.5 3  1.5
Normal Z Z  = -0.59 Z  = 1.77 p (Z1)=0.2776 p
 .72 .72
(Z2)=0.0384
P=1-0.2776-0.0384 =0 .6840

Table Table

2. Given the mean percentage of sexually active Canadians that have had at least one infection with HPV is 75% what is
the probability of finding more than 13 that have been infected in a random sample of 15.

n!
Binomial p=.75 q=.25 n=15 X>13 p( x)  pxqn  x
x!(n  x)!

P(15) = .0134 P(14)= 0.067


PX≥14)

5
0 1 2 3 4 5 6…. 14 15
Left skew

V. Answer the following briefly. 18 pts Pg 18pts

1. Why does the formula for the statistical variance have n-1 as its denominator? Explain in detail. 4pts

To eliminate bias since the value over n is always an underestimate- the numerator based on the range of the sample will
always b << than the range of the population; dividing by n-1 will make up for this- why -1?- it is the degrees of freedom,
the number of deviations that are independent of each other- only n-1 are because in calculation for the variance you have
forced the sum of deviation to zero around the sample mean not the parametric mea; thus only n-1 deviations are
independent of one another.

2. If you could use either normal or t which would be better? Why? 2pts

Normal is better; it is based on the parameter and has a lower σ and thus a narrower sampling distribution and more
power

3. Under what circumstances would you use the t distribution rather than the Normal to calculate probabilities? 1pt

When you do not have the parametric standard deviation (σ) only s-yes you can use it for large sample sizes, but as far as
I am concerned that means at least over 500

4. Which statistic is best to estimate population dispersion? Give the name and symbol. 3pts

Standard deviation s note it is a statistic not a parameter

5. List the properties of the sampling distribution of the mean. 3pts

Usually normally distributed


=

6. Write the Z formula for sample means. 2pts

7. Derive the formula actually used in one sample tests to compare a sample mean with a given value. from #6 using #5.
3pts

Using these equalities:


=

6
Pg 16pts

VI. Based on measures made by ornithology classes over the previous 15 years, the mean number of perching bird
species in a particular woods has been 5.9. One year after spraying with herbicide three species were found.
Does this value indicate a decrease in the number of species? 16 pts

1. State your level of significance 1pt α = 0.05


2. State you hypotheses using symbols 2pts

Ho: µ≥ 5.9
H1: µ< 5.9

3. Find the p value. Indicate it in a sketch shading the appropriate area. 8pts

p(0) +p(1) + p(2) +p(3)


p(X) = e-λ λx 23pts for formula
X!

0.003 + 0.016 + 0.047 + 0.094 =0.160 3 pts

0 1 2 3 4 5 6… ∞ 2pt for graph I gave this point to those who were using normal or t

4. State your statistical conclusions. 2pts


0.160> 0.05 Fail to reject

5. Verbalize your conclusions. 3pts

The number of species is not significantly lower or we have no evidence it is lower


Do not say we have evidence it is not lower; that is accepting Ho.

You might also like