You are on page 1of 38

Quiz, Lesson 1: Introduction to Statistics

Select the best answer for each question. When you are finished, click Submit Quiz.
For an asymmetric (or skewed) distribution, which of the following statistics is a good
measure for the middle of the data?

a. mean

b. median

c. either mean or median


Which of the following code examples correctly calculates descriptive statistics of
popcorn yield (Yield) for each level of the class variable (Type) in the data
set Statdata.Popcorn, as well as statistics for all levels combined?
The output should include the following statistics: sample size, mean, median,
standard deviation, variance, range, and interquartile range.

a. proc means data=statdata.popcorn
maxdec=2 fw=10
n mean median std var
range qrange;
class Type;
var Yield;
run;

b. proc means data=statdata.popcorn
maxdec=2 fw=10
printalltypes
n mean median std var
range qrange;
class Yield;
var Class;
run;

c. proc means data=statdata.popcorn
maxdec=2 fw=10
printalltypes
n mean median std var
range qrange;
class Type;
var Yield;
run;

d. proc means data=statdata.popcorn
maxdec=2 fw=10
printalltypes
n mean median std range
IQR;
class Type;
var Yield;
run;


1. Read the following statement about the central limit theorem and choose the
answer that contains the correct values for all of the missing fields:
The central limit theorem states that the distribution of sample __(1)__ is
approximately __(2)__, regardless of the distribution of the population data, as
long as the sample size is at least n = __(3)__.

a. means, skewed, 20

b. variance, equal, 30

c. means, normal, 30

d. proportions, equal, 10


2. Psychologists at a college want to know if students are sleeping more or less
than the recommended average of 8 hours a day.
Which of the following code choices correctly tests the null hypothesis?

a. proc univariate data=statdata.sleep mu0<>8;
var hours;
run;

b. proc univariate data=statdata.sleep;
var hours / mu0=8;
run;

c. proc univariate data=statdata.sleep;
var hours / mu0<>8;
run;

d. proc univariate data=statdata.sleep mu0=8;
var hours;
run;


3. How do you define the term power?

a. the measure of the ability of the statistical hypothesis test to reject the
null hypothesis when it is actually false

b. the probability of committing a Type I error

c. the probability of failing to reject the null hypothesis when it is actually
false


4. Select the choice that lists only continuous variables.

a. body temperature, number of children, gender, beverage size

b. age, body temperature, gas mileage, income

c. number of children, gender, gas mileage, income

d. gender, gas mileage, beverage size, income
5.

6. Which of the following code choices creates a histogram for the
variable Speed from the data set SpeedTest with a normal curve overlay and a
box with the skewness and kurtosis statistics printed in the northeast corner?

a. proc univariate data=statdata.speedtest;
histogram Speed / normal(mu=est sigma=est);
inset skewness kurtosis;
run;

b. proc univariate data=statdata.speedtest;
histogram Speed / normal;
inset skewness kurtosis / position=ne;
run;

c. proc univariate data=statdata.speedtest;
histogram Speed / normal(mu=est sigma=est);
inset skewness kurtosis / position=ne;
run;

d. proc univariate data=statdata.speedtest;
histogram Speed / normal(skewness kurtosis);
run;


7. Select the statement below that incorrectly interprets a 95% confidence interval
(15.02, 15.04) for the population mean, if the sample mean is 15.03 ounces of
cereal.

a. You are 95% confident that the true average weight for a box of cereal
is between 15.02 and 15.04 ounces.

b. The probability is .95 that the true average weight is between 15.02
and 15.04 ounces.

c. In the long run, approximately 95% of the intervals calculated with this
procedure will capture the true average weight.
.


8. The shape of a normal distribution depends on the value of which two
parameters?

a. the e x d the stdrd deviti s

b. the stdrd deviti d the vrice

c. the e d the stdrd deviti

d. none of the above


9. The standard error of the mean is

a. used to calculate confidence intervals of the mean.

b. always normally distributed.

c. sometimes less than 0.

d. none of the above







































Quiz Feedback, Lesson 1: Introduction to Statistics

Your Score:
90%
Congratulations! Your score of 90% indicates that you've mastered the
topics in this lesson. If you'd like, check the feedback below and
select Review links for any sections containing topics you want to
review.
When you're ready to start the next lesson, select the lesson in the
Course Menu. Then click a topic in the Lesson Menu to begin.



1. For an asymmetric (or skewed) distribution, which of the following statistics
is a good measure for the middle of the data?
a. mean
b. median
c. either mean or median

Your answer: b
Correct answer: b
The median is not affected by outliers and is less affected by the skewness.
The mean, on the other hand, averages in any outliers that might be in your
data.
Review: Measures of Location


Which of the following code examples correctly calculates descriptive statistics of
popcorn yield (Yield) for each level of the class variable (Type) in the data
set Statdata.Popcorn, as well as statistics for all levels combined?
The output should include the following statistics: sample size, mean, median,
standard deviation, variance, range, and interquartile range.
a. proc means data=statdata.popcorn
maxdec=2 fw=10
n mean median std var
range qrange;
class Type;
var Yield;
run;
b. proc means data=statdata.popcorn
maxdec=2 fw=10
printalltypes
n mean median std var
range qrange;
class Yield;
var Class;
run;
c. proc means data=statdata.popcorn
maxdec=2 fw=10
printalltypes
n mean median std var
range qrange;
class Type;
var Yield;
run;
d. proc means data=statdata.popcorn
maxdec=2 fw=10
printalltypes
n mean median std range
IQR;
class Type;
var Yield;
run;

Your answer: c
Correct answer: c
The PROC MEANS statement must include the option PRINTALLTYPES in
order for SAS to display statistics for all requested combinations of class
variables that is, for each level or occurrence of the variable and for all
occurrences combined. The statistics specified on the second line must
include the keywords N MEAN MEDIAN STD VAR RANGE QRANGE. The
code must specify Type as the class variable and Yield as the analysis
variable.
Review: The MEANS Procedure


3. Read the following statement about the central limit theorem and choose the
answer that contains the correct values for all of the missing fields:
The central limit theorem states that the distribution of sample __(1)__ is
approximately __(2)__, regardless of the distribution of the population data,
as long as the sample size is at least n = __(3)__.
a. means, skewed, 20
b. variance, equal, 30
c. means, normal, 30
d. proportions, equal, 10

Your answer: c
Correct answer: c
The central limit theorem states that the distribution of sample means is
approximately normal, regardless of the distribution of the population data, as
long as the sample size is at least n = 30.
Review: Normality and the Central Limit Theorem


4. Psychologists at a college want to know if students are sleeping more or less
than the recommended average of 8 hours a day.
Which of the following code choices correctly tests the null hypothesis?
a. proc univariate data=statdata.sleep mu0<>8;
var hours;
run;
b. proc univariate data=statdata.sleep;
var hours / mu0=8;
run;
c. proc univariate data=statdata.sleep;
var hours / mu0<>8;
run;
d. proc univariate data=statdata.sleep mu0=8;
var hours;
run;

Your answer: d
Correct answer: d
You specify the MU0= option as part of the PROC UNIVARIATE statement to
indicate the test value of the null hypothesis. The alternative hypothesis is
tht is t equl t 8 hours, but this does not need to be specified in the
PROC UNIVARIATE code.
Review: Using PROC UNIVARIATE to Generate a t Statistic


5. How do you define the term power?
a. the measure of the ability of the statistical hypothesis test to reject the
null hypothesis when it is actually false
b. the probability of committing a Type I error
c. the probability of failing to reject the null hypothesis when it is actually
false

Your answer: a
Correct answer: a
Power is the ability of the statistical test to detect a true difference, or the ability to
successfully reject a false null hypothesis. The probability of committing a Type I
errr is . The prbbility f filig t reject the ull hypthesis whe it is ctully
false is a Type II error.
Review: Types of Errors and Power


6. Select the choice that lists only continuous variables.
a. body temperature, number of children, gender, beverage size
b. age, body temperature, gas mileage, income
c. number of children, gender, gas mileage, income
d. gender, gas mileage, beverage size, income

Your answer: b
Correct answer: b
The continuous variables are age, body temperature, gas mileage, and income.
Review: Types of Variables: Quantitative and Categorical


7. Which of the following code choices creates a histogram for the
variable Speed from the data set SpeedTest with a normal curve overlay and
a box with the skewness and kurtosis statistics printed in the northeast
corner?
a. proc univariate data=statdata.speedtest;
histogram Speed / normal(mu=est sigma=est);
inset skewness kurtosis;
run;
b. proc univariate data=statdata.speedtest;
histogram Speed / normal;
inset skewness kurtosis / position=ne;
run;
c. proc univariate data=statdata.speedtest;
histogram Speed / normal(mu=est sigma=est);
inset skewness kurtosis / position=ne;
run;
d. proc univariate data=statdata.speedtest;
histogram Speed / normal(skewness kurtosis);
run;

Your answer: c
Correct answer: c
In the HISTOGRAM statement, you specify the Speed variable and the NORMAL
option using estimates of the population mean and the population standard
deviation. In the INSET statement, you specify the keywords SKEWNESS and
KURTOSIS, as well as the POSITION=NE option.
Review: The UNIVARIATE Procedure


8. Select the statement below that incorrectly interprets a 95% confidence
interval (15.02, 15.04) for the population mean, if the sample mean is 15.03
ounces of cereal.
a. You are 95% confident that the true average weight for a box of cereal is
between 15.02 and 15.04 ounces.
b. The probability is .95 that the true average weight is between 15.02 and
15.04 ounces.
c. In the long run, approximately 95% of the intervals calculated with this
procedure will capture the true average weight.
.

Your answer: c
Correct answer: b
A 95% confidence interval means that you are 95% confident that the interval
contains the true population mean. If you sample repeatedly and calculate a
confidence interval for each sample mean, 95% of the time your confidence
interval will contain the true population mean. A confidence interval is not a
probability. When a confidence interval is calculated, the true mean is in the
interval or it is not. There is no probability associated with it.
Review: Confidence Intervals


9. The shape of a normal distribution depends on the value of which two
parameters?
a. the e x d the stdrd deviti s
b. the stdrd deviti d the vrice
c. the e d the stdrd deviti
d. none of the above

Your answer: c
Correct answer: c
The shape of a normal distribution depends on the value of two parameters, the
e d the stdrd deviti .
Review: Normal Distribution


10. The standard error of the mean is
a. used to calculate confidence intervals of the mean.
b. always normally distributed.
c. sometimes less than 0.
d. none of the above

Your answer: a
Correct answer: a
The standard error of the mean is part of the equation used to calculate a
confidence interval of the mean. It is not normally distributed, and it is never less
than 0.
Review: Point Estimators, Variability, and Standard Error, Interval Estimators














Quiz, Lesson 2: Analysis of Variance (ANOVA)


Select the best answer for each question. When you are finished, click Submit Quiz.
1. Given this SAS output, is there sufficient evidence to reject the assumption of equal variances?
Quiz, Lesson 3: Regression


Select the best answer for each question. When you are finished, click Submit Quiz.
1. Based on this correlation matrix, what type of relationship do Performance and RunTime have?
Pearson Correlation Coefficients, N = 31
Prob > |r| under H0: Rho=0
Performance RunTime Age
Performance
1.00000


-0.82049
<.0001

-0.71257
<.0001

RunTime
-0.82049
<.0001

1.00000


0.19523
0.2926

Age
-0.71257
<.0001

0.19523
0.2926

1.00000


2.

a. a fairly strong, positive linear relationship

b. a fairly strong, negative linear relationship

c. a fairly weak, positive linear relationship

d. a fairly weak, negative linear relationship


3. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse,
and Maximum_Pulse, the parameter estimate for Age is -2.78. What does this mean?

a. For each year older, the predicted value of oxygen consumption is 2.78 greater.

b. For each year older, the predicted value of oxygen consumption is 2.78 lower.

c. For every 2.78 years older, oxygen consumption doubles.

d. For every 2.78 years younger, oxygen consumption doubles.


4. Given the information in this summary of variable selection, which stepwise selection method
was specified in the PROC REG step?
Step
Variable
Entered
Variable
Removed
Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001
2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222
5.

a. FORWARD

b. BACKWARD

c. STEPWISE

d. can't tell from the information given


6. Here is a table of output statistics from PROC REG. If you sample a new value of the dependent
variable when Performance equals 55, what are the lower and upper prediction limits for this
newly sampled individual value?
Output Statistics
Obs Name Performance
Dependent
Variable
Predicted
Value
Std
Error
Mean
Predict
95% CL Mean
95% CL
Predict
Residual
1 Jack 48 40.8400 44.9026 1.0190 42.0732 47.7319 37.4190 52.3861 -4.0626
2 Annie 43 45.1200 45.3793 1.3081 41.7475 49.0112 37.5570 53.2016 -0.2593
3 Kate 55 44.7500 44.2351 1.4885 40.1023 48.3678 36.1680 52.3021 0.5149
4 Carl 40 46.0800 45.6654 1.6493 41.0862 50.2446 37.3608 53.9700 0.4146
5 Don 58 44.6100 43.9490 1.8646 38.7719 49.1261 35.3003 52.5977 0.6610
6 Effie 45 47.9200 45.1886 1.1361 42.0343 48.3429 37.5763 52.8009 2.7314
7.

a. 44.7500 and 44.2351

b. 40.1023 and 48.3678

c. 36.1680 and 52.3021

d. can't tell from the information given


8. Which of the following statements describes a positive linear relationship between two
variables?
1. The more I eat, the less I want to exercise
2. The more salty snacks I eat, the more water I want to drink.
3. No matter how much I exercise, I still weigh the same.

a. 1 only

b. 1 and 2

c. 2 only

d. 2 and 3

e. 3 only


What output does this program produce?
proc corr data=statdata.bodyfat2 nosimple
plots=matrix(nvar=all histogram);
var Age Weight Height;
run;

a. individual correlation plots and simple descriptive statistics

b. a scatter plot matrix only, with histograms along its diagonal

c. a table of correlations and a scatter plot matrix with histograms along its diagonal

d. can't tell from the information given


9. How many of the following models meet Mallows' C
p
criterion for model selection?


Model
Index
Number in
Model
C(p) R-Square Variables in Model
1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist
2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist
3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist
4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist
5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist


a. 0

b. 1

c. 3

d. 5


In this PROC SCORE step, which option specifies the data set containing the parameter estimates
that are used to score observations?
proc score data=dataset1 score=dataset2
out=dataset3 type=parms;
var Performance;
run;

a. the DATA= option

b. the SCORE= option

c. the OUT= option

10.
According to these parameter estimates, are any of the variables in the model statistically significant
in predicting or explaining the percentage of body fat?
Parameter Estimates
Variable DF
Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 -20.98714 5.55433 -3.78 0.0002
Age 1 0.01226 0.02836 0.43 0.6658
Hip 1 -0.40163 0.09994 -4.02 <.0001
Abdomen 1 0.86123 0.06814 12.64 <.0001


a. no

b. yes, Age

c. yes, Hip and Abdomen

d. yes, Age, Hip, and Abdomen


What output does this program produce?
proc reg data=statdata.bodyfat2 plots(only)=(cp);
model PctBodyFat2 = Age Weight Height
Neck Chest Abdomen Hip Thigh
Knee Ankle Biceps Forearm Wrist
/ selection=cp best=15;
run;
quit;

a. only models that meet both Mallows' and Hocking's C
p
criteria for model selection

b. the best 15 models that meet the criteria for the forward, backward, and stepwise
selection methods

c. a set of the best 15 candidate models according to the C
p
statistic generated using the
all-possible regressions technique

d. can't tell from the information given



a. yes

b. no


2. Given this SAS output, is there sufficient evidence to reject the hypothesis of equal means?
Quiz, Lesson 3: Regression


Select the best answer for each question. When you are finished, click Submit Quiz.
1. Based on this correlation matrix, what type of relationship do Performance and RunTime have?
Pearson Correlation Coefficients, N = 31
Prob > |r| under H0: Rho=0
Performance RunTime Age
Performance
1.00000


-0.82049
<.0001

-0.71257
<.0001

RunTime
-0.82049
<.0001

1.00000


0.19523
0.2926

Age
-0.71257
<.0001

0.19523
0.2926

1.00000


2.

a. a fairly strong, positive linear relationship

b. a fairly strong, negative linear relationship

c. a fairly weak, positive linear relationship

d. a fairly weak, negative linear relationship


3. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse,
and Maximum_Pulse, the parameter estimate for Age is -2.78. What does this mean?

a. For each year older, the predicted value of oxygen consumption is 2.78 greater.

b. For each year older, the predicted value of oxygen consumption is 2.78 lower.

c. For every 2.78 years older, oxygen consumption doubles.

d. For every 2.78 years younger, oxygen consumption doubles.


4. Given the information in this summary of variable selection, which stepwise selection method
was specified in the PROC REG step?
Step
Variable
Entered
Variable
Removed
Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001
2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222
5.

a. FORWARD

b. BACKWARD

c. STEPWISE

d. can't tell from the information given
6.
7.
8. Here is a table of output statistics from PROC REG. If you sample a new value of the dependent
variable when Performance equals 55, what are the lower and upper prediction limits for this
newly sampled individual value?
Output Statistics
Obs Name Performance
Dependent
Variable
Predicted
Value
Std
Error
Mean
Predict
95% CL Mean
95% CL
Predict
Residual
1 Jack 48 40.8400 44.9026 1.0190 42.0732 47.7319 37.4190 52.3861 -4.0626
2 Annie 43 45.1200 45.3793 1.3081 41.7475 49.0112 37.5570 53.2016 -0.2593
3 Kate 55 44.7500 44.2351 1.4885 40.1023 48.3678 36.1680 52.3021 0.5149
4 Carl 40 46.0800 45.6654 1.6493 41.0862 50.2446 37.3608 53.9700 0.4146
5 Don 58 44.6100 43.9490 1.8646 38.7719 49.1261 35.3003 52.5977 0.6610
6 Effie 45 47.9200 45.1886 1.1361 42.0343 48.3429 37.5763 52.8009 2.7314
9.

a. 44.7500 and 44.2351

b. 40.1023 and 48.3678

c. 36.1680 and 52.3021

d. can't tell from the information given
10.
11.
12. Which of the following statements describes a positive linear relationship between two
variables?
1. The more I eat, the less I want to exercise
2. The more salty snacks I eat, the more water I want to drink.
3. No matter how much I exercise, I still weigh the same.

a. 1 only

b. 1 and 2

c. 2 only

d. 2 and 3

e. 3 only
13.
14.
15. What output does this program produce?
16. proc corr data=statdata.bodyfat2 nosimple
17. plots=matrix(nvar=all histogram);
18. var Age Weight Height;
19. run;

a. individual correlation plots and simple descriptive statistics

b. a scatter plot matrix only, with histograms along its diagonal

c. a table of correlations and a scatter plot matrix with histograms along its diagonal

d. can't tell from the information given
20.
21.
22. How many of the following models meet Mallows' C
p
criterion for model selection?


Model
Index
Number in
Model
C(p) R-Square Variables in Model
1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist
2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist
3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist
4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist
5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist


a. 0

b. 1

c. 3

d. 5


23. In this PROC SCORE step, which option specifies the data set containing the parameter
estimates that are used to score observations?
24. proc score data=dataset1 score=dataset2
25. out=dataset3 type=parms;
26. var Performance;
27. run;

a. the DATA= option

b. the SCORE= option

c. the OUT= option
28.
29.
30. According to these parameter estimates, are any of the variables in the model statistically
significant in predicting or explaining the percentage of body fat?
Parameter Estimates
Variable DF
Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 -20.98714 5.55433 -3.78 0.0002
Age 1 0.01226 0.02836 0.43 0.6658
Hip 1 -0.40163 0.09994 -4.02 <.0001
Abdomen 1 0.86123 0.06814 12.64 <.0001


a. no

b. yes, Age

c. yes, Hip and Abdomen

d. yes, Age, Hip, and Abdomen


What output does this program produce?
proc reg data=statdata.bodyfat2 plots(only)=(cp);
model PctBodyFat2 = Age Weight Height
Neck Chest Abdomen Hip Thigh
Knee Ankle Biceps Forearm Wrist
31. / selection=cp best=15;
32. run;
33. quit;

a. only models that meet both Mallows' and Hocking's C
p
criteria for model selection

b. the best 15 models that meet the criteria for the forward, backward, and stepwise
selection methods

c. a set of the best 15 candidate models according to the C
p
statistic generated using the
all-possible regressions technique

d. can't tell from the information given
34.


a. yes

b. no


3. The manufacturer of a cereal company uses two different processes to package boxes of cereal.
He wants to be sure the two processes are putting the same amount of cereal in each box. He
plans to perform a two-sample t-test to determine whether the mean weight of cereal is
significantly different between the two processes. What type of test should he run?

a. an upper-tailed t-test

b. a two-sided t-test

c. a lower-tailed t-test
4.
5.
6. Which of the following is not an assumption you make when including a blocking factor in an
ANOVA randomized block design?

a. The treatments are randomly assigned within each block.

b. The errors are normally distributed.

c. The effects of the treatment factor are constant across the levels of the blocking
variable.

d. The observations are dependent.
7.
8.
9. When you perform ANOVA for a randomized block design, where do you indicate your blocking
variable to SAS?

a. PROC GLM statement

b. MODEL statement only

c. CLASS statement and MODEL statement

d. LSMEANS statement
10.
11.
12. If your blocking variable has a very small F-value in the ANOVA report, what would be a valid
next step?

a. Remove it from the MODEL statement and re-run the analysis.

b. Test an interaction term.

c. Report the F-value and plan a new study.
13.
14.
15. The Dunnett method compares all possible pairs of means, so it can only be used when you
make pairwise comparisons.

a. true

b. false
16.
17.
18. You can examine Levene's Test for Homogeneity to more formally test which of the following
assumptions?

a. the assumption of errors being normally distributed

b. the assumption of independent observations

c. the assumption of equal variances

d. the assumption of treatments being randomly assigned
19.
20.
21. When you perform a two-way ANOVA in SAS, which of the following statements correctly defines
the model that includes the interaction between the two main effect variables?

a. class Drug*Disease;

b. class Drug=Disease;

c. model Drug*Disease;

d. model Health=Drug Disease Drug*Disease;
22.
23.
24. This table shows output from a post hoc pairwise comparison in which you tested the significance
of a drug on patients' health for three different diseases. What conclusion can you make based
on this output?


a. The drug effect is significant when used in patients with disease Z.

b. The drug effect is significant when used in patients with diseases Y and Z.

c. The drug effect is not significant when used in patients with disease Z.




































Quiz Feedback, Lesson 2: Analysis of Variance (ANOVA)

Your Score:
90%
Congratulations! Your score of 90% indicates that you've mastered the
topics in this lesson. If you'd like, check the feedback below and
select Review links for any sections containing topics you want to
review.
When you're ready to start the next lesson, select the lesson in the
Course Menu. Then click a topic in the Lesson Menu to begin.



1. Given this SAS output, is there sufficient evidence to reject the assumption of equal
variances?

a. yes
b. no

Your answer: b
Correct answer: b
The p-value of 0.2942 is greater than 0.05, so you fail to reject the null hypothesis and
conclude that the variances are equal.
Review: The GLM Procedure


2. Given this SAS output, is there sufficient evidence to reject the hypothesis of equal means?

a. yes
b. no

Your answer: a
Correct answer: a
The p-value of <.001 is less than 0.05, so you would reject the null hypothesis and conclude
that the means between the two groups are significantly different.
Review: Examining the Equal Variance t-Test and p-Values


3. The manufacturer of a cereal company uses two different processes to package boxes of
cereal. He wants to be sure the two processes are putting the same amount of cereal in each
box. He plans to perform a two-sample t-test to determine whether the mean weight of cereal
is significantly different between the two processes. What type of test should he run?
a. an upper-tailed t-test
b. a two-sided t-test
c. a lower-tailed t-test
4.
Your answer: x
Correct answer: b
5. Because the cereal manufacturer is interested in determining whether the two processes
produce a different mean cereal weight, he needs to perform a two-sided t-test.
6. Review: Scenario: Comparing Group Means, Scenario: Testing for Differences on One Side
7.

4. Which of the following is not an assumption you make when including a blocking factor in an
ANOVA randomized block design?
a. The treatments are randomly assigned within each block.
b. The errors are normally distributed.
c. The effects of the treatment factor are constant across the levels of the blocking
variable.
d. The observations are dependent.
5.
Your answer: d
Correct answer: d
6. In an ANOVA model, you assume that the errors are normally distributed for each treatment,
the errors have equal variances across treatments, and the observations are independent.
When you add a blocking factor to your ANOVA model, you also assume that the treatments
are randomly assigned within each block and that the effects of the treatment are the same
within each block.
7. Review: More ANOVA Assumptions
8.

5. When you perform ANOVA for a randomized block design, where do you indicate your
blocking variable to SAS?
a. PROC GLM statement
b. MODEL statement only
c. CLASS statement and MODEL statement
d. LSMEANS statement
6.
Your answer: c
Correct answer: c
7. You list the blocking variable in the CLASS statement. You also also specify the variables as
indicated in the ANOVA model, so you list the blocking variable in the MODEL statement.
8. Review: Performing ANOVA with Blocking
9.

6. If your blocking variable has a very small F-value in the ANOVA report, what would be a valid
next step?
a. Remove it from the MODEL statement and re-run the analysis.
b. Test an interaction term.
c. Report the F-value and plan a new study.
7.
Your answer: c
Correct answer: c
8. If the F-value for the blocking variable is small, that is, if it's less than 1, then adding the
blocking factor was not helpful in your analysis. Because you collect the data based on the
blocking factor, your only choice is to report the F-value and plan a new study. The blocking
factor must be included in all ANOVA models that you calculate with the sample that you've
already collected.
9. Review: Performing ANOVA with Blocking
10.

7. The Dunnett method compares all possible pairs of means, so it can only be used when you
make pairwise comparisons.
a. true
b. false
8.
Your answer: b
Correct answer: b
9. The Tukey method and the pairwise t-tests are two methods you learned about that compare
all possible pairs of means, so they can only be used when you make pairwise comparisons.
The Dunnett method compares all categories to a control group.
10. Review: Dunnett's Multiple Comparison Method, Tukey's Multiple Comparison Method
11.

8. You can examine Levene's Test for Homogeneity to more formally test which of the following
assumptions?
a. the assumption of errors being normally distributed
b. the assumption of independent observations
c. the assumption of equal variances
d. the assumption of treatments being randomly assigned
9.
Your answer: c
Correct answer: c
10. You use Levene's Test for Homogeneity in PROC GLM to verify the assumption of equal
variances in a one-way ANOVA model.
11. Review: The GLM Procedure
12.

9. When you perform a two-way ANOVA in SAS, which of the following statements correctly
defines the model that includes the interaction between the two main effect variables?
a. class Drug*Disease;
b. class Drug=Disease;
c. model Drug*Disease;
d. model Health=Drug Disease Drug*Disease;
10.
Your answer: d
Correct answer: d
11. In the MODEL statement, you first specify the main effect variables as they exist in the two-
way ANOVA model. You then define the interaction term by separating the two main effect
variables with an asterisk in the MODEL statement.
12. Review: Performing Two-Way ANOVA with Interactions, Applying the Two-Way ANOVA
Model
13.

10. This table shows output from a post hoc pairwise comparison in which you tested the
significance of a drug on patients' health for three different diseases. What conclusion can you
make based on this output?

a. The drug effect is significant when used in patients with disease Z.
b. The drug effect is significant when used in patients with diseases Y and Z.
c. The drug effect is not significant when used in patients with disease Z.

Your answer: c
Correct answer: c
The p-value for disease Z is 0.7815. Because this p-value is greater than your alpha of 0.05,
you fail to reject the null hypothesis and conclude that there is no significant effect
of Drug onHealth for patients with disease Z.
Review: Performing a Post Hoc Pairwise Comparison


















Quiz Feedback, Lesson 3: Regression

Your Score:
60%
Your score of 60% indicates that you would benefit from reviewing
topics in this lesson. Check the feedback below and
select Review links for questions you missed. When you're ready,
take the quiz again.



1. Based on this correlation matrix, what type of relationship
do Performance and RunTime have?
Pearson Correlation Coefficients, N = 31
Prob > |r| under H0: Rho=0
Performance RunTime Age
Performance
1.00000


-0.82049
<.0001

-0.71257
<.0001

RunTime
-0.82049
<.0001

1.00000


0.19523
0.2926

Age
-0.71257
<.0001

0.19523
0.2926

1.00000


2.
a. a fairly strong, positive linear relationship
b. a fairly strong, negative linear relationship
c. a fairly weak, positive linear relationship
d. a fairly weak, negative linear relationship
3.
Your answer: b
Correct answer: b
4. The correlation coefficient for the relationship between Performance and RunTime is -
0.82049, which is negative. It is also close to 1, making it a relatively strong relationship.
5. Review: Using Correlation to Measure Relationships between Continuous Variables
6.

2. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse,
andMaximum_Pulse, the parameter estimate for Age is -2.78. What does this mean?
a. For each year older, the predicted value of oxygen consumption is 2.78 greater.
b. For each year older, the predicted value of oxygen consumption is 2.78 lower.
c. For every 2.78 years older, oxygen consumption doubles.
d. For every 2.78 years younger, oxygen consumption doubles.
3.
Your answer: d
Correct answer: b
4. The parameter estimate for Age is the average change in Oxygen_Consumption for a 1-unit
change in Age. In this case, the parameter estimate is negative, So, for each year older (a 1-
unit change in Age), oxygen consumption decreases by 2.78 units.
5. Review: The Simple Linear Regression Model
6.

3. Given the information in this summary of variable selection, which stepwise selection method
was specified in the PROC REG step?
Step
Variable
Entered
Variable
Removed
Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001
2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222
4.
a. FORWARD
b. BACKWARD
c. STEPWISE
d. can't tell from the information given
5.
Your answer: d
Correct answer: c
6. The summary table contains both Variable Entered and Variable Removed columns. Of the
three types of stepwise selection (forward, backward, and stepwise), only stepwise selection
can both enter and remove variables. Therefore, STEPWISE must have been specified in the
PROC REG step.
7. Review: The Stepwise Selection Approach to Model Building, Specifying Stepwise Selection
Methods in SAS, The REG Procedure: Performing Stepwise Regression
8.

4. Here is a table of output statistics from PROC REG. If you sample a new value of the
dependent variable when Performance equals 55, what are the lower and upper prediction
limits for this newly sampled individual value?
Output Statistics
Ob
s
Nam
e
Performanc
e
Dependen
t
Variable
Predicte
d
Value
Std
Error
Mean
Predic
t
95% CL Mean
95% CL
Predict
Residua
l
1
Jack 48 40.8400 44.9026 1.0190 42.073
2
47.731
9
37.419
0
52.386
1
-4.0626
2
Annie 43 45.1200 45.3793 1.3081 41.747
5
49.011
2
37.557
0
53.201
6
-0.2593
3
Kate 55 44.7500 44.2351 1.4885 40.102
3
48.367
8
36.168
0
52.302
1
0.5149
4
Carl 40 46.0800 45.6654 1.6493 41.086
2
50.244
6
37.360
8
53.970
0
0.4146
5
Don 58 44.6100 43.9490 1.8646 38.771
9
49.126
1
35.300
3
52.597
7
0.6610
6
Effie 45 47.9200 45.1886 1.1361 42.034
3
48.342
9
37.576
3
52.800
9
2.7314
5.
a. 44.7500 and 44.2351
b. 40.1023 and 48.3678
c. 36.1680 and 52.3021
d. can't tell from the information given
6.
Your answer: d
Correct answer: c
7. The CLI option, which displays the 95% CL Predict column in the Output Statistics table,
produces confidence limits for an individual predicted value. In this table, the third
observation, for Kate, contains the value 55 for Performance. Therefore, the values in
her 95% CL Predict column are the lower and upper confidence limits for a new individual
value at the same value of Performance. In contrast, the CLM option displays the values in
the 95% CL Mean column, which are the lower and upper confidence limits for a mean
predicted value for each observation.
8. Review: Specifying Confidence and Prediction Intervals in SAS, Viewing and Printing
Confidence Intervals and Prediction Intervals, The REG Procedure: Producing Predicted
Values
9.

5. Which of the following statements describes a positive linear relationship between two
variables?
1. The more I eat, the less I want to exercise
2. The more salty snacks I eat, the more water I want to drink.
3. No matter how much I exercise, I still weigh the same.
a. 1 only
b. 1 and 2
c. 2 only
d. 2 and 3
e. 3 only
6.
Your answer: c
Correct answer: c
7. In statement 2, the amount of salty snacks eaten and thirst have a positive linear relationship.
As the values of one variable (amount of salty snacks eaten) increase, the values of the other
variable (thirst) increase as well.
8. Review: Using Scatter Plots to Describe Relationships between Continuous Variables, Using
Correlation to Measure Relationships between Continuous Variables
9.

6. What output does this program produce?
7. proc corr data=statdata.bodyfat2 nosimple
8. plots=matrix(nvar=all histogram);
9. var Age Weight Height;
10. run;
a. individual correlation plots and simple descriptive statistics
b. a scatter plot matrix only, with histograms along its diagonal
c. a table of correlations and a scatter plot matrix with histograms along its diagonal
d. can't tell from the information given
11.
Your answer: b
Correct answer: c
12. By default, PROC CORR produces a table of correlations (which can be a correlation matrix,
depending on your program). The NOSIMPLE option suppresses printing of the simple
descriptive statistics for each variable, and PLOT=MATRIX requests a scatter plot matrix
instead of individual scatter plots. The HISTOGRAM option displays histograms of the
variables in the VAR statement along the diagonal of the scatter plot matrix.
13. Review: Producing a Correlation Matrix and a Scatter Plot Matrix
14.

7. How many of the following models meet Mallows' C
p
criterion for model selection?


Model
Index
Number in
Model
C(p) R-Square Variables in Model
1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist
2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist
3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist
4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist
5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist

a. 0
b. 1
c. 3
d. 5

Your answer: d
Correct answer: d
In Mallows' C
p
criterion, p equals the number of variables in the model plus 1 for the intercept.
Therefore, for these models, p equals 8, 9, or 10, depending on the number of terms in the
model. All the C(p) values are less than their respective p values, so all five models meet
Mallows' C
p
criterion.
Review: Evaluating Models Using Mallows' Cp Statistic, Viewing Mallows' Cp Statistic in
PROC REG, The REG Procedure: Using the All-Possible Regressions Technique, The REG
Procedure: Using Automatic Model Selection


8. In this PROC SCORE step, which option specifies the data set containing the parameter
estimates that are used to score observations?
9. proc score data=dataset1 score=dataset2
10. out=dataset3 type=parms;
11. var Performance;
12. run;
a. the DATA= option
b. the SCORE= option
c. the OUT= option
13.
Your answer: b
Correct answer: b
14. The SCORE= option specifies the data set that contains the parameter estimates. PROC
SCORE reads the parameter estimates from this data set, scores the observations in the data
set that the DATA= option specifies, and writes the scored observations to the data set that
the OUT= option specifies.
15. Review: The SCORE Procedure: Scoring Predicted Values Using Parameter Estimates
16.

9. According to these parameter estimates, are any of the variables in the model statistically
significant in predicting or explaining the percentage of body fat?
Parameter Estimates
Variable DF
Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 -20.98714 5.55433 -3.78 0.0002
Age 1 0.01226 0.02836 0.43 0.6658
Hip 1 -0.40163 0.09994 -4.02 <.0001
Abdomen 1 0.86123 0.06814 12.64 <.0001
a. no
b. yes, Age
c. yes, Hip and Abdomen
d. yes, Age, Hip, and Abdomen
11.
Your answer: c
Correct answer: c
12. Hip and Abdomen both have p-values lower than .05, so they are statistically significant in
predicting or explaining the variability of the percentage of body fat.
13. Review: Performing Simple Linear Regression, Analysis versus Prediction in Multiple
Regression,The REG Procedure: Performing Multiple Linear Regression
14.

10. What output does this program produce?
11. proc reg data=statdata.bodyfat2 plots(only)=(cp);
12. model PctBodyFat2 = Age Weight Height
13. Neck Chest Abdomen Hip Thigh
14. Knee Ankle Biceps Forearm Wrist
15. / selection=cp best=15;
16. run;
17. quit;
a. only models that meet both Mallows' and Hocking's C
p
criteria for model selection
b. the best 15 models that meet the criteria for the forward, backward, and stepwise
selection methods
c. a set of the best 15 candidate models according to the C
p
statistic generated using the
all-possible regressions technique
d. can't tell from the information given
18.
Your answer: c
Correct answer: c
19. When you use the all-possible regressions technique, you specify RSQUARE, ADJRSQ, or
CP in the SELECTION= option to rank models. The BEST= option selects the specified
number of best models based on the SELECTION= statistic. If more than one statistic is
specified, the first statistic listed determines the ranking.














Quiz, Lesson 3: Regression


Select the best answer for each question. When you are finished, click Submit Quiz.
1. Based on this correlation matrix, what type of relationship do Performance and RunTime have?
Pearson Correlation Coefficients, N = 31
Prob > |r| under H0: Rho=0
Performance RunTime Age
Performance
1.00000


-0.82049
<.0001

-0.71257
<.0001

RunTime
-0.82049
<.0001

1.00000


0.19523
0.2926

Age
-0.71257
<.0001

0.19523
0.2926

1.00000


2.

a. a fairly strong, positive linear relationship

b. a fairly strong, negative linear relationship

c. a fairly weak, positive linear relationship

d. a fairly weak, negative linear relationship
3.
4.
5. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse,
and Maximum_Pulse, the parameter estimate for Age is -2.78. What does this mean?

a. For each year older, the predicted value of oxygen consumption is 2.78 greater.

b. For each year older, the predicted value of oxygen consumption is 2.78 lower.

c. For every 2.78 years older, oxygen consumption doubles.

d. For every 2.78 years younger, oxygen consumption doubles.
6.
7.
8. Given the information in this summary of variable selection, which stepwise selection method
was specified in the PROC REG step?
Step
Variable
Entered
Variable
Removed
Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001
2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222
9.

a. FORWARD

b. BACKWARD

c. STEPWISE

d. can't tell from the information given
10.
11.
12. Here is a table of output statistics from PROC REG. If you sample a new value of the dependent
variable when Performance equals 55, what are the lower and upper prediction limits for this
newly sampled individual value?
Output Statistics
Obs Name Performance
Dependent
Variable
Predicted
Value
Std
Error
Mean
Predict
95% CL Mean
95% CL
Predict
Residual
1 Jack 48 40.8400 44.9026 1.0190 42.0732 47.7319 37.4190 52.3861 -4.0626
2 Annie 43 45.1200 45.3793 1.3081 41.7475 49.0112 37.5570 53.2016 -0.2593
3 Kate 55 44.7500 44.2351 1.4885 40.1023 48.3678 36.1680 52.3021 0.5149
4 Carl 40 46.0800 45.6654 1.6493 41.0862 50.2446 37.3608 53.9700 0.4146
5 Don 58 44.6100 43.9490 1.8646 38.7719 49.1261 35.3003 52.5977 0.6610
6 Effie 45 47.9200 45.1886 1.1361 42.0343 48.3429 37.5763 52.8009 2.7314
13.

a. 44.7500 and 44.2351

b. 40.1023 and 48.3678

c. 36.1680 and 52.3021

d. can't tell from the information given
14.
15.
16. Which of the following statements describes a positive linear relationship between two
variables?
1. The more I eat, the less I want to exercise
2. The more salty snacks I eat, the more water I want to drink.
3. No matter how much I exercise, I still weigh the same.

a. 1 only

b. 1 and 2

c. 2 only

d. 2 and 3

e. 3 only
17.
18.
19. What output does this program produce?
20. proc corr data=statdata.bodyfat2 nosimple
21. plots=matrix(nvar=all histogram);
22. var Age Weight Height;
23. run;

a. individual correlation plots and simple descriptive statistics

b. a scatter plot matrix only, with histograms along its diagonal

c. a table of correlations and a scatter plot matrix with histograms along its diagonal

d. can't tell from the information given
24.
25.
26. How many of the following models meet Mallows' C
p
criterion for model selection?


Model
Index
Number in
Model
C(p) R-Square Variables in Model
1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist
2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist
3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist
4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist
5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist


a. 0

b. 1

c. 3

d. 5


27. In this PROC SCORE step, which option specifies the data set containing the parameter
estimates that are used to score observations?
28. proc score data=dataset1 score=dataset2
29. out=dataset3 type=parms;
30. var Performance;
31. run;

a. the DATA= option

b. the SCORE= option

c. the OUT= option
32.

33. According to these parameter estimates, are any of the variables in the model statistically
significant in predicting or explaining the percentage of body fat?
Parameter Estimates
Variable DF
Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 -20.98714 5.55433 -3.78 0.0002
Age 1 0.01226 0.02836 0.43 0.6658
Hip 1 -0.40163 0.09994 -4.02 <.0001
Abdomen 1 0.86123 0.06814 12.64 <.0001
34.

a. no

b. yes, Age

c. yes, Hip and Abdomen

d. yes, Age, Hip, and Abdomen

35.
What output does this program produce?
proc reg data=statdata.bodyfat2 plots(only)=(cp);
36. model PctBodyFat2 = Age Weight Height
37. Neck Chest Abdomen Hip Thigh
38. Knee Ankle Biceps Forearm Wrist
39. / selection=cp best=15;
40. run;
41. quit;

a. only models that meet both Mallows' and Hocking's C
p
criteria for model selection

b. the best 15 models that meet the criteria for the forward, backward, and stepwise
selection methods

c. a set of the best 15 candidate models according to the C
p
statistic generated using the
all-possible regressions technique

d. can't tell from the information given
42.

You might also like