You are on page 1of 4

Zachary A.

Lute
Kevin Griest
Statistics 302 Investigation 5
i.
Due: At the beginning of class, Friday, February 22, 2013. No late assignments will be accepted.
ii.
You may work with one other person from your class, handing in one report with both names.
iii.
Responses to the questions should be legible, preferably Word-processed/typed; all graphs should be
created using computer software. Include graphs wherever requested.
iv.
Remember to staple all the sheets you are turning in.
v.
You will be graded on content, accuracy of content, and whether you were mindful of (iii) and (iv).
NOTE: You will need Minitab for this Investigation.
1) Percent Body Fat: Use the Minitab file Percent fat and age.mtw. This data file contains information on the following
variables, for 18 U.S. adults aged between 23 and 61.
Body fat percentage
Age (years)
Sex (Male or Female)
a) Create a scatterplot of body fat percentage vs. age. Describe the scatterplot.
Scatterplot of Percent.Fat vs Age
45
40

Percent.Fat

35
30
25
20
15
10
20

30

40
Age

50

60

There appears to be a fairly strong, positive linear correlation between body fat
percentage and age. There appear to be several outliers at age 23 and body fat
approximately 28% and at age approximately 27 with body fat approximately 5%.
There are two leverage points at age 23 and one at age 61.

Page 1 of 4

b) Create a scatterplot of body fat percentage vs. age, separated by Sex. Describe the scatterplot. (Hint: See Page 63 64 of
notes for how to make this the scatterplot.) Include this scatterplot in your report.
Scatterplot of Percent.Fat vs Age
45

Gender
F
M

40

Percent.Fat

35
30
25
20
15
10
20

30

40
Age

50

60

This scatterplot shows that there are different slopes and different intercepts for
the regression lines for the two genders. This means that the change in mean body
fat percentage associated with a one year increase in age is different among males
compared to females. Males have a more positive slope compared to the females.
This means that with a one year increase in age, males tend to see a higher
increase in mean body fat percentage, compared to females.
c) Create an indicator variable Male = 1, if person is a male; 0, if female. Make sure that female is the baseline group.
d) Run a regression with body fat % vs. age and Male. Include the regression equation in the report.
Percent.Fat = 15.1 + 0.339 Age - 9.79 Male

e) Using the regression equation obtained in part (d) write 2 separate regression equations for males and females, describing
the relationship between body fat % and age.
Percent.Fat = 5.31 + 0.339Age
Percent.Fat = 15.1 + 0.339Age
f) Interpret the intercept of the regression equation in part (d), in the context of the data.
The intercept of the regression equation is the mean body fat percentage of all similar women aged zero. This is a
meaningless interpretation.
g) Interpret the coefficient of Male in the regression equation in part (d), in the context of the data.
For all similar men and women of the same age (if the adults were randomly selected), the decrease in estimated
body fat percentage associated with being a male rather than a female is 9.79 percentage points.
h) Interpret the coefficient of Age in the regression equation in part (d), in the context of the data.
After accounting for gender, the increase in mean body fat percentage associated with a one year increase in age is
0.339 percentage points.
i) At an overall significance level of 0.05, identify the significant predictors of body fat percentage.
At an overall significance level of 0.05, after doing a Bonferroni adjustment, the individual error rate is 0.025.
Age : p-value = 0.013 < 0.025. Therefore age is a significant predictor of body fat percentage.
Male : p-value = 0.018 < 0.025. Therefore gender is a significant predictor of body fat percentage.

Page 2 of 4

2) Assets and Sales: Use the Minitab file AssetsSector.mtw. This data file contains information on the following
variables, for 79 randomly selected large American corporations.
Assets (in millions of dollars)
Sales (in millions of dollars)
Sector (the type of company: Banks, Manufacturing, Other, Tech.)
a) Create a scatterplot of assets (y, in millions of dollars) vs. sales (x, in millions of dollars).
Scatterplot of Assets vs Sales
60000
50000

Assets

40000
30000
20000
10000
0
0

10000

20000

30000

40000

50000

Sales

b) Transform assets to log10(assets) and sales to log10(sales). Then create a scatterplot of log10(assets) versus log10(sales).
(Minitab > Calc > use LOGTEN())
Scatterplot of log(Assets) vs log(Sales)
5.0
4.5

log(Assets)

4.0
3.5
3.0
2.5
2.0
2.0

2.5

3.0

3.5
log(Sales)

4.0

4.5

5.0

c) For which data (the untransformed or the transformed) do you think, it would be better to use the regression methods
discussed in class? Why?
It would be better to use the regression methods we learned in class to analyze the transformed data because the
untransformed data looked to have a somewhat fan shape which indicates a violation of equal variance. However,
by using a log transformation, we got rid of the fanning shape and now can use the regression methods we have
learned.
d) Run a regression of log10(assets) as the response variable versus log10(sales) as the explanatory variable. Call this Model 1.
Include a screenshot of the regression output.
e) Create a scatterplot of log10(assets) versus log10(sales), by Sector. (Minitab > Graph > Scatterplot > With
Regression and Groups > Y variables, X variables, Categorical variables = Sector.)
Do you think that there might be an interaction between Sector and log10(sales)? Explain. Be sure to include the graph in
your report.
f) Explain, as simply as you can, what it means for there to be an interaction between Sector and log10(sales).
g) Create indicator variables for Sector, such that the Other category is the baseline group. (Hint: There are 4 categories of
Sector; how many indicator variables do you need?)
h) Run a multiple regression of log10(assets) versus log10(sales) and indicator variables for Sector. Call this Model 2. Include a
screenshot of the regression output.
i) Use the regression output in part (h) to write separate regression equations for each sector.
j) For all companies with the same log10(sales), what is the estimated change in mean log10(assets) associated with being a
manufacturing company as opposed to a company from the Other category? Is it an increase or decrease?
k) Now run a regression with log10(assets) as the response, and the following as predictors:
Log10(sales); Indicators for sector; Interactions between log10(sales) and indicators for sector
Page 3 of 4

Call this Model 3. Include a screenshot of the regression output in your report.
(NOTE: To create the interactions, you need to use Minitab > Calc> Calculator > and multiply each
indicator with log10(sales))
l) Use the regression output in part (k) to write separate regression equations for each sector.

Page 4 of 4

You might also like