You are on page 1of 3

1

Name: Tue Hoang


Regression Project
1. Find at least 15 data points of interest that you suspect may be related in a linear fashion.
Describe what data you are choosing to use and your source below.
Interest: How comsumers in the United States spend their money
Data: percentage of diposable personal income required to meet consumer loan payments and
mortgage payments for year 2002
Source: The Federal Reserve Board (Household Debt-Service Burden, 2002)

2. Now, make a scatterplot in Fathom of your data.

3. Describe the scatterplot. From the scatterplot, does it appear that it is appropriate to use
linear regression? If so, continue. If not, try finding another attribute that will give you more
appropriate results.
Scatterplot description:
- Direction: Positive
- Form: Generally linear
- Strength: Weak as data points stay far away from each other
It seems appropriate to use linear regression because both the variables are quantitative, the
general direction and form are sufficiently linear that a linear model could make sense.

4. Create dot plots, boxplots, and normal quartile plots for each set of data using Fathom.
Describe these plots thoroughly.
- Dot plot description: the data points for consumer debt concentrated towards the right corner of
the plot, while those for household debt concentrated towards the left corner. The middle values
for consumer debt and household debt respectively are approximately 7.2 and 5.9.
- Boxplot description: For consumer debt, we can see that the median value lies at about 7.5,
with the interquartile range from 6.5 to 7.7. The minimum value is at 6.1, while the maximum
value is at 7.9. For household debt, we can see that the median value lies at about 5.9, with the
2

interquartile range from 5.8 to 6.1. The minimum value is at 5.6, while the maximum value is at
6.3.
- Normal quantile plot: For consumer debt, the normal quantile plot displays a sufficiently linear
distribution of data points, with slight deviations from the general straight-line trend. There are a
few outliers at the top right corner. For household debt, the normal quantile plot also displays a
linear enough distribution. Deviations tend to concentrate in the middle of the line, as the data
points tend to move further upward.

5. Use Fathom to create an equation of your linear model. (While the scatterplot is highlighted,
go to Graph Least Squares Line.) Write the equation below.
Equation of Least Squares Line: Household_Debt = 0.0332 * Consumer_Debt + 5.704

6. What is the correlation?
Correlation: r = 0.1183

7. Interpret the slope of your model in this context.
The slope of 0.0332 in my model means that for every 1-unit increase in the value of Consumer
Debt, the value of Household Debt is expected to increase by 0.0332 unit.

8. Interpret the y-intercept of your model in this context.
The y-intercept of 5.704 in my model means that even when the value of Consumer Debt is 0
unit, that of Household Debt would be expected to be 5.704 units.

9. Interpret the meaning of R squared in the context of this problem.
r
2
= 0.014, meaning that 0.014, or 1.4% of the variation witnessed in the distribution of data
points can be accounted for by this predictor model.

10. Do you think a linear model is appropriate? Why or why not? (While the scatterplot is
highlighted, go to Graph Make Residual Plot).
A linear model is appropriate here because the residual plot displays what is to be expected of a
linear model: a distribution of data points in the shape of a blob.
3


13. Find the mean, standard deviation, and five number summary of each data set by separately
creating summary tables and while the summary is highlighted, selecting Summary Add Basic
Statistics and Summary Add 5 Number Summary.

14. Use the data you collected in #13 to create boxplots and the linear regression equation BY
HAND below. (These should agree with Fathom!)






















Your final grade will be based on a complete Fathom Document that you will email me, as
well as written responses to the above questions and the boxplots and equations in #14.

You might also like