Regression Project 1. Find at least 15 data points of interest that you suspect may be related in a linear fashion. Describe what data you are choosing to use and your source below. Interest: How comsumers in the United States spend their money Data: percentage of diposable personal income required to meet consumer loan payments and mortgage payments for year 2002 Source: The Federal Reserve Board (Household Debt-Service Burden, 2002)
2. Now, make a scatterplot in Fathom of your data.
3. Describe the scatterplot. From the scatterplot, does it appear that it is appropriate to use linear regression? If so, continue. If not, try finding another attribute that will give you more appropriate results. Scatterplot description: - Direction: Positive - Form: Generally linear - Strength: Weak as data points stay far away from each other It seems appropriate to use linear regression because both the variables are quantitative, the general direction and form are sufficiently linear that a linear model could make sense.
4. Create dot plots, boxplots, and normal quartile plots for each set of data using Fathom. Describe these plots thoroughly. - Dot plot description: the data points for consumer debt concentrated towards the right corner of the plot, while those for household debt concentrated towards the left corner. The middle values for consumer debt and household debt respectively are approximately 7.2 and 5.9. - Boxplot description: For consumer debt, we can see that the median value lies at about 7.5, with the interquartile range from 6.5 to 7.7. The minimum value is at 6.1, while the maximum value is at 7.9. For household debt, we can see that the median value lies at about 5.9, with the 2
interquartile range from 5.8 to 6.1. The minimum value is at 5.6, while the maximum value is at 6.3. - Normal quantile plot: For consumer debt, the normal quantile plot displays a sufficiently linear distribution of data points, with slight deviations from the general straight-line trend. There are a few outliers at the top right corner. For household debt, the normal quantile plot also displays a linear enough distribution. Deviations tend to concentrate in the middle of the line, as the data points tend to move further upward.
5. Use Fathom to create an equation of your linear model. (While the scatterplot is highlighted, go to Graph Least Squares Line.) Write the equation below. Equation of Least Squares Line: Household_Debt = 0.0332 * Consumer_Debt + 5.704
6. What is the correlation? Correlation: r = 0.1183
7. Interpret the slope of your model in this context. The slope of 0.0332 in my model means that for every 1-unit increase in the value of Consumer Debt, the value of Household Debt is expected to increase by 0.0332 unit.
8. Interpret the y-intercept of your model in this context. The y-intercept of 5.704 in my model means that even when the value of Consumer Debt is 0 unit, that of Household Debt would be expected to be 5.704 units.
9. Interpret the meaning of R squared in the context of this problem. r 2 = 0.014, meaning that 0.014, or 1.4% of the variation witnessed in the distribution of data points can be accounted for by this predictor model.
10. Do you think a linear model is appropriate? Why or why not? (While the scatterplot is highlighted, go to Graph Make Residual Plot). A linear model is appropriate here because the residual plot displays what is to be expected of a linear model: a distribution of data points in the shape of a blob. 3
13. Find the mean, standard deviation, and five number summary of each data set by separately creating summary tables and while the summary is highlighted, selecting Summary Add Basic Statistics and Summary Add 5 Number Summary.
14. Use the data you collected in #13 to create boxplots and the linear regression equation BY HAND below. (These should agree with Fathom!)
Your final grade will be based on a complete Fathom Document that you will email me, as well as written responses to the above questions and the boxplots and equations in #14.