For this semester, we are going to use the concepts in Chapter 10 (Correlation

and Linear Regression) in a statistical study. Your project should follow the

following steps and details of how you performed each steps should be included

in your final report.

1. Find a statistical question to answer.

In this case, it would be about defining correlation between two sets of data such

that one (y) depends on the other (x). For example, a statistical question would

be: is there a correlation between age (x) and say, cholesterol level (y)? Note

that cholesterol level (y) depends on age (x), not the other way around.

(Remember that correlation DOES NOT NECESSARILY mean causation).

Clearly state your question and elaborate on what correlation you are going to

analyze. A project is worthless if the readers have no clue what it is that you are

trying to find and analyze.

2. Come up with your own hypothesis.

This is where you are making a guess/conclusion about what you are likely to

find. For example, a hypothesis for the age vs. cholesterol may be that older a

person, higher the cholesterol. Your analysis of the data will either confirm or

deny your early conclusion, i.e., hypothesis.

3. Collect data.

Since we are going to use Chapter 4 material exclusively, make sure that the

data that you do choose are generally speaking, linearly related. Remember you

are going to be analyzing the data to find if they are linearly correlated and if so,

come up with linear regression line that best fits the data. The equation of the

line (y=mx+b) would be then the model for the data in that, you can plug in any x

value in that equation to find the y. Therefore, if the data (x vs. y) is say, nonlinear, trying to do linear analysis would be foolish. So, once you plot your data

in a scatterplot and the data do not look linear, find a different question to answer

for which data are somewhat linear.

Make sure you have enough data. If you are going to make some conclusion

about the whole American population and all you have is 20 data, it would be

hard to impress/convince anyone with your analysis.

Give the source of your data.

Use tables and charts. Use Chapter 3 knowledge to find/calculate 1-variable

summaries, i.e., shape, outliers, center (mean), and variability (standard deviation) of

the data. You will do this for the dataset x and dataset y separately.

You will need to include the raw data and the 1-variable summaries for the x and

y datasets in the main report or in an appendix.

5. Plot x vs. y and calculate linear correlation coefficient, r.

Compare your calculated r to the r from Table II in the Appendix in your text

book.

If your calculated r is greater than the Table II r then, significant correlation exists

in your data set. And you are okay to proceed with linear regression analysis to

come up with the best fit line and an equation (y=mx+b) for the line. Finding the

equation means calculating the slope m and the interceptor b.

Show the chart with all the points, best-fit line, and the equation.

6. Plot the residual (observed y predicted y) vs. x.

Here observed y is the raw data that you collected. Predicted y is calculated

from the best-fit line equation where you plug in your x and calculate your y.

Before we can conclusively say that the data (x vs. y) are linearly correlated, and

before you can use the best-fit line equation as your model of the data to make

prediction, you need to satisfy two more conditions:

A. If residual vs. x plot shows a discreet pattern then the data are NOT

linearly related.

B. If residual vs. x shows the spread of the residuals increasing of

decreasing then the data are NOT linearly related.

Include in your report the residual vs. x data tables as well as the graphs.

7. Check to see if the linear model assumption is valid.

If all 3 criteria in Steps 5 and 6 are met, then:

A. Linear correlation exists between the data and you can use the best fit

line as a model of the data to make prediction.

B. If ANY of the 3 criteria is NOT met, then the linear model assumption is

wrong and the equation is no good.

data), no matter what your x is.

State in details your final conclusion of the data in terms of if linear relation

exists. If it does, then give best fit line equation with r. Also, discuss how your

final conclusion compared with your initial conclusion, i.e., hypothesis about the

data.

8. Make a few predictions.

Remember again, if your linear model assumption is valid, then you can just plug

in an x value in the best-fit line equation to predict y. However, if your linear

model assumption is NOT valid, then your predicted y should just be average y

since linear regression equation is not representative of the data.

Make sure you do not make prediction outside your data range as your linear

regression equation is only good within the data range that you used to come up

with the equation.

9. Afterthought.

In this section, you explain/analyze what you did wrong (if at all!) in your study.

Ask yourself questions like did you just do a convenience or voluntary-response

sampling to collect your data? Did your study suffer from too few data points?

Are you misrepresenting the data? Is your analysis correct? Does your

conclusion make sense?

Also write what you could have done to make your study more worthwhile and

useful to the reader. This part can be also in your reflective writing in the eportfolio.

The grade for this project is 5%. The term project (a final report) is due on May

1. Please make sure you put in time and effort commensurate with that in mind

to produce a really professional, thorough, well-organized report. And finally, it is

a team project so make sure every one contributes and learns from the

experience. Good luck.

e-Portfolio

The last thing to do is to create an e-portfolio website (use wordpress, weebly, or

yola).

Then you need to upload the final version of the term project on your e-portfolio

website. It is your responsibility to make sure you get a copy of the report so you

can upload.

this project as a team. This is your personal opinion/reflection essay as opposed

to what will be on Section 9 (Afterthought) of your report where the whole team

reflected on the project.

You need to give me the web address with the uploaded final report and personal

reflection essay. Make sure you do not have any password for the site.

