You are on page 1of 3

HP Prime Graphing Calculator

Anscombe‘s Quartet

Learn more about HP Prime:


http://www.hp-prime.com

Who?

Anscombe‘s Quartet refers to four data sets devised by Francis Anscom-


be (1918-2001) in 1973. In this activity, the data is already loaded into
an app called Anscombe. In the early years of the computer, for statistical
calculations, Anscombe already argued that computers needed to be
used for calculations ánd graphs!!

• HP Prime functionality introduced: Using the Statistics 2Var app Nu-


meric, Symbolic, and Plot

• Statistics Content: Analyzing patterns in scatter plots; correlation and Francis Anscombe
linearity; least-squares regression lines; residual plots, outliers, and
influential points

Different, or not?

Load the AnscombeQuartet app (available at www.hp-prime.com) on all


calculators.

Launch the application and view the datasets in the numeric display with
N. The C1 column shows the combined x-values for the first three
data sets. C2, C3 and C4 contain the different y-values. The fourth data
set is in columns C5 and C6. Open the symbolic editor with @ and
make sure that only the checkbox for S1 (first statistical plot) is selected,
choose V and the option Automatically scales, to get the point cloud
for the first dataset image. Go back to the editor and select only the
second plot. Check out all four point clouds seperately. Always use auto-
matic scaling so the point clouds always appear correct on the screen.

1
You will see four very different point clouds in the four different data
sets. Now once again plot all four together with the checkmarks for S1,
S2, S3 and S4.

Go to the numerical representation with N and tap STATS. The su-


mmarized statistical values of the four data sets will show.

The statistical values are:

• n the number of data points.


• r the correlation coefficient between the independent and
dependent variables.
• R2 The square of the correlation coefficient.
• sCOV The sample covariance.
• σCOV The covariance of the population.
• ∑XY The sum of the individual products of X and Y.

Tap X for a look at the summary statistics for the independent variable,
as showed on the screen here right.

In these two screens you will notice that in the very different point clouds
a lot of the same values appear.

It‘s time to go to the charts and associated lines that look like the best fit.
Ensure that all four plots are checked and choose V and Auto Scaling
to get a graph with the appropriate settings. Press and check
(The small dot behind can be switched on and off) and
you will see all four point clouds and all four regression lines? Something
here is not right; we only see one regression line, which belongs to the
blue point cloud. Why do we not see the other three? The answer is a
little further.

Let adjust the data a little, go to the Home screen using H and type
C2+1 C2. Look at the screen on the right, also make the adj-
ustments for C3 and C4. The dependent variables in the first statistical
plot S1 are increased by 1, in the second with 2 and in S3 with 3.

2
Now open again the chart with # and behold, we see all four of the
calculated regression lines, which run parallel to eachother!

So that is the reason that only the blue line at the first stat plot showed,
the other three are calculated path, but underneath the blue line, all four
point clouds have the same regression line. So even though the four
point clouds are very different, they deliver all four the same line as the
best-fit line at several points clouds. This is no coincidence, Anscombe
has deliberately chosen the values that despite the differences they pro-
duce have the same result with a standard linear regression.

Undo the changes using H and type C2-1 C2. Undo in a similar
way the changes for C3 and C4, so the original situation is restored.

Open the symbolic editor with @ to view the different plots separa-
tely. Allow only S1 to stand and watch the plot with the regression line.
Place the cursor on the line with U. Use and to show
the equasion. S1=0,5*X+3 (approx.). We can also see the residual values,
we‘re storing them in C7 and thereby also create a plot.

Open the Home screen using H and enter: Resid(S1) C7. The
Resid command (for residual values in the regression bill) can be found
via: b, choose , 1Var.Statistics and option 3 Resid.

Open the symbolic representation with @ and change the settings for
S2 at C1 and C7 for this plot, as showed in the screen on the right. Just
keep S2 by unchecking the marks for the others.

Then use V and Auto Scaling for the graph of the residual values. Is a
linear fit to the data in C1 and C2 appropriate?

In the same way, check the data in the plot of C1 and C3, and C1 and C4,
C5 and C6. Investigate whether the linear regression is the appropriate
type.

You might also like