You are on page 1of 304

Student Workbook

prepared by-Gordon Bear


WORKBOOK
to accompany

STATISTICAL REASONING
IN PSYCHOLOGY AND EDUCATION
Second Edition

Edward W. Minium

prepared by
Gordon Bear
Ramapo College of New Jersey

John Wiley & Sons, Inc.


New York • Santa Barbara • Chichester • Brisbane •Toronto
Copyright © 1978 by John Wiley & Sons, Inc.

Reproduction or translation of any part of


this work beyond that permitted by Sections
107 or 108 of the 1976 United States Copyright
Act without the permission of the copyright
owner is unlawful. Requests for permission
or further information should be addressed to
John Wiley & Sons, Inc.

ISBN 0 471 03663 1


Printed in the United States of America

10 987654321
Acknowledgments

I am indebted to many fine people for important contributions to this work¬


book. Thank you to:

Ed Minium, whose text has made statistics so easy for me to teach and so
easy for my students to learn, and who also provided a careful critique of
the first three chapters of the workbook;

Jack Burton, my editor, who supplied much valuable guidance and understood
when the work expanded to fill the time available;

Bob Abelson, Barry Collins, and Fred Sheffield, who were my own conscien¬
tious instructors in the subject of statistics;

my teaching assistants at the University of Wisconsin, who worked diligently


with me and provided helpful advice for improving my instruction;

my many students, from whom I have learned a great deal about the teaching
of statistics and other matters;

John Harsh, who offered to share with me the fruits of his labor on his
mastery-plan workbook;

Bob Worsham, who contributed the data used in the homework for Chapter 9;

the Faculty Research Committee and the Academic Administration of Ramapo


College, who reduced my teaching duties to facilitate my work on this project;

Joe Fontanazza, who found the fantastic machine on which I typed both the
rough and the final drafts;

the classical-music stations of New York City, which nourished my spirit


through many long nights of typing;

Cartoon Cat, who provided companionship through those nights.

I happily dedicate this work to the women in my life.

in
Digitized by the Internet Archive
in 2018 with funding from
Kahle/Austin Foundation

https://archive.org/details/workbooktoaccompOOOObear
A PERSONAL STATEMENT from the AUTHOR of this WORKBOOK

I want to help you learn statistics and earn a good grade in this course.

You're off to a good start, because your text is the book by my colleague
Ed Minium, and it's an outstanding one. I taught from the first edition of
this book ten times: seven times at the University of Wisconsin at Madison,
where my classes had 60 to 90 students, and three times at Ramapo College of
New Jersey, where my classes had 10 to 30 students. At both schools the book
proved to be excellent for developing a thorough understanding of statistics,
and now the second edition is even better. Many other texts supply only the
bare facts, but Dr. Minium gives you much more. He shows you the overall
structure into which the facts fit, and he adds details that provide insights
into matters treated only superficially in other texts.

The purpose of my workbook is to help you master Dr. Minium's text. You'll
find here:

•Do-it-yourself summaries that direct your attention to the important points


•Maps showing you how the various concepts in statistics connect with each other
•Exercises in which you can practice your newly-learned skills
•Drills to help you become fluent in reading symbols and understanding them
•Tricks for remembering things
•Special help with the most difficult matters
•Examples of statistics at work in psychology, education, and the world at large

But please note: You will not be able to "cram" successfully by reading only
the summaries here. The summaries cannot substitute for the text itself. And
the exercises I offer you cannot substitute for those in the text either. In
fact, I have generally constructed exercises that differ markedly from the ones
in the text, to provide you with other kinds of opportunities for learning.

To get the most out of your course in statistics, attend class regularly,
study your text carefully, do the problems there—and then come to this workbook
and let me help you get really good at this stuff. I've tried hard to make this
book something special, and I'd be delighted to know that you used it.

v

■'
TABLE OF CONTENTS

Author's Statement v
Are You Worried about the Mathematics in this Course? 1
Tips on Buying a Calculator 2
Chapter 1 Introduction 3
Chapter 2 Preliminary Concepts 9
Chapter 3 Frequency Distributions 19
Chapter 4 Graphic Representation 33
Chapter 5 Central Tendency 39
Chapter 6 Variability 47
Chapter 7 The Normal Curve 63
Chapter 8 Derived Scores 71
Chapter 9 Correlation 79
Chapter 10 Factors Influencing the Correlation Coefficient 87
Chapter 11 Regression and Prediction 91
Chapter 12 Interpretive Aspects of Correlation and Regression 97
Chapter 13 The Basis of Statistical Inference 103
Chapter 14 The Basis of Statistical Inference: Further Considerations 113
Chapter 15 Testing Hypotheses About Single Means: Noraml Curve Model 119
Chapter 16 Further Considerations in Hypothesis Testing 129
Chapter 17 Testing Hypotheses About Two Means: Normal Curve Model 137
Chapter 18 Estimation of y and yx-yY 155

Chapter 19 Influence about Means and the t Distribution 161


Chapter 20 Influence about Pearson Correlation Coefficients 169
Chapter 21 Some Aspects of Experimental Design 177
Chapter 22 Elementary Analysis of Variance 187
Chapter 23 Inference about Frequencies 207
Chapter 24 Some Order Statistics (Mostly) 219
Answers 227
Homework 2^6
1

-
ARE YOU WORRIED ABOUT THE MATHEMATICS IN THIS COURSE?

As of this writing, I have led over 600 students through an introductory


course in statistics. Almost all of them, I'm sure, were initially worried
about the mathematics facing them. One young woman even dreamed of being
attacked by numbers on the night before her first class with me. But 95% of
these students passed the course with a grade of C or better, in spite of my
high standards, and the person who had the nightmare earned a strong A. She
also acquired good self-confidence about mathematics. You can be a success

too.

Consider this:

•Your text emphasizes the logic of statistics, not the theorems, for¬
mulas, and proofs that mathematicians work with. The title of the
book is "Statistical Reasoning in Psychology and Education," and it's
reasoning, not mathematics, that's important here.

•The mathematics employed in the text is only simple algebra. You


covered this in high school, and if you need to relearn it, you can
do so easily. Appendix A in the text will help.

Furthermore, look what you've got going for you:

•Your text is an exceptionally good book. As I noted above, it pre¬


sents more than the bare facts. It also provides the big picture,
so you can see how the facts fit together, and the fine details, so
you can gain insight into the facts.

•This workbook offers summaries of the text, exercises to help you in


various ways, and interesting examples of the things you're learning.

•Your instructor (and your teaching assistant, if you have one) will
go over the material in the book and answer any questions you have.

Moreover:

*This course itself provides a leisurely review of mathematics. The


math in the course begins with counting (tallying up observations).
It goes on to proportions and percentages, and more complex matters
come up only later. So you can gradually relearn whatever you're
uncertain of—and you'11 be relearning it in a context that makes it

vivid and useful.

The mathematics in this course is thus fully within your comprehension.


If you've got some time to devote to the course, you can learn absolutely
everything in your text and feel really good about it.

1
TIPS ON BUYING A CALCULATOR

A miniature calculator would be a good investment for this course, and


you111 probably find other uses for it too.

You don't need anything fancy. You'll have no use for the special features
of the "scientific" calculators that are meant to replace a slide rule—no use
for the keys for pi (tt) , logarithms, exponentiation, or the trigonometric func¬
tions sine, cosine, tangent, and cotangent. (I told you this course doesn't
require fancy mathematics.) You do want the following:

•an add-on memory, which permits you to add a number to another number
already in storage. The key for doing so is usually labeled M+. A machine
with add-on memory usually also has keys labeled M-, MR or RM, and MC or
CM, in which case it's said to have a four-function or four-key memory.

•automatic constant for multiplication (for which there's usually no special


key), or a key for squaring a number, labeled x2. Automatic constant works
like this: To square a number (to multiply it by itself), you first enter
the number and then hit the x key. Instead of entering the number again,
though, you just press the = key. See if this works on any calculator
you're trying: 3 x = should get the calculator to read 9. A key labeled
x2 is even better.

You should also look for a machine with:

•positive-action keys, which click or change in amount of resistance to the


touch when they work. Some machines give you no feedback on whether a given
key has functioned, and you wind up constantly checking the display, which
is a nuisance.

'keys that are big enough for you to press easily and accurately. Too much
miniaturization is a liability.

•a square-root key. It'll be labeled / or /x.

•a display that you can read easily from a variety of angles.

Be a smart shopper:

•Compare models, guarantees, and prices, and try several stores.

•Ask about the stores' policies on defective merchandise. What will they do
if your purchase malfunctions after you get it home? The store should agree
that if it breaks within 30 days, they will replace it with a new machine
from their own stock, rather than sending the old one to the factor for re¬
pair. Ask the salesperson to write "30-day exchange" on your receipt and
sign it.

You should be able to get what you want for less than $20.

2
CHAPTER 1

INTRODUCTION

Here's a list of the sections that make up Chapter 1 in your textbook.


If you'd like to keep track of your progress in this course, there's space
to write in such things as "Okay," "Reread," or "Ask about this."

1.1 Descriptive Statistics

__ 1.2 Inferential Statistics

___ 1.3 Relationship and Prediction

1.4 Kinds of Statisticians

1.5 For Whom Is This Book Intended?

1.6 The Role of Applied Statistics

1.7 More about the Road Ahead

1.8 Dirty Words about Statistics

1.9 Some Tips on Studying Statistics

Now here's a list of the problems and exercises at the end of Chapter 1.
Again, you can keep track of your progress by checking off those you did and
noting which ones you answered correctly, which ones you want to ask your
instructor or your teaching assistant about, and the like.

1 -----__---

2 ---------

3 ___

4 ________—-

3
4 Chapter 1

SUMMARY

To construct a helpful summary of the first chapter in your text, write


in the appropriate term or phrase where a blank appears below, and cross out
the incorrect wording where you're offered a choice. Following each blank and
each choice is a number in brackets that tells you which section of the text
has the answer. You might want to treat this as a test, though, and see how
well you can do without looking in the book.

What Is (Are?) Statistics?

In ordinary speech, the term statistics refers to facts involving numbers,


as in the expression "unemployment statistics." The word is plural in this
sense ("The statistics are due to be released soon"). In Ch. 1, however, the
text uses the term in a different sense, a sense in which the term is singular.
In this sense, statistics is a field within the discipline of mathematics, a
field comparable to algebra, geometry, and trigonometry.

As a field of mathematics, statistics consists of techniques for solving

problems. The techniques fall into two categories: descriptive statistics and

inferential statistics. The primary function of descriptive statistics is to

provide meaningful and convenient techniques for .!<? '


z
(iyO: rr. ft C\ c ; _ [For the answer, see the last sen¬

tence of Section 1.1 in the text]. As for inferential statistics, the object

of these procedures is to draw an __ about conditions which exist


r

in a larger set of observations from study of \ , 1 ■ - , , .- [1.2].

This branch of statistics is also known as _ statistics [1.2] .

Who Uses Statistical Techniques?

Those who work with statistics might be divided into four classes: (1)

those who need to know statistics in order to appreciate _' ? M i r,

<■ . IY y' (■■■ Vv\ c-ir ■ v n.;, / ■- ,_, (2) those who must select and apply

statistical treatment in the course of __, (3) profession¬

al r»,„g_, and (4) ____rr I ’ ■_ statisticians [1.4]. The main

interest of those in the first two classes is in statistics itself / their own

subject matter [Cross out the incorrect wording]. We might think of them as

amateur/-professional statisticians [1.4].

The professional (practicing) statistician acts as a _ in

the process of research by assisting those with _ questions


Introduction 5

(the first two kinds of persons) in finding statistical techniques and applying

them to the evidence they gather [1.4].

These three kinds of persons are more interested in applications of statis¬

tical techniques than in the theory behind them. In contrast, the fourth kind

of person, the mathematical statistician, is primarily interested in the theory.

The text is concerned with applied / t^eorebi-cal statistics, and it is

directed to the prospective amateur / pa?e#e^sdrorrad. statistician [1.5].

(For a "map" of the information reviewed here, see p. 6 of this workbook.)

How Do Statistical Techniques Figure into Research?

It is important to distinguish between substantive matters and statistical

matters. An investigation typically begins with a -■_ question,

which is a question of \ \ ^ " (__ [1.6]. The statistical

work comes later and begins with a question, the answer to

which is expected to throw light on the r,s t


question [1.6]. A

statistical question differs from a substantive question in that it concerns

a statistical (that is, numerical) property of the data. Upon applying a sta¬

tistical procedure, one arrives at a - C.1X ■_ conclusion, which like

the corresponding question concerns a ^ ;TH64! CO \_ (numerical) property

of the data [1.6]. Finally, a s\jj; conclusion is drawn. This con¬

clusion derives emLirerly / only partly from the statistical conclusion [1.6] .

Thus statistical procedures are tools that enable a researcher to move from a

substantive question to a substantive conclusion.

(For a map of this information, see p. 7 below.)

Is It True What They Say About Statistics?

Here are five accusations about the field of applied statistics. Each
contains some truth, but not the whole truth. No one is trying to tell you
what opinions to adopt, but the author of your text and I would like you to
see both sides of each issue. So fill in a counterargument for each accusa¬
tion. Those in the text appear in Section 1.8.

1. Statistics is a dry field. \-V ftv - •[ CO _L_—i—’ ; > ..L^U2 _——


6 Chapter 1

2. statistical techniques are depersonalizing, because they deal with

groups of people and not with individuals . (?•/ vOA ^ Q € -f v' C‘-^ ■ > ■ ■a A Ly—

4ut v -ec\ \ a a 4V\€ o’< hd ii) k ad -.in'i

v V\(s \ \f\ QUO - _— -- --— -— ■ -■ --

3. Statistical technique:q ai vo misleading results, i njf \aia '

&£■'?'& HOYS Cn- CAT C. Uit cjo c ces k 3, ir-<:

Statistical techniques dictate the kind of research that is done. ( The

statistical tail wags the substantive dog.") •a

kY(Vi iVv'A ■<?., r ' ■■■.(£,--

5. Statistics is too mathematical a field for anyone but an expert to

understand. mV'li a ___—

, .i
r ■ alT cc'-qo VV
AT

MAP of STATISTICS and THOSE INVOLVED WITH IT’

is developed by Mathematical Statisticians


Statistics
(as a field of
is applied to •Professional Statisticians
mathematics)
substantive
questions by serve as consultants to
consists of
Amateur
I
Statisticians

Descriptive Inferential
consist of
Statistics Statistics

Those who need to Those who use


understand statistics statistical
in order to understand techniques in
reports of research in their own
their own fields. research.
Introduction 7

MAP of the ROLE of STATISTICS in RESEARCH

begins with -♦Substantive Question


Research
often requires ^^n ✓ ask' -►Statistical Question
, _ vStatistical /
the use of-» , . <
Techniques \
\ yield- -►Statistical Conclusion

contributes to but does


not fully determine

i
-♦Substantive Conclusion
ends with'

Statistics in Action — ... ..-.. —.. ...

DATA-LOVING JAPANESE REJOICE on STATISTICS DAY

TOKYO, Oct. 27—This month, the 10th one of the year, the 72-year-old Takeo
Fukuda, this nation's 13th post-war Prime Minister, leads the 113 million citi¬
zens of Japan in marking the fourth anniversary of a very special event in the
official life of Japan—Statistics Day.

Japan, a 2,600-year-old nation that consists of 3,937 islands covering


145,267 square miles, was a relative latecomer in the official compilation of
numbers used throughout the world today to portray national characteristics.

But there is likely no nation that [now] ranks higher in its collective
passion for statistics.

In Japan, statistics are the subject of a holiday, local and national con¬
ventions, awards ceremonies and nationwide statistical collection and graph¬
drawing contests.
"This year," said Yoshiharu Takahashi, a Government statistician, "we have
almost 30,000 entries. Actually, we had 29,836."

"In a modern society," noted Mr. Takahashi, "statistics have become a


necessity." In addition to the obvious statistical categories, the central
Government now compiles figures on such things as the success rate of the
artificial incubation of chicken eggs, the number of railroad cars produced,
the volume of mail from overseas, the size of children's monthly allowances,
the number of baseball gloves imported, and the frequency of tootbrush usage.

Four years ago, however, the Government began to notice a statistical


decline in the cooperation rate of its citizens, many of whom were apparently
unconvinced of the numbers' necessity. Thus National Statistics Day was estab¬
lished .
This year's national theme is "Statistics are the beacon for our happy
life." Entries in the statistical graph contest were screened three times by
8 Chapter 1

judges, who gave first prize this year to the work of five 7-year-olds. Their
graph creation, titled "Mom, play with us more often," was the result of a
survey of 32 classmates on the frequency that mothers play with their off¬
spring and the reasons given for not doing so (the most often heard excuse:
"I'm just too busy").

Tomorrow, 2,500 Government employees involved with statistics will gather


in the city of Fukui for Japan's main statistical rally. The highlight will
be an address by Prof. Takashi Iga on "The kinds of statistics needed for the
economy of the future."

But there is one figure that won't be included: Officials do not yet keep
statistics on the number of statistics they keep. "We don't know," says Mr.
Takahashi, "they are countless."

[Copyright 1977 by The New York Times Company. Reprinted by permission. The
actual date of Statistics Day is October 18.]

Note the sense in which the word statistics is used in this article (in
the sixth paragraph, for example). This is not the kind you're studying in
this course. Any 7-year-old can comprehend the kind the article talks about.
What you're studying is techniques for describing this kind and for drawing
proper inferences from them.
PRELIMINARY CONCEPTS

Here's a list of the sections that make up Chapter 2 in the text. As for
Chapter 1, if you'd like to keep track of your progress, there's space to write
in things like "Okay," "Reread," or "Ask about this."

_ 2.1 Populations and Samples

_ 2.2 Random Samples

_ 2.3 The Several Meanings of "Statistics"

_ 2.4 Variables and Constants

2.5 Discrete and Continuous Variables

2.6 Accuracy in a Set of Observations

2.7 Levels of Measurement

2.8 Levels of Measurement and Problems of


~~ Statistical Treatment

Here's a list of the problems and exercises at the end of Chapter 2. To


keep track of your progress, check off those you did and note such things as
whether you answered a question correctly or need to ask for help with it.

1 2

3 4

5 6

7 8

9 10

11 12

13 14
10 Chapter 2

SUMMARY

Populations, Samples, and Random Sampling

The term population is used in two senses in the field of statistics. In

one sense, it refers to the people (or whatever) the .investigator is studying,

and the term designates the group about which the investigator wishes to i

(yv, / _ [2.1]. The term sample designates a ■ of a population

[2.1]. In the second sense, the term population refers to the v ; _ , _

set of C:\y_e, \ w f . vv-PV^f tVOr--^ :V:o ^\ _

\ \ y , Y.j ■ , ^_ [2.1] . A sample again is a part of

a population. A single observation or measurement (a sample of size one) is

called an \v\ 1 [2.1]. Of the two definitions of population, the one that

usually works better in statistics is the / second [2.1].

If a sample is drawn in a certain way from its parent population, it is

a random sample. For a sample to be random, it must have been selected in such

a way that every ih the y-.- > • had an equal opportunity of

being included in the sample [2.2].

Random samples have two important properties. First, suppose we draw

several random samples of the same size from a given population. It is highly

unlikely that we would get exactly the same collection of elements in each

sample. Thus the characteristics of the sample as a whole will change/ stay

the same from sample to sample [2.2]. This phenomenon is called sampling

variation.

The second important property of random samples is the effect of the size

of the sample on the amount of variability that occurs among different samples

of the same size. The larger the samples, the more / the less the variation

in the characteristics of the samples [2.2]. This fact is of great importance

in statistical descrlpfeupn / inference because it means that large samples

will provide us with a more / be-srs precise estimate of what is true about the

population that we can expect from small samples [2.2].

More Meanings for the Term Statistics

In Section 2.3 the text lists four meanings for the term statistics. In

the first two senses the word is singular and refers to branches of mathematics.
Preliminary Concepts 11

Statistics in the first sense is statistics, which is the science

of organizing, describing, and analyzing bodies of quantitative (numerical)

data; this is the kind of statistics that consists of descriptive and inferen¬

tial techniques. Statistics in the second sense is statistical ; -_ _

the branch of mathematics, owing much to the theory of pyp ha fej -u_' that

provides the theory behind the descriptive and the inferential techniques.

The third meaning of the word listed in Section 2.3 is the meaning that

the word takes in common usage. The word has already been defined in this

sense on p. 4 of this workbook as facts involving numbers. The text defines

the word in this sense as a set of uYUC^' ; such as averages. In this sense

we may use the word in the singular and speak of a statistic, meaning a single

numerical fact or a single index. The fourth meaning of the word statistics

is a refinement of the third. In this sense the word is again plural, but it

refers not just to any old numerical facts, but to indices that describe a

sample. In this sense, then, a statistic is a characteristic of a sample. The

comparable word for a characteristic of a population is ---—•

Constants and Various Kinds of Variables

Suppose we have decided to study a certain group of people, such as the

students at a particular college. The characteristic that defines this group

_the college they attend--does not vary from person to person. So far as our

study goes, it is not possible for this characteristic to have other than a

single value, and it is referred to as a constant / v^vira&^e [2.4].

Other characteristics, like the sex of a subject, can vary from one per

son to another within the group we are studying. A characteristic such as

this which may take on different values is a eefistant / variable [2.4].

A variable may be either qualitative or quantitative. A qualitative

variable consists of discrete categories that differ in quality, not in quan¬

tity. Such a variable is also called a _ variable [2.7, Para¬

graph 2], and an example is a person’s sex, which can be either male or female

The two categories, male and female, differ in kind but not in degree. To

designate one person as male and another as female is to say that they differ,

but the designations do not say that one is more of something than the other.
12 Chapter 2

A quantitative variable, on the other hand, consists of values that do

differ in quantity. An example is the number of siblings (the number of broth-

or sisters) that a person has, which can vary from 0 up to 10 or more. Another

example is a person's height. In each case, people who differ in where they

stand on the variable possess different guantities of it.

These two examples, number of siblings and height, illustrate the two

types of quantitative variable. Number of siblings can take only certain values

namely 0, 1, 2, 3, and so on. No person can have 0.6 or 1.2 siblings. Values

of such variables are stepwise in nature, and such variables are said to be

discrete / continuous [2.5], In contrast, height can take any value within

the range of possible heights. A person can be exactly five feet tall, or just

a little more than five feet, or just a little more than that, and so on. Where

as a discrete variable has gaps in its scale, this kind of variable has none,

and it is known as a f _ variable [2.5, Paragraph 2],

A qualitative variable is always a discrete one.

Accuracy in Measurement

Suppose we have a quantitative variable that is discrete, such as number

of siblings. If we record the value of such a variable with no error, we have

recorded an exact number. Numbers lacking this kind of accuracy are known as

Ok,.- numbers [2.6]. In working with a quantitative, discrete

variable, such numbers arise if our method of collecting data has 4^0;

the potential accuracy in the discrete values [2.6]. This happens when we

estimate a value or round it.

Now suppose the variable we're working with is quantitative and continu¬

ous, such as height. Even though a variable is continuous in theory, the pro¬

cess of measurement always reduces it to a _ one [2.5, Paragraph 4].

Thus any measurement of a continuous variable must be treated as an exact /

approximate number [2.6, Paragraph 2].

Levels of Measurement

Measurement is the process of assigning a number to a subject (or to what¬

ever a researcher is studying) so as to indicate the value of some variable

that characterizes the subject. There are three techniques of measurement:


Preliminary Concepts 13

categorizing, ranking, and scoring. (This information is not in the text but

should help clarify it.)

Categorizing is the technique of measurement used for a qualitative (cat¬

egorical) variable (such as sex). In categorizing, a number is assigned to a

subject to indicate the category into which the subject falls. (The number 1

might be used to indicate the category "male" and the number 2, the category

"female.") The several categories of a qualitative variable ("male" and "fe¬

male" for the variable sex) are said to constitute a(n) nominal/ ordinal' /

interval / ratio scale [2.7]. All observations placed in the same category

are considered to be Cfi\ f i "•, • _ [2.7]. Numbers do not actually have to

be used here, but if numbers are used to identify the categories of a given

nominal scale, the numbers are simply a substitute for the _ of the

categories and serve only for purposes of _» 7, _ [2.7].

Ranking and scoring are techniques of measurement used for quantitative

variables (such as competence as a workman). In ranking, a number from the

series 1, 2, 3, and so on is assigned to a subject to indicate the subject's

position relative to others in the magnitude of the variable of interest.

The subject with the greatest magnitude (the subject with the greatest merit

as a workman) is ranked 1; the subject with the second greatest magnitude is

ranked 2; and so on. In this type of measurement, the numbers form a(n)

nominal / ordinal / interval / ratier scale [2.7]. The basic relation expressed

in a series of numbers used in this way is that of QffQ.tf.p 4 ^_ [2.7].

(The 1, for example, says that the workman given this rank is greater in com¬

petence than the workman given the rank 2.) However, nothing is implied about

the of the difference between adjacent steps on the scale.

(The difference in merit between the man ranked first and the man ranked sec¬

ond may be large or small, and this difference is not necessarily the same as

that between the man ranked second and the one ranked third.) Further, nothing

is implied about the _ of whatever variable is being as¬

sessed [4.7] . (All workmen could be excellent, or all could be quite ordinary,

the numbers do not indicate which.)

In scoring, a number is assigned to a subject to indicate where the subject

stands on the variable of interest, without regard for where anyone else stands.
14 Chapter 2

The number is a true score that indicates how much of the variable the sub¬

ject is thought to possess, and the difference between one score and another

is meaningful. (Workmen might be scored for competence on a scale from -5

to +5, for example, with 0 indicating an average degree of competence.) If

the possible scores form an interval scale, a given numerical interval along

the scale (a difference of 1 point, say) is considered to represent the same

difference / varying differences in the characteristic being measured irre¬

spective of the ,■ f,?y,_ of that interval along the measurement scale

[2.7, Paragraph 4]. (The difference of 1 point between the scores of -5 and

-4 is considered to represent the same difference in competence as the differ¬

ence of 1 point between the scores of, say, 0 and +1 or +4 and +5.)

When measurement is at this level, the level of the interval scale, one

may talk meaningfully about the fyi - ■ between intervals [2.7, Paragraph 5],

(The interval of 2 points between +3 and +5 represents twice as big a differ¬

ence in competence as the interval of 1 point between +3 and +4, for example,

because 2 t 1 = 2.) Nevertheless, it is not possible to speak meaningfully

about a ratio between two tj [2.7]. (It is not meaningful to assert

that a workman rated at +3 has three times the competence as a workman rated

at +1, even though 3 * 1 = 3.) The reason is that the x point is arbi¬

trarily determined, and does not imply an absence of the characteristic being

assessed. (A rating of 0 does not indicate a total absence of competence in

the example offered here.)

If the numbers available for scoring form a ratio scale, they possess all

the desirable features of the interval scale, and in addition the ratio be¬

tween ffyftCc .,_ becomes meaningful [Table 2.1]. On a ratio scale, the

number zero indicates a true absence of the characteristic being assessed.

(Competence as a workman might be measured as the number of faults one can

find and repair in a widget rigged with a dozen malfunctions, for example.

A score of zero would then indicate an absence of competence of this kind,

and a man scoring 3 could be meaningfully said to be three times as competent

as a man scoring 1, since 3 f 1 = 3.)


Preliminary Concepts 15

The Effect of Level of Measurement on Statistical Treatment

The level of measurement at which a variable is assessed limits the kind


of statistical treatment applicable to the observations on that variable. For
example, if a variable is measured at the nominal level, by simply categorizing
the subjects, it is not meaningful to find an average. (If some subjects are
categorized as male and each designated as a "1," while others are categorized
as a female and each designated as a "2," it would be silly to find the aver¬
age of all the l's and 2's and say that the subjects' sex averaged out to be
1.4 or whatever.)

This point is obvious. A more subtle point is that numbers are often used
in psychology and education that look like they fall on an interval or even a
ratio scale, but we cannot be sure that they really do. Some authorities ad¬
vise the use of statistical techniques appropriate for ordinal scales in such
cases. But as Dr. Minium implies at the end of Section 2.8 in his text, the
weight of the evidence suggests that in most situations, it is okay to treat
the ambiguous numbers as though they came from an interval or even from a ratio
scale.
16 Chapter 2

--M NEMONIC* T I P-

A statistic (in the fourth sense of the word listed in Section 2.3) is a
characteristic of a sample, and a parameter is a characteristic of a popula¬
tion. You can easily remember the distinction by noting that statistic and
sample both begin with an S, while parameter and population both begin with
a P.
*Mnemonic ("nem-ON-ik"): pertaining to memory.

EXERCISES*

You are one of those bright young people who's making a lot of money these
days by conducting telephone polls for politicians. Governor Grassroots hires
you to tell her what proportion of the adults in her state approve of her work
as the state's chief executive. You get your staff busy making telephone calls
around the state, and they ask each adult they reach, "Do you think the present
governor is doing a good job?" You yourself make the first call, which is an¬
swered by John Q. Public, who says "Yes." In all, your firm completes 500 such
quickie interviews.

In this example, what is the population of interest to you? Define it in


the two senses of the word noted in Section 2.1 of the text.

1.

What is the sample that you have drawn? Again define it in two senses.

3. _

4.
— - " " “ ' " ' “ ’ * 1 " * “ ' -—' ' - ~ i

How large is your sample? 5._ Mr. Public (or his answer to your question)

is technically known as what? 6. —


7. Is the

state in which your respondents live a constant or a variable? _


i
Now suppose your staff asks each adult not only the question about the
governor's performance but also his or her age, sex, and political affiliation,
if any. Sex is recorded as "male" or "female," age as the number of the latest
birthday, and political affiliation as "Democrat," "Independent," or "Republican."
Describe these variables by filling in the following table.

*The answers to these and all other exercises here appear at the back of
this book, beginning on p. 227.
Preliminary Concepts 17

Qualitative or Discrete or Measured on Nominal, Ordinal,


Quantitative? Continuous? Interval, or Ratio Scale?

Sex 8. 9. 10.

Age 11. 12. 13.

Affiliation 14. 15. 16.

17. Now we come to an ambiguity. Suppose the answers to the question about
the governor's performance are recorded as "Yes," "No," and "No Opinion."
Should the answers be treated as categories making up a nominal scale for the
measurement of a qualitative variable? Why or why not?

18. Here's another question for which the answer is not clear cut. Marketing
I
researchers often ask their respondents to rate a product by saying "Excellent,
"Good," "Fair," "Poor," or "Bad," and they score these answers as 5, 4, 3, 2,
and 1, respectively. This scoring is a way of measuring favorability (or un-
favorability) of opinion toward the product. What level of measurement is this
CHAPTER 3

FREQUENCY DISTRIBUTIONS
In case you're just starting now to use this workbook: the blank lines on
this page are explained on the title pages for Chapters 1 and 2.

_ 3.1 The Nature of a Score

3.2 A Question of Organizing Data

3.3 Grouped Scores

3.4 Characteristics of Class Intervals

3.5 Constructing a Grouped Data Frequency


~~ " Distribution

3.6 Grouping Error

3.7 The Relative Frequency Distribution

3.8 The Cumulative Frequency Distribution

3.9 Centiles and Centile Ranks

3.10 Computation of Centiles from Grouped


~ Data

3.11 Computation of Centile Rank

PROBLEMS and EXERCISES

2 3
1

5 6
4
8 9
7

11 12
10
14 15
13
17 18
16
20 21
19
23 24
22

19
20 Chapter 3

SUMMARY

The Frequency Distribution

Ch. 3 presents the basic technique of descriptive statistics, which is the


frequency distribution. The chapter shows how to apply this technique to a
collection of scores on a quantitative variable. The collection may be either
a sample or a population.

To construct a frequency distribution (with no grouping of scores), locate

the _est and the _est score values [3.2]. Then list all possible

score values / only the scores that actually occurred , including these two

extremes, in ascending / descending order down the page [3.2]. Finally, add

a second column to the right of the first one. In this column, list for each

score value its frequency of occurrence (the number of times it occurred).

Frequency is abbreviated Freq in Tables 3.2 and 3.3 and thereafter symbolized

with the letter _, as in Table 3.4.

The term distribution is appropriate for the collection of scores when it


is cast into a table like this, because the table shows how the scores are
distributed over the range of possible values.

Grouping Scores

Sometimes it is helpful to group the scores before making a frequency dis¬

tribution. Grouping makes it easier to display the data and grasp the essen¬

tial facts they contain. In grouping, the various possible scores are collected

into a number of class _ [3.3] . Here are some rules for making

these groupings:

1. A set of class intervals should be mutually exclusive. That is, inter¬

vals should be chosen so that [3.4]

2. It is / is not important that all intervals be of the same width [3.4].

3. The intervals should be continuous throughout the distribution.

4. The interval containing the highest score should be placed at the top /

bottom of the column listing the class intervals [3.4] .

5. For most work, there should be not fewer than class intervals and

not more than [3.4].

In a grouped frequency distribution (as in an ungrouped one), the total

number of cases in the distribution is found by summing the several values of


Frequency Distributions 21

_, and is symbolized _ if the distribution is considered as a sample,

or by _ if it is a population [3.5]. All the examples here and in the text

assume that the collection of scores is a sample, so the symbol _ is used.

Disadvantages of Grouping

A grouped frequency distribution does not contain all the information that
the corresponding ungrouped one does, because within a given class interval,
one cannot tell exactly where the scores fell. Because this information is
not given, a problem called grouping error can arise. It is often necessary
to make calculations using the numbers in a grouped frequency distribution
(this very chapter discusses calculations of centile points and centile ranks,
for example), but in doing so, one cannot tell exactly where each score fell
along the scale of possible values, and so it is necessary to make an assump¬
tion about where the scores occurred. Sometimes it is assumed that the scores
within a given class interval are distributed evenly throughout the interval,
and sometimes it is assumed that they fell in any way such that the midpoint
of the interval is the average of the scores in that interval.

To the extent that such an assumption is false, the calculations based on

it will be in error. Other things being equal, the narrower the class inter¬

val width, the more / less the potentiality for grouping error [3.6].

A set of raw scores results / does not result in a unique set of grouped

scores [3.3]. That is, there is more than one way to construct a grouped fre¬

quency distribution for a given set of scores. This state of affairs is a

second liability for the grouped frequency distribution, because a researcher,

knowingly or unknowingly, may select a grouping that is misleading.

Relative Frequencies

It is often helpful to "cook" raw frequencies by transforming them into

relative frequencies, which are frequencies relative to the total number of

scores. There are two kinds of relative frequency, proportion and percentage.

A frequency expressed as a proportion of the total is called a proportional

frequency and is symbolized _ [Table 3.5]. To calculate a propor¬

tional frequency, divide the raw frequency by the total number of cases in

the distribution, or in symbols, calculate __ [3.7]. A frequency ex¬

pressed as a percentage of the total number of cases is called a percentage

frequency and is symbolized _ [Table 3.5]. To calculate a percentage

frequency, multiply the proportional frequency by _ [3.7].


22 Chapter 3

The Meaning of a Score

Closely related to the frequency distribution is another kind of table


called a cumulative frequency distribution. To understand such a table, it
is necessary to understand a convention often adopted in descriptive statis¬
tics when the data on hand are scores on a quantitative variable. By con¬
vention, the variable is assumed to be continuous, even though the measure¬
ment of the variable yielded discrete data. A given score is then taken to
represent a range of values on the underlying continuum. A score of 18 items
correct on an algebra test, for example, is not treated as though it were a
highly precise measurement of exactly 18.0000.... Instead it is assumed to
represent a score somewhere between 17.5 and 18.5 on the underlying continuum.
The figures 17.5 and 18.5 are the limits of the score that is called 18, and
they are one-half of an integer below 18 and one-half of an integer above it.

In general, the limits of a score are considered to extend from one-half

of the smallest _ below the value of the score to

orfe-half __ [3.1].

When scores are grouped into class intervals, the limits of a class inter¬
val can be given as score limits or as exact limits. The score limits are
merely the lowest and the highest raw scores that fall into the class inter¬
val. The exact limits are the lower limit of the lowest score and the upper
limit of the highest score. (See Section 3.4, Principle 6.)

The difference between the upper exact limit and the lower exact limit is

the width of the class interval and is symbolized with the letter [3.5].

The Cumulative Frequency Distribution

In a cumulative frequency distribution (such as Table 3.6), the important

numbers in the left-hand column are the upper exact limits of the class inter¬

vals. For each class interval, a cumulative frequency distribution shows how

many cases lie above / below the upper exact limit [3.8], The number of

cases below a given upper exact limit is the gumulative frequency for that

limit, and it is symbolized __ [3.8] . The cumulative frequencies are

entered starting at the top / bottom [3.8],

A cumulative frequency, like an uncumulated one, can be expressed relative

to the total number of cases, n. Again, the relative figure can be a propor¬

tion or a percentage. A cumulative frequency expressed as a proportion of the

total is called a proportional cumulative frequency and is symbolized

[Table 3.6]. To compute one, divide the raw cumulative frequency by [3.8].

The proportional cumulative frequency for the upper exact limit of the top-
Frequency Distributions 23

most interval will always equal 1.00 (as Table 3.6 shows). A cumulative fre¬

quency expressed as a percentage of the total number of cases is called a

cumulative percentage frequency and is symbolized __ [Table 3.6], It

is calculated by multiplying the corresponding proportional cumulative fre¬

quency by [3.8]. The cumulative percentage frequency' for the upper ex¬

act limit of the topmost class interval will always equal 100.

[The summary continues after the following exercises.]

EXERCISES

Here's a table of the kind we've been reviewing. Never mind what the
scores mean; that's irrelevant for now. Just think about the internal logic
of the table. I have given you the top value in the cum f column and four
figures in the column of cum %f's. From these numbers you can determine all
the missing information. Just recall what you know about the topmost value
in the cum f column, and remember how you go about finding the cum %f's.

Give this problem a good, honest try. You'll feel really proud if you
figure it out for yourself. If you need help, though, it's available at the
back of the book.

Score Limits Exact Limits f Prop.f %f cum f Prop.cum f cum %f

23 - 27 22.5 - 27.5 3 "Nt


/ 'V
12 /. 0 0 K 0
tpr' 75
18 - 22 17.5 - 22.5
3 .3^ ^ s . 7 S'
50
. 5D
13 - 17 12.5 - 17.5
a. ,n 17
33
8-12 7.5 - 12.5
1 . 0% % . 33
3-7 2.5 - 7.5 3 c 25
. 2.5" OO 3 *

If you think you can't figure the table out, look at the hint below and then
try again.

* -[UAjieq

ut qsouiuioqqoq oqg uog anppA g sqq ospe st £ pup £ = ZT 1° %S£ — TPAaoguT


qsouimoqqoq aqq joj J mno aqq ‘ Zl = u 31 ’u = 3nTBA J mnD 3SOUidoq
24 Chapter 3

Here are two more problems of the same kind, to help you understand the
connections among the different numbers in tables like these.

Score Limits Exact Limits f Prop, f %f cum f Prop, cum f cum %f

23 - 27
j V • S' 4 12 /* 0 0 i |v jrt

18 - 22 t
i "1 0 67
a A
18 "7
. ul
13 - 17 l Z 1 s$ ~ i )• if 0s. ll 11 Cj? * / \ 50

8 - 12 IS- l i. s 6 ft %uJ*\J 33

M* J®*
3 - 7
St'S'- #, A
§1 O
(j 33 Lj ( «< "S 33
3

£
n
Score Limits Exact Limits f Prop. f %f cum f Prop, cum f cum %f
496 - 505 495.5 - 505.5 ( *0 15 l-yt | 05
f .

HUc - 415 Lj$s S'- '' *1< s' 5 35 1^ .67


\£r f

A
Hit1 - 485
<-/74<S Mt<'5 .13 •i * ’ - M1 1vyf5
r\ A
LUS -474 44> V S- 4741 'S' * h .33 *

Here are some principles that describe the connections among the various
parts of tables like these. Note how they're illustrated in the tables above
(when the numbers are correctly filled in).

1. The sum of the f values = n (or N) = the top number in the cum f column.

2.
The top number in the Prop, cum f column is 1.00, and the top number in
the cum %f column is 100.

^* AH figures in the column of f values and cum f values must be whole


numbers (0, 1, 2, 3, and so on), because these figures are counts, counts
of the numbers of scores of various kinds.

4. The figures in the Prop. f and Prop, cum f columns, as proportions,


must be between 0 and 1.

5. The figures in the %f and cum %f columns, as percentages, must be


between 0 and 100.

6. A given figure in a cumulative column is the sum of two numbers: the


cumulative figure immediately below and the corresponding uncumulated value
in its row.
Frequency Distributions 25

MAP of the FREQUENCY and CUMULATIVE FREQUENCY DISTRIBUTIONS

A Collection of scores
is well described by-—» a Frequency Distribution
on a Quantitative Variable

lists all lists can be recast


possible scores into

Uncumulated Frequencies
with or without with which scores Cumulative
occurred Frequency

Grouping

can cause
collects scores
into

Grouping
Error
Class Intervals

should be
J Raw Numbers Relative Numbers

mutually exclusive

• continuous

in order from high down to low

•between 10 and 20 in number

Total Number of Cases

SUMMARY , Continued

Centile Points and Centile Ranks


The information in a table giving the cumulative percentage distribution
for a given set of scores is often presented in terms of what, are called cen
tile points and centile ranks. (Sometimes the words percentile point an
26 Chapter 3

percentile rank are used instead.) A centile point (sometimes called just a
centile) is a point along the scale of possible scores (which is assumed to
be a continuum), and it falls among the numbers shown on the left side of the
table. (It may be helpful to label the left-most column of Table 3.6 "Centile
Points.") A centile point is named by specifying the percentage of scores in
the distribution that fall below it. Thus the 96th centile point in a given
distribution is a certain point along the scale of scores, namely that point
that cuts off the bottom 96% of the distribution. The symbol for the 96th

centile point is _[3.9, Paragraph 2], Because a centile point is a point

along the scale of scores, it may have any value that a score may have. The

96th centile point in Table 3.6 is _[3.9, Paragraph 2],

A centile rank, in contrast, is a percentage, a percentage of the total

number of casbs in the distribution. Centile ranks fall among the cumulative

percentage frequencies shown on the right side of the cumulative distribution.

(It may be helpful to label the right-most column of Table 3.6 "Centile Ranks."

As percentages, centile ranks may take values only between and [3.9].

The centile rank for the score 90.5 in Table 3.6 is 96.

[The summary concludes after the following exercises and examples.]

EXERCISES

To practice using the concepts of centile point arid centile rank, label
the left-most column of Table 3.6 "Centile Points" and the right-most column
Centile Ranks, if you haven't done so already. Note again that the value
of 90.5 on the left goes with the cum %f of 96 on the right. In terms of
centile points and centile ranks, 90.5 has a centile rank of 96, and the 96th
centile point is 90.5. Now answer these questions about Table 3.6.

1 . The centile rank of 93.5 is


O
2. The centile rank of 69.5 is

3. The 84th centile point is s / i/ . _. 4. The 14th centile point is f7■ •

5. c28 is .5 6. C52 is 91
o

C falls between and %\


CJl
o

,.i,.

8 . Cgg falls between


k
and C9 0 falls between and
10 . The centile rank of 77.0 falls between and •
11. The centile rank of 82.0 falls between ^ {iLsi- and dli .
12. The centile rank of 92.0 falls between and
M U .
Frequency Distributions 27

Statistics in Action ■ --— --—-—— -

CENTILE POINTS and CENTILE RANKS of SPECIAL INTEREST

In the early 1960s, researchers working for the federal government ap¬
proached about 400 young men and 500 young women, asked them to take off their
shoes and step on a scale, and measured their height and weight. Let's focus
on just the heights for the men. The 400-odd heights were cast into a cumu¬
lative percentage frequency distribution like Table 3.6, and certain centile
points were found, namely the 1st, 5th, 10th, 20th, 30th, and so on up to the
90th, 95th, and 99th. Instead of publishing a table of the kind you're now
familiar with, though, the National Center for Health Statistics reported just
these centiles. They're shown on the left below.

SELECTED CENTILE POINTS from the DISTRIBUTIONS of


HEIGHTS and WEIGHTS of AMERICAN MEN and WOMEN
AGED 18-24 YEARS

Men's Height Women's Height Men's Weight* Women's Weight*


(Inches) (Inches) (Pounds) (Pounds)
Centile Centile Centile Centile Centile Centile Centile Centile
Point Rank Point Rank Point Rank Point Rank

74.8 99 69.3 99 231 99 218 99

73.1 95 67.9 95 214 95 170 95

72.4 90 66.8 90 193 90 157 90

70.9 80 65.9 80 180 80 145 80

70.1 70 65.0 70 171 70 137 70

69.3 60 64.5 60 164 60 131 60

68.6 50 63.9 50 157 50 126 50

67.9 40 63.0 40 151 40 122 40

67.1 30 62.3 30 145 30 117 30

66.5 20 61.6 20 140 20 111 20

65.4 10 60.7 10 131 10 104 10

64.3 5 60.0 5 124 5 99 5

62.6 1 58.4 1 115 1 91 1

*Weight includes some clothing. Nude


weight about 2 lb. less.
28 Chapter 3

To practice using the concepts of interest now, ask yourself what the
50th centile point (C50) for the men's heights is. A centile point is a
point along the scale of scores, remember, so here it will be a height.
The 50th centile point is that score that has 50% of the scores below it.
The table tells you that it's 68.6", so we now know that half the men in
this sample were under 68.6" in height.

A man who stands six feet tall (72.0") has what centile rank in compar¬
ison to these other fellows? Asking for a centile rank is asking for a
percentage, so the answer must be between 0 and 100. The table shows that
the centile rank is somewhat less than 90. For a height of 72.4, which is
the closest we can get to 72.0 using the entries in the table, the centile
rank is exactly 90, and a shorter height would have a smaller centile rank,
of course. So a six-footer is taller than almost 90% of the men in the
sample.

Now do these problems on your own:

*****

/S
_ 1. How tall do you have to be, at minimum, to exceed the height
of the shortest 20% of the men in this sample?

2. Is the answer to Question 1 a centile point or a centile rank?

3. Suppose you're six-one. What percentage of this sample was


taller than you?

4. Is the answer to Question 3 a centile point or a centile rank?

Ithi! 6 5. C60 = ?
\

VD

))f\¥ Is Ceo a centile point or a centile rank?


<b7«e1 -tplIS. The middle 20% of the scores in any distribution run from C40
to c60* Half of the 20%, namely 10%, lie between C4q and C5Q,
which is the score in the very middle. The other half of the
20% lie between C50 and C0Q. In this distribution, the middle
20% of the scores lie between which two heights?

The middle 90% of the scores lie between which two values?

How many men in this sample were between 65.4" and 66.5" tall?

How many men were between 70.1" and 73.1" tall?

Enough on male heights. The National Center for Health Statistics treated
the heights of the women, the weights of the men, and the weights of the women
in the way they treated the data you just examined in detail. The other
three distributions are tabled above, and the set of four will enable you to
compare your height and weight with the measurements for people of your own.
sex (and for most students, for people of their own age). The samples were
large (for men, n was about 411; for women, n was about 534—the researchers
did not supply the exact figures), and they were drawn to be representative
of the noninstitutionalized population of 18- to 24-year-olds in the lower
48 states. Thus you can be quite confident that the centile ranks shown in
these tables are close to the figures for the full populations.
)

Frequency Distributions 29

Students often ask at this point about the correlation between height and
weight. Common observation tells us that taller people tend to be heavier,
so there is in fact a correlation between height and weight for human beings.
These four tables do not, however, provide any evidence on the correlation
for either sex. So far as we can tell from the tables, the 50% of the men
who are shorter than 68.6" could be the same 50% who are heavier than 157 lb.
Ch. 9 will show you what kind of table provides evidence for the existence of
a correlation between one variable and another.

[The tables above were derived from Weight, Height, and Selected Body
Dimensions of Adults, National Center for Health Statistics Series 11, No.
8 (Washington: U.S. Government Printing Office, 1965).]

SUMMARY, Concluded

Look at Table 3.6 again. You should have labeled the left-most column

"Centile Points" and the right-most column "Centile Ranks." Sometimes we

find ourselves wondering about a centile point whose centile rank is not one

of the cumulative percentage frequencies shown on the right side of such a

table. (In Table 3.6, for example, we might wonder where the 50th centile

point falls.) Such a centile point will not be one of the upper exact limits

shown on the left side of the table, and the table does not directly give

the value of such a centile point. It can be estimated, though, using a

procedure called linear u.3 f ? -Iflfl im_ l3-10' footnote]. This pro-

cedure rests on the assumption that the scores in a given class interval are

,r _ throughout the interval [3.10, Paragraph 2]. (The next

section of the workbook offers help in understanding how to compute a centile

point in this way.)

Similarly, we sometimes find ourselves wondering about the centile rank

of a score that is not one of the upper exact limits shown on the left side

of a cumulative percentage frequency distribution like Table 3.6. The cen¬

tile rank for such a score will not be found on the right in the column of

cumulative percentage frequencies. (In Table 3.6, for example, we might

wonder what the centile rank for a score of 75 is.) It can also be estimated,

though, again using the procedure of linear interpolation. The assumption

about how the scores are divided within a given class interval is the same

as / the assumption made for computation of a centile point [3.11].


30 Chapter 3

SPECIAL HELP with the COMPUTATION of CENTILE POINTS

Here's a detailed version of the procedure outlined on p. 38 of the text.


The six steps listed here correspond to those in the text. The symbol C is
used to designate the centile point with a centile rank of X. X

1. Find the class interval in which the desired centile point falls. To
do so, determine the number of cases that constitute X% of the whole. This
number will be the cum f for C^., and it is given by the formula

cum f for C^ = (j^) (total number of cases)

The desired class interval is the one such that

cum f for lower exact limit < cum f for C < cum f for upper exact limit

For example, to find C^q for the data in Table 3.7 of the text, find cum f
for ^50' which is (50/100)(80) = 40. The desired class interval, we can now
tell, has the score limits 73 and 75 and the exact limits 72.5 and 75.5 (the
exact limits are not shown in the table), because

cum f for 72.5 (namely 32) < cum f for C50 (namely 40) < cum f for 77.5 (namely

2. Note again what the cum f for the lower exact limit is. In the example,
it's 32.

3. Determine the number of additional scores that together with this cum f
will equal the cum f for C . The formula is simple:

# of additional scores = cum f for C - cum f for lower exact limit


X

In the example, the number of additional scores needed to equal 40 is 40 - 32


= 8.

4. Note the f (not the cum f) for the interval and assume that this number
of scores is distributed evenly throughout the interval. In the example, f =
12, and we thus assume that the bottom score in the interval is in the bottom
twelfth of the interval, the next-to-the-bottom score is in the next-to-the-
bottom twelfth, and so on.

5. Find the distance up into the interval that (on the stated assumption)
is occupied by the additional scores needed to equal the cum f for C
X'
a. This distance will be a certain fraction of the width of the inter¬
val, and the fraction is given by the formula

fraction of interval = —ac*ditional scores needed to equal the cum f for C


f for the interval ^
Frequency Distributions 31

In the example, the fraction is 8/12; the 8 is from Step 3 and the 12 from
Step 4. On the assumption that the 12 scores are distributed evenly through¬
out the interval, the bottom 8 are in the bottom 8/12 of the interval.

b. To find the desired distance into the interval, the distance occupied
by those additional scores needed to equal the cum f for C^, multiply the frac¬
tion of the interval just computed by the width of the interval. The width is
simply:

width of interval = upper exact limit - lower exact limit

In the example, the width is 75.5 - 72.5 = 3.0 units. Multiplying the fraction
by the width, we have 8/12 times 3.0 units = 2.0 units, and we have thus deter¬
mined that on our assumption, the bottom 8/12 of the interval is the bottom 2
units.

c. The general formula for the desired distance is:

# of additional scores needed to equal cum f for Cy width of


desired distance
f for the interval interval
up into interval >

6.Add the distance found in the preceding step to the lower exact limit
of the interval in which you are working. This addition determines the point
along the scale of scores that cuts off a) those scores lying below the lower
exact limit plus b) the additional scores within the interval needed to equal
the cum f for C . This point is C , the point along the scale of scores below
which X% of the^cases fall. In the example, we have 72.5 + 2.0 = 74.5 = the
50th centile point.

Check to be sure your answer is within the interval that you located in
Step 1.

In general, C=

lower exact + a certain distance up into the interval (as found in Step 5c)
limit of interval
or

lower exact + a certain fraction times the width of the interval


limit of interval
or

# of additional scores needed to equal cum f for Cy width of


lower exact
+ f for the interval interval
limit of interval

In the last formula, the # of additional scores is given by the equation in Step
3, and the width is given by the equation in Step 5b.

The examples offered in Section 3.10 and the problems and exercises for Ch.
3 provide material for practicing the procedure spelled out above. Remember
that this workbook does not attempt to replace those problems and exercises.
f;
.

*
CHAPTER 4

GRAPHIC REPRESENTATION

The purpose of the blank lines below is explained on the title pages for
Chapters 1 and 2 of this workbook.

4.1 Introduction

4.2 The Histogram

4.3 The Bar Diagram

4.4 The Frequency Polygon

4.5 The Cumulative Percentage Curve

4.6 Graphic Solution for Centiles


and Centile Ranks

4.7 Comparison of Different Distri¬


butions

4.8 Histogram versus Frequency


Polygon

4.9 The Underlying Distribution and


~~ “ *“ “ ' — " Sampling Variation

4.10 The Mythical Graph

4.11 Possible Shapes of Frequency


Distributions

PROBLEMS and EXERCISES

1 2 3

6
4 5

7 8 9

11
12
10

14 15
13

16 17

33
34 Chapter 4

SUMMARY

The Histogram

Look at Table 4.1 in the text. This table presents a grouped _

distribution for a collection of scores on a certain kind of intelligence test

[4.2]. All the information in this table can also be presented in a graph

called a _, which is shown in Figure 4.1.

A graph like this has two axes. The horizontal one is also called the

abscissa / ordinate or the X/Y axis, and the vertical one is called the

abscissa / ordinate or the X/Y axis [4.2]. It is customary to represent

_ along the horizontal axis, and

_ along the vertical axis [4.2].

The histogram consists of a series of rectangles, one for each class


interval. The width of a rectangle indicates the width of the corresponding
class interval. The left edge of the rectangle rises from the point along the
horizontal axis that represents the lower exact limit of the class interval,
and the right edge of the rectangle rises from the point that represents the
upper exact limit. The height of a rectangle in Figure 4.1 indicates the raw
frequency of the scores in the corresponding class interval.

It is possible for the height of the rectangles in a histogram to indicate


not raw frequency but relative frequency of scores; it is necessary only to
relabel the vertical axis. For the sample of scores shown in Table 4.1 and
Figure 4.1, n = 100. Thus the raw frequency of 20 for the interval from 99.5
to 109.5 is a relative frequency of 20/100, which is .20 or 20%. In order to
show percentage frequency, the vertical axis of Figure 4.1 would only have to
say 20% where it now says 20, and similarly for the other numbers on the axis.

The Bar Diagram

The bar diagram is used to graph categorical / quantitative data, and it

is similar to the _, except that space is inserted between the

_ [4.3].

The Frequency Polygon

Data that can be graphed as a histogram can also be represented by a

frequency polygon. In this type of graph, a point is plotted above the _

_ of each class interval at a height representing the _

of scores in that interval [4.4]. These points are then connected with

straight / curved lines [4.4]. (The histogram in Figure 4.1 can be easily
turned into a frequency polygon by making a dot in the middle of the top of
each rectangle and then connecting the dots.)
Graphic Representation 35

If nothing further is done, the zig-zaggy line will not touch the horizontal

axis at either end, but it is conventional practice to bring it down to the axis

on both sides. To do so, identify the two ___ falling immedi¬

ately outside those end class intervals containing scores. The _

of these intervals are plotted at __ frequency, and these two points are

connected to the graph [4.4].

As with the histogram, the vertical axis of a frequency polygon can show

either raw frequency or frequency, depending on how it is labeled

[4.4] .

Choosing Between a Histogram and a Frequency Polygon

The information in any frequency distribution for a quantitative variable

can be graphed as either a histogram or a frequency polygon. Which is better?

In general, the _ is better [4.8, Paragraph 1]. It is

especially valuable when two or more distributions are to be compared. The

general public, however, seems to find it a little easier to understand the

[4.8, Paragraph 2]. And the area in the bars of a _

is directly representative of relative _ [4.8]. One hundred per¬

cent of the area in the bars represents 100 percent of the scores in the distri¬

bution. The percentage of the total area that falls in any given rectangle

represents the same percentage of the scores. In Figure 4.1, for example, the

bar over the interval from 99.5 to 109.5 has 20% of the total area in all the

bars, and it thus indicates that 20% of the scores fall in this interval. This

relationship between percentage of area and percentage of scores is only approx

imately / exactly true in the frequency polygon [4.8].

Shapes of Frequency Distributions

Every collection of scores on a quantitative variable has a fundamental

characteristic called its shape. The shape is determined most easily by look¬

ing at the literal shape of the histogram or frequency polygon graphing the

distribution of scores. Some commonly occurring shapes appear in Figure 4.11.

The bimodal, normal, and rectangular distributions are all symmetrical / asym¬

metrical, while the J-shaped and skewed distributions are symmetrical / asym¬

metrical [4.11].
36 Chapter 4

The Graph of a Distribution?

There is no such thing as the graph of a given set of data. The same set

of raw scores may be grouped in different ways /only one way [4.10]. And a

graph can be squat or slender depending on the relative scale of the two axes.

Sampling Variation

Frequency distributions resulting from a very large number of scores often

exhibit a pronounced regularity/ irregularity of shape [4.9]. But when a

sample is drawn from such a population, the shape of the sample is likely to

be more irregular. In general, the fewer the cases, the greater /less the

irregularity of the shape of the sample [4.9]. This is an illustration of the

principle that a larger sample will usually resemble the population more closely

than a smaller sample will.

The Cumulative Percentage Curve

Look at Table 4.2 on p. 49. The right-hand column shows the

percentage frequency distribution for a collection of 80 scores [4.5]. Such a

distribution can also be presented in a graph like Figure 4.4, which is called

a cumulative percentage frequency _, or __ [Figure 4.4, caption].

As with the histogram and the frequency polygon, the horizontal axis shows the

various possible scores, but now the vertical axis shows cumulated frequencies

(raw or relative) rather than uncumulated ones.

In a. table like 4.2, remember that a given cumulative percentage (say 40.0

on the cum %f scale) corresponds to the upper exact limit of the class interval

across from it (in this case, to the upper exact limit of the interval 70 - 72,

which is 72.5). Therefore in constructing cumulative percentage curves, a

given cumulative frequency is plotted at the midpoint / upper exact limit of

the corresponding class interval [4.5].

A cumulative percentage curve can be used to find the centile rank for a

given score, or the centile point for a given centile rank. Such graphic deter¬

mination of centile points or centile ranks will /will not yield the same

result as that given by the computational procedures outlined in the previous

chapter [4.6]. Connecting the points on the cumulative curve with straight

lines is the graphic equivalent of \a 61


Graphic Representation 37

EXERCISES

To develop your understanding of the concept of the shape of a distribu¬


tion, try guessing the shape of each distribution described below. Name the
shape, choosing from the terms introduced in Section 4.11 and used in the fig¬
ure of the same number. If you guess that a distribution is skewed or J-
shaped, specify whether the long "tail" is on the left or the right.

1. Consider the millions of Americans who earned money


in the form of wages or a salary last year. What is the shape of the distri¬
bution of their incomes?

2. Suppose that 523 college seniors take a 50-item arith¬


metic test intended for sixth graders. What shape will the distribution of
the seniors' scores have?

3. In beautiful New Jersey, a person must be at least


17 years old to possess a driver's license. Consider the population of New
Jerseyites who currently hold a license; some are just 17, but most are older.
If we determine for each such person the age at which she or he first acquired
a license, what will be the shape of the distribution of ages?

If you're having trouble with these, try this strategy: Visualize the
table listing the frequency distribution. In its simplest form, it will look
like Table 3.2 or 3.3 on p. 29 of the text, with just two columns. Think what
the numbers in the left-hand column would be. Then guess the pattern in the
frequencies in the right-hand column. Where, for example, would the large
frequency counts fall—at the top, in the middle, or at the bottom of the col¬
umn? Where would the small counts fall? Finally, visualize the translation
of the frequency distribution into a histogram or a frequency polygon, and
choose the appropriate word for describing its shape.

4. Suppose we assemble all the college professors in


North America who have one and only one child in grade school, and each pro¬
fessor brings his or her child. We time each of them, parents and children
alike, while they read the article on Japan's Statistics Day reprinted earlier
in this workbook, and for each we determine the reading rate in words per min¬
ute. What shape will the distribution of reading rates take?

5. What is the shape of the distribution of the weights


of all students enrolled in statistics courses around the world this semester?

6. What is the shape of the distribution of these stu¬

dents' heights?
7. Suppose that 523 sixth-graders take a 50-item spelling
test intended for college seniors. What shape will the distribution of the
sixth-graders' scores have?

8. Two hundred basketball players are assembled, 100


chosen at random from the population of professionals and 100 chosen at random
from the population of high-school teams. What shape will the distribution of

200 heights take?


38 Chapter 4

Statistics in Action —-----

GAS SHORTAGE ISN'T AFFECTING SPEED

Madison, Wis., Aug. 1, 1973--Gasoline shortages notwithstanding, motorists


are still traveling about as fast as ever, according to the 1973 biennial traf¬
fic speed study conducted by the State Department of Transportation.

The study found that average speed records for all types of vehicles on
two-lane rural roads, both night and day, have actually increased over the last
study in 1971. The only indication that people may be heeding pleas to slow
down and conserve gasoline are slightly slower speeds for passenger cars on
interstate highways.

The biennial speed studies were conducted at 25 points on the state trunk
system in May, 1973, to establish average speeds for various types of vehicles,
and to determine trends as well.

Average speeds are somewhat misleading. A more accurate reflection of how


fast traffic moves is the 85th percentile speed, which has long been accepted
by traffic engineers as representative of "reasonable" speeds. The 85th per¬
centile speed is the speed below which 85 percent of vehicles move.

Average and 85th percentile speeds are as follows, with comparative 1971
figures in parentheses:

Interstate, daytime (speed limit 70): Wisconsin passenger cars, 85th per¬
centile speed 74.1 (74.3), average speed 67.9 (68.4). Out-of-state passenger
cars, 85th percentile speed 74.9 (75.0), average speed 69.3 (69.6).

Interstate, nighttime (speed limit 60): All passenger cars, 85th percentile
speed 68.7 (70.7), average speed 62.8 (64.2).

Rural, non-interstate, daytime (speed limit 65): Wisconsin passenger cars,


85th percentile speed 68.1 (66.5), average speed 59.7 (58.5). Out-of-state
passenger cars, 85th percentile speed 69.5 (68.7), average speed 61.1 (61.0).

Rural, non-interstate, nighttime (speed limit 55): All passenger cars,


85th percentile speed 62.3 (61.5), average speed 55.1 (54.7).

Traffic speeds have been steadily increasing since studies began in 1938-
1939, except for the 1942-45 war years, when there was a uniform 35 mph limit.

Average speed in 1938 was 49.6 with the 85th percentile speed at 61.5.
There was no speed limit.

The lowest speeds recorded were in 1942, when average speed was 37.1 mph
and the 85th percentile speed was 42.9 mph. In 1950 the average speed was
50.9 mph and the 85th percentile speed was 59.9.

[Excerpted by permission from The Capital Times of August 7, 1973, p. 20.]


The average speed and the 85th percentile speed are used here to summarize
the location of an entire distribution. The next chapter discusses this matter
in detail and notes that there are several kinds of average. (The article does
not state which one the traffic engineers used.)
CHAPTER 5

CENTRAL TENDENCY

5.1 Introduction

5.2 Some Statistical Symbolism

5.3 The Mode

5.4 The Median

5.5 The Arithmetic Mean

5.6 Effect of Score Transformations

5.7 Properties of the Mean

5.8 Properties of the Median

5.9 Properties of the Mode

5.10 Symmetry and Otherwise

5.11 The Mean of Combined Subgroups

5.12 Properties of the Measures of


Central Tendency: Summary

PROBLEMS and EXERCISES

2 3
1

5 6
4

8 9
7

11 12
10

14 15
13

17 18
16

20 21
19

22 23

39
40 Chapter 5

SUMMARY

Look at the distribution of scores shown in Table 5.1 on p. 66 of the text.


You can see at a glance that the scores range from the 40s up to the 90s. The
column of frequency counts indicates, however, that most of the scores fall not
at either of the extremes, but in the middle of the distribution, in the 60s
and 70s. This is a typical pattern for a collection of scores on a quantitative
variable, whether the scores are a sample or a population: the scores tend to
cluster around some central location. A measure of central tendency is a num¬
ber that points to this central location, telling you about how large the scores
in general run.

There are three measures of central tendency in common use: the _,

the , and the arithmetic _ [5.1]. Any one of these can properly

be called the average of the scores; the term average is thus vague.

The Mode

If the scores in the collection have been left ungrouped, the mode is the

score that occurs with the greatest _ [5.3]. In grouped data, the

mode is taken as the _ of the class interval that contains the

greatest / smallest number of scores [5.3]. The symbol for the mode is _ [5.3]

The Median

If there are only a few scores in a collection (in which case they would

naturally be left ungrouped), the median is usually defined informally. The

scores are arranged from high down to low, and if there is an odd number, the

median is defined as the score in the middle. When there is an even number of

scores, there is no one middle score, and the median is taken as the point

way between the two scores that bracket the middle position [5.4].

The formal definition of the median is C , the 50th centile point, which
b(J
is the point along the scale of scores below which _% of the scores fall [5.4]

For a large number of scores that have been grouped, the median can be calcu¬

lated like any other centile point, using interpolation.

The symbol for the median is _ [5.4].

The Mean

The mean is the result of summing all the scores and then dividing the sum

by the _of scores [5.5]. Strictly speaking, this procedure defines

the arithmetic mean (there are other measures of central tendency also called
Central Tendency 41

means). The symbol for the mean of a sample of scores collectively called x

is , and the symbol for the mean of a population is _ [5.5]. The latter

symbol is the lower-case Greek letter _____ [5.5].

When the scores in a set are to be summed, as in finding the mean, the

capital Greek letter _, £, indicates that this operation is to be per¬

formed [5.2]. Y.X should be read " of X" [5.2]. Using this symbol,

the mean of a population called X is defined as follows:

y = - [Formula 5.1a]

The mean of a sample called X is defined as:

X = - [Formula 5.1b]

(Be sure you used N and n correctly in these formulas.)

Properties of the Mean

Many important properties of the mean can be understood with the aid of a
physical analogy. Picture the scores in a distribution as weights arranged
along a weightless plank, as in Figure 5.1. The plank will balance (become
level) at a certain point, and that point is the mean of the scores.

Here is a basic property of the mean (not noted in the text) that you can
easily understand by thinking of the mean as the balance point of the distribu¬
tion: The mean always falls somewhere between the lowest and the highest score
in the distribution (unless all scores have the same value). In the physical
analogy, the distribution always balances at a point somewhere between the left¬
most and the right-most weights.

You can also understand how the mean is sensitive to the exact location of
each score. The balance point is sensitive to the exact location of each weight,
and if a given weight is moved (corresponding to a change in value), the bal¬
ance point will change. Note especially that an extremely low score pulls the
mean way down, just as a weight well to the left of the others pulls the bal¬
ance point way over to the left. Similarly, an extremely high score pulls the
mean way up, just as a weight well to the right of the others pulls the balance
point way over to the right.

Suppose you subtract the mean from each score in the distribution. The
result for a given score is called the score's deviation from the mean. A
score below the mean will have a negative deviation, and a score above the mean
will have a positive deviation. If you compute the mean and the deviations
correctly, the sum of the deviations, taking into account the fact that some
are negative and some positive, will always be zero. This is the numerical way
of saying that the mean is the balance point of the scores, and it is illustra¬
ted in Figure 5.1
42 Chapter 5

Suppose you add a certain number (say 10) to each score in a distribution.

What will happen to the mean? In the physical analogy, this is like sliding

each weight to the right by a certain amount (10 units in the example). The

balance point will obviously move right along with the weights (to a point 10

units to the right in the example). In terms of numbers, if some constant

amount is added to each score in a distribution, the entire distribution is

shifted up /down by that amount, and the mean will be ____

[5.6]. Similarly, if a constant is subtracted from each score,

the mean will be ___ [5.6]. Other measures of

central tendency discussed in this chapter are / are not affected in the same

way [5.6].

Unfortunately, the analogy between the mean and the balance point does not

help you understand what happens when you multiply or divide every score in a

distribution by a constant (because it is hard to picture what happens to the

weights as a result of the multiplication or the division). Multiplying each

score by a constant the mean by the amount of the constant,

and dividing each score by a constant has the effect of_the mean

by that amount [5.6].

Properties of the Median

The median responds to how many scores lie below (or above) it, and also

to /but not to how far away the scores may be [5.8]. Thus the median is

more / less sensitive than the mean to the presence of a few extreme scores

[5.8]. This fact means that in distributions that are strongly asymmetrical

(or skewed), the median / mean may be the better choice if it is desired to

represent the bulk of the scores and not give undue weight to the relatively

few deviant ones [5.8] .

Properties of the Mode

For a given distribution, there is only one mean, just as there is only one

point where the weights representing the scores would balance. Likewise, there

is only one median, because there is only one point that divides the top half

of the scores from the bottom half. There may / may not be more than one

mode [5.9]. The mode is the only measure that can be used for data that have

the character of a(n) nominal / ordinal / interval / ratio scale [5.9].


Central Tendency 43

Sampling Fluctuation and Inferential Statistics

Suppose there is a population of scores, and you draw a random sample of a


certain size from it. The sample will turn out to have a particular mean, a
particular median, and a particular mode (maybe more than one mode). Now draw
another random sample of the same size. The mean, median, and mode(s) of the
second sample are likely to differ from their counterparts in the first sample.
Draw a third sample of the same size, and continue drawing samples of this size.
The means will fluctuate from sample to sample, and so will the medians, and so
will the modes.

The -s wil1 vary least among themselves [5.7, final paragraph], and

the - stands second in ability to resist the influence of sampling

fluctuation [5.8]. This state of affairs makes the mean the most useful measure

Oj. central tendency in inferential statistics—the most useful, that is, in

techniques for drawing inferences about a population from a sample. Further¬

more, the inferential techniques require various calculations based on a measure

of central tendency, and of the three measures, the _ is amenable to arith¬

metic and algebraic manipulation in a way that the other measures are not [5.7].

Relative Locations of the Three Measures of Central Tendency

In distributions that are perfectly symmetrical, mean, median, and (if the

distribution is unimodal) mode will yield the same / different values [5.10].

If the mean and median have different values, the distribution cannot be /may

or may not be symmetrical [5.10]. The more skewed, or lopsided, the distribu¬

tion is, the greater / lesser the discrepancy between these two measures [5.10].

In a smooth negatively skewed distribution, the has the highest

score value [5.10]. The _ has been specially affected by the fewer but

relatively extreme scores in the tail, and thus has the lowest value [5.10]

In a positively skewed distribution, the same / opposite situation obtains


[5.10].

EXERCISES

Which measure of central tendency is:

_ !• The only one suitable for qualitative (nominal or categorical) data?

_ 2. Impossible to compute if the distribution is open-ended?


44 Chapter 5

_ 3. Sensitive to the exact value of each score?

_ 4. The most resistant to sampling fluctuation?

_____ 5. Responsive only to the number of scores above it or below it, not
to their exact locations?

_ 6. Most widely used in advanced statistical procedures?

_ 7. The least resistant to sampling fluctuation?

__ 8. Least useful for advanced statistical procedures?

_ 9. The balance point of the distribution?

_ 10. The point about which the sum of negative deviations equals the sum
of positive deviations?

_ 11. Lowest in value when the distribution is positively skewed?

_ 12. Sometimes not a unique point in the distribution?

_ 13. Lowest in value when the distribution is negatively skewed?

_ 14. Most sensitive to extremely low or extremely high scores?

__ T^e point along the scale of scores that divides the upper half of
the scores from the lower half?

__ 16. Highest in value when the distribution is negatively skewed?

_ 17. The measure that best reflects the total value of the scores?

_ 18. Highest in value when the distribution is positively skewed?

SYMBOLISM DRILL
Yes, this is a drill such as you did many of back when you learned your
alphabet and multiplication table. And like those it'll be good for you. Fill
in the blanks. Answers appear on p. 231.

Symbol Pronunciation Meaning


1
"little en" Number of scores in a _
2
"big en" Number of scores in a
3
"eks" A score, or the set of scores, collectively
4
I Result of summing quantities of some kind
5
X ZX/n; the mean of a _
6
y Zx/N; the mean of a _
7
Mdn C50 (may also be defined informally)
8
Mo Score or midpoint of class interval with largest f
Central Tendency 45
u

.
CHAPTER 6
VARIABILITY

__ 6.1 Introduction

_ 6.2 The Range

_ 6.3 The Semiinterquartile Range

6.4 Deviation Scores

6.5 Deviational Measures: The Variance

_ 6.6 Deviational Measures: The Standard


Deviation

6.7 Calculation of the Standard Deviation:


Raw Score Method

6.8 Score Transformations and Measures of


Variability

6.9 Calculation of the Standard Deviation:


Grouped Scores

6.10 Properties of the Standard Deviation

6.11 Properties of the Semiinterquartile


Range

6.12 Properties of the Range

6.13 Measures of Variability and the Normal


Distribution

6.14 Comparing Means of Two Distributions

6.15 Properties of the Measures of Variability:


Summary

The list of problems and exercises appears on the next page.

47
48 Chapter 6

1 2 3

4 5 6

7 8 9

10 11 12

13 14 15

16 17 18

SUMMARY

Measures of Variability

Every set of scores has three important properties: a shape, a central


tendency, and a variability. Chapters 3 and 4 introduced terms for describing
various possible shapes, and Chapter 5 presented three measures of central
tendency. This chapter now offers four measures of variability.

The Purpose of Measures of Variability

Measures of variability express quantitatively (in numerical terms) the

extent to which the scores in a set _

_ [6.1]. A measure of variability specifies / does not specify

how far any particular score diverges from the center of the group; rather it

is a summary figure that describes the spread of the entire set of scores [6.1,

Paragraph 3 on p. 82]. Furthermore, a measure of variability provides / does not

provide information about the level of performance (the central tendency), and

it gives / does not give a clue as to the shape of the distribution [6.1, Para¬

graph 3 on p. 82].

The Range

The simplest measure of variability is the range, which is the difference

between the __ and the _ score [6.2] . Note that the

range is a distance, whereas a measure of central tendency is a location. All

the other measures of variability are also distances, except for the variance,

which is the square of a distance.

There is no special symbol for the range.


Variability 49

The Semiinterquartile Range

Every distribution has three quartile points, which are the three score

points which divide the distribution into four parts, each containing an equal

number of cases. These points, symbolized Q' , Q t and Q , are C , C , and

C , respectively [6.3]. The semiinterquartile range is defined as one-half

the distance between the and the quartile points [6.3].

The symbol for the semiinterquartile range is [6.3], and the formula that

defines it is:
[Formula 6.1a or b]

Note that the distance from Q ^ to Qis the range of the middle 50% of the

scores. Thus the semiinterquartile range is half the range of the middle 50%

of the scores. It may also be thought of as the mean distance between the

mode / median / mean and the two outer quartile points [6.3].

Deviation Scores

The other two measures of variability use the mean of the distribution as a

reference point and indicate how far the scores lie from the mean, on the average.

The distance between a given score and the mean is called the deviation score

for that raw score, and it is found by subtracting the mean from the raw score.

If a raw score is symbolized X, the deviation score is symbolized x, and the

formula that defines a deviation score is:

x - (X — ]j) for scores in a population

and

x = ( - ) for scores in a sample [6.4],

Warning: It is now highly important to distinguish raw scores, symbolized

with a capital X, from deviation scores, symbolized with a little x. Whenever

you see an X or an x, note carefully whether it's a capital or a small letter.

And when you write the symbols, you must make them clearly different. I suggest

that you make your capital letter large with straight lines, like this: X , and

your small letter small with one hooked line, like this: .

The Variance

A third measure of variability, the variance, is based on the squares of the


deviation scores. The square of a number is the result of multiplying that num¬
ber by itself. Thus the square of 2, symbolized 22, is 2 x 2, or 4.
50 Chapter 6

The variance is defined as the mean of the _ of the raw/

deviation scores [6.5]. The symbol for the variance of a population is _;

the foreign letter there is the lower-case Greek letter _ [6.5, Line 2],

and the symbol is read "sigma squared." The symbol for the variance of a sam¬

ple is _, which is read "es squared" [6.5]. The formulas that define the

variance in the two cases are:

2
Variance of a population: = Z ( - ) or _ [Formula 6.2a]

2
Variance of a sample: = Z ( - ) or _ [Formula 6.2b]

The variance is a most important measure which finds its greatest use in

inferential / descriptive statistics [6.5]. At the descriptive level, it has a

fatal flaw: its calculated value is expressed in terms of _ units of

measurement [6.5]. We can correct this flaw, though, by taking the square root

of the variance. (To find the square root of a number, figure out what value

has to be squared to equal that number. Thus the square root of 4, symbolized

/4, is 2, because 22 = 4.)

The Standard Deviation

The square root of the variance is called the standard deviation, and it

serves as a fourth measure of variability. The symbol for the standard devia¬

tion of a population is _, which is read "sigma," and the symbol for the

standard deviation of a sample is _, which is read "es." The defining

formulas, in terms of the deviation scores x, are:

Standard deviation of a population: = [Formula 6.3a]

Standard deviation of a sample: = [Formula 6.3b]

(Special help in understanding the variance and the standard deviation is


offered below.)

Sensitivity to the Scores in a Distribution

The standard deviation, like the mean / median / mode , is sensitive to the

exact location of every in a distribution [6.10]. In particular,

the standard deviation is more / less sensitive than the semiinterquartile


Variability 51

range to the presence or absence of scores which lie at the extremes of the

distribution [6.10]. Because of this characteristic sensitivity, the standard

deviation is / may not be the best choice among measures of variability when

the distribution contains a few very extreme scores, or when the distribution is

badly _ [6.10]. In contrast, if a distribution is badly skewed or if

it contains a few very extreme scores, the semiinterquartile range will /will

not respond to the presence of such scores, and will / but will not give them

undue weight [6.11].

Only the two outermost scores of a distribution affect the range. It is

thus highly / not very sensitive to the total condition of the distribution

[6.12].

Effects of Transformations of the Scores in a Distribution

Think of the scores in a distribution as a set of weights on a plank, as in


Figure 5.1. If we add a constant (say 10) to each score, this is the same as
shifting each weight to the right by the amount of the constant (10 units). The
whole distribution thus slides to the right, and the measures of central ten¬
dency change (they each increase by 10), but the variability among the scores—
where they are in relation to each other, or where they are in relation to their
mean—does not change.

Thus adding a constant to each score in the distribution, or subtracting a

constant from each score, which is equivalent to sliding all the weights to the

left in one big clump, affects / does not affect the measures of variability

[6.8]. When scores are multiplied or divided by a constant, the range, the

semiinterquartile range, and the standard deviation are _

_ by that same constant [6.8].

Comparing Means of Two Distributions

There are many occasions on which a researcher will wish to compare the mean

of one distribution with the mean of another. The researcher will subtract one

mean from another, but the difference so obtained usually has little meaning

without an adequate frame of reference by which to judge. That frame of refer¬

ence can be provided by comparing the difference to the ___

of the variable [6.14]. As a general guideline, a difference of .1 standard

deviation is negligible / moderate , and a difference of .5 standard deviation

is moderate / of some importance [6.14, p. 96].


52 Chapter 6

MAP of MEASURES of VARIABILITY

Measure of Variability

variance
Variability 53

SPECIAL HELP with the VARIANCE and the STANDARD DEVIATION

To understand how the variance and the standard deviation function as


measures of the variability among the scores in a distribution, remember that
they are based on the deviations between the scores and the mean.

The mean will always fall somewhere between the lowest score in the distri¬
bution and the highest one, as noted on p. 41 of this workbook, so some scores
will be below the mean, and some will be above it. For an illustration, look
at the left-most column in the table below. This is not a frequency distribu¬
tion, just a listing of the scores in order from high down to low. Check the
computation of the mean, and note that it falls between the low of 11 and the
high of 23.

Raw Score, X Deviation Score, x Squared Deviation Score, x2

23 +5 25

22 +4 16

19 +1 1

17 -1 1

16 -2 4

11 -7 49
o

Ex2 = 96
w

II
X

EX = 108

Mean = S2 = 96/6

X = EX/n = xd.u

= 108/6 = 18.0 S = /l6.0 = 4.0

Where are the raw scores in relation to the mean? The deviation scores in
the second column tell you. Raw scores below the mean have negative deviations;
raw scores above the mean have positive deviations; scores right at the mean
would have deviations of zero. The larger the deviation score, ignoring its
sign, the farther the corresponding raw score lies from the mean. Thus the raw
score of 11 is the farthest from the mean, because its deviation of 7 (really
-7) is the greatest of the deviation scores.

Now: the standard deviation of a distribution is just what its name says.
It is a standard, or typical, deviation. How big do the deviation scores for
the distribution run (ignoring their signs)? The standard deviation tells you
how big a typical deviation score is. In fact, the standard deviation is a
kind of average of the deviation scores (when their signs are ignored). It must
fall somewhere between the smallest deviation and the largest one, and the bigger
the deviations are, in general, the bigger the standard deviation will be.
54 Chapter 6

It is this latter feature that makes the standard deviation a measure of


the variability among the raw scores. What would happen if the raw scores in
the left-hand column of the table were more widely scattered about their mean?
What would happen, that is, if the scores above the mean became even larger in
value and thus moved farther above the mean, while the scores below the mean
became even smaller and thus moved farther below it? The deviations from the
mean would increase, of course, and so would any average of the deviation scores.
In particular, the standard deviation, as an average, would increase. Thus an
increase in the scatter among the raw scores would produce an increase in the
standard deviation.

The formula for the standard deviation is easy to remember if you recall
that it is the root mean squared deviation, as the text tells you on p. 86.
That is, it is the square root of the mean of the squared deviation scores.
So to compute the standard deviation, you have to find the deviation scores,
then the squares of the deviation scores, then the mean of the squared devia¬
tions, and then the square root of this mean. These calculations are shown in
this order in the other columns of the table above.

For this distribution, the standard deviation, S, turns out to be 4.0.


Think again what this number means. The standard deviation is a standard devia¬
tion—that is, a typical deviation. So for this distribution, a typical amount
by which the raw scores deviate from their mean is 4 units. To be a bit more
precise, an average of the deviation scores is 4.0, and thus some deviation scores
must be less than 4.0 while others must be more than 4.0. Note how these things
are true in the table. The deviation scores in the second column (ignoring their
signs) run from 7 and 5 down through 4 and 2 to 1. The value for the standard
deviation, 4.0, does indeed fall between the 7 and the 1, and it is a pretty
good average for those numbers.

Note that the standard deviation is calculated by doing two operations and
then undoing them, which gets you back to something comparable to what you began
with. You start with deviation scores, and you do two things to them: square
them, and then sum the squares. Then you undo these things: divide by the num¬
ber of cases, which undoes the summing; and take the square root, which undoes
the squaring. The result, the standard deviation, is comparable to the devia¬
tion scores with which you began: it is a standard, or typical, deviation.

Look again now at the computations in the table above. On the way to the
standard deviation in the lower right-hand corner, the variance turned up, for
the variance is nothing more than the mean of the squared deviation scores.
Remembering that it's a mean will help you tell if you have a reasonable value
when you've computed a variance. As the mean of the squared deviation scores,
it must fall somewhere between the largest squared deviation and the smallest
one. In the table, note that 16.0 does fall between the high of 49 and the low
of 1.
Variability 55

EXERCISES

To practice conceptualizing the variance and the standard deviation in the


terns introduced above, fill in the tables below, which are like the example
you just went over. THINK WHAT YOU ARE DOING! Here's a list of points to note;
do indeed note how these generalizations are true in each table.

The Mean (X): As an average of the raw scores, indicates about how large they
run. Takes an intermediate value, with at least one raw score larger and at
least one raw score smaller. Serves as the reference point for the deviation
scores (xs).

Deviation Scores (xs): Indicate where the corresponding raw score (X) lies in
relation to the mean—whether the raw score is above (+) or below (-) the mean,
and how far from the mean the raw score falls. Taking their signs into account,
Ex = 0 (see Section 5.7 in the text).

Variance (S2): Is the mean of the squared deviation scores (x2s). As an average
of the squared deviation scores, indicates about how large they run, and takes
an intermediate value, with at least one squared deviation larger and at least
one squared deviation smaller.

Standard Deviation (S): Is a typical value for the deviation scores (ignoring
their signs); serves as an average for the deviation scores (xs), indicating
about how large they run. Takes an intermediate value, with at least one
deviation larger and at least one deviation smaller. Is calculated by doing
two operations (squaring the deviation scores and summing them) and then un¬
doing the operations (dividing the sum by n or N and taking the square root
of the results), which gets you back to something comparable to what you started
with—something comparable to the deviation scores.

Deviation Score, x Squared Deviation Score,


Raw Score, X

13

12

Ex = Ex2 =
EX =

n =
Mean = S2 = /
X = EX/n
56 Chapter 6

Your standard deviation for the little distribution in that table should
have been 4.0, the same as the value for the example on p. 53. Why do the two
distributions have the same standard deviation? Yes, it's because they have
the same amount of variability, but why, exactly, do the two standard deviations
work out to be the same number?

Raw Score, X Deviation Score, x Squared Deviation Score, x2

14

13

10

10

EX = Ex = Ex 2 _

n = Mean = s2 = /
X = IX/n

S
/

Note how the generalizations on the preceding page apply to these tables-

Raw Score, X Deviation Score, x Squared Deviation Score, x2

14

~9

EX = Ex = Exz =
n = Mean = S2 = /
X = Ex/n

/ S = /
Variability 57

2
Raw Score, X Deviation Score, * Squared Deviation Score, x

ZX = Zx = Zx2 =

n = Mean = S'2 = /
X = ZX/n

/ S = v

MORE EXERCISES

As Minium's Second Law of Statistics implies, computing variances and


standard deviations from deviation scores is usually quite laborious, because
in real life the mean of a distribution is usually some ridiculously unwhole
number like 15.632, and the deviation scores are also decimal fractions. For¬
tunately, there are ways to compute the right answers without fussing with
deviation scores, ways that don't require anything but the raw scores. The
basic formula is: ? _ yv2 _ (yyx2/n
x K \ ) / ' (Rea(g this as "The sum of little-x-
squared equals the sum of big-X-squared minus the-sum-of-big-X-the-quantity-
squared over n.") Thus whenever you need Zx2, you can substitute the expres¬
sion on the right of the equals sign. To practice using this formula and to
see that it is trustworthy, compute Zx2 = ZX2 - (ZX)2/n for each of the dis¬
tributions above, and look to see if the result is the same as what you got
for Zx2. The raw scores are relisted below for each distribution, and the
computations for the data in the table used as the example on p. 53 have
been worked.
58 Chapter 6

Squared RAW Score, X2 Raw Score, X To make it clear that X2 differs from
2
x2, the column of X values is listed
529 29 on the left side of the raw scores.

484 22
Zx2 = ZX2 - (IX)2/n
361 19
= 2040 - (108) 2/6
289 17
= 2040 - 11664/6
256 16
= 2040 - 1944
121 11
-— - = 96 = the value computed
ZX2 = 2040 Zx = 108 n = 6 directly on p. 53

Squared RAW Score, x2 Raw Score, X

13 Zx2 = ZX2 - (Zx)2/n


12
- ( ) 7
9

7 /
6

---- = should = the value


£^,2 _ Zx - - computed directly
on p. 55

Squared RAW Score, X2 Raw Score, X

14 Zx2 = Zx2 - (Zx)2/n


13
- ( )V
10

10
/
8

5
should = the value
computed directly
ZX2 = Zx = n
on p. 56
Variability 59

Squared RAW Score, X2 Raw Score, X

14 Ex2 = EX2 - (Ex) 2/n

9
- ( ) 2/
9
/
9

should = the value


Ex2 = = n =
computed directly
on p. 56

Squared RAW Score, X2 Raw Score, X

7 Ex2 = EX2 - (IX) 2/n

~ = - ( ) 2/
7
--- /
5

4 =

_4 = should = the value


3 computed directly
----- on p. 57
3

Ex = n =

---M NEMONIC T I P—--

That formula you practiced using in the exercise above is worth memorizing
Write it out several times, taking care to distinguish x from X and Ex from
(Ex)2. Note again that the formula is read "The sum of little-x-squared equals
the sum of big-X-squared minus the-sum-of-big-X-the-quantity-squared over n.
Rehearse these words as well as the symbols to which they correspond.
60 Chapter 6

STILL MORE EXERCISES

Of the three measures of variability useful in descriptive statistics,


namely the range, the semiinterquartile range, and the standard deviation,
which one is:

1. Defined in terms of the scores' deviations from the mean?

2. Unresponsive to the location of scores in the middle of the


distribution?

_ 3. Poorest in resistance to sampling fluctuation?

_ 4. Best in resistance to sampling fluctuation?

_ 5. Most useful in inferential statistics?

_ 6. Most sensitive to extremely high or extremely low scores?

_ 7. Particularly useful with open-ended distributions?

_ 8. Determined by only two scores?

SYMBOLISM DRILL

This drill includes some review. Fill in the blanks.

Symbol_Pronunciation_Meaning
1
"little en" Number of scores in a

2
"big en" Number of scores in a

3
X "eks" or "big eks" A raw score, or the set of raw scores
4
Result of summing quantities of some kind
5
X Zx/n; the mean of a
6
Z X/N; the mean of a
9
"little eks" X - X or X - U; deviation score
1 0
"cue" (C 7 5 ~ C25)/2; semiinterquartile range
1 1
a2 "sigma squared" Zx2/N; variance of a population

12
"sigma" /Z xz/Ni of a

1 3
s2 "es squared" Zx2/n; of a

1 4
"es" /Z xz/n; of a
Variability 61

Statistics in

DESCRIPTIVE STATISTICS in USE

In 1973, the U. S. Consumer Product Safety Commission received reports


indicating that spray adhesives damage chromosomes and can cause a pregnant
woman to bear a deformed child. The Commission banned the sale of such adhe¬
sives in August of that year and warned all persons who had come into contact
with them, especially pregnant women, to consult a physician for a chromosome
study.

Two researchers subsequently mailed a questionnaire to all the Americans


listed in a directory of professionals who do diagnostic cytogenetics and
genetic counseling. Only five of these individuals failed to respond. The
questionnaire asked for the number of persons who had consulted the respondent
about exposure to spray adhesives and the number whose chromosomes the respon¬
dent had actually studied. The researchers reported the following data:

Number of People Consulting Respondent


about Exposure to Spray Adhesive f

None 52
1-5 68 Six additional
6-10 31 respondents re-
11 - 15 8 ported "some"
16 - 20 8 inquiries.
21 - 25 2
Over 25 7

176

Range 0 - 200. X = 6.81 inquiries per respondent. C25 = 0/ C50 = 3.2, C75 = 7.6.

Number of Persons Whose Chromosomes


were Studied by Respondent_f

None 49
1-5 58
Two additional
6-10 13
respondents re¬
11-15 4
ported "some"
16-20 2
studies.
21-25 1
Over 25 1

128

Range 0-44. X = 2.97 studies per respondent. C25 = 0, C50 = 2.0, C75 - 4.2.

As you can see, the researchers did not follow standard practice in con¬
structing their grouped frequency distributions: they have too few class inter¬
vals, the intervals are not of uniform width, and one interval is open-ended.
Nevertheless, these data illustrate a number of techniques of descriptive sta¬
tics and offer you a chance to review much of what you've learned so far.
62 Chapter 6

1. What is the shape of the first distribution? _

2. What is the shape of the second distribution? _

3. What is the median of the first distribution?

4. What is the median of the second distribution?

5. Why is the mean higher than the median in the first distribution?

6. Why is the mean higher than the median in the second distribution?

7. What is Q for the first distribution? _

8. What is Q for the second distribution? _

9. The researchers actually reported the values of Q for these distributions;


these are the only times I've seen this done. Why is Q better than S' as a measure
of variability for these distributions?

10. For the 176 respondents who reported an exact number, what was the total
number of people who consulted with them about the effects of exposure to spray
adhesives?

11. For the 128 respondents who reported an exact number, what was the total
number of people whose chromosomes were studied?

12. The researchers' goal was to determine the impact of the ban and the
warning on the genetic counselors in the U. S. who do diagnostic cytogenetics.
What was the population of interest to them?

13. Why did the researchers employ only descriptive statistics with no infer¬
ential techniques?

Important note: The ban was withdrawn in six months when the purported cor¬
relations between exposure to spray adhesive and chromosomal damage or birth de¬
fects could not be confirmed, and no toxicity could be demonstrated for the ad¬
hesives. In fact, investigators who reexamined the slides that had initially
been believed to show chromosome damage in exposed individuals did not agree with
the original interpretation.

Less important note: Don't be upset if you computed C50 and C75 for the two
distributions and failed to get the figures the researchers reported. I can't
derive the figures from the numbers in the tables either. The researchers must
have computed the quartiles from the ungrouped data.

Reference: E. B. Hook & K. M. Healy, "Consequences of a Nationwide Ban on


Spray Adhesives Alleged to be Human Teratogens and Mutagens," Science, 1976, 191,
566 - 567.
CHAPTER 7

THE NORMAL CURVE

7.1 Introduction

7.2 The Nature of the Normal


Curve

7.3 Historical Aspects of the


~ Normal Curve

7.4 The Normal Curve as a Model


— ~ _—— for Real Variables

7.5 The Normal Curve as a Model


~ for Sampling Distributions

7.6 Standard Scores (z Scores)


— and the Normal Curve

7.7 Finding Areas when the Score


is Known

7.8 Finding Scores when the Area


is Known

PROBLEMS and EXERCISES

63
64 Chapter 7

SUMMARY

The Normal Curve as a Model for the Shape of a Distribution

The normal curve is a model, or representation, of one feature of a distri¬

bution of scores—its shape. Some distributions have a shape for which the

normal curve is a poor model. Furthermore, the normal curve best describes

a finite number / an infinity of observations that are on a continuous / dis¬

crete scale of measurement, while in reality recorded observations are dis¬

crete/continuous and finite in number [7.4, p. 111]. Nevertheless, there are

many real distributions whose shape is well represented by the normal curve.

The normal curve also functions well as a model for many distributions of

sample _, even if it is not a good model for the raw observations

that make up the samples [7.5]. Suppose, for example, that we draw a very great

number of random samples from some population. (All the samples should be of

the same size.) We compute the mean of each sample, which is a statistic char¬

acterizing the sample, and we cast these statistics into a distribution. It

would be found that the shape of the distribution of this large number of means

tends to approximate the _ curve [7.5].

The normal curve, as a model for the shape of a distribution, does not
specify the central tendency of that distribution (it does not specify the mean,
for example), nor does the model specify the variability (it does not specify
the standard deviation, for example). Thus different distributions can all
conform to the shape called normal, even though the distributions differ in
their mean, in their standard deviation, or in both. (See Figure 7.1 for an
example.)

Characteristics of the Normal Curve

Exactly what is the shape of the normal curve? It is symmetrical / asymmet¬

rical and unimodal / bimodal [7.2]. (It is thus a specific kind of bell

shape.) Going away from the middle toward either end, the curve gets closer

and closer to the horizontal axis, and it eventually / but it never actually

touches it [7.2, p. 107]. The curve is continuous / discrete [7.2].

Areas under various segments of the normal curve are of special interest.

The interval from one standard deviation below the mean to one standard devia¬

tion above it (the interval from y - la to y + la) contains about two-thirds

(68%) of the total area. In any distribution whose shape is well modeled by
The Normal Curve 65

the normal distribution, then, about two-thirds (68%) of the scores will have a

value falling in the interval from y - la to y + la. The interval y ± 2a con¬

tains about _% of the area, and thus about _% of the scores fall into this

interval [6.13]. The interval y ± 3a contains almost all the area (99.7% of it),

and thus almost all the scores (99.7% of them) fall into this interval. In gen¬

eral, relative area under the curve equals _ of

cases in the distribution [7.6, Paragraph 1; see also the last paragraph of

Section 4.8 and p. 35 of this workbook].

Standard Scores (z Scores)

If a large number of people representative of the general population are

given a certain kind of IQ test, namely the Wechsler Adult Intelligence Scale

(the WAIS), the distribution of their scores will have a mean about 100 and a

standard deviation about 15. Back when the College Entrance Examination Board

Test (the CEEB test, also called the Scholastic Aptitude Test, or SAT) was first

constructed, a large number of high-school seniors would have earned scores with

a mean about 500 and a standard deviation about 100 on both the verbal and the

quantitative parts. (The means for the CEEB scores are somewhat lower now.)

Given these states of affairs, there is an important way in which an IQ score

of 100 from the first distribution corresponds to a CEEB score of 500 from the

second distribution: each falls at the mean of its distribution. There is also

an important way in which an IQ score of 115 corresponds to a CEEB score of 600:

each falls one standard deviation above the mean of the distribution from which

the score comes. Similarly, an IQ of 85 and a CEEB score of 400 are each one

standard deviation below the mean. An IQ of 130 corresponds to a CEEB score of

; an IQ of 145 corresponds to a CEEB score of _; an IQ of _ cor¬

responds to a CEEB score of 300; and an IQ of corresponds to a CEEB score

of 200 [Look ahead to Figure 8.4 on p. 132 for the answers].

There is a convenient way of saying where a raw score falls within its dis¬

tribution, a way that makes clear the sort of correspondences noted above. The

raw score is converted to a standard score. A standard score, or _ score,

states the position of the raw score in relation to the _ of the distri¬

bution, using the of the distribution as the unit of


66 Chapter 7

measurement [7.6]. Thus an IQ of 115 in the distribution described above has a

z score of +1, because it is 1 standard deviation (1 15-point unit) above the

mean (which is 100). A CEEB score of 600 in the other distribution also has a

z score of +1, because it too lies one standard deviation above the mean, but

here the standard deviation and the mean are those characterizing the distribu¬

tion of CEEB scores, namely 100 and 500, respectively. The z score for an IQ of

85 or a CEEB score of 400 is _, and the z score for an IQ of 130 or a CEEB

score of 700 is _ [Figure 8.4 on p. 132]. The IQ score with a z value of +3

is _, and the CEEB score with a z value of -3 is _ [Figure 8.4] .

Standard scores can easily be computed in your head if the numbers involved

are simple, like those in the examples above. If you need to use a formula to

find a z score, it will be one of these:

z score in a population called x: z = - [Formula 7.1]

z score in a sample called X: z = - [Formula 7.2]

The two distributions in the examples above would both be close to normal
in their shape, as Figure 8.4 shows, but z scores are useful in describing raw
scores in a distribution with any kind of shape.

Properties of z Scores

If all the scores in any given distribution are converted to z values, the

mean of the z scores is always _, and the standard deviation of the z scores

is always _ [7.6]. A distribution of z scores has a normal shape / whatever

shape is characteristic of the set of raw scores from which they were derived

[7.6].

ADVICE

The exercises offered in the text for this chapter are especially important.
Be sure to do at least some of the parts of each one. As for the other chapters
the exercises provided here (following the map of the new concepts) do not dupli
cate those in the text.
The Normal Curve 67

MAP of z SCORES and the NORMAL DISTRIBUTION

z SCORES

indicate where for a complete are helpful in solving


corresponding raw distribution of problems involving areas
scores lie in raw scores have under the
relation to mean
of distribution,
using standard mean of zero
deviation of
NORMAL CURVE
distribution
as unit of standard deviation
measurement of one
has a particular
bell shape, with
shape of raw scores
(not necessarily
normal) provides a model
for the shape of

does not specify


the distribution's

68% of the area in interval U +


the la
95% of the area in the interval U + 2a
99.7% of the area in the interval y + 3a central
tendency

variability

certain distributions

certain distributions of sample statistics


68 Chapter 7

EXERCISES

Here's a collection of scores listed in order from high down to low. (This
is not a frequency distribution.) Take them to be a sample called X. Compute
the mean, find the deviation scores, and then calculate the standard deviation
from the squares of the deviation scores. These are the sorts of exercises I
offered you in the preceding chapter.

What's new here is this: Find the z score corresponding to each raw score,
and check to see that the mean of the z scores, z" ("zee bar"), is zero. Then
compute the standard deviation of the z scores using a procedure like that by
which you found the standard deviation of the raw scores. This requires treat¬
ing the z scores as though they were raw scores and finding the deviation scores
for them, using their own mean, zero, as a reference point.

All the figures in the table work out to be simple (but not necessarily
whole) numbers, and the standard deviation of the z scores should be exactly one,
of course.

Squared Deviation Squared


Raw Deviation
Deviation z Score Score for Deviation
Score Score
Score z Score for z Score

X x = {X - X) x2 z = x/S (z - z~) (z - z) 2

13

13

Zx = Zx = Zx2 = Zz = Z (z - z) = Z (z - z)2 =

n = S2 = Zx2/n n = S* = Z(z - z)2/n

X = Z x/n = / z = Z z/n = /

S = / sz — -J
The Normal Curve 69

Note how the generalizations on p. 55 of this workbook apply to this table.


The generalizations enable you to determine whether your calculations are pro¬
ducing reasonable values. Is your value for X , for example, somewhere between
the highest raw score and the lowest one? S2 is also a mean and thus must fall
somewhere between the largest x2 value and the smallest x2 value. The mean and
the variance of the z scores can be checked in analogous ways.

If you'd like to do more problems of this kind, use the distributions on pp.
53 and 55 - 59 above.

SYMBOLISM DRILL

Yes, another one. Repetition is what makes a drill a drill and a good tactic
for learning the sort of material in the table below.

Symbol Pronunciation Meaning

Number of scores in a population

Number of scores in a sample

"eks" or "big eks" A score, or the set of scores

"little eks" X - y or X - X; score

Result of summing quantities of some kind

"eks bar" Zx/nj the mean of a

"mew" Zx/N; the mean of a

l l Zxz/N}_ of a

1 2 fzP7N; of a

1 3 Zx2/n; of a

1 4 v^Zx^/n; of a

1 5 "zee" x/0 or x/S; z score


CHAPTER 8
DERIVED SCORES

8.1 The Need for Derived Scores

8.2 Standard Scores: ThezScore

8.3 Standard Scores: Other Varieties

8.4 Translating Raw Scores to Standard


Scores

8.5 Standard Scores as Linear Transfor¬


mations of Raw Scores

8.6 Centile Scores

8.7 Comparability of Scores

8.8 Normalized Standard Scores: T Scores


and Stanines

8.9 Combining Measures from Different


Distributions

PROBLEMS nd EXERCISES

2_____
1 _____

4 _____
3____

6 ___
5 _____

8______
7______

10_______
9_____

12_______
11_____—

14 ___
13 ___-

71
72 Chapter 8

SUMMARY

The Need for Derived Scores

In psychological and educational measurement, a raw score, by itself, is

typically interpretable / uninterpretable [8.1, Paragraph 2]. To determine

whether a given score indicates a good performance or a poor one, then, a frame

of reference is needed. A distribution of scores obtained by a group with

known characteristics provides such a frame of reference. The group is called

a _ group, and the distribution of their scores on the test provides

the test _ [p. 124, footnote].

Types of Derived Scores

A number that indicates where a raw score stands in relation to other raw

scores is called a derived score. There are two major kinds of derived scores:

those like the z score that preserve the proportional relation of interscore

__, and those like the centile rank that do not [8.1, final para¬

graph] .

The z and Other Standard Scores

As noted in Chapter 7, transforming all raw scores in a distribution to z

scores changes the mean to _ and the standard deviation to _, and

it also changes/but it does not change the shape of the distribution [8.2].

It is in this sense (leaving the shape unchanged) that z scores preserve the

proportional relation among the distances between the raw scores.

Chapter 7 used the term standard score to refer to z scores, but there are

other types of derived scores that are also standard. Transformation of an

entire distribution of raw scores to a given type of standard score yields a

distribution with a fixed mean and a fixed standard deviation; this is what is

"standard" about the derived score.

For example, in WWII the Army transformed raw scores on its General Classi¬

fication Test into a type of standard score with a mean of and a standard

deviation of _ [8.3, Table]. And as this workbook noted in the summary for

Chapter 7, raw scores on the Wechsler Intelligence Scale are transformed into

derived scores called IQs with a mean of and a standard deviation of


Derived Scores 73

while transformations of raw scores on the CEEB test originally had a mean of

_ and a standard deviation of [8.3, Table].

The fundamental characteristic of all of these types of standard scores is

that they, like the z score, locate a raw score by stating how many _

__ it lies above or below the _ of the distribution [8.3].

Another important property of these standard scores is that they, like the z

score, preserve the _ of the original distribution of raw scores [8.3].

All of the standard scores considered so far, z scores and the others, are

transformations of the raw scores, and in each case the transforma¬

tion can be expressed by the equation of a _ line [8.5]. A linear

transformation is any transformation of scores performed by adding, subtracting,

multiplying, or dividing by ________ [8*5, first sentence].

Centile Scores

Centile ranks (which this chapter occasionally calls centile scores) are

also derived scores. Like the standard scores described so far, the centile

rank of a raw score describes its_relative to other scores in a

distribution [8.6]. The centile rank does this by indicating what percentage

of the scores fall below it. Centile ranks have a major disadvantage, however:

Changes in raw scores are / are not ordinarily reflected by proportionate

changes in centile rank [8.6]. When one centile rank is higher than another,

the corresponding raw score of the one is higher than that of the other, but we

do not know by how much. Changes in raw score are accompanied by proportionate

changes in centile rank only when the shape of the distribution of scores is

[8.6], as in the bottom half of Figure 8.3.

Comparison of Standard Scores from Different Distributions

Under some circumstances, a standard score from one distribution may be

meaningfully compared with a standard score of the same type from another dis¬

tribution. One element necessary for appropriate comparison is that the refer¬

ence groups (norm groups) used to generate the standard scores are _

(that is, similar) [8.7]. Furthermore, standard scores should be used for com¬

paring scores from two different distributions only if the two distributions
74 Chapter 8

have roughly the same _ [8.7]. If the distributions have different

shapes, there are two possible solutions. One possibility is to use

_; they are independent of the shape of the original distributions

[8.7, p. 134]. Another solution is to convert the distributions of raw scores

to distributions of derived scores that have identical _, as well as

identical _ and _ [8.7]. Such transformations

are considered immediately below.

Normalized Standard Scores

Transformations that yield "normalized standard scores" are transformations

that produce a standard mean and a standard standard deviation, but in addition

the process of transformation alters the shape of the original distribution of

raw scores so that it follows the _ curve [8.8, Sentence 2]. Trans¬

formation of raw scores to T scores, for example, produces a distribution with

a mean of _, a standard deviation of _, and one that is normally distri¬

buted [8.8]. Transforming raw scores to _produces a distribution

with a mean of 5, a standard deviation almost 2, and again a shape that is nor¬

mal [8.8, Paragraph 3].

Combining Measures from Different Distributions

Suppose an instructor has scores for students on a quiz, a midterm exam, and

a final. To compute the final grades, the instructor might simply sum each of

the three scores for each student and then decide which sums are worth an A,

which sums are worth a B, and so on. A naive person would think that this pro¬

cedure of simply summing the three scores makes them count equally in determin¬

ing the total and thus the final grade. When several scores are summed, each

one does / does not, however count equally in determining the total [8.9, Par¬

agraph 2]. If the several scores are independent (i.e., if the size of a per¬

son's score on one variable is / is not predictive of the size of that person's

score on another variable), then the contribution of each score to the total is

proportional to the magnitude of the _ of the distri¬

bution from which it came [8.9].

This situation can be rectified by assuring that all of the test distribu¬

tions from which scores are to be added to form the composite have the same
Derived Scores 75

, and one way to accomplish this is to transform

scores from the several distributions to __ scores [8.9]. The

transformed scores may then be summed if equal weight is desired, or multiplied

by desired weights and summed.

This is not the end of the problem, however, because it will work only if

the several scores are independent, and usually they are not. The lack of in¬

dependence is not a problem when only _ measures are to be combined, but

it is a problem when there are more than _ [8.9, p. 138]. With more than

, the procedure recommended above ensures / does not ensure that the

weights assigned will result in the intended relative importance of the contri¬

bution of each to the whole. Nevertheless, it is better to follow the procedure

outlined above than to allow the scores to be weighted by the amount of _

inherent in the distribution of each [8.9].

EXERCISES

To practice locating scores relative to the mean of their distribution,


using their standard deviation as the unit of distance, fill in the missing
values in the table below. Note that the four scores on a line would be truly
equivalent only if the distributions from which they come have similar shapes.
You should be able to do most of the calculations for the first four columns
in your head.

TABLE OF EQUIVALENT SCORES

Score where Score where Score where Centile Rank if


z Score [1=50, a=io Shape is Normal
y=100, Q=15 y=500, a=ioo

+3 145

97.7

650

50.0

-1

250

20
76 Chapter 8

MAP of DERIVED SCORES

RAW SCORE

is typically is given meaning by


uninterpretable
by itself in
psychology or DERIVED SCORE
education

indicates where may be


raw score stands
in relation to

Kind that preserves Kind that does not


the shape of the preserve the shape
Other Raw Scores distribution of raw of the distribution
scores and thus of raw scores and
preserves the pro¬ thus alters the
constitute portional relation proportional rela¬

I
Test Norms
of interscore
distances
tion of interscore
distances

consists of consists of

*These figures hold only for the original norm group.


The CEEB scores earned by high-school seniors in recent
years have had lower means and varying standard deviations
Derived Scores 77

SYMBOLISM DRILL

Symbol Pronunciation Meaning

2 Number of scores in a population

1 n Number of scores in a __

A score, or the set of _ scores


3 X

9 x X - y or X - X; ___ score

4 I

6 E / ; the mean of a
V

5 E / ; the mean of a
X

11 E /N; of a

/E / N; of a
12 a

E /n; of a
13 s2

A of a
14 s /n;

/a or /S; score

Statistics in Action—------
defining mental retardation

A psychologist who has conducted important research on mental retardation


recently discussed the issue of how to define retardation. Writing in the
journal Science, Edward Zigler of Yale University notes that in 1959, the Amer-
ican Association on Mental Deficiency, the leading professional organization
concerned with the retarded in America, defined the retarded as those wit
IQ score one or more standard deviations below the mean.

1. What IQ score is one standard deviation below the mean? (The IQ


tests for children have a standard deviation that is the same as
that for the WAIS or very close to it.)

2. What percentage of the population was defined as retarded by this


criterion?
78 Chapter 8

But many professionals now accept an IQ of 70 as the dividing line, Zigler says.

- 3. How many standard deviations below the mean does an IQ of 70 lie?

- 4. By this criterion, what percentage of the population is retarded?

. Zigler argues that retardation is a "two-tiered phenomenon" in which the


"mildly retarded," those with IQ between 50 and 70, must be distinguished from
the more severely retarded," those with IQ less than 50. "Statements or views
concerning one tier are pretty much inapplicable to the other," Zigler believes.

- 5. How many standard deviations below the mean does an IQ of 50 lie?

- 6. What percentage of the population is "mildly retarded" accordina to


Zigler's definition?

- 7. What percentage of the population is "severely retarded"?

Zigler suggests that people with IQ between 50 and 70 who are currently
called retarded merely represent "the lower portion of the normal distribution
of intelligence" and are "thus an integral part of the normal population." In
this conception, these people are like those whom we regard as short. Most of
the people m each classification came to be what they are through the usual gene
tic and environmental processes, which produce a normal distribution of 10 scores
m the one case and a normal distribution of heights in the other, both of them
Dust naturally including some scores well below the mean. The truly retarded,
m Zigler's view, are like the dwarfs and midgets who fall far below the mean'
m height as a result of processes that are clearly abnormal. Unfortunately,
he notes, there is no emotionally neutral word comparable to "short" to describe
the lower end of the naturally occurring distribution of IQ scores. The word
retarded" is pejorative, and to apply it to all people with an IQ between 50
and 70 is unfair and misleading, Zigler believes, just as it is unfair and mis-
eadmg to refer to all adults who are, say, five-two or less, as dwarfs or
midgets.

-- 8. What proportion of the young adult men in America are below five
feet, two inches in height? (See the table on p. 27 on this book.)

—- 9* what proportion of the young women are below five-two?

IQ^is not the only criterion that is used in diagnosing retardation. The
person's social competence and the age at which the abnormalities began are also
taken into consideration by many professionals. Social competence, or the abil¬
ity to meet the demands of everyday living, is not adequately defined by an IQ
score, Zigler says, and the exact relation between intelligence and social com¬
petence is unclear. There is great need for a measure of social competence
that can be used throughout the lifespan as IQ tests can.

1977?ei96!nil92-11949ler' ReVleW °f The Mentall<J Retarded and Society, Science,


CHAPTER 9
CORRELATION

9.1 Measurement of Association and


Prediction

9.2 Some Historical Matters

9.3 Association: A Matter of Degree

9.4 A Measure of Correlation

9.5 Computation of r: Raw Score Method

9.6 Score Transformations and the


Correlation Coefficient

9.7 The Scatter Diagram

9.8 Constructing a Scatter Diagram

9.9 The Correlation Coefficient: Some


Cautions and a Preview of Some As¬
pects of Its Interpretation

9.10 Other Ways of Measuring Association

PROBLEMS and EXERCISES

10

79
80 Chapter 9

SUMMARY

In the cases considered thus far in your study of statistics, there has
been only one score for a given subject, a score indicating the value of just
one variable (the subject's height, for example), when two scores are avail-
able for each subject, one score for each of two variables (the subject's height
and the subject's weight, say), it is possible to determine the correlation
between the two variables for the subjects on hand. Except in the final sec¬
tion, Chapter 9 is concerned with the correlation between variables that are
each quantitative and continuous (such as height and weight).

Correlation: A Matter of Direction

There is no correlation between two continuous quantitative variables if

high scores on one variable tend to be associated with both high and low

scores on the other variable, and if low scores on the one variable also tend

to be associated with both high and low scores on the other. Diagram 6 of

Figure 9.1 on p. 145 illustrates a case of a correlation very close to zero.

Such a graph is called a


[Figure 9.1, Caption].

If there is a correlation between two continuous quantitative variables,


it will be either positive or negative. In the case of a positive correlation,
lgh scores on one variable tend to be associated only with high scores on the
other variable, while low scores on the one variable tend to be associated
only with low' scores on the other. There is a positive correlation between
height and weight among human beings, in that people with a high score on
height (tall people) tend to have a high score on weight (tend to be heavy)
while people with a low_score on height (short people) tend to have a low score
on weight (tend to be light). Positive correlations are illustrated in Dia¬
grams 1 - 5 of Figure 9.1. Note that in these diagrams the data points fall in
a swarm running from lower left to upper right.

In the case of a negative correlation, high scores on one variable tend to


be associated only with low scores on the other variable, and low scores on
t e one tend to be associated only with high scores on the other. There is a
negative correlation between height and normal sleeping time per night among
uman beings, in that people with a high score on height (adults) tend to have
a ow score on normal sleeping time (tend to sleep relatively few hours), while
people with a low score on height (children) tend to have a high score on nor¬
mal sleeping time (tend to sleep relatively many hours). Negative correlations
are illustrated in Diagrams 7 and 8 of Figure 9.1. Note that in these diagrams
ne data points fall in a swarm running from upper left to lower right.

Correlation: A Matter of Degree

The distinction between a positive and a negative correlation is a distinc¬


tion in the direction of the correlation. Correlations also vary in degree.
Correlation 81

In the case of a perfect correlation, all data points in a diagram like those
in Figure 9.1 fall on a straight line. (If the direction is positive, the line
slopes from lower left to upper right; if the direction is negative, the line
slopes the other way.) In the case of less than perfect correlations, the data
points swarm more or less closely about a straight line; the farther away, the
lower the degree of the correlation (whether the direction is positive or nega¬

tive) .

Pearson's r as a Measure of Correlation

The direction and the degree of correlation between two continuous quantita¬

tive variables is indicated by a number called the Pearson product-moment corre¬

lation coefficient. The Greek letter p, pronounced "_," stands for the

population value, and _ stands for a sample value [9.3]. In this chapter

the symbol is used consistently, but the principles and procedures for

calculation described here apply equally to samples and to populations [9.3].

The sign of the coefficient may be positive or negative. A positive value

of r indicates a positive correlation, which, as noted above, occurs when there

is a tendency for high values of one variable (X) to be associated with high/

low values of the other variable (Y), and low values of the one to be associ¬

ated with high / low values of the other [9.3, Paragraph 2]. A negative value

of r indicates a negative correlation, which occurs when high values of X are

associated with high / low values of Y, and vice versa [9.3]. The sign of the

coefficient indicates the direction / degree of the association [9.3].

When no relationship exists, its value is _ [9.3]. When a perfect

relationship exists, its value is _plus or minus [9.3]. An intermediate

degree of relationship is represented by an intermediate value of r. Note that

when r = +1.00 or -1.00, every point lies close to / exactly on a straight

line [9.3, p. 145]. This means that if we know the value of X, we can predict

the value of Y with only a little / without any error [9.3].

Pearson's r is defined in the text in terms of deviation scores:

[Formula 9.1]
r
xy

, is the sum of the prod-


In this formula, _ - (X-X), y -
= number of pairs of scores, and S^,
ucts of the paired deviation scores
82 Chapter 9

Sy are the of the two distributions [9.4],

To understand how this formula works, visualize a scatter diagram like

Figure 9.2. This diagram is divided into four quadrants by two lines, one

located at the _ of X and one at the of y [9.4, p. 147].

Points located to the right of the vertical line are therefore characterized

by positive / negative values of x and those to the left by positive / negative

values of x [9.4]. Those points lying above / below the horizontal line are

characterized by positive values of y, and those above / below by negative

values of y [9.4]. For any point, the xy product may be positive or negative,

depending on the sign of x and the sign of y. The xy products will be positive

for points falling in quadrants _ and _ and will be negative for points

falling in quadrants and [9.4]

On examination of the deviation score formula for r, it is apparent that

the ___^xy r determines whether the coefficient will

be negative, zero, or positive [9.4]. The correlation coefficient will be

_ when the sum of the negative xy products from quadrants II and IV

equals the sum of the positive products from quadrants I and III; the coeffic¬

ient will be negative when the contributions from quadrants and exceed

those from quadrant _ and _; and the coefficient will be positive when the

reverse is true [9.4]. The greater the predominance of the sum of products

bearing one sign over those bearing the other, the greater the magnitude /

direction of the coefficient [9.4].

Effects of Transforming Scores on the Correlation Coefficient

How will the correlation be affected if a constant is added to or subtracted

from each X score before obtaining the correlation between that variable and Y?

What is the effect of multiplying or dividing X by a constant? The correlation

between the altered variable and the remaining variable changes /remains just

as it was [9.6]. It is also possible to transform X in one way and Y in another

and the value of the coefficient changes/ remains unaltered [9.6]. As long

as the translation of either (or both) variable(s) is a one, the

correlation will be unaltered [9.6].


Correlation 83

Cautions Concerning the Correlation Coefficient

1. Does a correlation coefficient of, say, +.50 mean that there is 50%

association between the two variables? The degree of association is / is not

ordinarily interpretable in direct proportion to the magnitude of the coeffic¬

ient [9.9].

2. If two variables are substantially correlated, must one be, at least in

part, the cause of the other? This is / is not so. Mere association is in¬

sufficient/sufficient to claim a causal relation between two variables [9.9].

3. The strength of the association between two variables depends, among

other things, on the nature of the measurement of the two variables as well as

on the kind of subjects studied. It is/ is not possible, then, to speak of

the correlation between two variables without taking these factors into con¬

sideration [9.9].

4. Pearsonian correlation is based on the line of best fit

to the bivariate distribution. Although a _ line is reasonably

considered to be the line of best fit in many situations, sometimes it is not.

When it is not, the strength of association is likely to be underestimated/

overestimated / accurately estimated by Pearson r [9.9].

5. The correlation coefficient is affected by the range of talent, or vari¬

ability characterizing the measurements of the two variables. In general, the

smaller the range of talent in X and/or Y, the higher / lower the correlation

coefficient, other things being equal [9.9].

6. The correlation coefficient is / is not subject to sampling variation

[9.9]. Depending on the characteristics of a particular sample, the obtained

coefficient may be higher or lower than it would be in a different sample from

the same population.

EXERCISES

For each case described below, use your common sense and your new under¬
standing of correlation to determine whether the correlation between the two
84 Chapter 9

variables is zero, positive, or negative; and if positive or negative, whether


the degree of correlation is low, medium, high, or perfect.

_ 1* Among the world's population of human beings with both


legs intact, what is the correlation between the length of a person's left leg
and the length of his or her right leg?

__ * 2 3* Among grade-school children, what is the correlation


between the length of a child's nose and the number of words in his or her
spoken vocabulary? Remember that the children come from grades one through six.

_ 2* Among teenagers, what is the correlation between the


number of freckles on a person's face and the person's IQ?

_ 4* Among married couples in North America, what is the


correlation between the age of the husband and the age of the wife?

___ Playboy magazine, following a suggestion offered by Plato


2300 years ago, once proposed that men should marry about the age of 30 and
choose a bride about the age of 20. Suppose that North America follows this
advice for 100 years. After the 100 years, married couples are surveyed; some
are relatively young newly-weds, and some are just celebrating their golden
anniversary. Among these couples, what would be the correlation between the
age of the husband and the age of the wife?

__ What is the correlation between the height of the New


York Yankees baseball players and the New York Jets football players?

SYMBOLISM DRILL

Symbol_Pronunciation Meaning

_ "little en" Number of scores in a

2 _ "big en"

3
A raw score, or the set of raw scores

9 _ "little eks" - y or - X ; score

4 "the sum of"

IX/N; the mean of a

5
Zx/n', the mean of a
Correlation 85

Symbol Pronunciation Meaning

11 of a
Ex2/N;

1 3 of a
Ex2/n;

1 2 of a
Ax2/iV;

14 of a
/Ex^/n;

1 7 correlation coefficient for a pop'n


"rho" Pearson

16 Pearson correlation coefficient for a sample


"ar"

1 5 x/a or x/S; score


86 Chapter 9

MAP of CONCEPTS in CHAPTER

Bivariate Distribution

consists of may be graphed as


/
pairs of values
Scatter Diagram
of X and Y

may reveal

Association between No tendency for high scores Association between


high scores on X and on X to be associated with high scores on X and
high scores on Y, either high or low scores on low scores on Y,
and between Y, and no tendency for low and between
low scores on X and scores on X to be associated low scores on X and
low scores on Y with either high or low scores high scores on Y
on Y
I
is described as is described as is described as

I I I
Negative correlation
Positive correlation Zero correlation
between X and Y between X and Y between X and Y

permits

corresponds to corresponds to corresponds to


positive values zero value negative values
of of of
prediction of
standing in Y
from standing Pearson Correlation
in X with Coefficient
better than
chance
is symbolized indicates by its magnitude
accuracy
(ignoring its sign)

p for a population,
is appropriate as a
measure of correlation the
\
strength of the linear
r for a sample
only if the relation association between X and Y
between X and Y is linear
l
is shown graphically by the extent
to which scores cluster around the
straight line of best fit on the
CHAPTER 10

FACTORS INFLUENCING THE CORRELATION


COEFFICIENT

10.1 Correlation and Causation

10.2 Linearity of Regression

10.3 Homoscedasticity

10.4 The Correlation Coefficient in


Discontinuous Distributions

10.5 Shape of the Distributions and the


Correlation Coefficient

10.6 Random Variation and the Correlation


Coefficient

10.7 The Correlation Coefficient and the


Circumstances under Which It Was
Obtained

10.8 Other Factors Influencing the


Correlation Coefficient

PROBLEMS and EXERCISES

1 ____——-

2 ________

3 ______

4 ________

5 ________

6 _____

7 _____

87
88 Chapter 10

SUMMARY

Correlation and Causation

The fact that x and 7 vary together is a necessary and also / but not a

sufficient condition for one to make a statement about causal relationship

between the two variables [10.1]. That is, evidence that two variables vary

together is/ is not necessarily evidence of causation [10.1]. Figure 10.1

shows four of the possibilities that may occur when two variables are correla¬

ted, and in only two of these cases (the first and second) does one of the

variables have a causal effect on the other.

Linearity of Regression

In scatter diagrams such as those shown on pp. 144-145 and 165 of the text,

it is helpful to fit a straight line to the swarm of data points. (The next

chapter tells how to do this precisely.) In general, the more closely the

scores hug the straight line of best fit, the higher / lower the value of r

[10.2] . When r is 0, the scores scatter as widely about the line as possible,

and when r is _ (plus or minus), the scores hug the line as closely as possi¬

ble, since they all fall exactly on the line [10.2]. One meaning of this prin¬

ciple is that prediction of Y from knowledge of X can be made with greater accu¬

racy when the correlation is high / low than when it is high / low [10.2].

But in a given set of data, a straight line may or may not reasonably

describe the relationship between the two variables. When a straight line is

appropriate, X and Y are said to be __ related [10.2] . More form¬

ally, the data are said to exhibit the property of linearity of

[10.2] .
What happens when X and Y are not linearly related? When the correlation

is other than zero and the relationship is nonlinear, as in Figure 10.3, Pearson

r will underestimate / overestimate / correctly estimate the degree of associa¬

tion [10.2] .

Homoscedasticity

When the variability in Y is the same throughout the values of X, as in

the left side of Figure 10.4, the bivariate distribution is said to exhibit the
Factors Influencing the Correlation Coefficient 89

property of homoscedasticity. Since r is a function of the degree to which the

points hug the straight line of best fit, the value obtained for it in such a

case has a meaning of general significance: r describes the closeness of associ¬

ation of X with Y irrespective of the specific value of _ (or _) [10.3].

When homoscedasticity does not obtain, as on the right side of Figure 10.4, r

will reflect the average degree to which the scores hug the line, but this aver¬

age will properly characterize the degree of relationship for only some values

of X and Y. Thus the meaning to be attached to r will / will not depend on

whether the hypothesis of homoscedasticity is appropriate to the data [10.3].

Discontinuous Distributions

What will happen if distributions that normally would be continuous have been

rendered discontinuous because of the exclusion of cases with intermediate values

on one variable? A sample constituted in this manner will / will not yield a

correlation coefficient different from what would be obtained in drawing a sample

so that all elements of the population had an opportunity to be selected [10.4].

Usually, discontinuity results in a coefficient higher than / the same as / lower

than otherwise [10.4].

Shape of the Distributions

If the correlation coefficient is to be calculated purely as a descriptive

measure, there is a / no requirement that the distributions be normal in shape

[10.5] . Obtaining the coefficient is often only the first step in analysis,

though. When additional steps are undertaken, the assumption frequently must

be made that X and Y are __ distributed [10.5].

Random Variation

When a correlation coefficient has been obtained from a particular set of

observations, it represents / does not represent the correlation between the

two variables; another sample will yield the same / a somewhat different value

[10.6] . In general, large / small samples yield values of r that are similar

from sample to sample, and thus the value obtained from a large / small sample

will probably be close to the population value [10.6]. For very large / small

samples, r is quite unstable from sample to sample, and its value can / cannot

be depended upon to lie close to the population value [10.6].


90 Chapter 10

The Circumstances Under Which the Correlation Coefficient Was Obtained

The influence of random sampling variation is the only / but one reason

why the obtained correlation coefficient is not the coefficient between the two

variables under study [10.7]. The degree of association between two variables

depends on (1) how the two variables were measured, (2) who the subjects were,

and (3) under what circumstances the variables operate. If any of these fac¬

tors changes, the extent of the assocation may also change. Consequently, it

is of utmost importance that a correlation coefficient be interpreted in the

light of the particular conditions by which it was obtained.

SYMBOLISM DRILL

Symbol Pronunciation Meaning

2
N

l
n

3
X

9
x X - or X score

4
E

6
y E / ; the _ of a _

5
x E / ; the _ of a _

1 4
s /E / ; _ of a

1 1
j2 E / ; _ of a

1 3
s2 E / ; _ of a

1 2
o /L 7 ; _ of a

1 5 z /a or / S ; score

1 6
r Pearson cor'n coef't for a

1 7
P Pearson cor'n coef't for a
CHAPTER 11

REGRESSION AND PREDICTION


11.1 The Problem of Prediction

__ 11.2 The Criterion of Best Fit

11.3 The Regression Equation: Standard


Score Form

11.4 The Regression Equation: Raw Score


Form

11.5 Error of Prediction: The Standard


Error of Estimate

__ 11.6 An Alternate (and Preferred)


Formula for S
YX

__ 11.7 Error in Estimating Y from X

11.8 Cautions Concerning Estimation of


Predictive Error

PROBLEMS and EXERCISES

1 ___

2 ___ . . ■

3 _______

4 _____ ...

5,

6___________

7 ___________

8 _______

9 ___

91
92 Chapter 11

SUMMARY

When the correlation coefficient for two variables is zero, knowledge of X

is of some / no help in predicting Y [11.1, Paragraph 1]. When the coefficient

is other than zero, whether positive or negative, knowledge of X permits pre¬

diction of Y with better than chance accuracy, and if the coefficient is ±1, we

can predict Y with almost perfect / perfect accuracy [11.1]. The problem of

prediction is thus closely related to the topic of correlation.

In the situations considered in this chapter, a linear (straight-line) rela¬

tionship between x and Y is a reasonable assumption, and the correlation coeffic¬

ient called r is thus a good measure of the correlation between X and Y. The

chapter shows how to use the value of r and certain other facts to find the

straight line of best fit to the Y values. Such a line is called a _

line [11.1, Paragraph 4], and it will correspond to an equation called a regress¬

ion equation. This line or its equation is used to predict the value of Y for

a given value of X.

The Criterion of Best Fit

How shall we judge which of all possible straight lines is the one that

best fits the values of y on hand and permits the best prediction of unknown Y

values? The criterion in use was first proposed by Karl Pearson, the person

who invented the correlation coefficient called r. Karl Pearson's solution to

this problem was to apply the _ criterion [11.2]. The

_ criterion calls for the straight line to be laid down in

such a manner that the _ of the squares of the discrepancies between the

actual and the predicted values of Y is as small as possible [11.2]. One impor¬

tant property of the least-squares solution is that the location of the regress¬

ion line and the value of the correlation coefficient will fluctuate less / more

under the influence of random sampling than would occur if another criterion

were used [11.2].

The Regression Line as a Mean

The regression line is a "running mean," a line that tells us the mean, or

expected value of _, for a particular value of _ [11.2, p. 180]. Y is the


Regression and Prediction 93

mean of all 7 values in the set, whereas 7' ("7 Prime," 7 predicted from the

regression line) is an estimate of the mean of _ given the condition that

has a particular value [11.2, p. 180].

The Regression Equation

The straight line of best fit to the 7 values, which is called the regress¬

ion line and is used for predicting unknown 7 values from particular values of

X, corresponds to an equation that takes a particular value of X and predicts

a value of 7. Such an equation is called a regression equation; strictly speak¬

ing, it is the equation for the regression of _ on _ [Equation 11.1 or

11.2]. The equation may be written in terms of standard scores (z scores) or

raw scores.

In terms of standard scores, the equation is very simple:

z’ = [Equation 11.1]

where z'Y is the predicted standard score value of __ [11.3]. This form of

the equation makes it easy to see how two important generalizations are true.

First, suppose the value from which prediction was made was the mean of X.

Since the z-score equivalent of the mean is _, the predicted standard score

value of 7 is zero, or in other words, the _ of 7 [11.3]. This prediction

will be the same, irrespective of the value of r / hold only for certain values

of r [11.3]. Second, if r = 0, then the predicted standard score value of 7

will always be [11.3]. In raw score terms, if the correlation is zero,

the predicted value of 7 is the _ of 7 no matter what value of X is used

to predict 7 [11.3].

In terms of raw scores, the regression equation is complicated:


" •'I f \

7' x - X + 7 [Equation 11.2]


V > \ )

Note, however, that the expression inside the first pair of parentheses is the

same as that inside the second pair.

Measuring Errors of Prediction

A value of 7 predicted from a given value of X is only an estimate of the

mean value of 7 for persons with the given score on X. If the correlation
94 Chapter 11

between X and Y is low / high , considerable variation of actual values about

the predicted value may be expected [11.5]. If the correlation is low / high,

the actual values will cluster more closely about the predicted value [11.5].

Only when the correlation is _ will the actual values regularly and

precisely equal the predicted values [11.5].

The magnitude of the errors of prediction is measured by a quantity called

the standard error of estimate of Y on X, which is symbolized [Equation

11.3]. The standard error of estimate is a kind of ;

it is the_of the distribution of obtained Y scores

about the predicted Y score [11.5]. The value of Syx ranges from when

the correlation is perfect to Sy when there is no correlation at all [11.5, p.

186]. The simplest formula for Sy% expresses it in terms of r and SY:

Syx = [Equation 11.4]

Predicting a Distribution of Y Values

With the help of SyX one can predict not only the mean score on Y for cases

with a given score on X (which is what Y' is), but also the entire distribution

of Y scores for cases with that score on X. The procedure for doing so is

described in Section 11.7. Correct application of this procedure requires that

several assumptions be satisfied. First, since the regression equation is used

to obtain the predicted value of Y, we must assume that a line

is the line of best fit, or this predicted value may be too high or too low

[11.8]. Second, Syy is taken as the standard deviation of the distribution of

obtained Y scores about Y’ , irrespective of the value of from which the

prediction has been made [11.8]. It is therefore necessary to assume that

variability is the same from column to column (the assumption of _

_) [11.8]. Third, the procedure depends on the assumption that the

distribution of obtained Y scores (for a particular value of X) is

[11.8].
Regression and Prediction 95

EXERCISES

To understand what you are doing in using the regression equation to predict
a Y score given an X score, and to check to see if the prediction you are making
is a reasonable one, it is helpful to think: Where is X in relation to its mean
X? And where is Y predicted to be in relation to its mean Y? Where Y is pre¬
dicted to be in relation to Y depends on a) where X is in relation to X, and on
b) whether r is positive, zero, or negative. You can figure out for yourself
exactly how this works by using the standard-score form of the regression equa¬
tion (see p. 118).

For example, suppose you want to predict the score on Y for a case whose X
value is above the mean of X. Suppose further that the correlation between X
and Y is positive. The regression equation says that z’y = rz^, and here r will
be some positive number, while zx will also be positive, because a raw score
above its mean has a positive z score. The product of two positive numbers is
also positive; thus the equation predicts that this case will have a positive
z score on variable Y. A positive z score indicates a raw score above the mean
of Y. Thus if X > X and if r is positive, Y' > Y.

This generalization is indicated in the upper—left corner of the table below.


Using reasoning like that illustrated above, fill in the other cells of the table,
in each case putting the symbol >, =, or < between the Y' and the Y. Remember:

A raw score above the mean has a positive z score.


A raw score at the mean has a z score of zero.
A raw score below the mean has a negative z score.

The product of two positive numbers is positive.


The product of two negative numbers is positive.
The product of a positive number and a negative number is negative.

If r is positive If r is zero If r is negative

If X > X Y* > Y Y' Y Y' Y

If X = X Y< y Y' Y Y’ Y

Y' Y Y> Y
If X < X Y' Y
96 Chapter 11

SYMBOLISM DRILL

Symbol Pronunciation Meaning

1
Number of scores in a sample

2 Number of scores in a population

3 A raw score, or the set of raw scores

4 Result of summing quantities of some kind

5 YjX/n; the mean of a

6 Ix/2V; the mean of a

9 X - X or X - ji; deviation score

11 Ix2/IV; of a

12 /Ix2/IV; of a

13 Ix2/n; of a

14 /Ix2/n; of a

1 5 x/0 or x/S; score

16 Pearson correlation coefficient for a sampl

17 Pearson correlation coefficient for a pop'n

1 8 "wi prime" Predicted raw score on Y

19 "zee prime sub wi" Predicted z score on Y

20 "es sub wi eks" Standard error of estimate of Y on x


CHAPTER 12

INTERPRETIVE ASPECTS
OF CORRELATION AND REGRESSION

12.1 Factors Influencing r: Range of


Talent

12.2 Factors Influencing r: Heterogeneity


of Samples

12.3 Interpretation of r: The Regression


Equation (I)

_ 12.4 Interpretation of r: The Regression


Equation (II)

12.5 Regression Problems in Research

12.6 An Apparent Paradox in Regression

12.7 Interpretation of r: kf the Coeffi¬


cient of Alienation
o t

12.8 Interpretation of r: r , the Coeffi-


" cient of Determination

12.9 Interpretation of r: Proportion of


—— — Correct Placements

12.10 Association and Prediction: Summary

PROBLEMS and EXERCISES

3 4

5 6

7 8

9 10

11 12

13 14

15 16

17 18

19 20

21

97
98 Chapter 12

SUMMARY

Chapter 12 offers a wealth of details that provide important insights into


some subtle aspects of correlation and prediction.

Range of Talent

The variability among the scores on a given variable, as measured by the

range or the standard deviation of the distribution, affects the value of r when

r is used as an indicator of the correlation between this variable and any other

In a given situation, restriction of range may take place in X, in Y, or in both

The value of r will be smaller / larger in those situations in which the range

of either X or Y (or both) is less, other things being equal [12.1]. This means

that there is no such thing as the correlation between two variables, and that

the value obtained must be interpreted in the light of the _

of the two variables in the circumstances in which it was obtained [12.1].

Other things being equal, the greater the restriction of range in X and/or Y,

the higher / lower the correlation coefficient [12.1].

Heterogeneity of Samples

On the left side of Figure 12.2 (p. 200) is a scatter diagram showing the

correlation between a pair of variables in two different samples. Within each

sample, the correlation is positive and moderately strong. If r is computed for

the combined data, however, it will be smaller (though still positive). Why?

When the data are pooled, the scores no longer hug the _ line

(which now must be a line lying amidst the two distributions) as closely as they

do in either distribution considered by itself [12.2].

In the particular case illustrated, the distributions differed, between

samples, in the mean of _ but not in the mean of _ [12.2]. Other types

of differences are quite possible. The second illustration in Figure 12.2

shows a situation in which a second sample differs in that both the mean of X

and the mean of Y are higher than in the first sample. In this case, the corre¬

lation will be greater / smaller among the pooled data than among the separate

samples [12.2].

The Role of r in the Regression Equation

The role of r in the regression equation is easiest to understand when the


Interpretive Aspects of Correlation and Regression 99

regression equation is cast in standard score form. In this case, the correla¬

tion coefficient is the of the regression line [12.3]. One interpre¬

tation of the correlation coefficient, therefore, is that it states the amount

of increase in that accompanies unit increase in _ when both measures

are expressed in standard score units [12.3]. It indicates how much of a _

Y will increase for an increase of one standard deviation

in X [12.3] .

When the regression equation for predicting Y from X is stated in raw score

form, its slope is r( / ) rather than r [12.3]. This expression is known

as the regression [12.3]. It states the amount of increase

in that accompanies unit increase in _ when both measures are expressed

in raw score terms [12.3].

Regression on the Mean

The correlation between intelligence of parents and intelligence of offspring

has a value close to +.50. If parental intelligence is two standard deviations

above the mean, the predicted intelligence of their offspring is __ standard

deviation above the mean [12.4]. On the other hand, if parents intelligence

is two standard deviations below the mean, the predicted intelligence of their

offspring is only one standard deviation above / below the mean [12.4]. To put

it in other words, bright parents will tend to have children who are brighter/

duller than average, but not as bright as they, and dull parents will tend to

have children who are dull / bright , but not as dull as their parents [12.4].

Remember that the predicted value is to be thought of as an average value.

This phenomenon is called regression on the mean. The fact of regression on

the mean is, of course, characteristic of any relationship in which the correla¬

tion is less than perfect / zero [12.4]. The more extreme the value from which

prediction is made, the greater / lesser the amount of regression toward the

mean [12.4]. The higher the value of r, the less / greater the amount of re¬

gression [12.4].

Regression on the mean is frequently a problem in research. When subjects

are selected because of their extreme position (either high or low) on one vari¬

able, we expect their position on a correlated variable to be in the same/


100 Chapter 12

opposite direction, but more / less extreme [12.5]. The two variables could

be test and retest on the same measure (as in the excellent example offered in

the first paragraph of Section 12.5), or they could be two different variables.

Again, it should be noted that the amount of regression will depend on the size/

direction of the correlation between the two variables [12.5].

If parents with extreme characteristics tend to have offspring with charac¬

teristics less extreme than themselves, how is it that, after a few generations,

we do not find everybody at the center? The answer is that regression of pre¬

dicted values toward the mean is accompanied by variation of obtained values

about the predicted values, and the greater the degree of regression toward the

mean, the greater / lesser the amount of variation [12.6]. Specifically, Y',

the predicted value of Y, is only the predicted of Y for those who ob¬

tain a particular score in X. The obtained Y values corresponding to that value

of X will be distributed about Y' with a standard deviation equal to _____ [12.6]

The lower the value of the correlation coefficient, the greater the value of

_, so that the greater the degree of regression on the mean, the greater/

lesser the variation of obtained Y values about their predicted values [12.6].

The Coefficient of Alienation

The measure of variability of scores about the regression line, or as we

might call it, the measure of error of prediction, is given by the standard error

of estimate, Syx* The maximum possible error of prediction occurs when r = ,

in which case Syy = [12.7]. The ratio of SYX to _ gives the proportion

of the maximum possible predictive error that characterizes the present predic¬

tive circumstances [12.7], and this ratio turns out to equal /l - r2. This

quantity is symbolized by the letter _, and it is called the coefficient of

_ [12.7].

When the value of k is close to unity (its maximum value), the magnitude of

predictive error is close to its maximum / minimum [12.7]. On the other hand,

when the value of k is close to zero, most / little of the possible error of

prediction has been eliminated [12.7]. The first two columns of Table 12.1 on

p. 208 show how k changes as r changes and make it clear that a given change in

the magnitude of a correlation coefficient has greater consequences when the

correlation is high / low than when it is high / low [12.7].


Interpretive Aspects of Correlation and Regression 101

The Coefficient of Determination

In a bivariate distribution, total variation in Y may be thought of as

having two component parts: variation in Y that is associated with or attribu¬

table to changes in _, and variation in Y that is inherent in Y, and hence

independent of changes in _ [12.8]. r2 gives the proportion of Y variance

that is associated with changes in _, and it is called the coefficient of

[12.8]. The middle column of Table 12.1 on p. 208

shows that the proportion of Y variance accounted for by variation in _ in¬

creases more slowly / rapidly than does the magnitude of the correlation coef¬

ficient [12.8] .

Proportion of Correct Placements

The interpretation of the meaning of r according to k or r is strongly /

certainly not very encouraging as to the value of r's of moderate magnitude

[12.9], A somewhat more cheerful outlook may be had by considering the propor¬

tion of correct placements that occur when the regression equation is used to

predict "success." Assume a normal bivariate distribution, that success is

defined as scoring above the median on the criterion variable (Y), and that

those who are selected as potentially successful are those who score above the

median on the predictor variable (X). The last two columns of Table 12.1 on

p. 208 present, for selected values of r, the proportion of correct placements

in excess of chance and the proportion of improvement in correct placements

relative to the chance proportion of .50.

SYMBOLISM DRILL

Symbol Pronunciation Meaning

3
X
102 Chapter 12

Symbol_Pronunciation_Meaning

4
Z
6
V ZX/N ; the of a

5 X Zx/n ; the of a

9 X or f score
11
a2 2 / •
/ of a
12
o ^ / f of a
1 3
s2 2 / f of a
14
s 2 / •
s of a
15
z /Q or /S score

16
r Pearson cor 1 n coef't for a

17
P Pearson cor' n coef't for a

18
Y' "wi prime" Predicted score on

19
ZV "zee prime sub wi" Predicted score on

20
"es sub wi eks"
SYX
CHAPTER 13

THE BASIS OF STATISTICAL INFERENCE

13.1 Introduction: The Plan of Study

13.2 A Problem in Inference: Testing


Hypotheses

13.3 A Problem in Inference: Estimation

13.4 Basic Issues in Inference

13.5 Probability and Random Sampling:


An Introduction

13.6 The Random Sampling Distribution


of Means: An Introduction

13.7 Characteristics of the Random


Sampling Distribution of Means

13.8 The Sampling Distribution as


Probability Distribution

13.9 The Fundamentals of Inference and


the Random Sampling Distribution of
Means: A Review

13.10 Putting the Sampling Distribution


of Means to Use

PROBLEMS and EXERCISES

7 8

9 10

103
104 Chapter 13

SUMMARY

The Two Types of Inferential Procedures

A basic aim of statistical inference is to form a conclusion about a char¬

acteristic of a __ from study of a _ taken from

that __ [13.2]. That characteristic may be a proportion, a

mean, a median, a standard deviation, a correlation coefficient, or any other

of a number of statistical _ [13.3]. Inference may also be

concerned with the difference between _ with regard to a

given parameter [13.3]. There are two types of inferential procedures:

_ testing and __ [13.2].

ln ___ testing, we have a particular value in mind; we

hypothesize that the value we have in mind characterizes the population / sample

of observations [13.3]. To evaluate a hypothesis, we will ask what type of

sample results one would expect to obtain if the hypothesis were correct / in¬

correct [13.2, next to last paragraph on p. 218]. If the sample outcome is not

in accord with what one would expect, we will accept / reject the hypothesis

[13.2] .

In estimation, on the other hand, no particular sample / population value

need be stated. Rather, the question is, what is the sample / population value

[13.3] ? To answer the question, a sample is drawn and studied and an inference

is made about the parameter of interest.

Random Sampling Distributions

The fundamental fact of sampling is that the value of the characteristic we

are studying will vary / stay the same from sample to sample [13.4]. The key

to any problem in statistical inference is to discover what sample values will

occur in repeated sampling, and with what relative frequency. A distribution

composed of values (such as the mean) characterizing samples of some one partic¬

ular size drawn repeatedly from the same population is known as a sampling dis¬

tribution. We must be able to describe it completely if we are to say what would

happen when samples of that size are drawn from that population.

There is just one basic method of sampling that permits sampling distribu-
The Basis of Statistical Inference 105

tions to be known. This is to draw probability samples--samples for which the

probability of inclusion in the sample of each _ of the population

is known [13.4]. One kind of probability sample is the random sample, which is

a sample so drawn that each possible _ of that size has an equal prob¬

ability of being selected [13.5].

It is the method of selection, and not the particular sample outcome, that

defines a sample as random. If we were given a population and a sample from that

population, it would be possible / impossible to say whether the sample was ran¬

dom without knowing the method by which it was selected [13.5].

One important characteristic of a random sample is that every _

in the population has an equal probability of inclusion in the sample [13.5].

The Random Sampling Distribution of Means

Although inference could be about any one of a number of characteristics of


a population (parameters), this chapter and the next several chapters focus on
inference about means, and thus the random sampling distribution of means is of
special concern. There is one such distribution for every population and every
sample size. Such a distribution can be hypothetically generated as follows:
From the population of interest, draw a random sample of a given size, and find
the mean of the sample. Replace the sample, "stir well," and draw another random
sample of the same size, now finding the mean of this one. Replace this sample,
and continue this procedure indefinitely. The infinitely large collection of
means of samples of the given size drawn at random from the population of interest
is the random sampling distribution of means for this case.* *

There is not just one random sampling distribution corresponding to a given

population, but a family of such distributions, one for each possible sample

_ [13.7, p. 224].

A random sampling distribution of means, like any other distribution, is

completely defined by specifying its mean, standard deviation, and shape. The

mean of any random sampling distribution of means is the same as the mean of

the of scores [13.7]. The symbol for the mean of a random

sampling distribution of means is _ [Formula 13.1], which is read mew sub

eks bar."

*Actually, there are two random sampling distributions for each such case:
one for sampling with replacement and one for sampling without replacement.
This is a nuance hinted at in the second footnote on p. 223 and treated fully m
Section 14.5
106 Chapter 13

The standard deviation of a random sampling distribution of means is called

the standard _ of the mean and is symbolized [Formula 13.2],

which is read "sigma sub eks bar." It is computed as follows:

= - [Formula 13.2]

The formula shows that (a) means vary more / less than scores do (when sample

size is at least two), (b) means vary more / less where scores vary less, and

(c) means vary more / less when sample size is greater [13.7].

With regard to the shape of the distribution, if the population of scores is

normally distributed, the sampling distribution of means will also / not be

normally distributed [13.7]. If the population is not normally distributed, the

Central Limit Theorem informs us that the random sampling distribution of means

tends toward a normal distribution irrespective of the shape of the population

of observations sampled and that the approximation to the normal distribution

becomes increasingly close with increase / decrease in sample size [13.7].

Thus even when the population of scores differs substantially from a normal

distribution, the sampling distribution of means may be treated as though it

were normally distributed when sample size is reasonably small / large [13.7].

A random sampling distribution of means may be used to tell what proportion


of sample means would fall between certain limits, or alternatively, to give the
probability of drawing such a mean in random sampling, as in the examples worked
in Sections 13.8 and 13.10.

[If you feel a need for a summary of the material on probability, look ahead
to the first two paragraphs on p. 114 of this workbook.]
The Basis of Statistical Inference 107

MAP of RANDOM SAMPLING DISTRIBUTION of MEANS

A Random Sampling Distribution of Means

consists of can be used to compute

symbolized
X

probability that a
single sample of size
n will have a mean
that falls between
certain limits

a certain mean, — mean, symbolized


symbolized y p—, equal to y

a certain standard • standard deviation, symbolized 0^,


deviation, symbolized called standard error of the mean,
a equal to a//rT

a certain shape, not ■ shape that is


necessarily normal •normal if population is normal in shape;
* an approximation to the normal one, even
if population is not normal, with a larger
n bringing a closer approximation

as described by

l
Central Limit
Theorem
108 Chapter 13

EXERCISES

To gain some insight into the random sampling distribution of means, turn
to p. 547 of the text. There are 2500 single digits on this page, which we can
take to be a population. What are the characteristics of this population?

1. We are assured that the digits were chosen at random, so each of the ten
possible values (1 through 9 plus 0) occurs about 1/10 of the time in this pop¬
ulation. Let's assume the figure is exactly 1/10 for each digit; the assumption
can't be far wrong. (You're welcome to make a frequency distribution to check
the assumption; it shouldn't take more than a week.) On this assumption, then,
each value has a frequency of 1/10 of 2500, or 250, and the shape of the dis¬
tribution is thus rectangular.

2. The mean of the population is the same as the mean of a distribution


consisting of just one occurrence of each of those ten digits. To see that
this is true, think of the scores as weights on a plank. The balance point for
the full population, which would appear as 10 stacks of weights each 250 weights
high, would be the same as the balance point for the bottom layer of those
stacks. So the mean works out to be (you figure it out):

3. The standard deviation of the population can be calculated using the


same logic used in figuring the mean: that for the whole distribution is the
same as that for a distribution consisting of one occurrence each of those ten
values. (This logic works only for a rectangular distribution, please note.)
So the standard deviation works out as:

Now, let's approximate the random sampling distribution of means for the
case in which samples of size two are drawn from this population. To draw a
first sample of size two, we should pick two digits in such a way that all
samples of size two have the same chance of occurrence. That's tough to do.
For present purposes, it will be sufficient to close your eyes and put the point
of a pencil down somewhere in the table. Take the digit closest to the point
as the first element of the sample, and take the digit to the right of this one
(or to the left of it, or above it, or below it—whatever you want) as the sec¬
ond element. Record the mean of these two numbers on the next page. Now repeat
The Basis of Statistical Inference 109

this procedure to get another sample of size two, and continue in this way
until you have 20 samples. (To generate the real sampling distribution, you
should continue forever, but then you'd never finish this course.) If you
get tired of closing your eyes each time, just read off pairs of digits start
ing any old place on the page; that'll be good enough.

Mean of Sample Mean of Sample Mean of Sample Mean of


Sample
Sample Number Sample Number Sample Number Sample
Number

1 6 11 16

2 7 12 17

8 13 18
3

4 9 14 19

5 10 15 20

Putting these 20 means together now will get you a rough approximation of
the random sampling distribution of the mean for samples of size two selected
from that population of 2500 digits.

1. In the space below, make at least a rough histogram of your distribution


of 20 sample means. You should find that the shape is not rectangular like
that of the population, but something more bell-shaped, as the Central Limit
Theorem says. The width of your class intervals should be small, by the way,
maybe half a point. Convenient class intervals are -0.25 to +0.25, 0.25 to
0.75, 0.75 to 1.25, 1.25 to 1.75, and so on.

2. Calculate the mean of your distribution of 20 sample means. It should


fall fairly close to the mean of the population. If you had the real sampling
distribution, its mean would be exactly the same as the mean of the population.
110 Chapter 13

3. Calculate the standard deviation of the 20 sample means. This is an


approximation to the standard deviation of the entire sampling distribution,
which is called its standard error and which would equal o//n where n = 2.
Calculate the standard error too. Note that the standard deviation of your
20 means is (almost certainly) less than the standard deviation of the raw
scores in the population.

And now, to gain even more insight, draw 20 samples all of some size greater
than two. Ten is a convenient size, because the division involved in finding
the mean of a sample is then simple. Again record the mean of each sample.

Sample Mean of Sample Mean of Sample Mean of Sample Mean of


Number Sample Number Sample Number Sample Number Sample
1 6 11 16

2 7 12 17

3 8 13 18

4 9 14 19

5 10 15 20

Again make at least a rough histogram to get the shape of the distribution
these 20 sample means It should be even closer to normal than the shape
the distribution for the case in which sample size was only two. Also cal-
culate the mean of your collection of means and the standard deviation. The
latter should be less than the standard deviation of the raw scores in the pop¬
ulation again, and less than the standard deviation of the distribution for the
case in which sample size was two. Finally, calculate the theoretical standard
error of the mean for samples of whatever size you used this second time.
The Basis of Statistical Inference 111

SYMBOLISM DRILL

Symbol Pronunciation Meaning

l Number of scores in a sample

2 Number of scores in a population


112 Chapter 13

Symbol_Pronunciation_Meaning

3
A raw score, or the set of raw scores

4
Result of summing quantities of some kind

6
ZX/Nj the mean of a

5
Zx/n; the mean of a

9
X - X or x - y; deviation score

12
/Zx2/N; of a

11
Zx2/N‘, of a

13
Zx2/n; of a

14
/Zx2/n; of a

15
x/0 or x/S',

17
Pearson correlation coefficient for a pop' n

16
Pearson correlation coefficient for a sample

18
Predicted raw score on Y •

19
Predicted z score on y

20
Standard error of estimate of Y on X

"mew sub eks bar" Mean of sampling distribution of means

sigma sub eks bar" Standard error of the mean; o//n


CHAPTER 14

THE BASIS OF STATISTICAL INFERENCE:


FURTHER CONSIDERATIONS

14.1 Introduction

14.2 Another Way of Looking at Proba¬


bility

14.3 Two Theorems in Probability

14.4 More About Random Sampling

14.5 Two Sampling Plans

14.6 The Random Sampling Distribution


of Means: An Alternative Approach

14.7 Using a Table of Random Numbers

PROBLEMS and EXERCISES

1_

2_

3_

4_

10

113
114 Chapter 14

SUMMARY

Probability, Empirical and Theoretical

Questions of probability arise when a repeatable event occurs and gives rise

to one of two or more possible outcomes. Examples are flipping a coin, which

gives rise to the outcome "heads" or the outcome "tails," and drawing a card from

a deck of playing cards, which gives rise to one of 52 possible outcomes. The

occurrence of such an event is called a trial. What now do we mean when we speak

of the probability of an outcome of some kind turning up on a given trial? The

previous chapter defined the probability of an outcome in terms of what happens

when trials are repeated over and over again indefinitely: The probability of

the occurrence of A on a single trial is the ____ of trials char¬

acterized by A in an infinite series of trials, when each trial is conducted in

a like manner [13.5, p. 221].

It is impossible to put this definition into practice, of course, because an

infinite series of trials can never be obtained. The best we can do is to repeat

the event of interest some finite number of times and compute the proportion of

trials characterized by A. This gives us what the present chapter calls an em¬

pirical probability. An empirical probability is but an estimate of the true

value, and confidence is to be placed in it according to the __ of

observations (trials) on which it is based [14.2, next-to-last paragraph].

If we know that the several possible outcomes of an event are equally likely,

we don't have to fuss with empirical probabilities. Instead we can find a prob¬

ability of interest by using the following theorem:

Given a population of possible outcomes, each of which is equally

likely to occur, the probability of occurrence on a single trial of

an outcome characterized by A is equal to the _ of out¬

comes yielding A, divided by the total number of _

_ [14.2].

In a roll of a fair die, for example, the six different faces are equally likely

to turn up. Thus the probability of rolling a face with an odd number of spots

is 3/6, or 1/2, because 3 faces yield this characteristic (the faces with 1, 3,

and 5 spots), and there are a total of 6 faces. A probability like this computed

from the theorem stated above is called a theoretical probability.


The Basis of Statistical Inference: Further Considerations 115

The Probability of This OR That

Questions about probability sometimes take the form: what is the probability

of this OR that happening when some event occurs? Such a question can be readily

answered by using the addition theorem of probability, but only if the outcomes

of interest (the "this" and the "that") are mutually exclusive. Outcomes are

mutually exclusive when the occurrence of one _ the possibility

of the occurrence of any of the others [14.3]. Another way to say this is that

two outcomes are mutually exclusive if they cannot occur on the same trial. In

drawing a card from a deck of playing cards, for example, the outcomes King and

Queen are mutually exclusive, because no card can be both a King and a Queen.

But the outcomes King and Club are not mutually exclusive, because a card can be

both a King and a Club. According to the addition theorem:

The probability of occurrence of any one of several particular

outcomes is the sum / product of their individual probabilities,

provided that they are ____ [14.3].

The Probability of This AND That

Other questions about probability take the form: what is the probability of

this AND that happening? Such a question can be readily answered by using the

multiplication theorem of probability, but only if the event that might generate

the "this" and the event that might generate the "that" are independent. Inde¬

pendence of events means that the outcome of one event must have some / no influ¬

ence on and in some /no way be related to the outcome of the other event [14.3].

According to the multiplication theorem:

The probability of several particular outcomes occurring jointly

is the sum / product of their separate probabilities, provided

that the events that generate these outcomes are __________

[14.3] .

The multiplication theorem applies only to situations where two or more _

are considered together, as in the tossing of two coins or the result of tossing

one coin twice [14.3].

Random Sampling

In the previous chapter, a random sample was defined as a sample so drawn


116 Chapter 14

that each possible _ of that size has an equal probability of selec¬

tion [14.4]. If a sample is drawn in this way, it is necessarily true that

every _ in the population has an equal chance of being selected

[14.4]. The reverse is / is not true; giving equal probability to the elements

does / does not necessarily result in equal probability for samples [14.4].

Sampling With and Without Replacement

Although there is only one way to define a random sample, there are two

sampling plans that yield a random sample. One plan is sampling without replace¬

ment. The characteristic of this method is that an / no element may appear more

than once in a sample [14.5]. The other plan is sampling with replacement.

Under this plan it is possible / impossible to draw a sample in which the same

element appears more than once [14.5]. Both of these plans can satisfy the con¬

dition of random sampling, but certain sample outcomes possible when sampling

with/without replacement are not possible under the other method [14.5].

There are three characteristics of a sampling distribution that define it

completely: mean, standard deviation, and shape. The first and last of these are

affected / unaffected by choice of sampling plan [14.5]. The standard deviation

of the sampling distribution is smaller / larger when sampling without replace¬

ment [14.5]. The formula given in Chapter 13 for the standard deviation (for the

"standard error of the mean") is strictly correct only if sampling is with re¬

placement. Despite the fact that most sampling in behavioral science is done

without replacement, we typically use the Chapter-13 formula for the standard

error of the mean. No/ Considerable harm is done in the usual case, where the

sample may be thought to be substantially smaller than 5% of the population [14.5]

Another Way to Conceptualize the Random Sampling Distribution of Means

In the previous chapter, the random sampling distribution of means was con¬

ceived as the result of a(n) large/ infinite series of sampling trials [14.6].

Another view is possible: the random sampling distribution of means is the rela¬

tive frequency distribution of means obtained from n / all possible samples of

a given size that could be formed from a given population [14.6]. This definition

holds whether sampling is done with or without [14.6].


The Basis of Statistical Inference: Further Considerations 117

When the sampling distribution is defined in this way, it is possible to

generate an entire such distribution for a population of finite size. An example

appears on pp. 240 - 243. Among the important insights this example offers is

the point that random sampling results in equal probability of occurrence of any

possible sample / sample mean , not in equal probability of occurrence of any

possible sample / sample mean [14.6].

SYMBOLISM DRILL

Symbol_Pronunciation Meaning

4 I

5 Zx/n; the of a

6 ZX/N; the of a

9 or •
r
score
X

12 Z / ; of a
0

1 1 of a
a2 £ / ;

14 s I / ; of a

1 3 Z / ? of a
s2

1 5 z /a or /S’, score

16 r

17 p

18
Y'

19 z'Y -

20
Svv
YX -

21 u_
-
22 Standard of the ; a//n
CHAPTER 15

TESTING HYPOTHESES ABOUT SINGLE MEANS:


NORMAL CURVE MODEL

15.1 Introduction

15.2 Testing an Hypothesis about a Single


Mean

15.3 Generality of the Procedure for


Hypothesis Testing

15.4 Estimating the Standard Error of the


Mean When 0 is Unknown

15.5 Captain Baker's Problem: A Test about y

15.6 Captain Baker's Problem: Conclusion

15.7 Directional and Nondirectional Alter¬


native Hypotheses

__ 15.8 Summary of Steps in Testing an Hypoth¬


esis about a Mean

15.9 Reading Research Reports in Behavioral


Science

15.10 Review of Assumptions in Inference


about Single Means

15.11 Problems in Selecting a Random Sample


and in Drawing Conclusions
PROBLEMS and EXERCISES

1_ 2 __ 3 __

4_ 5 6____

7_ 8 ____ 9___

10_ 11____12___

13 14 _

119
120 Chapter 15

SUMMARY

This chapter introduces the logic of hypothesis testing and the procedure
for testing a hypothesis about the mean of a single population. The procedure,
strictly speaking, requires knowledge of the standard deviation of the popula¬
tion, but this will rarely be known, so it must usually be estimated from the
sample on hand. Naturally, substituting an estimate for the real thing intro¬

duces some error, but the larger the sample the smaller / larger the error

[15.1]. So, if sample size is large enough (n greater than 40 or so), the error
will be small enough so that the procedure described here is satisfactory. For
this reason, the procedure is sometimes known as the large-sample method. It
takes the normal curve as a model for a certain sampling distribution.

Estimating the Population's Standard Deviation and the Standard Error of the Mean

To test hypotheses about means, it will be necessary to calculate the stan¬

dard error of the mean. This is symbolized _ and is equal to

[15.4, unnumbered formula]. This formula requires knowledge of <J, the

of the population / sample , but in actual practice

a will be unknown, and it must be estimated from the sample. One would think

that _, the sample standard deviation, would be the proper estimate, but it

proves to be (on the average) a bit too small / large [15.4].

A statistic called s ("little es") provides a better estimate of a. Its

formula is:
s = [Formula 15.1]

The defining formula for S (which should now be read "big es") is _,

so it is clear that s differs only in that the divisor is _ rather than

n [15.4]. The change in the divisor makes s a bit larger / smaller than S [15.4].

[It is now highly important to distinguish "big S" from "little s." One way

to do it in writing is to print the capital letter and print it large, like

this: , while making the small letter small and of the script variety, like

this: . ]

Substituting s for a in the formula for the standard error of the mean yields

the working formula for estimating the standard error. The estimate of the stan¬

dard error is symbolized s—, to distinguish it from 0—. The formula for the
X X
estimate is:
s— = [Formula 15.2]
X

Substituting s for a in making this estimate takes care of bias that would be
Testing Hypotheses about Single Means: Normal Curve Model 121

introduced if S had been used, but different samples will always yield the same

estimate / still yield different estimates , and so the constant / variable error

introduced by substituting an estimate for the true value remains [15.4]. Pro¬

cedures described here make / do not make allowance for this error, so we must

remember to use them only when samples are large / small enough [15.4].

Stating Hypotheses

In testing an hypothesis about the mean of a population, a researcher states

that the mean has a certain specific value (e.g., y = 30). Such a statement is

symbolized H^ ("aitch null")* HQ is called the_hypothesis [15.5]; it is

the hypothesis that the researcher will test and will decide to accept or reject.

The researcher must also state an alternative hypothesis, symbolized

("aitch sub ay"). This alternative may be nondirectional or directional.

A nondirectional hypothesis states that the population mean does not equal the

value specified by the null hypothesis (e.g., y / 30), without saying whether the

mean is less than the value specified by the null or greater. Use of a nondirec¬

tional alternative allows the investigator to reject the null / alternative

hypothesis if the evidence points with sufficient strength to the possibility

that y is greater than the value hypothesized (by Hq), or to the possibility that

it is less [15.7] .

A directional alternative hypothesis takes one of two forms, stating either

that the population mean is less than the value specified by the null hypothesis

(e.g., y < 30) or that the population mean is greater than this value (e.g., y
> 30). A directional alternative hypothesis is appropriate when it is only of

interest to learn that the true value of y differs from the hypothesized value

(the value specified by the null hypothesis) in a particular direction / either

direction [15.7].

One must choose, therefore, between a directional alternative and a nondirec¬

tional one. The choice should be determined by the rationale that gave rise to

the study, and should be made before / after the data are gathered [15.7].

When a nondirectional alternative hypothesis is stated, the resulting test

is referred to as a one-tailed / two-tailed test, because H0 will be rejected

if the obtained sample mean is located in an extreme position in just one/ either
122 Chapter 15

tail of the sampling distribution [15.7]. Similarly, a directional alternative

leads to a one-tailed / two-tailed test [15.7].

The Level of Significance

In testing a null hypothesis, one draws a sample at random from the popula¬

tion of interest and determines the mean of the sample. If the sample mean is

so different from what is expected when Hq is true that its appearance would be

unlikely, Hq should be accepted / rejected [15.5, Paragraph 2]. What degree of

rarity of occurrence is so great that it seems better to reject the null hypoth¬

esis than to accept it? Common research practice is to reject Hq if the sample

mean is so deviant that its probability of occurrence in random sampling is .05

or less, or alternatively, _or less [15.5]. Such a criterion is called

the level of _ and is symbolized by the Greek letter a,

which is pronounced " " [15.5].

Picturing the Sampling Distribution Implied by Hq

What sample means would occur if Hq were true? If it were true, the random
sampling distribution of means (for whatever sample size is used) would center
on the value specified by the null hypothesis, because the mean of a random sam¬
pling distribution of means, P—, is equal to the mean of the population of raw

scores, ]i. The value that the null hypothesis specifies is symbolized P^yp, so

in general, we may say, p_ = Vhyp if Hq is true.

One can thus draw a picture of the random sampling distribution of means
(for whatever sample size is used) on the assumption that Hq is true. Such a
picture appears in Figures 15.2 and 15.3 (which are essentially the same), 15.4,
and 15.5. Each pictured distribution is centered on Upyp, which takes the value
30 for the example used in this chapter.

Regions of Acceptance and Rejection

The sampling distribution of means that would occur if Hq were true is divi¬
ded into a region of acceptance and one or two regions of rejection. If the
obtained sample mean falls within a region of acceptance, the null hypothesis is
accepted; if it falls within a region of rejection, the null hypothesis is rejec¬
ted.

The region of acceptance always covers the central portion of the distribution
and always includes as Figures 15.2 through 15.5 show.

There are two regions of rejection, one in each tail, if the alternative hy¬
pothesis is nondirectional, as in Figures 15.2, 15.3, and 15.4. There is just
one region of rejection, located in just one of the tails, if the alternative
hypothesis is directional. If the alternative hypothesis specifies a value for
Testing Hypotheses about Single Means: Normal Curve Model 123

y less than that named by the null, the region of rejection lies in the left
(lower) tail, as in the left half of Figure 15.5; if the alternative specifies
a value for y greater than that named by the null, the region of rejection lies
in the right (upper) tail, as in the right half of Figure 15.5.

In every case, the total area of the region or regions of rejection is equal
to a, the level of significance. If there are two regions, each has half of
this amount (half of .05 in Figures 15.2 and 15.3, half of .01 in Figure 15.4).

Using z Scores in Hypothesis Testing

The base line in the pictures of the random sampling distribution of means
implied by the null hypothesis is divided into regions of acceptance and rejec¬
tion by one or two z scores. These z scores are found by consulting Table C in
Appendix F and are called critical values, symbolized zcr±t'

To determine whether the obtained sample mean falls within the region of

acceptance or the region of rejection, it is necessary to express the sample

mean as a z score. In general, a z score has the form

score - mean of distribution


Z _ standard deviation of distribution

In the distribution of sample means, the ____ a score' the

hypothesized population mean is the __t and the standard __ of

the mean is the standard deviation [15.6, Paragraph 2], Consequently, the loca¬

tion of the sample mean is expressed by:

z = [P. 256, unnumbered formula]

Now since a is unknown. is also unknown, and the sample mean can/

cannot be expressed as a true z [15.6]. The true z can be estimated, though,

if s— is substituted for G-. From now on z calculated by substituting such an

estimate will be called an approximate z and symbolized _ [15.6]. The

formula for "z" ["zee quotes"] is:


[Formula 15.4]

Concluding a Test
To conclude the test of a null hypothesis about the mean of a single popu¬

lation, the approximate z score that locates the obtained sample mean within

the random sampling distribution of means implied by the null hypothesis is com¬

pared with the critical z value or values. If the obtained sample mean, as in¬

dicated by the approximate z, falls in the region of rejection, the null hypoth-
124 Chapter 15

esis is rejected and the alternative hypothesis is accepted. If the obtained

sample mean, as indicated by its approximate z, falls in the region of accep¬

tance, the null hypothesis is accepted. The decision to "accept" Hq does not

mean that it is likely that Hq is true / false , but only that it could be

true / false [15.6, p. 257]. For this reason, some statisticians prefer to say

"fail to _ " or "_" rather than "accept" [15.7, footnote].

Also for this reason, if the null hypothesis is accepted, the alternative hypoth¬

esis remains a plausible possibility and cannot be rejected.

Statistical Jargon

When the outcome of a statistical test is reported in research literature,

it is common to see statements about "significance." What does it mean to say

that "the outcome was significant at the 5% level"? This usually means that a

null hypothesis and a(n) ___ hypothesis were formulated, the

decision criterion was ct = _, and the evidence from the sample led to

acceptance / rejection of the null hypothesis [15.9]. Similarly, the words

"not significant" (sometimes abbreviated n.s.) imply that the null hypothesis

could / could not be rejected [15.9]. When a report simply says "not signifi¬

cant without stating the value of ot, it is probably safe to assume that it was

_ [15.9].

The use of the word significant in connection with statistical outcomes is

unfortunate. In common English it implies "important," but in statistics it

means only that the sample value was / was not within the limits of sampling

variation expected under the null hypothesis [15.9]. Whether the difference

between what is hypothesized and what is true is large enough to be important

is the same thing / another matter [15.9].

Problems in Selecting a Random Sample and Consequences Thereof

The ideal way to conduct statistical inference is to define carefully the

target population, identify each _ of the population and assign it

an identification number, and to draw a _ sample by use of a table

of random numbers [15.11]. in behavioral science, almost always /only occa¬

sionally do real problems permit these specifications to be met [15.11]. We

need to keep in mind that, strictly speaking, it is possible to generalize the


Testing Hypotheses about Single Means: Normal Curve Model 125

inferential outcome only to a _ from which the sample may be

considered to be a _ sample [15.11]. Many research reports give the

impression that their conclusions are general ones, but a little probing will

likely reveal that the conclusions apply only to subjects who are of a particular

sex or age or who are otherwise a sharply limited group.

in Section 1.6, the distinction was made between a statistical conclusion

and a ___ one (one about the subject matter) [15.11]. The

statistical conclusion says something about a parameter of the population, such

as ]i. But the _ conclusion says something about the mean¬

ing of the study for psychology, or education, or some other discipline [15.11].

Drawing a statistical conclusion can be done as an automatic process once statis¬

tics is learned. However, moving from the substantive question to an appropriate

question, and finally from a ___ con¬

clusion to a substantive one, requires the highest qualities of knowledge and

judgment on the part of the investigator [15.11].

[If something in this chapter remains unclear to you, the next chapter may
help. It presents some of the details of the logic and the procedures introduced
here. Don't be afraid to look ahead and search the next chapter for more infor¬
mation on anything you're still puzzled about.]
126 Chapter 15

MAP of LARGE-SAMPLE PROCEDURE for TESTING a HYPOTHESIS


about a SINGLE MEAN

Null Hypothesis Random Sampling Distribution of Means


(for samples of whatever size is used)

is
symbolized
is centered,
if Hq is true,
H0
on
states
value of is divided
into

has standard error


estimated by

s— = s/Vn~
region of region of
where s = acceptance rejection

v/I'Lx2 / (n-1)

is
has area appears m
equal to one or both
tails of

accepted if obtained
sample mean falls in depending on
level of
significance
rejected if obtained
sample mean falls in
Alternative
Hypothesis

is symbolized
may be
is symbolized

nondirectional, directional,
a
H. requiring a two- requiring a one-
A
tailed test of H tailed test of Hq
0
Testing Hypotheses about Single Means: Normal Curve Model 127

SYMBOLISM DRILL

Symbol Pronunciation Meaning

4 Result of summing quantities of some kind

5 EX/ ; the mean of a sample

6 Ex/ ; the mean of a population

9 X - X or X - P; score

11 E / ; variance of a population

12 E / ; standard deviation of a popula'n

13 "big es squared" E / ; variance of a sample

14 "big es" E / ; standard deviation of a sample

1 5 x/a or x/S;

1 7 Pearson correlation coefficient for a pop'n

16 Pearson correlation coefficient for a sample

1 8 Predicted raw score on Y

19 Predicted z score on Y

20 Standard error of estimate of Y on X

21 Mean of sampling distribution of means

22 Standard error of the mean; o//~

23 "little es" Estimate of 0; /Ex2/(n-l)

"little es sub eks bar" Estimate of a—; s/vn

25 "aitch null" Null hypothesis

26 "aitch sub ay" Alternative hypothesis

27 hype" Value of p stated in null hypothesis

28 "zee quotes" Approximate z score with denominator estimated

29 "zee crit" Critical value of z


[
CHAPTER 16

FURTHER CONSIDERATIONS IN HYPOTHESIS


TESTING

16.1 Introduction

16.2 Statement of the Hypothesis

16.3 Choice of : One-Tailed and Two-Tailed


Tests

16.4 The Criterion for Acceptance or Rejection


of

16.5 The Statistical Decision

16.6 A Statistically Significant Difference


versus a Practically Important Difference

16.7 Error in Hypothesis Testing

16.8 Decision Criterion or Index of Rarity?

16.9 Multiple Tests

2
16.10 The Problem of Bias in Estimating O

PROBLEMS and EXERCISES

J-

3 4

5 6

7 8

9 10

11 12

14 .
13

129
130 Chapter 16

SUMMARY

This chapter spells out some of the details of the logic and the procedures
introduced in the previous chapter.

The Null Hypothesis

There are three important points to note concerning the null hypothesis. Hi

(a) # is always a statement about the population (or difference

between two or more _ if more than one population is involved)

[16.2] . (b) Hq is expressed in terms of a point value / range [16.2]; that is,

it states only one particular value for the population parameter of interest.

(c) The decision to accept or reject "the hypothesis" always has reference to

H0/ha [!6.2]. It is HQ/HA that is the subject of the test [16.2].

The term "null hypothesis" makes little sense in the case of an hypothesis
about the mean of a single population, but it is appropriate in the case of a
hypothesis about the relationship between the mean of a first population and the
mean of a second one. Here the hypothesis typically states that there is no
difference between the two population means, and the word null means zero.

The Alternative Hypothesis

The alternative hypothesis, H , may be nondirectional or directional. When


r\

the alternative hypothesis is nondirectional, a one-tailed / two-tailed test

results, and it is possible to detect a discrepancy between the true value and

the hypothesized value of the parameter irrespective of the direction / for only

one direction of the discrepancy [16.3]. A directional alternative hypothesis

is appropriate when there is no / some practical difference in meaning between

finding that the null hypothesis is true and finding that a difference exists in

a direction opposite to that stated in the directional alternative hypothesis

[16.3] . A directional alternative results in a one-tailed test. The decision

to use a one-tailed alternative must always flow from the logic of the substan¬

tive/statistical question [16.3]. The time to decide on the nature of the

alternative hypothesis is therefore at the beginning / end of the study, before /

after the data are collected [16.3].

The Level of Significance as a Level of Risk

The decision to accept or reject the null / alternative hypothesis is depen¬

dent on the criterion of rarity of occurrence adopted, commonly known as the


Further Considerations in Hypothesis Testing 131

_of significance (a, "alpha") [16.4]. This quantity determines the

extent to which we are taking a certain kind of risk in testing an hypothesis.

Suppose a is set at .05. When the null hypothesis is true, _% of the sample

means will nevertheless lead us to say that it is false [16.4]. So when we

decide to adopt a = .05, we are really saying that we will accept a probability

of .05 that the null hypothesis will be accepted / rejected when it is really

true [16.4].

To reduce the risk, we may set a at a lower / higher level [16.4]. In this

case, we run a substantial risk of accepting the null hypothesis when it is

true / false [16.4].

For general use, a = .05 and a = .01 make quite good sense. They tend to

give reasonable assurance that the null hypothesis will not be rejected unless

it really should be. At the same time, they are not so stringent as to raise

unnecessarily the likelihood of accepting true/ false null hypotheses [16.4].

Whatever the level of significance adopted, the decision should be made in

advance / after the data are in [16.4].

The Statistical Decision and Its Meaning

The statistical decision is the decision about the null hypothesis. A

decision to reject means that we do not believe the mean of the population to

be what the null says it is. Moreover, the lower / higher the probability of

obtaining a sample mean of the kind that occurred when the null hypothesis is

true, the greater the confidence we have in the correctness of our decision to

reject the hypothesis [16.5].

On the other hand, accepting the null hypothesis means/ does not mean that

we believe the hypothesis to be true [16.5]. Rather, this decision merely re¬

flects the fact that we do not have sufficient evidence to accept / reject the

null hypothesis [16.5]. To put it another way, the decision to accept means

simply that the hypothesis is a tenable one. Certain other hypotheses that

might have been stated would also have been accepted if subjected to the same

test.

In short, rejecting the null hypothesis means that it does not seem reason¬

able to believe that it is true / false , but accepting the null hypothesis
132 Chapter 16

merely means that we believe that the hypothesis could / must be true [16.5].

It does not mean that it must / could be true, or even that it is probably true,

for there would be many other hypotheses that if tested with the same sample

data would also be accepted [16.5].

Statistically Significant or Practically Important?

To test a null hypothesis about the mean of a single population, we calculate

an estimated z score, "z" ("approximate z"), that indicates where the obtained

sample mean falls within the sampling distribution of means that would occur if

the null hypothesis were true. If "z" is large enough (in the sense of being

far away from zero, either above it or below it), we will reject the null hypoth¬

esis. (If the test is one-tailed, "z" must fall on the appropriate side of the

sampling distribution, of course.)

Now the magnitude of "z" depends both on the quantity in the numerator and

on the quantity in the denominator of the ratio:

^ Hhyp
"z" =
s//n

Other things being equal, if sample size n is very large, the denominator, s//n,

will be quite large / small [16.6]. In this event, a relatively large / small

discrepancy between X and y^ may produce a value of "z" large enough to lead

us to reject the null hypothesis [16.6]. In cases of this kind, we may have a

result that is "statistically significant" but in which the difference between

y and y, is so small / large as to be unimportant [16.6].


true hyp
The end product of statistical inference is a conclusion about descriptors,

such as y. Therefore, the simplest remedy is to return to them to evaluate the

importance of a "significant" outcome. Look at how much difference exists be¬

tween y, and the sample mean obtained. Is it of a size that matters?


hyp
What about the statistical test when sample size is small? In this case,

the standard error of the mean will be relatively large / small , and it will be

difficult / easy to discover that the null hypothesis is false, if indeed it is,

unless the difference between y and y, is quite large / small [16.6].


true hyp

Errors in Hypothesis Testing

The statistical conclusion in hypothesis testing is the decision to accept


Further Considerations in Hypothesis Testing 133

or reject the null hypothesis. Either decision may be in error; thus there are

two types of errors one can make.

A Type I error is committed when is rejected and in fact it is true /

false [16.7]. The probability of committing a Type I error is Ot, the level of

significance. The possibility of committing a Type I error exists only in situ¬

ations where the null hypothesis is true / false [16.7]. If the null hypothe¬

sis is true / false , it is impossible to commit this error [16.7].

A Type II error is committed when H is accepted and in fact it is true /

false [16.7], The Greek letter _____ (beta, pronounced "bayta") is used to in¬

dicate this probability [16.7], The possibility of committing a Type II error

exists only in situations where the null hypothesis is true / false [16.7]. If

the null hypothesis is true / false , it is impossible to commit this kind of

error [16.7].

Misuses of a
Some researchers evaluate the outcome of the test of a null hypothesis by

showing the probability of obtaining a value as discrepant as the one obtained

if the null hypothesis were true. For a given outcome they might report, say,

that "p < .03." This probability statement is an expression of the rarity of

the sample outcome if the null were true and nothing more. It can/ cannot be

interpreted as the value of ot [16.8], which is a statement of the risk the re¬

searcher is willing to take in rejecting a null hypothesis.

Multiple Tests

Suppose that several hypothesis tests are conducted using the same level of

significance, say .05. For each test taken individually, the probability of a

Type I error is , but taken as a group, the probability that at least one

from among the several will prove to be a false positive is greater / less than

.05 and continues rising / falling as more tests are made [16.9].

Bias in Estimating

The standard error of the mean, symbolized G_, is computable from the formula
A
When o is unknown, as it usually is, it must be estimated from a sample.

Intuition suggests substituting S, the sample standard deviation, as an estimate,


134 Chapter 16

but S tends to be a little too small / large [16.10].

The basic problem is that the sample variance, S2, is a biased estimator of

the population variance, a2. When an estimator is unbiased, the _ of the

estimates made from all possible samples equals the value of the parameter esti-

mated [16.10]. But the mean value of S , calculated from all possible samples

of any given size that could be drawn from a given population, is a little smaller

than a2.
The formula for the sample variance is:

2
S [p. 278]

The tendency toward underestimation will be corrected if £(X-X) is divided by

_ rather than by n [16.10]. This change produces an unbiased estimate of

the population variance, symbolized s ("little es squared"). Taking the square

root of the formula, we have an estimate of the standard deviation of the popula¬

tion, symbolized s:
s [p. 278]

If s is then substituted for O in the formula for the standard error of the

mean, we have an estimate of the standard error called s_:


X

s— = [p. 279]

Although the correction introduced in estimating the standard error of the

mean makes for a better estimate on the average, we should recognize that any

particular sample will probably yield an estimate too _ or too _

[16.10].
Further Considerations in Hypothesis Testing 135

SYMBOLISM DRILL

Symbol Pronunciation Meaning

4
I
5
X / ; the of a

6
y / ; the of a

9 score
X or

1 1
a2 £ / ? __ _ of a

1 3
s2 I / ; _ _ of a

1 2
0 I / ; _ _ of a

1 4
s £ / ? ___ _ of a

1 5 score
z /a or /S;

1 6
r

2 1

22 of the _ ; a//~
°x
32
s2 "little es squared" Estimate of a2; lx2/{n~1)

2 3
S _Estimate of 0;

2 4
s— Estimate of_; /J~n
X
2 5
Ho
2 6
ha
2 7
yhyp
3 3 True value of y
u
true
2 8
"z"

2 9
z .
crit
3 0 Risk of Type error; level of
a

3 1 Risk of Type error


3 "bayta"
f
CHAPTER 17

TESTING HYPOTHESES ABOUT TWO MEANS:


NORMAL CURVE MODEL

17.1 Introduction

17.2 The Random Sampling Distribution


of the Differences between Two Sample
Means

17.3 An Illustration of the Sampling


Distribution of Differences between
Means

17.4 Properties of the Sampling Distri¬


bution of Differences between Means

17.5 Testing the Hypothesis of No


Difference between Two Independent
Means: the Vitamin A Experiment

17.6 The Conduct of a One-Tailed Test

17.7 Sample Size in Inference about


Two Means

17.8 Randomization as Experimental


Control

17.9 Comparing Means of Dependent


Samples

17.10 Testing the Hypothesis of No


Difference between Two Dependent Means

17.11 An Alternative Approach to the


Problem of Two Dependent Means

17.12 Some Comments on the Use of


Dependent Samples

17.13 Assumptions in Inference about


Two Means

137
138 Chapter 17

PROBLEMS and EXERCISES

1 2

3 4

5 6

7 8

9 10

11 12

SUMMARY

Chapter 17 describes the method for testing a hypothesis about the relation
between the mean of a first population and the mean of a second population.
Scores from the first population are called X, those from the second population
are called Y, and the null hypothesis usually states that the two populations
have the same mean, which is to say that the difference between the population
means is zero. In symbols, H says that - yy = 0. This hypothesis is appro¬

priate for many studies in which a variable is measured under two different con¬
ditions. In particular, this is the appropriate null hypothesis for an experiment
in which an experimental condition and a control condition are established and
a sample of scores on some variable is collected in each condition.

The Random Sampling Distribution of the Differences Between Two Sample Means

As in the case of testing a hypothesis about the mean of a single population,


we must ask whether the data on hand would be likely or unlikely to have turned
up, were the null hypothesis true. Here the data on hand are a pair of samples.
One came from the population of scores called X; it has a certain size, symbolized
nX' an<^ a cer"tain mean, symbolized X. The other sample came from the population

of scores called Y; its size is symbolized ny and its mean Y. Ideally, each

sample was selected at random from its parent population.

The difference between the two sample means, (X - Y), is the statistic on
which we focus. We ask whether the obtained difference is likely or unlikely to
have occurred if the null hypothesis were true. To answer this question, we must
consult a sampling distribution, in this case the random sampling distribution
°f differences between two sample means for samples of the sizes we drew and for
the two populations from which we drew them.
Testing Hypotheses about Two Means: Normal Curve Model 139

This distribution could be generated, hypothetically, as follows: One sample

(of size n%) is drawn at random from the population of X scores, and another (of

size ny) is drawn from the population of Y scores. The _ of each is com¬

puted, and the difference between these two _ obtained and recorded

[17.2]. Let the samples be returned to their respective populations and a second

pair of samples be selected in the same way / in a different way [17.2]. The

sample from population x must have size nx again, and the sample from population

Y must have size ny again. Again the two _ are calculated and the dif¬

ference between them noted and recorded [17.2]. If this procedure is repeated

indefinitely, the differences [values of (x - Y) thus generated] form the random

sampling distribution of differences between two sample _ [17.2]. This

distribution is specific to the sample sizes employed and to the populations

sampled; that is, the characteristics of the sampling distribution would change

if either sample size were changed or if either population were changed.

Three characteristics completely define any distribution: _, _

, and _ [17.4]. The mean of the random sampling

distribution of differences between pairs of samples means, — ("mew sub eks-


A i

bar-minus-wi-bar"), is the same as the between the two pop¬

ulation means, \ix - py [17.4]. This is true regardless of the sample sizes and

regardless of the shapes of the populations. For cases in which the null hypoth¬

esis says that py — py = 0, then, the random sampling distribution of differences

between pairs of sample means will be centered on zero if the null is true.

As for shape, the sampling distribution of differences will be normally dis¬

tributed when the two populations are ____ [17.4] .

Even when the two populations are not normal, the sampling distribution tends

toward normal, and with bigger sample sizes, the sampling distribution becomes

closer and closer to normal in shape.

The standard deviation of the sampling distribution of differences between

two means is called the standard __ the _ between

two sample _, and its symbol is _______ ("sigma sub eks-bar-minus-wi-

bar") [17.4]. Its value depends on whether the samples are independent or not.

Independent random samples exist when the selection of elements comprising

the sample of Y scores is in some / no way influenced by the selection of ele


140 Chapter 17

ments comprising the sample of X scores, and vice versa [17.4]. In ordinary

random selection from two populations, this would /would not he true [17.4]

For this case, the standard error of the difference between two means behaves

in accord with the following formula:

°X'-Y = [Formula 17.1]

Formula 17.1 requires the standard error of the mean of X and of Y and these,

m turn, require that_and_be known [17.4]. As usual, in practice,

the two population standard deviations must be estimated from the samples. Sub¬

stituting in estimates of G— and 0— produces an estimate of a- that is sym-

bolized s—("little es sub eks-bar-minus-wi-bar"). The formula is:

sY-y = [Formula 17.2a]

Since s~ = sx/-/nx and s— = s^Z/n^, in practice the formula works out to:

SX-F = [Formula 17.2b]

Estimating a—introduces a degree of error, of course. If the size of each

sample equals or exceeds __, the error will be small enough that the procedures

described here will be acceptable if not entirely correct [17.4].

There are two basic ways in which dependent samples may be generated: (1)

the same subjects are used for both conditions of the study, and (2) different

subjects are used, but they are __ on some variable related to per¬

formance on the variable being observed [17.9]. when samples are dependent, the

standard error of the difference between means must take account of the

-- induced by the existing relationship between the samples [17.9]. The

standard error is:

ax-y = [Formula 17.4]

When the parameters are unknown, the formula that estimates O_ is-
X-Y

[Formula 17.5]
Testing Hypotheses about Two Means: Normal Curve Model 141

Again, error will be introduced in substituting an estimate of O— — for its

true value. Evaluation of the approximate z that is obtained by the procedures

described in this chapter according to the characteristics of the normal curve

will generally be acceptable, although not entirely accurate, when the number

of pairs of scores / total number of scores = 40 or more [17.9].

The Alternative Hypothesis

The alternative hypothesis, H^, may take one of three forms. The nondirec-
tional form says that the two populations of interest do not have the same mean,
which is to say that that the difference between the population means is not
zero. In symbols, this form of HA says that ]ix - Py ^ 0. This form gives rise
to a two-tailed test in which the region of rejection is divided between the two
tails of the sampling distribution of differences between sample means, as in
Figure 17.2 on p. 294.

The other possibilities for the alternative hypothesis are directional forms
stating either that the X population has a greater mean than the Y population or
vice versa. In symbols, these forms say either than Px “ hy > 0 or that Px ~ ^y
< 0. These forms give rise to a one-tailed test in which the region of rejection
is located entirely in one tail of the sampling distribution of differences be¬
tween sample means. Figure 17.3 on p. 294 shows the appropriate picture for a
directional alternative of the first kind; for a directional alternative of the
second kind, the region of rejection would be located in the left-hand tail.

No matter what the form of the alternative hypothesis, the region or regions
of rejection have an area equal to a, the level of significance for the test.

Locating (X - F) within the Sampling Distribution

As noted above, if the null hypothesis of no difference between the two popu¬

lation means is true, the random sampling distribution of differences between

sample means is centered on zero. How deviant is the obtained sample difference

from the hypothesized population difference of zero? To answer, the obtained

difference must be expressed in the form of a z score, where z = (score - mean of

distribution)/(standard deviation of distribution). In the sampling distribution

of differences between sample means, our obtained difference, (X - Y), is the

, the hypothesized difference between the population means is the

,
and s-is the estimated ____ [17.5].
X Y
As with problems involving single means, we have an approximate z rather than a

true z, because an estimate of the standard deviation is substituted for the

value [17.5]. The use of the symbol "z" will continue to remind us
142 Chapter 17

of this. The formula for the location of (X -SY) in the sampling distribution,

expressed as an approximate z, is:

"z" = [Formula 17.3]

Here (y^ ~ hyp ^"mew sut)_eks minus mew sub-wi, the quantity hype") is the

value of (y^ - yy) stated in the null hypothesis, which is usually zero.

A Way to Simplify Things in the Case of Dependent Means

Calculating "z" is laborious in the case of dependent samples because of the

work involving in finding the correlation rXY, which is required for the estimate

of the standard error of the difference between the means, s_ An alternative


X~ Y
method that saves computational work while giving an identical answer is avail¬

able .

Consider the hypothesis that yx - yy = 0. If the hypothesis is true, then

it is also true that the mean of the population of differences between paired

values of X and Y is _ [17.11], [An exercise to provide insight into this

state of affairs appears below.] If the difference between an X score and its

paired Y score is designated by D, the initial hypothesis may be restated: Hgi

_ = _ [17.11]. The alternative method requires that we find D, the

mean of the population/ sample set of difference scores, and inquire whether

it differs significantly from the hypothesized mean of the population / sample

of difference scores, y^ [17.11], in this method the two-sample problem is re¬

duced to a one-sample problem exactly like that treated in the previous two

chapters.

To locate the obtained mean of the sample of difference scores, D, within

the sampling distribution of quantities of this kind, we must calculate an approx¬

imate z score as follows:

"z" = [Formula 17.6]

The Effect of Sample Sizes in the Case of Independent Means

When inference concerns two independent sample means, the samples may be /

must be of different size [17.7]. However, if a and a are equal, the total
X Y
Testing Hypotheses about Two Means: Normal Curve Model 143

of the sample sizes (n + n ) is used most efficiently when n _ n [17.7].


x y x y
This will result in a larger / smaller value for a-than otherwise [17.7].
X Y
The advantage of a larger / smaller a_is that, if there is a difference be-
X Y
tween \ix and yy, the probability of claiming it (rejecting _) is increased

[17.7].

The point just noted has to do with the relative size of the two samples.

What about the absolute magnitude of sample size? Other things being equal,

large samples increase / decrease the probability of detecting a difference

when a difference exists [17.7].

Randomization as Experimental Control

Comparisons between two or more groups may be divided into two categories:
those in which the investigator can assign to each subject any particular treat¬
ment condition, and those in which the investigator cannot. In a study of the
first kind, it is possible for the investigator to assign treatment condition
at random to the subjects, and to do so has important advantages.

The primary experimental (as opposed to statistical) benefit of randomization

lies in the chance (and therefore impartial) assignment of extraneous influence

among the groups to be compared. (An extraneous influence is one other than the

treatment, which will be one whose effects the investigator would not wish to

entangle with any effect the treatment might have.) Those who are likely to do

well have more / just as much chance of being assigned to one treatment group

than / as they have to another, and the same / opposite is true of those who

are likely to do poorly [17.8]. The beauty of randomization is that it affords

this type of experimental control over extraneous influences whether or not they

are known by the experimenter to exist. Random assignment of subjects to treat¬

ment groups guarantees / does not guarantee equality [17.8]. But randomization

tends to produce equality, and that tendency increases in strength as sample

size increases / decreases [17.8].

Inspection of the outcome of randomization sometimes tempts the experimenter

to exchange a few subjects from group to group before proceeding with the treat-

ingnt in order to obtain groups more nearly alike. Such a move improves things /

leads to disaster [17.8]. The standard error formulas are based on the assump¬

tion of randomization, and casual adjustment of this kind makes them more appro¬

priate / inappropriate [17.8].


144 Chapter 17

Aspects of the Use of Dependent Samples

Samples (or means) can be dependent for one of two reasons, as noted above:
because one group of subjects appeared in one sample and another group in the
other sample, but the subjects were matched in pairs, one member of each pair
from each sample; or because the same subjects were used in both conditions. In
both cases, randomization can be used to advantage.

With matched subjects, the benefit of randomization as control can be achieved

by assigning treatment condition __ to the members of each pair,

taking care to do so independently for each pair of subjects [17.12]. The problem

is more complicated when the same subjects are used for both treatment conditions.

Here, random assignment would mean deciding randomly which treatment the subject

will receive ___________ and which will be given [17.12]. This will

create some problems when the first treatment experience changes the subject in

some way so that she or he performs differently under the second treatment. Prac¬

tice effect and fatigue are two possible influences that might affect a subject's

second performance.

From a statistical standpoint, there can be an advantage in electing to use

paired observations rather than independent random samples, when a choice is avail¬

able. Pairing observations makes possible elimination of an extraneous source of

variation. The effect of doing so is to reduce / increase the influence of ran¬

dom variation on the differences between means. The standard error measures this

factor. Ihe effect of reducing the standard error of the difference by pairing

is the same as reducing it by increasing sample ______ [17.12]. The less the

error (as measured by the standard error), the less / more likely it is to mask

a true difference between the means of the two populations [17.12]. To put it

more formally, a reduction in a___ reduces the probability of committing a Type

I / II error [17.12].

The reduction of Cf—ind^ced by pairing observations depends on the value of

the ----- induced by pairing [17.12]. In general,

when pairing is on the basis of a variable importantly related to performance of

the subjects, the correlation will be higher / lower than otherwise, and the re¬

duction in 0—will consequently be lesser / greater [17.12].


Testing Hypotheses about Two Means: Normal Curve Model 145

CAUTIONS CONCERNING CONFUSABLE QUANTITIES

1. y-f y_ - Y
X-Y T *X
("Mu sub eks bar minus wi bar does not equal Mu sub eks bar Minus Wibar.")

The quantity on the left here is the mean of a population; that's what the
y indicates. The population is composed of numbers derived by taking the mean
of a sample of scores called A" and subtracting from this mean the mean of a
sample of scores called Y; that's what the subscript X~Y indicates. The numbers
described by the subscript are differences between sample means, then, and the
expression y^_y designates the mean of a sampling distribution of such quantities.
The first paragraph on p. 139 of this workbook describes how such a sampling
distribution could be generated.

The quantity on the right above is a difference, the difference between (a)
the mean (y) of a population, the elements of which are means (X) of samples of
scores called X, and (b) the mean (Y) of a single sample of scores called Y.
You will have no occasion to deal with this bizarre expression in this course,
and probably no occasion to deal with the expression at any other time in your
life, even if you become a professional statistician. Be sure you don't confuse
it with the expression on the left above.

("Sigma sub eks bar minus wi bar does not equal Sigma sub eks Minus Wibar.")

The quantity on the left this time is the standard deviation of a population;
that's what the O indicates. The population is composed of numbers derived by
taking the mean of a sample of scores called X and subtracting from this mean
the mean of a sample of scores called Y; that's what the subscript X-Y indicates.
The numbers described by the subscript are differences between means, then, just
as in the expression yy_y. °x-Y is the standard deviation of a sampling distri¬
bution of differences between means, and it has the special name "standard error.

The quantity on the right, in contrast, is a difference, the difference be¬


tween (a) the standard deviation (cr) of a population, the elements of which are
means (X) of samples of scores called X, and (b) the mean (Y) of a single sample
of scores called Y. This is another bizarre expression that will never arise in
this course—unless you think it or write it by mistake. Please don't.

3. The distinction between independent and dependent variables f


the distinction between independent and dependent means.

An independent variable is a variable manipulated by a researcher to see


whether it has some effect on another variable; it is a factor that might have
some influence on this other variable. The text calls it a treatment. In the
experiment described on pp. 285-286 of the text, the independent variable is the
quantity of Vitamin A in the subject's diet, and it is tested as a factor that
146 Chapter 17

might affect the subject's visual acuity under dim light. The variable that
might be influenced by the independent one is called a dependent variable. It
is not manipulated; rather it is left free to vary, and it is measured for each
subject in each condition of an experiment. In the one described in the text,
the dependent variable is visual acuity under dim light.

This distinction has nothing to do with the distinction between independent


means and dependent means, and the similarity in terminology is just an unfortu¬
nate coincidence. The means that may be independent or dependent are means of
scores on a dependent variable. What determines whether the means are indepen¬
dent or dependent is the researcher's procedure in collecting the data.

If the researcher tested one sample of subjects in one condition of the


experiment and a different sample of subjects in the other condition, without
doing anything more special than this, the mean of the scores for the first con¬
dition is independent of the mean of the scores for the second condition (and
vice versa). The two samples may be of different sizes in this case, and even
if they are of the same size, there is no logical way to pair a score from one
sample with a score from the other sample.

If the researcher tested each subject first in one condition and then in the
other condition, though, the mean of the scores for the one condition and the
mean of the scores for the other condition are dependent. Or if the researcher
picked a first subject, looked around for a second one who matched the first in
some way (in visual acuity under normal light for the example on p. 297), flipped
a coin to determine which member of this pair went into which condition, and
continued in this way, matching each subject for one condition with a subject
for the other condition again the mean of the scores for the one condition and
the mean of the scores for the other condition are dependent. When means are
dependent, the samples they characterize must be of the same size, and there is
a logical way to pair each score in one sample with a score from the other sample.

Whether means are independent or dependent, though, and if dependent, whether


they derive from the repeated-measurement or the matching procedure, the scores
contributing to the means are measurements of a dependent variable, and the re¬
searcher is looking to see whether an independent variable (or treatment) is
associated with a statistically significant difference between the means.

In distinguishing independent and dependent means, the text sometimes talks


about independent or dependent samples instead. The terminology is equivalent.

WHY DOES y = 0 if p - y =0?


ja x y

In the bottom paragraph on p. 299, the text zips you through the point that
Kd - 0 if \ix - yy = o. If you found that point unclear, this exercise should
help.
Testing Hypotheses about Two Means: Normal Curve Model 147

Dependent means arise when there is some logical way to pair each score in
one condition of a study with a score from the other condition of the study.
Such pairings are shown in Table 17.5 on p. 300 of the text. Here we are asked
to imagine that 20 subjects were chosen at random from some population and given
a preliminary test to determine their reaction time to a white light. Ten pairs
of subjects were then formed on the basis of these reaction times. Within each
pair, the two subjects were equal in the speed of their reaction to the white
light, but some pairs were relatively slow while others were relatively fast.
The reaction times on which the pairings were done do not appear in Table 17.5,
though, and they do not enter into any of the statistical calculations for the
study.

The researcher then flipped a coin or did the equivalent to assign the members
of each pair at random to one condition or the other of the experiment. The pro¬
cedure might have gone like this: Take a pair of subjects; call one of them A
and the other B. If the coin comes up heads, Subject A is tested with the green
light and Subject B with the red light; if the coin comes up tails, it's vice
versa for the subjects.

Reaction times to the colored lights for the ten pairs of subjects are shown
in Table 17.5 in the columns headed X and Y. Each score is the time in milli¬
seconds (thousandths of a second) to respond to a light. The light was green for
one member of each pair, whose score was subsequently called X, and red for the
other member of each pair, whose score was called Y.

Note that X = 27.4 while Y = 26.8 milliseconds.

Onward to Table 17.6 now. Here the same ten pairs of subjects are listed
in the same order, from Pair #1 down to Pair #10, along with their scores, X or
Y, again. But this time the difference, called D, between the X score and the Y
score for each pair is included. Check the column of D values: 3 = 28 - 25; -1 =
26 - 27; and so on. In general, D = X - Y.

What is D? It works out to be +.6, as the upper right corner of the table
indicates. (Check the computation.) So what? Well, X-Y = 27.4- 26.8 = +.6 too.
This is an instance of the generalization that where difference scores "D” are
computed as X-Y, D = X-Y.

Now construct another such instance yourself. Fill in the missing numbers in
the table below. It's been arranged so that here the mean of the difference
scores works out to be zero.
Pair X Y D = X - Y
IX = X = / 1 6 4 2
lY = Y = /
2 5 6
X - Y =
3 7 3
Id = d = /
4 5 5

Does D = X - Y? If not, you 5 3 6


made one or more mistakes.
6 46
148 Chapter 17

_ _If you did your computations correctly, your table will indicate that D =
X - Y.

The generalization the text states on p. 299 is just this statement expressed
in terms of population parameters. Corresponding to D, the mean of a sample of
difference scores called D, is yD. Corresponding to X and 7 are ]ix and yy. In
general, \iD = \\x - y^. if \AX - ]Ay = 0, then \iD = 0 too.

This is an eternal truth such as mathematicians call a theorem, and it comes


out pretty in words: The mean of the difference scores equals the difference
between the means.

You yourself can prove that this statement is an eternal truth; all it takes
is a very little high-school algebra. If you want to try—and you'll feel good
if you figure out the proof for yourself—the notes below will get you started.

Proof that y = y - y
D X Y

In general, a table of the kind under consideration here has the following
form, where N is the total number of pairs of X and Y scores:

Pair X Y D = X -

1 X1 D1
Yl
2 X2 y2 d2

3 X3 Y3 d3
♦ • • •
• • • •
• • • •

N X Y D
N N N
To prove that \iD = y^ - yy, start as follows:

lD Dl + b>2 + D3 + . . . + dn

(Xi-Yi) + (X2-Y2) + (X3-Y3) + ... + (XN - Yn)

N
Testing Hypotheses about Two Means: Normal Curve Model 149

To complete the proof, change that last expression on the right of the -
sign until you have bx “ Uy* You maY or maY not need a11 four of the additional
lines there, or you may need more than four; there's more than one way to do the
proof.

If you get stuck, consult the hint in the middle of this page.

•abed qxau aqq go uioqqoq aqq uo aiqeiTeAe qugq aaoui auo s,aaaqq qaoM
q.usaop OTqoeq sxqq qe—Ajq poob e—Aaq poob e II ioqui uoxssaadxa sqqq
uanq noA ueo qeqM *Arl - xrl st qoqqM ' qeob anoA uioag paeMqoeq burqaoM Aai
150 Chapter 17

SYMBOLISM DRILL

Symbol Pronunciation Meaning

"the sum of"

"eks bar" £*/ ; the of a

"mew" T.X/ ; the of a

"little eks" X - or X score


1 l
"sigma squared" 2 / ; of a
1 3
"big es squared" 2 / ; of a
1 4
"big es" 2 / ; of a
1 2
"sigma" 2 / ; of a
2 3
"little es" Estimate of _; /lx2/( ~

32
"little es squared" Estimate of 2*7 ( )
1 6
"ar"

1 7
"rho"

2 1
"mew sub eks bar" of of
2 2
"sigma sub eks bar" of the
2 4
"little es sub eks bar* Estimate of ; s/f
2 5
"aitch null"

2 6
"aitch sub ay"

2 7
"mew hype" Value of stated in
3 3
"mew true" value of

2 8 Approximate score with


"zee quotes"
estimated

*^rt aog Ajj^ituits pue


N
= Xrl : 8tI * d uo buTqaegs gooad aqg aog guTq xeuTq
NX + * * " + £X + ZX + lX
Testing Hypotheses about Two Means: Normal Curve Model 151

Symbol Pronunciation Meaning

2 9 "zee crit"

3 o
"alpha" Risk of Type error; level of

3 1 "bayta" Risk of Type error

of of
3 4
"mew sub eks bar minus
wi bar" between

"mew sub eks minus mew


3 5 stated in null hypothesis
sub wi, the quantity Value of
hype"

of the
36 "sigma sub eks bar minus --
- wi bar" between two

"little es sub eks bar


3 7 Estimate of
minus wi bar"

3 8 X - V; score
"dee"

ANNALS of EGREGIOUS* EXAMPLES

A businessman came to see me recently for advice on the analysis of some data
he'd collected. He ran a marketing-research firm, and he'd conducted two studies
testing consumer reaction to several varieties of frozen food. In both studies,
his subjects were shoppers who were approached in public places such as malls and
parking lots. The subjects were asked to taste one or more varieties of a food
(which had been cooked, of course) and to report a judgment of "bad," poor,
"fair," "good," or "excellent." In accord with the procedure described in Ques¬
tion 18 on p. 17 of this workbook, the fellow had translated these judgments into

the numbers 1 through 5.

What is of interest here is a subtle difference between the two studies. To


simplify things a bit, let us suppose that only two varieties of a food were com¬
pared in each study. In the first one, there were a total of 200 subjects. Half
of them had tasted one variety and half the other variety; a given subject had
tasted only one, so there were a total of 200 scores.

*Egregious ("eh-gree-juss"): conspicuously bad.


152 Chapter 17

You know how to analyze data of this kind now. (A fine accomplishment, no?)
Say how you would do it.

1. What statistics would you calculate to describe the data?

2. What inferential procedure would you apply? Say whether you would test
a hypothesis about a single population mean or about two population means, and
if the latter, whether the sample means are independent or dependent. State your
null hypothesis and your alternative hypothesis, choosing between a one-tailed
and a two-tailed test. List the calculations you would have to do.

In the second study, there were only 100 subjects, but each subject had tasted
two varieties ol a food and judged both of them, so each subject contributed two
scores to the data, and there were again a total of 200 scores.

You also know how to analyze data of this kind. Again outline how you would
do it.

3. What statistics would you calculate to describe the data?

4. What inferential procedure would you apply? Spell out the details as for
Question 2 above.

Now we come to the reason why this example is egregious. In his first study,
the businessman had collected 200 judgments, 100 recorded on one page of a note-
Testing Hypotheses about Two Means: Normal Curve Model 153

book for one variety of a food, and the other 100 recorded on a second page for
a second variety. The fellow had cast each sample of 100 scores into a frequency
distribution, producing two tables looking like this (the frequencies are hypo¬
thetical) :

Judgments of Variety A Judgments of Variety B

Score f Score f

5 23 5 17
4 55 4 40
3 12 3 23
2 6 2 15
1 4 1 5

If = 100 If = 100

The mean and the other descriptive statistics required for each sample were easy
to calculate from these tables. (The text covers these techniques on pp. 66 and
90.) All this is fine and dandy for the first study.

But this is exactly how the businessman had preserved his data for the second
study too. Two tables of this sort were all that he had to go on.

5. Something is wrong here. What is it?

The data from the second study, then, could not be properly analyzed. The^
businessman could hardly believe it; the difference between the procedures he'd
followed in the two studies seemed so slight.

Statistical Moral: Plan the statistical techniques you'll use on your data
BEFORE you collect the data.

The businessman confessed to me that he had already written his report on the
foods for the company that was considering marketing them. In the report he had
simply asserted that the variety with the highest mean rating in each study was
significantly higher than the others tested in that study but he didn t really
know this to be so, not even for the first study.

Moral Moral: A course in statistics helps to preserve one s honesty.


'

»
C H A p T E yT8IMATI0N QF AND jU,X-/XY

18.1 Introduction

18.2 The Problem of Estimation

18.3 Interval Estimates of U

18.4 An Interval Estimate of y

18.5 Evaluating an Interval Estimate

18.6 Sample Size Required for Estimates


of y and

18.7 The Relation between Interval


Estimation and Hypothesis Testing

18.8 The Merit of Interval Estimation

PROBLEMS and EXERCISES

1 2

3 4

5 6_______

7 8

10
9

11 12

14
13

16
15

18
17

20
19

155
156 Chapter 18

SUMMARY

The techniques of inferential statistics permit us to reach conclusions


about entire populations on the basis of samples drawn from those populations.
These techniques fall into two categories: hypothesis testing and estimation.
Previous chapters have treated hypothesis testing for cases where sample size
is large enough to permit the use of the normal-curve model. The present chap¬
ter deals with estimation in cases where sample size is again large and the
normal-curve model thus still appropriate.

Point Estimates vs. Interval Estimates

Sometimes it is required to state a single value as an estimate of the

population value. Such estimates are called _ estimates [18.2].

___ estimates alone are made reluctantly, because they may be consider¬

ably in error [18.2]. __ estimates are more practical when con¬

ditions permit [18.2]. In ___ estimation, limits are set within

which it appears reasonable that the population lies [18.2].

Other things being equal, if wide limits are set, the likelihood that the limits

will include the population value is low / high , and if narrow limits are set,

there is greater / lesser risk of being wrong [18.2]. Because the option exists

of setting wider or narrower limits, any statement of limits must be accompanied

by indication of the degree of__ that the population parameter

falls within the limits [18.2]. The limits themselves are usually referred to

as a ____ interval and the statement of degree of confidence as

a ___coefficient [18.2].

Interval Estimates of y

To construct an interval estimate of a population mean |i, one begins with

the mean of a sample drawn (ideally at random) from that population. A certain

quantity is added to the sample mean to set the upper limit of the interval,

and the same quantity is subtracted from the mean to set the lower limit. The

quantity that is added and subtracted is the product of two values, a certain

z score and—if it is known--the standard error of the mean for samples of what¬

ever size was drawn. In symbols, the limits are:

x ± z a-
p x

where X is the sample / population mean, obtained by sampling;


Estimation of y and 157

Q— is the of the ; and zn is the magnitude


X -tr
of z for which the probability is _ of obtaining a value so deviant or more

so (in either direction) [18.3]. As usual, most frequently _must be substi¬

tuted as an estimate of CT— [18.3]. When n _> , little error will be intro-
X
duced by substituting _ for _ [18.3].

Once the specific limits are established for a given set of data, the inter¬

val thus obtained either does or does not cover _ [18.3]. The probability

is, at this stage, either _ or _ that the interval covers _; we do

not know which [18.3]. Consequently, it is usual to substitute the term _

_ for probability in speaking of a specific interval [18.3].

What does it mean to say that we are, say, "95% confident"? We do not know

whether the particular interval covers _, but when intervals are constructed

according to the rule, of every 100 of them (on the average) will include

_ [18.3]. Remember that it is the interval / y that varies from estimate

to estimate, and not the value of _ [18.3].

For a given confidence coefficient, a small sample results in a narrow/

wide confidence interval, and a large sample in a narrower / wider one [18.3].

Interval Estimates of y— - y—

An interval estimate of the difference between two population means can be

constructed by following the rule:

(X - Y) + z O— —
P X-Y

where (X - Y) is the difference between the two sample _; Cf—_y is the

of the __ between two _;

Zp is the magnitude of z for which, in the normal distribution, the probability

is of obtaining a value so deviant or more so (in either direction); and

p = 1 - C, where C is the ___[18.4] .

Once again, the procedure is dependent on knowledge of a— and o— (and p, in the

case of dependent / independent samples), which are the values needed to obtain

a- [18.4]. If only sample estimates are available (the usual case), _


X Y
must be substituted for 0— — [18.4]. When the size of each of the two samples
X Y
equals or exceeds for dependent/ independent samples, or when the number
158 Chapter 18

of pairs of scores equals or exceeds _ when samples are dependent / indepen¬

dent, the error in such a substitution is usually tolerable [18.4].

Once again, for a given confidence coefficient, a small / large sample re¬

sults in a wide confidence interval and a small / large sample in a narrower

one [18.4].

Interpreting an Interval Estimate

Is a given interval wide or narrow? If we are not familiar with the variable

under study, we cannot say. In such a case, one way to add meaning is to inter¬

pret the interval limits in terms of number of __

of the variable rather than in terms of raw-score points [18.5, Paragraph 2].

One advantage of expressing the outcome of an interval estimate in this way is

that it compensates for the fact that the importance of a given interscore dis¬

tance depends on the size of the _ of the

variable [18.5].

When confidence limits are expressed this way, we need to keep in mind that

the width of the limits must still be considered in the light of the value of the

confidence _ employed, just as we do when the limits are ex¬

pressed in score points [18.5],

Determining Sample Size for an Estimate of a Given Width

Sometimes we wish an interval estimate to be of a certain width. After


choosing our confidence coefficient, we can estimate the size of the one or two
samples required to hold the estimate to the desired width—if we can estimate
the standard deviation of the one or two populations of interest. The details
are available in Section 18.6.

Interval Estimation vs. Hypothesis Testing

Interval estimation and hypothesis testing are two sides of the same coin.

For most population parameters or differences between two parameters, an interval

estimate contains all values of H that would be accepted/ rejected had they

been tested using a = 1 - C [18.7]. But estimation has some important advantages

in many cases:

1. The final quantitative output of an interval estimate is a statement about

the __ or _ _s concerned [18.8]. In hypothesis

testing, the statement is about a derived score, such as z or t, or about a


Estimation of y and y -y 159
2\ Y

probability, P. In either form of inference, the question is about the parame¬

ter (s). A confidence interval is thus a(n) indirect / direct answer to the

question, whereas hypothesis testing focuses on a derived variable [18.8].

2. An interval estimate straightforwardly exhibits the influence of random

sampling variation. But in hypothesis testing, the magnitude of the derived

variable depends on two factors: the difference between what was hypothesized

and what is , and the amount of sampling variation present, which is a

function of sample size [18.8].

3. Hypothesis testing is subject to an important confusion between a statis¬

tically ___ difference and an important difference [18.8],

but this problem essentially disappears with interval estimation.

4. Interval estimation avoids the error of thinking that "accepting Hq /

Ha " means that Hq / HA is true or probably true [18.8]. Interval estimation

can be applied in most situations to inquire whether the population is charac¬

terized by a particular parameter value, and if used in this way, the interval

makes plain all of the values that might characterize the parameter, including,

possibly, the value inquired about.

5. Since the null hypothesis is a point / range hypothesis, it is unreason¬

able to believe that it could be exactly true in any practical encounter. Inter¬

val estimation is therefore more / less realistic [18.8].

SYMBOLISM DRILL

Symbol_Pronunciation_Meaning__

1 n Number of scores in a

2 n Number of scores in a _

3 X _____

4 I __-

/ ; the _ of a

/ ; the __ of a
160 Chapter 18

Symbol Pronunciation Meaning

9
X - or / score

1 1
a2 Z / ; of a

1 2
o 2 / ; of a

1 3
s2 Z / ; of a

1 4
s Z / ; of a

3 2 2
S Estimate of ; ^ /( )

2 3
S Estimate of ; Z /( )

22
a—
v
of the ; /
A

2 4
SX Estimate of •
/ /
A

of the
3 6
°X-Y between

3 7
SX~Y Estimate of

2 5 H0

26 ha

2 1 of of

2 7 Value of stated in
nyp
Vhvv

of of
3 4 U-
between

3 5 (ii _i. \ Value of stated in


VMX Vhyp_ _ _

38 D _ - ;_score

39 C "see" _ coefficient

1 7 P

1 6 r
CHAPTER 19

INFERENCE ABOUT MEANS AND THE


t DISTRIBUTION
19.1 Introduction

19.2 Inference about a Single Mean


when 0 is Known and when it is not

19.3 Characteristics of Student's


Distribution of t

19.4 Degrees of Freedom and Student's


Distribution

19.5 Using Student's Distribution of t

19.6 Application of the Distribution


of t to Problems of Inference about
_ Means

19.7 Testing an Hypothesis about a


_ Single Mean

19.8 Testing an Hypothesis about the


Difference between Two Independent

___ Means

19.9 Testing Hypotheses about Two


Independent Means: An Example

19.10 Testing an Hypothesis about Two


Dependent Means

19.11 Interval Estimates of y

19.12 Interval Estimates of

19.13 Further Comments on Interval


Estimation

19.14 Assumptions Associated with


Inference about Means

PROBLEMS and EXERCISES

2
1
4
3
6
5

161
162 Chapter 19

7 8

9 10

11 12

13 14

15 16

17 18

SUMMARY

In previous chapters (15, 17, and 18), the text presented techniques for
making inferences about population means. Each technique was described first
in an ideal form in which the appropriate standard error could be calculated
from known population parameters. A modification necessary for practice use
was then introduced, because in practical use the standard error must be esti¬
mated from the sample or samples on hand. Estimation introduces a degree of
error that makes the normal curve the wrong model for the distribution of the
"z" statistic, but the error is tolerable when sample size is large. The pre¬
sent chapter describes a modification of the modification that is necessary
when sample size is small.

The procedures described in this chapter are sometimes called small-sample

procedures, which might lead one to think that the basic issue is one of sample

size. This is not so. The fundamental issue is whether the formulas for the

standard errors are based on population ____ or on sample esti¬

mates of those ______ [19.1] . If based on population

_the procedures of chapters 15, 17, and 18 / this chapter are exact¬

ly correct irrespective of sample size [19.1]. If based on estimates of popu¬

lation parameters, the procedures of chapters 15, 17, and 18 / this chapter are

exactly correct [19.1].

t = "Z" ^ z

To test an hypothesis about the mean of a single population, we draw a ran¬


dom sample from the population, find its mean, and ask whether obtained value
would be likely or unlikely to occur if the hypothesis about the population were
true. To determine the likelihood of the obtained value, we consult the sampling
distribution of the mean for samples of whatever size we drew. This distribution
could be generated by combining means of repeated random samples of the given size
Inference about Means and the t Distribution 163

To locate our obtained sample mean within the sampling distribution that
would occur if the null hypothesis were true, we would like to calculate a z
score according to the formula:

X - ]i
. z =
a—
x

If the assumptions of Section 15.10 are satisfied, values of z will be _

distributed as we move from random sample to random sample, but the

values of and _will remain fixed [19.2]. Consequently, values of z

may be considered to be formed as follows:

(normally distributed variable) - ( constant / variable )


z = [19.2]
( constant / variable )

Subtracting a constant from each score in a normal distribution changes / does

not change the shape of the distribution, and so / nor does dividing by a con¬

stant [19.2]. Consequently, z will be normally distributed when _ is normally

distributed, and the normal distribution is therefore the correct distribution

to which to refer z for evaluation [19.2].

In practice, though, we can only calculate an approximate z, z , in which

an estimate is substituted for the true value of the denominator. So in reality,

if we were to draw repeated random samples from the population of interest, not

only would values of X vary, but so would the estimates of a-. The resulting

statistic, "z", may be considered to be formed as follows:

(normally distributed variable) - (constant)


"z" =
(variable)

Because of the presence of the variable quantity in the denominator, this statis

tic does not follow the normal distribution, though it is close to normal when

sample size is large. The distribution it does follow is called Student's dis¬

tribution, and the statistic itself is called t. "z" was invented by the author

of the text as a temporary name for pedagogical purposes.

Beginners sometimes think that it is the sampling distribution of means that

becomes nonnormal when must be estimated from the sample.


O— This is not so.
X
If the assumptions of Section 15.10 are met, will be normally distributed

regardless of sample size [19.2], However, the position of X is not evaluated

directly; rather it is the value of the statistic _ = (X - y)/s— [19.2]. And


X
164 Chapter 19

although z is normally distributed, the resulting value of t is not, because the

denominator is a variable.

Characteristics of the Family of t Distributions

Student s distribution of t is not a single distribution but rather a

- of distributions [19.3]. The exact shape of a particular member of that

ramily depends on sample size, or, more accurately, on the number of

of freedom (df), a quantity closely related to sample size [19.3]. In general,

the number of _ , of freedom corresponds to the number of observations

that are completely ________ to vary [19.4]. One might at first suppose that

this would be the same as the number of scores in the sample (or samples), but

often conditions exist that impose restrictions so that the number of

of freedom is smaller / larger [19.4]. For example, the number of

of freedom in a problem involving the calculation of s is __[19.4, p. 332] .


When the number of degrees of is __ (df = oo) ,

ine distribution or Student's t is exactly the same as that of normally distribu¬

ted z [19.3]. As the number of degrees of_____ decreases, the charac¬

teristics of the t distribution begin to depart from those of normally distributed

z [19.3]. When samples are large/ small , the values of s— will be close to
X
that of
and wil1 be much like z [19.3, Paragraph 1], Its distribution

-ls, consequently, very nearly normal. When sample size is large / small , the

values of s— will vary substantially about


__ [19.3]. The distribution of t
Vvil_L then depart significantly from that of normally distributed z.

When the number of degrees of is less than

the theoretical distribution of t and the normal distribution of z are alike in

some ways, and different in others [19.3], They are alike in that both distribu¬

tions have a mean of_, are symmetrical / asymmetrical , and are unimodal

bimodal [19.3]. The two distributions differ in that the distribution of t is

more / less leptokurtic than the normal distribution (a leptokurtic curve has a

lesser / greater concentration of area in the center and in the tails than does a

normal curve), has a smaller / larger standard deviation (remember that o = )


z -
and depends on the number of of freedom [19.3].
Inference about Means and the t Distribution 165

Putting t to Work

When sample size is so small that the procedures presented in the previous
chapters do not work accurately, the same procedures can still be followed with
these changes: (a) "z" should be called t, because t is the conventional name.
(b) In hypothesis testing, an obtained value of t should be evaluated not with
reference to the normal distribution, but with reference to the distribution of
t for whatever degrees of freedom are involved. For inferences regarding a single
population mean, df = n — 1. For inferences regarding two population means, df —
(nx - 1) + (ny - 1) in the case of independent means and df = 1 less than the
number of pairs of scores in the case of dependent means. (c) In interval estima¬
tion, to determine the value of tp (the quantity analogous to zp), not the normal
distribution but the t distribution for the appropriate degrees of freedom should
be used. The rules about degrees of freedom just cited also apply here. (d) For
both hypothesis testing and estimation in the case of two independent means, the
standard error of the difference between two means should be calculated as noted
below.

Pooling Variance Estimates When Dealing with Two Independent Means

To test the difference between two independent means, the procedure introduced

in Chapter 17 calls for the computation of an approximate z as follows:

[19.8, first formula]

The denominator here, s—= /s—2 + s—2 - /(sxl/n^) + (sY2/ny).

When sample size is relatively small / large , "z" is very nearly normally

distributed [19.8]. As sample size increases/ decreases , its distribution de

parts from the normal, but, unfortunately, neither is it distributed exactly as

Student's t [19.8], except when = ny.

The eminent British statistician, Ronald A. Fisher, showed that a slightly

different approach to the computation of s—results in a statistic that is

distributed as Student's t. The change that Fisher introduced was, in effect,

to assume that _ = _ [19.8]. This is often called the assumption of


2 2
[19.8]. Under the assumption, s^ and sy are
homogeneity of ___
estimates of the same population variance. If this is so, then rathei than make

two separate estimates, each based on a small sample, it is preferable to combine

the information from both samples and make a single ___ estimate of the

population variance [19.8], This estimate is called s 2 and is calculated as:

2 [Formula 19.2]
s
p
166 Chapter 19

This quantity may be substituted for _ and for in the formula for

s— — [19.8]. Some algebraic manipulation simplifies the formula to s— — =


2\ i 2\ ~~ Y

v's 2 (1 /n + 1/n ) . The same formula should be used in setting a confidence inter-
p x Y
val for the difference between two population means.

Is the assumption of homogeneity of variance usually justified? Experience

suggests that the assumption of homogeneity of variance appears to be reasonably

satisfied in only a few / many cases [19.14]. Furthermore, violation of the

assumption makes less disturbance when samples are large / small than when they

are large/ small [19.14]. As a rule of thumb, it might be hazarded that moder¬

ate departure from homogeneity of variance will have little effect when each

sample consists of _ or more observations [19.14],

The problem created by heterogeneity of variance is minimized / maximized

when the two samples are chosen to be of equal size [19.14].

SYMBOLISM DRILL

Symbol_Pronunciation_Meaning

2 Number of scores in a population

1 Number of scores in a sample

3 A raw score, or the set of raw scores

4 Result of summing quantities of some kind

6 IX/N; the of a

5■ T,X/n; the of a

9 X - X or X - y; score

1 1 Zx2/N; of a

12 yZx2/N; of a

1 3 I>x2/n; of a

1 4 /Ex2/n; of a
Inference About Means and the t Distribution 167

Symbol Pronunciation Meaning

32
Estimate of G2 ; Zx2/(n-l)

2 3
Estimate of G; /tx2/ (n-1)

22
Standard error of the mean; 0//n

Estimate of G—; s//n


2 4 - ■■ 1 ■————■—- X.

- - Standard error of the difference between two


3 6 means

3 7 Estimate of G—

2 5
Null hypothesis

Alternative hypothesis
2 6

Mean of sampling distribution of means


2 1

Value of y stated in null hypothesis


2 7

3 3
True value of y

-——- Mean of sampling distribution of differences


3 4 between two means

Value of y -y stated in null hypothesis


3 5 X Y

Approximate z score with denominator estimated


2 8

Critical value of z
29

"tee" Conventional name for "z"


4 0

X - Yj score
3 8

Confidence coefficient
3 9

Level of significance; risk of Type I error


3 0

Risk of Type II error


3 1

4 1
"dee ef" Degrees of freedom
168 Chapter 19

MAP of t and RELATED CONCEPTS

is the conventional name is distributed under random


for the quantity hitherto sampling (if the assumptions
in this text called reviewed in Section 19.14 are
correct) as

a member of the family


of Student's
designates distribution of t

X, D,
■ .p the sampling distribution of
or - the mean of has
(X~Y) quantities of the given kind

mean of zero
divided by

j
an estimate of the standard error of this
sampling distribution

shape
that is
must be_made, for quantities of the
kind (X-Y), by pooling information standard deviation
from the two samples (assuming that greater than one,
°X = Oy ) when the samples are but smaller and
independent closer to one for
a larger number of

unimodal
degrees of freedom

symmetrical
is number of scores that
are completely free to
leptokurtic in comparison to vary
the normal distribution, but
closer to normal for a larger
number of ____
INFERENCE ABOUT PEARSON CORRELATION
COEFFICIENTS

20.1 Introduction

20.2 The Random Sampling Distribution of r

20.3 Testing the Hypothesis that p = 0

20.4 Fisher's z' Transformation

20.5 Estimating p

20.6 Testing the Hypothesis of No Differ¬


ence between px and p2: Independent Samples

20.7 Testing the Hypothesis of No Differ¬


ence between px and p2: Dependent Samples

20.8 Concluding Comments

PROBLEMS and EXERCISES

169
170 Chapter 20

SUMMARY

Like other statistics, the Pearsonian correlation coefficient would vary


from sample to sample were samples to be drawn repeatedly from a given popula¬
tion. Thus the value characterizing a sample cannot be taken as the correlation
coefficient for the parent population, and it might not come even close to the
population value. If only a sample is available, as is usually the case, the
best we can do to learn the population value is to employ one of the techniques of
inferential statistics, hypothesis testing or estimation. These are the subject
of this chapter.

The Random Sampling Distribution of r

Consider a population of pairs of scores (X and Y) that form a bivariate

distribution. The Pearsonian correlation coefficient, calculated from the com¬

plete set of paired scores, is _ [20.2]. When the coefficient is calculated

from a sample, it is _ [20.2]. If we draw a sample of given size at random

from the population, calculate r, return the sample to the population, and repeat

this operation indefinitely, the multitude of sample r's will form the

sampling ___ of r for samples of the particular size [20.2]. As

we might expect, the values of r will vary more / less from sample to sample

when sample size is large.[20.2].

If the sample values of r formed a normal distribution, we could proceed to

solve problems of inference by methods already familiar. Unfortunately, the

sampling distribution of r is not normal in shape. When p = , the sampling

distribution of r is symmetrical and nearly normal [20.2]. But, when p has a value

other than _, the sampling distribution is skewed [20.2]. Because the

distribution of sample r's is not normal, alternative solutions must be sought

to provide a practical frame for inference. For some problems, the distribu¬

tion affords an appropriate model [20.2].

Testing the Hypothesis that p = 0

The t distribution (or rather the family of t distributions) is useable in

testing the hypothesis that p = 0, because a simple combination of the value of

r characterizing a given sample and the number of cases in the sample, n, has a

sampling distribution that is that for Student's t with n - 2 degrees of freedom

when p = 0. The combination is:


Inference about Pearson Correlation Coefficients 171

[Formula 20.1]
t =

where r is the sample / population coefficient and n is the number of -

of scores in the sample / population [20.3].

To test the hypothesis that p = 0, we calculate t according to Formula 20.1

and evaluate it, according to the ____level adopted and

degrees of freedom, with reference to the values of t found m Table D

of Appendix F [20.3]. The method can be used for a variety of levels of signifi¬

cance and for one- or two-tailed tests.


What if a given r turns out to be significantly different from zero (that is,

what if a given r permits one to reject the null hypothesis that p = 0)? In

terms of the research question posed, the finding of a significant r is only a

preliminary. The next question is whether the correlation is - enough

to be of practical or theoretical use [20.3], With small / large samples, an

unusefully small r may prove to be "statistically significant" [20.3]. Although re¬

searchers frequently draw their inferences from a value of r by testing the null

that p = 0, constructing an interval estimate of the population value offers a

number of advantages. To do so requires transforming values of r to quantities

called z' ("zee prime").

From r to z1
The transformation to z' is accomplished via a formula invented by R. A.

Fisher, and it yields a quantity with two desirable properties:

1. The sampling distribution of z' is approximately - irrespective

of the value of _ [20.4].


2. The standard error of z\ unlike the standard error of r, is essentially

independent of the value of _ [20.4].


Because z' has these properties, we can transform a value of r to z' and then

apply inferential techniques to the Z* that use the convenient normal-curve model.

But the outcomes of the inferential techniques will still apply to the correlation

coefficients of interest.
in using the Z1 transformation, reasonable results will obtain unless sample

size (n) is very large / small or p is very low / high [20.4, last paragraph].
172 Chapter 20

Estimating p

Rather than testing the hypothesis that p is some specific value (usually

zero), it may be desirable to ask within what limits the population coefficient

may be found. A confidence interval may be constructed by translating the sample

r to z1 , and following the rule:

z1 — [Formula 20.3]

where z' is the value of z' corresponding to the sample _; z is the magni-
P
tude of z for which the probability is _ of obtaining a value so deviant or

more so (in either direction); p is (1 - C), where C is the

__; and , is the standard of Fisher's z' [20.5].


Z 1
The formula for the standard _ of z' is:

O' * = - [Formula 20.4]


4i

Application of the above rule will result in a lower limit and an upper limit,

both expressed in terms of z1. These must then be converted to by means of

Table F in Appendix F [20.5].

Testing the Hypothesis of No Difference between p1 and p0

Given a sample from each of two bivariate populations, 1 and 2, a satisfactory

test of the hypothesis that pi = p2 is available only if the samples (and popula¬

tions) are independent. That is, there must be no logical way to pair either set

of scores in one sample with either set of scores in the other sample. The hypoth¬

esis is tested with the same procedure used for testing a hypothesis about two

meansr except that the test statistic is a z score computed as follows:

z = - [Formula 20.5]

The denominator here is the standard error of the difference between two values of

z1, and its formula is:

a ,
z i-z 2 ~ [Formula 20.6]
Inference about Pearson Correlation Coefficients 173

The Assumption of Bivariate Normality


No assumption about the ~ of the bivariate distribution is required

when the correlation coefficient is used purely as a descriptive index [20.8].

However, all of the procedures for inference about coefficients described in this

chapter are based on the assumption that the population of pairs of scores forms

a bivariate distribution [20.8]. This implies that X is ---_

distributed, Y is_distributed, and that the relation between X and

y j_g [20.8] . If we are not dealing with a normal bivariate popula¬

tion, then these procedures for inference must be considered to yield only approx¬

imate results.

SYMBOLISM DRILL

Number of scores in a
n

Number of scores in a
N

Z ; the of a
X

Z ; the of a
u
score
or

Z of a
1 3

a
z of a
1 2

a2
z of a
1 1

s z of a
14
of
23
of ; E
32
174 Chapter 20

Symbol_Pronunciation_Meaning

22 CT—
X---- /

2 4 ST7 /

36 °x-Y

3 7 S-_-

2 1 U ___of __
X

27 y Value of stated in
hyp

34 y—
x-y

Value of stated in
5 (y -y )
* y hyp

2 5 h0

26 H
A

1 5 (value - mean)/(standard deviation)

ff ff
2 8 Z

29 Z
crit

of t

3 0 ^
of

3 1 3 of

38 D score

16

1 7 P

42 a
r

4 3 Fisher’s transformation of
Inference about Pearson Correlation Coefficients 175

Symbol_Pronunciation Meaning

_ of
4 4 CF i
Z -
of the _

between two independent z''s

3 9 C

t Conventional name for


40
CHAPTER 21

SOME ASPECTS OF EXPERIMENTAL DESIGN

21.1 Introduction

21.2 Type I Error and Type II Error

21.3 The Power of a Test

21.4 Factors Affecting Type II Error:


Discrepancy between the True Mean and
the Hypothesized Mean

21.5 Factors Affecting Type II Error:


Sample Size

21.6 Factors Affecting Type II Error:


(1) Variability of the Measure; (2)
Dependent Samples

21.7 Factors Affecting Type II Error:


Choice of Level of Significance (a)

21.8 Factors Affecting Type II Error:


One-Tailed versus Two-Tailed Tests

21.9 Summary of Factors Affecting Type


II Error

21.10 Calculating the Probability of


Committing a Type II Error

21.11 Estimating Sample Size for Tests


of Hypotheses about Means

21.12 Some Implications of Table 21.1


and Table 21.2

21.13 The Experiment versus the In Situ


Study

21.14 Hazards of the Dependent Samples


Design

21.15 The Steps of an Investigation

177
178 Chapter 21

PROBLEMS and EXERCISES

1 _ 2_

3 4

5 6

7 8

9 10

11 12

SUMMARY

A researcher faces not only statistical problems but also substantive ones,
and the two types of problem are often interrelated. The present chapter treats
some such interrelationships.

Errors in Hypothesis Testing and the Power of a Test

In testing a null hypothesis, a researcher may go wrong in either of two ways

by committing a Type I error, or by committing a Type II error. The probability

of committing each is defined as follows:

For a Type I Error, a = Pr (rejecting_is true / false );

For a Type II Error, 3 = Pr(accepting is true / false ) [21.2].

Thus a Type II error is committed when a false null hypothesis is accepted. The

opposite occurs when a false null hypothesis is rejected. Since the probability

of the former is 3/ the probability of the latter is [21.3]:

(1 - 3) = Pr(rejecting Hq Hq is true / false ) [21.3]

In other words, (1 - 3) is the probability of claiming a significant difference

when a true difference really exists. The probability of doing so, (1 - 3), is

called the _ of the test [21.3].

There are a number of factors that affect 3f and these are listed below.

Since 3 and power are complementary, it must be remembered that any condition

that decreases 3 increases/ decreases the power of the test, and vice versa [21.3]
Some Aspects of Experimental Design 179

Factors Affecting Type II Error

1. The greater the discrepancy between ytrue and V1hyp• the greater / less

the probability of falsely accepting the hypothesis [21.4]. This generalization

applies to a test of a hypothesis about the mean of a single population. For

hypotheses about the difference between the mean of a first population and the

mean of a second, the greater the discrepancy between the true difference and

the hypothesized difference (which is usually zero), the less the probability of

falsely accepting the hypothesis.


2. Other things being equal, the larger / smaller the size of the sample(s),

the lower the probability of committing a Type II error [21.5].

3. Increase in sample size reduces the risk of Type II error by reason of

its action in reducing the standard error of the mean. Since the standard error

of the mean is o//n, another way to make it smaller is to increase / reduce the

size of a [21.6]. a is the standard deviation of the set of measures, and it re¬

flects not only variation attributable to the factors of interest, but also vari¬

ation attributable to extraneous and irrelevant sources. Any source of extraneous

variation tends to increase / decrease a over what it would be otherwise, so an

effective effort to eliminate such sources will tend to increase / decrease a

and thus augment / reduce 3 [21.6]. In comparing means of two groups, the in¬

dependent / dependent sample design makes it possible to reduce the standard error

of the by controlling the influence of extraneous variables

[21.6].
4. 3 is also related to the choice of a. In general, reducing the risk of

a Type I error increases / decreases the risk of committing a Type II error

[21.7] . The primary consideration in selecting a should be the logic of the ex

periment. But unthinking conservatism in minimizing a will have an unnecessarily

adverse influence on _ [21.7].


5. Other things being equal, the probability of committing a Type II error is

greater/less for a one-tailed test than for a two-tailed test [21.8].

Estimating Sample Size for Testing Hypotheses about Means

In testing hypotheses about means, convenience suggests the desirability of

large / small sample, but accuracy suggests a large / small one [21.11]. How
a
I

180 Chapter 21

In the basic two-group experiment, control may be achieved in a number of

ways:

1. One fundamental technique is to hold the condition of a possible inter¬

fering factor constant for every_in the study [21.13] . But there

is an important price to be paid for seeking control in this manner: the tighter
| M
the control developed by holding many conditions constant, the more limited the

of the outcome [21.13].

2. The -groups design (see Section 17.12) equates subjects in

the two groups on some characteristic, rather than holding the characteristic

constant for all subjects [21.13].

3. Randomization provides another most important source of control. Random

assignment of treatment achieves control over differences that subjects may bring

to the study but still limits / without limiting generalization in the way that

would be done by holding these variables constant [21.13], Although random as¬

signment of treatment conditions to subjects is a powerful experimental tool in

controlling extraneous factors, it is no cure-all. It can take care of potentially

interfering subject variables (characteristics of the subjects that might influ¬

ence the dependent variable). But it cannot control certain other types of ex¬

traneous influence, namely those factors that vary along with the treatment from

one condition to the other. Thus conducting the study as an experiment with ran¬

dom assignment of treatment conditions can be a great help in interpreting the

meaning of the outcome of the statistical test, and it means / but it does not

mean that the answer to the substantive question posed in the beginning is auto¬

matically provided by the statistical conclusion [21.13].

Limitations of In Situ Studies

Many independent variables of potential interest are not subject to manipula¬

tion by the experimenter. Some are unmanipulable for ethical reasons. Other

variables are unmanipulable because they are _ characteristics of

of the organism [21.13], To study the effect of differences in such a variable,

we identify subpopulations possessing the desired differences and compare samples

from the subpopulations.


Some Aspects of Experimental Design 181

large a sample is really needed? To answer this question, we must first decide
, . , . value and
what magnitude of discrepancy between the ----

the __ value of the parameter is so great that, if one of this size or

larger existed, we would want to be reasonably certain of discovering it [21.

The decision as to just how big a discrepancy between the parameter's hypothesized

value and its true value is important is fundamentally a statistical / substantive

question and not a statistical / substantive one [21.11]. If we can specify

(1) this discrepancy and (2) the risk (B) we are willing to take of overlooking

a discrepancy of that magnitude, then we can estimate the size of the sample

samples we will need for our research.


Section 21.11 offers two tables for estimating sample size Close examination
of the tables reveals some important points about the design of researc .

1. If it is satisfactory to discover a discrepancy only when it is large,

more/fewer cases are required [21.12].

2. If it is important to discover a discrepancy as small as one-quarter of a

standard deviation of the variable measured, sample size must be large / small

[21.12].
3. If it is acceptable to increase the risk of a Type II error, larger/

smaller sample size is nedded [21.12],

4. If a is set at .01 rather than .05, a larger / smaller sample will be

required to maintain the same level of protection for 3 [21.12].

5. If a one-tailed test is appropriate, a larger / smaller sample will be

required to maintain the same level of protection for B [21.12].


6. If the problem involves the difference between two means rather than a

hypothesis about a single mean, approximately -- as many cases will be

required in each of the two samples to achieve the same level of protection

against committing a Type II error [21.12],

Control in ExpGrimsntution
in the classic model of the experiment, all variables are controlled except

the one subject to inquiry. The variable to be studied is manipulated, and the

effect on the variable under observation is examined. The variable subject to

manipulation is called the independent / dependent variable, and that under ob

servation is called the independent / dependent variable [21.13, first parag


182 Chapter 21

Studies in which the element of manipulation of the independent variable is

absent can still / cannot be called experiments [21.13]. The text refers to
between
them as in situ studies. The important difference/experiment and in situ study

is that in the latter, a significant degree of control is lost. This loss of

control makes it more / less difficult to interpret the outcome of such studies

[21.13].

The loss of control arises because when individuals are selected according

to differences that they possess in the variable we wish to investigate, they

inevitably bring with them associated differences in other dimensions. If dif¬

ferences in these extraneous dimensions are related to the dependent variable,

we may well find that the different "treatment" groups are significantly differ¬

ent with regard to the dependent variable, but the origins of these differences

may be so entangled that it is extremely difficult or even hopeless to sort them

out. In short, it is most difficult to develop statements of causal / correlation¬

al relationship in studies of this type [21.13].

But this is not to say that in situ studies are worthless.

Hazards of the Dependent-Samples Design

The dependent-samples design can be used to good effect when it is of the

variety in which matched pairs of subjects are formed with treatment conditions

randomly assigned to the two members of each pair. But trouble arises in the

other three versions of this design:

1. When repeated measurements are made on the same subjects, it is possible

that exposure to the first treatment condition will change the subject in some

way that affects his or her performance under the treatment condition assigned

second. An influence of this sort is called a(n) _ effect [21.14].

When an effect is present and it can be assumed that the influence

of one treatment upon the other is the same as that of the other upon the one,

the outcome of the experiment may be interpretable, if treatment condition has

been assigned with regard to the order of treatment [21.14].

However, the disturbing order effect will introduce an additional source of vari¬

ation in each set of scores, according to the magnitude of its influence. This

tends to increase / decrease the standard error and consequently to increase /


Some Aspects of Experimental Design 183

decrease the power of the test [21.14].

Furthermore, if the influence of one treatment upon the other is not the

same as that of the other upon the one, _ will be introduced as well as

unwanted variation [21.14]. The outcome then becomes difficult or impossible

to interpret.
2. If the design utilizes repeated observations on the same subject but as¬

signment of the treatment condition is not random with regard to order, we are

in less/even graver difficulty [21.14]. Any order effect will bias the com¬

parison. Studies of this type may also be subject to another source of bias:

the regression effect. If subjects are selected because of their extreme scores

on some measure, we expect remeasurement on the same variable to yield scores

closer to / farther from the mean [21.14].

3. A third troublesome case in that in which the two groups consist of

different subjects matched on an extraneous but related variable, and assignment

of treatment condition to members of a matched pair is nonrandom. These condi¬

tions are likely to arise when studying the effect of a nonmanipulable variable

in intact groups. Such investigations fall in the category of in situ studies

and are susceptible to all of the usual difficulties of such studies plus several

additional hazards:
(a) Matching may increase / reduce , and therefore obscure, the influence

of other important variables associated with the variable on which matching took

place [21.14].
(b) When the two intact populations differ widely on the variable on which

matching is done, it may be possible to form matched pairs only by using subjects

who are unusual relative to others of their own kind. Under these conditions,

any conclusion reached will be generalizable only to peculiarly constituted sub¬

groups of the two target ____ [21.14].

(c) When subjects are selected for pairing because of their extreme scores

on the matching variable, a regression effect may be expected. This is likely

to occur in studying two intact populations that differ widely / slightly on

the matching variable [21.14].


184 Chapter 21

SYMBOLISM DRILL

Symbol Pronunciation Meaning

1
Number of scores in a sample

Number of scores in a population


2

A raw score, or the set of raw scores


3

Result of summing quantities of some kind


4

5 Z ; the mean of a sample

Z ; the mean of a population


6

or ; deviation score
9

Z ; variance of a population
1 1

Z ; standard deviation of a population


12

1 3 Z ; variance of a sample

Z ; standard deviation of a sample


14

2. v1
32 Estimate of 0 ; Z

23 Estimate of a; Z

22 Standard error of the mean; /

24 Estimate of O—; /
-——. — ———-

- Standard error of the difference between two


36 means

37 Estimate of 0— —
Z\ JL

25 Null hypothesis

Alternative hypothesis
26

21 Mean of sampling distribution of means

27 Value of y stated in null hypothesis


Some Aspects of Experimental Design 185

Symbol_Pronunciation Meaning

33 _ ____
True value of y

Mean of sampling distribution of differences


34 __ between two means

Value of y -y stated in null hypothesis


35 X Y

(value - mean)/(standard deviation)


1 5

Approximate z score with denominator estimated


2 8

Critical value of z
29

Level of significance; risk of Type I error


30

Risk of Type II error


3 1

X - Y; difference score
3 8

Pearson correlation coefficient for a sample


1 6

Pearson correlation coefficient for a pop'n


1 7

Standard error of r
42

Fisher's transformation of r
43

Standard error of z'


44

Standard error of the difference between two


45 independent z*'s

Confidence coefficient
39

Conventional name for "z"


40

Degrees of freedom
41
ELEMENTARY ANALYSIS OF VARIANCE

_22.1 Introduction

22.2 One-Way Analysis of Variance: The Hypothesis

22.3 The Effect of Differential Treatment on


___ Subgroup Means

22.4 Measures of Variation: Three Sources

22.5 Within-Groups and Among-Groups Variance


__Estimates

22.6 Partition of Sums of Squares and Degrees of


___Freedom

22.7 Raw Score Formulas for Analysis of Variance

_ 22.8 The F Distribution

____ 22.9 Comparing sw2 and s^ according to the F Test

__ 22.10 Review of Assumptions

22.11 Two-Way Analysis of Variance

22.12 The Problem of Unequal Numbers of Scores

22.13 A Problem in Two-Way Analysis of Variance

22.14 Partition of the Sum of Squares for Two-Way


__ anova

22.15 Degrees of Freedom in Two-Way Analysis of


____Variance

22.16 Completing the Analysis

22.17 Studying the Outcome of Two-Way ANOVA

22.18 Interaction and the Interpretation of Main


Effects

187
188 Chapter 22

22.19 Alternatives to the General F Test for a


__ Treatment Effect

__ 22.20 Constructing a Comparison

__ 22.21 Standard Error of a Comparison

__ 22.22 Evaluating a Planned Comparison

__ 22.23 Constructing Independent Comparisons

__ 22.24 Evaluating a Post Hoc Comparison

PROBLEMS and EXERCISES

1___

2_____

3-

4_

5 ____

6 ____

7 ___

8 _____

9 _____

10______

11_____
1 2 _____

13 ______
14 _______

15 _____

16 _

17
Elementary Analysis of Variance 189

SUMMARY

Analysis of variance is a technique of inference applicable to many studies


in which quantitative data are collected in more than two conditions. The text
describes two varieties, "one-way" and "two-way."

One-Way Analysis of Variance

Terminology

In analysis of variance, an independent variable is known as a _

and the varied conditions of an independent variable are known as

of that _ [22.11, fourth sentence] . One-way anal¬

ysis of variance is appropriate for a study in which there is just one treatment.

The number of conditions (levels) of that treatment is symbolized k, and k may

be 2 or some larger number. Subjects are assigned independently and _

to the k treatment conditions [22.2, second paragraph]. The conditions

are identified as D, E, F, and so on, and the individuals subjected to a given

condition, in reality (those in the sample actually studied) or hypothetically

(those in the parent population), are called a subgroup.

The Hypotheses

If the different treatment applied to the subgroups has no differential

effect on the variable under observation, then we may expect these subgroup

population means to be _ [22.2]. To inquire as to whether variation

in treatment made a difference, we therefore test the null hypothesis:

H0:

against the alternative that they are __ in some way [22.2]. In

testing the hypothesis of no difference between two means, a distinction was

made between directional and nondirectional null / alternative hypotheses

[22.2]. Such a distinction still / no longer makes sense when the number of

subgroups exceeds two [22.2]. In the multigroup analysis of variance, H0 may be

false in only one way / in any one of a number of ways [22.2].

In spite of this difference, one-way analysis of variance is very closely

related to the t test of the difference between two dependent / independent

means [22.2, second paragraph]. In fact, the outcome of analysis of variance


190 Chapter 22

applied to the special case of two subgroups is identical with that of the t

test. Like the t test, the analysis of variance is suited to samples only of

large size / of any size [22.2].

The General Formula for an Unbiased Estimate of a Population Variance

True to its name, analysis of variance is concerned with the _____

as a measure of variability [22.5].* An unbiased estimate of a population vari¬

ance is made by calculating the sum of the_of the deviation of

each score from the sample / population mean, and dividing by the number of

degrees of associated with that sum of squares [22.5], which will

be one less than the number of scores. This general relationship can be summar¬

ized by the equation:


s2 = -

where the letters _ stand for "sum of squares (of deviation scores)" [22.5).

2
Homogeneity of Variance and the Within-Groups Estimate of U

The analysis makes the assumption that the several subgroup populations all

have the same variance, which is symbolized O . An estimate of O could be made

from any one of the subgroup samples by taking the sum of ___ of the

deviations of scores in that group from the subgroup / grand mean and dividing

by the appropriate number of degrees of ___ [22.5]. However, on the

assumption that the subgroup population variances are the same for all subgroups

(the assumption of ___ of variance) , a better estimate may be

made by combining information from these several subgroup samples [22.5]. Such

an estimate may be made by pooling the sums of squares of deviation scores from

the several subgroups and dividing by the sum of the degrees of freedom character

*If the difficult material in this chapter has not totally destroyed your
sense of humor, you should have recognized this question as analogous to Groucho
Marx's favorite on You Bet Your Life: "Who is buried in Grant's tomb?" But the
proper answer to Groucho's question is not the obvious one, for the tomb actually
holds both Grant and Grant’s wife. Similarly, the quantity that is analyzed
(that is, decomposed) in the analysis of variance is not exactly a variance; it
is the sum of the squared deviation scores that contributes to an estimate of
the population variance derived from the total set of scores on hand, as the
text explains in Section 22.6.
Elementary Analysis of Variance 191

izing each of the subgroups. This estimate is called the within-groups / among-

groups variance estimate; we shall symbolize it by _ [22.5]. The formula

is:
w —- [Formula 22.1]
df}W

where Xis a score in subgroup sample D, etc.; XD is the __ of subgroup

sample D, etc.; and n^ is the _ of elements in subgroup D, etc. [22.5] .

The numerator of this expression is called the within-groups / among groups sum

of _ (SSW) and the denominator the within-groups / among groups

degrees of __ (dfw) [22.5].

In practice, sw2 is more easily computed by Formula 22.5 on p. 399.

The Among-Groups Estimate of a2


If the null hypothesis is true (that is, if there is no treatment effect),
it is possible to derive another estimate of a2, independent of the within-groups
estimate, solely from the means of the k subgroup samples.

First, these means are treated as though they were raw scores, and the vari¬
ance of the population from which they came (the population of subgroup sample
means) is estimated in the usual fashion: find the mean of all the "scores"
(which here is the mean of the sample means, and this will be the same as the
grand mean of all raw scores); for each "score," find its deviation from this
mean (here, find the deviation between each sample mean and the grand mean);
square each deviation; sum the squares of the deviations; and divide the sum by
one less than the number of "scores" (which is k - 1 here). The symbols describ¬
ing these operations are:

1(X - X)2

k - 1

The k over the summation sign indicates that there are k quantities to be summed,

each has the form (X X)2

Now the variance estimated in this way is the variance of the population of
subgroup sample means, not the variance that we assume to characterize each of
the subgroup populations of raw scores. But from the former we can estimate the
latter, using the following reasoning:

An estimate of the standard error of a mean, s—, is computed from the formula

s^/Zn. Squaring both sides of this equation we have: _ [22.5,

paragraph 4]. Solving this equation for s^ , it reads.


192 Chapter 22

s 2 = [Section 22.5]
A

But s—2 is the quantity that we just computed; that is, it is the estimate
A
of the variance of the population of subgroup sample means that we derived from
the means themselves. We computed it directly here, whereas earlier in this
course we always computed it by following the formula s2/n. So we can take our
value for s— and multiply it by n to get an estimate of the variance that we
A
assume to be common to each population of raw scores.

This estimate is called the within-groups / among-groups estimate and is

symbolized by _ [22.5, paragraph 4]. When the subgroup samples are of equal

sizes, the formula for this estimate, in deviation score form, is:

= —rz = - [Formula 22.2]


atA

where _ is the mean of a subgroup sample; _ is the mean of the combined

distribution of scores (_ mean) , k is the number of _,

and n is the number of scores in each subgroup sample [22.5]. The numerator of

this formula is called the within-groups / among-groups sum of

(SSA), and the denominator is called the within-groups / among-groups degrees

of _ (dfA) [22.5].

When the subgroup samples are of unequal sizes, a slightly different formula

is required:

s^2 = - [Formula 22.3]

where _ is the number of scores in the ith subgroup sample and is the

mean of the ith subgroup sample [22.5].

In practice, SSA is more easily computed by Formula 22.6 on p. 399.

If there is no treatment effect, subgroup sample means will tend to cluster

about _ as predicted by the standard error of the mean, and sA2 will be an

unbiased estimate of inherent variation, 02 [22.5, p. 396]. It will, therefore,

estimate the same quantity as that estimated by _ [22.5, p. 396]. On the

other hand, if there is a treatment effect, the sum of squares of the deviations

of X about X will tend to be larger / smaller , and s^2 will tend to be larger /

smaller than sw2 [22.5, p. 396].


Elementary Analysis of Variance 193

2. 2
Comparing sA and s^

To compare the two estimates of c2, sA2 and sw2, we form them into a quantity
called an F ratio: F = sA2/sw2. If the null hypothesis is true, which means that
there is no treatment effect, the top and the bottom of this ratio will have^
about the same value, so the value of F will be about one. But if the null is
false and there is a treatment effect, sA2 will tend to be larger than sw , as
noted in the paragraph just above, and now the value of F will tend to be greater

than one.
But even if the null hypothesis were true, it would be possible for F to be
greater than one, even considerably greater, just because of sampling variation.
The best we can do is to determine whether a given value of F is likely or un
likely to occur should the null hypothesis be true.

We thus need to know the sampling distribution of F when the null is true.
This depends on the number of degrees of freedom associated with sA and on the
number of degrees of freedom associated with sw2. Table H in the back of t e
text shows selected values from the various members of the family of F distnbu
tions. Hypothetically, the values of F that make up a sampling distribution
could be generated by repeatedly replicating a given experiment: from the same
populations, draw samples, each of whatever size was originally used, and compute
sa2, Stf2, and their ratio, F, for each replication. The null hypothesis must
remain true throughout the replications.

To test the hypothesis that <and so on for a11 k sub9rouP popu~

lations, we must compare the calculated value of F with the values of F that would

occur through random sampling if the hypothesis were true / false [22.9]. Now

if s 2 is always placed in the numerator / denominator of F, as is customary,


will be rejected
the hypothesis of equality of subgroup population

only if the calculated value of F is larger / smaller than expected [22.9].

Consequently, the region of rejection is placed entirely in the upper / lower

tail of the F distribution [22.9].

Assumptions of the Analysis of Variance

For the procedure presented here to be entirely correct, several assumptions

must be satisfied:
distributed [22.10].
1. The subgroup populations are

2. Samples are drawn _


[22.10].
3. Selection of elements comprising any subgroup sample is ----—

of selection of elements of any other subgroup sample [22.10].

4 The of the several subgroup populations are the same

for all subgroups (homogeneity of ____) [22.10].


194 Chapter 22

As with the t test for independent means, moderate departure from conditions

specified in the first and fourth requirements will not unduly disturb the outcome

of the test. Resistance to such disturbance is enhanced when sample size rises/
falls [22.10].

If there is a choice, it is desirable to select samples of unequal / equal

sizes for the subgroups [22.10],

[The continues after the following section.]

SPECIAL HELP with ONE-WAY ANALYSIS of VARIANCE

If you re now lost m the thicket of details, it may help to walk out of the
de^n onto a hl11 to look at the big picture. You can then go down into the
details again when you see where they fit in.

Overview of the One-Way Analysis of Variance

There are k conditions (k = 2 or more), called D, E, F, and so on each reore


sentrng one level of a certain independent variable, which is called a treatment

of instruction5)ml? different doses of a drug or different methods'


of instruction.) In condition D we have scores on some dependent variable for a
sample of nD subjects, and their mean is xD, in condition e we havlscores fn the
dependent variable for a sample of nE subjects, and their mean is and so on

ical set^f T atTandom from its Parent population, which if the hvpothet-
If f saores for a11 individuals who could have been subjected to the aiven
condition The samples are independent of one another, in that each subject tls
observed in only one condition, and there is no logical way to pair thescores
n any condition with the scores in any other condition. The n's may thus be

iTlTlis "at'TeiX on^^ treatment has influenced the dependent variable,


other populations Population would have a mean different from that of the
... Populations, so our substantive question turns into a statistical one- Is
it plausible that all populations have the same mean?

1. Hr
l0: l1 D h# - V-p, and so on. As usual, H
lation parameters. 0 is a statement about the popu-

hA: H0 is false. The null can be false in more than one way dependina

between^ne-taile^and^two-taile^tests^o'longer'^makes'^sense5"

gard as unlikely from those we will regard as likely.


Elementary Analysis of Variance 195

5 Make two special assumptions: , .


(a) an2 = oJ = aF2 , and so on. The number that is assumed here to be t e

variance
D ,Lhp noDulations is called 02,
common to the populations is wn '
with no subscript, a measures

the variation inherent in each population of scores.


(b) All populations are normally distributed. . .. .

the test of the null hypothesis will not be serious y wrong.

4 Look at the samples on hand, and compute a number called F. This *


4. LOOK at me scmipic true. F tends to be about 1.0,
statistic characterizing the samples. Whe Hq
but when H0 is false, F tends to be greater than 1.0.

5. Assume that HQ is true and ask whether it would then be likely or„^**iY

for an F value of the obtained size (.or ^^er,),uniikely"°means the probability


means the probability is greater than th ' is likely or unlikely
is less than the « level. Determining whether the ion of F when the
when the null is true requires consulting the samp g tically by replicating
null is true. This distribution could be generated, hypothetically, by .ep
our study an indefinitely large number of times when the null is true. nD n

Z ~ « ~.t remain «.
A value of F is computed for each replication; it will vary from repi
replication but will be about 1.0 most of the time.

6a. If the obtained F value is a likelythe^"^ themself


its probability when the null is true exceeds a), accept the nu
failing to reject it. It must be retained as a possibility, though

necessarily likely to be true.

6b. If the obtained F value is an unlikely the'null

‘I ltS Tthfalternativf6 SVvJTi“sSS to be significant in this case.

^•rthfn^r^to

differences^among°the^population means. (The techniques for snooping ars= fancy t

tests such as the comparisons treated m Sef4^°^treatment in-


Note that a significant F value provides only some evide e '8 the sta-

whether this is so.

Details of the F Statistic


If you now see the big picture clearly, you're ready for a review of
details of the F statistic mentioned in Step 4 above:
. . rpqult of dividing one number by another. The
4a. An F value is a ratio, the result oi aivi y above. As
196 Chapter 22

The bottom estimate is called the within—groups estimate, s 2; the top estimate
is called the among-groups estimate, s 2. ^
2
F = aftiQng-groups estimate of g2 SA
within-groups estimate of oz s 2
W

4b. The estimate on the bottom of the ratio takes each of the k samples in
turn and looks within it for information about the variability of the scores in
the population from which it was drawn. Since each sample's parent population
has the same variance, according to the assumption in Step 3a above, an especially
good estimate of this variance can be made by pooling the information from within
sample D, the information from within sample E, and so on. This pooling is done
as a fancy kind of averaging: information about the variability of population D ,
which is derived from the data within sample D, is averaged with information
about the variability of population E derived from within sample E, and so on. The
number thus produced is not influenced by the dispersion among the samples; in
particular, it is not influenced by the variation among the xs, because it never
compared the xs.

F _ _ among-groups estimate of g2_


fancy average of information about g^ taken from within each of the samples

4c.^The estimate on the top of the F ratio completely ignores the information
about g that is available within each of the samples; it does not consider the
values of the individual scores (the Xs), Rather it looks only at the variability
among the samples, taking the sample means as measures of the locations of the
samples.

From the variability among the sample means, it figures an estimate of the
variance of the population of such means. This variance is not g2; rather this
is something analogous to G— , the standard error of a sampling distribution

composed of sample means. In fact, the estimate of the variance of the population
of sample means can be properly symbolized s—2 .
X
Now, when the null hypothesis is true, s_2 reflects only the variation inher-
X.
ent in each of the populations, a2. Think this through. If the null is true,
all populations have the same mean. We are assuming, moreover, that each has the
same inherent variation among its scores, variation measured by the number a2,
and that each is normally distributed. So if the null is true, we are, in effect,
drawing each sample from the same population. Why, then, should the various
samples turn out to have different means? Only because of the variation among
the scores in the population. This fact,
the fact that when the null is true s_2
X
reflects only^the variation inherent in a population of scores, means that we can
fuss with s— a bit and turn it into an estimate of the inherent variation, an
estimate of a2.

But we must remember that the estimate will be predicated on the assumption
that the null hypothesis is true. If the null is false, the several populations
Elementary Analysis of Variance 197

don't all have the same mean; at least one is different from the others.
sample mean will tend to fall where its parent population mean falls, of cours ,
* n.lV1 -jo faise the spread among the sample means will reflect no
so when the nul / f scores, but also the spread among
only the inherent variation in a populationof scor , 2 bably not
the population means. So the estimate of 0 derived rrom

»hiS“S'n«
F ratio will yield an F greater than 1.0.

Details of the Within-Groups Estimate of aj


. . . „ i-hirket here are some details of Section

s.'svs “Lszi't “ « ji.


If you're ready to continue 1 ao back to the big picture (to the Over-

issSiZ
familiar quantity Zx2/ (n - 1).
*. *■ —»*■»
-f - co+- nf scores is the mean squared deviation, remember.

(ChecketheULItrparagraph on p. 54 of this W°ff ^ “f^ome"values"

^d b^theTu^erlf^lues! 'and £’/* or «) is ^ mean of a collection of

squared deviation scores, x s.

To estimate the variance of a population from a ^e_P°PuL(See'sec-


divide the sum of the squared deviation scores no y this matter.) The es-

as dfst-rct frowst ™ of the sample itself) and

a2 (the true variance of the population).


fArTnill3 for s2 in a novel form and introduces some
Now Chapter 22 gives the foJ t x is written as (X - X). For sample
special terms to describe its pa . @ E ( _ y )f and so on. (To distin-
D this is specifically (XD XD), P E on, just

in case they're not all equa .) ' 2 The sqUares whose sum this ex-
ss, is used for a quantity of the kind IU of^^ations, of course-the
pression is talking about are rea y ,q<.„r,r„ x and its mean, X. Third, the
squares of the deviations between eac ra 2 ' ' described as the degrees
n - 1 on the bottom of the formula^X^xWUi population. Degrees

of freedom for this estima e matter is explained in Sections 19.4 and


of freedom is abbreviated df, an 1 ^ the latter section).
19.8 (starting with the last sentence on p. 336 m the
198 Chapter 22

Finally, the averaging: From sample D we estimate Oe2 , its parent popula¬
tion s variance, by computing £ {xE - XE)2/(nE - 1); from sample E we estimate
°E by computing £ {XE - XE)2 / {nE - 1); and so on for each sample. Each of the
estimates uses only information from within a single sample, note again; only
the raw scores within the sample, their mean, and their number go into the
estimate. The estimates are then averaged in a fancy way: (a) The tops of the
estimates, the quantities of the kind £ (X - X)2 , are added together. (b) The
bottoms of the estimates, the quantities of the kind (n - 1), are added together,
(c) The sum of the tops is divided by the sum of the bottoms. These operations
are summarized in Formula 22.1 on p. 395 of the text, which also reveals that
the sum of the tops is called the within-groups SSW, while the sum of squares,
sum of the bottoms is called the within-groups degrees of freedom, df

Though the text doesn't say so, SSw - SSE + SSE + SSE, and so on, while
dftyj = dfD + dfE + dfE, and so on.

For an example of these calculations, consult the example on p. 397 of the


text. In the arithmetic for SSW, pay attention to the square brackets, the "["
and the "]", which enclose the operations symbolized £(x - X)2 for each sample.

Details of the Among-Groups Estimate of G2

If you're disoriented again, go back to the big picture on p. 194 and reread
through Section 4c on p. 196, but skip Section 4b this time. Following are some
details of 4c.

As noted in the” next—to—the-last paragraph on the previous page, to estimate


the variance of a^population, we compute from a sample a quantity called s2 whose
formula is £(x - x) /(n - 1). What if the elements of the population are not raw
scores Jnot Xs), but sample means, Xs? We can still use the formula. Replace
X with X and call the mean of the xs X ("eks double-bar"). Replace n, the number
of Xs, with kf the number of sample means. The formula becomes £ (X - X)2/ (k - 1),
whi^ch appears in the middle of p. 395. This is the formula for what was called
s— in Section 4c above.
X

If the null hypothesis in the analysis of variance is true, s 2 reflects only


~X
the variation inherent in each population of scores, only 02, as we saw in Section
4c. How, then, could we estimate Q2 from s_2? Remember the formula for the
X
standard error of the mean: a— = O //n. When we have not G but an estimate of
a a x
it, s^, the formula becomes: s— = s^/n. Squaring both sides turns the equation

into one concerned with variances rather than standard deviations or errors.
Squaring tells us that an estimate of the variance of a population of Xs can be
had by dividing an estimate of the variance of the population of raw scores by
the sample size. That is, the estimate of the variance of the raw scores must be
scaled down through division by n. We already have an estimate of the variance
of a population of Xs; this is s— . So to derive an estimate of the variance of

the population of raw scores, we have to scale s—2 up by multiplying it by n.


Elementary Analysis of Variance 199

And when the ns of the several samples are equal, that's all there is to
the formula for the among-groups estimate of a : n, the common sample size, is
used as a multiplier for Z (X - F)2/(k - l)-that is, as a multiplier for our
friend s—2 • This gives Formula 22.2 on p. 395.
WhenXthe ns are not all equal, a slight modification is required, as incor¬
porated in Formula 22.3. The example on p. 397 is a case of unequa ns.

In either case, the product of sample size and £(X X) is called the among
groups sim of squahs, SS^, and the rest of the formula, * - 1, xs the among-
groups degrees of freedom, df^.
Disoriented again? Back to the big picture on p. 194. It’s always there if
you get lost in details-but at least you should now see where the details fit.

SUMMARY Continued

TWO-WAY ANALYSIS of VARIANCE

Two-way analysis of variance is applicable to a study in ^different

sample of scores.

able? And did the two treatments interact m mfluenc g P

The question about interaction is new. In general, the question of inter¬

action between two treatments may be phrased this way: Whatever the difference

among the several levels of one treatment, is it the same for each of the -

of the other__ [22.11, P- 406].

Variance Estimates
Each of the substantive questions gives rise to a o7

= \x
, and so on, where Cx names the
a
a null
null hypothesis
nyjJUbncoxo asserting
2J
that
- U(
'*C\ C2
C 33
treatment level represented by the first column in the table, C2 names the trea ■
200 Chapter 22

ment level represented by the second column, and so on. For the second substan¬
tive question, the statistical question is the plausibility of another null hy¬
pothesis, this one asserting that y^ = y^ = y^, and so on, where R, names the

treatment level representing by the first row, R2 names the treatment level rep¬
resented by the second row, and so on. For the third substantive question, the
statistical question is the plausibility of a third null hypothesis, and this one
asserts that there is no interaction between the two treatments in the pattern of
the population means for the individual cells of the table.

Each null hypothesis is tested by computing an F statistic, as in the one-way


analysis of variance, and each F is again formed by dividing one variance estimate
by another. For the three Fs, four variance estimates are needed.

1 * SWC (Within-cells estimate), derived from the variation among the scores

in the first cell, the variation among the scores in the second cell, and so forth

This measure is of interest because it is free from the influence of possible

differences between columns (column ___) , possible differences between

rows (row _), and also any interaction effect, if present [22.13]. it

therefore measures only inherent variation, and is analogous to in one-way


ANOVA [22.13].
o _ 2 ,
estimate), derived from the differences between column

---. [22.13]. If the null hypothesis about the population values of the

columns is correct, variation among column means (Xr , Xr , and so on) will be
° 1 2
affected only by inherent variation. Under these circumstances, sc2 will estimate
the same quantity estimated by [22.13]. If the null is false, sr2 will
C
tend to be larger than otherwise. It is therefore analogous to in one-way
ANOVA [22.13].

3. s ( estimate), derived from the differences between row


R

[22.13]. If the null hypothesis about the population values of the rows is cor-

nect, variation among row means (XR^f XR^, and so on) will be affected only by

inherent variation. Under these circumstances, sR2 will estimate the same quan¬

tity estimated by swc2. If the null is false, sR2 will tend to be larger than

otherwise. It is therefore just like ., except that it is sensitive to row


effect rather than to column effect [22.13].

4. s (
Rxc - estimate), derived from the discrepancy be¬

tween the means of the several columns / rows / cells and the values predicted

for each on the assumption of no interaction at the population / sample level


Elementary Analysis of Variance 201

[22.13]. If there is no interaction, SRXC2 wil1 be responsive only to


[22.13]
variation and will estimate the same quantity estimated by -

^ will respond to it and will therefore tend to


If interaction is present, s RXC

be larger / smaller [22.13].


Each of the three variance estimates s^ , and sRXC responsive (1) to

the presence of the effect for which it is named (_ effect, --


variation
effect, and effect), and (2) to

, on the other hand, is responsive only to inherent variation


[22.13] . _
[22.13] . When the first three estimates are at hand, F s may be formed by placing

each in turn in the numerator / denominator and s^c in the numerator / denomi¬

nator [22.13]. A significantly small / large F will then serve as an indicator

of the presence of the effect specially associated with the kind of estimate

placed in the numerator / denominator [22.13].

Formulas for computing the variance estimates appear in Section 22.14.

Degrees of Freedom for the Variance Estimates


Let C equal the number of columns, R equal the number of rows, and nwc equal

the number of scores within each cell (the n's for the cells are assumed to be^

equal). Since there are C deviations involved in the computation of sj, dfc =

[22.15]. In computing s^2, we consider the


; similarly, dfR =
deviation of each score in the cell from the cell mean. Consequently, each cell

contributes degrees of freedom, and = y [22.15]. Finally,


—-- WC L -

jf = [22.15].
a±Rxc ----

Main Effects and Interaction


effects [22.18]. The
The column effect and the row effect are called -

interpretation of a significant main effect is clear when there is no significant


. . , is present, the
interaction. However, when signirleant -----

meaning of tests for main effects may be clouded [22.18], It is particularly

important to study the several values of the column / row / cell means when the

test for interaction is significant [22.18].


202 Chapter 22

COMPARISONS

The F test for a treatment (the test for the one treatment in a one-way

analysis of variance or the test for either treatment in a two-way) examines

the hypothesis of no difference among population means for all subgroups of

the treatment. Often it is more interesting to inquire about a certain pattern

among the subgroup means than to ask the one overall question, though. Some¬

times the logic of the study will suggest the particular comparisons to be made,

and if so we will know in advance what comparisons would interest us. Compar¬

isons chosen this way are called planned / post hoc comparisons [22.19]. On

other occasions, comparisons come to our attention only on inspection of the

data. Such comparisons are known as planned / post hoc comparisons [22.19].

The same / A different strategy is desirable for examining post hoc compari¬

sons as/than for evaluating planned comparisons [22.19]. The way the compar-

is constructed is the same for both planned and post hoc comparisons; the differ¬

ence is in the way the comparison is [22 19]

Constructing a Comparison

A comparison is constructed from the means of the subgroup samples in such


a way that:

1. m a comparison, two/ two or more quantities are contrasted with each


other [22.20];

2. each term in the comparison is multiplied / divided by a coefficient


[22.20]; and

3. the total of the positive coefficients equals / exceeds the total of the

negative coefficients [22.20].

In general, a comparison, K, may be expressed as:

K = + + + [Formula 22.14]

where a^, a , etc., are the


for the several

of the treatment, and _ is the number of levels of the particular treatment

[22.20]. If some levels are not included in the comparison, the coefficients of

the means of these subgroups are assigned the value [22.20]


Elementary Analysis of Variance 203

Constructing Independent Comparisons


If possible, it is highly desirable to construct comparisons that are inde¬

pendent of each other. From among the k means of the levels of a treatment one

may construct a set of _ comparisons that are nearly mutually independent

[22.23]. When subgroup sample size is equal, it can be done as follows.

1. Construct the first comparison using two or more / all levels of the

treatment [22.23].
2. The second comparison must be constructed wholly from subgroups that fall

on one side of the first comparison. Again, use two or more / all available

subgroups [22.23].
3. Construct the third comparison by applying the procedure of step 2 to the

comparison just obtained.


iI j comparisons
Comparisons constructed this way are called ------

[22.23]. Adequacy of the procedure rests on reasonable approximation to the


. distributed with equal
assumptions that subgroup populations are ---
, , r?2 23]. Moderate depar-
__ and that subgroup n s are ____ L^z* J

ture from these conditions will not be crucial.

Estimating the Standard Error of a Comparison


To evaluate a comparison, we need to estimate its standard error, which is

symbolized sR. The formula for the calculation is:

[Formula 22.15]

SK = error

where s 2 is the variance estimate that would constitute the numerator / de-
w error in two-way
in one-way ANOVA;
nominator of the overall F test (-
of the subgroup A, etc.; and is
ANOVA); a is the coefficient of the —
the number of cases in subgroup sample A, etc. [22.21], The number of degrees of

freedom associated with this estimated standard error is the number associated

with __ [22.21].

Evaluating a Comparison
A comparison, planned or otherwise, may be evaluated by --
[22.22].
or by -------
204 Chapter 22

If interval estimation is chosen, the procedure for a planned comparison is

entirely analogous to estimation of — ]iy. The limits of the confidence inter¬

val are given by the rule K ± t^sR, and this is parallel to the rule in Formula

19.7 on p. 343 of the text: K is like (x - Y) , and is like s_

For post hoc comparisons, the text offers a procedure that, strictly speaking,

is applicable only in a situation where a preliminary overall F test for the

treatment has shown significance. If such a test has shown significance, then

there exists at least one comparison for which the null hypothesis will be re¬

jected / accepted at the same level of significance [22.24, p. 421]. As many

comparisons as are desired may be made, whether independent or not. The price

for such flexibility is that each comparison yields narrower / wider limits than

if it had been planned [22.24, p. 421].

SYMBOLISM DRILL

This drill is confined to the symbols used in the analysis of variance.


Pronunciation is given only where it's not obvious.

ONE-WAY ANALYSIS of VARIANCE

Symbol Meaning

1
Df E, F, The several subcategories of a

2
XD' XE' %E ' Scores in the several subcategories

3
XD' XE' xF, Means of samples / populations in the subcategories

4
Hd' He' hp' Means of samples / populations in the subcategories

5 X __ mean; mean of all scores (pronounced "eks double-bar")

6
___-groups estimate of a2

7 ss _ °f __ sum of squares of deviations


from mean)

8 ssw _-groups
Elementary Analysis of Variance 205

Meaning _______—
S vmbol

(ifrr -qroups ___—


ax W

-groups estimate of
V
cc -aroups __—--

A-F -qroups _____—


A -

QQ rji _____ ■ -—- -■—__—-----—

dfm ______________ “““

2 /
F sA /

TWO-WAY ANALYSIS of VARIANCE

Score m tne run

Score in the ith


xv
sample / population in the ith
of the
*Ci
sample / population in the ith
of the

sample / population in the ith


of the
UC.l
sample / population in the ith
of the
Ur
i
e 2 of G2
swc
estimate of

estimate of

estimate of
SRXC2
for
ssc
for
SSR

for
SSRxC
within
sswc
SST sum of
206 Chapter 22

Symbol Meaning

dfc for _
3 1

3 2
for _
dfR

3 3
for _
dfRxC

34
within
dfwc
3 5 degrees of _
dfT

3 6 F V or / or

COMPARISONS

3 7 K Sample / population value of a comparison

3 8 K Sample / population value of a comparison (pronounced "kappa")

3 9 Estimate of _ error of a _
SK
2 2
4 0 s s or s
error

4 1 F' Critical value of for a Scheffe comparison

ANNALS of EGREGIOUS EXAMPLES , Continued

Look back at the description of the marketing survey on p. 151 of this work¬
book, and note that in his first study, the researcher really tested five versions
of that frozen foodstuff he had been hired to evaluate.

1. What inferential technique should he have used on the full set of data
from the first study?

In the second study, he again tested five versions of a product, but this time
each subject tasted and rated all five versions.

2. Is the inferential technique appropriate for the first study also appropri¬
ate for the second? Why or why not?
INFERENCE ABOUT FREQUENCIES
23.1 Introduction

23.2 A Problem in Discrepancy between


Expected and Obtained Frequencies

23.3 Chi-Square (X2) as a Measure of


Discrepancy between Expected and Obtained
Frequencies

23.4 The Logic of the Chi-Square Test

23.5 Chi-Square and Degrees of Freedom

23.6 The Random Sampling Distribution


of Chi-Square

23.7 Assumptions in the Use of the Theo¬


retical Distribution of Chi-Square

23.8 The Alternative Hypothesis

23.9 Chi-Square and the 1 x C Table

23.10 The 1 x 2 Table and the Correction


for Discontinuity

23.11 Small Expected Frequencies and the


Chi-Square Test

23.12 Contingency Tables and the Hypoth¬


esis of Independence

23.13 The Hypothesis of Independence as


a Hypothesis about Proportions

23.14 Finding Expected Frequencies in a


Contingency Table

23.15 Calculation of X2 and Determination


of Significance in a Contingency Table

207
208 Chapter 23

23.16 Interpretation of the Outcome of


a Chi-Square Test

23.17 The 2x2 Contingency Table

23.18 Interval Estimates about Propor¬


tions

23.19 Other Applications of Chi-Square

PROBLEMS and EXERCISES

1_

2___

3 ___

4 _____

5 ______

6 ____

7
-——---—---

8
———— — "

9
— —--——— - --

10___

11 _

12
■ I... - ,

13:_ •———— . —— —- ■ - ■ — ■ ■
Inference about Frequencies 209

SUMMARY

May back an Chapt.r 2. . distinction — *.


,„d quantitative ones. '=“ of «» « Pi that ditfet

in" qua 1 i ty^ in'1 kind^" and" no t in quantity (nothin degree) ^

Ifsts examples are number of siblings and

• . | -l *

All of the descriptive and inferentiai ^chn^ttr^t!

course, with the exception of the bar drag ^ Nervations are numerical
for observations on guantitatrve varia^^^^ the frequency distributions of
scores, and it is scores that polygon and cumulative frequency curves
Chapter 3, by the histogram an deviation of Chapters 5 and 6, by the

prSP-S “PS ’.*w‘tn. t .«d TcLx:°t rsr t PKS B-

“12 s:tp“si“o°sr;s“nS:i ;:«=«-».“■*-


• i • _ •'t7n
niques involving z , +•
z, Fisher's
risucj. o z ,, and F.

in Chapter 23, we return to edYnTo r


there is one such variable, and eac su ] summarized by count-
of the categories that make up the variable^ ^category. The counts are fre-
ing up the number of subjects c assi their associated frequencies forms a
quencies, and the list of categories with their^associa ^ ^ p 428 of the

text^^The^frequenc^distribution^could be graphed as a bar diagram like that

011 "in^a'more complicated case, there are two ^^^0nftriable Tnf


subject is simultaneously classifie in ■da^ are SUIrmarized as frequencies,
one category from another vf»riabl - 9 various combinations of
here the frequencies with which subjects fall into
categories. An example is the table on page 438.
, t-hpre is one or two variables, the frequencies may
in either case, whether there corresponding proportions in the
be converted to proportions, and it is the co p s
parent populations that are of interest. _

,0 d,„ “^ripnpv’dSpn irPiPP t.


srps “"cM-.q»v= i. T *
more samples, (b) it has a cert^" attribution for each different number of
it, ,c) it has a ^“-Ni^fmay be used to test a null hypothesis or to
degrees of freedom, and ( ) Y ,,ff ence between two parameters. But the
estimate a population parameter or ^ chi-square are frequenc ies, whereas
numbers that enter into the computation of c 1- Qn means and standard
t (like its large-sample counterpart z ) and
deviations or variances.
210 Chapter 23

Although chi-square was developed for qualitative variables, it can be used


for quantitative ones if the several class intervals that divide up the scale
of measurement for such a variable are treated as discrete categories.

The CASE of ONE VARIABLE


The Null Hypothesis

The simplest case to which the chi-square statistic can be applied is that

in which frequency counts are available for the categories of a single variable.

Problems of this class are sometimes said to be characterized by a 1 x c ("one

by C") table, where C is the number of or class

[23.9] . In the 1 x C table, x2 may be used to test whether the relative fre¬

quencies characterizing the several categories or class intervals of a sample /

population frequency distribution are in accord with the set of such values

hypothesized to be characteristic of the sample / population distribution [23.9]

In any such problem, the hypothesized relative frequency of occurrence in each

category or class interval is dictated by the statistical / substantive hypoth¬

esis of interest [23.9]. The hypothesized proportions must / need not be equal
[23.9] .

The Alternative Hypothesis

#A> the alternative hypothesis, is simply that the null hypothesis is untrue

m some (any) way. Note that the distinction between a directional test and a

nondirectional one, encountered earlier, is still / not pertinent here [23.8],


with the exception described below.

Expected Frequencies

To conduct the test, we must generate expected frequencies, and these are

obtained for each category by multiplying the proportion hypothesized to charac¬

terize that category in the population by sample [23.4, p. 430]. An

expected frequency is the __ of the obtained frequencies that would occur

on infinite repetitions of an experiment such as the one actually done when the

null hypothesis is true/ false and sampling is [23.4, p. 430]

Computing Chi-Square

The chi-square statistic, /2/ provides a measure of the discrepancy between


Inference about Frequencies 211

Its basic formula, suited to this task, is:


expected and obtained frequencie:

[Formula 23.1]
X' = l
a iq the obtained frequency and sum
where is the expected frequency and - is the obrai
- characterizing a given
mation is over the number of -------
problem [23.3], Examination of the formula reveals several pornts of rnter

2
about X :
1. V2 cannot be positive / zero / negative since all discrepances are

squared; both positive and negative discrepancies make a posrtrve / negatrve

contribution to the value of X2 [23.3]. ,


2. X2 will be _ only in the unusual event that each obtaxned freque -

cy exactly equals the corresponding expected frequency [23.3].


i fhP larcrer / smaller the discrepancy between
3. Other things being equal, the la g /

the f 's and their corresponding f0's, the larger X wl11 be t23.
: But, it is not the size of the discrepancy alone that accounts for a con¬

tribution to the value of X2, it is the size of the discrepancy relative to the

magnitude of the expected / obtained frequency [23.3].


m-
5.
b. The value
me vdiuc of
v^.l. X2
A depends
— on the number of

volved in its calculation [23.3], The method of evaluating X ™ust there

take this factor into account. This is done by considering the number o -

(df) associated with the particular X [23.3].

Deqrees of Freedom for Chi-Square _ , _ iri


. - flparees of freedom was encountered first m Chapter 19
The concept of degrees or rree
connection with the t statistic and then again in connection with F. thos

t ings, „ Proved to be a function of sample - <».5]. However in


„ r, the number of degrees of freedom is determined by
problems with frequency data, the numb ^ each

the number of (fQ - fe) discrepancies that are -__-- "


other, and ii . arv" [23
are therefore "free to vary
5],
[23.5]. In general, the number of

grees of freedom for problems of the one-variable type will be C - 1, wh

involved [23.5].
is the number or ----
212 Chapter 23

The Logic of the Chi-Square Test

As noted above, an expected frequency is the _ of the obtained fre¬

quencies that would occur on infinite repetitions of an experiment when the null

hypothesis is true/ false and sampling is _ [23.4, p. 430]. When

the hypothesis is true, the several obtained frequencies will vary from their

corresponding __ frequencies only according to the influence of

random sampling fluctuation [23.4]. The calculated value of X2 will be smaller

when agreement between fQ's and fe's is good / poor and larger when it is not

[23.4].

When the hypothesized fe's are not the true ones, the set of discrepancies

between fQ and fe will tend to be larger/ smaller than otherwise, and, conse¬

quently, so will the calculated value of X [23.4]. To test the hypothesis, we

must learn what calculated values of X2 would occur under random sampling when

the hypothesis is true/ false [23.4]. Then we will compare the calculated

from our particular sample with this distribution of values. If it is so large

that such a value would rarely occur when the hypothesis is true, the hypothesis

will be accepted / rejected [23.4].

The Random Sampling Distribution of Chi-Square

When the hypothesis to be tested is true, and when the conditions noted below

obtain, the sampling distribution formed by the values of y2 calculated from re¬

peated random samples closely follows a known theoretical distribution. Actually,

there is a family of sampling distributions of /2 r each member corresponding to

a given number of_[23.6]. It is useful to know

that the of any member of the family of chi-square distributions is al¬

ways the same as the number of degrees of freedom associated with that particular

distribution [23.6].

The conditions that must obtain for a sampling distribution of x2 to follow

a known theoretical distribution are those stated in these assumptions:

1. It is assumed that the sample drawn is a _ sample from the

population about which inference is to be made [23.7].

2. It is assumed that observations are [23.7]. The set

of observeuions will not be completely when their number


Inference about Frequencies 213

is less than / equals / exceeds the number of subjects [23.7].

3. It is assumed that, in repeated experiments, observed frequencies will

be _ distributed about expected frequencies. With random sam¬

pling this tends to be true. There are two important ways in which this assump¬

tion may be violated:

(a) When f is small / large , the distribution of f 1s about fQ tends to

be positively skewed [23.7].

(b) The theoretical distribution of chi-square is smooth and continuous.

On the other hand, the obtained values of X2 actually form a _

distribution [23.7]. Comparing the obtained values of y2* which form a _

series, with the continuous distribution of theoretical values may re¬

sult in a degree of error [23.7] . The importance of this discrepancy is, for¬

tunately, minimal unless both n and df are small / large [23.7], A correction

exists when df = _____ [23.7] .

Special Considerations in the Case of a 1 x 2 Table

When the variable under study consists of only two categories, the data fall

into a 1 x 2 table, and the number of degrees of freedom is equal to one. In

the special circumstance that df = 1, a correction may be applied to compensate

for the error involved in comparing calculated values of X > which form a dis-

continuous / continuous distribution, with the theoretical tabled values of X r

which form a discontinuous / continuous distribution [23.10]. This correction

is known as Yates' correction, or the correction for__

and consists of reducing the discrepancies between fQ and f by _ before

squaring [23.10].

A second special consideration in this case is the availability of a one-

tailed test. When df = 1 (only), X2 = z2- We maY therefore calculate z = _

and compare that value with the critical one-tailed value of normally distributed

z [23.8, next-to-last paragraph]. The null hypothesis should be rejected only

for differences in the direction specified in the null / alternative hypothesis

[23.8].

A third special consideration is the size of the sample. When df = 1, both

f 's ought to equal or exceed _ [23.10].


214 Chapter 23

Fourth, note that this test may be conceived as a test about a single

__ [23.10, next-to-last paragraph] . The null hypothesis in

this conception states the value of the proportion in the population of interest;

the population value is symbolized P and the sample value p. The test of a

single proportion is conceptually analogous to the test of a single

[23.10]. The difference is simply in the test statistic involved; i.e., X or p,

whose statistical significance is assessed with t or y2, respectively.

Finally, just as with single means, it is possible to construct an interval

estimate of _ [23.10]. The procedure is explained on p. 446.

The CASE of TWO VARIABLES

The Null Hypothesis

So far, we have considered the application of chi-square to the one-variable

case. It also has important application to the analysis of

frequency distributions [23.12]. Here there are two variables of interest, and

each subject is simultaneously classified into one category of one variable and

one category of the other variable. The resulting frequency counts are cast

into a matrix like Table 23.3 on p. 438. Bivariate frequency distributions of

the type illustrated in Table 23.3 are known as tables

[23.12]. In many ways, such a table is similar to the bivariate frequency dis¬

tributions encountered in the study of correlation (see chapter 9). Indeed, the

major difference is that here the two variables are both qualitative / quantitative

variables rather than qualitative / quantitative variables [23.12].

From such a table we may inquire what cell frequencies would be expected if

the two variables are independent of each other in the sample /population

[2 3.12] . Then, chi-square may be used to compare the obtained cell frequencies

with those expected under the hypothesis of independence. If the fQ - f dis¬

crepancies are small / large , y2 will be small, suggesting that the two variables

of classification could be independent [23.12] in the population. Conversely,

a small / large y2 will point toward a contingent relationship [23.12] in the

population.
Inference about Frequencies 215

An alternative conception of the null hypothesis is possible. In general,

the hypothesis of independence in a contingency table (at the population level)

is equivalent to hypothesizing that the proportionate distribution of frequen¬

cies in the population for any row is the same for all _, or that in

the population the proportionate distribution of frequencies for any column is

the same for all _ [23.13]. So again the null hypothesis to be

tested by X2 maY k>e thought of as one concerning proportions.

The Alternative Hypothesis

No matter how the null hypothesis is conceived, the alternative hypothesis


states simply that the null is false, and the distinction between one-tailed and
two-tailed tests does not apply—unless each variable consists of only two cate¬

gories, as noted below.

Calculating Expected Frequencies

Frequencies expected under the hypothesis of independence (at the population

level) in a contingency table may be calculated as follows:

1. Find the column proportions by dividing each column total by the grand

total (n). The sum of these proportions should always be - [23.14].

2. Multiply each row total by these column proportions; the result in each

instance is the expected cell frequency (fe) for cells in that row. Keep the

result to one decimal place.

3. Check to see that the total of the expected frequencies in any row or in

any column equals that of the _ frequencies [23.14],

The same result could be obtained by finding the row proportions and multiply-

ing by totals [23.14].

Degrees of Freedom

To figure degrees of freedom in a contingency table, we consider that the

column totals and the row totals are fixed / free and ask how many cell frequen¬

cies are free to vary [23.15]. In general, for an R x C contingency table, their

number (and therefore the number of degrees of freedom) is --- where

C is the number of columns and R is the number of rows [23.15].

The Logic of the Chi-Square Test


of independence of classification is true at the pop-
If the null hypothesis
216 Chapter 23

l®^el t should expect that random sampling will produce obtained

values of X that are in accord with the tabled distribution of that statistic

for the appropriate number of degrees of freedom. If the hypothesis is false

in any way, the calculated value of X2 will tend to be smaller / larger than

otherwise [23.15]. As before, then (as for the case of a single variable), the

region of rejection is placed in the lower / upper tail of the tabled distri¬
bution [23.15].

Interpreting a Chi-Square Test

Since the contingency table is analogous to the correlation table, it might

be thought that x2, like r, provides a measure of strength of association. Al¬

though x2 may be converted into such a measure, it does not, by itself, serve

this function. The purpose of the chi-square test as applied to a contingency

table is to examine the hypothesis of__ between the two

variables at the population level [23.15]. Consequently, it is more nearly an¬

alogous to the test of the hypothesis, in a correlation table, that the true

correlation is [23.15].

Remember that a significant outcome of the chi-square test is directly ap¬

plicable to any row or column/ only to the data taken as a whole [23.16]. The

X which we obtain is inseparably a function of the R x c contributions (one

from each cell) composing it. We cannot say for sure whether one group is re¬

sponsible for the finding of significance or whether all are involved.

We should also remember that when small / large samples are involved, pro¬

portionately small differences may be responsible for statistically significant

differences [23.16]. Paying attention to proportions rather than frequencies will

help curb undue excitement upon obtaining a significant outcome in a small /

large sample [23.16].

Special Considerations in the Case of a 2 x 2 Table

The 2X2 table affords _degree(s) of freedom [23.17]. Consequently,

Yates' correction is applicable. We may, if we wish, proceed to treat this table

in exactly the same way as afforded an R x C table, except that each (fQ - f )

discrepancy would be reduced by _before _ing [23.17]. A special


Inference about Frequencies 217

formula is available, however, that incorporates the correction and reduces

computational labor. (See p. 444.)


A second special consideration in this case is the availability of a one-
tailed test. The alternative hypothesis may state not just that the nuxl is
false, but that it is false in a certain one of two ways. The procedure for a
one-tailed test in the case of a 1 x 2 table is also applicable here.

A third special consideration is again the size of the sample. Because df


= 1, all fe's should equal or exceed 5.
Fourth, just as the 1x2 chi-square table is related to testing a hypothe¬

sis about a single _, so is the 2 x 2 table related to the

test of the difference between two_from dependent / inde¬

pendent samples [23.17].


Finally, it is possible to construct an interval estimate of the difference
between the two proportions involved in a 2 x 2 table, as explained on pp.

447.

Statistics in Action , -------- “ “


The EFFECT of ENVIRONMENTAL NOISE on HELPFULNESS

in 1975 two psychologists reported a clever field experiment in which they


had unobtrusively observed reactions to a stranger who dropped a stack of boo s.
The incident was staged out-of-doors in a complex of student apartments on a
university campus, and the subjects were men who happened to walk into the
scenario alone. The stranger wore a cast on his arm, and he was carrying e
books from a car to an apartment when he spilled them. The subject was six feet
away at the time.
The dependent variable was a simple, qualitative one: did the subject help
the stranger retrieve his books, or didn't he? The independent variable was the
environmental noise: for half the subjects, it was at its normal level, about
decibels (di>), and for the other half it was raised to 87 db by a confederate
of the experimenters who ran a gasoline-powered lawn mower 25 feet from the
point where the stranger dropped the books.

The researchers reported the following results:

50 db 87 db
Yes 16 3
Help?
No 4 17
218 Chapter 23

1. Describe the data in terms of proportions.

2. Is this a case of one variable or two variables?

3* T^e chi-square test on these data can be conceived as a test for an


association between two variables or as a test for a difference between two
proportions.

(a) State the null hypothesis for a test for an association.

(b) State a two-tailed alternative hypothesis for this null.

(c) State the null hypothesis for a test for a difference between two
proportions.

(d) State a two-tailed alternative to this null.

4. Compute chi-square and draw a conclusion about the null hypothesis, usinq
the .05 level of significance.

The researchers conducted a parallel experiment in which the stranger did not
wear a cast on his arm, and here the noise level did not significantly affect the
proportion of subjects who helped him, which was 20% at 50 db and 10% at 87 db.
The effect of noise on helpfulness is thus not a simple one; sometimes it matters
and sometimes it doesn't.

Reference: K. E. Mathews, Jr., & L. K. Canon, "Environmental Noise Level as


1975 te™lri57^ °^7HelPin9 Behavior," Journal of Personality and Social Psychology,
SOME ORDER STATISTICS (MOSTLY)

24.1 Introduction

24.2 Ties in Rank

24.3 Spearman’s Rank Order Correlation


Coefficient

24.4 Test of Location for Two Independent


Groups: The Mann-Whitney Test

24.5 Test of Location Among Several Inde¬


pendent Groups: The Kruskal-Wallis Test

24.6 Test of Location for Two Dependent


Groups: The Sign Test

24.7 Test of Location for Two Dependent


Groups: Wilcoxon's Signed Ranks Test

PROBLEMS and EXERCISES

__ 2__
1_

_ 4_
3

_ 6_
5

__ 8__
7

10_
9

12_
11

_ 14_
13

16
15

17

219
220 Chapter 24

SUMMARY

In Chapter 24 the text returns to quantitative variables and describes some


alternatives to the techniques previously presented for dealing with observations
on such variables. In the realm of descriptive statistics there is an alterna¬
tive to Pearson's r, and in the realm of inferential statistics there are alter¬
natives to the t-test for two dependent groups, the t-test for two independent
groups, and the one-way analysis of variance.

The t-tests and the one-way analysis of variance are very efficient, in the

sense of providing high power for a given sample size, and they are the tech¬

niques of choice for quantitative variables. But to be absolutely correct the

techniques require that certain assumptions hold true for the distributions of

scores in the populations from which the available data came. For example,

_ of subgroup populations and homogeneity of _

are assumed for the t-test of the difference between _ of dependent /

independent samples and also for the one-way analysis of variance [24.1]. The

tests are quite "robust" against violation of such assumptions, in that they

yield results close to correct when the assumptions are wrong. However, a prob¬

lem can arise when the distributional assumptions are materially violated and

sample size is _ [24.1].

The alternative techniques presented in this chapter require less restrictive


conditions. The alternatives, however, especially the Sign Test, are somewhat
less efficient than the standard techniques when the assumptions necessary for
the latter are fully met.

All but the Sign Test among the techniques described in this chapter require

that the data be in the form of ranks, so if scores are on hand they must be

rank-ordered. Once ranks are available, though, the techniques that require them

are easy to use, and the Sign Test is exceptionally simple. Thus the techniques

of this chapter might be given special consideration when:

1. the data as gathered are already in the form of _ [24.1] ;

2. there is substantial reason to believe that the _

assumptions required for the more efficient techniques may be violated and when

sample size is _ [24.1];

3. rapidity of analysis and ease of computation are special considerations.

Dealing with Ties in Rank

A problem that arises quite frequently in translating scores to ranks is that


Some Order Statistics (Mostly) 221

of identical scores that therefore cause ties in rank. Most rank-order proce¬

dures are based on the assumption that the underlying measure is discrete / con¬

tinuous and that therefore theoretically there are no/ only a few ties [24.2],

There are various ways to deal with ties in rank. A simple and reasonably satis¬

factory scheme is to assign each of the consecutive scores in a tie the _

of the ranks that would be available to them [24.2]. This procedure usually has

little or no effect on the of the entire sample but tends to reduce the

[24.2]. Fortunately, the disturbance created in the statistic

being calculated is usually slight unless perhaps as many as a - of all

scores are involved in ties [24.2].

Spearman's Rank-Order Correlation Coefficient

Spearman's correlation coefficient, symbolized r^, is closely related to the

correlation coefficient [24.3]. In fact, if the paired scores

are both in the form of ranks (and there are no ties in rank), calculation of

and ___ r will yield similar / identical outcomes [24.2]. is also

used on occasion when both sets of measures are in score form. In this case,

each set of measures is translated into rank form, assigning _ to the lowest

score, to the next lowest, etc. [24.3]. When would one do this? Sometimes

the scale properties of the measures appear doubtful, as explained in Sections

2.7 and 2.8. If it can be concluded that what matters is that one score is higher

than another and that how much higher is not really important, translating scores

to will be suitable [24.3].

The formula for the Spearman rank correlation coefficient is:

[Formula 24.1]
rs =
between a pair of scores / ranks and n is
where D is the __

the number of __ of scores / ranks [24.3].

Exact procedures have been developed for testing the hypothesis of no correla¬

tion in the population sampled for very small samples, but good results may be

had for n >_ _ by finding the ciitical values required for significance for df

= in Table E of Appendix F [24.3], This is the same table used to

determine significance of Pearson __ [24.3].


222 Chapter 24

The Mann-Whitney Test

This test is an alternative to the t-test of the difference between means

of two dependent / independent samples [24.4]. The null hypothesis states

the identity of the two population distributions (the entire distributions)

rather than the identity of just the two means, medians, or whatever measure of

central tendency is used. Nevertheless, if the two population distributions are

of even moderately similar shape and variability, the Mann-Whitney is an excellent


test of
[24.4]. Since the test is on ranks, the

most closely corresponding measure of central tendency is the [24.

The procedure for conducting the test is as follows:

1. Label the two groups X and Y; if one group contains fewer cases than the

other, it must be labeled X/Y [24.4].

2. Combine all scores into one distribution of + ny cases. Then assign

the rank of 1 to the lowest / highest score, 2 to the next lowest / highest

score, etc., until all scores are ranked [24.4].

3. Find ZRX, the sum of the __of all scores in the X/Y distribu¬
tion [24.4].

The remainder of the procedure requires Table J in Appendix F and depends on


the alternative hypothesis.

The fundamental assumptions for this test are __ sampling with /

without replacement and only a few / no ties in rank [24.4]. A moderate number

of tied ranks does not substantially disturb the sense of the outcome.

The Kruskal-Wallis Test

This test is an alternative to the one-way / two-way analysis of variance

[24.5]. It may be thought of as an extension of the test

to more than two groups [24.5]. Like the __ test (and like

the one- and two-way analysis of variance procedures described in Chapter 22), it

is for dependent / independent groups [24.5]. The null hypothesis states the

identity of the several population distributions (the entire distributions again)

rather than the identity of just some particular measure of

-— for the several populations [24.5]. Under ordinary circumstances,

however, it is a good test for location. When examining a significant outcome,

the mean / median / mode is probably the best descriptive statistic to use [24.5]
Some Order Statistics (Mostly) 223

The test statistic for the Kruskal-Wallis procedure is called H and is

computed via Formula 24.2 on p. 461. With three groups and 4 or more cases per

group, the distribution may be used to evaluate H and will

give good approximate results [24.5]. With more than three groups, some groups

can have as few as 2 or 3 cases. Compare «calc with tabled values of -

(using Table G in Appendix F) for df = _ where k is the number of -

[24.5]. The region of rejection lies in the upper / lower tail of the -

distribution [24.5].
As with the _ test for two independent groups, the

effect of ties in rank is not great unless there are many of them [24.5]. As¬

sumptions for the Kruskal-Wallis test are the same as for the -.-

test: random sampling with / without replacement and only a few / no ties in

rank [24.5].

The Sign Test


The Sign Test and Wilcoxon's Signed Ranks Test are commonly used to test for

a difference in location for two dependent / independent groups [24.6]. For

the Sign Test, difference scores are calculated as though one were going to do

a t-test for dependent means using the procedure of Section 17.11, but all posi¬

tive difference scores are assigned the symbol "+", and all negative difference

scores are assigned the symbol Under the null hypothesis, we would expect

that there would be as many "pluses" as "minuses" in the sample within the limits

of sampling fluctuation, and we can compute a chi-square to test the null:

[p. 464, 1st formula]


+
x2 =

This statistic has 1 df.


in conducting a test according to these principles, it will occasionally

occur that some of the differences will be - and cannot, therefore, be

categorized as + or - [24.6]. This dilemma may be solved in one of several ways

Probably the simplest is to ignore such cases, reduce - accordingly, and pro-

ceed with the test on the remaining values [24.6].


224 Chapter 24

Evaluating the Sign Test by X2 will give reasonable accuracy for


or
more pairs of scores [24.6].

The assumptions required for this test are that the x - Y differences have
been randomly drawn from the
-----__ ot difference scores and that
sampling is with / without replacement [24.6], A third assumption is that no

difference is exactly - [24.6]. with regard to the last assumption,

using the method described above will be reasonably satisfactory provided the
number of _____ is small [24.6] .

Wilcoxon's Signed Ranks Test

The Wilcoxon test is an alternative to the


Test (and also to the
t test of the difference between two dependent / independent means) [24.7],

It is more sensitive than the -- Test, but it demands an assumption that

we may not be willing to make [24.7]: we must assume that differences between

pairs of scores can be placed in rank order. For the test itself, the assump¬

tions are random assignment of treatment condition to members of a

(and independent assignment among different _) , no differences of

and only a few/no ties in rank [24.7, p. 467]

To conduct the test, compute difference scores in the usual way. Then dis¬

regard the sign /size of the differences obtained, and supply ranks to the

absolute magnitude of the differences, assigning a rank of 1 to the smallest/

largest of the differences, 2 to the next smallest / largest , etc. [24.7, p.

466], Next, resupply the appropriate _ of the differences to the rinks

of the differences. The test statistic will be the of th» t u


Lne _ of the ranks with a
g e sign or the - of the ranks with a positive sign, whichever is

aller/larger [24./]. The rest of the procedure requires Table K in Appendix


F and depends on the alternative hypothesis.

We can see immediately why this test is more sensitive than the Sign Test.

That test responded only to the size / direction of the difference between a

pair of scores, whereas the Wilcoxon test uses additional information about the
size / direction of the difference [24.7].
; £\*r. t* - ■*-*■*- * * • _
'
answers

to Questions in this Workbook

CHAPTER 2

1. All adults in the state.

2. Their answers to the question asked m the poll.

3. The 500 people interviewed.

4. The 500 answers to the question asked in the poll.

5. 500, of course.

6. An element.
7. A constant so far as your survey is concerned. In a nationwide survey,
state of residence would be a variable.

8.
O.
Qualitative.
^uai.iv,uuj-»w.
9.
-
Discrete. 10. Nominal.
13. Ratio.
11. Quantitative. 12. In theory, continuous; in practice, discrete.

14. Qualitative. 15. Discrete. 16. Nominal

17. Most researchers would say yes, but all ^J^^the^Jnfattempt^


more of something than "No" does, namely n>°re approval But ordinal
to rank-order the respondents for the magnitude.^^ervafor a ratio scale.

--
is measured on a nominal scale.

18. Not nominal, because the variable is


scores. Not ordinal, because the suttee s we not indicate
favorability. Not ratio, because zero is "Reaves interval, but we can’t tell
the absence of the quantity of inter . . an along the scale. For example,
i< th. interval between m«.r by 1 »lt on
if Person A gets a score of 1 f 5 they also differ by 1 unit,
the scale. If C gets a score of 4 and D a ' between A and B is the
But we cannot tell whether the di erenc So we don’t know exactly
same as the difference in favorabi l Y e Section 2.8 says, it probably

S£! °ITTJZV. tSiSSi “ —


or even ratio scales without going far wrong.

227
228 Answers for Chapter 3

CHAPTER 3

Page 23:

Score Limits Exact Limits f Prop.f %f cum f Prop.cum f cum %f


3 .25 25 1.00 100

3 .25 25 9 .75

1
1
1
1
1
1
|
1
1 o
i
i
i
i
i
2 .17 17

LO
6


CD
o
1 8 4 . 33


3 .25 25 3 .25

ii
n
ii
ii
ii
ii
ii

ii

ii
ii

n
ii
i
i

it
i

ii
ii ii
i

ii
ii
i
i
i

n
ii
ii
ii
12 1.00 100

Page 24:

Score Limits Exact Limits f Prop.f %f cum f Prop.cum f cum %f


22.5 - 27.5 4 .33 33 1.00 100

17.5 - 22.5 2 .17 17 8 .67

12.5 -- 17.5 2 .17 17 6 .50

7.5 - 12.5 0 .00 0 4 .33

2.5 ■- 7.5 4 .33 33 4 .33


ii
ii
ii
ii
ii
ii
n
ii
ii
ii
12 1.00 100

Score Limits Exact Limits f Prop.f %f cum f Prop.cum f cum %f


5 .33 33 1.00 100

486 - 495 485.5 - 495.5 3 . 20 20 10 67

476 - 475.5 - 485.5 2 13 7 .46 46

466 - 475 465.5 - 475.5 5 .33 33 5 33


n
ii
ii
n
n
ii
ii
ii
i
i
i

i
i

i
i
i
i
i
i
i
i
i
i
i
i
i

15 .99 99

In this latter table, the sums of the proportionate and percentage frequencies
are not quite what they should be because of rounding error.
Answers for Chapters 3 and 4 229

CHAPTER 3 , continued

Page 26:

1. 98 2. 14

3. 87.5 4. 69.5

5. 75.5 6. 81.5 7. 78.5 and 81.5

8. 84.5 and 87.5 9. 87.5 and 90.5

10. 28 and 40

11. 52 and 64

12. 96 and 98

Page 28:

1. 66.5 inches.

2. A centile point.

3. Six-one = six feet + one inch = 72 inches + 1 inch = 73 inches. The 95th
centile point is 73.1 inches. Thus 5% of the men are over 73.1 inches, and so
a bit more than 5% are over 73 inches even.

4. Neither. The answer is indeed a percentage, but it's not the percentage of
cases falling below a given point along the scale of scores.

5. 69.3 inches.

6. A centile point (a score).

7. 67.9 inches and 69.3 inches.

8. 64.3 inches and 73.1 inches (which are C5 and C95, respectively).

9. 10%, or about 41 of the 411 or so. The table indicates that 10% were below
65.4 inches in height, and 20% were below 66.5. So going up the scale of heights
from 65.4 to 66.5 raised the cumulative percentage from 10 to 20, getting us an
additional 10% of the cases in the interval in question.

10. 25%, or about 103 of the 411 or so men. The logic behind this answer is
the same as that for Question 9.

CHAPTER 4

Page 37:

1. Skewed, with the tail on the right.

2. Skewed, with the tail on the left. Maybe J-shaped, even. It will be J-
shaped if the maximum score, 50, is the score that occurs most often (and is thus
the mode, as you will learn in the next chapter). Note that the size of the
group, 523, is irrelevant to the shape.
230 Answers for Chapters 4 and 5

3. J shaped, with the tail on the right.

4. Bimodal, but not necessarily symmetrical.

5. Normal, or at least unimodal and bell-shaped, very close to symmetrical,

rical.N°rmal' °r a9ain at least unimodal and bell-shaped, very close to syirnet-

for7thek!?v?h Wita *** taU °n the right‘ If the test is extremely difficult
for the sixth-graders, the shape might even be a backwards J.

8 . Bimodal, but not necessarily symmetrical

CHAPTER 5
Pages 43-44:

1 . The mode.

2 . The mean, because the sum of all values is unknown.

3. The mean, because


a change m the value of any score will change the sum
4. The mean.

5. The median.

6 . The mean.

7. The mode.

8. The mode.

9. The mean.

10 . The mean.
i—1
i—1

The mode.

12 . The mode.

13. The mean.

14. The mean.


i—1
LO,

The median.

i—1

The mode.

17. The mean.


00

The mean.

On the pages now following is the table of answers


m this workbook for all symbolism drills
Answers for Symbolism Drills 231

ANSWERS for SYMBOLISM DRILLS

Pronunciation Meaning_.
Svrnbol

1 "little en" Number of scores in a sample


n

2 N "big en" Number of scores in a population

3 "eks" or "big eks" A raw score, or the set of raw scores


X

4 "the sum of" Result of summing quantities of some kind


Z

5 "eks bar" Zx/n; the mean of a sample


X

6 "mew" ZX/N; the mean of a population


V

7 "median" C50 (may also be defined informally)


Mdn

8 Score or midpoint of interval with largest f


Mo "mode"

9 "little eks" X - \i or X - X; deviation score


X

1 0 "cue"
(C?5 _ C25)/2; semiinterquartile range
Q

1 1 "sigma squared" Zx2/Nj variance of a population


a2

12 /Zx^/N; standard deviation of a population


a "sigma"

1 3 "es squared" Zx2/n; variance of a sample


S2

14 /Zx2/n; standard deviation of a sample


s "es"

x/0 or x/S; z score


15 "zee"
z In general, (value - mean)/(standard deviation)

16 Pearson correlation coefficient for a sample


r "ar"

17 Pearson correlation coefficient for a pop n


P "rho"

18 "wi prime" Predicted raw score on Y


Y'

19 "zee prime sub wi" Predicted z score on Y


z'y

20 "es sub wi eks" Standard error of estimate of Y on X


SYK
21 Mean of sampling distribution of means
"mew sub eks bar"
vx
22 sigma sub eks bar" Standard error of the mean;
232 Answers for Symbolism Drills

Symbol Pronunciation Meaninq


2 3
s "little es" Estimate of 0; VZx2/ {n - 1)
24
"little es sub eks bar " Estimate of a—; s/Jn
X
25 H "aitch null"
Ho Null hypothesis

26 H "aitch sub ay"


ha Alternative hypothesis

26 yh
hyp "mew hype" Value of y stated in null hypothesis

28 "z" "zee quotes"


Approximate z score with denominator estimated
29
"zee crit" Critical value of z
Zcrit

30 a "alpha" Risk of Type 1 error; level of significance


31 g
"bayta" Risk of Type 2 error
32 s2
"little es squared" Estimate of a2; Zx2/(n - 1)
3 3
u "mew true"
true True value of y

34 y- mew sub eks bar minus Mean of sampling distribution of differences


X-Y wi bar" between means

"mew sub eks minus mew


5 (u-uv), sub wi, the quantity Value of u - u stated in null hypothesis
X Y hyp
hype" A 1

36 0- _ "sigma sub eks bar minus Standard error of the difference between
X-Y wi bar" two means

3 7 "little es sub eks bar


S-
X-Y minus wi bar" Estimate of a-
X-Y

38 D "dee" X - Y; difference score

39 C "see" Confidence coefficient


40 t
"tee" Conventional name for "z"

41 df "dee ef" Degrees of freedom

92 a "sigma sub ar"


r Standard error of r

43 z' "zee prime" Fisher's transformation of r


Answers for Symbolism Drills and Chapter 6 233

Symbol_Pronunciation Meaning

44 "sigma sub zee prime" Standard error of z'

"sigma sub zee prime Standard error of the difference between two
0 sub one minus zee
45
z'l-z'; independent Z1's
prime sub two"

CHAPTER 6

Page 55:
Ex = 48, n = 6, X = 8.0. Ex = 0. lx2 = 96, S2 = 16.0, S = 4.0.

Page 56, First Paragraph:


The standard deviations of the distributions tabled on pp. 53 and 55 work
out to be the same because the deviation scores for the distributions are the
same—which is to say that where the raw scores e in relation to their mean
is the same for the two distributions.

Page 56, Top Table:


lx2 = 54, s2 = 9.0, S = 3.0.
M

Ex = 60, n = 6, X = 10.0.
II
X

Page 56, Bottom Table:

Ex = 0. Zx2 = 20, s2 = 4.0, S = 2.0.


Ex = 50, n = 5 , X = 10.0.

Page 57:
Ex2 = 48, s2 = 4.0, S = 2.0.
M

O
X
II

EX = 48 n = 12, X = 4.0.

Page 58, Middle Table:

ZX2 = 480.

Page 58, Bottom Table:

lX2 = 654.

Page 59, Top Table:

IX2 = 520.

Page 59, Middle Table:

Ex2 = 240.
234 Answers for Chapter 6

CHAPTER 6, continued

Page 60:

1. The standard deviation.

2. The range.

3. The range.

4. The standard deviation.

5. The standard deviation.

6 . The standard deviation.

7. The semiinterquartile range.

8. The range.

Symbolism Drill:

See p. 231.

Page 62:

1. Skewed to the right.

2. Skewed to the right.

3. 3.2.

4. 2.0.

5. The very large scores pull the mean up. See the next-to-the-last para¬
graph on p. 41 of this workbook.

6. For the same reason as in the first distribution.

7. (7.6 - 0)/2 = 3.8.

8. (4.2 - 0)/2 = 2.1.

9. Because the distributions are so highly skewed, S is misleadingly large


See p. 91 of the text.

10. Since X = Zx/n, Zx = nX. Here nX = (176)(6.81) = 1198.56. The total must
have been a whole number and could have been either 1198 or 1199; both figures
round to 6.81 when divided by 176.

11. Again we must compute nX. Here the figures are (128) (2.97) = 380 16
which rounds to 380. * * '

12. All those genetic counselors in the U. S. who do diagnostic cytogenetics,


or, speaking more formally, their answers to each item on the questionnaire. (In
the more formal conception, there is one population for each item.)

13. The sample(s) included almost all elements of the population(s).


Answers for Chapter 7 235

CHAPTER 7

Page 68:
Deviation Squared
Squared
Raw Deviation z Score Score for Deviation
Deviation
Score Score z Score for z Score

+1.25 1.5625
+5 25 +1.25
13

+1.25 1.5625
+5 25 +1.25
13

+0.25 0.0625
+1 1 +0.25
9
-0.50 0.2500
-2 4 -0.50
6

1.00 -1.00 1.0000


4 -4 16 -

Ez = 0.00 E(z ~ z) = 0.00 E(z - z) 2 = 6.0000


Ex2 = 96
O
M

EX = 48
X
II

E(z - z)2/n
6 S2 = Ex2/n n = 6 Sz2 =
n =
6.0000/6
= 96/6 z = Ez/n
X = iX/n
= 0.00/6 — 1.0000

48/6 = 16.0
= 0.00 sz = /1.0000
=
8.0 S = /16.0
= 1.0000
= 4.0

Page 69:
For the answers to the symbolism drill, see p. 231
236 Answers for Chapter 8

CHAPTER 8
Page 75

TABLE OF EQUIVALENT SCORES

Score where Score where Score where


z Score Centile Rank if
0=100, 0=15 0=500, 0=100 0=50, 0=10 Shape is Normal

+3 145 800 80 99.9

+2
130 700 70 97.7

+1.5 122.5 650 65 93.3

0
100 500 50 50.0

-1
85 400 40 15.9

-2.5 62.5 250 25 0.6

-3 55 200
20
0.1

Page 77:

For the answers to the symbolism drill, see p. 231

Pages 77-78:

1 . 100 - 15 = 85. See p. 132 of the text for a beautiful illustration.


2. About 16%.

3. 70 - 100 = -30, which is 15 x -2. Therefore 70 is 2 standard deviations


below the mean.

4. About 2%.

5. 50 - 100 = -50, which is 15 x - 3 33. Therefore 50 is 3.33 standard devi-


ations below the mean.

- ,6' y°rdyg t0 Table B of Appendix F, 0.05% of the cases lie beyond a z score
or 3.30 (which is the closest we can get to 3.33 in the table). Coming in toward
the mean to a z of -2.00 (corresponding to an IQ of 70), we find that 2.28% of
the cases lie beyond it. That leaves 2.28 - 0.05 = 2.23% of the cases in the
lnterva^ between z - 3.30 and z = -2.00. So about 2% of the population is
mildly retarded by Zigler's definition.

7.
About 0.05% (actually somewhat less). This is about 1 person in 2000.
8 . •FlIn'cW° =5x12+2=60+2=62 inches. For men, the first centile
point
is 62.6 inches. Thus less than 1% of the men are below 62 inches in height.
9. Between 20 and 30%.
Answers for Chapters 9-11 237

CHAPTER 9

Pages 83-84:

1. Positive and high, surely very close to perfect.

2. Positive and probably at least moderate. The older children will have
both longer noses and larger vocabularies. The examples in the first two ques
tions here show that two variables can be correlated even though neither as

any influence on the other.

3. Almost certainly zero.

4. Positive and probably high. Note that instead of two scores for a single
subject, here we have two scores for a pair of subjects (a couple). The couple
is thus the equivalent of a single subject, in that it is the unit on which the

two variables are measured.

5. Still positive and high. The change in social custom would not influence
the relationship between the two variables; it would only raise the scores on
one variable (husband's age, relative to the scores on the °^er variable (wif,e s
age). A couple with a high score on husband's age would still tend to ha
high score on wife's age, and a couple with a low score on husband s age would
still tend to have a low score on wife’s age.

6 The question is nonsensical, because there are no pairs of scores here.


There is no logical way to pair a baseball player's height with a footba
player’s height; the two teams probably have different numbers of players, fo
one^thing. So the concept of correlation simply does not apply to a situa ion

likS ^Life is full of questions like this one, by the way, questions to which
the proper answer is, "That's a stupid question." Stay alert for them.

Symbolism Drill:

See p. 231.

CHAPTER 10

Symbolism Drill:

See p. 231.

CHAPTER 11

Page 95: If r is negative


If r is positive If r is zero

Y' = Y Y' < Y


If X > X Y' > Y

Y' = Y Y> = Y
If X = X Y' = Y
Y' > Y
Y' = Y
If X < X Y' < Y
238 Answers for Chapters 11 - 17

CHAPTER 11, continued

Page 96:

For the answers to the symbolism drill, see p. 231.

CHAPTER 12

Pages 101-102:

For the answers to the symbolism drill, see p. 231.

CHAPTER 13

Page 108:

y ~ ^X/N — (1+2+3+4+5+6+7+8+9+0)/10 = 4.5.

3. Ex2. As noted on p. 87 of the text. Ex2 = Ex2


Let's calculate
- (EX)2/N.
Here we have Ex = 285 - 452/10 = 285 - 2025/10 = 285 - 202.5 = 82 5 Then a =
/Ex2//V = 82.5/10 = /8.25 = 2.87.

Page 110:

3. The standard error of the mean, which is the standard deviation for the
real sampling distribution and not just your approximation to it, is 2.87/v^2 =
2.87/1.41 = 2.03.

For n = 10, the standard error of the mean is 2.87//Io = 2.87/3.16 = 0.91.
Pages 111-112:

For the answers to the symbolism drill, see p. 231.

CHAPTER 14
Page 117:

For the answers to the symbolism drill, see p. 231.

CHAPTER 15
Page 127:

For the answers to the symbolism drill, see pp. 231-232.

CHAPTER 16
Page 135:

For the answers to the symbolism drill, see pp. 231-232.

CHAPTER 17
Page 147:

EX = 30, X = 5.0; Ey = 30, Y = 5.0; Ed = 0, D = 0.0.


Answers for Chapter 17 239

CHAPTER 17, continued

Pages 148-149:

One way (out of several) to complete the proof is this:

Xj + x2 + x3 + ... + xN - yi ~ Y2 ~ Y3 ~ ~ yN

VD = " ~ N

{Xl + x2 + x3 + ... + XN) - (Y1 + Y2 + Y3 + . . . + YN)

= N

IX - ZY
N

lx lY_
N N

= UX - y,

Pages 150-151:

For the answers to the symbolism drill, see pp. 231-232.

Pages 152-153: _
1 Calling the two samples X and Y, sx, Y- sl' and
you should compute X,
X - 7 The latter should be compared to the average of sx and sy, as Section
6.14 ;f the text starting on p. 95 tells you, so you can get some idea of how
large the difference between the sample means is.

2. You should test a hypothesis about the difference between two Population
means. The sample means are independent. The null should state that
ence between the two population means of interest is zero, an have
should say that it is not zero, which is the two tailed case-
to choose an a level, estimate and calculate z *
3 You should proceed as in the first study (see Question 1), and in addition
you should calculate the correlation coefficient for the two sets of score .

4 This is a case of dependent means, so you have your choice of the pro¬
cedures described in Sections 17.10 and 17.11 of the text. The null hypo asls
should again declare no difference between the two population means and t
alternative should again be two-tailed. You would have to calculate s__Y or D
and a "z" again.
5 in the second study, each score for Variety 7> is paired with a score for
Variety B, but the two tables of data do not indicate the pairings,
impossible to compute s-_— or the difference scores an s^.
240 Answers for Chapters 18 - 22

CHAPTERS 18 - 21
Pages 159-160

Pages 166-167 <

For the answers to the symbolism drills, see pp, 231-233


Pages 173-175

Pages 184-185

CHAPTER 22
Pages 204-206:

1. treatment

3. samples

4. populations

5. Grand

6. Within

7. Sum of squares

8. Within-groups sum of squares

9. Within-groups degrees of freedom

10. Among-groups estimate of a2

11. Among-groups sum of squares

12. Among-groups degrees of ffeedom

13. Total sum of squares

14. Total degrees of freedom

15*

16. column

17. row

18. Mean of the sample in the ith column

19. Mean of the sample in the ith row

20. Mean of the population in the ith column


21. Mean of the population in the ith row

22. Within-cells estimate of a2


23. Column estimate of a2
24. Row estimate of a2
25. Interaction estimate of a2
26. Sum of squares for columns
Answers for Chapters 22 and 23 241

CHAPTER 22, continued

27. Sum of squares for rows

28. Sum of squares for interaction

29. Sum of squares within cells

30. Total sum of squares

31. Degrees of freedom for columns

32. Degrees of freedom for rows

33. Degrees of freedom for interaction

34. Degrees of freedom within cells

35. Total degrees of freedom

36. sc2/swc2 or sR2/Swq2 or sRXC /swc

37. Sample

38. Population

39. standard error of a comparison

40. sw2 or swc2

41. F

Page 206:

1. A one-way analysis of variance of the kind described in Chapter 22.

2. No, because the samples are dependent. There is a kind of one-way analysis
of variance suitable for the data in such a case, but like the t-test for depen¬
dent means, it requires knowledge of how the scores in one sample line up with
the scores in the other sample or samples. This was the information that the
businessman failed to record.

CHAPTER 23

Page 218:

1 Among the subjects tested at 50 db, 80% helped the stranger. Among those
tested at 87 db, though, only 15% helped. [80% = 16/(16+4), and 15% = 3/(3+17).]

2. Two variables. 3. (a) In the population sampled, there is no association


between noise level and helping vs. just walking by. (b) In the population
sampled, there is such an association. (c) The proportion of subjects helping in
the population that could be tested at 50 db equals the proportion helping in the
population that could be tested at 87 db. (d) Those two proportions are not equal

4. x2 = 14.44 with Yates' correction. df = 1, and the critical value of X


at a = .05 is 3.84. Ergo the null hypothesis can be rejected and the alternative

accepted.
245

HOMEWORK

On the following pages is homework, one double-sided page for each chapter
but the first. The answers to the homework problems appear only m the ms rue

tor's manual for the text.

Most of the problems in the homework are modeled after ones appearing in
the text or in this workbook, to encourage you to do those in the text and the

workbook for practice.

The space for your name is at the bottom of the second side of the homework
Daaes you will note, and you should write your name there upside down. The
person who checks your work is thus unlikely to know who you are until she or
he has finished the checking, and there will then be no question of bias in the

checking.
Homework for Chapter 2 247

See the comments on p. 245 of the workbook before beginning.

Suppose you're studying the effects of violent television programs on first-


grade boys. You run an experiment with two conditions. In your experimental
condition, 20 first-grade boys view some typical Saturday-morning fare with
plenty of violence, and in your control condition, another 20 first-grade boys
view something equally exciting but free of violence, like a series of races.
Each subject watches one program or the other, individually, and then goes out
to a playground. You determine the number of aggressive acts and the number of
altruistic acts each child commits over the first 30 minutes outside.

Say whether each of the following items is a population, a sample, an element,

a parameter , or a statistic m

1.
Pop Samp El Par Stat
the experimental condition.

Stat 2.
2. The 20 altruism scores in the control condition.
Pop Samp El Par

3. The number of aggression scores in the experimen


Pop Samp El Par Stat
condition, which is 20.

4.
Pop Samp El Par Stat
group described in Question 1.

5. The 17th child you tested in the control condition.


Pop Samp El Par Stat

6. The altruism scores you would have obtained had you


tested all possible first-grade boys in the control
Pop Samp El Par Stat
condition.

The number of aggressive acts committed by the child


7.
Pop Samp El Par Stat
described in Question 5.

8. The average altruism score for the 20 children in


Pop Samp El Par Stat
the control condition.

The 20 aggression scores in the experimental condi¬


9.
Pop Samp El Par Stat
tion.

10. The average of the scores described in Question 6


1—1
o

Pop Samp El Par Stat



Hi
r+

Now say whether each


0

or a variable in your study, and

crete or continuous.

Cons Dis Var Cont Var 11.

Cons Dis Var Cont Var 12.

13. Inclination to be altruistic.


Cons Dis Var Cont Var

Dis Var Cont Var 14.


Cons

Cons Dis Var Cont Var 15.

Cons Dis Var Cont Var 16.


248 Homework for Chapter 2

In a naturalistic study of aggression among children, you simply observe


children interacting on a playground. Your subjects include boys and girls
from several grades in several schools. For each of the measurement procedures
described below, say what level of measurement you're working at.

-L / . To specify a child's sex, you write "0"


Nominal Ordinal Interval Ratio
for a boy and "1" for a girl.

1—1
CO
To specify a child's sex, you write "1"

e
Nominal Ordinal Interval Ratio
for a boy and "2" for a girl.

19. To specify a child's sex, you write "M"


Nominal Ordinal Interval Ratio
for a boy and "F" for a girl.*

20. To measure a child's physical maturity


(in an admittedly crude way), you call
Nominal Ordinal Interval Ratio
the tallest child "1," the second tallest
"2," the third tallest "3," and so on.

21. To measure a child's sociability in the


setting under observation, you determine
Nominal Ordinal Interval Ratio
the total time the child spends interact-
with other children.

22. To measure a child's inclination to commit aggression at this time in


this place, you again count the number of aggressive acts the child performs
over a 30-minute interval. What level of measurement obtains here? The answer
may or may not be one of the four levels discussed in the text and the workbook.
Justify your answer.

*Did you hear about the girl who received a report card with "F" written in after
the word "sex"'? "'F' in sex!" she cried. "I didn't even know I was taking it."
Think about the level of measurement involved in grading on the scale A-B-C-D-F.

eureN
Homework for Chapter 3 249

Write the exact limits for the scores listed below.

Lower Limit Upper Limit

1. The weight of a 97-pound weakling measured to the


-- - nearest pound.

2. The weight of a 44-kilogram weakling measured to


- - the nearest kilogram.

3. The weight of a 100-pound weakling measured to the


- - nearest 10 pounds.

4. The weight of a 45—kilogram weakling measured to


-. - the nearest 5 kilograms.

5. A time of 9.9 seconds for a hundred-yard dash


- - timed to the nearest tenth of a second.

6. A distance of 100 yards for a 9.9-second dash


--—-- --- measured (very crudely) to the nearest 100 yards.

The question logically next is the one on the back of this page, but it
wouldn't fit on this side. You may wish to do it now.

In the table of selected centile points from the distribution of heights for

women on p. 27 of the workbook:

_ 7. What percentage of the women were shorter than five feet even?

_ 8. What percentage were between five feet even and five-three?

_ 9. The middle 40% of the distribution lies between what two values?

_ 10. What percentage of the women were over 65 inches in height?

_ 11. Cg o = ?

_ 12. How short can you be and still be taller than half the women in
this sample?

_ 13. Is the answer to Question 7 a centile point or a centile rank?

_ 14. Is the answer to Question 8 a centile point or a centile rank.

_ 15. Is the answer to Question 9 a centile point or a centile rank?

__ 16. Is the answer to Question 10 a centile point or a centile rank?

17. Is the answer to Question 11 a centile point or a centile rank?

In the table that answers Question 13 on p. 502 of your text:

18. What is the centile rank of a score of 79.5?

19. What is the 82nd centile point?

_ 20. What is C6?

21. The topmost 8% of the scores lie above what value?


250 Homework for Chapter 3

In a recent semester, 40 students who had enrolled in a statistics course


took the algebra test in Appendix A of your text. The test was scored as the
number of items answered correctly, and with 50 items the maximum score was
thus 50. The following jumble of scores resulted. Bring some order to this
chaos by grouping the data into class intervals with a width of 3. The topmost
interval should be 48-50. Cast the grouped data into the table below, giving
the proportions to 3 decimal places and the percentages to 1. You may or may
not need all the lines in the table.

45 44 47 29 28 37 41 34 34 50 47 34 36 17 43 42
40 23 36 28 22 28 33 25 38 49 21 43 25 44 15 37
35 18 32 41 32 42 41 35

Those are real data, so if you took the algebra test, you can meaningfully
compare your score with them. You might want to compute your centile rank in
the distribution. Also, if you won't be getting this page back before you have
to do the homev/ork for the next chapter, you should make a copy of the table.

In the table that answers Question 13 on p. 502 of the text, what are the
following centile points and centile ranks? Use the procedures of Sections 3 10
and 3.11 in your computations.

- The 16th centile point. _ The 90th centile point.

- The centile rank for 64.5. _ The centile rank for 68.0.

eureisi
Homework for Chapter 4 251

.ainpura fpcf aciain. On this side of the page.


Here are those 40 scoresdistribution grouped into class intervals 3
make a frequency po ygon s g 40.50 (This will be the frequency polygon
units wide with the topmost interval 48 50 (This w Qn the homework
corresponding to the grouped frequency distribution tnau y _
for Chapter 3.) Your vertical axis should show raw frequencies.

as large as possible.
Be neat, and plan ahead so your graph is

47 34 36 17 43 42
37 41 34 34 50
45 44 47 29 28 37
21 43 25 44 15
22 28 33 25 38 49
40 23 36 28
41 32 42 41 35
35 18 32
252 Homework for Chapter 4

Now make a cumulative percentage-frequency curve for the data on -f-h

rsm:rLte°S9ofh:a^os x,igrJiTnbg ;r intrais 3 units wide

L“:rpercentage — -- 3.s

sideways^fy^Lke?" ^ 9Mph 33 large aS Possib^. Turn the page

0UIPN
Homework for Chapter 5 253

test again
Here are the scores on the algebra
34 50 47 34 36 17 43 42
47 29 28 37 41 34
45 44
38 49 21 43 25 44 15 37
23 36 28 22 28 33 25
40
32 41 32 42 41 35
35 18

First, leave the data ungrouped. In the space to the left below, arrange the
scores in order of magnitude, as in Table 3.2 of the text, and show your work in

answering the following questions.


1. Identify the mode of these

2. Find the median, defining it


informally.

3. Calculate the mean to one


decimal place.

Now suppose the test had been scored o.s


the number of items not answered correctly.
The student with the score of 45 would then
have earned a 5, for example. This change
in scoring is equivalent to subtracting 50
from each score and calling the resulting
negative numbers positive.

4. What would the modal score be


“ ’ for this other way of scoring
the test?

5. What would the median (defined


informally) be?

6. How did you figure out Question 5?

7. What would the mean be? 8. How did you figure out Question 7?
254 Homework for Chapter 5

Now let the data (as originally collected) be grouped into class intervals
of width 3 with 48 - 50 on top. (This is the way you've been grouping the data
in previous homework.) In answering the questions below, don't bother to show
the grouped frequency distribution again, but do show your computations.

__ 9* What is the mode of the grouped data?

10. What is the median (defined formally as C50)?

11. What is the mean?

In the 1970 census, American women over age 45, who had presumably completed
any childbearing they were going to do, reported the number of children they had
borne. Some said none (about 6% had never married and about 10% of those who did
marry had remained childless), some said one, some said two, and so on. The mean
oyer all the women in this population was about 2.6 (W. Petersen, Population 3rd
ed., New York: Macmillan, 1975, p. 533).

What would the mean have been if the following events had happened?

- 12. Each woman had one additional child. (Those who had borne none
in reality would hypothetically have had one.)

- 13. Each woman had two times as many children as she actually did.
(Those who had really borne none would hypothetically have had
2xo=0 still.)

- 14. Each woman had first two times as many children as she actually
did, and then one additional one.

- 15. Each woman had first one child more than she actually did, and
then enough more to double the resulting number.

suiun
Homework for Chapter 6 255

Once again , the scores on the algebra test.

34 50 47 34 36 17 43 42
47 29 28 37 41 34
45 44
49 21 43 25 44 15 37
28 22 28 33 25 38
40 23 36
32 41 32 42 41 35
35 18

show your work


data ungrouped. In the space to
Leave the
in answering the following questions

1. What is the range of these data?

2. What is ZX2? (If you're using a


calculator, there is no need to
show your work for this item or
the next one.)

3. What is Zx?

4. What is (Zx)2?
5. What is Zx2?

6. What is S2?

7. What is S?

NOW suppose, as you did in the homework for the last chapter, that the test

resulting negative numbers positive.

Yes No 8. Would ZX2 change?

Yes No 9. Would Zx change?

Yes No 10. Would (Zx)2 change

Yes No 11. Would n change?

Yes No 12. Would Zx2 change?

Yes No 13. Would S2 change?

Yes No 14. Would S change?

15 . Explain your answers to Questions 12 , 13, and 14


256 Homework for Chapter 6

Let the data now be grouped in the familiar way, into class intervals of
width 3 ^ with 48 50 on top. Show your computations for the questions below,
but don t bother to copy in the grouped frequency distribution.

_ 16. What is 'Lx2?

17. What is S'2?

18. What is S?

In the fall of 1977, the Educational Testing Service reported that 54,903
people had taken the Graduate Record Exam, and that their scores on those'items
measuring verbal aptitude had a mean of 503 and a standard deviation of 126.
What would the new mean and the new standard deviation be if the following silly
operations were performed on each of those 54,903 scores?

New Standard
New Mean
Deviation

19. 10 points are added to each score.

20. Each score is multiplied by 2.

21. 10 points are added to each score, and the resulting


value is then multiplied by 2.

22. Each score is multiplied by 2, and 10 points are then


added to the result.

23. Each score is divided by 3.

24. 50 points are subtracted from each score.

25. Each score is divided by 3, and 50 points are then sub¬


tracted from the result.

26. 50 points are subtracted from each score, and the re¬
sulting value is then divided by 3.

0UTCN
Homework for Chapter 7 257

These exercises will probably strike you as repetitious and


do them carefully, though, because the principles you'll ^ learnrng and practrc
ing are essential for the understanding of important and interesting matters

coming up in this course.

Answer these questions to 4 decimal places (e.g., .1234). If a collection


of scores is normally distributed, what proportion fall...

1. above z — +0.50?

2. above z = +1.50?

3. below z = -2.50?

4. below z = -3.50?

5. above z = -0.50?

6. above z = -1.50?

7. below z = +2.50?

8. below z = +3.50?

9. between z = +0.60 and z = +1.20?

and z = -2.40?
CO
o
1—1

z
II

10. between
1

and z = +0.60?
o
00
t—1

z
II

11. between
1

12. between z = -2.40 and z = +1.20?

+0.80?
13. outside the limits z = +0.40 and z
-1.60?
14. outside the limits z = -1.20 and z

15. outside the limits z = -0.40 and z +0.40?

16. outside the limits z = -0.80 and z +0.80?

10 scores for the general public on the Wechsler Adult Intelligence Scale
(the WAIS) are normally distributed with a mean of 100 and a standard de.iation
of 15 Answer the following questions to 2 decimal places (e.g., 0.12,). What
percentage of the general (adult) public has a Wechsler IQ...
18. below 85?
17. below 70?
20. above 145?
19. above 115?
22. above 85?
21. below 130?
24. between 70 and 130?
23. between 85 and 115?

25. between 55 and 145?


258 Homework for Chapter 7

Answer these questions to 2 decimal places (e.g., 1.23). If a collection


of scores is normally distributed, what z score...

26. divides the upper 10% of the scores from the remainder?

27. divides the upper 20% of the scores from the remainder?

28. divides the lower 25% of the scores from the remainder?

29. divides the lower 40% of the scores from the remainder?

30. divides the upper 60% of the scores from the remainder?

31. divides the upper 70% of the scores from the remainder?

32. divides the lower 80% of the scores from the remainder?

33. divides the lower 90% of the scores from the remainder?

Answer these questions to 2 decimal places again. If a collection of scores


is normally distributed, what z score limits identify...

___________ 34. the central 90% of the scores?

_ 35. the central 80% of the scores?

__ 36. the outermost 70% of the scores?

37. the outermost 60% of the scores?

Answer these questions to the nearest whole number (e.g., 123). If a distri¬
bution of scores on a standardized aptitude test is normal in shape with a mean
of 500 and a standard deviation of 100, what is the raw score (not the z score)...

_ 38. below which 50% of the scores fall?

__ 39. below which 75% of the scores fall?

40. above which 85% of the scores fall?

41. above which 95% of the scores fall?

Again answer to the nearest whole number. In the distribution described just
above, what are the raw scores (not the z scores)...

42. that enclose the central 50% of the scores?

auiPN
Homework for Chapter 8 259

Fill in the missing values in the table below, noting that the four scores
on a given line would be truly equivalent only if the distributions from which
they came had similar shapes. This exercise is modeled after the one in the
workbook on p. 75.

Where the answer is not a whole number, give it to one decimal place (e.g.,
123.4)—except that you should give the centile ranks in the right-hand column
to two decimal places (e.g., 12.34).

TABLE of EQUIVALENT SCORES

Score where Score where Score where Score where Centile Rank if

y=100, a=15 y=100, a=10 y=500, a=100 y=80, a=20 Shape is Normal

850

125

100

54.50

50.00
I

94

84

300

10
260 Homework for Chapter 8

In the fall of 1977, a college senior took the Graduate Record Examination
and received the following information from the Educational Testing Service,
which constructs and scores this instrument:
Quantitative Verbal Analytic
Aptitude Aptitude Aptitude
Student's own score: 440 560 585
Mean score for all who took the test: 525 503 513
Standard deviation for all who took the test: 133 126 129
Her centile rank in this group: 25 66 63

1* What is her z score (to 2 decimal places) on the guantitative part?

2. If the quantitative-aptitude scores of all who took the test {N =


54,903) had been normally distributed, what would her centile rank ,
have been (to the nearest whole number)?

3. What is her z score (to 2 decimal places) on the verbal part?

4. If the verbal-aptitude scores of all who took the test had been
normally distributed, what would her centile rank have been (to the
nearest whole number)?

5. What is her z score (to 2 decimal places) on the analytic part?

6. If the analytic-aptitude scores of all who took the test had been
normally distributed, what would her centile rank have been (to the
nearest whole number)?

Now for each subtest compare the student's actual centile rank with the one
she would have earned in a normal distribution. The comparison doesn't provide
conclusive evidence, but it does permit an informed guess about the actual shape
of the distribution of scores. If the discrepancy between her actual centile rank
and the one she would have earned in a normal distribution is small, we have
little evidence against the most plausible hypothesis, which is that the true
shape is normal. If the discrepancy is large, we do have some good evidence
against the hypothesis of normality, and we can tell whether the shape is skewed
left or skewed right. So for each subtest, indicate your conclusion about the
shape of the distribution of scores. if you infer a skew, spell out your reason¬
ing about the direction of the skew.

7. Quantitative aptitude:

8. Verbal aptitude:

9. Analytic aptitude:

euiep
Homework for Chapter 9 261

In the fall of 1977, Ramapo College offered a statistics course (taught by


someone other than the author of your workbook and using a text other than yours)
in which the students were tested with a total of 500 multiple-choice items over
the semester. Eleven students completed all the work for the course, and here
for ten of them are their scores on the first examination, expressed as a percent¬
age correct out of the 39 items on that exam, along with their percentage correct
out of the semester's total of 500 items. The student at the median on the first
test was dropped from the table to reduce the n to 10 and thus simplify the cal¬
culations you will be asked to do.

Student % Correct on Exam 1_% Correct over Semester

38 75
A

54 65
B

62 94
C

67 81
D

67 84
E

72 93
F

77 90
G

77 93
H

82 90
I

85 95
J

How closely is performance on the first test related to performance over the
entire semester? To begin to answer this question, make a scatter plot of the
data in the space below. Do it neatly and as large as possible.
262 Homework for Chapter 9

To provide a more precise answer to the question about the relationship


between performance on the first exam and performance over the entire semester,
compute the Pearsonian correlation coefficient for the data on the other side.
Use the raw-score method illustrated on p. 150 of the text, and find the means
and standard deviations of the variables while you're at it.

Don't bother to list the individual values of x2 and F2, but do show your
other work. Give all values that are not whole numbers to three decimal places
(as 1.23, e.g.). This is more than the number that would usually be reported
for a mean or a standard deviation or a correlation coefficient, but you'll need
the extra accuracy for future work with these data.

IX IF

IX2 If
<N
w
*

Xv

X F

sx

IXF

Ixy

Now that you ve found the means, go back to your scatter plot and add lines
that show the locations of the means, as in the figure on p. 148 of your text.

-- .. Which students' data points lie in the first quadrant? List


the letters that identify the students.

-_______ Which students' data points lie in the second quadrant?

... Which students' data points lie in the third quadrant?

__ Which lie in the fourth quadrant?

euieisi
Homework for Chapter 10 263

1. in doing the homework for Chapter 9, you computed a correlation coefficient


to describe the relationship between a student's performance on the first exam
in a certain statistics course and the student's performance over the entire
semester. Is it sensible to describe this number as the correlation coefficient
for initial performance and total performance in a statistics course? Why or
why not?

2. Suppose you classify all the registered voters in the U. S. by their age.
You group all the 18-year-olds together, all the 19-year-olds together, and so
on. Would the Pearsonian correlation coefficient do a good job of describing
the relationship between a group's age and the proportion of the people in that
group who actually voted in a given election? Why or why not?

The following data (taken from an almanac) indicate the sort of numbers you'd
be working with. These are estimates of the national turn-out for the 1972
presidential election, which was the first such election in which citizens under
21 could vote.

% of Those
Age
Registered
Bracket
Who Voted

18 - 20 48.3
21 - 24 50.7
25 - 29 57.8
30 - 34 61.9
35 - 44 66.3
45 - 54 70.9
55 - 64 70.7
65 - 74 68.1
75 & + 55.6
264 Homework for Chapter 10

3. Thanks to your competence at statistics, you've been hired by a company


that constructs and sells tests to public-school systems. Your job is to develop
a test to compete with a widely used instrument for measuring grade-school chil¬
dren's reading ability. You work up some items that seem promising, put them
together into a test that has the advantage of taking less class time to adminis¬
ter, and then check to see how closely scores on your test correlate with scores
on the competing test. The results are disappointing. In a sample of first-
graders, the Pearsonian correlation coefficient is only .32, and in a sample of
sixth-graders the coefficient is only .39. You'd like to be able to say in ad¬
vertising the test that it yields scores closely correlated with the instrument
now m common use. How can you squeeze a higher correlation coefficient out of
your data? There's a way to do it without changing any of the students' scores.
(This tactic would still be unethical, though.)

.4* Have you ever wondered just how the amount of studying a person does on
a given subject relates to the person's mastery of that subject? Suppose you
questioned a variety of your classmates, asking each: a) How much time did you
spend in studying for whatever objective examination you took most recently, and
b) what percentage of the items on the exam did you get correct? Imagine that
the Pearsonian correlation coefficient for these two variables turns out to be
negative in your sample. Say there are 15 people in the sample, and the value
of r is -.32. Would you be tempted to reduce your studying time in the expecta¬
tion that your grades would increase? Name at least two reasons why your finding
(r ~ *32) Pr°vides only very weak evidence that more studying time causes exam
performance to deteriorate.

BureN
Homework for Chapter 11 265

Chapter 9 again.
the data from the homework for

Student % Correct on Exam 1 % Correct ovei

38 75
A
54 65
B
62 94
C
67 81
D
67 84
E
72 93
F
77 90
G
77 93
H
82 90
I
85 95
.T

Use the data to determine the regression equation tor


correct over the entire semester from percentage correct on the first exam o
correct over the^en u need the means and standard deviations of the

work.
7' =

Copy this equation for use in the next chapter's homework.


Now make another scatter plot of the data, as you did for Chapter 9, but this
time add the regression line to it.
266 Homework for Chapter 11

The student who was at the median on the first exam (and who was
omitted from the table) earned a score of 69% correct on that exam
Use your regression equation to predict this person's score over
the entire semester (to 2 decimal places). The actual figure was
86% correct. Show your work below.

Now use the regression equation to predict performance over the entire
semester for the 10 students who contributed data to the table. Fill in the
table below, which parallels the one on p. 186 of the text. Give the values of
F' to 2 decimal places. Remember that Z(Y - Y')2 should be 0, but it may be a
little off because of rounding error.

i r• \Y - Y ) (Y - Y’)
A 38 75

B 54 65

C 62 94

D 67 81

E 67 84

F 72 93

G 77 90

H 77 93

I 82 90

J 85 95

Compute Syx to 2 decimal places, doing it directly as \/Z(Y - Y') 2/n.


Show your work:

Now compute to 2 decimal places from the formula on p. 187, again


showing your work. Did you get the same value?

aureN
Homework for Chapter 12 267

1, Look back at the regression equation for the data presented in


the homework for Chapters 9 and 11. What is the regression coefficient for

those data?

2. Interpret the regression coefficient in the manner described on the


bottom of p. 202 in the text. Remember that you are talking about percentage
points, because the scores indicate percentage correct on examinations.

3. Compute k to two decimal places for the data on initial perfor


mance andTtotal performance in that statistics course.

4. Does k indicate a strong relationship between those two variables or a


weak one? Interpret k in the manner described on p. 208, by specifying the
reduction in the errors of prediction relative to the case in which the correla

tion is zero.

5. Compute the coefficient of determination to two decimal places


for the data on initial performance and total performance in the statistics

course.

coefficient of determination in the manner described at


6. Interpret the
the bottom of p. 210.
268 Homework for Chapter 12

7. Recall the test of reading ability you were (hypothetically) constructing


for Question 3 of the homework for Chapter 10. The test yielded scores that
correlated only weakly with another test presumably measuring the same thing
when you looked at a sample of first-graders and again when you looked at a
sample of sixth-graders. Suppose you now collect data on a good number of chil¬
dren in each grade from the first through the sixth. Do you expect the correla¬
tion for the entire group to be larger than the values for just the first-grader
or just the sixth-graders, or do you expect the correlation for the entire group
to be smaller? Justify your answer, and note that it again has some bearing on
the ethics of evaluating and advertising tests.

8. That company you're working for now develops a set of materials for
teaching reading. (There's a huge market for this kind of thing.) To persuade
potential customers that'the materials work, it is necessary to try them out,
collecting data before and after students use the materials. Now the company
has to decide what kind of sample to study: children who are already well above
average for their age in reading ability, children who are average or close to
it for their age, or children who are well below average for their age. There
is an unethical choice you could make here that would virtually guarantee that
the mean reading-ability score of the children in the sample would increase from
before the use of your company's materials to afterwards, even if the materials
were ineffective. Which choice is this, and why will this sample's mean almost
certainly rise from the pretest to the posttest no matter how poor the instruc¬
tional materials are?

0UIPN
Homework for Chapter 13 269

Suppose you're conducting research on something like errors in social


judgment, something that might be influenced by your subjects' intelligence.
You're accordingly worried about getting a sample of subjects who, as a group,
are unusually bright or unusually dull. You know that IQ scores for the general
adult public on the Wechsler Adult Intelligence Scale (the WAIS) are normally
distributed with a mean of 100 and a standard deviation of 15. If your subjects
will be a truly random sample of this population, you can correctly predict the
likelihood of getting a group whose mean IQ lies more than, say, 5 or 10 points
away from the population mean of 100. So, onward to the problems below, the
answers to which will tell you whether it's realistic to worry about such things
as getting a sample whose mean IQ is below 90 or over 110.

Answer these guestions to 4 decimal places (as .1234, e.g.), and show your

work.

First, suppose your sample size is going to be 9. What is the probability


that those 9 people will have IQ scores whose mean is...

1. over 105?

2. under 90?

3. more than 5 points away from 100 in either direction (i.e., more
than 100 + 5 or less than 100-5)?

4. more than half a standard deviation away from the population mean
(The standard deviation this question refers to is that of the
population.)

5. within one standard deviation of the population mean?


270 Homework for Chapter 13

Now suppose you


almost triple your sample size to 25. Again answer to 4
decimal places, and
show your work. What is the probability that the 25 people
in your sample will have IQs with a mean...

6. over 105?

7. under 90?

8. more than 5 points away from the population mean?

9. more than half a standard deviation away from the population mean?

10. within one standard deviation of the population mean?

11. Suppose your sample will be quite large, with 100 persons. To 4
decimal places, what is the probability that the 100 people will
have Wechsier IQs whose mean lies within 1 measly point of the
population mean? You may find the answer surprisingly large.

12. With a sample as large as 100, it doesn't much matter whether the distri¬
bution of IQs is normal in the population. Even if it departs considerably from
normality, we can still be quite confident that the answer to Question 11 is
correct. Why is this? There are three "magic words" that name the reason, and
the briefest possible answer to this question (an answer that is still entirely
correct, though) requires no more than those three little words.

auiPN
Homework for Chapter 14 271

An honest die is one whose six faces turn up with equal probability. The
"other" way of looking at probability described in Section 14.2 of the text thus

applies to it.
In answering these questions, assume that the dice are honest, and 9iye the
requested probabilities both as common fractions reduced as far as possi e (as
1/6, e.g.) and as decimal fractions to four decimal places (as .1234, e.g.).
You'll do best if you first translate each question into an OR question, an
question, or a combination of the two, whichever is appropriate. (See p. 115 of
the workbook.) Show your work below each of the questions.

With a pair of honest dice, what is the probability of rolling...

1. a 5 on a certain one of the dice (call it Die #1) and


" a 6 on the other (on Die #2)?

2. a 6 on Die #1 and a 5 on Die #2?

3. a 5 on one die (it doesn't matter which) and a 6 on


the other?

4. a 6 on one die (it doesn't matter which) and a 6 on the

other?

5. "doubles" (the same number on both dice)?

To draw a sample of some given size at random is to draw it in such a way


that all possible samples of that size are equally likely ^drawing a car
(a sample of size 1) at random from a deck of playing cards, then, the other
way “looking at probability described in Section 14.2 applies. Answer these

questions as you did the series above.

If you draw a card at random from a standard deck of 52, which is the

probability that you will get...

6. the Queen of Hearts?

7. a queen (of any kind)?


272 Homework for Chapter 14

8. a heart (of any kind)?

9. a queen OR a heart?

Now suppose you draw a first card at random, look at it, replace it, and
draw again at random. This is sampling (drawing a sample of size 2) with re-
p acement, and it is equivalent to drawing the first card from one deck and
the second card from a second deck.

What is the probability that you will get,

10. the Queen of Hearts on both draws?

11. a queen (of any kind) on both draws?

12. a heart (of any kind) on both draws?

13. a queen on the first draw AND a heart on the second?

14. Tne LaMaze method is a technique of prepared childbirth permitting a


(Laboring woman to participate actively in the delivery of her child, perhaps
o viatmg che need for analgesia or anesthesia. In the MaMaze classes attended

and ?!S1few aUthor °f yOUr workbook' there were seven women enrolled,
d °f them gave blrth to a 9irl rather than a boy. Is this a rare occur¬
rence. Assume that the probability of any one's bearing a girl is 1/2 (Actuallv
the probability of a boy is slightly greater than the probability of a'girl
about .51 or .52. More boys than girls are conceived, and even though male’

at ""delivery time.^17 ^ ^ geStation' they still predominate slightly

15. In answering Question 14


what assumption (other than the one about the
probability of a woman's bearing a girl) did you have to make?

sure*!
Homework for Chapter 15 273

Under the "personalized system of instruction" (PSI), material to be learned


is divided into small units, students work on it at their own pace, and they ta e
the exam on a given unit only when they think they're ready for it. They must
pass the exam at a high level before going ahead to the next unit, but they are
allowed several tries for each unit (taking a different exam each time, of course).

Three psychologists at Southwest Minnesota State College recently reported a


comparison of PSI with the traditional mode of instruction for introductory psy
chology (R. C. Riedel, B. Harney, S W. LaFief, "Unit Test Scores in PSI versus
Traditional Classes in Beginning Psychology," Teaching of Psychology, , ,
76 - 78) In the fall quarter of an academic year, they used t e me o w
a criterion of 16 out of 20 correct (80%) as the passing score for the test on
each of the ten units into which they divided their course. They did not lecture
but merely made themselves available at certain times to administer the exams
to whichever students were seeking to take one. In the winter quarter the psy¬
chologists did lecture and administered the same tests, one for each unit, every
fourth class period, with no opportunity for students to retake an exam.

The psychologists expected that the students in the PSI course would generally
make their initial try at the exam on a given unit without being fully prepared,
so that most would not meet the criterion of 16 correct on the firs l' .
fact in the students' first tries at the exam on the very first unit of
course, they earned a mean of 17.93 correct (which is almost 90%). The Psychol¬
ogists reported the n and the standard deviation for this group: 64 and . .,
respectively. Thus you can determine whether it is plausible that those 64 score
are a random sample from a population whose mean is only 16 Test the appropriate
hypothesis, using the .05 level of significance and doing a two-tailed test
Carry all calculations to 3 decimal places, and round your final answers to
but If you need an answer for a later calculation, use the 3-place version.

2. Ha in symbols
1. Ho in symbols:

4. X:
3. a:

5. The value for the standard deviation given above is S. Compute s using the
formula s = /(S2) (n) / (n - 1) :

7. z ______—
crit -—
6‘ sx:-

II „ II
9. Decision on Hq: Accept Reject
8. "z
■calc *■

in Question 9 mean in substantive


10. What does the statistical decision
for the substantive question of whether
terms? That is, what are its implications
to meet the criterion of 16 correct on
the PSI students were generally unprepared
their first tries at the exam?
274 Homework for Chapter 15

11. If you had conducted that test at the .01 level of significance, would
your decision on the null hypothesis have been different?
Yes No

12. If you had conducted the test at the .05 level of significance but had
done a one-tailed version in which the alternative hypothesis stated that u <16
would your decision on the null hypothesis have been different?
Yes No

13. If you had conducted the test at the .01 level of significance and had
done a one-tailed version in which the alternative hypothesis stated that u<16,
would your decision on the null hypothesis have been different?
Yes No

In the lecture course, the mean score on the first exam was only 13.41 (67%
correct). Is it plausible that the scores in this group are a random sample from
apopuiation whose mean is as large as 16? The n for the group was 61, and the
s andard deviation (S) was 4.07. Do a two-tailed test at the .05 level of sig¬
nificance again. ^

14. H0 in symbols:
15. H in symbols:
i—1

a: 17. X: 18. s:

>

19* X -
20. • 4- : 21. "z" n :
2
cnt calc - 22. Decision on H_: Anrppf Reject

Over the remaining nine units of the course, the PSI students earned a mean
a ove 16 on their first tries at every unit except one. The mean of their first

th^ ^ 619 Unit WaS °nlY 15'39 (n = 62' S = 2.78). is it plausible that
y 1th6 SC°reS Were a random samPle fron> a population whose mean was only
16? Do the appropriate two-tailed test at the .05 level of significance.

23. s: 24. s—:


X ------
25: " .
calc * 26: Decision on Hq: Accept Reject

27. If you had conducted the test (still two-tailed) at the .10 level of
significance, what would z have been'?
cnt

28. Would your decision on H^ have been different? Yes No

You may be interested to know that over all 10 units of the course, even on
their first tries the PSI students earned a mean score higher than the mean for
the students in the lecture version. In the homework for Chapter 17 you will
have a chance to determine whether the differences are statistically significant

ouiun
Homework for Chapter 16 275

1. Recall from the homework for Chapter 15 the comparison of PSI with
- traditional lecture course in elementary psychology at Southwest Minnesota
State College in their first tries at the exam on the first unit of the course,
the 64 PSI students earned a mean of 17.93 correct out of the 20 items on the
the b4 stuaeiiuij . 0 Q1 Thp figure 16 out of 20 was of special
intere st'here^because^l^o^better^was' required" for going on to the next unit
Evaluate the difference between 17.93 and 16.00 following the Procedure sugge

in Section 16,6 of the text. “ determine how much of a

Vp *>• Give the answer


to two decimal places on the line above.

"of some importance" according to the


2. Is this difference "negligible" or
standards proposed on p. 96 of the text?

13.41
3 The 61 students in the lecture course earned a mean of only
devia-
on the first exam, with a standard deviation of 4.07. How many stan ar
tions worth is the difference between 13.41 and 16.00.

How important is this difference according to the standards on p. 96?


4.

5 The poorest performance for the PSI students came in their first

worth is the difference between 15.39 and 16.00?

6. How important is this difference according to the standards on p. 96?

_s-Trirzi
Nation of 2 89 How'many standard deviations worth is the difference between

16.60 and 16 even?

96?
8. How important is this difference according to the standards on P
276 Homework for Chapter 16

is common practice in the behavioral sciences for researchers to conduct


one tailed test if their hypothesis specifies the direction in which y.
from Ui-__
ph . _x. _ _ . ~ .... pt lies
..,r Suppose the psychologists at Southwest Minnesota State CollegeThad
followed this practice in examining the data on the PSI students' first tries at
t e exam on the first unit of their course. They expected that the students
would not generally have prepared well enough to earn a score of 16 or better.
Their null hypothesis would have said that y = 16.

9. What would their alternative hypothesis have said?

10. Is there any level of


significance that would have permitted the psychol-
ogists to discover that y
true lies above 16? If so, what? If no, why not?

--- 11. Given that the psychologists were interested in discovering the
population mean to be either above 16 or below it, what should
their alternative
hypothesis have said?

12. Suppose you test the null hypothesis that the population mean is 16 for
tftSaTP^? °f PSI S=ores' and you end UP rejecting the null. The sample mean was
.' 1®t S Say' and the.n was 60- You a one-tailed test at the .01 level of
significance, and a friend who is naive about statistics asks, "Does your result
mean we can be 99% confident that the population mean is above 16?" Explain to
your friend the logic behind your hypothesis test, and say how the figure 99%
enters into things. Work the figures 16, 18, and 60 into your explanation too.

Answer this question carefully! Do a rough draft on another page first.


Homework for Chapter 17 277

Recall again the comparison between PSI and a conventional lecture course in
elementary psychology. The instructors of the course reported the following data
for scores on the first exam: PSI Lecture

X 17.93 13.41
S 2.91 4.07
n 64 61

is it plausible that the two groups are random samples from populations wi
identical means? Test the appropriate hypothesis at the .01 level of signi 1
cance, doing a two-tailed test. Carry your calculations to 3 decimal places, an
round your final answers to 2—but if you need an earlier answer for a calcula
tion, use the figure with 3 decimal places. Show your work. If you need to com-
pute s, the formula is s = /(S2) (n)/ (n - 1).
1. These samples are (circle one): independent dependent

3. Ha in symbols:
2. Hq in symbols:

4. a: 5. X - Y: 6. s-:.

7• sy:. 8- sx-7:'

9. z . , : 10. "z"
cnt calc ---

11. Decision on Hq : Accept Reject

12. What are the implications of your statistical conclusion in Questio


for the substantive question of whether the PSI students tries at the tes
would not be as good as the lecture students' performance? (That was the psy
chologists' expectation, remember.)

The mean for the PSI students' first attempts at each of the remanning nine
exams was higher than the corresponding mean for the lecture students (jho ne

^reft °nThe°thirdaunitais^f intS^ because that was the one on which the
iectureStudents did the best and the only unit on which they earned a mean over

16. The data are as follows: PSI Lecture

X 17.06 16.60
5 2.01 2.89
n 64 72

Again, determine whether it is plausible that the parent populations have the
same mean, doing a two-tailed test at the .01 level of significance.
278 Homework for Chapter 17

13. These samples are: independent dependent 14. X - Y:

15. s—: _____ 16. s-


X — Y
17. s-: 18. "z"
X-Y ■ calc

19. Decision on HQ: Accept Reject

Also of interest are the data on the eighth unit, because this was the one
on which the PSI students did least well, and the only one on which the mean of
their first tries at the exam was under 16. The PSI class still outperformed
the lecture class, though:
PSI Lecture

i—1
X 15.39 27


5 2.78 3. 34
n 62 61

Do a two tailed test at the .01 level of significance again


20. These samples are: independent dependent 21. X - Yz

22. s- 23. Sy:


X —

24. s-: 25. "z"


X-Y - calc *-

26. Decision on HQ: Accept Reject

None—1st 2nd 3rd 27. Which, if any, of the above three tests would have
yielded a different conclusion about the null hypothesis if it had been conducted
at the .05 level of significance?

28. Suppose you're wondering whether the PSI students' performance on their
first tries on the first unit differed significantly from their performance on
their first tries at the tenth unit (x = 16.49). If you were to do a two-tailed
test of the hypothesis that the parent populations for the two samples of scores
had identical means, using the .01 level of significance again, could you follow
the procedure and use the formulas that you employed for the three problems above
Why or why not?

29. If you could not follow the same procedure and use the same formulas, say
what you would have to do differently.

9UIBN
Homework for Chapter 18 279

Here are the data again for the first-unit comparison of PSI with a lecture
course covering the same material:
PSI Lecture

X 17.93 13.41
5 2.91 4.07
n 64 61

In answering the following questions, carry your calculations to 3 decimal


places and round the final answers to 2, but if you need an earlier answer for a
calculation, use the figure with 3 decimal places. Show your work, as usual.
The formula, once more, for computing s from S is s = v(S2) (n)/(n - 1).

Determine the 95% confidence limits for the PSI population's mean.

1. X

2.
ZP

3.
sx

4. Lower limit

5. Upper limit

6. di

7. Say in words what di means

Suppose we wanted to increase the precision of this estimate, so that the


full width of the interval is only 1 point on the scale of raw scores (which are
numbers of items correct out of 20). Estimate the required sample size. Remember
that it must be a whole number.

8. w

9. s

11. Required n

Determine the 99% confidence limits for the lecture population's mean.

_ 12. X

____13- zP
280 Homework for Chapter 18

14. s—
- X

__ 15. Lower limit

_ 16. Upper limit

_ 17. di

18. Say in words what di means in this case.

Now find the 95% confidence limits for the difference between the PSI popu¬
lation's mean and the lecture population's mean.

19. X - Y

20.
ZP

21.
SX-Y

22. Lower limit

23. Upper limit

24. sav

25. d2

26. Say in words what d2 means in this case.

Suppose you wanted the 95% confidence limits for the problem above to be only
1 point wide on the scale of raw scores. Estimate the required size for each
sample (which must be a whole number, of course).

27. w

28. s
av

29. z,

30. Required n per sample

0UIPN
Homework for Chapter 19 281

Here are the data from the homework for Chapter 9 again.

% Correct % Correct
Student
on Exam 1 over Semester

A 38 75

B 54 65

C 62 94

D 67 81

E 67 84

F 72 93

G 77 90

H 77 93

I 82 90

J 85 95

In answering the following questions, carry your computations to 3 decimal


places and report answers to 2, but if you need an answer for a later computa¬
tion use the 3-place version. Show your work.

If the students had known the answer to 75% of the questions over the entire
semester, on the average, their mean percentage correct would have been 81.25.
The extra 6.25 percentage points would have come from their guessing correctly
on a quarter of the 25% of the items they didn't know. (With 4-choice items,
the probability of a correct guess is .25.) Is it plausible that the mean of
the population of percentage-correct scores for the entire semester is 81.25?
Do a two-tailed test of the appropriate hypothesis at the .05 level of signifi¬
cance. (Don't be confused because you previously called the scores Y.

symbols: 2. H in symbols:
1. Hq in

3. a: 4. X: 5. s„: ____—-
- X

6. s—: ______7‘ df'- 8’ tcrit""

10. Decision on Hq: Accept Reject


9. t
calc

Now estimate the population mean for the total-performance scores by finding

the 99% confidence limits.

11. t
-p

12. Lower limit

13. Upper limit


282 Homework for Chapter 19

Did the students' total-performance percentages significantly exceed their


initial—performance percentages? Test the difference between the mean percentage
correct on Exam 1 and the mean percentage correct over the semester, using the
method of difference scores. Show the difference scores in the table on the
other side of this page. Use the .01 level of significance, and even though it's
not appropriate, let the alternative hypothesis indicate a higher mean for the
population of total-performance scores. Be sure to state the null and the alter¬
native hypotheses in terms of difference scores, though.

14. Hq in symbols:________ 15. in symbols:

16. a: 17. D: 18. s :


-- D ---—---—-- --

19 •
__ 20. df: 21. t . :
— —— —--—- crit —--

22. t ____ 23. Decision on Hq: Accept Reject


calc *•

Here again are the data for the comparison of PSI with a conventional lecture
course in elementary psychology:
PSI Lecture

X 17.93 13.41
S 2.91 4.07
n 64 61

Test the difference between the two sample means, doing a two-tailed test at the
.01 level of significance and using the t statistic. It will be interesting to
compare this test with the one you did on the same data for Chapter 17.

24. Hq in symbols : 25. in symbols:

26. Oil 27. X - Y: 28. Ex2:

29. ly2: 30. s-:


V
A~ j\x — --- -———----

31. df:

32. 33. 34. Decision on HQ: Accept Reject


^crit’ ^calc * —

Finally, compute the 9()% confidence limits for the difference between the two
population means.

36. Lower limit

37. Upper limit


Homework for Chapter 20 283

Look back at the bivariate distribution presented in the homework for


Chapters 9, 11, and 19. Test the hypothesis that the correlation coefficient
is zero in the full population from which those scores may be considered a
random sample. Make the test two-tailed, and use the .01 level of significance.
Except for the computation of r, show your work.

_ 1. Hq in symbols

2. in symbols

3. n

__ 4. r

_ 5. t

_ 6. df

7. t ..
——- crit

8. Decision on

9. According to Table E in Appendix F, what values of r are re-


quired for significance at the .01 level in a two-tailed test?

Now compute the 95% confidence limits for the population value, again show¬

ing your work.

10. z'
i“H
1—1

z

12. 0 ,
z'

13. Lower limit expressed as z'

14. Upper limit expressed as z'

15. Lower limit expressed as a correlation coefficient

coefficient
16. Upper limit expressed as a correlation

17. Are the limits equidistant from the sample value? If not, which limit

is closer?
284 Homework for Chapter 20

Many institutions of higher education make a systematic effort to survey


their students' opinions of the instruction the institutions provide. Some
schools publish the average rating of each instructor in each course that she
or he teaches, and students use these averages in choosing their curricula,
on the presumption that an instructor whose average rating for a given course
is high will receive another high average when teaching that course again. But
how valid is this presumption?

Relevant data have been gathered at Queens College of the City University
of New York (L. H. Seiler, L. D. Weybright, & D. J. Stang, "How Useful are
Published Evaluations Ratings to Students Selecting Courses and Instructors?"
Teaching of Psychology, 1977, 4, 174 - 177). Five different Pearsonian correla¬
tion coefficients are available, each describing the relationship between the
mean rating an instructor received for a given course and the mean that instruc¬
tor received for the same course one year later. The n's for the correlations
range from 99 to 183, and the r's range from .58 to .65. Even the largest of
these, which happens to be the one based on the biggest n and is thus the best
single estimate of the true correlation, is disappointingly small.

--- 18. What is the coefficient of alienation for these data? (See
Section 12.7.)

19. What is the coefficient of determination for these data? (See


Section 12.8.)

One might expect the correlation to be higher if an instructor's two offer¬


ings of a given course came not a year apart but in successive semesters. The
only available data for this case were collected at another institution: r = .67
for 45 combinations of an instructor and a course. Does this figure differ siq-
ni icantly from r = .65 for 183 combinations of instructor and course? Test the
appropriate hypothesis at the .05 level of significance, using a two-tailed alter¬
native. As usual, carry computations to 3 decimal places and round to 2, and
show your work.

20 . H0 in symbols

21 . H^ in symbols

22. z1 i

23. Z'2

24.

25.

26. z
crit

27. Decision on H ^

ouiPN
Homework for Chapter 21 285

Look back at the first page of homework for Chapter 15. Note that on their
first tries at the exam on the first unit of the PSI course, the 64 students
correctly answered a mean of almost 18 out of the 20 items, which turned out to
be significantly greater (in the statistical sense) than 16, the minimum needed
for proceeding to the next unit.

To get some idea of the power of the test you did there, assume that the
standard deviation of the scores in the population is 2.40. (The article from
which this example is drawn reports the standard deviation for 10 samples of
first tries at an exam in that PSI course, one for each of the 10 units into
which the course was divided, and the mean of the 10 standard deviations is 2.42.)
Following the procedure illustrated in Section 21.10, determine B for the test
vou did (a two-tailed test at the .05 level of significance using a sample of
size 64)— on the assumption that the true mean was 17, just one point higher than
the hypothesized mean. To show your work, construct a neat, carefully labele ^
diagram like Figure 21.5 on p. 371. Do a rough draft first on another sheet or

paper.

1. What is 3 (to 4 decimal places)?

2. Say in words what this figure means for this particular case

3. What is the power of the test (to 4 decimal places)?

means for this particular case.


4. Say in words what this figure
286 Homework for Chapter 21

In a study conducted on a street corner with a traffic light, an experimenter


stood on the curb waiting for the light to turn from red to green. When another
pedestrian walked up, the experimenter turned and gave the person either a quick
glance or a prolonged stare. Regardless of the sex of the experimenter or the
sex of the subject, those who had been stared at tended to cross the street faster,
once the light changed, than those who had received only a glance. The mean
crossing time for 66 subjects in the stare condition was 11.1 seconds, while the
mean for 62 subjects in the glance condition was 12.2 seconds, and this difference
proved to be highly significant. The standard deviation of the population of
crossing times was estimated to be 1.00 seconds (really) in both conditions.
(This study was the work of P. C. Ellsworth, J. M. Carlsmith, and A. Henson, who
reported it in their article "The Stare as a Stimulus to Flight in Human Subjects"
in the Journal of Personality and Social Psychology, 1972, 21, 302-311.)

It seems safe to assume that the researchers conducted a two-tailed test of


the null hypothesis that the population means were equal. One wonders how small
a ffbi'srice between the population means they could have discovered in this
procedure with samples of the sizes they employed. There is no one answer, of
course; rather the smaller the difference, the lower the probability of their dis¬
covering it. But if the probability of discovery is specified, one can estimate
the minimum difference whose discovery carried this probability. Do so for the
probabilities named below, following the procedure described in the next-to-the-
last paragraph of Section 21.11. (Only approximate answers are possible. Give
them in seconds, not in standard deviations.)

--- 5* What was the minimum difference between the population means
that was discoverable with a probability of .80? (This is
the difference for which the risk of missing it was .20.)

---6. What was the minimum difference that was discoverable with
a probability of 90%? (This is the difference for which the
risk of wrongly accepting the null was 10%.)

__ 7. What was the minimum difference whose probability of dis¬


covery was fully 95%? (This is the difference for which the
risk of a Type II error was only .05.)

ouipjsi
Homework for Chapter 22 287

To become comfortable with one-way analysis of variance, and to gain insight


into its workings, it is helpful to do an analysis with very simple numbers. Such
numbers are supplied below, and you will find that most of the quantities derived
from them (means, sums of squares, and the like) also turn out to be simple. In
doing the analysis of these figures, you will see that you are asked first to use
the definitional formulas of Section 22.5 (illustrated in Section 22.6) and then
the raw-score formulas of Section 22.7, which should produce the same results.

xD xD-xD (xD-xD)2 XE xe-xe (XE-XE)2 XF XF-Xp (XF-Xp)2

21 16 12

21 15 10

20 15 10

19 15 10

19 14 8

z =
1. XD 2. Xe 3. XF Note that the
means are very widely dispersed, whereas within each subgroup the scores cluster
tightly about their mean.

4. Z (XD XD) 2 These are the quantities that could logically be


called SSEr SSF, and SSF, as noted on p. 198 of the
5. Z (xE xe) 2
workbook. To show your work in computing them via
6. Z (xF xF)2 the definitional formulas, fill in the table above.
2
7. 8. df W 9. sW

Now compute
ssw ■
10. Zx2
11. The <

12. ssw ,

Onward to sa . First use the formula on the bottom of p. 395, showing your
work in computing the numerator of the fraction by filling in the table below.

X X - X (X - X) 2

2
XD 13. Z (x - x)
XE 14. n

XF 15. ssA
16. dfA = k - 1
Z =
2
17.
288 Homework for Chapter 22

Next compute SSA via the raw-score formula on p. 399.

_ 18. The term in square brackets in Formula 22.6

__ 18- The other term in the formula, the one subtracted from the first

_ 20• SSA as computed from Formula 22.6 Compare with previous result

Now complete the analysis:

21. F
calc
22. F for a = .05
cnt
23. Decision on null hypothesis stating equality of population means

For a bit of extra insight, finally, compute SST first via the definitional
formula on p. 397 and then via the raw-score formula on p. 399. To use the defi-
nitional formula, fill in the table below, which lists all the scores.

X x-x (x-x)
Suppose an experimenter assigns 45 men at random
to receive a placebo, a small dose of caffeine, or
a large dose, and then determines their reaction time
in an apparatus simulating the braking of an automo¬
bile. The n's are equal for the three treatment lev¬
els. The experimenter then repeats this procedure
with 45 women. The resulting data can be studied in
a two-way analysis of variance. One variable is
dosage of drug, and the other is sex of subject.

__ 28. df for dosage of drug

__ 29. df for sex of subject

_ 30. df for interaction


1—1
ro

dfwc
32. F for dosage at a = .05
cnt
33. F for sex at a = . 05
cnt
34. F . for interaction at a =
cnt

24• SST from the table, which should = SSW + ssA


25. Ex , where summation is over all scores, from Formula 22.7

26. The other term in Formula 22.7 27. SST from Formula 22.7
Homework for Chapter 23 289

Want to convince people that you can read their minds? Try this demonstra¬
tion. Ask a good-sized group of people each to think of a number between six
and ten, inclusive. Each person should make his or her choice individually and
keep it private. Then request that the group think their numbers "at" you, and
announce that you will receive, via telepathy, the number that "comes through"
most strongly, which will be the modal choice. Pretend to receive their thoughts,
and state with confidence that the "loudest" number is seven. You will have an
excellent chance of being correct. Why? Consider the following data, which are
the results of asking 207 introductory-psychology students to choose a number
from six to ten. (The data were reported by Philip Zimbardo in the instructor's
manual for the ninth edition of his text Psychology & Life.)

Choice f
six 24

seven 112

eight 33

nine 25

ten 13

Does it appear plausible, in light of these data, that people in our contem¬
porary society make those five possible choices in equal proportions? Test the
appropriate hypothesis at the .05 level of significance.

1. State the null hypothesis in words.

2. State the alternative hypothesis in words.

3. Compute X2#■ showing your work in the table above. X =

4. df = 5. X' 6. Decision on Hq: Accept Reject


crit

7. Is it plausible that the proportion of people who choose seven in what¬


ever population Zimbardo randomly sampled is one-half? Compute the appropriate
X2, showing your work in the table below.

Choice f

seven 112

other 95

8. df = 9. for a .05 10. Decision on Hq: Accept Reject


crit
290 Homework for Chapter 23

In 1975, three psychologists at Purdue University reported a study in which


undergraduate students, 28 men and 34 women, were asked to play an electronic
dart game. Each subject was offered a choice between two versions, one in which
the score depended on the player's skill and one in which the score depended on
luck. The psychologists reported their data in the following table:

Males Females

Luck 6 22
Choice
Skill 22 12

28 34

11. Conceptualizing this study as a test of the difference between two propor¬
tions, state the appropriate null hypothesis in words.

12. State a two-tailed alternative in words.

13. X2 = 14. df = 15. for a .05 =


crit

16. State your decision on the null hypothesis and interpret your finding,
specifying the direction of the difference, if any, between the two sexes. Use
the remaining space to copy in (neatly) your computation of X2•

The study is the work of Kay Deaux, Leonard White, and Elizabeth Farris:
"Skill versus Luck: Field and Laboratory Studies of Male and Female Preferences,"
Journal of Personality and Social Psychology, 1975, 32, 629-636.

3UIUJSI
Homework for Chapter 24 291

1. Here are the data from the homework for Chapter 9 again. Compute Spearman's
rank order correlation coefficient for the two variables, showing all your work.

Student % Correct on Exam 1 % Correct over Semester

A 38 75

B 54 65

C 62 94

D 67 81

E 67 84

F 72 93

G 77 90

H 77 93

I 82 90

J 85 95

You may be interested to compare r^ with r. Are they close?

2. Compute X2 for a Sign Test of the difference between the two samples of
scores in the table above. Show your work in the space below.

3. Recall that with 1 df, Jy* = z, and z is comparable to t. Compute z for the
Sign Test. You may be interested to compare it with the value of t that you
found in the homework for Chapter 19.at the top of p. 282.
292 Homework for Chapter 24

Now do Wilcoxon's Signed Ranks Test for the data on the other side. If you
think a bit, you'll see that it's not necessary to find the difference scores.
Show your work in the space below.

4. W+

5. W_

6. Is the test statistic W+ or W_?

7. Critical value of the test statistic for a two-tailed test at the


.01 level of significance

8. Decision on Hq

In a replication of the study described in the homework for Chapter 21 (p.


286), a female experimenter directed a stare or just a glance at a pedestrian
waiting for a traffic light to change from red to green. A second experimenter
standing across the street timed the subject as she or he crossed the intersec¬
tion after the light changed. Crossing times were recorded to the nearest half
second, and the following (real) data resulted:

Stare: 677.57.58888 9 12.5

Glance: 8.5 8.5 9 9.5 10 10 10 10.5 11 11

Test the difference between the two samples with the Mann-Whitney procedure,
using a two-tailed alternative and the .05 level of significance. Call the
stare condition X, and show your work in the table above.

_ 9. ZRX

_ 10. Range of critical values

11. Decision on H
- 0

amujsi

You might also like