You are on page 1of 10

DESCRIPTIVE VS.

SAMPLING STATISTICS 11

The fundamenlal distinction between descriptive and sampling statistics


is essentially as follows: In sampling statistics we study populations in terms
of the data of samples. In other words, the data of a part arc used as thc
basis for investigating 01' studying the whole. In descriptive statistics, on the
other hand, no distinction between part and whole is made; the data obtained
ill a study are treated as if they constitute a whole. A censwr characteristically
presents data for the methods of descriptive statistics, since, hy definition, a
census is a set of observations or measurements made for all memhers of a
gl'OUp or population. A sample, by definition, characteristically presents data
for the problems and methods of analytical statistics. In both cases, however,
the initial task ill the statistical treatment of results is the reduclioTl of data.

The Reduction of Ddta


Olle of the fundamental purposes served by statistical method is the reduc-
tion of data. What, thell, do we mean by the reducLioll of data? It consists
ill the organization and summarization of data into forms that can be readily
perceivcd and understood. Consider the schedule for information or data
shown in Fig. ] :2, part. of thc school record for a child.

Fig. 1:2. A Schedule


Case No. ________________ Date of Record _______________ _
N arne _ _ ___ ________________________________ ___________ Sex _______________ _
(Snrlll.tuw) (F'inst)
Address ____________________________________________ School _______________ _
Birth dat.e ________________ Place ____________________ Grade _______________ _
Father's name __________________________________ Occupation _______________ _
iVlother'R Ilume _________________________________ Occupation _______________ _
Brothers _____________________________ Sisters __________________________ - __
Mental Age __________________ f.Q. __________________ Test _________________ _
EXaInincl' ________________________________________ Date _______________ _

Educational Achievement Record:


Area Rating Test Date

Whether or not data arc obtained in conjunction with the pIau of an


experiment or in conjunctioIJ with the plan of an educational system (01' other
agency) for malntaining relevant records, it shonld be ohvious that a sys-
tematic scheduln for recording Lhe data is a labol'~saving device. When possi-
ble, it is most convenient to arrange the data of a case on a card, the size of
which will of course depend upon thp. amonnt of information to be recorded.
A card 3 by 5 inches is about as small as can be easily manipulated; only
rarely is a card or sheet larger than 9 by It inches needed. The procedure of
recording is facilitated if the schedule is printed with appropriate descriptive
terms for the various categories of data or information to be entered.
12 INTRODUCTION TO STATISTICS

III recent years the problem of recordillg alld handling great maSSflS of daLa
has heen wet, by the development of tlw IJllllCh card, 011 which can be recorded
by maciJiIll' allY type of information that can be coded by a nUltlber SYSLClll,
as well as original data which arc llllIlH·rical. To haIl(lh~ the eocl(~d data of
such cartis, Hnrling machilles and variolls kinds of tabulalors have been
dcvciolwcl (cf. Chapler 2, Sectioll C).
A schcduk fm a child's scho(lllw'ord if; gClwrally usd'ul ill (wo ways. (I) h
is valuahl(~ for Ihe Leacher, pKyeholugisL, 81e., who worke; willi l.lte individual
pupil and all('mpls 10 iroll oul, prohlelllS of IhaL child's adjllsillwilt Lo the
school 01' (lUll'!' situatiolls. This is a 1I001-statislical, illdividllal IISC of sllch
inforlllalioll. (2) ft is valuable for staLit-:I.ical purposc's, which fl}pallS l.lmt it
is IIseflll as olle ease ill hUlldreds 0[' t bOllsallds of slIeil records which, wIlen
eonsideT'('d Pll tIlass(~ 01' accurding to relcvallt groupings, lIHlY provide vaillable
illfoI'1I1,a('iol) ill l.Iw plallJlillg', Ii lIulIci) Ig', alld IllanagclllL'lI L of an educatiollul
sYRLeflJ. WII('II the purpose is statistical, a given child's schedule is more
appropriately ~igllifil·(l hy a COllvl~lIietJ L 1I11mber rat.her thall by hie; nalll(\
since the illyt'RtigaLol" is 110 111l1gL'r dl'alillg with tll~' illdividual child hut
with olle cast' ill a grollP or Illass of sta(.isLieal illfol'tllal.ioll. Such results w,
arc ohtaiIli'cl ill th(~ stalisLknl trealJlWllt of Lhe data will apply lo Lhe grollp
as a \vhole ralhe!' llmll to nil illdividual CH:oe. .
It shuuld he apparent thaL it is illlPossible adcquatdy to illterpret, III!!
l'(~cords of IJIIlldt'l·rls or I.hOU:'JandR of SII('h s!']I('dllles of information unl(·ss 1.I1(~
data arc t-iOIlIl'llIlW clas:,:ifil'd and Slllllltltuizl·d. An iuvestigator or l"(~searclt
work(~r is thus fU!:ml with the very pra('tical problem of n~c1I1('illg a great hulk
of data to a form t.lrat will be more rnadily perceived alld IlIIdefstoou. The
procedures to be adupted for such classificatioll and suwmarizatioll depelld
specifically upon the purposes of the investigation or inquiry. III general,
however, the statistical procedures that. may he used iJlclude ollly a few
alternatives; they will be deseribed ill detail ill the followillg chapLers 011
methods of dl'sCT'ipiire statistics. Here it is elllphasized lhaL descripl.iv() 8Lath;~
tical procedures serve a need whieh arisuH as soon as the obs(_~rvaLiolls or data
of any survey 01' inqlliry become at all sizable or bulky. For, as n. A. Fb]wJ'
says, "No human mind is capable of graspillg in the entirety the mnauillg'
of ally considerable quantity of numerical data." ,I: The statistical methods
used for the reductioll of data are of thrcc kinds:
1. Graphic methods
2. Computational methods yieldillg IlllIlwrieal measures
3. Tabular methods
The aim, theIl, of descriptive statist.ics is the reduction of data so that
the results of observation and measurement may he (1) made more inlllwdi~
atcly meaningful, and (2) presented in a form that will make interpretation
* R. A. Fisher, Statistical Me/hods .for Research Worker.~, Oliver & Boyd, Landau, 7th ccL,
]9~8, p, 6. .
THE NATURE OF STATISTICAL DATA 13

alld comparisou of resulLs (~usy and unumbiguOUf-l. In the light of the preced-
ing discussioll, descriptive statistics can now be ddincd as the orgalli:tatioll
amI sUIlllllarizaLion of collectiulls of lIl111wricul elata, inclndirlg dala arrivt"ri
aL by the simple method of cllllItlcratillg iusLUllCl'S. D('seriptive statisticH
cOllsists ill Llw reduclioll of groups or ma::;ses of daLa by means of tabIcs,
graphs, and numerical nwaSUl't:s such as percentages or proportions, averages,
1l1t'ai:HU'CS of deviation or dispersion, coeHicients of correlatioll, de.
ThaI. tlw methods of d(~seript.ivll sLatistics are essential to the methods of
analytical or salllpling statistics is appan'llt. Whelllt'r the data arc of a censlls
01' of a sumph', tlll' fin,!, sl.t~p ill their treatment comlists ill their appropriate
reduction or simplifkation.

C. THE NATURE OF STATISTICAL DATA


III gellcral, statistical dal a arc uf t\yO kinds. They an' derived either froIll
variablcs or from 1I0l1-variabks. Consider, for nxample, the kinds of statistical
informati.on collected about human beings. Cellsus data provide us with
information {'.ollccl'Iling Llw illcidpllce or llumlwl' of pnoplt~ by gcographkal
areas, their ages, disLl'ilmLioll with regard to sex, etc. Psyehologists and
sociologists bring Logether many kinds of informatIon concerning human
behuyior and intelligence. TIw sLatislical data of Lhe latter investigations
consist of various psychological lllCaSIIrt'tnenls, Ihe frequency of diIrereill
beha"\' ior patterns, scores from q lwsl iOTlnaires, ill tc'l'est illVt'ulories, etc. SOIlW
of thesp dala are variable and others are nou-variable. Let liS See what III(:
disliuctioll between them iH.

Non-Variable Data
The incidence of the two sexes in a population provides -a common example
of non-variable data. People are either male or female. Scx is a non-variable
attribute. A person can he categorized as either one or the olher. FurtJwr-
more, no order is inherent in the arrangement of \Jwse two categories; that.
is, there is lIO basis in measurement for putting the male class first and the
female class second, or vice versa. A non-variable attribute is thus one that
exists with respect to distinct categories rather than wit.h respect to a par-
ticular degl:ee.
Non-variable data are oftOll rderred to as the data of categories. Categorical
data arc generally ohtained simply by the eIlumeration of instances that
occur, or that arc observed to exist, with respect to the classes or categories
under consideration. .
Variable Data
In conLrm;t to catl~gOl'ical data, variablu data represent quantitaLive differ-
ences (variation) ill the manifeHtaticHl of a property ur LraiL or attribute. Thus,
tbe age and height of persons are examples of atLributes thuL are vat'iablt's.
14 I!'-ITRODUCTION TO STATISTICS

The csseuLial c1mracterisLies of a sLatistical variable arc as follows: (I) 'flw


attribute being studied is capable of quanti/alive dijferentiation. (at lem;!',
theorctically); (2) the data difi'er(;ntiaLed have Of'der iuherent ill their naLurp,
an ordcr ranging from least to most. 'rhus, the age of individuals is infiuitely
variable (withill the age range of human bcings), inasmuch as age is susceptible
[0 quantitative difJ'ercntiatioJl in terms of years, months, and days. Further-
more, a collection of age data call rcadily be brought together with respecL
to the orclm' inherent in theIll, namely, an order that ranges from least age
to most age,
Although Rex was seeIl to be an attribute which yields non-variable raLher
than variabk data, il should be ubserved IhaL Lite ratio of males to females
providci:l a rlwusure- [lIp sex ralio--wlIieh is a variable attribute. The Rex
ratio is all iudex [hal may vary ill siy.(~ for dilft'I'PIlt. calendar periods or placus.

The Treatment of Statistical Data


Most of the methods of staLisLies have Lt'PlI developed for the treatmellt
afld analysis (If variahle data. This 'is because variation has, historically, IJPell
praetically synonymous wilh the eoncepl of statistical phenomenu. We have
already seen that QUddeL aJld Galton pioneered in t.he nineteenth cpntury
in tho development of st.atistical mclhmls. By aud large, the methodR LileY
were responsible for were eOlWerll(~d wHh tlw variations charaei.crislic of
human beings alld other natural phenolll!.'na. We saw that Bowditch, study-
ing thc grO\~th of children, was fat'c'd with t.lle problem of somehow relating
two vlll'iable atLributes-heighl aud weight.-but lhat ilremailled for Galton
to develop a method of determiuing Ihe correlaliutl between the measure-
ments of two such vadll.hks. FLIt'llH'l'tIlore, l.ht' stat.istieill.lIf; of the uiueteenlh
c('ulury wme inclined 10 consider the dalu of variables as furmiug a distribu-
tioLl of Ull'asures similar 10 Ilw lIormal probability curn'. For as large sample's
of data of various allrihu Les or chamc Lerit;lies of man awl other biological alld
social pht'lioIlwlla We1'e observed and measured, tIte distributions of Ihe col-
lections of data obLailled fOl' a given atlribu to were ()fl.l~1l fOllnd 10 approaell
the form or litis CLll'VC. Tlil~ 1I1.)1,ltiai probabilitJ' (~urve thus came to epitomize
a fundamental property of a variable. Neverlheless, noL all variable attri-
butes yield disLributions of this forlll. On the other llUUcl, categorical data
do not )rield dislributions of allY kind. This is the case because the essence
of any distribution is all ordered sel'ies of measures ranging from the least
degree of the attribute observed or measured, La the most degree. Variable dala
are often referred to as the data of variates, and categorical data as the data
of non-variales.
In consequence of the kinds of statistical problems arising during the nine-
teenth century alld the early part of lhl~ twentieth, the bulk of the methods
of slatistics developed have benll for I,he trpatlllcil t of vadabl<) data. HOWDVPJ',
non-variahle data are also important and aecordillgly some special methods
TtlE NATURE OF STATISTICAL DATA 15

have been devised for their treaLment and analysis.* These methods are
especially relevallt La rnatly market research investigations, as well as to
studies in social psychology and sociology. In Chapters 2-4, we shall present
the basin statistical nwtlJOds for 11on-variables as developed for problems of
descriptive statistics, and ill Chapters 5-9, the fundamental methods that
have been developed for the descriptive treatment of variables. However,
the distinction between these two sets of methods is not always sharp. The
data of variables are sometimes treated by methods developed for categorical
data. For example, ill order to determine whether a particular aptitude test
is satisfactory, the criterion therefor may be taken simply in terms of .mccessful
and nOThmcceo4tzl performance. Obviously, performance is itself a variable
attribute. However, we often lack satisfactOl'y methods for quantitatively
dif1'eren tiating degrees of success or non-success and we obtain, at best, broad
nOll-quantitative distinctions or differentiations of such attributes.

The Mathematical and logical Implications


of a Variable-A Series
The eSRence of a statistical variable resides ill the two properties already
mentioned: (1) the capaciLy of a characterislic or attribute to be quantitatively
differentia/ed (by some proees::I of measuf(~menL or observation), and (2) the
presence of an iuherent order ill the daLa. When the statistical data of an
investigatioll satisfy these Iwo conditions, they yield a series, or scale, of
measures. Such a series of measures ranges in nUIlLerical size from least to
highest values. The concept of a series is thus implied by the order inherent
in the quantitatively diiYerentiated data of a variable. Some variables, how-
ever, call he studied and ordered into a series, but not quantitatively differ-
entiated. Thus, the social inlere,.:;ls of a group of people can be rated as "above
average," "average," and" below average" (yieldillg u serie~ with three broad
classes), although there may not bc available a satisfactory process of measure-
lUellt that will yield quantitative dif1'erentiatiolls of varying degrees of the
attribute socia.l interest.

Continuous l!S. DisconUnuolls Series


Even though the data of a variable may satisfy the two properties of quan-
titative differentiation and order, there is a third property characteristic of
such data that gives rise to a distinction among variables themselves. The
data of variables may forlll either a continuons series of measures or a dis-
confinuotls series.
A contilluous RCl'if's of measures is Olle that, by defiuitioIl, is lheoret.ically
susceptible t.o IIllmerical suhdivisions of uny uegrce of finelwss. A series' of

r.c.
, * G. V. Ylile and IVI. G. Kendall, An Introduclion 10 the Theory of SlatisUcs, Griffin,
London, 12th ccl., 1910, ChClJlH.I-G.
16 INTRODUCTION TO STATISTICS

age dala is theoretically capable of such subdivisions. In practice, we may


have no need to differentiate ages to finer degrees than years or months, but
theoretically finer subdivisions in days or hours, etc., could be made. Such
data thus form a continuolls series, or continuum, Lhat ranges from the least
observed value to the highest observed value. This is the case, even though
each subdivision ill a continuum may lIot actually have an empirical datum.
On the other hand, the dala of some variables do not, either in fact or the-
oretically, form a continuum, or continuous series of measurements. A collec-
tion of statistical information indicating the number of listeners per radio
program, or Lhe nnmber of children per family, will yidd data that satisfy
the two basic properties of a variable, namely, quantitative differentiation
and order. Such variable data may be arranged in a series ranging [rom the
least number of listeners per radio program to the 'greatest number of lis-
teners. However, it is obvious Lhat only integral values can occur; there are
no fractions' of radio listeners. A distribution of such daLa thus yields a series
that is non-continuous. There are real gaps between the integral values lyillg
within the limits of the series. Such di.~c6nHnuous series arc often referred Lo
as discrete, a.ud the data of sneh a series are sometimes called discrete data.
However, the latter terms are likely to be misleading, because discrete dala
are often confused with categorical daLa. It should be clear, howr.ver, that
discrete data, as just definecl, are the data of a variable rather than of a nOll-
variable. Categorical data of a uOll-variablt~ do !loL have an inherent order
such that they can be arranged in a series of from least to most.
In statistical practice Lhe data of a discontinuous series am usually treaLed
as if thr.y formed a continuous series. Thus Lhe average number of children
per family is usually calculated Lo a fractionate value, as for example, 3.5,
despite the fact that such a value is an obvious abstraction. Au average is
useful because, for a collection of such data, it indicates that the typical
number of childr~n per family is midway between three and' four children.

Exact and Approximate Measures


The datil of statistical investigations are obLained by various methods.
In general, however, the methods may be divided into two classes. (1) A
great 'deal of statistical informatiOll is obtained by the simple HW thad of
enumerating or counting instances. (2) Statistical data are also obtaiiwd by
a process of observation and measurement that is more complex than the
method of simple enumeration.
The meihod of simple enumeration always yields an integral value. Such a
value is an exact measure except for the possibility of errors in making a co un t.
Categorical data are usually obtained by countIng "noses," but the data of
some variablrs, such as the number of children per family, are also obtained
in the some way.
On the oLlwr hand, the data of variables are often approximations. They are
THE NATURE OF STATISTICAL DATA 17

usually obtained by methods thaL yield estimates of location or position in a


continuous series of values. Mo~t of the measurements in the physical sciences
are approximat.ions obtained by well-defined methods of observation and
measurement. AlLhough they are approximations, they have, from a prac-
tical point of view at least, very small margins of eITOl'; in fact, oftcntimes the
errors arc so small that they can be neglected. In psychology and the social
sciences, a test score is an example of an approximate ml~asure. It is usually
obtained by a method oj' observation and meaSUl'emellt that provides an esti-
mate of a persoll's position in a series or scale of test scores.
By definition, all approximate measure is one theoretically capable of
greater exactness if the methods of measurement are continually refined.
It is apparent that a continuous series of numbers is implied by the concept
of an approximate measure. The fact that a statistical datum may be an
approximation rather than an exact number is not, however, to be inter-
preted as thereby belitLling its significance or usefulness. OIl the contrary,
the difference between exact and approximate measures is a difference that
results from the methods used in obtaining them. As just indicated, the
method of the enumeration of instances (basic to all sLatistical data of censuses,
many market research investigations, cLe.) yields numbers or measurements
which arc exact in thc sellse that they are the result of a count. Nevertheless,
it is to be observed that so far as a research investigation may consist of a
sample drawn from a population of instances (as is characteristic of most
market research studies), the count made of a sample is necessarily treated
as an apPl'Oximalioll of the popUlation. Even though the count of the sample
may be an exact number per se, from Lhe point of view of its usc as an estimate
of a population value it is an approximation.
Similarly, the initial measurements obtained from many psychology tests
are based upon a count. Thus, a vocabulary test score may be simply an
enumeration of the number of correctly defmed words in a list. Originally the
vocabulary test score is simply an enumeration of correct responses, and
from this point of view it is an exact number. However, as an estimate of a
pen-lOn's vocabulary ability, it is an approximation. This is true because the
particular list of words used for-the vocabulary test is only a sample of the
test material that could be used for such a purpose. Since all psychological
tests necessarily employ but a sample of test material, the measurements of
ability yielded by a test are always approximaLions and never exact measure-
ments. All such measures are estimates of people's positions in a series or
scale of test performance. All such estimates are approximations.

EXERCISES
1. In what sense is statistics a form of appli.ed mathematics?
2. What are the implications of Quetelet's work for the development of descriptive
and sampling statistics? .
3. State tne different ways in which the concept "statistics" is employed.·
18 INTRODUCTION TO STATISTICS

4. What is the essential difference between descriptive and sampling statistics?


5. What is meant by the reduction of data?
6. What different kinds of' methods are utilized for tho reduction of data?
7. Distinguish between a non-variable and a variable attribute.
S. Distinguish between a continuous and a discontinuous variable.
9. 'Yhat is the difference between exact and approximate Illea~ures?
CHAPTER 2

The Reduction and Organization of


Categorical Data

A. INTRODUCTION
In this chapter we shall present some of the elementary but at the same time
indispensable statistical methods for the treatment of the non-variate type
of data often obtained in psychology, anthropology, sociology, and related
fields. The dala of non-variable attributes, of categories, their collection and
statistical treatment are of basic importance to the research worker, even
though amajority of research problems yield variate data.
From the point of view of the practical problems of research the initial
task to be dealt with is classificaLion, or division, of large masses of llon-
variat.e data. The logic of classification and division is essential to a sound
use of methods for their reduction and comparison. Just as the psychologist
and related scientists need to know the logic of measurement underlying the
treatment of the data of variables, so they also need to know how Lo handle
masses of non-variat.e data which first need to be classified and the results
then described through the use of appropriate statistical techniques.
We shall consider first the problem of classification. Then methods for the
reduction of snch data to a useful form will be presented. Basically, these
methods are simply tabulation and enumeration. Methods for the comparison
of such data will be developed in Chapter 3. These methods consist chiefly
ill the calculation of ratios or rates, such as percentages. Finally, in Chapter 4
we shall present methods for the correlation of categorical data.

B. THE CLASSIFICATION AND ENUMERATION OF ATTRIBUTES


Categorical data are enumerated instances of attributes or qualities of
objects or individuals that are taken as existing or not existing, rather than
as existing to some degree. Hence, categorical data are derived from llon-
variable attributes, rather than from attributes or qualities that are variable.

Dichotomous and Polytpmous Classifications of Attributes


Dichotomous Classification
We saw in the preceding chapter that the sex of human beings constitutes a
non-variable attribute. People can be identified as either MALE or NOT-MALE.
19
20 THE REDUCTION AND ORGANIZATION OF CATEqORICAl DATA

This division is a dich%molls, mutually pxC'lusive diITt'rontialioJ1 of a qualita-


tive attribute. That is to say, it is a twofold classification of human beillg's
with respect to all attribute (quality 01' trail) that call bp difJ'l'reuLiated quali-
tatively (MHE-KIND and NOT-MALE-KIND), hut !l01 qualltitatively. It is a
division such that a pcrson call be identified as being MALE (the presence of
the attribute in qUf,stion) 01' NOT-~rALg (lht' absence of t.he attribute) hut not
both (the two categories are mUlually exclusive).
In the cas(·' of dkhotornolls divisions there thus should be two distinct,
mutually exclusive classes 01' categories, as in the case of the aLLribilte of
SEX-KIND. The negative class, NOT-MALE, is of course usually identified by
the positive descriptive term FEMALE. Although a posit.ive terlll for the
negative class is 1I0t always available for dichotomous classifications, Lhe
positive description of a class is empirically more satisfactory Lban the nega-
tive, provided no ambiguity results. *
Persons ran also be divided into OIle or the other of the two following
mutually exclusive categories: (1) the 13LIND (total absence of visioll), and
(2) the NO'r-TIUND, d(~spiL.e the fact that lllO latLer class is variable, ill that
acuity of visioll varies from little to much. Similarly, people can be divided
into the LIGHT-UAIRED and Lhe NOT-LWHT-IIAIBED. Hcrp, however, the dif-
ferentiation of the eharael.cristics for each caLegOI'y is 1101, so easy because of
(1) the many variations in hail' color, and (2) Llw problem of establishing
satisfactory objective criteria fClr tlw appropriate identification of borderline
cases. The extremes in hail' color would, of course, be easy to identify and
enumerate, but persons with in-betwm~ll shades would be more difficult to
classify. III any cvenL, the line of division for an attribute of this kiBd would
be arbitrary, whereas the distinction between the BI.IND and the NOT-BLIND is
not arbitrary.

Polylomous Classification
Eye color is again an attribute that can be divided into two categories, thc
BLUE-EYED and the NOT-BLUg-EYED. This time, however, the dichotomy it-
self is arbitrary. Eye color is an attribute which may, for research purposes,
be more usefully differentiated into more than two categories. In fact, so far
as it can be correlated with variations in degree of pigmentation, human eye
color may be considered as a variable attribute. But at the present time there
are no entirely satisfactory empirical methods for dealing quantitatively with
this attribute. The usual method for field and la.boratory purposes in psy-
chology and anthropology consists in using a sct of artificial eyes differing
in pigment.ation. By a matching technique (a person's eye color being com-
pared with the colors and shades of the artificial eyes until Lhe best match
is obta.ined), the color and lightness of an individual's eyes are identified

* Technioally, dichotomous division is restrioted by logioians to a positive statement of


the differentia and its negative: A and not-A.

You might also like