You are on page 1of 11

Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.

1 Concepts and Terminology


“Teachers can open the door, but you must walk through it yourself.”
Chinese Proverb

Contents
Introduction .................................................................................................................................. 1
Why Do We Do Statistics? ............................................................................................................. 1
1: Because we not have access to the population of interest .................................................. 2
2: To quantify the magnitude of an effect ................................................................................ 4
3: To test assumptions associated with a statistical analysis ................................................... 4
4: To catch the attention of babes ............................................................................................ 5
Independent and Dependent Variables ........................................................................................ 5
Levels of Measurement ................................................................................................................. 6
Hypothesis Testing ........................................................................................................................ 8
Summary ..................................................................................................................................... 10
References................................................................................................................................... 11

Introduction
The purpose of this chapter is to help you familiarize yourself with some of the key concepts
and terms commonly considered and used by applied researchers. First, I provide an answer to
the age-old question, ‘Why do we do statistics?’. Virtually everything included in this textbook
can be categorized into the four answers I provide to that question. I then describe the
important distinction between independent and dependent variables. Next, I cover the four
levels of measurement, essential knowledge for the application of statistics. Finally, I describe
the basic terminology associated with hypothesis testing. This chapter is not long. However, on
a per gram basis, it is possibly the most powerful.

Why Do We Do Statistics?
First, what is statistics (Watch Video 1.1: Why do we do statistics?)? Statistics is an
applied mathematical discipline relevant to summarizing, interpreting, and decision making
through the analysis of data. In simple terms, people conduct statistics to answer empirical
questions with numbers (i.e. quantitatively). An empirical question is one which can be studied,

C1.1
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 1: CONCEPTS AND TERMINOLOGY

and potentially answered, with observable information. In the context of statistics, the
observable information must be quantifiable (i.e., measured). In more detailed and operational
terms, statistical analyses are conducted for four primary reasons:

(1) Because one does not have access to the population of interest.
(2) To measure the magnitude of an effect.
(3) To test assumptions associated with a statistical analysis.
(4) To catch the attention of babes.

In the next section of this chapter, I describe each reason in more detail.

1: Because we not have access to the population of interest


The life of typical scientists would be made much easier, if they had access to their
populations of interest (Watch Video 1.2: Because we do not have access to the population). In
statistics, a population includes all of the cases associated with a defined group. As of 2019, the
human population is estimated to be 7.7 billion. A lot of research conducted around the world
is relevant to humans. However, no scientific study can ever be expected to include as
participants the entire human population. It would be too expensive and time consuming to
identify and test 7.7 billion people. Instead, researchers use a much smaller group of cases as a
representation of the entire population. Such a group of cases is known as a sample.
To repeat, researchers almost invariably use samples as representations of a particular
population of interest, rather than the actual population. Consequently, statistical analyses need
to be conducted, in order to estimate the chances that one may be fooled into thinking that
something is happening within the population of interest, when it actually is not. As a general
statement, scientists do not want to make fools of themselves. Consequently, they tend to take
seriously the estimation of the chances that their data may be suggesting something that is not
really true. Researchers estimate those chances with something known as a p-value, where p
stands for probability. Statistical analyses which include p-values are known more broadly as
inferential statistics, because researchers wish to infer their results to the entire population of
interest. Nearly the entirety of this textbook is relevant to inferential statistics, so, the process
of estimating and interpreting p-values will come up often.
In a small number of instances, the entire population of interest is actually accessible to
a researcher. For example, I could calculate the correspondence between student lecture
attendance and student performance on the final exam in the statistics unit I teach. Previous
research suggests that students who attend more of the lectures in a unit do better on the final
exam (e.g., Gatherer & Manning, 1998). My third year unit tends to have about 300 students. If
I were interested only in the students that take my unit, the 300 students would represent the
entire population of interest. Technically, the analyses I would conduct on the 300 students

C1.2
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 1: CONCEPTS AND TERMINOLOGY

would be known as parameters, rather than statistics. Parameters are values calculated from
data that represent the entire population of interest.
From a scientific perspective, it would not be particularly interesting to know the
correspondence between lecture attendance and final exam performance for just one unit, in
one university, in one city. Researchers and laypeople alike want to know about findings that
have much broader implications than my classroom. Specifically, people want to know if there
is an association between lecture attendance and final exam performance for all university
students, generally. Unfortunately, if I needed to get population level results (i.e., parameters),
technically, I would have to get a hold of data from the millions and millions of students from
around the world who are currently enrolled in a university, which is obviously unrealistic.
This is where the genius of statistics comes in. Theoretically, a researcher does not have
to get a hold of the entire population of interest, in order to get results that can be inferred to
the population of interest. Instead, all the researcher has to do is get a random sample of cases
from the population of interest. A random sample is a group of cases selected from the
population of interest, where each case had an equal chance of being selected into the group.
From one perspective, a random sample is a bit of a statistical miracle. It is a relatively
inexpensive, but potentially accurate, substitute for a population. In practice, however, a
genuinely random sample is as unobtainable as the population itself.
Consider, for example, the lecture attendance and final exam performance study. In
order to create a random sample, one would have to get a list of all of the enrolled university
students from around the world (millions) and select at random, say, 200 names from that list.
Then, obtain the lecture attendance and final exam performance data from those 200 students.
Finally, statistical analyses could be conducted on those data. Strictly speaking, this is what one
would have to do, in order to infer the statistical results to the population of interest. Such
results would be known as inferential statistics. In actuality, no researcher ever does the above.
First of all, it would take a huge amount of time just to get the names of all of the enrolled
university students worldwide. Secondly, simply because a student’s name may get selected as
one of the 200 does not mean the student will actually participate in the study. Ethically, they
can’t be forced to do so (sigh…). Thus, although random sampling is an amazing strategy in
theory, in practice, it is not feasible. Consequently, virtually all scientific studies suffer from the
same limitation: they neither have data from the entire population, nor do they have a truly
random sample of the population of interest.
Instead, in practice, researchers use convenience samples. A convenience sample
consists of cases who are readily accessible. In the above example, if I were to conduct a study
on the association between lecture attendance and final exam performance, based only
students in my own unit who volunteer to participate, I would be using a convenience sample.
A very large percentage of researchers across many scientific disciplines use convenience
samples. However, researchers apply statistical analyses upon those data, as if they were

C1.3
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 1: CONCEPTS AND TERMINOLOGY

derived from a random sample. No one can say with any precision what the consequences are
of using convenience samples, rather than random samples. It is probably safe to say that the
consequences are not positive for the accuracy of the obtained results. I see no solutions to the
challenge of obtaining random samples forthcoming any time soon. Arguably, scientific
disciplines based on inferential statistics do appear to advance, despite the very frequent use of
convenience samples. For example, penicillin was discovered to work, based on research with
convenience samples, not random samples. So, researchers proceed as if their convenience
samples represent random samples and hope for the best.

2: To quantify the magnitude of an effect


The second reason statistics may be conducted is to quantify the magnitude of an effect
(Watch Video 1.3: Estimate magnitude of effect). Consequently, once a researcher estimates
the p-value associated with a statistical analysis, it is important to calculate the magnitude of
effect. For example, a researcher might find a statistical result that is unlikely to be due to chance
(i.e., really going on in the population), but the association or difference might be really small –
so small, it is inconsequential and/or uninteresting.
Even if a researcher had access to the population of interest (i.e., didn’t have to use
samples and estimate p-values), a researcher would nonetheless be wise to conduct statistical
analyses (or parameter analyses, more accurately) to quantify the magnitude of one or more
effects. Consider the lecture attendance and final exam performance study, for example. It is
conceivable that I could conduct an attendance study based on my own students, because I
might be interested only in my own students. I have access to the population of interest, in this
case. Consequently, I would not have to estimate a p-value associated with any of the analyses,
if I have data from all of my students. However, I would still want to calculate the direction and
strength of the effect. Does attending more lectures correspond to obtaining higher marks on
the final exam? Just how many more marks can a student expect to achieve, if s/he attended 1
more lecture than average? These are very important questions to answer in the process of
conducting a statistical analysis, whether with samples or populations of data.

3: To test assumptions associated with a statistical analysis


The third reason researchers conduct statistics is relevant to testing statistical
assumptions (Watch Video 1.4: Test assumptions). Virtually all statistical analyses have
assumptions that need to be met, in order for the results derived from the analyses to be
accurate. These assumptions are often tested with statistical analyses in their own right. That is,
there are statistical analyses that have been developed to determine whether the assumptions
associated with particular analyses have been satisfied. Unfortunately, there is a substantial
amount of misunderstanding and misinformation about the assumptions associated with
various statistical analyses. Consequently, various sections of the chapters throughout this

C1.4
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 1: CONCEPTS AND TERMINOLOGY

textbook attempt to “set the record straight”. An argument can be made that the assumptions
associated with a statistical analysis should be tested, prior to conducting the statistical analysis
of primary interest. However, in this textbook, I tend to test the assumptions after the primary
analyses have been conducted, if only because assumption testing is, typically, much less
interesting. In plain language, I do not want to lose readers, before I even get to the heart of the
analysis!

4: To catch the attention of babes


It may be good attention, or it may be bad attention. Either way, you will certainly draw
some eyes when you use words like ‘heteroscedasticity’ and ‘wild bootstrap’ at parties (Watch:
Video 1.5: To catch the attention of babes ).

Independent and Dependent Variables


Researchers make the distinction between independent and dependent variables
(Watch Video 1.6: What are independent and dependent variables?). A variable is an attribute
to which values can be assigned. An independent variable is otherwise known as a predictor
variable. Strictly speaking, an independent variable is controlled by the researcher. For example,
a researcher may determine which participants in a study receive a treatment and which
participants do not. However, the use of the term independent variable is typically used more
loosely than its strict definition. In the lecture attendance and final exam performance study I
mentioned above, lecture attendance would be considered the independent variable, because
it clearly occurs before final exam performance. Stated alternatively, it would be impossible for
final exam performance to have an effect on lecture attendance. Thus, in such a study, final
exam performance cannot be the predictor. Instead, it is the dependent variable.
A dependent variable is typically considered an outcome variable. Thus, the
independent variable is usually considered a predictor of the dependent variable. Stated
alternatively, values associated with a dependent variable may be theorized to depend upon, at
least in part, the values of an independent variable. For example, final exam performance may
be theorized to depend upon lecture attendance, not the other way around. For this reason,
final exam performance would be considered the dependent variable and lecture attendance
would be considered the independent variable.
In some cases, an analysis will be based on two variables for which it is not entirely clear
which variable is the independent variable and which variable is the dependent variable. For
example, an analysis based on the association between employment status and self-esteem. In
such an example, it is not clear which variable may have an effect on which. Does employment
affect self-esteem, or does self-esteem affect employment status? Or perhaps both variables
impact each other? In cases where it is not clear which variable may influence which, researchers
will often, nonetheless, refer to one variable as the independent variable and another as the

C1.5
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 1: CONCEPTS AND TERMINOLOGY

dependent variable. Often, the independent variable will be the variable of greater interest. For
example, a researcher may develop a theory about self-esteem, which would make it the
independent variable in a study which also included employment as a variable.
An understanding of which variable is an independent variable and which variable is a
dependent variable helps researchers communicate with each other. Additionally, knowledge
of independent and dependent variables can also have implications for determining which
statistical analysis should be applied to a set of data. At a higher level of conceptualization,
independent and dependent variables can be measured on either a continuous scale or a
discrete scale. Theoretically, a continuous scale consists of data that can be measured in
infinitely small units. For example, brain size in grams or distance travelled in kilometers. By
contrast, discrete measurement is essentially based on count data that cannot be broken down
into increasingly smaller units. For example, people’s biological sex or the country in which
people live. Some statistical analyses can be performed on continuously measured variables or
discretely measured variables, or a combination of the two. Consequently, it is important to
understand levels of measurement in a complete manner.

Levels of Measurement
In statistics, there are four conventionally considered levels of measurement: (1)
nominal, (2) ordinal, (3) interval, and (4) ratio (Stevens, 1946). The importance of understanding
levels of measurement cannot be understated. Your ability to discern the level of measurement
associated with a variable will, in large part, determine the type of statistical analysis that can
be conducted on the data. Additionally, understanding the strengths and weaknesses associated
with each level of measurement will help you design more effective empirical investigations
(Watch Video 1.7: Nominal, ordinal, interval, & ratio).1
Nominal is arguably the least informative level of measurement. Nominal measurement
consists of a set of categories that have different verbal labels, or names. Nominal measurement
is really more qualitative in nature, rather than quantitative. Thus, there is a qualitative
distinction between the categories, rather than a quantitative one. An example variable
measured on a nominal scale is biological sex. That is, a researcher may measure biological sex
in such a way that the participants nominate themselves as either ‘male’, ‘female’ or ‘other’.
Although males may be coded ‘1’, females coded ‘2’ and ‘other’ coded ‘3’, there are no widely
recognized quantitative distinctions between males, females and other. They are purely
categorical distinctions. Another example of a nominally measured variable is ‘country of
residence’, where participants respond to such categories as: Argentina, Australia, Canada,
England, Netherlands, New Zealand, and USA, for example. Again, there is no quantitative

1
I purposely spoke in a dry, monotone voice for this video, in order to help out the dry jokes I made
along the way.

C1.6
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 1: CONCEPTS AND TERMINOLOGY

distinction between the categories, as far as one’s country of residence is concerned.2 A final
example question measured on a nominal scale is, “Do you eat meat? Yes or no.” Again, in this
case, there is no quantitative distinction between the two responses, yes or no, that can be
divided into smaller units. For this reason, it is nominal in nature. A variable measured with only
two categories (e.g., Male/Female; Yes/No) is sometimes referred to as dichotomous.
The ordinal level of measurement is categorical in nature, however, there is an ordered
nature to the categories. Consequently, ordinal measurement is more informative than nominal.
Theoretically, ordinal categories have different sizes or degrees, however, the quantitative
difference between the categories cannot be specified precisely. Two commonly used ordinal
scales include ranking and rating scales.
Ranking consists of applying numbers to objects to represent an ordering across a
dimension of interest. Once ranked, the relationship between the objects is such that a higher
or lower rank denotes more or less of the attribute of interest. For example, a panel of wine
connoisseurs could be asked to rank order the quality of 10 wines, whereby a rank of 1 denotes
the best tasting wine and a rank of 10 denotes the worst tasting wine. In this case, the attribute
of interest is quality of taste. Thus, a wine ranked 6 would be considered better tasting than a
wine ranked 7. Similarly, a wine ranked 2 would be considered better tasting than a wine ranked
3. Although the numerical difference between the ranks is equal to 1 in both cases (7 – 6 = 1; 3
– 2 = 1), it would be unjustifiable to suggest that the difference in the quality of the wines was
equal. That is, it is possible that the difference between the wine ranked 2 was much, much
better than the wine ranked 3, whereas the difference between the wines ranked 6 and 7 might
be really small. It is for this reason that the ordinal level of measurement is not considered fully
informative.
Another type of commonly used ordinal measurement is the rating scale. There are
many different sorts of rating scales. Rating scales are commonly used in surveys. Rating scales
have in common that they have the participant select a number along a scale that may range
between two or more numerical values. Typically, a larger selected number represents a larger
amount of the attribute of interest. One of the most commonly used rating scales is the Likert
scale. In a Likert scale, respondents provide their level of agreement associated with a particular
statement. For example, the item ‘I enjoy learning about statistics,’ could be responded to on
the following 5-point Likert scale: 1 = Strongly Disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 =
Strongly Agree. Such a scale would be considered ordinal in nature.
An interval scale is more informative than an ordinal scale, because the exact,
quantitative difference between numbers within an interval scale reflect equal differences in
magnitude. Thus, an interval scale is more continuous in nature than an ordinal scale. Examples
of interval scales include degrees Celsius, Gregorian calendar time, IQ scores, and scholastic

2
It is true that these countries could be distinguished quantitatively based on population, however, the
question is simply relevant to the country in which the person resides.

C1.7
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 1: CONCEPTS AND TERMINOLOGY

achievement scores. What these four scales have in common is that there is no absolute or
meaningful zero point. Zero degrees Celsius is simply the freezing point of water: it is not the
total absence of heat. Julian calendar time starts at the birth of Christ. Thus, the year zero in the
Julian calendar does not denote the beginning of time. An IQ score of zero is impossible.
However, the difference between an IQ score of 100 and 115 is the same as the difference
between an IQ score of 115 and 130. Additionally, the difference between 10 and 20 degrees
Celsius is the same as the difference between 20 and 30 degrees Celsius. However, it would
inappropriate to suggest that someone with an IQ of 140 is twice as smart as someone with an
IQ of 70. Similarly, 10 degree Celsius is not twice as warm as 5 degrees Celsius. Thus, degrees
Celsius and IQ scores are not the most fully informative scales. Instead, they are interval in
nature.
The ratio scale is the most fully informative level of measurement. The only difference
between a ratio scale and an interval scale is that a ratio scale has a true zero point. Thus, a
score of zero implies the total absence of the attribute of interest. The implication of a true zero
point is that one can express the difference between two values as a ratio. That is, you can say
that someone possesses twice as much of an attribute as someone else. For example, a memory
scale may be based on the quantity of numbers that can be recalled over a short period of time.
A person can fail to recall even a single number, which would imply that they have no short-
term memory for digits. Furthermore, someone who can recall 8 digits has twice the memory
for digits than someone who can only recall 4 digits. Other examples of ratio scales include
height, weight, reaction time, and number of likes for a YouTube video.
As mentioned above, knowledge of levels of measurement has implications for which
type of statistical analysis may be applied to a particular set of data. In practice, researchers do
not make a distinction between interval and ratio scales, as the same statistical analyses can be
applied to interval and ratio data. Consequently, researchers may refer to their data as
measured on an ‘interval/ratio’ scale, when it may actually be only interval. Larger implications
reveal themselves when the data are measured on an ordinal scale, or a nominal scale. Thus, it
is important to evaluate a variable with respect to its level of measurement across interval/ratio,
ordinal, and nominal. As I describe in chapter 16 of this textbook, not all ordinal data are equally
informative. Consequently, they should arguably be treated differently when it comes to
statistical analyses. To foreshadow my position, some ordinal scales can be treated as if they
were measured on an interval/ratio scale.

Hypothesis Testing
The word ‘hypothesis’ is used commonly by researchers. A hypothesis is a statement
that reflects a position on the nature of an effect, or the absence thereof. There are two types
of hypotheses: null and alternative. A null hypothesis (H0) is a formal statement that represents
the total absence of an effect. For example, with respect to the lecture attendance and final

C1.8
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 1: CONCEPTS AND TERMINOLOGY

exam performance study I described above, the corresponding null hypothesis is (Watch Video
1.8: What is a hypothesis?):

H0: There is no association between lecture attendance and final exam performance.

In practice, researchers hope to reject the null hypothesis.3 That is, researchers do not (usually)
collect data, spend time entering into a spreadsheet, and then analyze it, in the hopes that they
will fail to reject the null hypothesis. Essentially, researchers state the null hypothesis and hope
to find evidence that the null hypothesis is wrong. Many researchers and statisticians get
uptight, when they come across a statement that essentially says that the null hypothesis has
been supported. The argument here is that, strictly speaking, the null hypothesis is never
supported. Instead, one only fails to find empirical evidence to reject it. There are good reasons
for such a position on the grounds of statistical power, a topic discussed in another chapter. In
practice, the probability with which the null hypothesis can be rejected is estimated with a p-
value. Many of the chapters in this textbook will include analyses relevant to the estimation of
p-values across a number of statistics.
In contrast to the null hypothesis, the alternative hypothesis is a statement that
represents an association, broadly defined, between two or more variables. Alternative
hypotheses tend to be more interesting. To find empirical evidence in favor of an alternative
hypothesis can potentially advance a scientific field, not to mention advance the career of a
scientist or fill the pockets of a quantitative finance trader. With respect to the lecture
attendance and final exam performance study, the corresponding alternative hypothesis is:

H1: There is an association between lecture attendance and final exam performance.

In practice, a university lecturer or professor might take the time to collect data on lecture
attendance and final exam performance in the hopes of obtaining evidence that there is a
positive association between lecture attendance and final exam performance. That is, higher
levels of lecture attendance are associated with higher final exam marks. If statistical evidence
were obtained to suggest that the null hypothesis should be rejected, it necessarily implies that
the alternative hypothesis should be accepted. Throughout this textbook, a large number of
hypotheses will be tested with a variety of statistics, in order to demonstrate their application.
For the most part, the examples I use are based on data that correspond to the hypotheses and
results that were tested and reported in published studies. So, if you conduct the analyses
yourself (I show you how), you will be retracing the footsteps of real scientists!

3
Of course, a proper, well-behaved scientist should not have hopes about hypotheses. Instead, a scientist
should be satisfied with the process and outcomes of discovering the nature of reality, irrespective of
whether a null hypothesis is rejected or not.

C1.9
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 1: CONCEPTS AND TERMINOLOGY

Summary
The question of why statistics are conducted was answered in this chapter. Statistics are
conducted to infer results from samples to populations, to quantify the direction and magnitude
of an effect, and to test one or more assumptions associated with another statistical analysis (I
made a joke about getting attention). The concept of inferential statistics was introduced, as
were the four levels of measurement: nominal, ordinal, interval, and ratio. Finally, null and
alternative hypotheses were described in the context of inferential statistics.
A large number of inferential statistical analyses will be covered throughout this
textbook. Prior to introducing some basic inferential statistical analyses, it would be useful to
become familiar with a category of statistics known as descriptive statistics. Descriptive
statistics are different to inferential statistics in that descriptive statistics are not really relevant
to testing hypotheses. Instead, descriptive statistics are conducted typically in order to get a
general sense of the nature of the data (Watch Video 1.9: Summary – Chapter 1).

C1.10
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 1: CONCEPTS AND TERMINOLOGY

References
Gatherer, D., & Manning, F. C. (1998). Correlation of examination performance with lecture
attendance: a comparative study of first-year biological sciences
undergraduates. Biochemical Education, 26(2), 121-123.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677-680.

C1.11
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.

You might also like