You are on page 1of 14

1

Introduction to Statistics

INTRODUCTION
Welcome to the world of statistics! You are about to encounter numbers, tables, names, graphs,
probabilities, and trends –in other words, all about statistics.

The module will teach you what descriptive statistics is all about. Statistics is an orderly science;
hence it can be understood easily. A conceptual understanding of the statistical procedures used in
nursing as well as the computational skills to carry out these procedures is given in this module. At the
end of the module, some activities and exercises are given. Please do the activities and answer the
questions because they will enhance your mastery of the lesson. Approach this module with an open
and positive mind. You will like statistics because it is a very useful course.

OBJECTIVES

At the end of this module, you will be able to:

1. Discuss the science of statistics;


2. Explain the fundamental elements of statistics;
3. Explain the role of statistics in critical thinking in health related situations.

1.1THE SCIENCE OF STATISTICS

Statistics is the science of data. It is meaningful and useful science whose broad scope of application
to nursing and other health sciences, to government, to business and other physical and
biopsychosocial sciences is limitless. What about you, what comes to mind when you think of
statistics? Does it bring into your mind unemployment figures, election returns, or basketball scores?
Or is it simply a graduate course requirement you have to complete?

Statistics is logical. It has a key role in critical thinking in the classroom, in the hospital, on the job, or
in everyday life. Thus, the time you spend in studying the subject will repay you in many ways later.
Each of us has a built-in system of reference that helps us make decisions. One definite we also have a
built-in set of prejudices that may affect our decisions. One definite advantage of statistics is that it can
help us make decisions without prejudice. Moreover, statistics can be used for making decisions when
faced with uncertainties. For example, suppose you want to estimate the proportion of how many
among the nurses enrolled in this course will finish the course on time, you would need statistics to
predict the number of these who will finish versus those who will not.

The general prerequisite for statistical decision-making is the gathering of numerical facts or
information. Procedures for evaluating numerical data, together with rules of inference, are prime
topics in the study of statistics.

In this line of term, statistics are trained in collecting, evaluating, and drawing conclusions from
numerical information. More importantly, statisticians determine what information is relevant in
giving problem and whether the conclusions drawn from the study are to be trusted.

Statistical methods by themselves have no power to work miracles; however, these methods can help
us make some decisions. Furthermore, the statistical results should be interpreted by one who
understands not only the methods but also the subject matter, especially the conceptual or theoretical
framework to which statistics have been applied.

Thus, statistics is the science of data that involves collecting, classifying, summarizing, organizing,
analyzing, and interpreting numerical information or data.

1.2 THE FUNDAMENTAL ELEMENTS OF STATISTICS

1.2.1 Population and Sample

Statistical methods are useful for studying, analyzing, and learning about population. A population is
a set of units / such as people, objects, transactions, or events, that we are interested in studying. For
example, populations may include:

1. People
1.1 all Filipino women working in foreign countries
1.2 all registered nurses in the Philippines
1.3 everyone who is enrolled in nursing in the WCC Antipolo.

2. Objects
2.1 all theses and dissertations done in 1998
2.2 all stores selling Filipino products
2.3 all shoes manufactured in Marikina

3. transactions
3.1 all memos of agreement signed by the WCC Antipolo administration in 1998
3.2 all sales of Jollibee foods delivered to the WCC College of Nursing from Antipolo
branch in January-February 1999
3.3 all promotions of the WCC Antipolo faculty in 1997
4. events
4.1 all victims of fireworks accidents brought to PGH emergency room in December 1998
and January 1999
4.2 all birthday celebrations of graduating students in April 1999
4.3 all births registered at all Manila hospitals on February 14, 1999

In the above examples, you will notice that each set includes all the units in the population.

1.2.2 Variables and Sample

According to McClane and Sincich (1997), it is possible to measure a characteristic for every unit in
the population if the population you wish to study is small. For example, if you are measuring the
high school GPA of all incoming first year students at WCC Antipolo, it is feasible to obtain these
data. When we measure a characteristic for every unit of a population, the result is a census of the
population.

Oftentimes it is not feasible to study the entire population. For instance, how would you measure the
weight and height of each 5 year old boy in the Philippines? For such a population conducting a
census would be prohibitively time consuming and very costly. A reasonable alternative is to select
and study a subset or a portion of the population.

A sample is a subset of a population. It is a finite number of units selected from the population. Thus,
sample is simply a part of the population. But not every sample is a representative of a population. To
be a representative, that sample must be selected randomly. A random sample is determined
completely by chance. According to Brase and Brase (1983) in a simple random sampling every
number or units of the population has an equal probability or chance of being included in the sample.

For example, instead of polling all 139,000 registered nurses in the Philippines regarding who they
voted for during the 1998 presidential election, a pollster can just randomly select a sample of 1,000
registered nurses to represent all the registered nurses in the Philippines.

In studying a population, we focus on one or more characteristics or properties of the units in the
population. Such characteristics are called variables.

A variable is a characteristics or property of an individual population or sample unit. For example,


we may be interested in the variables age, gender, and number of years of education of the
unemployed residents of Manila. The name variable is derived from the fact that any particular
characteristic may vary among the units in the population or sample.

Let us have some examples.

Example 1
A PhD student in Nursing investigated the number of children per household in Quezon City.
A sample of 500 households in Quezon City was randomly selected to determine the number of
children per family.
a. Describe the population
b. Describe the sample
c. Describe the variable of interest

Solution

a. The population of interest is all the households in Quezon City.


b. The sample includes the 500 households randomly selected by the investigator.
c. The total number of children per household is the variable of interest.
Example 2 (adapted from McClane & Sincich, (1997)

“Cola wars” is the popular term for the intense competition between Coca Cola and
Pepsi Cola displayed in their marketing campaigns. Their campaigns have featured movie and
television stars, rock videos, athletic endorsements, and claims of consumer preference based
on taste tests. Suppose, as part of a Pepsi marketing campaign, 1,000 cola consumers are
given a blind taste test (i.e. a taste test in which the two brand names are disguised). Each
consumer is asked to state a preference between brand A or brand B. the total number of
children per household is the variable of interest.
a. Describe the population
b. Describe the sample
c. Describe the variable of interest
Solution
a. The population of interest is the collection or set of all customers.
b. The sample is the 1,000 consumers selected from the population of all cola
consumers.
c. The characteristic that Pepsi wants to measure is the consumer’s cola preference.

1.2.3 Measurement

Statistics can be applied in the analysis of a variable the variable can be represented numerically. We
do this through the process of measurement. Measurement is the process we use to assign numbers
to variables of individual population units. For example, we can measure the teaching performance of
a faculty member by asking all his/her students to rate his/her performance on a scale from 1to 10. Or,
we can measure research assistant’s age by simply asking them their actual age. To gather data for a
variable we can use either quantitative measurements or qualitative measurements.

Quantitative measurements use a naturally occurring numerical scale to describe the size of a
particular data.

Examples:
1. The temperature (in degrees Celsius) at which 20 pieces of heat-resistant plastic begin to
melt.
2. The current unemployment rate (measured as a percentage) for each province and city of
the Philippines.
3. The scores of a sample of 150 NMAT medical students applicants administered
nationwide.
4. The successful master’s graduate students who finished the degree over a ten-year period.

Qualitative measurements involve classification of observation into categories.

Examples:
1. The political party affiliation (Lakes NUCD, Laban, Peoples’ Party, Masang Makabayan,
or Independent) of 100voters from Parañaque.
2. The academic status (pass or fail) on the comprehensive exam of 20 doctoral students.
3. The size of the refrigerators (big, medium, small) rented by each of a sample of 30 transient
boarders.
4. A taste taster’s ranking (best, worst, average) of four brands of salad dressing for a panel of
10 testers.

After the variables of interest for every unit in the sample or population are measured, the data are
analyzed either by descriptive or inferential statistical methods.

Descriptive statistics utilizes numerical and graphical methods to look for patterns in a data set, to
summarize the information in a convenient form.

Inferential statistics utilizes sample data to make estimate, decisions, predictions, or other
generalizations about a population. In this unit, we will only focus on descriptive statistics.

Let us now pause for some activities and exercises. Compare your responses with the answers given at
the end of this module. Do not skip these exercise questions; they are important.

1.3 ROLE OF STATISTICS IN CRITICAL THINKING

As evidenced by media today, there is a need to evaluate the flood of information reaching our
homes. Each day the media present us with published results on economic, health, social and other
concerns. The growth in data collection associated with scientific phenomena, business operations,
and government activities (quality control, statistical auditing, forecasting, etc.) has been remarkable
in the 1990’s. This scenario demands from each one of us to develop a discerning sense – an ability to
use rational thought to interpret the meaning of data. This ability can help us make intelligent
decisions, inferences, and generalizations to think critically. This is possible with the use of statistics.

Statistical thinking involves applying rational thought to assess data and the inferences made from
them critically.

Are you still with me? Let us pause and do some activities.
1.4 SUMMATION NOTATION
In statistics, it is necessary to work with sums of numerical values. To express these, we make use of
standard notation. Let us consider the exam scores of Bertha Pila on 9 statistics exams.

Exam 1 – 88 Exam 4 – 55 Exam 7 – 78


Exam 2 – 6 Exam 5 – 28 Exam 8 – 64
Exam 3 – 46 Exam 6 – 9 Exam 9 – 16

In mathematical notation, letter X denotes a score in a data set. From Bertha’s scores, we have the
following data:

X1 = score on Exam 1 = 88
X2 = score on Exam 2 = 6
X3 = score on Exam 3 = 46
X4 = score on Exam 4 = 55
X5 = score on Exam 5 = 28
X6 = score on Exam 6 = 9
X7 = score on Exam 7 = 78
X8 = score on Exam 8 = 64
X9 = score on Exam 9 = 16

The numbers 1-9 written beside the Xs are called subscripts. They represent the first to the 9 th
observed score in a given data set. In this case, X 1 represents Bertha’s score on the first exam while X 9
represents her score on the ninth exam. In general, X I denotes the ith value in a data set. Using this
notation, the sum of Bertha’s exam scores can be expressed symbolically as:
X1 + X2 + X3 + X 4 + X5 + X6 + X7 + X 8 + X 9

But instead of writing down all this Xs, we can simply express this equation as, where
9
symbol ∑ ❑(Greek capital letter “sigma”) is the summation notation used in statistics.
∑ X
Thus,
i=1
to get the sum of the first, second, third, and ninth values.

In statistics, we always compute for the total sum and not for the partial sum, and so can be further
9 simplified to ∑ X which means “summation of all the scores” in a data set.
∑X
i=1

Applying now Bertha’s exam scores:


9

∑X = X1 + X2 + X3 + X 4 + X5 + X6 + X7 + X 8 + X 9
i=1
= 88 + 6 + 46 + 55 + 28 + 9 + 78 + 64 + 16
= 390

Some Rules of Summation

Rule 1 : ∑ XY is not equal ∑ X ∑ Y


Example : X Y XY
1 4 4
2 5 10
3 6 18
∑ X= 6 ∑ Y =15 ∑ XY = 32
Steps:
 Multiply each X value with each Y value
 Get the summation of ∑ XY , ∑ X , ∑ Y
 Check if ∑ XY is equal to ∑ X ∑ Y

∑ XY =∑ X ∑ Y
32= (6)(15)
32 ≠ 90
Therefore, ∑ XY ≠ ∑ X ∑ Y

Rule 2: ∑ ( X +C ) is not equal to ∑ X + C, where C is a constant


Example: Let C = 5
X X+5
6 11
7 12
8 13
∑ X = 21 ∑ (X +Y )=36
Steps:
 add 5 to each X value
 get ∑ X and ∑ (X +5)
 check if ∑ ( X +5 )= ∑ X +C

∑ (X +C ) = ∑ X +C
36 = 21 + 5
36 ≠ 26
Therefore, ∑ (X +C ) ≠ ∑ X +C

2
Rule 3:¿ ¿ is not equal to ∑ X

Example: X X2
2 4
4 16
6 36
2
∑ X = 12 ∑ X =56
Steps:
 multiply each X value by itself
2
 get ∑ X + ∑ X
2 2
 check if (∑ X ) = ∑ X
2 2
(∑ X ) = ∑ X
(12)2 = 56
(12) (12) = 56
144 ≠ 56
2 2
Therefore, (∑ X ) ≠ ∑ X
SUMMARY

In this module, we saw that statistics is the study of how to collect, organize, analyze and interpret
numerical information. We investigated some types of problem where statistics can be used. In these
situations, we saw examples of population and samples. It is important to remember that the main role
of inferential statistics is to draw conclusions about a population based on information obtained from a
sample. Whereas the main role of descriptive statistics is to prevent or summarize a large mass of data
into a manageable form. We also saw in this module, the elements of statistics and finally we see the
role of statistics in critical thinking. With all this, let us cultivate a liking for this course. We shall
learn more as we study the other modules. Keep up the good work of reading your modules. Statistics
is a skill, you will soon have it.

2
Frequency Distributions

INTRODUCTION
The initial step in the descriptive process that is, describing the data and the cases that are presented by
those data, is the organization of otherwise disorganized information and the condensation of
otherwise unmanageably large quantities of information.

The large mass of data may be organized by a creating a frequency distribution table containing the
following components: frequency, percentage, cumulative frequency, and cumulative percentage. This
module discusses first the ungrouped frequency distributions and later, the grouped.

OBJECTIVES

At the end of this module study, you will:

1. Be familiar with the organization of data according to:


a. Frequency
b. Percentage
c. Cumulative frequency
d. Cumulative percentage
2. Organize a given set of data using the different types of frequency distributions
3. Discuss the significance of the results obtained from the ungrouped and grouped frequency
distributions.

2.1 UNGROUPED FREQUENCY DISTRIBUTION


Basically, frequency distributions show in tabular form the number of each score or category appears
in a data set. Score in their original forms are called raw score or raw data. Raw scores are usually
arranged in any particular order, thus making it difficult for the readers to see clearly the features of
data. See for example Table 2.1, which lists the raw scores of 40 masters’ students in their statistic’s
final examination for their N-298 class in UP Manila. These scores are not arranged in any particular
order, making it hard to examine clearly how well students performed as a group, or how varied the
scores are from one student to the next.

TABLE 2.1 Raw Scores on the Statistics Final Examination of Masters’ Students

81 94 90 80 87 80 85 95
83 92 87 70 96 76 87 89
86 79 75 83 84 75 81 81
81 84 70 78 96 94 88 78
80 77 93 87 77 78 79 72
Table 2.2 on the other hand, present another version of the data in table 2.1. Notice that the final
examination scores are now arranged in order from lowest to highest in the first column, labeled X.
frequencies are then listed in the second column labeled f , showing how many students received each
listed score. When data are organized this way, we can see at a glance that the scores ranged from a
low of 70 to a high of 96, or that four students had a score of 84 and another four had a score of 87.
Such presentation is called an ungrouped frequency distribution. Ungrouped frequency distributions
begin the process of organizing the data into a meaningful form. You can incorporate in the ungrouped
frequency distribution table columns for raw score (X), frequency (f), percentage (%), cumulative
frequency (cf), and cumulative percentage(c%).

2.1.1 Frequencies

To determine the frequencies of the scores in the data set, arrange first the raw scores in ascending or
descending order (as shown in Table 2.2). Finally, under the f column, indicate the number of times
each score appeared in the data set (see Table 2.1). Notice that the sum of all the frequency values (cf)
is equal to N or the total number of observations or scores in the data set.

TABLE 2.2 ungrouped Frequency Distribution of the Statistics final Examination Scores of 40
Master’s Students

X f % cf c%
96 2 5.0 40 2 100.0
95 1 2.5 38 3 95.0
94 2 5.0 37 5 92.5
93 1 2.5 35 6 87.5
92 1 2.5 34 7 85.0
91 0 0.0 33 7 82.5
90 1 2.5 33 8 82.5
89 1 2.5 32 9 80.0
88 1 2.5 31 10 77.5
87 4 10.0 30 14 75.0
86 1 2.5 26 15 65.0
85 1 2.5 25 16 62.5
84 2 5.0 24 18 60.0
83 2 5.0 22 20 55.0
82 0 0.0 20 20 50.0
81 4 10.0 20 24 50.0
80 3 7.5 16 27 40.0
79 2 5.0 13 29 32.5
78 3 7.5 11 32 27.5
77 2 5.0 8 34 20.0
76 1 2.5 6 35 15.0
75 2 5.0 5 37 12.5
74 0 0.0 3 37 7.5
73 0 0.0 3 37 7.5
72 1 2.5 3 38 7.5
71 0 0.0 2 38 5.0
70 2 5.0 2 40 5.0
E f = N = 40

2.1.2 Grouped Percentages

The percentage associated with each score can be computed using this equation:

Percentage (%) = f
N x 100
Where f = each score’s frequency of occurrence
N = total number of scores in the distribution

Percentages have one advantage over frequencies. It is often easier to compare two or more
percentages than frequencies. This is particularly true in instances when 2 or more different
distributions have different sample sizes.

2.1.3 Cumulative Frequencies


Cumulative frequencies show the number of cases of scoring at or below each listed score. Cumulative
Frequencies are determined by adding the frequency listed for a given score and the frequencies listed
for lower scores.

2.1.4 Cumulative Percentages


Cumulative Frequencies become useful when they are converted to cumulative percentages.
Cumulative Percentage shows the percentage of cases scoring at or below each score. Each of these
percentages represents the percentile rank of a particular score. The percentile rank is useful for
determining quickly the relative locations of individual scores. Thus, a score’s percentile rank tells us
how high or how low, how good or how bad a given score is by locating this score relative to the other
scores that we were obtained.

The cumulative percentage for any given score is computed using this equation:

C% = cf
N X 100
Where cf = the cumulative frequency listed for a score
N = total number of scores in the distribution

2.2 GROUPED FREQUENCY DISTRIBUTIONS


It is very tedious to list all individual scores in an ungrouped frequency distribution table when you
have a large number of scores. It is best to present scores in groups or intervals and thus, creating a
grouped frequency distribution table. This table also consists of columns for frequencies, percentages,
cumulative frequencies and cumulative percentages.

To construct a grouped frequency distribution for the data set in Table 2.1, do the following steps:

1. Find the range (R). 1. R = 96 – 70 + 1


R = highest score-lowest score + 1 = 27
2. Determine the class width (W) by
dividing the range by the desired 2. i = 27
number of class intervals. 6

i = ____R_____ = 4.5 or 5
# of class intervals

a. If series contains less than 50 cases,


10 classes or less are just enough.
b. If series contains 50 to 100 cases, 10
to 15 classes are just enough.
c. If more than 100 cases, 15 or more
classes are good. 3. 95-99 96, the highest score,
90-94 is included in this
3. List the class intervals, making sure interval
that the lowest and highest scores of 85-89
the data set are included in the 80-84
bottom and top class intervals 75-79
respectively 70-74 70, the lowest score is
included in the intervals
Note:
a. All class intervals must have the *same width for all class intervals
same class width.
b. For the bottom class interval, start
with a score or number that is a 4. See Table 2.3
multiple of the class width.

4. Determine f, %, cf. c%

Table 2.3 Grouped frequency Distribution of Statistics Final Exam Scores of 40 Nursing Masters’
Students.

Class Interval f % cf c%
95-99 3 7.5 40 100.0
90-94 5 12.5 37 92.5
85-89 8 20.0 32 80.0
80-84 11 27.5 24 60.0
75-79 10 25.0 13 32.5
70-74 3 7.5 3 7.5

In comparing Table 2.2 with Table 2.3, it is shown that the grouped frequency distribution table has
class intervals while the ungrouped has one. Furthermore, grouped frequency distributions provide a
simpler, more economical description of the data than do the ungrouped frequency distributions. By
combining several scores into one class interval, grouped frequency distributions reduced the total
amount of information is that must be digested y someone in.

Again, take a look at the class intervals in Table 2.3. Each class interval is bounded by numbers called
real limits or exact limit. Thus, the lower and upper or exact limits. For each class interval, the lower
exact limits of the class interval 85-89 are 84.5 and 89.5, respectively. Furthermore, each class interval
can be represented by one value and that is the midpoint. A midpoint is the middle value in a class
interval 80-84, the midpoint is 82.

SUMMARY

This module showed you the importance of arranging data and presenting them in distribution tables
that show the frequency, percentage, cumulative frequency and cumulative frequency.

One application of a frequency distribution is that it can give us an idea of how many students
performed below a given passing score. It can give us the picture of how well or how badly a student
performed in a class relative to the scores of the other students.

In the succeeding modules, you will have more of this frequency distribution theme presented in
graphs, histograms, and other position measures. I wish to encourage you to go on – statistics is not
really hard because it is a science of order and logic.

So, until next time, keep on doing the activities because they will build your statistical skills.

You might also like