You are on page 1of 13

Module 1: The Nature of Statistics

Students Intended Learning Outcome:


At the end of the week, the pre-service teacher (PST) should be able to:
1. discuss the contributions of the different statisticians/mathematicians in the continuous
improvement of statistical knowledge and concepts
2. differentiate between
a. Descriptive and Inferential Statistics
b. Population and Sample
3. Identify the types of data and the level of measurement for each variable
4. describe the different data collection methods and sampling techniques
5. create a tree diagram illustrating data and their levels of measurement
6. Explain how Statistics can be used and misused
7. Point out the importance of using digital technology in Statistics

Introduction:

Probability and statistics acts as an inseparable twins, regardless whether you have done your
experiment, or planning to carry it, the following question is always hanging there: :what is the chance of
success?”

Probability and statistics, was considered as the fields of mathematics concerned with the laws
regulating random events, including the gathering, analysis, interpretation, and display of numerical data.
Probability has its beginnings in the study of gambling and insurance in the 17th century, and it is now a
vital instrument of both social and natural sciences. Statistics, however, have its origin in census counts
taken thousands of years ago (Porter, 2020).

I. Description and History of Statistical Science

A. Brief Historical Development of Statistics

The word statistics is derived from the Latin word "status" or from the Italian word "statista"
which can be attributed as “political state” or “government”. In the past, the rulers and kings
employed statistics to gather data or needed information on land, farming, trade and their state
populations to evaluate their military capability, wealth, fiscal resources and other government
issues. Thus, statistics is closely linked with the administrative affairs of a state.

In the seventeenth and eighteenth centuries, mathematician were asked by gamblers to


develop principles that would improve the chances of winning at cards and dice. The two most
noted mathematicians who became involved in this, the first major study of probability were
Bernoulli and DeMoivre . In the 1730s DeMoivre developed the equation for the normal curve.
Important work on probability was conducted in the first two decades of the nineteenth century
by two other mathematicians, LaPlace and Gauss. Their work was an application of probability
principles to astronomy. Gauss introduced the theory of errors in physical sciences at the end of
eighteenth century.

Through the eighteenth century statistics was mathematical, political and governmental. In
the early nineteenth century a famous Belgian Statistician, Quetelet, applied statistics to
investigation of social and educational problems. Beyond any doubt, Francis Galton has the
greatest effect on the introduction and use of statistics in the social sciences. Galton contributed
in the field of heredity and eugenics, psychology, anthropology, and statistics. Our present
understanding of correlation, the measure of agreement between two variables, is credited to him.
The mathematician Pearson collaborated with Galton in later years and was instrumental in
developing many of the correlation and regression formulas that are in us today. Among Galton’s
contribution was the development of centiles and percentiles.

Though the importance of statistics was strongly felt, its tremendous growth was in the
twentieth century. During this period, lot of new theories, applications in various disciplines were
introduced. With the contribution of renowned statisticians several theories and methods were
introduced, naming a few are Probability Theory, Sampling Theory, Statistical Inference, Design
of Experiments, Correlation and Regression Methods, Time Series and Forecasting Techniques.

In early 1900s, statistics and statisticians were not given much importance but over the years
due to advancement of technology it had its wider scope and gained attention in all fields of
science and management. It is pertinent to note that the continued growth of statistics is closely
associated with information technology. As a result several new inter- disciplines have emerged.
They are Data Mining, Data Warehousing, Geographic Information System, Artificial Intelligence
etc. Now-a-days, statistics can be applied in hardcore technological spheres such as
Bioinformatics, Signal processing, Telecommunications, Engineering, Medicine, Crimes, Ecology,
etc.

B. Definition of Statistics

The word statistics has several meanings.

 In the first place, it is a plural noun which describes a collection of numerical data
such as employment statistics, accident statistics, population statistics, birth and death,
income and expenditure, of exports and imports etc. It is in this sense that the word
'statistics' is used by a layman or a newspaper.

- The word ’statistics’ is defined by Professor Secrit as follows:- "By statistics we


mean aggregate of facts, affected to a marked extent by multiplicity of
causes, numerically expressed, enumerated or estimated according to
reasonable standard of accuracy, collected in a systematic manner for
a predetermined purpose and placed in relation to each other."
-
 Secondly the word statistics as a singular noun, is used to describe a branch of
applied mathematics, whose purpose is to provide methods of dealing with a collections
of data and extracting information from them in compact form by tabulating,
summarizing and analyzing the numerical data or a set of observations.

- The word 'statistics' is defined by Croxton and Cowden as follows:- "The collection,
presentation, analysis and interpretation of the numerical data."

 Statistics is that branch of science that deals with 1) collecting; 2)organizing;


3)summarizing; 4) analyzing of data, and 5) making inferences, or decisions and predictions,
about a population based on the data of the sample.

II. Population and Sample

 A population is a group, or a set of objects, or individuals, that share a certain property, or


characteristics, and it is the entire interesting group to be studied. It is the entire set of
individuals or objects of interest or the measurements obtained from all individuals or objects
of interest.
 A smaller or representative part, or a subset of the population is called sample. It is a
portion, or part, of the population of interest

Figure 1. Illustration of population and sample

Here are some examples of the aforementioned concepts

Example 1. The students officially enrolled in any class at the City of Malabon University, form a
population since there are no more students that will have the same property.

Example 2. Consider the number of students enrolled in a particular class, and choose, at random a
committee of five students. This committee is a sample of that population.

Note: The elements in a population, or in a sample, are called observations, measurements, scores or just
data.

III. Descriptive and Inferential Statistics

There are two types of statistics: descriptive and inferential

a. Descriptive Statistics

It is a methods of organizing, summarizing, and presenting data in an informative way.


Use descriptive statistics to summarize and graph the data for a group that you choose. This
process allows you to understand that specific set of observations.

Descriptive statistics describe a sample. You simply take a group that you’re interested in,
record data about the group members, and then use summary statistics and graphs to present
the group properties. With descriptive statistics, there is no uncertainty because you are
describing only the people or items that you actually measure. You’re not trying to infer
properties about a larger population.

The process involves taking a potentially large number of data points in the sample and
reducing them down to a few meaningful summary values and graphs. This procedure allows us
to gain more insights and visualize the data than simply pouring through row upon row of raw
numbers.

Example: Consider the scores of 30 students in a test

66.21 73.58 76.62 81.28 86.13


66.98 73.69 76.69 81.39 86.81
67.77 74.32 78.09 81.98 87.37
68.75 75.35 79.05 83.01 91.80
72.14 76.23 79.56 83.61 94.89
73.11 76.55 80.34 85.73 96.53

Using descriptive statistics, we can present the test scores in graphical form and other
statistic available.

Statistic Class value


Mean 79.18
Range 66.21 – 96.53
Proportion >= 70 86.70%

These results indicate that the mean score of this class is 79.18. The scores range from
66.21 to 96.53, and the distribution is symmetrically centered around the mean. A score of at
least 70 on the test is acceptable. The data show that 86.7% of the students have acceptable
scores.

Collectively, this information gives us a pretty good picture of this specific class. There is
no uncertainty surrounding these statistics because we gathered the scores for everyone in the
class. However, we can’t take these results and extrapolate to a larger population of students.

Elements of a Descriptive Statistical Problem


1. Define the population (or sample) of interest
2. Select the variables that are going to be investigated
3. Select the tables, graphs, or numerical summary tools
4. Identify the pattern in the data
b. Inferential Statistics

Inferential statistics is a methods used to determine something about a population on the


basis of a sample. Inferential statistics takes data from a sample and makes inferences about the
larger population from which the sample was drawn. Because the goal of inferential statistics is
to draw conclusions from a sample and generalize them to a population, we need to have
confidence that our sample accurately reflects the population. This requirement affects our
process. At a broad level, we must do the following:
1. Define the population we are studying.
2. Draw a representative sample from that population.
3. Use analyses that incorporate the sampling error.
Make sure that the sample mirrors the population on average and this can be done using
random sampling. It allows us to have confidence that the sample represents the population.
Random sampling produces statistics, such as the mean, that do not tend to be too high or too
low. Using a random sample we can generalize from the sample to the broader population.

The most common methodologies in inferential statistics are hypothesis tests, confidence
intervals, and regression analysis.

Elements of an Inferential Statistical Problem


1. Define the population of interest
2. Select the variables that are going to be investigated
3. Select a sample of the population units
4. Run statistical test on the sample
5. Generalize the results to your population and draw
conclusions

c. Differences between Descriptive and Inferential Statistics


As you can see, the difference between descriptive and inferential statistics lies in the
process as much as it does the statistics that you report.

For descriptive statistics, we choose a group that we want to describe and then measure
all subjects in that group. The statistical summary describes this group with complete certainty
(outside of measurement error).

For inferential statistics, we need to define the population and then devise a sampling
plan that produces a representative sample. The statistical results incorporate the uncertainty
that is inherent in using a sample to understand an entire population. The sample size becomes a
vital characteristic. The law of large numbers states that as the sample size grows, the sample
statistics (i.e., sample mean) will converge on the population value.

A study using descriptive statistics is simpler to perform. However, if you need evidence
that an effect or relationship between variables exists in an entire population rather than only
your sample, you need to use inferential statistics.

d. Variables and Types of Data


The collection of data that are relevant to the problem being studied is commonly the
most difficult, expensive, and time-consuming part of the entire research project. Statistical
data are usually obtained by counting or measuring items.

 Primary data are collected specifically for the analysis desired


 Secondary data have already been compiled and are available for statistical analysis
 A variable is an item of interest that can take on many different numerical values.
 A constant has a fixed numerical value.
Statistical data are usually obtained by counting or measuring items. Most data can be
put into the following categories:

1. Qualitative Data- data are measurements that each fail into one of several categories. (hair
color, ethnic groups and other attributes of the population)

Qualitative data are generally described by words or letters. They are not as widely used as
quantitative data because many numerical techniques do not apply to the qualitative data.
For example, it does not make sense to find an average hair color or blood type.

Qualitative data can be separated into two subgroups:


 dichotomic (if it takes the form of a word with two options (gender - male or female)
 polynomic (if it takes the form of a word with more than two options (education -
primary school, secondary school and university).

2. Quantitative Data - data are observations that are measured on a numerical scale
(distance traveled to college, number of children in a family, etc.)

Quantitative data are always numbers and are the result of counting or measuring attributes
of a population.

Quantitative data can be separated into two subgroups:


 discrete (if it is the result of counting (the number of students of a given ethnic group
in a class, the number of books on a shelf, ...)
 continuous (if it is the result of measuring (distance traveled, weight of luggage, …)

e. Numerical Scale of Measurement Used in the Study of Variables (Stevens, 1946)

1. Nominal scale of Measurement


 Nominal – consist of categories in each of which the number of respective observations is
recorded. The categories are in no logical order and have no particular relationship. The
categories are said to be mutually exclusive since an individual, object, or measurement can
be included in only one of them.
 The simple form of classification and the least sophisticated. It is sometimes called categorical
scales or categorical data.
 Data assume no natural ordering and nominal scales have no numerical value.
 Largely allied to measuring qualitative characteristics such as eye color, hair color, gender,
nationality or even lifestyle groups, i.e., singles, young married, retired.
 No mathematical relation of comparative magnitude exists between two variables.

Example of Nominal Data

Eye Color Number of Men Percentage


Blue 60 30
Brown 80 40
Green 30 15
Gray 20 10
Hazel 10 5
TOTAL 200 100

Examples of Survey Questionnaire that use Nominal Data

a. What is the teacher’s gender?


___________ Male
___________ Female

b. Describe your current employment status.


___________ Full time
___________ Part time
___________ not currently employed
___________ Retired
c. Indicate your current marital status.
___________ Single
___________ Married
___________ widowed
___________ Separated

2. Ordinal Scale of Measurement


 Ordinal – contain more information. Consists of distinct categories in which order is implied.
Values in one category are larger or smaller than values in other categories (e.g. rating-excelent,
good, fair, poor)
 An ordinal scale not only classifies subjects but also ranks them in terms of the degree to which
they possess a characteristics of interest.
 The ordinal scale assumes a relation of comparative magnitude (greater than, less than) among
the categories or scale points involved.
Example: Observing the “body language” of students attending a class lecture, an
observer may decide that student B shows greater interest in what is being said than
student A, and that student C shows greatest interest than student B.
 a good example is ‘rating scales’
 It provides information about relative magnitude but it does not provide information about the
degree to which observed entities differ from one another.

Example of ordinal data


Rail travelers might be asked to give their views on the quality of the MRT service
according to a scale of 1 – 5 where:
  1 = very poor
2 = poor
3 = adequate
4 = good
5 = very good
Rating Number of MRT Percentage
travelers
Very Poor 30 10
Poor 50 16.67
Adequate 100 33.33
Good 80 26.67
Very Good 40 13.33
TOTAL 300 100

Examples of survey questionnaire that use ordinal scale

a. How much education have you completed?


_________ College graduate
_________ with some units in college
_________ High school graduate
_________ never finished high school

b. What is your latest student evaluation?


_________ Outstanding
_________ Very satisfactory
_________ Satisfactory
_________ Poor
c. How often during the last month did you find yourself tardy?
_________ Always
_________ Very often
_________Fairly often
_________ Sometimes
_________ Almost never

3. Interval Scale of Measurement


 Interval – is a set of numerical measurements in which the distance between numbers is of a
known, constant size.
 An interval scale has all the characteristics of a nominal scale and an ordinal scales, but in
addition, it is based upon predetermined equal intervals.
 Achievement tests, aptitude tests, and intelligence tests represent interval scales.
 When scores have equal intervals, it is assumed, for example, that the difference between a
score of 30 and a score of 40 is essentially the same as the difference between the scores of 50
and a score of 60.
 This scale does not have a true zero point, although a zero point is often use in the interval scale
measurement, the designation does not mean the total absence of the thing measured.
 For example, if an IQ test produces scores ranging from 0 to 200. A score of 0 does not indicate
the absence of intelligence. A score of 0 only indicates the lowest level of performance possible
on that particular test and a score of 200 represents the highest level.
 Scores resulting from administration of an interval scales can be added and subtracted but not
multiplied or divided. For instance, an achievement test scores of 90 is 45 points higher than a
score of 45, but we cannot say that a person scoring 90 knows twice as much as a person scoring
45.
 Other example of interval scales are trait anxiety, level of satisfaction, family income.

4. Ratio Scale of Measurement


 Ratio – consists of numerical measurements where the distance between numbers is of a known,
constant size, in addition, there is a non-arbitrary zero point
 A ratio scale represents the highest, most precise, level of measurement.
 The ratio scale of measurement provides the true zero points aside from having equal intervals
between its points.
 Weight, height, time, distance and speed are example of ratio scales.
 With true zero point, we can say that a man 5’4” is twice as tall as a child 2’7’ in height.
It is the most powerful among the four measurement scales. Example: Scores of students in a
given examination
 A statistic appropriate for a lower level of measurement may be applied to data representing a
higher level of measurement. A statistic appropriate for ordinal may be used with interval data,
since interval data possess all the characteristics of ordinal data and more. The reverse, however
is not true. A statistic appropriate for interval data cannot be applied to ordinal data since such a
statistics requires equal interval. (Downie, 1984)

f. Data Collection and Sampling Techniques

A sample should have the same characteristics as the population it is representing.


Sampling can be:
 with replacement: a member of the population may be chosen more than once
(picking the candy from the bowl)
 without replacement: a member of the population may be chosen only once (lottery
ticket)

Sampling methods can be:

 Probability or random sampling (each member of the population has an equal chance
of being selected)
 Non- probability or non-random sampling

The actual process of sampling causes sampling errors. For example, the sample may not be
large enough or representative of the population.

Factors not related to the sampling process cause non-sampling errors. A defective counting
device can cause a non-sampling error.

1. Random Sampling or Probability Sampling

A. Simple Random Sampling or Lottery Sampling - selection so that each has an equal
chance of being selected.
B. Systematic Random Sampling - Select some starting point and then select every Kth element
in the population

C. Stratified Sampling - subdivide the population into subgroups that share the same
characteristic, then draw a sample from each stratum.

d. Cluster Sampling - divide the population into sections (or clusters); randomly select
some of those clusters; choose all members from selected clusters

2. Non-Random Sampling or Non-Probability Sampling

a. Convenience Sampling – use results that are readily available

b. Judgment Sampling - In this case, the person taking the sample has direct or indirect control
over which items are selected for the sample. An expert selects a representative sample according
to his own subjective judgment.

c. Quota Sampling - The main concern in quota sampling is to come up with the desired number
of samples no matter how they are selected. In this method, the decision maker requires the
sample to contain a certain number of items with a given characteristic. Many political polls are,
in part, quota sampling.
d. Volunteer Sampling – Sample consists essentially of volunteers.

e. Haphazard/Incidental Sampling – samples are selected purely by chance; that is, whoever is
available at the time and place the data is to be collected.

f. Purposive Sampling – The researcher selects those who can best help or give information
based on his own judgment. Subjects are not randomly selected.

The steps involved in sampling include:


1.  Identify the target population
2.  Identify the subject or respondent population
3.  Specify the criteria for subject or respondent selection
4.  Specify the sampling design
5.  Recruit the subjects

Determining the Sample Size

Slovin's formula is a very general equation used when you can estimate the population but
have no idea about how a certain population behaves. The formula is described as:

Sample Size = N / (1 + N*e2)


N = population size
e = margin of error

Note that this is the least accurate formula and, as such, the least ideal. You should only use
this if circumstances prevent you from determining an appropriate standard deviation and/or
confidence level (thereby preventing you from determining your z-score, as well).

Example 1: Calculate the necessary survey size for a population of 240, allowing for a 4% margin
of error.

Solution: Given are N = 240 and e = 0.04

Sample Size = N / (1 + N*e2)


= 240 / (1 + 240 * 0.042)
= 240 / (1 + 240 * 0.0016)
= 240 / (1 + 0.384}
= 240 / (1.384)
= 173.41 (final answer)
Example 2. From the population of 10,000 clients with tuberculosis, a researcher selected a
sample size with a margin of error of 5%. What is desired sample size to be considered for the
research?

g. Uses and Misuses of Statistics

Why study statistics?


1. Data are everywhere
2. Statistical techniques are used to make many decisions that affect our lives
3. No matter what your career, you will make professional decisions that involve data. An
understanding of statistical methods will help you make these decisions effectively

h. The use of computers and calculator

Activity 1. Basic Concepts

A. Classify the variables as Qualitative or Quantitative


1. Nation of origin
2. Number of friends
3. Eye color
4. Grams of sugar in meal
5. Number of left turns you made while driving home today.
6. The value of a car
7. Your mobile phone number
8. Your student ID number
9. The number of media men/women killed for the last 3 years
10. The number of Gold of the Countries participated in Tokyo Olympics 2020.

B. Classify the quantitative variables as discrete or continuous.


1. The distance of school from your house.
2. The time top run a marathon.
3. The number of questions you will get wrong on a multiple choice test
4. The number of hours spend in an online class
5. The number of seats in a classroom
6. The amount of gas in the tank of a car
7. The number of COVID 19 Delta variant cases in NCR
8. The number of hours the doctor staying in the hospital
9. The amount of money spend for vaccines
10. The number of those who recovered from COVID 19

C. Determine the sample size.


1. A group of 1,000 city government employees needs to be surveyed to find out
which tools are best suited to their jobs. For this survey a margin of error of 0.05
is considered sufficiently accurate. Using Slovin’s formula, find the required
sample survey size.
2. Use Slovin’s formula to find out what sample of a population of 1,000 people you need to
take for a survey on their soda preferences. Use a confidence level of 95%?

3. A retailer who is interested to know how many of their customers bought an item from them
after viewing their website on a certain day. Given that their website has on average, 10,000
views per day determine the sample size of the customers that they have to monitor at a 95%
confidence level with a 5% margin of error.  

References

Downie & Heath, Basic Statistical Method Fifth Edition, Harper and Row, Publisher, Inc., 1983
Porter, Theodore M.. "Probability and statistics". Encyclopedia Britannica, 3 Feb. 2020,
https://www.britannica.com/science/probability. Accessed 10 August 2021.

https://www.brainkart.com/article/Origin-and-Growth-of-Statistics_35037/

https://statisticsbyjim.com/jim_frost/

You might also like