You are on page 1of 12

Definition of Statistics

- Statistics is a scientific method of collecting, organizing, summarizing, presenting,

analyzing and interpreting data. Valid conclusion and making reasonable decision on
the basis of such analysis is drawn.
- Statistics (in plural form) refer to any set of quantitative data or classified numerical

Historical Overview of Statistics

The term statistics came from the Latin word “statista” which means state. Statistics in
the early days was widely used for the purposes of governing the state such as the figures on
the geographical areas conquered and the number of soldiers killed in the battlefield and for
purposes of taxation

Statistics developed as a science due to man’s propensity for gambling. Mathematicians

were consulted by gamblers to explain the laws of chance concerning the occurrences of events
in the games of chance that led to the early development of probability, on which statistics

Individuals who shaped statistics today:

 Achenwall
 The first to introduce the word “statistiks” in a preface to a statistical
 Zimmerman and Sinclair
 Introduced and popularized the name “statistics” in their books.
 Girotamo Cardano
 An Italian mathematician who wrote “Liber de Ludo Aleae” where the
first study of the principles of probability appeared
 Blaise Pascal
 Worked on the “Game of Points” that marked the beginning of the
mathematics of probability
 De Moivre
 Discovered the equation for normal distribution
 Adolf Quetelet
 A Belgaim astronomer who applied the theory of probability to
anthropology psychology and education.
 Francis Galton
 Developed the use of percentiles and worked with Charled Darwin in the
application of statistics to heredity and correlation theory.
 Karl Pearson
 Worked with Galton to develop regression and correlation theory and
sampling theory
 Ronald Fisher
 Introduced the Fisher’s test used in the analysis of variance.

Two Phases of Statistics:

1. Descriptive Statistics seeks only to describe and analyze a given group without
drawing conclusion or inference about a larger group
2. Inferential Statistics seeks only to draw conclusion or inference about the larger
group based on the sample –subset of the larger group.

Defenition of Terms:

It is important to know some terminologies that will be used in the study of statistics.

1. Data and Information

Data is a set of observation, values and elements under investigation
Information is data that has been collected and processed into meaningful form.

1.1 – Qualitative Data and Quantitative Data

Qualitative Data refers to the categorical or attributes of information that
can be classified by some criterion.
Quantitative Data refers to numerical information.

1.1.1 – Discrete Data and Continuous Data

Discrete Data are obtained by observing values of a discrete
Continuous Data are obtained by observing values of a continuous

2. Constant and Variable

Constant is an attribute that remains the same or does not vary
Variable is a characteristic that varies from one person or thing to another.
2.1 – Qualitative Variable and Quantitative Variable
Qualitative Variable is a non numerically valued variable.
Quantitative Variable is a numerically valued variable.

2.1.1 – Discrete Variable and Continuous Variable

Discrete Variable is a quantitative variable where the possible
values form a finite (or countably infinite) set of
Continuous Variable is a quantitative variable whose possible
values form some interval of numbers.

3. Population and Sample

Population is the collection of all individuals, objects, items, places, events or data under
Consideration in a statistical study
Sample is the portion or representative part of the population chosen for study.

4. Measurement
It is a process of assigning values or score to persons or objects.

4.1 Nominal scale assigns number or other symbols to persons or objects to be used
mainly for identification and classification purposes.

e.g. gender, religion, socio-economic status.

4.2 Ordinal scale places measurements into categories each category indicating different
level of some attributes that is being measured. Categories can be ordered or
distance between categories is undetermined.

e.g. academic ranks, school ranks.

4.3 Interval scale is the distance between any two different numbers in the scale of
Known size. It does not always have a meaningful zero point. A zero point is a
point that indicates the absences of what we are measuring.

e.g. Kelvin Temperature, results of counting and measurements.


refers to the method of selecting a portion from the population under study.

Types of Sampling Techniques

1. Probability sampling allows every unit of the population the chance of being included
In the study

1.1 Simple random sampling is the process of selecting a sample giving each
sampling unit an equal chance of being included in
the sample. This is the most commonly used method
and basic to all sampling designs. This is the most
suitable method for homogenous groups.

1.2 Systematic sampling with a random start is a method of selecting a sample

by taking every k unit from the ordered population.
The first being selected at random. k is called the
Sampling interval.

i. Number the units of the population consecutively from 1 to n
ii. Determine the sampling interval (k) by the formula:
k=N/n where N = population size
n = sample size
iii. Use the table of random numbers to choose r. r is the first unit of
the sample size. The formula for obtaining the sample size (n) is
Slovin’s Formula:
n = N/(1+Ne2) where n = sample size
N = population size
c = margin error

1.3 Stratified sampling is used if the population is made up of groups or items

which are heterogeneous w/ respect to the characteristics
under study. The population should be classified or
stratified into more or less homogeneous number of
population or strata before sampling is done.
random sampling consists of selecting a simple random
sample form each of the sub population which the
population has been classified.

1.4 Cluster sampling is method of selecting a sample of distinct groups or

Clusters of smaller units called elements. The sample cluster may
Be chosen by random sampling using systematic sampling with a
random start.

1.5 Multi-stage sampling is done in stages. The selection of the sample is

accomplished in two or more stages. The population is first
divided into a number of first stage primarily units from which is a
sample is drawn. Within the sample first stage units, a sample
record stage or secondary units is drawn.

2. Non-probability sampling selects the sample in such a way that not all the units of the
Population is given the chance of being selected – some have no chance at all.

2.1 Purposive sampling selects the sample based on the pre-selected


2.2 Quota sampling chooses the sample based on the required number or
Percentage of the population, the selection of which is not based
On randomization.

2.3 Convenience sampling selects the sample that can be easily picked and made
Part of the group since the population is infinite.

*** Non-random samples can be described but cannot be used for

Making conclusions or inferences.

Gathered available facts/data from published or unpublished sources should be
accurate, timely, complete, and relevant to the problem.
Sources of Data

1. Primary data are obtained from published or unpublished materials by the

researchers themselves. These are gathered from an original source or which are
based on a first hand experience.

e.g. diaries, autobiographies and first person accounts

2. Secondary data are obtained from existing documents or published or unpublished

reports by people organizations other than original collection.

e.g. newspapers, magazines, biographies, published books

Methods of Data Collection

1. Survey Method
Data is obtained by asking people either directly (interview) or indirectly
(questionnaire) through the use of schedule – set of questions.

1.1 Interview is a person to person exchange if data between one supplying data
(interviewee) and the one soliciting the data (interviewer) that is most
appropriate for revealing data on complex, emotionally laden topics or
sentiments underlying an expressed opinion.

e.g. focused interview, clinical interview, non-directive interview

1.1.1 Facilitates the clarification of some questions and answers.

1.1.2 Allows the observation of the interviewee’s reaction and facial reactions to
some of the questions.
1.1.3 Interviewer may deliberately or unintentionally influence the interviewee’s

1.2 Questionnaire elicits responses by way of a set of questionnaire that are usually
mailed (snail mail or electronic mail)
1.2.1 Confidential data are usually collected by questionnaire
1.2.2 Respondent can accomplish the questionnaire at his most convenient time.
1.2.3 Covers wide geographical area.

Types of Questions:

1. Fixed alternative questions limit response to a stead alternative. It is very easy to


e.g. Do you want to study abroad?

O Yes O No

2. Open-ended questions permit free response by merely raising the issue without
providing any instruction to the respondents reply.

e.g. How do you describe your school?

Characteristics of a Good Questionnaire:

1. Questions must be simple and clear in order to obtain accurate information. Good
questions result in a greater degree of precision. Questions like, “How much do you
drink?”. The question is not clear to respondents, it may have several meanings.

2. Questions must be objective. Questions like. “Why do you like to study in UST?”.
This question must be phrased in such a way not to put the answer into the subject’s

3. Questions must always state the precise units in order to facilitate the presentation
of data.

4. Questions must as much as possible be fixed alternative

5. Questions must be organized in a logical manner.

6. Questions must include the essential information only.

2. Observation Method
Data pertaining to behaviors of an individual or a group of individuals during the
occurrence of a particular event/situations are best obtained through observation. This method
is limited to the time of occurrence of the event.

Types of Observations
2.1 Participant observation – observer joins the group as participating member
actively or passively
2.2 Non-participating observation – observe outside of the group whether his
presence is known or unknown

3. Experimental Method
A method designed for collecting data under controlled condition that usually
establishes causal relationship.

4. Use of Records Method

The data is obtained through registration such as birth, death, cars, as required by some
laws, ordinances or policies.


Frequency distribution is the method of organizing and summarizing statistical data in
tabular form.

Classes are the categories for grouping data.

Class frequency is the number of observation falling under a class.

Class limits are the end of number classes.

e.g. 110-115
110 is the lower class limit (lcl)
115 is the upper class limit (ucl)

Class boundaries are the true or real class limits.

e.g. 110-115
Note: for discrete variable – add and subtract 0.5
For continuous variable – depends on the number of decimal places.

Class size is called size of the class interval. It is obtained by getting the difference
between the successive upper/lower class limits/boundaries.
Class mark is the midpoint of the class interval.

CM = (ucl + lcl) / 2

Steps in constructing a Frequency Distribution

1. Determine the range which is the difference between the highest and lowest value.

2. Determine the adequate number if class interval.

a. The number of classes should not be smaller than 6 but not greater that 16 (6 < n < 16).
Not too many so as to obtain many empty classes and not too few to avoid lumping
observation and too much information.
b. Observation should fall into one and only one class interval. Sturges approximation is
only a guideline not an inflexible rule.
K = no. of classes (approximate)
K = 1 + 3.22 log n

3. Determine the size of the class interval (sci)

sci = R/k round off sci to the nearest odd number depending on the number of

4. Determine a number less than or equal to the lowest score divisible by the size of the class

5. List all class limits and class boundaries.

6. Tally the frequency for each class.

7. Get the sum of the frequency column and check against the total number of observation.
Data must be presented in them most understandable form that shows significant

Three Ways of Data Presentation

1. Textual Form is summarizing the data in paragraph form. The simplest and the
most appropriate approach when there are only few numbers to
be presented. When a large quantitative data are included in the
text or paragraph the presentation becomes almost

2. Tabular Form is arranging and presenting data in rows and columns so that the
reader may easily compare and analyze. This method facilitates
the comparison of various figures under the different categories.

3. Graphical Form is presenting values or relationships in pictorial form. Charts or

graphs are extremely useful and effective in quickly presenting
unlimited amount of information. It is more effective and
attractive than any other method of data presentation.

3.1 Bar Graph. This consists of bars of heavy lines of equal width, either all
vertical or all horizontal. The length of the bars represent the
magnitude of the quantities being compared.

3.2 Line Graph. This graph shows the relationship between two or more sets
of quantities and is usually used to highlight the effect of time in a
given data.

3.3 Pie Chart. This is used to represent quantities that make a whole. The
diagram is in circular shape cut into sub-dimension with each size
of every section indicated on the proportion of each component.
Definition of Terms:

 Experiment is the process by which observation or measurement is obtained.

 Event is the outcome of an experiment, It is a collection of one or more simple events.
 Sample space is the set of all possible outcomes of an experiment.
 Probability is the toll that allows the statistician to use sample information to make
inferences about or describe the population form which the sample was drawn. The
probability of an event is the numerical measure of the likelihood or degree of
predictability that the event will occur.

The Empirical Probability of an Event A is defined as

P(A) = nA = number of times A occurred

n number of trials run


1. For every event a, 0 < P(A) < 1, that is, the probability of any event is a real number 0
and 1 inclusive
2. P(S) = 1 and P(Ø) = 0
3. If A1, A2, A3... An are mutually exclusive events (mutually disjoint sets) Then:

P(A1 U A2 U... An ) = i =1{ [Ai = P(A1) + P(A2) + P (A3) + … P(An)]

This method is easy to employ when the sample space S… that are equally likely
or equiprobable.

A researcher studied the relationship between the salary of a working a woman with
school aged children and the number of children she had.

2 or fewer children More than 2 children

High Salary .13 .02
Medium Salary .20 .10
Low Salary .30 .25
Let A denote the event that a working woman has 2 or fewer children.
Let B denote the event that a working woman has a lower salary.

1. What is P(A)? = ______________________________

2. What is P(A U B)? = ____________________________________

3. What is P(A B)? = _________________________________

4. Find P(B / A) = _________________________________


A permutation is an arrangement of objects in a definite order.

P =
n r n!___
(n – r)

A combination is a selection of objects without regard to order.

C =
n r n!___
n!(n – r)!

Prepared By: Doxa Dave Rotap
B.S. Microbiology 2013

You might also like