You are on page 1of 40

Basic concepts in

By: Judy Ann I. Caminoc
Statistics is a field of mathematics that deals with the
Collection, Organization, Analysis, and Interpretation of
quantitative data.

When we say Collection of data we mean the process of

gathering relevant information from the population. When we
talk about Organization of data we refer to the systematics
arrangement of data into tables, graphs, or charts so that
logical and statistical conclusions can be easily be derived
from the collected information. Analysis of data refers to
the process of deducing relevant information from the given
data so that numerical description can be formulated.
Interpretation of data is all about deriving conclusion from
the data that have been analyzed. It also involves making
predictions or forecasts about large groups based on the
gathered data from the small groups.
Survey, test, interview, observe,
Collect experiment, register

Tables, graphs,
Organize texts

Numerical analysis, (“most”,

Analyze “how many percent”, “least”)

Give the meaning/implication of

Interpre the findings, concluse
Two fields of statistics
1. Descriptive Statistics consist of the collection,
organization, summarization, and presentation of data.
- Here, the statistician tries to describe a given
2. Inferential Statistics is another area of Statistics
concerned with drawing conclusions about the large
groups of data called the population based on selected
elements of that population, known as sample.
- involves analysis and interpretation of data
- Here, the statistician tries to make inferences from
samples to population. This area also makes use of the
concept of probability.
Measures of central tendency

A measure of central tendency or measure of central

location is a summary measure that describes a whole
set of data with a single quantity that represents the
middle or center of its distribution the way in which a
group of data that cluster around a central value. In
short, this is a measure that tells where the center of
data set is located.
The most commonly used measures of central
tendency are the mean, median, and mode.

   The
mean ( also called as the “average” or
“arithmetic average”, is the most commonly used
measure of central tendency. It is said to be the most
reliable measure of central tendency and has the least
probable error but does not supply information about
the homogeneity of the distribution.
Ungrouped data

A.  Simple Mean

Getting the simple mean means that we are giving
equal weight to each value in the data set.
To compute the mean of ungrouped data, we use the
1. The ages of five contestants in a Statistics Quiz Bee are
the following:
18, 17, 18, 19 and 18. find their average age.
2. Six employees are working as call center agents. Their
salaries are as follows:
23,500, 24,300, 25,800, 23,900, 24,100, and 24,950. what
is the average salary of the employees?
B.  Weighted Mean
Weighted mean is mean calculated by giving values in a
data set more influence according to some attribute of the
data. It is an average in which each quantity to be averaged
is assigned a weight, and these weightings determine the
relative importance of each quantity on the average.
Weightings are the equivalent of having that many like items
with the same value involved in the average.
The formula for weighted mean is , where w is the weight
of each value and x is the matching value.
1. Xandra bought different fruits for New Year. She bought 3
apples at 10 each, 5 ponkans at 5 each, 3 pears at 15
each, 4 pieces of chicho at 25 each. What is the average
price of each fruit that Xandra bought?
2. At MJR fitness and health society, 60% of the members
are women and 40% are men. What is the average age of
all member if the average age of the women is 35 and the
average age of the men is 30?
Grouped data

Onemethod that can be used to find the mean of grouped
data is the class mark or midpoint method.
Class mark or Midpoint Method
In this method, the class mark of each interval has to be
known and then it will be multiplied to the corresponding
frequency of every class interval. The formula for the mean
using this method is
where frequency
class mark
total number of observations
1. Consider the frequency distribution below:

CI cf
75-79 5
70-74 7
65-69 8
60-64 10
55-59 8
50-54 9
45-49 5
A median is defined as the middle value/observation in an
organized list of numbers and falls in the middle-most position
of the whole data.

The median value in an ungrouped data is determined by the
first arranging the numbers in value order from lowest to
highest or vice versa. If there is an odd amount of numbers, the
median value is the middle most number, with the same
amount of numbers below and above. If there is an even
amount of numbers in the list, the middle pair must be
determined, added together and divided by two to find the
median value. The median can be used to determine an
appropriate average.
1. A college professor at a certain university assigns Statistics
practice problems to be worked via net. Students must use a secret
code to access the problems and the time of log-in and log-out are
automatically recorded for the professor. At the end of the week, the
professor examines the amount of time each student spent solving
the assigned problems. Find the median. The data is provided below
in minutes.
15 28 25 48 22 34 39 44 43 49 34 22 33 27 25 22 30
2. The speed of stenographers in typing per minute are as follows:
Stenographer 1 2 3 4 5 6 7 8 9 10
Speed 121 110 120 119 112 121 118 115 107 115
Grouped data

The formula for the median for grouped data is follows:

where: exact lower class boundary of the median class

less than cumulative frequency below the median
class size
frequency of the median class

a. Compute for the of the data,

b. Determine the median class by computing for the value of
c. Locate the computed value for at the column (must be within
one of the The interval corresponding to this value is the
median class.
d. Look at the corresponding to the median class. Then get the
before the median class.
e. Subtract the from
f. Divide the answer in step e by the frequency of the median
g. Multiply the answer in step f by the value of i. To determine the
value of I, subtract the lower limit from the upper limit in any of
the class intervals then add one.
h. Add the answer in step g to the exact lower limit ) of the
median class. The answer in the step is the median value of the
data set.
1. The record of 21 people in a 100m race is summarized in
the given frequency table:

Time in seconds Frequency

(in seconds)
51-55 2
56-60 7
61-65 8
66-70 4
The number/value/observation in a data set which appears
the most number of times. If no number in the list is
repeated, then there is no mode for the list. However, it is
also possible to have more than one mode for the same
distribution of data, (bi-modal, tri-modal, or multi-modal).

To find the mode of an ungrouped data, find the frequency
of each number/value/observation in the given data set.
Then, choose the number/value/observation having the
highest frequency as the mode.
MODE= number/value/observation with the highest frequency
1. The speed of stenographers in typing per minute are as
Stenographer 1 2 3 4 5 6 7 8 9 10
Speed 121 110 120 119 112 121 118 115 107 115
Grouped data

To  find the mode of grouped data, we use formula

where: modal class or the class with the highest

difference between the frequency of the modal
class and the frequency above it
difference between the frequency of the modal class
and the frequency below it
class size
  Steps:

a. Identify the modal class by determining the interval with

the highest frequency.
b. Determine the exact lower limit ( of the modal class.
c. Calculate .
d. Determine the value of i by subtracting the lower limit
from the upper limit in any of the class intervals then add
e. Substitute the values in the formula.
1. The record of 21 people in a 100m race is summarized in
the given frequency table:

Time in seconds Frequency

(in seconds)
51-55 2
56-60 7
61-65 8
66-70 4
  is concerned with the selection of a subset of individuals from
within a statistical population to estimate characteristics of
the whole population.

 it is the process or the method of drawing a definite number
of the individual, cases or the observation from a particular
universe, selecting part of a total group for the investigation.
(Mildred Parton)
 is a procedure for selecting sample members from a
 is a sampling technique in which sample from a larger
population are chosen using a method based on the theory of
 process of selecting a sampling in such a way that all individuals
in the defined population have an equal and independent
chance of being selected for the sample.

1. Simple Random Sampling

2. Systematic Random Sampling
3. Stratified Random Sampling
4. Cluster sampling
5. Multi-stage Sampling
Simple random sampling

 Simple random sampling is the most basic type of probability

sampling for selecting a specified number of units from a
population. It is a procedure in which every possible sample of a
certain size within a population has a known and equal
probability of being chosen as the study sample.
Example: Imagine that a researcher wants to understand more
about the career goals of instructors at a single university. Let's
say that the university has roughly 350 instructors .These 350
instructors are our population (N). Each of the 350 instructors is
known as a unit. In order to select a sample (n) of instructors from
this population of 350 instructors, we could choose to use a simple
random sampling.

To create a simple random sample, there are six steps: (a)

Identify or define the population; (b) choosing your sample size or
determine the desired sample size; (c) listing the population; (d)
assigning numbers to the units; (e) finding random numbers; and
(f) selecting your sample.
1. Step One: Identify the population.
For example, let us define the population as all the 350 teachers in a
single university.

2. Step Two: Determine the desired sample size.

Out of this population let’s imagine that we choose a sample size of
126 instructors. The sample is expressed as n.

3. Step Three: List the Population.

Prepare a list of the members of the given population and number
the members consecutively from zero to the required number. For our
example, the names of the instructors comprising the population will be
numbered from 000,001, 002, etc., up to 349.
4. Step Four: Assign numbers to the units.

We will use the table of random numbers, and point at any number
(without looking at the table).For this number, look only at the
appropriate number of digits.

Since the population in our example is numbered from 000 to 349

(three digits number), we look only at the last three digits of the first
number we point at. If this number corresponds to the number assigned
to any of the individuals is in the population, then that individual is in
the sample. For example, if the number is 127, then the member
assigned 127 will be in the sample. If the number selected is 427, it is
ignored because the highest number in the population list is 349.
5. Step Five: Find Random numbers.

From these number, you can read the table vertically (upward or
downward); horizontally( leftward or right ward) or diagonally.
Record those numbers which are in the population list and discard
those which are not. Continuing with our example, we record those
numbers between 000 and 349 and discard those which appeared
a second or more times. We stop the selection of the random
numbers when we have 126 such numbers.

6. Step Six: Select your sample.

Finally, we select which of the 350 instructors will be invited to
take part in the research.
2. Systematic Random Sampling
 The selection of the first element is at random
and the selection of the other elements in the
sample is systematic by subsequently taking
every kth element from the random start, where k
is the sampling interval.
3. Stratified Random Sampling
 Is a process done by dividing the population into
strata or categories and drawing the members at
random proportionate to each stratum or sub-group.
 is one obtained by separating the population elements
into non overlapping groups, called strata, and then
selecting a simple random sample from each stratum.
 The following example illustrates a situation in which stratified
random sampling may be appropriate.

An advertising firm, interested in determining how much to
emphasize television advertising in a certain county, decides to
conduct a sample survey to estimate the average number of hours
each week that households within the county watch television. The
county contains two towns, A and B, and a rural area. Town A is built
around a factory, and most households contain factory workers with
school-age children. Town B is an exclusive suburb of a city in a
neighboring county and contains older residents with few children at
home. There are 155 households in town A, 62 in town B, and 93 in
the rural area. Discuss the merits of using stratified random
sampling in this situation.
4. Cluster Sampling
 is suitable procedure if the population is spread out
over a wide geographical area.
 there is the division of population into no overlapping
groups or clusters consisting of one or more elements,
and then select a sample of clusters.
Example of cluster sampling :
Suppose all the households of a city are to be
studied. Suppose also that the sample to be used
is 20 percent. The city is divided into clusters, the
number and the sizes of which are decided upon
by the researcher. Suppose there are 40 clusters
or blocks, 20 percent of which are 8 clusters. The
8 clusters are to be selected either by simple
random sampling or by systematic random
5.Multi stage sampling
 Is a more complex sampling technique a
combination of several sampling techniques that we
have discussed.
 It is used especially when the subjects of an
investigation are scattered all over a big
geographical area.
This can be done in two or more stages. This method
involves the following steps:
1.Divide the population into strata. Say , the 5 colleges of
2.Divide each stratum into clusters. Say, the courses
offered in each colleges.
3.Draw a sample from each cluster using the simple
random sampling or systematic sampling.
Slovin’s formula

   used to calculate the sample size (n) given the

population size (N) and a margin of error (e).
 It’s a random sampling technique formula to estimate
sampling size
It is computed as;

Where number of samples

total number of population
margin of error
A researcher plans to conduct a survey. If the population on High City is
1,000, find the sample size if the margin of error is
a. 25%
b. 10%
c. 5%
d. 8%
e. 2%

You might also like