Professional Documents
Culture Documents
INTRODUCTION
Statistics is the study of data, from its form to its relevance to daily lives. Data is
everywhere. It is observable or measurable. With the advancement of technology every
day, data can be accessed anywhere and by anyone. When data is correct, valid analysis
and interpretation can be generated to produce valuable information. There are many
classifications of data. Different kinds of data are collected, analyzed, and interpreted.
Being able to differentiate them is the first thing that must be considered when organizing
data.
OBJECTIVES
DISCUSSION PROPER
Statistics
-is a form of mathematical analysis that uses quantified models, representations and
synopses for a given set of experimental data or real-life studies. Statistics studies
methodologies to gather, review, analyze and draw conclusions from data.
For instance, according to The World Factbook, published by the Central Intelligence
Agency (CIA), in 2015 there were approximately 105 males for every 100 females between
the ages of 15 and 24. However, in the category of people 65 years old and older, there
were approximately 79 men for every 100 women.
Data
What is data?
Data Collection
Data collection is a term used to describe a process of preparing and collecting data.
The purpose of data collection is to obtain information to keep on record, to make
decisions about important issues, to pass information on to others.
Data constitute the foundation of statistical analysis and interpretation. Hence, the first
step in statistical work is to obtain data. Data can be obtained from three important
sources.
It’s best to include categories that cover all possible answers and are
mutually exclusive. There should be no overlap between response items.
Ordinal variables include categories that can be ranked. Consider how wide
or narrow a range you’ll include in your response items, and their relevance to your
respondents.
Likert-type questions collect ordinal data using rating scales with five or
seven points.
When you have four or more Likert-type questions, you can treat the
composite data as quantitative data on an interval scale. Intelligence tests,
psychological scales, and personality inventories use multiple Likert-type
questions to collect interval data.
With interval or ratio data, you can apply strong statistical hypothesis tests
to address your research aims.
Open-ended questions
They require more time and effort from respondents, which may deter them
from completing the questionnaire.
OBSEVATIONS
Quantitative data is expressed in numbers and graphs and is analysed through statistical
methods.
SECONDARY DATA - is a research data that has previously been gathered and can be
accessed by researchers. The term contrasts with primary data, which is data collected
directly from its source.
Classification of Data
The process of arranging data into homogenous group or classes according to some
common characteristics present in the data is called classification.
For Example: The process of sorting letters in a post office, the letters are classified
according to the cities and further arranged according to streets.
Bases of Classification
Qualitative Base:
When the data are classified according to some quality or attributes such as sex, religion,
literacy, intelligence etc…
Quantitative Base:
When the data are classified by quantitative characteristics like heights, weights, ages,
income etc…
Geographical Base:
When the data are classified by geographical regions or location, like states, provinces,
cities, countries etc…
Types of Classification
1. One -way Classification:
If we classify observed data keeping in view single characteristic, this type of classification
is known as one-way classification.
For Example: The population of world may be classified by religion as Muslim, Christians
etc.
For Example: The population of world may be classified by Religion and Sex.
In mathematics and statistics, the arithmetic mean (/ærɪθˈmɛtɪk ˈmiːn/, stress on first
and third syllables of "arithmetic"), or simply the mean or the average (when the context
is clear), is the sum of a collection of numbers divided by the count of numbers in the
In addition to mathematics and statistics, the arithmetic mean is used frequently in many
diverse fields such as economics, anthropology and history, and it is used in almost every
academic field to some extent. For example, per capita income is the arithmetic average
income of a nation's population.
While the arithmetic mean is often used to report central tendencies, it is not a robust
statistic, meaning that it is greatly influenced by outliers (values that are very much larger
or smaller than most of the values). Notably, for skewed distributions, such as the
distribution of income for which a few people's incomes are substantially greater than
most people's, the arithmetic mean may not coincide with one's notion of "middle", and
robust statistics, such as the median, may provide better description of central tendency.
The arithmetic mean is the most commonly used and readily understood measure of
central tendency in a data set. In statistics, the term average refers to any of the measures
of central tendency. The arithmetic mean of a set of observed data is defined as being
equal to the sum of the numerical values of each and every observation, divided by the
total number of observations. Symbolically, if we have a data set consisting of the values
a1, a2…, an, then the arithmetic mean A is defined by the formula:
a1 + a2 + a3 + ⋯ an
x̄ =
𝑛
If the data set is a statistical population (i.e., consists of every possible observation and
not just a subset of them), then the mean of that population is called the population mean,
and denoted by the Greek letter µ.
Examples:
1. The marks obtained by 6 students in a class test are 20, 22, 24, 26, 28, 30. Find the
mean.
Solution:
x ̅=(20+22+24+26+28+30)/6
Therefore, mean = 25
2. If the arithmetic mean of 14 observations 26, 12, 14, 15, x, 17, 9, 11, 18, 16, 28, 20, 22, 8
is 17. Find the missing observation.
Solution:
Given 14 observations are: 26, 12, 14, 15, x, 17, 9, 11, 18, 16, 28, 20, 22, 8
Arithmetic mean = 17
We know that,
Arithmetic mean = Sum of observations/Total number of observations
17 = (216 + x)/14
17 x 14 = 216 + x
216 + x = 238
x = 238 – 216
x = 22
Median
The arithmetic mean may be contrasted with the median. The median is defined such that
no more than half the values are larger than, and no more than half are smaller than, the
median. If elements in the data increase arithmetically, when placed in some order, then
the median and arithmetic average are equal. For example, consider the data sample1, 2,
3, 4 . The average is 2.5, as is the median. However, when we consider a sample that
cannot be arranged so as to increase arithmetically, such as 1, 2, 4, 8, 16 , the median and
arithmetic average can differ significantly. In this case, the arithmetic average is 6.2, while
the median is 4. In general, the average value can vary significantly from most values in
the sample, and can be larger or smaller than most of them.
There are applications of this phenomenon in many fields. For example, since the 1980s,
the median income in the United States has increased more slowly than the arithmetic
average of income.
Examples:
Solution:
The list 4, 8, 1, 14, 9, 21, 12 contains 7 numbers. The median of the list with an odd number
of entries is found by ranking the numbers and finding the middle number. Ranking the
numbers from the smallest number to largest gives
1, 4, 8, 9, 12, 14, 21
The list 46, 23, 92, 89, 77, 108 contains 6 numbers. The median of a list of data with an
even number of entries is found by ranking the numbers and computing the mean of the
two middle numbers. Ranking the numbers from the smallest to largest gives
The two middle numbers are 77 and 89. The mean of 77 and 89 is 83. Thus 83 is the mean
of the data.
Mode
The mode of a list of numbers is the number that occurs most frequently. In statistics,
data can be distributed in various ways. The most often cited distribution is the classic
normal (bell-curve) distribution. In this, and some other distributions, the mean
(average) value falls at the mid-point, which is also the peak frequency of observed
values. For such a distribution, the mean, median, and mode are all the same value. This
means that this value is the average value, the middle value, also the mode—the most
frequently occurring value in the data.
Mode is most useful as a measure of central tendency when examining categorical data,
such as models of cars or flavors of soda, for which a mathematical average median value
based on ordering cannot be calculated.
Examples:
1. For example, in the following list of numbers, 16 is the mode since it appears more times
in the set than any other number:
• 3, 3, 6, 9, 16, 16, 16, 27, 27, 37, 48
2. A set of numbers can have more than one mode (this is known as bimodal if there are
two modes) if there are multiple numbers that occur with equal frequency, and more
times than the others in the set.
• 3, 3, 3, 9, 16, 16, 16, 27, 37, 48
In the above example, both the number 3 and the number 16 are modes as they each occur
three times and no other number occurs more often.
3. If no number in a set of numbers occurs more than once, that set has no mode:
• 3, 6, 9, 16, 27, 37, 48
A set of numbers with two modes is bimodal, a set of numbers with three modes is
trimodal, and any set of numbers with more than one mode is multimodal.
Solution:
a. In the list 18, 15, 21, 16, 15, 14, 15, 21, the number 15 occurs more often than the other
numbers. Thus 15 is the mode
b. Each number in the list 2, 5, 8, 9, 11, 4, 7, 23 occurs only once. Because no number
occurs more than the others, there is no mode.
Weighted Average
A weighted average, or weighted mean, is an average in which some data points count
more heavily than others, in that they are given more weight in the calculation. For
example, the arithmetic mean of 3 and 5 is (3+5)/2 =4, or equivalently (1/2.3)+ (1/2.5)=4.
In contrast, a weighted mean in which the first number receives, for example, twice as
much weight as the second (perhaps because it is assumed to appear twice as often in the
general population from which these numbers were sampled) would be calculated as(
2/3.3) + (1/3.5)= 11/3 Here the weights, which necessarily sum to the value one, are 2/3
and 1/3, the former being twice the latter. The arithmetic mean (sometimes called the
"un-weighted average" or "equally weighted average") can be interpreted as a special case
of a weighted average in which all the weights are equal to each other (equal to 1/2 in the
above example, and equal to 1/n in a situation with n numbers being averaged).
In some cases, you might want a number to have more weight. In that case, you’ll want to
find the weighted mean. To find the weighted mean:
Example:
You take three 100-point exams in your statistics class and score 80, 80 and 95. The last
exam is much easier than the first two, so your professor has given it less weight. The
weights for the three exams are:
The weighted mean is relatively easy to find. But in some cases, the weights might not add
up to 1. In those cases, you’ll need to use the weighted mean formula. The only difference
between the formula and the steps above is that you divide by the sum of all the weights.
∑𝑛𝑖=1(𝑥𝑖 ∗ 𝑤𝑖 )
x̄ =
∑𝑛𝑖=1 𝑤𝑖
While mean and median tell you about the center of your observations, it says nothing
about the 'spread' of the numbers.
Example:
Suppose two machines produce nails which are on average 10 inches long. A sample of 11
nails is selected from each machine.
Machine A: 6, 8, 8, 10, 10, 10, 10, 10, 12, 12, 14
Machine B: 6, 6, 6, 8, 8, 10, 12, 12, 14, 14, 14
In both cases, the mean is 10, indeed. However, the first machine seems to be the better
one, since most nails are close to 10 inches. Therefore:
We must find additional numbers indicating the 'spread' of the data.
The Range
The easiest measure of the data spread is the range. It is simply the highest data value
minus the lowest data value (we have seen the range before). In the above example, the
range is the same for both data, namely 14 - 6 = 8. The range is, while useful, too crude a
measure of variability.
The Variance
We want to find out how much the data points are spread around the mean. To do that,
we could find the difference between each data point and the mean, and average these
differences. However, we want to measure the differences to the mean regardless of the
sign (positive or negative difference). Therefore, we could find the absolute value of the
difference between each data point and average that. But for theoretical reasons an
absolute value function is not easy to deal with, so that one chooses a square function
instead (which also neutralizes signs). Finally, for yet other theoretical reasons we shall
use not the sample size n to compute an average, but instead n - 1.
Hence, we will use this formula to compute the data spread, or variance:
Variance = add up the squares of (Data points - mean), then divide that sum by (n - 1)
There are two symbols for the variance, just as for the mean:
• 𝜎2 is the variance for a population
2
•𝑠 is the variance for a sample
We had to use two formulas because one involves the population mean, the other the
sample mean. Practically, however, the formula is the same. It is useful to compute the
variance at least once "by hand" before we show how to use Excel to accomplish the same
feat quickly and easily.
Here is the table that this procedure produces for the above sample of nails from machine
A and B:
Machine A:
𝑥 (𝑥 − 𝑥̄) (𝑥 − 𝑥̄)2
6 4 16
8 2 4
8 2 4
10 0 0
10 0 0
10 0 0
10 0 0
10 0 0
12 -2 4
12 -2 4
14 4 16
Therefore, the variance for machine A is: (16 + 4 + 4 + 0 + 0 + 0 + 0 + 0 + 4 + 4 + 16) /
10 = 48 / 10 = 4.8
Machine B:
𝑥 (𝑥 − 𝑥̄) (𝑥 − 𝑥̄)2
6 4 16
6 4 16
6 4 16
8 2 4
8 2 4
10 0 0
12 -2 4
12 -2 4
14 -4 16
14 -4 16
14 -4 16
Therefore, the variance for machine B is: (16 + 16 + 16 + 4 + 4 + 0 + 4 + 4 + 16 + 16 + 16)
/ 10 = 112 / 10 = 11.2
In other words, the variance - or spread around the mean, for machine A is 4.8 while
machine B has a variance (spread) of 11.2. That means that machine A seems to produce
nails that, as a rule, produces nails that stick pretty close to the average nail length.
Machine B, on the other hand, produces nails with more variability that machine A.
Therefore, Machine A would be much preferred over machine B.
Note: The unit of the variance is the square of the original unit; hence, it is not the best
number (considering units). Therefore, one introduces an additional number, called the
standard deviation:
Example:
Consider the sample data 6, 7, 5, 3, 4. Compute the standard deviation for that data.
To compute the standard deviation, we must first compute the mean, then the variance,
and finally we can take the square root to obtain the standard deviation. In this case we
do not need to create a table since there are so few numbers:
• Computing the mean:
6+7+5+3+4
𝑥̄ = =5
5
• Computing the variance:
((6 − 5)2 + (7 − 5)2 + (5 − 5)2 + (3 − 5)2 + (4 − 5)2 )/ (5 − 1) = 2.5
• Standard deviation:
𝒔 = √𝟐. 𝟓 = 𝟏. 𝟓𝟖
SUMMARY
Statistics is used in every aspect of life, such as in data science, robotics, business, sports,
weather forecasting, and much more.
REFERENCES
Books:
Mathematics in the Modern World, 14th Edition Aufman Richard, et. al.,,
Mathematics in the Modern World, Philippine Edition by REX Book Store
Mathematics in the Modern World, by Esmeralda A. Manlulu, et. al.
ISUI-CvE-Mod
Revision: 02
Effectivity: August 1, 2020