Professional Documents
Culture Documents
Statistics- is the mathematical science that involves the collection, presentation, analysis and interpretation of
data and the methods utilized therein.
Statistical Methods- are the procedures used in collecting, summarizing, and analyzing data.
Two Major Areas of Statistics
1. Descriptive Statistics- comprises statistical methods dealing with collection, tabulation and
summarization of data, so as to present meaningful information. This method can either be graphical
or computational.
2. Inferential Statistics- the technique by which decisions about a statistical population are made based only
on a sample having been observed or a judgment having been obtained. This kind of statistics is
concerned more with generalizing information or making inference about population.
Population- is the totality of the observations with which a statistician is concerned. The observation could
refer to anything of interest, such as persons, animals or object.
Size of the population – is defined as the number of observations in the population.
Sample- is a subset of a population.
Kinds of Data
1. Continuous Data- this arise from measurement of a continuous variable.
Examples: weights of 100 students, school achievement, I.Q., heights of children.
2. Discrete Data- this is characterized by gaps for which no real values may be obtained. They are made up of
items that values of which have been obtained by counting.
Examples: school enrollment, number of pencils, etc.
Variable- it is a characteristic or phenomena, which may take on different values.
Examples: weight, I.Q., and sex.
Steps in a Statistical Investigation:
1. Collection
2. Presentation
3. Analysis
4. Interpretation
Methods of Collection of Data
1. Direct or interview method. This is a personal communication with the individual you want to interview.
2. Indirect or questionnaires method. This is done by sending questionnaires to the person from whom
you would like to get the information.
3. Registration. Utilizing existing records is registration.
4. Observation. This can be done directly or indirectly.
5. Experiment. This is done by making or conducting scientific inquiry.
Methods of Presentation of Data
1. Frequency Distribution
2. Graph
3. Figure
4. Chart
PRESENTATION OF DATA
1
I. Frequency Distribution
Definition of terms
Frequency Distribution- the tabulation of the scores or measures group with class intervals
Raw Data- are collected data which have not been organized numerically.
Array- is an arrangement of raw numerical data in ascending or descending order.
Class frequency- this refers to the number of observations belonging to a class interval for the
number of items within a category.
Class interval- this is the grouping or category defined by a lower limit and an upper limit.
Class limit- are the limits of the class interval. (Lower and Upper Class Limit)
Class boundaries or true limits- they are more precise expressions of the class limits by at least 0.5
of their value. It is situated between the upper class limit of one interval and the lower class limit
of the next interval.
Class mark or midpoint- this is the midpoint of the class interval and is obtained by adding the
lower and upper limits and dividing by 2.
Class size or width- this is the width of class interval and is the difference between the lower and
upper class boundaries.
Example: The frequency distribution for the final grades in Eng’g Statistics.
No. of Class Class Interval Class Frequency Class Boundary Class Mark
1 70-74 4 69.5-74.5 72
2 75-79 8 74.5-79.5 77
3 80-84 12 79.5-84.5 82
4 85-89 10 84.5-89.5 87
5 90-94 6 89.5-94.5 92
N=40 and Class width = 5
General Principles:
1. Graphing is done by using the Cartesian coordinate plane.
2. The vertical line is the y-axis or ordinate, the horizontal line is the x-axis or abscissa.
3. These two lines interest at a common point called point of origin or point of intersection.
4. Distances measured along the x-axis to the right of 0 are positive; to the left of 0, the distances measured
are negative. The distances measured above 0 are positive and the distances measured below 0 are
negative.
2
A. Histogram
A frequency curve, which is composed of a series of rectangles, constructed with the steps as the
base and the frequency as the height.
Steps in Preparing a Histogram
1. Prepare the x and y-axis
2. Mark x and y scale, x representing the scores and y, the frequencies.
3. The bases of the bars are plotted on the x-axis where the width of the base corresponds to the real limits
or class boundaries of the class interval. The center of the base falls on the midpoint of the class interval.
Y(freq)
12
X(score)
69.5 74.5 79.5 84.5 89.5 94.5
0
class boundaries
B. Frequency Polygon
This is a line graph of class frequencies plotted against class marks. It is made by connecting the
midpoints of the rectangular tops in a histogram.
14
12
class frequency
10
8
6
4
2
0
67 72 77 82 87 92 97
score(midpoint)
3
45
40
class frequency
35
30
25
20
15
10
5
0
69.5 74.5 79.5 84.5 89.5 94.5
score(class boundaries)
Y(freq)
12
0 X(score)
70 74 75 79 80 84 85 89 90 94
class limits
E. Pie Chart. A pie chart displays the absolute or relative frequencies of the intervals as sectors of the circle.
Each sector in a pie chart corresponds to a class interval; The ratio of the area of the sector of the circle is equal
to the relative frequency of the class.
4
POPULATION PARAMETER
The population parameter or briefly parameter is a numerical quantity that describes some
characteristic of a population. Parameters are normally represented by Greek letters. The most common
parameters are the population mean () and variance (2).
SAMPLE STATISTIC
The sample statistic or briefly statistic is a quantitative value that is calculated from the observations
in a sample. They are usually represented by lowercase English letters with other symbols. The sample mean
(x`) and sample variance (s2) are two of the common statistics derived from samples.
MEASURES OF LOCATION
I. PERCENTILE
Percentiles are values that divide the data into one hundred equal parts.
Where:
L = lower class boundary of the class interval upon which pN/100 lies.
Fp = sum of all scores upon intervals below L.
fp = number of scores within the interval upon which pN/100 falls.
C = class interval size or class width.
Example:
Find:
a. P25 = 74.5+((10-4)/8) x5= 78.25
5
II. DECILE
Decile are values that divide the data into ten equal parts.
Where:
L = lower class boundary of the class interval upon which dN/10 lies.
Fp = sum of all scores upon intervals below L.
fp = number of scores within the interval upon which dN/10 falls.
C = class interval size or class width.
Example:
Find:
a. D2
b. D5
III. QUARTILE
Quartiles are values that divide the data into four equal parts.
6
Finding the quartile from the grouped data.
Formula:
𝒒∗𝑵
[ 𝟒 ] − 𝑭𝒒
𝑸𝒅 = 𝑳 + ( ) 𝒙𝑪
𝒇𝒒
Where:
L = lower class boundary of the class interval upon which qN/4 lies.
Fq = sum of all scores upon intervals below L.
fq = number of scores within the interval upon which qN/4 falls.
C = class interval size or class width.
Example:
Find:
a. Q2
b. Q3
7
Finding the mean from Grouped Data
Method I
Formula:
∑ 𝒇𝑪𝑴
̅=
𝒙
𝑵
Where:
f = class frequency
CM = class mark
N = Sum of the frequencies
∑ 𝒇𝒅
̅ = 𝑨𝑴 + ⌊
𝒙 ⌋∗𝑪
𝑵
Where:
AM = assumed mean
C = class size
d = deviation
MEDIAN
The median is that point on the scale of scores below which ½ of the score lie above which the other
half of the score lie.
When to use the Median:
1. When the exact midpoint of the distribution is wanted, the 50% point.
2. When there are extreme scores which would markedly affect the mean. Extreme score do not disturb the
median.
3. When it is desired that certain scores should influence the central tendency, but all that is known about
them is that they are above or below the median.
2. When N is even, the median is the average of the two middle scores.
Example: 30 20 15 12 10 9 8 5 3 1
Median is 12
8
Finding the median from Grouped Data
Formula:
𝑵
[ 𝟐 − 𝑭𝟐 ]
𝑴𝒅 = 𝑳 + ( ) 𝒙𝑪
𝒇𝟐
Where:
L = Lower class boundary of the interval where the median lies.
N = Number of scores
F2 = Cumulative frequency up to the class immediately preceding the median
f2 = frequency of the median class
C = class size
MODE
Mode is that single measure or score which occurs most frequently. When data are grouped into
frequency distribution, the crude mode is usually taken to be the midpoint of that interval which contains the
largest frequency.
When to use the Mode.
1. When a quick and approximate measure of the central tendency of all that is wanted.
2. When the measure of central tendency should be the most typical value.
Finding the Mode from Ungrouped Data
Example:
1. A set of numbers 1 2 4 5 6 7 8 8 8 9 10, the mode is 8.
2. A set of numbers 1 2 4 5 6 7 8 9 10, no mode.
3. A set of numbers 1 2 6 6 6 7 8 8 8 9 10, the modes are 6 and 8 (bimodal).
𝒅𝟏
𝑴𝒐 = 𝑳 + [ ]∗𝑪
𝒅𝟏 + 𝒅𝟐
Where:
L = exact lower class boundary of the modal class.
d1 = difference between the frequency of the modal class and the
frequency of the class below it.
d2 = difference between the frequency of the modal class and the
frequency of the class above it.
C = class width.
MEASURES OF DISPERSION/VARIATION
A measure of dispersion is a method of measuring the degree by which numerical data or values tend
to spread from or cluster about the central point of average.
The most common measures of dispersion are the ff:
1. Range
2. Quartile Deviation
3. Average Deviation
4. Standard Deviation
5. Variance
9
RANGE
The Range for Ungrouped Data:
The range of a set ungrouped data is the difference between the highest and lowest values.
𝑹 = 𝑯𝑺 − 𝑳𝑺
R = 23 – 11 = 12
The Range for Grouped Data
The range for grouped data is generally defined as the difference between the upper boundary of the
highest class interval and the lower boundary of the lowest class interval.
𝑹 = 𝑼𝒃𝒉 − 𝑳𝒃𝒍
QUARTILE DEVIATION
It is also called the semi-quartile range. It is defined as the amount of dispersion present in the middle
of 50 % of the value. Hence, the equation is given by
𝑸𝟑 − 𝑸𝟏
𝑸. 𝑫. =
𝟐
Where:
Q.D. = the quartile deviation
Q1 = the first quartile
Q3 = the third quartile
∑|𝒙 − 𝒙
̅|
𝑨. 𝑫. = (𝒔𝒂𝒎𝒑𝒍𝒆)
𝒏
Where:
x= the individual value
𝑥̅ = sample mean
n = total number of observation
∑|𝒙 − 𝝁|
𝑨. 𝑫. = (𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏)
𝒏
Where:
= population mean
10
From the Grouped Data
∑ 𝒇 ∗ |𝑪𝑴 − 𝒙
̅|
𝑨. 𝑫. = (𝒔𝒂𝒎𝒑𝒍𝒆)
𝒏
∑ 𝒇 ∗ |𝑪𝑴 − 𝝁
̅|
𝑨. 𝑫. = (𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏)
𝒏
STANDARD DEVIATION
The standard deviation of a set of N numbers is denoted by s and is defined by
̅) 𝟐 ]
[∑(𝒙 − 𝒙
𝑺=√ (𝒔𝒂𝒎𝒑𝒍𝒆)
𝒏
Thus s is the root mean square of the deviations from the mean or, as it is sometimes called, the root
mean square deviation.
For a given population, the process for finding the standard deviation is given by the formula.
[∑(𝒙 − 𝝁)𝟐 ]
𝝈=√ (𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏)
𝑵
̅)𝟐 ]
[∑ 𝒇(𝑪𝑴 − 𝒙
𝑺=√
𝒏
Where:
CM = class mark
𝑥̅ = sample mean
f = class frequency
n = total frequency
For easier computation, the above formula can be converted to the form
𝟐
[∑ 𝒇𝑪𝑴 𝟐 ] ∑ 𝒇𝑪𝑴
𝑺=√ − [ ]
𝒏 𝒏
To compute for the standard deviation of a population in a frequency distribution, use any of the
following:
[∑ 𝒇(𝑪𝑴 − 𝝁)𝟐 ]
𝝈=√
𝑵
𝟐
[∑ 𝒇𝑪𝑴 𝟐 ] ∑ 𝒇𝑪𝑴
𝝈=√ − [ ]
𝑵 𝑵
11
THE VARIANCE
The variance of a set of data is defined as the square of the standard deviation. Thus s2 and 2 would
represent the sample variance and population variance respectively.
SKEWNESS AND KURTOSIS
Skewness is the degree of asymmetry, or departure from symmetry, of a distribution.
If the frequency curve of a distribution has a longer “tail” to the right of the central maximum than
to the left, the distribution is said to be skewed to the right or to have positive skewness.
+SK
mean > median
If the reverse is true it is said to be skewed to the left or to have negative skewness.
-SK mean < median
If the longer tail of the curve is located at the center, it is said to have symmetrical skewness. In
symmetrical skewness, the mean, median and mode are all equal
𝑥̅ = Md = Mo
C.T.
12
For skewed distributions the mean tends to lie on the same side of the mode as longer tail. Thus a
measure of the asymmetry is supplied by the difference ( Mean-Mode). This can be made dimensionless on
division by a measure of dispersion, such as standard deviation, leading to the definition
𝑀𝐸𝐴𝑁 − 𝑀𝑂𝐷𝐸
𝑆𝐾𝐸𝑊𝑁𝐸𝑆𝑆 =
𝑆𝑇𝐴𝑁𝐷𝐴𝑅𝐷 𝐷𝐸𝑉𝐼𝐴𝑇𝐼𝑂𝑁
̅ − 𝑴𝑶
𝑿
𝑺𝑲 =
𝑺
To avoid use of the mode, we can employ an empirical formula and define
3(𝑀𝐸𝐴𝑁 − 𝑀𝐸𝐷𝐼𝐴𝑁)
𝑆𝐾𝐸𝑊𝑁𝐸𝑆𝑆 =
𝑆𝑇𝐴𝑁𝐷𝐴𝑅𝐷 𝐷𝐸𝑉𝐼𝐴𝑇𝐼𝑂𝑁
̅ − 𝑴𝑫 )
𝟑(𝑿
𝑺𝑲 =
𝑺
KURTOSIS
Kurtosis is the degree of peakedness of a distribution, usually taken relative to a normal distribution.
A measure of kurtosis based on both quartiles and percentiles is given by
𝑄. 𝐷.
𝐾=
𝑃90 − 𝑃10
A. Leptokurtic
– a distribution having a relatively high peak.
B. Platykurtic
- a distribution which is flat-topped
C. Mesokurtic
- a distribution which is not very high peaked or very flat-topped.
13
SOLUTIONS:
Problem 1- 5
For the given data:
30 40 50 55 45 80 40 63 70 25
In Ascending Order:
25 30 40 40 45 50 55 63 70 80
Calculator technique:
To set the mode to Stat and Single-variable:
MODE→ 3:STAT → 1:1-VAR
To Input the data:
SHIFT →1(STAT) → 2:DATA
then INPUT the values (1ST Column)
14
Problem 6 – 8
For the given Frequency Distribution
Interval Freq. Class mark
10 – 19 6 14.5
20 – 29 7 24.5
30 – 39 8 34.5
40 – 49 10 44.5
50 – 59 9 54.5
s = 13.865
8. Find the variance
A. 192.24
B. 54.0225
C. 64
D. 1600
Variance = s2 = 192.24
15