ppt3 Descriptive2 (23-24)

DESCRIPTIVE
STATISTICS
FREQUENCY DISTRIBUTION
A frequency table categories (or classes) of
scores, along with counts (or frequencies) of the
number of scores that fall into each category.
The frequency for a particular class is the number of
original scores that fall into that class.
The types of frequency
distribution
1. Categorical
2. Ungrouped
3. Grouped
Categorical or Qualitative Frequency Distributions
Categorical frequency distribution represent data that can be

placed in specific categories, such as gender, hair color or
religious affiliation.
Example: The blood type of 25 blood donors are given below.

Summarize the data using a frequency distribution.
AB B A O B
O B O A O
B O B B B
A O AB AB O
A B AB O A
Represent the blood types as classes and the number of
occurrences for each blood types as frequencies. The
table (distribution) below summarizes the data
Class (Blood type) Frequency
Distribution
A 5
B 8
O 8
AB 4
Total 25
Quantitative Frequency Distribution – Ungrouped
An ungrouped frequency distribution simply lists the data
values with the corresponding number of times or frequency
count with which each value occurs.
Example : The following data represent the number of defective

observed each day over a 25-day period for a manufacturing
process. Summarize the information with a frequency
distribution.
Day 1 2 3 4 5 6 7 8 9 10 11 12 13
Defects 10 10 6 12 6 9 16 20 11 10 11 11 9
Day 14 15 16 17 18 19 20 21 22 23 24 25
Defects 12 11 7 10 11 14 21 12 6 10 11 6
Solution: The frequency distribution for the number of
defects is shown below:
Class (Defects) Frequency f

6 4
7 1
9 2
10 5
11 6
12 3
14 1
16 1
20 1
21 1
Total 25
Quantitative Frequency Distribution
– Grouped
Table below shows of IQ for Two Classes
82 96 105 115 131

87 97 106 119 131
89 98 107 120 140
93 102 109 127 131
93 103 111 128 185
Frequency Distribution
FREQUENCY DISTRIBUTION OF IQ FOR TWO CLASSES

IQ FREQUENCY
80-89 3
90-99 5
100-109 6
110-119 3
120-129 3
140-149 2
150 AND OVER 1
Quantitative Frequency Distribution – Grouped
CLASS TALLY FREQUENCY CLASS BOUNDARY CLASS MARK Relative Less than Greater than
Cumulativ Cumulative
INTERVAL Frequency Frequency
e
(RF) (>CF)
Frequency
(<CF)
5-9 IIII 4 4.5-9.5 7 0.04 4 100
10-14 IIII-III 8 9.5-14.5 12 0.08 12 96
15-19 IIII-IIII-IIII-II 17 14.5-19.5 17 0.17 29 88
20-24 IIII-IIII-IIII- 26 19.5-24.5 22 0.26 55 71

IIII-IIII-I
25-29 IIII-IIII-IIII-IIII 20 24.5-29.5 27 0.20 75 45
30-34 IIII-IIII-IIII 15 29.5-34.5 32 0.15 90 25
35-39 IIII-IIII 10 34.5-39.5 37 0.10 100 10

Lower class limits are the smallest number that can
actually belong to the different classes.
Upper class limits are the largest number that can actually
belong to the different classes.
CLASS
INTERVAL
5-9
Lower
class 10-14
limit 15-19
Upper class
20-24 limit
25-29
30-34
35-39
Class boundaries are the numbers used to separate classes, but
without the gaps created by the class limits. They are obtained
increasing the upper class limits and decreasing the lower class
limits by the same amount so that there are no gaps between
consecutive classes. The amount be added or subtracted is one-
half the difference between the upper limit of one class and the
lower limit of the following class.
CLASS FREQUENCY CLASS
INTERVAL BOUNDARY
5-9 4 4.5-9.5
10-14 8 9.5-14.5 1) ½(upper limit – lower limit)

15-19 17 14.5-19.5
=½(10-9) = 0.5
2) Add to upper limit 9+0.5 =
20-24 26 19.5-24.5
9.5
25-29 20 24.5-29.5 Subtract from lower limit = 5-
30-34 15 29.5-34.5
0.5 = 4.5
Class Boundary for 1st
35-39 10 34.5-39.5
class Interval
Upper Class boundaries
4.5-9.5
Class marks are the midpoints of the classes. They can be
found by adding lower class limits and dividing by 2.
CLASS FREQUENCY CLASS Class
INTERVAL BOUNDA Mark
RY
5-9 4 4.5-9.5 7
10-14 8 9.5-14.5 12 upper class limit  lower class limit

classmark 
15-19 17 14.5- 17
2
19.5 59
 7
20-24 26 19.5- 22 2
24.5
25-29 20 24.5- 27
29.5
30-34 15 29.5- 32
34.5
35-39 10 34.5- 37
39.5
Class width or Class size is the difference between two
consecutive lower class limits or two consecutive lower
class boundaries.
Class width / size = difference
CLASS FREQUENCY CLASS Class of two consecutive upper or
INTERVAL BOUNDA Mark lower class limits
RY =10-5 = 5
5-9 4 4.5-9.5 7
10-14 8 9.5-14.5 12
of two consecutive upper or
15-19 17 14.5- 17 lower class boundaries
19.5 = 14.5 – 9.5 = 5
20-24 26 19.5- 22
24.5
upper and lower class
25-29 20 24.5- 27 boundaries
29.5 = 9.5 – 4.5 = 5
30-34 15 29.5- 32
34.5
35-39 10 34.5- 37
39.5
Relative Frequency ratio of the class frequency to the total
frequency
CLASS FREQUENCY Relative
INTERVAL Frequency
5-9 4 0.04 frequency
RF 
10-14 8 0.08 total frequencies
4
15-19 17 0.17   0.04
100
20-24 26 0.26
25-29 20 0.20
30-34 15 0.15
35-39 10 0.10
Total 100
Cumulative Frequency accumulated frequency that is <, > to a stated value. We obtain the >
cumulative frequency if the frequencies are summed from bottom up to find the number of
observations greater than a specified lower class boundary. The less than cumulative is
constructed if the frequencies are summed from top down to find the number of observations
less than a particular upper class boundary .
CLASS TALLY FREQUENCY Less than Greater than

INTERVAL e Frequency
Frequency (>CF)
(<CF)
5-9 IIII 4 4 100

4+8
10-14 IIII-III 8 12 96
4+8+17
15-19 IIII-IIII-IIII-II 17 29 88
20-24 IIII-IIII-IIII- 26 55 71
IIII-IIII-I
25-29 IIII-IIII-IIII-IIII 20 75 10+15+2
0
45
10+15
30-34 IIII-IIII-IIII 15 90 25
35-39 IIII-IIII 10 100 10

STEM AND LEAF PLOTS
Another simple way to display the distribution of a

quantitative data set is the stem and leaf plot. This
procedure was introduced by Tukey and is one of the
primary tools of explanatory data analysis. A stem and
leaf diagram consists of a series of horizontal rows of
numbers. The number used to label a row is called a
stem, and the remaining numbers in the row are called
leaves..
Table below shows of IQ for Two Classes
82 96 105 115 131
87 97 106 119 131
89 98 107 120 140
93 102 109 127 131
93 103 111 128 185
Stem and Leaf Plot of IQ for Two Classes

Stem Leaf
8 279
9 3678
10 235679
11 159
12 078
13 1
14 0
15
16 2
Steps:
1. Divide each measurement into two parts: the stem and
the leaf.
2. List the stem in a column, with a vertical line to their
right.
3. For each measurement, record the leaf portion in the
same row as its corresponding stem.
4. Order the leaves from the lowest to highest in each
stem.
5. Provide a key to your stem and leaf coding so that the
reader can recreate the actual measurements if
necessary.
Sometimes the available stem choices result in a plot
that contains too few stems and a large number of
leaves within each stem. In this situation, you can
stretch the stems by dividing each one into several
lines, depending on the leaf values assigned to them.
Stems are usually divided in one of two ways:
 Into two lines, with leaves 0-4 in the first line

and leaves 5-9 in the second line.
 Into five lines, with leaves 0-1, 2-3, 4-5, 6-7,
and 8-9 in the five lines respectively.
STEM AND LEAF DIAGRAM (1 line)
STEM LEAF FREQUENCY
1 (0-9) 99 2
2 (0-9) 0112344555556677777889 23
9
3 (0-9) 00114 5
STEM AND LEAF DIAGRAM (2 lines)
STEM LEAF FREQUENCY
1 (5-9) 99 2
2 (0-4) 0112344 7
2 (5-9 5555566777778899 16
3 (0-4) 00114 5
STEM AND LEAF DIAGRAM (3 lines)
STEM LEAF FREQUENC

Y
1 (8-9) 99 2
2 (0-1) 011 3
2 (2-3) 23 2
2 (4-5) 4455555 7
2 (6-7) 6677777 7
2 (8-9) 8899 4
3 (0-1) 0011 4
3 (2-3) 0
3 (4-5) 4 1
STEM AND LEAF DIAGRAM
STEM LEAF FREQUENC

Y
1 (8-9) 99 2
2 (0-1) 011 3
2 (2-3) 23 2
2 (4-5) 4455555 7
2 (6-7) 6677777 7
2 (8-9) 8899 4
3 (0-1) 0011 4
3 (2-3) 0
3 (4-5) 4 1
GRAPHICAL REPRESENTATION OF FREQUENCY DISTRIBUTION
GRAPHICAL REPRESENTATION OF FREQUENCY
DISTRIBUTION
a. HISTOGRAM
The histogram is a set of vertical bars having their bases
or the horizontal axes which center on the class marks.
The width corresponds to the class marks and the height
correspond to the frequencies.
A histogram differs from a bar chart in the bases of each
bar are the class boundaries rather than the class limits.
Histogram
Histogram of IQ Scores for Two Classes
4
Frequency
0
80.00 100.00 120.00 140.00 160.00
IQ
Quantitative Frequency Distribution – Grouped
CLASS TALLY FREQUENCY CLASS BOUNDARY CLASS MARK Relative Less than Greater than
INTERVAL Frequency Frequency
e
(RF) (>CF)
Frequency
(<CF)
5-9 IIII 4 4.5-9.5 7 0.04 4 100
10-14 IIII-III 8 9.5-14.5 12 0.08 12 96
15-19 IIII-IIII-IIII-II 17 14.5-19.5 17 0.17 29 88
20-24 IIII-IIII-IIII- 26 19.5-24.5 22 0.26 55 71

IIII-IIII-I
25-29 IIII-IIII-IIII-IIII 20 24.5-29.5 27 0.20 75 45
30-34 IIII-IIII-IIII 15 29.5-34.5 32 0.15 90 25
35-39 IIII-IIII 10 34.5-39.5 37 0.10 100 10

26
20
F 17
R
E 15
Q
U 10
E 8
N
C 4
Y
4.5 9.5 14.5 19.5 24.5 29.5 34.5 39.5
CLASS BOUNDARY
FREQUENCY HISTOGRAM
b. FREQUENCY POLYGON
The frequency polygon is a modification of the histogram; only, the frequency
polygon is line graph where the class frequencies is plotted against the class
marks. To close the polygon, an extra class mark at each end must be added.
The frequency polygon can also be obtained by connecting midpoints of the
tops of the rectangles in the histogram.
F
R
E
Q
U
E
N
C
Y
CLASS MARK
c. OGIVES
A line graph showing the cumulative frequency of distribution is called an ogive. For the “less than”
ogive, the “less than” cumulative frequencies are plotted against the upper class boundaries. For the
“greater than” ogive, the greater than cumulative frequencies are plotted directly above the lower
class boundaries. These graphs are useful in estimating the number of observations that are less than
or more than a specified value.
CLASS BOUNDARY CLASS MARK Relative Less than Greater than
Frequency Frequency
e 120
(RF) (>CF)
Frequency
(<CF)
100
4.5-9.5 7 0.04 4 100
80
9.5-14.5 12 0.08 12 96 60 <OG

...
40
14.5-19.5 17 0.17 29 88
20
19.5-24.5 22 0.26 55 71 0
4.5 9.5 14.5 19.5 24.5 29.5 34.5 39.5
24.5-29.5 27 0.20 75 45
29.5-34.5 32 0.15 90 25
34.5-39.5 37 0.10 100 10

Other shape: (SKEWED TO THE RIGHT)
80
70
60
F 50
R 40 Seri
E 30 ...
Q 20
U 10
E 0
N 1 2 3 4 5 6 7 8 9
C
Y OGIVE
CLASS BOUNDARY
Other Shape: (SKEWED TO THE LEFT)
80
70
60
F
50
R <OG
40 ...
E
30
Q
20
U 10
E 0
N 0.5 5.5 10.5 15.5 20.5 25.5 30.5 35.5
C
Y
CLASS BOUNDARY
Other Shape; PLATYKURTIC
F
R
E
Q
U
E
N
C
Y
CLASS BOUNDARY
other Shape:
LEPTOKURTIC
F
R
E
Q
U
E
N
C
Y
CLASS BOUNDARY
DESCRIPTIVE STATISTICS
 Measures of Location
 Measures of Variability
 Measures of Shape
MEASURES OF CENTRAL TENDENCY (location)
A measure of central tendency gives a single value that act as a
representative average of the values of all the outcomes of your
experiment. Three parameters that measure the center of the
distribution in some sense are of interest. These parameters, called the
population mean, the population median and the population mode.
Central Tendency
refers to the Middle
of the Distribution
A. THE MEAN
For Ungrouped Data:
Let x1 , x2 , x3 ,…. xn be n observations of a random variable X. The sample mean,
denoted by x, is the arithmetic average of these values. That is,
N
x i
  i 1
for population mean
N
n
x i
x i 1
for sample mean
n
For Grouped Data
 f i xi
 or x  i 1
k
i 1
fi
Where: fi is the frequency of class interval i

xi is the class midpoint of class interval i
B. THE MEDIAN
_
For Ungrouped Data:

Let x1 , x2 , x3 ,…. xn be a sample observations arranged in the order of smallest to
largest. The sample median for this collection is given by the middle observation if n is odd.
If n is even, the sample median is the average of the two middle observations.

 x( n 1) / 2 If n is odd
~

x 
 x( n / 2 )  x( n / 2 ) 1
 If n is even
 2
For Grouped Data:
When the data are grouped into a frequency distribution, the median is
obtained by finding the cell that has the middle umber and then
interpolating within the cell.
~ n   cf n  cf
x  Lbi  2 i 1
(class size)
~
2 i 1
x  Ubi  (class size)
fi fi
where:
Lbi = lower class boundary of the interpolated interval
Ubi = lower class boundary of the interpolated interval
<cfi-1 = less than cumulative frequency of the class before interpolated
interval
>cfi-1 = greater than cumulative frequency of the class before
interpolated interval
fi = frequency of the interpolated interval
i = interpolated interval
n = number of data points
C. THE MODE
The last measure of central

tendency is the mode. For a finite
population, the population mode is the
value of X that occurs most often. The
mode of a sample is the value that
occurs most often in the sample. The
drawback to this measure is that there
might not be a unique mode. There
might be no single number that occurs
more often that any another. For this
reason, the mode is not a particularly
useful descriptive measure.
When the data are grouped into
a frequency distribution, the midpoint
of the cell with the highest frequency is
the mode, since this point represents
the highest point (greatest frequency).
For grouped Data:
d1
Mode  LB  (class size)
d1  d 2
L B  lower boundary of the modal class
Modal Class  is a category contanig the highest frequency
d 1  difference between th e frequency of the modal class and frequency above it when th e
scores are arranged from lowest to highest
d 2  difference between th e frequency of the modal class and frequency below it when th e
scores are arranged from lowest to highest
EXAMPLES:
1. A high school teacher at a small private school assigns trigonometry practice problems to be
worked via the net. Students must use a password to access the problems and the time of log-
in and log-off are automatically recorded for the teacher. At the end of the week, the teache
examines the amount of time each student spent working the assigned problems. The data is
provided below in minutes.
Data 15 28 25 48 22 43 49 34 22
33 27 25 22 20 39

x 15+ 28 + 25 + 48 + 22 + 43 + 49 + 34 +22 + 33 +27 + 25 + 22 + 20 +3
Mean = ------------------------------------------------------------------------------------------
15
Mean = 30.13
~
Median :
x = 15 20 22 22 22 25 25 27 28 33 34 39 43 48 49
= 27
^
Mode :
x =22
2. The number of television viewing hours per household
and the prime viewing times are two factors that affect
television advertising income, A random sample of 30
households in a particular viewing area produced the
following estimated of viewing hours per household.
3.0 6.0 7.0 15.0 12.0 6.1
6.5 8.0 4.0 5.0 6.0 7.3
5.0 12.0 1.0 3.5 3.0 5.4
7.5 5.0 10.0 8.0 3.5 8.3
9.0 2.0 6.5 1.0 5.0 8.5
Find the mean, median and mode

3. The frequency table (below) represent the final examination for
an statistics course. Find the population mean, the population
median and the population mode.
Class Interval Frequency Class mark Cumulative

Frequency
<CF
10– 19 3 14.5 3
20 – 29 2 24.5 5
30 – 39 3 34.5 8
40 – 49 4 44.5 12
50 – 59 5 54.5 17
60 – 69 11 64.5 28
70 – 79 14 74.5 42
80 – 89 14 84.5 56
90 – 99 4 94.5 60
 fi xi
Mean = ---------------
 fi
(3)(14.5) + (2)(24.5) +( 3)(34.5) + (4)(44.5) + (5)(54.5) +
(11)(64.5) + 14(74.5)+ (14)(84.5) +(4)(94.5)
Mean = --------------------------------------------------------------------------------
3 + 2 + 3 + 4 + 5 + 11 + 14 + 14 + 14
Mean = 66
n/2 – <cfi-1
Median = Lb + -------------------- (i)
fi
60/2 – 28
Median = 69.5 + -------------------- (10)
14
Median = 70.93
Mode = Classmark with the highest frequency
Mode = 74.5 and 84.5
Mode = Classmark with the highest frequency
Mode = 74.5 and 84.5
Modal Class 1
3
Mode = 69.5 + -------- (10) = 79.5
3-0
Modal Class 2
0
Mode = 79.5 + -------- (10) =79.5
0 - 10
4. Find the sample mean, sample median and sample mode
CLASS FREQUENCY CLASS CLASS <CF >CF

INTERVAL BOUNDARY MARK
5-9 4 4.5-9.5 7 4 100
10-14 8 9.5-14.5 12 12 96
15-19 17 14.5-19.5 17 29 88
20-24 26 19.5-24.5 22 55 71
25-29 20 24.5-29.5 27 75 45
30-34 15 29.5-34.5 32 90 25
35-39 10 34.5-39.5 37 100 10

Summary of when to use the mean, median and
mode
Best measure of central

Type of Variable
tendency
Nominal Mode
Ordinal Median
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median

Piece of Advice
MEASURES OF VARIABILITY
Refers to the extent of scatter or
dispersion around the zone of central tendency
Variability is about the Spread
A. RANGE
One measure of variation is the range, which has the advantage of being
very easy to compute. The range, R, of a set of n measurements is defined as the
difference between the largest and smallest measurements.
Formula:
Range = Highest score – Lowest Score or R = (H – L)
B. VARIANCE and STANDARD DEVIATION
The variance of a population of N measurements is defined to be the average of the

squares of the deviations of the measurements about their mean μ. The
population variance is denoted by σ² and is given by the formula
N
 (x i   )2
2  i 1 For ungrouped data
N
f i ( xi   ) 2
2  i 1
k
For grouped data
f i 1
i
_
The variance of a sample of n measurements is defined to be the sum of the
squared deviations of the measurement about their mean x divided by (n-1).
The sample variance is denoted by s² and is given by the formula
n
 (x i  x) 2
For ungrouped data
s2  i 1
n 1
k
f i ( xi  x ) 2
For grouped data
s2  i 1
 k 
  fi   1
 i 1 
The standard deviation, in essence, represents the “average amount of

variability” in a set of measures, using the mean as a reference point. Strictly
speaking, the standard deviation is the positive square root of the average of
the square deviations about the mean or the positive square root of the
variance. The standard deviation is basically a measure of how far each score,
on the average, is from the mean
1. A high school teacher at a small private school assigns trigonometry practice problems to be
worked via the net. Students must use a password to access the problems and the time of log-
in and log-off are automatically recorded for the teacher. At the end of the week, the teacher
examines the amount of time each student spent working the assigned problems. The data is
provided below in minutes.
Data 15 28 25 48 22 43 49 34
22 33 27 25 22 20 39
Calculate the range, variance and standard deviation.
Range = HV – LV
= 49-20
 (x – x-bar) ²
s²(sample variance) = --------------------------
n-1
(15-30.13)2 + (28-30.13)2 + (25-30.13)2 +(48-30.13)2 + (22-30.13)2 + (43-30.13)2 +
(49-30.13)2 + (34-30.13)2 + (22-30.13)2 +(33-30.13)2 + (27-30.13)2 + (25-30.13)2 + (22-30.13)2
+(20-30.13)2 + (39-30.13)2
= -----------------------------------------------------------------------------------
15 -1
= 109.9809524
s  s2
(sample standard deviation)
= 10.48718038
2. The frequency table (below) represent the final examination for
statistics course. Find the population range, population variance and
population standard deviation
Class Interval Frequency Class mark Cumulative

Frequency
10– 19 3 14.5 3
20 – 29 2 24.5 5
30 – 39 3 34.5 8
40 – 49 4 44.5 12
50 – 59 5 54.5 17
60 – 69 11 64.5 28
70 – 79 14 74.5 42
80 – 89 14 84.5 56
90 – 99 4 94.5 60
Range = Highest Upper Class Boundary - Smallest Lower Class Boundary
= 99.5 – 9.5
= 90
 ƒ (x - µ) ²
² = ----------------- ƒ
3(14.5 – 66)2 +2 (24.5 – 66)2 +3 (34.5 – 66)2 + 4(44.5 – 66)2 +
5(54.5 – 66)2 +11 (64.5 – 66)2 +14 (74.5 – 66)2 +
14(84.5 – 66)2 + 4(94.5 – 66)2
² = ----------------------------------------------------------------------------
60
= 432.75
 = 20.80264406 or 20.80
Fx-991E plus
1. MODE, 3:STAT, 1: 1-VAR
2. INPUT DATA
3. AC
4. SHIFT, 1-STAT, 4:VAR
Choices:
1: n # of data points
2: mean ( sample/population mean
3: population standard deviation
4: sample standard deviation
Fx-991EX
1. MAIN MENU: PRESS 6
2. SELECT 1: 1-VARIABLE
3. INPUT DATA
4. PRESS OPTN, 3: (1-VARIABLE CALC)
SHIFT, MODE, REPLAY(DOWN), 4:STAT

FREQUENCY
1: ON
2: OFF
Influence of Distribution Shape
Measures of Shape
- refer to the visual characteristics of a certain

distribution.
- knowledge of the shape of the distribution can
help in concluding whether the distribution is
normal or not
Two (2) Principal Measures

of Shape
SKEWNESS
KURTOSIS
Measures of Shape
Skewness
refers to the symmetry of a

distribution. A distribution
which is not symmetric with
respect to its mean can be
termed as either positively-
skewed or negatively-skewed Kurtosis
refers to the flatness or
peakedness of a particular
distribution
Skewness
SK = 0
Symmetric (Normal)
SK= S[(Xi - μ)/s]3
N SK > 0
where: Positively Skewed
Xi - individual reading
σ - standard deviation
μ - mean SK< 0
N - population size Negatively Skewed
Skewness relating to central tendency
negative skew: The left tail is longer than the right tail. It
has relatively few low values. The distribution is said to
be left-skewed or "skewed to the left“; Example
(observations): 1,1000,1001,1002,1003
positive skew: The right tail is longer the left tail. It has
relatively few high values. The distribution is said to be
right-skewed or "skewed to the right".Example
(observations): 1,2,3,4,100.
The skewness for a normal distribution is zero, and any

symmetric data should have a skewness near zero.
Kurtosis
k = 3
MesoKurtic (Normal)
k = S[(Xi - μ)/s] 4
where:
N
k > 3
Xi - individual reading LeptoKurtic
σ - standard deviation
μ - mean
N - population size k < 3
PlatyKurtic
Platykurtic data set has a flatter peak around its mean, which causes thin
tails within the distribution. The flatness results from the data being
less concentrated around its mean, due to large variations within
observations
Mesokurtic data, A term used in a statistical context where kurtosis of a

distribution is similar, or identical, to the kurtosis of a normally
distributed data set.
Leptokurtic distributions have higher peaks around the mean compared to

normal distributions, which leads to thick tails on both sides. These
peaks result from the data being highly concentrated around the mean,
due to lower variations within observations.
SUMMARY:
CENTRAL TENDENCY
MEAN MEDIAN MODE

N

x i
 x( n 1) / 2
UNGROUPED DATA   i 1
N
~

x 
ODD value of X that
n occurs most
x  x( n / 2 )  x( n / 2 ) 1
i
 often
x i 1
2 EVEN
n 
GROUPED DATA 
k
f i xi
n   cf
the midpoint of
 or x  ~ the cell with the
x  Lbi  2
i 1
i 1

k
fi (CS) highest
i 1 fi frequency
VARIABILITY
RANGE VARIANCE STANDARD
N
DEVIATION
UNGROUPED DATA R= HV-LV  (x i   )2
2  i 1 Population variance   2
n N
 i  x) 2
( x s  s2
s  2 i 1
Sample variance
n 1
GROUPED DATA R= Highest Upper
f
k
i ( xi   ) 2
Class Boundary –  2
 i 1
k Population variance   2
Lowest Lower Class k
 fi

i 1
f i ( xi  x ) 2 s  s2
Boundary s2  i 1
Sample variance
 k 
  fi   1
 i 1 
SUMMARY:
SHAPE
SKEWNESS POPULATION SAMPLE

N N
UNGROUPED DATA  ( xi   )3  (x i  x)3 SK = 0, normal
SK  i 1
SK  i 1 SK > 0, positively skewed
N 3 ns 3 SK < 0, negatively skewed
GROUPED DATA k k
 f i ( xi   ) 3  f i ( xi  x ) 3
SK  i 1 SK  i 1
 k   k 
  f i  3   fi s 3
 i 1   i 1 
KURTOSIS
UNGROUPED DATA N N
K= 3, mesokurtic
 (x i   )4  (x i  x) 4
K>3, leptokurtic
K  i 1
K  i 1
N 4 ns 4 K<3, mesokurtic
GROUPED DATA
k k
f i ( xi   ) 4 f i ( xi  x ) 4
K  i 1
K  i 1
 k   k 
  f i  4   fi s 4
 i 1   i 1 
PRACTICAL SIGNIFICANCE OF THE
STANDARD DEVIATION
A. TCHEBYSHEFF’S (CHEBYSHEV) THEOREM

Tchebysheff’s theorem applies to any set of measurements
and can be used to describe either a sample of or
population. The idea involved in this theorem is illustrated
below. An interval is constructed by measuring a distance k σ
on either side of the mean μ. Note that the theorem is true
for any number we choose for k as it is greater than or equal
to 1. Then at least 1 – (1/k²) of the total number of n
measurements lies constructed interval
The theorem states that:
-At least none the measurements lie in the
interval μ-σ to μ+σ.
-At least ¼ of the measurements lie in the
interval μ-2σ to μ+2σ.
-At least 8/9 of the measurements lie in the
interval μ-3σ to μ+3σ.
B. EMPIRICAL RULE
Another rule helpful in interpreting a value for a
standard deviation is the Empirical rule, which
applies to a data set having a distribution that is
approximately bell-shaped. The empirical rule is
often stated in abbreviated form, sometimes
called the 68-95-99.7 rule.
2.35% 2.35%
0.15% 0.15%
Examples:
1. Let X be the number of screws delivered to a box by an automatic filling
device. Assume µ = 1,000, σ2 = 25. There are problems with too many
(giving away free product) or too few (potential irritated customers)
screws in a box.
a) How many σ-units to the right of µ is 1009?
b) What X value 2.6 σ-units to the left of µ ?
c) Use Chebyshev’s inequality to find a bound on P[994 < X < 1006].
a) Limit(X) = µ ± kσ Then, X=1009 is 1.8(units) standard

limit -  1009  1000 deviation to the right of µ
k   1.8
 5
b) Limit(X) = µ ± kσ Then, X=987 is 2(units) standard

X= 1000 – 2.6(5) = 987 deviation to the left of µ
c) Area = 1 – 1/k2
Area = 1 – 1/(-1.2)2
To solve k:
=0.3056
limit -  994  1000
k   1.2
 5 P(994 < X < 1006) ≥ 0.3056
2.The mean life of a certain brand of auto batteries is 44
months with a standard deviation of three months. Assume
that the lives of all auto batteries of this brand have a bell-
shaped distribution. Using the empirical rule, find the
percentage of auto batteries of this brand that have a life
of
a. 41 to 47 months b. 41 to 50 months c. 35 to 53
months
0.34 0.34
0.135
0.135
0.0015 0.0235 0.0235 0.0015
35 38 41 44 47 50 53
a) About 68% (0.34 + 0.34)of battery lives fall between 41 to 44 months
b) About 81.5% (0.34+0.34+0.135) of battery lives fall between 41 to 50 months
a) About 99.7% of battery lives fall between 35 to 53 months

ppt3 Descriptive2 (23-24)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ppt3 Descriptive2 (23-24)

Uploaded by

Copyright:

Available Formats

DESCRIPTIVE

Categorical frequency distribution represent data that can be

Example: The blood type of 25 blood donors are given below.

Example : The following data represent the number of defective

Class (Defects) Frequency f

Table below shows of IQ for Two Classes

82 96 105 115 131

FREQUENCY DISTRIBUTION OF IQ FOR TWO CLASSES

5-9 IIII 4 4.5-9.5 7 0.04 4 100

10-14 IIII-III 8 9.5-14.5 12 0.08 12 96

15-19 IIII-IIII-IIII-II 17 14.5-19.5 17 0.17 29 88

20-24 IIII-IIII-IIII- 26 19.5-24.5 22 0.26 55 71

30-34 IIII-IIII-IIII 15 29.5-34.5 32 0.15 90 25

35-39 IIII-IIII 10 34.5-39.5 37 0.10 100 10

10-14 8 9.5-14.5 1) ½(upper limit – lower limit)

10-14 8 9.5-14.5 12 upper class limit  lower class limit

CLASS TALLY FREQUENCY Less than Greater than

5-9 IIII 4 4 100

35-39 IIII-IIII 10 100 10

Another simple way to display the distribution of a

Stem and Leaf Plot of IQ for Two Classes

 Into two lines, with leaves 0-4 in the first line

STEM LEAF FREQUENC

STEM LEAF FREQUENC

Histogram of IQ Scores for Two Classes

5-9 IIII 4 4.5-9.5 7 0.04 4 100

10-14 IIII-III 8 9.5-14.5 12 0.08 12 96

15-19 IIII-IIII-IIII-II 17 14.5-19.5 17 0.17 29 88

20-24 IIII-IIII-IIII- 26 19.5-24.5 22 0.26 55 71

30-34 IIII-IIII-IIII 15 29.5-34.5 32 0.15 90 25

35-39 IIII-IIII 10 34.5-39.5 37 0.10 100 10

4.5 9.5 14.5 19.5 24.5 29.5 34.5 39.5

9.5-14.5 12 0.08 12 96 60 <OG

34.5-39.5 37 0.10 100 10

Where: fi is the frequency of class interval i

For Ungrouped Data:

The last measure of central

6.5 8.0 4.0 5.0 6.0 7.3

5.0 12.0 1.0 3.5 3.0 5.4

7.5 5.0 10.0 8.0 3.5 8.3

9.0 2.0 6.5 1.0 5.0 8.5

Find the mean, median and mode

Class Interval Frequency Class mark Cumulative

CLASS FREQUENCY CLASS CLASS <CF >CF

35-39 10 34.5-39.5 37 100 10

Best measure of central

Interval/Ratio (not skewed) Mean

Interval/Ratio (skewed) Median

The variance of a population of N measurements is defined to be the average of the

The standard deviation, in essence, represents the “average amount of

Class Interval Frequency Class mark Cumulative

SHIFT, MODE, REPLAY(DOWN), 4:STAT

- refer to the visual characteristics of a certain

Two (2) Principal Measures

refers to the symmetry of a

The skewness for a normal distribution is zero, and any

Mesokurtic data, A term used in a statistical context where kurtosis of a

Leptokurtic distributions have higher peaks around the mean compared to

MEAN MEDIAN MODE

SKEWNESS POPULATION SAMPLE

A. TCHEBYSHEFF’S (CHEBYSHEV) THEOREM

a) Limit(X) = µ ± kσ Then, X=1009 is 1.8(units) standard

b) Limit(X) = µ ± kσ Then, X=987 is 2(units) standard