PART II
Q. A.  1
Prepared By
Kunal Mojidra
2
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
UNGROUPED VERSUS GROUPED DATA
Ungrouped data
• have not been summarized in any way
• are also called raw data
Grouped data
• have been organized into a frequency
distribution
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
3
EXAMPLE OF UNGROUPED DATA
42
30
53
50
52
30
55
49
61
74
26
58
40
40
28
36
30
33
31
37
32
37
30
32
23
32
58
43
30
29
34
50
47
31
35
26
64
46
40
43
57
30
49
40
25
50
52
32
60
54
Ages of a Sample of
Doctors’ in Gandhinagar
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
4
FREQUENCY DISTRIBUTION OF AGES OF A
SAMPLE OF DOCTORS’ IN GANDHINAGAR
Class Interval Frequency
20under 30 6
30under 40 18
40under 50 11
50under 60 11
60under 70 3
70under 80 1
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
5
42
30
53
50
52
30
55
49
61
74
26
58
40
40
28
36
30
33
31
37
32
37
30
32
23
32
58
43
30
29
34
50
47
31
35
26
64
46
40
43
57
30
49
40
25
50
52
32
60
54
Smallest
Largest
51 =
23  74 =
Smallest  Largest = Range
DATA RANGE
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
6
NUMBER OF CLASSES AND CLASS WIDTH
The number of classes should be between 5 and 15.
• Fewer than 5 classes cause excessive
summarization.
• More than 15 classes leave too much detail.
Class Width
• Divide the range by the number of classes for an
approximate class width
• Round up to a convenient number
10 = Width Class
8.5 =
6
51
classes of No
Range
= Width Class e Approximat =
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
7
CLASS MIDPOINT
35 =
2
40 + 30
=
2
endpoint class ending + endpoint class beginning
= Midpoint Class
( )
35 =
10
2
1
+ 30 =
width class
2
1
+ point beginning class = Midpoint Class
RELATIVE AND CUMULATIVE FREQUENCY
Relative frequency is the proportion of the
total frequency that is in any given class interval
in a frequency distribution.
Relative frequency is the individual class
frequency divided by the total frequency.
The cumulative frequency is a running total
of frequencies through the classes of a frequency
distribution.
The cumulative frequency for each class interval
is the frequency for that class interval added to
the preceding cumulative total.
9
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EXAMPLE – 1
10
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
QUANTITATIVE DATA GRAPHS
One of the most effective mechanisms for presenting
data in a form meaningful to decision makers is
graphical depiction.
Through graphs and charts, the decision maker can
often get an overall picture of the data and reach some
useful conclusions merely by studying the chart or
graph.
Converting data to graphics can be creative and artful.
Often the most difficult step in this process is to reduce
important and sometimes expensive data to a graphic
picture that is both clear and concise and yet consistent
with the message of the original data.
One of the most important uses of graphical depiction
in statistics is to help the researcher determine the
shape of a distribution.
11
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
QUANTITATIVE DATA GRAPHS
Quantitative data graphs are plotted along a
numerical scale, and qualitative graphs are
plotted using nonnumerical categories.
12
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
Quantitative data graphs
histogram
frequency
polygon
ogive dot plot
stemand
leaf plot.
EXAMPLE 2
13
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
HISTOGRAMS
One of the more widely used types of graphs for
quantitative data is the histogram.
A histogram is a series of contiguous bars or
rectangles that represent the frequency of data in
given class intervals.
If the class intervals used along the horizontal
axis are equal, then the height of the bars
represent the frequency of values in a given class
interval.
14
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
HISTOGRAM
If the class intervals are unequal, then the areas
of the bars (rectangles) can be used for relative
comparisons of class frequencies
A histogram is a useful tool for differentiating the
frequencies of class intervals.
A quick glance at a histogram reveals which class
intervals produce the highest frequency totals
15
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
HISTOGRAM
16
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
FREQUENCY POLYGONS
A frequency polygon, like the histogram, is a
graphical display of class frequencies.
However, instead of using bars or rectangles like
a histogram, in a frequency polygon each class
frequency is plotted as a dot at the class
midpoint, and the dots are connected by a series
of line segments.
Construction of a frequency polygon begins by
scaling class midpoints along the horizontal axis
and the frequency scale along the vertical axis.
A dot is plotted for the associated frequency value
at each class midpoint.
Connecting these midpoint dots completes the
graph.
17
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
FREQUENCY POLYGONS
18
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
OGIVES
An ogive (ojive) is a cumulative frequency polygon.
Construction begins by labeling the xaxis with the
class endpoints and the yaxis with the frequencies.
However, the use of cumulative frequency values
requires that the scale along the yaxis be great
enough to include the frequency total.
A dot of zero frequency is plotted at the beginning of
the first class, and construction proceeds by
marking a dot at the end of each class interval for
the cumulative value.
Connecting the dots then completes the ogive.
19
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
OGIVES
20
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
DOT PLOTS
A relatively simple statistical chart that is generally used
to display continuous, Quantitative data is the dot plot.
In a dot plot, each data value is plotted along the
horizontal axis and is represented on the chart by a dot.
If multiple data points have the same values, the dots
will stack up vertically.
Dot plots can be especially useful for observing the
overall shape of the distribution of data points along with
identifying data values or intervals for which there are
groupings and gaps in the data.
21
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
STEMANDLEAF PLOT
Another way to organize raw data into groups
besides using a frequency distribution is a stem
andleaf plot.
This technique is simple and provides a unique
view of the data.
A stemandleaf plot is constructed by separating
the digits for each number of the data into two
groups, a stem and a leaf.
The leftmost digits are the stem and consist of the
higher valued digits.
The rightmost digits are the leaves and contain
the lower values.
22
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
STEMANDLEAF PLOT
23
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
QUALITATIVE GRAPHS
In contrast to quantitative data graphs that are
plotted along a numerical scale, qualitative
graphs are plotted using nonnumerical
categories.
24
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
PIE CHARTS
A pie chart is a circular depiction of data where the area of the
whole pie represents 100% of the data and slices of the pie
represent a percentage breakdown of the sublevels.
Pie charts show the relative magnitudes of the parts to the
whole.
They are widely used in business, particularly to depict such
things as budget categories, market share, and time/resource
allocations.
However, the use of pie charts is minimized in the sciences and
technology because pie charts can lead to less accurate
judgments than are possible with other types of graphs.
Generally, it is more difficult for the viewer to interpret the
relative size of angles in a pie chart than to judge the length of
rectangles in a bar chart.
25
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EXAMPLE – 3
26
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
Leading Petroleum Refining Companies
PIE CHART
27
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
BAR GRAPHS
Another widely used qualitative data graphing
technique is the bar graph or bar chart.
A bar graph or chart contains two or more
categories along one axis and a series of bars, one
for each category, along the other axis.
Typically, the length of the bar represents the
magnitude of the measure (amount, frequency,
money, percentage, etc.) for each category.
The bar graph is qualitative because the
categories are nonnumerical, and it may be
either horizontal or vertical.
28
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
BAR GRAPHS
In Excel, horizontal bar graphs are referred to as
bar charts, and vertical bar graphs are referred
to as column charts.
A bar graph generally is constructed from the
same type of data that is used to produce a pie
chart.
However, an advantage of using a bar graph over
a pie chart for a given set of data is that for
categories that are close in value, it is considered
easier to see the difference in the bars of bar
graph than discriminating between pie slices.
29
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EXAMPLE – 4
30
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
How Much is Spent on Back to College
Shopping by the Average Student
BAR GRAPHS
31
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
PARETO CHARTS
A third type of qualitative data graph is a Pareto chart,
which could be viewed as a particular application of the
bar graph.
An important concept and movement in business is
total quality management.
One of the important aspects of total quality
management is the constant search for causes of
problems in products and processes.
A graphical technique for displaying problem causes is
Pareto analysis.
Pareto analysis is a quantitative tallying of the number
and types of defects that occur with a product or
service.
Analysts use this tally to produce a vertical bar chart
that displays the most common types of defects, ranked
in order of occurrence from left to right. The bar chart
is called a Pareto chart. 32
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EXAMPLE – 6
Suppose the number of electric motors being
rejected by inspectors for a company hasbeen
increasing.
Company officials examine the records of several
hundred of the motors inwhich at least one defect
was found to determine which defects occurred
more frequently.
They find that 40% of the defects involved poor
wiring, 30% involved a short in the coil, 25%
involved a defective plug, and 5% involved
cessation of bearings.
33
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
PARETO CHARTS
34
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
GRAPHICAL DEPICTION OF TWO
VARIABLE NUMERICAL DATA:
SCATTER PLOTS
Many times in business research it is imprtant to
explore the relationship between two numerical
variables.
A graphical mechanism for examining the
relationship between two numerical variables—
the scatter plot (or scatter diagram).
A scatter plot is a twodimensional graph plot of
pairs of points from two numerical variables.
35
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
36
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
SCATTER PLOT
Registered
Vehicles
(1000's)
Petrol Sales
(1000's of
Liters)
5 60
15 120
9 90
15 140
7 60
0
100
200
0 5 10 15 20
P
e
t
r
o
l
S
a
l
e
s
Registered Vehicles
MEASURES OF CENTRAL TENDENCY:
UNGROUPED DATA
One type of measure that is used to describe a set
of data is the measure of central tendency.
Measures of central tendency yield information
about the center, or middle part, of a group of
numbers.
Measures of central tendency do not focus on the
span of the data set or how far values are from
the middle numbers.
The measures of central tendency here for
ungrouped data are the mode, the median, the
mean, percentiles, and quartiles.
37
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
MODE
The mode is the most frequently occurring value
in a set of data.
In the case of a tie for the most frequently
occurring value, two modes are listed. Then the
data are said to be bimodal.
Data sets with more than two modes are referred
to as multimodal. 38
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
MEDIAN
The median is the middle value in an ordered array of
numbers.
For an array with an odd number of terms, the
median is the middle number.
For an array with an even number of terms, the
median is the average of the two middle numbers.
A disadvantage of the median is that not all the
information from the numbers is used.
For example, information about the specific asking
price of the most expensive house does not really
enter into the computation of the median.
The level of data measurement must be at least
ordinal for a median to be meaningful.
39
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
MEDIAN
The following steps are used to determine the median.
STEP 1. Arrange the observations in an ordered data
array.
STEP 2. For an odd number of terms, find the middle
term of the ordered array. It is the median.
STEP 3. For an even number of terms, find the
average of the middle two terms. This average is the
median.
40
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
MEAN
The arithmetic mean is the average of a group of
numbers and is computed by summing all
numbers and dividing by the number of numbers.
Because the arithmetic mean is so widely
used, most statisticians refer to it simply as the
mean.
41
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EXAMPLE – 7
42
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
MEAN
The mean is affected by each and every value,
which is an advantage.
The mean uses all the data, and each data item
influences the mean.
It is also a disadvantage because extremely large
or small values can cause the mean to be pulled
toward the extreme value.
The mean is the most commonly used measure of
central tendency because it uses each data item
in its computation, it is a familiar measure, and
it has mathematical properties that make it
attractive to use in inferential statistics analysis.
43
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
44
VARIABILITY
Mean
Mean
No Variability in Cash Flow
Variability in Cash Flow
PERCENTILES
Percentiles are measures of central tendency that divide
a group of data into 100 parts.
There are 99 percentiles because it takes 99 dividers to
separate a group of data into 100 parts.
The nth percentile is the value such that at least n
percent of the data are below that value and at most
(100  n) percent are above that value.
Example: 90th percentile indicates that at least 90% of
the data lie below it, and at most 10% of the data lie
above it
The median and the 50th percentile have the same
value.
Applicable for ordinal, interval, and ratio data
Not applicable for nominal data
45
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
STEPS IN DETERMINING THE LOCATION
OF A PERCENTILE
46
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EXAMPLE – 8
Determine the 30th percentile of the following
eight numbers: 14, 12, 19, 23, 5, 13, 28, 17.
47
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
QUARTILES
Measures of central tendency that divide a group of
data into four subgroups
Q1: 25% of the data set is below the first quartile
Q2: 50% of the data set is below the second quartile
Q3: 75% of the data set is below the third quartile
Q1 is equal to the 25th percentile
Q2 is located at 50th percentile and equals the
median
Q3 is equal to the 75th percentile
Quartile values are not necessarily members of the
data set
48
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
QUARTILES
49
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EXAMPLE – 9
50
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EXAMPLE – 10
51
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EXAMPLE – 11
52
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EXAMPLE – 12
53
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
MEASURES OF VARIABILITY:
UNGROUPED DATA
Measures of variability describe the spread or the
dispersion of a set of data.
Common Measures of Variability
Range
Interquartile Range
Mean Absolute Deviation
Variance
Standard Deviation
Z scores
Coefficient of Variation
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
54
RANGE
The difference between the largest and the smallest
values in a set of data
Simple to compute
Ignores all data points except the two extremes
Example:
Range = Largest  Smallest
= 48  35 = 13
35
37
37
39
40
40
41
41
43
43
43
43
44
44
44
44
44
45
45
46
46
46
46
48
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
55
INTERQUARTILE RANGE
Range of values between the first and third quartiles
Range of the ―middle half‖
Less influenced by extremes
Interquartile Range Q Q = ÷ 3 1
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
56
DEVIATION FROM THE MEAN
Data set: 5, 9, 16, 17, 18
Mean:
Deviations from the mean: 8, 4, 3, 4, 5
µ = = =
¿
X
N
65
5
13
0 5 10 15 20
8
4
+3
+4
+5
µ
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
57
MEAN ABSOLUTE DEVIATION
Average of the absolute deviations from the mean
5
9
16
17
18
8
4
+3
+4
+5
0
+8
+4
+3
+4
+5
24
X
X ÷ µ X ÷ µ
M A D
X
N
. . .
.
=
÷
=
=
¿
µ
24
5
4 8
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
58
POPULATION VARIANCE
Average of the squared deviations from the
arithmetic mean
5
9
16
17
18
8
4
+3
+4
+5
0
64
16
9
16
25
130
X
X ÷ µ
( )
2
X ÷ µ
( )
2
2
130
5
26 0
o
µ
=
=
=
÷ ¿ X
N
.
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
59
POPULATION STANDARD DEVIATION
( )
2
2
2
130
5
26 0
26 0
5 1
o
µ
o
o
=
=
=
=
=
=
÷ ¿ X
N
.
.
.
5
9
16
17
18
8
4
+3
+4
+5
0
64
16
9
16
25
130
X
X ÷ µ
( )
2
X ÷ µ
Square root of the variance
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
60
SAMPLE VARIANCE
Average of the squared deviations from the
arithmetic mean
2,398
1,844
1,539
1,311
7,092
625
71
234
462
0
390,625
5,041
54,756
213,444
663,866
X X X ÷
( )
2
X X ÷
( )
2
2
1
663 866
3
221 288 67
S
X X
n
=
÷
=
=
÷ ¿
,
, .
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
61
SAMPLE STANDARD DEVIATION
Square root of the
sample variance
( )
2
2
2
1
663 866
3
221 288 67
221 288 67
470 41
S
X X
S
n
S
=
÷
=
=
=
=
=
÷ ¿
,
, .
, .
.
2,398
1,844
1,539
1,311
7,092
625
71
234
462
0
390,625
5,041
54,756
213,444
663,866
X X X ÷
( )
2
X X ÷
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
62
USES OF STANDARD DEVIATION
Indicator of financial risk
Quality Control
construction of quality control charts
process capability studies
Comparing populations
household incomes in two cities
employee absenteeism at two plants
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
63
STANDARD DEVIATION AS AN
INDICATOR OF FINANCIAL RISK
Annualized Rate of Return
Financial
Security
µ o
A 15%
3%
B 15% 7%
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
64
EMPIRICAL RULE
Data are normally distributed (or approximately
normal)
µ o ± 1
µ o ± 2
µ o ± 3
95
99.7
68
Distance from
the Mean
Percentage of Values
Falling Within Distance
EXAMPLE 13
A company produces a lightweight valve that is
specified to weigh 1365 grams.
Unfortunately, because of imperfections in the
manufacturing process not all of the valves produced
weigh exactly 1365 grams.
In fact, the weights of the valves produced are
normally distributed with a mean weight of 1365
grams and a standard deviation of 294 grams.
Within what range of weights would approximately
95% of the valve weights fall?
Approximately 16% of the weights would be more
than what value?
Approximately 0.15% of the weights would be less
than what value?
65
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
66
CHEBYSHEV’S THEOREM
Applies to all distributions
P k X k
k
for
( ) µ o µ o ÷ < < + > ÷ 1
1
2
k >1
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
67
CHEBYSHEV’S THEOREM
Applies to all distributions
µ o ± 4
µ o ± 2
µ o ± 3 11/3
2
= 0.89
11/2
2
= 0.75
Distance from
the Mean
Minimum Proportion
of Values Falling
Within Distance
Number of
Standard
Deviations
K = 2
K = 3
K = 4
11/4
2
= 0.94
EXAMPLE – 14
In the computing industry the average age of
professional employees tends to be younger than in
many other business professions.
Suppose the average age of a professional employed
by a particular computer firm is 28 with a standard
deviation of 6 years.
A histogram of professional employee ages with this
firm reveals that the data are not normally
distributed but rather are amassed in the 20s and
that few workers are over 40.
Apply Chebyshev’s theorem to determine within
what range of ages would at least 80% of the
workers’ ages fall.
68
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
69
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
COEFFICIENT OF VARIATION
Ratio of the standard deviation to the
mean, expressed as a percentage
Measurement of relative dispersion
( )
C V . . =
o
µ
100
EXAMPLE – 15
Five weeks of average prices for stock A are 57,
68, 64, 71, and 62. Compute a coefficient of
variation.
Z SCORE
A z score represents the number of standard
deviations a value (x) is above or below the mean
of a set of numbers when the data are normally
distributed.
Using z scores allows translation of a value’s raw
distance from the mean into units of standard
deviations.
If a z score is negative, the raw value (x) is below
the mean. If the z score is positive,
the raw value (x) is above the mean
70
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EXAMPLE  16
For a data set that is normally distributed with a
mean of 50 and a standard deviation of
10, Determine the z score for a value of 70.
71
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
EMPIRICAL RULE IN FORM OF Z SCORE
Between z = 1.00 and z = +1.00 are
approximately 68% of the values.
Between z = 2.00 and z = +2.00 are
approximately 95% of the values.
Between z = 3.00 and z = +3.00 are
approximately 99.7% of the values.
72
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
73
MEASURES OF CENTRAL TENDENCY
AND VARIABILITY: GROUPED DATA
Measures of Central Tendency
Mean
Median
Mode
Measures of Variability
Variance
Standard Deviation
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
74
MEAN OF GROUPED DATA
Weighted average of class midpoints
Class frequencies are the weights
µ =
=
=
+ + + · · · +
+ + + · · · +
¿
¿
¿
fM
f
fM
N
f M f M f M f M
f f f f
i i
i
1 1 2 2 3 3
1 2 3
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
75
EXAMPLE 17
Class Interval Frequency Class Midpoint fM
20under 30 6 25 150
30under 40 18 35 630
40under 50 11 45 495
50under 60 11 55 605
60under 70 3 65 195
70under 80 1 75 75
50 2150
µ = = =
¿
¿
fM
f
2150
50
43 0 .
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
76
MEDIAN OF GROUPED DATA
( )
Median L
N
cf
f
W
Where
p
med
= +
÷
=
2
:
L the lower limit of the median class
cf = cumulative frequency of class preceding the median class
f = frequency of the median class
W = width of the median class
N = total of frequencies
p
med
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
77
EXAMPLE  18
Cumulative
Class Interval Frequency Frequency
20under 30 6 6
30under 40 18 24
40under 50 11 35
50under 60 11 46
60under 70 3 49
70under 80 1 50
N = 50
( )
( )
Md L
N
cf
f
W
p
med
= +
÷
= +
÷
=
2
40
50
2
24
11
10
40 909 .
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
78
MODE OF GROUPED DATA
Midpoint of the modal class
Modal class has the greatest frequency
Class Interval Frequency
20under 30 6
30under 40 18
40under 50 11
50under 60 11
60under 70 3
70under 80 1
Mode =
+
=
30 40
2
35
EXAMPLE  19
For the following grouped data, Find Mean, Mode
and Median.
79
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
80
VARIANCE AND STANDARD DEVIATION
OF GROUPED DATA  PUPOLATION
EXAMPLE  20
If these is the following data of a population, find
Variance and Std. Deviation.
81
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
AS PER ORIGINAL FORMULA
82
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
AS PER COMPUTATIONAL FORMULA
83
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
84
VARIANCE AND STANDARD DEVIATION
OF GROUPED DATA  SAMPLE
EXAMPLE  21
Compute the mean, median, mode, variance, and
standard deviation on the following sample data.
85
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
86
MEASURES OF SHAPE
Skewness
Absence of symmetry
Extreme values in one side of a distribution
Kurtosis
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal shape
Platykurtic: flat and spread out
Box and Whisker Plots
Graphic display of a distribution
Reveals skewness
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
87
SKEWNESS
Negatively
Skewed
Positively
Skewed
Symmetric
(Not Skewed)
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
88
SKEWNESS SKEWNESS
Negatively
Skewed
Mode
Median
Mean
Symmetric
(Not Skewed)
Mean
Median
Mode
Positively
Skewed
Mode
Median
Mean
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
89
COEFFICIENT OF SKEWNESS
Summary measure for skewness
If S < 0, the distribution is negatively skewed
(skewed to the left).
If S = 0, the distribution is symmetric (not skewed).
If S > 0, the distribution is positively skewed
(skewed to the right).
( )
S
Md
=
÷ 3 µ
o
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
90
COEFFICIENT OF SKEWNESS
( )
( )
1
1
1
1
1
1
1
23
26
12 3
3
3 23 26
12 3
073
µ
o
µ
o
=
=
=
=
÷
=
÷
= ÷
M
S
M
d
d
.
.
.
( )
( )
2
2
2
2
2
2
2
26
26
12 3
3
3 26 26
12 3
0
µ
o
µ
o
=
=
=
=
÷
=
÷
=
M
S
M
d
d
.
.
( )
( )
3
3
3
3
3
3
3
29
26
12 3
3
3 29 26
12 3
073
µ
o
µ
o
=
=
=
=
÷
=
÷
= +
M
S
M
d
d
.
.
.
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
91
KURTOSIS
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal in shape
Platykurtic: flat and spread out
Leptokurtic
Mesokurtic
Platykurtic
BOX AND WHISKER PLOT
A boxandwhisker plot, sometimes called a box
plot, is a diagram that utilizes the upper and
lower quartiles along with the median and the
two most extreme values to depict a distribution
graphically.
Five specific values are used:
Median, Q2
First quartile, Q1
Third quartile, Q3
Minimum value in the data set
Maximum value in the data set 92
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
BOX AND WHISKER PLOT
Inner Fences
IQR = Q3  Q1
Lower inner fence = Q1  1.5 IQR
Upper inner fence = Q3 + 1.5 IQR
Outer Fences
Lower outer fence = Q1  3.0 IQR
Upper outer fence = Q3 + 3.0 IQR
93
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
94
BOX AND WHISKER PLOT
Q
1
Q
3
Q
2
Minimum Maximum
U
n
i
t
1

P
a
r
t
I
I
,
Q
A

1
,
K
U
N
A
L
95
SKEWNESS: BOX AND WHISKER PLOTS, AND
COEFFICIENT OF SKEWNESS
Negatively
Skewed
Positively
Skewed
Symmetric
(Not Skewed)
S < 0 S > 0