You are on page 1of 27

CHAPTER 2:

Visual Description of Data


to accompany

Introduction to Business Statistics


fourth edition, by Ronald M. Weiers
Chapter 2 - Learning Objectives
• Convert raw data into a data array.
• Construct:
– a frequency distribution.
– a relative frequency distribution.
– a cumulative relative frequency distribution.
• Construct a stem-and-leaf diagram.
• Visually represent data by using graphs
and charts.
Chapter 2 - Key Terms

• Data array
– An orderly presentation of data in either
ascending or descending numerical order.
• Frequency Distribution
– A table that represents the data in classes
and that shows the number of observations
in each class.
– No. of falling observation in each class
Chapter 2 - Key Terms
• Frequency Distribution
– Class – Each category of the frequency distribution
– Frequency - The number of data values falling within
each class.
– Class limits - The boundaries for each class. These
determine which data values are assigned to that class.
– Class interval – The width of each class.

Approximate class width = large value in raw data − smallest


value in raw data ÷ number of class desired
– Class mark - Midpoint of each class
Sturges’ rule

• How to set the approximate number of


classes to begin constructing a frequency
distribution.
k 1 3322
. (log n)
10
where k = approximate number of classes to use and
n = the number of observations in the data set .
How to Construct a
Frequency Distribution
1. Number of classes
Choose an approximate number of classes for your data.
Sturges’ rule can help.
2. Estimate the class interval
Divide the approximate number of classes (from Step 1) into
the range of your data to find the approximate class interval,
where the range is defined as the largest data value minus
the smallest data value.
3. Determine the class interval
Round the estimate (from Step 2) to a convenient value.
How to Construct a
Frequency Distribution, cont.
4. Lower Class Limit
Determine the lower class limit for the first class by
selecting a convenient number that is smaller than the
lowest data value.
5. Class Limits
Determine the other class limits by repeatedly adding the
class width (from Step 2) to the prior class limit, starting
with the lower class limit (from Step 3).
6. Define the classes
Use the sequence of class limits to define the classes.
Converting to a Relative
Frequency Distribution
1. Retain the same classes defined in the
frequency distribution.
2. Sum the total number of observations
across all classes of the frequency
distribution.
3. Divide the frequency for each class by the
total number of observations, forming the
percentage of data values in each class.
Forming a Cumulative Relative
Frequency Distribution
1. List the number of observations in the lowest
class.
2. Add the frequency of the lowest class to the
frequency of the second class. Record that
cumulative sum for the second class.
3. Continue to add the prior cumulative sum to the
frequency for that class, so that the cumulative
sum for the final class is the total number of
observations in the data set.
Forming a Cumulative Relative
Frequency Distribution, cont.
4. Divide the accumulated frequencies for each class
by the total number of observations -- giving you
the percent of all observations that occurred up to
an including that class.

• An Alternative: Accrue the relative frequencies


for each class instead of the raw frequencies.
Then you don’t have to divide by the total to get
percentages.
Example: Problem 2.53
• The average daily cost to community hospitals for
patient stays during 1993 for each of the 50 U.S. states
was given in the next table.
– a) Arrange these into a data array.
– b) Construct a stem-and-leaf display.
– *) Approximately how many classes would be appropriate
for these data? [*not in textbook]
– c & d) Construct a frequency distribution. State interval
width and class mark.
– e) Construct a histogram, a relative frequency distribution,
and a cumulative relative frequency distribution.
Problem 2.53 - The Data
AL $775 HI 823 MA 1,036 NM 1,046 SD 506
AK 1,136 ID 659 MI 902 NY 784 TN 859
AZ 1,091 IL 917 MN 652 NC 763 TX 1,010
AR 678 IN 898 MS 555 ND 507 UT 1,081
CA 1,221 IA 612 MO 863 OH 940 VT 676
CO 961 KS 666 MT 482 OK 797 VA 830
CT 1,058 KY 703 NE 626 OR 1,052 WA 1,143
DE 1,024 LA 875 NV 900 PA 861 WV 701
FL 960 ME 738 NH 976 RI 885 WI 744
GA 775 MD 889 NJ 829 SC 838 WY 537
Problem 2.53 - (a) Data Array
CA 1,221 TX 1,010 RI 885 NY 784 KS 666
WA 1,143 NH 976 LA 875 AL 775 ID 659
AK 1,136 CO 961 MO 863 GA 775 MN 652
AZ 1,091 FL 960 PA 861 NC 763 NE 626
UT 1,081 CH 940 TN 859 WI 744 IA 612
CT 1,058IL 917 SC 838 ME 738 MS 555
OR 1,052 MI 902 VA 830 KY 703 WY 537
NM 1,046 NV 900 NJ 829 WV 701 ND 507
MA 1,036 IN 898 HI 823 AR 678 SD 506
DE 1,024 MD 889 OK 797 VT 676 MT 482
Problem 2.53 - (b)
The Stem-and-Leaf Display
Stem-and-Leaf Display N = 50
Leaf Unit: 100

1 12 21
2 11 43, 36
8 10 91, 81, 58, 52, 46, 36, 24, 10
7 9 76, 61, 60, 40, 17, 02, 00
(11) 8 98, 89, 85, 75, 63, 61, 59, 38, 30, 29, 23
9 7 97, 84, 75, 75, 63, 44, 38, 03, 01
7 6 78, 76, 66, 59, 52, 26, 12
4 5 55, 37, 07, 06
1 4 82

Range: $482 - $1,221


Problem 2.53 - Continued

• To approximate the number of classes


we should use in creating the frequency
distribution, use Sturges’ Rule, n = 50:

k 13.322(log n)13.322(log 50)


10 10
13.322(1.69897)15.6446.6447
Sturges’ rule suggests we use
approximately 7 classes.
Constructing the Frequency
Distribution
• Step 1. Number of classes
– Sturges’ Rule: approximately 7 classes.
The range is: $1,221 – $482 = $739
$739/7 ­$106 and $739/8 ­$92
• Steps 2 & 3. The Class Interval
– So, if we use 8 classes, we can make each
class $100 wide.
Constructing the Frequency
Distribution
• Step 4. The Lower Class Limit
– If we start at $450, we can cover the range in 8 classes,
each class $100 in width.
The first class : $450 up to $550
• Steps 5 & 6. Setting Class Limits
$450 up to $550 $850 up to $950
$550 up to $650 $950 up to $1,050
$650 up to $750 $1,050 up to $1,150
$750 up to $850 $1,150 up to $1,250
Problem 2.53 - (c) & (d)
Average daily cost Number Mark
$450 – under $550 4 $500
$550 – under $650 3 $600
$650 – under $750 9 $700
$750 – under $850 9 $800
$850 – under $950 11 $900
$950 – under $1,050 7 $1,000
$1,050 – under $1,150 6 $1,100
$1,150 – under $1,250 1 $1,200
Interval width: $100
Problem 2.53 - (e) The
Histogram
12

10

0
500 600 700 800 900 1000 1100 1200
Problem 2.53 - The Relative
Frequency Distribution
Average daily cost Number Rel. Freq.
$450 – under $550 4 4/50 = .08
$550 – under $650 3 3/50 = .06
$650 – under $750 9 9/50 = .18
$750 – under $850 9 9/50 = .18
$850 – under $950 11 11/50 = .22
$950 – under $1,050 7 7/50 = .14
$1,050 – under $1,150 6 6/50 = .12
$1,150 – under $1,250 1 1/50 = .02
Problem 2.53 - (e) The Percentage
0.25 Polygon
0.2

0.15

0.1

0.05

0
0 200 400 600 800 1000 1200 1400
Problem 2.53 - The Cumulative
Frequency Distribution
Average daily cost Number Cum. Freq.
$450 – under $550 4 4
$550 – under $650 3 7
$650 – under $750 9 16
$750 – under $850 9 25
$850 – under $950 11 36
$950 – under $1,050 7 43
$1,050 – under $1,150 6 49
$1,150 – under $1,250 1 50
The Scatter Diagram
• A scatter diagram is a two-dimensional plot of
data representing values of two quantitative
variables.
• x, the independent variable, on the horizontal axis
• y, the dependent variable, on the vertical axis
• Four ways in which two variables can be
related:
1. Direct 2. Inverse 3. Curvilinear
4. No relationship
Problem 2.53 - The Cumulative
Relative Frequency Distribution
Average daily cost Cum.Freq. Cum.Rel.Freq.
$450 – under $550 4 4/50 = .02
$550 – under $650 7 7/50 = .14
$650 – under $750 16 16/50 = .32
$750 – under $850 25 25/50 = .50
$850 – under $950 36 36/50 = .72
$950 – under $1,050 43 43/50 = .86
$1,050 – under $1,150 49 49/50 = .98
$1,150 – under $1,250 50 50/50 = 1.00
Problem 2.53 - (e) The Percentage
Ogive (Less Than)
50
45
40
35
30
25
20
15
10
5
0
0 200 400 600 800 1000 1200
An Example: Problem 2.38
• For 6 local offices of a large tax preparation
firm, the following data describe x = service
revenues and y = expenses for supplies,
freight, postage, etc.
• Draw a scatter diagram representing the data.
Does there appear to be any relationship
between the variables? If so, is the relationship
direct or inverse?
Problem 2.38, continued
Scatter Plot with Trend Line

y = Office 25.0
Expenses
(thous) 23.0
21.0
19.0
17.0

15.0
200.0 300.0 400.0 500.0 600.0
x = Service Revenue (thous)

There appears to be a direct relationship between


the service revenue and the office expenses incurred.

You might also like