CH2 Organizing Data

CH2 Organizing Data
“In God we trust. All others must bring data.”

– W. Edwards Deming, statistician, professor, author, lecturer, and consultant.
Raw Data
 Data recorded in the sequence in which they are collected and before they are processed
or ranked are
Frequency Distributions
 In statistics, a frequency distribution is a list, table or graph that displays
the frequency of various outcomes in a sample.
 Each entry in the table contains the frequency or count of the occurrences of values
within a group or interval.
I. Fundamentals in Organizing Data

a. Proportions
- Measures the fraction of the total group that is associated with each score
- Called relative frequencies because they describe the frequency ( f ) in
relation to the total number (N)
f
proportion  p 
- N
b. Percentages
- Expresses relative frequency out of 100
- Can be included as a separate column in a frequency distribution table
f
percentage  p(100)  (100)
- N
The ability to quickly and comfortably covert between fractions (proportions), decimal fractions (relative
frequency) and percentages is fundamental to success in this course. Some students struggle with
reconciling the fact that although these are three distinct metrics, they all point to the same “deep” meaning.
1 / JMSG
Psychological Statistics
II. FREQUENCY DISTRIBUTION FOR QUALITATIVE DATA
a. Discrete Frequency Distribution (or Ungrouped data) It presents the data of
a discrete variable in a tabular form by listing all possible values that a discrete
variable can take along with the corresponding frequencies.
Example:
A sample of 30 employees from large companies was selected, and these employees were asked
how stressful their jobs were. The responses of these employees are recorded next where very
represents very stressful, somewhat means somewhat stressful, and none stands for not stressful
at all.
Somewhat None Somewhat Very Very None
Very Somewhat Somewhat Very Somewhat Somewhat
Very Somewhat None Very None Somewhat
Somewhat Very Somewhat Somewhat Very None
Somewhat Very very somewhat None Somewhat
I. TABULAR
STEPS
1. To make the frequency distribution table, first write the categories in one column
Table 1. Frequency Distribution of Stress on Job

Stress on Tally Frequency (f)
Job
Very
Somewhat
None
∑f=
2. Next, tally the numbers in each category (from the results above). For example, the
number zero appears four times in the list, so put four tally marks “||||”:

Job
Very ||||||||||
Somewhat ||||||||||||||
None ||||||
∑f=
2 / JMSG
3. Finally, count up the tally marks and write the frequency in the final column. The
frequency is just the total.

Job
Very |||||||||| 10
Somewhat |||||||||||||| 14
None |||||| 6
∑f= 30
4. Calculating Relative Frequency of a Category and Percentage of Relative frequency

Frequency of that category
Re lative frequency of a category 
Sum of all frequencie s

Stress on Tally Frequency (f) Rf %Rf
Job
Very |||||||||| 10 10/30 = .333 .333(100) = 33.3
Somewhat |||||||||||||| 14 14/30 = .467 .467(100) = 46.7
None |||||| 6 6/30 = .200 .200(100) = 20.0
∑f= 30 ∑Rf=1 ∑%=100
II. GRAPHICAL
A. Bar Graph
- A graph made of bars whose heights represent the frequencies of
respective categories
- are used to compare things between different groups or to track changes
over time. However, when trying to measure change over time, bar graphs
are best when the changes are larger.
- In consists of rectangular bars of equal width.
- The space between the two consecutive bars must be the same.
- Bars can be marked both vertically and horizontally but normally we use
vertical bars.
- The height of bar represents the frequency of the corresponding
observation.
- Steps in construction of bar graphs/column graph:
1. On a graph, draw two lines perpendicular to each other,
intersecting at 0.
2. The horizontal line is x-axis and vertical line is y-axis.
3 / JMSG
3. Along the horizontal axis, choose the uniform width of bars and
uniform gap between the bars and write the names of the data
items whose values are to be marked.
4. Along the vertical axis, choose a suitable scale in order to
determine the heights of the bars for the given values. (Frequency
is taken along y-axis).
5. Calculate the heights of the bars according to the scale chosen and
draw the bars.
16
14
12
Frequency
10
8
6
4
2
0
Very Somewhat None
Stress on Job
B. Pie Chart
- A circle divided into portions that represent the relative frequencies or
percentages of a population or a sample belonging to different
- are best to use when you are trying to compare parts of a whole. They do
not show changes over time.
- Steps in construction of pie chart:
1. Calculate the central angle for each component, using the formula.
= [Percentage value of the component / 100] ⋅ 360°
Stress on Frequency (f) Rf Angle Size
Job
Very 10 .333 360(.333) = 119.88
Somewhat 14 .467 360(.467) = 168.12
None 6 .200 360(.200) = 72.00
∑f= 30 ∑Rf=1
2. Draw a circle of convenient radius.
3. Within this circle, draw a horizontal radius.
4. Draw radius making central angle of first component with
horizontal radius; this sector represents the first component. From
this radius, draw next radius with central angle of second
component; this sector represents second component and so on,
until we exhaust all components.
5. Shade each sector differently and mark the component it
represents.
4 / JMSG
6. Give the heading for each component.
None,
Very, 20%
33.30%
Somew
hat,
46.70%
5 / JMSG
III. FREQUENCY DISTRIBUTION FOR QUANTITATIVE DATA
o If the number of categories is very large, they are combined (grouped) to make
the table easier to understand
o However, information is lost when categories are grouped
 Individual scores cannot be retrieved
 The wider the grouping interval, the more information is lost
A. “Rules” for Constructing Grouped Frequency Distributions

 Requirements (Mandatory Guidelines)
o All intervals must be the same width
o Make the bottom (low) score in each interval a multiple of the interval
width
 “Rules of Thumb” (Suggested Guidelines)
o Ten or fewer class intervals is typical (but use good judgment for the
specific situation)
o Choose a “simple” number for interval width
B. Constructing frequency distributions for continuous variables requires understanding
that a score represents an interval
A given “score” could have been any value within the score’s real limits
The recorded value was rounded off to the middle value between the score’s real limits
Individuals with the same recorded score probably differed slightly in their actual performance
Consequently, grouping several scores actually requires grouping several intervals
Apparent limits of the (grouped) class interval are always one unit smaller than the real limits of
the (grouped) class interval.
I. TABULAR
Given: Scores on Quizzes
6 / JMSG
Steps
1. Calculate the range of the scores; Range = Highest – Lowest.
Range = 19 – 5
=4
2. Determine the class interval width by using the following formula:
Rule of thumb for class interval
Sample size Intervals
<50 5-7
50-100 8-10
101-250 11-15
>250 16-20
Assume: class interval = 8
class width = 14/ 8 = 1.75
3. If this formula yields a decimal value, the value to the nearest whole number.
1.75 = 2
4. List all the class intervals, placing the interval containing the smallest value at the top.
Quiz score
5–6
7–8
9 – 10
11 – 12
13 – 14
15 – 16
17 – 18
19 – 20
7 / JMSG
5. Tally the raw scores into appropriate class intervals.
6. Add the tallies for each interval to obtain the interval frequency. The tally for each class
Quiz score f
5–6 4
7–8 8
9 – 10 9
11 – 12 13
13 – 14 7
15 – 16 6
17 – 18 2
19 – 20 1
∑f = 50
II. Graphical
Steps
1. Construct the Class Boundaries: Separate one class in a grouped frequency distribution
from another. The boundaries have one more decimal place than the raw data and
therefore do not appear in the data. There is no gap between the upper boundary of one
class and the lower boundary of the next class. The lower class boundary is found by
subtracting 0.5 units from the lower class limit and the upper class boundary is found by
adding 0.5 units to the upper class limit.
Quiz score f Class Boundary
5–6 4 4.5 – 6.5
7–8 8 6.5 – 8.5

9 – 10 9 8.5 – 10.5
11 – 12 13 10.5 – 12.5
13 – 14 7 12.5 – 14.5
15 – 16 6 14.5 – 16.5
17 – 18 2 16.5 – 18.5
19 – 20 1 18.5 – 20.5
∑f = 50
8 / JMSG
2. Calculation of Class mark or midpoint
Lower limit  Upper limit
Class midpoint or mark 
2
Quiz score f Class Boundary Class midpoint
5–6 4 4.5 – 6.5 5.5
7–8 8 6.5 – 8.5 7.5

9 – 10 9 8.5 – 10.5 9.5
11 – 12 13 10.5 – 12.5 11.5
13 – 14 7 12.5 – 14.5 13.5
15 – 16 6 14.5 – 16.5 15.5
17 – 18 2 16.5 – 18.5 17.5
19 – 20 1 18.5 – 20.5 19.5
∑f = 50
3. Calculate the Rf and %Rf

Quiz f Class Class midpoint Rf %Rf
score Boundary
5–6 4 4.5 – 6.5 5.5 0.08 8
7–8 8 6.5 – 8.5 7.5 0.16 16

9 – 10 9 8.5 – 10.5 9.5 0.18 18
11 – 12 13 10.5 – 12.5 11.5 0.26 26
13 – 14 7 12.5 – 14.5 13.5 0.14 14
15 – 16 6 14.5 – 16.5 15.5 0.112 12
17 – 18 2 16.5 – 18.5 17.5 0.04 4
19 – 20 1 18.5 – 20.5 19.5 0.02 2
∑f = 50 ∑Rf = 1 ∑%= 100
4. Graph it
A. Histogram
- A histogram is a graph in which classes are marked on the horizontal axis
and the frequencies, relative frequencies, or percentages are marked on the
vertical axis.
- The frequencies, relative frequencies, or percentages are represented by
the heights of the bars.
- In a histogram, the bars are drawn adjacent to each other.
9 / JMSG
- Requires numeric scores (interval or ratio)
- Represent all scores from minimum thru maximum observed data values
- Include all scores with frequency of zero
- Draw bars above each score (interval)
 Height of bar corresponds to frequency
 Width of bar corresponds to score real limits (or one-half score
unit above/below discrete scores)
f
14
12
10
0
5–6 7–8 9 – 10 11 – 12 13 – 14 15 – 16 17 – 18 19 – 20
Figure 1. Histogram of Quiz Score of Students
B. Block Histogram
- A histogram can be made a “block” histogram
- Create a bar of the correct height by drawing a stack of blocks
- Each block represents one per case. Therefore, block histograms show the
frequency count in each bar
C. Polygon
- A graph formed by joining the midpoints of the tops of successive bars in
a histogram with straight lines
- List all numeric scores on the X-axis
 Include those with a frequency of f = 0
- Draw a dot above the center of each interval
 Height of dot corresponds to frequency
 Connect the dots with a continuous line
 Close the polygon with lines to the Y = 0 point
- Can also be used with grouped frequency distribution data
10 / JMSG
f
14
12
10
8
6
4
2
0
5–6 7–8 9 – 10 11 – 12 13 – 14 15 – 16 17 – 18 19 – 20
SHAPES OF HISTOGRAMS
• Summarizes important characteristics of data
1. What is shape of the distribution?
2. Where is middle of distribution?
3. How wide is distribution?
 Shapes of distributions
o Unimodal distribution
 single value is most frequent
o Bimodal (or multimodal )
 2 most frequently occurring values
 May indicate relevant subgroups ~
o Symmetric
 if right side mirror-image of left
o Skewed - asymmetric
 a few extreme values
 Positively skewed: right tail longer
 Negatively skewed: left tail longer ~
D. Stem-and-Leaf
- In a stem-and-leaf display of quantitative data, each value is divided into
two portions – a stem and a leaf. The leaves for each stem are shown
separately in a display.
 Example 1: The following are the scores of 30 college students on a statistics test:
Steps:
1. To construct a stem-and-leaf display for these scores, we split each
score into two parts. The first part contains the first digit, which is
11 / JMSG
called the stem. The second part contains the second digit, which is
called the leaf.
2. After we have listed the stems, we read the leaves for all scores
and record them next to the corresponding stems on the right side
of the vertical line.
5 2 0 7
6 5 9 1 8 4
7 5 9 1 2 6 9 7 1 2
8 0 7 1 6 3 4 7
9 6 3 5 2 2 8
Figure 1. Stem-and-leaf display of test scores.
5 0 2 7
6 1 4 5 8 9
7 1 1 2 2 5 6 7 9 9
8 0 1 3 4 6 7 7
9 2 2 3 5 6 8
Figure 2. Ranked stem-and-leaf display of test scores.
12 / JMSG
 Example 2. The following data are monthly rents paid by a sample of 30 households
selected from a small city.
880 1081 721 1075 1023 775 1235 750 965 960
1210 985 1231 932 850 825 1000 915 1191 1035
1151 630 1175 952 1100 1140 750 1140 1370 1280
6 30
7 75 50 21 50
8 80 25 50
9 32 52 15 60 85 65
10 23 81 35 75 00
11 91 51 40 75 40 00
12 10 31 35 80
13 70
Figure 1. Stem-and-leaf display of rents.
 Example 3. The following stem-and-leaf display is prepared for the number of hours that
25 students spent working on computers during the last month.
 0  6
 1  1 7 9
 2  2 6
 3  2 4 7 8
 4  1 5 6 9 9
 5  3 6 8
 6  2 4 4 5 7
 7 
 8  5 6
 * Prepare a new stem-and-leaf display by grouping the stems.
0–2 6 * 1 7 9 * 2 6
3–5 2 4 7 8 * 1 5 6 9 9 * 3 6 8
6–8 2 4 4 5 7 * * 5 6
13 / JMSG
14 / JMSG

CH2 Organizing Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH2 Organizing Data

Uploaded by

Copyright:

Available Formats

CH2 Organizing Data

“In God we trust. All others must bring data.”

I. Fundamentals in Organizing Data

Table 1. Frequency Distribution of Stress on Job

Table 1. Frequency Distribution of Stress on Job

Table 1. Frequency Distribution of Stress on Job

4. Calculating Relative Frequency of a Category and Percentage of Relative frequency

Table 1. Frequency Distribution of Stress on Job

A. “Rules” for Constructing Grouped Frequency Distributions

Assume: class interval = 8

class width = 14/ 8 = 1.75

7–8 8 6.5 – 8.5

7–8 8 6.5 – 8.5 7.5

3. Calculate the Rf and %Rf

7–8 8 6.5 – 8.5 7.5 0.16 16

Figure 1. Histogram of Quiz Score of Students

Figure 2. Ranked stem-and-leaf display of test scores.

Figure 1. Stem-and-leaf display of rents.

You might also like