Professional Documents
Culture Documents
1) Raw Data
Obtain a printout of the raw data for all the variables. Raw data
resembles a matrix, with the variable names heading the columns, and the
information for each case or record displayed across the rows.
Example: Raw data for a study of injuries among county workers (first 10
cases)
It is difficult to tell what is going on with each variable in this data set.
Raw data is difficult to grasp, especially with large number of cases or records.
Univariate descriptive statistics can summarize large quantities of numerical
data and reveal patterns in the raw data. In order to present the information in
a more organized format, start with univariate descriptive statistics for each
variable.
For example, the variable Severity of Injury:
Severity of Injury
3
4
6
4
5
9
3
2
9
3
2) Frequency Distribution
Obtain a frequency distribution of the data for the variable. This is done by identifying the
lowest and highest values of the variable, and then putting all the values of the variable in order from
lowest to highest. Next, count the number of appearance of each value of the variable. This is a count of
the frequency with which each value occurs in the data set. For example, for the variable "Severity of
Injury," the values range from 2 to 9.
Severity of Number of Injuries with
Injury this Severity
2 1
3 3
4 2
5 1
6 1
9 2
Total 10
3) Grouped Data
The severity of injury ratings can be collapsed into just a few categories or groups. Grouped data
usually has from 3 to 7 groups. There should be no groups with a frequency of zero (for example, there
are no injuries with a severity rating of 7 or 8). One way to construct groups is to have equal class
intervals (e.g., 1-3, 4-6, 7-9). Another way to construct groups is to have about equal numbers of
observations in each group. Remember that class intervals must be both mutually exclusive and
exhaustive.
Cumulative frequency distributions include a third column in the table (this can be done with
either simple frequency distributions or with grouped data):
Severity of Number of Cumulative
Injury Injuries frequency
2 1 1
3 3 4
4 2 6
5 1 7
6 1 8
9 2 10
A cumulative frequency distribution can answer questions such as, how
many of the injuries were at level 5 or lower? Answer=7
Frequencies can also be presented in the form of percentage distributions and cumulative
percentages.
Severity of Percent of Cumulative
Injury Injuries percentages
2 10 10
3 30 40
4 20 50
5 10 70
6 10 80
9 20 100
Graphing is a way of visually presenting the data. Many people can grasp
the information presented in a graph better than in a text format. The purpose
of graphing is to:
1. present the data
2. summarize the data
3. enhance textual descriptions
4. describe and explore the data
5. make comparisons easy
6. avoid distortion
7. provoke thought about the data
Bar Graphs
Bar graphs are used to display the frequency distributions for variables
measured at the nominal and ordinal levels. Bar graphs use the same width for
all the bars on the graph, and there is space between the bars. Label the parts
of the graph, including the title, the left (Y) or vertical axis, the right (X) or
horizontal axis, and the bar labels.
Bar graphs can also be rotated so that the bars are parallel to the horizontal
orientation of the page. For example,
HISTOGRAM
A histogram is a chart that is similar to a bar chart, but it is used for
interval and ratio level variables. With a histogram, the width of the bar is
important, since it is the total area under the bar that represents the
proportion of the phenomenon accounted for by each category. The bars convey
the relationship of one group or class of the variable to the other(s).
For example, in the case of the counties and employee injuries, we might
have information on the rate of injury according to the number of workers in
each county in State X.
If we group the injury rates into three groups, then a low rate of injury
would be 0.0-1.9 injuries per 1,000 workers; moderate would be 2.0-3.9; and
high would be 4.0 and above (in this case, up to 5.9). This could be graphed as
follows:
FREQUENCY POLYGON
A cumulative frequency polygon is used to display the cumulative
distribution of values for a variable.
PIE CHART
Other ways to look at the sub-groups or classes within one variable is by
the relation of each sub-group or class to the whole. This can be calculated
with a proportion. A proportion is obtained by dividing the frequency of
observations counted for one group or class (written as f) by the total number
of observations counted for the variable (written as N).
Many health statistics are expressed as rates, for example, the birth rate
is the number of births per some population, such as number of births per
1,000 women.