You are on page 1of 3

Chapter 2 – Tabular and Graphical Methods

A. Categorical variable – consists of observations Vertical axis should not have a very high values as an
that represent labels or names. upper limit.

Summarize the data with a frequency distribution:

 Group the data into categories and record the


number of observations that fall into each
category.
 The relative frequency for each category is the
proportion of observations in each category.
 Multiply the proportions by 100 to get Vertical axis should not be stretched.
percentages.

Example: Myers-Briggs assessment personality types for


1,000 employees at a technology firm.

Contingency Table – used to examine the relationship


between two categorical variables.
Bar Chart – depicts the frequency or relative frequency
for each category of the categorial variable. Stacked Column Chart – used to visualize more than
one categorical variable.
 Series of either horizontal or vertical bars
 Bar lengths proportional to the values they are  Graphically shows the contingency table
depicting.  Allows for the comparison compositive within
each category
Pie Chart – is a segmented circle whose segments
portray the relative frequencies of the categories of a
qualitative variable.
B. Numerical Variable - each observation Histogram - is the counterpart to the vertical bar chart
represents a meaningful amount or count. used for a categorical variable. Graphically depict a
frequency distribution for a numeric variable.
Use a frequency distribution to summarize a numerical
variable.  A series of rectangles
 Mark off the along the horizontal axis
 Instead of categories, we construct a series of
 The height of each bar represents the frequency
intervals or classes.
or relative frequency for each interval
 The data are more manageable using a
 No gaps between bars/intervals
frequency distribution, but some detail is lost.
 We have to make decisions about the number It allows us to quickly see where most of the
of intervals and the width of each interval. observations tend to cluster. It indicates the spread and
shape of the variable.
Example: house prices in Punta Gorda
 Symmetric: mirror image of itself on both sides
 Suppose we are going to have 6 intervals.
of its center
 The maximum is 649 and the minimum is 125.
 Skewed: positive (elongated right tail) or
 As a starting point, the width of each interval negative (elongated left tail)
could be: (649 – 125) / 6 = 87.33
 This would not give limits that are easily
recognizable, so we use 100.

 There are 3 other items to compute:

Relative frequency: proportion (or fraction) of


observations that falls into each interval.

Cumulative frequency: the number of observations that


falls below the upper limit of a particular interval.

Cumulative relative frequency: the proportion (or


fraction) of observations that falls below the upper limit
of a particular interval.
Scatter Plot – used to examine the relationship between Line Chart – displays a numerical variable as a series of
two numerical variables. consecutive observations connected by a line.

 Determine if two numerical variables are  A line chart is especially useful for tracking
related in some systematic way changes or trends over time.
 Each point represents a pair of observations of  It is also easy for us to identify any major
the two variables changes that happened in the past on a line
 Refer to one variable as x (x-axis) and the other chart.
as y (y-axis)  When multiple lines are plotted in the same
chart, we can compare these observations on
one or more dimensions.

Example: monthly stock prices for Apple and Merck

 A scatterplot with a categorical variable


modifies a basic scatterplot. Incorporate a
categorical variable in addition to the two
numeric variables.
 Encode the categorical variable with color.
 Giving each point a distinct hue makes it easy to
show its category.

You might also like