Professional Documents
Culture Documents
Chapter 3
did you find out from your reading about the titanic data? (Especially once the data were organized in a table?)
Frequency Table
Records totals and categories Gives percents for each category Names categories and tells how frequently each occurs Area on graph should correspond to magnitude Displays distribution of a categorical variable, showing counts for each category next to each other for easy comparison Shows whole group of cases as a circle, cut into slices where each slice is proportional to the fraction of the whole in each category
Distribution
Area Principle
Bar Chart
Pie Chart
Often, we want to investigate whether theres a relationship between two categorical variables. For example, our authors want to determine whether theres a relationship between the kind of ticket a passenger held and the passengers chance of survival. One method for investigating this relationship is by using a two-way table called a contingency table.
Second
118 167 285
Third
178 528 706
Crew
212 673 885
Total
711 1490 2201
The frequency distribution of one of the variables is called its marginal distribution.
Class
First Alive Count % of Row % of Table 203 28.6% 9.2% 122 8.2% 5.6% 325 14.8% 14.8% Second 118 16.6% 41.4% 5.4% 167 11.2% 58.6% 7.6% 285 12.9% Third 178 25.0% 25.2% 8.1% 528 35.4% 74.8% 24.0% 706 32.1% Crew 212 29.8% 24.0% 9.6% 673 45.2% 76.0% 30.6% 885 40.2% Total 711 100% 32.3% 32.3% 1490 100% 67.7% 67.7% 2201 100%
% of Column 62.5% Dead Count % of Row % of Table Total Count % of Row % of Table
Survival
% of Column 37.5%
% of Column 100%
100%
12.9%
100%
32.1%
100%
40.2%
100%
100%
First
Alive Count % of Column Dead Total Count % of Column Count % of Column 203 62.5% 122 37.5% 325 100%
Secon d
118 41.4% 167 58.6% 285 100%
Third
178 25.2% 528 74.8% 706 100%
Crew
212 24.0% 673 76.0% 885 100%
Total
711 32.3% 1490 67.7% 2201 100%
Survival
Percent of what?
What percent of the survivors were in second class? What percent were second-class passengers who survived? What percent of the second-class passengers survived? will help you to know the Who and whether to use row, column, or table percentages.
Conditional Distributions
A conditional distribution shows the distribution of one variable for just the individuals who satisfy some condition on another variable. For example, we could look at the conditional distribution of ticket class, conditional on having survived:
Class First Survival Alive 203 28.6% Secon Third d 118 16.6% 178 25.0% Crew 212 29.8% Total 711 100%
One way to compare the conditional distributions for survival by class is to look at a new type of bar chart called a segmented bar chart. A segmented bar chart treats each bar as the whole and divides it proportionally into segments corresponding to the percentage in each group.
20%
10% 0% Alive Dead
Does it appear that survival may have depended on class? Do you think there is an association between these variables?
Independence
In a contingency table, when the distribution of one variable is the same for all categories of another, we say that the variables are independent. Well see a way to check for independence formally later in the book. For now, well just compare the distributions.
Lets investigate our own data and see what questions we can answer
Dont violate the area principle. Keep it honest. Dont confuse similar-sounding percentages. Dont forget to look at the variables separately, too. Be sure to use enough individuals. Dont overstate your case. Dont use unfair or silly averages.
Consider this
Its the last inning of an important game. Your team is a run down with the bases loaded and two outs. The pitcher is due up, so youll be sending in a pinch-hitter. There are 2 batters available on the bench. Whom should you send in to bat?
Player A B Overall 33 for 103 45 for 151 Vs. LHP 28 for 81 12 for 32 Vs. RHP 5 for 22 33 for 119
This occurs when someone uses unfair averaging over different groups.
The moral of Simpsons paradox is to be careful when you average across different levels of a second variable. Its always better to compare percentages or other averages within each level of the other variable. The overall average may be misleading.
Homework
Pgs. 21-26 Choose 1 from numbers 1-4, then complete problems 5, 7, 12, 16, 22-24, 26, and 31 Read the Investigative Task