This action might not be possible to undo. Are you sure you want to continue?
usually categorical variables. Suppose that we have two variables, sex (male or female) and handedness (right- or left-handed). We observe the values of both variables in a random sample of 100 people. Then a contingency table can be used to express the relationship between these two variables, as follows:
right-handed left-handed TOTAL
The figures in the right-hand column and the bottom row are called marginal totals and the figure in the bottom right-hand corner is the grand total. The table allows us to see at a glance that the proportion of men who are right-handed is about the same as the proportion of women who are right-handed. However the two proportions are not identical, and the statistical significance of the difference between them can be tested with a Pearson's chi-square test, a G-test or Fisher's exact test or Barnard's test, provided the entries in the table represent a random sample from the population contemplated in the null hypothesis. If the proportions of individuals in the different columns varies between rows (and, therefore, vice versa) we say that the table shows contingency between the two variables. If there is no contingency, we say that the two variables are independent. The example above is for the simplest kind of contingency table, in which each variable has only two levels; this is called a 2 x 2 contingency table. In principle, any number of rows and columns may be used. There may also be more than two variables, but higher order contingency tables are hard to represent on paper. The relationship between ordinal variables, or between ordinal and categorical variables, may also be represented in contingency tables, though this is less often done since the distributions of ordinal variables can be summarised efficiently by the median. The degree of association between the two variables can be assessed by a number of coefficients: the simplest is the phi coefficient defined by
In tables with more than two levels for each variable an analogous quantity is called the polychoric correlation coefficient. and N is the grand total number of observations.where χ2 is derived from the Pearson test. C can be adjusted so it reaches a maximum of 1 when there is complete association in a table of any number of rows and columns by dividing it by . The tetrachoric correlation coefficient the Pearson productmoment correlation coefficient between hypothetical row and column variables with Normal distributions. the contingency coefficient C and Cramér's V. The term contingency table was first used by Karl Pearson in "On the Theory of Contingency and its Relation to Association and Normal Correlation" in Drapers' Company Research Memoirs (1904) Biometric Series I. Alternatives include the tetrachoric correlation coefficient (also only useful for 2 × 2 tables). . This coefficient can only be used for 2 x 2 tables. The formulae for the other coefficients are: k being the number of rows or the number of columns. It should not be confused with the Pearson productmoment correlation coefficient computed by assigning values 0 and 1 to the cells. that would reproduce the observed contingency table if they were divided into two categories in the appropriate proportions. C suffers from the disadvantage that it does not reach a maximum of 1 with complete association in asymmetrical tables (those where the numbers of row and columns are not equal). whichever is less. φ varies from 0 (corresponding to no association between the variables) to 1 (complete association).
Each cell in the table represents a mutually exclusive combination of X-Y values. A contingency table for these data might look like the following Light Regular Dark Total Male 20 40 50 110 Female 50 20 20 90 Total: 70 60 70 200 This is a two-way 2x3 contingency table (i. Later on.Contingency Table: A contingency table is a tabular representation of categorical data . "25-40". ">65". are also stratified by age group. Suppose the beer-drinkers data. "Dark").e. two rows and three columns). The table displays sample values in relation to two different variables that may be dependent or contingent on one another. "4065". Example 1 Suppose a study of speeding violations and drivers who use car phones produced the following fictional data: . In this case we would have a three-way 2x3x4 contingency table. we will use contingency tables again. For each drinker we have information on sex (variable X. besides sex and preference. The third discrete variable Z ("Age") in this case might. taking on 2 possible values: "Male" and "Female") and preferred category of beer (variable Y. but in another manner. The table helps in determining conditional probabilities quite easily. "Regular". For example. for example. A contingency table usually shows frequencies for particular combinations of values of two discrete random variable s X and Y. take on 4 values: "<25". consider a sample of N=200 beer-drinkers. See also: Contingency tables analysis A contingency table provides a different way of calculating probabilities. Sometimes three-way (and more) contingency tables are used. taking on 3 possible values: "Light". equivalent to 4 two-way 2x3 contingency tables (one 2x3 table for each of the 4 age-groups).
Calculate the following probabilities using the table Problem 1 P(person is a car phone user) = Solution number of car phone users total number in study = 305 755 [ Hide Solution ] Problem 2 P(person had no violation in the last year) = Solution number that had no violation total number in study = 685 755 [ Hide Solution ] . The column totals are 70 and 685. The row totals are 305 and 450.Speeding violation in the last year Car phone user Not a car phone user Total 25 45 70 No speeding violation in the last year 280 405 685 Table 1 Total 305 450 755 The total number of people in the sample is 755. Notice that 305+450=755 and 70+685=755.
Problem 3 P(person had no violation in the last year AND was a car phone user) = Solution 280 755 [ Hide Solution ] Problem 4 P(person is a car phone user OR person had no violation in the last year) = Solution 305 755 + 685 .
755 280 755 = 710 755 [ Hide Solution ] Problem 5 P(person is a car phone user GIVEN person had a violation in the last year) = 25 70 [ Show Solution ] Problem 6 .
Solution Sex The Coastline Near Lakes and Streams On Mountain Peaks Total Female 18 16 11 45 Male 16 25 14 55 Total 34 41 25 100 Table 3: Hiking Area Preference \ .P(person had no violation last year GIVEN person was not a car phone user) = 405 450 [ Show Solution ] Example 2 The following table shows a random sample of 100 hikers and the areas of hiking preferred: Sex The Coastline Near Lakes and Streams On Mountain Peaks Total Female 18 16 ___ 45 Male ___ ___ 14 55 Total ___ 41 ___ ___ Table 2: Hiking Area Preference Problem 1 Complete the table. Sex The Coastline Near Lakes and Streams On Mountain Peaks Total Female 18 16 11 45 Male 16 25 14 55 Total 34 41 25 100 Table 3: Hiking Area Preference [ Show Solution ] Complete the table.
not all numbers are continuous and measurable -.  Quantitative data Quantitative -.for example social security number -. but a difference of 10 degrees indicates the same difference in temperature anywhere along the scale. Observations that you count are usually ratio-scale (e. Since you can measure zero years. The Kelvin temperature scale. So one can say.e. Money is another common ratio-scale quantitative measure. agree. Note that the distance between these categories is not something we can measure.or categorical measurement expressed not in terms of numbers. When the categories may be ordered. the complete absence of heat. In statistics it is often used interchangeably with "categorical" data. etc. medium. the categories may have a structure to them. However. A temperature of 50 degrees Celsius is not "half as hot" as a temperature of 100. for example. however we may not know which value is the best or worst of these issues. the doubling principle breaks down in this scale.8 m" Quantitative data always are associated with a scale measure. constitutes a ratio scale because on the Kelvin scale zero indicates absolute zero in temperature. number of widgets). however. . but rather in terms of numbers. Categorical variables that judge size (small. Examples might be gender.. religion.or numerical measurement expressed not by means of a natural language description. but rather by means of a natural language description. Observations of this type are on a scale that has a meaningful zero value but also have an equidistant measure (i. the difference between 10 and 20 is the same as the difference between 100 and 110). A more general quantitative measure is the interval scale.even though it is a number it is not something that one can add or subtract. When there is not a natural ordering of the categories.. Attitudes (strongly disagree.g. For example: favourite colour = "blue" height = "tall" Although we may have categories. For example a 10 yearold girl is twice as old as a 5 year-old girl. Interval scales also have a equidistant measure. that 200 degrees Kelvin is twice as hot as 100 degrees Kelvin. For example: favourite colour = "450 nm" height = "1. or sport. neutral.) are ordinal variables. Probably the most common scale type is the ratio-scale. we call these nominal categories.Quantitative and qualitative data are two types of data. these are called ordinal variables. strongly agree) are also ordinal variables. time is a ratio-scale variable. large. race. Qualitative data Qualitative . However. disagree.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.