You are on page 1of 52

CATEGORICAL DATA

ANALYSIS
by:

Novianto Budi
Kurniawan
WHAT IS
CATEGORICAL DATA?
Categorical data is a collection of
information that is divided into groups
2
3
WHAT IS CATEGORICAL DATA
Categorical data is a collection of
information that is divided into groups
Types of Categorical Data:
1. Nominal
This is a type of data used to name variables without providing any numerical value
(“labelled” or “named” data)
2. Ordinal
This is a data type with a set order or scale to it

Categorical data can take on numerical values (such as “1” indicating Yes and
“2” indicating No), but those numbers don’t have mathematical meaning. One
can neither add them together nor subtract them from each other.

4
WHAT IS CATEGORICAL DATA
The measurement scale
for the response
consists of a number of categories

Variable Measurement Scale


High school, Under
Education
graduate, Graduate, etc
Mortality Dead, alive
Food Very soft, Soft, Hard, Very
texture hard
Gender Male, Female

5
CATEGORICAL DATA
ANALYSIS
• Independent (Explanatory) Variable is Categorical (Nominal or Ordinal)
• Dependent (Response) Variable is Categorical (Nominal or Ordinal)
• Source: Data collection (Meta-analysis, Census, Survey, Observations,
etc)
• Special Cases: 2x2 (Each variable has 2 levels)
–Nominal/Nominal
–Nominal/Ordinal
–Ordinal/Ordinal

• Contingency Tables

6
CONTINGENCY TABLES
• A table showing the distribution of one variable in rows and another in
columns, used to study the association between the two variables.
• Tables representing all combinations of levels of explanatory and
response variables
• Numbers in table represent Counts of the number of cases in each cell

• Row and column totals are called Marginal counts

7
ANALYSIS OF CONTINGENCY
TABLES
Tables as technique of data description
What can a contingency table tell us?
• Comparison between groups
• Mutual relationship between 2 (or more) variables
• Explanatory Variable – Groups (Typically based on
demographics, exposure, or Trt)
• Response Variable – Outcome (Typically presence or absence
of a characteristic)

8
DISPLAYING CONTINGENCY
TABLES

9
ANALYSIS OF CONTINGENCY
TABLES
Bivariate analysis of categorical variables

Relationship of two categorical variables → comparison of sub-groups

(effect of independent variable on dependent variable)

Cross-tabulation

10
TWO-WAY CONTINGENCY
2×2 table TABLE ANALYSIS

Marginal frequencies Total number


Univariate frequency distribution for each
of cases
Source: [Lamser, Růžička 1970: 260] variable

11
TWO-WAY CONTINGENCY
Example:
TABLE
A sample 124 mice was divided into two groups, 84
receiving a standard dose of pathogenic bacteria
followed by an antiserum and a control group of 40
not receiving the antiserum. After 3 weeks the
numbers dead and alive in each group were
counted.
antiserum control Total antiserum control
Dead 19 18 37 % Dead 23 45
Alive 65 22 87
Association between
Total 84 40 124
mortality and treatment?

12
CONSTRUCTION OF
CONTINGENCY TABLES
Step 1: which variable is independent and which is depen
Determine
Step 2:Calculate percentages within the categories of the
independent variable.

Step 3:Compare percentages for one of the categories of


the dependent variable.

13
CONSTRUCTION OF
CONTINGENCY TABLES
INDEPENDENT – explanatory variable

Gender

DEPENDENT Satisfaction Men Women Total


variable 1 (not satisfied) 5 2 7
(outcome)
2 (moderate) 5 1 6
3 (satisfied) 2 6 8
Total 12 9 21

Frequently we have dependent variable on the left in columns


and independent (explanatory or predictor) in columns →
column percent.
14
CONSTRUCTION OF
Configuration of contingency table
CONTINGENCY
→ Column percent:
In the categories of independent variable we show complete (100 %) distribution of dependent variable.

INDEPENDENT – explanatory variable

Gender

DEPENDENT Satisfaction Men Women Total


variable 1 (not satisfied) 41 % (5) 22 % (2) 33 % (7)
(outcome)
2 (moderate) 41 % (5) 11 % (1) 29 % (6)
3 (satisfied) 16 % (2) 66 % (6) 38 % (8)
Total 100 % (12) 100 % (9) 100% (21)

Frequently we have dependent variable on the left in columns


and independent (explanatory or predictor) in columns →
column percent.
15
CONSTRUCTION OF
Illogical configuration
CONTINGENCY TABLE of crosstabulation

Gender

Satisfaction Men Women Total

1 (not satisfied) 5 (71 %) 2 (29 %) 7 (100 %)


2 5 (83 %) 1 (27 %) 6 (100 %)
3 (satisfied) 2 (25 %) 6 (75 %) 8 (100 %)
Total 12 9 21 (100 %)

Beliefs can‘t influence


gender ! 16
TWO-WAY CONTINGENCY
TABLE ANALYSIS
A two-way (2 x 2) contingency table

analysis evaluates whether a

statistical relationship exists

between two variables.

17
CONSTRUCTION OF
CONTINGENCY TABLE
Table 1. Relationship between Educational
Level and Performance on Civil Service
Examination
Education
Performance
on Civil High More than Total
Service School High
Examination or Less School

Low 100 200 300


High 150 800 950
Total 250 1,000 1,250

18
CONSTRUCTION OF
CONTINGENCY
Percentage TABLE
Distribution for Data of Table 1

Education
Performance
on Civil
Service High School More than
or Less High School
Examination

(100  250) x 100% = (200  1,000) x 100% =


Low
40% 20%
(150  250) x 100% = (800  1,000) x 100% =
High
60% 80%
Total (n = 250) 100% (n = 1,000) 100%
19
CONSTRUCTION OF
Percentage Distribution for Data of Table 1
CONTINGENCY TABLE
Education
Performance
on Civil
Service High School More than
or Less High School
Examination

Low 40% 20%


High 60% 80%
Total 100% 100%
(n = 250) (n = 1,000)
What you can conclude regarding the relationship
between Education and Performance on Civil Service
Examination?
20
ANALYSIS OF CONTINGENCY
ContingencyTABLES table: Larger
• The situation of four-way (2×2) table can be generalized as n × i, e.g. 2×3 or 3×3

• When interpreting the table it is important, whether one or both variable is


nominal or ordinal.
• Categorical variables can be in principle:
– dichotomised → 0/1 (e.g. voted/non-voted)
– multinomial → more than 2 nominal categories
(e.g. Studium: HiSo-daily / HiSo-distant / Management&Superv. )
– ordinal → we have ranking of the categories
(e.g. Education: 1. Elementary, 2. Vocational training, 3. Secondary w/t diploma, 4. University)

• This distinction results in how we interpret the results (%) and which coefficient
of association/correlation we can use.

21
ANALYSIS OF CONTINGENCY
A manager would expect income to lead to job satisfaction: The
TABLES
higher the income, the higher would be the expected job
satisfaction. Using the following data, do you agree?
Table 2. Cross-Tabulation of Income and Job
Satisfaction
Income
Job
Total
Satisfaction Low Medium High

Low 100 30 10 140


Medium 60 80 15 155
High 40 40 50 130
Total 200 150 75 425

22
ANALYSIS OF CONTINGENCY
TABLES - ORDINAL
Percentage Cross-Tabulation of Income-Job
Satisfaction Relationship
Income
Job
Satisfaction Low Medium High
Low 50% 20% 13%
Medium 30% 53% 20%
High 20% 27% 67%
Total 100% 100% 100%
(n = 200) (n = 150) (n = 75)

23
ANALYSIS OF CONTINGENCY
Income-Job Satisfaction
TABLES Relationship
• Avoid intermediate categories of the independent (and dependent) variable for
this purpose - will result in clearer understanding and interpretation of the table
• Compare the percentage of those with low income who have high job
satisfaction (20%) with the percentage of those with high income who have high
job satisfaction (67%).
• Alternatively, compare the percentage of those with low income who express low
job satisfaction (50%) with the percentage of those with high income who
express low job satisfaction (13%).
• They show that those with high income indicated high job satisfaction more often
than did those with low income (by 47%) and, conversely, that those with low
income indicated low job satisfaction more often than did their counterparts with
high income (by 37%).

24
ANALYSIS OF CONTINGENCY
A disgruntled official working in the personnel department is
TABLES - ORDINAL
disturbed by the level of incompetence she perceives in the
leadership of the organization. She is convinced that
incompetence rises to the top, and she shares this belief with
you as her coworker over lunch. She asked you to help her to
Table 3.her
substantiate Cross-Tabulation of Competence
claim. Using the following data, do youand
agree
with her judgement? Hierarchy
Competence
Hierarchy Total
Low Medium High
Low 113 60 27 200
Medium 31 91 38 160
High 8 8 24 40
Total 152 159 89 400
25
ANALYSIS OF CONTINGENCY
TABLES - ORDINAL
Percentage Cross-Tabulation of Competence-
Hierarchy Relationship
Competence
Hierarchy
Low Medium High
Low 74% 38% 30%
Medium 21% 57% 43%
High 5% 5% 27%
Total 100% 100% 100%
(n = 152) (n = 159) (n = 89)

26
TWO-WAY CONTINGENCY
TABLE ANALYSIS

CASE STUDY

27
STATISTICAL CONTROL TABLE
ANALYSIS
When a pair of variables were found to
be associated statistically, we inevitably
assumed that they were, in fact, related,
in the sense that changes in one could
be expected to lead to changes in the
other. Conversely, when the variables
were not associated statistically, we
28
STATISTICAL CONTROL TABLE
ANALYSIS
Control table analysis is a technique
for the analysis of multivariate (three or
more variable) analysis of nominal and
ordinal variables. Control table analysis
is used to determine how a third,
“control” variable may affect the
association between an independent
29
STATISTICAL CONTROL TABLE
ANALYSIS
The procedure by which the researcher

controls for the effect of a third


variable on a bivariate relationship is
deceptively simple. He or she examines the
relationship between the original two
variables within each of the categories of
the control variable and compares
30 the
Controlling for a Third
Variable
Step 1:Partition the sample according to the
categories of the control variable.
Step 2:Prepare the cross-tabulation between the
original two variables for each of the
subsamples defined by the control variable
in Step 1.
Step 3:Interpret the cross-tabulations obtained for
each of the categories of the control
variable.

31
STATISTICAL CONTROL TABLE
Example:
ANALYSIS
The Daily News, one of the leading newspapers in Central City,
has recently published a series of troubling articles accusing the
Central city government of favouritism in testing and hiring job
applicants. The articles charge Central city officials with giving
hiring preference to those whom they know, rather than to the
most qualified applicants. One article quotes an unsuccessful job
candidate: “Unless you know someone in Central city hall,
you’re not going to get a pass on the civil service
examination. And without the pass, you don’t make it
onto the hire list. Check the list—most of the people on it
have friends in city government. The key is to know
32
STATISTICAL CONTROL TABLE
Example (cont … ):
To do so, he draws aANALYSIS
random sample of 335 job applicants for
analysis from the Central city central personnel department. He
begins by cross-tabulating whether the applicant knew someone
in Central city government (previous contact) with the
information on whether she or he passed the civil service exam
(test
Tableperformance). Table 3 displays
3. Cross-Tabulation the cross-tabulation.
of Test Performance and
Prior
PriorContact
Contact
Test
Total
Performance No Yes
Fail 70 70 140
Pass 60 135 195
Total 130 205 335
33
STATISTICAL CONTROL TABLE
Example (cont … ):
ANALYSIS
Percentage Cross-Tabulation of Test Performance
and PriorPrior
Contact
Contact
Test
Performance No Yes

Fail 54% 34%


Pass 46% 66%
Total 100% 100%
(n = 130) (n = 205)

What can you conclude from this cross-tabulation?


Is the charge aired in the newspaper articles true?
34
STATISTICAL CONTROL TABLE
Example (cont … ):
ANALYSIS
Like the mayor (also you), the staff is disturbed by the results of
the cross-tabulation. He, however, suspects that a third variable
may be responsible for this surprising relationship:
education. He reasons that level of education will certainly
affect performance on the civil service exam. He feels that the
only reason why those with previous contact appear to fare
better on the test is that they are more likely to have completed
higher levels of education.
He intends to introduce education ( college graduate; not
college graduate) into the analysis. The anticipated
35
STATISTICAL CONTROL TABLE
Example (cont … ):
ANALYSIS

36
STATISTICAL CONTROL TABLE
Table 3.A.
ANALYSIS

37
STATISTICAL CONTROL TABLE
Table 3.B.
ANALYSIS

38
STATISTICAL CONTROL TABLE
Example (cont … ):
ANALYSIS
Although prior contact with a Central city official
appeared to affect test performance in the
original bivariate cross-tabulation, the introduction of
the control variable (education) made the
relationship disappear. Thus, in this example,
contact is not a cause of test
performance. Instead, previous contact is a
spurious variable—one that initially appears to be
related to the dependent variable but whose effect
39
STATISTICAL CONTROL TABLE
Example (cont … ):
ANALYSIS
Table 3.B. also demonstrates that, regardless of prior
contact with a Central city official, the percentage of
college graduates failing the examination (25%) is much
smaller than the percentage of nongraduates who fail
(67%). This finding indicates that it is education that
leads to test performance. For both those who have and
those who have not had prior contact, the higher the
education, the better is the performance
40 on
STATISTICAL CONTROL TABLE
ANALYSIS

EXERCICES

41
MEASURES OF ASSOCIATION
Measures of association are statistics whose
magnitude and sign (positive or negative)
provide an indication of the extent and direction of
relationship between two variables in a cross-
tabulation. In contrast to the percentage
difference, measures of association are calculated
on the basis of—and take into account—all data in the
contingency table. These statistics are designed to
indicate where an actual relationship falls
42 on the
MEASURES OF ASSOCIATION
Four conventions:
1.If the relationship between the two variables is
perfect, the measure equals + 1.0 (positive
relationship) or - 1.0 (negative relationship).

2.If there is no relationship between the two


variables, the measure equals 0.0.

3.The sign of the measure indicates the direction of


the relationship.
4.The stronger the relationship between the two
variables, the greater is the magnitude
43 of the
MEASURES OF ASSOCIATION
An Ordinal Measure of Association: Gamma

Gamma is based on the number of concordant pairs of


cases versus the number of discordant pairs in the table;
This value of gamma suggests a modest degree of
relationship between variables. Gamma compares each
of respondents on their answers to each of the variables

The concordant pairs demonstrate support for a positive


relationship, whereas the discordant pairs44show support
MEASURES OF ASSOCIATION
Concordant pairs vs Discordant pairs

• A pair of observations is concordant if the subject who


is higher on one variable is also higher on the other
variable.
• A pair of observations is discordant if the subject who
is higher on one variable is lower on the other variable.
• To calculate the concordant and discordant pairs, the
data are treated as ordinal, so ordinal data should be
appropriate for your application.
• The number of concordant and discordant pairs are
used in calculations for Kendall's tau, which measures
the association between two ordinal 45 variables.
EXERCISE 1
Example :
Table 4. Cross-Tabulation of Education and
Seniority
Education
Seniority Total
Low High
Low 20 10 30
High 5 15 20
Total 25 25 50

1. Calculate the magnitude and the direction of the


relationship between Education and Seniority!
2. What conclusion you can take from this data?
46
MEASURES OF ASSOCIATION
Example (cont…) :
Table 4. Cross-Tabulation of Education and
Seniority
Education
Seniority Total
Low High
Low 20 10 30
High 5 15 20
Total 25 25 50

Discordant pairs Concordant pairs


47
MEASURES OF ASSOCIATION
Example (cont…) :
 

Number of concordant pairs = 20 x 15 = 300 pairs.


Number of discordant pairs = 10 x 5 = 50 pairs.
 

This value of gamma indicates a relatively strong positive relationship


between education and seniority.

48
MEASURES OF ASSOCIATION
Example (cont…):
Percentaged Cross-Tabulation of Education and
Seniority
Education
Seniority
Low High
Low 80% 40%
High 20% 60%
Total 100% 100%
(n = 25) (n = 25)
As education increased, employees were more likely to
have high seniority by a (percentage) difference of 60% –
20% = 40%.
49
MEASURES OF ASSOCIATION
Example :
Table 5. Cross-Tabulation of Gender and
Contribution
Gender
Contributi
Total
on Male Female

No 100 125 225


Yes 200 75 275
Total 300 200 500

1. Calculate the magnitude and the direction of the


relationship between Gender and Contribution!
2. What conclusion you can take from this data?
50
EXERCISE 2

1. Calculate Gamma!
2. What conclusion you can take from this data?

51
THANK YOU
AND

SEE YOU IN THE NEXT SESSION

04:10
9/28/20

You might also like