You are on page 1of 43

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/317345080

SOCIAL STATISTICS INTRODUCTION

Chapter · January 2010

CITATIONS READS

4 41,174

1 author:

Roger Penn
Queen's University Belfast
180 PUBLICATIONS   1,079 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Football and the Social World View project

Sociology and History View project

All content following this page was uploaded by Roger Penn on 05 June 2017.

The user has requested enhancement of the downloaded file.


Contents i

SOCIAL STATISTICS
ii Contents

Roger Penn is a Professor in the Department of Sociology as well as the Centre for
Applied Statistics at Lancaster University. His main research interests lie in the field
of economic sociology, as well as the application of social statistics to sociological
data. He has written books for Oxford University Press and Cambridge University
Press, as well as publishing in leading journals including Sociology, Sociological
Review, and British Journal of Sociology.

Damon Berridge lectures in the Centre for Applied Statistics at Lancaster Uni-
versity. He also serves as Head of Postgraduate Teaching in the Department of
Mathematics and Statistics and as Director of the National Statistical Consultancy
Service which offers statistical advice to UK social scientists. His work focuses on
modelling binary and ordinal recurrent events through random effects models.
SAGE BENCHMARKS IN SOCIAL RESEARCH Contents iii

SOCIAL STATISTICS

VOLUME I
The Statistical Analysis of Aggregate Categorical Data

Edited by
Roger Penn and Damon Berridge

Los Angeles | London | New Delhi


Singapore | Washington DC
iv Contents

Introduction and editorial arrangement © Roger Penn & Damon Berridge 2010

First published 2010

Apart from any fair dealing for the purposes of research or private study,
or criticism or review, as permitted under the Copyright, Designs and
Patents Act, 1988, this publication may be reproduced, stored or transmitted
in any form, or by any means, only with the prior permission in writing of
the publishers, or in the case of reprographic reproduction, in accordance
with the terms of licences issued by the Copyright Licensing Agency. Enquiries
concerning reproduction outside those terms should be sent to the publishers.

Every effort has been made to trace and acknowledge all the copyright owners
of the material reprinted herein. However, if any copyright owners have not
been located and contacted at the time of publication, the publishers will be
pleased to make the necessary arrangements at the first opportunity.

SAGE Publications Ltd


1 Oliver’s Yard
55 City Road
London EC1Y 1SP

SAGE Publications Inc.


2455 Teller Road
Thousand Oaks, California 91320

SAGE Publications India Pvt Ltd


B 1/I 1, Mohan Cooperative Industrial Area
Mathura Road
New Delhi 110 044

SAGE Publications Asia-Pacific Pte Ltd


33 Pekin Street #02-01
Far East Square
Singapore 048763

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN: 978-1-84787-356-9 (set of four volumes)

Library of Congress Control Number: 2009922857

Typeset by Star Compugraphics Private Limited, Delhi


Printed on paper from sustainable resources
Printed by MPG Books Group, Bodmin, Cornwall
Contents v

Contents

Appendix of Sources xi
Editors’ Introduction: Social Statistics xxi
Roger Penn and Damon Berridge

Volume I: The Statistical Analysis


of Aggregate Categorical Data
1. On the Influence of Previous Vaccination in Cases of Smallpox 1
W.R. Macdonell
2. Tables for Testing the Goodness of Fit of Theory to Observation 9
W. Palin Elderton
3. On the Inheritance in Coat-Colour of Thoroughbred Horses
(Grandsire and Grandchildren) 19
N. Blanchard
4. Notes on the Theory of Association of Attributes in Statistics 25
G. Udny Yule
5. A Further Study of Statistics Relating to Vaccination and Smallpox 41
W.R. Macdonell
6. The Law of Ancestral Heredity 51
Karl Pearson
7. On a Property Which Holds Good for All Groupings of a Normal
Distribution of Frequency for Two Variables, with Applications
to the Study of Contingency-Tables for the Inheritance
of Unmeasured Qualities 81
G. Udny Yule
8. On the Influence of Bias and of Personal Equation in Statistics
of Ill-defined Qualities: An Experimental Study 93
G. Udny Yule
9. Reply to Certain Criticisms of Mr G. U. Yule 97
Karl Pearson
10. On a New Method of Determining Correlation, When One Variable
is Given by Alternative and the Other by Multiple Categories 107
Karl Pearson
11. On the General Theory of Multiple Contingency with Special
Reference to Partial Contingency 119
Karl Pearson
12. On the Interpretation of χ2 from Contingency Tables,
and the Calculation of P 137
R.A. Fisher
13. On the Application of the χ2 Method to Association and
Contingency Tables, with Experimental Illustrations 145
G. Udny Yule
vi Contents

14. Statistical Tests of Agreement between Observation and Hypothesis 155


R.A. Fisher
15. Recent Improvements in Statistical Inference 165
Harold Hotelling
16. Contingency Tables Involving Small Numbers and the χ2 Test 175
F. Yates
17. Contingency Table Interactions 193
M.S. Bartlett
18. Calculation of Chi-Square for Complex Contingency Tables 199
H.W. Norton
19. George Udny Yule 1871–1951 207
Frank Yates
20. On the Hypothesis of No “Interaction” in a Multi-Way
Contingency Table 223
S.N. Roy and Marvin A. Kastenbaum
21. On Description of Differential Association 233
David Gold
22. Calculation of Chi-Square to Test the No Three-Factor
Interaction Hypothesis 239
Marvin A. Kastenbaum and Donald E. Lamphiear
23. Tests of Significance for 2 × 2 Contingency Tables 249
F. Yates

Volume II: Statistical Modelling


of Categorical Data
24. Measures of Association for Cross Classifications 1
Leo A. Goodman and William H. Kruskal
25. Interactions in Multi-Factor Contingency Tables 31
J.N. Darroch
26. Maximum Likelihood in Three-Way Contingency Tables 49
M.W. Birch
27. The Detection of Partial Association, II: The General Case 67
M.W. Birch
28. How to Ransack Social Mobility Tables and Other Kinds
of Cross-Classification Tables 85
Leo A. Goodman
29. A General Model for the Analysis of Surveys 125
Leo A. Goodman
30. Log Linear Models for Contingency Tables: A Generalization
of Classical Least Squares 173
J.A. Nelder
31. A Structural Model of the Mobility Table 183
Robert M. Hauser
32. How Destination Depends on Origin in the Occupational
Mobility Table 215
Otis Dudley Duncan
Contents vii

33. Criteria for Determining Whether Certain Categories in a


Cross-Classification Table Should Be Combined,
with Special Reference to Occupational Categories
in an Occupational Mobility Table 225
Leo A. Goodman
34. Structural Transformations in the British Class Structure: A Log
Linear Analysis of Marital Endogamy in Rochdale 1856–1964 263
R.D. Penn and D.C. Dawkins
35. Latent Structure Analysis of a Set of Multidimensional
Contingency Tables 283
Clifford C. Clogg and Leo A. Goodman
36. Exchange, Structure, and Symmetry in Occupational Mobility 305
Michael E. Sobel, Michael Hout and Otis Dudley Duncan
37. Canonical Analysis of Contingency Tables by Maximum Likelihood 317
Zvi Gilula and Shelby J. Haberman
38. More Universalism, Less Structural Mobility: The American
Occupational Structure in the 1980s 339
Michael Hout
39. The Analysis of Multivariate Contingency Tables by Restricted
Canonical and Restricted Association Models 375
Zvi Gilula and Shelby J. Haberman
40. Semiparametric Estimation in the Rasch Model and Related
Exponential Response Models, Including a Simple Latent Class
Model for Item Analysis 401
Bruce Lindsay, Clifford C. Clogg and John Grego
41. The Impact of Sociological Methodology on Statistical Methodology 425
Clifford C. Clogg
42. The Effects of Family Disruption on Social Mobility 449
Timothy J. Biblarz and Adrian E. Raftery

Volume III: The Analysis of Longitudinal Data


43. Statistical Inference about Markov Chains 1
T.W. Anderson and Leo A. Goodman
44. Statistical Methods for the Mover-Stayer Model 25
Leo A. Goodman
45. A Stochastic Model of Social Mobility 57
Robert McGinnis
46. Maximum Likelihood from Incomplete Data via the EM Algorithm 73
A.P. Dempster, N.M. Laird and D.B. Rubin
47. Dummy Endogenous Variables in a Simultaneous Equation System 127
James J. Heckman
48. Sample Selection Bias as a Specification Error 161
James J. Heckman
49. Statistical Models for Discrete Panel Data 171
James J. Heckman
viii Contents

50. The Incidental Parameters Problem and the Problem of Initial


Conditions in Estimating a Discrete Time-Discrete Data
Stochastic Process 227
James J. Heckman
51. Control for Omitted Variables in the Analysis of Panel and Other
Longitudinal Data 243
Richard B. Davies and Robert Crouchley
52. The Relationship between a Husband’s Unemployment
and His Wife’s Participation in the Labour Force 263
Richard B. Davies, Peter Elias and Roger Penn
53. Models for Sample Selection Bias 291
Christopher Winship and Robert D. Mare
54. An Approximate Generalized Linear Model with Random Effects
for Informative Missing Data 315
Dean Follmann and Margaret Wu
55. Modeling the Drop-Out Mechanism in Repeated-Measures Studies 339
Roderick J.A. Little
56. Pseudo-Likelihood for Combined Selection and Pattern-Mixture
Models for Incomplete Data 361
Geert Molenberghs, Bart Michiels and Michael G. Kenward
57. Mixed Effects Logistic Regression Models for Longitudinal
Binary Response Data with Informative Drop-Out 377
Thomas R. Ten Have, Allen R. Kunselman, Erik P. Pulkstenis
and J. Richard Landis
58. Random Coefficient Models for Binary Longitudinal Responses
with Attrition 401
Marco Alfò and Murray Aitkin
59. A Transitional Model for Longitudinal Binary Data Subject
to Nonignorable Missing Data 419
Paul S. Albert
60. Missing Covariates in Longitudinal Data with Informative
Dropouts: Bias Analysis and Inference 433
Jason Roy and Xihong Lin
61. Simple Solutions to the Initial Conditions Problem in Dynamic,
Nonlinear Panel Data Models with Unobserved Heterogeneity 453
Jeffrey M. Wooldridge

Volume IV: The Statistical Modelling of Ordinal


Data and Multi-Level Analysis
62. Identification and Estimation of Age-Period-Cohort Models
in the Analysis of Discrete Archival Data 1
Stephen E. Fienberg and William M. Mason
63. Regression Models for Ordinal Data [With Discussion] 55
Peter McCullagh
Contents ix

64. Social Background and School Continuation Decisions 103


Robert D. Mare
65. Measures of Nominal-Ordinal Association 125
Alan Agresti
66. Change and Stability in Educational Stratification 139
Robert D. Mare
67. Regression and Ordered Categorical Variables 161
J.A. Anderson
68. Statistical Modelling Issues in School Effectiveness Studies 201
M. Aitkin and N. Longford
69. Linear Modelling with Clustered Observations: An Illustrative
Example of Earnings in the Engineering Industry 259
R.B. Davies, A.M. Martin and R. Penn
70. A Random-Effects Model for Ordered Categorical Data 279
Robert Crouchley
71. The Analysis of Longitudinal Ordinal Data with Nonrandom
Drop-Out 299
G. Molenberghs, M.G. Kenward and E. Lesaffre
72. Analysis of Nonrandomly Censored Ordered Categorical
Longitudinal Data from Analgesic Trials 313
Lewis B. Sheiner, Stuart L. Beal and Adrian Dunne
73. Modeling Repeated Measures with Monotonic Ordinal Responses
and Misclassification, with Applications to Studying Maturation 333
Paul S. Albert, Sally A. Hunsberger and Frank M. Biro
74. A Hierarchical Latent Variable Model for Ordinal Data from
a Customer Satisfaction Survey with “No Answer” Responses 351
Eric T. Bradlow and Alan M. Zaslavsky
75. Analysis of Longitudinal Substance Use Outcomes Using
Ordinal Random – Effects Regression Models 371
Donald Hedeker and Robin J. Mermelstein
76. Mixed Effects Logistic Regression Models for Longitudinal
Ordinal Functional Response Data with Multiple-Cause
Drop-Out from the Longitudinal Study of Aging 391
Thomas R. Ten Have, Michael E. Miller, Beth A. Reboussin
and Margaret K. James
77. Modeling Clustered Ordered Categorical Data: A Survey 407
Alan Agresti and Ranjini Natarajan
78. Joint Regression and Association Modeling of Longitudinal
Ordinal Data 443
Anders Ekholm, Jukka Jokinen, John W. McDonald
and Peter W.F. Smith
79. Generalized Linear Mixed Models for Ordered Responses
in Complex Multilevel Structures: Effects Beneath the School
or College in Education 461
Antony Fielding and Min Yang
80. A Mixed-Effects Regression Model for Longitudinal
Multivariate Ordinal Data 493
Li C. Liu and Donald Hedeker
x Contents

81. Modelling Trajectories through the Educational System


in North West England 509
Roger Penn and Damon Berridge
82. The Dynamics of Perception: Modelling Subjective Wellbeing
in a Short Panel 533
Stephen Pudney
Contents xi

Appendix of Sources

All articles and chapters have been reproduced exactly as they were first published,
including textual cross-references to material in the original source.

Grateful acknowledgement is made to the following sources for permission to


reproduce material in this book.

1. ‘On the Influence of Previous Vaccination in Cases of Smallpox’,


W.R. Macdonell
Biometrika, 1(3) (1902): 375–383.

2. ‘Tables for Testing the Goodness of Fit of Theory to Observation’,


W. Palin Elderton
Biometrika, 1(2) (1902): 155–163.

3. ‘On the Inheritance in Coat-Colour of Thoroughbred Horses (Grandsire and


Grandchildren)’, N. Blanchard
Biometrika, 1(3) (1902): 361–364.

4. ‘Notes on the Theory of Association of Attributes in Statistics’, G. Udny Yule


Biometrika, 2(2) (1903): 121–134.

5. ‘A Further Study of Statistics Relating to Vaccination and Smallpox’,


W.R. Macdonell
Biometrika, 2(2) (1903): 135–144.

6. ‘The Law of Ancestral Heredity’, Karl Pearson


Biometrika, 2(2) (1903): 211–236.

7. ‘On a Property Which Holds Good for All Groupings of a Normal


Distribution of Frequency for Two Variables, with Applications to the Study
of Contingency-Tables for the Inheritance of Unmeasured Qualities’,
G. Udny Yule
Proceedings of the Royal Society of London. Series A, Containing Papers of a
Mathematical and Physical Character, 77(517) (1906): 324–336.
xii Appendix of Sources

8. ‘On the Influence of Bias and of Personal Equation in Statistics of


Ill-defined Qualities: An Experimental Study’, G. Udny Yule
Proceedings of the Royal Society of London. Series A, Containing Papers of
a Mathematical and Physical Character, 77(517) (1906): 337–339.

9. ‘Reply to Certain Criticisms of Mr G. U. Yule’, Karl Pearson


Biometrika, 5(4) (1907): 470–476.

10. ‘On a New Method of Determining Correlation, When One Variable is


Given by Alternative and the Other by Multiple Categories’, Karl Pearson
Biometrika, 7(3) (1910): 248–257.

11. ‘On the General Theory of Multiple Contingency with Special Reference to
Partial Contingency’, Karl Pearson
Biometrika, 11(3) (1916): 145–158.

12. ‘On the Interpretation of χ2 from Contingency Tables, and the


Calculation of P’, R.A. Fisher
Journal of the Royal Statistical Society, 85(1) (1922): 87–94.

13. ‘On the Application of the χ2 Method to Association and Contingency


Tables, with Experimental Illustrations’, G. Udny Yule
Journal of the Royal Statistical Society, 85(1) (1922): 95–104.

14. ‘Statistical Tests of Agreement between Observation and Hypothesis’,


R.A. Fisher
Economica, 8 (1923): 139–147.

15. ‘Recent Improvements in Statistical Inference’, Harold Hotelling


Journal of the American Statistical Association, Supplement: Proceedings of
the American Statistical Association, 26(173) (1931): 79–87.

16. ‘Contingency Tables Involving Small Numbers and the χ2 Test’, F. Yates
Supplement to the Journal of the Royal Statistical Society, 1(2) (1934):
217–235.

17. ‘Contingency Table Interactions’, M.S. Bartlett


Supplement to the Journal of the Royal Statistical Society, 2(2) (1935):
248–252.
Appendix of Sources xiii

18. ‘Calculation of Chi-Square for Complex Contingency Tables’, H.W. Norton


Journal of the American Statistical Association, 40(230) (1945): 251–258.

19. ‘George Udny Yule 1871–1951’, Frank Yates


Obituary Notices of Fellows of the Royal Society, 8(21) (1952): 309–323.

20. ‘On the Hypothesis of No “Interaction” in a Multi-Way Contingency Table’,


S.N. Roy and Marvin A. Kastenbaum
The Annals of Mathematical Statistics, 27(3) (1956): 749–757.

21. ‘On Description of Differential Association’, David Gold


American Sociological Review, 22(4) (1957): 448–450.

22. ‘Calculation of Chi-Square to Test the No Three-Factor Interaction


Hypothesis’, Marvin A. Kastenbaum and Donald E. Lamphiear
Biometrics, 15(1) (1959): 107–115.

23. ‘Tests of Significance for 2 × 2 Contingency Tables’, F. Yates


Journal of the Royal Statistical Society. Series A (General), 147(3) (1984):
426–463.

24. ‘Measures of Association for Cross Classifications’, Leo A. Goodman


and William H. Kruskal
Journal of the American Statistical Association, 49(268) (1954): 732–764.

25. ‘Interactions in Multi-Factor Contingency Tables’, J.N. Darroch


Journal of the Royal Statistical Society. Series B (Methodological), 24(1) (1962):
251–263.

26. ‘Maximum Likelihood in Three-Way Contingency Tables’, M.W. Birch


Journal of the Royal Statistical Society. Series B (Methodological), 25(1) (1963):
220–233.

27. ‘The Detection of Partial Association, II: The General Case’, M.W. Birch
Journal of the Royal Statistical Society. Series B (Methodological), 27(1) (1965):
111–124.
xiv Appendix of Sources

28. ‘How to Ransack Social Mobility Tables and Other Kinds of


Cross-Classification Tables’, Leo A. Goodman
The American Journal of Sociology, 75(1) (1969): 1–40.

29. ‘A General Model for the Analysis of Surveys’, Leo A. Goodman


The American Journal of Sociology, 77(6) (1972): 1035–1086.

30. ‘Log Linear Models for Contingency Tables: A Generalization of Classical


Least Squares’, J.A. Nelder
Applied Statistics, 23(3) (1974): 323–329.

31. ‘A Structural Model of the Mobility Table’, Robert M. Hauser


Social Forces, 56(3) (1978): 919–953.

32. ‘How Destination Depends on Origin in the Occupational Mobility Table’,


Otis Dudley Duncan
The American Journal of Sociology, 84(4) (1979): 793–803.

33. ‘Criteria for Determining Whether Certain Categories in a


Cross-Classification Table Should Be Combined, with Special Reference
to Occupational Categories in an Occupational Mobility Table’,
Leo A. Goodman
The American Journal of Sociology, 87(3) (1981): 612–650.

34. ‘Structural Transformations in the British Class Structure: A Log Linear


Analysis of Marital Endogamy in Rochdale 1856–1964’, R.D. Penn
and D.C. Dawkins
Sociology, 17(4) (1983): 506–522.

35. ‘Latent Structure Analysis of a Set of Multidimensional Contingency


Tables’, Clifford C. Clogg and Leo A. Goodman
Journal of the American Statistical Association, 79(388) (1984): 762–771.

36. ‘Exchange, Structure, and Symmetry in Occupational Mobility’,


Michael E. Sobel, Michael Hout and Otis Dudley Duncan
The American Journal of Sociology, 91(2) (1985): 359–372.

37. ‘Canonical Analysis of Contingency Tables by Maximum Likelihood’,


Zvi Gilula and Shelby J. Haberman
Journal of the American Statistical Association, 81(395) (1986): 780–788.
Appendix of Sources xv

38. ‘More Universalism, Less Structural Mobility: The American Occupational


Structure in the 1980s’, Michael Hout
The American Journal of Sociology, 93(6) (1988): 1358–1400.

39. ‘The Analysis of Multivariate Contingency Tables by Restricted Canonical


and Restricted Association Models’, Zvi Gilula and Shelby J. Haberman
Journal of the American Statistical Association, 83(403) (1988): 760–771.

40. ‘Semiparametric Estimation in the Rasch Model and Related Exponential


Response Models, Including a Simple Latent Class Model for Item Analysis’,
Bruce Lindsay, Clifford C. Clogg and John Grego
Journal of the American Statistical Association, 86(413) (1991): 96–107.

41. ‘The Impact of Sociological Methodology on Statistical Methodology’,


Clifford C. Clogg
Statistical Science, 7(2) (1992): 183–196.

42. ‘The Effects of Family Disruption on Social Mobility’, Timothy J. Biblarz


and Adrian E. Raftery
American Sociological Review, 58(1) (1993): 97–109.

43. ‘Statistical Inference about Markov Chains’, T.W. Anderson


and Leo A. Goodman
The Annals of Mathematical Statistics, 28(1) (1957): 89–110.

44. ‘Statistical Methods for the Mover-Stayer Model’, Leo A. Goodman


Journal of the American Statistical Association, 56(296) (1961): 841–868.

45. ‘A Stochastic Model of Social Mobility’, Robert McGinnis


American Sociological Review, 33(5) (1968): 712–722.

46. ‘Maximum Likelihood from Incomplete Data via the EM Algorithm’,


A.P. Dempster, N.M. Laird and D.B. Rubin
Journal of the Royal Statistical Society. Series B (Methodological), 39(1)
(1977): 1–38.

47. ‘Dummy Endogenous Variables in a Simultaneous Equation System’,


James J. Heckman
Econometrica, 46(4) (1978): 931–959.
xvi Appendix of Sources

48. ‘Sample Selection Bias as a Specification Error’, James J. Heckman


Econometrica, 47(1) (1979): 153–161.

49. ‘Statistical Models for Discrete Panel Data’, James J. Heckman


Charles F. Manski and Daniel McFadden (eds), Structural Analysis of
Discrete Data with Econometric Applications (Cambridge, Mass.: The MIT
Press, 1981), pp. 114–178.

50. ‘The Incidental Parameters Problem and the Problem of Initial Conditions
in Estimating a Discrete Time-Discrete Data Stochastic Process’,
James J. Heckman
Charles F. Manski and Daniel McFadden (eds), Structural Analysis of
Discrete Data with Econometric Applications (Cambridge, Mass.: The MIT
Press, 1981), pp. 179–195.

51. ‘Control for Omitted Variables in the Analysis of Panel and Other
Longitudinal Data’, Richard B. Davies and Robert Crouchley
Geographical Analysis, 17(1) (1985): 1–15.

52. ‘The Relationship between a Husband’s Unemployment and His Wife’s


Participation in the Labour Force’, Richard B. Davies, Peter Elias
and Roger Penn
Oxford Bulletin of Economics and Statistics, 54(2) (1992): 145–171.

53. ‘Models for Sample Selection Bias’, Christopher Winship and Robert D. Mare
Annual Review of Sociology, 18 (1992): 327–350.

54. ‘An Approximate Generalized Linear Model with Random Effects for
Informative Missing Data’, Dean Follmann and Margaret Wu
Biometrics, 51(1) (1995): 151–168.

55. ‘Modeling the Drop-Out Mechanism in Repeated-Measures Studies’,


Roderick J.A. Little
Journal of the American Statistical Association, 90(431) (1995): 1112–1121.

56. ‘Pseudo-Likelihood for Combined Selection and Pattern-Mixture Models for


Incomplete Data’, Geert Molenberghs, Bart Michiels and Michael G. Kenward
Biometrical Journal, 40(5) (1998): 557–572.
Appendix of Sources xvii

57. ‘Mixed Effects Logistic Regression Models for Longitudinal Binary Response
Data with Informative Drop-Out’, Thomas R. Ten Have, Allen R. Kunselman,
Erik P. Pulkstenis and J. Richard Landis
Biometrics, 54(1) (1998): 367–383.

58. ‘Random Coefficient Models for Binary Longitudinal Responses with


Attrition’, Marco Alfò and Murray Aitkin
Statistics and Computing, 10(4) (2000): 279–287.

59. ‘A Transitional Model for Longitudinal Binary Data Subject to Nonignorable


Missing Data’, Paul S. Albert
Biometrics, 56(2) (2000): 602–608.

60. ‘Missing Covariates in Longitudinal Data with Informative Dropouts:


Bias Analysis and Inference’, Jason Roy and Xihong Lin
Biometrics, 61 (2005): 837–846.

61. ‘Simple Solutions to the Initial Conditions Problem in Dynamic, Nonlinear


Panel Data Models with Unobserved Heterogeneity’, Jeffrey M. Wooldridge
Journal of Applied Econometrics, 20 (2005): 39–54.

62. ‘Identification and Estimation of Age-Period-Cohort Models in the Analysis


of Discrete Archival Data’, Stephen E. Fienberg and William M. Mason
Sociological Methodology, 10 (1979): 1–67.

63. ‘Regression Models for Ordinal Data [With Discussion]’, Peter McCullagh
Journal of the Royal Statistical Society. Series B (Methodological), 42(2)
(1980): 109–142.

64. ‘Social Background and School Continuation Decisions’, Robert D. Mare


Journal of the American Statistical Association, 75(370) (1980): 295–305.

65. ‘Measures of Nominal-Ordinal Association’, Alan Agresti


Journal of the American Statistical Association, 76(375) (1981): 524–529.

66. ‘Change and Stability in Educational Stratification’, Robert D. Mare


American Sociological Review, 46(1) (1981): 72–87.
xviii Appendix of Sources

67. ‘Regression and Ordered Categorical Variables’, J.A. Anderson


Journal of the Royal Statistical Society. Series B (Methodological), 46(1)
(1984): 1–30.

68. ‘Statistical Modelling Issues in School Effectiveness Studies’, M. Aitkin


and N. Longford
Journal of the Royal Statistical Society. Series A (General), 149(1)
(1986): 1–42.

69. ‘Linear Modelling with Clustered Observations: An Illustrative Example of


Earnings in the Engineering Industry’, R.B. Davies, A.M. Martin and R. Penn
Environment and Planning A, 20 (1988): 1069–1084.

70. ‘A Random-Effects Model for Ordered Categorical Data’, Robert Crouchley


Journal of the American Statistical Association, 90(430) (1995): 489–498.

71. ‘The Analysis of Longitudinal Ordinal Data with Nonrandom Drop-Out’,


G. Molenberghs, M.G. Kenward and E. Lesaffre
Biometrika, 84(1) (1997): 33–44.

72. ‘Analysis of Nonrandomly Censored Ordered Categorical Longitudinal Data


from Analgesic Trials’, Lewis B. Sheiner, Stuart L. Beal and Adrian Dunne
Journal of the American Statistical Association, 92(440) (1997): 1235–1244.

73. ‘Modeling Repeated Measures with Monotonic Ordinal Responses and


Misclassification, with Applications to Studying Maturation’, Paul S. Albert,
Sally A. Hunsberger and Frank M. Biro
Journal of the American Statistical Association, 92(440) (1997): 1304–1311.

74. ‘A Hierarchical Latent Variable Model for Ordinal Data from a Customer
Satisfaction Survey with “No Answer” Responses’, Eric T. Bradlow
and Alan M. Zaslavsky
Journal of the American Statistical Association, 94(445) (1999): 43–52.

75. ‘Analysis of Longitudinal Substance Use Outcomes Using Ordinal


Random – Effects Regression Models’, Donald Hedeker and
Robin J. Mermelstein
Addiction, 95(3) (2000): S381–S394.
Appendix of Sources xix

76. ‘Mixed Effects Logistic Regression Models for Longitudinal Ordinal


Functional Response Data with Multiple-Cause Drop-Out from the
Longitudinal Study of Aging’, Thomas R. Ten Have, Michael E. Miller,
Beth A. Reboussin and Margaret K. James
Biometrics, 56(1) (2000): 279–287.

77. ‘Modeling Clustered Ordered Categorical Data: A Survey’, Alan Agresti


and Ranjini Natarajan
International Statistical Review/Revue Internationale de Statistique,
69(3) (2001): 345–371.

78. ‘Joint Regression and Association Modeling of Longitudinal Ordinal Data’,


Anders Ekholm, Jukka Jokinen, John W. McDonald and Peter W.F. Smith
Biometrics, 59 (2003): 795–803.

79. ‘Generalized Linear Mixed Models for Ordered Responses in Complex


Multilevel Structures: Effects Beneath the School or College in Education’,
Antony Fielding and Min Yang
Journal of the Royal Statistical Society. Series A. (Statistics in Society),
168(1) (2005): 159–183.

80. ‘A Mixed-Effects Regression Model for Longitudinal Multivariate Ordinal


Data’, Li C. Liu and Donald Hedeker
Biometrics, 62 (2006): 261–268.

81. ‘Modelling Trajectories through the Educational System in North West


England’, Roger Penn and Damon Berridge
Education Economics, 16(4) (2008): 411–431.

82. ‘The Dynamics of Perception: Modelling Subjective Wellbeing in a Short


Panel’, Stephen Pudney
Journal of the Royal Statistical Society. Series A. (Statistics in Society), 171(1)
(2008): 21–40.
xx Contents
Contents xxi

Editors’ Introduction:
Social Statistics
Roger Penn and Damon Berridge

T
his collection of articles on social statistics provides a broadly chronological
account of the creation and development of statistical methods relevant
to the social sciences and in particular, to sociology. Sociological research
has posed serious challenges for statistical analysis historically because socio-
logical data are not generally continuous in nature1. Statistical methods such as
linear regression models and analysis of variance (ANOVA) developed rapidly
after Galton’s breakthroughs in the 1870s and 1880s which culminated in his
book, Natural Inheritance (Galton, 1889). However, these methods were based
upon assumptions of continuous data, with an underlying normal distribution
and were therefore inappropriate for the analysis of much sociological data.
New methods had to be created and developed. These four volumes chart
these innovations over the last 100 years or so.
Sociological data are often categorical in nature. They may be binary, nominal
or ordinal. Binary data (such as ‘employed’/‘unemployed’) have an underlying
binomial distribution which can be analysed using logistic and probit models.
Nominal data (such as occupational status categories) have an underlying
multi-nomial distribution which can be examined by either log-linear modelling
or multi-nomial logit modelling. Ordinal responses such as those found in the
many Likert2 items (Likert, 1932) used in survey research can be assessed using
proportional odds or cumulative logit models. The generation of these styles
of modelling has been a dominant theme during the last 50 years.
The early literature on the analysis of categorical data witnessed a succes-
sion of attempts aimed at quantifying the nature of association between two
binary variables. The debate between Pearson and Yule became particularly
heated. At the heart of their exchanges was a debate about whether categor-
ical variables could be regarded as discrete representations of continuous
distributions (Pearson’s point of view), or whether binary variables such as
(‘vaccinated’/‘not vaccinated’) were inherently discrete (Yule’s perspective).
Both points of view had some merit. For variables such as religion and ethnicity,
the classification is fixed and no underlying continuous distribution is ap-
parent. For other variables, for example social class, the assumption of an
underlying continuum is more plausible. Indeed, such an assumption would
prove to be integral to the development of later models such as the cumulative
logit (see Volume 4).
Pearson (1900) introduced a goodness-of-fit test which constituted the
starting point for a subsequent revolution in social statistics. This would
xxii Editors’ Introduction

become known as Pearson’s chi-squared statistic. Pearson was motivated to


develop this test in order to determine whether possible outcomes on a Monte
Carlo roulette wheel were equally likely. The chi-squared test statistic had neat
mathematical properties3 which permitted the evaluation of whether prob-
abilities in a multinomial distribution equated to certain prior values. It allowed
for the attachment of probabilities to the likelihood of responses occurring
by chance (at random). The probability of results occurring less than 1 in
20 times became the standard criterion for measuring statistical significance
within the social sciences. This continues today. Indeed, a great deal of routine
sociological practice remains locked at this level of descriptive statistical
analysis established in the early part of the twentieth century.
These developments closely paralleled the emergence of the notion of
hypothesis testing which involved the generation of a ‘research hypothesis’ and
an isomorphic ‘null hypothesis’ of no association. The research hypothesis itself
was often derived from emerging sociological theory. Traditionally, this was
seen as an a priori starting point for data analysis but later it was also linked
to a posteriori inductive approaches such as ‘grounded theory’ (see Glaser and
Strauss, 1967). The latter has now become incorporated into contemporary
social statistics in the form of ‘data mining’ (see, for example, Hand, 2001).
Other attempts at measuring association originated in the study of public
health (see Macdonell, 1902; 1903) and genetics (see Blanchard, 1902). Their
analyses of the nature of statistical association began with the simplest of
cases: the 2 × 2 contingency table (see Macdonell, 1902; Blanchard, 1902). It
developed subsequently into the analysis of the more general 2 × c table (see
Macdonell, 1903) and culminated in the r × c case (see Yule 1906a; 1906b)4.
The presentation of innovative methods in social statistics moved during
the inter-war period from biological and medical journals into mainstream
statistical journals on both sides of the Atlantic, notably the Journal of the Royal
Statistical Society (JRSS) and the Journal of the American Statistical Association
(JASA). These two journals have remained pivotal for the dissemination of
new methods within social statistics and quantitative sociology. The 1930s
and 1940s witnessed the further extension of the chi-squared test to handle
higher dimensional tables (see Bartlett, 1935; Norton, 1945).
The analysis of complex data structures, such as those encountered in the
study of processes of social mobility, required the development of new methods
that could permit the estimation of the simultaneous effects of a wide range
of explanatory variables. Some of these would be categorical in form (such as
social class, gender and ethnicity) but others would be continuous (such as age
and income). There was therefore, a pressing need to develop methods that
permitted the simultaneous control for categorical and continuous variables,
and which also considered likely interactions amongst the explanatory vari-
ables themselves. These issues were resolved within a variety of modelling
frameworks (see Volumes 2, 3 and 4).
The practical solutions to these statistical problems also relied upon the
parallel emergence of computers with sufficient power to fit models. It also
Editors’ Introduction xxiii

required the development of appropriate software to facilitate their widespread


adoption. Initially, social statisticians wrote their own software but the late
1960s saw the emergence of SPSS (Statistical Package for the Social Sciences)
as a standard software package available to social scientists. The combina-
tion of SPSS software and IBM mainframe computers underpinned the rapid
development of the incorporation of new methods of statistical analysis within
the social sciences in the 1970s and 1980s, particularly in the United States.
Indeed, during this period, key contributions were overwhelmingly published
in US statistics and sociology journals.
The main difficulty inherent in much of the traditional analysis of con-
tingency tables was the problem of chronology. Social scientists wanted to be
able to examine causal processes such as whether unemployment led to ill
health. However, a contingency table that cross-tabulated ‘employment’/
‘unemployment’ by ‘good health’/‘poor health’ could not demonstrate such
a relationship since a statistically significant correlation between unemploy-
ment and illness could equally be the result of illness leading to unemployment.
Indeed, a putative correlation could be the result of both processes operating
simultaneously.
Longitudinal data offered a solution to this impasse. In the example above,
a longitudinal research design producing simultaneous data over time on states
of health and on employment status would permit the disentangling of the two
processes, by the measurement of the two relative sets of effects. However,
longitudinal analysis brought its own problems that also needed new solu-
tions (see Volumes 3 and 4).

Volume 1: The Fundamentals of Descriptive


Social Statistics
The earliest articles in this volume laid down the fundamental principles for
the exploratory analysis of categorical data. A classic example of such data
was social class as measured by occupational groups (see Macdonell, 1903:
Table IX, p.138) [1.5]. Social class often comprised several occupational cat-
egories, but Macdonell reduced it to a binary variable comprising two
categories, ‘higher class’/‘lower class’.
The relationship between two binary variables could be summarised using
a 2 × 2 contingency table or crosstabulation. The higher and lower classes
of fathers were cross-classified by the higher and lower classes of their sons
(see Macdonell (1901, 1903) [1.1, 1.5] and Blanchard (1902) [1.3]5).
The relationship between two binary ‘attributes’6 (or variables) was
expressed in terms of a correlation or an association (see Yule, 1903) [1.4].
Association was defined in terms of the presence of dependence, or equiva-
lently, the absence of independence, between two binary variables (see Yule,
1903) [1.4]. The correlation or association displayed in a single 2 × 2 table
was encapsulated in a single summary statistic. Early attempts at developing
xxiv Editors’ Introduction

such a statistic included the ‘coefficient of correlation’ (see Macdonell, 1903;


Pearson et al., 1903; Yule, 1906a) [1.5, 1.6, 1.7] and the ‘coefficient ratio’
(see Pearson, 1910) [1.10].

1.1 Data Comprising More Than Two Variables

Even in these early studies, there was an appreciation that processes were
rarely bivariate in nature, and could not be summarised adequately using a
single 2 × 2 table. More often, there was a complex series of inter-relationships
between more than two binary variables. This was initially resolved by the con-
struction of a series of 2 × 2 tables (see Macdonell (1901, 1903) [1.1, 1.5]).
The ideas underlying the theory of complete independence were gen-
eralised from two variables to more than two (see Yule, 1904) [1.4]. However,
modelling approaches, better able than hypothesis testing to handle such
multivariate data, would not start to be developed until the second half of
the twentieth century (see Volume 2 onwards).

1.2 Variables Comprising More Than Two Categories

As time passed, there was a recognition that two categories were insuffi-
cient to capture much social scientific data. In the early research on heredity,
‘unmeasured characters’ (or categorical variables) such as eye-colour were
categorised into more than two categories: these included light/medium/dark
hues (see Yule, 1906b: pp.337–8) [1.8] or, with greater resolution, eight tints
(see Yule, 1906a: p.330) [1.7].
There was an appreciation that collapsing such multi-categorical data into
two categories (see Macdonell, 1903, p.139) [1.5] was problematic for sev-
eral reasons. The first involved the loss of potentially useful information. The
second involved the problem that different ways of dichotomising groupings
led to different crosstabulations7 and resulted in different coefficients of cor-
relation (see Macdonell, 1903) [1.5].

1.3 A Typology of Categorical Data

As a general rule, different types of data require different types of analysis.


This rule applies to the analysis of categorical data. Multi-categorical vari-
ables can be classified into two types: ordered (or ordinal) and unordered
(or nominal).
In the earlier example on heredity, Yule (1906a) [1.7] wished to examine
the association between the heights of fathers and their sons. He started his
analysis by categorising the fathers’ and sons’ heights into three categories,
thereby creating two new ordered variables, each comprising three cat-
egories. These ordered variables were cross-classified to produce a 3 × 3 table.
Editors’ Introduction xxv

However, he subsequently analysed this 3 × 3 table by means of a series of


2 × 2 tables. Unfortunately, this served to increase the complexity of the prob-
lem rather than to resolve the underlying issue.

1.4 Chi-Squared Test

Several statistics such as Yule’s Q (see Yule, 1900, 1912) were developed to
quantify strength of association. However, no formal test of statistical signifi-
cance was attached to them. Pearson (1907) concluded: ‘I cannot therefore
accept Mr Yule’s test as very likely to be helpful in measuring deviation from
Gaussian…’ (pp. 470–1). To rectify this situation, Pearson (1907) [1.9] himself
calculated expected frequencies assuming a Gaussian (normal) distribution
and applied the chi-squared test to a 3 × 3 table that examined the transmis-
sion of eye colour from father to son (pp. 471–2).
Subsequently, Pearson (1916) [1.11] applied the chi-squared test statistic
for testing the null hypothesis of no association. He referred the chi-squared test
statistic to the table of critical values for the chi-squared distribution presented
earlier by Palin Elderton (1902) [1.2]. In a later article, Fisher (1922) [1.12]
clarified the use of the chi-squared test for contingency tables. In particular, he
stated that one should take, not the number of cells but the number of degrees
of freedom, to produce more accurate p-values.
Yule (1922) [1.13] also applied the chi-squared test to compare observed
and expected frequencies. He applied the test to data from a series of physical
experiments. Fisher (1923) [1.14] then used the chi-squared test to com-
pare data observed from an experiment with two members of the family of
chi-squared distributions (p.145) in order to assess which of the two displayed
the better fit. A correction to the chi-squared test for 2 × 2 tables with small
cell counts was proposed subsequently by Yates (1934) [1.16].
Hotelling (1931) [1.15] provided a useful summary of developments at
that time, including the chi-squared test and Fisher’s ‘likelihood’. In the dis-
cussion of his article, Shewhart made an observation which still stands today:
‘The logic of discovery is beginning to take on a new form, and today, the
subject of statistical inference is of interest to all scientists.’

1.5 Interactions

In a 2 × 2 table comprising two variables A and B, the first order interaction is


denoted by ‘A × B’. This indicates how A varies with B. The main effects comprise
the marginal distributions of A and of B. Once the two main effects of A and
B have been taken into account, the first order interaction can be tested. This
principle was extended to higher dimensional tables. As Bartlett (1935) [1.17]
explained, ‘For a 2 × 2 × 2 table … the only additional problem to consider…
is the testing of the second order interaction (A × B × C).’
xxvi Editors’ Introduction

1.6 High Dimensional Tables

Norton (1945) [1.18] presented a scheme of successive approximations for


2N × R tables8. He made an important point which, even today, is sometimes
difficult to convey to undergraduates in social statistics and social science:
‘… a main effect may appear to be nil even though one of the interactions is
clearly significant’ (p.251)!
Yates (1952) [1.19] provided a concise summary of developments in
social statistics by the middle of the twentieth century.
Kastenbaum and Lamphiear (1959) [1.22] developed an iterative pro-
cedure to estimate parameters of a multinomial distribution in a (r – 1) ×
(s – 1) × (t – 1) crosstabulation and calculated the corresponding chi-squared
test statistic. They emphasised the importance of being able to implement
their methods using the latest technology: ‘It is the purpose of this paper to
demonstrate a technique which… is particularly well suited for modern high-
speed computers’ (p.107).
Volume 1 concludes with a wide-ranging article by Yates (1984) [1.23],
which summarised the developments covered in the volume. He provided an
excellent review of early work (Section 3). He also reviewed the chi-squared test,
including criticisms of the exact test and continuity correction (Section 11).

Volume 2: The Development of Statistical Modelling


In Volume 1, we examined how the analysis of contingency tables focused on
the testing of the null hypothesis of no association using the chi-squared test. In
Volume 2, we explore how methods increased in sophistication as researchers
started to examine issues such as social mobility (see Goodman, 1969 [2.5];
Hauser, 1978 [2.8]; Duncan, 1979 [2.9]; Penn and Dawkins, 1983 [2.11];
Hout, 1988 [2.15]; Biblarz and Raftery, 1993 [2.19]) and occupational mobility
(see Duncan, 1979 [2.9]; Sobel et al., 1985 [2.13]). Social and occupational
mobility, together with marital endogamy, were areas in which statisticians
and social scientists interacted fruitfully during the 1970s and 1980s.
In Volume 2, we witness the transition from hypothesis testing to statistical
modelling. Goodman and Kruskal (1954) [2.1] reviewed traditional measures
of association and declared, ‘… we propose the construction of probabilistic
models…’ (p. 735). Existing methods for handling two-way contingency tables
were adapted to analyse three-way cross-tabulations. Darroch (1962) [2.2]
proposed a likelihood ratio test for the hypothesis of no second-order inter-
action (A × B × C), and used Kastenbaum and Lamphiear’s (1959) [1.22] earlier
2 × 3 × 5 example. Darroch (1962) [2.2] also extended his approach to handle
four-way tables (A × B × C × D).
Birch (1963) [2.3] showed how interactions could be defined as cer-
tain linear combinations of logarithms of expected frequencies which could
be estimated using maximum likelihood (ML). These ML estimates were
Editors’ Introduction xxvii

shown to be identical for a range of different sampling methods (Section 2).


He also demonstrated that these ML estimates had three elegant mathematical
properties:

(i) likelihood equations had a unique solution at which the likelihood is


maximised;
(ii) ML estimates were the same under a wide variety of sampling conditions;
(iii) ML estimates were sufficient statistics9.

These results were generalised to many-way contingency tables (Section 5).


Birch (1965) [2.4] subsequently presented an early form of the log-linear
model (see, for example Nelder, 1974 [2.7]; Penn and Dawkins, 1983 [2.11]
later in Volume 2) involving three variables I, J and K. He demonstrated that
the hypothesis of a common cross-product ratio between variables I and J
across all levels of variable K was, in its general form, equivalent to Roy and
Kastenbaum’s (1956) hypothesis of no three-factor interaction.

2.1 Social Mobility

Goodman (1969) [2.5] presented methods for examining two-way social


mobility tables. Social class was classified into three categories: ‘upper’ (U),
‘middle’ (M) and ‘lower’ (L). He divided each 3 × 3 table into a series of 2 × 2
sub-tables, for example, father’s status (U or M) versus son’s status (U or M).
He defined the interaction in each 2 × 2 sub-table in terms of an ‘odds ratio’.
For example, the odds ratio of having ‘destination’ M rather than ‘destination’
U was defined for an individual whose origin is M, relative to an individual
whose origin is U. If there were no difference in odds between these two indi-
viduals, then the odds ratio would take a value of 1 or, equivalently, the log
odds ratio would take a value of 0. Goodman denoted the log odds ratio by G.
By deriving a standard error S of G, he used the Z score (G/S) to test whether
G was significantly different from zero.
The limitations of such bivariate analyses necessitated the development of
multivariate techniques which could combine hypothesis testing and statistical
modelling. The advantages of statistical modelling over hypothesis testing
were threefold:

(i) It allowed for the examination of the joint effects of a set of n explana-
tory variables {X1, X2, …, Xn} on response variable Y simultaneously.
(ii) It permitted control of a set of secondary explanatory variables {X1,
X2, …, Xn–1} by including them in the model first, before assessing the
relative significance of the factor of primary interest, Xn.
(iii) It enabled the assessment of the statistical significance of both main
effects and interactions between explanatory variables.
xxviii Editors’ Introduction

2.2 Major Methodological Developments in the 1970s

2.2.1 Generalised Linear Models


A major advance in data analysis came with the formalisation of the family of
generalised linear models by Nelder and Wedderburn (1972). One member
of this family, with Poisson error distribution and log link function, defined
the log-linear model. Goodman (1972) [2.6] applied log-linear models to the
analysis of a 24 table of counts. The goodness of fit between nested models
was tested using chi-squared and likelihood ratio tests. Nelder (1974) [2.7]
subsequently used a log-linear model to relate the log count (or frequency)
to a set of explanatory variables. This early work by Nelder would lead to the
development of the statistical software package Generalized Linear Interactive
Modelling, GLIM (see Francis et al., 1993) which has been a cornerstone for
the development of complex statistical modelling in the social sciences.

2.2.2 EM Algorithm
In their highly influential article, Dempster et al. (1977) [3.4] developed the
EM algorithm, an iterative procedure comprising an expectation (E) step
followed by a maximisation (M) step. The simplicity and generality of the
associated theory would provide the inspiration for much later work (see
Volumes 3 and 4). When the underlying data follow the exponential family,
whose maximum likelihood estimates are easily computed, each M step is simi-
larly easily computed. Dempster et al. (1977) [3.4] provided many examples
including ‘missing data situations, applications to grouped, censored or trun-
cated data, finite mixture models, variance component estimation, hyperpar-
ameter estimation, iteratively reweighted least squares and factor analysis’.

2.3 Occupational Mobility

The exploration of occupational mobility provided the basis for several


methodological developments. Hauser (1978) [2.8] proposed a multiplica-
tive model for an observed crosstabulation of respondents’ occupations and
their fathers’ occupations at an earlier time. The model expressed the expected
frequencies as the product of four elements: an overall effect, a row effect, a
column effect and an interaction effect. The cells in the mobility table were
assigned to K mutually exclusive and exhaustive subsets, and each of these
subsets shared a common interaction parameter. Thus, aside from total, row
and column effects, each expected frequency was determined by only one par-
ameter which reflected the level of mobility or immobility in that cell, relative
to that in other cells in the table.
Duncan (1979) [2.9] analysed occupational class in terms of eight group-
ings (p.794) and fitted a series of five different models to the resulting 8 × 8
Editors’ Introduction xxix

occupational mobility table. He used likelihood ratios to compare ‘nested’ or


‘hierarchical’ models (see Figure 1, p. 799).
In their own study of occupational mobility, Sobel et al. (1985) [2.13]
made the distinction between ‘reciprocated’ and ‘unreciprocated’ mobility.
They matched concepts of structure and exchange to parameters of a quasi-
symmetry model. Exchange or ‘reciprocated’ mobility was defined as the part
of the mobility process that resulted from equal flows between pairs of occu-
pational categories. Structural mobility was defined as the effect of marginal
heterogeneity operating uniformly on such origins.
Whilst these articles developed the traditional interest in mobility tables,
with the emphasis on the distinction between diagonal and off-diagonal cells,
they suffered from an increasing propensity to generate complex and arcane
explanations. This was resolved by the application of log-linear analysis. Penn
and Dawkins (1983) examined matrices of marital endogamy to assess the
extent to which men and women from different social classes intermarried in
Britain between 1850 and 1964. The concinnity of their analyses exemplified
the principle of parsimony and cut through much of the unnecessary complexity
within earlier approaches.
Biblarz and Raftery (1993) [2.19] tested the hypothesis that family dis-
ruption affected subsequent occupational outcomes. They applied a log-linear
model to a four-way table of counts, cross-classified by father’s occupation, son’s
occupation, race (‘White’/‘Black’) and family type (‘intact’/‘nonintact’).

2.4 Canonical Analysis

Canonical analysis has often been employed instead of log-linear modelling


to analyse the relationship between two categorical variables. However, until
the mid-1980s, canonical analysis had taken place on an ad hoc basis. Gilula
and Haberman (1986) [2.14] positioned canonical analysis within a more
formal theoretical framework. They examined models that placed non-trivial
restrictions on the values of canonical parameters so that a parsimonious
description of association could be obtained. Parameter estimation was by
maximum likelihood, and approximate confidence intervals were derived.
Adequacy of models was assessed by using chi-squared tests. The resulting
models could be used to determine the appropriateness of latent class analysis.
They could also be used to determine whether a set of canonical scores had
specified patterns. Comparisons with a log-linear parameterisation of the cell
probabilities revealed that canonical analysis, which used interpretations based
on regression and correlation, was an effective alternative to log-linear param-
eterisations based on cross-product ratios.
Gilula and Haberman (1988) [2.16] extended this approach from two-way
tables to multivariate scenarios, but these were – somewhat paradoxically –
reduced to two dimensions. Responses were treated by them as a single cat-
egorical variable, whilst the explanatory variables were likewise reduced to
xxx Editors’ Introduction

a single factor. In this way, a multi-way table was reduced to a two-way array
to which traditional canonical and association models could be applied. How-
ever, this rush to simplification was at the expense of much of the information
inherent in the data under scrutiny.

2.5 Latent Variable/Latent Class Analysis

An alternative form of analysis involved the concept of latent classes which


underpinned the processes under observation. Clogg and Goodman (1984)
[2.12] used four dichotomous response items to measure attitudes towards
science and scientific careers. Responses to these items were cross-classified
by ‘intelligence’ (high IQ or low IQ). They proposed a latent structure analysis
through the introduction of two (or more) latent classes. Item responses were
assumed to be conditionally independent of each other, given such latent class
membership. Clogg and Goodman (1984) [2.12] showed how their alternative
formulation led directly to the familiar log-linear model.
Lindsay et al. (1991) [2.17] fitted a range of models to educational testing
data. These models included the Rasch model which was defined as a func-
tion of item difficulty and subject ability parameters, and the T-class mixture
model in which the subject ability parameters were treated as random effects.
In the latter model, respondents were assumed to be drawn from a popula-
tion with T latent classes. This model was extended by maximising the mix-
ture likelihood over all possible latent class structures which were modelled
non-parametrically. The final class of models described by Lindsay et al.
(1991) [2.17] were conventional models in which, within each latent class,
response probabilities were independent with unknown and item-specific
success probabilities.

Volume 3: Statistical Modelling of Longitudinal Data


Much of the classical work in social statistics in the first half of the twentieth
century focused on the analysis of cross-sectional data. This became institu-
tionalised in the routine use of crosstabulations and associated chi-squared
tests to analyse synchronic data. However, social science has had a longstand-
ing interest in diachronic processes of social change. Such concerns permeated
the early development of sociological theory as seen in the works of Marx,
Durkheim and Weber (see Giddens, 1971).
An early reference to the concept of a ‘model’ applied to longitudinal
data was made by Anderson and Goodman (1957) [3.1] (see Section A: State
Dependence below). Heckman (1978) [3.5] subsequently formulated and
estimated simultaneous equation models which included both discrete and con-
tinuous endogenous variables. His models relied on the idea that discrete
Editors’ Introduction xxxi

endogenous variables were generated by continuous latent variables crossing


thresholds, a concept that dated back to Pearson (1900). Heckman explained
how dummy endogenous variables served two distinct roles: firstly, as proxies
for unobserved latent variables (‘spurious effects’) and secondly, as direct
shifters of behaviour (‘true effects’). He demonstrated that these two roles
needed to be carefully distinguished.
In his seminal 1981 article, Heckman (1981a) [3.7] addressed a variety of
issues which arose specifically in the analysis of longitudinal categorical data
and which set the agenda for subsequent methodological developments in the
field over the next 30 years. He formulated a general dynamic model for dis-
crete panel data that could be used to analyse the structure of ‘discrete choices’
(categorical outcomes or responses) over time. He generated a rich group of
stochastic processes based upon time-discrete outcomes, including Markov
models (see Section A: State Dependence) and renewal processes. This family
of models was flexible enough to accommodate time-varying explanatory vari-
ables, general correlation structures for random effects and complex inter-
relationships among ‘decisions’ (outcomes) taken at different times.

A. State Dependence

State dependence is present in a dynamic social process if the probability of


an individual being in a given state at time [t+1] depends on the state that
same individual occupied at time [t]. A classic example of such state depend-
ence was the axiom of cumulative inertia proposed by McGinnis (1968) [3.3]:
the longer an individual spends in a given state such as any particular social
class, the less likely such an individual would be to leave that state (p.716).
Indeed, some individuals would always, or almost always, remain in the
same state regardless of duration. The existence of such individuals, known
as ‘stayers’, in a longitudinal dataset presents special problems analytically
(see Section C below).
Anderson and Goodman (1957) [3.1] generated maximum likelihood esti-
mates for transition probabilities in a Markov chain. Chi-squared and likelihood
ratio tests were derived for testing whether the transition probabilities of a
first order chain were constant, and when the transition probabilities were
constant, whether they had specified numerical values.
Heckman’s (1981a) [3.7] method could be used to address the longstand-
ing problem of distinguishing between true and false contagion (Bates and
Neyman, 1951), or in the language of Heckman (1981a) [3.7], distinguishing
between ‘spurious’ and ‘true’ state dependence. In other words, state depend-
ence may exist through serial correlation in the observables generating the
event (‘spurious state dependence’) or because past experience of the event
affects subsequent behaviour (‘structural state dependence’) (p.167).
xxxii Editors’ Introduction

Davies et al. (1992) [3.10] handled state dependence by adding a dummy


variable (previous state) to their model (for more details, see Section B below),
rather than conditioning on previous state10.

B. Residual Heterogeneity

A central problem in the social sciences involves the existence of unmeasured


or, indeed, unmeasurable explanatory variables. If such factors are at work, this
will be evident in the structure of residuals to any fitted models. Appropriate
diagnostic tests to identify such potential variables were initially identified
by Heckman (1981a) [3.7], and further developed by Davies and Crouchley
(1985) [3.9] and Winship and Mare (1992) [3.11].
Heckman (1979) [3.6] argued that selection bias can also be a contri-
butory factor generating ‘omitted variable’ bias within a regression analysis.
He proposed the incorporation of an explicit individual-specific error term
(‘random effect’) within the modelling framework. This random effect was
assumed to follow a normal distribution.
Davies and Crouchley (1985) [3.9] showed that uncontrolled heterogeneity
can lead to bias in the estimation of coefficients for any exogenous variables
included in the model, and can result in extremely misleading estimates of
standard errors. They developed random effects models in order to control for
heterogeneity, thereby minimising such bias. These models, which included
Markovian structure, were initially applied to British election data (see Davies
and Crouchley, 1985). Subsequently, Davies et al. (1992) [3.10] used an early
version of SABRE to fit a logistic-normal random effects model to employment
data in Britain.

C. Movers and Stayers

The mover-stayer model, a generalisation of the Markov chain model, assumed


there are two types of individuals in the population under scrutiny: the ‘stayer’
who had a propensity to remain in the same state (such as ‘employed’) during
the entire period of study, and the ‘mover’ whose tendency to change state
(i.e. become ‘unemployed’) over time could be described by a Markov chain
with a transition probability matrix.
The transition probability matrix for movers and the proportion of
stayers were unknown parameters. Goodman (1961) [3.2] presented various
estimators of these parameters and compared their accuracy, as well as
describing tests of several hypotheses concerning the mover-stayer model.
Davies and Crouchley (1985) [3.9] argued that stayers could be handled
within a random effects framework by adding a ‘spike’ (or concentration) of
stayers to the appropriate mixing distribution. They handled stayers in states 1
and 0 by supplementing a normal mixing distribution with mass points at
plus and minus infinity.
Editors’ Introduction xxxiii

D. Initial Conditions

Before a statistical model can be fitted to event history data, the social pro-
cess under scrutiny needs to be appropriately specified. In much social science
research, this initial conditions problem is treated casually. Naive assump-
tions are often made: the initial conditions or relevant pre-sample history of
the social process are often assumed to be exogenous. This assumption is valid
only if the responses that generate the process are serially independent or if
a genuinely new process fortuitously starts at the beginning of the observa-
tion period.
Alfo and Aitkin (2000) [3.16] presented a solution to the initial conditions
problem. They modelled the initial response separately whilst second and sub-
sequent responses were allowed to depend on previous response(s).
Wooldridge (2005) [3.19] proposed modelling the mixing distribution,
conditional on the initial response and any exogenous explanatory variables.
For the binary response model with a lagged response, Arellano and Carrasco
(2003) proposed a maximum likelihood estimator conditional on the initial
response, where the mixing distribution was assumed to be discrete. In the
binary case, Wooldridge’s approach was more flexible and computationally
much simpler than that of Arellano and Carrasco in three respects. Firstly, the
response probability could take either the probit or the logit form. Secondly,
strictly exogenous explanatory variables could be incorporated easily, along
with a lagged response variable. Thirdly, standard random effects software
could be used to estimate the resulting parameters.
Specifying a mixing distribution conditional on the initial condition has
several advantages. Firstly, the mixing distribution can be chosen flexibly
and regarded as an alternative approximation to that of Heckman (1981b)
[3.8]. Secondly, in several cases, including the binary probit model, a mixing
distribution can be chosen that leads to straightforward estimation using
standard software.

E. Dropout

Dropout or ‘attrition’ occurs at random when each respondent in a panel survey


is equally likely to drop out at any given wave. When dropout is correlated with
the social process under scrutiny, dropout or attrition is said to be ‘informative’
about that process. The most efficient and effective way of handling dropout
is to model the response and dropout processes simultaneously (see Little,
1995 [3.13]).
Joint models for response and dropout can be classified into two broad
types: selection models and pattern-mixture models. The difference between
the two lies in the manner in which the joint distribution of the response-
dropout mechanism is factored.
A major problem encountered in the development of such joint models
was that results were often sensitive to assumptions about the nature of
xxxiv Editors’ Introduction

dropout itself. Winship and Mare (1992) [3.11] provided a development of


Heckman’s (1979) [3.6] two-stage estimator, and outlined a number of semi-
and non-parametric approaches to estimating selection models that relied
on weaker assumptions about the nature of the dropout process. Little (1995)
[3.13] suggested that sensitivity analyses should be performed in order to
assess the effect on inferences of alternative assumptions about the dropout
process itself.
Follmann and Wu (1995) [3.12] proposed a separate set of models for
continuous responses with binary dropout processes which were linked by
a common random parameter. An approximation to the response model was
conditioned on dropout, thereby precluding the need to specify an explicit
dropout model. This was extended by Ten Have et al. (1998) [3.15] to handle
longitudinal binary responses subject to informative dropout by the use of a
shared parameter model with logistic link. Molenberghs et al. (1998) [3.14]
developed pseudo-likelihood solutions for the same problem. Albert (2000)
[3.17] subsequently developed a transition model to analyse longitudinal
binary data subject to non-ignorable missingness. Roy and Lin (2005) [3.18]
considered the situation in which dropout was informative and both outcome
and time-varying covariates were missing at the time of dropout. Both of
these approaches used the EM algorithm developed some 25 years earlier by
Dempster et al. (1977) [3.4].

Volume 4: Statistical Modelling of Ordinal


Categorical Data
In the fourth and final volume, the issues raised in the general context of
event history data, and more specifically repeated binary data, are extended to
the modelling of ordinal categorical data. This area has been at the forefront
of recent developments in the linkage between social scientists and social
statisticians.
Ordinal data are widespread in the social sciences but have tradition-
ally been neglected. Conventional statistical analyses have generally involved
simplifying ordinal responses as binary outcomes11. Such an approach loses
much of the resolution inherent within such ordinal data. Another strategy,
often adopted by researchers, has been to apply a scoring system to ordered cat-
egories. The resulting variable is treated as continuous, and linear regression
models which assume the normal distribution are routinely used. However,
this strategy violates a range of assumptions about the nature of ordinal data:
in particular, it ignores the fact that they are rarely normally distributed.
Agresti (1981) [4.4] presented measures for summarising the strength of
association between an ordered categorical variable and a nominal variable.
For a 2 × c table, these measures included Somers’ d, discrete analogues of
the Mann-Whitney test and Goodman and Kruskal’s (1954) [2.1] γ . For the
r × c case, Agresti (1981) [4.4] constructed two generalised measures that
Editors’ Introduction xxxv

were expressed in terms of event probabilities concerning two types of pairs


of observations, including an alternative representation of Freeman’s (1965)
ϑ index.
Agresti (1981) [4.4] also outlined two models: firstly, a log-linear model
which assumed that a set of ordered scores could be assigned to the response
categories; secondly, a cumulative logit model which did not require such a
scoring system.
The cumulative logit model had been described earlier by McCullagh
(1980) [4.2]. Within the cumulative logit model, the original ordinal response
was represented by a series of binary responses. For example, a Likert item
such as ‘Strongly Agree’ (SA), ‘Agree’ (A), ‘Neither Agree Nor Disagree’ (NAND),
‘Disagree’ (D), ‘Strongly Disagree’ (SD) would be represented by the four binary
responses: (i) SA vs. A to SD; (ii) SA, A vs. NAND to SD; (iii) SA to NAND vs.
D, SD; (iv) SA to D vs. SD (see Berridge and Penn, 2009, for an illustration
of this approach). The model intercepts were allowed to vary between these
four partitions. The effects of explanatory variables were assumed to remain
constant across all partitions12.
McCullagh (1980) [4.2] suggested other models that did not require con-
tinuous scores. These included the complementary log-log model, a discrete-
data version of Cox’s proportional hazards model (see Cox, 1972), and the
adjacent-logit model. He also made a distinction between symmetric and
asymmetric models. A symmetric model was one in which the results were
invariant to a reversal in the order of the categories. The cumulative logit
(or proportional odds) model was an example of such a symmetric model,
and remains the natural starting point when analysing a Likert item.
Asymmetric (or sequential) responses were appropriately analysed using
models such as the continuation ratio. This was originally developed for
the analysis of educational data (see Fienberg and Mason, 1979; Mare, 1980,
1981) [4.1, 4.3, 4.5].
The multi-level model for ordinal data, like the continuation ratio model,
also originated in the area of educational research, and was closely related to
the random effects and variance component models introduced in Volume 3.
We have not provided an extensive review of multi-level models here13. How-
ever, to illustrate their applicability for modelling ordinal data, we have
included in Volume 4 a selection of influential articles within which multi-
level models have been applied to data on educational attainment (see Aitkin
and Longford, 1986; Fielding and Yang, 2005; Penn and Berridge, 2008)
[4.7, 4.18, 4.20] and earnings (see Davies et al., 1988) [4.8].
Aitkin and Longford (1986) [4.7], in their general analysis of clustered
educational data, argued for the use of variance component or ‘random par-
ameter’ models. They applied a range of linear models to data on 907 pupils
in 18 schools from one Local Education Authority.
Fielding and Yang (2005) [4.18] fitted a series of multi-level cumulative
logit models to data on ‘A’-level grades14, cross-classified by student and teach-
ing group within a number of educational institutions.
xxxvi Editors’ Introduction

Their initial models were defined at three levels:

Level 1: a set of explanatory variables to explain ‘A’-level grades, lodged within


the cross-classification of a particular student and teaching group at
level 2, within a particular institution at level 3.
Level 2: independent teaching group and student random effects.
Level 3: an institution-specific random effect.

To incorporate teacher effects, Fielding and Yang examined a subset of data


on a small number of further education colleges. They used a fixed effects
specification for such colleges by incorporating dummy variables into a set of
explanatory factors. The resulting models were defined at two levels:

Level 1: as per the initial model, but without the link to a particular insti-
tution at level 3.
Level 2(a): independent teaching group and student random effects, as per
the initial model.
Level 2(b): a set of teacher-specific explanatory variables and teacher-specific
random effects, both weighted according to each teacher’s share
of the complete teaching timetable of a teaching group over the
two years of its provision.

Penn and Berridge (2008) [4.20] analysed a wide range of school-level and
locality-level factors that affect each of the three stages in a young adult’s
educational trajectory in England: GCSE results, subsequent path taken at
age 16 and ‘A’-level results. By applying three-level models to data collected as
part of the EFFNATIS15 project, they found no evidence of any locality-level
effects. Furthermore, none of the factors conventionally considered to affect
educational attainment such as gender, social class and ethnicity had a con-
sistent effect across all three stages. Rather, each factor had a contingent effect
at specific points within the overall trajectory of educational outcomes.
Earlier, Davies et al. (1988) [4.8] presented a linear model that was also
appropriate for examining such hierarchical data. Their model had conven-
tional regression, variance component and random coefficient models as
special cases and was fitted using the software VARCL (Longford, 1990). The
effectiveness of the model was demonstrated through an analysis of earnings
in the British engineering industry. Particular emphasis was placed by them
on the interpretation of parameter estimates and residuals.
During the 1980s, extensions to earlier ordinal models were developed.
Anderson (1984) [4.6] developed the stereotype model. The choice between
stereotype models was made empirically on the basis of comparative model
fit. This was deemed particularly important for ‘assessed, ordered categorical
response variables’, where it was not obvious a priori whether the ordering was
relevant to the regression relationship. Model simplification was assessed in
Editors’ Introduction xxxvii

terms of whether the response categories were distinguishable with respect


to the vector of explanatory factors.
Research subsequently progressed from the analysis of cross-sectional
ordinal data to the modelling of repeated or clustered ordinal data. Exactly the
same issues that arose in the analysis of longitudinal categorical data, outlined
in Volume 3, needed to be addressed in the context of repeated or clustered
ordinal data.

A. State Dependence

Albert et al. (1997) [4.12] proposed a class of latent Markov chain models
for analysing repeated ordinal responses with medical diagnostic misclassifi-
cation. They modelled the underlying monotonic response and the misclassi-
fication processes separately, and proposed an EM algorithm (see Dempster
et al., 1977 [3.4]) that allowed for ordinal parameterisations, time-dependent
covariates and randomly missing data.
Pudney (2008) [4.21] subsequently developed a simulated maximum
likelihood approach for the estimation of dynamic linear models, where the
continuous response variable was observed through the prism of an ordinal
scale. He applied the latent auto-regression (LAR) model to BHPS data on
households’ perceptions of their financial well-being, and demonstrated the
superior fit of the LAR model to the state dependence (SD) model.

B. Residual Heterogeneity

Crouchley (1995) [4.9] developed a random effects model for multivariate and
grouped (clustered) ordered categorical data. He preferred the complementary
log-log link function to other link functions such as the logit, probit and log-log
(McCullagh and Nelder, 1989). The complementary log-log link function had
the advantage that it provided a closed-form expression for the likelihood of the
model, unconditional on the random effect over a wide range of distributions.
This likelihood was computed without recourse to numerical integration or
Gaussian quadrature. Crouchley assumed that the distribution for the random
effects belonged to the Hougaard (1986) family of distributions16.

C. Movers and Stayers

Ekholm et al. (2003) [4.17] modelled the probabilities of particular sequences


of responses (‘path probabilities’). In order to specify the association between
responses measured for the same ‘generic unit’ or individual, they used
measures of association that were ratios of moments17 (‘dependence ratios’).
The two-way dependence ratio (of order two) between any two responses
from the same individual was defined as the ratio of the moment of order two
xxxviii Editors’ Introduction

relative to the product of the respective moments of order one. For meaningful
association, a strong structure had to be imposed on the dependence ratios by
deriving them from a vector of association parameters18. In Ekholm et al.’s as-
sociation model, dependence ratios were expressed as explicit functions of
association parameters, explanatory variables and time. An explicit expres-
sion for an individual’s path probability was then derived which permitted the
calculation of that individual’s contribution to the overall likelihood.

D. Initial Conditions

Pudney (2008) [4.21] indicated that, in the state dependence (SD) model,
there are two alternative approaches for dealing with random effects. Heckman
(1981b) [3.8] specified an approximation to the distribution of the initial
response, conditional on strictly exogenous explanatory variables and on ran-
dom effects. He then derived the distribution of subsequent responses, con-
ditional on the initial response, explanatory variables and random effects, by
using sequential conditioning. The alternative approach, used by Wooldridge
(2005) [4.19], was to specify the mixing distribution, conditional on the initial
response and exogenous explanatory variables (see Volume 3).
Wooldridge’s (2005) [4.19] binary probit model can be extended in a
straightforward manner to fit a dynamic ordered probit model. If the response
comprises ordered categories, then an ordered probit model can be specified
with lagged response indicator variables and strictly exogenous explanatory
variables. The observed value of the ordered response variable is determined
by a latent variable falling in a particular interval, where the cutpoints must
be estimated, as in the cumulative logit model (see McCullagh, 1980 [4.2]).
If the mixing distribution, conditional on both the initial response and strictly
exogenous explanatory variables, is specified as having a homoscedastic
normal distribution, then standard random effects ordered probit software
can be used.
The Wooldridge approach has advantages in SD models but is less attrac-
tive in LAR models, where the latent response of interest is not observable and
cannot be conditioned upon. For this reason, Pudney (2008) [4.21] used the
Heckman treatment of initial conditions to model longitudinal panel data.

E. Dropout

In recent years, there has been considerable work examining the issue of
repeated ordinal data subject to dropout or ‘attrition’. Some approaches have
made strong assumptions about the nature of dropout. One of these was the
population-averaged approach of Molenberghs et al. (1997) [4.10]. They
developed a multivariate Dale model combined with logistic regression for
dropout. The association between observations was measured by a global
odds ratio. Response and dropout processes were modelled as conditionally
Editors’ Introduction xxxix

independent given complete data, resulting in a likelihood that could be esti-


mated reliably using the EM algorithm. They cautioned that the interpreta-
tion of the results of any model depended upon the specific assumptions made
and also that a parallel sensitivity analysis should be performed.
There is an important difference between the analyses of Molenberghs
et al. (1997) [4.10] and those of Ekholm et al. (2003) [4.17]. The association
parameters used by Molenberghs et al. involved global odds ratios which
varied over time but which did not discriminate between movers and stayers.
Ekholm et al. concentrated attention on the characteristics of ‘paths travelled’
(or the sequences of responses). They examined individual paths by speci-
fying meaningful models in terms of local, rather than global, association
measures.
Another method for handling ordinal data subject to dropout involved
subject-specific approaches. The random effects model for ordinal data
developed by Crouchley (1995) [4.9] has been extended in a variety of ways.
Sheiner et al. (1997) [4.11] developed a cumulative logit model which was
conditional upon subject-specific random effects, and an empirical model for
dropout (‘censoring’) which was conditional upon observed responses and
subject-specific random effects. Following the marginal likelihood approach
proposed by Davies and Crouchley (1985) [3.9], random effects were assumed
to be normally distributed and then integrated out. Hedeker and Mermelstein
(2000) [4.14] suggested an alternative random effects model which allowed
individual changes over time to be estimated.
Other subject-specific approaches have made fewer assumptions about
the nature of dropout. Ten Have et al. (2000) [4.15] followed the shared par-
ameter approach of Follmann and Wu (1995) [3.12] by proposing a mixed
effects logistic model, in which the random effects included random intercepts
and random slopes (see Davies et al. (1988) [4.8]). The ordinal outcome and
‘survival’ (or, conversely, dropout) processes were assumed to share a com-
mon random effects structure. Dropout was also assumed to be conditionally
independent of outcome, given such underlying random effects. Once again,
a parallel sensitivity analysis was suggested.
Agresti and Natarajan (2001) [4.16] presented a review of Bayesian and
non-Bayesian methods for clustered (or repeated) ordered categorical data.
Their review concentrated on two classes of model. The first involved marginal
(or population-averaged) models, including generalised estimating equa-
tions. The second entailed cluster-specific (or subject-specific) models which
included the maximum likelihood estimation of random effects models
as well as a Bayesian approach that used a Markov Chain Monte Carlo (MCMC)
approach.
In the context of (ordinal) item response theory (IRT), Bradlow and Zaslavsky
(1999) [4.13] developed a Bayesian hierarchical analysis comprising three
levels19. Inferences were made using samples from the posterior distributions
in order to compute point and interval estimates. Samples were obtained from
xl Editors’ Introduction

three different MCMC samplers: data augmentation, a Metropolis step and


the Griddy-Gibbs sampler.
The IRT theme was continued in a non-Bayesian context by Liu and
Hedeker (2006) [4.19]. They proposed a random effects model that allowed
for three-level multivariate ordinal outcomes and accommodated multiple
random subject effects. This approach allowed different item factor loadings
(item discrimination parameters) to be estimated for multiple outcomes. The
explanatory variables in their models were not required to follow the assump-
tion of proportional odds.

Concluding Remarks
In this series, we have demonstrated how the discipline of social statistics
has evolved over the last century or more. In recent years, the trend has been
towards increasingly rigorous formulation of research hypotheses, larger and
more detailed datasets, more complex statistical models to match such data,
and a more sophisticated level of statistical analysis in the major international
social science journals.
The advent of e-social science and high performance computing has opened
up the way to easy replication of results, and has helped to produce standards
of scientific rigour within the social sciences which are comparable to those
in the natural and physical sciences.
There has been a long and distinguished track record of interdisciplinary
collaborative research between statisticians and social scientists. There remain
considerable opportunities for continued innovation at the interface between
statistics and social science. We hope, therefore, that this present series will
inspire a new generation of social statisticians to develop innovative methods
of direct use to their social scientific colleagues.

Notes
1. Continuous measures are common in the natural sciences. They include such variables
as height, weight and Body Mass Index (BMI). In the social sciences, they feature in
psychology (IQ) and economics (income, profits, GNP).
2. Likert items measure attitudes and allow respondents to choose between a range
of options ordered in terms of ‘strongly agree’, ‘agree’, ‘neither agree nor disagree’,
‘disagree’ and ‘strongly disagree’. A Likert scale is the sum of responses across several
Likert items.
3. Departures of observed frequencies from expected frequencies produce greater chi-
squared values for a fixed sample size n. The p-value of the test is the null probability
that the chi-squared test statistic takes a value at least as large as the observed value.
4. The relationships in such tables produced a range of indices of association such as
Yule’s Q, Goodman and Kruskal’s tau, and lambda (see Blalock, 1972, Chapter 15).
5. Blanchard’s article actually compared the characteristics of horses: specifically ‘grandsires’
and ‘grandchildren’.
Editors’ Introduction xli

6. Whilst much of the conceptual foundation of contemporary social statistics was


developed during this period, the terminology used varied amongst authors. This is
signalled by our use of quotations. Wherever necessary, we have provided current
terms as appropriate.
7. This issue was taken up once again by Penn and Dawkins (see Volume 3) in their log-
linear analysis of marital class endogamy. Their analysis revealed that dichotomising
their seven occupational groupings into a ‘middle class’/‘working class’ divide produced
a gross distortion of the underlying social processes.
8. N refers to the number of variables comprising two categories. R refers to a single
variable comprising more than two categories.
9. For example, the distribution of a variable X is characterised by a single unknown par-
ameter, θ. In many of the estimation problems encountered, the information contained
within a sample of observations {x1, …, xn} from this distribution can be summarised.
Some function of the sample values (usually the mean) provides as much information
about θ as the sample itself. Such a function would be sufficient for estimation purposes
(see Mood, Graybill and Boes, 1974, pp. 299–300).
10. This became possible in later versions of the statistical software package, Software for
the Analysis of Binary Recurrent Events, SABRE (http://www.sabre/lancs.ac.uk/).
11. See, for example, Scott et al. (1998).
12. This is known as the proportional odds assumption (cumulative logit models fitted
under this assumption are known as proportional odds models).
13. See the Sage series on multi-level modelling edited by Skrondal and Rabe-Hesketh
(2009).
14. These are assumed to be ordinal.
15. The Effectiveness of Integration Strategies in Europe. This was funded by the European
Union’s Fourth Framework. The wider results of the project are presented in Penn and
Lambert (2009).
16. Three well-known members of this family are the gamma, the inverse Gaussian and
the positive stable law distributions (Aalen, 1988).
17. For realisations Y1, Y2, … of a variable Y, the first moment is defined as the expectation
of Y1, the second moment is the expectation of Y1 × Y2, etc.
18. See the ‘manifest classes’ or interaction parameters of Hauser (1978) (2.8).
19. Probabilities of individual responses to items were modelled as a function of person-
and item-specific parameters θ1 and general parameters θ2 (regression coefficients
corresponding to the covariates of interest); that is, [Y|θ1, θ2]. The distribution of
[θ1|θ2] was modelled and the prior distributions [θ2] were specified.

Additional References
Aalen, O.O. (1988) ‘Heterogeneity in survival analysis’, Statistics in Medicine, 7, 1121–1137.
Agresti, A. (1990) Categorical Data Analysis, New York: Wiley.
Arellano, M. and Carrasco, R. (2003) ‘Binary choice panel data models with predeter-
mined variables’, Journal of Econometrics, 115, 125–157.
Bates, G. and Neyman, J. (1951) ‘Contributions to the Theory of Accident Proneness II:
True or False Contagion’, University of California Publications in Statistics, 1, 215–253.
Berridge, D., Penn, R. and Ganjali, M. (2009) ‘Changing attitudes to gender roles: A longi-
tudinal analysis of ordinal response data from the British Household Panel Study’,
International Sociology, 24(3), 346–367.
Blalock, H.M. (1972) Social Statistics (2nd edn.), New York: McGraw-Hill.
Cox, D.R. (1972) ‘Regression models and life-tables’, Journal of the Royal Statistical Society,
Series B, 34, 187–220.
xlii Editors’ Introduction

Davies, R.B. and Crouchley, R. (1985) ‘The determinants of party loyalty: a disaggregate
analysis of panel data from the 1974 and 1979 general elections in England’, Political
Geography Quarterly, 4, 4, 307–320.
Francis, B., Green, M. and Payne, C. (1993) The GLIM System: Release 4 Manual, Clarendon
Press: Oxford.
Freeman, L.C. (1965) Elementary Applied Statistics, New York: Wiley.
Galton, F. (1889) Natural Inheritance, London: Macmillan.
Giddens, A. (1971) Capitalism and Modern Social Theory, Cambridge: Cambridge University
Press.
Glaser, B. and Strauss, A. (1967) The Discovery of Grounded Theory, Chicago: Aldine.
Hand, D. (2001) Principles of Data Mining, Cambridge, MA: MIT Press.
Hougaard, P. (1986) ‘Survival models for heterogeneous populations derived from stable
distributions’, Biometrika, 73, 387–396.
Likert, R. (1932) ‘A technique for the measurement of attitudes’, Archives of Psychology,
140, 1–55.
Longford, N.T. (1990) VARCL: Software for Variance Component Analysis of Data with Nested
Random Effects (Maximum Likelihood), Technical Report, Educational Testing Service,
Princeton, NJ.
McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models (2nd edn.), London:
Chapman and Hall.
Mood, A.M., Graybill, F.A. and Boes, D.C. (1974) Introduction to the Theory of Statistics
(3rd edn.), McGraw-Hill: Singapore.
Nelder, J.A. and Wedderburn, R.W.M. (1972) ‘Generalized Linear Models’, Journal of the
Royal Statistical Society, Series A, 135, 370–384.
Pearson, K. (1900) ‘On a criterion that a given system of deviations from the probable in
the case of a correlated system of variables is such that it can be reasonably supposed
to have arisen from random sampling’, Philosophical Magazine, Series 5, 50: 157–175
(Reprinted 1948 in K. Pearson’s Early Statistical Papers, ed. by E.S. Pearson, Cambridge:
Cambridge University Press).
Penn, R. and Berridge, D. (2008) ‘Modelling Trajectories through the Educational System
in North West England’, Education Economics, 16, 4, 411–431.
Penn, R. and Lambert, P. (2009) Children of International Migrants in Europe, Palgrave
Macmillan: London.
Scott, J., Braun, M. and Alwyn, D. (1998) ‘Partner, parent, worker: family and gender roles’,
in R. Jowell, J. Curtice, A. Park, L. Brook, K. Thomson and C. Bryson (eds) British- and
European-Social Attitudes; The 15th Report: How Britain Differs, Aldershot: Ashgate.
Skrondal, A. and Rabe-Hesketh, S. (2009) Multilevel Modelling, Sage: London.
Yule, G.U. (1900) ‘On the association of attributes in statistics’, Philosophical Transactions
of the Royal Society of London, A194, 257–319.
Yule, G.U. (1912) ‘On the methods of measuring association between two attributes
(with discussion)’, Journal of the Royal Statistical Society, 75, 579–642.

View publication stats

You might also like