Professional Documents
Culture Documents
Lambert
Lambert
2
1) Data Management though e-
Social Science
DAMES – www.dames.org.uk
ESRC Node funded 2008-2011
Aim: Useful social science provisions
Specialist data topics – occupations; education qualifications;
ethnicity; social care; health
Mainstream packages and accessible resources
Aim: To exploit/engage with existing DM resources
In social science – e.g. ESDS, CESSDA
In e-Science – e.g. OGSA-DAI; OMII
3
To us ‘Data management’ means…
‘the tasks associated with linking related data resources, with
coding and re-coding data in a consistent manner, and with
accessing related data resources and combining them within the
process of analysis’ […DAMES Node..]
4
Some components…
Manipulating data
Recoding categories / ‘operationalising’ variables
Linking data
Linking related data (e.g. longitudinal studies)
combining / enhancing data (e.g. linking micro- and macro-data)
Secure access to data
Linking data with different levels of access permission
Detailed access to micro-data cf. access restrictions
Harmonisation standards
Approaches to linking ‘concepts’ and ‘measures’ (‘indicators’)
Recommendations on particular ‘variable constructions’
Cleaning data
‘missing values’; implausible responses; extreme values
5
Example – recoding data
Count
educ4
3.00 Higher 4.00 S chool
1.00 2.00 sc hool or level or
-9. 00 Degree Diploma vocational below Total
Highest -9 Mis sing or wild 323 0 0 0 0 323
educat ional -7 Proxy respondent 982 0 0 0 0 982
qualific ation
1 Higher Degree 0 425 0 0 0 425
2 Firs t Degree 0 1597 0 0 0 1597
3 Teaching QF 0 0 340 0 0 340
4 Other Higher QF 0 0 3434 0 0 3434
5 Nurs ing QF 0 0 161 0 0 161
6 GCE A Levels 0 0 0 1811 0 1811
7 GCE O Levels or Equiv 0 0 0 0 2518 2518
8 Commercial QF, No O
0 0 0 331 0 331
Levels
9 CSE Grade 2-5, Scot
0 0 0 0 421 421
Grade 4-5
10 Apprenticeship 0 0 0 257 0 257
11 Ot her QF 102 0 0 0 0 102
12 No QF 0 0 0 0 2787 2787
13 Still At School No QF 138 0 60 0 0 138
Total 1545 2022 3935 2399 5726 15627
Example –Linking data
Linking via ‘ojbsoc00’ :
c1-5 =original data / c6 = derived from data / c7 = derived from www.camsis.stir.ac.uk
7
Matching files (‘deterministic’)
Complex data (complex research) is distributed across
different files. In surveys, use key linking variables for...
One-to-one matching
SPSS: match files /file=“file1.sav” /file=“file2.sav” /by=pid.
Stata: merge pid using file2.dta
Many-to-Many matches
1) Variable constructions
o Coding and re-coding values
2) Linking datasets
o Internal and external linkages
9
..plus the centrality of keeping clear records
of DM activities
Reproducible (for self)
Replicable (for all)
Paper trail for whole
lifecycle
Cf. Dale 2006; Freese 2007
In survey research,
this means using
clearly annotated
syntax files
(e.g. SPSS/Stata)
Syntax Examples:
www.longitudinal.stir.ac.uk
10
Principle DAMES services
(current status)
11
GEMDE – Grid Enabled Specialist Data Environments
12
GEODE –
Occupational data
Data curation tool
14
Data fusion
tool
15
2. Linking DAMES and e-Stat
High level vision is to ingrain data
management functionality and uptake within
e-Stat modelling capabilities
17
Data oriented review
Applied research perspective
Range of data resources
Accessing and documenting data resource
options
18
The implementation for e-Stat
This is mostly a blank space…
…and we’ve not hitherto used Python
20
Est store demo here
21
Appendix items
22
Model 1:
Data file specification Variable manipulation & analysis
Spouse
BHPS, wave Analytical CAMSIS
Graphics
A individuals file Spouse
SOC Gender Current job
Age RGSC
Wave BHPS wave Age
C B individuals. (yrs)
bands
-> usedataset{UKDA_5151}
DAMES most -> usedatafile{individuals wave A}
common
commands: -> matchdata{individuals wave A;individuals wave B; link
Text variable=pid; format=wide}
interface -> SPSS{match files file=“aindresp.sav” /file=“bindresp.sav”
/by=pid} Invoked manually
Commands -> SPSS{fre var=ajbrgsc} or in response to
invoking other
packages
-> Stata{recode ageb 16/30=1 31/50=2 *=.} manipulating
-> R{..} graphs
-> Stata{do $path2\part1_analysis.do}
23
‘The significance of data management for social survey
research’
(see http://www.esds.ac.uk/news/eventdetail.asp?id=2151)
24
Some provocative examples for the UK…
Social mobility is increasing, not decreasing!
− Popularity of controversial findings associated with Blanden et al (2004)
− Contradicted by wider ranging datasets and/or better measures of stratification position
− DM: researchers ought to be able to more easily access wider data and better variables
25
Comment – growing interest in data
management..?
Historically, references covering DM were few and far between
• Dale, A., Arber, S., & Procter, M. (1988). Doing Secondary Analysis. London:
Unwin Hyman Ltd.
Recently, there’s been a small burst of relevant references
• Levesque, R., & SPSS Inc. (2008). Programming and Data Management for SPSS
Statistics 17.0. Chicago, Il.: SPSS Inc. .
• Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC
Press.
• Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test
Ideas. New York: Jossey Bass.
• http://www.esds.ac.uk/support/onlineguides.asp
• http://www.longitudinal.stir.ac.uk/
..and growing interest re. ‘documentation for replication’
• Dale, A. (2006). Quality Issues with Survey Research. International Journal of
Social Research Methodology, 9(2), 143-158.
• Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not
Sociology? Sociological Methods and Research, 36(2), 2007.
26
E-Science and Data Management
E-Science isn’t essential to good DM, but it has capacity to improve
and support conduct of DM…
1. Concern with standards setting
in communication and enhancement of data
2. Linking distributed/heterogeneous/dynamic data
Coordinating disparate resources; interrogating live resources
3) Contribution of metadata
tools/standards for variable harmonisation and standardisation
4) Linking data subject to different security levels
27