You are on page 1of 25

An update on household statistics from

administrative data

Pete Jones
Administrative Data Census Project
Office for National Statistics
Administrative Data Census Project

• Aiming to replicate as many census outputs using administrative
data and surveys in the run up to 2021
• Deliver an Administrative Data Census to run in parallel with the
2021 Census
• Size and characteristics of the population
• Number and characteristics of households
• Housing characteristics
• Publishing Research Outputs for household statistics
• Estimates for number of households (‘occupied addresses’)
• Household size
• Household composition
• Aim to produce at low levels of geography
• Development of new ONS surveys to support household statistics
2
Definition of ‘household’
• 2011 Census defines households as…
“one person living alone, or a group of people (not necessarily related)
living at the same address who share cooking facilities and share a living
room or sitting room or dining area”.
• Household definition evolves but can be targeted in the design of
censuses and surveys
• Administrative data is based on person level interactions with government
departments
• Individuals registering at addresses
• Household members update information at different times
• Limited information about relationships between residents
3
‘Occupied addresses’ from admin data

• Population spine constructed by linking person records
• Address information on person records is linked to the Address Register
• A usual resident Statistical Population Dataset (SPD) drawn from
population spine 4
• Occupied addresses (UPRNs) identifiable on the SPD
Limitations of UPRN as household proxy

• Definitional differences - not always a one-to-one relationship between
addresses and households
• UPRN assignment - Not all individuals included on the SPD can be
assigned to a UPRN
• Complex residential addresses – Addresses with ‘parent’ and ‘child’
UPRN hierarchies
• Exclusion rules for usual residence – Rules used to determine usual
residence may result in some incorrect exclusion of some households
from SPD
• Results in notable undercounts when comparing occupied addresses on
SPD with 2011 Census household estimates

5
Comparison between SPD occupied
addresses with Census households

England and Wales –
Comparing with Census for 2011 :-

6
Simulating adjustments with a Population
Coverage Survey (PCS)
• Population Coverage Survey to run alongside administrative data
• 1% sample of postcodes spread across England and Wales
• Partly used to adjust for missing addresses on the frame
• Simulated the potential use of a coverage survey to adjust for
undercounts in occupied UPRNs

End sample: 1% of
OAs in England and
Wales

LSOAs OAs

7
(NB: PSU – Primary Sampling Unit , SSU – Secondary Sampling Unit)
Calculations for local authority
estimates

8
Results: national level

Admin data unadjusted Admin data DSE adjusted
Estimate 21,980,124 23,433,814
Difference from 2011 Census -1,385,920 67,770
Relative bias (%) -5.93 0.29
RSE (%) N/A 0.26
RRMSE (%) N/A 0.39

9
Results: LA level

10
10 LAs with largest bias
City of London

Liverpool

Rutland Lambeth Southwark
Gwynedd

Colchester

Shepway

Isles of Scilly
Hastings
11
Compared to February Research Output

Admin data (unadjusted) Admin data DSE adjusted

LA name Relative Bias (%) LA name Relative Bias (%)
City of London -50.6 City of London -23.7
Kensington and Chelsea -34.6 Gwynedd -8.8
Westminster -31.5 Hastings -6.0
Islington -22.2 Southwark 5.5
Gwynedd -21.4 Lambeth 5.2
Isles of Scilly -19.6 Rutland 5.0
Hammersmith and Fulham -18.7 Liverpool -4.7
Camden -17.4 Isles of Scilly -4.5
Tower Hamlets -16.8 Colchester 4.2
Wandsworth -16.0 Shepway -4.1

12
Household Composition using Admin Data

• Household composition relies on the availability of
relationships data
• Sources currently available:
1. Housing Benefit
• Partner ID available where applicable

2. National Benefits Database
• Partner ID available for State Pension claimants

3. Child Benefit data
• Contains a National Insurance ID for one of the parents
• High coverage of dependent children
• Eligible up to age 16, then up to 19 if in approved education
or training
13
Algorithm

1. Single person households – one person in UPRN
2. Student – all people have HESA record
3. Lone parent families:

Age
1 Smith
> 18 years
Parent ID
2 Smith
18
16
3

14
Couple families

• Aim to capture all couples: married, cohabiting, opposite sex,
same sex
• If no partner ID available from benefits data, can only use age
gap

Age Partner ID
1 ≤ 12 years
2

18
16

15
Couple families

4. Couple families:
Age Couple
1
2 Smith
Parent ID
> 18 years

3 Smith
18
16
4

16
Other households

Contain more than one family
Age Age
1
2
> 50 years
1

3 2 < 15 years
4
3

5

More than two generations Person 3 too old to
be child of 1 or 2
17
Results
Missing

Other

Lone parent

Census
Couple SPD

Student

Single

0 10 20 30 40 50 60
% of households

18
Imputation of couple relationships

• Testing imputation method to estimate missing relationships,
similar to Austrian approach
• Annual Population Survey (APS) used as donor dataset to
impute couple and non-couple relationships
• Enables us to capture full distribution of age range between
couples
• Method currently matches donors on
• Age (oldest person) rounded to nearest 5
• Age difference – Matches records where age gap is the same or 1 year
higher or lower
• Sexes of the two people
• Median of > 30 donors per group with same matching values
• About 25% of relationships are imputed as couples

19
Results

Other

Lone parent

Couple Census
SPD

Student

Single

0 10 20 30 40 50 60
% of households

• SPD V2.0 underestimates 2 person households and
overestimates larger households
• Expect differences in ‘Couple’ and ‘Other’ household types 20
Detailed results
Other
Other
All aged 65 and over

With dependent children

Student
S

All children non-dependent
parent
Lone

Dependent children

All children non-dependent Census
SPD
Couple

Dependent children

No children

One family: All aged 65 and over
1

Other
person
Single

Aged 65 and over

0 5 10 15 20
Percentage of households

21
Household size
2011 Census vs. 2011 SPD
1 person
1 person
2 people
2 people
3 people
3 people

0.30
0.6

0.5

0.25
0.5

0.4

0.20
0.4

0.3
2011 Census

2011 Census

2011 Census

0.15
0.3

0.2

0.10
0.2

0.1

0.05
0.1

y=x y=x y=x
intercept= -0.006 intercept= 0.025 intercept= -0.007

0.00
0.0

slope= 1.011 0.0 slope= 1.01 slope= 0.946

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.00 0.05 0.10 0.15 0.20 0.25 0.30

SPD v2 2011 estimates SPD v2 2011 estimates SPD v2 2011 estimates

4 people
4 people 5+5+people
people
0.20

0.30
0.25
0.15

0.20
2011 Census

2011 Census
0.10

0.15
0.10
0.05

0.05

y=x y=x
intercept= -0.004 intercept= -0.002
0.00

0.00

slope= 1.037 slope= 0.893

0.00 0.05 0.10

SPD v2 2011 estimates
0.15 0.20 0.00 0.05 0.10 0.15

SPD v2 2011 estimates
0.20 0.25 0.30
22
Household size
2011 Census vs. SPREE
1 person
1 person
2 people
2 people
3 people
3 people

0.30
0.6

0.5

0.25
0.5

0.4

0.20
0.4

0.3
2011 Census

2011 Census

2011 Census

0.15
0.3

0.2

0.10
0.2

0.1

0.05
0.1

y=x y=x y=x
intercept= 0.007 intercept= 0.001 intercept= -0.007

0.00
0.0

slope= 1.049 0.0 slope= 0.964 slope= 0.994

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.00 0.05 0.10 0.15 0.20 0.25 0.30

SPREE estimates SPREE estimates SPREE estimates

4 people
4 people 5+5+people
people
0.20

0.30
0.25
0.15

0.20
2011 Census

2011 Census
0.10

0.15
0.10
0.05

0.05

y=x y=x
intercept= -0.003 intercept= 0.001
0.00

0.00

slope= 0.989 slope= 1.045

0.00 0.05 0.10

SPREE estimates
0.15 0.20 0.00 0.05 0.10 0.15

SPREE estimates
0.20 0.25 0.30
23
Examples at local authority level
SPD1 difference from census percentages versus SPREE2 adjustment, 2011

Kensington and Chelsea
4
3 Some geographies
are affected by certain
2
missingness e.g.
1
armed forces data, so
0 may need to be
1 2 3 4 5 plus
-1 treated differently
-2
-3 If an area is
SPD¹ - Census
-4 extremely different
SPREE² - Census from the national
-5
-6 distribution, it may be
harder to estimate
Source: Office for National Statistics
Notes: 1. SPD - Statistical Population Dataset using those
2. SPREE - Structure Preserving Estimator distributions.
24
Summary

• More detailed research on relationship between UPRN
and households
• 2011 Census data
• Working with local authorities

• Consider how surveys can be targeted to meet specific
challenge of definitions
• A UPRN to household adjustment?
• Sample design for SPREE/GSPREE
• Survey design for imputation methods

• More research outputs – cross tabulations
• More detailed research on register based countries
25