You are on page 1of 23

Safely plotting continuous variables on

a map

Peter-Paul de Wolf and Edwin de Jonge


September 26-28, 2018
Contents

Introduction

Disclosure risk

Risk measure

Example

Discussion

2
Introduction
NSIs often have Geographical output:
– Traditionally using tables (counts, magnitudes, …)
– Using administrative regions, defined by e.g.
- property (fire brigade area)
- size of ground area (grid)
- number of units (NUTS)

3
Introduction
NSIs often have Geographical output:
– Traditionally using tables (counts, magnitudes, …)
– Using administrative regions, defined by e.g.
- property (fire brigade area)
- size of ground area (grid)
- number of units (NUTS)
– Aggregated data plotted on map
- e.g. Electricity consumption enterprises

3
Introduction
NSIs often have Geographical output:
– Traditionally using tables (counts, magnitudes, …)
– Using administrative regions, defined by e.g.
- property (fire brigade area)
- size of ground area (grid)
- number of units (NUTS)
– Aggregated data plotted on map
- e.g. Electricity consumption enterprises
- Electricity consumption private dwellings

3
Introduction
Plotting a continuous variable on a map
Continuous in value
Not necessarily continuous in location

Running example: mean energy consumption of enterprises


Using administrative regions and zooming

4
Introduction
Plotting a continuous variable on a map
Continuous in value
Not necessarily continuous in location

Running example: mean energy consumption of enterprises


Using administrative regions and zooming

4
Introduction
Plotting a continuous variable on a map
Continuous in value
Not necessarily continuous in location

Running example: mean energy consumption of enterprises


Using administrative regions and zooming

4
Some Notation

Population 𝒰 = {r , … , r } with r = (𝑥 , 𝑦 )
Measurements 𝑔 , … , 𝑔
Spatial distribution 𝑔(r) ∶ ℝ → ℝ
Total (energy consumption) in area 𝒜

𝐺(𝒜) = 𝑔(r) 𝑑r
𝒜

Spatial or unit mean (energy consumption) in area 𝒜

𝐺(𝒜) 𝐺(𝒜)
𝐺̄ (𝒜) = or 𝐺̄ (𝒜) =
||𝒜|| ∑ 𝟙(r ∈ 𝒜)

5
Disclosure scenario

– Location of population unit is very identifying


– Zooming leads to pinpointing exact location of population unit
– Spatial distribution is on sensitive variable
– Spatial characteristics of population implies identifiability

Attacker scenario
Determine ‘hot-spots’
Zoom in into region of interest
Link value of spatial distribution to individuals

6
Disclosure scenario

Sub-scenarios:
External attacker
Attacker is not a population unit with location in area of hot-spot
and can only use information from the spatial distribution

Internal attacker
Attacker is a population unit with location in area of hot-spot and
can additionally use its own information to derive more accurate
information about another unit in that area

7
Risk measure

Risk measure based on 𝑝%-rule as function of table cell

Adjust risk measure to be based on 𝑝%-rule as function of area

0, if 𝒜 ∩ 𝒰 = Ø
RC (𝒜; 𝑝) = (1 + 𝑝/100)𝑔 − 𝐺(𝒜), if 𝒜 ∩ 𝒰 = {r }
(1 + 𝑝/100)𝑔 + 𝑔 − 𝐺(𝒜), otherwise

where

𝑔 = max(𝑔 ∶ r ∈ 𝒜) and 𝑔 = max(𝑔 ∶ r ∈ 𝒜 and r ≠ r )

8
Risk measure
Which area 𝒜 to be used to determine risk?

Is there a natural area associated with unit 𝑗?

Note:
For any unit 𝑗 we can find an area 𝒜∗ such that

|𝐺(𝒜∗ ) − 𝑔 | 𝑝
<
𝑔 100

Practical approach
Calculate RC (𝒜; 𝑝) on the lowest level to which zooming is allowed

9
Example: risk vs resolution

Mean energy consumption of enterprises

% unsafe cells, external attacker % unsafe cells, internal attacker


100% 100%

75% 75%

p% criterium p% criterium
5% 5%
50% 50%
10% 10%
15% 15%

25% 25%

0% 0%

250m 500m 750m 1000m 250m 500m 750m 1000m


resolution resolution

10
Example

Kernel density estimator of spatial distribution 𝑔̂ (r) and

𝐺̂ (𝒜) = 𝑔̂ (r) 𝑑r
𝒜

Consequently possible that

𝐺̂ (𝒜) < 𝑔

or
𝐺̂ (𝒜) > 0 while 𝒜 ∩ 𝒰 = Ø

11
Example

Use ‘estimated largest and second largest’:

0, if 𝒜 ∩ 𝒰 = Ø
R̂ C, (𝒜; 𝑝) = (1 + 𝑝/100)𝑔̂ , − 𝐺̂ (𝒜), if 𝒜 ∩ 𝒰 = {r }
(1 + 𝑝/100)𝑔̂ , + 𝑔̂ , − 𝐺̂ (𝒜), otherwise

Alternatively, use

0, if 𝒜 ∩ 𝒰 = Ø

R̂ C, (𝒜; 𝑝) = (𝑝/100)𝑔 − |𝐺̂ (𝒜) − 𝑔 |, if 𝒜 ∩ 𝒰 = {r }
(𝑝/100)𝑔 − |𝐺̂ (𝒜) − 𝑔 − 𝑔 |, otherwise

12
Example

Use ‘estimated largest and second largest’:

0, if 𝒜 ∩ 𝒰 = Ø
R̂ C, (𝒜; 𝑝) = (1 + 𝑝/100)𝑔̂ , − 𝐺̂ (𝒜), if 𝒜 ∩ 𝒰 = {r }
(1 + 𝑝/100)𝑔̂ , + 𝑔̂ , − 𝐺̂ (𝒜), otherwise

Alternatively, use

0, if 𝒜 ∩ 𝒰 = Ø

R̂ C, (𝒜; 𝑝) = (𝑝/100)𝑔 − |𝐺̂ (𝒜) − 𝑔 |, if 𝒜 ∩ 𝒰 = {r }
(𝑝/100)𝑔 − |𝐺̂ (𝒜) − 𝑔 − 𝑔 |, otherwise

12
Example: spatial smoothing

13
Example: spatial smoothing

13
Example: spatial smoothing

13
Example: risk vs bandwidth

𝑝%-rule with 𝑝 = 10

% unsafe cells, external attacker % unsafe cells, internal attacker


100% 100%



● ●
● ● ● ● ● ● ●
● ● ●
● ● ●
● ●
● ● ●
● ● ● ●
● ● ●

75% 75%
PC6 / zip code


resolution resolution





100m ●
100m


● ● ● ● ●
200m 200m
50% ●
● PC6 / zip code 50%
● 300m 300m


400m 400m
● 500m 500m



25% 25%

0% 0%

0m 500m 1000m 1500m 0m 500m 1000m 1500m


kde bandwidth kde bandwidth

14
Discussion

– Smoothing intrinsically provides some disclosure protection.


– Choose bandwidth dynamically (spatial) to protect properly?
– Adjustment of the risk measure RC when using smoothing is
needed.
– Location risk: how to define ‘natural area’?
– Other disclosure scenarios?

15
Discussion

Wherever you go, there you are…


Buckaroo Banzai

Location: identifying or sensitive?

16

You might also like