Presentation PSD 2018

Safely plotting continuous variables on
a map
Peter-Paul de Wolf and Edwin de Jonge

September 26-28, 2018
Contents
Introduction
Disclosure risk
Risk measure
Example
Discussion
2
Introduction
NSIs often have Geographical output:
– Traditionally using tables (counts, magnitudes, …)
– Using administrative regions, defined by e.g.
- property (fire brigade area)
- size of ground area (grid)
- number of units (NUTS)
3
Introduction
– Aggregated data plotted on map
- e.g. Electricity consumption enterprises
3
Introduction
– Aggregated data plotted on map
- e.g. Electricity consumption enterprises
- Electricity consumption private dwellings
3
Introduction
Plotting a continuous variable on a map
Continuous in value
Not necessarily continuous in location
Running example: mean energy consumption of enterprises

Using administrative regions and zooming
4
Introduction
Continuous in value

4
Introduction
Continuous in value

4
Some Notation
Population 𝒰 = {r , … , r } with r = (𝑥 , 𝑦 )
Measurements 𝑔 , … , 𝑔
Spatial distribution 𝑔(r) ∶ ℝ → ℝ
Total (energy consumption) in area 𝒜
𝐺(𝒜) = 𝑔(r) 𝑑r
𝒜
Spatial or unit mean (energy consumption) in area 𝒜
𝐺(𝒜) 𝐺(𝒜)
𝐺̄ (𝒜) = or 𝐺̄ (𝒜) =
||𝒜|| ∑ 𝟙(r ∈ 𝒜)
5
Disclosure scenario
– Location of population unit is very identifying

– Zooming leads to pinpointing exact location of population unit
– Spatial distribution is on sensitive variable
– Spatial characteristics of population implies identifiability
Attacker scenario
Determine ‘hot-spots’
Zoom in into region of interest
Link value of spatial distribution to individuals
6
Disclosure scenario
Sub-scenarios:
External attacker
Attacker is not a population unit with location in area of hot-spot
and can only use information from the spatial distribution
Internal attacker
Attacker is a population unit with location in area of hot-spot and
can additionally use its own information to derive more accurate
information about another unit in that area
7
Risk measure
Risk measure based on 𝑝%-rule as function of table cell
Adjust risk measure to be based on 𝑝%-rule as function of area
0, if 𝒜 ∩ 𝒰 = Ø
RC (𝒜; 𝑝) = (1 + 𝑝/100)𝑔 − 𝐺(𝒜), if 𝒜 ∩ 𝒰 = {r }
(1 + 𝑝/100)𝑔 + 𝑔 − 𝐺(𝒜), otherwise
where
𝑔 = max(𝑔 ∶ r ∈ 𝒜) and 𝑔 = max(𝑔 ∶ r ∈ 𝒜 and r ≠ r )
8
Risk measure
Which area 𝒜 to be used to determine risk?
Is there a natural area associated with unit 𝑗?
Note:
For any unit 𝑗 we can find an area 𝒜∗ such that
|𝐺(𝒜∗ ) − 𝑔 | 𝑝
<
𝑔 100
Practical approach
Calculate RC (𝒜; 𝑝) on the lowest level to which zooming is allowed
9
Example: risk vs resolution
Mean energy consumption of enterprises
% unsafe cells, external attacker % unsafe cells, internal attacker

100% 100%
75% 75%
p% criterium p% criterium
5% 5%
50% 50%
10% 10%
15% 15%
25% 25%
0% 0%
250m 500m 750m 1000m 250m 500m 750m 1000m

resolution resolution
10
Example
Kernel density estimator of spatial distribution 𝑔̂ (r) and
𝐺̂ (𝒜) = 𝑔̂ (r) 𝑑r
𝒜
Consequently possible that
𝐺̂ (𝒜) < 𝑔
or
𝐺̂ (𝒜) > 0 while 𝒜 ∩ 𝒰 = Ø
11
Example
Use ‘estimated largest and second largest’:
0, if 𝒜 ∩ 𝒰 = Ø
R̂ C, (𝒜; 𝑝) = (1 + 𝑝/100)𝑔̂ , − 𝐺̂ (𝒜), if 𝒜 ∩ 𝒰 = {r }
(1 + 𝑝/100)𝑔̂ , + 𝑔̂ , − 𝐺̂ (𝒜), otherwise
Alternatively, use
0, if 𝒜 ∩ 𝒰 = Ø
∗
R̂ C, (𝒜; 𝑝) = (𝑝/100)𝑔 − |𝐺̂ (𝒜) − 𝑔 |, if 𝒜 ∩ 𝒰 = {r }
(𝑝/100)𝑔 − |𝐺̂ (𝒜) − 𝑔 − 𝑔 |, otherwise
12
Example
Use ‘estimated largest and second largest’:
0, if 𝒜 ∩ 𝒰 = Ø
R̂ C, (𝒜; 𝑝) = (1 + 𝑝/100)𝑔̂ , − 𝐺̂ (𝒜), if 𝒜 ∩ 𝒰 = {r }
(1 + 𝑝/100)𝑔̂ , + 𝑔̂ , − 𝐺̂ (𝒜), otherwise
Alternatively, use
0, if 𝒜 ∩ 𝒰 = Ø
∗
R̂ C, (𝒜; 𝑝) = (𝑝/100)𝑔 − |𝐺̂ (𝒜) − 𝑔 |, if 𝒜 ∩ 𝒰 = {r }
(𝑝/100)𝑔 − |𝐺̂ (𝒜) − 𝑔 − 𝑔 |, otherwise
12
Example: spatial smoothing
13
13
13
Example: risk vs bandwidth
𝑝%-rule with 𝑝 = 10
% unsafe cells, external attacker % unsafe cells, internal attacker

100% 100%
●
●
● ●
● ● ● ● ● ● ●
● ● ●
● ● ●
● ●
● ● ●
● ● ● ●
● ● ●
●
75% 75%
PC6 / zip code
●
●
resolution resolution
●
●
●
●
●
100m ●
100m
●
●
● ● ● ● ●
200m 200m
50% ●
● PC6 / zip code 50%
● 300m 300m
●
●
400m 400m
● 500m 500m
●
●
●
●
25% 25%
●
0% 0%
0m 500m 1000m 1500m 0m 500m 1000m 1500m

kde bandwidth kde bandwidth
14
Discussion
– Smoothing intrinsically provides some disclosure protection.

– Choose bandwidth dynamically (spatial) to protect properly?
– Adjustment of the risk measure RC when using smoothing is
needed.
– Location risk: how to define ‘natural area’?
– Other disclosure scenarios?
15
Discussion
Wherever you go, there you are…

Buckaroo Banzai
Location: identifying or sensitive?
16

Presentation PSD 2018

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation PSD 2018

Uploaded by

Copyright:

Available Formats

Safely plotting continuous variables on

Peter-Paul de Wolf and Edwin de Jonge

Running example: mean energy consumption of enterprises

Running example: mean energy consumption of enterprises

Running example: mean energy consumption of enterprises

Spatial or unit mean (energy consumption) in area 𝒜

– Location of population unit is very identifying

Risk measure based on 𝑝%-rule as function of table cell

Adjust risk measure to be based on 𝑝%-rule as function of area

𝑔 = max(𝑔 ∶ r ∈ 𝒜) and 𝑔 = max(𝑔 ∶ r ∈ 𝒜 and r ≠ r )

Is there a natural area associated with unit 𝑗?

Mean energy consumption of enterprises

% unsafe cells, external attacker % unsafe cells, internal attacker

250m 500m 750m 1000m 250m 500m 750m 1000m

Kernel density estimator of spatial distribution 𝑔̂ (r) and

Consequently possible that

Use ‘estimated largest and second largest’:

Use ‘estimated largest and second largest’:

% unsafe cells, external attacker % unsafe cells, internal attacker

0m 500m 1000m 1500m 0m 500m 1000m 1500m

– Smoothing intrinsically provides some disclosure protection.

Wherever you go, there you are…

Location: identifying or sensitive?

You might also like