You are on page 1of 9

The application of controlled rounding for tabular data with particular

reference to the Tau-Argus software


Philip Lowthian, Giovanni Merola, Office for National Statistics
This paper describes the Controlled Rounding Program implemented as a disclosure control
tool in the ONS. Controlled rounding is a method for rounding the cells of a statistical table
that maintains its additive structure. This method is appropriate for the disclosure control of
frequency tables in which low values are considered dis closive; it can be used as an
alternative to suppression. The idea was proposed long time ago but the program developed
for the ONS is the first one that can actually round in a controlled manner large
multidimensional tables.
In this paper we will first give a brief des cription of the controlled rounding algorithm, then
we discuss some of its properties for the protection of disclosive tables. Finally, we describe
how it is embedded in a more general package for the statistical disclosure control of tables,
Tau Argus, which is distributed by the ONS.

1. Introduction
Rounding techniques involve the replacement of the original data by multiples of a given rounding
base. Rounding (e.g. to the nearest integer) has been used in science for presentational purp oses for
many centuries. Random and deterministic rounding has been used as a confidentiality protection
tool by national statistics institutes in tabular data for decades (see e.g. Nargundkar and Saveglund
[11], Ryan [15], Willenborg and de Waal [16]). Unfortunately naive rounding frequently destroys
additivity in tables. Controlled rounding has the desirable feature that the rounded tables are
additive i.e. the values in the marginal cells coincide with those calculated by adding the relevant
interior cells. As a general numerical technique, controlled (or matrix) rounding is not new:
solutions were provided by Bacharach [1] as early as 1966. Controlled rounding was, however,
developed and promoted as a serious technique for official statistics by Cox and coworkers in the
1980s (Cox and Ernst [3], Causey et al [2], Cox [4]). Further work on computational aspects was
done by other workers (e.g. Kelly, Golden, Assad and Baker [8,9,10]). One difficulty discovered by
Causey et al [2] was the existence of three dimensional tables for which classical (zero-restricted)
controlled rounding was impossible. In zero-restricted controlled rounding, cell entries that are
already a multiple of the rounding base are not changed, and other entries can only move to an
adjacent multiple of the rounding base. Fischetti and Salazar-Gonzlez [6] overcame this problem
for three- and four-dimensional tables by slightly relaxing the zero-restriction. However, until very
recently (Salazar-Gonzlez [12,13] ), controlled rounding has not been practical as a standard
technique for confidentiality protection of tables in official statistics primarily because of the
difficulty of finding a control-rounded version of any set of linked tables that occur in official
statistics. Salazar et al. [14] describes both controlled rounding theory and practice.
Section 2 introduces the concept of controlled rounding methodology with descriptions of both
the zero restricted and non-restricted approaches. Other rounding methods are also discussed.
Section 3 describes the protection against disclosure which the different forms of rounding provide.
Finally section 4 introduces the Tau-Argus program and briefly shows how to carry out controlled
rounding within this package.

2. Controlled Rounding Methodology


A statistical table is a collection of cells classified with respect to the categories of one or more
variables. For example, the cells can be classified by sex and age of the respondents. Typically, the
1

external cells of a table show marginal totals, that is sub-totals of the internal cells. Furthermore, in
some cases the categories of a variable can be grouped in wider classes, such as, for example, a
table showing respondents classified by Wards may also give the totals by Local Authority District
(LAD) in which the Wards are nested. In this case the variable is called hierarchical. A simple
example of a frequency table with hierarchies is given in Table 1.
Marital status
Geography Single
Married
00ACFX
6
2
00ACFY
7
0
00AC
13
2
00AWFY
4
5
00AWFZ
7
2
00AW
11
7
Total
24
9

Total
8
7
15
9
9
18
33

Table 1: Frequency table with hierarchical geography: Wards nested in LADs (4 digit codes geographies) by
marital status. Values in bold indicate that they are sums of other cell values.

For our purposes it is convenient to represent a table in vector form, i.e. list the cells (including the
marginals) one after the other in a column vector. Let a be such vector, then the additive structure of
the table can be represented by the equation:
Ma 0,
where M is a matrix of coefficients (0, 1 or -1) that describe the additive relationships among the
cells of the table. For example, one row will have ones for the elements corresponding to the cells
of the first row, minus one for the element corresponding to the total for the first Ward and zero for
the other elements. This row describes the fact that the first value in the last column is the sum of
the elements in the first row. The matrix corresponding to a given table is univocally defined,
regardless of the values in the cells. That is to say, if another population is classified by the same
variables, the corresponding table must satisfy the same additivity constraints.

2.1 Controlled Rounding


The process of controlled rounding consists in finding a table, y, of which elements are multiples of
a specified base, b, and satisfy the constraints My 0 . Developing an algorithm that finds feasible
solutions for this problem is not easy. Assuming that it can be done, for a given table and a given
base there could not exist a controlled rounded solution or there could exist more than one solution.
Furthermore, in SDC practice, data protectors may require that the rounded table satisfies additional
constraints, for example, that a particular cell frequency is larger than a specified value. The
Controlled Rounding Program (CRP, developed by Dr. J. J. Salazar-Gonzalez 1 for the ONS) is
based on sophisticated optimization techniques and computes solutions by the following criteria:
a) each rounded value is a multiple of the base adjacent to the original value;
b) the rounded values, y i, must satisfy given constraints defined as lb yi yi and
ub yi yi ;
c) the rounded table, y, satisfies My 0 ;

Juan Jos Salazar is Professor of Optimization Methods at the University of La Laguna, Tenerife, Spain. He is also
author of the optimal suppression routine used for SDC at the ONS.

d) if more than one solution satisfying a) and b) exists, the solution chosen is the one that
minimizes the distance function: (a, y) a a wi | ai yi | , where the wis are given
i

weights.
By criterion a) the original values can be rounded either up or down to one of the adjacent multiples
of the base and the values that are already multiples of the base remain unchanged. Criterion b)
allows to incorporate external information in the computation of the solution. Criterion c) makes the
solution be a controlled one. Criterion d) gives an objective for choosing among different solutions.
A rounded solution that respects criteria a) above is called zero-restricted controlled. Such a
solution that also respects criteria b) and c) cannot be always found. In this case, in CRP criterion a)
can be substituted by:
a') if a solution satisfying criteria a)-c) above cannot be found, criterion a) is relaxed allowing
one or more rounded values to be multiples of the base K steps distant from one of the
adjacent ones.
This criterion, then, allows values to jump to multiples of the base that are not next to them; values
that were already multiples of the base can jump to a multiple of the base that is not farther than K
steps. A solution satisfying criteria a),b)-d) is simply called non-restricted controlled rounding.
Table 2 shows possible outcomes of controlled rounding for different choices of the number of
steps (denoted with K) allowed. We have implicitly assumed that the values are bounded to be
nonnegative.
Base 5
original
7
10
13

Zero-Restricted
Unrestricted
K=0
K=1
K=2
5,10
0,5,10,15 0,5,10,15,20
10
5,10,15
0,5,10,15,20
10,15
5,10,15,20 0,5,10,15,20,25

Table 2: Possible outcomes of controlled rounding. The first columns gives the original values, the subsequent
columns the possible values for zero-restricted, unrestricted with K equal to 1 and 2, solutions, respectively. The
solutions are restricted to be nonnegative.

Table 3 shows a zero-restricted solution in base 3 for Table 1 above. This solution was obtained
without imposing any constraints on the rounded values other than the non-negativity ones.
Marital status
Geography Single
Married
00ACFX
6
3
00ACFY
6
0
00AC
12
3
00AWFY
3
6
00AWFZ
9
0
00AW
12
6
Total
24
9

Total
9
6
15
9
9
18
33

Table 3: Zero -restricted solution in base 3 for Table 1 above. The values corresponding to the LADs (4 digit
codes geographies) are the sum of the Wards they contain and th e marginal totals also respect the additivity
constraints.

2.2

Other rounding methods

There exist other methods for rounding tables. In SDC are also used conventional and random
rounding. In these methods, differently from controlled rounding, cells are round ed independently
and the rounded table is not necessarily additive. We now briefly describe these methods.

Conventional rounding
In conventional rounding values are rounded to the nearest multiple of the base. This method is well
known and we only give an example of it, in Table 4.

Original
7
8
10

Base 5
Rounded value
5
10
10

Table 4: Values assumed by original values rounded conventionally in base 5.

Random Rounding
In Random Rounding values are rounded up or down to one of the nearest multiples of the base
according to a probability. One variant of this method, often used in SDC, is the Unbiased Random
Rounding. In this method the probabilities are assigned so that the expected rounded value is the
original value. In Table 5 are shown the probabilities for the unbiased random rounding for some
values to be rounded in base 5.
Original Rounded value Probability
5
5
1
5
4/5
6
10
1/5
5
3/5
7
10
2/5
5
1/5
9
10
4/5
Table 5: Possible values and corresponding probabilities for numbers rounded with unbiased random rounding.

3. Protection against disclosure provided by rounding


The protection provided by rounding can be measured by the uncertainty about the true values
determined by it. Therefore, we take the width of the interval containing the possible true value for
a given rounded value, called existence interval, as measure of protection. We assume that the
rounding base and the number of jumps allowed are released together with the rounded table.
Furthermore, we assume that it is known that the original values are frequencies (hence nonnegative
integers).
We will indicate with z and a be the original and the rounded values, respectively, and with b the
rounding base.

Zero-restricted rounding
A user of the data can compute the following existence intervals for the true value, z:

z [0, b 1] if a 0
z [a b 1, a b 1] if a 0.

For example, if the rounding base is b=5 and the rounded value is a=0, a user can determine that the
original value is between 0 and 4. If the rounded value is not 0, then users can determine that the
true value is between plus or minus 4 units from the published value. Hence, the width of the
existence interval is
b 1
if a 0
2(b 1) if a 0.
So, if the rounding base is 5, the protection will be 4 units if the rounded value is equal to zero and
8 units otherwise.

Unrestricted controlled rounding


As mentioned before, it is assumed that the number of steps allowed, K, is released together with
the rounded table. Then we can generalise the above formulae to unrestricted controlled rounding:
an intruder can compute the following existence intervals for the true value:
z [0, a ( K 1)b 1] if a Kb
z [a ( K 1)b 1, a ( K 1)b 1] if a Kb.

For example, assume that for controlled rounding with b=5 and K=1, a rounded value is a=15, then
a user can determine that z [6, 24] . From the existence intervals above, it is easy to see that the
width of the existence interval is:
a ( K 1)b 1 if a Kb
2[( K 1)b 1]

if a Kb.

Conventional rounding
In conventional rounding users know that the true value has been rounded to the nearest multiple of
the base2. Hence, the existence intervals that can be computed are:
z 0, b 2
if a 0
z a b 2 , a b 2 if a 0,
where the symbol denote the floor value. For example if the base is b=5 and a=0, then a user
can determine that z 0, 2 if a=5, then a user can determine that z 3, 7

b 2 if a 0
b 1 if a 0,
From the expressions above it is clear that conventional rounding gives less protection than
controlled rounding. Broadly speaking, we could say that it is half of that of controlled rounding.

Random Rounding
The protection given by random rounding is the same as that given by zero-restricted controlled
rounding. However, as it will be discussed in next paragraph, the actual protection given by this
method can be lower than the theoretical protection discussed here.
2

This definition needs to be specified when the base is even; in this case, usually, values equidistant to two multiples of
the base are rounded up. However, for simplicity, for conventional rounding we assume that the base is odd. The
computations for odd bases can be easily derived.

Reduction of the protection due to unpicking


As mentioned above, of the three rounding methods presented only controlled rounding gives
additive rounded tables. Nonadditive rounded tables are, in general, not liked much by users.
Furthermore, the difference between totals and rounded values can be used by malicious users to
unpick values, reducing the protection of the table. Consider, for example, Table 6, which was
obtained applying simple random rounding to Table 1. The marginal cells highlighted in bold do
not satisfy the additivity constraints. In fact, the rounded total number of respondents in LAD
00AC, 15, is not equal to the sum of the rounded number of respondents in Wards 00ACFX and
00ACFY, 6 for both.
Base=3 Single Married Total
00ACFX
6
0
6
00ACFY
6
0
6
00AC
15
0
15
00AWFY
6
3
9
00AWFZ
6
3
9
00AW
9
6
18
Total
24
9
33
Table 6: Simple random rounding of Table 1 above. Marginal cells that are not equal to the sum of the
corresponding internal cells are highlighted in grey.

From the rounded values, 6, it can be inferred that the respondents in the two Wards are between 4
and 8 but, from the rounded value, 15, of the LAD, it can be inferred that the total number of
respondents in that area is between 13 and 17. However, if in one Ward there are 4 respondents, the
total in the LAD can at most be 12. Hence, in each Ward there must be at least 5 respondents.
Considering the respondents in the LAD, they cannot be 17, because the sum of the respondents in
the Wards can be at most 16. In this way we managed to reduce the protection of the cells by one
unit. Iterating the procedure for all cells of the table it might be possible to reduce more the
protection; in some cases it is possible to recover exactly some of the values.
One of the advantages of controlled rounding is that it cannot be unpicked because there are no
additive inconsistencies to be exploited. Hence, the actual protection achieved is the same as the
theoretical protection given above, which is not always the case for other types of rounding.

4. Controlled rounding in Tau-Argus


Computational Implementation
We have used the computer program implemented in Salazar-Gonzlez [13]. It is a branch-andbound algorithm, a typical scheme to find an optimum solution to any integer linear programming
problem (Wolsey [17]). The branching phase consists of fixing yi either to round down or up to the
nearest multiple of the rounding base. To avoid having to evaluate the consequences of doing this
for all cells, there is a bounding phase of the algorithm. This consists of solving the linear
programming relaxation of the mathematical problem (i.e. a non-integer simplification of the
original integer program, thus providing a lower bound for the objective function (4)) after each
branching step. A further ingredient to speed up the process is a heuristic phase that provides an
upper bound of the objective function at each branching step. This enables unproductive branches
to be discarded (when the lower bound is greater than or equal to the upper bound), thus saving
computing time. It also provides a lower and upper bound on the objective function at each
6

iteration, thus allowing an assessment of the properties of the best solution so far, which is
sometimes very useful if the iterative search is taking a very long time, and the process is aborted.
Another important ingredient of an efficient branch-and-bound algorithm is a pre-processing
stage, in which the size of the program is reduced by removing redundant variables. For example, in
this case, when frequencies are already multiples of the rounding base, they can be removed from
the mathematical model. Redundant equations can also be removed. This algorithm was
implemented in standard C programming language. Computing the lower bound of the objective
function (4) required the use of a professional linear programming solver, and for this purpose we
used Xpress 14 (Dash Optimization [5]). Specifying the structure My=0 (and the frequencies, a, for
any set of linked tables, and other input parameters) in an appropriate form for transfer to the
controlled rounding routines is not trivial in practice. The complete algor ithm has been incorporated
within -Argus (Hundepool [7]), the general purpose disclosure control package for statistical tables
(available from http://neon.vb.cbs.nl/casc/). -Argus has been developed to produce aggregate
tables from input microdata, and store the necessary metadata to guide various disclosure control
routines, including that for controlled rounding.
Using Tau-Argus for controlled rounding
The user can import a table or create a table within Tau-Argus. Once the table has been created it
can be viewed by the user. In the frequency table example shown below in table 7 a minimum
frequency of three was defined, the cell failing this rule is highlighted.

Table 7: The Tau-Argus view table window prior to rounding.

By choosing the rounding option the following window is obtained.


7

Table 8: The options available after controlled rounding has been chosen.

Once the OK button is clicked the table is rounded as can be seen in Table 9

Table 9: The rounded table.

5. Conclusions
Controlled rounding as an SDC tool presents several advantages, the main ones being:
1) it offers the best real protection of the data among the existing rounding methods;
2) cell values add up to the marginal totals;
3) it can be applied to hierarchical tables, meaning that also tables at low geographical levels can
be protected;
4) the rounded table is as near as possible to the original table.
For the above reasons, the ONS decided to offer CRP as one of its SDC tools. It is embedded in a
wider scope package for SDC, Tau Argus.

References
1. Bacharach, M.: Matrix Rounding P roblem. Management Science 9 (1966) 732 -742
2. Causey, B.D., Cox, L.H., Ernst, L.R..: Applications of Transportation Theory to Statistical P roblems. Jour nal of the American
Statistical Association 80 (1985) 903-909
3. Cox, L.H., Ernst, L.R.: Controlled Rounding. INFOR 20 (1982) 423 -432
4. Cox, L.H.: A Constructive P rocedure for Unbiased Controlled Rounding. Journal of the American Statistical Associatio n 82
(1987) 520-524
5. Dash Optimization (2003) See web site http://www.dashoptimization.com
6. Fischetti, M., Salazar, J. J.: Computational Experience with the Controlled Rounding P roblem in Statistical Disclosure C ontrol.
Journal of Official Statistics 14/4 (1998) 553-565
7. Hundepool, A. The CASC project. In Domingo -Ferrer, J. (eds.): Inference Control in Statistical Databases: From Theory to
P ractice. Lecture Notes in Computer Science, Vol. 2316. Springer-Verlag (2002)
8. Kelly, J. P ., Golden, B. L., Assad, A. A.: Using Simulated Annealing to Solve Controlled Rounding P roblems. ORSA Journal on
Computing 2 (1990) 174-185
9. Kelly, J. P ., Golden, B. L., Assad, A. A., Baker, E. K.: Controlled Rounding of Tabular Data. Operations Research 38 (1990) 760772
10. Kelly, J. P ., Golden, B. L., Assad, A. A.: Large-Scale Controlled Rounding Using TABU Search with Strategic Oscillation.
Annals of Operations Research 41 (1993) 69 -84
11. Nargundkar, M.S., Saveland, W. Random Rounding: A Means of P reventing Disclosure of Information About Individual
Respondents in Aggregate Data. A.S.A. Annual Meeting P roceedings of the Social Statistics Section (1972) 382 -385
12. Salazar-Gonzlez, J.J.: A Unified Framework for Different Methodologies in Statistical Disclosure P rotection. Technical paper,
University of La Laguna, Tenerife, Spain (2002)
13. Salazar-Gonzlez, J.J.: Controlled Rounding and Cell P erturbation: Statistical Disclosure Limitation Methods for Tabular Data.
Technical paper, University of La Laguna, Tenerife, Spain (2002)
14. Salazar-Gonzlez, J.J., Young, C., Lowthian, P ., Merola, G., Bond, S. and Brown, D.: Getting the best results in Controlled
Rounding with the Least Effort. Proceedings of the CASC Project Final Conference, Barcelona. Springer-Verlag (2004)
15. Ryan, M.P .: Random Rounding and Chi-Squared Analysis. The New Zealand Statistician 16 (1981) 16 -25
16. Willenborg, L. C. R. J. and de Waal, T. Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Vol. 155 .
Springer-Verlag (2001)
17. Wolsey, L.A.: Integer P rogramming. Wiley -Interscience (1998)

You might also like