Professional Documents
Culture Documents
Data Security PDF
Data Security PDF
The rising abuse of computers and increasing threat to personal privacy through data banks
have stimulated much interest m the techmcal safeguards for data. There are four kinds of
safeguards, each related to but distract from the others. Access controls regulate which
users may enter the system and subsequently whmh data sets an active user may read or
wrote. Flow controls regulate the dissemination of values among the data sets accessible to
a user. Inference controls protect statistical databases by preventing questioners from
deducing confidential information by posing carefully designed sequences of statistical
queries and correlating the responses. Statlstmal data banks are much less secure than most
people beheve. Data encryption attempts to prevent unauthorized disclosure of confidential
information in transit or m storage. This paper describes the general nature of controls of
each type, the kinds of problems they can and cannot solve, and their inherent limitations
and weaknesses. The paper is intended for a general audience with little background in the
area.
Keywords and Phrases" security, data security, protection, access controls, information
flow, confidentiality, statistical database, statistical inference, cryptography, capabilities,
public-key encryption, Data Encryptmn Standard, security classes
CR Categortes. 4.35, 4.33, 1.3
Permission to copy without fee all or part of this material is granted provided t h a t the copies are not made or
dmtributed for dtrect commercial advantage, the ACM copyright notice and the title of the publication and its
date appear, and notice is given that copymg is by permission of the Association for Computing Machinery. To
copy otherwise, or to republish, requires a fee and/or specific permission.
1979 ACM 0010-4892/79/0900-0227 $00.75
i FILE X
READ X I FILE X
SMITH
FILE Y
FILE Z
~COPY~TO~I F,LE
i
JONES
JONES
FIGURE la. A C C E S S . Jones can be permitted to read FIGURE lb. F L O W . D e n i e d access to file Y, S m i t h
file Y and write m fileX; he has no access to fde Z. gets confederate Jones to make a copy; flow controls
could prevent this.
FIGURE lC. INFERENCE. A questioner used prior FIGURE ld, ENCRYPTION. Jones illicitly obtains a
knowledge to deduce confidential information from copy of file X; but Its encrypted contents a r e mean-
a statmtical summary; inference controls could pre- ingless to him.
vent this.
0
O
O f o
0
o (HOW MANY BROWN- 0
0
O | EYED, F[VE-FOOT- o
| BLONDES HAVE o
o
Oo L.H,ODENMOLES? oo
0
READ X
MEDICAL
INFORMATION
FILE FILE X
;
~ Iol - 16oo I
, i |
K Iol
'
|
B i L
, i
R I 3 t - - B+L-
I0 C
"1
. 7 t
"")";" Rw .:! K
CAPABILITY LISTS
ments by displacements (line numbers). move segments between main and second-
The command "Read D" refers to the D t h ary memory merely by updating the de-
line of the segment--that is, memory ad- scriptors. Because the keys do not change,
dress B + D. A reference is valid only if the no program is affected by relocations of
displacement is in range--that is, if 0 _ D segments in the memory hierarchy.
<:L. Figure 4 shows the final step in the
Figure 3 shows that descriptors of all scheme: the capability lists are associated
memory segments may be kept in a descrip- with programs. The access code of a capa-
tor table. Each descriptor is identified by a bility specifies one or more kinds of access:
unique key K. Now a program refers to a read (R), write (W), execute (E), or call (C).
segment by specifying the displacement Read access permits a program to copy a
and the key--thus "Read K, D" refers to value out of a memory segment; write ac-
the D t h word in the segment whose key is cess permits a program to store a value in
K. Each descriptor is stored with a "pres- a memory segment; execute access permits
ence bit" that tells whether the associated a processor to fetch instructions from a
segment is in the main store; if it is not, as segment; and call access permits a program
for key 7 in Figure 3, an attempted refer- to execute a procedure call instruction with
ence will trigger a "missing segment fault" that segment as the target. Execute access
that will cause the operating system to sus- is valuable for restricting access to privi-
pend the program and load the segment. leged operations. A program actually refers
This scheme confers considerable flexibility to a segment by a segment number S and
because the operating system can freely a displacement D. Thus "Read S, D " refers
Jones
Cox e-- - -
Smith
i,5ol ! _l_q
Name Code
Jones RW
Smith R
DISK
FILE
AUTHORIZATION
STORE
LIST
makes the ideal of a separate capability list supposedly correct and trustworthy super-
for every program very difficult to achieve. visor programs can manipulate capabili-
Radical changes in computer architecture, ties and segments without restriction
designed to bridge the "semantic gap" be- [WILK68]. This difficulty is ameliorated
tween concepts in the programming lan- somewhat in MULTICS, which has a linear
guage and concepts in the machine lan- hierarchy of supervisor states, called rings,
guage, may be needed to overcome this that confer successively greater privilege;
difficulty. Myers's S W A R D machine the user can operate some untrusted
[MYER78] and Gehringer's "typed mem- subprograms in rings of low privilege
ory" [GEHR79] point in the right direction. [SCHR72]. Contrary to the principle of least
Another serious problem with existing privilege, systems based on supervisor
systems is excessive privilege vested in the states permit programs in higher rings to
operating system. A supervisor mode of run with much more privilege than they
operation takes over when the user's pro- require for their tasks. There is no efficient
gram calls any operating system program. way for two cooperating subprograms to
The supervisor mode overrides all or most have nonoverlapping sets of privileges.
of the storage protection mechanisms. The The supervisor mode problem is an in-
C o m n . t i n t ~ , ~ . t ~ v ~ V n | I | N o 3 S e n t e m b e r 1979
Data Security 239
Q: How many patients have these char- TABLE 1. POLITICAL CONTRIBUTIONDATABASE
acteristics:
Name Sex Occupation Contribution
Male ($)
Age 45-50 Abel M Journalist 3000
Married Baker M Journalist 500
Two children Charles M Entrepreneur 1
Harvard law degree Darwin F Journalist 5000
Engel F Scientist 1000
Bank vice-president Fenwick M Scientist 20000
Took drugs for depression Gary F Doctor 2000
Hart M Lawyer 10000
This query will respond w i t h " 1" if Fenwick
has taken drugs for depression and "0"
otherwise.
The principle of this compromise is sim- B) such that queries for the formulas A and
ple. The questioner finds a formula C whose (A AND NOT B) are both answerable.
query set count is 1. He can then discover Schl6rer called the formula T ffi (A AND
whether the individual thus isolated has NOT B) the tracker o f / . Table 1 shows a
any other characteristic X by asking, "How database recording n ffi 8 secret political
many individuals satisfy C AND X?." (The contributions. Suppose that k ffi 2; then
response ' T ' indicates that X is character- responses are given only to queries applying
istic of the individual and "0" indicates to at least two but not more than six indi-
not.) This attack was first reported by Hoff- viduals. Suppose further that the ques-
man and Miller [HOFF70]. tioner believes that C ffi (JOURNALIST
It might seem that this compromise could AND FEMALE) uniquely identifies Dar-
be prevented by a m i n i m u m q u e r y set con- win. The minimum query set control would
trol: prevent direct questioning about Darwin.
The dialogue below shows how Darwin's
Do not respond to queries for which there contribution can be deduced by using as
are fewer than k or more than n - k tracker the formula T ffi (JOURNALIST
records in the query set, where n is the AND NOT FEMALE) ffi (JOURNALIST
total number of records in the database. AND MALE).
The positive integer k in this control is a Q: How many persons are JOURNAL-
design parameter specifying the smallest IST?
allowable size of a query set. If the query A: 3
language permits complementation, a max- Q: How many persons are JOURNAL-
imum size n - k of the query set must also IST AND MALE?
be enforced, for otherwise the questioner A: 2
could pose his queries relative to the com-
plement (NOT C) of the desired character- By subtracting the second response from
istics (C). the first, the questioner verifies that
Unfortunately, this control is ineffective. (JOURNALIST AND FEMALE) identifies
Schlorer showed that compromises may be only one individual (Darwin). The ques-
possible even for k near n / 2 by the tech- tioner continues with
nique of a "tracker" [ScHL75, SCHL79]. The Q: What was the total of contributions
basic idea is to pad small query sets with by all persons who are JOURNAL-
enough extra records to make them an- IST?
swerable, then subtract out the effect of the A: $8500
extra records. Schl6rer called the formula Q: What was the total of contributions
identifying the extra records the t r a c k e r by all persons who are JOURNAL-
because the questioner can use it to "track IST AND MALE?
down" additional characteristics of an in- A: $3500
dividual.
Suppose that a questioner, who knows Since Darwin is the only female journalist,
from external sources that an individual I her contribution is the difference between
is characterized by the logical formula C, is the response of the second query and the
able to express C in the form C ffi (A AND response of the first ($5000).
-1'
Encipher
Decipher
Decipher
Encipher
~ M
Secure Channel
FIGURE 6. Encryptmn using one key (traditional view).
by computer time or by dollars spent, would guard for data stored in, or in transit
be required to compromise a particular sys- through, media whose security cannot be
tem. This would enable designers either to guaranteed by these other controls. With
raise the cost of compromise beyond a spe- the help of a secret key (K) the sensitive
cific threshold, or to declare that the de- plaintext (M) is scrambled into unintelligi-
sired level of security could not be imple- ble ciphertext (M K) before being put in the
mented with the given cost constraints. insecure medium.
The query functions of many commercial In a traditional cryptosystem, illustrated
database systems seem to release much in Figure 6, there is a slow-speed secure
more information about confidential rec- channel by which the sender can inform the
ords than many people have suspected. receiver of the key K used to encode the
Without proper controls, databases subject message. The message itself, transmitted at
to simple compromises will be the rule high speed through the insecure medium, is
rather than the exception. When combined secure as long as the key is secret. Simmons
with minimum query set restrictions, ran- calls this symmetric encryption because the
dom sample queries appear to offer the best same key is used at both ends of the channel
defense. [SIMM79]. The code is broken if an intruder
A final defense against snooping is threat can deduce the key by analyzing the ciph-
monitoring--inspection of logs or audit ertexts. Keys are changed regularly, usually
trails for unusual patterns of queries, espe- more often than the time in which the
cially many queries for the same records cleverest intruder is believed capable of
[HoFF70]. Although it does not attempt to locating the key systematically.
control the flow of information through The code will be unbreakable if the key
query programs, monitoring threatens ex- is a sequence of random bits as long as the
posure of illegal activity. message (pseudorandom generation will
not do!); each key bit specifies whether the
corresponding message bit is comple-
4. CRYPTOGRAPHIC CONTROLS
mented or not. With such a key, called a
Access, flow, and inference controls may one-time pad, each bit of ciphertext is
not prevent accidental or malicious disclo- purely random and uncorrelated with all
sures of sensitive data. None of these con- other bits of ciphertext. Practical crypto-
trols helps if an operator leaves a listing of systems are based on keys that are much
the password file on a desk, if confidential shorter than the messages; because the in-
data are moved off-line during backup or truder may know the enciphering and de-
maintenance, if transmissions are tapped or ciphering algorithms, the security of these
played back, or if hardware and software cryptosystems depends on the secrecy of
are faulty. Encryption is a common safe- the keys and the computational difficulty
List of
Secret Keys
A sA
B Se
AI
A to KG" ( A , ( I , B) SA)
KG to A: (I, K, (K,A)SS) SA
A to B: (K, A) Ss
FIGURE 7. Protocol for allocating A and B a common key K for exchanging messages.
Insecure Chonnel
s~
Encipher Decipher M
Decgpher Encipher
SA
t
COMPUTER A COMPUTER B
FIGURE 8. Public-key encryption.