Professional Documents
Culture Documents
A (1995)
158, Part 1, pp. 21-53
Inferencein ForensicIdentification
By DAVID J. BALDINGt and PETER DONNELLY
Queen Mary and WestfieldCollege, London, UK
SUMMARY
Theproblem ofquantifying theweightofevidence inforensic is addressed.
identification
Theessenceoftheproblem is abstracted
ina simpleparadigm, theanalysisofwhichyields
valuableinsights
andhighlights important A specialcaseofthisanalysis
distinctions. gives
a resolution
oftheso-calledislandproblem.Theparadigm is extendedtoassesstheeffects
ofseveralfeatureswhichmaybe important inpractical situations,suchas possiblealibis,
heterogeneous populationsof potentialsuspectsand informative protocolsforfinding
Theanalyses
suspects. ofthepaperclarifyseveralissuespertainingtotheweight ofevidence
withmatching
associated acid(DNA)profiles
deoxyribonucleic andraisesomenewconcerns.
In addition,established
concerns regarding theincorrect interpretation
of probabilistic
evidencebyjuriesare discussedin theDNA profilecontext.
Keywords:BAYES RULE; DEOXYRIBONUCLEIC ACID PROFILES; FORENSIC IDENTIFICATION;
ISLAND PROBLEM; WEIGHT OF EVIDENCE
1. INTRODUCTION
A generalproblem
inforensic ariseswhena suspectis observed
identification tohave
of traits,also knownto be possessedbythe
raretrait,or combination
a particular
Theproblemis to quantify
criminal. theevidential strength,forthesuspect'sguilt,
of suchan observation. For example,thetraitmightbe a physicalcharacteristic
describedby an eye-witness,a deoxyribonucleicacid (DNA) profileor perhapsa
weaponofthetypeusedin thecrime.An infamous case focusing
on suchevidence
is People versus Collins (California Reporter, 1968). Here, a robberywas committed
by a black man with a moustache and a beard and a blonde woman with a pony-tail,
and the getaway car was yellow with an off-whitetop. A couple were later charged
with the crime, essentiallythe only evidence being that they and their car fittedthis
description.In the course of the trialand subsequent appeal, several distinctstatistical
analyses of the weight which should be given to the evidence were presented.
In Sections 2 and 3 we consider these issues in the contextof a simplifiedparadigm.
The original version of this, the so-called 'island problem', has been the subject of
several different analyses, with distinct solutions. We describe and resolve a
formulationof the problem which is more general and more realisticthan that usually
considered. Along with additional insights, the general formulation raises some
important practical issues which seem not to have previously been appreciated.
The firstimportantpracticalconsideration,introducedin Section 4, concernspositive
correlations in trait possession. The effectof any such correlations is to reduce the
2.1. MethodI
is knownto haveT and thenumberX of
Yellin(1979)notedthatthecriminal
=
Z()
I
PG I'NXN
1+ xKx)(I p)Nx
1X=0
E(1iX)~~~~~~~~~~~~1
1-1_p)N+ 1(1
(N+ I)p
2.2. Method 2
thetotalnumberZ of
Kingston(1965)and Cullison(1969)arguedthatinitially
T-bearers,
innocent ornot,hasthebinomialdistribution
withparameters N+ 1 and
p. Afternotingthatthecriminalhas T, weknowthatthetotalnumber ofT-bearers
is non-zeroand hence
2.3. Method 3
Lindley(seeEggleston
(1983),p. 244)proposed
incorrespondencetheuseofBayes's
rule.Thelikelihood
thatthesuspectandcriminal bothhaveT isp underhypothesis
G andp2 underGc 'not G'. The priorprobabilities of G and Gc are respectively
1/(N+ 1) andN/(N+ 1). UsingtheoddsformoftheBayesrule,theposterior odds
ratio against G is
O()=EN/(N+ 1)
p 1/(N+1)
=Np, (3)
and hence
P(G)=lI * (4)
1+Np(4
Eggleston examplein whichN= 100andp = 0.004. In thiscase
gavea numerical
thethreeformulae
forP(G) give0.826,0.903and0.714respectively.
WhenNis large,
3.1. Non-sequential
Case
LetS and C denotetherandomvariableswhosevaluesarethelabels(or 'names')
and lets denotetheobservedvalueof
thesuspectand thecriminal
of respectively
odds againstG are
S. The posterior
P(GcIXc=Xs1= S=s)
P(GXc=X,=1, S=s)
P(XC=X,=1, S=sIC?s)P(C?s)
P(Xc=Xs=1, S=s1C=s)P(C=s)
_P(xc=x,=11C?s, S=s)P(C?sIS=s)
P(X,=1IC=S=S) P(C=sIS=s) (5)
it does notrequirethatthenumberof
Equation(5) is verygeneral.In particular,
possibleperpetratorsN be known.It also appliesin settings in whichthereis
uncertainty
aboutwhether or notthecriminalhas T. Notethatthereis an implicit
conditioning
on otherevidenceintheprobabilities
inequation(5) andequations
which
followfromit.
We nowmaketwosimplifying assumptions.
1. The probability
Assumption thatanyparticular
individualis thecriminalis
unchangedbyknowledgeof thelabel of thesuspect:P(C = i IS = s) = P(C = i) fori = 0,
1, . . .,IN.
Assumption 2. The T-statusofall theindividuals is independent
of theidentities
of thesuspectand thecriminal:(xo, xl, . . *, XN) is independent
of (S, C).
Theseassumptions areimplicitin theoriginalislandproblem.Assumption 1, with
i = s, implies
thatthecourt'sbeliefintheguiltofa particular individualis notchanged
bythefactthattheindividualhas beenidentified as thesuspect.A disinterested
observer maygiveinferential weightto thisfact,e.g. on thebasisof confidence in
thelaw enforcement authorities.As Dawid (1994)argues,it is inappropriate fora
courtto modify itsbeliefin theguiltof a particular individual simplybecausethat
individual hasbeenbrought totrial.In addition,assumption 1 withi?s requires that
theidentity ofthesuspectdoesnotprovideinformation abouttheguiltorinnocence
ofotherindividuals. Thiswillbe reasonable inmanycases.In Section5.5wedescribe
situations in whichit maybe inappropriate.
Assumption 2 impliesthattheknowledge thata particular pairare suspectand
criminal doesnotaffectthedistribution oftheT-status in thepopulation.Thiswill
notholdforcertaintypesof trait,as discussedin Section5.2. Itsvalidity willalso
dependon thebackground knowledge availableto thecourt.For example,search
protocols basedon a characteristiccorrelatedwithT mayinvalidate thisassumption;
see Section5.5.
Inviewofassumptions 1 and2, weshallintheremainder ofthissubsection suppress
theexplicit conditioningon theevent[S = sJ.It nowfollowsfromequation(5) that
of guilt,but notethattheyare
We referto theP(C=i) as 'prior'probabilities
or, equivalently,
P(GIxc=X= 1)=
1+ ZP(X1=1Ix'=1)
i?s
I+ sP(Xi=IIXs= )
ios
3.2. SequentialCase
Nowsupposethata searchhasbeenconducted inthepopulation
amongindividuals
untilone,thesuspect,is foundto haveT. Supposethatthelabels,in order,of the
individualssearchedare IO, II, . . ., I.. The orderof the searchmay be fixedor
random.We introduce
thenotation
The posterior
odds againstG afterthesearchare
In theoriginalislandproblemequation(13) becomes
O(G) = (N-y)p
P(xil= Ix1
X= 1)=P(xi2= 1I x = 1) il, i2E Sj, j= 1, 2, . .,K. (22)
We shallwritex(i) fora genericXi,iE Sj, so that,forexample,P(x U)= I X,s=1)
denotesthe commonvalue of the conditionalprobability, givenx,=1, of T-
possessionby individuali E Sj:
P(x(i) = I I Xs= l)--P(xi= I I Xs= 1) i E Sj.
Fromequation(8) it thenfollowsthat
K
O(G)= ZNj P(x(j) = 1I X,= 1), (23)
j=1
in whichNj= Sj1.
Let 0 denotetheprobability
thatan individual
chosenuniformlyrandomly inthe
population(excluding hasT. An alternative
thesuspect) analysis
ofequation(8) gives
havebeensuchthatphysicalstrength wouldhavefacilitated
itscommission.In the
extreme case,T maybe essentialto committhecrimeand hencethecriminal must
haveit,e.g. ifT is 'had accessto a weaponof thetypeusedto committhecrime'.
We shall use the termnecessaryto referto traitssuch thatP(Xj= 1 1C = i) = 1;
otherwise thetraitis contingent.
If assumption 2 is violatedin thiswaythentheargument leadingto equation(8)
is invalid.In somecasesequation(5) maystillbe simplified.An exampleinvolving
a necessary traitwillbe discussedin Section5.4.
5.4. PossibleAlibis
Now considerthesituation in whichindividuals maybe able to provethatthey
didnotvisitthesceneof thecrimeat thetimethatit was committed. We shalluse
thetermalibito denoteanyvalidproofof innocence otherthannon-possession of
T. Notethatthisdefinition differsslightlyfromcommonusage.In thissense,non-
possessionof an alibiis a necessarytrait:thecriminal cannothavea validproofof
innocence.We introduce additionalindicator randomvariables,fi, i= 0, 1, . . ., N,
where4j= 1 iftheithindividual doesnothavean alibi,otherwise fj= 0. Notethat
possessionof T and non-possession of an alibican be regardedas possessionof a
singlecompoundtraitand equation(5) gives
cX =c I IC#s,
V = Cs
O(G)=P(X S=s)P(C?sIS=s)
P(Xs=A=11C=S=s) P(C=sIS=s)
If assumption
1 and theuniform
priorremainvalidthen
Undertheadditionalassumptions is independent
thatT-possession of(S, C), thealibi
statusof anyindividualotherthanthecriminalis independentof (S, C) and alibi
statusis conditionally
independentof T-possessiongiven(S, C), thisbecomes
?
O(G)=E P(6si = 1) P(xj= 1 xs
i-7s
< var(p').
Sincep' is small,theaboveinequality
willbe almostexact.Also,
P(Xs = l) =E(p'1).
Thus,
ACKNOWLEDGEMENTS
WethankProfessor A. P. Dawidforfirstmakingus awareoftheoriginalisland
problem. D. V. Lindley,Dr N. G. PolsonandDr R. A. Nicholsalso
He, Professor
providedhelpful
comments on an earlydraftofthispaper.Theworkwassupported
inpartbytheUK ScienceandEngineeringResearch GR/F98727,
Councilundergrants
GR/G 11101and B/AF/1255.
REFERENCES
Balding,D. J.(1994)Estimating products inforensic identificationusingDNA profiles. To bepublished.
Balding,D. J. and Nichols,R. A. (1994)DNA profilematchprobability calculation: howto allow
forpopulation stratification,
relatedness,databaseselection andsinglebands.Forens.Sci. Int.,64,
125-140.
Berger,J.0. andWolpert, R. L. (1988)TheLikelihood Principle. Hayward:Institute ofMathematical
Statistics.
Berry, D. A., Evett,I. W. and Pinchin,R. (1992)Statistical inferencein crimeinvestigations using
deoxyribonucleic acid profiling (withdiscussion).Appl. Statist.,41, 499-531.
Billingsley,
P. (1979)Probability and Measure.NewYork:Wiley.
Brookfield, J. (1992)Law and probabilities. Nature,355,207-208.
Buffery,C., Burridge, F., Greenhalgh, M., Jones,S. andWillot,G. (1991)Allelefrequency distributions
offourvariablenumber tandem repeat(VNTR)lociintheLondonarea.Forens.Sci.Int.,52,53-64.
CaliforniaReporter (1968)PeopleversusCollins.Calif.Rep., 66, 497.
Chakraborty, R. and Kidd,K. K. (1991)The utility of DNA typing in forensicwork.Science,254,
1735-1739.
Cullison,A. D. (1969)Identification byprobabilities and trialbyarithmetic (a lessonforbeginners
in howto be wrongwithgreater HoustonLaw Rev.,6, 471.
precision).
DawidA. P. (1994)Theislandproblem: coherentuseofidentification evidence.InAspects ofUncertainty:
a Tribute to D. V. Lindley(eds P. R. Freemanand A. F. M. Smith),ch. 11. Chichester: Wiley.
Donnelly, P. (1992)Discussionon Statistical inference in crimeinvestigationsusingdeoxyribonucleic
acidprofiling (byD. A. Berry,I. W. Evettand R. Pinchin).Appl. Statist.,41, 524-525.
(1994)Thenon-independence ofmatches at different lociinDNA profiles: quantifying theeffect
of closerelatives on thematchprobability. Preprint.
Eggleston,R. (1983)Evidence, ProofandProbability, 2ndedn.London:Wiedenfield andNicholson.
SeymourGeisser(UniversityofMinnesota,Minneapolis):I shallrestrict
mycomments to DNA profiling
in the USA. The suspectand crimescene profileswill consistof probes on 2-5 loci and an arbitrary
is generally
analysis usefularenotsupported inthepaper.We definethefollowing. El = el -analysis
of the scene samplerevealscharacteristics e 1; E2 = e2-analysis of the suspectsamplereveals
e2; Fl =f 1-the frequency
characteristics of thecharacteristics
observed in thescenesampleamong
the populationdefinedby whatis knownof the criminalis f 1; F2 =f2-the frequency of the
characteristics
observed in thesuspectsampleamongthepopulation defined bywhatis knownofthe
suspectisf2; I is all ourotherknowledge.
Conditioning onthesuspect's thestrength
profile, oftheevidenceis measured bythelikelihood ratio
P(El =el 1E2=e2, Fl =fl, F2=f2, C?s, I)
P(El =el 1E2=e2, Fl =fl, F2=f2, C=s, I)
Thus,itis correct
thattheevidential y,butitis crucialtorealize
valueofa matchis reducedbya factor
thatthecorrectprioris fortheguiltof thegroupnot theguiltof theindividual.If we make thefurther
in thepopulation
thateachindividual
assumption beforetheevidenceis
is equallylikelyto be guilty
definitive, twinsexcepted.
identical Theremustbe somepoint,shortofa complete sequence,at which
theamountofDNA information is sufficient
to implyidentitybeyondreasonabledoubt.Atthatstage
therewillbe no needfortheanalysispresented hereor thealternative
analysesgivenbymanyother
researchers.The NorthCarolinaStateBureauof Investigation, forexample,nowhas eightvariable
number tandem repeatlociinuse.Evenwiththeunrealistic levelsofpopulation
substructure
advocated
bytheauthors, thechancethattwounrelated individualsina population
sharethesame16-alleleprofile
small,and evenforfullsibsthechanceis only1 in verymanythousands.
is vanishingly
The authorsrepliedlater,in writing,
as follows.
Severaldiscussants (Evett,Curnow, Morton, Brookfield) complain thatourmodelis 'highly idealized'
and'unrealistic' andhencenotgenerally useful.Theoriginal islandproblem is unrealistic.In contrast,
thegeneralized islandproblem, whichis thebasisforouranalysis, involves effectively no restrictions.
Thejointdistribution of traitpossessionis arbitrary. Concernthatitmaybe unrealistic to assumea
knownpopulation of possibleperpetrators is misplaced. Thiscan alwaysbe achievedbydefining the
'island'to be theentireplanet,so thatitspopulation is guaranteed to includethecriminal andN is
thepopulation oftheworld.Of course,a uniform prioris unlikely to be appropriate inthiscase,but
thegeneral analysis doesnotrequire a uniform priorandcanthusbe appliedtoany'realworld'situation.
For example,in thenon-sequential case equation(5) is quitegeneral,as notedin thepaper,as is the
morehelpfulequation(6) underassumptions 1 and 2.
In thecontext ofa criminal trial,thecentral questionis whether or nottheparticular individualon
trialis guilty.Failureto focuson thecorrect questionmayliebehindsomeof theconfusion, bothin
theliterature andinthediscussion, overtheappropriate analysiswhena suspectis identified through
a databasesearch.If theoneindividual whomatches is on trial,neither theposterior probability that
someoneinthedatabaseis guilty (Sudbury, butnotethathisG differs fromours)northeprobability
ofatleastonetraitbearerinthedatabase(Morton)is directly relevant.In thiscase,theevidence against
thesuspectis thatshe or he has thetraitand thatotherparticular individuals do not. Its correct
quantification is described in Section3.3. The evidential weightagainsttheindividual on trialis not
reduced bythedatabasesearch. No assumption about(Morton) orparameter for(Curnow) theprobability
thatthecriminal is in thedatabaseis neededforthelikelihood ratio.Of course,priorprobabilities
forthesuspect,and individuals notin thedatabase,are neededfortheposterior probability of the
suspect'sguilt.Databasesearchesmaygiveriseto casesin whichextremely smallpriorprobabilities
forthesuspect'sguiltareappropriate. (Notethatthepriorwilldependon theinformation aboutthe
suspect andthecrime.It is unlikely to dependonthesizeofthedatabasesearched.) Harding'sexample
of 'arbitrary arrest'also illustratestheimportance of knowing to whichquestiona 'solution'applies.
Preeceis correct inpointing outthatagreement overwhatcountsas evidence isnecessary forproductive
discussion. Thepaperfocuses onthesetting ofa criminal trialinwhichthe'evidence' effectivelyconsists
oftheinformation putbeforethejuryat trial.Legaldebateovertheadmissibility ofpossibleevidence
isthusnotrelevant. Brookfield suggeststhata jurymight giveweight toa beliefthattheforensic scientist
inthecasehasa non-trivial priorprobability ofthesuspect'sguilt,leadingto theiradoptionofa prior
probability whichis notextremely small.Suchreasoning bythejurywouldbe wrong, andakintogiving
evidential weight to thefactthatthesuspectis on trial.In general, theforensic scientist'sbeliefwill
be basedon someinformation whichis presented at trialandsomewhichis not.Theeffect of giving
weight totheirbeliefwillbe todouble-count theinformation presented and/orto allowevidence which
is notpresented, someofwhichmaybe inadmissible, to influence theverdict. Brookfield's argument
is phrasedin termsof usingthedatato decidewhether or nota particular prioris 'correct'.Thisis
misconceived, buthisimplication here,andhisargument elsewhere (Brookfield, 1994b),isthatextremely
smallpriorsareirrelevant to current criminal cases.We disagree, bothin principle and on thebasis
of someknowledge of particularcases.
Severaldiscussants havehighlighted practical errors in theDNA context. Ourexperience bearsout
Koehler'sconcernthat forensicscientists may be proneto errorsin statistical reasoning.His
documentation (Koehler,1993b)of errors in courtmakesdepressing reading.ThompsonandGeisser
both point to the importanceof possible laboratoryerrors.Since the paper was presented,the UK
Court of Appeal has quashed a second convictionbased primarilyon DNA evidenceand ordered
in thelightof concerns
a retrial, aboutanomaliesin theexperimental
evidence.Cooke'sexample,in
which a forensicscientistwas nevertold that the suspect's fatherwas a codefendant,is extremely
worrying.