You are on page 1of 47
STATE OF NEW YORK ST. LAWRENCE COUNTY COURT PEOPLE OF THE STATE OF NEW YORK: NOTICE OF against k MOTION TO PRECLUDE ORAL NICHOLAS HILLARY, : Indictment No. 2015-15 Defendant. Hon. Felix J. Catena Xx PLEASE TAKE NOTICE, that upon the annexed affirmation of EARL S. WARD and the accompanying memorandum of law, the undersigned will move the County Court of St Lawrence County on the 1 Day of July, 2016, at 9:30 a.m., or as soon thereafter as counsel may be heard, for an Order granting the following relief: 1. Precluding the prosecution from offering expert testimony as to the use of, or any results produced by, the forensic software tool STRmix because the use of this software for probabilistic genotyping is not generally accepted in the relevant scientific and legal communities as required by Frye v. United States, 293 F. 1013 (D.C. Cir, 1923); or, in the alternative, granting a pre-trial Frye hearing on the issues; or in the alternative, 2. Precluding the prosecution from calling an expert witness to testify on their ditect case regarding any conclusion reached by the use of the STRmix on the ground that the prosecution cannot lay a foundation for the introduction of the evidenee; specifically that the application of the STRmix exceeds the limits of validation, rendering it unreliable, see People v. Wesley, 83 N.Y.2d 417 (1994) and Parker v. Mobil Oil Corp,, 7 N.Y.3d 434, 447 (2006), or in the alternative, directing that a hearing be held on the reliability of any proposed testimony about the aforementioned, See People v. Wesley, 83 N.Y.2d 417, DATED: New York, New York May 31, 2016 Respectfully yours, He OY Earl S. Ward Counsel to Mr. Hillary TO: The Hon, Mary Rain Office of the District Attomey, St, Lawrence County Clerk of Court St. Lawrence County STATE OF NEW YORK ST. LAWRENCE COUNTY COURT PEOPLE OF THE STATE OF NEW YORK, AFFIRMATION — against ~ ORAL NICHOLAS HILLARY, : Indictment No, 2015-15 Defendant. Hon. Felix J. Catena x AFFIRMATION EARL S. WARD, an attorney admitted to practice in the courts of this State, hereby affirms, under penalty of perjury, that the following is true, except for those statements made upon information and belief, which are believed to be true: 1, Tam counsel of record for ORAL NICHOLAS HILLARY. This Affirmation is ‘made based upon information and belief based upon review of the record in this matter and all prior proceedings to date. 2. This Affirmation is made in support of Mr. Hillary’s motion to preclude the prosecution from offering expert testimony regarding any DNA results obtained using the forensic software program STRmix. In the alternative, Mr. Hillary requests a Frye hearing on the issues raised. RELEVANT FACTUAL BACKGROUND A. Background of Case 3. On October 24, 2011, the Potsdam Police Department received a eall from Marissa Vogel, neighbor of Garrett Phillips, stating that she heard moans and the word “help” coming from Garrett Phillip’s apartment at 100 Market Street (referred to hereinafter as Crime Scene). 4. Officer Mark Wentworth of the Potsdam Pol Department arrived at the Crime Scene at approximately 5:16 p.m. 5. When Officer Wentworth knocked on the door of the apartment, he heard what sounded like someone quietly walking around. 6. At 5:24 pm, Officer Wentworth reported that he still heard someone walking around the apartment. 7. Shortly after 5:37 pm, Officer Wentworth entered the apartment with the landlord, Rick Dumas, and found Garrett Philips unconscious in the bedroom, 8. There was no one else in the apartment, 9. Garrett Phillips was pronounced dead on 7:18 p.m. on the evening of October 24, 2011. 10. ‘The death was ruled a homicide, as there was evidence that Garrett Phillips had been killed by strangulation. 11, The defendant Oral “Nick” Hillary became a suspect, although the district attorney at the time would decline to prosecute him. Mr. Hillary had had an intimate relationship with Tandy Cyrus, the mother of Garrett Phillips. They had briefly lived together, but had broken up. Ms. Cyrus also had had an intimate relationship with Deputy John Jones, who visited the crime scene and participated in the initial investigation. 12, Insecking to identify the killer or killers and to rule out Mr. Hillary as a suspect, law enforcement performed an investigation, including the collection of dozens of DNA samples taken from multiple areas of the Crime Scene, the body and clothing of Garrett Phillips, the interior of Mr. Hillary’s car, clothing and other items seized from Mr. Hillary’s home, and a pseudo-exemplar sample of Mr. Hillary’s own DNA profile, and multiple areas of the Crime Scene. 13, Lawenforcement hypothesized that the killer or killers had escaped from the Crime Scene by exiting a window in the rear of the apartment, dropping to a lower roof, and then landing in the backyard. DNA samples were collected that tracked this possible avenue of egress. The New York State Police crime lab processed an enormous number of samples from the Crime Scene. For example the lab attempted to extract Human DNA from “Material removed from crack in tile,” “Hair from tile,” “Swabs from window and screen,” “Swab from exterior wood sill,” “Swab from exterior stone sill,” “Swabs from the window and screen,” “Small black fuzz,” “Swabs from the rear upper and lower left regions of the shorts,” “Swabs from the rear upper and lower right regions of the shorts,” “Swabs from the rear left shoulder region of the shirt,” “Swabs from the outside rear right shoulder region of the shirt,” “Swabs from the left outside rear back/side region of the shirt,” “Swabs from the right outside rear back/side region of the shirt,” “Swabs from the left outside rear arm region of the shirt,” “Swabs from the right outside rear arm region of the shirt,” “Swabs from the inside/outside rear waistband of the underwear,” “Swabs from the side (-18") of the roof tile,” “Swabs from the side (-21") of the roof tile,” “Swabs from the inside of the roof tle, (straight edge side)” “Swabs from the inside of the roof tile, (broken edge side)” “Swabs from the edges of the roof tile, (mise. pieces)” “Swabs from the undemeath of the roof tile, (mise. pieces)” “Swab from the button,” “Swab from the threads,” “Swabs from the outside bottom region of the sock,” “Swabs from the outside top region of the sock,” “Swab from the outside ankle elastic region of the sock,” “Swabs from the outside bottom region of the sock,” “Swabs from the outside top region of the sock,” “Swab from the ankle elastic region of the sock,” “Paint scrapings - scuff mark,” “Blood stained cutting from the shorts,” “Blood stained cutting from the long sleeve shirt,” “Cutting from the fluorescent area of the long sleeve shirt,” “Swabs from the outside neck region of the long sleeve shirt,” “Swabs from the inside neck region of the long sleeve shirt,” “Cutting from the fluorescent area of the long sleeve shirt,” Jood stained cutting from the long sleeve shirt,” “Cutting from the fluorescent area of the black long sleeve jersey,” “Cutting from the fluorescent area of the black long sleeve jersey,” “Swab from the ripped area in the front of the black long sleeve jersey,” “Swabs of the ripped neck region of the black long sleeve jersey,” “Swabs from the lower front arm region of the black long sleeve jersey,” “Swabs from lower rear arm region of the black long sleeve jersey,” “Oral swabs from Garrett J. Phillips,” “Fingernail scrapings from the right hand of Garret J. Phillips,” “Fingernail scrapings from the loft hand of Garret J. Phillips,” “Blood stained swab from the "nail left thumb" of Garrett J. Phillips,” “Blood stained swab from the "nail left thumb" of Garrett J. Phillips,” “Blood stained swabs from the "skin anterior neck" of Garrett J. Phillips,” “Blood stained swabs from the "skin anterior neck" of Garrett J. Phillips,” “Swab from outside palm (leather) of left glove,” “Swab from outside finger tips of left glove,” “Swab from outside wrist region of left glove,” “Swab from fluorescent area on fingertip of left glove,” “Swab from top side of finger tips of left glove,” “Swabs from inside left glove,” “Swab from left front door handle,” “Swab from left front door arm rest pocket,” “Swab from left front seatbelt clip,” “Swab from shifting lever,” “Swab from steering wheel,” “Swab from exterior wood sill,” “Swabs from inside blind,” “Swabs from outside blind,” “Swab from screen range from --0-10 in,” “Swab from screen range from -10-20 in.,” “Swab from screen range from-20-30 in.,” “Swab from screen range from -30-40 in.,” “Blood stained Swab from ...-bedroom - upper part of trim molding,” “Blood stained Swab from master bedroom - middle part of trim molding,” “Blood stained Swab from... bedroom - lo-part of trim molding,” “Swab from door look knob,” “Swab from door lock plate,” “Swabs from sleeve of sweatshirt,” “Swabs from sleeve of sweatshirt,” “Swabs from hood region of sweatshirt,” “Cutting of fluorescent area of sweatshirt,” “Cutting of fluorescent area of sweatshirt,” “Cutting of fluorescent area of sweatshirt,” “Blood stained cutting from the envelope,” “Swab from inside of left bra cup,” “Swab from outside of left bra cup,” “Swab from inside of right bra cup,” “Swab from outside of right bra cup,” “Swab from inside grey bracelet,” “Swab from outside grey bracelet,” “Swab from inside blue bracelet,” “Swab from outside blue bracelet,” “Swabs from lower left leg region of Adidas sweatpants, “Swabs from lower right leg region of Adidas sweatpants, “Swabs from outside of left glove (palm side), “Swabs from outside of left glove (backside), “Cutting of fluorescent area on left glove, “Cutting of fluorescent area on left glove, “Swabs from outside of right glove (palm side), “Swabs from outside of right glove (backside), “Cutting of fluorescent area on right glove, “Swabs from lower left leg of Admiral sweatpants, “Swabs from lower right leg of Admiral sweatpants, “Cutting of fluorescent area on Admiral sweatpants, “Cutting of fluorescent area on Admiral sweatpants, “Human hair from Admiral sweatpants, “Swabs from lower left leg of Lotto sweatpants, “Swabs from lower right leg of Lotto sweatpants, “Cutting of fluorescent area on Lotto sweatpants,” “Blood stained swab from the left inside chin and jaw area of the EMT collar, “Blood stained swabs from the inside back of the neck area of the EMT collar, “Blood stained cutting from the velcro strap of the EMT collar, “Blood stained swabs from the right inside chin and jaw area of the EMT collar, “Swabs from the inside throat area of the EMT collar, “Swabs from the inside back of the neck area of the EMT collar, “Blood stained swabs from the outside back of the neck area of the EMT collar, “Blood stained swabs from the outside chin and jaw area of the EMT collar, “Swabs from bottom base of blind, “Swabs from "inside" blind slats - 1 to 17,” “Swabs from “outside” blind slats - 1 to 17,” “Swabs from screen - lower comer,” “Swabs from screen - upper lower comer,” “Swabs from screen - upper comer~ near break,” “Swabs from screen - opposite side comer,” “Swabs of latent lift from lower left interior window frame,” and “Swabs of latent lift from lower left interior window frame.” 14, The profile of Nick Hillary and Garrett Phillips were compared against all samples where comparisons could be made. 15, A definite trend occurred in these DNA evidence as reported: Mr. Hillary ‘was excluded from all samples at the Crime Scene where comparisons could be made. Meanwhile, Garrett Phillips was excluded from samples taken from Mr. Hillary’s vehicle and from the items seized from Mr. Hillary’s home, in samples where comparisons could be made. ‘The one exception to this evidentiary trend was a report by the NYSP lab that the DNA mixture profile from the fingernail scrapings from the left hand of Garrett Phillips was consistent with DNA from Mr. Phillips as the major contributor, admixed with DNA from at least one additional donor. The lab further reported that due to insufficient genetic information, Mr. Hillary could neither be included nor excluded as a possible contributor of DNA to the left hand fingernail scrapings mixture. 16. Beginning in 2013, the NYSP lab reached out to Dr. Mark Perlin of Cybergenetics, Ine. in Pittsburgh, PA. Founded by Dr. Perlin, Cybergeneties is the proprietor of TrueAllele, a probabilistic genotyping software program that has been admitted into more courts in the United States than any other DNA expert system. TrueAllele is a comprehensive software system that utilizes the electronic data files from the genetic analyzing instrament from the lab. ‘The program is based upon the premise that more rather than less data should be used when assigning a probability in a DNA forensic comparison. 17, TrueAllele attempts to take into account into all of the data and all possibilities of what the data represents, TrueAllele generates likelihood ratios (“LRs”) for each comparison called a match statistic, that is similar to the random match probability used in conventional DNA testing of an individual profile 18. At the behest of the prosecution, the NYSP requested that Dr. Perlin compare the left hand fingernail scrapings mixture against the DNA profiles of Garrett Phillips and Mr. Hillary. 19. Using TrueAllele, Dr. Perlin performed the comparison. The results were a match statistic with no statistical support. In other words, the computer-generated statistic by TrueAllele was consistent with the human expert interpretation that there was inconclusive evidence as to whether Hillary’s profile was inchided in the mixture from the left hand fingernail shavings. 20. After Dr. Perlin informed NYSP in 2013 (and again in 2014) that the TrueAllele comparison provided no statistical support for a match with Mr. Hillary, the prosecution declined to order a report from Cybergenetics, or to employ Dr. Perlin in any other work in this case. 21. In 2014, after running on a campaign to find the killer of Garrett Phillips, the Office of District Attomey Mary Rain indicted Mr. Hillary. The indictment was later dismissed for prosecutorial misconduct in the grand jury. Onondaga County District Attomey William Fitzpatrick also entered the case as a special prosecutor. 22, The St. Lawrence County District Attorney's Office re-indicted Mr. Hillary in 2015. 23. ADA Fitzpatrick reached out to Dr. John Buckleton, a forensic scientist from New Zealand who had helped develop STRmix at the Institute of Environmental Science and Research (“ESR”). 24, The NYSP subsequently sent ESR electronic raw data files, and a scientist at the ESR ran the data in STRmix. 25, There is no indication in the materials provided to the defense that the NYSP ever performed an internal validation study for the use of STRmix on casework samples developed at the NYSP lab. 26. There have been a series of draft affidavits from Dr. Buckleton on the left hand fingernail scrapings. The most recent affidavit was dated April 14, 2016. 27. The LR statistics for the left hand fingernail scrapings has ranged from ten million to ten thousand, depending on how a phenomenon called ‘stutter’ is treated. The most recent LR statistic was generated after the adoption of a forward stutter model. STRmix now reports the LR to be roughly 300,000 as to Mr. Hillary. B. Conventional DNA Analysis 28, Understanding probabilistic genotyping software like TrueAllele and STRmix requires an understanding of general principles of DNA testing. DNA is a molecule containing genetic material that codes for the unique physical characteristics of human beings. An individual inherits half of his DNA from his mother and half from his father. Each person's DNA is unique, with the exception of identical twins. 29. DNA is comprised of four chemicals called nucleotides, or bases: adenine (“A”), cytosine (“C”), guanine (“G”), and thymine (“T”), These bases pair together in the following way: A with T; C with G. These pairs repeat in varying lengths and form the rungs on the double helix that constitutes the DNA molecule. The double helix is wound very tightly into a chromosome. 30. A “gene” refers to a sequence of base pairs along a given portion of the DNA double helix which codes for a certain trait. Different genes are located in different places, or loci, along a chromosome. An allele is one of several alternative forms of a gene that occurs at the same position on a specific chromosome. In other words, an allele is a variation in the number of times the base pairs of DNA repeat at a particular locus on a particular chromosome. This number of repeats varies among humans, who have two alleles at each locus of each chromosome, inheriting one allele from each parent. Modern forensic analysis therefore focuses on Short Tandem Repeats (STRs), i.e. the number of times the base pairs repeat at a variety of loci along a person’s chromosomes. By measuring and comparing the number of repeats at given loci, an analyst can distinguish one individual from another. 31. Currently, in developing a DNA profile, the NYSP examines fifteen loci, plus the gender-determining locus, “Amelogenin,” Numbers are used to represent which alleles are present at each locus. Each person has two alleles at each locus (one from each parent). Ifa person inherits the same allele at a locus from each parent, the person is “homozygous” at that locus; if the inherited alleles are different, the person is “heterozygous” at the locus. 32. Forensic DNA analysis is essentially a six-step process. First, DNA is extracted from the evidence, e.g, the swabs or scrapings taken from the physical evidence. In the second step, quantitation, the analyst measures the amount of DNA present in the sample being tested. The third step, amplification, involves polymerase chain reaction (PCR), a process of heating and cooling that makes millions of reproductions of DNA so that the DNA sample becomes more u robust, and more easily analyzed. In the fourth step, after the DNA is amplified, a process known as electrophoresis separates the STR fragments by size. The electrophoresis results appear as a series of peaks on a graph, known as an electropherogram. Once the electropherogram is generated, the analyst reviews it and draws conclusions about the DNA sample, with a view toward developing a DNA profile, thus constituting the fifth step. In the sixth and final step, the analyst compares a forensic DNA profile with a known DNA profile, and draws additional conclusions 33, DNA testing and typing can be complicated by the existence of what are called stochastic effects. These are random fluctuations in testing results that can adversely influence DNA profile interpretation and usually occur in low level samples. These stochastic effects include: stutter, which is a peak that is typically one repeat unit less in size than a true allele, but, is not itself a true allele; “drop-in,” which is an allele that is not from a contributor to an evidence sample, usually due to low levels of contamination; “drop-out,” which is the failure to detect an allele that is actually present in a sample, due to small amounts of genetic starting material. ional stocha Furthermore, in complex mixtures, ad ¢ effects can complicate accurate interpretation, These are: peak height imbalance (disparities of height between two peaks from the same contributor); machine noise (background noise captured by capillary electrophoresis, machine); degradation (random breaks in the DNA molecules); preferential and locus-specific, amplification (more ready amplification of some DNA types versus others) 34, A DNA profile on a piece of evidence or from a crime scene can be a single source, meaning coming from a single contributor, or a mixture, DNA mixtures arise when two or more individuals contribute DNA to a sample. An analyst can tell that a sample contains a mixture because more than two allele peaks will appear at one or more loci. In standard DNA 2 analysis, the peak heights of one contributor may stand out, and thus readily distinguish their alleles from those of the one or more other contributors. Once an analyst determines the number of contributors to a mixture sample, they can determine the genotype of the contributors by grouping together the alleles with similar peak heights. C. _ Uninterpretable Mixtures and Probabilistic Genotyping 35, Low level DNA samples and DNA mixtures of two or more contributors pose @ problem to DNA forensic analysts. In standard analysis, the peak heights of one contributor may stand out, and thus the analyst can readily distinguish his alleles from those of the one or more other contributors, But itis often the case, especially with relatively small contributors seen in high sensitivity analysis, that the sample contains a soup from which each individual’s alleles cannot be separated out and placed in a profile. In the past analysts dealt with this challenge by calculating statistics concerning the probability of inclusion. But these statistics were general in nature and continue to be the subject of much controversy. See, William C. Thompson, Laurence D, Mueller, and Dan E. Krane, Forensic DNA Statistics: Still Controversial in Some Cases, THE CHAMPION, December 2012, 12-23. 36. Probabilistic genotyping software programs are designed to calculate a statistic to contributors of such mixtures when one could not be determined in the past. These programs use biological modeling, statistical theory, computer algorithms, and probability distributions to calculate likelihood ratios (LRs). LRs are the statistic calculated by these probabilistic programs, which reflects the relative probability of a particular finding under alternative theories about its origin. /d, In forensic DNA analysis, that LR can be stated as the profile is x amount of times ‘more likely if the prosecutor's hypothesis is true than the defendant’s hypothesis. The prosecutor’s hypothesis typically is that the defendant and a certain number of other unknown, unrelated contributors contributed to the mixture, while the defendant’s hypothesis (which, disturbingly, is not provided by the defendant, but by the operator of the software) is that the same total number of unrelated people were the contributors. D. What is STRmix ? 37, STRmix, the probabilistic genotyping software at issue in this case, does not solely rely on the science of DNA typing and interpretation. Instead, it also uses computer seience algorithms to perform “complex” mathematical and statistical calculations. See Jo-Anne Bright, Duncan Taylor, et al. “Developmental validation of STRmix, expert software for the interpretation of forensic DNA profiles, Forensic Science Int'l: Genetics”(accepted manuscript), 2016, p. 227 (Hereinafter “Developmental validation of STRmix”), Indeed, STRmix is a sofiware program—it is not used for any of the other steps of DNA analysis described above. ‘See generally, id. tis only after a DNA analyst in a laboratory perform the regular steps developing a DNA profile from a sample, that STRmix adds an additional step performed not in a lab by a trained scientist, but instead, by a person sitting at a computer screen, running a complex computer software program. This program is designed to answer the classic question in forensic DNA interpretation: what are the profiles of the contributors to this mixture? But ‘STRmix answers this old question in a new, novel and unique way. 38. The backbone of the STRmix software system is a computing algorithm called the Markov Chain Monte Carlo (MCMC) method of calculating probable outcomes. Ia. at 233. The implementation of the MCMC algorithm in STRmix utilizes statistical models to simulate hypothetical true alleles while incorporating stochastic effects. Jd. It then assesses those simulated alleles and then makes conclusions about what is true DNA as opposed to artifacts ina sample, Jd. Based on those conclusions, the likelihood ratio is then generated as a further statistical assumption. The reason that MCMC is used is that there are an exponentially enormous number of combinations of assumptions and outcomes that arise from any mixed sample. It would be practically impossible to do such a calculation without a computer running sophisticated software. 39, The MCMC method has shortcomings, however. First, it does not report results for all of the calculations performed. Instead, it discards a certain number of calculations made at the beginning of the analysis. The discarded data is called “burn-ins.” Id, at 234, These “burn-ins” could contain probative DNA, but are discarded because, theoretically, they are expected to be less likely to lead to probative results, d, at 234. 40. The remaining data simulations are all calculated toward the ultimate result. Each mulation is called “random walks” whereby the algorithm running on the software is programmed to randomly make an assumption about the data which, in tur, leads to the ultimate conclusions. /d. at 235, In order to generate results this random simulation is performed thousands, or even millions, of times for a single mixture sample. This randomness is the key to the application of the MCMC algorithm to probabilistic genotyping. By randomly simulating a supposedly sufficient subset of all possible outcomes, MCMC can generate an informed probability distribution for the genotypes of the individuals whose DNA together composes a mixed DNA sample. Critics of STRmix have expressed concerned, however, that by not considering all of the data, the correct answer to the likelihood ratio question could be overlooked or miscalculated. ‘The probability distribution of the MCMC algorithm could compound this inaccuracy by overly weighting an inaccurate or incorrect answer. If the path that the “random walk” takes, or the models that guide it, are imprecise or poorly defined, then the probabilities produced by STRmix could not only be inaccurate themselves, but also, by the nature the MCMC algorithm are not reproducible. 41, The developers of STRmix acknowledge in fact that “[tJhe results of no two analyses will be completely the same using a stochastic system like MCMC. This is a phenomenon that is relatively new to forensic DNA interpretation, which up until this point has always had the luxury of, at least theoretically, completely reproducible interpretation results.” “Developmental validation of STRmix,” p. 233. Disagreement Amongst Forensic Scientists About STRmix and Other Similar Software Programs 42. There is no agreement within the forensic community about which probabilistic software programs or methods to employ when analyzing low template DNA or complex mixture samples. To date, there are eight different probabilistic genotyping software programs in the country. In New York state alone, courts have considered three different probabilistic software programs that use different methodologies in their analysis. These programs include: STRmix, The Forensic Statistical Tool (FST), and TrueAllele, They vary in the manner in which they collect data, the necessary assumptions they make to perform their statistical calculations, and the actual underlying mathematical principles used to make these calculations. 43. STRmix has thus far been accepted by only one trial court as passing the Frye standard in New York state. See People v. Bullard-Daniel, Ind. No. 2015-88 (Niagara Co. March 10, 2016) STRmix was developed by the Institute for Environmental Science and Research (ESR) in New Zealand. The program relies on analysts to collect the data by reviewing the electopherograms (epg) developed in a case and discarding the peaks below the lab’s analytic threshold, See Duncan Taylor, Jo-Anne Bright, and John Buckleton, The Interpretation of Single Source and Mixed DNA Profiles, Forensic Sci Int'l: Geneties 7 519 (2013). Artifacts like pull-up 16 and forward stutter are also removed manually Id. Most probabilistic software programs assume that allele “drop-out” occurs at a predictable rate, but differ on how to determine that rate. “Drop-out” is a stochastic effect that occurs when an allele is not seen in a given DNA profile, though the analyst would have expected to see the allele, The “drop-out” rate is an important assumption these programs make to calculate a statistic, STRmix calculates the “drop-out” rate ina sample based on peak height variances. Peak height variability in the observed profile is analyzed having regard to the peak height variability experienced in the laboratory generally. STRmix utilizes MCMC algorithms to perform the required statistical calculation. 44, The Forensic Statistical Tool (FST) is another probabilistic software program used in New York State that has been the subject of Frye hearings. See e.g. People v. Rodriguez, N.Y. Co, Ind. No. 5471/2009 (Sup. Ct. N.Y.Co. October 24, 2013) (FST passes the Frye standard). But see People v. Collins, 49 Mise.3d 595 (Kings Co, Sup. Ct. 2015) (Dwyer, J.) (FST fails the Frye test). It was developed in-house at the Office of Chief Medical Examiner (OCME) of the City of New York by Dr. Theresa Caragine and Dr. Adele Mitchell and is used in all complex mixture cases in New York City. Similar to STRmix, FST relies on analysts to collect the data used in the calculations and analysis, An analyst reviews the electropherogram and determines whether alleles are present at each locus by utilizing the lab’s analytic threshold, The analyst inputs this information, along with a known suspect profile, into the FST software, The analyst then sets the parameters for running the program including, whether the mixture contains ‘two or three contributors. The software then outputs a “Forensic Statistic Comparison Report,” summarizing the data that was input and indicating the resultant likelihood ratio. 45. PST differs from STRmix in how it calculates the LR. Unlike STRmix, FST does not use MCMC algorithms in making these calculations, instead relying only on Bayesian statistics. Bayesian statistics describe the probability of an event, based on conditions that might be related to the event. See John Butler, Forensic DNA Typing: Biology, Technology, and Genetics of STR Markers 459 (Second Ed., Elsevier Academic Press 2005). FST also calculates the “drop-out” rate differently than STRmix. FST calculates the allelic “drop-out” based on the quantitation values of given DNA samples, rather than peak height variation, 46. TrueAllele is the third different program that has been the subject of a Frye hearing, but even it differs significantly from STRmix and FST in the manner in which it collects, interprets, and calculates the data. See People v. Wakefield, 47 Misc.3d 850 (Sup. Co. Schenectady Co. Feb. 9, 2015). TrueAllele was developed by Cybergeneties of Pittsburgh, PA under the direction of Dr. Mark Perlin, TrueAllele is a fully continuous probabilistic approach that analyzes the epgs and considers the genotypes at every locus of each contributor, taking into consideration the mixture weights of the contributors, the DNA template mass, polymerase chain reaction (PCR) stutter, relative amplification, DNA degradation, and the uncertainties of all these variables. 47. Unlike FST and STRmix, TrueAllele does not rely on an analyst’s interpretation of what constitutes a true allele by using analytical thresholds dictated by laboratory protocol in order to collect its data. True Allele instead, considers all the data present in the sample, even those peaks below the lab’s analytic threshold, In essence, the calculations made by TrueAllele are based upon more information than used by FST and STRmix. Unlike FST, TrueAllele accounts for “drop-out” rates as a function of peak heights and peak height ratios seen in the sample rather than based on the quantity of DNA in the sample. Like STRmix, but unlike FST, it uses MCMC algorithms to calculate likelihood function that compares genotypes relative to a population and computes a match LR. See Wakefield, 47 Misc.3d at 859, E. The Standards of the Computer Science Community Have Been Ignored by STRmix. 48. While new to forensic biology, the use of the MCMC algorithm processing has been long-used by computer scientists. Programs using MCMC, like all other commercial software programs, universally undergo a rigorous validation process within the computer science community before being accepted by that community for publie use. STRmix’s creators have avoided the validation process of the computer science community, however, focusing exclusively on the forensic biology component of the program. 49. Asset forth in the annexed affidavit of Nathaniel Adams, the failure to demonstrate the development of the STRmix software system was in accordance with software engineering industry standards is devastating to the reliability of its results. Mr. Adams is a Systems Engineer at Forensic Bioinformatic Services, Inc. in Fairborn, Ohio. His duties include analyzing electronic data generated during the course of forensic DNA testing; reviewing case materials; reviewing laboratory protocols; and performing calculations of statistical weights, including custom simulations for casework and research. See Affirmation of Nathaniel Adams (annexed hereto) (Adams Aff.), § 1. 50. As Mr. Adams points out, the likelihood ratio calculation done by STRmix has no “ground truth.” In other words, the likelihood ratio is totally dependent upon assumptions made ‘when modeling stochastic processes, see supra. It provides no benchmark from sample to sample or test to test. Therefore, Mr. Adams observes, “we must base our confidence in the program on ‘two factors: (a) The appropriateness of the models used. This factor is generally within the domain of biologists and statisticians; and (b) The degree of fidelity with which these models have been translated from theory/concept to source code for execution as a software program. This factor is generally within the domain of software developers/engineers.” Adams Aff., 46. 51. Fora software system to be considered validated by the software engineering community it must be tested and validated using relevant industry standards. See Adams Aff, {] 7-8, Standards for software development and validation are published by several internationally recognized organizations within that community, including the Institute of Electrical and Electronics Engineers (IEEE), the ISO, and the Associating for Computing Machinery. See Adams Aff, 47. 52. IEEE has published verification and validation guidelines for new software coming on the market. See Adams Aff, 417. These systems are to be tested to make sure they perform as expected in a general sense, and also to make sure that there are no latent or dormant “bugs” in the system which could appear in later uses to devastating effect. See Queensland authorities confirm ‘miscode’ affects DNA evidence in criminal cases, supra, In addition to making the software available for independent testing, the maker of new software should also provide documentation concerning internal tes yg and validation. For software to pass and similar standar validation and verification in accordance with I ;, the creator of the software must demonstrate that it has gone through this rigorous process. See Adams Aff. ff 12-13, and attached guide to “The Software Development Process.” 53. Validation and verification is essential to the reliability of probabilistic genotyping systems like STRmix. As the geneticists Christopher D, Steele and David J. Balding point out that, although probabilistic genotyping has promise, before it can be used by fore biologists, the underlying computer science must be validated: Laboratory procedures to measure a physical quantity such as a concentration can be validated by showing that the measured concentration consistently lays within an acceptable range of error relative to the true concentration. Such validation is infeasible for software aimed at computing an LR because it has no underlying true value (no equivalent to a true concentration exists). The LR 20 expresses our uncertainty about an unknown event and depends on ‘modeling assumptions that cannot be precisely verified in the context of noisy CSP [crime scene profile] data, ‘Some progress can be made in evaluating the validity and performance of software. Courts need these kinds of evaluations to have confidence in the results of software-based forensic analyses. Open source software is highly desirable in the court environment because openness to scrutiny by any interested party is an waluable source of bug reports and suggestions for improvement. C.D. Steele and D. J. Balding, “Statistical evaluation of forensic DNA profile evidence, Annu, Rev. Stat. Its Appl., vol. 1, pp. 361-384, 2014. See also, Adams Aff. 5. 54, Here, STRmix is neither “open source” nor “open to scrutiny.” Although STRmix has internally validate procedures by biologists and statisticians, and has endeavored to follow the forensic DNA guidelines outlined by SWGDAM, it has not been demonstrated to have undergone any validation process described by IEEE, ISO or the Association for Computing Machinery. See Adams Aff, 411. ‘The result is that source code errors have called into question STRmix statistics to obtain criminal convictions. See Queensland authorities confirm ‘miscode’ affects DNA evidence in criminal cases, The Courier Mail, Mar.21, 2014." * Available at http://www, couriermail.com.awnews/queenstand/queensland-authorities-confitm-miscode-affects-dna-evidence-in- criminal-cases/news-story/833c580d3#1c59039efdladefSSaf92b, 21 MEMORANDUM OF LAW POINT 1 STRMix Is NOT GENERALLY ACCEPTED; EVIDENCE RELATED To RESULTS OBTAINED THROUGH ITs USE, SHOULD BE PRECLUDED UNDER FRYE; IN THE ALTERNATIVE, THERE SHOULD BE A HEARING A. THE APPLICABLE LAW New York courts have adopted the test set forth in Frye for the admission of scientific evidence. Wesley, 83 N.Y.2d at 422-23 (citing Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923)). The Frye test poses the elemental question of “whether the accepted techni es, when properly performed, generate results accepted as reliable within the scientific community generally.” People v. LeGrand, 8 N.Y.3d 449, 457 (2007) (intemal quotations omitted). As __Just when a scientific principle or discovery crosses the line between the experimental and demonstrable stages is difficult to define. Somewhere in this twilight zone the evidential force of the principle must be recognized, and while courts will go a long way in admitting expert testimony deduced from a well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs. 293 F. at 1013-14. Unanimous endorsement by the scientific community is not required, but there must be “general acceptance in the relevant scientific community that a technique or procedure is capable of being performed reliably.” Wesley, 83 N.Y.2d at 435 (Kaye, C. J., concurring). As Judge Kaye admonished in her concurring opinion in Wesley: Its not for a court to take pioneering risks on promising new scientific techniques, because premature admission both prejudices litigants and short-circuits debate necessary to determine the accuracy of a technique. Premature acceptance of “revolutionary” forensic techniques has led to wrongful convictions. Id. at 437, n. 4 2 General acceptance of novel scientific evidence may be demonstrated through expert testimony, judicial opinions, and/or scientific and legal writings. Lahey v. Kelley, 71 N.Y.24 135, 144 (1987); People v, Middleton, 54 N.Y.2d 42, 49-50 (1981). The determination of whether a scientific principle or technique is generally accepted in the relevant scientific “emphasizes counting scientists’ votes, rather than . .. verifying the soundness of a scientific solution.” Wesley, 83 N.Y.2d at 432 (Kaye, C. J., concurring) (emphasis added); see LeGrand, 8 N.Y.3d at 457 (same). Thus, “Frye is not concerned with the reliability of a certain expert's conclusions, but instead with whether the experts’ deductions are based on principles that are sufficiently established to have gained general acceptance as reliable.” Nonnon v. City of New York, 32 A.D.3d 91, 103 (1st Dept. 2006) (internal quotations and citations omitted). The proponent of the disputed evidence shoulders the burden of proving general acceptance in the relevant scientific community. People v. Rosado, 25 Mise. 3d 380, 384 (Sup. Ct. Bronx Co. 2009) (citing Zito v. Zabarsky, 28 A.D.3d 422 (2d Dept. 2006). B. — STRmix Does Not Produce Results That Are Generally Accepted As Reliable Within the Relevant Scientific Community i Probabilistic Genotyping Was Not Contemplated by Wesley; It is Not “Generally Accepted’ by the Forensic Biology or DNA Analysis Community To be sure, traditional DNA testing and “visual comparison” have been admissible under Frye in New York since 1993, See Wesley, supra. However, the testing here, using novel probabilistic genotyping software is far from traditional, and its admissibility is anything but well-settled. The present posture of probabilistic genotyping in the forensic science field is, nothing like what was presented in Wesley, and, thus, that case does not answer the Frye question here. 23 First, when considering whether forensic DNA analysis was admissible, the Wesley court ‘was presented with only one method of testing. Here, there are numerous methods of probabilistic genotyping used in different labs, none of which has the backing of a general consensus of the forensic science community. Although the technique was novel at the time, the court in Wesley was not presented with alternate methods of creating and comparing single source profiles. In this case, however, within the New York state alone, labs use three different probabilistic genotyping software programs, which collect, analyze, and calculate the data from the DNA sample in different ways. There is no agreement within the forensic community which method or combination of methods is best to carry out this type of testing and which provides the ‘most accurate conclusions. Without such an agreement, it cannot be said that STRmix meets the Frye standard. Second, Wesley never contemplated a DNA analysis method as complex and opaque as STRmix’s probabilistic genotyping method. In Wesley, the DNA method of comparison that was accepted by the court was a side-by-side visual comparison of dark bands on two DNA print pattems to see if they match. Once a match was determined, it was determined the frequency ‘with which a specific allele occurs within a specific population. See Wesley, supra. STRmix and other probabilisti genotyping software programs go well beyond this. These programs use complicated statistical models and algorithms to calculate the presence of alleles that are not seen in a sample and predict the existence of genotypes in a mixture sample that cannot be determined by looking at an electropherogram. They apply mathematical principles of modeling, that although may have value in certain fields, push the boundaries of acceptance in the field of forensic DNA analysis. See e.g. People v. Collins, 49 Mise.3d 595 (Kings Co. Sup. Ct. 2015) (Dwyer, J.) (“Further, the fact that FST software is not open to the 24 public, or to defense counsel, isthe basis of a more general objection”). Given that the techniques used by STRmix go beyond the techniques accepted by the court in Wesley, Wesley provides little supporting in determining whether STRmix meets the Frye standard, ii, Since STRmix Combines DNA Analysis with Computer Science Principles, it Cannot be Considered Generally Accepted As It Has Disregarded Computer Science Standards To be admissible under Frye evidence must be “generally accepted” not just in any scientific community, but in the “relevant” scientific community. Wesley, 83 N.Y.2d 423, “The thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.” Frye, 239 F, 1014 (emphasis supplied). In Wesley, the Court found that standard DNA testing passed the Frye test. The Court described that testing as a multi-step process performed in a scientific laboratory by forensic biologists. Wesley, at 423-44. At the conclusion of the testing, the forensic scientist - a live person -- made a visual comparison of the DNA on the sample to that on the suspect in order to determine whether there was a “match.” Id. at 425. The Court found, based on the hearing record developed below, that this manner of DNA testing was generally accepted within the ‘community of forensic biologists and DNA analysts. Wesley made no mention of probabilistic genotyping or the use of computer software to make this analysis. That Court did not consider whether the same community that accepted “visual” DNA comparison also could do so for a comparison that had nothing to do with DNA analysis and everything to do with computer code. ‘The only New York State decision applying Wesley in denying a Frye challenge to STRmix focused solely on the acceptance of STRmix within the “community of forensic DNA analysis.” See People v, Bullard-Daniel, supra, *15 In that case, the Court held that because a 28 number of forensic DNA committees and governing bodies considered probabilistic genotyping sofiware like STRmix reliable, then it was “generally accepted” under Frye. Id. Although the Court also mentioned the use of MCMC and the underlying algorithm of STRmix as being considered by these DNA scientists, it overlooked the most salient issue: that forensic biologists and even population geneticists are not computer scientists. While they may be able to expound on their respective communities’ acceptance of a program like STRmix, these scientists cannot say anything about whether experts in computer algorithm and software engineering would draw the same conclusion. In fact, given STRmix’s complete failure to go through any software validation as proscribed by IEEE, STRmix would not be accepted within the community for which it, as a computer program, is best suited; the computer science community. Its inappropriate for forensic scientists with litle or no formal training or experience in software engineering or development to claim that their software program has been validated from a software engineering perspe fe, The importance of validating a computer program and, having it accepted by computer scientists before use in court is as obvious as it is essential. It is obvious because STRmix, by its own admission, is a computer program. Computer program. experts, thus, are the appropriate scientists “particular field” to appropriately determine whether STRmix is accepted or rejected. Meanwhile, it is essential that the computer programming is validated by computer scientists so that any “bugs” in the system can be caught and corrected before it is used in forensic casework. ‘The need for validation can be demonstrated with an analogy. Microsoft Word is used to write, among other things, legal briefs. Lawyers research and write those brief’. The content of those brief’ are reviewed by other lawyers, by adversaries, and by the Court. The content of those briefs can be considered “legal writing,” and all within the legal community likely could 26 identify the type of law discussed in the brief. The law and its application, thus, would be akin to being “generally accepted.” On a more basic level, the spelling of words and basic grammar are known to all lawyers and can be accepted as “ground truths” that govern such writings. But no lawyer (unless she also is a software engineer) could actually say anything meaningful about the use of Microsoft word to write those briefs. Lawyers could observe that Microsoft Word occasionally crashes, bugs emerge, and files of hard work can be corrupted or destroyed, The lawyer knows when this happens, an error message appears, or a cursor stops moving. But a lawyer cannot say whether those bugs represent a critical error in Microsoft Word's code. A lawyer cannot make any evaluation of Microsoft Word’s method of processing typed data, Indeed, a lawyer who is also not a computer scientist could say nothing about how well Microsoft Word works and whether it works in a way that is like other word processors or ctitically different. Similarly, a lawyer can know when a basic “ground truth” is violated, like a word is misspelled Asking a forensic biologist or DNA analyst to accept STRmix as a valid software is no different than asking a lawyer to accept Microsoft Word as one. Neither has any meaningful way of knowing whether the respective program is actually good and solid engineering, or has fatally flawed code, In fact, the forensic scientist using STRmix is in an even worse position than the attomey to evaluate her software because she wouldn’t even have a set of “ground truths” to rely upon to monitor the reliability of the program in real time or even years down the road, An attorney would immediately know that Microsoft Word didn’t pick up a simple spelling error; a forensic scientist would never know if the software miscalculated the likelihood ratio for a given mixture. 27 Accordingly, because forensic biologists work in a totally different field, with totally different standards and processes for validation, they are not the “particular” scientific community needed to demonstrate general acceptance of STRmix. Computer scientists are that “particular” group, and they have not yet weighed POINT I THis COURT SHOULD PRECLUDE THE STRMIX RESULTS BECAUSE THE METHOD WaS UNRELIABLY APPLIED TO THE EVIDENCE SAMPLE IN THIS CASE, A, Introduetion Even if this Court finds that STRmix methodology is generally accepted as reliable in the scientific community, that methodology was not reliably applied in this case. This Court should preclude the STRmix results. A forensic method must be validated before it is used in casework. Validation demonstrates that a method is reliable. It also establishes the limits of the method and the conditions under which the method will produce reliable results. The application of STRmix to the fingernail scrapings sample 550C2 exceeded the limits established in validation, The program was used in this case in a mixture ratio that was much more extreme than any r tested in its validation, STRmix calculated the percentages of the major and minor contributors to the fingernail scrapings sample to be 99 and 1, respectively, when the sample was analyzed in the November 24, 2015 and December 3, 2015 analysis runs of STRmix. See Nov. 24, 2015 STRmix Summary of Input Data Report at 1; Dee. 3, 2015 Summary of Data Report at 1. When STRmix was run on December 23, 2015, and then again on April 4, 2016, the program calculated the percentages as 100 10 0. See Dec. 23, 2015 STRmix 28 ‘Summary of Input Data Report at 1; Apr. 4, 2015 STRmix Summary of Input Data Report at 1 Yet, STRmix has not been validated for use with such extreme mixture ratios. Although the overall amount of DNA in the 550C2 sample is robust (it was tested in 1, 1.5, 2 nanogram (ng) amounts?), that portion contributed by the minor contributor is very small, and would be termed “low-template,” “low-level” or trace DNA. Dr. Buckleton characterizes the minor contributor to 550C2 as “at a very low level.” Apr. 14 Buckleton Affidavit. Even if STRmix can be reliably applied to samples where the contributors are present in high template amounts, itis unreliable for low level components at such extreme ratios. DNA present in such small amounts exhibit stochastic, or random effects, including peak height fluctuation, drop out, and drop in, The DNA peaks from such low level contributors may be indistinguishable from stutter, a fact which complicated the interpretation in this case. Although STRmix purports to account for these stochastic effects, it does not do so reliably when applied to a sample with characteristics like 550C2, as illustrated by the chronology of testing and reporting of results by ESR. ESR’s interpretation of this sample has evolved over the course of months, leading to different results. When one of the STRmix developers first looked at the data, she observed that “{t]hese are at the very edge of suitability of any interpretation method.” Nov. 3, 2015 email from Jo Bright to John Buekleton, William Fitzpatrick et al Despite this, ESR decided it would run the sample in STRmix. After ESR already ran and generated one result with STRmix, in November, 2015, ESR requested data from NYSP to use in its interpretation—which the NYSP did not have. Nevertheless, ESR ran STRmix again in December 2015, reporting the results in Dr. BucKleton’s December 7, 2015 affidavit. After this ? For points of reference, one human cell has 6.6 picograms (pg) of DNA in the nucleus. There are a thousand pg in a nanogram (ng). One definition of low copy number DNA is the testing of DNA in amounts under 100-200 pg 29 second set of results were generated, ESR then determined it would need to implement a new ‘model in its calculations, one which accounted for a phenomena known as forward stutter, discussed below. ESR tried to do this with data NYSP provided, but resorted to reporting two statistics which differed by three orders of magnitude instead. Finally, after STRmix had been run several times and the resulting statistics reported in Buckleton’s affidavits, ESR applied a new forward stutter model and produced an entirely new statistic. This constant revision and admitted failure to adequately model the observed data at the outset at the very least warrants careful inquiry at a hearing before this Court can be satisfied that the STRmix results should be admitted. B. Legal Standard Itis fundamental that a court has the responsibility to exclude irrelevant or unreliable evidence from the case. “It is incumbent upon the proponent of expert scientific testimony to lay a proper foundation establishing that the processes and methods employed by the expert in formulating his or her opinions adhere to the accepted standards of reliability within the field.” People v. Hyatt, 2001 N.Y. Slip Op. 50115(U) citing People v. Wilson, 133 A.D.2d 179 (N.Y. App. Div. 1987). When dealing with a methodology that is generally accepted, “[t]he focus moves from the general reliability concerns of Frye to the specific reliability of the procedures followed to generate the evidence proffered and whether they can establish a foundation for the reception of the evidence at trial.” Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 447 (2006) citing People v. Wesley, 83 N.Y.2d 417, 429 (1994); People v. Middleton, 54 N.Y.2d at 45; People v. Wesley, 83 N.Y.2d at 428 (testimony must not only demonstrate the acceptance of forensic DNA 30 profiling evidence by the relevant scientific community and its reliability, but must also demonstrate the “admissibility of the specific evidence - - .e., the trial foundation - - how the sample was acquired, whether the chain of custody was preserved, and how the tests were made”) In Parker v. Mobile Oil Corp., supra, a toxic tort case, the Court of Appeals had to answer the question “as to whether the methodologies employed by [plaintiff's] experts lead to a reliable result--specifically, whether they provided a reliable causation opinion without using a dose-response relationship and without quantifying Parker's exposure.” Id. at 447. Concluding that in that case there was not a novel methodology at issue necessitating a Frye inquiry, the Court described the issue before it “as more akin to whether there is an appropriate foundation for the experts’ opinions, rather than whether the opinions ate admissible under Frye.” Id. See also Wesley, supra, J. Kaye concurring, FN2 at 436 (“Our cases have always required a foundational inquiry before scientific evidence can be admitted (see, e.g., People v. Middleton, 54 N.Y.24d, at 45, 444 N.Y.S.2d 581, 429 N.E.2d 100, supra ), even after a particular technique has passed out of the “twilight zone” of “novel” evidence that is the subject of Frye and is judicially noticed as reliable ( see, People v. Knight, 72 N.Y.2d 481, 487, 534 N.Y.S.2d 353, 530 N.E.2d 1273 [radar speed detection]; People v. Campbell, 73 N.Y.2d 481, 485, 541 N.Y.S.2d 756, 539 N.E.2d 584 [blood alcohol content test]; People v. Mertz, 68 N.Y.2d 136, 148, 506 N.Y.$.2d 290, 497 N.E.2d 657 [same]; People v. Freeland, 68 N.Y.2d 699, 701, 506 N.Y.S.2d 306, 497 N.E.2d 673 [same]; Pereira v. Pereira, 35 N.Y.2d 301, 307, 361 N.Y.S.2d 148, 319 N.E.2d 413 [polygraph test used for investigative purposes] ).”). See also People v. Seda, 139 Misc.2d 834, 841 (N.Y. Cty. 1988) (Carey, J.) (“Dr. Shaler’s testimony also revealed that contrary to the requirements of the laboratory manual he had devised for electrophoretic analysis, 31 he had failed to record any of the parameters of the analysis he performed inasmuch as he acted as his own ‘quality control’ and, in the event of any irregularities, would have repeated the analysis”); People v. Castro, 144 Mise.2d 956, 976 (Bronx Cty. 1989) (Sheindlin, J.) (finding DNA forensic identification technique was generally accepted, but that a hearing was necessary to determine whether the lab had conducted scientifically acceptable tests, finding after a pretrial hearing, that the “testing laboratory failed in several major respects to use the generally accepted scientific techniques and experiments for obtaining reliable results” and ruling the evidence inadmissible). If this Court denies Mr. Hillary's Frye challenge, it must then answer this foundational inquiry. Should this Court find that STRmix satisfies the Frye standard, the defense respectfully requests that this Court conduct an inquiry into the reliability of the evidence and conduct a hearing to answer this foundational question, see Wesley, supra, J. Kaye concurring, fn2 at 436 (Frye hearing and foundational inquiry may proceed simultaneously....”) C. STRmix has not been adequately validated for use with mixtures ratios as extreme as those in this ease 1, Extreme mixture ratios present interpretational challenges A mixture re is defined as “the relative ratio of the DNA contributions of multiple ividuals to a mixed DNA typing result, as determined by the use of quantitative peak height information (SWGDAM 2010).” ]. JOHN BUTLER, ADVANCED TOPICS IN FORENSIC DNA TYPING: INTERPRETATION, p. 136 Elsevier (2014). As STRmix reported in this case, the relative contributions may also be expressed as percentages in the overall sample mixture (e.g., 99 percent and I percent). See id. 2 ‘The amount of DNA each individual contributes to the mixture affects the ability to detect all of the contributors to the mixture. A “minor component of a mixture is usually not detectable for mixture ratios below the 5% level or 1:20.” Furthermore, when a “minor component is at a low level it is subject to stochastic effects...” which significantly impact interpretation and statistical weighting, Jd. at 137 Critically, it can be difficult in a mixed sample to distinguish between testing artifacts and a low level peak from a minor contributor. Stutter is an artifact of DNA testing. Stutter appears on an electropherogram as a small peak to the immediate left (or sometimes immediate right) of the true allele. When it occurs at one repeat less than the true allele, it is known as “-4 stutter”, When a stutter peak appears to the right of a true allele itis known as +4 stutter. To illustrate this, consider the images below. Stutter will often appear in a profile like: ‘The small blip to the left of the larger peak is stutter in a conventionally typed profile. Compare with two loci from the $50C2 sample in this case: Stutter complicates interpretation in different ways. Stutter and a real allele may also overlap. Stutter can appear to be a real allele in a complex mixture with peaks present at a range ofheights. This is particularly true when the minor component alleles are present at a height in the same range as an expected stutter peak. “When minor alleles have peak heights that are similar in amount to stutters of major alleles, then these stutter peaks and minor alleles are indistinguishable and may need to be accounted for in the interpretation of the profile...”[citations omitted]. JoHN BUTLER, ADVANCED TOPICS IN FORENSIC DNA TYPING: INTERPRETATION, p. 319 Elsevier (2015). This is the case for the sample at issue here. Dr. Buckleton in describing the need for a forward stutter model agrees: “When a peak is in a forward stutter position it is sometimes difficult to ascertain whether itis allelic or an artifact.” Apr. 14, 2016 Buckleton Aff. See also Bruce Budowle, et al., Mixture Interpretation: Defining the Relevant Features for Guidelines for the Assessment of Mixed DNA Profiles in Forensic Casework, J. Forensic Sci, July 2009, Vol. 54, No. 4. Therefore, the extreme mixture ratio in this case means it is difficult to determine whether a peak is truly from a minor contributor or is it stutter peak. In fact, Dr. Buckleton requested data from the New York State Police to model forward stutter, As Dr, Buckleton stated in his affidavits, six alleles attributed to Mr. Hillary are in the forward stutter position, 2. The mixture proportions/ratios are extreme and STRmix has not been adequately validated for use on them The mixture proportions in this case were determined by STRmix to be 99% and 1%, respectively, when the sample was analyzed in the November 24, 2015 and December 3, 2015 analysis runs of STRmix. See Nov. 24, 2015 STRmix Summary of Input Data Report at 1; Dec, 34 3, 2015 Summary of Data Report at 1. When STRmix was run on December 23, 2015, and then again on April 4, 2016, the program calculated the percentages as 100% to 0%. Before a method may be used in a forensic laboratory, it must be validated to ensure it is reliable, See FBI Quality Assurance Standards (QAS) 8.1, 8.2 and 8.3, available at http://www. fbi. gov/about-us/lab/biometric-analysis/eodis/qas_testlabs. Last visited May 26, 2016. There are two types of validation, developmental and internal, Developmental validation is defined under the FBI Quality Assurance Guidelines as “the acquisition of test data and determination of conditions and limitations of a new or novel DNA methodology for use on forensic samples.” Internal validation is defined as “an accumulation of test data within the laboratory to demonstrate that established methods and procedures perform as expected in the laboratory.” FBI QAS, supra. SWGDAM, a group of American and Canadian forensic scientists representing labs at the local, state and national level, defines a developmental validation of a probabilistic genotyping software system as “the acquisition of test data to verify the functionality of the system, the accuracy of statistical calculations and other results, the appropriateness of analytical and statistical parameters, and the determination of limitations... Developmental validation should also demonstrate any known of potential limitations of the system.” SWGDAM Guidelines for the Validation of Probabilistic Genotyping Systems, 2015, available at hittp://media wix.com/ugd/4344b0_22776006b67e4a32a5 ffo04fe3b56515.pdf, last visited 5/14/2016. Developmental validation should include the testing of various mixture proportions in order to evaluate the system’s sensitivity, which measures how well the method can detect a 38 known contributor, and the system’s specificity, which measures how well a method excludes a non-contributor. These studies should be conducted “over a broad variety of evidentiary typing results (to include mixtures and low-level DNA quantities).” Jd. at 5-6. Inits developmental validation of STR Mix, ESR tested a range of template amounts, number of contributors, and ratios of contributors. This is described in the STRmix V.2.4 User Manual. For two person mixtures, STRmix was tested at amounts of DNA from 100-500pg with ratios of 1:1 to 5:1 (when the prosecutor’s hypothesis was true, i.e. a sensitivity study). See STRmix V.2.4 User Manual, p. 110; Duncan Taylor, Jo-Anne Bright, John Buckleton, The interpretation of single source and mixed DNA profiles, Forensic Sci. Int'l: Genetics 7 516-528 at 524 (2013). More extreme ratios than 5:1 were tested several years ago. An experiment was conducted with an unknown, earlier version of STRmix than that used in this ease. The most extreme proportion STRmix calculated for two person mixtures in that study is .09 to 91, or 9 to 91, less than the proportions STRmix assigned in this case. Only three samples, each of a different template, were used at this ratio, and amplified in triplicate.’ Duncan Taylor, Using continuous DNA interpretation methods to revisit likelihood ratio behavior, Forensic Sci, Int’: Genetics, 11 (2014), 144-153 at 145. This is insufficient to demonstrate that STRmix can generate reliable statistics for a sample at the ratio present in this case. Moreover, the NYSP lab did not conduct an intemal validation of STRmix, so the lab did not test how STRmix would perform on mixture ratios this extreme. Not every mixture ratio which is conceivable can be tested—that would be impracticable and unnecessary. What is necessary is to establish bounds or limits. For instance, if lab tests Amplified in triplicate means that the 3 samples each underwent the amplification stage of DNA testing 3 times, so that there were a total of 9 tests. 36 20:1 and 10:1 mixtures, it may not be necessary to test 17:1 samples, because that ratio falls within a range that has been tested. The issue of whether STRmix can be applied to such extreme ratios appears to be one of first impression for New York courts. No such challenge appears to have been raised in Bullard- Daniel, supra D. Confidence in STRmix’s ability to correetly interpret this type of sample should be questioned It is clear that this sample presented a challenge to STRmix and that this Court cannot be assured that the answer reached by STRmix is accurate or was produced with reliable means, ESR has already generated three different likelihood ratios, using different models. Modelling, should be done in the developmental stage, not during the application of the method to an actual case. This is a function of the inability of STRmix to explain the observed data well. This continued revision also raises the possibility of subjectivity and bias in the interpretation, Critically, NYSP did not conduct an internal validation study in which NYSP equipment, protocols, personnel and data were used in the testing of STRmix. First, parameters incorporated by STRmix on which it relies to generate results did not come from the New York State P lab. On December 4, 2015, Dr. Bright emails Meegan Fitzpatrick, a scientist at NYSP, and requests drop-in data because that function had been tumed off, but during technical review (review of the results by another scientist at ESR) this was questioned. Drop-in is the detection of sporadic alleles in a sample, the origin of which cannot be said to come from the crime scene sample; in other words drop-in is sporadic contamination. As the STRmix V.2.4 User Manual states, it’s like “alleles snowing from the ceiling.” Id. at 146, 37 In response to Dr. Bright's request, Meegan Fitzpatrick states, “We currently do not have any data on drop in. We do not have a low copy number protocol or validation in house to provide any data.” Yet, for STRmix, “[d]rop-in parameters are defined individually for a specific laboratory and are determined as part of the implementation process for STRmix within that laboratory.” ESR requested the drop-in data and NYSP could not provide it. In fact, other data used in modeling parameters incorporated into the STRmix statistical analysis didn't come from the NYSP at all: instead, ESR used either the default settings ESR had developed or took them from a lab in Toronto using the same genetic analyzer and kit. See Apr. 14, 2016 Buckleton Affidavit, (“Our standard operating settings and parameters for Identifiler 3500 data were applied except that values for saturation, allele and stutter variance and locus specific amplification efficiencies. These were taken from a dataset validated using Identifiler Plus data analyzed on a 3500 CE instrument undertaken at the Toronto Centre of Forensic Sciences.” Second, certain data that were provided proved difficult to model adequately. After already running STRmix on this sample, Dr. Buckleton developed a forward stutter model, critical in a case where six of the eleven peaks corresponding to Mr. Hillary’s alleles are in forward stutter positions in a mixture with an extreme ratio, To this end, on December 15, 2015, Dr. Buckleton requested from Julie Pizziketti, Director of Biological Science at the NYSP, “100 single source [samples] with all stutter filters off" because he “hit a bit of a snag at TR for your case. I need to firm up on forward stutter. Many of your peaks are in forward stutter positions.” Dr. Buckleton received this data: In order to inform this we analyzed a set of data provided by the New York State Police. This analysis has not proven highly successful and this is because I did not specify the required data well. T have ended up with 136 useful data points of which only 2 show forward stutters... 38 Jan. 3 Draft affidavit, Dec. 23 Affidavit. When attempting to develop or evaluate a model describing forward stutter, a sample size of two is insufficient. Again, NYSP did not conduct an internal validation where STRmix would be tested in that lab using those protocols, equipment, personnel, and data, ESR should have anticipated these issues from the start: as Dr. JoAnn Bright, one of the developers of STRmix and scientist who ran the analyses in this case, communicated to the NYSP lab director and her colleagues after reviewing the electropherograms of sample 550C2, that “[tJhese are at the very edge c ity of any interpretation method,” Nov. 3, 2014 email from Jo Bright to John Buckleton, William Fitzpatrick et al.. Yet the decision was made to push the envelope anyway and test it, even without an adequate model in place. After results were generated, ESR went back and attempted to compensate for the lack of adequate modelling, More studies would need to be performed to ensure STRmix is reliable to be used on mixtures with such extreme ratios as those present in this case. Therefore, this Court has no basis on which to conclude that the application of STRmix in this particular case was reliable and must preclude the evidence. POINT U1 THE STRMIX RESULTS SHOULD ALSO BE PRECLUDED BECAUSE ‘THEIR PROBATIVE VALUE ARE SUBSTANTIALLY OUTWEIGHED ‘THE DANGER IT WILL PREJUDICE THE DEFENDANT Even if this Court finds that testimony concerning the STRmix results is admissible under the Frye/Wesley standard, it should still exclude the evidence because its probative value is substantially outweighed by the danger it will prejudice the defendant and mislead the jury. 39 “Evidence is relevant if it has any tendency or reason to prove the existence of any material fact i.e, it makes the determination of the action more probable or less probable than it would be without the evidence.” People v. Scarola, 525 N.E.2d 728, 732 (1988). ‘The general rule is that evidence which tends to prove a material fact in a case is admissible unless precluded by an evidentiary rule, People v. Wilder, 93 N.Y.2d 352 (1999); People v. Buie, 86 N.Y.2d 501, 509 (1995). As such, “[n}ot all relevant evidence is admissible as of right”...Even where technically relevant evidence is admissible, it may still be excluded by the trial court in the exercise of its discretion if its probative value is substantially outweighed by the danger that it will unfairly prejudice the other side or mislead the jury.” Id Here, testimony concerning the STRmix results will confuse and mislead the jury. There are three STRmix results, and it is reasonable to assume that the prosecution will elicit ESR’s account for the reasons behind those differences, which will involve the presentation of highly technical testimony. The risk that the jury will be bewildered by the different modeling behind the results and how they relate to one another is great. Alternatively, this Court should hold a hearing to determine whether the probative value of this evidence is substantially outweighed by the danger that it will unfairly prejudice the defendant and mislead the jury. 40 WHEREFORE, this Court should preclude the prosecution from introducing any evidence about or produced via the STRmix software in this case or, in the alternative, order a Frye hearing, and grant any other relief as this Court deems just and proper. TO: ‘THE HON. MARY RAIN District Attomey, St. Lawrence County Clerk of the Court St. Lawrence County 4 Sa Earl 8. Ward 600 Fifth Avenue 10" Floor New York, NY 10020 (212)763-5000 Declaration of Nathaniel Adams 1, Ihave a Bachelor of Science degree with a major in Computer Science froma Ws University (Dayton, Ohio). 1 am enrolled in the Graduate School at Wright State University, pursuing a Master of Science degree in Computer Science. | am employed as a Systems Engineer at Forensic Bioinformatic Services, Inc. in Fairborn, Ohio. My duties include analyzing electronic data generated during the course of forensic DNA testing; reviewing case materials; reviewing laboratory protocols; and performing calculations of statistical weights, including custom simulations for casework and research. | actively use, develop, and maintain a number of software programs to assist with these efforts. | have been involved in several reviews of probabilistic genotyping analyses in criminal cases, including STRmix™. In 2024 | attended a week-long workshop on interpreting forensic DNA mixtures, including a day-long session on STRmix™. In January 2016, | was retained in a criminal case unrelated to NY v Hillary and inspected the source code of, STRmix™, Due to a non-disclosure agreement that | signed, | am not allowed to discuss the findings of my code inspection of STRmix™ outside of that particular case 2. Asan employee of Forensic Bioinformatics, | have had the opportunity to examine the sclentific literature directly relating to the STRmix™ program and the application of ‘STRmix™ to certain criminal cases internationally. 3. For purposes of this Declaration, ! am restricting my comments to any public evidence as to whether STRmix™ has adhered to specific industry standards and practices recognized and used in the field of software development and engineering for validation of software systems. 4, Professor David Balding describes difficulties in assessing likelihood ratio (LR) calculations for low template DNA [LTDNA] samples in his article “Evaluation of mixed- source, low-template DNA profiles in forensic science” (D. J. Balding. Proc. Natl. Acad. Sci. U. S. A. July 2013. 110(30}:12241-6. Available at: http://www-pnas.org/content/110/30/12241 full) There is no “gold standard” test of an LR calculation for LTDNA profiles. Likelihoods reflect uncertainty, and even when the profiles of the true contributors are known in an artificial simulation, this does not tell us what is the appropriate level of uncertainty justified by a given observation affected by stochastic phenomena, likelihoods depend on modeling assumptions, and there can be no “true” statistical model for a phenomenon as complex as an LTDNA profile. 5. The difficulty in properly validating software like STRmix™ is described in C. D. Steele and D. J. Balding, “Statistical evaluation of forensic DNA profile evidence,” Annu. Rev. Stat. Its Appl, vol. 1, 2014., Section 5.1, “Quality of Results”: Laboratory procedures to measure a physical quantity such as a concentration can be validated by showing that the measured concentration consistently within an acceptable range of error relative to the true concentration. Such validation is infeasible for software aimed at computing an LR [likelihood ratio] because it has no underlying true value (no equivalent to a true concentration exists). The LR expresses our uncertainty about an unknown event and depends ‘on modeling assumptions that cannot be precisely verified in the context of noisy [crime scene profile} data. Some progress can be made in evaluating the validity and performance of software. Courts need these kinds of evaluations to have confidence in the results of software-based forensic analyses. Open source software is highly desirable in the court environment because openness to scrutiny by any interested party is an invaluable source of bug reports and suggestions for improvement. 6. In comparing a suspect's DNA profile to an evidence sample, STRmix™ generates a likelihood ratio (LR). Because no “ground truth” LR value exists against which STRmix™ results can be compared for any mixture, we must base our confidence in the program on two factors: ‘a, The appropriateness of the models used. This factor is generally within the domain of biologists and statisticians. b. The degree of fidelity with which these models have been translated from theory/concept to source code for execution as a software program. This factor is generally within the domain of software developers/engineers. 7. Industry practices as well as specific standards exist for the development and validation of software systems in order to determine its fitness for purpose from a software engineering perspective. For example, the Institute of Electrical and Electronics Engineers (IEEE), the International Organization for Standardization (ISO), and the Association for Computing Machinery (ACM) have promulgated standards for software engineers to utilize during the software development process described in Appendix A. As of May 26, 2016, IEEE’s collection of “Systems and Software Engineering” standards (available at: https://standards.ieee.org/cai- bin/lp_index?type=standard&coll name=software_and systems engineeringgistatus=a ctive) contains 132 active standards, including "730-2014 - Software Quality Assurance Processes,” “982.1-2005 - Standard Dictionary of Measures of the Software Aspects of Dependability,” “1012-2016 - IEEE Approved Draft Standard for System, Software and Hardware Verification and Validation,” and “29119-1-2013 - Software and systems engineering —Software testing - (Parts 1-4)’. 8. Ihave not seen sufficient documentation demonstrating STRmix™’s fidelity to its intended use, i.e, that it has been rigorously tested, validated, or verified using current software engineering practices described in standards such as those above. | have not seen formal descriptions of its intended use or demonstrations that its actual operations adhere to its intended use. Examples of materials important for evaluating software systems in this manner include, but are not limited to: a. Formal software requirements and specifications documents; b. The source code, including code utilized for testing purpos c. The software test plan describing the testing that is or should be conducted; d. The software test report describing the results of the testing conducted; and e, Logs pertaining to maintenance of the cade; version changes; user change requests; error/bug reports; and installation or performance issues. 9. Ihave observed few formal references to industry standards or even general practices in relation to the development and use of many probabilistic genotyping software systems, including STRmix™, such as the standards mentioned above or the process described in Appendix A. 10. When claiming that a software system has been validated or verified as operating correctly, such claims should be made of the context of “how, by whom, and by what standard?” If claims of validation and verification of a software system are made in accordance with a specific, formal standard, such claims should be demonstrable by way of a citation of that standard as well as supporting documentation generated during the course of development, testing, and validation/verification of that software system. These materials should be imminently available to the software developer because they are important components of the software development process. For the purposes of independent validation/verification, these materials could be audited by an outside group of experts, ideally involving software developers, biologists, and statisticians. 11. | am aware of no software engineering standards specific to the field of probabilistic genotyping. | am not aware of any recommendations made by regulatory or guidance bodies in the field of forensic DNA that the development or use of probabilistic genotyping software systems adhere to specific, formal software engineering standards. 12. Advocacy for increased transparency of software in the greater scientific community has been repeatedly made. Examples include Requiring that source code be made available upon publication would also be expected to yield substantial benefits—including improved code quality, reduced errors, increased reproducibility, and greater efficiency through code reuse and sharing, Achieving this would bring disclosure and publication requirements for computer codes in line with other types of scientific data and materials. (A. Morin, et al. “Shining Light into Black Boxes.” Science. 2012 April 13; 336(6078): 159-160. Available at: hhttp://veww.nebi.nim,nih.gov/pme/artictes/PN1C4203337/ Our view is that we have reached the point that, with some exceptions, anything less than release of actual source code is an indefensible approach for any scientific results that depend on computation, because not releasing such code raises needless, and needlessly confusing, roadblocks to reproducibility. (D.C Ince, et al. “The case for open computer programs.” Nature. 2012 February 22; 482(7386): 485-488, Available at: http://www.nature.com/nature/journal/v482/n7386/full/nature10836.html) 13. While knowledge of a software system's source code is an important component of an independent validation and verification, the body of software engineering materials, generated during the course of development is important to contextualize the source code (the software implementation) in light of the intended use of the program, i.e. the formal software requirements and specifications. 14. | understand that several developers of probabilistic genotyping software systems, including the developers of STRmix"™, are not interested in open source (publicly available) distributions of their programs. | have been and continue to be willing to assist with reviews of source code and associated software engineering materials under non disclosure agreements or protective orders. | declare the above is true and correct under the penalty of perjury under the laws of the State of New York, executed this 27" day of May, 2016, at Fairborn, Ohio ithaniel Adams sash cs apa esr ne 25,20 AL Appendix A The Software Development Process? The Software Development Process is composed of a series of stages, which are generally divided into: 1, Requirements - What the program should do, generally written in English and including visuals where applicable. 2. Specification — What the program should do, generally written in a combination of technical English and mathematical notation, with diagrams where applicable. A precise, technical description of the Requirements. 3. Design — How the program should perform the tasks specified in the Requirements and Specification documents, generally written in a combination of technical English, pseudocode”, and diagrams, where applicable. This is the structure of the program with descriptions of how its companents fit together. 4. Implementation - Translation of the design to a programming language. 5. Testing ~ Verification that the program produced by the Implementation stage adheres to the tasks described in Requirements and Specifications. 6. Maintenance - Upkeep of the program, including fixes for errors (“bugs”) as well as additions of new features or revisions of current features. The Software Development Process is a cyclical process ~ non-trivial programs invariably require revisiting earlier stages of the process. There are many good reasons for revisiting earlier stages of the process: when prior assumptions turn out to be invalid or incomplete; requirements 'change; technologies change; users request a change; a process can be improved; or performance can be improved. * Software development process: “the process by which user needs are translated into a software product, NOTE The process involves translating user needs into software requirements, transforming the software requirements into design, implementing the design in code, testing the code, and sometimes, installing and checking out the software for operational use. These activities may overlap or be performed iteratively.” (SO/IEC/IEEE, “Systems and software engineering ~ Vocabulary," ISO/IEC/IEEE 24765:2010(2), vol, 2010.) * Pseudococe: "A notation resembling a simplified programming language, used in program design; esp. (a) one that s translated by a computer into a programming language; (6) one consisting of expressions in natural language syntactically structured like @ programming language, used to represent programs that are later written bya programmer.” (Oxfard English Dictionary; accessed March 2016) Requirements I Figure A1 - The Software Design Process, (ND Adams and DE Krane. “Black boxes and due a2 process: Transparency in expert software systems”. Oral presentation. American Academy of Forensic Sciences annual meeting, 2016. Las Vegas, Nevada, USA.)