You are on page 1of 11

25739468, 2022, 2, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wfs2.1434 by <Shibboleth>-member@glam.ac.uk, Wiley Online Library on [19/02/2023].

See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Received: 2 December 2020 Revised: 28 April 2021 Accepted: 17 May 2021
DOI: 10.1002/wfs2.1434

PERSPECTIVE

Explainable artificial intelligence for digital forensics

Stuart W. Hall1 | Amin Sakzad1 | Kim-Kwang Raymond Choo2

1
Department of Software Systems and
Cybersecurity, Faculty of Information Abstract
Technology, Monash University, EXplainable artificial intelligence (XAI) is an emerging research area relating
Melbourne, Australia
to the creation of machine learning algorithms from which explanations for
2
Department of Information Systems and
outputs are provided. In many fields, such as law enforcement, it is necessary
Cyber Security, University of Texas at San
Antonio, San Antonio, Texas, USA that decisions made by and with the assistance of artificial intelligence (AI)-
based tools can be justified and explained to a human. We seek to explore the
Correspondence
Amin Sakzad, Department of Software
potential of XAI to further enhance triage and analysis of digital forensic evi-
Systems and Cybersecurity, Faculty of dence, using examples of the current state of the art as a starting point. This
Information Technology, Monash opinion letter will discuss both practical and novel ideas as well as controver-
University, Clayton, VIC 3800, Australia.
Email: amin.sakzad@monash.edu
sial points for leveraging XAI to improve the efficacy of digital forensic
(DF) analysis and extract forensically sound pieces of evidence (also known as
Edited by: Sara Belkin, Executive Editor artifacts) that could be used to assist investigations and potentially in a court
of law.

This article is categorized under:


Digital and Multimedia Science > Artificial Intelligence
Digital and Multimedia Science > Cybercrime Investigation

KEYWORDS
artificial intelligence, digital artifacts, digital evidence, digital forensics, EXplainable
artificial intelligence

1 | INTRODUCTION TO XAI

A machine learning algorithm typically runs two distinct operations for learning and classification (Burrell, 2016).
Learners are generally “trained” by being provided input data already classified with an appropriate output. Then,
based on the learned data, the algorithm can produce classifications for new inputs. Machine learning comes in differ-
ent types, including supervised and unsupervised learning, reinforcement learning and neural networks. Supervised
learning is where the learning occurs when the system is fed precategorized or labeled training data. Unsupervised
learning is learning that occurs without labeled input data, allowing the algorithm to find its own patterns within the
data. These two types of learning can also be used in conjunction with each other. Reinforcement learning is another
type of machine learning in which during the training process feedback is provided to the system. One of the key prob-
lems extant in machine learning is the inability of human users to understand exactly how a model came to a determi-
nation. EXplainable artificial intelligence (XAI) seeks to address this “black box” problem, where machine learning
algorithms often make decisions through methods which cannot be explained in specifics.
Doran et al. (2017) outline four categories of XAI based on the interpretability present in a particular XAI model,
namely: opaque systems, interpretable systems, comprehensible systems, and truly explainable systems (see also

Abbreviation: XAI, artificial intelligence; digital forensics.

WIREs Forensic Sci. 2022;4:e1434. wires.wiley.com/forensicsci © 2021 Wiley Periodicals LLC. 1 of 11


https://doi.org/10.1002/wfs2.1434
25739468, 2022, 2, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wfs2.1434 by <Shibboleth>-member@glam.ac.uk, Wiley Online Library on [19/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 11 HALL ET AL.

TABLE 1 Categorization of XAI models based on the level of interpretability and their brief description

Interpretability Feature
Opaque (black-box) No insight into algorithm, provides only output data
Interpretable Formulas/methodologies known, transparent in nature, provides only output data
Comprehensible No insight into algorithm, provides textual/feature based references, justify decision making
Truly explainable Contextualizes input data, justify output recognizable to a human, User-centric by design

Table 1). Opaque systems are the ones in which the mechanisms/algorithms/approaches mapping input data to output
data are not visible to the user (Doran et al., 2017). These systems are often closed-source, developed by software manu-
facturers to be provided to clients to meet specific needs, and proprietary in nature. Opacity in AI introduces additional
challenges when informing decisions in the public sector, where there is greater requirement for transparent and justi-
fied decision making. If transparency in a system cannot be provided, then post-hoc explanations can be used to “justify
the outputs in terms of rationalisations of the systems workings rather than trying to show the actual workings”
(Preece et al., 2018). Existing instances of AI's use in digital forensics are generally opaque in nature and require valida-
tion by a user (see Section 2.2). Opaque AI assists in evidence discovery rather than interpretation. For example, a file
may be flagged as being of interest but a digital forensic (DF) examiner must still examine the file's contents/metadata
to determine the validity of the AI's classification. Furthermore, no reasoning for the AI's classification is provided.
In interpretable system, the user cannot only see, but also study and understand how inputs are mathematically
mapped to outputs (Doran et al., 2017). The authors of (Arrieta et al., 2020) further differentiate between
interpretability—“the ability to explain or to provide meaning in terms understandable to a human,” and understandabil-
ity, for which transparency and interpretability are necessary. They also acknowledge that the audience of an AI's output
as a variable affecting its explainability. This is a significant factor that must be considered for models implemented
within the DF field, where technical information must be communicated to individuals with different technical back-
ground knowledge. Not only does the DF field require that AI be transparent, but also that its decision making and rea-
soning capability be accurate and comprehensible for courts, and can be explained during court proceedings.
Interpretability can be provided through methods including, but not limited to, linear/logistic regression, generalized lin-
ear models, generalized additive models, decision trees, decision rules and the Rulefit algorithm (Molnar, 2020).
A comprehensible system emits symbols along with its output that allows a user to relate inputs and their
corresponding output properties (Doran et al., 2017). In these systems, different detected features of the input data act as
keys to specific values that are added to the outputs to present pointers of features detected within the data to users. These
systems additionally provide a form of comment to their output, in contrast to just the output as given by an opaque sys-
tem. This output can take the form of confidence scores, which introduce trust, or by providing contrastive/counterfactual
information which provides insight into a model (Chari et al., 2020). For a system to be explainable in a manner suitable
in the DF field by being self-evidential, it must be both interpretable and comprehensible. Justification must be given for
the system's output, and those justifications must follow from the inputted data. When investigating applications of AI
systems in DF, the authors of (Costantini et al., 2019) found that with suitable tools using new methods for visualizing
and explaining results of computed answers such as argumentation schemes, AI could be used “to explain the conclusions
(and their proof) in a transparent, comprehensible and justified way.” It follows then that the DF field requires a truly
explainable AI system to address issues including wait times, case load, and high data volume.
A truly explainable AI system formulates a line of reasoning that explains the decision-making process of a model
using human-understandable features of the input data (Doran et al., 2017). Truly explainable AI makes use of knowl-
edge bases which contextualizes analyzed data within the system and thus provides a mechanism through which indi-
vidual findings can be deconstructed, in various forms, to provide a justification for the model's interpretation that
logically follows from the provided input data. The authors of (Wang et al., 2019) emphasize that such models must be
user-centric, highlighting that prior studies have failed to justify specific implementations of their lines of reasoning,
while providing background on links between human reasoning and XAI systems. Further to this, systems must priori-
tize the needs of end users (e.g., different user groups within a specific industry, or different stakeholders within the
legal system) to be trusted. Developers of these systems should work with individuals who will employ these systems or
else risk creating systems that provide responses too obtuse or jargonistic to be interpretable (Miller et al., 2017).
Finally, if the output of a model can be validated through the use of logically consistent and self-generated justifica-
tions, the need for the transparency of the decision-making model to the user is reduced, but not eliminated, which
could potentially allow software providers to protect proprietary technologies.
25739468, 2022, 2, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wfs2.1434 by <Shibboleth>-member@glam.ac.uk, Wiley Online Library on [19/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
HALL ET AL. 3 of 11

2 | C UR R E NT A I FOR D F

In what follows, we first position the state of AI in DF and then list some DF suites that already leverage AI in their
systems.

2.1 | The state of digital forensics

It was acknowledged in the 2009 U.S. National Academy of Science Report Strengthening Forensic Science in the United
States: A Path Forward (National Research Council, 2009) that the DF discipline has “undergone a rapid maturation pro-
cess” in the 10 years prior to its publication in 2009. In the report, three issues undermining the standardization of DF
analysis were identified: (1) no standardized certification process for DF practitioners, (2) DF examination being treated
as an investigative, rather than forensic, process, and (3) the large variance in training and experience of DF practitioners.
It also suggested the minimum qualification for DF practitioners should be a Bachelor of Science, a Bachelor of Computer
Science, or a Masters Degree of Science, and recommended that forensic laboratories should be removed from the admin-
istrative control of law enforcement agencies or prosecutions' offices in the United States. However, the approaches in this
paper are yet to be widely adopted or accepted into the mainstream (Kafadar, 2019). In 2020, the UK Forensic Capability
network partnered with the National Police Chief's Council and the Association of Police and Crime Commissioners to
produce the UK Digital Forensic Science Strategy (Forensic Capability Network, Transforming Forensics, National Police
Chiefs' Council, Association of Police and Crime Commissioners, 2020)—a strong indicator that these groups view DF as
an investigative and forensic process within the purview of law enforcement. The research identified three core challenges
facing DF (i.e., volume of data, complexity of devices and legitimacy of forensic processes) and seven “pressing issues.”
The latter are lack of support services (e.g., DF administration, IT support, etc.), a fragile commercial marketplace, limited
strategic engagement with academic and industry partners, recruitment and retention, lack of DF science awareness
within policing, limited embedded quality (e.g., standardization, qualification, and methodology validation), and handling
of legacy data (retention of reports, case data, forensic images, etc.). Acknowledging that efficient and effective integration
of data is a core function of DF, the authors stated that “implementing improved technology for search and review,
including… use of artificial intelligence (AI) and machine learning (ML) techniques” must be “prioritized” (Forensic
Capability Network, Transforming Forensics, National Police Chiefs' Council, Association of Police and Crime
Commissioners, 2020, p. 27), and acknowledged AI as a key emerging area where law-enforcement could collaborate with
industry and academia (Forensic Capability Network, Transforming Forensics, National Police Chiefs' Council, Associa-
tion of Police and Crime Commissioners, 2020, p. 28).
Another report prepared by the U.S. President's Council of Advisors and Technology (PCAST) (National District Attor-
neys Association [NDAA], 2016) explores issues within modern DF practice. This report found that forensic disciplines in
the U.S. lacked “quality culture” (NDAA, 2016, p. 33) and that nearly all crime laboratories were closely tied up with the
prosecution of cases. The applicability of some key aspects of this paper to XAI come from background provided on scien-
tific validity in U.S. courts and the scientific requirements for forensic processes—thresholds which XAIs must meet if
they are to be accepted as part of the DF process. The authors highlight that the U.S. Supreme Court held that “the admis-
sibility of scientific expert testimony depended on its scientific reliability” (NDAA, 2016, p. 41). The authors also stressed
that forensic processes need to stand up to scientific scrutiny by being repeatable, reproducible, and accurate. Ideally, any
methodology should be objective with “precise definitions” of “feature identification, comparison and matching”
(NDAA, 2016, p. 48). Since this is not possible, and “the black box in the examiner's head cannot be examined directly for
its foundational basis in science,” the authors stress the need for empirical study of examiners' performance in “black-box
studies” to determine a likelihood of error based on the generalized performance of the forensic analyst cohort
(NDAA, 2016, p. 49). A truly explainable AI with a rationalizable “black-box” could provide greater confidence in subjec-
tive forensic methodologies, by being both transparent in process and providing justified and logical reasoning.

2.2 | AI-based DF suites

If any of these changes are going to happen, it will almost certainly be over the long term. Examination of three key DF
software providers' current AI offerings gives insight to how far the industry still must come. Magnet Forensics, manu-
facturers of the Magnet Axiom forensic examination tool, implemented machine learning in their Magnet.AI module,
25739468, 2022, 2, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wfs2.1434 by <Shibboleth>-member@glam.ac.uk, Wiley Online Library on [19/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 11 HALL ET AL.

Digital
Pretrained Tagged suspect data
Digital forensics data forensics
opaque AI system
examiner
...

An organisation Validates and


Report
like a court generates
reports

FIGURE 1 AI in DF. State-of-the-art in vendor implementations

which categorizes user chat and image files from evidence items into specific classifications (child abuse material
[CAM], drugs, weapons, etc.). Evidence presented by this tool is tagged for review by a DF examiner. No further infor-
mation regarding the case is generated or analyzed (Magnet Forensics, 2017). The Griffeye (formally Netclean) software
suite is reportedly used by over 4000 law enforcement agencies worldwide (Griffeye, 2020) and features a free add-on
module called Griffeye Brain that undertakes the same kind of categorization of images, videos and audio files as Mag-
net Axiom. Cellebrite's Physical Analyzer software (formally UFED Physical Analyzer) has recently integrated AI into
its analytic processes (Carmeli, 2020). Figure 1 represents the state-of-the-art AI-based vendor implementation model.
Each of these approaches to AI seeks to reduce the need for human interaction with the data, not eliminate it. It is
hard to envision a scenario when the human element could be completely removed from the DF process. A more realis-
tic approach would be to transfer from human-in-the-loop to human-on-the-loop in this context. In particular, Each of
Magnet, Cellebrite and Griffeye's use of AI (Carmeli, 2020; Griffeye, 2020; Magnet Forensics, 2017) is via proprietary
data analytics undertaken through their individual forensic platforms with results that may be replicated by other tools.
Regardless of their accuracy, the items tagged still require validation by a user/examiner, who assesses their relevance
to the case, and then reports on them as necessary. Although completely removing human interaction from the DF pro-
cess is a worthy aspiration to explore, such a move could potentially not be welcomed by the legal system and society at
large. Hence, exploring the important aspect of human-on-the-loop as to where in the process could the human be
removed is still to be considered in an independent study.

3 | F UTURE OF AI FOR DF

Going beyond automation present in specific tools, Du et al. (2020) summarize the state of the art for IT forensics using
AI by addressing the applicability of AI to data discovery/recovery, device triage, network traffic analysis, encrypted
data forensics, timeline/event reconstruction and multimedia forensics. Each of these areas will have variable utility
and applicability to different types of DF examiners, being enterprise or individual. One example is how AI/ML can
help triage, for example child pornography or other disturbing images. In such a framework, XAI can be exploited to
assist a DF examiner by, for example classifying images, while protecting their mental health so that they do not need
to look at so many disturbing images. Instead, they can look at similarity scores/parameters of an image or read some
specific predetermined explanations (generated by the XAI/ML algorithm) of images without looking at them.
From a law enforcement perspective, XAI may have limited use in assisting the investigative process on first pass-
ing. While emphasis is theoretically placed on law enforcement DF examiners not displaying any bias and producing
impartial reports, most DF analysis is completed after a suspect has been arrested and when the existence of evidence
against the individual in question is already believed to exist. Investigators in these cases desire DF specialists to extract
data from devices into an easily reportable format for presentation to the courts, something which could potentially be
automated by XAI. Despite the real challenges that exist for the integration of XAI into the DF field, XAI presents
opportunities to deal with old problems that are becoming increasingly challenging to address. Increases in the volume
of data required for analysis, the complexity of offenses committed using IT infrastructure, and diversity of evidence
items have all made the reporting of valuable DF insights more time-consuming and more reliant on resource-intensive
25739468, 2022, 2, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wfs2.1434 by <Shibboleth>-member@glam.ac.uk, Wiley Online Library on [19/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
HALL ET AL. 5 of 11

data processing (Du et al., 2020). In what follows, we explore the opportunities presented by XAI to deal with problems
currently extant in DF from a law enforcement perspective.

3.1 | Changes to DF examiners' (roles/tasks)

If XAI becomes widespread in the DF field, the role of a DF examiner will almost certainly change as well. The integra-
tion of XAI would change how digital evidence is presented in legal proceedings, where the XAI's output and methodol-
ogy may need to be explained, or detailed documentation provided (Raaijmakers, 2019). This would bring at least three
subtle changes into (1) the access privileges needed to be given to a DF examiner to trained AI data, (2) the qualifica-
tions needed for a DF examiner to be employed by law enforcement, and (3) the way a DF examiner treats/examines a
public or a private case. In what follows, we explain these changes one-by-one:

1. Access to training data for XAI models can be challenging. In a law enforcement context, in which XAI could be
implemented, systems would need to be trained on data similar to user data likely to be of interest (e.g., images
depicting CAMs) which are subject to their own legal controls though some data sources already exist. Access to train-
ing materials and data for contextual databases is further complicated by controls that exist on the use of personal data
held by law enforcement agencies. Often, specific regulations cover how this information can be used. Depending on
the training data used, justifications provided by XAI systems could also be biased by features of the training data that
are not representative of that being examined or by decisions made by software developers during the design process.
2. Another change will occur in the skills, practices, and understanding of DF examiners, who will not only be
required to understand the IT systems they are examining, but also the XAI systems that are making predictions on
their behalf. In other words, the greater introduction of AI/ML in DF requires that the DF examiners and other rele-
vant lab members and managers to have the basic AI/ML skills/qualifications/understandings. This may be imprac-
tical for law enforcement lab managers to recruit skilled DF examiners who also have relevant AI/ML expertise.
This challenge is compounded by the competition for the same pool of talents with other, often better paid, industry
sectors. DF examiners also need to be fast learners, due to the fast-paced nature of consumer technology (e.g., newer
versions of operating systems, mobile devices).
3. As with most information/operational technology requirements, there is a difference in user needs between user
groups from the public and private sectors in DF. Some of these differences have been touched on already, such as
the conflict that exists between a company's desire to protect their proprietary “black-box” AI and the public sector's
strong emphasis on transparency. In the private sector, DF is often undertaken on data coming from additional
sources such as network devices which cannot be taken offline. It is imperative that business is not disrupted while
analysis is underway as it is not possible for evidence items to sit in storage for months awaiting the availability of a
DF investigator. In the public sector, where evidence items may be held under lawful seizure, addressing an individ-
ual's concerns regarding the seizure of their property is usually at the investigating DF officer's discretion.

3.2 | Case management

Currently, DF case management is a challenging and unwieldy process, with complications arising from the prioritiza-
tion of case work, the difficulty of forensic analysis on individual property items, the capacity for data recovery, the
availability of DF examiner and investigator resources, and more. XAI could potentially assist DF examiners in
the triaging of cases, allowing for the intelligent allocation of resources. XAI may also assist in determining whether DF
analysis is necessary to determine the outcome of a case. To clarify, it is up to the courts (and not the DF analysis) to
determine the outcome of a case. Instead, XAI tools can help to determine the intrinsic forensic value of an artifact and
the likelihood of the fact that if such an evidence can be convincing to a court of law or not, hence its inclusion or
exclusion from the report to the court.
Besides bringing advanced security and data analytic features to the DF science with respect to case management,
XAI/ML can provide data processing, classification, and clustering. More specifically, XAI/ML can improve the
processing capabilities of a DF case manager by automating some of them through repeated accesses to data and
actions. These actions will also improve over time by getting access to more and more trained data. Another point is
that DF examiners are already overwhelmed with significant data volumes to forensically sound extract relevant
25739468, 2022, 2, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wfs2.1434 by <Shibboleth>-member@glam.ac.uk, Wiley Online Library on [19/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 of 11 HALL ET AL.

artifacts. The XAI/ML capability in reading through data and context recognition allows a DF examiner to take the art
of artifact extraction to a whole new stage. This would save a lot of time and effort in managing a DF case.
In addition to all above and based on the information from different sources within a case or types of cases, it is very
likely that XAI/ML can assist to develop/create effective models predicting the location of specific items/artifacts for
further data analysis. Just as with DF examinations, criminals make use of different tools/methodologies depending on
the types of their offending. The use of XAI/ML would allow the examination of training dataset from certain types of
case will no doubt lead to commonalities that can be leveraged. As an example, illicit material is commonly found in
folders named “new folder” on suspect's devices and this is widely known by DF examiners. In this example, such a
simple yet alone repeated fact across multiple different cases can be leveraged to prioritize the search locations in a digi-
tal device in the developed XAI model.

3.3 | Wait times

There is no doubt that wait and processing times for DF services continue to increase as devices and tools increase in
sophistication. Quick and Choo (2014) outlined a potential model for data reduction called the Digital Forensic Data
Reduction framework, to address wait and processing times via the use of predefined data capture based on features
derived from data-mining. By using a smaller data set, analysis could be completed quicker to have more impact on
active investigations in the short term. Some tools, such as the Kroll Artefact Parser and Extractor (KAPE), seek to
address the increasing data load by using manually preconfigured file masks to copy specific data from target devices,
forensic images or virtual machines to create small triage images of key files from which forensically sound insights can
be found (Zimmerman, 2020). While this can speed up the time taken to process these files, it does not eliminate the
data volume problem. Files retrieved using tools like KAPE still require analysis by a DF examiner and reporting in a
meaningful format. XAI could potentially make these processes smarter and more streamlined.

3.4 | XAI for investigative support: Data review and reporting times

If XAI is the solution to addressing longstanding challenges in the DF field, there is a chance that its capability could
be pushed even further beyond post-hoc data extraction and processing, instead becoming a powerful tool for investiga-
tive data analysis. In the future, XAI could become so sophisticated that it could be ran over data from a target device,
and, in conjunction with case data from other evidence items and background information on those involved, including
open-source and closed-source information, provide a structured report with insights about the relationships between
people, places, times/dates and objects of relevance to the case. The remainder of this paper focuses on a potential
implementation of such a model: XAI for Investigative Support. The steps of this model are summarized in Figure 2.

3.4.1 | Setting scope of data to be examined

When undertaking a DF examination, different evidence items will have differing methods and availability for data
extraction. When using the theoretical XAI for Investigative Support model, individuals would first identify which evi-
dence items are to be processed. This could be decided based on the quality of the data, the centrality of the suspect to
the case, or operational experience. There may be pressure on DF examiners to include all items seized in the workflow,
but like with any investigation, there is a need to prioritize the targets of analysis. The child of drug dealer is unlikely to
have relevant information related to his/her parent's criminal empire located on his/her iPad. By removing these items,
one could decrease the overall processing time in the model, while also improving the accuracy of its output.

3.4.2 | Collect data of likely relevance

The processing of information through AI models is heavily resource intensive. Just as tools such as KAPE provide the
opportunity for DF examiners to speed up data acquisition and commence examinations more quickly, so too could
they be used for reducing the volume of data to be processed by the XAI models. As well as retrieving target data, tools
25739468, 2022, 2, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wfs2.1434 by <Shibboleth>-member@glam.ac.uk, Wiley Online Library on [19/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
HALL ET AL. 7 of 11

FIGURE 2 XAI for investigative support

must be able to prepare it for processing by the XAI system. As with any forensic analysis, processing of data should be
sure to comply with relevant legislation, with collection undertaken with the requisite authority.

3.4.3 | Provide data to a trained model

The XAI model used must be able to distinguish between different types of data and be able to interpret this data mean-
ingfully (e.g., the EXIF data located within a JPEG file should tell model something different to a shortcut file). In
examining this step of the model, it would be useful to think of a sample case. If one considers a child exploitation case,
EXIF data may suggest that CAM was manufactured at a specific location. Secondary source data such as police records
could inform the XAI who lives at the location. Wi-Fi connection records may suggest another suspect connected to
Wi-Fi at that address and browser data or chats could suggest that meetings have been occurring. All these insights
might usually take many days, weeks or months for a DF investigator to discern, but an XAI model should be able to
track people, places, times/dates and objects with greater speed and accuracy. The XAI model will then interpret corre-
lations between all these data points. Processing could even go beyond simple relationships to analyze the context in
which individual files (such as videos or images) exist on devices. AI has already been used for probative analysis of
images and videos to detect the presence of “deep fakes” (Maras & Alexandrou, 2019). For the XAI system at this point
of an investigation, transparency is optional. This is because the investigator and DF analyst are not using the output of
the model as the basis for charges to proceed to court. Instead, the XAI can act in a similar method to those currently in
utilized by DF software providers by relying on the validation of the DF examiners until there is an appropriate level of
25739468, 2022, 2, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wfs2.1434 by <Shibboleth>-member@glam.ac.uk, Wiley Online Library on [19/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 of 11 HALL ET AL.

confidence in the model's performance, or, be used as a tool for further avenues of enquiry, rather than evidence in and
of itself.

3.4.4 | Receive automated report

The output of the XAI model should be an automated report detailing insights into the relevance of items examined
(and by extension, their owners) to the case. This report should reference specific data points as indicators of its asser-
tions and provide levels of confidence in its analysis. Analysis should include insights into key people, places, times/
dates and objects, and any relationships it deems relevant based on training data. Visualization of the output will help
interpretation and increase explainability for investigators.

4 | C AS E S T UDY

The following is an example of how our proposed XAI for Investigative Support, could potentially be applied to a real-
world case study in a law enforcement agency's DF lab. The case itself is fictional, but shares features with real cases
the first author has assisted in.

4.1 | The lab

DF specialists working for a law enforcement agency work in a centralized lab, which operates under the purview of
the centralized Crime Department, rather than within a forensics branch or external forensic agency. The lab accepts
submissions for jobs for forensic work through a “managed job” process, which is designed specifically to minimize DF
specialist time spent on producing statements and attending court. First, an investigation's informant submits a request
for forensic analysis, for which a priority rating is assigned. Following a waiting period based on prioritization, the
informant delivers the property to the DF lab for forensic analysis, and briefs the DF specialist on the case particulars,
including files or data types specifically of interest (if any). DF specialists are often working on multiple jobs at once, so
property may be received and stored prior to analysis. Once work on the job is commenced, the DF specialist gains
access to the device and processes it into software appropriate for viewing the extracted data (e.g., Griffeye Analyze for
CAMs categorization) either via live preview with a storage device connected to a forensic bridge, or after imaging the
device if time allows. The “managed job” requires DF specialists to “facilitate access” to devices on behalf of investiga-
tions' informants. It is the informants who review extracted data of interest and determine that of evidential value.
Files/data identified by the informant are then provided in forensic reporting, with the original files (if they can be pro-
vided) and a report of relevant metadata (e.g., File System information). Some cases (e.g., cases involving CAM) require
informants to categorize all images and videos extracted from a device and representative CAM files to be rendered in a
printed, evidential report to be produced by the DF specialist. Some informants prefer not to attend viewings for specific
evidence items, in which case portable cases are produced using the in-built functionality of different forensic suites to
allow for informants to create their own reports at their own offices. Due to limited capacity for the retention of case
files and forensic images, often only reports provided to informants are kept. No standard procedures exist for data
retention.
Simple jobs involving mobile devices can also be processed remotely, as the central lab provides extraction tools to
regional police sites where trained sworn police members can attempt to extract data and produce reports without a DF
specialist. These regional sites are embedded specifically with the sexual offenses and child abuse investigation teams,
which provided the bulk of regional forensic analysis work in the past. Training is managed in-house and provided
biannually to detectives who request it.
DF specialists are available to attend warrants with detectives when required to help with the acquisition of digital
evidence upon request from a team within the centralized crime department. Most warrants are not attended, with digi-
tal evidence being collected and handled by detectives, who receive basic evidence handling training as part of their
detective qualification. Memory acquisition is never completed for desktop computer or laptop devices found running
at scenes, unless a DF specialist is present. Mobile devices are often “hand-scrolled” by detectives if seized while turned
on. Detectives' training states that mobile devices should be placed into flight mode and have their sim card removed
25739468, 2022, 2, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wfs2.1434 by <Shibboleth>-member@glam.ac.uk, Wiley Online Library on [19/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
HALL ET AL. 9 of 11

prior to processing as evidence, though this does not always occur. As a result, some devices are remotely wiped prior
to examination of the data present.
The lab also has a system in place to deal with more complicated DF cases. When digital evidence produced by the
informant is challenged in court, the informant can request additional forensic analysis by a DF specialist. The evidence
is then returned to lab. Each property item is photographed and imaged before forensic analysis takes places. The foren-
sic analysis is designed to answer questions raised in court and is unique for each case. Expected wait times for these
jobs from request submission to the delivery of a report can be up to 12 months. There are no standardized procedures
for reporting, with different DF specialists presenting their evidence in a statement with annexes, or a report format.
The DF specialist is then available to attend court and provide evidence relating to the findings of their forensic
analysis.

4.2 | Case details

Detectives from a team combating the exploitation of children received a notification from the National Center for
Missing and Exploited Children (NCMEC) CyberTipline that an individual using a Facebook account uploaded CAM to
the platform. The notification showed the account was accessed from an IP owned by the VPN client, SurfShark. Analy-
sis of HTTP headers (User agent) provided in the report showed the account was accessed from a PC running Windows
10  64 and a Samsung Galaxy S9 (SM-G960F) mobile phone. The recovery mobile phone number for the nominated
account was an Australian mobile phone number. Investigators ran checks with the user's telecommunications pro-
vider, and identified the owner of the account, Suspect A. Checks of police databases showed Suspect A was an
Australian Driver's license holder from which his residential address was identified.
Officers executed a commonwealth warrant under section 3 of the Commonwealth Crimes Act (Federal Register of
Legislation, 2020) an evidentiary warrant where entry is made prior to having conclusive evidence of illicit activities.
Digital forensic analysts attended to assist officers in the triaging of evidence (usually undertaken manually with foren-
sic software), both so that relevant items could be seized, and so that sufficient evidence could be found to charge and
remand Suspect A. Items that could not be preliminarily examined on scene were moved under section 3K (Federal
Register of Legislation, 2020) to be assessed within 30 days for evidentiary value after which they were either returned
or seized. CAM was found on Suspect A's personal mobile device and laptop, which were seized, and he was charged
accordingly. Additional evidence items were moved for further examination at the digital forensics lab.
Examination of 3K items took precedence due to the limited length of time they could be held. Following the con-
clusion of this process, the seized items were stored until the pending analysis request was actioned subject to staffing
resources. Closer examination of Suspect A's mobile phone revealed that he was using multiple social media accounts
across various platforms, and multiple encrypted messaging applications, to receive and distribute CAM to a network of
other individuals online. Located aliases were then checked against police and open-source data, further investigations
were opened, and, eventually, additional warrants were executed.

4.3 | Potential outcomes

The XAI for Investigative Support model could reduce the time taken to action additional investigations in a case such
as this one from months to days, or even hours. First, Suspect A's property is triaged at the scene. Items of evidentiary
value and those that could not be assessed at the scene are moved/seized and taken to the forensics lab for analysis.
Each item is pre-processed to extract relevant data/artifacts via a method for data reduction, for example, KAPE acquisi-
tion. This targeted acquisition should be configured to capture relevant data based on features of the individual case
and devices being analyzed. This data, along with data from secondary sources, including but limited to, open-source
data, social media data, police holdings, and so on. would be given to the XAI system, which presents to investigators a
detailed and referenced report that identifies where CAM has been shared and who the participants in those communi-
cations were in terms of selectors/identifiers and enriched identification data related to those identifiers (e.g., social
media account profile pictures, police records, license photos, or even their proximity to people under the age of
18 based on census data). This data could help in the prioritization and execution of future investigations by being a
powerful tool for analysis.
25739468, 2022, 2, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wfs2.1434 by <Shibboleth>-member@glam.ac.uk, Wiley Online Library on [19/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 of 11 HALL ET AL.

5 | C ON C L U S I ON

We have provided context to the types of XAI currently available, and insights into their applicability in the DF field. Cur-
rent implementations of AI in the DF field are opaque in nature, requiring validation by a DF examiner, and unlikely to
be explainable in a court of law. The admissibility in court of forensic analysis completed by truly explainable AI cannot
be determined at this time, as this would require an analysis of existing legislation such as those relating to evidence acts.
However, by thinking of XAI not as a replacement for a DF examiner, but as a powerful tool to aid in investigation, case
administration, and prioritization, XAI could be leveraged effectively and legally to greatly enhance the DF field into the
future. As suggested by one of the reviewers, one of the future potential research agenda is to conduct an empirical study
to evaluate the effectiveness of XAI in various stages of the DF investigations in a real-world lab.
With the introduction of AI (or XAI) into DF, one has to also bear in mind the risk associated with adversarial
machine learning, where the AI/XAI models are trained using tainted/biased data. In other words, how can we know if
the XAI models used in the investigations can be trusted? In addition while detecting deepfakes is a topic of ongoing
interest (Akhtar et al., 2018; Cohen et al., 2020; Li et al., 2020), how confident are we that the evidence, including
obtained using XAI approaches, presented in court is not deepfake (text, audio, and video artifacts). In addition, the
robustness of AI-based models for DF was studied in Aditya et al. (2018) by implementing a domain-independent adver-
sary testing framework and testing the security robustness of black-box AIs. The development and analysis of such
frameworks for XAI is also a promising research direction.
On a slightly controversial note, there have been attempts to leverage AI in designing vulnerability identification and
exploitation approaches. For example, Gao et al. (2021) presented an AI-based vulnerability search approach on cross-
platform IoT binaries. Hence, a follow-up question is whether we can leverage such AI/XAI-based techniques to help DF
investigators identify and exploit vulnerabilities in the computing devices or networks to get access to evidence.

A C K N O WL E D G M E N T S
The authors thank the editor and the anonymous reviewers for providing constructive and generous feedback. Despite
their invaluable assistance, any errors remaining in this paper are solely attributed to the authors. The views and opin-
ions expressed in this article are those of the authors alone and not the organizations with whom the authors are or
have been associated. In addition, certain commercial entities, equipment, or materials may be identified in this docu-
ment in order to describe an experimental procedure or concept adequately. Such identification is not intended to imply
recommendation or endorsement by the organizations with whom the authors are or have been associated.

CONFLICT OF INTEREST
Dr. Kim-Kwang Raymond Choo is an Editor of the journal and was excluded from the peer-review process and all edi-
torial decisions related to the publication of this article.

A U T H O R C ON T R I B U T I O NS
Stuart Hall: Conceptualization; formal analysis; investigation; methodology; project administration; resources; valida-
tion; visualization; writing-original draft; writing-review & editing. Amin Sakzad: Conceptualization; methodology;
project administration; supervision; validation; visualization; writing-review & editing. Kim-Kwang Raymond Choo:
Conceptualization; methodology; supervision; validation; writing-review & editing.

DATA AVAILABILITY STATEMENT


Data sharing is not applicable to this article as no new data were created or analyzed in this study.

ORCID
Amin Sakzad https://orcid.org/0000-0003-4569-3384
Kim-Kwang Raymond Choo https://orcid.org/0000-0001-9208-5336

R EF E RE N C E S
Aditya, K., Grzonkowski, S., & Lekhac, N. A. (2018). Enabling trust in deep learning models: A digital forensics case study. In 2018 17th IEEE
International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big
Data Science and Engineering (Trustcom/Bigdatase) (pp. 1250–1255). IEEE Computer Society. https://doi.org/10.1109/TrustCom/
BigDataSE.2018.00172
25739468, 2022, 2, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wfs2.1434 by <Shibboleth>-member@glam.ac.uk, Wiley Online Library on [19/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
HALL ET AL. 11 of 11

Akhtar, N., Liu, J., & Mian, A. (2018). Defense against universal adversarial perturbations. In 2018 IEEE Conference on Computer Vision and
Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018 (pp. 3389–3398). IEEE Computer Society.
Arrieta, A. B., Diaz-Rodríguez, N., Ser, J. D., Bennetot, A., Tabik, S., Barbado, A., … Herrera, F. (2020). Explainable artificial intelligence
(XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115.
Burrell, J. (2016). How the machine “thinks”: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1),
2053951715622512.
Carmeli, K. (2020). How image classification delivers a new examination experience in cellebrite physical analyzer. Retrieved from https://
www.cellebrite.com/en/blog/how-image-classification-delivers-a-new-examination-experience-in-cellebrite-physical-analyzer/
Chari, S., Gruen, D. M., Seneviratne, O., & McGuinness, D. L. (2020). Foundations of explainable knowledge-enabled systems. Knowledge
Graphs for eXplainable Artificial Intelligence: Foundations, Applications and Challenges, 47, 23–48.
Cohen, G., Sapiro, G., & Giryes, R. (2020). Detecting adversarial samples using influence functions and nearest neighbors. In 2020 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020 (pp. 14441–14450). IEEE.
Costantini, S., Gasperis, G., & Olivieri, R. (2019). Digital forensics and investigations meet artificial intelligence. Annals of Mathematics and
Artificial Intelligence, 86(1–3), 193–229.
Doran, D., Schulz, S., & Besold, T. R. (2017). What does explainable ai really mean? a new conceptualization of perspectives. Proceedings of
the First International Workshop on Comprehensibility and Explanation in AI and ML 2017 co-located with 16th International Confer-
ence of the Italian Association for Artificial Intelligence (AI*IA 2017), Bari, Italy, CEUR Workshop Proceedings, 2071.
Du, X., Hargreaves, C., Sheppard, J., Anda, F., Sayakkara, A., Le-Khac, N.-A., & Scanlon, M. (2020). Sok: Exploring the state of the art and
the future potential of artificial intelligence in digital forensic investigation. In Proceedings of the 15th International Conference on Avail-
ability, Reliability and Security. Association for Computing Machinery.
Federal Register of Legislation. (2020). Crimes Act 1914. Retrieved from https://www.legislation.gov.au/Details/C2020C00253
Forensic Capability Network, Transforming Forensics, National Police Chiefs' Council, Association of Police and Crime Commissioners.
(2020). Digital forensic science strategy. Retrieved from https://www.fcn.police.uk/sites/default/files/2020-07/Digital%20Forensic%
20Science%20Strategy%20EMAIL%20VERSION%20ONLY.pdf
Gao, J., Yang, X., Jiang, Y., Song, H., Choo, K. R., & Sun, J. (2021). Semantic learning based cross-platform binary vulnerability search for iot
devices. IEEE Transactions on Industrial Informatics, 17(2), 971–979.
Griffeye. (2020). Multi-object detection—Now in Griffeye Brain. Retrieved from https://www.griffeye.com/multi-object-detection-now-in-
griffeye-brain/
Magnet Forensics. (2017). Introducing magnet.AI: Putting machine learning to work for forensics. Retrieved from https://www.
magnetforensics.com/blog/introducing-magnet-ai-putting-machine-learning-work-forensics/
Kafadar, K. (2019). Statistics and the impact of the 2009 NAS report. Duke LJ Online, 69, 6.
Li, S., Zhu, S., Paul, S., Roy-Chowdhury, A. K., Song, C., Krishnamurthy, S. V., … Chan, K. S. (2020). Connecting the dots: Detecting adver-
sarial perturbations using context inconsistency. In A. Vedaldi, H. Bischof, T. Brox, & J. Frahm (Eds.), Computer vision—ECCV 2020—
16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII (Vol. 12368, pp. 396–413). Springer.
Maras, M.-H., & Alexandrou, A. (2019). Determining authenticity of video evidence in the age of artificial intelligence and in the wake of
deepfake videos. The International Journal of Evidence & Proof, 23(3), 255–262.
Miller, T., Howe, P., & Sonenberg, L. (2017). Explainable AI: Beware of inmates running the asylum or: How i learnt to stop worrying and love
the social and behavioural sciences.
Molnar, C. (2020). Interpretable machine learning a guide for making black box models explainable. Retrieved from https://christophm.github.
io/interpretable-ml-book/
National District Attorneys Association (NDAA). (2016). Forensic science in criminal courts: Ensuring scientific validity of feature-comparison methods.
Retrieved from https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_forensic_science_report_final.pdf
National Research Council. (2009). Strengthening forensic science in the United States: A path forward. National Academies Press.
Preece, A., Ashelford, R., Armstrong, H., & Braines, D. (2018). Hows and whys of artificial intelligence for public sector decisions: Explana-
tion and evaluation. Available at arXiv:1810.02689 (Last Accessed 10 June 2021).
Darren Quick and Kim-Kwang Raymond Choo. (2014). Data reduction and data mining framework for digital forensic evidence: Storage, intel-
ligence, review and archive. Retrieved from https://www.aic.gov.au/publications/tandi/tandi480
Raaijmakers, S. (2019). Artificial intelligence for law enforcement: Challenges and opportunities. IEEE Security Privacy, 17(5), 74–77. https://
doi.org/10.1109/MSEC.2019.2925649
Wang, D., Yang, Q., Abdul, A., & Lim, B. Y. (2019). Designing theory-driven user-centric explainable ai. In Proceedings of the 2019 CHI Con-
ference on Human Factors in Computing Systems (pp. 1–15). Association for Computing Machinery.
Zimmerman, E. (2020). Introducing KAPE—Kroll artifact parser and extractor. Retrieved from https://www.kroll.com/en/insights/
publications/cyber/kroll-artifact-parser-extractor-kape

How to cite this article: Hall, S. W., Sakzad, A., & Choo, K.-K.R. (2022). Explainable artificial intelligence for
digital forensics. Wiley Interdisciplinary Reviews: Forensic Science, 4(2), e1434. https://doi.org/10.1002/wfs2.1434

You might also like