Big Data and Audit Evidence: Hbliburd@business - Rutgers.edu

1
Big data and audit evidence1 2
Helen Brown-Liburd
Assistant Professor of Accounting
Rutgers Business School
hbliburd@business.rutgers.edu
and
Miklos A. Vasarhelyi
KPMG Distinguished Professor of AIS
Rutgers Business School
miklosv@andromeda.rutgers.edu
1 This editorial aims to create a dialogue regarding research to advance audit thinking in light of the new evolving
data environment. A Special Topic is being planned for JETA on 2017.
2 The authors thank comments received from Pei Li, Jun Dai, Fei Qi Huang, and at workshops at Itau Unibanco and
the Rutgers Business School. The help from Ms. Qiao Li and Sophia (Ting) Sun as well as the suggestions of JP Krahel
were very much appreciated.
2
Big data and audit evidence
Helen Brown-Liburd and Miklos A. Vasarhelyi
Introduction
The traditional view of evidence
“Audit evidence is all the information, whether obtained from audit procedures or other sources that is
used by the auditor in arriving at the conclusions on which the auditor's opinion is based. Audit
evidence consists of both information that supports and corroborates management's assertions
regarding the financial statements or internal control over financial reporting and information that
contradicts such assertions. (PCAOB, AS 15)”
Recent years have brought a substantively different technological environment for organizational
assurance processes. The advent of a very different set of business processes that support the modern
business organization have provided radically new tools, a new data environment, and a new set of deep
problems. As such, the traditional view of audit evidence may no longer be sufficient and the audit
profession and regulators must be mindful of the impact that a more advanced technological
environment is likely to have on certain traditional forms of audit evidence.
While business processes are progressively incorporating Big Data (Vasarhelyi, Kogan and Tuttle, 2015),
both the measurement of business (accounting) and the assurance of this measurement (auditing) have
yet to take advantage of these innovations and integrate new possibilities and threats into their rules
and regulations. These new emerging technologies can substantively change the environment and
practice of accounting and auditing.
Technology has widened the distance between data and its users, creating a rich and complex
production environment as well as an increased need for verification of the acuity of these processes. A
variety of threats has followed the capability enhancements provided by technology (MIT Technology
Review, 2015). At the same time, the new data environment provides the possibility of greatly enhanced
assurance capabilities that will substantially change auditing. This note focuses on the new data
environment and its potential to enhance and transform the nature, usage, and decision processes
related to audit evidence.
Three key questions will be examined:
 What forms of evidence are emerging from the new data environment?
 How can this evidence be integrated into the traditional audit process?
 How should the assurance process conceptually change?
The emerging data environment

Linking the Business Process to external Data is drastically changing the data environment. First
organizations are using a series of different cloud (Weinman, 2013; Wei, 2014, Du & Cong, 2015)
arrangements of virtual location. This type of arrangement allows for faultless ubiquitous support of
3
corporate data and facilitates better interfacing with it’s "buffer (intermediate, boundary) zones” and
with the exogenous Big Data environment. The actual physical location of data is irrelevant in its
classification as either 1) corporate data 2) buffer data, or 3) exogenous Big Data”.
Automatic data collection

Traditional data capture and preparation provided substantive support for business information systems
at a large cost of manual capture. During that technological period data was mainly prepared in
punched cards and paper tape, then stored on magnetic tapes. Due to labor intensiveness, process
imprecision, and costly storage, data stores were limited in size. With the advent of data scanners, the
process continued to be mainly manual but some degree of automation was achieved in data capture
and linkages between traditional data and purchase baskets 3 were established. An entire new set of
questions became possible as described in Figure 1. Later on, Web data (including click information,
URLs, and referring links) provided further data linkages and a substantially larger volume. Again, there
were substantive increases in data volume and storage and unstructured, automatically captured data
(URLs, click paths, identification label), were integrated into the environment. New types of analytic
questions arose as click paths provided a dynamic view of customer behavior but in a less deterministic
and more stochastic manner.
Figure 1: The Big Data environment (adapted from Vasarhelyi, Kogan, & Tuttle, 2015)
3 Purchase baskets entail the merchandise a particular client purchased in one shopping trip.
4
The next level, both endogenous and exogenous, is the Internet of Things (IoT) (Kopetz, 2011; Weinman,
2014; Chui, Löffler, Roberts, 2010). Goldman Sachs predicts 28 billion “things” will be connected to the
Internet by 2020 as opposed to the 6 billion mobile internet items in the 2000s. However, this figure
does not seem to incorporate the use of RFID chips (Shepard, 2005), which are to be embedded into a
large number of inventory items that will therefore become Web-enabled and many other applications
of these chips. These are devices that “reflect” identification information, but the rapid decrease in chip
costs means that their capabilities could become more active and even stronger elements of the audit
information chain. These RFID chips associated with connection devices will allow for development of “e-
tracks” that will reflect logs of items available and eventually more intelligent information to be
incorporated.
Organizations will eventually embed chips into their inventory and fixed assets, use mobile trackers on
equipment and employees, and have smart devices in most of their facilities, managing access, location,
environmental parameters, and dynamic behavior. These measurements could be real and active parts of
the corporate information system and will raise many privacy and security concerns. The IoT adds
another layer to the expanding data network that will serve both for management and the assurance
functions. The IoT can be exogenous and endogenous, with interacting elements located inside, at the
boundaries, and outside the organization. While detail and precision are of great import on day-to-day
operations of organizations, for the assurance function, and in terms of audit evidence, in certain cases a
high-level view may be of more value.
Intermediary (boundary) data

In addition to the elements, that link the external to the internal environment, a layer of information is
often found that is external to the organizational traditional information system, but an integral part of
its information processes. This intermediate layer includes a large set of externally captured information
that is briefly captured for rapid scrutiny. For example, video from hundreds of cameras may be
scrutinized using threat detection (Weidemann et al., 2005) or face recognition (Jafri & Arabnia, 2009)
software and selectively retained based on given parameters.
This intermediate environment of much data, of variable timing, and optional registration is highly
contingent on the predicted use of data. Organizations will capture data and place it in intermediate
storage for examination, filtering, and selective retention. The same data source could be measured both
on an annual basis (e.g. one full inventory scan), or every hundredth of a second (frequent scans to
immediately identify withdrawal of inventory). Such frequency of data generation, filtering, and
retention decisions are contingent on the predicted application of data and the potential value of more
frequent information. Over time, new applications emerge that may require more frequent data. For
example, while one measurement of inventory can suffice for year-end verification, hourly
measurements might serve to verify the dynamics of inventory usage, relate these to employee
movement, and serve to segment predicted income into very small time intervals.
Intermediary data does present some challenges for organizations. For example, internal data (e.g., e-
mails, employee tracking, camera images), under current US laws are available without privacy
restrictions to employers. On the other hand external data may or may not be available for analysis.
5
Big and intermediate data evidence sources

External data sources continue to expand in terms of both content and interconnectedness. The digital
domain encompasses audio, video and textual media with a progressing amount of sensor data (e-
tracking) being generated and partially captured by measurement and storage devices. These domains
feature a set of information characteristics4 such as: permanence (transitory to permanent), privacy
(private to public), level of aggregation (fine to coarse), security (secure to open), accuracy (exact to
incorrect), timing (past to real-time), etc.
Overall, the emerging data environment has to be evaluated in light of its impact on the sufficiency,
competence, and reliability of audit evidence. While traditional evidence tends to be mainly archival and
internal5, the evidence typically extracted from the external environment is more probabilistic in nature
and must be considered in light of the characteristics of information. A new body of knowledge must be
created to understand this information and the emerging limitations of the traditional audit model. This
note raises some issues, discusses some technologies, and provides some examples to stimulate
research into new sets of evidence and their impact on audit judgment.
The development of applications to manage multiple operational processes contributes to connect

(exogenous) Big Data (McAfee, and Brynjolfsson. 2012) to corporate measurement, management, and
assurance processes. Extending the above example, cameras in the corporate parking lot (boundary), in
the streets surrounding a facility (external), and in the stores (internal) can be used to gather a corpus of
visual information brought into some temporary storage for short-term usage. This temporary storage is
a “boundary area” that collects information applies applications and feeds a small subset of information
to main corporate stores, ERPs, etc. The aforementioned video feeds may use face recognition software
to identify employees, frequent customers, or undesirables. Selections of these comparisons are then
fed into client files, customer support systems, or the security apparatus for warning, recording, or
action. Applications that interpret external Big Data feeds and link to the organizational information
systems are called bridges. These bridges can bring important data for operations and continuous
monitoring as well as substantive evidential matter to support a new set of assurance processes.
Evidence Considerations in an evolving data environment

An audit ecosystem to administer a progressively automated audit
Dai (2014) has proposed the usage of the Audit Data Standard (Zhang et al., 2012) and related apps into
an integrated audit of the future. (Figure 2). This audit includes a risk assessment platform generating an
automated audit plan with a set of assertions, a recommender system choosing apps, results being
analyzed by routines, and the process software generating internal and external audit reports. At all
steps of the process, software agents would be working and generating forms of evidence (Papazoglou,
2001).6
4 Information theory (e.g. Shannon and Weaver, 1949) and measurement theory (Mock, 1976; Romero et al, 2012)
can be used in the conceptualization of relationships within and among data environments.
5 On the other hand direct observation and confirmations transcend organizational boundaries (e.g. bank
confirmations) and this multi-entity feature is explored as evidence.
6 There are many types of software agents such as application agents, personal agents (or interface), general
business activity agents (including information brokering agents, and negotiation and contracting agents), and
system-level support agents (planning and scheduling agent, interoperation agents, business transaction agents,
and security agents).
6
Figure 2: Evolving view of an automated system using the Audit Data Standard (ADS) (from Dai, 2014)
What forms of evidence are emerging from the Big Data environment?
This section examines, on a more detailed level, some data generated by devices such as RFID chips and
GPS localizers, illustrating the enriched set of management processes that can be developed through
their utilization.
Audit evidence then and now

Audit standards largely provide guidance related to the traditional forms of audit evidence [e.g.,
evidence generated by company or external documents] (AS No. 15, PCAOB 2015; SAS 122, AICPA 2012,
ISA 500, IAASB 2009) and evidence considerations in an electronic environment [e.g., information
transmitted, processed, maintained, or accessed electronically] (SAS 122, AICPA 2012; SAS 109, AICPA
2006). However, these standards do not sufficiently address the nature of evidential matter that will be
necessary in the more complex and advanced technological environment. Auditing standards require
auditors to gather audit evidence that is sufficient, competent and reliable to support their audit
opinion; but the characteristics used to define sufficient, competent and reliable audit evidence may not
be adequate. Table 1 summarizes the attributes in the standards required of audit evidence and issues
that should be considered in a more complex Big Data environment.
Evidence Characteristics Considerations in Big Data Environment
 difficulty of alteration; external Big Data is not under the business control
 credibility; data capture and preprocessing must be verified
 completeness; external data is practically infinite and not always accessible
 evidence of approvals; data is external
 ease of use; and new automatic methods are being developed for this purpose
 clarity external Big Data tends to be stochastic
7
Table 1: Attributes of Big Data evidence
As a result, it is important to evaluate how technology can be utilized to ensure that the attributes
defined in the standards are met. Several relevant points to consider are:
1) Sufficiency (quantity) may not be the primary issue. Because new technology will allow auditors
to examine 100% of the population, the shift in focus will most likely relate to timely accessibility
of the relevant data and the auditor’s use of various data analytic tools to analyze and interpret
the data in a more meaningful and effective way.
2) Appropriateness (quality). Relevance and reliability are key issues and the traditional approaches
for their evaluation may not apply. Relevance most likely will be determined by judgment as it is
today. However, such judgment will be subject to evaluation by formalization as many tests will
be formalized into computer procedures that do not currently exist. Typically, automated data
extraction and utilization by formal models creates a much higher level of reliability than manual
processes.
3) The sources and types of evidence are new, and how this evidence may compliment or replace
traditional evidence must be better understood by researchers and the profession.
Incorporating modern technology to obtain enhanced audit evidence

Incorporating advanced technology into the audit process will undoubtedly raise questions concerning
the implication of a less transparent audit trail (e.g. traceable paper documents may not exist). Although
the traditional manual audit trail has become rare, computer processes can create logs with reasonable
facility and these can be collected and processed in many ways not previously possible. Process mining
techniques (Jans, Alles and Vasarhelyi, 2014) can be used to build relevant audit logs and create a
plethora of tests that would be impractical to apply manually. For example, each transaction can have its
path evaluated and rated in terms of suspicion; each transaction’s approvals evaluated in terms of
segregation of duties (SOD) and rank; the networks of people dealing with transactions traced and
tracked; etc.
Another question to consider is how should traditional audit procedures change to adapt to technology?
Audit procedures address assertions. Since assertions are driven by financial reporting standards, they
are unlikely to change and auditors will still be required to establish audit objectives and design their
audit procedures to address these assertions. The change will instead be driven by how technology
impacts the nature, extent and timing of audit procedures performed. For example, most audit
objectives and assertions will be formalized and programmed into repetitive apps to be applied within an
automated audit that will implement a formal audit plan with elements to be repeated at predetermined
times or continuously.
Finally, the risk based nature of the audit process will need to consider the audit risks associated with
obtaining sufficient, competent reliable audit evidence in the Big Data era. With the advent of “Big Data”,
the risk assessment process potentially becomes even more complex because the unstructured nature of
Big Data increases ambiguity. On the other hand full population testing may reduce the risk associated
with certain items to zero. Thus, the challenge that auditors face involves how they can derive value
from the increased amount of information that they are exposed to and how to ensure that audit
judgments and decisions are based on quality information that is relevant and trustworthy. The use of
8
more sophisticated audit tools can assist auditors by automating the collection, formatting, and mapping
of key audit objectives and procedures. For example, these audit tools will be highly structured due to
the formalization of the audit plan with pre-selected apps processing data at predetermined times,
covering a measured range of known risks and using mathematically (or judgmentally) derived
algorithms. Uncovered or unexpected evidence or judgments will then be manually evaluated and the
human approach captured and integrated into the existing system. A feedback system evaluating
outcomes in the short and long term will be used to evaluate audit system performance over time.
Nature of the new type of evidence

Some key technical questions must be raised in the context of audit processes and audit evidence in the
modern information age. First the role of automation, second recency (timeliness) of information, third
how does this tie to the traditional view of evidence
The essentiality of automation

The processes of the advanced information environment are automation based, very data-heavy, and
must be synchronized. Imagining all or even a small percentage of telephone calls, messages in the
Internet, etc., being manually processed is absurd. The same applies to audit processes, matching of data
streams, exception analytics and reporting. However, the top level of the decision schema in the current
technological environment must still be manual, and this equally applies to the feedback loop applicable
to formal system improvements. Schema as earlier discussed by Dai (2014) and Kozlowski and Vasarhelyi
(2014) will serve to bring evidence together and anchor the top level non-formalizable audit decision
process.
Capturing evidence every nanosecond

The frequency of the assurance process has been discussed frequently in the literature (Vasarhelyi and
Halper, 1991; Chiu et al, 2014). Although strong arguments have been made for a continuous audit in
general, the consensus has been that assurance must be performed at maximum at the “pulse” of the
system. Managers given a choice of employing very frequent usage of audit applications tend to be
comfortable with a monthly approach7. However, if the automatic data traces discussed in this note
materialize, and the argument that data usage through apps and its economic dominate frequency, a
new scenario may evolve.
The discrepancy with the traditional view of evidence

An entire new, different set of evidence is evolving. This evidence is so strong that it will pressure
regulators and practitioners to bring it into consideration. It will however present pressure to reconsider
traditional concepts in the audit such as materiality, independence, and method of judgment. This
evidence may be mainly based on data streams primarily used for operations and continuous
monitoring. Under current view their operation could be seen as forms of additional controls or
integration of controls. Furthermore, the requirements for automation will force the formalization of
evidence evaluation processes that are not available today.
Predictive evidence
Continuous audit research (Chiu, Liu, Vasarhelyi, 2014) focuses on a model of monitoring business
processes through selected metrics, the comparison of these metrics with standards (models) of
7 Taylor, P. Presentation at the Transformative Technology workshop of the AAA SET section in Chicago, august
2015.
9
performance and acceptable variance and the issuance of alarms (alerts) when an allowable variance is
exceeded. Once alarms are issued an assessment of the nature of the alarms will lead to an “audit by
exception.”
Recent research has focused on improving comparison models to establish more reasonable standards
for comparison (Kuenkaikaew, 2013; Kogan et al, 2014). Furthermore the need for say, predicting fourth
quarter levels analytically (times series or cross sectional) and accepting reported figures if discrepancies
are small has become greater as Sarbanes-Oxley provides a much narrower time frame to issue the
annual opinion.
Stochastic evidence
Audit evidence derived from external data tends to be more independent but less tailored for the actual
decision process. It is reasonable to expect that much of this evidence will be stochastic and
probabilistic, and statistical methods will have to be built for its usage.
How can these types of evidence be integrated into the traditional

audit process?
Figure 3 is a flowchart of key business processes (Vasarhelyi & Greenstein, 2003; Romero et al, 2012).
These processes are regularly examined in the annual audit and are being changed by evolving
technology. As an illustration of the changes in the business measurement and assurance processes
(Alles & Vasarhelyi, 2006), Error: Reference source not found6 shows arrows linking B2C and B2B
markets to the Property, Plant and Equipment (PP&E) process. Current quotes and sales of equipment
are used for the valuation of this account and can also be used for the valuation of inventory. This is both
a shift from the traditional cost basis towards current market (fair) value and a way to provide assurance
(if this new method of accounting is adopted) on inventory and PP&E. Dotted arrows link sources of
external data to other processes indicating some form of validation that may be used as modern
evidence.
10
Figure 3: Modern Business Measurement and Modern Assurance using external sources of evidence
Among these relationships, examples can be drawn that are further discussed later in this note:
 Security recordings of arrivals and departures of trucks from parking lots for assuring inventory
changes
 Telephone records, associated with e-mails, to validate sales, ordering, and discrepancy
determinations
 Examination of video streams in network TV to confirm that ads were actually placed. These can
be linked to variations in order/ sales to validate the ad efficiency promised by ad agencies and
marketing strategies
 GPS tracks of truck trajectories to validate deliveries and pickups. This can also support sales
validation, purchase validation, efficient usage of trucks, etc. RFID logs of items loaded in trucks
would allow detailed measures of content.
 Sentiment analysis of social media postings to determine frequency and needs of customer
assistance, repairs, and potential reputational risk. Content analysis of this same media for fault
determination in manufactured parts.
 Evidence from the Internet of Things (IoT) on energy and facility usage, individual movement and
health, and many other indices that can be used in a confirmatory or predictive mode.
The new sources of information and their linkage to business processes, as above illustrated, can offer an
enormous set of confirmatory and predictive evidence as well as tools for control and continuous
11
monitoring. The transition is expensive and behaviorally challenging, as it requires changed behavior,
management processes, and statutes.
The economics of digital innovation are different, mostly because processes will require less manual
labor and fixed costs will exceed variable costs in the development of applications to provide information
and support. The challenges are enormous but the benefits are such that an evolution is very likely.
These challenges include: Incompleteness of external data, stochastic relationships among Big Data
variables and internal business processes, transition from current to future status, anachronistic
standards, the need or lack of need for recent data, the multi-part multi-agent problem, data relating
one to many and many to many artifacts, etc.
How should the assurance process conceptually change?

This section examines on a more detailed level structural changes, nature of data streams, and linkages
of the different data environments.
Incorporating Automatic Sensing

The usage of GPS devices (Kaplan and Hegharty, 2005) for localization and RFID chips for identification
opens a new chapter in automatic assurance and process management. Traditionally, inventory
verification is performed at the end of the period by physical inventory counts. If RFID chips are
embedded into each inventory item, automatic measurement would follow (Error: Reference source not
found. The incremental cost of performing this measurement is close to zero, but overly frequent scans
can result in cumbersome data volume, and subsequently difficult manipulation and storage. Although
this section focuses on these two artifacts, utilization of IoT in general offers enormous potential.
Linking to other corporate measurement processes

The representation of entity wealth and operations has evolved over the years into the current set of
financial statements for external consumption but also to a much larger and complex set of business
reporting platforms such as ERPs. These incorporate dozens of processes and thousands of controls and
reports typically built around a relational database system and linked to a set of legacy software. These
typically store and use structured data. The new Big Data and electronic tracking logs more likely will be
unstructured and stochastic in the information they provide.
Linking with inventory arrivals

Although taking inventory counts every millisecond may not be useful, monitoring inventory arrivals may
serve to manage and verify part of the P2P process (Figure 4). If this process is extended with GPS and
partnered with suppliers it can verify supplier provisioning and so on. This external linkage with relevant
processes of business partners can potentially represent a major change in business and assurance
process thinking. Vasarhelyi (2003) proposed, prior to many of these technical developments, peer
matching (confirmatory extranets) of cooperating parties as a substantively improved confirmation
process (Dull, Tegarden, and Scheifler, 2006).
12
Figure 4: Linking Inventory RFID measures to everything
Linking inventory departures with the sales process

Although taking inventory counts every millisecond may not be apparently useful, monitoring inventory
movement may serve to connect with cash registers, receivables posting, out-shipping processes, etc.
This may also serve to manage and verify another part of the P2P process (Figure 4Error: Reference
source not found). Extending this verification to large clients, tax collection entities, and service
providers will blur the currently tight boundaries of the business process. Industrial processes that
manufacture continuously are often automated with links and sensors to most elements. Business
processes are still reasonably manual with strong human components although the most manual
repetitive processes are being rapidly automated (Monga, 2015).
The digital trace of inventory measurements can be linked to sales records, electronic cash registers,
supermarket checkouts, and theft detectors at store exits (Figure 4). Depending on the accuracy and
completeness of these links they can be seen as part of a larger and more complex corporate
measurement, control, and assurance ecosystem or even as just one large system. Eventually much of
these management control and assurance processes will focus on exception examination and diagnostics
while the core processes will be performed automatically.
 The value of more frequent automatic data collection depends on the applications developed to
use this data
It must be noted that the development and utilization of these close-to-the-event data flows depends on
a series of technologies and applications being “piggybacked” (Vasarhelyi, 2015) and the intrinsic
characteristics and liabilities (threats) brought in from this process.
13
Automatic confirmations are only one example of the disruptive technological change eventually
affecting assurance.
 Much of the extant concern in assurance is with population integrity and computational
correctness. If this integrity is assured by a series of automatic close-to-the-event control
processes, that auto-self-verify, the emphasis and focus of audit will change.
The matching/integration problem of progressively adding new “sensing” data streams (Hoogduin, Yoon,
and Zhang, 2015) is the general concern faced by automatic confirmations. Some processes are totally
within the system and can be fully engineered while others depends on measuring and aggregating
streams of data outside the controlled environment on which very little influence is exerted.
The multi-party problem

A corporation has a large set of frequent partners among its suppliers, customers, service providers,
government entities, and regulators. Setting up automatic confirmations for regular high volume
partners (Figure 4) with peer-to-peer confirmatory pinging is of mutual benefit and will provide the basis
for improved processes. Establishing these relationships for one time (e.g. single sale) or seldom
performed transactions poses a different problem and is subject to different economics. Some type of
extensible protocol, of adaptable characteristics, that can identify type of relationships and benefit from
the open transport nature of the internet, has to be developed. The lower end of the transaction stream
such as an individual buyer may not have the same storage and information processing capabilities as
corporation but still has confirmatory streams from secondary sources such as credit card records of
purchases, carrier receipts from deliveries, etc. It is naïve not to expect certain root transactions to be
non-confirmable such as cash transactions in a store.
XML was developed as a protocol to allow for multiple forms (extensions) of communication interchange
standards typically tied to an industry (e.g. XBRL) but conceivably can be adapted to be used in the
above multi-party problem. XML’s goal was to create interoperability between processes and has been
applied in multiple industries. Tagging can also be used for confirmatory flags for different types of
assurance functions.
In addition to their potential value as confirmatory evidence of the performance of business processes,
the utilization of these interlinked processes portends opportunities for management through
continuous monitoring of product, money and information flow.
Connecting to the external environment (closed and open loops)

Figure 4 also illustrates the environment with multiple suppliers and customers, some of which are too
small or unwilling to participate on a closed loop with the organization being considered. For example,
ABC Company opens a relationship with a bank (or a supplier, or a customer). The contractual agreement
includes a protocol definition of mutual pinging for confirmations and access to defined files, data
streams and protocols desired not only for electronic confirmations but also for expanded digital stream
cooperation with access to ordering, storage, logistics, and other streams of data. In this case a closed
loop of verification and information can be derived not only at the individual transaction level but also at
summaries and aggregate numbers. A closed loop of information is established and improved supply
chain accuracy is obtained with joint processes of error correction and process management.
14
This type of internal external tight linkage can already be found in processes such as supplier-managed-
inventory and automatic reorder in inventory optimization. However, there is always serious hesitation
to allow third parties to observe / change the organization’s internal data streams.
If this closed loop is not obtained, organizations may be able to resort to secondary observation out of
the boundaries of the corresponding partners. Typically, these measurements streams are much less
accurate than direct connectivity but still may be useful for many purposes. Direct measurement at a
detailed level offers great benefits. Assurance verification processes may not need to exist if the
operations are automatically monitored. The analogy is that early IT audits were concerned with
hardware computational errors and progressively this concern disappeared due to inherent hardware
controls.
Auditors may be able to rely on operation controls and remedial processes without having to monitor
certain data flows. The level of aggregation of auditor control and action, the circumstances where
auditors should be part of the transaction stream, and the principles, regulations and methods of audit
in this environment are still open questions for research. An interesting additional set of questions
relates to the types of anomalies, frauds, and serious risk operational issues that can arise in this
environment. The third set of questions that applies in most technological environments is what type of
utilization of technology also emerges as a threat to be incorporated into audit concerns.
Figure 4 also includes three sources of Big Data: e-mails, social media, and security videos. All of these
sources may be from inside or outside the system. The consideration that arises with the usage of
external Big Data is its availability. Users of social media became more selective on their sharing habits
and social media providers, pressured by public opinion, improved privacy for their users. However there
are still enormous pools of public data that social media companies provide to users and data assemblers
for summarization and analysis. Third party vendors (e.g. doubleclick.com) assemble private and public
datasets to provide better pictures of Internet commerce usage and trends. Important ethical, legal, and
logistic issues arise. For example “the right to be forgotten” (Eugen & Marius, 2013) raises issues of
retention of public data.
Is it really evidence?
Several issues and a progressive view of the IT environment permeated by discussion of evidence and
audit procedure are presented in this note. These raised a series of questions about audit evidence. The
following are statements related to new or slightly different forms of audit evidence.
 No anomalies were extracted from the transaction stream

 Top key risk indicators were stable over the current period
 Our records and those of third parties match in 98% of the instances and in 94% of values
 Systems have been down for 2.5% of the time
 The auditor’s predictive model is 3% below management’s numbers
 Our Watson-based inference (Wallace-Wells, 2015) system rates the client to be 97% likely to
have his financial statements fairly stated
Research is needed to determine how these forms of evidence can be incorporated into the current
auditing framework, how this framework should be changed, the evolutionary approach to change from
the current to the future model, and finally the roles, competencies, functions, of the human auditor in
this future environment.
15
A parallel question that must be raised concerns the audit itself, in light of large set of interlinked,
mutually trusting systems, producing consensus numbers. What is its value added? What additional
functions must it provide? What is going to be the evolutional path?
Conclusions
The concept and nature of audit evidence is changing due to the emergence of Big Data, digital evidence,
and electronic traces propitiated by RFID, GPS, and IoT recording. The consequence of these events is a
progressive overlap between management, management control, continuous monitoring, and
continuous auditing functions that must be somewhat re-conceptualized. Auditing systems will likely be
complete ecosystems with layer over layer of data treatments and largely automated processes. The
upper layers will be the complex set of decisions that lead to a final opinion or a final assessment of a
complex process. The upper layers of assurance decision will be constantly monitored and serve for
adding improvements and create better man-machine performance in assurance.
Evidence will be exponentially expanded in volume, and analytics will serve to summarize and explain its
meaning. Analytics to be used will vary within the context, but will generally be strongly based on
automatically sensed data, will be stochastic, and will constantly be scrutinized for effectiveness.
Limitations
This note is speculative in nature, aiming at exposing forthcoming technological data environments and
their interpretation. Its main benefit is to initially raise issues and potentially discuss factors relating to
these issues. The main purpose is to call the attention to research questions and motivate researchers to
embark in their investigation. The main limitation of this note is that the best it can hope for is to be
directionally correct and to propitiate thinking and research on a wide gamut of related topics.
Some research issues

Evidence in the future audit will use different fields of knowledge that range from computer science to
microelectronics, industrial engineering, statistics, and auditing. Some of the interesting issues raised
throughout this paper include:
 the level of aggregation of auditor control and action

 the circumstances in which auditors should create monitoring transaction streams
 the principles, regulations, and methods of audit in this projected environment
 the types of new anomalies, frauds, and serious risk operational issues that can arise in this
environment
 types of utilization of technology that emerge as a threat to be incorporated into audit concerns
 what are the above threats
 methods of the formalization of evidence evaluation processes
 the role and location of analytics in the audit process
 how can extant audit evolve to a Big Data enriched method of assurance?
 what are the assurance products of the auditor in this environment?
 will the current audit profession evolve to be the main provider of these products?
16
References
AICPA, Continuous Auditing and audit Analytics, monograph, 2015.
American Institute of Certified Public Accountants (AICPA). 2006. Statement on Auditing Standards No.
109, AU Section 314: Understanding the Entity and its Environment and Assessing the Risks of
Material Misstatement. New York.
American Institute of Certified Public Accountants (AICPA). 2012. Statement on Auditing Standards No.
122, AU-C Section 500: Audit Evidence. New York.
Alles, M. G., Kogan, A., and Vasarhelyi, M. A. 2002. Feasibility and economics of continuous assurance.
Auditing: A Journal of Practice & Theory, 21(1), 125-138.
____________________________________. 2003. “Black Box Logging and Tertiary Monitoring of

Continuous Assurance Systems.” Information Systems Control Journal 1: 37-39.
Alles, M. G., and Gray, G. L. 2012. A relative cost framework of demand for external assurance of XBRL
filings. Journal of Information Systems, 26(1), 103-126.
_________, M. G. Brennan, A. Kogan, and M. Vasarhelyi. 2006. Continuous Monitoring of Business

Process Controls: A Pilot Implementation of a Continuous Auditing System at
Siemens. International Journal of Accounting Information Systems 7 (2): 137-161.
Chiu, V., Q. Liu, and M. A. Vasarhelyi. 2014. The Development and Intellectual Structure of Continuous
Auditing Research. Journal of Accounting Literature. 33 (1-2): 37-57.
Chui, M., Löffler, M. and Roberts, M. The Internet of Things, McKinsey, March 2010.
Dai, J. A Recommender model for audit apps using the audit data standard, Dissertation in progress,
Rutgers Business School, 2014.
Eugen, C., and Marius, C. 2013. Right to be forgotten. Anales Universitatis Apulensis Series
Jurisprudentia, (16).
Hoogduin, L., K. Yoon, and L. Zhang. 2015. Integrating different forms of data for audit evidence: markets
research becoming relevant to assurance, Accounting Horizons, 29 (2): 431-438.
International Auditing and Assurance Standards Board (IAASB). 2009. International Standard of Auditing
No. 500: Audit Evidence. New York.
Jafri, R., and Arabnia, H. R. 2009. A survey of face recognition techniques. Journal of Information
Processing Systems, 5(2), 41-68.
Jans, M. J., Alles, M., and Vasarhelyi, M. A. 2010. Process mining of event logs in auditing: Opportunities
and challenges. Available at SSRN 2488737.
17
Kaplan, E., & Hegarty, C. (Eds.). 2005. Understanding GPS: principles and applications. Artech House.
Kogan, A., M. G. Alles,, M. A. Vasarhelyi, and J. Wu. 2014. Design and evaluation of a continuous data
level auditing system. Auditing: A Journal of Practice & Theory, 33(4), 221-245.
Kopetz, H. 2011. Internet of things. In Real-Time Systems, Springer, US: 307-323.
Kozlovski, S. and M. A. Vasarhelyi. 2014. An Audit Ecosystem: A Starting Point with Definitions,
Attributes and Agents, working paper, CarLab, Rutgers Business School, Newark, NJ.
Kuenkaikaew, S. 2013. Predictive audit analytics: evolving to a new era. PhD Dissertation, Rutgers
Business School, Newark, NJ.
Moffitt, K.C. and M. A. Vasarhelyi. 2013. AIS in an Age of Big Data Journal of Information Systems, 27 (2):
1-19.
McAfee, A., and E. Brynjolfsson. 2012. Big Data: the management revolution. Harvard Business Review,
October 2012, 60–66.
Mock, T. J. 1976. Measurement and accounting information criteria (No. 13). American Accounting
Association.
MIT Technology Review, The 20 Most Infamous Cyberattacks of the 21st Century, August 25 th, 205,
http://www.technologyreview.com/view/540786/the-20-most-infamous-cyberattacks-of-the-
21st-century-part-i/?utm_campaign=newsletters&utm_source=newsletter-daily-
all&utm_medium=email&utm_content=20150825
Monga, V. The new bookkeeper is a Robot, Wall Street Journal, May 5, 2015.
Papazoglou, M. P. 2001. Agent-oriented technology in support of e-business. Communications of the

ACM, 44(4), 71-77.
Public Company Accounting Oversight Board (PCAOB). 2010. Audit Evidence. PCAOB Auditing Standards.
Washington, D.C.: PCAOB.
Romero, S., G. Gal, T. J. Mock,and M. A. Vasarhelyi. 2012. A measurement theory perspective on

business measurement. Journal of Emerging Technologies in Accounting, 9(1), 1-24.
Shannon C.E. and W. Weaver. 1949. The Mathematical Theory of Information. University of Illinois Press,
Urbana, Illinois.
Shepard, S. (2005). RFID: radio frequency identification. McGraw Hill Professional.
Vasarhelyi, M. A., A. Kogan, and, B. Tuttle. 2015. Big Data in accounting: An overview. Accounting
Horizons, 29 (2):381-396.
Vasarhelyi, M.A. The new scenario of business processes and applications on the digital world, Working
Paper, CarLab, 2015.
18
Vasarhelyi, M. A., and M. Greenstein. 2003. Underlying principles of the electronization of business: A
research agenda. International Journal of Accounting Information Systems, 4(1), 1-25.
Vasarhelyi, M., and M. Alles. 2006. The Galileo Disclosure Model (GDM): reengineering Business
Reporting through using new technology and a demand driven process perspective to radically
transform the reporting environment for the 21st century. http://raw.rutgers.edu/gdl/Galileo
Vasarhelyi, M.A. 2003. Confirmatory Extranets: a methodology of automatic confirmations, Rutgers

Accounting Research Center.
Vasarhelyi M.A., M. G. Alles, and K. T. Williams. 2010. Continuous Assurance for the Now Economy. A
Thought Leadership Paper for the Institute of Chartered Accountants in Australia.
Vasarhelyi, M.A. and F. B. Halper. 1991. The continuous audit of online systems. Auditing: A Journal of
Practice and Theory, 10 (1): 110-125.
Weidemann, A., Fournier, G. R., Forand, L., & Mathieu, P. (2005, May). In harbor underwater threat
detection/identification using active imaging. In Defense and Security (pp. 59-70). International
Society for Optics and Photonics.
Weinman, J. Cloudonomics: The Business Value of Cloud Computing. John Wiley and Sons. Kindle Edition,
2013.
Wei, J. 2014. How wearables intersect with the cloud and the internet of things: Considerations for the
developers of wearables. Consumer Electronics Magazine, IEEE, 3 (3): 53-56.
Zhang, L., A. Pawlicki., D. McQuilken, and W. Titera. 2012. The AICPA Assurance Services Executive
Committee Emerging Assurance Technologies Task Force: The Audit Data Standards (ADS)
Initiative, Journal of Information systems.
Wallace-Wells, B. As Jeopardy Robot Watson Grows Up, How Afraid of It Should We Be? New York
Magazine, May 18, 2015.

Big Data and Audit Evidence: Hbliburd@business - Rutgers.edu

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data and Audit Evidence: Hbliburd@business - Rutgers.edu

Uploaded by

Copyright:

Available Formats

1

Big data and audit evidence1 2

Assistant Professor of Accounting

Rutgers Business School

KPMG Distinguished Professor of AIS

Rutgers Business School

Big data and audit evidence

Helen Brown-Liburd and Miklos A. Vasarhelyi

Three key questions will be examined:

The emerging data environment

Automatic data collection

Intermediary (boundary) data

Big and intermediate data evidence sources

The development of applications to manage multiple operational processes contributes to connect

Evidence Considerations in an evolving data environment

Audit evidence then and now

Evidence Characteristics Considerations in Big Data Environment

Table 1: Attributes of Big Data evidence

Incorporating modern technology to obtain enhanced audit evidence

Nature of the new type of evidence

The essentiality of automation

Capturing evidence every nanosecond

The discrepancy with the traditional view of evidence

How can these types of evidence be integrated into the traditional

How should the assurance process conceptually change?

Incorporating Automatic Sensing

Linking to other corporate measurement processes

Linking with inventory arrivals

Figure 4: Linking Inventory RFID measures to everything

Linking inventory departures with the sales process

The multi-party problem

Connecting to the external environment (closed and open loops)

 No anomalies were extracted from the transaction stream

Some research issues

 the level of aggregation of auditor control and action

____________________________________. 2003. “Black Box Logging and Tertiary Monitoring of

_________, M. G. Brennan, A. Kogan, and M. Vasarhelyi. 2006. Continuous Monitoring of Business

Kopetz, H. 2011. Internet of things. In Real-Time Systems, Springer, US: 307-323.

Papazoglou, M. P. 2001. Agent-oriented technology in support of e-business. Communications of the

Romero, S., G. Gal, T. J. Mock,and M. A. Vasarhelyi. 2012. A measurement theory perspective on

Shepard, S. (2005). RFID: radio frequency identification. McGraw Hill Professional.

Vasarhelyi, M.A. 2003. Confirmatory Extranets: a methodology of automatic confirmations, Rutgers

You might also like