Continuous assurance is a methodology to provide assurance on financial data on a near real-time basis. One of the fundamental elements of continuous assurance is continuous data auditing in which the integrity of the data provided by the client is tested. Continuity equations can be used to evidence assertions regarding data integrity. In order to do so, data is tested by predicting subsequent values based on a fitting model. In total there are three models: the simultaneous equations model, the vector autoregressive model and the restricted vector autoregressive model. I propose to test these models and compare them on the aspect of anomaly detection capability.

© All Rights Reserved

12 views

Continuous assurance is a methodology to provide assurance on financial data on a near real-time basis. One of the fundamental elements of continuous assurance is continuous data auditing in which the integrity of the data provided by the client is tested. Continuity equations can be used to evidence assertions regarding data integrity. In order to do so, data is tested by predicting subsequent values based on a fitting model. In total there are three models: the simultaneous equations model, the vector autoregressive model and the restricted vector autoregressive model. I propose to test these models and compare them on the aspect of anomaly detection capability.

© All Rights Reserved

- Difference Minor and Major Non Conformity
- Statistical Analysis
- The Value of Volatility
- Econometrics for Dummies Cheat Sheet
- Akerman et al. (n.d) Sources of Wage Inequality.pdf
- On The Use of Influence Function Matrix in Model Order Determination and Checking Of Outliers in Nigerian Economic Series
- Transient Stability
- Chapter 1
- Quality Cycle
- Duo Qin a History of Econometrics the Reformation From the 1970s
- Audit Plan - Parks
- cvarter
- An Empirical Model of Learning and Patient and New Drug Entry
- Wp 07177
- US Federal Reserve: ifdp739
- Volatility Transmission Between Exchange Rates and Stock Prices
- Agricultural Commodity Prediction
- Notes
- Forcasting Urea Price (Kim Brorsen)
- Circular Regression Based on Gaussian Processes

You are on page 1of 19

Research Proposal

CONTINUITY EQUATIONS IN CONTINUOUS ASSURANCE

ANR: 201386

Pre-master Accounting

Supervisor : Prof. Dr. W.F.J. Buijink

2014

Abstract

Continuous assurance is a methodology to provide assurance on financial data on a near

real-time basis. One of the fundamental elements of continuous assurance is continuous data

auditing in which the integrity of the data provided by the client is tested. Continuity

equations can be used to evidence assertions regarding data integrity. In order to do so, data

is tested by predicting subsequent values based on a fitting model. In total there are three

models: the simultaneous equations model, the vector autoregressive model and the restricted

vector autoregressive model. I propose to test these models and compare them on the aspect

of anomaly detection capability.

I.

Introduction

Continuous assurance has been a subject of interest for auditors and financial professionals

for the last three decades. However, this field of research took off only after Vasarhelyi et al.

(2004) published a widely accepted conceptual framework for continuous assurance. In the

following years additional studies were performed in this field, but most of these studies were

focused on refining the theoretical framework and developing new and innovative analysis

methods. Comparison of existing analysis models was not yet in scope. This proposal focuses

on the comparison of the anomaly detection capability of existing models of continuity

equations

Conventional audit procedures focus on time consuming manual testing on a fixed number

of randomly selected supporting documents, like invoices or inventory counts. By

introducing more superior audit procedures from the continuous assurance domain, like

continuity equations, substantive testing can in theory be performed more efficiently and

effectively. The level of assurance can improve, while time consumption is reduced at the

same time.

However, all these audit procedures from the continuous assurance domain are fairly new

and remain mostly untested in the real world. This research intends to investigate one of these

procedures, continuity equations, on a more detailed level. By using continuity equations

business processes could be tested by detecting anomalies in one or more of the steps within

these processes. The audit procedures or manual testing can then be narrowed down to the

detected anomalies.

Efficient performance of anomaly detection could lead to a paradigm shift in the field of

auditing. Instead of sampling evidence randomly from the population, the level of assurance

can be improved by inspecting exceptions only: audit by exception.

II.

Continuous assurance

The Canadian Institute of Chartered Accountants (1999) provides a definition of continuous

assurance: Continuous auditing [or continuous assurance] is a methodology that enables

independent auditors to provide written assurance on a subject matter using a series of

auditors reports issued simultaneously with, or a short period of time after, the occurrence of

events underlying the subject matter. The emphasis of continuous assurance is on reducing

the lag between preparing a report and subsequently providing assurance on the matters

reported.

In order to be able to provide assurance on a near real-time basis, the auditors have to rely

heavily on automated testing. Vasarhelyi et al. (2004; 2010) have defined three elements of

continuous assurance and continuous monitoring: Continuous Control Monitoring (CCM),

Continuous Data Auditing (CDA), Continuous Risk Monitoring and Assessment (CRMA).

CCM can be compared to interim testing of procedures in the conventional audit framework

and CDA can be compared to final testing focusing more on data than procedures. These two

elements combined can be used to provide sufficient assurance. CRMA can be used as an

additional part of the control framework, but is not essential for providing assurance. CDA

verifies the integrity of the data flowing through the information system. The data provided

by the client is the basis for all testing procedures, so data assurance forms an essential part of

continuous assurance. Continuity equations can be used as a tool from the CDA sub-domain

to evidence management assertions focusing on data integrity.

Continuity equations

Continuity equations have been a fundamental part of classical physics since the eighteenth

century. These equations describe the transport of a quantity, while simultaneously ensuring

conservation of this quantity (like mass and/or energy). Accordingly similar relations can be

defined for the transport of quantities within a system in the financial domain. The movement

of reported quantities, e.g. ordered kilograms or invoiced units, between steps in the key

business processes can be described with continuity equations.

The term continuity equations was coined in 1991, when Vasarhelyi and Halper (1991)

modeled the flow of billing data at AT&T. Although Vasarhelyi and Halper proposed

2

continuity equations more than 20 years ago, little research has been performed on the

application in practice and implementation of a decent continuity equations model.

In most businesses the flow of goods is the most important basis for revenue recognition.

As such, the flow of goods can be used to provide evidence for the completeness, timeliness

and accuracy of the reported revenue. If the continuity equations hold for a specific business

process, one can assert that there are no leakages from the transaction flow, i.e. the integrity

of the flow of goods can be asserted. Therefore, continuity equations provide a method to

evidence the integrity of the basis for revenue recognition, which makes them a valuable tool

in continuous assurance.

Continuity equations are based on historical data of quantities in the separate steps of

business processes. For example, the sales cycle can be modeled as three separate steps:

receiving the order from the customer, shipping goods to the customer and invoicing for the

ordered and shipped goods. The quantity of ordered goods today will of course show up in

the invoicing step a certain number of days later. The daily flow of goods between these steps

can be defined with a certain quantity

focus on the sales cycle consisting of the three previously defined process steps.

Previous research by Leitch and Chen (2003), Kogan et al. (2010) and Alles et al. (2005)

has resulted in three models of continuity equations: the simultaneous equations model

(SEM), vector autoregressive model (VAR) and the restricted vector autoregressive model

(RVAR).

Simultaneous Equations Model

Leitch and Chen (2003) proposed a first model of continuity equations in the field of

assurance: the Simultaneous Equations Model (SEM). When applied to the sales cycle this

model can be represented as Equation (1). Each step in the sales cycle is simultaneously

dependent on historic quantities from the previous step. These historic quantities are

represented with lag

in each step. This model simplifies the sales cycle by assuming that

(1)

The coefficients of this model are estimated by OLS linear regression, optimizing for the

overall

of the model.

3

Leitch and Chen tested the application of SEM on monthly data of financial statements.

They found that SEM outperformed other more conventional models of analytical

procedures.

Basic Vector Autoregressive model

Alles et al. (2005) introduced another model: the basic Vector Autoregressive (VAR)

model. This model for the sales cycle can be represented as Equation (2). In this model

,

at time , the

terms are

terms are

and

for the given dimension

(

(2)

Each of these sub-equations models a predictor for the reported quantities in a specific step

in the business process. As previously defined, the quantities are related to quantities in the

other process steps by a time delay (lag). For example, if orders are shipped in exactly one

day, without exception, and invoicing is performed simultaneously with shipping, the

resulting predictors can be defined as Equation (3).

(3)

The VAR model is estimated by OLS linear regression, optimizing for the overall

by

trying different lags for the process steps. Only the maximum expected lag is provided to the

algorithm, which then tries to find the best fitting model by iterating trough all lag

possibilities up to the maximum expected lag. The exact lags do not have to be known prior

to modeling as the best fitting lags are determined while modeling.

One can easily understand that it is not always trivial to determine lags prior to the

modeling process, e.g. lags in the purchasing cycle are highly dependent on the policies and

processes at third parties. Therefore, the VAR model can be a powerful tool for modeling

continuity equations when exact lags can not be predefined easily.

4

Contrary to the SEM model, the VAR model does not assume that there is a singular fixed

lag between steps. All lags up to a maximum are considered in the model. This can possibly

result in a comprehensive estimated model. Therefore, most VAR models are represented

using matrix notation.

Restricted Vector Autoregressive model

Kogan et al. (2010) have shown in their studies that the VAR model shows outstanding

accuracy. More importantly, they showed that the Restricted VAR (RVAR) model resulted in

better accuracy. With a MAPE (mean absolute percentage error) of 0.3374 on the test set it

outscored even several other models, i.e. SEM and VAR type of models. Only the Bayesian

VAR model performed better when taking only the MAPE into account, but it also resulted in

a larger standard deviation for the absolute percentage error. Therefore, the Bayesian VAR

model is not considered viable for auditing purposes. The RVAR model was found to be one

of the best models for continuity equations.

insignificant coefficients from the VAR model. For example, if the mean lag between order

and shipping is less than a month shipment

not significant and thus excluded from the model. This method iterates the modeling process

per equation by removing all coefficients with | |-statistics below a predefined threshold, as

explained in Figure 1. Kogan et al. (2010) find that a threshold of

corresponding

and its

Data

Final model

Threshold

Yes

Start

Initial model

estimation

Exclude parameters

with t-statistic

below threshold

Re-estimate model

All t-statistics

above threshold?

No

Figure 1. RVAR modeling process. The initial VAR model is restricted by excluding parameters with a tstatistic below a predefined threshold. The model is re-estimated followed by the next exclusion iteration, until

all parameters satisfy the t-statistic requirement.

The RVAR model usually results in less extensive and more accurate estimated models due

to the restriction to significant terms only.

Research question

In total three different models of continuity equations are used in the field of continuous

assurace. Auditors rely on the accuracy and anomaly detection capability of these models to

provide assurance on the data. This leads to my research question:

Which of the existing models of continuity equations in continuous auditing has the best

anomaly detection capability?

III.

Method

Data

The proposed base model for the sales cycle is based on three different quantities: the

ordered quantity, the quantity of goods shipped and the quantity invoiced. These three

variables can be provided by most ERP systems on a daily basis.

Data is provided by a Dutch wholesaler in technical supplies. This company uses an offthe-shelf solution of Microsoft Dynamics AX 2009. The data was extracted from separately

generated reports containing transaction quantities for each of the process steps by merging

the columns by date, as presented in Figure 2.

SalesOrders

PK

Date

Quantity

Shipments

PK

Invoices

Date

PK

Quantity

Date

Quantity

SalesData

PK,FK1,FK2,FK3 Date

SO

GS

IS

Figure 2. Data model consisting of daily aggregates for three different stages in the sales cycle: ordered

quantity (SO), quantity of goods shipped to customer (GS) and quantity invoiced (IS) combined by date via a

SQL join clause. The date serves as the primary and foreign keys of the data source involved.

The data reflects actual day-to-day transaction quantities of February 2007 up to November

2007, excluding Sundays and holidays during which the company was closed for business.

Saturdays are still included, because sometimes high priority orders are shipped on Saturdays.

The resulting data is exported as a CSV file to be imported by the model implementations

in R. The CSV file consists of four data fields, i.e. date, the quantities ordered, quantities

shipped and quantities invoiced. More detailed information about the data can be found in

Appendix A.

Panel A

Variable

Sales orders (SO)

Goods shipped (GS)

Invoices sent (IS)

n

264

264

264

Mean

Std.Dev. 25th Pct. Median 75th Pct.

66,845

60,676

38,384

62,548

83,122

62,068

46,099

42,295

63,326

40,865

60,211

47,237

78,393

60,745

81,303

Panel B

Pearson correlations

| |

| |

| |

| |

1.000

| |

| |

0.600* 0.588*

1.000 0.960*

1.000

Table 1. A: sample characteristics of the data set consisting of 264 observations of actual day-to-day

transaction quantities in sales orders, goods shipped en invoices sent. B: Pearson correlations between the

quantity variables.

Table 1 and Figure 3 presents descriptive statistics about the three quantity fields in the data

set. The Pearson correlations show that the GS and IS variables are strongly related. This is

fully in line with the notion that invoices are generated at the same time as the goods are

shipped most of the time. Furthermore, the charts clearly show less activity on Saturdays

compared to weekdays. On Saturdays only priority orders and over-the-counter sales are

handled.

The data is split into two separate parts, which account for roughly and of the

observations included in the data set respectively. The first part will be used as a training set

to estimate the model parameters for all three models. The second part is used as a test set.

After estimation, the models will be tested by generating predictions for the test set.

Figure 3. Plot of daily aggregates for three different stages in the sales cycle: ordered quantity (SO), quantity of

goods shipped to customer (GS) and quantity invoiced (IS) as provided in the data set.

The models will be implemented in R, the most widely accepted language for statistical

processing and data analytics. A rudimentary implementation of these models is already

available in the form of R packages.

The SEM model is implemented in four stages: data collection, pre-processing, modeling

and prediction. The code is based on the systemfit package, which has been developed and

pusblished by Arne Henningsen and Jeff D. Hamann and is available via CRAN.

(Henningsen & Hamann, 2007)

The VAR and RVAR models are also implemented in four stages: data collection, preprocessing, modeling and prediction. The code is centered around the vars package, which

has been developed and pusblished by Bernhard Pfaff and Matthieu Stigle and is available via

CRAN. (Pfaff & Im Taunus, 2007; Pfaff, 2008; Pfaff, 2008) The package includes several

functions for modeling VARs, testing the VARs and presenting the results.

8

Testing of the models

After the model parameters were estimated based on the training set the resulting models

are tested. Anomaly detection capability is tested by counting false negatives or Type II

errors in the model predictions based on a slightly modified test set. Type I errors or false

positives are not in scope, due to the lack of negative effects on the level of assurance.

The test set is altered by increasing the quantities in five randomly selected observations by

100%. These altered observations serve as injected anomalies in the test set. The test set,

including the seeded anomalies, are then processed by the model implementation and

anomalies are reported.

In order to improve randomness and reduce the apparent selection bias the testing is

repeated 1,000 times, while randomly selecting five observations to be altered by 100% in the

original test set for every repetition. The mean number of Type II errors found serves as the

test statistic for comparison purposes. These means are compared using a dependent t-test.

The test procedure, as implemented in R, can be found in Appendix C.

IV.

Expected results

After testing I expect to find that the RVAR model to be the superior model in terms of

anomaly detection capability. The SEM model will probably underperform due to the

oversimplification of the sales cycle steps and the accompanying lag terms. I expect most

companies to have two or more lag terms associated with the largest part of the flow of

goods. The data provider for the proposed tests for example provides next day delivery for

some items which are separately shipped. The ordered quantity can thus be considered as two

or more flows with

and

In theory it should also outperform the basic VAR model purely based on statistical

properties. In both the RVAR and VAR model multiple lag terms are considered and

included in the model. This should result in better performance than the SEM model. The

RVAR model can be considered an improved version of the basic VAR model due to the

exclusion of statistical insignificant terms. Eventhough the algorithm for estimating the

RVAR model on real data is simple and elegant it could result in a suboptimal estimation.

9

Estimating anomaly detection performance and accuracy prior to the estimation algorithm is

even more difficult.

V.

Limitations

The research focuses on Type II errors only, since only false negatives (failing to identify

an anomaly when one exists) influence the level of assurance. The level of assurance is the

most important factor in acceptance of the models used. If the models are considered to be

not reliable, auditors will not be able to use them. Therefore, actual errors can not pass the

test undiscovered.

However, Type I errors also influence the audit procedure. The detection of false positives

can lead to an increase in audit activities, since all detected anomalies have to be tested

manually. Eventhough Type I errors are not in scope, the models can only be accepted if the

number of false positives stays below a certain limit.

Data

The data used in this research is provided by a single entity and for a single year only.

Therefore, conclusions and results are only applicable to the data provider and can not be

generalized. In order to be able to generalize the results and conclusions, the proposed

methods need to be used on data provided by multiple entities. Furthermore, reliability will

be improved by testing data from subsequent years. Furthermore, since the data is provided

by a single entity selection bias may occur. In addition, the data set contains noise. Preexisting anomalies might exist in the data set.

10

REFERENCES

(CICA), C. I. (1999). Continuous Auditing. Continuous Auditing. Toronto, ON, Canada.

Alles, M., Kogan, A., Vasarhelyi, M., & Wu, J. (2005). Continuity Equations in Continuous

Auditing: Detecting Anomalies in Business Processes.

Dzeng, S. (1994). A Comparison of Analytical Procedures Expectation Models Using Both

Aggregate and Disaggregate Data. Auditing: A Journal of Practice \& Theory,

13(Fall), 1-24.

Henningsen, A., & Hamann, J. D. (2007). systemfit: A Package for Estimating Systems of

Simultaneous Equations in R. Journal of Statistical Software, 23(4), 1-40.

Kogan, A., Alles, M. G., Vasarhelyi, M. A., & Wu, J. (2010). Analytical Procedures for

Continuous Data Level Auditing: Continuity Equations.

Leitch, R. A., & Chen, Y. (2003). The effectiveness of expectation models in recognizing

error patterns and generating and eliminating hypotheses while conducting analytical

procedures. Auditing: A Journal of Practice & Theory, 22(2), 147-170.

Pfaff, B. (2008). VAR, SVAR and SVEC models: Implementation within R package vars.

Journal of Statistical Software, 27(4), 1-32.

Pfaff, B. (2008). vars: VAR Modelling. R package version, 1-3.

Pfaff, B., & Im Taunus, K. (2007). Using the vars package.

Vasarhelyi, M. A., & Halper, F. B. (1991). The continuous audit of online systems. Auditing:

A Journal of Practice & Theory, 10(1), 110-125.

Vasarhelyi, M. A., Alles, M. G., & Kogan, A. (2004). Principles of analytic monitoring for

continuous assurance. Journal of Emerging Technologies in Accounting, 1(1), 1-21.

Vasarhelyi, M. A., Alles, M., & Williams, K. T. (2010). Continuous assurance for the now

economy. Institute of Chartered Accountants in Australia Sydney, Australia.

11

Appendix A.

Data

The data is provided by a Dutch wholesaler in technical supplies and contains daily

aggregates of the three separate steps in the sales cycle.

SalesOrders

PK

Date

Quantity

Shipments

PK

Invoices

Date

PK

Quantity

Date

Quantity

SalesData

PK,FK1,FK2,FK3 Date

SO

GS

IS

Figure 2. Data model consisting of daily aggregates for three different stages in the sales cycle: ordered

quantity (SO), quantity of goods shipped to customer (GS) and quantity invoiced (IS) combined by date via a

SQL join clause. The date serves as the primary and foreign keys of the data source involved.

12

Appendix B.

13

14

Appendix C.

Test algorithm

15

16

17

- Difference Minor and Major Non ConformityUploaded bychrisn786
- Statistical AnalysisUploaded byPrem Kumar
- The Value of VolatilityUploaded bygr8fun5052
- Econometrics for Dummies Cheat SheetUploaded byelhoumamy
- Akerman et al. (n.d) Sources of Wage Inequality.pdfUploaded byJoe
- On The Use of Influence Function Matrix in Model Order Determination and Checking Of Outliers in Nigerian Economic SeriesUploaded byIOSRjournal
- Transient StabilityUploaded byavinash_sinha20
- Chapter 1Uploaded bySugim Winata Einstein
- Quality CycleUploaded bymrbaiquni
- Duo Qin a History of Econometrics the Reformation From the 1970sUploaded byUriel Rodriguez
- Audit Plan - ParksUploaded byJon Ortiz
- cvarterUploaded bypasaitow
- An Empirical Model of Learning and Patient and New Drug EntryUploaded bymaleticj
- Wp 07177Uploaded bywashitong
- US Federal Reserve: ifdp739Uploaded byThe Fed
- Volatility Transmission Between Exchange Rates and Stock PricesUploaded byikutmilis
- Agricultural Commodity PredictionUploaded byHarshit Mahajan
- NotesUploaded byssobeidat
- Forcasting Urea Price (Kim Brorsen)Uploaded bytomkhai
- Circular Regression Based on Gaussian ProcessesUploaded byM Agung Alifferiza M
- et_Ch3Uploaded byRyan Taga
- Lesson 12Uploaded byZaenal Muttaqin
- Citc Memo - Citc Iso AuditUploaded byMohd Ismail Yusof
- 01 Naveed Ahmed Shah C.v 2019Uploaded byNaveed Ahmad Shah
- TaylorStPp3.PsUploaded bynauli10
- Hypothesis 130224084711 Phpapp02 CopyUploaded byJibin Philip
- GRE Math ReviewUploaded byHaidar Ismail
- StatUploaded byramaneejee
- Manajemen Biaya 03.pptUploaded byWhawhan Vierrania
- Cm Kiem Toan Viet Nam - EngUploaded byDiep Nguyen

- Interest Rate Prediction for Social LoansUploaded byErik van Kempen
- Anomaly Detection Capability of Existing Models of Continuity Equations in Continuous AssuranceUploaded byErik van Kempen
- Using R in Continuous Assurance: Restricted Vector Autoregressive Model (RVAR) of Continuity EquationsUploaded byErik van Kempen
- Alternative Implementation of the Restricted Vector Autoregressive Model (RVAR) of Continuity Equations for Continuous AssuranceUploaded byErik van Kempen
- Short Introduction to the Restricted Vector Autoregressive Model (RVAR) of Continuity Equations for Continuous AssuranceUploaded byErik van Kempen

- Essay on EducationUploaded byhassan
- Depression BMJ 08Uploaded byMarcos Teixeira
- Sample Planning Documents (1)Uploaded bymohitks01_89
- Dotson ANSI-z10 Case StudyUploaded byMichael Johns
- Huang & Pinker Lexical Semantics and Irregular Inflection.pdfUploaded byAbraxas
- Traffic Monitoring Guide (FHWA)Uploaded byuhope
- Head Room Supervision to Completeness of Note Nursing Care DocumentationUploaded byInternational Journal of Innovative Science and Research Technology
- NegotiationUploaded byvphani11
- autism-checklistUploaded byapi-115513756
- Qual Health Res 2009 Downing Matibag 1196 209 CopyUploaded byemeraldwxyz
- Lect 9 Calculus 4Uploaded byMert Erkul
- Psychology Paprs AnalysisUploaded byRafia Shah
- Transactional Document-Content Management-a Whitepaper by BancTec – leading Content Management, Record Management System ProviderUploaded byavacarter
- ConclusionUploaded bymasprad
- Benefits Risks Testosterone TherapyUploaded bySusan Coleman
- sppsUploaded byiskandar
- Pasir Gudang CommunityUploaded byzackx
- Impact of High Involvement Work Practices on Employee’s Attitude and Behaviour of Commercial Banks in ChennaiUploaded byInternational Journal of Advanced Scientific Research and Development
- Tinjauan Pustaka II Revisi 2.Me2Uploaded byAldila Desy Kusumawaty
- Optimization Basic ConceptsUploaded byjyothis_joy8315
- polygraphic_deception_detectionUploaded byda5id665
- Videogames as Art PIP.Uploaded byJacob Pebbl
- A Qualitative Evaluation of the Barriers and Facilitators Toward Implementation of the WHO Surgical Safety Checklist Across Hospitals in England.pdfUploaded byMaria Andrea
- pqt-anna-university-notes-rejinpaul.pdfUploaded byDavid Miller
- CSS DrinkingUploaded byBenny John
- Novel Approaches in Selective Tryptophan Isotope Labeling by Using Escherichia coli Overexpression MediaUploaded byYago Pereira
- nsplan0911Uploaded bylovely015
- 117076134 SpecUploaded byJorge Hantar Touma Lazo
- one markUploaded bytharani1771_32442248
- ADCO is Equally Committed to the Prevention of PollutionUploaded bymohammad_shahzad_iiui