You are on page 1of 6

ACCT 3130

Chapter 3: Performing the Test Plan and Analyzing the Results

Learning Objective 3-1: What are the four categories of Data Analytics?

4 Main Categories of Data Analytics


 Descriptive analytics are procedures that summarize existing data to determine what has
happened in the past.
 Diagnostic analytics are procedures that explore the current data to determine why
something has happened the way it has, typically comparing the data to a benchmark.
 Predictive analytics are procedures used to generate a model that
can be used to determine what is likely to happen in the future.
 Prescriptive analytics are procedures that model data to enable
recommendations for what should be done in the future
Descriptive Analytics Examples:
 Summary statistics describe a set of data in terms of their location
(mean, median), range (standard deviation, minimum, maximum), shape (quartile), and size
(count).
 Data reduction or filtering is used to reduce the amount of observations to focus on
relevant items (that is, highest cost, highest risk, largest impact, etc.). It does this by taking a
large set of data (perhaps the population) and reducing it to a smaller set that has the vast
majority of the critical information of the larger set.
Diagnostic Analytics Examples:
 Profiling identifies the “typical” behavior of an individual, group, or population by compiling
summary statistics about the data (including mean, standard deviations, etc.) and comparing
individuals to the population.
 Clustering helps identify groups (or clusters) of individuals (such as customers) that share
common underlying characteristics—in other words, identifying groups of similar data
elements and the underlying drivers of those groups.
 Similarity matching is a grouping technique used to identify similar individuals based on
data known about them.
 Co-occurrence grouping discovers associations between individuals based on common
events, such as transactions they are involved in.
Predictive Analytics Examples:
 Regression estimates or predicts the numerical value of a dependent variable based on the
slope and intersect of a line and the value of an independent variable.
 Classification predicts a class or category for a new observation based on the manual
identification of classes from previous observations.
 Link prediction predicts a relationship between two data items, such as members of a social
media platform.
 Decision support systems are rule-based systems that gather data and recommend actions
based on the input.
 Machine learning and artificial intelligence are learning models or intelligent agents that
adapt to new external data to recommend a course of action.
ACCT 3130

Learning Objective 3-2: What are some descriptive analytics approaches, including summary statistics
and data reduction?

Descriptive analytics help summarize what has happened in the past.


 A financial accountant would sum all of the sales transactions within a period to calculate
the value for Sales Revenue that appears on the income statement.
 An analyst would count the number of records in a data extract to ensure the data are
complete before running a more complex analysis.
 An auditor would filter data to limit the scope to transactions that represent the highest
risk. In all these cases, basic analysis provides an understanding of what has happened in the
past to help decision makers achieve good results and correct poor results.
Summary Statistics
 Summary statistics describe the location, spread, shape, and dependence of a set of
observations.

Data reduction involves the following steps:


1. Identify the attribute you would like to reduce or focus on.
2. Filter the results.
3. Interpret the results.
4. Follow up on results.
Fuzzy matching locates approximate matches
 Useful for identifying relationships in imperfect data.

Learning Objective 3-3: How does the diagnostic approach to Data Analytics work, including profiling
and clustering?

Diagnostic Analytics
 Diagnostic analytics provide insight into why things happened or how individual data values
relate to the general population.
Profiling Compares an Individual to the Population
 Profiling is done primarily using structured data—data that are stored in a database or
spreadsheet and are readily searchable.
 Profiling is used to discover patterns of behavior. In this example, the higher the Z-score
(farther away from the mean), the more likely a customer will have a delayed shipment
(blue circle).
Profiling Relies on Gathering Summary Statistics and Identifying Outliers
1. Identify the objects or activity you want to profile.
2. Determine the types of profiling you want to perform.
3. Set boundaries or thresholds for the activity.
4. Interpret the results and monitor the activity and/or generate a list of exceptions.
5. Follow up on exceptions.
ACCT 3130

Examples of Profiling
 In the continuous audit, an auditor may use Benford’s Law to evaluate the frequency
distribution of the first digits from a large set of numerical data.
How Do You Perform Clustering
 Clustering is used to identify groups of similar data elements and the underlying drivers of
those groups.
 Clustering algorithms calculate the minimum distance of all observations and groups those
elements.
Examples of Clustering
 Internal auditors can use clustering to identify groups of transactions that may indicate risk
or fraud in insurance or other payments.

Learning Objective 3-4: When do you use predictive analytics, including regression and classification?

Regression Allows the Accountant to Develop Models to Predict Expected Outcomes


1. Identify the variables that might predict an outcome.
2. Determine the functional form of the relationship.
3. Identify the parameters of the model.
Dependent variable = f(independent variables)
Examples of Regression
 In managerial accounting, regression may predict employee turnover:
Employee turnover = f(current professional salaries, health of the economy [G D P],
salaries offered by other accounting firms or by corporate accounting, etc.)
 In auditing, regression may be used to determine the appropriateness of allowance
accounts:
Allowance for loan losses amount = f(current aged loans, loan type, customer loan
history, collections success)
The Goal of Classification is to Predict Whether an Individual Will Belong to One Class or Another.
1. Identify the classes you wish to predict.
2. Manually classify an existing set of records.
3. Select a set of classification models.
4. Divide your data into training and testing sets.
5. Generate your model.
6. Interpret the results and select the “best” model.
Classification
 Training data are existing data that have been manually evaluated and assigned a class.
 Test data are existing data used to evaluate the model.
 Decision trees are used to divide data into smaller groups.
 Decision boundaries mark the split between one class and another.
 Pruning removes branches from a decision tree to avoid overfitting the model.
 Linear classifiers are useful for ranking items rather than simply predicting class probability.
 These are useful for determining the really important values, such as valuable customers, or
which transactions are most likely fraudulent.
 Support vector machine is a discriminating classifier that is defined by a separating
hyperplane that works first to find the widest margin (or biggest pipe) and then works to
find the middle line.
ACCT 3130

How to Evaluate Classifiers?


 Try to avoid overfitting, or models that are too accurate. They are actually pretty bad at
predicting a future observation.
 Look for the sweet spot where we maximize the accuracy of the testing data.

Learning Objective 3-5: What are prescriptive analytics, including machine learning and artificial
intelligence?

What Do We Do Next?
 Once other diagnostic and predictive analyses have been performed, the decision process
can be aided by rules-based decision support systems, machine learning models, or added
to an existing artificial intelligence model to improve future predictions. 
Decision Support Systems Use Rules to Guide the Accountant
 The rules are derived from past behavior to help guide the accountant through a process.
 For example, the classification of leases is based on evaluating several rules.
Machine Learning Learns from Past Data to Predict Better Outcomes
 What these all have in common is the use of algorithms and statistical models to generate a
previously unknown model that relies on patterns and inferences.
 For most application of artificial intelligence models, most companies will outsource the
underlying system to companies like Microsoft, Amazon, or Google rather than develop it
themselves.
 These companies have large datasets to create more accurate prediction and
recommendation engines.
Summary
 In this chapter, we addressed the third step of the IMPACT cycle model: the “P” for
“performing test plan.” That is, how are we going to test or analyze the data to address a
problem we are facing?
 We identified descriptive analytics that help describe what happened with the data,
including summary statistics, and data reduction and filtering.
 We provided examples of diagnostic analytics that help users identify relationships in the
data that uncover why certain events happen through profiling, clustering, similarity
matching, and co-occurrence grouping.
 We explained examples of predictive analytics and introduced some data mining concepts
related to regression, classification, and link prediction that can help predict future events or
values.
 We discussed predictive analytics, including decision support systems and artificial
intelligence and provided some example of how these systems can make recommendations
for future actions.
 We introduced some specific models and terminology related to these tools, including
Benford’s law, test and training data, decision trees and boundaries, linear classifiers, and
support vector machines.
 We identified cases where creating models that overfit existing data are not very accurate at
predicting the future.
 We presented some classification terminology—including test and training data, decision
trees and boundaries, linear classifiers, and support vector machines—and talked about the
perils of under- and overfitting the training data and their consequences in predictions using
the test data.
ACCT 3130

Multiple Choice Questions

1. ___________ is a set of data used to assess the degree and strength of a predicted relationship.
A. Training data
B. Unstructured data
C. Structured data
D. Test data
2. Data that are organized and reside in a fixed field with a record or a file. Such data are generally
contained in a relational database or spreadsheet and are readily searchable by search
algorithms. The term matching this definition is:
A. Training data
B. Unstructured data
C. Structured data
D. Test data
3. An observation about the frequency of leading digits in many real-life sets of numerical data is
called:
A. Leading digits hypothesis
B. Moore’s law
C. Benford’s law
D. Clustering
4. Which approach to data analytics attempts to predict a relationship between two data items?
A. Similarity matching
B. Classification
C. Link prediction
D. Co-occurrence grouping
5. In general, the more complex the model, the greater the chance of:
A. Overfitting the data
B. Underfitting the data
C. Pruning the data
D. A more accurate prediction of the data
6. In general, the simpler the model, the greater the chance of:
A. Overfitting the data
B. Underfitting the data
C. Pruning the data
D. The need to reduce the amount of data considered
7. ________ is a discriminating classifier that is defined by a separating hyperplane that works first
to find the widest margin (or biggest pipe) and then works to find the middle line.
A. Linear classifier
B. Support vector machine
C. Decision tree
D. Multiple regression
8. ________ mark the split between one class and another
A. Decision trees
B. Identified questions
C. Decision boundaries
D. Linear classifiers
ACCT 3130

9. Models associated with regression and classification data approaches have all except this
important part:
A. Identifying which variables (we’ll call these independent variables) might help predict an
outcome (we’ll call this the dependent variable)
B. The functional form of the relationship (linear, nonlinear, etc.)
C. The numeric parameters of the model (detailing the relative weights of each of the
variables associated with the prediction)
D. Test data
10. Which approach to data analytics attempts to assign each unit in a population into a small set of
classes where the unit belong?
A. Classification
B. Regression
C. Similarity matching
D. Co-occurrence grouping

You might also like