Professional Documents
Culture Documents
Learning Objective 3-1: What are the four categories of Data Analytics?
Learning Objective 3-2: What are some descriptive analytics approaches, including summary statistics
and data reduction?
Learning Objective 3-3: How does the diagnostic approach to Data Analytics work, including profiling
and clustering?
Diagnostic Analytics
Diagnostic analytics provide insight into why things happened or how individual data values
relate to the general population.
Profiling Compares an Individual to the Population
Profiling is done primarily using structured data—data that are stored in a database or
spreadsheet and are readily searchable.
Profiling is used to discover patterns of behavior. In this example, the higher the Z-score
(farther away from the mean), the more likely a customer will have a delayed shipment
(blue circle).
Profiling Relies on Gathering Summary Statistics and Identifying Outliers
1. Identify the objects or activity you want to profile.
2. Determine the types of profiling you want to perform.
3. Set boundaries or thresholds for the activity.
4. Interpret the results and monitor the activity and/or generate a list of exceptions.
5. Follow up on exceptions.
ACCT 3130
Examples of Profiling
In the continuous audit, an auditor may use Benford’s Law to evaluate the frequency
distribution of the first digits from a large set of numerical data.
How Do You Perform Clustering
Clustering is used to identify groups of similar data elements and the underlying drivers of
those groups.
Clustering algorithms calculate the minimum distance of all observations and groups those
elements.
Examples of Clustering
Internal auditors can use clustering to identify groups of transactions that may indicate risk
or fraud in insurance or other payments.
Learning Objective 3-4: When do you use predictive analytics, including regression and classification?
Learning Objective 3-5: What are prescriptive analytics, including machine learning and artificial
intelligence?
What Do We Do Next?
Once other diagnostic and predictive analyses have been performed, the decision process
can be aided by rules-based decision support systems, machine learning models, or added
to an existing artificial intelligence model to improve future predictions.
Decision Support Systems Use Rules to Guide the Accountant
The rules are derived from past behavior to help guide the accountant through a process.
For example, the classification of leases is based on evaluating several rules.
Machine Learning Learns from Past Data to Predict Better Outcomes
What these all have in common is the use of algorithms and statistical models to generate a
previously unknown model that relies on patterns and inferences.
For most application of artificial intelligence models, most companies will outsource the
underlying system to companies like Microsoft, Amazon, or Google rather than develop it
themselves.
These companies have large datasets to create more accurate prediction and
recommendation engines.
Summary
In this chapter, we addressed the third step of the IMPACT cycle model: the “P” for
“performing test plan.” That is, how are we going to test or analyze the data to address a
problem we are facing?
We identified descriptive analytics that help describe what happened with the data,
including summary statistics, and data reduction and filtering.
We provided examples of diagnostic analytics that help users identify relationships in the
data that uncover why certain events happen through profiling, clustering, similarity
matching, and co-occurrence grouping.
We explained examples of predictive analytics and introduced some data mining concepts
related to regression, classification, and link prediction that can help predict future events or
values.
We discussed predictive analytics, including decision support systems and artificial
intelligence and provided some example of how these systems can make recommendations
for future actions.
We introduced some specific models and terminology related to these tools, including
Benford’s law, test and training data, decision trees and boundaries, linear classifiers, and
support vector machines.
We identified cases where creating models that overfit existing data are not very accurate at
predicting the future.
We presented some classification terminology—including test and training data, decision
trees and boundaries, linear classifiers, and support vector machines—and talked about the
perils of under- and overfitting the training data and their consequences in predictions using
the test data.
ACCT 3130
1. ___________ is a set of data used to assess the degree and strength of a predicted relationship.
A. Training data
B. Unstructured data
C. Structured data
D. Test data
2. Data that are organized and reside in a fixed field with a record or a file. Such data are generally
contained in a relational database or spreadsheet and are readily searchable by search
algorithms. The term matching this definition is:
A. Training data
B. Unstructured data
C. Structured data
D. Test data
3. An observation about the frequency of leading digits in many real-life sets of numerical data is
called:
A. Leading digits hypothesis
B. Moore’s law
C. Benford’s law
D. Clustering
4. Which approach to data analytics attempts to predict a relationship between two data items?
A. Similarity matching
B. Classification
C. Link prediction
D. Co-occurrence grouping
5. In general, the more complex the model, the greater the chance of:
A. Overfitting the data
B. Underfitting the data
C. Pruning the data
D. A more accurate prediction of the data
6. In general, the simpler the model, the greater the chance of:
A. Overfitting the data
B. Underfitting the data
C. Pruning the data
D. The need to reduce the amount of data considered
7. ________ is a discriminating classifier that is defined by a separating hyperplane that works first
to find the widest margin (or biggest pipe) and then works to find the middle line.
A. Linear classifier
B. Support vector machine
C. Decision tree
D. Multiple regression
8. ________ mark the split between one class and another
A. Decision trees
B. Identified questions
C. Decision boundaries
D. Linear classifiers
ACCT 3130
9. Models associated with regression and classification data approaches have all except this
important part:
A. Identifying which variables (we’ll call these independent variables) might help predict an
outcome (we’ll call this the dependent variable)
B. The functional form of the relationship (linear, nonlinear, etc.)
C. The numeric parameters of the model (detailing the relative weights of each of the
variables associated with the prediction)
D. Test data
10. Which approach to data analytics attempts to assign each unit in a population into a small set of
classes where the unit belong?
A. Classification
B. Regression
C. Similarity matching
D. Co-occurrence grouping