Data Structures and CAATTs

Data Structure and
CAATTs for Data

Extraction
Acknoledgement
These slides have been adapted from:
IT Auditing, Hall
Learning Objectives
• Understand the components of data structures and how these
are used to achieve data-processing operations.
• Be familiar with structures used in flat-file systems, including
sequential, indexes, hashing, and pointer structures.
• Be familiar with relational database structures and the
principles of normalization.
• Understand the features, advantages, and disadvantages of the
embedded audit module approach to data extraction.
• Know the capabilities and primary features of generalized audit
software.
• Become familiar with the more commonly used features of ACL.
DATA STRUCTURES
 Organization
Access method
Access:
Non-Index
Methods
Hashing
Pointers
INDEX File DATA File
Access: Data
Index Methods Organization
SEQUENTIAL SEQUENTIAL
ISAM RANDOM
RANDOM
FILE PROCESSING OPERATIONS
1. Retrieve a record by key

2. Insert a record
3. Update a record
4. Read a file Individual
Records
5. Find next record
6. Scan a file
7. Delete a record Table 8-1
DATA STRUCTURES
 Flat file structures

 Sequential structure [Figure 8-1]
All records in contiguous storage spaces in specified

sequence (key field)
Sequential files are simple & easy to process
Application reads from beginning in sequence
If only small portion of file being processed, inefficient

method
Does not permit accessing a record directly
Efficient: 4, 5 – sometimes 3
DATA STRUCTURES

 Indexed structure
In addition to data file, separate index
file
Contains physical address in data file

of each indexed record
DATA STRUCTURES

 Indexed random file [Figure 8-2]
Records are created without regard to physical
proximity to other related records
Physical organization of index file itself may be

sequential or random
Random indexes are easier to maintain,

sequential more difficult
Advantage over sequential: rapid searches
Other advantages: processing individual records,

efficient usage of disk storage
Efficient: 1, 2, 3, 7
DATA STRUCTURES

 Indexed Sequential Access Method (ISAM)
 Large files, routine batch processing

 Moderate degree of individual record processing
 Used for files across cylinders
 Uses number of indexes, with summarized content
 Access time for single record is slower than Indexed Sequential
or Indexed Random
 Disadvantage: does not perform record insertions efficiently –
requires physical relocation of all records beyond that point –
SOS
 Has 3 physical components: indexes, prime data storage area,
overflow area [Figure 8-4]
 Might have to search index, prime data area, and overflow area –
slowing down access time
 Integrating overflow records into prime data area, then
reconstructing indexes reorganizes ISAM files
 Very Efficient: 4, 5, 6
 Moderately Efficient: 1, 3
 Inefficient: 2, 7
DBMS etc.
Legacy systems
Legacy systems
1960 1970 1980 1990

EVOLUTION OF ORG./ACCESS METHODS
Efficient
Inefficient
Access single records Access entire files

HASHING STRUCTURE
 Employs algorithm to convert primary key into physical

record storage address
 No separate index necessary
 Advantage: access speed
 Disadvantage
 Inefficient use of storage
 Different keys may create same
address
 Efficient: 1, 2, 3, 6
 Inefficient: 4, 5, 7
POINTER STRUCTURE
 Stores the address (pointer) of related record in

a field with each data record [Figure 8-6]
 Records stored randomly
 Pointers provide connections b/w records
 Pointers may also provide links of records b/w files
[Figure 8-7]
 Types of pointers [Figure 8-8]:
 Physical address – actual disk storage location
• Advantage: Access speed
• Disadvantage: if related record moves, pointer must be
changed & w/o logical reference, a pointer could be lost
causing referenced record to be lost
 Relative address – relative position in the file (135th)
• Must be manipulated to convert to physical address
 Logical address – primary key of related record
• Key value is converted by hashing to physical address
 Efficient: 1, 2, 3, 6
 Inefficient: 4, 5, 7
DATABASE STRUCTURES
 Hierarchical & network

structures [Figure 8-9]
 Uses explicit linkages b/w records
to establish relationship
 Figure 8-9 is M:N example
 Relational structure
 Uses implicit linkages b/w records
to establish relationship:
foreign keys / primary keys
Relational Database: “table” – rows and
columns
Relational Records: “Foreign Keys” in one record
establishes relationships to related records in other files.
CUSTOMERS
INVOICES
INVENTORY
DATABASE STRUCTURES
 User views
 Data a particular user needs to achieve his/her
assigned tasks
 A single view, or view without user input, leads to

problems in meeting the diverse needs of the
enterprise
 Trend today: capture data in sufficient detail and

diversity to sustain multiple user views
 User views MUST be consolidated into a single “logical

view” or schema
DATABASE STRUCTURES
 Creating views
 Designing output reports, documents, and input
screens needed by users or groups
 Physical documents help designer understand

relationships among the data
• 3 user views: Table 8-2, Figure 8-12, Table

8-3
 Then apply normalization principles to the conceptual
user views to design the database tables
DATABASE STRUCTURES
 Importance of data normalization
 Critical to success of DBMS
 Effective design in grouping data
 Several levels: 1NF, 2NF, 3NF, etc.
 Un-normalized data suffers from:

• Insertion anomalies
• Deletion anomalies
• Update anomalies
 One or more of these anomalies will exist in tables <

3NF
DATABASE STRUCTURES
 Normalization process
 Un-normalized data [Table 8-4]
 Eliminates the 3 anomalies if:
• All non-key attributes are dependent on the primary key
• There are no partial dependencies (on part of the
primary key)
• There are no transitive dependencies; non-key attributes
are not dependent on other non-key attributes
 “Split” tables are linked via embedded “foreign

keys”
 Normalized database tables examples: Figures 8-
13, 8-14
DATABASE STRUCTURES
 Creating physical tables

 Created on paper so far
 Then create physical files and populate data
 Physical views can be produced from DBMS
 Query function
 Allows users to create customized lists from database
 Users stipulate, using English-like commands, which tables,
records, fields, filtering criteria needed to produce the desired
list
 Result is virtual table derived from actual database tables
 SQL
• SELECT, FROM, WHERE [Figure 8-16]

• De facto standard query language
DATABASE STRUCTURES
 Auditors and data normalization
 Database normalization is a technical matter that is
usually the responsibility of systems professionals.
 The subject has implications for internal control that
make it the concern of auditors also.
 Most auditors will never be responsible for normalizing
an organization’s databases; they should have an
understanding of the process and be able to determine
whether a table is properly normalized.
 In order to extract data from tables to perform audit
procedures, the auditor first needs to know how the data
are structured.
EMBEDDED AUDIT MODULE
 Identify important transactions live

while they are being processed and
extract them [Figure 8-18]
 Examples
 Errors
 Fraud
 Compliance
• SAS 78, SAS 94, SAS 99 / S-OX

EMBEDDED AUDIT MODULE
 Disadvantages:
 Operational efficiency – can

decrease performance, especially if
testing is extensive
 Verifying EAM integrity - such as
environments with a high level of
program maintenance
 Status: increasing need, demand,
and usage of COA/EAM/CA
GENERALIZED AUDIT SOFTWARE
 Brief history
 Most widely used CAATT [Figure 8-19]
 Usages include:
1) Footing and balancing entire files or selected data items (e.g.,

extending inventory)
2) Selecting and reporting detail data
3) Selecting stratified statistical samples from data files
4) Formatting results into audit reports (auto work papers!)
5) Printing confirmations
6) Screening / filtering data
7) Comparing multiple files for differences
8) Recalculating values in data

 Popular because:
1. GAS software is easy to use and
requires little computer background
2. Many products are platform independent,
works on mainframes and PCs
3. Auditors can perform tests independently
of IT staff
4. GAS can be used to audit the data
currently being stored in most file
structures and formats
 Simple structures [Figure 8-27]

 Complex structures [Figures 8-28,8-29]
 Auditing issues:
 Auditor must sometime rely on IT personnel to produce

files/data
 Risk that data integrity is compromised by extraction
procedures
 Auditors skilled in programming better prepared to avoid
these pitfalls
ACL
 ACL is a proprietary version of

GAS
 Leader in the industry
 Designed as an auditor-friendly
meta-language (i.e., contains
commonly used auditor tests)
 Access to data generally easy
with ODBC interface
ACL
 Input file definition

 Customizing a view
[Figure 8-32]
 Filtering data
[Figures 8-34, 8-35]
 Stratifying data [Figure 8-36]
 Statistical analysis
IT Auditing & Assurance, 2e, Hall & Singleton

Data Structures and CAATTs

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Structures and CAATTs

Uploaded by

Copyright:

Available Formats

Data Structure and

CAATTs for Data

These slides have been adapted from:

1. Retrieve a record by key

 Flat file structures

All records in contiguous storage spaces in specified

Sequential files are simple & easy to process

Application reads from beginning in sequence

If only small portion of file being processed, inefficient

Does not permit accessing a record directly

 Flat file structures

Contains physical address in data file

 Flat file structures

Physical organization of index file itself may be

Random indexes are easier to maintain,

Advantage over sequential: rapid searches

Other advantages: processing individual records,

 Flat file structures

 Large files, routine batch processing

1960 1970 1980 1990

Access single records Access entire files

 Employs algorithm to convert primary key into physical

 Stores the address (pointer) of related record in

 Hierarchical & network

 A single view, or view without user input, leads to

 Trend today: capture data in sufficient detail and

 User views MUST be consolidated into a single “logical

 Physical documents help designer understand

• 3 user views: Table 8-2, Figure 8-12, Table

 Critical to success of DBMS

 Effective design in grouping data

 Several levels: 1NF, 2NF, 3NF, etc.

 Un-normalized data suffers from:

 One or more of these anomalies will exist in tables <

 “Split” tables are linked via embedded “foreign

 Creating physical tables

• SELECT, FROM, WHERE [Figure 8-16]

 Identify important transactions live

• SAS 78, SAS 94, SAS 99 / S-OX

 Operational efficiency – can

1) Footing and balancing entire files or selected data items (e.g.,

3) Selecting stratified statistical samples from data files

4) Formatting results into audit reports (auto work papers!)

6) Screening / filtering data

7) Comparing multiple files for differences

8) Recalculating values in data

 Simple structures [Figure 8-27]

 Auditor must sometime rely on IT personnel to produce

 ACL is a proprietary version of

 Input file definition

You might also like