You are on page 1of 52

Chapter 8:

Data Structures and CAATTs


for Data Extraction

IT Auditing, Hall, 4e

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Learning Objectives
o Understand the components of data structures and how these are used
to achieve data-processing operations.
o Be familiar with structures used in flat-file systems, including sequential,
indexes, hashing, and pointer structures.
o Be familiar with relational database structures and the principle of
normalization.
o Understand the features, advantages, and disadvantages of the
embedded audit module approach to data extraction.
o Know the capabilities and primary features of generalized audit
software.
o Become familiar with the more commonly used features of audit
command language.
© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Data Structures

o Two fundamental components:


o Organization: The way records are physically arranged on the
secondary storage device.
o Access Method: Technique used to locate records and to navigate
through the database or file.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Flat File Structures

o Sequential structure (sequential access method):

o All records in contiguous storage spaces in specified sequence


(key field).
o Sequential files are simple & easy to process.
o Application reads from beginning in sequence.
o Approach is not efficient when only a small portion of file being
processed.
o Does not permit accessing a record directly.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Sequential Storage and Access
Method

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Flat File Structures

o Indexed random file:


o Records created without regard to physical proximity to other
related records.
o Physical organization of index file may be sequential or random.
o Random indexes are easier to maintain, in terms of adding
records, than sequential.
o Sequential index can be searched rapidly.
o Principal advantages of indexed random files are in processing
individual records and efficient usage of disk storage.
o Sequential files are more efficient for operations that involve
processing a large portion of a file.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Indexed Random File Structure

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Flat File Structures

o Virtual Storage Access Method (VSAM):


o Large files requiring routine batch processing.
o Moderate degree of individual record processing.
o Can be searched sequentially for efficient batch processing.
o Disadvantage: Does not perform record insertions efficiently –
requires physical relocation of all records beyond that point.
o Has three physical components: indexes, prime data storage area,
overflow area.
o Might have to search index, prime data area, and overflow area –
slowing down access time.
o Integrating overflow records into prime data area, then
reconstructing indexes reorganizes ISAM files.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
VSAM Used for Direct Access

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Inserting a Record into a VSAM File

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Hashing Structure

o Employs algorithm to convert primary key into physical


record storage address with no separate index necessary.
o Advantage is access speed.
o Disadvantages:
o Inefficient use of storage.
o Different keys may create same address (called a collision)
which slows down access to records.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Hashing Technique with Pointer to
Relocate Collision Record

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Pointer Structure

o Stores the address (pointer) of related record in a field with


each data record.
o Pointers provide connections between records and may be used
to link records between files.
o Physical address pointer contains actual disk storage
location which allows direct access to the record.
o Advantage: Access speed.
o Disadvantages: If related record moves, pointer must be
changed. With no logical relationship to records they identify, if
pointer is lost or destroyed, record it references is also lost.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Pointer Structure

o Relative address pointer contains relative position of a


record in the file which must be manipulated to convert to
physical address.
o Logical key pointer contains primary key of related
record.
o Key value is converted by hashing to physical address.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
A Linked-List File

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Pointers Within and Between Files

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Types of Pointers

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Hierarchical & Network
Database Structures

o Early models employed flat-file techniques and new


proprietary database structures.
o Major difference between approaches is degree of process
integration and data sharing that can be achieved.
o Database models designed to support existing flat-file
systems while allowing a move to new levels of data
integration.
o A many-to-many association is illustrated on the next slide.
o Link files may also contain accounting data.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Link File in a Many-to-Many
Relationship

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Relational Database
Structure, Concepts and
Terminology
o Based on indexed sequential file structure which facilitates
direct access to individual records and batch processing of
the file.
o Multiple indexes can create an inverted list or cross-
reference allowing even more flexible access to data.
o Represents data in two-dimensional tables and supports
relational algebra functions of restrict, project, and join.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Indexed Sequential File

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Relational Algebra Functions
Restrict, Project, and Join

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Relational Database
Concepts

o Entity is anything the organization wants to capture data


about.
o Data model is the blueprint for creating the physical database.
o Entity relationship diagram (ER) is the graphical
representation used to depict the model.
o Occurrence used to describe number of instances or records
pertaining to a specify entity.
o Attributes are data elements that define an entity.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use .
Relational Database
Concepts
o Association:
o Represented by a line connecting two entities.
o Described by a verb, such as ships, requests, or receives.
o Cardinality is the degree of association between two
entities.
o The number of possible occurrences in one table that are
associated with a single occurrence in a related table.
o Four basic forms: zero or one (0,1), one and only one (1,1),
zero or many (0,M) and one or many (1,M).

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Examples of Entity Associations

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Physical Database Tables

o Properly designed tables possess the following four


characteristics:
o Value of at least one attribute in each occurrence (row) must
be unique. This is the primary key.
o All attribute values in any column must be of the same class.
o Each column in a given table must be uniquely named.
o Tables must conform to the rules of normalization, which
means free from structural dependencies.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use .
Linkages Between Relational
Tables

o Logically related tables need to be physically connected to


achieve associations described in the data model.
o Accomplished by using a foreign key.
o Nature of association between related tables

determines method used to assign foreign keys.


o With foreign keys, programs can be written to navigate

among database tables to provide needed information.


o User view is the set of data a particular user sees.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Linkages Between Relational Tables

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Database Anomalies

o Improperly normalized tables can cause problems that


restrict or deny user access.
o Negative operational symptoms called anomalies.
o To be free of anomalies, tables must be normalized to the
third normal form (3NF).

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Database Anomalies

o Three types of anomalies:


o Update anomaly results from data redundancy in an
unnormalized table. Modification on an attribute must be made in
each of the rows in which the attribute appears.
o Insertion anomaly occurs when a new item cannot be added to
the table because there is no association with the primary key.
o Deletion anomaly involves unintentional deletion of data. When
an attribute item used by only one entity is deleted, all information
about that attribute item is lost.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Normalizing Tables

o Anomalies are symptoms of dependencies: repeating groups,


partial dependencies and transitive dependencies.
o Normalization process identifies and removes dependencies
resulting in tables in which:
o All nonkey (data) attributes in table are dependent on primary key.
o All nonkey attributes are independent of other nonkey attributes.
o If any attribute violates conditions, it needs to be removed,
placed in a separate table and assigned an appropriate key.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Normalized Database Tables

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Data Normalization

o When unnormalized tables are split into multiple 3NF tables,


need to be linked together via foreign keys so data can be
related and made accessible to users.
o Systems professionals usually responsible for data
normalization but internal control implications make it of
concern to auditors.
o Auditor should have an understanding of the process and be able
to determine if table is properly normalized.
o Auditor needs to know how data are structured to extract data and
perform audit procedures.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Six Phases of Database
Design (View Modeling)
o Identify organization’s primary entities.
o Construct data model showing entity associations (cardinality).
o Add primary keys and attributes to the model.
o Assign primary keys to all entities in the model to uniquely identify
records.
o Every attribute should appear in one or more user views.
o Normalize data model and add foreign keys.
o Remove repeating groups, partial and transitive dependencies
and assign foreign keys to link tables.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Data Model Showing Entity
Associations

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use .
Data Model Showing Keys and
Attributes

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Normalized Data Model

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Six Phases of Database
Design (View Modeling)

o Construct the physical database.


o Create physical tables and populate them with data.
o May take many months in a large installation.
o Prepare the physical user views.
o Normalized tables should support all required views of system
users.
o Personal user queries limits access to authorized data only.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use .
Normalized Tables

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Normalized Tables

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Embedded Audit Module
(EAM)
o Objective to identify important transactions live while they are
being processed and extract copies in real time.
o May be used to monitor controls on an ongoing basis as
required by SAS 109.
o Two significant disadvantages:
o Operational efficiency: Decrease performance from a user’s point
of view, especially if testing is extensive.
o Verifying EAM integrity: May not be a viable audit technique in
environments with a high level of program maintenance.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Embedded Audit Module
Technique

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Generalized Audit Software
(GAS)

o Most widely used CAATT for IS auditing.


o Footing and balancing entire files or selected items.
o Selecting and reporting detailed file data.
o Selecting stratified statistical samples from data files.
o Formatting results of tests into audit reports.
o Printing confirmations in standard or special wording.
o Screening and selectively including or excluding data items.
o Comparing multiple files for identifying differences.
o Recalculating data fields.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Generalized Audit Software

o Popular because:
o GAS software is easy to use and requires little computer
background.
o Many products are platform independent, works on
mainframes and PCs.
o Auditors can perform tests independently of IT staff.
o GAS can be used to audit the data currently being stored in
most file structures and formats.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Using GAS to Access Simple
Structures

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Using GAS to Access
Simple Structures
o May pose problems as not all GAS products capable of
accessing every type of file structure.
o Auditing issues:
o Auditor must sometime rely on IT personnel to produce
files/data.
o Risk that data integrity is compromised by extraction
procedures.
o Auditors skilled in programming better prepared to avoid these
pitfalls.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Using GAS to Access Simple
Structures

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
ACL Software

o ACL is an industry leader of GAS software designed as an


auditor-friendly meta-language (i.e., contains commonly
used auditor tests).
o Text Figures 8.31 – 8.36 highlight ACL’s more commonly
used features: data definition, customizing a view, filtering
data, stratifying data and statistical analysis.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Normalizing Tables in a
Relational Database
APPENDIX

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Steps in the Normalization
Process

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Normalized Tables in a Relational
Database

o Design the user view as an output report.


o Represent the view as a single table that contains all of the
view attributes.
o Remove repeating group data.
o Multiple values for a particular attribute in a specific record.
o Remove partial dependencies.
o Occurs when one or more nonkey attributes are dependent on
only part of the primary key rather than the whole key.
o Remove transitive dependencies.
o Occurs where nonkey attributes are dependent on another
nonkey attribute and independent of table’s primary key.
© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Linkages Between Normalized
Tables

© 2016 Cengage Learning®. May not be scanned, copied or duplicated or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.

You might also like