Chapter 13 Data and Databases

CHAPTER 13 SUMMARY
 The Need For Data Collection And Storage. Data that results from any transaction must be collected and stored for
many reasons.
1. To complete transactions from beginning to end. For a sale, this may involve taking the order, pulling items
from the warehouse, shipping items, billing the customer, collecting the cash, and crediting the customer
account for payment.
2. To follow-up with customers or vendors and to expedite future transactions. For example, if the company
stores name, address, and other details about a customer, it need not reenter that data when the customer
places future orders.
3. To create accounting reports and financial statements.
4. To provide feedback to management so they can effectively and efficiently manage.
Accounting data from transactions is in the form of structured data. Structured data is the type of data that easily fits
into rows and columns. Companies also collect unstructured data. Unstructured data is data that does not easily fit
into rows and columns of fixed length. An example of unstructured data would be the free-form text in customer
reviews of products. This chapter describes the typical storage and processing techniques used in organizations to
manage the mountain of structured data resulting from transactions.
 Storing And Accessing Data. The storage of data and the way in which that data is used are extremely interrelated.
Data that will be needed quickly and frequently must be stored in a manner that allows frequent and quick access.
The reverse of this is true also. Data that is stored in a manner that allows frequent and quick access is easy to
access and use.
o Data Storage Media. A computer data hierarchy is character, field, record, file, and database.
o Data Storage Media. In the early days of computers, files were stored on magnetic tape. Files may be stored
in sequential access, or random access, or index sequential access method. Processing can be accomplished
via batch processing, online processing, or online, real-time processing.
 Data Processing Techniques. Processing can be accomplished via batch processing, online processing, or online,
real-time processing. When determining whether batch or real time processing is appropriate, system professionals
must consider response time, efficiency, complexity, control, and storage media. Batch systems have slow response
times because the transactions are not processed until the whole group is ready to be processed. Real time systems
have fast response times because transactions are processed as entered. Batch processing is more efficient for a
large volume of similar transactions. Batch processing is easier to control, while there are more internal control
complexities with real-time systems.
 Databases. A database is a collection of data stored on the computer in a form that allows the data to be easily
accessed, retrieved, manipulated, and stored. The term database usually implies a shared database within the
organization. If data is not in a shared database, there are two problems. Data redundancy occurs when the same
data is stored in more than one file, rather than in a shared database. Due to this data redundancy, adding records,
deleting records, and editing or changing records is more likely to cause errors in the data. Secondly, concurrency
means that all of the multiple instances of the same data are exactly the same. Updating data that is not in a shared
database leads to concurrency problems. In a shared database, the data is stored only once and any and all changes
to a record are immediately available to those who share the data. Adding records, deleting records, and editing
records are less likely to cause erroneous data when that data is stored only once and shared. The Database
Management System (DBMS) is software that manages the database and controls the access and use of data by
individual users and applications. The DBMS determines which parts of the database can be read or modified by
individuals or processes. Data have relationships between records. There are three types of relationships: one-to-
one, one-to-many, and many-to-many.
o The History Of Databases. The earliest databases were flat file databases in two dimensional tables. These
were stored in text format in sequential files and such files are not efficient ways to access and use single
records. Databases later evolved into hierarchical databases define that relationships between records using
This study source was downloaded by 100000793596153 from CourseHero.com on 05-09-2023 01:32:05 GMT -05:00
https://www.coursehero.com/file/67825284/Chapter-13-Data-and-Databasesdocx/
an inverted tree structure. These relationships are called parent-child and they represent one-to-many
relationships. Hierarchical databases are efficient in processing large volumes of transactions, but they do
not allow for easy retrieval of records except those within an explicit linkage. This means that hierarchical
databases are not flexible enough to allow various kinds of inquiries of the data. A relational database stores
data in two-dimensional tables that are joined in many ways to represent many different kinds of
relationships in the data. Relational databases are built with many tables, with relationships between tables.
The tables are flexible enough to answer an unlimited number of queries. To obtain this flexibility, the tables
within a relational database must be designed according to specific rules. The process of converting data into
tables that meet the definition of a relational database is called data normalization.
o The Need For Normalized Data. To obtain this flexibility, the tables within a relational database must be
designed according to specific rules. The process of converting data into tables that meet the definition of a
relational database is called data normalization. Most relational databases are in third normal form, which
means they met the first three rules of data normalization. The first three rules of data normalization are as
follows:
1. Eliminate repeating groups. This rule requires that any related attributes (columns) that would be
repeated in several rows must be put in a separate table. There is an order table and an order details
table. If these were not separate tables, basic information of the order such as customer ID and ship
date would have to be repeated for each item ordered.
2. Eliminate redundant data.
3. Eliminate columns not dependent on the primary key.
o Trade-offs in database storage. While the relational database is very flexible for queries, it is not the most
efficient way to store data that will be used in other ways. The quickest way to access and process records
from a database when their intended use is processing a large volume of transactions is the hierarchical
model. But, the hierarchical model is not flexible when querying. Thus, there is a trade-off of transaction
processing efficiency for flexibility.
 The Use Of A Data Warehouse To Analyze Data. A data warehouse is an integrated collection of enterprise-wide
data that includes five to 10 years of non-volatile data, and it is used to support management in decision making and
planning. The data warehouse can be better understood by comparing it to the operational database. The
operational database is the data that is continually updated as transactions are processed. Periodically, new data is
uploaded to the data warehouse from the operational data, but other than this updating process, the data in the
data warehouse does not change.
o Build The Data Warehouse. The data in the data warehouse must support users’ needs and it must be
standardized across the enterprise. Rather than collect and incorporate all of the available data into the data
warehouse, it is important to include only data that meets user needs. Management, accounting, finance,
production and distribution functions will be using this data warehouse to budget, plan, forecast, and
analyze profitability.
o Identify The Data. The data in the data warehouse must be data that provides the right kind of information
to these user groups. To determine data that should be in a data warehouse it is important to examine user
needs and high-impact processes (HIPs). HIPs are the processes that are critically important and that must
be executed correctly if the organization is to survive and thrive. By identifying and examining both HIPs and
user data needs, the set of data needed in the data warehouse can be determined.
o Standardize The Data. The data in the data warehouse will come from many different processes and sub-
units across the enterprise. Different applications within the enterprise might use the same information, but
use it in a different manner. Most companies do not feel that they can afford the time or effort to rewrite
source code in these older, legacy systems. Rather than change existing systems it is easier to standardize
data in the data warehouse.
o Cleanse Or Scrub The Data. Since the data in a data warehouse is likely to come from many different
sources within the enterprise, there are likely to be errors and inconsistencies in the data. To the extent
possible, the data should be cleansed to remove or fix errors or problems in the data.
2|Page
oUpload The Data. Data from each of the HIP systems must be uploaded to the data warehouse. Also, on a
regular basis, new data should be uploaded to the data warehouse. Between the dates that data is
uploaded, the data warehouse is static and it does not change.
 Data Analysis Tools. The purpose of a data warehouse is to give managers a rich source of data that they can query
and examine for trends and patterns. Data in the data warehouse is analyzed by the use of data mining and
analytical processing.
o Data Mining. Data mining is the process of searching for identifiable patterns in data that can be used to
predict future behavior. Although there are many purposes to predict future behavior, the most popular use
of data mining is to predict future buying behavior of customers. If businesses are able to more accurately
predict customer buying behavior, they can plan appropriately to produce, distribute, and sell the right
products to customers at the right time. Software must be used to search for trends and patterns in the
data. The general term for these software tools is online analytical processing or OLAP.
o OLAP. OLAP is a set of software tools that allow online analysis of the data within a data warehouse. The
analytical methods in OLAP usually include:
1. Drill down. This is the successive expansion of data into more detail, going from high-level data to
successively lower levels of data. For example, if a person is examining sales, drill down would involve
examining sales for the year, then by month, then by week or day. This examination of successive levels
of detail is drill down.
2. Consolidation or roll-up. This is the aggregation or collection of similar data. It is the opposite of drill
down in that consolidation takes detailed data and summarizes it into larger groups.
3. Pivoting or rotating data. This is examining data from different perspectives. As an example, sales of
beer can be examined by time (months), by store type (convenience store or liquor store), by container
type (cans or bottles), etc.
4. Time series analysis to identify trends. This is the comparison of figures such as sales over several
successive time periods.
5. Exception reports present variances from expectations.
6. What-if simulations. This is used to understand interactions between different parts of the business.
 Distributed Data Processing -Data can be stored in a central location, or it can be distributed across various
locations. Similarly, the processing of data and transactions can occur only in a central location, or distributed across
the various locations. In the early days of computing, data processing and databases were stored and maintain in a
central location. This would be called centralized processing and centralized databases. However, in today’s IT
environment, most processing and databases are distributed. In distributed data processing (DDP) and distributed
databases (DDB) the processing and the databases are dispersed to different locations of the organization. A
distributed database is actually a collection of smaller databases dispersed across several computers on a computer
network.
o DDP and DDB. In distributed data processing (DDP) and distributed databases (DDB) the processing and the
databases are dispersed to different locations of the organization. Distributing the processing and data
offers many advantages. These advantages are:
1. Reduced hardware cost. distributed systems use networks of smaller computers rather
than a single mainframe computer. This configuration is much less costly to purchase
and maintain.
2. Improved Responsiveness. Access is faster since data can be located at the site of the
greatest demand for that data. Processing speed is improved since the processing
workload is spread over several computers.
3. Easier incremental growth. As the organization grows or requires additional computing
resources, new sites can be added quickly and easily. Adding smaller, networked
computers is easier and less costly than adding a new mainframe computer.
4. Increased user control and user involvement. If data and processing are distributed
locally, the local users have more control over the data. This control also allows users
3|Page
to be more involved in the maintenance of the data and users are therefore more
satisfied.
5. Automatic integrated backup. When data and processing are distributed across several
computers, the failure of any single site is not as harmful. Other computers within the
network can take on extra processing or data storage to make up for the loss of any
single site.
However, it is important to recognize that there are also disadvantages to the use DDP and
DDB. These disadvantages are increased difficulty of managing, controlling, and
maintaining integrity of the data.
oClient/Server Systems. Cloud-based database services are a fast growing area of IT. Many of the largest
computer-related companies are providers of cloud-based database services. The cloud provider generally
provides not only the data storage space, but the software tools to manage and control the database. The
customer company must have at least enough IT structure to access and use the data stored in a cloud
storage. The use of cloud data storage results in the same advantages described in chapter 2 of scalability,
expanded access, reduced infrastructure, and cost savings. A company that stores part or all of its database
in the cloud must recognize the risks of DaaS. As discussed in chapter 4, a user of cloud-based services is
dependent on the security, availability, processing integrity, and confidentiality controls of the provider.
 IT Controls For Data And Databases. The data is a valuable resource that must be protected with good internal
controls. IT-general controls assist in preventing unauthorized access and in assuring adequate backup. It is
important to use authentication and hacking controls such as log-in procedures, passwords, security tokens,
biometric controls, firewalls, encryption, intrusion detection, and vulnerability assessment. In addition to these
control procedures, the database management system (DBMS) must be set up so that each authorized user has a
limited view (schema) of the database. Business continuity planning, data backup procedures, and disaster recovery
planning can help assure adequate backup of databases. To ensure integrity (completeness and accuracy) of data in
the database, IT application controls should be used. These controls are input, processing, and output controls such
as data validation, control totals and reconciliation, and reports that are analyzed by managers.
 Ethical Issues Related To Data. There are many ethical issues related to the collection, storage, and protection of
data in databases. Companies collect and store a wealth of information about customers in their databases.
o Ethical Obligations Of The Company. All companies collect data from customers that may be private, non-
sharable data. The sensitivity and privacy of that data depends on the nature of the business and the type of
services or products sold. Companies have an ethical obligation to handle confidential data with due care.
Online companies that sell via websites have an even higher duty to maintain customer privacy and
confidentiality. The AICPA Trust Principles have an entire section devoted to Online Privacy. The Privacy
Framework lists ten privacy practices that should be adhered to by online companies
o Ethical Obligations Of The Employee. Within organizations, many employees must have access to private
data about clients and customers. These employees have an ethical obligation to avoid misuse of any private
or personal data about customers. There are no specific IT controls that would prevent authorized
employees from disclosing private information, but having and enforcing a code of ethics within the
organization can reduce the chances of such disclosure. Proper IT control procedures such as log-in
procedures, passwords, smart cards, biometric controls, encryption of data, and firewalls can help reduce
unauthorized access by employees.
o Ethical Obligations Of The Customer. Customers have an obligation to provide accurate and complete
information to companies that they deal with when the requests for such data are legitimate business needs.
In addition to this obligation, customers in some cases may have access to company data that should be kept
confidential.
4|Page
14. How does data differ from information?
Data is the basic facts collected from a transaction. Information is data that has been manipulated by summarizing,
categorizing, or analyzing to make that data useful to a decision maker.
41. Think of a database that would be needed at a professional service firm to maintain the contact list of
clients or patients at the office of a CPA, attorney or medical doctor. Identify the fields likely to be used in
this database. If you were constructing this database, how many spaces would you allow for each field?
The fields and suggested sizes that usually be needed are: last name (24), first name (24), middle initial or name (24),
address line 1 (50), address line 2 (50), apartment number (12), city (24), state (2), zip code (9), home telephone number
(10), office telephone number (10), mobile telephone number (10). Some field sizes could be more or fewer spaces. Of
course, fields sizes for fields such as zip code and phone number are more certain. It is important that the field size must
be slightly larger than the longest item to appear in that field. In the case of items for which we know the size precisely,
the field size can be set accordingly. For example, zip codes will never include more than 9 digits. These fields would
represent a minimal number of necessary fields. Depending on the type of firm, there may be many other fields that
would be needed. For example, a doctor’s office would need emergency contact information, while a CPA firm would not
need this information. In addition, there would be many other fields to be designed if the company desired more than
contact information.
59. (SO 12) Describe the ethical obligations of companies to their online customers.
A company must put processes and safeguards into place to protect the privacy and confidentiality of customer data.
The nine privacy practices described by the AICPA Trust Services Principles are a good source of the guidelines a company
should follow. A summary of those principles would be to notify customFpoers of the data that will be collected, how it
will be used and retained, how the customer may edit the data, and how the customer may consent, or not consent, to
such use of the data.
47.(SO 3) Differentiate between batch processing and real-time processing. What are the advantages and
disadvantages of each form of data processing? Which form is more likely to be used by a doctor’s office in
preparing the monthly patient bills? Batch processing occurs when similar transactions are grouped into a batch and
that batch is processed as a group. The alternative to batch processing is real time processing. Real-time processing
occurs when transactions are processed as soon as they are entered. Real-time processing is interactive because the
transaction is processed immediately. The advantages of batch processing are that it is an efficient way to process a large
volume of like transactions, it is less complex than real-time systems, it is easier to control and maintain an audit trail;
and the data can be stored in less complex, sequential storage. The major disadvantage of batch processing is the slow
response time. Balances are not updated in real-time and therefore, management does not have current information at
all times. The major advantage of real-time processing is the rapid response time. Since balances are updated in real-
time, management always has current information. The disadvantages of real-time processing are that it is less efficient
for processing large volumes of like transactions; it is more complex than batch systems; it is more difficult to control and
maintain an audit trail; and data must be stored in random access databases. Monthly processing of patient bills could be
batch processing. There would be a high volume of like transactions at month-end.
5|Page
Powered by TCPDF (www.tcpdf.org)

Chapter 13 Data and Databases

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 13 Data and Databases

Uploaded by

Copyright:

Available Formats

CHAPTER 13 SUMMARY

You might also like