You are on page 1of 30

Chapter 6

Data Management

1
Need for Data Management
• Content needs to be organized in order to be able to
retrieve it in an efficient manner
• Useful information must be delivered to the right person
at the right time in the right format. It must be accurate
and reliable.
• Almost all the data that we provide or access is stored in
databases

2
File Organization Terms and Concepts

• Database: Group of related files


• File: Group of records of same type
• Record: Group of related fields
• Field: Group of characters as text or
numbers
• Byte: Group of 8 bits that makes up a
character
• Bit: 0 and 1

3
Traditional File Processing

Application Users
Data Files programs

Financial Bursary Queries/Reports


Info programs

Student Admissions Queries/Reports


Personal programs
Info

Course- Transcript Queries/Reports


Grades programs
Info 4
Copyright © 2018 Pearson Education Ltd.
Problems with the Traditional File Environment (1 of 2)

• Files maintained separately by different departments


• Data redundancy
• The presence of duplicate data in multiple data files so that the
same data are stored in more than one place or location

• Data inconsistency
• Errors can arise during data entry.
• Data updated in one location may not be updated in the other
location.

5
Problems with the Traditional File Environment (2 of 2)
• Program-data dependence
• Changes made to the program may require changes being made to the
restructuring of data. Similarly, changes made to data may require changes
in program code.
• Lack of flexibility
• Not capable of generating reports that were not previously pre-defined and
pre-programmed.
• Poor security
• Individual departments responsible for security of their system
• Lack of data sharing
• Silos of data make it extremely difficult to access data across applications.
Hence, it cannot respond to information requirements in a timely fashion

6
Database Management Systems (1 of 2)
• Database
• A collection of data organized to serve
many applications efficiently by centralizing
the data and controlling redundant data.
• Database management system (DBMS)
– Software that allows for the creation of a
database.
– Interfaces between applications and data
files

7
Database Management Systems (2 of 2)
• Solves problems of traditional file environment
• Reduced redundancy
• Program data independence
• Flexibility
• Shared data
• Better security
• User authentication using user names and passwords
• Access rights based on roles and responsibilities of users

8
Human Resources Database with Multiple Views

9
Copyright © 2018 Pearson Education Ltd.
Relational DBMS (1 of 3)
• Popular relational DBMS today:
• MS Access (for PCs), Oracle,
MySQL, MS SQL Server
• Represents data as two-
dimensional tables or relations
• Each table contains data on
entity and attributes
• Entity: Person, place, thing on
which we store information
• Attribute: Each characteristic, or
quality, describing entity

10
Relational DBMS (2 of 3)
• Table: grid of columns and
rows
• Records (rows): Collection of
attributes for each entity
• Fields (columns): Represents
attribute for entity
• When a user enters data
into a form, the data goes
into tables

11
Relational DBMS (3 of 3)
• Primary key: Field used to
uniquely identify each
record
• Foreign key: Key field used
in second table as look-up
field to identify records from
original table
• Relationships: Used to link
tables to one another via
the primary and foreign keys
• Enables retrieval of data
across several tables

12
Capabilities of Database Management Systems
• Data definition
• Data dictionary
• Data manipulation language
• Querying
• Structured Query Language
(SQL)
• Report generation

13
Example of an SQL Query, and Access Query

14
Copyright © 2018 Pearson Education Ltd.
MS Access Query wizard

15
Copyright © 2018 Pearson Education Ltd.
The Challenge of Big Data
• Big data
• Massive sets of unstructured, semi-structured and structured data from
transactions, web traffic, social media, sensors, and so on
• Characterized by:
• Large volume, wide variety, high velocity
• Can reveal more patterns, relationships and anomalies than smaller
data sets
• Volumes too great for typical DBMS
• Petabytes, exabytes of data
• Requires new tools and technologies to manage and analyze
16
Business Intelligence Infrastructure (1 of 4)
• Array of tools for obtaining information from separate systems and
from big data
• Data warehouse
• Data marts
• Hadoop
• In-memory computing
• Analytical tools

17
Business Intelligence Infrastructure (2 of 4)
• Data warehouse
– Stores current and historical structured data from many core operational
transaction systems
– Consolidates and standardizes information for use across enterprise, but data
cannot be altered
– Provides analysis and reporting tools
– Response times can be slow for very large volumes of data
• Data marts
– Subset of data warehouse
– Typically focus on single subject or line of business
– Lower costs, faster response times

18
Business Intelligence Infrastructure (3 of 4)
• Hadoop
• Handles structured, semi-structured and unstructured data
• Enables distributed parallel processing of big data across inexpensive
computers
• Capable of handling larger volumes than data warehouses
• Processing distributed across a network of inexpensive servers
• Used by Expedia to analyze transaction data, user interaction data on the
website, and marketing and advertising expenditure logs to monitor the
success of their marketing campaigns.

19
Business Intelligence Infrastructure (4 of 4)
• In-memory computing
• Use of distributed RAM to store big data across several computers in a
network
• Uses computers main memory (RAM) for data storage to avoid delays in
retrieving data from disk storage
• Can reduce hours/days of processing to seconds

20
Analytical Tools: Relationships, Patterns, Trends (1 of 5)

• Tools for consolidating, analyzing, and providing access to vast


amounts of data to help users make better business decisions
• Multidimensional data analysis (OLAP)
• Data mining
• Text mining
• Web mining

21
Analytical tools: Online Analytical Processing (OLAP) (2 of 5)
• Supports multidimensional data analysis
• Viewing data using multiple dimensions
• Each aspect of information (product, pricing,
cost, region, time period) is a different
dimension
• Example: Compare the Actual Sales for Nuts
and Bolts across the East and West regions.
• OLAP enables rapid, online answers to ad
hoc queries
• MS Excel is well suited for
multidimensional analysis
• Use of Pivot Tables to view information in
different ways

22
Pivot Table Reports

23
Analytical Tools: Data Mining (3 of 5)
• OLAP
• Summarizes data along various Number of Credit Card holders
dimensions 2015 (in 2016 (in 2019 (in 2020 (in
• Provides results to specific questions Region State thousands) thousands) ~~~~~~~~ thousands) thousands)

• What are the number of credit card East


West
New York
California
~~~~~~
~~~~~~
~~~~~~
~~~~~~
~~~~~~
~~~~~~
45
123
56
167
holders, by region for the years 2019 East Massachussetts ~~~~~~ ~~~~~~ ~~~~~~ 76 65
and 2020? South Georgia ~~~~~~ ~~~~~~ ~~~~~~ 23 78

• Data mining South


West
Texas
Oregon
~~~~~~
~~~~~~
~~~~~~
~~~~~~
~~~~~~
~~~~~~
87
56
95
83
• Finds hidden patterns, relationships in South Georgia ~~~~~~ ~~~~~~ ~~~~~~ 68 82
datasets
• Infers rules to predict future behavior
• What are the characteristics of credit
card holders and which bank
customers are likely to apply for a
credit card in the next six months?
• Example: customer buying patterns

24
Analytical Tools: Text Mining (4 of 5)
• It is estimated that 80% of all enterprise data consist of
unstructured and semi-structured data.
• Extracts information from textual documents in large semi-
structured and unstructured data sets (blog posts, customer
sentiments expressed on social media and online forums, emails,
survey responses etc.)
• Finds hidden patterns and trends in data
• Sentiment analysis software
• Uncovers tone and emotion in text

25
Analytical Tools: Web Mining (5 of 5)
• Discovery and analysis of useful patterns and information from semi-structured
and unstructured data on the web
• Example: Google Trends and Twitter Trends show the popularity of searches and topics over
time
• Can be used by businesses to monitor product trends and location of prospective customers.

• Web content mining


• Extracts useful information from web pages
• Used by Google to index web pages for its search engine listings
• Web structure mining
• Extracts information regarding the structure of a website
• Uses quantity and quality of inbound and outbound links to determine the quality of a website
• Web usage mining (web analytics)
• Collection and analysis of data to understand and optimize web usage
26
Static web pages
• Static web pages present the
same data to all users; fixed,
unchanging data unless changed
by the web master
• Web page retrieved from web
server.
• No database involved

27
Databases and the Web
• Many companies use the web to make some internal databases available to customers or
partners
• Browser connects to the database via company servers
• Changes made in the database will be reflected automatically on the web pages
• Allow users to search for and find information on a web database, using a web interface and
a search mechanism
• Web pages are generated on the fly and display different information to different users
based on their search criteria.
• Examples:
o Shopping sites
o Youtube
o Google Search
o Social media sites
o Content Management sites

Linking internal databases to the web 28


Establishing an Information Policy
• Firm’s rules, procedures, roles for sharing, managing, standardizing
data
• Data administration
• Establishes policies and procedures to manage data
• Data governance
• Deals with policies and processes for managing availability, usability, integrity,
and security of data, especially regarding government regulations
• Database administration
• Creating and maintaining database

29
Ensuring Data Quality
• Before a new database is in place, a firm must:
• Identify and correct faulty data
• Establish better routines for editing data once database in operation
• Data quality audit
• Survey of the accuracy and level of completeness of the data in an
information system
• Data cleansing (data scrubbing)
• Activities to identify and correct errors and inconsistencies in data

30

You might also like