Professional Documents
Culture Documents
Data Management
1
Need for Data Management
• Content needs to be organized in order to be able to
retrieve it in an efficient manner
• Useful information must be delivered to the right person
at the right time in the right format. It must be accurate
and reliable.
• Almost all the data that we provide or access is stored in
databases
2
File Organization Terms and Concepts
3
Traditional File Processing
Application Users
Data Files programs
• Data inconsistency
• Errors can arise during data entry.
• Data updated in one location may not be updated in the other
location.
5
Problems with the Traditional File Environment (2 of 2)
• Program-data dependence
• Changes made to the program may require changes being made to the
restructuring of data. Similarly, changes made to data may require changes
in program code.
• Lack of flexibility
• Not capable of generating reports that were not previously pre-defined and
pre-programmed.
• Poor security
• Individual departments responsible for security of their system
• Lack of data sharing
• Silos of data make it extremely difficult to access data across applications.
Hence, it cannot respond to information requirements in a timely fashion
6
Database Management Systems (1 of 2)
• Database
• A collection of data organized to serve
many applications efficiently by centralizing
the data and controlling redundant data.
• Database management system (DBMS)
– Software that allows for the creation of a
database.
– Interfaces between applications and data
files
7
Database Management Systems (2 of 2)
• Solves problems of traditional file environment
• Reduced redundancy
• Program data independence
• Flexibility
• Shared data
• Better security
• User authentication using user names and passwords
• Access rights based on roles and responsibilities of users
8
Human Resources Database with Multiple Views
9
Copyright © 2018 Pearson Education Ltd.
Relational DBMS (1 of 3)
• Popular relational DBMS today:
• MS Access (for PCs), Oracle,
MySQL, MS SQL Server
• Represents data as two-
dimensional tables or relations
• Each table contains data on
entity and attributes
• Entity: Person, place, thing on
which we store information
• Attribute: Each characteristic, or
quality, describing entity
10
Relational DBMS (2 of 3)
• Table: grid of columns and
rows
• Records (rows): Collection of
attributes for each entity
• Fields (columns): Represents
attribute for entity
• When a user enters data
into a form, the data goes
into tables
11
Relational DBMS (3 of 3)
• Primary key: Field used to
uniquely identify each
record
• Foreign key: Key field used
in second table as look-up
field to identify records from
original table
• Relationships: Used to link
tables to one another via
the primary and foreign keys
• Enables retrieval of data
across several tables
12
Capabilities of Database Management Systems
• Data definition
• Data dictionary
• Data manipulation language
• Querying
• Structured Query Language
(SQL)
• Report generation
13
Example of an SQL Query, and Access Query
14
Copyright © 2018 Pearson Education Ltd.
MS Access Query wizard
15
Copyright © 2018 Pearson Education Ltd.
The Challenge of Big Data
• Big data
• Massive sets of unstructured, semi-structured and structured data from
transactions, web traffic, social media, sensors, and so on
• Characterized by:
• Large volume, wide variety, high velocity
• Can reveal more patterns, relationships and anomalies than smaller
data sets
• Volumes too great for typical DBMS
• Petabytes, exabytes of data
• Requires new tools and technologies to manage and analyze
16
Business Intelligence Infrastructure (1 of 4)
• Array of tools for obtaining information from separate systems and
from big data
• Data warehouse
• Data marts
• Hadoop
• In-memory computing
• Analytical tools
17
Business Intelligence Infrastructure (2 of 4)
• Data warehouse
– Stores current and historical structured data from many core operational
transaction systems
– Consolidates and standardizes information for use across enterprise, but data
cannot be altered
– Provides analysis and reporting tools
– Response times can be slow for very large volumes of data
• Data marts
– Subset of data warehouse
– Typically focus on single subject or line of business
– Lower costs, faster response times
18
Business Intelligence Infrastructure (3 of 4)
• Hadoop
• Handles structured, semi-structured and unstructured data
• Enables distributed parallel processing of big data across inexpensive
computers
• Capable of handling larger volumes than data warehouses
• Processing distributed across a network of inexpensive servers
• Used by Expedia to analyze transaction data, user interaction data on the
website, and marketing and advertising expenditure logs to monitor the
success of their marketing campaigns.
19
Business Intelligence Infrastructure (4 of 4)
• In-memory computing
• Use of distributed RAM to store big data across several computers in a
network
• Uses computers main memory (RAM) for data storage to avoid delays in
retrieving data from disk storage
• Can reduce hours/days of processing to seconds
20
Analytical Tools: Relationships, Patterns, Trends (1 of 5)
21
Analytical tools: Online Analytical Processing (OLAP) (2 of 5)
• Supports multidimensional data analysis
• Viewing data using multiple dimensions
• Each aspect of information (product, pricing,
cost, region, time period) is a different
dimension
• Example: Compare the Actual Sales for Nuts
and Bolts across the East and West regions.
• OLAP enables rapid, online answers to ad
hoc queries
• MS Excel is well suited for
multidimensional analysis
• Use of Pivot Tables to view information in
different ways
22
Pivot Table Reports
23
Analytical Tools: Data Mining (3 of 5)
• OLAP
• Summarizes data along various Number of Credit Card holders
dimensions 2015 (in 2016 (in 2019 (in 2020 (in
• Provides results to specific questions Region State thousands) thousands) ~~~~~~~~ thousands) thousands)
24
Analytical Tools: Text Mining (4 of 5)
• It is estimated that 80% of all enterprise data consist of
unstructured and semi-structured data.
• Extracts information from textual documents in large semi-
structured and unstructured data sets (blog posts, customer
sentiments expressed on social media and online forums, emails,
survey responses etc.)
• Finds hidden patterns and trends in data
• Sentiment analysis software
• Uncovers tone and emotion in text
25
Analytical Tools: Web Mining (5 of 5)
• Discovery and analysis of useful patterns and information from semi-structured
and unstructured data on the web
• Example: Google Trends and Twitter Trends show the popularity of searches and topics over
time
• Can be used by businesses to monitor product trends and location of prospective customers.
27
Databases and the Web
• Many companies use the web to make some internal databases available to customers or
partners
• Browser connects to the database via company servers
• Changes made in the database will be reflected automatically on the web pages
• Allow users to search for and find information on a web database, using a web interface and
a search mechanism
• Web pages are generated on the fly and display different information to different users
based on their search criteria.
• Examples:
o Shopping sites
o Youtube
o Google Search
o Social media sites
o Content Management sites
29
Ensuring Data Quality
• Before a new database is in place, a firm must:
• Identify and correct faulty data
• Establish better routines for editing data once database in operation
• Data quality audit
• Survey of the accuracy and level of completeness of the data in an
information system
• Data cleansing (data scrubbing)
• Activities to identify and correct errors and inconsistencies in data
30