You are on page 1of 44

Sistemas de Información para

la Toma de Decisiones

Tema 10 - Capitulo 11
Data Management:
Warehousing, Analyzing,
Mining & Vizualization
1
Learning Objectives
!  Recognize the importance of data, their managerial issues,
and their life cycle.
!  Describe the sources of data, their collection, and quality
issues.
!  Relate data management to multimedia and document
management.
!  Explain the operation of data warehousing and its role in
decision support.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 2
Learning Objectives (cont.)
!  Understand the data access and analysis problem and the data
mining and online analytical processing solutions.

!  Describe data presentation methods and explain geographical


information systems, visual simulations, and virtual reality as
decision support tools.

!  Discuss the role and provide examples of marketing databases.

!  Recognize the role of the Web in data management.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 3
Case: Sears & Data Warehouses
Problem:
!  Sears was caught by surprise in the 1980s as shoppers defected to
specialty stores and discount mass merchandisers.
Solution:
!  Sears constructed a single sales information data warehouse, replacing
18 old databases which were packed with redundant, conflicting &
obsolete data.
!  By 2001, Sears made the following Web initiatives:
"  e-Commerce home improvement center
"  B2B supply exchange for the retail industry
"  Online Toy catalog and much more

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 4
Case: Sears & Data Warehouses
Results:
!  The ability to monitor sales by item per store enables Sears to create a
sharp local market focus.
!  Data monitoring of Web-based sales helps Sears marketing and Web
advertisement plans.
!  Response time to queries has dropped from days to minutes.
!  The data warehouse offers Sears employees a tool for making better
decisions.
!  Sears retailing profits have climbed more than 20 % annually since the
data warehouse was implemented.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 5
Difficulties of Managing Data
"  The amount of data increases exponentially.
"  Data are scattered throughout organizations and are collected by
many individuals using several methods and devices.
"  Only small portions of an organization’s data are relevant for any
specific decision.
"  An ever-increasing amount of external data needs to be considered
in making organizational decisions.
"  Data are frequently stored in several servers and locations in an
organization.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 6
Difficulties of Managing Data (cont.)
"  Raw data may be stored in different computing systems, databases,
formats, and human and computer languages.
"  Legal requirements relating to data differ among countries and
change frequently.
"  Selecting data management tools can be a major problem because of
the huge number of products available.
"  Data security, quality, and integrity are critical yet are easily
jeopardized.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 7
Data Life Cycle

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 8
Data Sources & Collection
Internal Data. An organization s internal data are about people,
products, services, and processes.
Personal Data. IS users or other corporate employees may
document their own expertise by creating personal data.
External Data. There are many sources for external data, ranging
from commercial databases to sensors and satellites.
The Internet & Commercial Database Services. Some external
data flow to an organization through electronic data interchange
(EDI), through other company-to-company channels or the
Internet.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 9
Data Quality

Data Quality (DQ) is an


extremely important
issue since quality
determines the data s
usefulness as well as the
quality of the decisions
based on the data.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 10
Data Quality Problems
(Strong et al.,1997)

Intrinsic DQ: Accuracy, Contextual DQ: Relevancy,


objectivity, believability, and value added, timeliness,
reputation. completeness, amount of
data.
Accessibility DQ: Representation DQ:
Accessibility and access Interpretability, ease of
security. understanding, concise
representation, consistent
representation.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 11
Object-Oriented Databases
!  The object-oriented database is the most widely used of the
newest methods of data organization, especially for Web
applications.

!  An object-oriented database is a part of the object-oriented


paradigm, which also includes object-oriented programming,
operating systems, and modeling.

!  Object-oriented databases are sometimes referred to as


multimedia databases and are managed by special multimedia
database management systems.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 12
Document Management
Document Management is the automated control of electronic
documents, page images, spreadsheets, word processing
documents, and complex, compound documents through their
entire life cycle within an organization, from initial creation to final
archiving.
Benefits of Document Management :
"  Greater control over production, storage, and distribution of documents
"  Greater efficiency in the reuse of information
"  Control of a document through a workflow process
"  Reduction of product cycle times

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 13
Case: U.S. Automobile Association (USAA)
Problem:
!  The USAA is a large insurance company in Texas that serves over 2
million officers. In the 1980s, the company experienced extreme
delays in data retrieval and searches.
Solution:
!  Using an environment called Automated Insurance Environment,
USAA has been transformed into a completely paperless company.
Results:
!  The system reduces the cost of storing documents, improves
customer service, and improves productivity of employees.
!  USAA now saves $70,500,000 for the 10,000,000 documents handled
annually.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 14
Data Processing
Data processing in organizations can be viewed either as
transactional or analytical.

!  Transactional: !  Analytical:
!  The data in transactions !  Analytical processing
processing systems (TPS) involves analysis of
are organized mainly in a accumulated data, mainly
hierarchical structure and by end-users.
are centrally processed. !  Includes DSS, EIS, Web
!  Databases and processing applications, and other end-
systems are known as user activities.
operational systems.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 15
Delivery Systems

A good data delivery system


should be able to support:
#  Easy data access by the
end-users themselves.
#  A quick decision-making
process.
#  Accurate and effective
decision making.
#  Flexible decision making.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 16
Data Warehouses
!  The purpose of a data warehouse is to establish a data
repository that makes operational data accessible in a form
readily acceptable for analytical processing activities (e.g.
decision support, EIS)

!  Data warehouses include a companion called metadata, meaning


data about data.

Major Benefits of Data Warehouses:


(1) The ability to reach data quickly, as they are located in one
place.
(2) The ability to reach data easily, frequently by end-users
themselves, using Web browsers.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 17
Data Warehouses

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 18
Characteristics of Data Warehouses
1)  Organization. Data are organized by detailed subjects.
2)  Consistency. Data in different operational databases may be
encoded differently. In the warehouse they will be coded in a
consistent manner.
3)  Time variant. The data are kept for 5 to 10 years so they can be
used for trends, forecasting, and comparisons over time.
4)  Non-volatile. Once entered into the warehouse, data are not
updated.
5)  Relational. The data warehouse uses a relational structure.
6)  Client/server. The data warehouse uses the client/server to provide
the end user an easy access to its data.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 19
Data Warehouse Suitability
Data warehousing is most appropriate for organizations in which
some of the following apply.
"  Large amounts of data need to be accessed by end-users.
"  The operational data are stored in different systems.
"  An information-based approach to management is in use.
"  There is a large, diverse customer base.
"  The same data are represented differently in different
systems.
"  Data are stored in highly technical formats that are difficult to
decipher.
"  Extensive end-user computing is performed.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 20
Data Marts
Data Marts are an alternative used by many other firms is creation of a
lower cost, scaled-down version of a data warehouse. They refer to
small warehouses designed for a strategic business unit (SBU) or a
department.
Two major types of Data Marts:
1) Replicated (dependent) Data Marts. In such cases one can
replicate functional subsets of the data warehouse in smaller
databases.
2) Stand-Alone Data Marts. A company can have one or more
independent data marts without having a data warehouse.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 21
Knowledge Discovery in Databases (KDD)
!  KDD is the process of extracting useful knowledge from volumes
of data.
!  It is the subject of extensive research.
!  KDD’s objective is to identify valid, novel, potentially useful, and
ultimately understandable patterns in data.
!  KDD is useful because it is supported by three technologies that
are now sufficiently mature:
"  Massive data collection
"  Powerful multiprocessor computers
"  Data mining algorithms

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 22
Evolution of KDD
Stages in the Evolution of Knowledge Discovery
Evolutionary Stage Business Question Enabling Technologies Characteristics
Data Collection What was my total Computer, tapes, disks. Retrospective, static
(1960s) revenue in the last five data delivery
years?
Data Access (1980s) What were unit sales in Relational databases Retrospective,
New England last March? (RDBMS), structured query dynamic data delivery
language (SQL) at record level
Data Warehousing & Drill down to Boston? Online analytic processing Retrospective,
Decision Support (OLAP), multidimensional dynamic data delivery
(early 1990s) databases, data warehouses at multiple levels
Intelligent Data What’s likely to happen to Advanced algorithms, Prospective,
Mining (late 1990s) Boston unit sales next multiprocessor computers, proactive information
month? Why? massive databases delivery
Source: Courtesy of Accrue Software.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 23
Tools & Techniques of KDD
!  Ad-hoc queries allow users to request in real time
information from the computer that is not available in the
periodical reports. Such answers are needed to expedite
decision making.

!  Online analytical processing (OLAP) refers to such end-


user activities as DSS modeling using spreadsheets and
graphics, which are done online.

!  Ready-made Web-based Analysis. Many vendors provide


ready made analytical tools, mostly in finance, marketing, and
operations.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 24
Data Mining
!  Data mining derives its name from the similarities
between searching for valuable business information in a
large database,and mining a mountain for valuable ore.
!  Data mining technology can generate new business
opportunities by providing these capabilities:
!  Automated prediction of trends and behaviors. Data
mining automates the process of finding predictive
information in large databases.
!  Automated discovery of previously unknown patterns.
Data mining tools identify previously hidden patterns in
one step.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 25
Applications of Data Mining
Data Mining is currently being used in the following areas;

"  Retailing & Sales "  Insurance


"  Banking "  Policework
"  Manufacturing & Production "  Government & Defense
"  Airlines
"  Brokerage & Securities
"  Health care
trading
"  Broadcasting
"  Computer hardware &
"  Marketing
software

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 26
Text & Web Mining
!  Text mining is the application of data mining to non-
structured or less structured text files.
!  Text mining helps organizations to do the following:
!  Find the “hidden” content of documents, including additional
useful relationships.
!  Group documents by common themes.
!  Web Mining refers to mining tools used to analyze a large
amount of data on the Web, such as what customers are
doing on the Web—that is, to analyze clickstream data.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 27
Data Visualization
Data visualization refers to the
presentation of data by
technologies such as digital
images, geographical
information systems,
graphical user interfaces,
multidimensional tables and
graphs, virtual reality, three-
dimensional presentations,
and animation.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 28
CASE: Data Visualization Helps Haworth
Problem
!  Haworth Corporation, a major office furniture manufacturer, has
maintained a competitive edge by offering customization.
!  But many customers are unable to visualize the 21 million
potential product combinations.
Solution:
!  Computer visualization software enables sales representatives
with laptops to show customers exactly what they were ordering.
Results:
!  Reduction in time spent between sales reps and CAD operators,
& increased customer satisfaction with quicker delivery.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 29
Multidimensionality
!  Modern data and information may have several dimensions.
!  e.g. Management may be interested in examining sales figures in
a certain city by product, by time period, by salesperson, and by
store.

!  It is important to provide the user with a technology that


allows him or her to add, replace, or change dimensions
quickly and easily in a table and/or graphical presentation.

!  The technology of slicing, dicing, and similar manipulations is


called Multidimensionality.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 30
Multidimensionality
Three factors are considered in multidimensionality:

Examples of Examples of Examples of


dimensions: measures: time:
Products, Money, sales
salespeople, market Daily, weekly,
volume, head monthly, quarterly,
segments, business count, inventory
units, geographical profit, actual versus yearly.
locations, distribution forecasted results.
channels, countries,
industries.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 31
Advantages of Multidimensionality
#  Data can be presented and navigated with relative ease.
.
#  Multidimensional databases are easier to maintain.
#  Multidimensional databases are significantly faster than
relational databases as a result of the additional dimensions
and the anticipation of how the data will be accessed by
users.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 32
Geographic Information Systems (GIS)
!  A geographical information system (GIS) is a computer-based
system for capturing, storing, checking, integrating,
manipulating, and displaying data using digitized maps.
–  Every record or digital object has an identified geographical location.
!  Banks are using GIS for plotting the following:
–  Branch and ATM locations
–  Customer demographics
–  Volume and traffic patterns of business activities
–  Geographical area served by each branch
–  Market potential for banking activities
–  Strengths and weaknesses against the competition
–  Branch performance

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 33
Geographic Information Systems (GIS)
!  GIS Software varies in its capabilities, from simple computerized
mapping systems to enterprise wide tools for decision support data
analysis.
!  GIS Data are available from a wide variety of sources. Government
sources (via the Internet and CD-ROM) provide some data, while
vendors provide diversified commercial data as well
!  GIS & Decision Making.  The graphical format of makes it easy
for managers to visualize the data & make decisions.
!  GIS and the Internet or intranet. Most major GIS software
vendors are providing Web access, such as embedded browsers, or
a Web/Internet/intranet server that hooks directly into their software.
!  Emerging GIS Applications. 

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 34
Visual Interactive Modeling (VIM)
!  Visual interactive modeling !  Visual interactive simulation
(VIM) uses computer graphic (VIS) is one of the most
displays to represent the developed areas in VIM.
impact of different
–  It is a decision simulation in
management decisions on
which the end-user watches
goals such as profit or market
the progress of the simulation
share.
model in an animated form
–  A VIM can be used both for
supporting decisions & using graphics terminals.
training.
–  It can represent a static or a
dynamic system.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 35
Virtual Reality
!  Virtual reality (VR) is interactive, computer-generated, three-
dimensional graphics delivered to the user through a head-
mounted display.
!  VR applications to date have been used to support decision
making indirectly.
–  Boeing has developed a virtual aircraft mock-up to test designs.
–  At Volvo, VR is used to test virtual cars in virtual accidents.
!  Data visualization helps financial decision makers by using
visual, spatial & aural immersion virtual systems.
–  Some stock brokerages have a VR application in which users surf over
a landscape of stock futures, with color, hue, and intensity.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 36
Marketing Transaction Database
!  The Marketing transaction database (MTD) combines many of
characteristics of static databases and marketing data sources
into a new database that allows marketers to engage in real-time
personalization and target every interaction with customers.
!  The MTD provides dynamic, or interactive, functions not available
with traditional types of marketing databases.
"  Exchanging information allows marketers to refine their understanding of
each customer continuously.
!  Data mining, data warehousing, and MTDs are delivered on the
Internet and intranets.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 37
Implementation Examples
The following examples illustrate how companies use data mining and
warehousing to support the new marketing approaches;
#  Alamo Rent-a-Car discovered that German tourists liked bigger cars.
So now, when Alamo advertises its rental business in Germany, the
ads include information about its larger models.

#  Au Bon Pain Company discovered that they were not selling as much
cream cheese as planned. When they analyzed point-of-sale data, they
found that customers preferred small, one-serving packaging.

#  AT&T and MCI sift through terabytes of customer phone data to fine-
tune marketing campaigns and determine new discount calling plans.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 38
CASE: Data Mining Powers Walmart
!  Wal-Mart s formula for success owes much to the company s
multimillion-dollar investment in data warehousing.
!  The systems house data on point of sale, inventory, products in
transit, market statistics, customer demographics, finance, product
returns, and supplier performance.
–  The data are used for three broad areas of decision support:
•  analyzing trends
•  managing inventory
•  understanding customers
!  The data warehouse is available over an extranet to store managers
and suppliers.
–  In 2001, 5,000 users made over 35,000 database queries each day.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 39
Web-based Data Management Systems
!  Business intelligence activities – from data acquisition, through
warehousing, to mining – can be performed with Web tools or are
interrelated with Web technologies and e-Commerce.
!  e-Commerce software vendors are providing Web tools that connect
the data warehouse with EC ordering and cataloging systems.
–  e.g. Tradelink, a product of Hitachi
!  Data warehousing and decision support vendors are connecting their
products with Web technologies and EC.
–  e.g. Comshare s DecisionWeb, Brio s Brio One, Web Intelligence from
Business Objects, and Cognos s DataMerchant.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 40
Corporate Portals

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 41
Web-based Data Acquisition & Agents

Web-based Data Acquisition Intelligent Data Warehouse


!  Traditional data acquisition has !  The amount of data in the data
become a pervasive element in warehouse can be very large.
today s business environment.

!  This acquisition includes both !  While the organization of data is


the recording of information done in a way that permits easy
from online surveys and search, it still may be useful to
questionnaires, and direct have a search engine for
measurements taken in the specific applications.
manufacturing environment.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 42
Managerial Issues
"  Cost–benefit issues &
justification. A cost–benefit
analysis must be undertaken
before any commitment to new
technologies.

"  Where to store data


physically. Should data be
distributed close to their sources?
"  The legacy data problem.
Or should data be centralized for What should be done with masses
easier control.
of information already stored in a
variety of formats, often known as
"  Legal issues. Data mining gives the legacy data acquisition
raise to a variety of legal issues.
problem?

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 43
Managerial Issues (cont.)
"  Disaster recovery. How well can "  Privacy. Collecting data in a
an organization’s business warehouse and conducting data
processes recover after an mining may result in the invasion of
information system disaster? privacy.
"  Internal or external? Should a
firm store & maintain its databases "  Data purging. When is it
internally or externally? beneficial to “clean house” and
purge information systems of
"  Data security and ethics. Are obsolete or non–cost-effective
the company s competitive data data?
safe from external snooping or
sabotage? "  Data delivery. A problem
regarding how to move data
"  Ethics. Should people have to efficiently around an enterprise also
pay for use of online data? exists.

MTI. Carlos J. Duarte Camacho Sistemas de Información para la Toma de Decisiones. Tema 10 Diapositiva 44

You might also like