You are on page 1of 32

Database Management

Systems
Time to Think
Databases and Business Decision Making
Tools for Business Intelligence
Time to Think
How is the data organized?
Time to Think?
How is the data organized?
Time to Think?
How is the data organized?
Time to Think?
How is the data organized?
File organization concepts
–Database: Group of related files

–File: Group of records of same type

–Record: Group of related fields

–Field: Group of characters as word(s) or number

•Describes an entity (person, place, thing on which we store


information)

•Attribute: Each characteristic, or quality, describing entity

–E.g., Attributes Date or Grade belong to entity COURSE


DBMS
Issues with traditional Data Organization

The Data Hierarchy

Attributes of an entity
A computer system
organizes data in a
hierarchy that starts with
the bit, which represents
either a 0 or a 1. Bits can be
grouped to form a byte to
represent one character,
number, or symbol. Bytes
can be grouped to form a
field, and related fields can
be grouped to form a record.
Related records can be
collected to form a file, and
related files can be
organized into a database.
DBMS:Issues with traditional Data Organization

◼ Data redundancy and


inconsistency

◼ Program-data dependence

◼ Lack of flexibility

◼ Poor security

◼ Lack of data sharing and


availability
DBMS
Database and DBMS

◼ Database
◼ Collection of data organized to serve many applications by
centralizing data and controlling redundant data

◼ Database management system


◼ A software that permits an organization to centralize data, manage them
efficiently, and provide access to the stored data by application
programs
◼ Interfaces between application programs and physical data files
◼ Separates logical and physical views of data
◼ Solves problems of traditional file environment
◼ Controls redundancy
◼ Eliminates inconsistency
◼ Uncouples programs and data
◼ Enables organization to central manage data and data security
DBMS: RDBMS Systems

◼ Relational DBMS
◼ Represent data as two-dimensional tables called relations or files
◼ Each table contains data on entity and attributes

◼ Table: grid of columns and rows


◼ Rows (tuples): Records for different entities
◼ Fields (columns): Represents attribute for entity
◼ Key field: Field used to uniquely identify each record
◼ Primary key: Field in table used for key fields
◼ Foreign key: Primary key used in second table as look-up field to
identify records from original table
DBMS: RDBMS Systems

A relational database
organizes data in the
form of two-
dimensional tables.
Illustrated here are
tables for the entities
SUPPLIER and PART
showing how they
represent each entity
and its attributes.
Supplier Number is a
primary key for the
SUPPLIER table and a
foreign key for the
PART table.
A DBMS has three important capabilities:

◼ 1)data definition is the capability to specify the structure


of the content of the data. It’s used to create database
tables and define the characteristics of the fields in each
table;

◼ 2)
the data dictionary stores definitions of data elements
and their characteristics;

◼ 3)the data manipulation language is used to add, change,


delete, and retrieve data in the database.
DBMS: RDBMS Systems

Operations of a Relational DBMS

◼ Three basic operations used to develop useful sets of data


◼ SELECT: Creates subset of data of all records that meet stated
criteria
◼ JOIN: Combines relational tables to provide user with more
information than available in individual tables
◼ PROJECT: Creates subset of columns in table, creating tables with
only the information specified
Evolution of DBMS
RDBMS Systems

The select, project, and join operations enable data from two different tables
to be combined and only selected attributes to be displayed.
Designing Databases
–you must understand the relationships among the data,

–the type of data that will be maintained in the database,

–how the data will be used, and how the organization will need to change to
manage data from a company-wide perspective.

–Conceptual (logical) design: Abstract model from business perspective

–Physical design: How database is arranged on direct-access storage


devices

–Normalization

•Streamlining complex groupings of data to minimize redundant data


elements and awkward many-to-many relationships.

•Most efficient way to group data elements to meet business requirements,


needs of application programs
–referential integrity

•rules to ensure that relationships between coupled tables remain


consistent

–Entity-relationship diagram

•Used by database designers to document the data model

•Illustrates relationships between entities


THE CHALLENGE OF BIG DATA
•beyond the ability of typical DBMS to capture, store, and
analyze.

•billions to trillions of records, all from different sources.

•Businesses are interested in big data because they can reveal


more patterns and interesting anomalies than smaller data
sets, with the potential to provide new insights into customer
behavior, weather patterns, financial market activity, or other
phenomena.

•To derive business value from these data, organizations need new
technologies and tools capable of managing and analyzing non-
traditional data along with their traditional enterprise data.
Databases and Business Decision
Making
◼ Databases store historical data so the information about
trends, changes across entire company cannot be obtained
from a database

◼ Functional silos in an organization prevent data connectivity


between various departments in an organization
Databases and Business Decision
Making
◼ Data warehouse:
◼ Stores current and historical data from many core operational
transaction systems
◼ Consolidates and standardizes information for use across
enterprise, but data cannot be altered
◼ Data warehouse system will provide query, analysis, and
reporting tools

◼ Data marts:
◼ Subset of data warehouse
◼ Summarized or highly focused portion of firm’s data for use by
specific population of users
◼ Typically focuses on single subject or line of business
Hadoop
–is an open source software framework managed by the Apache Software
Foundation

–For handling unstructured and semi-structured data in vast quantities, as well


as structured data.

–enables distributed parallel processing of huge amounts of data across


inexpensive computers “servers”.

–breaks a big data problem down into sub-problems, distributes them among
up to thousands of inexpensive computer processing nodes,

– and then combines the result into a smaller data set that is easier to analyze.

–Hadoop consists of several key services:

•Hadoop Distributed File System (HDFS) for data storage.

•MapReduce for high-performance parallel data processing

–Facebook announced the data gathered in the warehouse grows by roughly half
a PB per day. / PB is 1000⁵
In-Memory Computing

–Another way of facilitating big data analysis.

–relies primarily on a computer’s main memory (RAM) for data


storage. (Conventional DBMS use disk storage systems.)
Databases and Business Decision
Making
Tools for Business Intelligence

◼ Tools for consolidating, analyzing, and providing access to


vast amounts of data to help users make better business
decisions

◼ E.g., Harrah’s Entertainment analyzes customers to develop


gambling profiles and identify most profitable customers

◼ Principle tools include:


◼ Software for database query and reporting
◼ Online analytical processing (OLAP)
◼ Data mining
Tools for Business Intelligence

A series of
analytical tools
works with data
stored in
databases to find
patterns and
insights for
helping managers
and employees
make better
decisions to
improve
organizational
performance.
Tools for Business Intelligence
Online analytical processing (OLAP)

◼ Supports multidimensional data analysis


◼ Viewing data using multiple dimensions
◼ Each aspect of information (product, pricing, cost, region, time
period) is different dimension
◼ E.g., how many washers sold in East in June compared with other
regions?

◼ OLAP enables rapid, online answers to ad hoc queries


Tools for Business Intelligence

The view that is showing


is product versus
region. If you rotate the
cube 90 degrees, the
face that will show is
product versus actual
and projected sales. If
you rotate the cube 90
degrees again, you will
see region versus actual
and projected sales.
Other views are
possible.

Multi-dimensional Data Model


Tools for Business Intelligence
Data Mining

◼ More discovery driven than OLAP

◼ Finds hidden patterns, relationships in large databases and


infers rules to predict future behavior

◼ E.g., Finding patterns in customer data for one-to-one


marketing campaigns or to identify profitable customers.
Tools for Business Intelligence
Data Mining-Types of Information Available

◼ Associations: Occurrences linked to a single event


◼ E.g., When corn chips are purchased, a cola drink is purchased
65% of the time

◼ Sequences: Events are linked over time


◼ E.g., If a house is purchased, a new refrigerator will be purchased
within two weeks 65% of the time

◼ Classification: Recognizes patterns that describe the group to


which an item belongs by examining existing items that have
been classified and by inferring a set of rules
◼ Help discover the kind of customers who are likely to leave
Tools for Business Intelligence
Data Mining-Types of Information Available

◼ Clustering: Similar to classification but groups are not


defined
◼ Discovers groupings within data, such as finding affinity for bank
cards
◼ Categorizing database into groups of customers based on
demographics and types of persona investments

◼ Forecasting: Uses a series of existing values to forecast what


other values will do
◼ E.g., Finding patterns in data to help managers estimate the future
value of continuous variables,such as sales figures
Tools for Business Intelligence
Predictive Analysis and Text Mining

◼ Predictive analysis
◼ Uses data mining techniques, historical data, and assumptions
about future conditions to predict outcomes of events
◼ E.g., Probability a customer will respond to an offer or purchase
a specific product

◼ Text mining
◼ Extracts key elements from large unstructured data sets (e.g.,
stored e-mails)
Data as service model-Gartner
hype cycle

You might also like