You are on page 1of 23

CPS 216: Advanced Database

Systems

Shivnath Babu
Fall 2007
Outline for Today

• What this class is about: Data management


• What we will cover in this class
• Logistics

What does a Database System mean to you?


(Hint: What are they used for? Give examples)
User/Application Data Management

Query Query Query

Data

DataBase Management System (DBMS)


Example: At a Company
Query 1: Is there an employee named “Nemo”?
Query 2: What is “Nemo’s” salary?
Query 3: How many departments are there in the company?
Query 4: What is the name of “Nemo’s” department?
Query 5: How many employees are there in the
“Accounts” department?
Employee Department
ID Name DeptID Salary … ID Name …
10 Nemo 12 120K … 12 IT …
20 Dory 156 79K … 34 Accounts …
40 Gill 89 76K … 89 HR …
52 Ray 34 85K … 156 Marketing …
… … … … … … … …
DataBase Management System (DBMS)

High-level
Query Q Answer

Translates Q into
best execution plan
DBMS for current conditions,
runs plan

Data
Example: Store that Sells Cars
Owners of Make Model OwnerID ID Name Age
Honda Accords Honda Accord 12 12 Nemo 22
who are <= Honda Accord 156 156 Dory 21
23 years old
Join (Cars.OwnerID = Owners.ID)

Filter (Make = Honda and Filter (Age <= 23)


Model = Accord)
Cars Owners
Make Model OwnerID ID Name Age
Honda Accord 12 12 Nemo 22
Toyota Camry 34 34 Ray 42
Mini Cooper 89 89 Gill 36
Honda Accord 156 156 Dory 21
… … … … … …
DataBase Management System (DBMS)

High-level
Query Q Answer

Translates Q into
best execution plan
DBMS for current conditions,
runs plan
Keeps data safe
and correct
despite failures,
concurrent Data
updates, online
processing, etc.
DBMS is multi-user
• Example
Get account balance from database;
If balance > amount of withdrawal then
balance = balance - amount of withdrawal;
dispense cash;
store new balance into database;
• Homer at ATM1 withdraws $100
• Marge at ATM2 withdraws $50
• Initial balance = $400, final balance = ?
– Should be $250 no matter who goes first
Final balance = $250

Homer withdraws $100:


read balance; $400
if balance > amount then
balance = balance - amount; $300
write balance; $300

Marge withdraws $50:

read balance; $300


if balance > amount then
balance = balance - amount; $250
write balance; $250
Final balance = $300

Homer withdraws $100: Marge withdraws $50:


read balance; $400
read balance; $400
If balance > amount then
balance = balance - amount; $350
write balance; $350

if balance > amount then


balance = balance - amount; $300
write balance; $300
Final balance = $350

Homer withdraws $100: Marge withdraws $50:


read balance; $400
read balance; $400

if balance > amount then


balance = balance - amount; $300
write balance; $300

if balance > amount then


balance = balance - amount; $350
write balance; $350
Concurrency control in DBMS
• Similar to concurrent programming problems
– But data is not all in main-memory
• Appears similar to file system concurrent
access?
– Approach taken by MySQL initially; now
MySQL offers better alternatives
• But want to control at much finer granularity
• Or else one withdrawal would lock up all
accounts!
Recovery in DBMS
• Example: balance transfer
decrement the balance of account X
by $100;
increment the balance of account Y
by $100;
• Scenario 1: Power goes out after the first
instruction
• Scenario 2: DBMS buffers and updates data in
memory (for efficiency); before they are written
back to disk, power goes out
• Log updates; undo/redo during recovery
DataBase Management System (DBMS)

High-level
Query Q Answer

Translates Q into
best execution plan
DBMS for current conditions,
runs plan
Keeps data safe
and correct
despite failures,
concurrent Data
updates, online
processing, etc.
Summary of modern DBMS features
• Persistent storage of data
• Logical data model; declarative queries and
updates ! physical data independence
• Multi-user concurrent access
• Safety from system failures
• Performance, performance, performance
– Massive amounts of data (terabytes ~
petabytes)
– High throughput (thousands ~ millions
transactions per minute)
– High availability (¸ 99.999% uptime)
Modern DBMS Architecture
Applications
SQL
DBMS
Parser
Logical query plan
Query Optimizer
Physical query plan
Query Executor
Access method API calls
Storage Manager
Storage system API calls File system API calls
OS

Disk(s)
Course Outline
• 40% of the class is about core DBMS concepts
– Query execution, query optimization, transactions,
recovery, etc.
– Textbook material
• 60% of the class is on “what is happening today
in data management”
– New developments on textbook material
– Data streams
– Web search – Google, Yahoo!
– Data integration (structured data + unstructured data)
– Data mining
– Unsolved challenges
Using a Traditional DBMS
User/Application

Query
Query Result
Result
… …

Loader Table R

Table S
New Approach for Data Streams

User/Application

Register
Continuous Query
Result
(Standing Query)

Stream Query
Input streams Processor
Example Continuous (Standing) Queries
• Web
– Amazon’s best sellers over last hour
• Network Intrusion Detection
– Track HTTP packets with destination address
matching a prefix in given table and content
matching “*\.ida”
• Finance
– Monitor NASDAQ stocks between $20 and
$200 that have moved down more than 2% in
the last 20 minutes
New Challenges in DBMSs

High-level
Query Q Answer

DBMS

TeraBytes  PetaBytes
<CD>
<TITLE>Empire B.</TITLE>
Data <ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia
</COMPANY>
<PRICE>10.90</PRICE>
</CD>
Course Logistics

• Reference: Database Systems: The Complete


Book, by H. Garcia-Molina, J. D. Ullman, and J.
Widom
• Web site: http://www.cs.duke.edu/courses/fall07/cps216
• Grading:
– Project 40%
– Homework Assignments 20%
– Midterm 20%
– Final 20%
Summary: Data Management is
Important
• Core aspect of most sciences and
engineering today
• Core need in industry
• Cool mix of theory and systems
• Chances are you will find something
interesting even if you primary interest is
elsewhere

You might also like