Database Systems - Introduction

ORIE 480

IT framework
• Develop a concept for representing & solving the problem. • Implement the solution to the problem. • Analyze and interpret the solution results. • Now, another kind of modeling, very widely used: Represent an

organization and its activities in a database.

Overview of Database Management
• What is:
– Data?
• Raw facts that are described, observed, or measured.

– Information?
• Data that has been organized or prepared

– Knowledge?
• Data/Information/Rules that are used for actual decision making

– Database?
• A collection of inter-related data.

– Database Management System?
• A collection of programs that define/manipulate/maintain databases.
From "Database Management Systems, Lecture Notebook, 3rd Edition, Il-Yeol Song, McGraw-Hill.

Why a database system?
• Great majority of real-world computer applications are associated with databases. • Alternatives (file systems) are unattractive.
– Some reasons: • Data-program dependence • Redundancy • Inconsistency • Lack of security • Have to write your own • If a query has not been foreseen, an expert programmer is needed

Data Models
• An abstraction of the data and their associations • Different Models: – Hierarchical – Network – Relational – Object-oriented

• Different software implementation for each.
– Why?
• History, needs, and requirements.

– What is the most popular? • relational systems!

RDBMSs
(Relational Database Management Systems)

• Commercial/Open Source software implementations.
– Non-Microsoft/Commercial • ORACLE, DB2 – Microsoft • SQL Server, Access – Open Source • MySQL – Pure Python • Gadfly

RDBMS' Future/Stability

Here to stay?
Absolutely!

• •

Relational DBMS with SQL is the commercial de facto standard Constantly being improved. Hot research area in Computer Science Hierarchical and network systems still in use: performance, cost of conversion Object-oriented DBMS: thought to be on the way big time; SQL will likely stay in some form

But…

• •

What are the components of a relational database?
• Database • Table / Relation • Tuple / Record • Attribute

From “Online Tutorial, IST Solutions Institute, Penn State University

Relational database tables restrictions
• Columns contain the SAME type of data
– columns are static (once defined) – columns are named e.g., not A, but SSN.

• One value per cell • One (or more) column(s) contains a unique value for each row
– PRIMARY KEY

• Rows have the same size (# of columns)
– order of rows is unimportant – rows are dynamic (when new data is input)

Levels of Abstraction & Data Independence
• Levels of Independence
- Logical - Physical External Schema 1 External Schema 2

Conceptual Schema Physical Schema Disk

Transaction Management
• A Transaction is a logical unit of access to a DBMS

– It’s a unit, no such thing as half a transaction! – Concurrency and Recovery from System Crashes – Locks, Write-Ahead Logs and Checkpoints
• Banking, stock market transactions, airline reservations etc.

Two Components of a RDBMS
• Data Definition Language (DDL)

– allows you to define the metadata
• Data Manipulation Language (DML)

– allows you to insert/edit/delete data – allows you to ask questions about the data – two major formats:
• Query by Example (QBE) • Structured Query Language (SQL)

Example: The parts-suppliers database
• Three tables:

– S (suppliers) – P (parts) – SP (shipments)

Supplier table
S# SNAME STATUS CITY

S1 S2 S3 S4 S5

Smith Jones Blake Clark Adams

20 10 30 20 30

London Paris Paris London Athens

•CONCEPTS: •Table (and relation) •Tuple (or record) •Field, attribute (slot in a record; column in a table) •Domain (set of permitted values for an attribute) •Data types (integer, character, date, . . . ) •Data atomicity (one data item per field) •Key, key field(s) (uniquely identify a record)

Parts Table
P# PNAME COLOR WEIGHT

P1 P2 P3 P4 P5 P6

Nut Bolt Screw Screw Cam Cog

Red Green Blue Red Blue Red

12 17 17 14 12 19

•Concepts: •Ordering on the rows? •Weight = 17. 17 what? Pounds? Ounces? Tons? Kilograms? •Why more than one table?

Shipment Table
S# S1 S1 S1 S2 S2 S3 S4 S4 P# P1 P2 P6 P1 P2 P2 P2 P4 QTY 300 200 100 300 400 200 200 300

•Concepts: •Two-field key: S#-P# •Why more than one table? •How do we answer queries that rely on data in more than one table?

Representing the Suppliers/Parts/Shipments
• Use Access

– let's see…

Our focus in 480:
• Be able to design and implement “simple” databases (pursue your managerial insights) • Be a “power user”. Design and execute complicated queries.

• Know when to use a database…

Spreadsheets vs. simple Databases
1. Start typing in data immediately, 2. Little control of access, 3. Can vary data types, format, etc., 4. Little control of the cells contents, 5. Not great output features, i.e., only print the entire sheet. 1. Have to design the database first, 2. Can limit access, 3. Each attribute must be the same type, 4. Total control of an attributes contents, 5. Great output features: labels, etc., for only certain rows.

ToDo
• Read: Course pack pages 3-23 • Enroll in CourseInfo • Remember to show up for your lab section