1

Introduction To This Class
INFS 614 Professor Smith

MITRE

2

Let’s Go Over the Syllabus ....
0 Welcome! 0 Your experience with databases? 0 About me:

- database scientist at MITRE since ‘93 - current interests: bioscientific databases, security/privacy, data discovery and sharing - contact information:
0 0

email (best): kps@mitre.org office phone: 703-983-6115 (endure the voicemail)

0 Texts: - Database Management Systems, 3rd ed, Raghu Ramakrishnan and Johannes Gehrke, McGraw-Hill; - Oracle9i Programming, 4th edition Rajshekhar Sunderraman, MITRE Addison-Wesley ISBN 0-321-19498-5.

3

Satisfaction of Prerequisites
0 Prerequisites (strictly enforced)

- INFS-501 (Discrete mathematics) - INFS-515 (Computer architecture/organization) - INFS-590 (Program design and data structures)
0 Specifically:

- Good background in discrete mathematics (e.g., set theory, mathematical logic, relations and functions); - Programming (good knowledge of either C, C++ or Java); - Data structures and algorithms, computer architecture, and operating systems.
MITRE

4

Satisfaction of Prerequisites (Cont.)
0 Consult your letter of acceptance. It specifies your status with

respect to these foundation courses. For each course, it must be that either - You were waived from the course (the evidence should be either in the acceptance letter or in a subsequent official document). - You took the course and received a grade of B or better. 0 Questions? Contact Ryan in the dept office at 993-1640 (Room 330 in S&T II).

MITRE

5

Submission and Grading
0 Late submissions are not accepted.

- On-time means: before lecture begins on the due date - Your homeworks must run properly under the Oracle system in the labs. 0 Grading is based on your performance on: - homework assignments (20%) - midterm exam (35%), and - final exam (45%). 0 The period of performance ends with the final. 0 Final grades consider: - a) absolute standards (e.g., did you learn the material?) - b) class rank
MITRE

6

Your GTA and Course Administration
0 GTA

- TBA - Office hour: TBA
0 The course will be administered via a website:

- ise.gmu.edu/~kps/INFS614 - please read it at least once per week!

MITRE

7

Honor Code System
0 GMU honor Code

- www.ise.gmu.edu/Honor.html 0 For this class: - Homeworks & exams require individual work. - Study groups are encouraged, but homeworks’ solutions and write up must be individual. - Exams: in-class, individual effort, closed books 0 Satisfaction of prerequisites: - Honor code issue

MITRE

8

Cheating: 5 Good Reasons Not To ...
1) It won’t help you. 2) It will hurt you. 3) It could really hurt you. 4) Cheating is stealing from your classmates. 5) Cheating is wrong.

MITRE

Semester Overview and Pace
Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1/24 1/31 2/7 2/14 2/21 2/28 3/6 3/13 3/20 3/27 4/3 4/10 4/17 4/24 5/1 5/8 Date Topic Introduction The ER Model The Relational Model Relational Algebra Relational Calculu s Midterm Review Midterm Examinat i o n (Spring Break Holiday) SQL: Basics SQL: Nested Queri e s SQL: Aggregate Queri e s Topic (TBA ) Functional Dependencies Normalization Review Final Examination 19.119.3 19.419.6 5 5 5.1-5.3 5.4 5.5 4 1c 4 1c Text 1 2 3 4.1-4.2 4.3 1a 1b 2 3 1a 1b 2 3 HW HW assigned d u e

9

MITRE

10

Database Management Systems: Lecture 1 - Introduction
INFS 614 Professor Smith

MITRE

11

Why Are Databases Important?
0 Databases are everywhere

- Travelocity is a layer around SABRE, an industry-wide airline reservation database - The “Deep Web” is > 75 times the size of the Internet - Your cell phone has a small database - Increasingly: databases are key components of bigger systems - Much useful enterprise “knowledge” exists in data models 0 Lots of jobs require, or can be done much better, with a good understanding of databases 0 Databases can be very interesting - “bottom up” look at many types of systems; if you understand the data architecture, you have an intimate understanding of the whole system (and vice versa) - you meet interesting people!
0

aircraft designers, narcotics agents, bioscientists, movie reviewers

0 Many interesting research areas

MITRE

12

What is a Database? A DBMS?
0 A database is collection of data managed over a period of time

-

big: all US airline reservations, sales records at Walmart small: a personal Christmas card list, a recipe file specialized: medical images, terrorist info, a hybrid auto design you are in a student database (or you need to be soon...)

0 A DBMS is a Database Management System

- a specialized software tool making it much easier to manage databases - over 40 years of active research and engineering has helped develop the modern DBMS - this is a huge business (billions/year)
0 Sometimes database is used to mean DBMS

- “Oracle sells a popular database” - leads to misunderstanding if you ask “Do you have a database?” MITRE

13

Example Database: A University Database
0 Information about a university environment

• Entities :
• Students • Faculty • Courses • Classrooms

• Relationships :
•Students enroll in courses •Faculty teach courses •Courses use a classroom •Courses are prerequisite to other courses
MITRE

14

DBMS’s: Why Use A DBMS?
0 Well .... what if you don’t? 0 Let’s say you have 500 gig of corporate data on sales, employees,

departments, etc, and you want to generate custom reports. 0 Writing applications on a simple file system has many drawbacks: - disk management: report application can’t just use main memory, must shuttle data back and forth to disk as you use it 0 OS provides primitives, but can be a lot of work - scaling: linear access is too slow, need generic indexing - data integrity: if you delete a department, how do you ensure its employees are deleted? - security: OS security is probably not good enough - queries: you probably need to write a new program for each question you need to ask about the stored data - transactions: you may need to code up protections for concurrent users and against system crashes 0 --->> This is a lot of work!! MITRE

15

Standard Services A DBMS Offers
0 Schematic data modeling 0

0 0 0 0

- disciplined design A universal query language (SQL) - declarative access - interoperability - physical data independence Automated query optimization Efficient access to terabytes through indexing Automated data integrity enforcement Security (access controls, roles, views, audit, authentication ...)

0 “Crash-proof” atomic 0 0 0 0 0 0

transactions Data persistence between executions Concurrency control Transparent data distribution, parallelization APIs (e.g., JDBC) and a rich set of development tools Data export into XML Sophisticated administration and tuning tools

MITRE

Where is the “Break-Even Point”? Case Study: Small Preschool Survey
0 Problem: given a preschool survey with 79 0 0

16

0

records, generate averages and create reports Excel is great for reports, but queries have to be hardcoded into spreadsheet pages Using SQL, I can generate averages with incredible flexibility, but I first have to create a database and load in the data. It was worth it (to me); I used a (free) open source DBMS, then copied the answers into Excel for the reports.

MITRE

17

Structure of a DBMS
Queries Transaction Mgmt
Security

DBMS

Query Interface Query Optimization & Execution Relational Operators File Access Methods Buffer Management Disk Management

Disk
MITRE

18

People Who Interact With Databases
Application Programmers Data Providers DBMS End Users DBAs DBMS Designers & Implementors
MITRE

Information System Architects

Database Researchers

19

Course Overview
Relational Calculus 4 5 Entity Relationship Design 1

SQL 2 DML DDL (queries) (relational design)

Relational Algebra 3

Normalization & Dependency Theory 6
MITRE

Data Modeling
Databases implement a model of pertinent aspects of the world

20

“Real World”

1) Data Structures 2) Operations 3) Validity Constraints

warehouse

x

“store”

“no room left”

“insert”

Database

warehouse representation

x

“insert denied”

MITRE

21

A Brief History of Databases

Data Model

relational

network hierarchical

x
1960

x
1970

x
1980 1990 2000 2010

Time

the present

MITRE

Relational Data Model: An Example
0 Given 3 relations (tables) of data: Pilots Flights Aircraft

22

Pilot.name = Flights.pilot_name

Flights.aircraft_id = Aircraft.id

0 Which pilots have flown prop-jets? (In SQL) SELECT FROM WHERE AND AND DISTINCT Pilots.name Pilots, Flights, Aircraft Pilot.name = Flights.pilot_name Flights.aircraft_id = Aircraft.id Aircraft.type = “prop-jets” MITRE

Initial Query Execution Plan
answer (the distinct pilot names) (10) project (only prop-jets - 0.1%) (10,000) select (10,000,000) join scan join scan (10,000,000) (2000)

23

Total tuples processed: 30,012,060

(10,000,000) (50) scan

Pilots

Database :
(50)

Flights

Aircraft

(10,000,000)

(2000)

MITRE

Query Optimization: Improved Plan
answer (only distinct pilot’s names) (10) project Total tuples processed: 30,062 (50) (10,000) join

24

scan

(10,000)

join (only prop-jets - 0.1%)

(10,000) indexed retrieval

select

(2)

Pilots

Database :
(50)

Flights

Aircraft

(10,000,000)

(2000)

MITRE

A Brief History of Databases (Continued)
object objectrelational relational

25

Data Model

network hierarchical

x
1960

x
1970

x
1980

x
1990

x
2000 2010

Time

the present

MITRE

26

ORDBMS’s
0 Key issue addressed:

- “Impedance mismatch” - mismatch between data model in a database and in its application languages - relations vs. objects/classes (e.g., in C++ or Java) 0 Added ORDBMS features: - More powerful type definition
0

e.g. “point”, “polyhedron” “union”, “contains” convex polyhedra indices, operators, optimization techniques

- Type-specific methods, predicates
0

- Inheritance in the schema
0

- User definable infrastructure
0

MITRE

A Brief History of Databases (recent events .... )
native xml object objectrelational relational

27

Data Model

network hierarchical

x
1960

x
1970

x
1980

x
1990

x
2000 2010

Time x x

the present

MITRE

Important Events (Somewhat) Recently
0 Open Source DBMSs

28

- MySQL - 1995 - PostgreSQL - 1996 0 Semi-structured (e.g., XML) Databases - LORE - 1996 - Apache Xindice - 2001 0 Currently, nearly all DBMS’s generate XML from tables

MITRE