You are on page 1of 29

Introduction to Relational Databases

`

RDBMS
• Most popular database system. • Simple and sound theoretical basis.

• Developed by E F Codd in the early 1970's.
• The model is based on tables, rows and columns and the manipulation of data stored within.

• Relational database is a collection of these tables.
• First commercial system: MULTICS in 1978. • Has overtaken Hierarchical and Network models. • Main feature: Single database can be spread across several tables. • Examples include: Oracle, IBM's DB2, Sybase, MySQL & Microsoft Access.

Differences between DBMS and RDBMS DBMS • Data is stored in a single large table • Single record modification affects the whole database RDBMS (Codd 1980) • Database is 'broken down' into smaller pieces • The changes will not affect the entire database .

data can be viewed in different ways by different users. Expandability is relatively easy to achieve by adding new views of the data as they are required. Support one off queries using SQL or other appropriate language. Better backup and recovery procedures Provides multiple interfaces Solves many problems created by other data models The ability to handle efficiently simple data types Multiple users can access which is not possible in DBMS .RDBMS Advantages • • • • • • • • • • • • • Increases the sharing of data and faster development of new applications Support a simple data structure. namely tables or relations Limit redundancy or replication of data Better integrity as data inconsistencies are avoided by storing data in one place Provide physical data independence so users do not have to be aware of underlying objects Offer logical database independence .

Relational Database • Relational Database Management System (RDBMS) – Consists of a number of tables and single schema (definition of tables and attributes) – Students (sid. gpa identify attributes sid is primary key . age. gpa) Students identifies the table sid. age. name. login. login. name.

2 3. gpa: real) sid 50000 53666 53688 53650 53831 53832 name Dave Jones Smith Smith Madayan Guldu login dave@cs jones@cs smith@ee smith@math madayan@music guldu@music age 19 18 18 19 11 12 gpa 3.8 1.0 .4 3.3 3. age: integer.An Example Table • Students (sid: string. name: string.8 2. login: string.

quarter.Another example: Courses • Courses (cid. dept) cid Carnatic101 Reggae203 History105 instructor Jane Bob Alice quarter Fall 06 dept Music Summer 06 Music Spring 06 Fall 06 Math History Topology101 Mary . instructor.

name. quarter. gpa) – How do we express which students take each course? . instructor. dept) – Students (sid. age.Keys • Primary key – minimal subset of fields that is unique identifier for a tuple (unordered sets of known values with names) – sid is primary key for Students – cid is primary key for Courses • Foreign key –connections between tables – Courses (cid. login.

studid) Studid is foreign key that references sid in Student table Student Foreign Enrolled sid name key cid Carnatic101 grade studid C 53831 50000 53666 53688 53650 Dave Jones Smith Smith Guldu login dave@cs jones@cs smith@ee smith@math guldu@music Reggae203 Topology112 History 105 B A B 53832 53650 53666 53831 53832 Madayan madayan@musi . grade. need a new table Enrolled(cid.Many to many relationships • In general.

e..Relational Algebra • Collection of operators for specifying queries • Query describes step-by-step procedure for computing answer (i. operational) • Each operator accepts one or two relations as input and returns a relation as output • Relational algebra expression composed of multiple operators .

Basic operators • Selection – return rows that meet some condition • Projection – return column values • Union • Cross product • Difference • Other operators can be defined in terms of basic operators .

quarter. gpa) • Enrolled (cid. instructor. studid) . grade. name.Example Schema (simplified) • Courses (cid. dept) • Students (sid.

8 .3 3.3 from S1: σgpa>3.4 3.2 3.4 53650 Smith 3.8 1.0 sid name gpa 53666 Jones 3.3(S1) S1 sid 50000 53666 53688 53650 53831 53832 name Dave Jones Smith Smith Madayan Guldu gpa 3.8 2.Selection Select students with gpa higher than 3.

0 .8 1.2 3.3 3.8 2.8 2.4 3.Projection Project name and gpa of all students in S1: name.8 1.2 3.3 3.0 name Dave Jones Smith Smith Madayan Guldu gpa 3. gpa(S1) S1 Sid 50000 53666 53688 53650 53831 53832 name Dave Jones Smith Smith Madayan Guldu gpa 3.4 3.

3: name.8 2.8 1.0 name gpa Jones 3.3 3.Combine Selection and Projection • Project name and gpa of students in S1 with gpa higher than 3.4 3.2 3.4 Smith 3.gpa(σgpa>3.3(S1)) Sid 50000 53666 53688 53650 53831 53832 name Dave Jones Smith Smith Madayan Guldu gpa 3.8 .

Set Operations • Union (R U S) – All tuples in R or S (or both) – R and S must have same number of fields – Corresponding fields must have same domains • Intersection (R ∩ S) – All tuples in both R and S • Set difference (R – S) – Tuples in R and not S .

Set Operations (continued) • Cross product or Cartesian product (R x S) – All fields in R followed by all fields in S – One tuple (r. s S .s) for each pair of tuples r  R.

2 3.0 S2 sid name gpa 53666 53688 53700 53777 53832 sid 53666 53688 53832 Jones Smith Tom Jerry Guldu 3.0 gpa 3.8 1.5 2.8 2.Example: Intersection S1 sid 50000 53666 53688 53650 53831 53832 name Dave Jones Smith Smith Madayan Guldu gpa 3.2 3.2 2.0 S1  S2 = name Jones Smith Guldu .4 3.4 3.3 3.8 2.4 3.

studidE S1 Sid 50000 53666 name Dave Jones gpa 3.8 1.Joins • Combine information from two or more tables • Example: students enrolled in courses: S1 S1.8 2.sid=E.0 .2 3.4 E cid Carnatic101 Reggae203 Topology112 History 105 grade studid C B A B 53831 53832 53650 53666 53688 53650 53831 53832 Smith Smith Madayan Guldu 3.3 3.

8 53832 Guldu 2.3 Joins E cid Carnatic101 Reggae203 Topology112 History 105 cid History105 Topology112 Carnatic101 grade studid C B A B grade B A C 53831 53832 53650 53666 studid 53666 53650 53831 53666 53688 53650 53831 53832 sid 53666 53650 53831 Jones Smith Smith Madayan Guldu name Jones Smith Madayan 3.2 3.8 2.8 1.S1 Sid 50000 name Dave gpa 3.0 gpa 3.8 1.0 Reggae203 B 53832 .4 3.4 3.

Relational Algebra Summary • Algebras are useful to manipulate data types (relations in this case) • Set-oriented • Brings some clarity to what needs to be done • Opportunities for optimization – May have different expressions that do same thing • We will see examples of algebras for other types of data in this course .

Views • Virtual table defined on base tables defined by a query – Single or multiple tables • Security – “hide” certain attributes from users – Show students in each course but hide their grades • Ease of use – expression that is more intuitively obvious to user • Views can be materialized to improve query performance .

course) AS SELECT S.cid FROM Students S. Enrolled E WHERE S. sid. E.sname.studid and E.grade = „B‟ name Jones sid 53666 course History105 Guldu 53832 Reggae203 .sid=E.sid.Views • Suppose we often need names of students who got a „B‟ in some course: CREATE VIEW B_Students(name. S.

3 May need to scan entire table Index consists of a set of entries pointing to locations of each search key .Indexes • • • • Idea: speed up access to desired data “Find all students with gpa > 3.

data records in different order from index • Primary vs.ordering of data records same as ordering of data entries in the index – Unclustered.Types of Indexes • Clustered vs. Secondary – Primary – index on fields that include primary key – Secondary – other indexes . Unclustered – Clustered.

0 50000 53600 53800 .Example: Clustered Index • Sorted by sid sid 50000 53650 53666 53688 53831 53832 name Dave Smith Jones Smith Madayan Guldu gpa 3.4 3.2 1.8 2.3 3.8 3.

2 1.8 3.0 3.2 3.Example: Unclustered Index • Sorted by sid • Index on gpa 1.8 sid 50000 53650 53666 53688 53831 53832 name Dave Smith Jones Smith Madayan Guldu gpa 3.3 3.8 2.4 3.3 3.8 2.0 .4 3.

Comments on Indexes • Indexes can significantly speed up query execution • But inserts more costly • May have high storage overhead • Need to choose attributes to index wisely! – What queries are run most frequently? – What queries could benefit most from an index? • Preview of things to come: SDSS .

without details of storage • Efficient data access – uses techniques to store and retrieve data efficiently • Reduced application development time – many important functions already supported • Centralized data administration • Data Integrity and Security • Concurrency control and recovery .Summary: Why are RDBMS useful? • Data independence – provides abstract view of the data.