You are on page 1of 48

Lecture 1: Course Overview

Introduction to Database Models


95-703 Database Management (Fall 2021, Section A)

Xiaoying Tu
About me…
o Xiaoying Tu
• Phonetically: Shiau-ing Too
• Or simply: X.T.
o Post-doc Teaching Fellow (Dietrich & Heinz Colleges)
o For the past 35 years of my life:

© Chinatour.net

o Research: How IT innovations (digitization + internet) disrupted film and music industries
o Teaching:
• Instructor for Database Management and Data Mining
• Recitation leader & teaching assistant for master’s courses at Heinz
Your Turn…
Course Logistics – Important Documents
o Course Syllabus
• Assessment components
• Late submission policy + how late passes work
• Participation + “Two-strikes” rule
• Policy on collaboration and cheating
• … and many more

o Course Schedule
• Lecture topics and dates
• Suggested reading
• Submission deadlines

o Read the syllabus and schedule in details


• You are responsible for knowing everything in there
• Post any questions to Canvas discussion board
IMPORTANT!!

o You must only attend the classes and office hours of Section A. Do
not switch to the other sections of 95-703 on an ad-hoc basis.

o If you are not registered for Section A, you must attend the section
which you are registered for.
Course Logistics – Student Evaluation
Assessment Component Grade Percentage
Homework Assignments (4) 20%

SQL Assignments (3) 20%

Project 20%

Final Exam 40%

Participation Bonus (see details in syllabus) 1 bonus point


Course Logistics – Communications
o Canvas Announcement
o Canvas Discussion Board
• Post questions under appropriate threads
• Do not attach your solution file
o Office Hours
• To be posted to Canvas by end of first week
o Email
• Always send to ALL teaching team members (me + both TAs)
Course Logistics - Computing
o All assignments are to submitted digitally to Canvas
o We strongly encourage you to TYPE your assignment submissions!
• Easy for you to edit / modify
• Easy for us to read
o Database Design (Entity-Relationship Diagram): one of the following
• LucidChart
• EdrawMax
• Microsoft Visio: download from CMU Computing
• Microsoft Powerpoint: download Office 365 from CMU Computing
o Database Query (SQL)
• Oracle Database Express Edition (XE) 18c
 Installation guide: Document; Video
• Mac users need to install Virtual Machine before installing Oracle
 Instructions to be posted on Canvas
o Contact Heinz Computing (heinz-computing@andrew.cmu.edu) for technical support.
Course Logistics – Health Advisory
o In order to attend class meetings in person, all students are expected to abide by all behaviors
indicated in A Tartan’s Responsibility, including any timely updates based on the current conditions.
o In terms of specific classroom expectations, whenever the requirement to wear a facial covering is
in effect on campus, students are expected to wear a facial covering throughout class. Note:
the requirement to wear a facial covering is in effect for the start of the Fall 2021 semester. If you do
not wear a facial covering to class, I will ask you to put one on (and if you don’t have one with you, I
will direct you to a distribution location on campus, see https://www.cmu.edu/coronavirus/health-
and-wellness/facial-covering.html). If you do not comply, you will be asked to leave the
classroom and be referred to the Office of Community Standards and Integrity for follow up,
which could include student conduct action. Finally, please note that sanitizing wipes should be
available in our classroom for those who wish to use them.
o tl;dr:

Mask. On. Please.


Course Structure – A Roadmap
2. Querying the Relational Model
Tables, rows, columns, keys
SQL SQL: SELECT, FROM, WHERE, GROUP BY, HAVING, JOIN,
User Stories Subqueries, Analytic functions, …
(requirements) operations
(CRUD)

Customer Physical
needs Model
The Database 3. Additional Topics
Views, indexes, Data Integration, …

Conceptual Logical
Model Model

1. Relational Database Design


Constraints: entity, referential, check
Anomalies, functional dependencies, normalization
conceptual -> logical -> physical model
Agenda
o The age of data
o What does a database provide?

o Database system architecture

o Data models

o The relational model


The Data Revolution
Data is the new cyber-currency;
companies rely on it to optimize customer experience and drive sales —
hackers target and monetize the same data.
… Imperva

Data is the new oil.


It’s valuable, but if unrefined it cannot really be used.
It has to be changed into gas, plastic, chemicals, etc.
to create a valuable entity that drives profitable activity;
so must data be broken down, analyzed for it to have value.
… Clive Humby
Example: The Process of Grocery Checkout
Actions Data Manipulation
Scan Loyalty Card

For each item scan UPC

Pay final bill


Example: The Process of Grocery Checkout
Actions Data Manipulation
Scan Loyalty Card Access customer information
(personal information + purchase history)
For each item scan UPC 1. Retrieve current price of item
2. Calculate any discounts
3. Print line item
Pay final bill 1. Charge amount to credit card
2. Print final bill
3. Adjust current stock amounts
4. Identify items that need to be re-stocked if
current quantity falls below threshold
What does a database provide?

What functionality does a DBMS provide?

DBMS

persistence

files
Example: A Simple File System
What’s wrong with File Systems?
o Hard to query
• Need to write dedicated program for even the simplest report
o Structural Dependence
• Once you change a file structure, all reports built upon it needs modification.
o Data Dependence
• Changes in the characteristics of data (e.g. changing a field from integer to decimal)
require changes in all the programs that access the file.
o Data Redundancy
• Because files are isolated, same information are repeated in multiple files
o Lack of security
• Issues with data sharing, control of access…
What does a database provide?

? ? ? ? ? ? ?

DBMS

persistence

files
What does a database provide?

Usability Concurrency Reliability Efficiency Scalability Availability Security

DBMS

persistence

files
Client Server Architecture

Request

Server
Client

Response
END-
USERS DATABASE SYSTEM
Application &
Web Server(s) Database Server(s)

Registrar

Application Database
programs including Data
Accounting Management
data entry forms and System (DBMS)
reports
- Registrar
- Accounting

Students
Data Models
o A data model represents data structures and their characteristics, relations,
constraints, transformations, and other constructs with the purpose of
supporting a specific problem domain.
o Four ways data is related:
Linear Network
Set

Hiearchical
Tables: One Data Structure to Rule them All …
Students • In Relational Databases there is only one data
ID Name GPA structure: a table made up of at least 1 column and
zero or more rows
• Tables are relations which in turn are sets. Hence the
order of the rows in a table are immaterial
Daily_Show_Guests
Year Title Date Profession Name

Ted Codd
(Turing Award 1981)

Baby_Names
State Gender Year fname number
Why study Relational Databases?
Why study Relational Databases?
Lecture 2: The Relational Model
95-703 Database Management (Fall 2021, Section A)

Xiaoying Tu
Agenda
o Tables vs. (Mathematical) Relations
• Why the name “relational” database?
o Keys
• All sorts of keys, but most importantly:
 Primary keys
 Foreign keys
o Anomalies
o Integrity Constraints
Recall: The central object in a relational
database is a table
Customers Table

A table is a logical data representation of an entity set


(a set of entities).
What is an entity?
o Entity
• Something that has a distinct, separate existence
• Can be a person, thing, concept, event, or place
• Entities are objects of interest to the organization
 E.g. Travel agency– tours, customers, and sales

o What are some entities of interest for a university?


A table is composed of rows and columns
Attribute, Column, or Field
Tuple, Row, Record

Cardinality: # of
rows in a table

Degree: # of columns in a table


Properties of a database table
o Each cell contains exactly one
atomic value.
• Customer’s address is divided
into 4 columns: street, city,
state, zip. The business needs to
be able to access zip, city and
state independently.
• Phone is not further divided
since the business does not
need to work with any subset of
phone number.
Properties of a database table
o Column names are distinct
o Each row is distinct; there are no
duplicate rows
o Column values come from the
same domain.
Domains
o Domain - the set of allowable values
from which actual column values are
drawn.
• Purpose: maintains the integrity of
the data.
o Every column is defined on exactly
one domain (has one set of allowable
values).
o At a minimum, the domain of every
column is enforced with an associated
data type.
Mathematical Relations
o Consider two sets 𝐷𝐷1 = 2,4 , 𝐷𝐷2 = 1,3,5
o Cartesian Product of 𝐷𝐷1 and 𝐷𝐷2 , denoted as 𝐷𝐷1 × 𝐷𝐷2 , is the set of all
ordered pairs such that:
• First element comes from 𝐷𝐷1
• Second element comes from 𝐷𝐷2
𝐷𝐷1 × 𝐷𝐷2 = 2,1 , 2,3 2,5 4,1 4,3 4,5
o Any subset of 𝐷𝐷1 × 𝐷𝐷2 is a relation, for example:
• 𝑅𝑅 = 2,1 , 4,1
• 𝑅𝑅 = 𝑥𝑥, 𝑦𝑦 | 𝑥𝑥 ∈ 𝐷𝐷1 , 𝑦𝑦 ∈ 𝐷𝐷2 , 𝑦𝑦 = 1
• 𝑆𝑆 = 𝑥𝑥, 𝑦𝑦 | 𝑥𝑥 ∈ 𝐷𝐷1 , 𝑦𝑦 ∈ 𝐷𝐷2 , 𝑥𝑥 = 2𝑦𝑦 = 2,1
Mathematical Relations
o Extending the idea to 𝑛𝑛 sets: 𝐷𝐷1 , 𝐷𝐷2 , … , 𝐷𝐷𝑛𝑛
o Their Cartesian Product is defined as:
𝑛𝑛

� 𝐷𝐷𝑖𝑖 = 𝐷𝐷1 × 𝐷𝐷2 × ⋯ × 𝐷𝐷𝑛𝑛


𝑖𝑖=1
= 𝑑𝑑1 , 𝑑𝑑2 , … , 𝑑𝑑𝑛𝑛 | 𝑑𝑑1 ∈ 𝐷𝐷1 , 𝑑𝑑2 ∈ 𝐷𝐷2 , … , 𝑑𝑑𝑛𝑛 ∈ 𝐷𝐷𝑛𝑛
o Any set of 𝒏𝒏-tuples from this Cartesian product is a relation on the 𝑛𝑛
sets.
o The 𝐷𝐷𝑖𝑖 sets are also called domains, from which we choose values.
Relations vs. Database Tables
o Relations can be interpreted as database tables:
• Each column draws value from an underlying domain
• Each row represents a tuple
o Relation schema: a named relation defined by a set of attribute and
domain name pairs
• i.e. A table containing a set of attributes with their respective domains
o Relational database schema: a collection of relation schemas (table
schemas), each with a distinct name.
• i.e. The representation of all the tables in your database
Properties of Relations
o The relation has a name that is distinct from all other relation names
in the relational schema
o Each cell of the relation contains exactly one atomic (single) value

o Each attribute has a distinct name

o The values of an attribute are all from the same domain

o Each tuple is distinct; there are no duplicate tuples

o The order of attributes has no significance

o The order of tuples has no significance


Keys – the “key” to uniqueness
o Super Key
• Any set of attributes that uniquely identify a row in a table
o Candidate Key
• A minimal super key: if we remove an attribute from a candidate key it will cease to uniquely
identify rows
o Primary Key
• From amongst the candidate keys the one we choose to identify rows in the table
There’s another key…
o Foreign Key
• A column in a table that refers to (i.e. matches a value of) the PK of another table

Student Advisor

ID Name AdvisorID AID Name Office


1 Jack A3 A3 Correy WH 5403
2 Jill A3 A5 Gary PH 226C
3 Pat A3
4 George A5

… but why not just store everything in a single table?


Update Anomaly
ID Name AdvisorID Name Office
1 Jack A3 Correy WH 5403
Redundant
2 Jill A3 Correy WH 5403
data
3 Pat A3 Correy WH 5403
4 George A5 Gary PH 226C

ID Name AdvisorID Name Office


1 Jack A3 Correy WH 5403
Anomalous
2 Jill A3 Correy DH 3142
data
3 Pat A3 Correy WH 5403
4 George A5 Gary PH 226C
INSERTION Anomaly
ID Name AdvisorID Name Office
1 Jack A3 Correy WH 5403
2 Jill A3 Correy WH 5403
3 Pat A3 Correy WH 5403
4 George A5 Gary PH 226C
A10 Logan HBH 2102

What to put in
here as PK?
DELETION Anomaly
ID Name AdvisorID Name Office
1 Jack A3 Correy WH 5403
2 Jill A3 Correy WH 5403
3 Pat A3 Correy WH 5403
Information
4 George A5 Gary PH 226C
about Gary
is lost!
Solution: Split the Table
Foreign Primary
Student Key Key Advisor

ID Name AdvisorID AID Name Office


1 Jack A3 A3 Correy WH 5403
2 Jill A3 A5 Gary PH 226C
3 Pat A3
4 George A5

This foreign key and primary key match represents the logical relationship between
the two entities.
Logical Relationships

o Example – at the travel agency a logical relationship “assigned” exists


between the entities Customer and Salesperson
 A customer is assigned one salesperson.
 Each salesperson is assigned many customers.

o Logical relationships that exist between the entities are important for the
organization to capture along with the data itself.
Integrity Constraints
o Null
• A special value representing “unknown” or “blank” or “absence of a value”
• Not the same as zero numeric value or text strings consisting of spaces

o Restrictions that ensures data accuracy


• Attribute Integrity (a.k.a. domain constraint): all values of an attribute must
come from its underlying domain
• Other general constraints: enterprise-specific rules
• Entity Integrity: next slide…
• Referential Integrity: next slide…
Entity Integrity & Referential Integrity
o Entity Integrity: Each row must have a unique, non-null value
• Enforced using Primary Keys
• Avoids duplicate records

o Referential Integrity: If a foreign key exists in a relation, either the


foreign key value must match a primary key value of some tuple in its
home relation or the foreign key value must be null.
• Enforced using PK-FK pairs
• Avoids dangling references
Referential Integrity Example
Student Which of the following operations could cause a
FK (cid) referential integrity violation, i.e. dangling
A 67262 references (assume the constraints are
B 67112 somehow not properly enforced in DB)?
Course
PK (cid) Operations Violation?
67262 Stats Insert into Student
67112 Python Delete from Student
Update Student
Insert into Course
Delete from Course
Update Course
Acknowledgements
o Some of the lectures notes for this class feature content borrowed with or without modification from
the following sources:

o 95-703 Lecture Recordings (Prof. Janusz Szczypula)


o 90-728 Lecture Notes (Prof. Karyn Moore, Xiaoying Tu)
o 67-262 Lecture Notes (Prof. Raja Sooriamurthi)
o Casteel, J., “Oracle 12c: SQL,” Cengage Learning, 2016
o Connolly, T. and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and
Management,” 6th edition, Addison-Wesley, 2015
o Coronel, C. and S. Morris, “Database Systems: Design, Implementation, & Management,” 12th edition,
Cengage Learning, 2017
o Hoffer, J. A., R. Venkataraman, and Heikki Topi, “Modern Database Management,” 11th edition, Prentice
Hall, 2012
o Price, J., “Oracle Database 12c: SQL,” McGraw Hill, 2014Statistics for Business and Economics, 13th
Edition, by McClave, Benson, and Sincich.

You might also like