You are on page 1of 6

Introduction to Database 1

Chapter 1
DBMS - database management system - is a specialized SW for efficiently managing large amount of -
mostly structured - data. A DBMS is capable of
ˆ data model, which specifies a logical structure of the data, called schema
ˆ high-level query language
ˆ efficient persistent storage system
ˆ transaction management, an atomic unit of work

Data Model
...is a collection of conceptual tools for describing
ˆ data
ˆ relationships among various data
ˆ data semantics
ˆ consistency constraints
It provides users with an abstract view of the data. There are data models like
ˆ relational model
ˆ entity-relationship data model, which is mainly for database design
ˆ object-based data models
ˆ semi-structured data model
et cetera

Relational Model

Instances and Schemas


Schema is description of structure of the database.
ˆ logical schema: the logical structure of the database
conceptual schema, e.g., ER model, UML
implementation schema, e.g., relational model
ˆ physical schema, e.g., storage id, format, index
Instance is actual content of the database at a particular point in time.

Levels of Abstraction
Physical, logical, and view level

1
Data Independence
ability to make physical or logical level changes without affecting application programs

DDL: Data Definition Language


...provides facilities to define the relation schema, e.g.
create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2),
)
It provides facilities to specify integrity constraints

ˆ primary key, unique, not null, foreign key constraints

ˆ domain constraints

It provides facilities for authorization

DML: Data Manipulation Language


...is the ”query language”! There are procedural and declarative DML. SQL: structured query language
is a declarative DML. It is not Turing machine equivalent, and is embedded in some host language.
......

Chapter 2
Terms
ˆ A relation schema R(A1 , ..., An ) is, like, a table, having its attributes and all.
logical design of a database
ˆ A relation instance r(R) defined over schema is a set of rows
a snapshot of the data in the database at a given instant in time
ˆ A tuple is an element of a relation, aka row in a table

ˆ A binary relation over sets X and Y is join

ˆ A relation is a set of tuples

Keys
ˆ A key K ⊆ R = {A1 , ..., An }

ˆ K is a superkey if values for K can uniquely identify a tuple of each possible relation r(R)

ˆ A candidate key is a minimal superkey

ˆ One of the candidate keys is designated to be the primary key

ˆ A foreign key refers to another relation’s primary key.


Referential integrity constraint: A value of an attribute in a relation must be the value of an
attribute in another relation.

2
Relational Algebra
Basic Operators
ˆ selection σ
σp (r) where p is the selection predicate.
relation consisting of select rows that satisfy the predicate
ˆ projection Π
ΠA1 ,...,Ak (r) where Ai are attribute names
relation of k listed columns
since relations are sets, duplicate rows are removed
ˆ cartesian product ×
a tuple from each possible pair of tuples
If an attribute of same name exists in both relations, we need to distinguish them.
join r ▷◁θ s = σθ (r × s)
ˆ union ∪, difference −
relations must have the same arity, i.e., same number of attributes
...and the attribute domains must be compatible
ˆ rename ρ
ρx (E), where E is a relational algebra expression
A result of E, by default, does not have a name to be referred to by.
Can also be used as ρx(A1,...,Ak) (E)

These are fundamental operators of relational algebra, which cannot be written in terms of others.

Extended Operators
ˆ duplicate-elimination δ

ˆ extended projection πB+C→X

ˆ aggregation AVG, SUM, etc

ˆ grouping γ
grouping attributes γaggregation (r)

dept name γavg(salary) as avg salary (instructor)

...but do we need to project this: Πdept name,avg salary ()?


ˆ sorting τ

ˆ outer join ▷◁

ˆ assignment ←
A query can be written as a sequential program consisting of a series of assignments.

3
Chapter 3
DDL in SQL
We have data types like:
ˆ char(n): fixed-length string
ˆ varchar(n): string with maximum length n
ˆ int: integer, machine-dependent
ˆ smallint: small integer, machine-dependent
ˆ numeric(p, d): aka decimal, fixed point number
e.g. numeric(3,2) can store 3.24
ˆ real, double precision: floating point, machine-dependent
We create tables like:
CREATE TABLE table_name (
attr0_name attr0_type,
attr1_name attr1_type NOT NULL,
...
attrn_name attrn_type,
PRIMARY KEY (attrp0_name, ..., attrpm_name),
FOREIGN KEY (attrf0_name, ..., attrfk_name) REFERENCES other_table
)
We alter tables like:
ALTER TABLE r
ADD attr_name attr_type
or:
ALTER TABLE r
DROP attr_name
We drop tables like:
... ROBERT’);
DROP TABLE students; --
I bet you are already familiar with SELECT FROM WHERE. Beware to use DISTINCT or ALL keywords
to explicitly eliminate or keep duplicate rows.

String Operations
We have LIKE operator, % for zero or more of any characters, for any single character, specifiable
escape character using ESCAPE operator. MySQL is not case sensitive even even for LIKE operation.
We can also concatenate with —— operator, convert case with UPPER() and LOWER() functions, get
length or extract substring with LEN() and SUBSTRING() functions, etc.

Clauses
WHERE is a clause.
HAVING applies to each group, while WHERE applies to each tuple before forming groups.
ORDER BY can order the tuples of the result by one or more attributes. You can specify ASC(default)
or DESC per attribute, like:
ORDER BY dept_name ASC, gpa DESC
GROUP BY is a clause to be used with aggregate functions. Result can only contain aggregate values
and/or the grouping attribute(s).
WITH is a clause to define a temporary relation, called common table expression (CTE),

4
Clause Predicates
BETWEEN operator like:
WHERE gpa BETWEEN 2.7 AND 3.7
Row constructor, like:
SELECT name, course_id
FROM instructor, teaches
WHERE (instructor.ID, dept_name) = (teaches.ID, ’History’)
...honestly not seeing much point here.

Set Operations
UNION, INTERSECT, and EXCEPT(set difference), these are DISTINCT by default, specify ALL
keyword when needed.

Subqueries
In the following SQL query:
SELECT a1, a2, ..., an
FROM r1, r2, ..., rm
WHERE p
ai can be replaced by a subquery that generates a single value, aka scalar subquery
ri can be replaced by any valid subquery
P can be replaced with an expression of the form ”attribute <operation>(subquery)”

Set Comparison Operator


ˆ SOME/ALL: comparison holds true for some/all row in the subquery
ˆ EXISTS: subquery is not empty
ˆ UNIQUE: subquery contains no duplicate rows
...but is there even an SQL variant actually supporting this?

Modification of the Database


INSERT, DELETE, and UPDATE, like...

INSERT INTO takes (attribute, names, optionally)


VALUES (attribute, values, in_order);

INSERT INTO takes


SELECT studentID, courseID, "Spring"
FROM student, course
WHERE ...;

UPDATE instructor
SET salary = salary * 1.05
WHERE ...;

UPDATE instructor
SET salary = CASE
WHEN salary <= 100000 THEN salary * 1.05
ELSE salary * 1.03
END;

DELETE FROM instructor; -- no clause: delete everyone!

5
Chapter 4
Join
There are either inner or left/right/full outer join, depending on how unmatched(dangling) tuples are
treated.
We can use ON predicate/NATURAL/USING (attrs) for join condition.

student NATURAL JOIN takes; -- uses studentID by default


instructor JOIN teaches USING(courseID); -- every instructor who ever opened the course

That was inner join, the default way. Outer join, in addition to the result of inner join, keeps tuples
that have no match. There are LEFT/RIGHT/FULL OUTER JOIN to keep unmatched tuples from
left/right/both operand relations.
There is JOIN ... ON predicate, which is about the same as WHERE.

View
A view is a virtual relation, i.e. is not stored.
CREATE VIEW viewname AS (query);

But can we modify a view to modify stored relation(s)? Most SQL implementations allow updates only
on simple views, where
ˆ FROM: only one relation

ˆ SELECT: only attribute names of the relation


no expressions, aggregates, or DISTINCT specification
must contain every not-null attribute
ˆ no GROUP BY or HAVING clause

Then there is materialized view, a view that is physically stored. How do we keep it up-to-date? Periodic
reconstruction: unacceptable for applications requiring up-to-date data. Incremental maintenance: only
recompute parts that are affected by the changes of underlying base tables

Transaction
ˆ A: atomicity

ˆ C: consistency

ˆ I: isolation

ˆ D: durability

Integrity Constraints
On a single relation: NOT NULL, PRIMARY KEY, UNIQUE(=superkey), CHECK (predicate)
Referential integrity: like, if a row in the instructor table has value ”Biology” for attribute deptName,
then there must exist a row in department table whose deptName is ”Biology”.
Foreign key constraint: value in R.A must appear in the primary key of S. A is called a foreign key. And
NULL is allowed unless declared NOT NULL.

You might also like