You are on page 1of 52

Records: a collection of data elements types, typically in fixed number and sequence

(usually the rows)


Data Element: a unit of data that is defined (usually columns)
• STUDENT stores data on each student
• COURSE stores data on each course
• SECTION stores data on each section of a course,
• GRADE_REPORT stores the students grades in
sections
• PREREQUISITE stores the prerequisites of each course

What can we do with a database?


• Query is a request for data from a database which can cause some data to be retrieved.
• Transaction: an executing program or process that includes database accesses, such as reading or
updating of database records.

And basically manipulate the dataset with all different forms of querying and updating.

Database Management System

DBMS is a general-purpose software system that facilitates the process of defining, constructing,
manipulating, and sharing databases among various users and applications.
DBMS provides efficient, reliable, convenient, and safe multi-user storage of, and access to data.

What DBMS provides:


Defining a database that involves specifying the data types, structures, and constraints of the data
to be stored in the database.
Database definition is stored by the DBMS in the form of a database catalog called meta-data.
Constructing the database which is the process of storing the data on some storage medium that is
con- trolled by the DBMS.
Manipulating a database includes functions such as querying the database to retrieve specific data,
updating the database to reflect changes.
Sharing a database allows multiple users and programs to access the database simultaneously.
Advantages of using DBMS
• Redundancy control: one is easier to manage than many
Database system
Database + DBMS = Database system
Properties of a DBMS
has a Catalog which contains not only the data, but also information on file structures, type and
storage format of each data item, and constraints on the data.
Types of DBMSs
DBMS can be classified based on:
Number of users: Number of #locations where the database is Data Model on which the DBMS is
Single user systems - kept: based
one user at a time -Centralised - single site • Relational DBMS
Multi-user systems - -Distributed - multiple locations • Hierarchical DBMS
many users at the -Homogeneous - same DBMS software on all• Network DBMS
same time sites • Object DBMS
-Heterogeneous - different software on each• Object-relational DBMS
site • XML DBMS

Data Model
• Abstraction: suppressing details of data organization and storage, and highlighting the essential
features for an improved understanding of data.
• Model: an abstraction of a real-world object or event
• Useful in understanding complexities of the real-world environment
Example: Blue print vs actual product
• Data abstraction - different users can look at the data at their desired level of detail
• Data model - means to achieve data abstraction
• Basic operations - insert, delete, modify, retrieve
• Dynamic behaviour - the database designer can specify valid user-defined operations that are
allowed on the database objects.
• Data model: a collection of concepts that can be used to describe the structure of a database.
• Structure of a database: the data types, relationships, and constraints that apply to the data.
High-level (or conceptual) data models provide concepts that are close to the way
many users perceive data.

Example1: Entity-Relationship
They have Entity (eg.Star), Attribute(eg.colour) and relationship(eg.Is_on_label)
Example 2: Relational, Network and Hierarchal data models

Implementation (representational) data models provide concepts that fall between the
above two, used by commercial DBMS implementations (e.g. relational data models)
Low-level (or physical) models provide concepts that describe details of how data is
stored on the computer storage, such as disks.

These data models provide concepts to describe the details of how data is stored on the
computer storage (such as disk)
Examples: Record formats, record orderings, access path, index . . .
Database Schema
Schema:
• database structure
• data types
• constraints on the database
Schema Diagram: An illustrative display of a database schema:

Schema Construct: a component (or an object) of the schema, such as STUDENT, COURSE.
Three-Schema Architecture
Schema can be defined at the 3 levels:
External schema describes the various user views. Typically
uses the same data model as the conceptual schema.
Conceptual schema describes the structure and constraints
for the whole database.
Typically uses conceptual data model.

Internal schema describes physical storage structures and


access paths.
Typically uses a physical data model.

Database State
• Empty State: when the database has no data.
This is when we define a new database, and specify its database schema to the DBMS.
• Initial State: when database is initially loaded with some data.
Every time the database is updated, we get another database state called current state
• Valid State: is a state that satisfies the structure and constraints specified in the schema
DBMS refer to the schema whenever it needs to.
Database schema does not change frequently while Database state changes every time the
database is updated. DBMS stores the descriptions of the schema constructs and constraints (called
meta-data) in the DBMS catalog
Database Design
Requirements Collection and Analysis:
• Data requirements: what kind of data is needed
o database designers interview users to understand their data requirements.
• Functional requirements: what operations is
performed

The main steps of the database design process:


Entity-Relationship (ER)

• is a popular high-level conceptual data model, frequently used for the conceptual
design of database applications.
• ER diagrams: diagrammatic notation associated with the Entity Relationship model.
• Entity Relationship model describes the data by: focusing on mini-world, and not on
the database and not on the storage details
Main constructs in the ER model:
• Entities represents a “thing” in the mini-world with an independent existence
Example: person, music, computer, address, employer, movie, product, student,
book, etc.
• Relationships
• Attributes : A property of an entity, entity type or a relationship type.
Example: name of a student, salary of an employee, color of a book, location of a
house, ...
Each attribute has a value set (or data type) such as integer, string, date, ...
Common Data Types
Name Syntax Description
Integer Integer 32-Bit signed integer values between -2^31 and 2^31
Double Double 64-Bit floating point values of approximate precision
Numeric Numeric(p,s) A number with p digit before the decimal and s digitals after the decimal
Character Char(x) A textual string of the exact length x
Varying Character Varchar(x) A textual string of the maximum length x
Date Date Stores year, month, and day
Time time Stores hour, minute, and second values
• Entity Type: a collection (or set) of entities that have the same attributes.
• Attributes of an entity type are represented with ovals and are attached to their
entity type by straight lines.
• Entity Set: the collection of all entities of a particular entity type in the database at a
point in time. Usually referred to using the same name as the entity type
Example: EMPLOYEE refers to both entity type and the set of all employee entities
(entity set).

Types of Attributes:
(a) Simple or Composite
• Composite attribute can be composed of several components.
Example: FullName ( FirstName, LastName )
• Simple attribute (atomic) cannot be divided into smaller parts.
Example: Social Security Number (SSN) or gender
Example: a street address can also be decomposed into street number, street name,
and apartment number

(b) Single-valued or Multi-valued


• Single-valued attribute have a single value for a particular entity;
Example: Age is a single-valued attribute of a person.
• Multi-valued attribute can have a set of values for the same entity.
Example: Color of a CAR or Degrees of a STUDENT.
Denoted as { Color } or { Degrees }
• Multivalued attributes are displayed in double ovals.

Color = { White, Red }

• Multivalued attributes may have lower and upper bounds to constrain how many
values there can be assigned for an attribute

(c) Stored or Derived


• Sometimes two (or more) attribute values are related.
Example: the Age and Birth_date attributes of a person.
• Age attribute is called a Derived attribute and is said to be derivable from the
Birth_date attribute, which is called a Stored attribute.
o Stored: date_of_birth=29.12.1979
Stored: current_date=26.08.2021
Derived: age=f(current_date,date_of_birth)
Key Attributes
• Key Attribute: one or more attributes of an entity type whose values are distinct for
each individual entity in the entity set.
• Key attribute are represented underlined inside oval.

Example: an EMPLOYEE entity type can be identified with the key attribute of Ssn
(Social Security number).
Domain of Attribute
• Value Sets (Domains) of Attributes specify which values the attributes can take
Example 1: Gender takes values from { female, male }
Example 2: Age can only take integers from 16 to 70
• Data types: Boolean, integer, string, date, time, ...
• Attribute A of Entity Set E whose Value Set is V can be defined as a Function
• Function maps E to the Power Set P(V ) of V:
• Mathematically: A : E -> P(V)
Power set is the set of all subsets
A is attribute
E is Entity set
P is Value Set of Attribute
• The value of Attribute A for Entity e is A(e)
Example: name(e) = John
Relationship
In ER diagrams, Relationship Types are displayed as diamond-shaped boxes,
connected by straight lines to participating entity types.
Example: EMLPOYEE works_for DEPARTMENT

• Relationship Type R: among n entity types E , E , ..., E defines a set of a


1 2 n
relationships among entities from these entity types.
• Every entity e , e , ..., e participates in relationship instance r = (e , e , ..., e ).
1 2 n i 1 2 n

Cardinality Ratio
Main types of Constraints in binary relationship:
(a) Cardinality Ratio: specifies the max number of relationship instances that an entity can
participate.
example

example
DEPARTMENT : EMPLOYEE
is of cardinality ratio
• Possible cardinality ratios for binary relationship types:
o 1:1 o 1:N

o N:1 o M:N

• A line can also be associated with min, max cardinality


• This is presented in the form (min, max)
• (min, max) notation: means that each entity e in the entity set E must participate in
at least min entries of the relationship, and at most max.
• “1” Entity Type A can relate to “1..N” Entity Type B
Example: “1” DEPARTMENT can employ “1..N” EMPLOYEEs

• Note: (min, max) notation has the values reversed compared to the 1-N relationship.
• Note: indicates “no limit”
*
• Min value of 1 indicates Total Participation

Participation Constraint
• Main types of Constraints in binary relationship:
(b)Participation Constraint: specifies the minimum number of relationship instances
that an entity can participate in.
• It is also called Minimum Cardinality Constraint
Participation constraints can be:
(a) Total (Existence dependency) is displayed as a double line connecting the
participating entity type to the relationship.
means that every entity in the total set of entities must participate in at least one
relationship in the relationship set. It specifies whether the existence of an entity
depends on its being related to another entity via the relationship type.
Example: if every EMPLOYEE must work for a DEPARTMENT, then,
participation of EMPLOYEE in works_for is a Total participation.

(b) Partial is represented by a single line


means some entities may participate in a relationship in the relationship set, but not
necessarily all.
Example: only some or part of the set of employee entities are related to some
department entity via MANAGES, but not necessarily all.

Key Attribute
• Key Attribute: one or more attributes of an entity type whose values are distinct for
each individual entity in the entity set.
• Key attribute are represented underlined inside oval.

Strong Entity Type

• Entity Type that has key attributes of its own is called Strong Entity Types.
• Key attribute are represented underlined inside oval.

Weak Entity Type

• Entity Type that does not have key attributes of its own is called Weak Entity Types.
• Weak Entity Type is represented by a double line box.

example:

Identifying Relationship of a Weak Entity Type is represented by double line diamond.

Example: suppose CUSTOMER never receives two LOANs at the same Date. So Date can be
the partial key for LOAN.
Partial Key is underlined with a dashed line.

• Owner (or identifying) Entity Type is an entity type that is related to Weak Entity
Types through combination of some of their attribute values.
• Weak entity can not be identified without owner entity.

Degree of a relationship type


• Degree of a relationship type is the number of participating entity types.
Example: the relationship works_for is of degree 2.
A relationship type of degree 2 is called Binary. A relationship type of degree 3 is called Ternary.

Enhanced Entity-Relation Model (EER) Recursive Relationship


Derived Attribute: two (or more) attribute • Each entity type that participates in a
values related. relationship type plays a particular Role in
Example: Age and Birth_date attributes of a the relationship.
person. • A relationship between two entities of
Stored: date_of_birth = 09.09.2000 Derived: age similar entity type (with different roles) is
= 20 called Recursive Relationships
Example: EMPLOYEE participates in
SUPERVISION:
- in the role of supervisor (boss)
- in the role of supervisee

Relationship Attributes Attributes of 1:1 or 1:N relationship types can be


Relationship types can also have attributes, migrated to one of the participating entity types.
similar to those of entity types.
Example: Start_date attribute of the Example: the Start_date attribute for the manage
manage relationship type. relationship can become an attribute of
EMPLOYEE (or DEPARTMENT)

Ternary to Binary

Example: SUPPLY Ternary Relationship (s, j, p) Example: SUPPLY as Binary Relationships (s,j), (j,p),
(s,p)
Ternary relationship type represents different
information than do three binary relationship
types.

Example: SUPPLY represented as a weak entity


type.
SUPPLY is identified by the combination of its
three owner entities, i.e., SUPPLIER, PART, and
PROJECT

Enhanced Entity Relationship


EER = ER + Subclass - Superclass Subclass vs Superclass
Category - Union type • An entity type A is a Subclass (Subtype) of
Specialization - Generalization an entity type B if and only if the entity set
Attribute - Relationship Inheritance of A is at the same time an entity set of B.
Class & Subclasses • We say that B is a Superclass (Supertype) of
Suppose an Entity Type that has Subtypes A.
Database designer is interested to explicitly • An entity cannot exist in the database
represent them. merely by being a member of a subclass; it
Enhanced ER diagram can represent such must also be a member of the superclass.
relationships
Inheritance Specialization
• Entities in Subclass inherit attributes from Example: the process of defining an Army and
Superclass. Civilian subclasses of a Person superclass.
• Subclass can have attributes that
superclass does not.

Specialization vs Generalisation
Specialization allows us to: Generalization: process of suppressing the
-define a set of subclasses of an entity type differences among several entity types, identify
-establish specific attributes for each subclass their common features, to generalize them into a
-establish specific relationship types between single superclass of them.
subclasses and other entity types -Generalization is a bottom-up conceptual
synthesis
-Generalization is the inverse process of the
Specialization

Example: CAR and TRUACK have several common attributes, such as Price, Vehicle_id, and
License_plate_no
Example: they can be generalized into a superclass entity type VEHICLE.
Example:
CAR, and TRUCK is the specialization of VEHICLE VEHICLE is the generalization of CAR and
TRUCK.

Constraints on Specialization or Generalization:


(a) Predicate-defined (or condition-defined) (b) User-defined
if we place a condition on the value of attributes if we do not have a condition for determining
of the superclass to exactly determine the membership in subclasses and membership is
subclasses entities specified individually for each entity by the user
Example: we can specify a condition in the (c) Disjointness
ENGINEER subclass by Job_type = “engineer”. specifies that an entity is a member of at most
one of the subclasses in the specialization
(subclasses are disjoint sets)
• Disjointness is displayed by a d in circle
(b) No Disjointness
if the subclasses are not disjoint, their entities
Attribute-defined sets may overlap and so entities may be member
Example: we can specify a condition in the of more than a subclass.
ENGINEER subclass by Job_type = “engineer”. Overlapping is displayed by an o in circle

(d) Completeness (or totalness) (d) Not Completeness (or no totalness)


Total specialization specifies that every entity in the Partial specialization allows an entity not to
superclass must be a member of at least one subclass. belong to any of the subclasses.
It is displayed by a double line to connect the superclass It is displayed by a single line to connect the
to the circle superclass to the circle

Constraints Specialization Lattice:


Specialization hierarchy: a subclass can be a subclass in more than one
class/ subclass relationship.
every subclass participates as a subclass in only one
class-subclass relationship
so each subclass has only one parent (a tree structure)

Shared Subclass UNION


Shared subclass is a class that has more than one Union (or Category) a collection of entities from
superclasses different entity types
Shared subclass inherits the attributes from all its UNION is displayed by a circle with the ∪ symbol
superclasses and represented multiple inheritance.

Relational Data Model (based on logic, all data represented as tuples


grouped into relations)
• Relational Models represent the database as a collection of relations.
• Each relation resembles a Table of values: each row in the table represents a
collection of related data values

When we refer to a Relational Database, we implicitly include both:


• Database Schema &
• Current state

Notation
A relational schema R of degree n is R(A1 , ..., An)
• Uppercase letters R, Q, S denote relation names
• Lowercase letters q, r, s denote relation states
• Lowercase letters t, u, v denote tuples
• Relation (or relation state) r of the relation schema R(A1, ..., An) also denoted by
r(R), is a set of n-tuples
r = { t1, ..., tm }

• A relation state r(R) is also called extension


• A relation schema R(A1, ..., An) is also called intension
• Domain D of attribute Ai is a set of atomic values.
• Atomic value is indivisible in formal relational model.
• Domain D is represented by Dom(Ai)
Examples:
Norwegian personal numbers: 11 digit integer
Department_name: strings of length 2

Relational Model
• Several attributes can have the same domain while attribute names indicate
different roles for domain.
Example: in a relation:
Home_numbers may refer to the role of Home_phone Office_phone may refer to
the office phone
Mobile_phone may refer to the role of mobile phone
Relation (or relation state) r of the relation schema R(A1, ..., An) also denoted by r(R), is a
set of n-tuples
Notation
Relation . Attribute
• An n-tuple t in r(R) is denoted t=<v1,... , vn>
• t[Ai] and t.Ai and t[i] refer to value vi for Ai in t

• Alternative definition: a tuple can be considered as a set of (<attribute>, <value>)


pairs where the attribute name appears with its value.
Example: t1 and t2 are identical:
t1 = < ( Name, Dick Davidson ), ( Ssn, 422-11-2320 ), ... ,(Age, 25 ) > t2 = <( Ssn, 422-
11-2320 ), ( Age, 25 ), ... ,( Name, Dick Davidson ) >
• Value in the Tuple: is an atomic (not divisible into components), and hence,
composite or multivalued attributes are not allowed

• Multivalued attributes can be represented by separate relations


• Composite attributes can be represented by their simple component attributes

• Interpretation (meaning) of a Relation: Each tuple in the relation can be interpreted


as a fact.
Example:
fact: there is a STUDENT whose Name is Benjamin Bayer, Age is 19, and so on.
Sometimes with Nulls
Categories of Constraints:

1. Inherent-based constraints (implicit constraints) 2.Schema-based (explicit constraints)


The characteristics of relations are the inherent are constraints that can be expressed in the schema
constraints of the relational model. of the relational model via the DDL (Data Definition
Example: the constraint that a relation cannot Language).
have duplicate tuples is an inherent constraint.
3.Application-based (semantic or business rules) 4.Data dependencies (data design)
relate to the meaning and behavior of attributes. include functional dependencies and multivalued
They are checked within application programs that dependencies.
perform database updates, as they are difficult to They are used mainly for testing the “goodness” of
express and enforce by data model the design of a relational database and are utilized in
a process called normalization.

Schema-based constraints
(a) Domain constraints: specify that all attribute values must be atomic within each tuple
and obtained from their respective domain.
Data types associated with domains typically include standard numeric data types for
integers, characters, booleans, strings, date, time, timestamp, and even money, etc.
(b) Constraints on NULLs
(c)Entity integrity constraints
(d) Referential integrity constraints
(e)Key constraints: means that no two tuples can have the same combination of values for
all their attributes.
Relation is defined as a set of tuples, and all elements of a set are distinct. Hence, all tuples
in a relation must also be distinct.
Superkey (SK) is a selection of attributes that always have a unique combination of values
for all tuples.
Example

Superkey is a selection of attributes that always have a unique combination of values for all
tuples.
(e)Key constraints
Superkey: of the relation schema R(A1, ..., An) is a set {B1, ..., Bn} ⊆ {A1, ..., An}, such that
for any relational state r(R) we have that:
t[B1, ..., Bn] ≠ t’[B1, ..., Bn]
t1 [Ssn,Name] ≠ t2 [Ssn,Name] ≠ t3 [Ssn,Name]
Key (K) is a superkey when removing any attribute A from K leaves a set of attributes that is
not a superkey anymore.
If more keys exist, they are all called candidate keys. But the one that is used is called
Primary Key!
Key Constraints
• It is common to designate one of the candidate keys as the Primary Key and
represented with underline.
• This is the candidate key whose values are used to identify tuples in the relation.

• Entity Integrity Constraint states that no Primary Key value can be NULL.
• This is because the Primary Key value is used to identify individual tuples in a
relation and with a NULL value, we cannot identify some tuples.
• Relational Database Schema S is a set of relational schemas S = {R1, R2, ..., Rn} and
a set of Integrity Constraints (IC).

• Relational database State DB of S is a set of relational states DB={r1, r2, ..., rm} such
that ri is a state of Ri and ri relation states satisfy the Integrity Constraints (IC).
• Referential integrity constraint states that a tuple in one relation that refers to
another relation must refer to an existing tuple in that relation (consistency)
• Let R1 and R2 be relational schemas. Let FK be a selection of attributes from R1, and
let PK be the primary key in R2.

FK is a Foreign Key in R1 referring to R2 if:


(a)Each attribute in FK matches an attribute in PK, and vice versa (they necessarily have the
same domains).
(b)Values of FK in any tuple t1 in state R1 are either NULL or occur as values for PK in a tuple
t2 in the state of R2. So that t1 [FK] = t2 [PK]
• Referential Integrity is achieved by ensuring that all references go via foreign keys.
• We specify in the schema which references there are and what is required for these
to be foreign keys.
Insert
• Insert operation provides a list of attribute values for a new tuple t that is to be
inserted into a relation R.
• Insert can violate:
1. Domain constraints when giving an attribute value that is not in the domain for
that attribute
2. Key constraints when giving an already existing key value
3. Entity integrity: when any part of the primary key of the inserted tuple is NULL
4. Referential integrity: when value of any foreign key in inserted tuple does not exist
in referenced tuple
• If insertion violates a constraint, the default is to reject.
• But would be useful if the DBMS could provide a reason to the user as to why the
insertion was rejected.
• Another option is to attempt to correct the reason for rejecting the insertion, but
typically not for violations caused by Insert

Operations:
Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, Insert <‘Alicia’, ‘J’, ‘Zelaya’, ‘999887777’, ‘1960-04-05’,
‘1960-04-05’, ‘6357 Windy, Katy, TX’, F, 28000, ‘6357 Windy Lane, Katy, TX’, F, 28000,
NULL, 4> into EMPLOYEE. ‘987654321’, 4> into EMPLOYEE.
Result: accepted. This insertion satisfies all Result: rejected. This insertion violates the key
constraints. constraint since another tuple with the same Ssn exists
Insert can violate: in the EMPLOYEE relation
1. Domain constraints when giving an attribute
value that is not in the domain for that attribute Insert can violate:
2. Key constraints when giving an already 3. Entity integrity: when any part of the primary key of
existing key value the inserted tuple is NULL

Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, NULL, ‘1960-04- Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-
05’, ‘6357 Windy Lane, Katy, TX’, F, 28000, 05’, ‘6357 Windswept, Katy, TX’, F, 28000,
NULL, 4> into EMPLOYEE. ‘987654321’, 7> into EMPLOYEE.
Result: rejected. This is a violation of referential integrity
Result: rejected. This insertion violates entity constraint specified on Dno in EMPLOYEE as no
integrity constraint (NULL for the primary key referenced tuple exists in DEPARTMENT with
Ssn). Dnumber = 7

Correction: DBMS could ask the user to provide


a value for Ssn, and could then accept the
insertion if a valid Ssn value is provided.

Insert can violate: Correction: DBMS could either ask the user to change
4. Referential integrity: when value of any the value of Dno to a valid value (or NULL), or to insert a
foreign key in inserted tuple does not exist in DEPARTMENT tuple with Dnumber = 7
referenced tuple • If insertion violates a constraint, the default is to
reject.
• But would be useful if the DBMS could provide a
reason to the user as to why the insertion was
rejected.
• Another option is to attempt to correct the
reason for rejecting the insertion, but typically
not for violations caused by Insert

Delete
• Delete is an operation that is used to destroy tuples from the table.

Delete the WORKS_ON tuple with Essn = Delete the EMPLOYEE tuple with Ssn = ‘999887777’.
‘999887777’ and Pno = 10. Result: not acceptable, because there are tuples in
Result: accepted. exactly one tuple is deleted. WORKS_ON that refer to this tuple. Hence, if the tuple
in EMPLOYEE is deleted, referential integrity violations
Delete can violate only referential integrity will result.
Example:
Possible responses from the DBMS: Acknowledge Correction: DBMS can automatically delete offending
Delete and leave database unchanged tuples from WORKS_ON with Essn = ‘999887777’.
Delete t2 and replace reference values in t1 with
NULL
Delete both t2 and t1 (and any tuples that
reference t1)

Delete the EMPLOYEE tuple with Ssn = Update


‘333445555’. Update (or Modify) operation is used to change the
Result: not acceptable, deletion results in values of one or more attributes in a tuple (or tuples) of
referential integrity violations, because tuple some relation R.
involved is referenced by tuples from all
EMPLOYEE, DEPARTMENT, WORKS_ON, It is necessary to specify a condition on the attributes
DEPENDENT relation. of relation to select a tuple (or tuples) to be modified.
Result: not acceptable, deletion results in worse
referential integrity violations, because tuple Lets check examples of Update operations.
involved is referenced by tuples from EMPLOYEE
& other relations.
Update the salary of the EMPLOYEE tuple with Ssn =
Correction: DBMS may delete all tuples from ‘999887777’ to 28000.
WORKS_ON and DEPENDENT with Essn = • Result: accepted
‘333445555’. Tuples in EMPLOYEE with Super_ssn
= ‘333445555’ and tuple in DEPARTMENT with
Mgr_ssn = ‘333445555’ can have their
Super_ssn/Mgr_ssn changed to valid values or to
NULL.

Update the Dno of the EMPLOYEE tuple with Ssn Update the Dno of the EMPLOYEE tuple with Ssn =
= ‘999887777’ to 1. ‘999887777’ to 7
Result: accepted
Updating an attribute that is not part of a Result: not acceptable. because it violates referential
primary key nor of a foreign key usually causes integrity.
no problems
Upon modifying a foreign key it must be verified Update the Ssn of the EMPLOYEE tuple with Ssn =
that the new value refers to existing tuple or is ‘999887777’ to ‘987654321’.
set to NULL Result: not acceptable. it violates primary key
constraint by repeating a value that already exists as a
DBMS needs to check and confirm that the new primary key in another tuple; it violates referential
value is of the correct data type and domain. integrity constraints because there are other relations
Modifying a primary key value is similar to that refer to the existing value of Ssn.
deleting one tuple and inserting another in its
place because we use the primary key to identify
tuples.

Types of Collections
Creating Domains
• Numeric
• Character-string
• Bit-string
• Boolean
• Date
• Timestamp
• Interval

CREATE DOMAIN Ssn_type AS CHAR(9);


Numeric data types include:
• Integer numbers of various sizes (INT, and SMALLINT)
• Floating-point (real) numbers of various precision (FLOAT, and DOUBLE PRECISION)
• Character-string data types are:
o Fixed length represented by CHAR(n)
where n is the number of characters (0 to 255).
o Varying length represented by VARCHAR (n) where n is the max number of
characters (0 to 65535).
• DATE data type has the format of YYYY-MM-DD (‘1000-01-01' to ‘9999-12-31')
• TIME data type has the format: hh:mm:ss (‘-838:59:59’ to ‘838:59:59)
• TIMESTAMP data type has the format of YYYY- MM-DD hh:mm:ss (‘1970-01-01
00:00:01’ to '2038-01-09 03:14:07' UTC).
• DATE, TIME, and TIMESTAMP can be considered as a special type of string.
• They can generally be used in String comparisons by being cast (or coerced or
converted) into the equivalent strings.
Auto Increment IF NOT EXIST
• Sometimes we would like to have a • IF NOT EXISTS checks if the table already exists in
unique number to be generated the database.
automatically when a new record is • If this is the case, the statement will be ignored
inserted into a table. and the new table will not be created!
• This is usually the primary key that
we would like to be created
automatically every time a new
record is inserted.

Operations However, the designer can specify an alternative action


Operations of the relational database: to be taken by attaching a referential triggered action to
• Retrievals any foreign key constraint.
• Updates The options include SET NULL, CASCADE, and SET
Constraints Revisited DEFAULT.
• Operations sometimes violate
Referential integrity constraint.
• Referential integrity is specified via
the FOREIGN KEY clause.
• The default action that SQL takes for
an integrity violation is to Reject the
operation that will cause a violation,
which is known as the RESTRICT
option.

From EER to Relational Model

Database Design
(a)This lecture focuses on the logical database design process, which is also known as data
model mapping.
EER = ER + Subclass - Superclass
Specialization - Generalization
Category - Union type

(a)Mapping algorithm for ER and EER:


(a) 7 steps: ER model constructs - àRelations
mapping
(b) 2 steps: EER model constructs-- -> Relations

mapping

(a)Mapping algorithm: 7 steps for ER + 2 steps for EER:


(a)Step 1: Strong Entity (a)Step 5: M:N relationship
(b)Step 2: Weak Entity (b)Step 6: Multivalued attributes
(c)Step 3: 1:1 relationship (c)Step 7: n-ary relationship
(d)Step 4: 1:N relationship (d)Step 8: specialization/generalization
(e)Step 9: Union types (categories)
Relational Algebra
it provides foundation for relational model operations.
it is used as a basis for implementing and optimizing queries in the query processing and
optimization modules in Relational DBMS
its concepts are incorporated into the SQL standard query language for Relational DBMSs.

SELECT σ is used to chose a subset of the SELECT σ is used to chose a subset of the tuples from a
tuples from a relation that satisfy a selection relation that satisfy a selection condition
condition Example
σ (Employee)
Salary>30000
Equivalent to:
SELECT *
FROM EMPLOYEE
WHERE Salary>25000;

Question:
If |σc(R)| is the number of tuples in the resulting
relation, then: |σc(R)|≤|R|
PROJECT SELECT σ is used to retrieve rows
PROJECT π is used to retrieve columns
Nested SELECTs
PROJECT π is used to chose a subset of the σ <cond1>( σ<cond2>(---( σ<condn>(R))…)) = σ<cond1>
attribute values from all the tuples from a AND<cond2>AND…AND<condn>(R)
relation
Nested PROJECT: iff the first list is a subset of the second
which is a subset of the third ..., then:
• π<S1>(π<S2>(...(π<Sn>(R))...)=π<S1>(R)

Question:
π (Employee) Which statement is correct:
Sex, Salary σ<c1> ( σ<c2>(R) ) = σ<c2> ( σ<c1>(R) )
equivalent to: So σ is commutative.
SELECT DISTINCT Sex, Salary The following is not correct:
FROM EMPLOYEE; π<S1> ( π<S2>(R) ) ≠ π<S2> ( π<S1>(R) )
So π is not commutative! But:
π<S1> ( π<S2>(R) ) = π<S2> ( π<S1>(R) ) iff S1 = S2

RENAME We have two options:


Example: (a) write the operations as a single relational
Find the first name, last name, and salary of all expression by nesting the operations
employees in department 5
We must apply a SELECT and a PROJECT operation

(b)apply one operation at a time and create


intermediate result relations

Option b) requires giving temporary names to the


relations that hold the intermediate results
RENAME operations can be used for that.
RENAME p is used to rename a relation name or the
attribute names, or both

RENAME p is used to rename a relation name or the attribute names, or both

Union Operation
• Union operators, same as in sets
• Union: denoted R ∪ S, includes all the unique tuples that are in R, in S, or in both R
and S.
Use Operation in Relational Algebra to find SSN of employees in department 5 and their
supervisors

Answer

Cartesian Product Difference Operation


Cartesian Product (or Cross Join) is denoted R × S, • Binary operators, same as in sets
includes all the unique tuples that have as first • Set Difference (Minus): denoted R-S, includes
element, a tuple from R, and as a second element, all the unique tuples that in R but not in S.
a tuple from S.
Example
Let R (A1, ..., An) and S(B1, ..., Bm):

Complete Set of Operations JOIN operation


{σ,π, ∪, ρ, -,×} is a complete set of operators. • JOIN ⋈, is used to combine related tuples
This means that all other operators can be into one single (longer) tuple and allows us to
expressed by using {σ,π, ∪, ρ, -,×} process relationships among relations.
Example:
R ∩ S ≡ (R∪S) - ((R-S) ∪ (S-R))

In JOIN ⋈, only pairs of tuples from R and S that


satisfy the join condition are returned, in R × S all
pairs of tuples are returned
Example

EQUIJOIN NATURAL JOIN


EQUIJOIN is the most common use of JOIN which NATURAL JOIN is used when two relations have an
involves join conditions with equality comparisons attribute that has the same name (and domain).
(=).
Example:

Definition of NATURAL JOIN requires that the two join attributes having the same name in
both relations.
If not the case, a renaming operation is applied first.

Then you can perform NATURAL JOIN:


DIVISION Operation
• DIVISION ÷ is applied to relations R(X) and S(Z), where the attributes of S are subset
of attributes of R.
• R(X) ÷ S(Z)
This basically means that Z is a subset of X.
• Let Y be the set of attributes of R that are not attributes of S.
DIVISION Operation results in a relation T(Y) that includes a tuple t if tuples tR appear in R
with tR[Y] = t and with tR[X]=tS for every tuple tS in S
1. Retrieve the list of project numbers that ‘John 2. Create an intermediate relation that includes a
Smith’ works on tuple <Essn,Pno>.
SMITH ← σ Fname=‘John’ AND Lname=‘Smith’ 3.It contains employee whose Ssn is Essn and works
(EMPLOYEE) SMITH_PNOS ←π on a project, with number in Pno
Pno (WORKS_ON ⋈
Essn=Ssn SMITH)

4. Apply the DIVISION operation to the two relations: Intersection Operation


SSNS(SSN) ← SSN_PNOS ÷ SMITH_PNOS • Binary operators, same as in sets
• Intersection: denoted R ∩ S, includes all the
unique tuples that are both in R and in S.

Additional Operations
Generalised Projection: extends the projection
operation by allowing functions of attributes to be
included in the projection list
Example: EMPLOYEE(Ssn, Salary, Deduction, Years_service)

Aggregate functions: used in basic statistical queries that aggregate date from the database
tuples . Example fig 2:

Good Design
Goals of the design activity: information preservation while minimum redundancy.
1st Normal Form 2nd Normal Form
All rows must be unique(no duplicates) Database must be in 1st Normal Form
Each cell must contain a single value (not a list) Non partial dependency - All non-prime attributes
Each value should be non divisible(can't be split should be fully functionally dependent on the
down further) candidate key

Example:
key for the relation is {Code, Manager} and the Functional
Dependencies are:
{Code, Manager} --> Type
Type --> Users
Users --> City
Code --> City, Type
Manager --> StartYear
3rd Normal Form Boyce-codd Normal Form
Database must be in 1st, and 2nd Normal Form Database must be in 1st, 2nd and 3rd Normal Form
No transitive dependency - All fields must only be For any dependency A->B, A should be a super key
determinable by the primary/composite key, not A can not be a non prime attribute while B is prime
by other keys attribute in A->B
non prime er avhengig av en annen non prime er
ikke 3rd normal form

3NF Boyce-Codd Normal Form


Schema R is in Third Normal Form (3NF) if it is in Schema R is in the Boyce-Codd normal form (BCNF) if
2NF and for all dependencies of the form X→{A} it is in 3NF and whenever a nontrivial dependency
at least one of the following holds: X→{A} holds in R, then X is a superkey of R.
(i) X is a superkey
(ii) A is a prime attribute

But this schema is in 3NF since Lastn is a prime


attribute.
But this schema is in 3NF since Lastn is a prime
attribute.

Boyce-Codd Normal Form Disadvantage: StudentNr is redundant and thus it


More particularly, is it appropriate to remove last wastes space
name from STUDENT? Advantages: a possibly more natural normalisation to
Is it acceptable to have to look up into two tables BCNF
each time you want to get the last name of a
student?

Example: Doctor (Doctor_ID, Patient#, Date, Diagnosis,


Treat_code, Charge)
Suppose the key for the relation is Doctor_ID and Functional
Dependencies are:
Doctor_ID --> Patient#
Doctor_ID --> Date
Doctor_ID --> Diagnosis
Doctor_ID --> Treat_code
Doctor_ID --> Charge

Database in Python Database communication in Python includes four stages:


1. Creating a connection object 2. Creating a cursor object
import mysql.connector
Create a table named "customers":
mydb = mysql.connector.connect(
host="localhost", import mysql.connector
user="yourusername",
password="yourpassword" mydb = mysql.connector.connect(
) host="localhost",
user="yourusername",
print(mydb) password="yourpassword",
database="mydatabase"
)
# adding more code after variable to créate DB
mycursor = mydb.cursor()
mycursor = mydb.cursor()
mycursor.execute("CREATE TABLE customers (name
mycursor.execute("CREATE DATABASE VARCHAR(255), address VARCHAR(255))")
mydatabase")

3. Interacting with the database Insert after mycursor variable:


Mycursor.exectue(“CREATE DATABASE sql = "INSERT INTO customers (name, address) VALUES
mydatabase”) (%s, %s)"
val = ("John", "Highway 21")
mycursor.execute(sql, val)

mydb.commit()

print(mycursor.rowcount, "record inserted."


SQL Transaction
ACID properties is to study transaction management well.
ACID: Atomicity, Consistency, Isolation, and Durability.
• Atomicity means all or nothing. Either all transactions are successful or none. You
can group SQL statements as one logical unit, and if any query fails, the whole
transaction fails.
• Consistency ensures that the database remains in a consistent state after performing
a transaction.
• Isolation ensures that transaction is isolated from other transaction.
• Durability means once a transaction has been committed, it persists in the database
irrespective of power loss, error or restart system.

Python MySQL Connector provides the following methods to manage database


transactions:
commit: method sends a commit statement to the MySQL server for the transaction.
rollback:revert the changes made by transaction.
• Once the program completed executing the query with changes, then you need to
call commit() method:
connection.commit()
• If a transaction fail to execute, and you want to undo all your changes, then you
need to call a rollback method:
connection.rollback()
Insert Operation Delete Table
We can insert values to the created table We can delete values existing within a
sql = "INSERT INTO customers " \ table
" (name, address) " \ mycursor = mydb.cursor()
" VALUES (%s, %s)" sql = "DELETE FROM customers " \
val = ("John", "Highway 21") "WHERE address = 'Mountain 21'"
mycursor.execute(sql, val) mydb.commit() mycursor.execute(sql)
We can update values of the tuples within
a table
mydb.commit()
sql = "UPDATE customers " \ print(mycursor.rowcount, "record(s)
"SET address = 'Canyon 123' " \ "WHERE deleted")
address = 'Valley 345'"
mycursor.execute(sql) mydb.commit()

Memory
• Memory is used to store information within a computer, either programs or data.
• Memory hierarchy by considering:
o Cost per storage unit
o Access speed
o Reliability
• Memory hierarchy aims to keep the data that is:
o Most accessed on top (so it is accessible quickly)
o Least accessed at the bottom
• Main Memory & Cache Memory refers as internal memory placed at the main
board. This memory communicates directly with CPU.
• Secondary & Tertiary Memory refers as external memory (or auxiliary memory)
because it is not located at the main board (for back-up purpose).
Memory Hierarchy
Primary Storage level Main memory (DRAM: Dynamic RAM),
[cache] Cashe is a fast accessible data (static RAM) provides the main work area for the CPU
used by the CPU to speed up execution of for keeping program instructions data.
[main memory] program instruction. Advantage is its relative low cost, which
Advantage is that it is very fast memory continues to decrease Disadvantage is its
and needs least access time. volatility and lower speed compared with
Disadvantage is its very limited capacity cache
and high cost

Flash memory is high- density, and also Magnetic Disk is type of memory which is
[Flash Memory] high- performance memory, using a flat disc covered with magnetic coating,
electrically erasable programmable read- used to store programs or files
[Magnetic Disk] only memory technology (EEPROM). Advantage is that it is less expensive than
Advantage is the fast access speed RAM and can store large amounts of data,
Disadvantage is that an entire data block Disadvantage is that data access is slower
must be erased and written over than main memory
simultaneously
Optical Drives are a form of optical Magnetic Tapes are a type of magnetic
[Optical disk] removable storage, and example is DVD memory where one side of tape has
(Digital Versatile Disc). magnetic coated material and used
[Magnetic tapes] Advantage is is that it allows mass Advantage is very low cost that makes
replication them ideal for database backup
Disadvantage is the data is sequentially Disadvantage is that they are slow and
accessed. difficult to make updates on data

Is it possible to keep the entire database in the main memory?


Yes, this is called Main Memory database and it is the case for real- time applications
requiring extremely fast response times. Still backup database on disk.
Example: Telephony switching where routing & line info are stored in the main memory.

Memory Access

Random Access: every memory location can be accessed directly rather than accessed in
sequence.
Access time is independent of location or previous access.
Example: RAM
Sequential Access: start at the beginning and read through in order

Access time depends on location of data and previous location Example: tape

o Direct Access: Individual blocks have unique address.


o Access is by jumping to vicinity + sequential search
o Access time depends on location of data and previous location Example: hard disk

Hard Disk Drive Block


One of the main storage mediums for Block is the unit of data transfer between disk
databases is HDD. Lets take a closer look at HDD and memory.
and how data is stored in there. Block is a logical division of the disk, in equal
shares, as determined by the operating system.
Physical read & write operations occur block-
wise between disk and the primary memory.

Efficient Data Access


o In real world database, the data files are extremely large and may contain billions of
data records in the blocks.
o In such databases, techniques for efficient data access is necessary to bring data into
main memory. Now we check some techniques.
o Techniques for efficient accessing data on HDDs:
o Proper organization of data on disk: Given the structure and organization of
data on disk, it is advantageous to keep related data on contiguous blocks.
o This avoids unnecessary movement of the read/ write arm and related seek
times.
Blocks in Disk
Common techniques for allocation of the blocks on disk:
Contiguous allocation (or consecutive): file Linked allocation: each file block contains a pointer
blocks are allocated to consecutive disk blocks. to the next file block. This makes it easy to expand
This makes reading the whole file very fast, but the file but makes it slow to read the whole file.
it makes expanding the file difficult.

Indexed allocation: one or more index blocks


contain pointers to the actual file blocks. This
makes reading to be faster, but index block can
be wastage of time and space.

Efficient Data Access


Techniques for efficient accessing data on HDDs:
o Use of log disks to temporarily hold writes: A single disk may be assigned to just one
function. All blocks to be written can go to that disk sequentially, thus eliminating
any seek time.
o This works much faster than doing the writes to a file at random locations, which
requires a seek for each write.
o Proper scheduling of I/O requests: If it is necessary to read several blocks from disk,
total access time can be minimized by scheduling them so that the arm moves only
in one direction and picks up the blocks along its movement.
o Reading data ahead of request: To minimize seek times, whenever a block is read
into the buffer, blocks from the rest of the track can also be read.
This works well for applications that are likely to need consecutive blocks.
o Buffering: deals with the incompatibility of speeds between a CPU and the
electromechanical device such as an HDD (slower)
Buffering is done in memory so that new data can be held in a buffer while old data
is processed by an application.
Buffer
o Buffer: part of main memory that is available to receive blocks or pages of data from
disk.
o Buffer Manager: is a software component of a DBMS that responds to requests for
data and decides what buffer to use and what pages to replace in the buffer to
accommodate the newly requested blocks.
o In nearly all computer systems, Buffer are a reserved in the main memory that holds
one disk block
o Read - data is transferred into the buffer. Write - data is transferred from the buffer
to the disk

most common buffering techniques:


For that we need to know how a process is run! Processes can run in an interleaved way or
parallel way.
Process
o Processes A & B run concurrently in interleaved fashion
o Processes C & D run concurrently in a parallel fashion

Double Buffer
o Double buffering: a technique whereby we can gain efficiency in terms of
performing the I/O operation between the disk and main memory.
o I/O operation is done in one buffer area concurrently while processing the data from
another buffer
o Using multiple buffers —> efficient performance
o Using multiple buffers can increase efficiency by allowing the CPU processes data in
one buffer while the disk transmits new data to another buffer.
o While one buffer is being read or written, the CPU can process data in the other
buffer.

o Done by an independent disk I/O processor can proceed to transfer a data block
between memory and disk in parallel to CPU processing.
Database Storage
o Databases typically store large amounts of data that must persist over long periods
of time, and hence the data is often referred to as persistent data.
o In contrasts, transient data persists for only a limited time during program
execution.
Records
o Data in a database is stored in the form of records.
o Record consists of a number of fields containing data values.

Files
A file is a sequence of records
o Fixed-length records file: every record in the file has exactly the same size in bytes
o Variable-length records file: the file records are of the same type, but one or more
of the fields are of varying size

Variable-length records occur when:


o Mixed file: The file contains records of different record types
o Repeating field: the records are of the same record type, but one or more fields may
have multiple values for individual records
o Optional fields: the records are of the same record type, but one or more of the
fields are optional

Record Blocking
Spanned organisation: one record in several blocks
Unspanned organisation: no “spill-over” of records across blocks

o Block size B [bytes]


o Records size R [bytes]
o Blocking factor bfr
If B≥R, then bfr = floor(B/R) [records per block]

How can we calculate the unused space in each block


B - (bfr * R) [bytes]
Disk Access Time
Disk Access Time:
o Seek time: time for disk heads to move to the correct cylinder
o Rotational delay: time for desired block to rotate under disk head
o Transfer time: time to read/write data in the block (= time for disk to rotate over the
block)
Disk Access Time =
Seek time + Rotational delay + Transfer time
Example:
Seek time = 40 [msec]
Rotation delay = 10 [msec]
Block Transfer Time = 1 [msec]
Time to locate and transfer 1 single block is:
Disk Access Time = 40 + 10 + 1 = 51 [msec]
*
Seek time (by average) ≈ 5 ms
Rotational delay (by average) ≈ 4.2 ms (for 7200 RPM)
*
Average Seek Time here is about a time to skip half of cylinders
Seek time = 0 (data is on the same the track)
Rotational delay = 0 (data is in the next block on track) ≈ 10 times faster than random disk
access!

Memory Operations on Files


Memory is used to store information within a Operations for locating and accessing file records
computer, either programs or data. vary from system to system.
Memory system is a collection of various forms of o Open
memory, constructed in a hierarchy o Reset
Memory hierarchy by considering: o Find (or locate)
-Cost per storage unit o Read (or get)
-Access speed o FindNext
-Reliability o Delete
o Modify
o Insert
o Close
File Header Operations on Files
File header (or file descriptor) contains meta data This is nearly similar for different data types,
about the file needed to access the file records. even BLOB
File header includes information to determine the BLOB (= Binary Large Object) is a data item that
disk addresses of the file blocks. consists of large unstructured objects which
represent images, digitised video or audio
File header may also includes: streams, or free text.
o field lengths & field types BLOBs are stored separately from the record with
o order of fields within (fixed-length) records a pointer directed at them.
o separator characters record type
But how search and find the records? Searching for
a records on a disk:
(a) Copying one/few blocks into main memory (b)
Checking the file header to find record
(c) If file header does n’t help, then do linear search

Organising Files Methods for organizing records of a file retrieval & update is optimised:
Heap files : Sorted files :We can physically order the
o Simplest organization where records are records of a file based on the values of one of
placed in the file in the order in which they the fields (ordering field).
are inserted. This leads to an ordered file.
o Hence, the new records are inserted at the
end of file.
o As you may know, this organization is
called a Heap (Pile) file

Advantage:
Inserting a new record is efficient.

Disadvantage:
Searching for a record is expensive since it
involves a linear search through the file block by If the ordering field is also a key field of the
block. file, the field is guaranteed to have a unique
Deletion is expensive value in each record, then the field is called
the ordering key.
Advantages:
o reading the records becomes efficient
o finding the next record from the
current one usually requires no
additional block accesses
o using a search condition based on the
value of the ordering key field results
in faster access with binary search
Disadvantages:
o no benefit when searching by non
ordering fields, inserting and
deleting is expensive (in time)

Hashing: Alternative Methods


Another type of primary file organization is based on Folding: applying an arithmetic or a logical
hashing, which provides very fast access to records under function to different portions of hash field to
certain search conditions. get a hash address
This organization is usually called a hash file. Example:
Hashing is a method for mapping digital data of arbitrary if m=1000, how to store 235469 —>
size —-to—> data of fixed size ‘235’ and ‘469’ (235+964) mod 1000 =
In the context of databases storage, two forms of hashing 199
exists:
§ Internal Hashing Collision
§ External Hashing Collision occurs when the hash field value of a
Hash table - files are organised into an array of m ‘slots’ record, that is inserted, hashes to an address
each containing one record that already contains a different record.
Address of each slot corresponds to the index of array In this situation, we must insert the new
Hash Function is used to transform the hash field value record in another position, since that address
into a number 1 to m is occupied.
Perfect hashing function will never creates
collisions
Collision Resolution: process of finding
another position in the collision situation.

Other Hash Methods


Examples of collision resolution methods:
• Chaining :
various “overflow” locations are kept,
with a pointer added to each record
location

• Open addressing :
checking the subsequent positions in
order until an empty slot is found

• Multiple hashing :
a second (and third...) hash is applied to
the results of the first hash
Good Hash Function
• Distribute records uniformly over the
address space to minimise collisions,
while not leaving many empty spaces
• Hash table is better to be kept 70% to
90% full
• Choose a prime number for m, it
distributes the hash addresses better
over the address space when hashing
function (mod) is used
External Hashing Dynamic Hashing
To suit the characteristics of disk storage, the target Hashing scheme described so far is called
address space is made of buckets, each of which holds static hashing because a fixed number of
multiple records. buckets m is allocated.
Hashing function maps a key into a relative bucket We are fixing the address space which can be
number rather than assigning an absolute block address a serious drawback for dynamic files.
to the bucket Example: if the number of records increases
Hashing for disk files is called external hashing to a lot, many collisions will result in and
Organising Files on Disk retrieval will be slowed down because of the
• Cluster is a number of blocks that are consecutive long of overflow records.
on the storage medium. h(K) =K mod m
• Bucket is either one disk block or a cluster of We need a way to make m a variable.
contiguous disk blocks. What if we take a h(K) not to produce the
number of the row in the table, but a binary
Extendible Hash Tables number.
What if we use this binary number as the row
d
• Extendible Hashing, uses an array of 2 bucket number!
addresses (called directory) with d factor (global • Internal nodes: that have two
depth of directory).
• Integer value corresponding to the first (high-
order) d bits of a hash value is used as an index to
the array to determine a directory entry pointers:
• Address in that entry determines the bucket in
which the corresponding records are stored o left pointer corresponds to 0
• Local depth dʹ specifies the number of bits on bit (in hashed address)
which the bucket contents are based
• When d = dʹ, the number of entries in the
directory doubles, if a bucket overflows. o right pointer corresponds to
• Halving occurs if d > dʹ for all the buckets after the 1 bit
some deletions occur.
• Leaf nodes: that hold pointer to

bucket with records

SQL SELECT
SELECT column1, column2, ... selects the "CustomerName" and "City" columns from the
FROM table_name; "Customers" table:
SELECT CustomerName, City FROM Customers;
SELECT * FROM table_name; selects all the columns from the "Customers" table:
SELECT * FROM Customers;
SELECT DISTINCT statement is used to return only distinct (different)
values.Inside a table, a column often contains many duplicate values; and sometimes you
only want to list the different (distinct) values.

SELECT DISTINCT column1, column2, ... selects only the DISTINCT values from the
FROM table_name; "Country" column in the "Customers" table:
SELECT DISTINCT Country FROM Customers;

Lists the number of different (distinct)


customer countries:
SELECT COUNT(DISTINCT Country) FROM
Customers;

SELECT Example Without DISTINCTselects all (including the


duplicates) values from the "Country" column in the "Customers" table:

SELECT Country FROM Customers;

The SQL WHERE Clause


The WHERE clause is used to filter records. It is used to extract only those records that fulfill a
specified condition.

SELECT column1, column2, ... SELECT * FROM Customers


FROM table_name WHERE Country='Mexico';
WHERE condition;

Text Fields vs. Numeric Fields


SQL requires single quotes around text values (most database systems will also allow double
quotes).

SELECT * FROM Customers


WHERE CustomerID=1;

The SQL AND, OR and NOT Operators


SELECT column1, column2, ... SELECT column1, column2, ...
FROM table_name FROM table_name
WHERE condition1 AND condition2 AND WHERE condition1 OR condition2 OR condition3
condition3 ...; ...;
selects all fields from "Customers" where country selects all fields from "Customers" where country
is NOT "Germany" and NOT "USA": is "Germany" AND city is "Berlin":
SELECT * FROM Customers SELECT * FROM Customers
WHERE NOT Country='Germany' AND NOT WHERE Country='Germany' AND City='Berlin';
Country='USA';
selects all fields from "Customers" where city is selects all fields from "Customers" where country
"Berlin" OR "München": is "Germany" OR "Spain":
SELECT * FROM Customers SELECT * FROM Customers
WHERE City='Berlin' OR City='München'; WHERE Country='Germany' OR Country='Spain';
selects all fields from "Customers" where country SELECT column1, column2, ...
is "Germany" AND city must be "Berlin" OR FROM table_name
"München" (use parenthesis to form complex WHERE NOT condition;
expressions):
SELECT * FROM Customers selects all fields from "Customers" where country
WHERE Country='Germany' AND (City='Berlin' OR is NOT "Germany":
City='München'); SELECT * FROM Customers
WHERE NOT Country='Germany';

ORDER BY Syntax
SELECT column1, column2, ... selects all customers from the "Customers" table,
FROM table_name sorted by the "Country" column:
ORDER BY column1, column2, ... ASC|DESC; SELECT * FROM Customers
ORDER BY Country;

statement selects all customers from the


"Customers" table, sorted DESCENDING by the
"Country" column:
SELECT * FROM Customers
ORDER BY Country DESC;
ORDER BY Several Columns Example
selects all customers from the "Customers" table, statement selects all customers from the
sorted by the "Country" and the "Customers" table, sorted ascending by the
"CustomerName" column. This means that it "Country" and descending by the
orders by Country, but if some rows have the "CustomerName" column:
same Country, it orders them by SELECT * FROM Customers
CustomerName: ORDER BY Country ASC, CustomerName DESC;
SELECT * FROM Customers
ORDER BY Country, CustomerName;

The SQL INSERT INTO Statement


The INSERT INTO statement is used to insert new records in a table. It is possible to write the
INSERT INTO statement in two ways:

1.Specify both the column names and the values 2.If you are adding values for all the columns of
to be inserted: the table, you do not need to specify the column
names in the SQL query. However, make sure
INSERT INTO table_name (column1, column2, the order of the values is in the same order as the
column3, ...) columns in the table. Here, the INSERT INTO
VALUES (value1, value2, value3, ...); syntax would be as follows:

INSERT INTO table_name


VALUES (value1, value2, value3, ...);
statement inserts a new record in the It is also possible to only insert data in specific
"Customers" table: columns.

INSERT INTO Customers (CustomerName, This will insert a new record, but only insert data
ContactName, Address, City, PostalCode, in the "CustomerName", "City", and "Country"
Country) columns (CustomerID will be updated
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen automatically):
21', 'Stavanger', '4006', 'Norway');
INSERT INTO Customers (CustomerName,
City, Country)
VALUES ('Cardinal', 'Stavanger', 'Norway');

NULL VALUE
A field with a NULL value is a field with no value.If a field in a table is optional, it is
possible to insert a new record or update a record without adding a value to this field. Then,
the field will be saved with a NULL value.

It is not possible to test for NULL values with comparison operators, such as =, <, or <>.

We will have to use the IS NULL and IS NOT NULL operators instead.
SELECT column_names SELECT column_names
FROM table_name FROM table_name
WHERE column_name IS NULL; WHERE column_name IS NOT NULL;

lists all customers with a NULL value in the lists all customers with a value in the "Address"
"Address" field: field:
SELECT CustomerName, ContactName, SELECT CustomerName, ContactName, Address
Address FROM Customers
FROM Customers WHERE Address IS NOT NULL;
WHERE Address IS NULL;

The SQL UPDATE Statement


The UPDATE statement is used to modify the existing records in a table.

UPDATE table_name updates the first customer (CustomerID = 1)


SET column1 = value1, column2 = value2, ... with a new contact person and a new city.
WHERE condition; UPDATE Customers
SET ContactName = 'Alfred Schmidt', City=
'Frankfurt'
WHERE CustomerID = 1;
UPDATE Multiple Records Be careful when updating records. If you omit
the WHERE clause, ALL records will be updated!
It is the WHERE clause that determines how UPDATE Customers
many records will be updated. SET ContactName='Juan';

this will update the ContactName to "Juan" for


all records where country is "Mexico":

UPDATE Customers
SET ContactName='Juan'
WHERE Country='Mexico';

The SQL DELETE Statement


DELETE FROM table_name WHERE The following SQL statement deletes all rows in
condition; the "Customers" table, without deleting the
table:
DELETE FROM Customers;
It is possible to delete all rows in a table without The following SQL statement deletes the
deleting the table. This means that the table customer "Alfreds Futterkiste" from the
structure, attributes, and indexes will be intact: "Customers" table:
DELETE FROM table_name; DELETE FROM Customers WHERE
CustomerName='Alfreds Futterkiste';
The SQL MIN() and MAX() Functions
SELECT MIN(column_name) SELECT MAX(column_name)
FROM table_name FROM table_name
WHERE condition; WHERE condition;
finds the price of the cheapest product: finds the price of the most expensive product:
SELECT MIN(Price) AS SmallestPrice SELECT MAX(Price) AS LargestPrice
FROM Products; FROM Products;

The SQL COUNT(), AVG() and SUM() Functions


The COUNT() function returns the number of The AVG() function returns the average value of
rows that matches a specified criterion. a numeric column.
SELECT AVG(column_name)
SELECT COUNT(column_name) FROM table_name
FROM table_name WHERE condition;
WHERE condition;
The SUM() function returns the total sum of a statement finds the number of products:
numeric column. SELECT COUNT(ProductID)
SELECT SUM(column_name) FROM Products;
FROM table_name [NULL values are not counted.]
WHERE condition;
finds the average price of all products: finds the sum of the "Quantity" fields in the
SELECT AVG(Price) "OrderDetails" table:
FROM Products; SELECT SUM(Quantity)
[NULL values are ignored.] FROM OrderDetails;
[NULL values are ignored.]

The SQL IN Operator


SELECT column_name(s) selects all customers that are located in
FROM table_name "Germany", "France" or "UK":
WHERE column_name IN (value1, value2, ...); SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK');
SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT STATEMENT);
selects all customers that are NOT located in selects all customers that are from the same
"Germany", "France" or "UK": countries as the suppliers:
SELECT * FROM Customers SELECT * FROM Customers
WHERE Country NOT IN ('Germany', 'France', WHERE Country IN (SELECT Country FROM
'UK'); Suppliers);
SQL Aliases
SQL aliases are used to give a table, or a column in a table, a temporary name. Aliases are
often used to make column names more readable. An alias only exists for the duration of
that query.

SELECT column_name AS alias_name SELECT column_name(s)


FROM table_name; FROM table_name AS alias_name;
creates two aliases, one for the CustomerID creates two aliases, one for the CustomerName
column and one for the CustomerName column: column and one for the ContactName column.
Note: It requires double quotation marks or square
SELECT CustomerID AS ID, CustomerName AS brackets if the alias name contains spaces:
Customer SELECT CustomerName AS Customer, ContactName
FROM Customers; AS [Contact Person]
FROM Customers;
creates an alias named "Address" that combine
four columns (Address, PostalCode, City and
Country):
SELECT CustomerName, Address + ', ' +
PostalCode + ' ' + City + ', ' + Country AS Address
FROM Customers;
To get the SQL statement above to work in
MySQL use the following:
SELECT CustomerName, CONCAT(Address,',
',PostalCode,', ',City,', ',Country) AS Address
FROM Customers;

SQL JOIN
(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from
the right table
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from
the left table
FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table
SELECT column_name(s) selects all orders with customer information:
FROM table1 SELECT Orders.OrderID, Customers.CustomerName
INNER JOIN table2 FROM Orders
ON table1.column_name = table2.column_name; INNER JOIN Customers ON Orders.CustomerID =
Customers.CustomerID;

selects all orders with customer and shipper The LEFT JOIN keyword returns all records from the
information: three tables left table (table1), and the matching records from the
SELECT Orders.OrderID, right table (table2). The result is 0 records from the
Customers.CustomerName, Shippers.ShipperName right side, if there is no match.
FROM ((Orders
INNER JOIN Customers ON Orders.CustomerID =
Customers.CustomerID) SELECT column_name(s)
INNER JOIN Shippers ON Orders.ShipperID = FROM table1
Shippers.ShipperID); LEFT JOIN table2
ON table1.column_name = table2.column_name;

select all customers, and any orders they might The RIGHT JOIN keyword returns all records from
have: the right table (table2), and the matching records
SELECT Customers.CustomerName, Orders.OrderID from the left table (table1). The result is 0 records
FROM Customers from the left side, if there is no match.
LEFT JOIN Orders ON Customers.CustomerID =
Orders.CustomerID SELECT column_name(s)
ORDER BY Customers.CustomerName; FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;
will return all employees, and any orders they The FULL OUTER JOIN keyword returns all records
might have placed: when there is a match in left (table1) or right (table2)
SELECT Orders.OrderID, Employees.LastName, table records.
Employees.FirstName
FROM Orders SELECT column_name(s)
RIGHT JOIN Employees ON Orders.EmployeeID = FROM table1
Employees.EmployeeID FULL OUTER JOIN table2
ORDER BY Orders.OrderID; ON table1.column_name = table2.column_name
WHERE condition;
selects all customers, and all orders: A self join is a regular join, but the table is joined
SELECT Customers.CustomerName, Orders.OrderID with itself.
FROM Customers
FULL OUTER JOIN Orders ON SELECT column_name(s)
Customers.CustomerID=Orders.CustomerID FROM table1 T1, table1 T2
ORDER BY Customers.CustomerName; WHERE condition;

matches customers that are from the same city:

SELECT A.CustomerName AS CustomerName1,


B.CustomerName AS CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CustomerID <> B.CustomerID
AND A.City = B.City
ORDER BY A.City;
The SQL UNION Operator
SELECT column_name(s) FROM table1 The UNION operator selects only distinct values by
UNION default. To allow duplicate values, use UNION ALL:
SELECT column_name(s) FROM table2; SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;
returns the cities (only distinct values) from both returns the cities (duplicate values also) from both
the "Customers" and the "Suppliers" table: the "Customers" and the "Suppliers" table:
SELECT City FROM Customers SELECT City FROM Customers
UNION UNION ALL
SELECT City FROM Suppliers SELECT City FROM Suppliers
ORDER BY City; ORDER BY City;
returns the German cities (only distinct values) returns the German cities (duplicate values also)
from both the "Customers" and the "Suppliers" from both the "Customers" and the "Suppliers"
table: table:
SELECT City, Country FROM Customers SELECT City, Country FROM Customers
WHERE Country='Germany' WHERE Country='Germany'
UNION UNION ALL
SELECT City, Country FROM Suppliers SELECT City, Country FROM Suppliers
WHERE Country='Germany' WHERE Country='Germany'
ORDER BY City; ORDER BY City;
lists all customers and suppliers:
SELECT 'Customer' AS Type, ContactName, City,
Country
FROM Customers
UNION
SELECT 'Supplier', ContactName, City, Country
FROM Suppliers;

The SQL GROUP BY Statement


SELECT column_name(s) lists the number of customers in each country:
FROM table_name SELECT COUNT(CustomerID), Country
WHERE condition FROM Customers
GROUP BY column_name(s) GROUP BY Country;
ORDER BY column_name(s);
lists the number of customers in each country, lists the number of orders sent by each shipper:
sorted high to low: SELECT Shippers.ShipperName, COUNT(Orders.OrderID)
SELECT COUNT(CustomerID), Country AS NumberOfOrders FROM Orders
FROM Customers LEFT JOIN Shippers ON Orders.ShipperID =
GROUP BY Country Shippers.ShipperID
ORDER BY COUNT(CustomerID) DESC; GROUP BY ShipperName;

SQL SELECT INTO Statement


Copy all columns into a new table: Copy only some columns into a new table:
SELECT * SELECT column1, column2, column3, ...
INTO newtable [IN externaldb] INTO newtable [IN externaldb]
FROM oldtable FROM oldtable
WHERE condition; WHERE condition;
creates a backup copy of Customers: uses the IN clause to copy the table into a new table
SELECT * INTO CustomersBackup2017 in another database:
FROM Customers; SELECT * INTO CustomersBackup2017 IN
'Backup.mdb'
FROM Customers;
copies only a few columns into a new table: copies only the German customers into a new table:
SELECT CustomerName, ContactName INTO SELECT * INTO CustomersGermany
CustomersBackup2017 FROM Customers
FROM Customers; WHERE Country = 'Germany';

copies data from more than one table into a new


table:
SELECT Customers.CustomerName, Orders.OrderID
INTO CustomersOrderBackup2017
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID =
Orders.CustomerID;

The SQL INSERT INTO SELECT Statement


Copy all columns from one table to another table: Copy only some columns from one table into another
INSERT INTO table2 table:
SELECT * FROM table1 INSERT INTO table2 (column1, column2, column3, ...)
WHERE condition; SELECT column1, column2, column3, ...
FROM table1
WHERE condition;

DATABASES
CREATE cann be used to create schemas, The DROP DATABASE statement is used to drop an
tables, and domains, etc. existing SQL database.

CREATE DATABASE databasename; DROP DATABASE databasename;


is used to create a new table in a database. A copy of an existing table can also be created using
CREATE TABLE Persons ( CREATE TABLE.
PersonID int,
LastName varchar(255), The new table gets the same column definitions. All
FirstName varchar(255), columns or specific columns can be selected.
Address varchar(255),
City varchar(255) If you create a new table using an existing table, the new
); table will be filled with the existing values from the old
table.

CREATE TABLE new_table_name AS


SELECT column1, column2,...
FROM existing_table_name
WHERE ....;
ensures that the "ID", "LastName", and "FirstName" creates a UNIQUE constraint on the "ID" column when
columns will NOT accept NULL values when the the "Persons" table is created:
"Persons" table is created: CREATE TABLE Persons (
CREATE TABLE Persons ( ID int NOT NULL,
ID int NOT NULL, LastName varchar(255) NOT NULL,
LastName varchar(255) NOT NULL, FirstName varchar(255),
FirstName varchar(255) NOT NULL, Age int,
Age int UNIQUE (ID)
); );
creates a CHECK constraint on the "Age" column creates a FOREIGN KEY on the "PersonID" column when
when the "Persons" table is created. The CHECK the "Orders" table is created:
constraint ensures that the age of a person must be MYSQL:
18, or older: CREATE TABLE Orders (
MYSQL: OrderID int NOT NULL,
CREATE TABLE Persons ( OrderNumber int NOT NULL,
ID int NOT NULL, PersonID int,
LastName varchar(255) NOT NULL, PRIMARY KEY (OrderID),
FirstName varchar(255), FOREIGN KEY (PersonID) REFERENCES
Age int, Persons(PersonID)
CHECK (Age>=18) );
);
CREATE TABLE Orders (
SQL: OrderID int NOT NULL,
CREATE TABLE Persons ( OrderNumber int NOT NULL,
ID int NOT NULL, PersonID int,
LastName varchar(255) NOT NULL, PRIMARY KEY (OrderID),
FirstName varchar(255), CONSTRAINT FK_PersonOrder FOREIGN KEY
Age int CHECK (Age>=18) (PersonID)
); REFERENCES Persons(PersonID)
);
SQL Server: PRIMARY KEY:
CREATE TABLE Orders ( CREATE TABLE Persons (
OrderID int NOT NULL PRIMARY KEY, ID int NOT NULL,
OrderNumber int NOT NULL, LastName varchar(255) NOT NULL,
PersonID int FOREIGN KEY REFERENCES FirstName varchar(255),
Persons(PersonID) Age int,
); PRIMARY KEY (ID)
);

SQL Commands

• SQL command that lists the databases that have been created: SHOW
DATABASES;
• SQL command that checks which database is being used: SELECT
DATABASE();
• SHOW COLUMNS FROM TableName;

Is equivalent to : DESC TableName;

You might also like