Professional Documents
Culture Documents
Adbs 1
Adbs 1
Christos Tjortjis
2017
International Hellenic University
School of Science & Technology
MSc in Data Science, MSc in ICT Systems, MSc in Mobile &
Web Computing
Contact details
Christos Tjortjis
Assistant Professor in Decision Support & Knowledge
Discovery Systems
Director for the MSc in Data Science, MSc in ICT
Systems, MSc in Communications & Cyber Security
Office B1.4
email: c.tjortjis@ihu.edu.gr
Telephone: +30 2310 807576
Course aim
• The course aims to familiarise students with
contemporary database systems, as well as
emerging database technologies.
• It discusses basic aspects of DB techniques
and exposes a set of tools and technologies
that can be used along with “core” DB systems.
• Students are expected to engage in practical
database system design through coursework.
The emphasis is on general concepts and
theoretical foundations.
Learning outcomes
• Develop the logical model of a relational database.
• Use essential SQL tools to program DB systems.
• Understand advanced concepts of DM management
and architecture.
• Organize, store and process data efficiently, using
contemporary technologies such as Data Warehouses
and XML.
• Understand and apply various emerging technologies,
including Business Intelligence and Data Mining,
OLAP, Information Retrieval, and Search engines.
• Understand and utilise knowledge extracted from large
volumes of data.
Course content & organization
• ER model, relational model
• SQL
• Indexing
• Hashing
• Data warehousing and OLAP
• Data Mining
• Information Retrieval
• Web Search
• XML basic and advanced concepts
Assessment
• Homework
– 30%
• Submission deadline 24/1/18 at 5 p.m.
• Written final exam
– 70%
Reading
• Silberschatz A., Korth H., and Sudarshan S., (2011),
Database System Concepts, 6th Ed., McGraw-Hill.
– http://db-book.com/
• Ramakrishnan R, Gehrke J. (2002), Database
Management Systems, 3rd ed., McGraw-Hill.
– http://pages.cs.wisc.edu/~dbbook/
• Elmasri R., Navathe S. B., (2016), Fundamentals of
Database Systems, 7th Ed., Pearson.
– https://www.pearson.com/us/higher-education/program/Elmasri-Fundamentals-
of-Database-Systems-7th-Edition/PGM189052.html
– Class notes
Additional References
• Garcia-Molina H., Ullman J., and Widom J., (2009),
Database Systems: The Complete Book, 2nd ed., Pearson.
– http://infolab.stanford.edu/~ullman/dscb.html
Advanced Database Systems
Introduction
based on: Silberschatz A., Korth H., and Sudarshan S., (2011), Database System Concepts,
Christos Tjortjis
2017
International Hellenic University
School of Science & Technology
MSc in Data Science, MSc in ICT Systems, MSc in Mobile &
Web Computing
Database Management System
(DBMS)
• A DBMS contains information about a particular enterprise
– Collection of interrelated data
– Set of programs to access the data
– Environment that is convenient and efficient to use
• Database Applications:
– Banking: transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Online retailers: order tracking, customized recommendations
– Manufacturing: production, inventory, orders, supply chain
– Human resources: employee records, salaries, tax deductions
• Databases can be very large.
• Databases touch all aspects of our lives
University Database Example
Rows
Database
DB System Internals
DB Architecture
The architecture of a DB system is greatly
influenced by the underlying computer system
on which the DB is running:
• Centralized
• Client-server
• Parallel (multi-processor)
• Distributed
Client-server architectures
History of DB Systems
• 1950s and early 1960s:
– Data processing using magnetic tapes for storage
• Tapes provided only sequential access
– Punched cards for input
• Late 1960s and 1970s:
– Hard disks allowed direct access to data
– Network and hierarchical data models in widespread use
– Ted Codd defines the relational data model
• Would win the ACM Turing Award for this work
• IBM Research begins System R prototype
• UC Berkeley begins Ingres prototype
– High-performance (for the era) transaction processing
• 1980s:
– Research relational prototypes evolve into commercial
systems
• SQL becomes industrial standard
– Parallel and distributed DB systems
– Object-oriented DB systems
• 1990s:
– Large decision support and data-mining applications
– Large multi-terabyte data warehouses
– Emergence of Web commerce
• Early 2000s:
– XML and XQuery standards
– Automated DB administration
• Later 2000s:
– Giant data storage systems
• Google BigTable, Yahoo PNuts, Amazon, ..
Revision questions
1. List four applications you have used that most likely
employed a database system to store persistent data.
• If a dept. has >1 instructor, the building name and budget get repeated
multiple times.
– Updates to the building name and budget may get performed on some of the copies but not
others, resulting in an inconsistent state where it is not clear what is the actual building
name and budget of a dept.
• A dept. needs to have at least one instructor in order for building and budget
information to be included in the table.
– Nulls can be used when there is no instructor, but null values are rather difficult to handle.
• If all instructors in a dept. are deleted, the building and budget information
are also lost.
– Ideally, we would like to have the dept. information in the DB irrespective of whether it has
an associated instructor or not, without resorting to null values..
Advanced Database Systems
ER model
based on: Silberschatz A., Korth H., and Sudarshan S., (2011), Database System Concepts,
Christos Tjortjis
2017
International Hellenic University
School of Science & Technology
tuples
(or rows)
Attribute Types
• The set of allowed values for each attribute is
called the domain of the attribute
• Attribute values are (normally) required to be
atomic; that is, indivisible
• The special value null is a member of every
domain
• The null value causes complications in the
definition of many operations
Relation Schema and Instance
• A1, A2, …, An are attributes
• R = (A1, A2, …, An ) is a relation schema
Example:
instructor = (ID, name, dept_name, salary)
• Formally, given sets D1, D2, …. Dn a relation r is a
subset of
D1 x D2 x … x Dn
Thus, a relation is a set of n-tuples (a1, a2, …, an)
where each ai Di
The current values (relation instance) of a relation are
specified by a table
An element t of r is a tuple, represented by a row in a table
Relations are Unordered
Order of tuples is irrelevant (tuples may be stored in
an arbitrary order)
Example: instructor relation with unordered tuples
Database
• A database consists of multiple relations
• Information about an enterprise is broken up into parts
instructor, student, advisor
• Bad design:
univ (instructor -ID, name, dept_name, salary, student_Id, ..)
results in
– repetition of information (e.g., two students have the same
instructor)
– the need for null values (e.g., represent a student with no
advisor)
• Normalization theory deals with how to design “good”
relational schemas
Keys
• Let K R
• K is a superkey of R if values for K are sufficient to identify a
unique tuple of each possible relation r(R)
– Example: {ID} and {ID,name} are both superkeys of instructor.
• Superkey K is a candidate key if K is minimal
Example: {ID} is a candidate key for Instructor
• One of the candidate keys is selected to be the primary key.
– which one?
• Foreign key constraint: Value in one relation must appear in
another
– Referencing relation
– Referenced relation
Schema Diagram for University DB
Relational Query Languages
• Procedural vs. non-procedural, or declarative
• “Pure” languages:
– Relational algebra
– Tuple relational calculus
– Domain relational calculus
• Relational operators
Selection of tuples
Relation r
Select A and C
Projection
Π A, C (r)
Joining two relations – Cartesian
Product
Relations r, s:
r x s:
Union of two relations
• Relations r, s:
r s:
Set difference of two relations
• Relations r, s:
r – s:
Set Intersection of two relations
• Relation r, s:
• rs
Joining two relations – Natural Join
• Let r and s be relations on schemas R and S
respectively.
Then, the “natural join” of relations R and S is
a relation on schema R S obtained as
follows:
– Consider each pair of tuples tr from r and ts from s.
– If tr and ts have the same value on each of the
attributes in R S, add a tuple t to the result,
where
• t has the same value as tr on r
• t has the same value as ts on s
Natural Join Example
• Relations r, s:
Natural Join
r s
Figure in-2.1