Professional Documents
Culture Documents
CS 564
Lecture #1
1
Yes. This is the Room for CS 564
• We moved from Humanities 1111
• All future lectures/discussions will be in this
room
• Please sit a bit closer to the screen, so that I
don’t have to shout
2
A Bit about Myself
Born in Vietnam
Grew up in a fishing village
as HaiAn Doan
3
Vietnam Hungary US
High school in Vietnam
Undergrad in Hungary
– had lot of beers
– learned seven languages
– Hungarian, English, C, C++, Ada, Pascal, PL/I
4
Wisconsin Seattle Illinois Wisconsin
Masters at Wisconsin-Milwaukee
Ph.D. at Washington-Seattle
– where I failed to take “CS 564”
5
Random Comments from Students
• Take instruction seriously, … gave lots of really
excellent dating advice
• All in-class examples revolve around beer
• His accent is very annoying …
• His accent is great. It’s so hard to understand
that I’m forced to concentrate in lectures …
• His accent is a bonus feature of the class.
Prepared me to work in Silicon Valley
• I now love databases …When I own Oracle, I
will pay you back.
6
What is this Course about?
• Numerous applications must deal with a lot of
data
App 1
DB 2 DB 1
App 2
DB 3
8
Questions
• What form should the data be in?
– way back in 1970s, people suggest to store data in
tables
– so each database is a set of tables
Students
ID First Name Last Name
1 Barack Obama
2 George Bush
Addresses
ID City State
1 Washington DC Washington DC
2 Dallas TX
9
Questions
• What form should the data be in?
– each table can be thought of as a relation in the
mathematical sense
– so such a database is referred to as a relational DB
Students
ID First Name Last Name
1 Barack Obama
2 George Bush
Addresses
ID City State
1 Washington DC Washington DC
2 Dallas TX
10
So the management system is called
a relational database management
system (or RDBMS for short)
App 1
DB 2 DB 1
App 2
DB 3
11
Since the 1970s, RDBMSs have been
studied intensively, and have taken
over the world
• It is now a corner stone of the modern world
• Powering virtually all data-intensive apps
• 20B industry
• Bought island in Hawaii
• Since then new types of data have emerged
– that would not be very well suited to be modeled as
tables
12
• New types of database management systems
have also emerged
– eg NoSQL systems
• But RDBMSs remain foundational and
pervasive, and will be so in the future
• This class focuses on RDBMSs
– we will learn how to design a relational database
– how to store it in an RDBMS
– how to use an RDBMS
– look into the internals of RDBMS
13
• Lessons that you learn in this class will carry
over to newer types of database management
systems
• You will learn fundamentals of managing a
large amount of data
– critical as the world is becoming increasingly data
centric
• Good for you when you go applying for a job
– many jobs require knowing how to use RDBMSs
• It’s fun
14
• If you are interested in more data managment
stuff
– CS 764: gory details about RDBMSs
– CS 784: newer types of data and how to manage them
(beyond RDBMSs)
15
Course Logistics
16
Prerequisite
• Must have data structure and algorithm background
– CS 367 is a must; CS 537 might be useful
• For the project
– lot of programming will be required
– in a high-level language of your own choosing (or rather your
team’s choosing)
– could be Java, C, C++, Perl, Python, etc.
– must know how to build a Web based application or be willing to
learn
17
Textbook
– There is no ideal textbook, unfortunately
– Database Management Systems, by R. Ramakrishnan
and J. Gehrke, third edition
– Database Systems: The Complete Book, by Garcia-
Molina, Ullman and Widom, second edition
18
Course Format
• For all students
– two 75-min lectures / week
– project: programming, 4-5 stages, may include some
basic homework questions
– a midterm and a final exam
• Attending lectures on Wed/Fri is important
• We also use the Mon slots occasionally for
make-up lectures
• So if you can’t make Monday 2:25-3:15, do not
take the class
19
• In fact, for next week I’m traveling on W and
F
• So we will have a make-up lecture on Monday,
Jan 26
20
Lectures
• Lecture slides in ppt format will be posted
shortly before or after the lecture
– are to complement the lectures
• Many issues discussed in the lectures will be
covered in the exams
– hence try to attend lectures regularly
• Will not cover ALL materials on the slides
– attending lectures will tell you which is covered and
which is not
21
Project
• Select an application that needs a database
• Build a database application from start to
finish
• Significant amount of programming
• Will be done in stages
– you will submit some work at the end of each stage
• May have to show a demo at semester end
22
Project Groups
• Project will be done in group of 3-4 students
– a lot of work, difficult to design so that
one person can do all
– learn how to work in a group: valuable skills
– groups are like broccoli, they are good for you
• Try to form groups as soon as possible
– can start by posting requests on Piazza
• There will be a deadline later for forming groups
• If you have not formed groups by then
– we will help assign you to groups
23
More on Grouping
• All group members receive same grading
• If someone drops out, the rest pick up the
work
24
Exams
• Midterm & final
– will be announced shortly
– check dates and make sure no conflict!
• There may be some brief review before each
exam
• If you have conflicts
– do let us know in advance
• The Uncle problem
25
Tentative Grading Breakdown
• Midterm: 25%
• Final: 35%
• Project: 40%
• Will attempt to grade on an absolute scale as
much as possible
– not on a curve
26
Contacting the staff ...
27
Staff & Office Hours
• Instructor: AnHai Doan
• TAs:
– Avinaash Gupta
– Harneet Singh
• See class homepage for office hours, contact
information
28
Communications
• class homepage
– www.cs.wisc.edu/~anhai/courses/564-sp15
• mailing list: compsci564-1-s15@lists.wisc.edu
– vitally important!
– make sure to check it regularly for new announcements
• Piazza: will be set up shortly
• If you have a question/problem
– talk to people in your group first
– post your question on Piazza
– email TA
– go to office hours to talk to TA or instructor
29
Now onto database studies ...
30
At the Beginning
• A program typically consists of code + data
• Eg, need to sort 1000 numbers
– 2, 4, 6, 8, 1, 13, 9, ...
• Store these numbers in an array
• Write some code to sort
32
An Illustration
App 1
DB 2 DB 1
App 2
DB 3
33
Another Motivating Example
• Suppose we want to store, manipulate, and
query information about:
– students
– courses
– professors
– who takes what, who teaches what
34
Application Requirements
• store the data for a long period of time
– large amounts (100s of GB)
– protect against crashes
– protect against unauthorized use
• allow users to query/update:
– who teaches “CS 367”
– enroll “Mary” in “CS 564”
35
• allow several (100s, 1000s) users to access the
data simultaneously
• allow administrators to change the schema
– add information about TAs
36
Trying Without a DBMS
• Why Direct Implementation Won’t Work:
• Storing data: file system is limited
– size less than 4GB (on 32 bits machines)
– when system crashes we may loose data
– password-based authorization insufficient
• Query/update:
– need to write a new C++/Java program for every new
query
– need to worry about performance
37
• Concurrency: limited protection
– need to worry about interfering with other users
– need to offer different views to different users (e.g.
registrar, students, professors)
• Schema change:
– entails changing file formats
– need to rewrite virtually all applications
38
What Can a DBMS Do for Us?
• Data Definition Language - DDL
• Data Manipulation Language - DML
– query language
• Storage management
• Transaction Management
– concurrency control
– recovery
quarter
Advises
Teaches
Professor
Students: Takes:
SSN Name Category SSN CID
123-45-6789 Charles undergrad 123-45-6789 CSE444
234-56-7890 Dan grad 123-45-6789 CSE444
… … 234-56-7890 CSE142
Courses: …
CID Name Quarter
CSE444 Databases fall
CSE541 Operating systems winter
select C.name
from Students S, Takes T, Courses C
where S.name = “Mary” and
S.ssn = T.ssn and T.cid = C.cid
44
Query Optimization
Goal:
Declarative SQL query Imperative query execution plan:
sname
select C.name
from Students S, Takes T, Courses C
where S.name=“Mary” and cid=cid
name=“Mary”
47