You are on page 1of 86

CSE 412 Database Management

Lecture 9
SQL
Jia Zou
Arizona State University

1
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12

Main Ideas:

2
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12

Main Ideas:
1. We need first link nationkey (Supplier) to each supplying relationship
(PartSupp) by joining PartSupp with Supplier, and obtain a table, which we
temporarily call as PartSuppNationkey
PartSupp ps_partKey ps_suppkey …
PartSuppNationkey

Ps_PartKey Ps_SuppKey s_NationKey …


⨝ Ps_suppkey = s_suppkey
Supplier s_suppkey s_aationKey …

3
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12

Main Ideas:

2. The problem can be solved by a Self Join over PartSuppNationKey…Why?


PartSuppNationkey
Ps_PartKey Ps_SuppKey s_NationKey …

4
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12
Main Ideas:
2. The problem can be solved by a Self Join over PartSuppNationKey…Why?
PartSuppNationkey (PSN1)
Ps_PartKey Ps_SuppKey s_NationKey …

⨝ PSN1.ps_partkey = PSN2.ps_partkey
PartSuppNationkey (PSN2) ?
Ps_PartKey Ps_SuppKey s_NationKey …

5
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12
Main Ideas:
2. The problem can be solved by a Self Join over PartSuppNationKey…
PartSuppNationkey (PSN1)
Ps_PartKey Ps_SuppKey s_NationKey …

ps_part PSN1.ps_ PSN1.s_n PSN2.ps_s PSN2.s_na …


key suppKey ationKey uppKey tionKey
⨝ PSN1.ps_partkey = PSN2.ps_partkey
PartSuppNationkey (PSN2)
Ps_PartKey Ps_SuppKey s_NationKey …

6
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12
Main Ideas:
2. The problem can be solved by a Self Join over PartSuppNationKey…
PartSuppNationkey (PSN1)
Ps_PartKey Ps_SuppKey s_NationKey …
123 5 6 ps_part PSN1.ps_ PSN1.s_n PSN2.ps_s PSN2.s_na …
123 81 12 key suppKey ationKey uppKey tionKey
⨝ PSN1.ps_partkey = PSN2.ps_partkey
PartSuppNationkey (PSN2) 123 5 6 81 12

Ps_PartKey Ps_SuppKey s_NationKey … 123 81 12 5 6

123 5 6
7
123 81 12
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12
Main Ideas:
2. The problem can be solved by a Self Join over PartSuppNationKey…
PartSuppNationkey (PSN1)
A Selection on the output:
Ps_PartKey Ps_SuppKey s_NationKey …
σ PSN1.s_nationKey = 6 ∧ PSN2.s_nationKey = 12
will solve the problem!
⨝ PSN1.ps_partkey = PSN2.ps_partkey ps_part PSN1.ps_ PSN1.s_n PSN2.ps_s PSN2.s_na …
key suppKey ationKey uppKey tionKey
PartSuppNationkey (PSN2)
Ps_PartKey Ps_SuppKey s_NationKey …

8
9
Review
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries

10
Grouping
• SELECT … FROM … WHERE …
GROUP BY list_of_columns

• Example: compute average popularity for each age


group
SELECT age, AVG(pop)
FROM User
GROUP BY age

11
Having
• Used to filter groups based on the group properties (e.g., aggregate
values, GROUP BY column values)
• SELECT … FROM … WHERE … GROUP BY …
HAVING condition;

12
The difference between HAVING and WHERE
• 1. Use HAVING to filter groups based on the group properties (e.g.,
aggregate values, GROUP BY column values)
• 2. Use WHERE to filter tuples based on any attributes in the table(s) in
the FROM clause
• 3. If the attribute to be filtered happens to be the GROUP BY attribute,
we can use both HAVING and WHERE

13
Table subqueries
• Use query result as a table
• In set operations, FROM clauses, SELECT list, WHERE clause etc.
• A way to “nest” queries
• Example: names of users who poked others but never get poked
SELECT DISTINCT name
FROM User, ((SELECT uid1 AS uid FROM Poke) EXCEPT (SELECT uid2 AS uid
FROM Poke)) AS T
WHERE User.uid = T.uid;

14
IN subqueries
• x IN (subquery) checks if x is in the result of subquery
• Example: users at the same age as (some) Bart

15
Exists subqueries
• Exists (subquery) checks if the result of subquery is non-empty
• Example: users at the same age as (some) Bart

• This happens to be a correlated subquery—a subquery that references


tuple variables in surrounding queries
16
More examples
• Which users are the most popular?

17
Today’s Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries

18
More examples
• Which users are the most popular?

19
More examples
• Which users are not the least popular?
• SELECT *
FROM User
WHERE pop >= SOME (SELECT pop FROM User)

20
More examples
• Which part has the highest retail price? (Return the partkey)

21
More examples
• Which part has the highest retail price? (Return the partkey)

22
More examples
• Which part has the highest retail price? (Return the partkey)

It works but slow… Any better approach??

23
More examples
• Which part has the highest retail price? (Return the partkey)

Super fast, at milliseconds level

24
More examples
• Which part has the highest retail price? (Return the partkey)

This is called as Scalar subqueries!

25
Scalar subqueries
• A query that returns a single row can be used as a value in WHERE, SELECT,
etc.
• Example: users at the same age as Bart

• Runtime error if subquery returns more than one row


• Under what condition will this error never occur?
• What if the subquery returns no rows?
• The answer is treated as a special value NULL, and the comparison with NULL will fail
26
Scalar subqueries Example
• Returns me the names of customers whose account balance (c_acctbal) is
above average (larger than the average account balance of all customers.)

27
Scalar subqueries Example
• For each customer, returns his/her name and the difference between his
account balance and the average account balance of all customers .

CName AccountBal-AVGBal

28
Scalar subqueries Example
• For each customer, returns his/her name and the difference between his
account balance and the average account balance of all customers.

29
Select in WITH
• The basic value of SELECT in WITH is to break down complicated
queries into simpler parts. An example is:

30
More Examples
• For each customer, returns his/her name and the difference between
his account balance and the average account balance of all
customers.

31
Today’s Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages

32
INSERT
• Insert one row
• INSERT INTO Member VALUES (789, 'dps’);
• User 789 joins Dead Putting Society
• Insert the result of a query
• INSERT INTO Member
(SELECT uid, 'dps’
FROM User
WHERE uid NOT IN
(SELECT uid FROM Member WHERE gid = 'dps'));
• Everybody joins Dead Putting Society!

33
Update
• Example: User 142 changes name to “Barney”
• UPDATE User
SET name = 'Barney’
WHERE uid = 142;
• Example: We are all popular!
• UPDATE User
SET pop = (SELECT AVG(pop) FROM User);
• But won’t update of every row causes average pop to change?
ØSubquery is always computed over the old table

34
Delete
• Delete everything from a table
• DELETE FROM Member;
• Delete according to a WHERE condition
Example: User 789 leaves Dead Putting Society •
• DELETE FROM Member
WHERE uid = 789 AND gid = 'dps’;
• Example: Users under age 18 must be removed from United Nuclear
Workers
• DELETE FROM Member
WHERE uid IN
(SELECT uid FROM User WHERE age < 18)
AND gid = 'nuk';

35
Example

36
Import from csv file or text file

Much faster
than
executing a
large batch of
insertion
statements.
37
Review
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Modification Queries

38
More Practice
• Please return me the name of all suppliers who supply the
most number of parts

39
More Practice
• Please return me the name of all suppliers who supply the
most number of parts

• SubQuery 1:

WITH supplier_numParts(supplierName, numParts) AS


(SELECT s_name, count(distinct p_name) FROM Part, Supplier,
PartSupp WHERE p_partkey = ps_partkey AND s_suppkey =
ps_suppkey)

40
More Practice
• Please return me the name of all suppliers who supply the
most number of parts

• SubQuery 2
WITH supplier_numParts(supplierName, numParts) AS
(SELECT s_name, count(distinct p_name) FROM Part, Supplier,
PartSupp WHERE p_partkey = ps_partkey AND s_suppkey =
ps_suppkey)
maxNumParts(value) AS (SELECT MAX(numParts) FROM
supplier_numParts)

41
More Practice
• Please return me the name of all suppliers who supply the most number
of parts

• Final Query
WITH supplier_numParts(supplierName, numParts) AS
(SELECT s_name, count(distinct p_name) FROM Part, Supplier, PartSupp
WHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey)
maxNumParts(value) AS (SELECT MAX(numParts) FROM
supplier_numParts)
SELECT SupplierName
FROM supplier_numParts, maxNumParts
WHERE numParts = maxNumParts.value

42
More Practice
Please return me the name of all suppliers who supply the most number of parts

43
Intermediate and Advanced
SQL
Starting from Lecture 13 in Part 2 of the course

44
Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages
• Views
45
Views
• A view is like a “virtual” table
• Defined by a query, which describes how to compute the view contents on
the fly
• DBMS stores the view definition query instead of view contents
• Can be used in queries just like a regular table

46
Creating and dropping views
• Example: members of Jessica’s Circle
• CREATE VIEW JessicaCircle AS
SELECT * FROM User
WHERE uid IN (SELECT uid FROM Member
WHERE gid = 'jes’);
• Tables used in defining a view are called “base tables”
• User and Member above
• To drop a view
• DROP VIEW JessicaCircle;

47
Using views in queries
• Example: find the average popularity of members in Jessica’s Circle
• SELECT AVG(pop)
FROM JessicaCircle;
• To process the query, replace the reference to the view by its definition
• SELECT AVG(pop)
FROM (SELECT * FROM User
WHERE uid IN
(SELECT uid FROM Member WHERE gid = 'jes’))
AS JessicaCircle;
48
Why using views
• To hide data from users
• To hide complexity from users
• Logical data independence
• If applications deal with views, we can change the underlying schema without
affecting applications
• Recall physical data independence: change the physical organization of data
without affecting applications
• To provide a uniform interface for different implementations or
sources
ØReal database applications use tons of views
49
Modifying views
• Does it even make sense, since views are virtual?
• It does make sense if we want users to really see views as tables
• Goal: modify the base tables such that the modification would appear
to have been accomplished on the view

50
A simple case
• CREATE VIEW UserPop AS
SELECT uid, pop FROM User;

DELETE FROM UserPop WHERE uid = 123;

translates to:

DELETE FROM User WHERE uid = 123;

51
An impossible case
• CREATE VIEW PopularUser AS
SELECT uid, pop FROM User
WHERE pop >= 0.8;

INSERT INTO PopularUser


VALUES(987, 0.3);
• No matter what we do on User, the inserted row will not be in
PopularUser

52
A case with too many possibilities
• CREATE VIEW AveragePop(pop) AS
SELECT AVG(pop) FROM User;
• Note that you can rename columns in view definition

UPDATE AveragePop SET pop = 0.5;


• Set everybody’s pop to 0.5?
• Adjust everybody’s pop by the same amount?
• Just lower Jessica’s pop?

53
SQL92 updateable views
• More or less just single-table selection queries
• No join
• No aggregation
• No subqueries
• Arguably somewhat restrictive
• Still might get it wrong in some cases
• See the slide titled “An impossible case”
• Adding WITH CHECK OPTION to the end of the view definition will make
DBMS reject such modifications

54
Example

55
Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages
• Views
• Integrity Constraints
56
Primary Keys
• Single-column primary key:

• Multi-column primary key:

57
Foreign Key References
• Single-column reference:

• Multi-column reference:

58
Foreign Key References
• You can define what happens when the parent table is
modified:
• CASCADE (remove from the table)
• NO ACTION
• SET NULL
• SET DEFAULT

59
Foreign Key References
• Delete/update the enrollment information when a student is changed:

60
Value Constraints
• Ensure one-and-only-one value exists:

• Make sure a value is not null:

61
Example

62
General assertion
• CREATE ASSERTION assertion_name
CHECK assertion_condition;
• assertion_condition is checked for each modification that could potentially
violate it
• Example: Member.uid references User.uid
• CREATE ASSERTION MemberUserRefIntegrity
CHECK (NOT EXISTS
(SELECT * FROM Member
WHERE uid NOT IN
(SELECT uid FROM User)));
ØIn SQL3, but not all (perhaps no) DBMS supports it

63
Tuple- and Attribute- based Checks
• Associated with a single table
• Only checked when a tuple/attribute is inserted/updated
• Reject if condition evaluates to FALSE
• TRUE and UNKNOWN are fine
• Examples:
• CREATE TABLE User(... age INTEGER CHECK(age IS NULL OR age > 0), ...);
• CREATE TABLE Member
(uid INTEGER NOT NULL,
CHECK(uid IN (SELECT uid FROM User)), ...);
• Is it a referential integrity constraint?
• Not quite; not checked when User is modified

64
Example

CREATE TABLE products (


product_no integer,
name text,
price numeric,
CHECK (price > 0),
discounted_price numeric,
CHECK (discounted_price > 0),
CHECK (price > discounted_price) );

65
Integrity Constraints

https://www.pos
tgresql.org/docs/
12/ddl-
constraints.html

66
Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages
• Views
• Integrity Constraints
• Indexes

67
Indexes
• An index is an auxiliary persistent data structure
• Search tree (e.g., B+-tree), lookup table (e.g., hash table), etc.
Ø More on indexes later in this course!
• An index on R.A can speed up accesses of the form
• R.A = value
• R.A > value (sometimes; depending on the index type)
• An index on (R.A1, …, R.An) can speed up
• R.A1 = value1∧ … ∧ R.An = valuen
• (R.A1 , …, R.An) > (value1, …, valuen)(again depends)
Ø Ordering of index columns is important—is an index on (R.A, R.B)
equivalent to one on (R.B, R.A)?
Ø How about an index on R.A plus another on R.B?

68
Examples of using indexes
• SELECT * FROM User WHERE name = 'Bart’;
• Without an index on User.name: must scan the entire table if we store User as a flat
file of unordered rows
• With index: go “directly” to rows with name='Bart’
• SELECT * FROM User, Member
WHERE User.uid = Member.uid
AND Member.gid = 'jes’;
• With an index on Member.gid or (gid, uid): find relevant Member rows directly
• With an index on User.uid: for each relevant Member row, directly look up User rows
with matching uid
• Without it: for each Member row, scan the entire User table for matching uid
• Sorting could help

69
Creating and dropping indexes in SQL
CREATE [UNIQUE] INDEX indexname ON
tablename (columnname1,…,columnnamen);
• With UNIQUE, the DBMS will also enforce that
(columnname1,…,columnnamen) is a key of tablename
DROP INDEX indexname;

• Typically, the DBMS will automatically create indexes for PRIMARY KEY
and UNIQUE constraint declarations

70
Choosing indexes to create
• More indexes = better performance?
• Indexes take space
• Indexes need to be maintained when data is updated
• Indexes have one more level of indirection
ØOptimal index selection depends on both query and update workload
and the size of tables
• Automatic index selection is now featured in some commercial DBMS

71
Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages
• Views
• Integrity Constraints
• Indexes
• Discretionary Access Control

72
GRANT Command
• GRANT privileges ON object TO users [WITH GRANT OPTION]
• The following privileges can be specified:
• SELECT: Can read all columns (including those added later via ALTER TABLE command).
• INSERT(col-name): Can insert tuples with non-null or non-default values in this column.
• INSERT means same right with respect to all columns.
• Update (col-name): similar to INSERT
• DELETE: Can delete tuples.
• REFERENCES (col-name): Can define foreign keys (in other tables) that refer to this column.
• Object can be a table or a view
• User can be a user or a role of user
• If a user has a privilege with the GRANT OPTION, can pass privilege on to other
users (with or without passing on the GRANT OPTION).
• Only owner can execute CREATE, ALTER, and DROP.
Revoke Command
• Revoke privileges ON object FROM users [CASCADE]
• When a privilege is revoked from X with CASCADE is specified, , it is
also revoked from all users who got it solely from X.
Examples: GRANT and REVOKE of Privileges
• GRANT INSERT, SELECT ON Sailors TO Horatio
• Horatio can query Sailors or insert tuples into it.
• GRANT DELETE ON Sailors TO Yuppy WITH GRANT OPTION
• Yuppy can delete tuples, and also authorize others to do so.
• GRANT UPDATE (rating) ON Sailors TO Dustin
• Dustin can update (only) the rating field of Sailors tuples.
• GRANT SELECT ON ActiveSailors TO Guppy, Yuppy
• This does NOT allow the ‘uppies to query Sailors directly!
• REVOKE SELECT ON Sailors FROM Yuppy CASCADE;
• This will revoke the authorization for querying Sailors from Yuppy and all users who
got this privilege solely from Yuppy
Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages
• Views
• Integrity Constraints
• Indexes
• Discretionary Access Control
• Programming Interfaces

76
Working with SQL through an API
• E.g.: Python psycopg2, JDBC, ODBC (C/C++/VB)
• All based on the SQL/CLI (Call-Level Interface) standard
• The application program sends SQL commands to the DBMS at
runtime
• Responses/results are converted to objects in the application
program

77
Working with SQL through an API
https://pypi.org/project/psycopg2/

• E.g.: Python psycopg2, JDBC, ODBC (C/C++/VB)


• All based on the SQL/CLI (Call-Level Interface) standard
• The application program sends SQL commands to the DBMS at
runtime
• Responses/results are converted to objects in the application
program

78
Example API: Python psycopg2

79
More psycopg2 examples

80
Prepared statements: motivation

• Every time we send an SQL string to the DBMS, it must perform parsing,
semantic analysis, optimization, compilation, and finally execution
• A typical application issues many queries with a small number of patterns
(with different parameter values)
• Can we reduce this overhead?

81
Prepared statements: example

• The DBMS performs parsing, semantic analysis, optimization, and compilation


only once, when it “prepares” the statement
• At execution time, the DBMS only needs to check parameter types and validate
the compiled plan
• Most other API’s have better support for prepared statements than psycopg2
• E.g., they would provide a cur.prepare() method 82
“Exploits of a mom”

• The school probably had something like:


cur.execute("SELECT * FROM Students " + \ "WHERE (name = '" + name +
"')")
where name is a string input by user
• Called an SQL injection attack
83
SQL comments
• https://www.postgresql.org/docs/current/sql-syntax-
lexical.html#SQL-SYNTAX-COMMENTS

84
SQL Injection

85
Guarding against SQL injection
• Escape certain characters in a user input string, to ensure that it
remains a single string
• E.g., ', which would terminate a string in SQL, must be replaced by '' (two
single quotes in a row) within the input string
• Luckily, most APIs provide ways to “sanitize” input automatically (if
you use them properly)
• E.g., pass parameter values in psycopg2 through %s’s

afe fe
s Sa
Un = 'SELECT * FROM
sql_query sql_query = 'SELECT * FROM %s'
{}'.format(user_input) cur.execute(sql_query) cur.execute(sql_query,
(user_input,))
86

You might also like