Professional Documents
Culture Documents
Overview
Data modeling review
Relational algebra
SQL
From model to schema
Limitations
What am I not telling you about?
database normalization
object-based approaches to database design
object-relational mapping
.... too much more to mention ....
Ask questions!
Example Problem
A system to automate the tracking and documentation of plasmid construction
Terminology:
fragment: a length of double-stranded DNA
plasmid: a circular fragment
recipe: a series of manipulations of the DNA to produce a new plasmid
with cDNA of interest inserted
ACL: access control list
Needs:
Data processing -- convert raw data into results
Visualization -- a way to visualize the results
Data storage -- store the results (and perhaps the raw data)
Example problem
Data Modeling
The FIRST Step
Structured way to understand the data semantics
Independent of underlying platform
Way to communicate with team members (including users)
Excellent (minimal?) documentation
Example: ER Diagrams
ER Diagrams
Types of Databases
Flat-file
Hierarchical
Network
Relational
Object
Object-Relational
Flat-File Databases
No database-enforced (or provided) linkage between records
Excellent for small or special-purpose databases
Might include support for single or multiple indexes
Major feature: ease of use (Filemaker, Access)
Major drawback: scalability & flexibility
e.g.:
ndbm
Berkeley DB (Sleepycat DB)
vi, grep, sed
FileMaker
Access
Hierarchical Databases
Hierarchical Databases
Hierarchical Databases
Database provides explicit master-detail support
Ideal for many business applications
Restricted to a strict hierarchy
Queries down the hierarchy are very efficient
Any other queries are very expensive
e.g.
IMS
Networked Databases
Fragment and Gene have a many-to-many relationship
Not represented well by hierarchical databases
Networked Databases
Networked Databases
Based on set theory
Database provides explicit linkage support
Very significant design costs
Queries along the connection path are very efficient
Any other queries are very expensive
e.g.
CODASYL
Relational Databases
Foundation of most production databases
Based on relational calculus and relational algebra
Allows ad-hoc query capability across record types
Supports a standard query language (SQL)
Can support either hierarchical or network models
Attributes are limited to basic types
e.g.
SQLite
MySQL
Derby
Oracle
DB2
Relational Databases
Based on relational views (tables)
Associations are based on data values, not expressed linkages
All data is expressed in tables
Terminology:
Rows are called tuples
Columns (attributes) are of a common domain (type)
ER → Relational Schema
First, combine any entities with a one-to-one relationship
Next define tables for our entities:
Note that we've added a new attribute to serve as the primary key for each entity
ER → Relational Schema
Now define tables for relationships, adding attributes for the associations:
Relational Algebra: Selection
Selection
o Selection of tuples based on Boolean criteria
SQL - CREATE
Creates database objects (databases, tables, indices)
SYNOPSIS:
column_name1 data_type,
column_name2 data_type,
......
ON table_name (column_name)
SQL - CREATE
Examples:
CREATE TABLE "GENE"
(
ID char(16),
NAME varchar(20),
PROTEIN varchar,
START int,
);
RCP char(16),
FRAG char(16),
DATE date,
);
SQL - INSERT
Inserts data into a table row
SYNOPSIS:
INSERT INTO "tablename" (first_column,...last_column)
VALUES(first_value,...last_value);
Example:
INSERT INTO GENE (ID, NAME, PROTEIN, START)
[WHERE "conditions"]
[GROUP BY "column-list"]
[HAVING "conditions]
SELECT - Selection
Selection
Selection of tuples based on Boolean criteria
SELECT - Projection
Projection
o Selection of attributes
Where:
o join_predicate is an equality for an Equijoin, or a
comparison for any other join
SQL - INNER JOIN Examples
Examples:
R1|F1|1985-09-09|r1|r1,cr|scooter
R1|F2|1985-09-09|r1|r1,cr|scooter
R2|F3|1985-10-05|r2|r2.cr|ckw
R1|F1|1985-09-09|R1|r1|r1,cr|scooter
R1|F2|1985-09-09|R1|r1|r1,cr|scooter
R2|F3|1985-10-05|R2|r2|r2.cr|ckw
R1|F1|1985-09-09|r1|r1,cr|scooter
R1|F2|1985-09-09|r1|r1,cr|scooter
R2|F3|1985-10-05|r2|r2.cr|ckw
R1|F1|1985-09-09|R1|r1|r1,cr|scooter
R1|F2|1985-09-09|R1|r1|r1,cr|scooter
R2|F3|1985-10-05|R2|r2|r2.cr|ckw
SELECT * FROM RECIPE LEFT OUTER JOIN PRODUCES ON
PRODUCES.RCP=RECIPE.RCP;
R1|r1|r1,cr|scooter|R1|F1|1985-09-09
R1|r1|r1,cr|scooter|R1|F2|1985-09-09
R2|r2|r2.cr|ckw|R2|F3|1985-10-05
R3|r3|r3.cr|rst|||
Note we're again doing an implicit Equi-JOIN. The explicit syntax would
be:
CREATE TABLE TEMP2 AS
SQL - UPDATE
Updates data in a database
SYNOPSIS:
UPDATE tablename
SET columnname="newvalue"[,nextcolumn="newvalue2"...]
Example:
SQL - References
Good intro tutorial:
On-line SQL Tutorial
o http://www.sqlcourse.com/
On-line SQL Tutorial 2
o http://www.sqlcourse2.com/
Another good intro:
A Gentle Introduction to SQL
o http://sqlzoo.net/
A useful reference site:
W3 Schools SQL Tutorial
o http://www.w3schools.com/sql/
Also worth a look
Software carpentry lecture on Relational Databases
o http://plato.cgl.ucsf.edu/Outreach/bmi219/slides/swc/lec/
db.html
Object-Relational Databases
Essentially an extension of the
relational database model
Preserves the tabular (relational)
organization of the data
Allows developers to define more
complex data types (User Defined
Types, UDTs)
No support for encapsulation or
inheritance
Some support for methods is provided
(User Defined Functions, UDFs)
SQL object extensions already
standardized (SQL3)
e.g.
postgres
Oracle
Object Databases
Provides persistent storage of objects
Most useful in conjunction with object-based applications
Primarily a programmer's tool, although vendors are providing SQL3 and ODBC
interfaces
e.g.
Objectivity
Types of Databases
Questions?
Recommended Reading:
Date, C.J. An Introduction to Database Systems. Reading, Mass.:
Addison-Wesley (1981)
Codd, E.F. A Relational Model of Data for Large Shared Data Banks.
CACM 13, No. 6 (June 1970)
Uses of Databases
...or why do I [should you] care about this stuff?
Questions?
Database Access with Python
SQL provides a way to interact with a relational database...
... but how do I access my database programmatically?
Lots of ways, but we're going to discuss sqlite3
sqlite3:
Provides access to SQLite from python scripts
Simple (maybe too simple...)
Basic idea is to execute SQL commands and return the response as a
python list
Installed on plato
sqlite3 Example
#! /usr/bin/python
import sys
import sqlite3
try:
cursor = conn.cursor()
# Execute an SQL statement -- can be pretty much any SQL
rows = cursor.fetchall()
cursor.close()
conn.commit()
conn.close()
except sqlite3.Error, e:
sys.exit (1)
AMP, MAKK...
TET, MYAK...
NGF, MYAK...
import sys
import sqlite3
try:
cursor = conn.cursor()
cursor.execute("""
'ID' char(16),
'NAME' varchar(20),
'PROTEIN' longtext,
'START' int,
""")
cursor.execute("""
('G1','AMP','MAKK...',-5)
""")
cursor.execute("""
('G2','TET','MYAK...',-10)
""")
cursor.execute("""
('G3','NGF','MYAK...',-1)
""")
while (1):
row = cursor.fetchone ()
if row == None:
break
rows = cursor.fetchall ()
cursor.close()
conn.commit ()
conn.close()
except sqlite3.Error, e:
sys.exit (1)
SQLite3 Use
sqlite3 provides a good low-level interface
For most uses, probably want to wrap low-level SQL commands in Python objects
In the above example, a GENE might be an object
Might have methods to fetch (SELECT) or save (INSERT) a GENE
Provides some insulation from underlying SQL implementation