You are on page 1of 15


By the awesome people in Jenna’s tutorials
These notes are merged from multiple groups summarizing the same chapters
Topics missing from this booklet
Week 1: Introduction to Databases and Transactions

SEMESTER 1, 2014

key Key Constraints – if for a particular participant entity type.relationship attributes and their types .thick line) Cardinality Constraints – Generalisation of key and participation constraints. Facilitate planning. relationship (E-R model) – describes relationship between entities. The primary key of a weak entity type is formed by the primary key of the strong entity type(s) on which the weak entity type is existence dependent. event.ship Set) – set of similar relationships.Partial (the default) – An entity need not belong to one of the lower level entity set .A technique for understanding and capturing business information requirement graphically. account 4711 Entity type (set) is a collection of entities that share common properties or characteristics eg : student. relationship name . Weak entities – An entity type that does not have a primary key eg child from parents payment of loan. unit COMP5138. courses. (representation in E-R diagram . A cardinality constraint for the participation of an entity set E in a relationship R specifies how often an entity of set E participates in R at least (minimum cardinality) and at most (maximum cardinality). number of entities is also known as the degree of the relationship eg – John is enrolled in INFO2120 Relationship type (R.Covering Constraints – total : an entity must belong to one of the lower level entity sets . Constraints On ISA Hierarchies .Week 2: Conceptual DB Design (ER diagrams) Conceptual design . eg Student (entity type) related to UnitOfStudy(entity type) by EnrolledIn (relationship type) Distinction – Relation (relational model) –set of tuples. account. plus the weak entity type�s discriminator. each entity participates in at most one relationship. or concept about which you ant gather and store data. the corresponding role is a key of relationship type eg employee role is unique in workIn Participation constraint – if every entity participates in at least one relationship participation constraint holds .A participation constraint of entity type E having role ρ in relationship type R states that for e in E there is an r in R such that ρ(r) = e. It must be distinguishable from other entities. (rectangle represent) NOTE: entity sets need not to be disjoint Attribute describes one aspect of an entity type eg people have name and address Relationships Relationship – relates two or more entities.role names . object. Eg John Doe. place. both entity sets and relationship sets (E-R model) may be represented as relations (in the relational model) Schema of relationship types - The combination of the primary keys of the participating entity types forms a super key of a relationship Relationship Set Schema .Overlap Constraints – disjoint : an entity can belong to only one lower-level entity set Overlapping : an entity can belong to more than one lower level entity set . operation and maintenance of various data resources Entities Entity – a person.

i.Week 3: The Relational Data Model (NULLs. simple attributes map directly the relation. it is possible to insert a row where every attributes has the same value as an existing row. two-dimensional table of data -> consists of rows (records) and columns (attribute or field). Relation instance A relation R has a relation schema: specifies name of relation and name and date type of each attribute. Non-null columns One domain constraint is to insist that no value in a given column can be null. Integrity constraints It is condition that must be true for any instance of the database. Syntax: create view name as <query expression>. FOREIGN KEY (lecturer) references Lecturer (empid). Relation schema vs. A legal instance of relation is one that satisfies all specified ICs. using a given data model.e. Relational data model is most widely used model today Definition of Relation: A relation is a named. A relation instance: a set of tuples (table) for a schema Creating and Deleting Relations in SQL  Create of table (relations): create table name (list of columns)  Deletion of table (relation): Drop table name Base Data types of SQL SMALLINT/INTEGER/BIGIT integer values DECIMAL/NUMERIC Fixed-point number FLOAT/REAL Floating point number with precision p CHAR/VHARCHAR/CLOB alphanumerical character string types Null ‘value’ RDBMS allows special entry NULL in a column to represent facts that are not relevant. Schema Data model: a collection of concepts for describing data Schema: a description of a particular collection of data at some abstraction level. Data manipulation: powerful operators to manipulate the data stored in relations Data integrity: facilities to specify a variety of rules to maintain the integrity of data when it is manipulated. Mapping of relationship type Many to many – create a new relation with the primary keys of the two entity type as its primary key One to many – primary keys on the one side becomes a foreign key on many side One to one – primary key on the mandatory side becomes a foreign key on the optional side Relationship – becomes fields of either the dependent.a simple and consistent structure. respectively new relation Relational views A view is a virtual relation. CONSTRINT Student_PK primary key (Sid). or not yet known. Mapping E_R diagrams into relations Each entity type becomes a relation. Relational key Primary keys are unique. column=expression} Deleting of tuples from a table: delete from table [where search_codition] Relational database Data structure: a relational database is a set of relations with tuples and fields.e. referential integrity) The relation data model of data is based on the mathematical concept of Relation. In SQL-based RDBMS. PRO: NULL is useful because using an ordinary value with special meaning does not always work. minimal identifiers in a relation. <query expression> is any legal query expression (can even combine multiple relations) . Con: NULL causes complications in the definition of mane operations Modifying relations using SQL Insert of new data into a table: insert into table (list of columns) values (list of expression) Updating of tuples in a table: update table set column = expression {. The strength of the relational approach to data management comes from its simple way of structuring data. Foreign keys are identifies that enable a dependent relation to refer to its parent relation i. Data Model vs. keys. Composite attributes are flattened out by creating a separate field for each component attribute Weak Entity type Become a separate relation with a foreign key taken from the superior entity.

) 5. o Example: R ∪ S o Definition: R U S = { t | t ∈ R ∨ t ∈ S }  Intersection ( ∩ ) tuples in relation 1. R >< S = S >< R Association rule 1. A schema-level ‘rename’ operation  Rename ( ρ ) allows us to rename one field to another name. o Example: R ∩ S o Definition: R ∩ S = { t | t ∈ R ∧ t ∈ S }  Difference ( . πA( σp( R ) ) = σp( πA ( R ) ) 2. but only one copy of fields for which equality is specified. Operations that combine tuples from two relations  Cross-product ( X ) allows us to fully combine two relations. σP ( R ∪ S ) = σP ( R ) ∪ σP ( S ) 3. Rename ( ρ ) Additional (derived) operations:  Intersection. o Example: ρ Classlist(2-> cid. Selection (σ) 6. o Example: ∏ name. in (A ∩ B) 5. σp1 (σp2 ( R )) = σp1 ∧ p2 ( R ) Distribution rules 1. Also called the Cartesian product  Join ( ><) to combine matching tuples from two relations. i. but VERY useful  Cf. R >< ( S ∪ T ) = ( R S ) ∪ ( R >< T ) . Union (U) 4.Relational Algebra 1.S = { t | t ∈ R ∧ t ∉ S } Important: R and S have the same schema R and S have the same arity (same number of fields) Corresponding fields must have the same names and domains 2. πA( πB ( R ) ) = πA( R ) if A ⊆ B 2. o Example: σ country=‘AUS’ (Student)  Projection ( π ) deletes unwanted columns from relation. o Example: R – S o Definition: R . πA. Cross-product( X ) 3. join division: o Not essential. but not in relation 2. Join Composition and equivalence rules Commutation rules 1.Week 4A: Introduction to Declarative Querying . Operations that remove parts of a relation  Selection ( σ ) selects a subset of rows from relation.e. AND in relation 2. σP ( R >< S ) = σP (R) >< S if P only references R 4.B(R >< S ) = πA (R) >< πB ( S ) if join-attr.) tuples in relation 1. Set Operations  Union ( ∪ ) tuples in relation 1 OR in relation 2. πA ( R ∪ S ) = πA( R ) ∪ πA ( S ) 2. 4. Set Difference ( . Projection (π) 2. 4-> uos_code) ( Enrolled X UnitOfStudy ) Six basic operations We can distinguish between basic and derived RA operations 1. R >< (S >< T) = (R >< S) >< T Idempotence rules 1. country (Student) 3. o family_name=last_name Lecturer  Natural Join (><) Equijoin on all common fields o Example: R><S o Result schema similar to cross-product.

TIME. or alter the relation schema specifiy integrity constraints.BY Indicate the categorization of tuples HAVING . <. DELETE FROM DCL (Data Control Language) . NULL/not NULL constraints DML (Data Manipulation Language) . != . DDL (Data Definition Language) . like administering privileges and users SELECT . <= . intersect.-. intersect and except. PK.Indicate the conditions to include a category ORDER BY . || = concatenate GROUP . lists the relations involved in the query. FALSE if a condition does not hold.Control the DB. and UNKNOWN if a comparison includes a NULL The use of three-valued logic is needed because of possible NULL values in databases and because a logical condition to be decidable needs all values to be known.Indicate the conditions to include a tuple in the result comparison operators: = . _ = any character. and NOT BETWEEN allows a range query. >= . * = all. FK. drop. <=.Sorts the result according to specified criterial. AS renames relations and attributes WHERE .Lists the columns (and expressions) that should be returned from the query DISTINCT removes duplicates.INTERVAL 2. 4.*. UPDATE. LIKE used for string matching. TIMESTAMP. < .Query. >. DESC Date and Time: 4 Types: DATE. > . can have +. INTERVAL.Indicate the table(s) from which data will be obtained.DATE string. +/. Can use CURRENT_DATE. Set operator: The set operations union.Create. insert. % = any substring. CURRRENT_TIME as constraints. ASC (default). INNER JOIN and OUTER JOIN R NATURAL JOIN S R INNER JOIN S ON <join condition> R INNER JOIN S USING (<list of attributes>) R LEFT OUTER JOIN S R RIGHT OUTER JOIN S R FULL OUTER JOIN S 3. delete and modify information in the DB INSERT INTO. >= Main Operations:EXTRACT( component FROM date ). Combine with AND.Week 4B: Introduction to SQL (+ Joins) 1. NULL Value and Three Valued logic: Three-valued logic uses three different result values for logical expressions: TRUE if a condition holds. Join: You can join two or more table using the attribute conditions Type of join: NATURAL JOIN. and except (Oracle: minus) operate on relations and correspond to the relational algebra operations union. Example: (select customer_name from depositor) union (select customer_name from borrower) . normal time-order comparisons apply:=. OR. <>./ as arithmetic operators FROM .

sid AND uos_code = 'INFO2120' ) Grouping   A group is a set of tuples that have the same value for all attributes in grouping list NOTE : an attribute in the SELECT clause must be in the GROUP BY clause as well SYNTAX – it must follow this order SELECT target-list FROM relation-list WHERE qualification GROUP BY grouping-list HAVING group-qualification EXAMPLE – What was the average mark of each course? SELECT uos_code as unit_of_study . and evaluate to true if v is one of the elements in V EXISTS (Correlated sub query)  Used to check whether the results of a correlated nested query is empty(contains no tuples) or not The following checks for each student S whether there is at least one entry in the Enrolled table for that student in INFO2120: SELECT sid. …an.sid = S. and Relational Division Nested Subqueries   A sub query is a SELECT-FROM-WHERE expression that is nested within another query Common use : set membership . is the set of all tuples <a> such that for every tuple <b> in S. with attributes a1.sid FROM Enrolled E WHERE E. b1.uos_code = 'INFO2120' ) SELECT sid. there is an <a. name FROM Student S WHERE EXISTS ( SELECT * FROM Enrolled E WHERE E. set comparisons and set cardinality Non-correlated sub queries  Don’t depend on data from the outer query  Execute once for the entire outer query Correlated sub queries  Make use of data from the outer query  Execute once for each row of the outer query  Can use the EXISTS operator IN ( Non-correlated sub query)  A comparison operation that compares a value v with a set/multi-set of values V. …bm) S (b1 …bm) R/S. … an.b> tuple in R It is not an essential operator: just a useful shorthand . name FROM Student WHERE sid IN ( SELECT E. AVG (mark) FROM Assessment GROUP BY uos_code Relational Division Definition     R (a1. Grouping.Week 5: Nested Subqueries.

D ) with FDs: {A -> B D and B -> C}. Example: R ( A.     First normal form (“1NF”): domains of all attributes are atomic Second normal form (“2NF”): 1NF + no partial dependencies Third normal form (“3NF”): 2NF + no transitive dependencies BCNF: the only non-trivial FDs that hold are key constraints Table Decomposition A decomposition of R consists of replacing R by two or more relations such that: Each new relation scheme contains a subset of the attributes of R (and no attributes that do not appear in R). From the Attribute Closure to Keys: The set of Functional Dependencies can be used to find candidate keys. B. If you know the FDs. Candidate Key: Main Idea -> only allow FDs of form of a key constraint. C. A superkey is a column/set that includes a candidate key. given typical queries). Candidate Key Identification: Identifying all FDs that hold on our data set | Then reasoning over those FDs using a set of rules to on how we can combine FDs to infer candidate keys | Or alternatively. Each non-key field is functionally dependent on every candidate key. Deletion anomaly. every attribute of R appears as an attribute of one of the new relations. Overall Design Process: Consider a proposed schema | Find out application domain properties expressed as functional dependencies | See whether every relation is in BCNF | If not. Schema Normalisation (“SN”): Only allow FDs of the form of key constraints. seek to avoid redundancy in the data – same information repeated in several places. FDs help us to identify candidate keys. you can check whether a column (or set) is a key for the relation. SN is the process of validating and improving a logical design so that it satisfies certain constraints (Normal Forms) that avoid unnecessary duplication of data. Functional Dependencies and Normal Forms Functional Dependency (“FD”): the value of one attribute (the determinant) determines the value of another attribute. (Dependency preserving does not imply lossless join & vice-versa!) Must consider whether all FDs are preserved. and all new relations differ. X → Y means “X functionally determines Y” and “Y is functionally dependent on X”. use a bad FD to decompose one of the relations. From FDs to Keys: Candidate keys are defined by functional dependencies | Consequently. Insertion anomaly. using these FDs top verify whether a given set of attributes is a candidate key or not. Choose one candidate key as the primary key. start with partial dependencies (Replace the original relation by its decomposed tables) | Repeat the above. Making it Precise It is essential that all decompositions used to deal with redundancy be lossless! Dependency-preserving: If R is decomposed into S and T. Update. then all FDs that were given to hold on R must also hold on S and/or T. There may be several candidate keys. . Redundancy is at the root of several problems associated with relational schemas: Redundant storage. If a dependency-preserving decomposition into BCNF is not possible (or unsuitable.Week 6: Schema Normalization (including BCNF) Motivation Most important requirement of DB Design is adequacy – every important process can be done using the data in the database. until you find that every relation is in BCNF. If a design is adequate. should consider decomposition into 3NF.

Static integrity constraint is a condition that every legal instance of a database must satisfy. Integrity constraints are conditions that must be satisfied for every instance of the database. There are two types of access control namely authentication and authorization. .Week 7: Database Security and Integrity (+ Triggers) Every database security needs to be managed at some level that’s why there is database access control. Integrity constraints are specified in the database schema and are checked when the database is modified. Let’s say you have a database containing many Varchar data types and you don’t want to rewrite the same thing again and again then you can use domain constraint to create a varchar which will be available to all the tables in the database and a check will be made to verify that it is within limit. One example of semantic integrity constraint is the UNIQUE keyword on the student ID. DEFRERING constraint let the transaction be completed first then check the constraint and NON-DEFERABBLE check the constraint immediately afterwards every time the database is modified after the database gets modified. ASSERTIONs are schema objects and are static integrity constraints that will make the database always satisfy a condition.g. E.g. Why are all these measures taken? To protect the private data of an individual. There has been an introduction of semantic integrity constraint so that there are no losses of data consistency when changes are done to the database. CREATE TRIGGER AFTER/BEFORE insert OR update OF tuple on tablename BEGIN action END. CREATE ASSERTION checksid CHECK (select count (Sid) <=100) to check that the number of students must not exceed 100. E. Authorization on the other hand can make the owner of the database give some rights to other people on their tables and views using the syntax GRANT event on tablename to personname. There are 2 types of integrity constraints namely static integrity constraint and dynamic integrity constraint. CREATE DOMAIN domain name check (value in ()). Authentication make use of logins and passwords to make sure the person who tries to login is really the owner of the database. some examples of the static integrity constraints are domain constraints. Trigger is a statement that automatically fires if some specific modifications occur on the database. key constraints and assertions. We can add some constraints on the database like “ON DELETE NO ACTION” so that if a parent table’s tuple is deleted the child table tuple’s is not deleted. CREATE TABLE student { Sid INTEGER PRIMARY KEY name varchar}. triggers.g. And revoke access using the syntax: REVOKE event on tablename from personname. Event: insert/ delete/ select /update Now we can generate some views and grant or revoke access to some people. The dynamic integrity constraint is a condition that a legal database state change must satisfy e. If the conditions are not satisfied then the integrity constraint will abort the transaction. E.g. One example of the dynamic integrity constraint is the trigger.

New connections take some time. Stored Procedures are when application logic is run from within the database server. etc. $variable) NULL PHP supports NULL by default isset($var) checks if var exists and is not NULL empty($var) if var exists and has a non-empty.returns the exception message .$params] ). Exception Handling . or create an API to call SQL commands (Call-level interface) PHP – scripting language for dynamic websites that is embedded into HTML Variables: begin with $. Three different ways of executing SQL statements: semistatic(PDO::query(sql)). Establishing database connection.PDOException::getCode() .Never show Database errors to end user. Executing SQL statements. processing logic. parameterized (PDO::prepare(sql)). can either embed SQL in language (Statementlevel Interface).Week 8: DB Application Development Database Application Architectures -Data-intensive systems: Three types of functionality .PDOException::getMessage() . for this reason dynamic queries are a better choice to avoid this. when creating new connection. PDO is DBMS independent.presentation logic.g. but can be declared without giving a type Strings: double quotes replace substrings with variables. fewer locks that are being held for long periods. . associative arrays are paired PDO – PHP Data Objects. Non-interaction SQL refers to SQL statements included in an application program Client-side Database Application Development To integrate SQL with host language (e. $userid. There are many advantages to stored procedures: improved maintainability. or 3 tiered depending on presence of client. DB server and web/application server -Interactive SQL refers to SQL statements input directly to the terminal.1. or immediately run (PDO::exec(sql)) Placeholders: Anonymous placeholders are represented as a ? inside a query and linked using $stmt->bindValue(1. Five problems with interfacing with SQL: 1. Java. nonzero value Error Handling . $variable) the ‘1’ represents the first ? in the query Named placeholders are represented in the query using the format :name and are linked using $stmt>bindValue(‘:name’. a. reduced data transfer. 2. extension to PHP that provides a database abstraction layer (used to connect PHP and database). DBMS outputs to screen. $conn = new PDO( DSN. single quotes do not Arrays: numeric arrays are indexed 0. value must belong to a class. need to insert DBMS prefix b. so should only be done once in a program 2. Often when the user specifies a static query in their code this holds potential for an SQL injection attack. data management -System architectures can be 1.returns the exception code SQL Injection attacks most frequently occur when an unauthorised user exploits the unchecked user input or buffer overflows in the database. abstraction layer (programmers need not know the schema). $passwd [. C).

repeated reads of same record must return same value. which reflects a discrete unit of work. Transactions must release locks once complete and cannot request additional locks afterwards. All queries are performed on this new version and then applied to the old version.Week 9: Transaction Management (ACID.Data can only be read but is shared. A serializable execution guarantees correctness in that it moves a database from one consistent state to another consistent state. Versioning / Snapshot Isolation => A new version of the items (snapshot) being accessed are created on update. o X (exclusive) lock – Data can be only read and write by one transaction. o Repeatable read. o Serializable. serialisability) Transaction – a collection of one or more operations on one or more databases. Concurrency control is the protocol that manages simultaneous operations against a database so that serializability is assured. Be careful of deadlocks. cycle of transactions waiting for locks to be released by each other. There are different levels of serialization and different databases require different levels.Only committed records can be read. From lowest to highest: o Read uncommitted. Locking Protocol => Two-phase Locking Protocol (2PL) – A transaction must obtain either: o S (shared) lock .Only committed records can be read but successive reads of record may return different values. Thus it follows ACID and fulfils the consistency component. Means that all transactions are serialized and follow ACID. Basically – Each transaction preserves database consistency. . each transaction must be unaware of other concurrently executing transactions Durability: The effect of a transaction on the database state should not be lost once the transaction has committed Commit: if the transaction successfully completes Abort: if transaction does not successfully complete Database is consistent if all static integrity constraints are satisfied A sequence of database operations is serializable if it is equivalent to a serial execution of the involved transactions. Doesn’t mean a transaction is always 100% serializable. Most used level in practice.Default according to SQL-standard. o Read committed. Transaction does:  Return information from database  Update the database to reflect the occurrence of a real world event  Cause the occurrence of a real world event ACID Properties: Atomicity: Transaction should either complete or have no effect at all Consistency: Execution of a transaction in isolation preserves the consistency of the database Isolation: Although multiple transactions may execute concurrently.Uncommitted records may be read.

Each relation is a set of records.1.Week 10: Indexing and Tuning A database is a collection of relations.Ordered index : search keys are stored in sorted order structures to organize records via trees or hashing 1. clustered (main) index on primary key Unclustered (secondary) index    index entries and rows are not ordered in the same way There can be many unclustered indices on a table Unclustered isn’t ever as good as clustered. In SQL.2. range queries and prefix searches  Hash-based Indexes o Fast for equality searches  Special Indexes o Such as Bitmap Indexes for OLAP or R-Tree for spatial databases . only indexes to support point queries. Primary index: Secondary index: index whose search key specifies the sequential An index whose structure is separated from the order of file . Also called main index or integrated data file and whose search key typically specifies index an order different from the sequential order of the file.1.1. but may be necessary for attributes other than the primary key Types of Indexes:  Tree-based Indexes:B +-Tree o Very flexible. Two examples: 1. A record is a sequence of attributes.Hash index : search keys are distributed uniformly across ’buckets’ using hash function An index is an access path to efficiently locate row(s) via search key fields without having to scan the entire table.1. 1. Indexes . index is: CREATE INDEX name ON relation-name (<attributelist>) Clustered index    Good for range searches over a range of search key values index entries and rows are ordered in the same way There can be at most one clustered index on a table CREATE TABLE generally creates an integrated.

Traditionally OLAP query data collected in its OLTP system but newer applications such as Internet companies prefer gathering data that it needs and potentially even purchasing them.Issues in data warehousing includes semantic integration (eliminate mismatches from different sources e. and purge old data) and Metadata Management (Keep track of source. .Populating such warehouses in non-trivial (data integration etc. not comprehensive and potentially contain inconsistencies and errors. . indexing.Data needs to be gathered in a form suitable for analysis Data Warehousing: Issues and the ETL Process . .) .Week 11: Data Analysis . refresh periodically. and Load) .Three Complementary Trends of data analysis in enterprise includes Data Warehousing: Consolidate data from many sources in one large repository OLAP: Interactive and “online” queries based on spreadsheet-style operations and “multidimensional” view of data Data Mining: Exploratory search for interesting trends and anomalies OLTP vs OLAP vs Data Mining . . Data Warehouse . interactive and exploratory analysis of datasets by integrating data collected across the enterprise. Transactions access large fraction of the database.Data warehouse contain large amount of read-only data that has been gathered at different times spanning long periods provided by different vendors and with different schemas. Transform. .OLAP and Data Warehousing The Problem/Motivation: . ROLLUP and GROUPING SET . and comprehensive New techniques for database design.Data such as currency and historical data are being analyzed to identify useful patterns and support strategies. and other information for data in warehouse) . and analytical querying need to be supported.OLAP (On Line Analytic Processing) uses mainly historic data in database to guide strategic decisions. loading time.Example: Google Fusion Tables. .Typical operational data is transient. Data are query more sophistically and in more specific ways. . They contain complex queries with infrequent updates.Data visualization turns large amount of data into useful information that businesses can understand easily and have decision based on them. Heterogeneous Sources (access data from variety of source formats).Internet helps the sharing of big data sets and correlating the data with own data becomes more important.Data (often derived from OLTP) for OLAP and data mining applications is usually stored in a special database called a data warehouse .OLTP (On Line Transaction Processing) maintains a database of some real-world enterprise and supports dayto-day operations. Load Refresh Purge (load data. Maptd. Different attribute names or domains). . periodic. .g. They are short simple transactions with frequent updates that only access small fraction of the database at a time.Star Schema .com .Must include a metadata repository which is information about physical and logical organization of data Populating a data Warehouse: ETL Process (Capture/Extract.Data mining attempt to find pattern and extract useful information from a database and not setting a strict guideline in the query.Businesses aim to create complex.CUBE.ROLAP/MOLAP . After ETL data should be detailed.Window and Ranking Queries .

semi-structure for data interchange. from information retrieval to data management Databases . attributes.Week 12: Introduction to Data Exchange with XML XML has 4 core specifications: XML Documents. JSON JSON: JavaScript Object Notation . "zipcode": 2006 } } DTD (Document Type Definition) <!ELEMENT book (title)> Grammar Elements + attributes XML Schema <xsd:simpleType name="Score"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="0"/> <xsd:maxInclusive value="100"/> </xsd:restriction> </xsd:simpleType> Structure and Typing Elements. and heterogeneous collections XML describes content whereas HTML describes presentation. from data processing to data/query translation. Namespaces. from storage to transport XML vs. originates from object serialization a la Javascript. Processing XML. and has a logical / physical structure . how do we transport semi-structured data? XML! Semistructured Data: “Self-describing. character set.from relational model to semistructured data.3) ● XMLELEMENT() produces a single nested XML element ● XMLATTRIBUTES() only as optional part of an XMLELEMENT call. XML Schema SQL can be ignorant of how data is stored. Produces tables that can have columns of type XML. Document Type Definitions (DTDs). simple and complex types. ● XMLEXTRACT and XMLEXISTS: Tell whether the set of nodes returned by XPath expression is empty (not supported by PostgreSQL – will be added in upcoming version 9. ● XMLCONCAT() concatenates individual XML values ● XMLAGG() an aggregate function that concatenates several input xml rows to a single XML output value ● XMLCOMMENT creates an XML comment element containing text An SQL query does not return XML directly.inheritance Specified as attribute of the document elements Only “Part of” relationships Specified as part of the prologue of an XML document Modern databases support SQL/XML ● Provide XML datatype to store XML in database . Store XML.From HTML to XML. Query XML using XQuery. but a schema is still required! But. Specifics for XML: syntactic structure. Low-overhead format opposed to XML. adds attribute(s) to a new XML element. Paradigm Shifts: Web . xml): Selects the XML content specified by the xpath expression from the xml data. "city": </person> "Sydney". ● Integrates XML support functions for querying and inserting XML data: ● XMLPARSE()parses XML fragments or documents so that it can be stored in SQL. ● XPATH(xpath. groups Supports “includes” relationships .text-based. Data sources with non-rigid structure (Biological or Web) Characteristics: Missing or additional attributes. Database Issues: Model XML using graphs. irregular data. "state": "NSW". .DTD with ‘entities’. XML JSON { <person name="John Smith"> "name": "John Smith". different types in different objects.stored in native tree form. <address street="1 Cleveland Street" city="Sydney" "address": { state="NSW" zipcode="2006" /> "street": "1 Cleveland Street". elements & attributes. no a priori structure” Origins: Integration of heterogeneous sources. Multiple attributes.

3== WorksIn --0.Relation (e.AND .IS NULL.Employee works in EXACTLY ONE department Employee ==> WorksIn --.MINUS 2. looks like rounded p) .g.right outer join: non-joined table can have null attributes .relationships: diamonds .ON DELETE: CASCADE.superclass at tip of triangle .a candidate key determines every column .anonym: status VARCHAR CHECK (status = 'A' OR STATUS = 'B') . rename parts .WHERE clause (select rows) . ) ) 3.DATE ‘2012-03-01’ .three-valued logic .AND.SELECT FROM WHERE GROUP BY HAVING ORDER BY .Why is a table called a relation? Relation from primary key to every column . functions ignore nulls .relationship applies to FURTHEST entity .named: CONSTRAINT chk_status CHECK (status = 'A' OR STATUS = 'B') 2. Keys (including foreign keys) .select avg(mark) .Attribute closure X^+ of some attributes X is 'all attributes that are determined by X' (functionally dependent on X).thick arrow: exactly one .BETWEEN 75 AND 100 .5 + null returns null .natural join: join on all equal fields . Functional dependencies . Assertions .Types of relations . count .Static constraints: . combine parts .INTEGER . <> .Relational schema (e.EXCEPT (minus duplicate rows) .join: combine fields .INTERSECT (duplicate rows only) .personid INTEGER REFERENCES person (id) ON DELETE NO ACTION .rename (row.Avg.Weak entity relationship: double rectangles . foreign key value. ER diagrams . >= .Superclass/subclass . ….TEXT .g. including X itself 1. SQL . SET NULL. sum..Department .difference . Triggers 2.most aggr.intersection .ellipses to ellipses: related attributes .overlapping: default (can belong to 1 or more) .OR.double-ellipses: multi-valued attributes .keys are underlined . 3) .Domain constraints (fields must be of correct data domain) (constraint on ONE attribute) 1.Relational schema instance (e.= .select count(distinct sid) from Enrolled .‘2012-04-01’ + INTERVAL ’36 HOUR’ . but with extra row-part for matches) 4.A --> B if 'A functionally determines B'.attribute: ellipse . enforces data integrity) .Department .e.SELECT * (all columns) .Department .SELECT clause (select only specified cols) 3. An} 2.a superkey is a set of columns that contains the candidate key .. that plus the row) .Types of constraints .CHAR .Department .. NO ACTION (default. max.Subqueries .full oiuter join: both tables can have null attributes . OR.Aggregate functons .Discriminator (aka partial key): discriminates among all entities related to one of the other entity . NOT 4.R inner join S using (<list of attributes>) . delete: loss of data needed for future rows 3.WHERE stuff . remove parts . Basic database stuff .LIKE 'POST%' (and lots of other string/regex operations) .Correlated vs uncorrelated .arrow: at most one .OR .partial: thick line (an entity doesn't have to belong to any) 3.R natural join S . ENUM checks 3.g.Semantic integrity constraints (constraints on MULTIPLE attributes) 1. e.projection (pi) .Relational instance (e. min.SELECT x AS y (rename operator) . Initialise result with the given set of attributes: X={A1.conditional join: join on specified fields . RESTRICT (pre-triggers). or 'B is functionally dependent on A' .UNION (add rows) . Super 4. Candidate 3. Relational Algebra 1.Integrity constraints (all constraints. .CREATE TABLE . foreign key field) .Jenna’s Super Summary 1.JOIN stuff .Triangle .Types of keys 1.set operations . Functional dependencies . ER diagram) .CURRENT_DATE and CURRENT_TIME .default) or SELECT DISTINCT (remove dups) .disjoint: write disjoint (can belong to only 1) .Outer join: non-matches included as NULL .Key constraints 1. insertion: duplicate data or null values 2.thick line: at least one . Foreign .a primary key functionally determines the whole row . set operations . Primary 2.g. NOT .Weak entity types: double rectangles .fully combine relations (col x row = for each thing in the col. all rows included (cartesian join) . update: changes in one row cause changes to all rows (biggest problem) .R inner join S on <join condition> .Types of fields . != .SELECT stuff . <= .*-.cross-product (X) .VARCHAR .SELECT 3 * 4 (arithmetic operations) .left outer join: joined table can have null attributes .union . post-triggers).total: default (an entity must belong to one) .SELECT ALL (keep dups . SET DEFAULT .Employee works in AT MOST ONE department Employee --> WorksIn --.combine matching tuples (col x row = same as col.selection (sigma) .CREATE ASSERTION x CHECK ( NOT EXISTS ( SELECT . < .select count(*) . DB) . not = null . Null/not null 2.ENUM: CREATE TYPE x AS ENUM . Repeatedly search for some FD: A1 A2 … Am -> C .Natural join: duplicate column names . Checks . > .Dynamic constraints 1.syntax .g.Employee works in 1 TO 3 departments Employee ==1.join (triangular-infinity thing) .Data redundancy causes anomalies 1..EXTRACT(year FROM enrolmentDate) . Unique and Unique checks .g.entity: square .rename one field to another 5. AND.Equi-join: when fields are equal .Employee works in AT LEAST ONE department Employee === WorksIn --.Union join: all columns included.

DELETE etc can be .Commit [ PARTITION BY attributelist ] (attributes to select) T2: W(A).Atomicity (all or nothing) .unrepeatable read (two reads in a transaction give different results.g.Dependency-preserving decomposition .sorted (uncommon. good for range.BCNF no remaining anomalies from functional dependencies (good!) key . tree is better) attribute closure K^+ .Pick one candidate key to be the primary key . together.If R(A.db needs to be optimised for SELECT queries .W(A). each page contains a maximum number of Add C to the set result.trivial FDs is X --> Y and Y is a subset of X (you determine yourself) 1.Properties of indexes . so 6. but can mean some [(RANGE|ROWS) BETWEEN v1 PRECEEDING AND v2 FOLLOWING] ) (rows operations are blocked to look at) .CREATE TABLE usually creates a unique. 2 levels mean records are 2 indexes away at most . round down!) (e.CREATE INDEX usually creates a secondary. n tables with FKs from the fact table T2: R(A). unclustered index always lossless-join .getting totals and subtotals for the hierarchies: . WW conflict) WINDOW name AS ( T1: W(A). inserting takes more time .Every dependency from the original is still in the decomposed relations . look at each set of attributes K and calculate the .2NF no partial dependencies (not important) table scan --> record) . query --> . RW . or more) 2. .for each dimension. we have a hierarchy . GROUP BY (every combination) conflict) .With an index. The set result is the correct value of X^+ (the closure of attributes) . OLAP . clustered. there are an equal number of records etc) with each value . R(B). WR conflict) . candidate key.W(A). z). cascading deletes.Not summarised not. all attributes are atomic (no multivalued or composite attributes) . the final state is the same regardless of the order .g.Types of indexes . B) and L(A. includes primary . if one fails they all fail .Commit [ ORDER BY attributelist ] (attributes to order by) . take exclusive lock 10.Before writing. it could .When you join the decomposed relations. records 3. .maximise (to a point) redundancy .There can be at most one clustered index on a table .Before reading. we say every original dependency is in exactly ONE of the . GROUP BY (z).CUBE(x.Commit GROUP BY (nothing) T2: R(A) many blocks? divide total # of records by # of blocks (e.1NF. y.Durability (committing MEANS committed. one logical 'unit of . Serialisability . selecting takes less time.g.such that all A1. Single. Main [or primary] (indexes contain the whole row) vs secondary (indexes .10% of the records with A = a also have B = b . once a commit returns. e.ACID access takes (reading a disk block into memory) . any crashes can return to that commit) 9. z).Abort . 4K block) .W(A). redundant fields.2-phase-locking ensures serializable executions.Find the candidate keys (the smallest subset that is still a superkey) .Not lossless-join doesn't usually mean whole rows are lost.Consistency (db always in valid state: triggers.OLAP stands for "online analytical processing" work' .The only non-trivial FDs are key constraints . CHECKs. either the FD is trivial or A is a superkey contain a pointer) (primary key.Serialisability means interleaved execution is the same as batch execution: . W(B).dirty read (reading uncommitted data.Often. bitmap indexes.An "access path" is the journey you take to reach the data (e. equality and prefix searches) .g.if a field has 3 possible long does the query take? times the number of blocks by the time an ..g.Check each subset of K to see if it is also a superkey . C) has A -> B. Unique (index over a candidate key) vs nonunique . T1: R(A). etc given 2 transactions.Isolation (transactions do not interfere) .ROLLUP(x.CREATE INDEX name ON table (field) . 20 byte record) .tree (like sorted but much space per row? add up space per field (e.Space and time problems . r-trees for spatial data) .A covering index (for a query) means all fields in the query are indexed.vs multi-attribute .4NF no multivalued dependencies (not important) 3. 50 blocks) 7.Clustered is good for "range searches" (key is between two limits) . main index on the mean that meaningless rows are added primary key .Lossless-join decomposition . Am are already in the set of attributes result.W(A). but C is 8. Repeat step 2 until no more attributes can be added to result . B. . Clustered (data records are ordered the same way as indexes) vs . C) is . then the decomposition L(A.Commit .how many records per block? divide the space of a block by this amount decomposed relations (calculate records per block. GROUP BY (y.Hold lock until transaction commits/aborts .assumptions: .hash (good for equality and thats it) .1 central fact table.W(B).To find all candidate keys.lost update (overwriting uncommitted data.special (e. z) does GROUP BY (x. ….UPDATE.A "search key" is a sequence of attributes that are indexed.If K+ contains all columns.W(B). you get the original relation 4.3NF no transitive dependencies (not important) . Indexing . y.'Auto-commit' means every SQL statement is an entire transaction slow .LOTS of tricks used: indexes. y.Data warehousing .formally: for every FD A --> B.5NF no remaining anomalies (not important) unclustered .Decomposition attributes .Commit . R(A). take shared lock . K is a superkey .Star schema T1: R(A).g. z) does GROUP BY (nothing). XML .an index is a type of page 4.they happen in order.WINDOW queries SELECT AGG(…) OVER name FROM .records are stored in pages. Normalisation ('decomposing' into normal forms) the records are not accessed at all .a Transaction is a list of SQL statements that are ACID.