Course description: Structured information is the lifeblood of commerce, government, and science today. This course provides an introduction to the broad field, covering a range of topics relating to structured data, ranging from data modeling to logical foundations and popular languages, to system implementations. We will study the theory of relational data design; the basics of query languages; efficient storage of data, execution of queries and query optimization; transactions and updates; and "big data" and NoSQL systems. The course entails roughly 6 homeworks, a group project and 2 midterms.
Topic covered: Database design, relational algebra, query languages (SQL, XQuery), data import and munging, views, indexing, transactions, query optimization, client-side & server-side Web development, Map/Reduce and NoSQL systems.
Prerequisites: For Penn undergrads and submatriculants: CIS 121. For MCIT students: CIT 592 and 594. For other Master's students: programming experience and mathematical maturity equivalent to the above.
Text: Ramakrishnan and Gehrke, Database Management Systems, McGraw-Hill
Detailed syllabus: 1. Introduction. Relational model, schemas and SQL. (Ch. 1, 3.1-3.3, 5.1-5.3) 2. Advanced SQL. Nulls, outer joins (Ch. 5.3-5.6) 3. Updates and views. Embedded SQL and the Web (Ch. 3.6, 7.5-7.7) 4. Relational database design: ER modeling (Ch. 2.1-2.5, 3.1-3.6) 5. Relational design theory: Functional dependencies, Armstrong’s Axioms. Schema refinement and normalization. Decomposition into BCNF and 3NF. (Ch. 19:1-6) 6. Updates and transactions, transactions in practice. Isolation levels. (Ch. 16:1-6) 7. Storage and Indexing (Ch. 8:1-4) 8. B+trees (Ch. 10:3-8) 9. Relational database optimization foundations: relational algebra (Ch. 4:1-2) 10. Query processing and optimization (Ch. 12:1-4, 14:1-4, 15:1, 3) 11. NoSQL solutions and document-oriented databases: MongoDB and MapReduce (handout) 12. NoSQL solutions and graph databases: Neo4j (handout) 13. Security and authorization (Ch. 21)