P. 1
Nunes Database

Nunes Database

|Views: 17|Likes:
Published by venkisura7031

More info:

Published by: venkisura7031 on Aug 21, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PPT, PDF, TXT or read online from Scribd
See more
See less

10/29/2011

pdf

text

original

Introduction to Database Design

July 2005 Ken Nunes knunes @ sdsc.edu

SAN DIEGO SUPERCOMPUTER CENTER

Database Design Agenda
‡General Design Considerations ‡Entity-Relationship Model ‡Tutorial ‡Normalization ‡Star Schemas ‡Additional Information ‡Q&A

SAN DIEGO SUPERCOMPUTER CENTER

General Design Considerations

‡Users ‡Legacy Systems/Data ‡Application Requirements

SAN DIEGO SUPERCOMPUTER CENTER

Users
‡Who are they?
‡Administrative ‡Scientific ‡Technical

‡Impact
‡Access Controls ‡Interfaces ‡Service levels

SAN DIEGO SUPERCOMPUTER CENTER

Legacy Systems/Data
‡What systems are currently in place? ‡Where does the data come from? ‡How is it generated? ‡What format is it in? ‡What is the data used for? ‡Which parts of the system must remain static?

SAN DIEGO SUPERCOMPUTER CENTER

Application Requirements ‡What kind of database? ‡OnLine Analytical Processing (OLAP) ‡OnLine Transactional Processing (OLTP) ‡Budget ‡Platform / Vendor ‡Workflow? ‡order of operations ‡error handling ‡reporting SAN DIEGO SUPERCOMPUTER CENTER .

Relationship Model A logical design method which emphasizes simplicity and readability. ‡Basic objects of the model are: ‡Entities ‡Relationships ‡Attributes SAN DIEGO SUPERCOMPUTER CENTER .Entity .

Entities Data objects detailed by the information in the database. ‡Denoted by rectangles in the model. Employee Department SAN DIEGO SUPERCOMPUTER CENTER .

Attributes Characteristics of entities or relationships. ‡Denoted by ellipses in the model. Employee Name SSN Department Name Budget SAN DIEGO SUPERCOMPUTER CENTER .

‡Denoted by diamonds in the model.Relationships Represent associations between entities. Employee Name SSN works in Department Name Budget Start date SAN DIEGO SUPERCOMPUTER CENTER .

Relationship Connectivity Constraints on the mapping of the associated entities in the relationship. ‡Denoted by variables between the related entities. ‡Generally. values for connectivity are expressed as ³one´ or ³many´ Employee Name SSN N work 1 Department Name Budget Start date SAN DIEGO SUPERCOMPUTER CENTER .

Connectivity one-to-one Department 1 1 has Manager one-to-many Department 1 has N Project many-to-many Employee M works on N Project SAN DIEGO SUPERCOMPUTER CENTER .

‡The coach requires information on: ‡Players ‡Player statistics ‡Games ‡Sales SAN DIEGO SUPERCOMPUTER CENTER .ER example Volleyball coach needs to collect information about his team.

tickets.date. merchandise Name Statistics Games date opponent result Players Start date End date tickets Sales merchandise SAN DIEGO SUPERCOMPUTER CENTER .date.Team Entities & Attributes ‡Players . end date ‡Games . opponent. name.statistics. start date. result ‡Sales .

Team Relationships Identify the relationships. ‡For each game. we have multiple players so the relationship is one-to-many 1 Games N play Players SAN DIEGO SUPERCOMPUTER CENTER . ‡The player statistics are recorded at each game so the player and game entities are related.

‡We have only 1 set of sales numbers for each game. 1 1 generates Games Sales SAN DIEGO SUPERCOMPUTER CENTER .Team Relationships Identify the relationships. one-to-one. ‡The sales are generated at each game so the sales and games are related.

Team ER Diagram date opponent result Games 1 play N 1 1 generates Players Name Start date End date Statistics tickets Sales merchandise SAN DIEGO SUPERCOMPUTER CENTER .

‡Transform each entity into a table with the key and its attributes.Logical Design to Physical Design Creating relational SQL schemas from entity-relationship models. ‡Transform each relationship as either a relationship table (many-tomany) or a ³foreign key´ (one-to-many and many-to-many). SAN DIEGO SUPERCOMPUTER CENTER .

Entity tables Transform each entity into a table with a key and its attributes. Employee Name SSN create table employee (emp_no number. SAN DIEGO SUPERCOMPUTER CENTER . name varchar2(256). primary key (emp_no)). ssn number.

ssn number. ‡Foreign key is a reference in the child (many) table to the primary key of the parent (one) table.Foreign Keys Transform each one-to-one or one-to-many relationship as a ³foreign key´. primary key (dept_no)). foreign key (dept_no) references department). name varchar2(256). SAN DIEGO SUPERCOMPUTER CENTER . name varchar2(50). Department 1 has N Employee create table department (dept_no number. create table employee (emp_no number. dept_no number. primary key (emp_no).

Foreign Key Department dept_no 1 2 3 Name Accounting Human Resources IT Accounting has 1 employee: Brian Burnett Human Resources has 2 employees: Nora Edwards Ben Smith Employee emp_no 1 2 3 4 5 6 dept_no 2 3 2 1 3 3 Name Nora Edwards Ajay Patel Ben Smith Brian Burnett John O'Leary Julia Lenin IT has 3 employees: Ajay Patel John O¶Leary Julia Lenin SAN DIEGO SUPERCOMPUTER CENTER .

Many-to-Many tables Transform each many-to-many relationship as a table. start_date date. emp_no number. primary key (proj_no. emp_no). ‡The relationship table will contain the foreign keys to the related entities as well as any relationship attributes. Employee SAN DIEGO SUPERCOMPUTER CENTER . Project N Start date has M create table proj_has_emp (proj_no number. foreign key (proj_no) references project foreign key (emp_no) references employee).

Many-to-Many tables Project proj_no 1 Name Empl t I tr t it proj_no 1 3 3 2 3 2 proj_has_emp emp_no 4 6 5 6 2 1 start_date 4/7/03 8/12/02 3/4/01 11/11/02 12/2/03 7/21/04 Employee emp_no 1 2 3 4 5 6 dept_no 2 3 2 1 3 3 Name Nora Edwards Ajay Patel Ben Smith Brian Burnett John O'Leary Julia Lenin Employee Audit has 1 employee: Brian Burnett Budget has 2 employees: Julia Lenin Nora Edwards Intranet has 3 employees: Julia Lenin John O¶Leary Ajay Patel SAN DIEGO SUPERCOMPUTER CENTER .

/dbms/db2/home/db2i010/sqllib/db2cshrc ‡Run the DB2 command line processor (CLP) % db2 SAN DIEGO SUPERCOMPUTER CENTER .sdsc. ‡Log on to the system using SSH. % ssh user@ds003. or bash) $ .Tutorial Entering the physical design into the database.edu ‡Setup the database instance environment: (csh or tcsh) % source /dbms/db2/home/db2i010/sqllib/db2cshrc (sh. ksh.

) => name varchar(50).Tutorial ‡db2 prompt will appear following version information. \ db2 (cont.) => primary key (dept_no)) SAN DIEGO SUPERCOMPUTER CENTER . \ db2 (cont. db2=> ‡connect to the workshop database: db2=> connect to workshop ‡create the department table db2=> create table department \ db2 (cont.) => (dept_no smallint not null.

) => foreign key (dept_no) references department) ‡list the tables db2 => list tables for schema <user> SAN DIEGO SUPERCOMPUTER CENTER .) => ssn int not null. \ db2 (cont. \ db2 (cont.) => (emp_no smallint not null. \ db2 (cont.) => name varchar(50).) => primary key (emp_no). \ db2 (cont.Tutorial ‡create the employee table db2 => create table employee \ db2 (cont.) => dept_no smallint not null. \ db2 (cont.

Normalization A logical design method which minimizes data redundancy and reduces design flaws. SAN DIEGO SUPERCOMPUTER CENTER . ‡The normal forms break down large tables into smaller subsets. ‡Consists of applying various ³normal´ forms to the database design.

1NF simplifies attributes ‡ Queries become easier. ‡ No multi-valued columns. SAN DIEGO SUPERCOMPUTER CENTER .First Normal Form (1NF) Each attribute must be atomic ‡ No repeating columns within a row.

racle. Mac DB2.1NF Employee (unnormalized) ¡  ©  ¨ ¦  § ¥  £ ¤ £¢¡      emp_no 1 2 3 name b Kevi J Barbara J es Jake Rivera dept_no 01 224 201 dept_name R&D I R&D skills C. J va i x. Java Employee (1NF)   emp_no 1 1 1 2 2 3 3 3 name Kevi Jacobs Kevi Jacobs Kevi Jacobs Barbara Jones Barbara Jones Jake Rivera Jake Rivera Jake Rivera dept_no 201 201 201 224 224 201 201 201 dept_name R&D R&D R&D I I R&D R&D R&D skills C Perl Java inux Mac DB2 Oracle Java SAN DIEGO SUPERCOMPUTER CENTER . erl.

and delete anomalies. SAN DIEGO SUPERCOMPUTER CENTER .the property of one or more attributes that uniquely determines the value of other attributes. ‡ Functional dependence . ‡ Prevents update. insert.Second Normal Form (2NF) Each attribute must be functionally dependent on the primary key. ‡ Any non-dependent attributes are moved into a smaller (subset) table. 2NF improves data integrity.

dept_no. dept_name) (emp_no -> name.Functional Dependence Employee (1NF) $ #" !               emp_no 1 1 1 2 2 3 3 3 name Kevi J Kevi J Kevi J Bar ara J es Bar ara J es Jake Rivera Jake Rivera Jake Rivera dept_no 201 201 201 224 224 201 201 201 dept_name R&D R&D R&D I I R&D R&D R&D skills C Perl J va i ac DB2 Oracle Java Name. and dept_name are functionally dependent on emp_no. Skills is not functionally dependent on emp_no since it is not unique to each emp_no. dept_no. SAN DIEGO SUPERCOMPUTER CENTER .

2NF Employee (1NF) % ) () ( ( ( ( ( '& '& '& % % e 1 1 1 2 2 3 3 3 p_no na e Kevi acobs Kevi acobs Kevi acobs Barbara Jones Barbara Jones Jake Rivera Jake Rivera Jake Rivera dept_no 201 201 201 224 224 201 201 201 dept_na R& R& R& I I R& R& R& e skills C Perl Java inux ac B2 racle Java Employee (2NF) 3 emp_no 1 2 3 name Kevin Jacobs Barbara Jones Jake Rivera dept_no 201 224 201 dept_name R&D I R&D Skills (2NF) emp_no 1 1 1 2 2 3 3 3 skills C Perl Java Linux Mac DB2 Oracle Java SAN DIEGO SUPERCOMPUTER CENTER 2 ( 1 0 .

adding null values.deleting wanted information. ‡ Update Anomaly . causes performance degradation. eg. deleting the IT department removes employee Barbara Jones from the database SAN DIEGO SUPERCOMPUTER CENTER . eg. changing IT dept_name to IS ‡ Delete Anomaly . eg. inserting a new department does not require the primary key of emp_no to be added.Data Integrity Employee (1NF) 4 4 emp_no 1 1 1 2 2 3 3 3 name Kevin Jacobs Kevin Jacobs Kevin Jacobs Barbara Jones Barbara Jones Jake Rivera Jake Rivera Jake Rivera dept_no 201 201 201 224 224 201 201 201 dept_name R&D R&D R&D I I R&D R&D R&D skills C Perl Java Linux Mac DB2 Oracle Java ‡ Insert Anomaly .multiple updates for a single name change.

‡ Transitive dependence . SAN DIEGO SUPERCOMPUTER CENTER . and delete anomalies. insert. ‡ Any transitive dependencies are moved into a smaller (subset) table. 3NF further improves data integrity.two separate entities exist within one table.Third Normal Form (3NF) Remove transitive dependencies. ‡ Prevents update.

SAN DIEGO SUPERCOMPUTER CENTER . department can be considered a separate entity.Transitive Dependence Employee (2NF) 5 emp_no 1 2 3 name Kevin Jacobs Barbara Jones Jake Rivera dept_no 201 224 201 dept_name R&D I R&D Dept_no and dept_name are functionally dependent on emp_no however.

3NF Employee (2NF) 6 emp_no 1 2 3 name Kevin Jacobs Barbara Jones Jake Rivera dept_no 201 224 201 dept_name R&D I R&D Employee (3NF) emp_no 1 2 3 name Kevin Jacobs Barbara Jones Jake Rivera dept_no 201 224 201 Department (3NF) dept_no dept_name 201 R&D 224 I SAN DIEGO SUPERCOMPUTER CENTER .

Other Normal Forms Boyce-Codd Normal Form (BCNF) ‡ Strengthens 3NF by requiring the keys in the functional dependencies to be superkeys (a column or columns that uniquely identify a row) Fourth Normal Form (4NF) ‡ Eliminate trivial multivalued dependencies. SAN DIEGO SUPERCOMPUTER CENTER . Fifth Normal Form (5NF) ‡ Eliminate dependencies not determined by keys.

Normalizing our team (1NF) games @ @ 9 77 7 78 7 game_id 34 35 4 42 date 6/3/ 5 6/ / 5 6/15/ 5 6/2 / 5 opponent Chicago Seattle Phoeni LA result W W L W sales_id 12 122 125 126 sales @@@@ @@@ @@@@ @@@ @@ @@ @@ @@@ game_id 34 35 4 42 merch 5 45 25 65 tickets 25 3 15 4 SAN DIEGO SUPERCOMPUTER CENTER B B B 6 8 1 18 24 2 B B C B B B B B B B B BB BB BB A A A B player_id 45 45 45 78 1 2 1 2 1 3 7 players game_id 34 35 4 42 34 35 42 name ike Speedy ike Speedy ike Speedy Frank Newmon Joe Powers Joe Powers Tony Tough start_date end_date aces 1/1/ 12 1/1/ 1 1/1/ 5/1/ 5 1/1/ 2 7/1/ 5 8 1/1/ 2 7/1/ 5 1 1/1/ 5 15 blocks 3 2 2 spikes 2 15 1 digs 5 4 3 1 12 14 B B B .

Normalizing our team (2NF & 3NF) games G game_id 34 35 40 42 date 6/3/05 6/8/05 6/15/05 6/20/05 opponent Chicago Seattle Phoeni LA result W W L W sales_id 120 122 125 126 sales game_id 34 35 40 42 merch 5000 4500 2500 6500 tickets 25000 30000 15000 40000 players F E pla er_id 45 78 102 103 name Mike peedy Frank ewmon Joe Powers Tony Tough start_date end_date 1/1/00 5/1/05 1/1/02 7/1/05 1/1/05 player_id 45 45 45 102 102 103 player_stats game_id 34 35 40 34 35 42 aces 12 10 7 8 10 15 blocks 3 2 2 6 8 10 spikes 20 15 10 18 24 20 digs 5 4 3 10 12 14 D SAN DIEGO SUPERCOMPUTER CENTER .

Revisit team ER diagram date opponent result 1 1 generates games 1 Recorded by N sales tickets merchandise player_stats aces blocks digs spikes tracked N 1 players Start date End date Name SAN DIEGO SUPERCOMPUTER CENTER .

allows for faster querying due to less joins. ‡ Denormalized .Star Schemas Designed for data retrieval ‡ Best for use in decision support tasks such as Data Warehouses and Data Marts. delete. ‡ Slow performance for insert. and update transactions. SAN DIEGO SUPERCOMPUTER CENTER . ‡ Comprised of two types tables: facts and dimensions.

Fact Table The main table in a star schema is the Fact table. ‡Measure . ‡ Contains groupings of measures of an event to be analyzed.numeric data Invoice Facts units sold unit amount total sale price SAN DIEGO SUPERCOMPUTER CENTER .

Dimension Table Dimension tables are groupings of descriptors and measures of the fact.non-numeric data Customer Dimension cust_dim_key name address phone Time Dimension time_dim_key invoice date due date delivered date Location Dimension loc_dim_key store number store address store phone SAN DIEGO SUPERCOMPUTER CENTER Product Dimension prod_dim_key product price cost . ‡descriptor .

Customer Dimension cust_dim_key name address phone Time Dimension 1 N Invoice Facts cust_dim_key loc_dim_key time_dim_key prod_dim_key units sold unit amount total sale price 1 N time_dim_key invoice date due date delivered date Location Dimension loc_dim_key store number store address store phone Product Dimension N prod_dim_key product price 1 cost N 1 SAN DIEGO SUPERCOMPUTER CENTER .Star Schema The fact table forms a one to many relationship with each dimension table.

Team Facts date merchandise tickets SAN DIEGO SUPERCOMPUTER CENTER . ‡ From this we will use the sales table to create our fact table.Analyzing the team The coach needs to analyze how the team generates income.

Team Dimension We have 2 dimensions for the schema: player and games. Game Dimension game_dim_key opponent result Player Dimension player_dim_key name start_date end_date aces blocks spikes digs SAN DIEGO SUPERCOMPUTER CENTER .

Team Star Schema Team Facts player_dim_key game_dim_key date merchandise tickets N 1 N Player Dimension 1 player_dim_key name start_date end_date aces blocks spikes digs Game Dimension game_dim_key opponent result SAN DIEGO SUPERCOMPUTER CENTER .

Books and Reference ‡Database Design for Mere Mortals. Michael J. Toby J. Terry Halpin ‡Database Modeling and Design. Teorey SAN DIEGO SUPERCOMPUTER CENTER . Hernandez ‡Information Modeling and Relational Databases.

Continuing Education UCSD Extension Data Management Courses DBA Certificate Program Database Application Developer Certificate Program SAN DIEGO SUPERCOMPUTER CENTER .

and database storage resources.edu/ ‡Tools and expertise for making data collections available to the broader scientific community. SAN DIEGO SUPERCOMPUTER CENTER . ‡Provide disk.Data Central The Data Services Group provides Data Allocations for the scientific community. ‡ http://datacentral. tape.sdsc.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->