You are on page 1of 77

Fundamentals of Relational

Database Design and
Database Planning


 Definitions
 Selecting a dbms
 Selecting an application layer
 Relational Design
 Planning
 A very few words about Replication
 Space

What is a database?
A database is the implementation of freeware or
commercial software that provides a means to
organize and retrieve data. The database is the
set of physical files in which all the objects and
database metadata are stored. These files can
usually be seen at the operating system level.
This talk will focus on the organize aspect of
data storage and retrieval.
Commercial vendors include MicroSoft and Oracle.
Freeware products include mysql and postgres.
For this discussion, all points/issues apply to both
commercial and freeware products.


Definitions Instance A database instance. These processes usually include a process monitor. They will vary from database vendor to database vendor. lock monitor. 4 . or an ‘instance’ is made up of the background processes needed by the database software. etc. session monitor.

A database application schema is the set of database objects that apply to a specific application. A database instance controls 0 or more databases. For example payroll. A database application schema not a database. A database application is the code base to manipulate and retrieve the data stored in the database application schema. purchasing. Usually several schemas coexist in a database. A database contains 0 or more database application schemas. within a database to serve a specific functionality. 5 . These objects are relational in nature. AND A DATABASE IS NOT A SCHEMA. etc. calibration. trigger. and are related to each other. Definitions What is a schema? A SCHEMA IS NOT A DATABASE.

6 .  Primary key. a table was called a file. Definitions Cont. In the old days. Primary Definitions  Table.  Index. a set of columns that contain data. an object that allows for fast retrieval of table rows.  Row. is 1 or more columns in a table that makes a record unique. a set of columns from a table reflecting a record. often designated pk. Every primary key and foreign key should have an index for retrieval speed.

Definitions Cont. Optional allows a child to exist without a parent. Primary Definitions  Foreign key. is a common column common between 2 tables that define the relationship between those 2 tables. often designated fk.  Foreign keys are either mandatory or optional. Mandatory forces a child to have a parent by creating a not null column at the child. 7 . allowing a nullable column at the child table (not a common circumstance).

8 . Primary Definitions Entity Relationship Diagram or ER is a pictorial representation of the application schema. Definitions Cont.


Definitions Cont. what is a legal move vs. what is an illegal move. These are of the utmost importance for a secure and consistent set of data. 10 . Primary Definitions Constraints are rules residing in the database’s data dictionary governing relationships and dictating the ways records are manipulated.

11 . Primary Definitions Data Manipulation Language or DML. sql statements that insert. update or delete database in a database. sql used to create and modify database objects used in an application schema. Definitions Cont. Data Definition Language or DDL.

The effects of all the SQL statements in a transaction can be either all committed (applied to the database) or all rolled back (undone from the database). 12 . Primary Definitions A transaction is a logical unit of work that contains one or more SQL statements. insuring data consistency. A transaction is an atomic unit. Definitions Cont.

Definitions Cont. reflecting a specific facet of information. Primary Definitions  A view is a selective presentation of the structure of. and data in. 13 . one or more tables (or other views). A view is a ‘virtual table’. having predefined columns and joins to one or more tables.

enforce complex integrity constraints. Primary Definitions Database triggers are PL/SQL. Definitions Cont. they can automate data generation. Database triggers can be used in a variety of ways for managing your database. audit data modifications. Trigger methodology differs between databases. 14 . and customize complex security authorizations. For example. Java. or C procedures that run implicitly whenever a table or view is modified or when some user actions or database system actions occur.

in multiple databases that make up a distributed database system. Backups are used to recover one or more files that have been physically damaged as the result of a disk failure. Definitions Cont. such as tables. Backups are copies of the database data in a format specific to the database. 15 . It is of the utmost importance to perform regularly scheduled backups. Media recovery requires the restoration of the damaged files from the most recent operating system backup of a database. Primary Definitions Replication is the process of copying and maintaining database objects.

16 . there are safety issues if the data is lost or unavailable. Mission Critical Applications An application is defined as mission critical. Definitions Cont. there are legal implications or financial loss to the institution if the data is lost or unavailable. imho. 3. 4. no data loss can be tolerated. if 1. 2. uptime must be maximized (98%+).

Its definition varies depending on the database software one selects. Vldb is an acronym for very large databases. or data that needs extraordinary measures need to be taken for operations such as backup. recovery. but ‘large’ is a hard definition to determine. 17 . etc. Definitions Cont. ‘large’ or ‘very large’ or ‘a lot’ Seems odd. storage. Very large normally indicates data that is reaching the limits of capacity for the database software.

Issues will be backup strategies for large databases. Freeware does limit the size of the databases. Commercial databases do not a have a practical limit to the size of the load. Definitions Cont. Mysql supposedly can support 8T and 100 users. 18 . However. Documentation on these issues vary widely from the freeware sites to the user sites. and the number of users. you will find arguments on the users lists that these numbers cannot be met.

Selecting a DBMS Many options. many decisions. 19 . For lots of good information. Many examples of people choosing product. please refer to the urls on the last slides. planning. criticality. costs.

Selecting a DBMS How do I Choose? Which database product is appropriate for my application? You must make a requirements assessment. and no data loss can be tolerated? Is your database large? (backup recovery methods) What data types do I need? (binary. large objects?) Do I need replication? What level of replication is required? Read only? Read/Write? Read/Write is very expensive. Does you database need 24x7 availability? Is your database mission critical. so can I justify it? 20 .

 Priority for production issues  Upgrades/new releases  Assistance with and use of proven backup/recovery methods 21 . and get them resolved in a timely manner. Support includes:  24x7 assistance with technical issues  Patches for bugs and security  The ability to report bugs. I would strongly suggest purchasing and using a commercial database with support. Selecting a DBMS How do I Choose? Cont. If your answer to any of the above is ‘yes’.

Also expect less functionality than any commercial product. Be prepared. See http://www- css. Selecting a DBMS The Freeware Choice Freeware is an alternative for applications. support for these databases is done via email to a ad hoc support group. be fore warned. 22 . The level of support via these groups may vary over the life of your database.

Selecting a DBMS The Freeware Choice Freeware is free. Freeware is open source. Freeware functionality is improving. Freeware is good for smaller non- mission critical applications. 23 .

planning takes center stage. Selecting an Application Layer Again. In the end you want stability and dependability.  How many users need access?  What will the security requirements be?  Are there software licensing issues that need consideration?  Is platform portability a requirement?  Two tier or three tier architecture? 24 .

if database portability is a requirement. Selecting an Application Layer  Direct access to the database layer? (probably should be avoided)  Are you replicating? How? Where? With what?  There are no utilities that will port data from 1 database to another (i. an independent code must be written to satisfy this requirement..e. 25 . postgres to mysql).

We consider it a must have. talent. patches…are they important and timely?  Documentation? Set standards.  Is the application flexible enough to easily accommodate business rule changes that mandate modifications?  The availability of an ER diagram at this stage is invaluable. Selecting an Application Layer Cont. Application maintenance issues  People availability. 26 . code reviews making sure the documentation exists and is clear. This lack of portability means a method to move data between databases  must be written independently. and turnover? (historically a huge issue)  A ‘known’ or ‘common’ language?  Freeware? Bug fixes. procedures. working with users as a team.  There are no utilities to port data from 1 type of db to another.

Sql the query language for relational databases. but I will mention a few terms you may hear. JDBC. 27 . Selecting an Application Layer Misc. open database connectivity. A must learn. The software that allows a database to talk to an application. application definitions… This presentation is not an application presentation. java database connectivity. ODBC.

Relational Design The design of the application schema will determine the usability and query ability of the application. 28 . the application and users will suffer until someone else is forced to rewrite it. Done incorrectly.

development. 29 . Relational Design The Setup The database group has a standard 3 tier infrastructure for developing and deploying production databases and applications. This infrastructure is applicable to any application schema. Each of these instances contain 1 or more applications. This infrastructure provides 3 database instances. and an protected production environment. testing. signoff. It is designed to insure development. mission critical or not. integration and production. feedback.

Developers playground. Usually there is not enough disk space to ever ‘refresh’ with production data. Small in size compared to production. Much of the data is ‘invented’ and input by the developers. Development instance. 30 . Relational Design The Setup The 3 instances are used as follows: 1.

31 . Cuts from dev to int are frequent and common to maintain the newest releases in int for user testing. The users should use integration as their sign off area. Relational Design Cont. The Setup 2. The integration instance is used for moving what is thought to be ‘complete’ functionality to a pre production implementation. Power users and developers work in concert in integration to make sure the specs were followed.

The optimal setup of a production database server machine has ~3 operating system logons. NO testing allowed. Relational Design Cont. web servers. the database logon (ie oracle). developers. In a critical 24x7 supported database. all should be kept off the production database server. real data. and a monitoring tool. 32 . Needs to be kept pure. log files. Very few logons. The Setup 3. The production instance. root. development tools.

Resist putting software products on the db server machine so that their maintenance does not inhibit the running of the database. All software products need maintenance and downtime. Relational Design Cont. a logging application. If this logging app. 2. 33 . the db would be unaffected by the malfunction. it could inhibit access to the database for a long period. fills all available space and halts the database. To optimize a mission critical 24/7 database. nothing else. monitoring users on the db goes wild. Example. were not on the dbserver machine. the database server machine should be dedicated to running the database. 1. Further. The Setup Let’s talk about mission critical & 24x7 a bit. if the product breaks.

All database applications and database software require modifications. all apps may have to take the down time. In that way you insure any down times are specific to your cause. and 1 of those applications needs the database for an upgrade. If you are sharing your database instance with other many other applications. Most times these modification require down time because the schema or data modifications need to lock entire tables exclusively. The Setup 3. Avoid this by insuring your 24/7 database application is segregated from all other software that is not absolutely needed. Relational Design Cont. 34 .

(d0ora2) (sam. calib) d0ofint1) An database can accommodate 1 or schema more instances An instance may applications in contain 1 or more application d0ofint1 schemas (sam. calib) 35 . runs. runs. Our 1st relational example A cpu can house 1 or more databases Databases schema on d0ora2 applications in CPU d0ofprd1 (d0ofprd1.

What is a schema? It is It is not Tables (columns/datatypes) having The environment (servers. These scripts can be run against multiple servers and should be archived. 36 . unique. OS) Constraints (not null. Accounts Privileges & Roles Server side processes One implements a schema by running scripts. I.e objects foreign & primary keys) Application Code Triggers Indexes etc. The results of queries.

Relational Design Getting Started Using your design tool. honing. you will begin by relating objects that will eventually become tables. Example. redoing. You will spend LOADS of time in your design tool. The end users and the designers need to be working almost at the same desk for this process. If the end user is the designer. the end user should involve additional users to insure an unbiased and general design. Tables are related. etc. It is highly suggested that the design be kept up to date for future documentation and maintainers. 1 run will result in 0 or more events. All the other schema objects will fall out of this design. most frequently in a 0 to many relationship. Analyzing and defining these relationships results in an application schema. 37 . reacting to modifications.

 Discovery of data that needs to be gathered. 38 .  Fast query results  Limited application code maintenance  Data flexibility  Less painful turnover of application to new maintainers.  Fewer long term maintenance issues. Gather requirements. What will a good schema design buy you? I am afraid the 80% planning 20% implementation rule applies.

but a document will be a start. 39 . You will not be able to anticipate all requirements. Relational Design Let’s get started Write a requirements document. Think about how objects relate to each other. Who are the users? What is their mission? Identify objects that need to be stored/tracked. Do not be afraid to argue/debate the relationships with others. A well designed schema naturally allows for additional functionality.

A new developer started from scratch because there was no documentation or design. they do not think for you. Relational Design So how do you get there? Design tools are available. however. It is highly recommended you use a design tool. left the project. A picture says 1000 words. Create ER. and left a mess. 40 . Get a commitment from the developer(s) to see the application through to implementation. A string of developers tried. We have seen several applications redone multiple times. entity relationship. diagrams. They will give you a clue that you are doing something stupid. but it won’t stop you.

Relational Design
How do I get there?
Adhere to the recommendations of your database
vendor for setup and architecture.
Don’t be afraid to ask for help or to see other
Don’t be afraid to pilfer others design work, if it is
good, if it closely fits your requirements, then use it.
Ask questions, schedule reviews with experts and
Work with your hardware system administrators to
insure you have the hardware you need for the
proposed job to be done.


Relational Design
Common Mistakes
Mistakes we see ALL the time
 Do not design your schema around your
favorite query. A relational design will
enable all queries to be speedy, not only
your favorite.
 Don’t design the schema around your
narrow view of the application. Get other
users involved from the start, ask for input
and review.

Relational Design
Common Mistakes
 Create a relational structure, not a
hierarchical structure. The ER diagram
should not necessarily resemble a tree
or a circle. It is the logical building of
relationships between data.
Relationships flow between subsets of
data. The resulting ER diagram’s
‘look’ is not a standard by which one
can judge the quality of the design.

 Use indices and constraints. took over a year. unqueryable. required an entire application rewrite.  Do not create separate schemas for the same application or functions within an application. this is a MUST! 44 . Relational Design Common Mistakes  Do not create 1 huge table to hold 99% of the data. made 80 tables from the 1 table. We have seen a table with 1100+ columns…unusable.

One record spanning many blocks. no other record will be inserted. hence bad performance. thus chaining. Actually this was not the case.  A table with more than 900 columns. 45 . Such design will cause chaining since each record is not going to fit in one block. Relational Design Examples of Common Mistakes  Using timestamp as the primary key assumes that within a second. and an insert operation failed. Use database generated sequences as primary keys and NON-UNIQUE index on timestamp.

and duplicate values issues when the application increments the sequence. Have seen locking issues. instantaneous query response. Indices are not wasted space! 46 . Relational Design Examples of Common Mistakes  Do not let the application control a generated sequence. halted during a query. That is why the databases have sequence mechanisms.  Use indices! An Atlas table with 200. use them. Added a primary key index. Have the database increment/lock/constrain the sequence/primary key.000 rows. Reason? No indices.

Bugs in the api allowed data to be deleted that should not have been deleted. luckily the database constraints saved the data. but ‘implemented’ via the api. Relational Design Examples of Common Mistakes USE DATABASE CONSTRAINTS!!!!!! Have examples where constraints were not used. Have also seen apis error with ‘cannot delete’ errors. and constraints would have prevented the error. 47 . They were trying to force an invalid delete.

Entity Relationship Diagrams 1 to many PARENT have CHILD # PARENT_ID belong to # CHILD_ID A have B # A_ID # B_ID belong to C have D # C_ID # D_ID belong to E have F # E_ID # F_ID belong to 48 .

Entity Relationship Diagrams many to many define H G # G_ID # H_ID owned by define G2H2 map to G2 H2 # G2_ID map to # H2_ID define define I J # I_ID relate to # J_ID I2 define map to I2J2 J2 # I2_ID # J2_ID map to define 49 .

Entity Relationship Diagrams 1 to 1 K define L # K_ID # L_ID relate to M define N # M_ID # N_ID relate to O define # O_ID P relate to # P_ID 50 .

pedestal. 51 . times. This is a child table. pedestal or gain. & gain Defined by drift. In addition to start and end This is a parent table. drift. Relational Design The Good CALIB_TYPE CALIBRATION # CALIB_TYPE_ID # CALIBRATION_ID * DESCRIPTION define * TSTART o TEND be defined by Calibration type might have 3 Each calibration record will be rows.

all reporting the same information. tested. and maintained for 4 tables now instead of 2. Relational Design The Bad CALIBRATION # CALIBRATION_ID * TSTART o TEND define define define relate to relate to relate to DRIFT_CALIB PEDESTAL_CALIB GAIN_CALIB # DRIFT_CALIB_ID # PEDESTAL_CALIB_ID # GAIN_CALIB_ID * TSTART * TSTART * TSTART o TEND o TEND o TEND You have now created 3 different children. when 1 child would suffice. 52 . Code will have to be written.

Extra code. Relational Design The Ugly CALIBRATION CALIBRATION(2) CALIBRATION(3) # CALIBRATION_ID # CALIBRATION_ID # CALIBRATION_ID * TSTART * TSTART * TSTART o TEND o TEND o TEND defines defines defines relate to relate to relate to PEDESTAL_CALIB DRIFT_CALIB GAIN_CALIB # PEDESTAL_CALIB_ID # DRIFT_CALIB_ID # GAIN_CALIB_ID * TSTART * TSTART * TSTART o TEND o TEND o TEND Now you have created 3 different applications. using 6 tables. 53 . extra maintenance. All of which could be managed with 2 tables. extra testing.

Relational Design The Good…let’s recap CALIB_TYPE CALIBRATION # CALIB_TYPE_ID # CALIBRATION_ID * DESCRIPTION define * TSTART o TEND be defined by AHHH. or normalization as we refer to it. 54 . back to normal.

and that will allow you to run against your database 55 . Relational Design What to expect from a design tool  An entity relationship diagram  The ability to create the ddl (data definition language) needed  The ability to project disk space usage  Ddl in a format to allow you to enter the code into a code library (cvs).

Application sharing is enhanced when others can look at your design and determine whether the application is reusable in their environment. Sam is a good example of an application that 3 experiments are now using. 56 .Relational Design Why bother? Experience from RunII TO SAVE TIME AND PRECIOUS PEOPLE RESOURCES! Personnel consistency does not exist. The documentation that a design product provides will the next developer an immediate understanding of the application in picture format. Application developers come and go regularly.

It is the pictorial answer to many issues. 57 . and quite possibly the wallet of the application leader. When an application is under construction. the designer tool should assist with this task. Relational Design Why bother? Cont. Planning for disk space has been an issue. the ER diagram goes to every application meeting.

How far out do I need to plan? Initially 2-4 years. emergencies. 58 . What if my plan fails or looks undoable? Nip it in the bud. software. obsolescence. How often do I need to review the plans? Annually. come up with options. hardware. be proactive. maintenance. Planning Overall What do I need to plan for? People.

Experience tells us that the database machine should serve 1 master (if it is a large database or mission critical). It is hard to predict the number of rows in a table. Ideally there will be root. nothing else. Planning Overall  Disk space requirements. My experience is all the wags. No apache. the database. It would be easier if we knew the amount and results of the science ahead of time! Remember. (wild guesses) fall short of what is needed. 10x what you think the data will take. no applications. no log file areas. 59 .  Hardware requirements. etc . oracle for example. a database monitor user and a database user.

60 . Plan for 3-4 years before needing to replace hardware. Do you change the oil in your car? Plan on 1 morning per month downtime for caring for the hardware and software. If the downtime is not needed.  Maintenance. Planning Overall  Growth and obsolesce. New/upgraded software gives addition functionality that you will want/need. Fire walling will not protect you from bugs and obsolescence. Hardware and software become obsolete. I cannot stress how important this is. Planning maintenance time is as important as planning to buy disks. it will not be taken. Security patches could mandate additional stoppages.

Planning User Requirements Will user requirements influence your hardware & software decisions? Do you need replication? What architecture is your api going to be? How many users will be loading the database and hardware? 61 .

Major version upgrades provide needed and new functionality. One always hopes one can get on a stable version of something and not upgrade. 62 . Planning Maintenance  Database/Operating system software need upgrades. That is a fallacy. Bug patches and security patches are a never ending fact of life.

Vldb is normally defined as mulitple Gig or tera byte databases. Planning Backup and Recovery Backup and recovery procedures of vldb (very large databases) are difficult at best. Insure when planning for hardware there is plan for backup and recovery. This is probably the most sensitive area when choosing a freeware database. Hardware plays a part here as well. Disk and tape may be needed. 63 .

Anger and disappointment are lessened. Planning Good Practices with a Hammer Make a standards document and enforce its use. System as well as database standards need to be followed and enforced. When dbas and developers are always on the same page. life is easier for both. 64 . Expectations are clear and defined.

65 . we are down! Everyone always wants 24x7 scheduled uptime. hardware choices. possibly dictate. Planning Failover Yikes. fte requirements. 98-99% uptime can be realized at a much lower cost. Until they see the cost. Uptime requirements will influence. Make anyone who insists on real 100% uptime to justify it (and pay for it?). database choices.

Unless you are using a commercial database with roll forward recovery. This should dictate your backup schedule. Make sure your database and database software are backed up. Practice different scenarios. Do not forget tape backups as a catastrophic recovery method. Planning Failover The cheapest method of addressing a failure is proactive planning. delete the entire database. 66 . delete a datafile. Practice recovery on your integration and development databases. assume you will lose all dml since your last backup if you need to recover.

Replication can improve the performance and protect the availability of applications because alternate data access options exist. Replication Replication is the process of copying and maintaining database objects in multiple databases that make up a distributed database system. 67 .

 Streams allows ddl modifications made to the master automatically.  When a replica is under maintenance there is failover to another replica.   Advanced replication also supports master to master . Replication Cont.  Oracle Supports 3 types of replication  READ ONLY Snapshots (Materialized views).  The replicas are up and running in read only mode if the master is down for maintenance. But streams based replication is recommended. 68 . READ ONLY Snapshots replication from a Sun box to a Sun & Linux box(s) is being done in CDF.  Streams can be configured in uni-directional ( Single Source and one or more than targets) or master to master where updates can happen to any participant database. Advanced Replication and streams based replication.

Oracle master to master replication allows for updates on both the master and replica sides. It seems to be the 1 st option the unwitting opt for. Every link in the multi master would be required to be a fully staffed. 69 . as downtime will be critical. Master to master is a complex and a high maintenance replication. Replication cont. Both Cern and Fermi dbas have requested firm justification before considering this type of replication request.

Reinstantiation is not unplug and play type of situation. then source db should be tuned enough to hold the archives logs. Data Model should be designed very carefully. sync up of data will be challenge or one may lead to reinstantiation of replication. there is very much chance of overlapping the sequences and will cause integrity constraints. all master sites should be in 24*7 support mode. Conflict Resolution In Master to Master. 1. Otherwise . If receiving site is down for extended period of time. conflict resolution may be challenge. Disk Space for Archives. Space. 4. 2. 3. Design of Data Model if Primary Keys are populated by sequences . Reasonable downtime for target depends upon archive area being generated on source. otherwise. Rules should be well defined to resolve the data conflicts. DB Support In Master to Master Replication. one has to reinstantiate the replication. 70 . Replication cont. space and more space.

1 is out). 71 . It is master-slave replication using binary log of operations on the server side. There is a PostgreSQL replication tool. We have not tested it yet. v4.32. Freeware Replication MySQL has replication in the last stable version (3. It is possible to build star or chain type structures.23.

Disk hardware becomes unsupported. data volume initial estimates were undersized by a factor of 2 or 3. This estimate was surpassed Feb. Imho. RunII events were estimated at 1 billion rows. Lost in Space Space is the 1 area consistently under estimated in every application I have seen. We will probably end up with 4-5 billion event rows. consistently. 72 . That is a lot of disk space. For example. 2004. and obsolete in what seems to be a blink of an eye.

73 . N Gb 8 x N Gb Unexpected? AllData databases Index useRedo disk toRollback store data. • Evaluate growth at end of year 1. Lost In Space cont.  Operate in 2 year cycles: • First 2 years storage available on day 1. begin prep of next 2 yr. Data Index Backup Replication mirror mirror Good rule of thumb:       You need 10x the disk to hold a given amount of data in an RDB.

Lost in Space. You will use as much disk space as you purchase. New disks are not installed and configured over night. They require planning. downtime and $. probably considerably more. Give WIDE lead time to purchase disk storage. 74 . cont. Database indices will take MINIMALLY at least as much space as the tables. and then some.

com/articles/evodb.syronex. Additional References **WARNING some of these may be database pring/cs4400a/  Intro to Oracle tutorial http://w2.html mentions 1 dba for atlas  Sql course  Evolutionary Database Design http:// www.  Intro to database design 75 . ware/  db infrastructure standard.fnal. support levels. Additional References  ***Highly recommended / 76 . for fermi computing http://www- css. db comparatives http://www-css.

h tm#designer (choose Oracle Designer tutorial or Oracle Designer Short Cuts and Lessons Learned)  Btev specific additional information http://www- Additional References  Oracle Designer tutorial http://www- html 77 .