Database Concepts

What is a Database
    

Personal address book in a Word document Collection of Word documents Collection of Excel Spreadsheets Data collected, maintained, & used in airline reservation Data used to support launch of a space shuttle Structured Set of Data

Models of Reality
DML REALITY • structures • processes DDL DATABASE SYSTEM DATABASE

  

Database is a model of structures of reality Use of a database reflect processes of reality Database system is a software system which supports definition & use of a database
 

DDL: Data Definition Language DML: Data Manipulation Language

When does DBMS is used
        

Persistent Storage of Data Centralized Control of Data Control of Redundancy Control of Consistency & Integrity Multiple User Support Sharing of Data Data Independence Control of Access & Security Backup & Recovery

Data Modeling
REALITY • structures • processes DATABASE SYSTEM data modeling MODEL

 

Model represents a perception of structures of reality Data modeling process is to fix a perception of structures of reality & represent this perception In data modeling process we select aspects & we abstract

Database Design
purpose of database design is to create a database which
  

is a model of structures of reality supports queries & updates runs efficiently

Database Terminology
      

Data Models Keys Integrity Triggers & Stored Procedures Null Values Surrogates Normalization

Data Model
A data model consists of notations for
  

expressing:

Data Structures Integrity Constraints Operations

Data Model - Data Structures
All data models have notation for defining:  entity types  attribute types  relationship types

FLIGHT-SCHEDULE FLIGHT# 101 545 912 242 AIRLINE delta american sc&inavian usair WEEKDAY mo we fr mo PRICE 156 110 450 231

DEPT-AIRPORT FLIGHT# 101 912 545 AIRPORT-CODE atl cph lax

Data Model Constraints
Rules that cannot be expressed by Data Structures
  

Static constraints apply to database state Dynamic constraints apply to change of database state E.g., “All FLIGHT-SCHEDULE entities must have precisely one DEPT-AIRPORT relationship
FLIGHT-SCHEDULE FLIGHT# 101 545 912 242 AIRLINE delta american sc&inavian usair WEEKDAY mo we fr mo PRICE 156 110 450 231 DEPT-AIRPORT FLIGHT# 101 912 545 242 AIRPORT-CODE atl cph lax bos

Data Model Operations
Operations support change & retrieval of data:
 

insert FLIGHT-SCHEDULE(97, delta, tu, 258); insert DEPT-AIRPORT(97, atl); select FLIGHT#, WEEKDAY from FLIGHT-SCHEDULE where AIRLINE=‘delta’;

FLIGHT-SCHEDULE FLIGHT# 101 545 912 242 97 AIRLINE delta american sc&inavian usair delta WEEKDAY mo we fr mo tu PRICE 156 110 450 231 258

DEPT-AIRPORT FLIGHT# 101 912 545 242 97 AIRPORT-CODE atl cph lax bos atl

Keys
Keys are uniqueness constraints

A key on FLIGHT# in FLIGHT-SCHEDULE will force all FLIGHT#’s to be unique in FLIGHT-SCHEDULE Consider following keys on DEPT-AIRPORT:

FLIGHT-SCHEDULE FLIGHT# 101 545 912 242 AIRLINE delta american sc&inavian usair WEEKDAY mo we fr mo PRICE 156 110 450 231

DEPT-AIRPORT FLIGHT# 101 912 545 242 AIRPORT-CODE atl cph lax bos

Integrity

Integrity:
– does model reflect reality well? – is model without internal conflicts? a FLIGHT# in FLIGHT-SCHEDULE cannot be null because it models existence of an entity in real world a FLIGHT# in DEPT-AIRPORT must exist in FLIGHT-SCHEDULE because it doesn’t make sense for a non-existing FLIGHT-SCHEDULE entity to have a DEPT-AIRPORT
DEPT-AIRPORT WEEKDAY mo we fr mo PRICE 156 110 450 231 FLIGHT# 101 912 545 242 AIRPORT-CODE atl cph lax bos

FLIGHT-SCHEDULE FLIGHT# 101 545 912 242 AIRLINE delta american sc&inavian usair

Triggers & Stored Procedures

Triggers can be defined to enforce constraints on a database, e.g.,
DEFINE TRIGGER DELETE-FLIGHT-SCHEDULE ON DELETE FROM FLIGHT-SCHEDULE WHERE FLIGHT#=‘X’ ACTION DELETE FROM DEPT-AIRPORT WHERE FLIGHT#=‘X’;

FLIGHT-SCHEDULE FLIGHT# 101 545 912 242 AIRLINE delta american sc&inavian usair WEEKDAY mo we fr mo PRICE 156 110 450 231

DEPT-AIRPORT FLIGHT# 101 912 545 242 AIRPORT-CODE atl cph lax bos

Null Values
123-45-6789 234-56-7890 345-67-8901
CUSTOMER#

Lisa Smith Lisa Jones George Foreman inapplicable unknown Mary Blake

NAME

MAIDEN NAME

DRAFT STATUS

inapplicable drafted inapplicable

Null-value unknown reflects that attribute does apply, but value is currently unknown. That’s ok! Null-value inapplicable indicates that attribute does not apply. That’s bad! Null-value inapplicable results from direct use of “catch all forms” in database design “Catch all forms” are ok in reality, but detrimental in database design

Surrogates
reality
name custom# addr customer
customer custom# name addr

customer

surrogate-based representation

surrogates are system-generated, unique, internal identifiers

DATA MODELS
   

ER-Model Hierarchical Model Relational Model Object-Oriented Model

ER Model
dept time airport name airport addr airport code 1 airport 1 street city zip arriv time customer# customer name reservation date n customer n n arriv airport n 1 dept airport n p domestic flight

visa required
international flight

weekdays

flight schedule flight#

instance of

flight instance

seat#

Hierarchical Model
flight-sched flight# flight-inst date customer customer#

dept-airp airport-code customer name

arriv-airp airport-code

parent-child relationship types (1:n only!!): (flight-sched,flight-inst), (flight-inst,customer)  one record type is root, all or record types is a child of one parent record type only  substantial duplication of customer instances

Relational Model

Commercial systems include: ORACLE, DB2,
SYBASE, INFORMIX, INGRES, SQL Server

Dominates database market on all platforms

Relational Model Data Structures
  

domains attributes relations

relation name attribute names flight-schedule flight#: airline: integer domain names char(20) weekday: char(2) price: dec(6,2)

Relational Model Integrity Constraints
  

Primary Keys Entity Integrity Referential Integrity
customer customer# p reservation flight# date customer# customer name

flight-schedule flight# p

Relational Model Operations
 

Powerful query languages Procedural; describes how to compute a query; operators like JOIN, SELECT, PROJECT Declarative; describes desired result, e.g. SQL, QBE insert, delete, & update capabilities

Object-Oriented Model
based on object-oriented paradigm, e.g., Simula, Smalltalk, C++, Java area is in a state of flux  object-oriented commercial systems include GemStone, Ontos, Orion-2, Statice, Versant, O2  object-relational model has relational repository model; adds object-oriented features;  object-relational commercial systems include Starburst, POSTGRES

Clinical Database

Data from Clinical Trial stored in some computer system
– Manual – Electronic

Complex Data Cleaning, Review & Reporting Excel, Access, SAS, Oracle

Clinical Db Design
Always follows Protocol Creation  Accuracy  Clarity  Ease & Speed of Data Entry  Creation of Analysis Datasets  Source Data Transfer formats  DBMS Requirements

Normalization

 

Step-by-step decomposition of complex records into simple records Reduces redundancy Non-loss decomposition
– To avoid inconsistencies – Update anomalies

Clinical Db Normalization
    

Tall Skinny Vs Short Fat Normalized Vs Non-normalized Fewer Columns & More Rows Efficient Access & Storage Reducing the Size of Data Groupings or Records Does not apply to CRF Image Db

Short Fat

Patient Visit ID 1001 2

BP_DI A_1 120

BP_SY S_1 72

BP_DI A_2 118

BP_SY S_2 70

BP_DI A_3 117

BP_SY S_3 68

Tall Skinny
Patien Visi Measureme t ID t nt 1001 2 1 BP_DI A 120 BP_SY S 72

1001

2

2

118

70

1001

2

3

117

68

Short Fat Vs Tall Skinny

Short Fat
– Data Cleaning Checks for variables within a single visit is easy – Missing Values easily detected

Tall Skinny
– Easy Creation of Structures & Associated Checks – Data Querying is easier

Thank You

Sign up to vote on this title
UsefulNot useful