DATABASE MANAGEMENT

CSYS2404

LECTURE NOTES
© Mrs. Gaye Campbell 2010

© Gaye Campbell 2010

1

Database Management

TABLE OF CONTENTS

SYLLABUS/COURSE OUTLINE ........................................................................................................................................7

UNIT I – Introduction of Database Concepts ...........................................................................8 UNIT II – Database Design .....................................................................................................8 UNIT III – Introduction to Relational Algebra and SQL ..........................................................9 UNIT IV – Distributed Databases ............................................................................................9 UNIT V – Security Issues ...................................................................................................... 10
UNIT I: INTRODUCTION TO DATABASE CONCEPTS ............................................................................................... 12

The need for File Systems and Databases .............................................................................. 12 Basic Concepts ...................................................................................................................... 12
Sample Payroll Database Structure ................................................................................................ 14

The traditional/file oriented approach .................................................................................... 15
Problems with the Traditional approach ........................................................................................ 15

The database approach .......................................................................................................... 16
DBMS (Database management systems) ........................................................................................ 17 Functions common to most databases ........................................................................................... 18

Advantages of databases ........................................................................................................ 19 Disadvantages of databases ................................................................................................... 19 Components of a DBMS ....................................................................................................... 20 The different types of databases/Database Models ................................................................. 21
Hierarchical ................................................................................................................................... 21 Network ........................................................................................................................................ 23 Relational ...................................................................................................................................... 25 Object-Oriented ............................................................................................................................. 26 Object-Relational ........................................................................................................................... 31 Multidimensional........................................................................................................................... 32
UNIT II: DATABASE DESIGN.......................................................................................................................................... 34

Introduction to the Database System Life Cycle (DBLC)....................................................... 34
Analysis and design phase.............................................................................................................. 34 Database implementation and operation phase ............................................................................. 34

Roles of database personnel................................................................................................... 36
Data modellers .............................................................................................................................. 36 Business Analysts ........................................................................................................................... 36

© Copyright G. Campbell 2010

2

Database Management
Database Designers ....................................................................................................................... 36 Systems Analysts [see Business Systems course] ............................................................................ 37 Programmers ................................................................................................................................. 37 Database Administrators ............................................................................................................... 38

Database Design – Conceptual, Logical, Physical .................................................................. 40
Conceptual design ......................................................................................................................... 41 Logical Design ................................................................................................................................ 41 Physical Design .............................................................................................................................. 41

Database Schema or Levels of abstraction in specifying a database structure ......................... 43
Definition of database schema ....................................................................................................... 43 Explanation of the four database schema ...................................................................................... 43

Entity- Relationship Diagrams ............................................................................................... 47
Types of relationships .................................................................................................................... 47 The symbols used in an ERD ........................................................................................................... 48 Sample ERDs .................................................................................................................................. 48 Example of Creating the ERD.......................................................................................................... 50 Entity and Referential Integrity ...................................................................................................... 51 ERD Exercises................................................................................................................................. 52

Functional Dependencies ....................................................................................................... 53 Computation of Closures ....................................................................................................... 53
Algorithm for finding the closure of a set of attributes ................................................................... 54 Closure Exercises ........................................................................................................................... 54

Armstrong’s Axioms ............................................................................................................. 55
Reflexivity ...................................................................................................................................... 55 Augmentation................................................................................................................................ 55 Transitivity..................................................................................................................................... 55 Examples ....................................................................................................................................... 55 EXERCISE ....................................................................................................................................... 55

Covers and their role in determining redundant FDs .............................................................. 56
Algorithm to find redundant FDs. ................................................................................................... 56 Exercises - Find the redundant FDs in the following sets: ............................................................... 56

1st , 2nd , 3rd Normal Forms .................................................................................................... 57
Definition - A relation is in first normal form (1NF) if: ..................................................................... 57 Definition - A relation is in second normal form (2NF) if: ................................................................ 58

© Copyright G. Campbell 2010

3

............................. 78 ALTER TABLE .................................................................. 66 The role of Relational DMLs and DDLs. 76 CREATE TABLE (using constraints – primary key... aggregate functions................................................................................................................ 82 SELECT sub queries ........................... 68 Division.................................................................................................................................. 93 © Copyright G........................................................................................................................ 69 Cartesian product.................. comparison operators) .............................................................................................. equi..................................................................... ................ inner...................... 68 Difference (or Set Difference) ....... ............. GROUP BY............................................ 73 SQL Commands – LAB PORTION .................................................... 61 Normalization Exercises to 3NF.................................................................................................................................................................................................. 76 Brief Summary of Commands.......................................................................................................................... 80 INSERT .................................... 72 Relational Algebra Exercises ..........................Database Management Definition ....................... 91 DELETE ................................................................ 86 Operations on Result Sets ..... 68 Renaming ................ ........................................... ................................................................................ 68 Join (natural............................................................................................ ORDER BY............................................................................................................................................................................... 66 The languages used in database systems ............................................... 68 Selection................................................ 58 Comprehensive example (1NF to 3NF) ..... 59 Another example of the process...................... 63 Assessment of file layouts as they affect the functioning of a database..................................................................................... 92 CREATE INDEX ........................ 65 UNIT III: INTRODUCTION TO RELATIONAL ALGEBRA AND SQL .......................................................................... 68 Simple projection.................... 92 CREATE VIEW ................................... 68 Intersection ........................................................................................................................ ..................................................................................... 66 The difference between relational algebra and relational calculus...... logical operators................................................... .................... 89 UPDATE .......................................................................................................................................................................................... 68 Union......................................................................................................................................................................................................... outer)...... foreign key) .................................................................................................................................................... HAVING.. 67 Relational algebra...................................................................................................................................................................................65 Physical and logical data organization.... Campbell 2010 4 ................................................................................................................................................................................................... ................................................................................................................................................................................A relation is in 3rd normal form (3NF) if: ........................... 81 SELECT (using WHERE....................................

... 95 EXERCISE 2 – INSERT....................................... 101 Disadvantages ............ WILDCARD cont’d............................................................................. 95 EXERCISE 4 ...... Consistent................................................................................................................................................................................................................................. DROP INDEX ........................................................................................................... 113 What are Security Risks?............................................................. 109 Data mining ................................................................................................................................... 100 Advantages and disadvantages of a distributed database ............ 93 DROP INDEX .................................................................................................................................................................................... 116 Integrity Preservation – keys (primary and foreign).................................................................................... SELECT USING UNION .......................................................................................................... 113 The role of the Data Dictionary .............................................................................................................. 101 Advantages ................. 104 Differences between data warehouse and operational database .................................................................................. 93 COMMIT and ROLLBACK ............. 95 EXERCISE 1 – CREATE TABLE AND ALTER TABLE STATEMENTS ......................................................................................................................................................................................................SELECT STATEMENT ........................................................................................................................................................................... local and global application................... 96 EXERCISE 5 – DISTINCT............................................................................................................................................. CREATE INDEX................. 94 SQL EXERCISES .......................................................................... 106 Data mart ............................................ 99 Assessment of a distributed database versus a loose connection of independent site ........................ Durable (ACID) ..................................................... 117 Keys ................................................................................................................. 110 Transactions – Atomic.....Database Management DROP TABLE ............................................................... Isolated.......................backup and restore methods ......................................... 111 Concurrency control ............................................................................................................................... 108 On-line analytical processing............... 95 EXERCISE 3 ............................. DROP TABLE.. Campbell 2010 5 ............ 93 GRANT and REVOKE ........... SUB QUERY......................................... 117 © Copyright G..................................... 114 Database protection methods .. 111 UNIT V: SECURITY ISSUES ..... 113 Security risks and their effects .... 99 Definition of logical database....................................................................................... data validation........................................................................................................................ 102 Practice Questions ................................................ 99 Characteristics of a distributed database ................................................................................................................................................................................................................................................................... 117 Data Validation ................................................... DELETE................................................. global intelligence ................SELECT STATEMENT USING MORE THAN ONE TABLE ............................................................................... UPDATE.............................................. 96 UNIT IV: DISTRIBUTED DATABASES............................................ 100 Terms and concepts used in distributed databases ............... 103 Data warehouse ......................................................... authority levels ............................. 93 DROP VIEW ............ 113 What is data security?..... 96 EXERCISE 6 – REVIEW OF ALL COMMANDS............................

.................................................................................................................................................. anti-virus............................................ encryption....................................... Campbell 2010 6 .................................... 123 © Copyright G................. 121 REFERENCES ...........Database Management Authority Levels ................ SQL views ......................................................................................................................................................... firewall........ 118 SAMPLE SQL CODE FOR RECREATING DATABASE .................................................. 118 Security Control – unauthorized access and use.....

understand various terms used in Database Management 2. SQL and normalization. Campbell 2010 7 . understand how to normalize up to 3NF 9. appreciate the advantages of the database approach 3. appreciate the differences between Logical and Physical Database Design and organization 7. but this can be extended to any available DBMS. use SQL commands 10. understand how to solve relational Algebra problems 12. understand distributed database concepts 13.Database Management SYLLABUS/COURSE OUTLINE THE COUNCIL OF COMMUNITY COLLEGES OF JAMAICA COURSE NAME: COURSE CODE: CREDITS: CONTACT HOURS: PRE-REQUISITE(S): CO-REQUISITE(S): SEMESTER: COURSE DESCRIPTION: This course is designed to ensure that the student completes a study of Database Management Systems. understand key components of a database management system 4. appreciate the importance of maintaining data integrity and security 14. know the steps in the Database System Life Cycle 6. GENERAL OBJECTIVES: Upon successful completion of this course. appreciate the historical transformation of database models and DBMS 5. understand the application of Entity Relationship Diagrams Database Management CSYS2404 3 45 (45 hours theory) None None © Copyright G. students should: 1. Students will be exposed to database concepts including functional dependencies. understand functional dependencies 8. Emphasis will be placed on the creation and manipulation of databases using Oracle. understand how to create reports using ad-hoc SQL commands 11.

table/file. identify hardware. Components of a DBMS – DDL. discuss the file oriented versus the database approach 3. discuss the concept of database schema 6. identify the Phases in the Database System Life Cycle 3. students should be able to: 1. The database approach 4. discuss concepts of entity and referential integrity 8. database. composite key. discuss the differences between physical and logical data organization Content: © Copyright G. network. relational. Advantages of databases 5. super key. secondary key. network. object-oriented and object-relational models Content: 1. candidate key 2. Query Language. discuss advantages associated with database approach as opposed to file-oriented approach 4. Report Generator 6. describe features of hierarchical. record. logical and physical data design 5. utilize ERDs to capture data requirements 7. field. define the Database System Life Cycle 2. foreign key. The different types of databases – hierarchical. discuss Functional Dependencies (FDs) 9. assess file layouts as they affect the functioning of databases 12. object-relational UNIT II – Database Design Specific Objectives: Upon successful completion of this unit. normalize to 3NF 11. Database Management System. software and DBMS components 5. primary key. define key terms associated with database management 2. The traditional/file oriented approach 3. object-oriented. identify the roles of database personnel 4.Database Management UNIT I – Introduction of Database Concepts Specific Objectives: Upon successful completion of this unit. find redundant FDs in a set 10. DML. Campbell 2010 8 . relational. students should be able to: 1. discuss conceptual. Basic Concepts – character.

aggregate functions. GRANT and REVOKE. renaming. CREATE INDEX. inner. 1st . intersection. SQL Commands . union. DROP VIEW. difference. discuss and identify the role of Relational DMLs and DDLs differentiate between relational algebra and relational calculus solve Relational Algebra problems utilize SQL commands Content: 1. Physical 4. Entity. DROP TABLE. 3. 3. Introduction to Relational algebra – Simple projection. UPDATE.Database Management 1. Assessment of file layouts as they affect the functioning of a database. The difference between relational algebra and relational calculus. Programmers and Database Administrators. comparison operators). students should be able to: © Copyright G. Database Schema 5. Database Design. Database Implementation. SELECT (using WHERE. UNIT IV – Distributed Databases Specific Objectives: Upon successful completion of this unit. 3rd Normal Forms 12. Database Design – Conceptual. foreign key).Database Analysis. Physical and logical data organization. Campbell 2010 9 . Roles of database personnel . join (natural. HAVING. selection. ORDER BY. Database Testing and Evaluation. INSERT. Systems Analysts. The role of Relational DMLs and DDLs. 2. Logical. 4. Armstrong’s Axioms 10.Relationship Diagrams 6. The Database Management System Life Cycle . COMMIT and ROLLBACK. Business Analysts. SELECT sub queries. logical operators. 13. Database Designers. 2nd . DROP INDEX. Computation of Closures 9. Entity and Referential Integrity 7. 4. students should be able to: 1. division. 3. ALTER TABLE. 2. outer) and Cartesian product. CREATE VIEW. Covers and their role in determining redundant FDs 11. GROUP BY.Data modelers. Operation. DELETE. UNIT III – Introduction to Relational Algebra and SQL Specific Objectives: Upon successful completion of this unit. equi.CREATE TABLE (using constraints – primary key. Functional Dependencies 8. Database Maintenance 2.

differentiate between a data warehouse and a data mart 7. fragmentation – vertical/horizontal. define characteristics of Distributed Databases 2. data validation.backup and restore methods Integrity Preservation – keys (primary and foreign). Isolated. and allocation 5. Durable (ACID) 12. Concurrency control UNIT V – Security Issues Specific Objectives: Upon successful completion of this unit. discuss data warehousing 6. 4. firewall. Differences between data warehouse and operational database 9. Data mart 7. 3. anti-virus. Data warehouse 8. authority levels Security Control – unauthorized access and use. 2. The role of the Data Dictionary Database protection methods . encryption.Database Management 1. 4. Campbell 2010 10 . discuss the concept of data mining 10. discuss the concept of transactions and concurrency control Content: 1. Consistent. 4. local and global application. assessment of a distributed database versus a loose connection of independent sites 3. students should be able to: 1. Transactions – Atomic. identify the role of the Data Dictionary/ Directory identify methods used in database protection discuss methods used in integrity preservation identify and discuss security control techniques Content: 1. 2. discuss On-line analytical processing (OLAP) 9. 3. differentiate between a data warehouse and an operational database 8. SQL views © Copyright G. Advantages and disadvantages of a distributed database 6. Data mining 11. define terms and concepts used in the distributed database environment 4. 2. homogeneous versus heterogeneous distribution. replication. On-line analytical processing 10. identify advantages and disadvantages of distributed databases 5. 3. global intelligence Assessment of a distributed database versus a loose connection of independent site Terms and concepts used in distributed databases – transparency. Characteristics of a distributed database Definition of logical database.

Campbell 2010 11 . J.A. Lab METHODS OF ASSESSMENT AND EVALUATION: 1. J. Shah.).). (2004) Database systems using oracle. H.) . & Topi. Discussions 3. Common Coursework 2. Prescott. C. NJ: Prentice Hall. Internal Tests 3. Recommended: Date. (2nd ed.Database Management METHODS OF DELIVERY: 1.. NJ: Prentice Hall. N. Lectures 2. (2008) Modern database management. 20% 20% 60% © Copyright G. (9th ed. Final Examination RESOURCE MATERIAL: Prescribed: Hoffer. M. (8th ed. NJ: Addison Wesley. (2003) An introduction to database systems.

A table is-a collection of similar records. storing and retrieving data. A single-unit of data in its simplest form. that are arranged to express information and belongs to a character set (e.g.g. Campbell 2010 12 . that the value is not decomposable. which may be performed on the stored value. In the example employee table below the “lastname” field would contain all of the last names of the employees in the table.Database Management UNIT I: INTRODUCTION TO DATABASE CONCEPTS The need for File Systems and Databases In order to be competitive in today’s data driven environment. Data type/Field type Record/Row/Tuple The physical representation of a data value. ASCII represented by 8 bits). Data management is the process of identifying effective and efficient methods of collecting. such as letters or numbers. the same structure. Table/File/Relation A group of records having. A data type is a unified set of data values that is integrated with a set of operations that allows the effective manipulation of each data value within the set. A field contains a specific piece of information within a record. Over the years. This means that what is information for someone may be data for another. Basic Concepts Term/Concept Data Information Character Definition Field/Attribute/Column Raw facts which are important to an organization Organized-data. this need has given rise to the emergence of two distinct data management approaches: the file approach and the database approach. The data type determines what kind of data may be stored in the field and it also determines the operations. business organizations have to be concerned with the concept of data management. place. Each field is allowed to hold an atomic value. These data item (values) are often stored in fields. One of a set of symbols. the employee table has all of the © Copyright G. which means that all the records within a table must have the same structure (physical and logical). A group of related fields. It captures all of the records of a particular type of entity. A record is defined as being a collection of related data. It is an attribute or characteristic of an entity. In order to store information each field has to be associated with a data type. Before we look at the differences between the file approach and the database approach we need to be aware of some basic file/database concepts. event or thing. E. A record in an employee table would contain specific information about a particular employee. A record contains information about a given person. A field name uniquely identifies each field.

The users will have the following facilities: add new files. but not all superkeys are keys.g. and management of a database. which are stored in a table This is a field or a collection of fields whose collective value is used to order the information in a database table. access. Student id number. It may be a person. and delete files. Name) An attribute that can serve as a primary key. Employee data. is also unique. A primary key cannot allow Null values and must always have a unique index. Chassis number. Employee id number.g. A super key is a collection of one or more fields whose collective value creates a unique value. update data. Attribute(s) used to identify an entity Key Primary key The primary key is one or more fields whose values uniquely identify each record in a table. car. delete data.Database Management employee records. Bar code Department id etc. (Null values indicate that the field is empty). (an alternate key). Library book management. event or thing. Fields that could be used as primary keys include:TRN. bank account etc. Engine number. Student. Sales Customer data. insert new data. Campbell 2010 13 . Part number.g. A primary key that consists of two or more attributes The primary key of one entity that is placed in a second entity for the purpose of accessing the first entity All keys are superkeys. E. employee. that is. Banking. which collectively stores and provides the information needed by an organization. It can allow null values. License plate number. library book. It allows creation. NIS number. on an employee table the TRN may be used as the key but the NIS No. An entity is an object or event about which someone chooses to collect data. the type of data that will be held in the table. E. Secondary key Candidate key (minimal superkey) Composite key Foreign key Superkey Index key Query A set of attributes used for identifying records but not uniquely (e. ISBN on books. The importance of a super key is that it allows us to make a distinction between the records. Student Registration Database Management Systems (DBMS) Complex system software which constructs and maintains the database in a controlled way. The main purpose of an index is to speed up data retrieval A question about the data stored in your tables. Entity Database (db)/Information Repository A database is a collection of tables. Reference number. The structure of the table is described by the fields. Inventory management/Stock. Passport number. place. A primary key is used to relate a table to foreign keys in other tables. Common types of databases in society include:Payroll. A database system is essentially nothing more than a computerized record-keeping system. Supplier data. or a request to perform © Copyright G. retrieve data.

A database object on which you place controls for taking actions or for entering. or data access page. report. A database object that prints information that is formatted and organized according to your specifications.Database Management an action on the data. displaying. An attribute that is not a part of the primary key Form Report Non prime attribute Sample Payroll Database Structure © Copyright G. Campbell 2010 14 . It can bring together data from multiple tables to serve as the source of data for a form. and editing data in fields.

These differences were typically seen in 3 areas: • Typographical errors in the duplicated data • Data type differences in the duplicated data • Differences in the logical representation of the duplicated data Programming problems The programming languages that were available during this period of time were all 3rd Generational Languages.Database Management The traditional/file oriented approach The file processing approach is an approach to storing and managing data where each department within an organization typically has its own set of files. Data problems These problems were brought about by the differences in the format of the duplicated data. In this methodology. The file approach is often called the traditional approach. the process of' data management is " handled in an unstructured and ad-hoc" (unplanned) manner. which are also known as procedural languages. Can you use the above approach to do this query? Find the employees making < $23000 who a) work in warehouse with floor area larger than 30000 square feet. © Copyright G. Data flows from program to program. Procedural languages suffer from two deficiencies. b) have issued an order to supplier “S6”. Campbell 2010 15 . which makes it difficult to write programming routines that manipulate data within the data files These 2 deficiencies are known as structural dependence and data dependence. Files are designed to meet needs of a given program. This means that the data files and the programs which manipulate these files are created on a departmental basis without due consideration of the needs of the other departments. The focus is on procedures. Problems with the Traditional approach The problems created by this approach may be divided into 2 categories : data problems and programming problems.

The problems are as follows: • Application program dependent. • File structure changes severely impact existing programs. • Poor data control – with no centralized control at the data element level it is common for the same data element to have multiple names • Often difficult to understand The database approach © Copyright G. There are also typographical errors in the duplicated data) • Inconsistent data arises when one program does an update and another does not. Data dependence This is the situation in which a programmer needs to have a knowledge of the physical representation of the data within the file in order to write programming routines to manipulate the data.g Prog 1 cannot access directly those files designed for Prog 2 (Files are often design specifically for their particular application) • Separated and Isolated data – Resulting in difficulty to access data stored in different files • Incompatible files • Files must be pre-sorted • Redundant data can arise as new programs are written (The same fields are stored in multiple places. E.Database Management Structural dependence this is the situation in which a programmer needs to have a knowledge of the representation of the logical structure of a file in order to write programming routines to manipulate the data within the file. The logical structure of a file is concerned with the order in which data occurs within the file. Campbell 2010 16 . the chance for errors is increased.

It allows creation. The data resource is separate from the programs.Database Management In the database approach many programs and users share the data in the database. Users access data using software called a Database Management System (DBMS). Formal Definition: A database is a single organized collection of structured data. A nonprocedural language does not suffer from the deficiencies of a 3rd Generational Language. The result is that all the problems that were generated because of duplicated data are now eliminated. access. a 4th Generational Language supports structural independence and data independence. which is accessed by the entire organization. In fact. A set of related files. • It consists of a collection of interrelated data and a collection of programs to access that data. We have now eliminated the duplication of data by ensuring that there is a centralized pool of data. This results in a pool of centralized data. Structural independence – the situation in which the logical representation of a file structure is not needed in order to write programming routines for manipulating the file contents. © Copyright G. Data independence . and management of a database.the situation in which the physical representation of data is not needed in order to write programming routines for manipulating the data. A DBMS is usually purchased from a software vendor and is the means by which an application program or end-user views and manipulates data in a database. The second step in the database approach is the use of a 4th Generational Language. The first step in the database approach is to perform a data requirements analysis of the organization as a whole. The focus is on the data and not on procedures. This data is common to all users of the system but is independent of programs which use the data. DBMS (Database management systems) • The DBMS is an item of complex system software which constructs and maintains the database in a controlled way. In the database approach data management is handled in a structured and planned manner. which is also known as a nonprocedural language. In other words we are concerned with identifying the data needs of the organization not just the data needs of the specific department. Database .An organized collection of data. Campbell 2010 17 . The data describe one particular enterprise. stored with minimum of duplication of data items so as to provide a consistent and controlled pool of data. which is then shared among the various organizational departments. The first step in the database approach is geared towards solving the data problems that were present in the file approach.

The main aspects of this are:. (The user is unaware of the structure of the database.protecting data against unauthorized access. printed or stored o Consists of simple English-like statements o Each has its own grammar and vocabulary o Usually quickly learned by non-programmer • Form o A window used to enter and change data o When well designed validates data as entered reducing data entry errors • Report Generator o Also called report writer o Allows users to design a report on the screen o Normally used only to retrieve data • Data Security o A DBMS provides means to ensure that only authorized users access users at permitted times o Most DBMSs allow different levels of access privileges • Backup and Recovery o A DBMS provides a variety of techniques to restore a damaged or destroyed database to usable form. and so that separate items of data can be cross referenced. Campbell 2010 18 . This allows redundant data to be removed. • The DBMS keeps statistics of the use made of the data. © Copyright G. (Research: Look up hashing) • The DBMS also has the function of providing security for the data. • It also allows data which is frequently used to be kept in a readily accessible form so that time is saved. updating existing records and deleting unwanted records • Query Language o Allows users to specify data to be displayed. safeguarding data against corruption. The DBMS provides user with the services needed and handles the technicalities of maintaining and using the data. • It maintains indices so that any required data can be retrieved.Database Management • It also provides the interface between the user and the data. Functions common to most databases • Data Dictionary (DD) o Is sometimes called a repository o Contains data about each file in the DB and each field within the files o Should only be updated by skilled personnel o Is used to perform validation checks o Allows users to specify a default field • File retrieval and maintenance o Many tools provided o Involves adding new records.) • The DBMS also allocates the storage to data. providing recovery and restart facilities after a hardware/software failure.

(This means that there is little duplication of data. (Total availability).F. In addition. Campbell 2010 19 .Database Management o A Backup or copy of the entire database should be made on a regular basis o Some DBMSs maintain a log of activities Advantages of databases • Data is managed by the DBMS • Program independent • Information supplied to managers is more valuable because it is based on a comprehensive collection of data instead of files which contain only the data needed for one application. Codd] © Copyright G. • Easier Access – non-technical users can access and maintain data if afforded the necessary privileges. • The amount of input preparation needed is minimized by the single input principle. errors due to discrepancies between 2 files are eliminated. • Security settings are usually used to define who have access to what level. reducing the probability of introducing inconsistencies and redundancies • A great deal of programming time is saved because the DBMS handles the construction and processing of the files and the retrieval of data. by the entire organization. • As well as routine reports. • Data definition and documentation are standardized. Disadvantages of databases • • Requires more memory. one transaction will cause the necessary changes to be made to the data). it is possible to obtain ad hoc reports to meet particular requirements. (Reduced data redundancy – most data items are stored in only one file which greatly reduces duplicate data) • Improved data integrity – data modification is accomplished by changing only one file. [Better service to the users] • There is an economic advantage in not duplicating data. [Data is centralized and integrated] • Shared data – • Data belongs to and are shared. usually over a network. (Reduced development time) • The integration of different business systems is greatly facilitated. storage and processing power Data are more vulnerable than in file processing systems [Research – The History of databases. 1970 – E.

Report Generator/Writer (see unit III for more details) Hardware Users © Copyright G. DML. Query Language.Database Management Components of a DBMS The components of a Database System are as follows: • • • • Database Software – DDL. Campbell 2010 20 .

relational and object oriented models. The data model consists of the rules that define how the database organizes data and how users view the organization of data. These record types are the equivalent of tables in the relational model. and unlike the network. The root record type exists at the top of the tree. Examples of hierarchical systems in computers are: • File system – a hierarchy of folders and sub-folders in which files are placed.Database Management The different types of databases/Database Models Every database and DBMS is based on a specific data model. family. One-to-many relationships exist between records in the hierarchy with one being the parent and the other the child. Hierarchical systems pervade everyday life. They collect all the instances of a specific record together as a record type. The hierarchical and network database models store its data in a series of records.g. genus etc. Examples of hierarchical systems in society are: • The army which has generals at the top and privates at the bottom • The classification of plants and animals according to species. and with the individual records being the equivalent of rows. The hierarchical model is the oldest of the database models. All data must be accessed through the root. (E. which have a set of field values attached to it. when you click on File another menu comes up under it). The classes are: • Relational • Network • Hierarchical • Object Oriented • Multidimensional A data model is a representation of data and its interrelationships which describe ideas about the real world. does not have a well documented history of its conception and initial release. Links between the record types are created using Parent-child relationships. Hierarchical database systems can also be found in inventory and accounting systems used by government departments and hospitals. • Menu driven system – systems of main menus and sub-menus below. Campbell 2010 21 . with each row of objects linked to objects directly beneath it. It is derived from the Information Management Systems of the 1950's and 60's. Hierarchical A hierarchical system is one that is organized in the shape of a pyramid. Databases are classified according to the approaches taken to database organization. The hierarchical model is a tree structured model and consists of many record types with one being the root. It was adopted by many banks and insurance companies who are still running it as a legacy system to this day. Each child has a unique © Copyright G.

as you might imagine. but very difficult to answer others. one problem with this system is that the user must know how the tree is structured in order to find anything. Hierarchical structures were widely used in the first mainframe database management systems. Hierarchical relationships between different types of data can make it very easy to answer some questions. To get to a low-level table. you must first access the customer (e. however. you must first access the customer then the order. However. by knowing the customer#). Examples of hierarchical databases include: • IMS .Database Management parent and a parent can have many children. Order. the parent of order is customer. These legacy systems are likely to be phased out over time. Parts. Campbell 2010 22 .M • Caché • Multidimensional_hierarchical_toolkit • Mumps_compiler © Copyright G.g. the root record type is customer. the parent of parts is order. The path to the parts record type is therefore Customer. in the diagram below.. The hierarchical model is no longer used as the basis for current commercially produced systems. Of course.Information Management Systems by IBM • System 2000 by MRI systems corp. they often cannot be used to relate structures that exist in the real world. In order to access the parts. In order to access an order. • Adabas • GT. a patient can have more than one physician) then the hierarchy becomes a network. If a oneto-many relationship is violated (e. you start at the root and work your way down through the tree until you reach your target.g. For example. as the number of qualified staff declines due to retirement and retraining. there are a large number of legacy (old) installations. Order has two children which are parts and salesman. This child/parent rule assures that data is systematically accessible. due to their restrictions.

Campbell 2010 23 . Its original inventor was Charles Bachman. This might be troublesome if.Database Management Advantages of the Hierarchical Model Data is unified since all records stem from the root Easier to secure the database since you can access data through only one path Good for large volumes of one-to-many relationships Adding. for example. and deleting records is more efficient and accurate than the network model Disadvantages of the Hierarchical Model • Software dependence (Changes to the database structure requires modification to all programs which access the database) • You cannot add a record to a child table until it has already been incorporated into the parent table. the network model allows each © Copyright G. • Cannot (difficult) show many-to-many relationships • One-to-many relationship can result in redundant data • Not flexible enough to support ad-hoc queries • Data can only be accessed through the right path • It is not user friendly as users have to know the structure in order to access data through the right path Network The network model is a database model conceived as a flexible way of representing objects and their relationships. you wanted to add a student who had not yet signed up for any courses. the Network Database model was designed to solve some of the problems with the Hierarchical Database Model. Where the hierarchical model structures data as a tree of record types. In many ways. with each record type having one parent record and many children. updating. In the diagram above. you cannot add a new salesperson until there is a customer and an order. and it was developed into a standard specification published in 1969 by the Conference on Data Systems Languages (CODASYL) Consortium.

Another way of saying it is that the child of salesperson and customer is order. The chief argument in favour of the network model. forming a lattice structure. Although the model was widely implemented and used. Examples of network databases include: • Codasyl • Total • VAX-DBMS • IMAGE of Hewlett Packard • DMS-1100 of UNIVAC • SUPRA of Cincom © Copyright G. more declarative interface. For example. This allows the model to support many-to-many relationships. The path to Parts is either Salesperson. an order can be accessed through either the salesperson or the customer as order has salesperson and customer as its parents. it failed to become dominant for two main reasons. the order #. was that it allowed a more natural modeling of relationships between entities. it was eventually displaced by the relational model. Data can therefore be accessed through more than one path. Parts or Customer. Campbell 2010 24 . IBM chose to stick to the hierarchical model in their established products such as IMS and DL/I. Secondly. There is no root record type. You can therefore access parts by either knowing who the salesperson is or through the order by knowing for example. in the diagram below. which offered a higher-level. in comparison to the hierarchical model. Parts.Database Management record type to have multiple parent and child records. Order. Order. Firstly.

and field an attribute Relational DB user calls file a table. The columns/fields are called attributes. Campbell 2010 25 . Ingres. use and maintain Relational Relational databases consist of tables called relations. Relational DB developer calls file a relation. Most relational databases include Structured Query Language (SQL) a query language that allows users to manage. MySQL. (Changes to the database structure requires modification to all programs which access the database) • Uses more processing time than the hierarchical structure • Users must have knowledge of the structure of the database in order to navigate • Hard to design. Relationships between relations are implicit in the overlapping attributes. Each row normally has a unique identifying key.Database Management Advantages of the Network Model • Many-to-many relationships are easily represented • It is more flexible as you can access data through more than 1 path • Represents redundancy more efficiently than hierarchical model Disadvantages of the Network Model • Software dependence. and field a column CustName Salesperson Order-No Salesperson Part-No Order-no © Copyright G. update and retrieve data (e. The rows/records are called tuples. Oracle. db2. Visual FoxPro). Access.g. record a row. Relations are made up of tuples and attributes. record a tuple. All have the same simple format making them easy to set out under column headings. Sybase.

that. subName. sex) SUBJECT(subid. each table is given a unique name. which exist among elements in. The name of the table is then written in capital letters. id is the primary key for the student table and if we assume the subid is the primary key for the subject table. Fname. we will end up with the following table structures in standard notation. For example.If we assume. Campbell 2010 26 . related entity tables.e. Lname and sex. These fields are enclosed in brackets. In standard notation. The primary key field for the table is then underlined. A relationship table on the other hand is a table structure that enables us to show the associations.we want to show the relationship between each student and the subjects taken in another table called takes. Lname. let us assume that we want to store the following information about a student: id.Database Management Advantages of the Relational Model Structural independence (i. Fname. Finally. Standard Notation Standard Notation is a format for writing database tables so that its logical structure may be understood. subid) Object-Oriented Summary o Stores data in objects (An object contains data plus the actions that process the data) o Can usually store more types of data than Relational databases o Can usually access data faster than the Relational DB © Copyright G. Following the table name is a list of all the fields. sublength. which are found in the table. sublength) TAKES(id. Let us also assume that we want to store the following information about a subject: subld. STUDENT(id. subName. Changes to the database structure DOES NOT require modification to all programs which access the database Powerful and flexible query mechanism that makes ad-hoc queries possible Easy representation of all types of relationships Unification of data that minimizes redundancy and maximizes security Disadvantages of the Relational Model Requires more space and processing power Requires more planning if the database structure is to be designed properly Entity Table/Relationship Table An entity table is a table structure which allows us to store a set of similar entities.

What is a Class? A class is a category of objects. a user can click on a button. Each object must have a set of well-defined public interfaces. breed. Campbell 2010 27 . In other words. and triangles. the relevant code for the particular user action is executed. dogs have state (name. an object is an item that contains data. The state refers to the data that is stored inside the object. rectangles. number of gears) and behavior (braking. which are encapsulated (contained) inside the object. right click. while the methods/behaviours refer to the set of operations/functions. For example. As indicated above. The class must specify a description of the data that is stored and a description of the operations that the object can provide. put the mouse over the button. mouse over etc are therefore examples of methods. When the user clicks on the button. current pedal. a class provides the blueprint for the creation of an object. which a client may use to get the object to perform a specific operation. Click. A class is a special programming construct that allows us to create objects. changing gears). slowing down. color. Examples of objects. An object oriented database can contain many classes of objects. hungry) and behavior (barking. For example.Database Management o Stores unstructured data more efficiently than the Relational DB o Example FastObjects. as well as the actions that read or process the data. You might want to represent real-world dogs as software objects in an animation program or a real-world bicycle as a software object in the program that controls an electronic exercise bike. The class defines all the common properties (characteristics) of the different objects that belong to it. GemStone What is an Object? An object generally is any item that can be individually selected and manipulated. You can also use software objects to model abstract concepts. accelerating. each object must have a state and a set of methods. This can include shapes and pictures that appear on a screen as well as less tangible software entities. fetching. double click. these include: • Command buttons • List boxes • Data windows © Copyright G. For example. Software objects are modeled after real-world objects in that they too have state and behavior. Bicycles have state (current gear. In other words. wagging tail). Real-world objects share two characteristics: They all have state and behavior. there might be a class called shape that contains objects which are circles. two wheels. In object-oriented programming an object is a self-contained entity that consists of both data and procedures to manipulate the data. which the object can perform. right click or double click on the button.

The most significant characteristic of object-oriented database technology is that it combines object-oriented programming with database technology to provide an integrated application development system. and so on. An object-oriented database stores data in objects. and allow programmers to reuse objects.Database Management • • • • • • • • • Windows Menus Text boxes Pictures Audio clips Video clips (animation) Students Courses Employees What is an object-oriented database (OODB)? Object-oriented databases or object database management systems grew out of research during the early to mid-1980s into having intrinsic database management support for graph-structured objects. video clips. audio clips. Object-oriented databases have several advantages compared with relational databases. First Name. © Copyright G. access this data faster. Address. When users query an object-oriented database. by contrast. then place the same button on each screen. Unstructured data includes photographs. If an object already exists. The term "object-oriented database system" first appeared around 1985. programmers can reuse it instead of recreating a new object saving on program development time. for example. and C++. Campbell 2010 28 . For example. if a Close button exists on each screen. A record in a relational database. They can store more types of data. It also could contain instructions on how to print the member record or the formula required to calculate a member's balance due. the results often display more quickly than the same query of a relational database. might contain data about a member such as Member ID. as well as actions that read or process the data. An object-oriented database stores unstructured data more efficiently than a relational database. the programmer only needs to write the code once. would contain only data about a member. Object-oriented databases are designed to work well with object-oriented programming languages such as Java. An object contains data. C#. This is called inheritance as discussed below. and documents. The following are features of an object-oriented database: • Inheritance – the ability to create new objects by allowing them to automatically obtain the data members and the data operations of an existing class without rewriting the code that is present in the existing class. A Member object. Last Name.

The object can maintain private information and methods that can be changed at any time without affecting other objects that depend on it. and previous versions of the design drafts. and links to Web pages. architectural. • • Examples of object oriented databases include: • • • • • FastObjects GemStone KE Texpress ObjectStore Versant Examples of applications appropriate for an object-oriented database include the following: • A multimedia database stores images. The Web contains a variety of hypertext and hypermedia databases. triangle etc. You can search these databases for items such as documents. the relationship among the components. but the function behaves differently from object to object. The way it calculates area depends on the type of object that called the function. graphics. A voice mail system database stores audio messages. A hypermedia database contains text. and sound. The Web browser sends and receives data between the form and the database. and scientific designs. audio and video clips. Encapsulation – the ability of an object to hide its internal representation from the program that uses it. Users perform queries to search the document contents. calendars. and/or video clips.Database Management • Polymorphism (many forms) – the ability to have multiple classes of objects using the same interfaces although the implementation details may vary from object to object. you can search people's schedules for available meeting times. memos. you can have a function/subroutine that calculates the area of an object. This is because the formula for area is different for circle. This is accomplished by defining public interfaces and by specifying that these public interfaces must be used when accessing the internal data. Campbell 2010 29 . manuals. A groupware database stores documents such as schedules. A computer-aided design (CAD) database stores data about engineering. For example. In other words. Information-hiding . For example. A hypertext database contains text links to other types of documents. You don't need to understand a bike's gear mechanism to use it. video. audio clips. rectangle. graphics. A television news station database stores audio and video clips. Data in the database includes a list of components of the item being designed. • • • • © Copyright G. and reports. A Web database links to an e-form on a Web page. there is one function called CALCULATE_AREA and multiple objects will call this function. For example.an object has a public interface that other objects can use to communicate with it. a geographic information system (GIS) database stores maps.

In contrast to a relational DBMS where a complex data structure must be flattened out to fit into tables or joined together from those tables to form the in-memory structure. OODBs have no performance overhead to store or retrieve a web or hierarchy of interrelated objects. the record is not transient (temporary). and transient data. applications require less code. which have complex relationships between data. "The object-oriented database (OODB) paradigm is the combination of object-oriented programming language (OOPL) systems and persistent systems. use more natural data modeling. This makes object DBMSs better suited to support applications such as financial portfolio risk analysis systems. Campbell 2010 30 .Database Management OODBs add database functionality to object programming languages. world wide web document structures. © Copyright G. A major benefit is the unification of the application and database development into a seamless data model and language environment. As a result. The power of the OODB comes from the seamless treatment of both persistent data." Data is a database is said to be persistent (constant) because you can read a record at one point in time and read the record at another point in time and the record is still there. design and manufacturing systems. as found in databases. Object developers can write complete database applications with a modest amount of additional effort. as found in executing programs. and hospital patient record systems. According to Rao (1994). and it enables better management of the complex interrelationships between objects. This one-to-one mapping of object programming language objects to database objects has two benefits over other storage approaches: it provides higher performance management of objects. In other words. telecommunications service applications. and code bases are easier to maintain.

Database Management Representation of an object oriented database. information on that area will appear. When the user puts their mouse over a button. the object-oriented database contains buttons and a map. When the user clicks on a button. Campbell 2010 31 . Object-Relational What is a hybrid object-relational database (ORD)? An object-relational database (ORD) or object-relational database management system (ORDBMS) combines features of the relational and object-oriented data models. It is a relational database management system that allows developers to integrate the database with their own custom data types and methods. a description of the button appears. these systems are more correctly referred to as object-relational mapping systems. there is a link to another web page. an object-relational DBMS allows software developers to integrate their own types and the © Copyright G. When the user clicks on a particular area of the map. Whereas RDBMS or SQL-DBMS products focused on the efficient management of data drawn from a limited set of data types (defined by the relevant language standards). In the sample website below. The term object-relational database is sometimes used to describe external software products running over traditional DBMSs to provide similar features.

video. As an evolutionary technology. o The number of dimensions varies o Most have a time dimension o Examples: D3. Oracle Express The following shows the difference between the relational view of sales data and the multidimensional view of sales data. For example. Database designers can work with familiar tabular structures while assimilating new object-management possibilities. Campbell 2010 32 . ODL must specify a description of the data that is stored in objects as well as a description of the operations that the object can provide. ODL is used to define and manipulate the objects in the database. Examples of Object-relational databases include: • DB2 • JDataStore • Oracle • Polyhedra • PostgreSQL What is Object Definition Language (ODL)? Object-oriented and object-relational databases often use a query language called object query language (OQL) to manipulate and retrieve data. An applet is an application that has limited features. move its location. The goal of ORDBMS technology is to allow developers to raise the level of abstraction at which they view the problem domain. Code could be written to manipulate the button in various ways such as: raise the button. bring it into focus.Database Management methods that apply to them into the DBMS. complex objects such as time-series and geospatial data and diverse binary media such as audio. an object could be defined as being a command button. the object-relational (OR) approach has inherited the robust transaction. © Copyright G. These new facilities integrate management of traditional fielded data. Object-relational database management systems (ORDBMSs) add new object storage capabilities to the relational systems at the core of modern information systems. enlarge it etc. images. By encapsulating methods with data structures. These databases also have an object definition language (ODL). requires limited memory resources. an ORDBMS server can execute complex analytical and data manipulation operations to search and transform multimedia and other complex objects. Multidimensional o Stores data in dimensions. and applets. and is usually portable between operating systems.and performance-management features of its relational ancestor and the flexibility of its object-oriented cousin.

Campbell 2010 33 .Database Management Relational View INVOICE Table Number Date 2034 15/5/96 2035 15/5/96 2036 16/5/96 2037 16/5/96 Customer Amount Dartonik $3500 INC $1800 Dartonik $2000 INC $800 LINE Number 2034 2034 Table Product Price Quantity Mouse $150 20 Diskette $50 10 Multidimensional View Time Dimension Customer Dimension 15/5/96 16/5/96 Totals Dartonik $3500 $2000 $5500 INC $1800 $800 $2600 Totals $5300 $2800 $8100 Sales figures occur at the intersection of a customer row and time column [Extra Research: semi-structured model. associative model] © Copyright G.

ERD) • Logical Design – the information content of the database (tables/objects and links) • Physical Design – layout on secondary storage (indexing.g.Database Management UNIT II: DATABASE DESIGN Introduction to the Database System Life Cycle (DBLC) The DBLC is made up of the following phases: • • • • • • Database Analysis Database Design Database Implementation Database Testing and Evaluation Database Operation Database Maintenance In designing a database it goes through this cycle. The steps in the cycle are further broken down as follows: Analysis and design phase Requirements formulation and analysis Logical Design Implementation design Physical design Database implementation and operation phase Database implementation Operation and monitoring Modification and adaptation Database Analysis This phase is done in the analysis phase of the SDLC. Campbell 2010 34 . access methods etc. The main aim of database analysis is to perform the following function: • • • • Analyse the current situation of the company (initial study) Define the problems being experienced Define organizational objectives and business rules (for validation rules etc. data types.) Define the scope and the boundaries of the project Database Design This phase is concerned with performing the following functions: • Conceptual Design – how data relates to each other (models the real world) (e.) © Copyright G.

g. increasing field sizes etc. [In other words this is where we create the database structure using SQL commands] Database Testing and Evaluation This phase is concerned with running tests to ensure that the database will meet the needs of the organization. Random/Direct] Database Implementation This phase is concerned with the actual creation of the database with respect to the database design that was constructed above. This includes making modifications (e.). concurrency control etc. adding new fields. In addition this phase is also concerned with the implementation of security routines.Database Management [Research the various access methods: Indexed. adding records) through the relevant application software. that the security of the database is indeed intact etc. This involves verifying that the appropriate business rules are being called. enhancing performance etc) Performing periodic security audit checks © Copyright G.g. logical or physical layers. Campbell 2010 35 .g. This phase also involves testing of the programs that will use the database to ensure that the interface works. We are also concerned with the population of the database. backup) Corrective maintenance (recovery from failure) Adaptive maintenance (adding new entities. business rules. Sequential. Maintenance is often attained by performing the following activities: • • • • Preventive maintenance (e. The failure of evaluation criteria may signal changes in the conceptual. Database Operation In this step users are actually using the database (e. Database Maintenance This phase is concerned with ensuring that the database is functional and reliable.

The term Business Analyst (BA) is used to describe a person who practices the discipline of business analysis. identifying options for improving business systems and bridging the needs of the business with the use of IT." This person critically evaluates the information gathered. A business analyst or "BA" is responsible for analyzing the business needs of clients to help identify business problems and propose solutions. the business analyst typically performs a liaison function between the business side of an enterprise and the providers of services to the enterprise. Common alternative titles are systems analyst. The International Institute of Business Analysis has the following definition of the role: "A business analyst works as a liaison among stakeholders in order to elicit. although some organizations may differentiate between these titles and corresponding responsibilities. 5. The business analyst understands business problems and opportunities in the context of the requirements and recommends solutions that enable the organization to achieve its goals. The most widely used form of data modelling is the Entity-Relationship (ER) approach. He/She should have strong analytical skills and can therefore translate business needs to requirements. Campbell 2010 36 . The role of the data modeller therefore is to create the data model or to carry out conceptual database design. and functional analyst. Database Designers The process of designing a database generally consists of a number of steps which will be carried out by the database designer. analyze. policies and information systems. Usually. the database designer must: • Determine the data to be stored in the database © Copyright G. Within the systems development life cycle domain. 3. 2.Database Management Roles of database personnel Data modellers Database design seeks to design the logical and physical structure of one or more databases to accommodate the information needs of the users in an organization for a defined set of applications". planning and analysis conceptual design logical design physical design implementation The data model is one part of the conceptual design process. communicate and validate requirements for changes to business processes. The design process roughly follows five steps: 1." The British Computer Society proposes the following definition of a business analyst: "An internal consultancy role that has responsibility for investigating business systems. 4. He also has good communication skills and is able to challenge business units. Business Analysts This person has both business and computer knowledge.

consult with users. Programmers Writes. A successful systems analyst must acquire four skills: analytical. conduct training. Programmers should therefore have the ability to learn new languages on their own as technology changes. This person must be able to communicate effectively. engineers etc. [Please note that programming languages are numerous and change from time to time. design considerations. coordinating and recommending software and system choices to meet an organization's business requirements. risk. They may be responsible for developing cost analysis. planning. Because they must write user requests into technical specifications.Database Management • • Determine the relationships between the different data elements Superimpose a logical structure upon the data on the basis of these relationships. and computer hardware platforms. which helps him/her to identify opportunities and to analyze and solve problems. Interpersonal skills help systems analysts work with end users as well as with analysts. The systems analyst plays a vital role in the systems development process. He/She uses computer technology to solve problems. write documentation. tests. Analytical skills enable systems analysts to understand the organization and its functions. and change. The systems analyst must be able to work with various programming languages. programmers. A systems analyst is responsible for researching. and interpersonal. communicates with users and trains them. modifies computer programs. and other systems professionals. They may also be responsible for feasibility studies of a computer system before making recommendations to senior management. [See the Database Design section for more details] Systems Analysts [see Business Systems course] The systems analyst analyses and designs systems that meet the computer requirements of an organization. and implementation timelines. the systems analysts are the liaisons between vendors and the IT professionals of the organization they represent. Technical skills help systems analysts understand the potential and the limitations of information technology. Management skills help systems analysts manage projects. technical. resources. Basically. © Copyright G. He/She also writes user manuals. operating systems. managerial. Called Systems Architects in some companies. a systems analyst performs the following tasks: • • • • • • Interact with the customers to know their requirements Interact with designers to convey the possible interface of the software Interact/guide the coders/developers to keep track of system development Perform system testing with sample/live data with the help of testers Implement the new system Prepare documentation Many systems analysts have morphed into business analysts. Campbell 2010 37 .

g. They identify user requirements. performance analysis and tuning. and test and coordinate modifications to the computer database systems. database administrators often plan and coordinate security measures. The role of coordinating the use of the database belongs to the database administrator (DBA). and adds new users to the system. corporate and IT policies and the technical features and capabilities of the database management systems (DBMSes) being administered. Managing a company’s database requires a great deal of coordination. data integrity. (Monitor performance). The administrative controls carried out by the DBA therefore include the following: • Select and implement the DBMS • Develop database models (e. set up computer databases. Database administrators work with database management systems software and determine ways to organize and store data. and database security have become increasingly important aspects of the job of database administrators.Checks backup and recovery/restore procedures • Perform archiving (backup and remove historical data from current files) • Appraise the performance of the database and takes corrective actions if performance degrades. and some database design or assistance thereof. • Ensures that the database structure is documented • Provides manuals describing the facilities the database offers and how to make use of these facilities. An organization’s database administrator ensures the performance of the system.000US depending on qualifications and experience. understands the platform on which the database runs.Database Management Database Administrators A database administrator (DBA) is a person who is responsible for the environmental aspects of a database. © Copyright G. The duty of a database administrator varies depending on job description. Campbell 2010 38 .000US to $86.1 This includes documentation of the data dictionary. They nearly always include disaster recovery (backups and testing of backups). (Includes backup and recovery • Verifies database integrity • Monitors performance of the database • Recoverability . Entity relationship diagrams) • Create and maintain the data dictionary. • Availability – ensures that the database is running when necessary • Use query languages to obtain reports of the information in the database 1 A data dictionary (also called repository) is a DBMS element that contains data about each table in a database and each field within those tables. backup systems. Provides the facilities for retrieving data and for structuring reports are appropriate to the needs of organization • Manages and evaluates security of the database. • Periodic appraisal of the data to ensure it is complete. accurate and not duplicated. With the volume of sensitive data generated every second growing rapidly. Their salaries range from $65. Because they also may design and implement system security.

Campbell 2010 39 . [Research – Salaries of the above job titles in various companies] © Copyright G.Database Management Although not strictly part of a database administrator's duties. logical and physical design of databases is sometimes part of the job. These functions are traditionally thought of as being the duties of a database analyst or database designer.

Physical Database Design is the process of developing a database structure from user requirements. rather than expertise in the domain from which the data to be stored is drawn e. but also the forms and queries used as part of the overall database application within the database management system (DBMS). A fully attributed data model contains detailed attributes for each entity. This is because those with the necessary domain knowledge frequently cannot express clearly what their system requirements for the database are as they are unaccustomed to thinking in terms of the discrete data elements which must be stored. financial information. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language. Determining data to be stored In a majority of cases. The Database Design Process The process of designing a database generally consists of a number of steps which will be carried out by the database designer. biological information etc. This process is one which is generally considered part of requirements analysis. It is the process of producing a detailed data model of a database. and most correctly. Therefore the data to be stored in the database must be determined in cooperation with a person who does have expertise in that domain. Usually. and who is aware of what data must be stored within the system. Principally. the term database design could also be used to apply to the overall process of designing.g. In an object database the entities and relationships map directly to object classes and named relationships. Not all of these steps will be necessary in all cases. However. Logical. In the relational model these are the tables and views. it can be thought of as the logical design of the base data structures used to store the data. and requires skill on the part of the database designer to elicit the needed information from those with the domain knowledge.Database Management Database Design – Conceptual. the designer must: • • • Determine the data to be stored in the database Determine the relationships between the different data elements Superimpose a logical structure upon the data on the basis of these relationships. not just the base data structures. Campbell 2010 40 . which can then be used to create a database. Data to be stored can be determined by Requirement Specification. The term database design can be used to describe many different parts of the design of an overall database system. the person who is doing the design of a database is a person with expertise in the area of database design. © Copyright G.

Since complex logical relationships are themselves tables they will probably have links to more than one parent.e. and other parameters residing in the DBMS data dictionary. the inverse is not necessarily true. i. but one person cannot have two addresses. the other will also. This includes detailed specification of data elements. The relationships may be defined as attributes of the object classes involved or as methods that operate on the object classes. because if the address is different then the associated name is different too. Each table may represent an implementation of either a logical object or a relationship joining one or more instances of one or more logical objects.Database Management Conceptual design Once a database designer is aware of the data which is to be stored within the database. It is the detailed design of a system that includes modules & the database's hardware & software specifications of the system. indexing options. © Copyright G. However. This determines the layout or configuration on secondary storage. In other words the physical design of the database specifies the physical configuration of the database on the storage media. Relationships between tables may then be stored as links connecting child tables with parents. Physical Design This results in a physical database structure which is developed from the logical structure. it is possible to arrange the data into a logical structure which can then be mapped into the storage objects supported by the database management system. The conceptual schema accurately models the real world organization and its important data elements and relationships. in a list of names and addresses. When performing this step. assuming the normal situation where two people can have the same address. the name is dependent upon the address. For example. Once the relationships and dependencies amongst the various pieces of information have been determined. It is the consolidation of all user requirements into a DBMS-independent information structure (conceptual schema). when one piece of information changes. Logical design results in the logical database structure. The conceptual schema normally used is the ERD. they must then determine how the various pieces of that data relate to one another. data types. Campbell 2010 41 . where one piece of information is dependent upon another i. In the case of relational databases the storage objects are normalized tables which store data in rows and columns. In an Object database the storage objects correspond directly to the objects used by the Object-oriented programming language used to write the applications that will manage and access the data. the designer is generally looking out for the dependencies in the data. Logical Design This involves the design of the entire information content of the database.e. when the name changes address may be the same.

• Access method design .physical allocation of stored records.provide storage and retrieval capabilities for data stored on physical devices. • Stored record clustering .Database Management Physical design can be roughly divided into 3 steps: • Stored record format design .concerned with the problem of formatting stored data by analysis of the characteristics of data item types. distribution of data item values. their usage of various applications. Record clustering places the same or different record types together in blocks on the storage device. Campbell 2010 42 . © Copyright G.

Definition of database schema Database schema defines a database’s structure.physical view . Therefore. entities. Database schema is a design. The conceptual schema is independent of both software and hardware. Software independence means that the model does not depend on the DBMS software used to implement the model. its tables. The database designer will. Explanation of the four database schema • Conceptual schema . The most widely used conceptual model is the entity relationship (E-R) model. domains. E. Database schema is therefore based on how one views the data.Database Management Database Schema or Levels of abstraction in specifying a database structure Look at the diagram above. • Internal schema . It is an enterprise-wide representation of data as viewed by high-level managers. This model is the basis for the identification and description of the main data objects. the internal model requires the database designer to match the conceptual model’s characteristics and constraints to those of the selected database model. Campbell 2010 43 . In other words. and is the basic database model. the internal model adapts the conceptual model to a specific DBMS. Using the E-R model yields the conceptual schema. the rabbit or the duck? Just as how different persons perceive the illusions in different ways. Which is easier to see.consists of attributes. avoiding details. which deals with organizational structures that are used to define database structures such as tables and constraints. for example. In other words. This represents a global view of the data. different users will view the data in different ways. in effect the basic database blueprint. this schema is used to design the database structure. relationships The conceptual schema is also called the logical model. the foundation on which the database and the application are built. Data can be viewed as entities with attributes or it can be viewed as groups of bits. changes in either the hardware or the DBMS software will have no effect on the database design at the conceptual level. see the specific tables in the database and know which fields © Copyright G.what analyst/programmer sees Once a specific DBMS has been selected. which is.g. relationships. Conceptual schema provides a relatively easily understood bird’s eye view of the data environment. and business rules. Hardware independence means that the model does not depend on the hardware used in the implementation of the model.

the relational database model requires less detail in its internal model because most RDBMSes handle data access path definition transparently – that is. currency datatype is not on all DBMSes] The development of a detailed internal model is especially important to database designers who work with certain database models that require very precise specification of data storage location and data access paths.Database Management are on which table. This model requires the definition of both the physical storage devices and the physical access methods required to reach the data within those storage devices. For example.g. an end user may see every field on one screen (form). The bits are commonly stored on tracks. Some fields may also be missing from the user’s screen.independent because it is unaffected by the choice of the computer in which the software is installed. Attributes of storage media o Tracks Bits (0s and 1s) are the smallest unit of data. Because the internal model depends on the existence of specific database software. The user does not necessarily need to know about these fields in order to perform his tasks. By end users we mean the people who use the application programs as well as those who designed and implemented them. It is the end user’s view of the data environment or the applications interface. The internal model is still hardware. o Sectors Data can be grouped in blocks called sectors.applications programmer or end user view This is based on the internal model. through the use of a data input form). Whereas the database designer will know that fields are located on different tables. This user therefore views the fields as if they were on one table. the designer need not be aware of the data access path details. • External schema . the location of the database within the group and the location of the tables within the database be specified. Therefore a change in storage devices or even a change in operating system will not affect the internal model’s design requirements. It is both software and hardware dependent. The 0 is a non magnetized spot on magnetic storage devices or as pits (holes) burnt in the surface of optical storage devices. especially in a mainframe environment. DB2 requires that the data storage group. The end user will not need to know that the data is separated into different tables. It deals with methods through which users may access the data (e. A sector on magnetic disk for example is in the shape of a pizza slice/wedge. it is said to be software-dependent.g. Nevertheless. even relational database software usually requires data storage location specification. A change to either the DBMS software or hardware would require a change to the database model. © Copyright G. In contrast. The block is read or written to at once. Therefore a change in the DBMS requires that the internal model be changed to fit the DBMS’s characteristics and requirements. • Physical schema – way data is stored on secondary storage The lowest level of abstraction describes the way data is saved on storage media such as disks or tapes. Campbell 2010 44 . [e.

For example. Files can be ordered in many ways by using more than one sets of pointers. should the new master file be damaged or destroyed. Records cannot be inserted in the middle of the file. or most of it. b) Indexed Records have a unique key which is a pointer to the record in order to access them. the next record logically (alphabetically) is Boris. go to record number 4 to find it. In other words. Advantages This method can use magnetic tape which is the least expensive method of storage. Disadvantages This method can be slow when trying to locate a record near to the end of the file. The entire file must be processed and a new master file created even if only one record requires maintenance or updating. © Copyright G. This can be in alphabetical or numerical order. if you want to know what the record is after Alice. this will affect the types of applications that can use the data. In order to modify a file the original file (master) is changed by creating transactions in a transaction file. you follow the sequence of the pointers. Campbell 2010 45 . Any type of storage device can access sequential files. The pointer after Alice therefore says 4. as well as the time and cost necessary to do so. The pointers exist in an index file (separate from data file) and direct you to the next logical record. In order to access the file sequentially. the method of organization chosen will determine how the data can be accessed. must be processed at once. Processing begins at the first logical record and proceeds through each record in the file until the final record has been read or written. The transaction file is processed and a new master file is created based on the transactions. Magnetic tape is a sequential access device and can only use sequential files.Database Management o File Organization and Access Methods When data are stored on secondary storage devices. Alice is the first logical record. The records are not physically in logical order. It is the most efficient form of organization when the entire file. Transaction and old master files act as a backup. In turn. a) Sequential With sequential file organization records are stored physically in order in the file.

[Research . the primary key is processed mathematically and another number is computed that represents the location where the record will be stored. This method is known as hashing. When a user retrieves the record. Campbell 2010 46 . The key is used to calculate an address for the record. To accomplish this each record is uniquely identified by a key. Disadvantages Indexes lower efficiency Indexes can be damaged. Data is easily kept up-to-date. c) Direct (Random) The data in this method may be organized in such a way that they are scattered throughout the disk in what may appear to be a random order. The second record must then be stored in an overflow area. Advantages Data can be accessed directly and quickly. thus the sequencing is lost. the data still exists. a record can be read or updated. This reduces the efficiency of the retrieval process. This method requires the use of direct access devices such as magnetic disk. and the hashing routine is used to determine where the record can be found. Once accessed. No transaction files are maintained. Disadvantages This is more expensive than sequential. Direct access permits access to any record without the necessity to read other records in the file. If an index is lost. Hashing is a method used for determining the physical location of a record. Files can also be processed sequentially.Database Management Physical Rec# 1 2 3 4 5 Data Mary Alice Jane Boris Peter Index/Pointer 5 4 1 3 2 Advantages Data can be accessed sequentially or directly. In this method. There is no backup of the master file. its key is entered. Procedures must be established to ensure the regular creation of backup files. because the search for the right record becomes more complex through the use of overflow areas and thus becomes more time-consuming.The levels of the ANSI/SPARC database architecture] © Copyright G. The problem with hashing however is that different keys processed can sometimes result in the same number or the same storage locations. leading to “collisions”.

e. These are: • One-to-one (1:1) i. the end-user.g. The most popular diagramming technique that is used to create the data model is known as the Entity . a performs relation between an artist and a song. Examples: an owns relation between a company and a computer. patient. Peter Chen and others. An Entity-Relationship Diagram (ERD).Relationship Diagrams A data model is a pictorial abstraction of the contents of a database. The purpose of an ERD is to design a database structure. A product can have only one package. E. product. There are three types of relationships that can exist between entities. EntityRelationship diagrams (ERDs) emerged in the 1970's from work by Dr. customer. song. The major function of the data model is to provide a simplified view of the database contents in a form that is easily understood by the client. Types of relationships A relationship is an association between entities. a supervises relation between an employee and a department. This field is the data item that uniquely identifies the record. E. Campbell 2010 47 . [Research: Weak entity2.Database Management Entity. It may be a person. A record would form a collection of these data items. Entities are drawn as rectangles. The attribute that would uniquely identify a particular entity would be the primary key field. also known as the Entity Relationship Model. product code. each entity A has many entity Bs. address. cardinality. sex. library book. date of birth. 2 Cannot be uniquely identified by its own attributes alone © Copyright G. Each entity A has only one entity B. A relationship captures how two or more entities are related to one another.g. They can also be used with clients to discuss business rules. the application programmer and the database designer. but a student has only one teacher in the subject. (The attributes equate to the fields/data items). title. Each B has only one A. blood type etc. place or thing etc. employee. roughly) as verbs. Examples are: student.e. Entities can be thought of (roughly) as nouns. They were looking for means to simplify the representation of large and complex data storage concepts. car. subtype entity] An entity has certain attributes. eye color. Relationships can be thought of (again. An attribute is a characteristic of an entity or it can be defined as the data collected about the entity. Examples are: name.Relationship diagram. A teacher of this subject can have many students. existence-dependent. An entity is an object or event about which someone chooses to collect data. • One-to-many (1:m) i. supertype entity. is a specialized graphic that shows the interrelationships between entities in a database.

Each B has many As. a patient can have many doctors.g. Campbell 2010 48 . there is a many-to-many relationship between Salesman and City. entities are represented by rectangles. © Copyright G. The type of relationship is represented by 1. A doctor can have many patients.e. For example. Dept has an attribute called manager. there is a one-to-one relationship between Office and Emp. based on the 3 diagrams above there is a one-to-many relationship between Dept and Emp.Database Management • Many-to-many (m:n) i. The symbols used in an ERD Entity – represented by a rectangle Relationship – represented by a diamond or a line depending on convention Type of relationship – represented by a diamond with a number or lines depending on convention • Attribute – represented by ovals outside the entity or listed inside the entity depending on convention • • • Sample ERDs Convention 1 . Each entity A has many entity Bs. The name of the relationship is written in the diamond. m or n.Chen In this convention. The name of the attribute is written in the oval. The oval is attached to the entity with a line. relationships are represented by diamonds and attributes by ovals. E.

Optional existence is represented by placing a circle on the line next to the optional entity. Campbell 2010 49 . entities are represented by labelled rectangles. The above diagram shows a one-to-many relationship (one department to many projects). Mandatory existence is represented by placing a perpendicular bar on the line next to the mandatory entity. Entity names should be singular nouns.g. The label is the name of the entity. Attributes. Attributes which are identifiers are underlined. A “one” is represented by a single line attached to the entity and a “many” is indicated by a “crow’s foot” or three lines. The name of the relationships is written above the line. when included. Relationship names should be verbs. are listed inside the entity rectangle (e. Relationships are represented by a solid line connecting two entities. The diagram shows that Departments are mandatory but Projects are optional.Martin In this convention. © Copyright G.Database Management Convention 2 . DeptID and ProjectID). Attribute names should be singular nouns.

identified in the narrative (see highlighted items above). or they may suggest the need for keys or identifiers. What is the least expensive prescription? d. Which beds are free? b. How much will a patient cost to treat? d. Which drugs are being used? 5. identified in the narrative (see highlighted items above). Campbell 2010 50 . Add attributes to the relations. Define Relationships: these are usually verbs used in descriptions of the system or in discussion of the business rules (entity ______ entity).g. How many doctors are there in the hospital? e. Which assistants work for Dr. How much will be spent in a ward in a given week? c. This flexibility allows us to consider a variety of questions such as: a. in the discussion of business rules. Each patient is required to take a variety of drugs a certain number of times per day and for varying lengths of time. or in documentation. Describe the type of relationship between the entities Many-to-Many must be resolved to two one-to-manys with an additional entity Usually automatically happens Sometimes involves introduction of a link entity (which will be all foreign key) Examples: Patient-Drug 6. 4. Which doctors work in which wards? b. Initially the system will be concerned solely with drug treatment. Heathcare assistants also attend to the patients. and may also suggest new entities. What questions can we ask? a. X? c. Usually each patient will be assigned a single doctor. these are determined by the queries. Which patients are family related? 7. 3. Represent that information with symbols © Copyright G. 2. The system will also need to track what treatments are required for which patients and when and it should be capable of calculating the cost of treatment per week for each patient (though it is currently unclear to what use this information will be put). a number of these are associated with each ward. e. but in rare cases they will have two. The system must record details concerning patient treatment and staff payment. grade.Database Management Example of Creating the ERD Consider a hospital: Patients are treated in a single ward by the doctors assigned to them. How much does a doctor cost per week? e. Define Entities: these are usually nouns used in descriptions of the system. Which assistants can a patient expect to see? f. How do we start the ERD? 1. Some staff are paid part time and doctors and care assistants work varying amounts of overtime at varying rates (subject to grade).

Database Management Entity and Referential Integrity As a database designer you will discover that database integrity rules are essential if you are to create a good database design. Campbell 2010 51 .The purpose of this rule is to ensure that there are no illegal entries within the relationship tables. you still need to be aware of them. Referential integrity . © Copyright G. It also prevents us from deleting records whose primary key value has a corresponding match in a relationship table. Entity integrity – this states that all records must have a primary key and the primary key value must never contain a null or undefined value.this states that a foreign key must either have a null value or it must have a matching primary key value in the table to which it is related. The purpose of this rule is to ensure that each record within a table have a unique identity. Although some Relational DBMS automatically enforce these rules.

one or more customers at a time. Driver.is garaged] [Research Martin & Chen] © Copyright G. An order lists one or many products. since the vehicles vary in size and can be single or double-decked. deadline date. The system should capture information about the task performed by an employee such as description. as well as the supervisor and employee number and a unique project number. date assigned. Each employee is employed to perform a particular task.is serviced by / routestage – comprises / driver-stage . Exercise 3 Consider a construction firm: Employees belong to a particular department. Each department has a supervisor and at least one employee. but possibly more departments. location. Products that are available are stored in the company warehouse. and sometimes a telephone number. Stage. and hourly rate. Exercise 5 A company has several departments. Exercise 6 A Metropolitan Bus Company owns a number of buses. supervisors and employees. Each bus is allocated to a particular route. Town. although some routes may have several buses. TRN. name. The important data fields are the names of the departments. address. The artist can make a CD if he wishes.is situated/ garage-bus . which corresponds to a journey through some or all of the towns on a route. Employees must be assigned to at least one. but an employee may be on vacation and not assigned to any projects. Some of the towns have a garage where buses are kept and each bus is identified by the registration number and can carry different numbers of passengers. The system should capture the employee’s name. At least one employee is assigned to a project. Campbell 2010 52 . A customer can place as many orders as he would like to. projects. Each route is identified by a route number and information is available on the average number of passengers carried per day for each route. [Entities: Bus. Relationships: Bus-route .passes-through/ route-town passes-through / garage-town . One or more drivers are allocated to each stage of a route.Database Management ERD Exercises Exercise 1 Man is married to wife Manager manages employee Lecturer teaches student Student studies course Exercise 2 An artist belongs to a band. and supervisor. Route. Each route passes through a number of towns.is allocated / stage-town . Drivers have an employee number. A CD contains one or more tracks on it. Exercise 4 A Sales Rep serves none. address and other contact information. The system should also capture information about the department such as department name.

. A2. If X is the primary key (or a candidate key) then all attributes Y or relation R must be functionally dependent on X. A2. An) . An) be a relational schema (i. A2.The closure of F. that is.). written as X --> Y. item. price) Here are 2 FDs • name --> address .. F+ = {X --> Y : F |= = X --> Y}. Definition . Definition . If have a set of FDs then closure is another set of FDs that is implies. a relation/table with attributes A1 etc. is the set of FDs that are logically implied by F. Example © Copyright G. address.. (b) X is a minimal key.. Campbell 2010 53 . for no proper subset Y ⊂ X is Y --> {A1. We say that X functionally determines Y (or Y is functionally dependent on X). …. An}. B --> C } |= = A --> C. and let X --> Y be a given functional dependency. if for each value of X there exists exactly one value of Y.we allow for the case where X and Y are composite.Let F be the set of functional dependencies for relation R. ….Let R(A1. …. An} is in F+. …. Closure can be used to find keys of a relation.e. Then X is a (candidate) key if : (a) X --> {A1. Example SUPPLIERS (name. F+. Example { A --> B. NB. A2. An) and the set of FDs F.Consider the relational schema R(A1. that is. A functional dependency allows us to use the value of one attribute and predict the value of another attribute. …. …. • name + item --> price. Computation of Closures Definition . and let X and Y be subsets of (A1. An) in F+.Given a particular value of name there exists precisely one corresponding value for each address. A2.Database Management Functional Dependencies Definition . Then F logically implies X --> Y (written F |= = X --> Y) if every relation (instance r of R) that satisfies the dependencies in F also satisfies X --> Y. and let X be a subset of {A1. A2.

name or job. Exercise 2: Given Supplier (name. it has no proper subsets. we have X(0) = X = {city. in F. zip is not a key.e. st}. X(i+1) is X(i) plus (i.Database Management Let R(A. item. (i. C) and F = {A --> B. st --> zip Hence. Using the above algorithm. A --> A. Example Given relation R (city. zip is a key. price is a key. X(i) --> Z so X(i) U Z) Note: We will eventually reach i such that X(i) = X(i+1) . st} we now look for dependencies of the form city --> Q1. C}. show that (i) st. st} U zip = {city. Show that name + item is a key. There is then no need to compute beyond X(i) once we discover that X(i) = X(i+1) . zip}. (city and st are subsets of X(0)). job. and a set X ⊆ U. the closure of X. B --> C}. If F |= = A --> C then it follows therefore that A --> B. a set of FDs F. that is. Campbell 2010 54 . Hence A is a key. To show that city + st is a key. name +item --> price}. What is the key? F |= = A --> C if every A gives a value C. dept -> name and name -> dept. To find X+. st --> Q2. address. Closure Exercises Exercise 1: By computing its closure. unioned with) the set of attributes A such that there is some dependency Y --> Z. A --> {A. st. Show if address. There is one such dependency. Exercise 3: Given R(name. dept) and job. X(1) = X(0) U zip = {city. Also the process terminates if X(i) = U. Method 1. namely. st --> Q3.e. Algorithm for finding the closure of a set of attributes Given a set of attributes U. price) and F = {name --> address. or city. zip} = U Hence X+ = U. name is a key. If all 3 exist. Determine if job. Since A is a single attribute. st. st. B. then X(1) = X(0) U Q1 U Q2 U Q3. © Copyright G. dept or dept. If X(i) = U then X is a key. and X = {city. and Y ⊆ X(i). city. Let X = {city. zip) and nontrivial FDs city + st --> zip and zip --> city. X(0) is X 2. A --> C. B. such that A is in Z. But {city. st} is a key. (ii) city.

They are sound because they do not generate any incorrect dependencies. Y ⊆ X) then name + item --> item. (a) zip --> city (given) (b) zip st --> city st (augmentation using (a) ) (c) city st --> zip (given) (d) city st --> city st zip (augmentation using (c) ) (e) st zip --> city st zip (transitivity using (b) and (d) ).g.Database Management Armstrong’s Axioms These are rules used to determine/generate dependencies from other dependencies. e. as set of attributes U and a set of functional dependencies F. then X --> Z. Examples Given the relation R (city. e. zip) and nontrivial FDs city + st --> zip zip --> city to show that both city + st and st + zip are keys for R.e. Addr. addr -> Age Age -> Year Use Armstrong’s Axioms to come up with other FDs. Augmentation If X --> Y holds. if X = name + item and Y= item (i. They are complete because all FDs implied by F can be derived from F using the axioms. Year) and FDs TRN. Campbell 2010 55 . Hence from (d) and (e) both city st and st zip are keys for R. and Z ⊆ U. if item --> price then item + name --> price + name Transitivity If X --> Y and Y --> Z. © Copyright G. Name. Age. the axioms are as follows: Reflexivity IF Y ⊆ X ⊆ U. st. Given a relational scheme R. [Both determine all fields] EXERCISE Given R(TRN. Note: Armstrong’s axioms are sound and complete.g. then X --> Y is logically implied by F. then XZ --> YZ.

{X--> A} U {Z --> A} equivalent to F. name -> price Name. is given by Fm = {A-->B. B-->A.h. Fm. while (result changes and Y is not contained in result) do for each FD. Exercises . say X-->Y. A minimal cover. that there are no redundant dependencies. and remove it from the set of FDs 2. Origin. Lime Melon -->Naseberry c) Name -> Addr. For no X-> A in F and proper subset Z of X is F . we must show that every dependency in F is in G+ and that every dependency in G is in F+. B-->C. To test whether F and G are equivalent.Database Management Covers and their role in determining redundant FDs If F and G are sets of dependencies. Lime -->Naseberry. Example Consider the set F = {A-->B. C-->A}. Note: Every set of dependencies F is equivalent to a set Fm that is minimal. Every right hand side of a dependency F is a single attribute. Colour -> Elasticity Naseberry Melon--> Orange. 1. result = X. In that case we say that F covers G (and G covers F). no dependency in F is redundant. For no X --> A in F is the set F . id -> dept. C-->A}. of any FD in F is redundant. has more than 1 attribute. found by eliminating the dependencies B-->A and A-->C. then split it.h. that is. no attribute on the l. A-->C. Algorithm to find redundant FDs. arrival time -> flight#. 2. Choose an FD. origin. Density -> Elasticity. remaining in the reduced set of FDs if A is a subset of result then result = result U B end 3.Find the redundant FDs in the following sets: a) Colour -> Density. Campbell 2010 56 . We say that a set of dependencies F is a minimal cover. then F is equivalent to G if F+ = G+. A-->B. destination -> flight#. 3. That is. destination.s. That is. if: 1. d) Name -> id. B-->C. if Y is a subset of result then FD X -->Y is redundant. Id -> name.s. Id -> dept e) Flight# -> destination.{X--> A} equivalent to F. If any r. In designing databases we ensure that the set of functional dependencies for a given schema is minimal. name. b) Lime -->Melon. flight# -> origin © Copyright G. item -> price.

Example : If A + B + C is a key for a relation R. which © Copyright G. Definition . A table is also in first normal form (1NF) if it contains no repeating groups.where at least one value is not atomic (that is.A relation is in first normal form (1NF) if: every attribute is a simple (atomic) attribute. simpler tables. The table has large rows due to the repeating groups and wastes disk space. Definition . 1NF violations cause data redundancy. normalized tables are more easily maintained. It is a process of decomposing a table into smaller. and all of the data fields are included in one relation or table. Other examples include: • NAME can be broken down into FIRSTNAME. STREET.Database Management 1st . There is also at least one value that is not atomic (that is. and C are prime attributes. • ADDRESS can be broken down into STREET#. B. it can be decomposed further). The normalization process starts with unnormalized relations . In addition to being simpler and more stable. Normalization is the process of eliminating data redundancies and data anomalies from table structures by applying various rules called normal forms. LASTNAME. 2nd . In zero normal form (0NF). then attributes A.Normalization is a process of obtaining stable groupings of attributes into relations. atomic field is one that cannot be broken down further. Example S# S1 P# P1 P2 P3 P4 P1 P2 P2 PQ QTY 300 200 400 200 300 400 200 S2 S3 The field PQ can be broken down into P# (part number) and QTY (quantity).An attribute of relation R is prime (sometimes called key) attribute if it participates in a key. ZIPCODE etc. Campbell 2010 57 . MIDDLENAME. Note: Every normalized table is in 1NF. CITY. A simple. 3rd Normal Forms Definition . A table or relation is in first normal form (1NF) if: every field is a simple (atomic) field. Normalization organizes a database into one of several forms to remove ambiguous relationships between data and minimize data redundancy. the database is completely non-normalized/unnormalized. it can be decomposed further. Note: When you break down a table into simpler tables always ensure that there is a common field that you will be able to use to join the tables back together for queries.

and FD is model --> origin.] [NB. Key is model + cylinder#. A --> B. An attribute of relation R is prime (sometimes called key) attribute if it participates in a key. fee). Example: if X --> Y and Y --> Z then X and Y will remain on one table with X being the key and Y and Z would be on the other table with Y being the key. [Primary key --> all attributes. B --> C means A --> C). location). key). cylinder#. Campbell 2010 58 . origin. dept. Employee is not in 3NF because there is a transitive dependency of location on the key. poor data integrity. In order to convert to 3NF we need to remove or break up the transitive dependencies. and C are prime attributes. tax. Definition . All fields dependent on X would be on one table and all fields dependent on Y would © Copyright G. emp# --> location.Database Management may lead to data inconsistencies. tax. Model and cylinder# are prime.A relation is in 3rd normal form (3NF) if: a) it is in 2NF and b) it has no transitive dependencies of nonprime attributes on keys (i. Table cars in not in 2NF because origin is non prime and not fully dependent on model and cylinder# (i. Emp# --> dept.e. Example: We can convert the above to 1NF as follows: S# S1 S1 S1 S1 S2 S2 S3 P# P1 P2 P3 P4 P1 P2 P2 QTY 300 200 400 200 300 400 200 Definition . To convert to 1NF you should break down fields to their simplest and remove any repeating groups into another table. dept --> location. data anomalies etc. Example employee (emp#. origin. fee are non prime. then attributes A.] Example Cars (model. That is every nonprime attribute is fully dependent on the primary key. Suppose emp# is a key. wastage of space. emp#.A relation is in second normal form (2NF) if: a) it is in 1NF and b) it has no partial dependencies of nonprime (nonkey) attributes on keys. B.e. Example : If A + B + C is a key for a table R.

Fields that are not fully dependent on the key should be moved to a separate table. BCNF] Comprehensive example (1NF to 3NF) 1NF S# Status City P# Qty S1 20 LONDON P1 300 S1 20 LONDON P2 200 S1 20 LONDON P3 400 S1 20 LONDON P4 200 S1 20 LONDON P5 100 S1 20 LONDON P6 100 S2 10 PARIS P1 300 S2 10 PARIS P2 400 S3 10 PARIS P2 200 S4 20 LONDON P2 200 S4 20 LONDON P4 300 S4 20 LONDON P5 400 S# + P# is a key. we would therefore place emp# and dept on one table (key emp#) and dept and location on another table (key dept). The fields status and city should therefore be placed in their own table and the key for that table is the field that they are functionally dependent on (which is S#). then all 6 rows would have to be modified 2NF To change to 2NF we ensure that everything is fully dependent on the key. [Research 4NF. Only fields fully dependent on the key should remain in the original table. city --> status Problems • We cannot insert the fact that a supplier is in a given city until he supplies at least one part • Deletion of a row for a given supplier destroys additional info • Redundancy can result in long searches and inconsistency (if change one row have to make same change in another). Y will be the common field that will be used to join the tables for the running of queries. © Copyright G. Example: Suppose supplier S1 changes status to 30. S# --> city. S# --> status. Campbell 2010 59 .Database Management be on another table. From the Employee table above.

We can now enter the fact that supplier S5 is located in Athens. Hence search and consistency problems. Campbell 2010 60 . 3NF Remove transitive dependencies S# --> city and city --> status S# P# Qty S# City City S1 P1 300 S1 LONDON ATHENS S1 P2 200 S2 PARIS LONDON S1 P3 400 S3 PARIS PARIS S1 P4 200 S4 LONDON S1 P5 100 S5 ATHENS S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P2 200 S4 P4 300 S4 P5 400 S# S1 S2 S3 S4 S5 Status 20 10 10 20 30 City LONDON PARIS PARIS LONDON ATHENS Status 30 20 10 © Copyright G.Database Management S# P# Qty S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P2 200 S4 P4 300 S4 P5 400 Note . • If we delete the only row for a city we destroy the fact that a city has a given status value. • Status value occurs many times. Problems • We cannot enter the fact that a given city has a given status until a supplier is located in that city.

Order Table Order # 1001 Order Date 6/8/2004 Product # 605 Product Name White Copy Paper Ballpoi nt pens Ring Binder White Copy Paper Qty Ordered 2 Vendor # 321 Vendor Name Hammer mill Pilot Globe Hammer mill 102 File Folders 2 450 Globe Product # 203 Product Name CD Jewel Cases Qty Ordered 5 Vendor # 110 Vendor Name Fellowes 1002 1003 1004 6/10/2004 6/10/2004 6/11/2004 751 321 605 6 12 2 166 450 321 © Copyright G. A partial dependency exists when fields in the table depend on only part of the primary key. Second normal form requires you to place the product information in a separate Product table to remove the partial dependency (part c). you remove transitive dependencies. which is only part of the primary key. by combining the primary key of the nonrepeating group (Order #) with the primary key of the repeating group (Product #). Product Name is dependent on Product #. product name. Thus Product# to Vendor Name is shown twice to facilitate two products. To normalize the data from 0NF to 1NF (first normal form). A transitive dependency exists when a nonprimary key field depends on another nonprimary field. As shown part c. you remove partial dependencies. Campbell 2010 61 . Vendor Name is dependent on Vendor #. How do you identify repeating groups. For example. or change a Vendor or Product Name. Vendor # also remains in the original table as a foreign key and is identified by a dotted underline (part d). In 3NF. To move from 2NF to 3NF(third normal form). with Vendor # as the primary key. delete. For instance. To further normalize the database form 1NF to 2NF (second normal form). you make the change in just one table. You then assign a primary key to the second table (Line Item). both of which are nonprimary key fields. Other items will be repeated. the database now is well organized into four separate tables and is easier to maintain. In the Line Item Table (part b). to add. Consider the Order table. Primary keys are underlined to distinguish them from other fields. This is because you are able to order more than one thing. for a particular order we will have more than one product number. The table has large rows due to the repeating groups and wastes disk space when an order has only one item. Third normal form requires Vendor Name to be placed in a separate Vendor table.Database Management Another example of the process. quantity ordered etc. The field that is the primary key in the new table .in this case. If Vendor Name is left in the Order table. you remove the repeating groups (fields 3 through 7 and 8 through 12) and place them in a second table (part b). the database will store redundant data each time a product is ordered from the same vendor. Repeating groups are listed in parentheses (part a). For every order there will only be one order number and date.

Order Date. Order Date) Line Item (Order# +Product #. Vendor #. Vendor #) Product (Product #. Quantity Ordered. Order Date) Line Item (Order # + Product #. Quantity Ordered. Campbell 2010 62 . Vendor Name) c) Second Normal Form (2NF) Order (Order #.Database Management Order Order # 1001 1002 1003 1004 Order Date 6/8/2004 6/10/2004 6/10/2004 6/11/2004 Line Item Order# Product # 1001605` 1001203 1002751 1003321 1004605 1004102 Qty Ordered 2 5 6 12 2 2 Vendor# 321 110 165 450 321 450 Product Product # 1002 203 321 605 751 Product Name File Folders CD Jewel Cases Ring Binder White Copy Paper Ballpoint pens Vendor Vendor # 110 166 321 450 Vendor Name Fellowes Pilot Hammermill Globe a) Zero Normal Form (0NF) (Order #. Product Name) d) Third Normal Form (3NF) Order (Order #. Quantity Ordered. Vendor #. Vendor Name)) b) First Normal Form (1NF) Order (Order #. Vendor Name) Product (Product #. Quantity Ordered. Order Date) Line Item (Order # + Product #. Product Name) Vendor (Vendor #. Product Name. Product Name. (Product #. Vendor #. Vendor Name) © Copyright G.

Database Management Normalization Exercises to 3NF.PatientDrug Table Structure PatientI D 9876765 7654433 9876567 8768888 9877771 6512334 Patient Name Brown. Mary Allen. Kay Drug Tricepta n Tavegyl Clidets Ventolin Panadein e Tavegyl Trade Name Tegretol Antihista mine Cyomisti n Inhalado r Panadol ET Antihista mine Formulat ion Tablets Liquid Ointmen t Gas Tablets Liquid Size 100mg 200ml 100ml 20oz 100mg 200ml Dose 30mg 10ml 2ml 1oz 5mg 10ml Frequency Once a day Twice a day Every two hours Once a day Twice a day Twice a day Side Effect Stomach Cramps Headache Kidney damage Panad eine Indigestion Headache PanadolET Tablets 100mg 5mg Twice a day Indigestio n Drug Hatce ptan Trade Name Smithcline Formulatio n Capsules Size 200mg Dose 30mg Frequency Once a day Side Effect The key is PatientID & Drug The FDs are: PatientID --> PatientName PatientID. Ann Dunn. Oscar Jones. Drug --> Dose Drug --> Formulation Drug --> SideEffect © Copyright G. Drug --> Frequency Drug --> TradeName Drug --> Size Size --> SideEffect PatientID. Karen Green. Campbell 2010 63 . Exercise 1 . Bob Harris.

Joan Brown. Customer# --> Sales Amount Warehouse# --> Warehouse Location Salesman# --> Sales Area Customer# --> Warehouse# © Copyright G.890 34.000 The key is Salesman# and Customer# The FDs are: Salesman# --> Salesman Name Customer# --> Customer Name Customer# --> Warehouse Location Salesman#. Ian Matthews. Joan Matthews. Johnathan Sales Area West West East West West North Customer# 18765 18830 32112 98787 98799 87889 Customer Name Delta Services Levy & Sons Johnsons Facey Webster’s Inc Taino Limited Warehouse# 4 3 5 4 7 2 Warehouse Location Fargo Bismarck Goshen Fargo Portsmouth Ferry Sales Amount 13. Kevin Allen. 540 10.600 14.800 45. Kevin Walters.Database Management Exercise 2 – Sales Table Salesman# 3462 3462 4578 1111 1111 6765 Salesman Name Walters. Campbell 2010 64 .877 40.

Logical organization must be stable so that programs do not have to be rewritten. (This gives the DBA the freedom to change both the physical and logical aspects of the database system without disturbing the applications built on the database. Name.Database Management Assessment of file layouts as they affect the functioning of a database. Name. address in one file Id#. Physical and logical data organization. The database designer should therefore try to optimize the physical model for space and time considerations.g. The performance parameters normally used are the space estimates and time estimates. Software hides the complexity.I/O can be reduced if some redundant data is carried. address. It is important to evaluate the performance characteristics of the physical model before implementing the database. Efficient use of storage is of a little concern. subject. Campbell 2010 65 . Logical Simplicity is important Physical Complex organizations may be important. Id#. Means of finding/addressing data does not have a major effect on logical structures. Data independence is of little concern if facilities are provided for restructuring the physical data. Methods of locating data depends on how data is physically laid out.g. Once the database is installed it is difficult or impossible to redesign it.g. but not having redundant data can save space but cost more time. Efficient use of storage is of major concern. E. Physical layout may be changeable. E. grade © Copyright G.) Application program requests correspond to the logical data structure. 1 file vs 2 files etc. Application programs requests are usually unrelated to data storage. Program does not care about physical layout of data. subject. High level of redundancy often exists between logical files. Elimination of redundancy is an objective of physical organization. Note trade-offs between space and time . id#. designed for periodic reorganization. grade in another file E. Both of these parameters are predictable. Data independence is of prime importance. Addressing techniques have a major effect on physical storage layout.

Database Management UNIT III: INTRODUCTION TO RELATIONAL ALGEBRA AND SQL The languages used in database systems A 4GL (4th generation language) is a product that aids the development of new systems.The DML is that portion of the DBMS. a Data Manipulation Language (DML) is provided which must be used to access the data. • Procedural DMLs require that the user specify the data that is needed from the database and how to obtain it Procedural DMLs are more difficult to use since they require that the user be proficient in using the language commands to manipulate the structure and the contents of the data file. for this reason Data Description Language (DDL) is provided which must be used to specify the data in the database.The DDL is that portion of the DBMS. Campbell 2010 66 . Data Manipulation Language . Most 4GLs make use of relational databases. The functions of a DDL may therefore include: Creating Database structures Creating table structures Associating fields with table structures Associating data types with field structures etc. They are called 4th generation because they work at higher level than normal high level languages such as COBOL or Pascal. The combination of the DDL and DML is often called a Data Sub-Language (DSL) or a query language. Data descriptions must be standardized. which allows us to create and modify the structure of the database and the database tables. and retrieve data from the database. On the other hand they are more flexible since they allow the user to determine the method that is used for accessing and manipulating the structure and contents of a file. Some 4GLs are actually the combination of a database query language and other facilities. © Copyright G. modify. Some databases have their own computer languages associated with them. which allows us to store. Similarly. which allow the user to access and retrieve data. Other databases are only accessed via third generation languages. which themselves have query languages which perform operations at a very high level. There are two types of DMLs: procedural DML and the nonprocedural DML. Features of a 4GL • Defines data • Define what processing must be performed on the data • Define report or screen format • Define input data and validation checks • Handle user queries The role of Relational DMLs and DDLs. Data Definition Language .

but it does not allow the user to tell how to obtain it Nonprocedural DMLs are easier to use since they do not require a detailed knowledge of the language commands. union. © Copyright G.queries describe a desired set of tuples by specifying a predicate the tuples must satisfy. The operators are join. The query language allows the end user to generate adhoc queries. The user describes the answer but does not give the algorithm for finding it.allows the user to explicitly describe how to find the answer to the query. which are immediately answered. In most languages the DML and the query language are one and the same. On the other hand they lack flexibility since the programmer has no way of determining the method for accessing and manipulating the contents of the data file. The difference between relational algebra and relational calculus. which are needed to manipulate the structure and the contents of a data file. Query Language The implementation of a query language is very vital for a DBMS. Today. Query languages can roughly be divided into two types: • Relational algebra . set difference. selection. Uses specific operators to apply to tables. Campbell 2010 67 . many DBMS also provide support for a standardized query language that may be different from the DML of the language. Please note that it is the nonprocedural DML of a 4th Generational Language that allows it to exhibit structural and data independence. This is known as the Structured Query Language (SQL). projection. • Relational calculus . notation for formulating the definition of that desired relation.• Database Management ·Nonprocedural DMLs require that the user specify the data that is needed from the database.

y ( σ x = 7 (A) ) OR σ x = 7 ( Πx. © Copyright G. one unary and builds a relation consisting of all values of one attribute of the binary relation that match (in the other attribute) all values in the unary relation.Database Management Relational algebra Relational Algebra is: • the formal description of how a relational database operates • an interface to the data stored in the database itself • the mathematics which underpin SQL operations This section uses the sample tables below along with others to demonstrate how to solve relational algebra problems. Πx. Campbell 2010 68 . one binary. Please note that projection and selection can be combined.y (A) ) Difference (or Set Difference) A . y) of table A. R divided by S by matching x to x and z to z. Selection σ x = 7 (A) Produces a subset of rows that match/satisfy a criteria (field x = 7). A b c a f b d B b e d d a f R a q a x y z S x z a d c Simple projection Πx. Answer = a from other field. daf Division Takes 2 relations. Union for relations with same arity (number of attributes) A U B = all rows appearing in both A and B without repeating duplicates.y (A) = Produces output showing only certain attributes (x.B = rows in A but not in B abc cbd Renaming A rename is a unary operation written as ρa / b(R) where the result is identical to R except that the b field in all tuples is renamed to an a field. abc daf cbd bed Intersection A ∩ B = Builds a relation consisting of all tuples appearing in both files. This is simply used to rename the attribute of a relation or the relation itself.

but the result of an antijoin is only those tuples in R for which there is NOT a tuple in S that is equal on their common attribute names. The semi-join is joining similar to the natural join and written as R S where R and S are relations. is similar to the natural join. The result of the semi-join is only the set of all tuples in R for which there is a tuple in S that is equal on their common attribute names.keep data from the left-hand table o RIGHT OUTER JOIN .include rows in table A with no match. • Natural join .keep data from the right-hand table o FULL OUTER JOIN . • The antijoin. Campbell 2010 69 . depending on which data is to be kept.don’t repeat common field.Database Management Another Example of Division Join (natural. outer) A B = Builds a relation consisting of all possible concatenated pairs of tuples one from each of the 2 files.keep data from both tables • Opposite of the outer join is the regular/semi-join/inner. written as R S where R and S are relations. There are three forms of the outer join. • Opposite of natural join is the equi-join • θ-join – using conditions • Outer Join . equi. o LEFT OUTER JOIN . inner. © Copyright G.

Database Management
Example of Natural Join

Example of θ-join

Consider tables Car and Boat which list models of cars and boats and their respective prices. Suppose a customer wants to buy a car and a boat, but she doesn't want to spend more money for the boat than for the car. The θ-join on the relation CarPrice ≥ BoatPrice produces a table with all the possible options.

Example of a semijoin

Example of Left Outer Join

© Copyright G. Campbell 2010

70

Database Management
Example of Right Outer Join

Example of Full Outer Join

Example of an antijoin

© Copyright G. Campbell 2010

71

Database Management Cartesian product. The Cartesian Product is also an operator which works on two sets. It is sometimes called the CROSS PRODUCT or CROSS JOIN. It combines the tuples of one relation with all the tuples of the other relation. Cartesian Product Example

© Copyright G. Campbell 2010

72

Lastname (σ Age < 21 (Student Council)) Math Grades ÷ Scholarship Grades © Copyright G. Campbell 2010 73 .Database Management Relational Algebra Exercises Exercise 1 Key Club Table IdNumber 452145 785475 745874 745888 888999 Student Council Table IdNumber 785475 745874 362121 Math Grades Table IdNumber 452145 785475 745874 745888 888999 Firstname John Heather Michelle Keith Ingrid Lastname Jones Coombs Gentles Smith Harris Age 18 22 20 25 30 Sex M F F M F Firstname Heather Michelle Philip Lastname Coombs Gentles Cameron Age 22 20 19 Sex F F M Grade 56 99 82 65 70 Scholarship Grades Table Grade 99 82 i) ii) iii) iv) v) vi) Key Club ∪ Student Council Key Club ∩ Student Council Key Club . Age (Key Club) ∏ Firstname.Student Council ∏ Firstname.

EMPNO (σ JOBNO > 30 (Employees) ) [3 marks] c) Employees ∪ Retired Employees [3 marks] d) Retired Employees – Employees [3 marks] e) Employees ∩ Retired Employees [3 marks] f) σ EMPNO > 200 (Employees) [3 marks] g) Jobs ÷ Insured Jobs [3 marks] Jobs (Outer. Regular) Classes (Outer. Natural) Exercise 3 EMPNO 111 234 456 121 Employees NAME Adams Henry Gregg Brown JOBNO 34 23 23 78 EMPNO 456 789 369 Retired Employees NAME JOBNO Gregg 23 Jones 12 Wilson 56 Jobs JOBNO JOBTITLE 12 Mason 23 Carpenter 34 Plumber Insured Jobs JOBNO 23 a) NAME. ClassCode (σ Idnumber > 6 (ComputerStudents)) h) σ Idnumber > 6 (Π Name (ICEPStudents)) i) j) ICEPStudents ICEPStudents Classes (Equi. ICEPStudents Idnumber 5 9 16 Name Karen Henry Crystal Adobe Donna Building Classes ClassCode CSS 1D 3D a) b) c) d) e) ClassName Cert in Computing Year 1 Comp Major Year 3 Comp Major ClassCode 2S CSS 1D Idnumber 4 9 22 [20 marks] ComputerStudents Name Ellen Albright Crystal Adobe Peter Rock FinalYearClasses ClassCode 3D ClassCode MIS CSS CSO ICEPStudents ∪ ComputerStudents ICEPStudents ∩ ComputerStudents ICEPStudents – ComputerStudents Classes ÷ FinalYearClasses Name. JOBNO (Retired Employees) [3 marks] b) JOBNO. Campbell 2010 74 .Dec 2001 Past Paper Question 5 Given the files below.Database Management Exercise 2 . Natural) [4 marks] h) Employees © Copyright G. ClassCode (ComputerStudents) f) σ Idnumber > 6 (ICEPStudents) g) Π Name. give the results for the relational algebra.

Database Management Exercise 4 a) Which relational algebra operation is unary? b) If a Cartesian product is done from one table to itself. Campbell 2010 75 . how would you prevent duplicate field names? © Copyright G.

4) distinct(dept) amount * 10 [files] join SELECT a.Database Management SQL Commands – LAB PORTION What is SQL? Abbreviation of structured query language. The original version called SEQUEL (structured English query language) was designed by an IBM research center in 1974 and 1975. field2. min. ANSI approved a rudimentary version of SQL as the official standard. SQL is a standardized query language for requesting information from a database. however. Increasingly. This enables several users on a local-area network to access the same database simultaneously. Please note that SQL command syntax varies slightly from one DBMS to the other. Please note that even though SQL is done in the lab.141 instead of host name] Brief Summary of Commands 1. max substr(field. Data Manipulation Projection and Selection SELECT [field(s)] FROM [file(s)] WHERE [condition] ORDER BY [field(s)] GROUP BY [field] HAVING [condition] [fields] * all fields count(*) sum(salary) field1.10. and pronounced either see-kwell or as separate letters. b. SQL is being supported by PC database systems because it supports distributed databases (databases that are spread out over several computer systems).field. file2.field. it is nevertheless the closest thing to a standard query language that currently exists. SQL has been the favorite query language for database management systems running on minicomputers and mainframes. Campbell 2010 76 . …. Although there are different dialects of SQL. In 1986. SQL was first introduced as a commercial database system in 1979 by Oracle Corporation. 1. Fieldn count(distinct dept) also avg. Historically. but most versions of SQL since then have included many extensions to the ANSI standard. In 1991.5.field © Copyright G.field OR SELECT file1. you are required to know the syntax by heart for the written final exam. Oracle command • At command line type CONNECT • User Name SYSTEM • Password ADMIN MySQL command • Start run cmd <enter> • Mysql –u gcampbell –p –h exedvhost1 • Pwd gcampbell [can use 10. ANSI updated the standard. The new standard is known as SAG SQL.

z) INSERT INTO file SELECT stmt … WHERE Clause Field IN (‘A’.2)) CREATE [UNIQUE] INDEX indexname ON file (field1 ASC. Campbell 2010 77 . field3) AS SELECT stmt … ALTER TABLE file ADD field CHAR(5) DROP TABLE file DROP INDEX indexname on tablename DROP VIEW viewname Control GRANT SELECT ON file to PUBLIC REVOKE SELECT ON file FROM PUBLIC COMMIT ROLLBACK MySQL data types Auto_increment Char Boolean Data Dec/Decimal Double Double precision Float Int/Integer © Copyright G. CREATE TABLE file (field1 CHAR (5) NOT NULL.Database Management WHERE a. field2. Data Definition E. >=. Sets conditions for summary (grouped) data.e. ‘B’.field = b. field2 DESC) CREATE VIEW viewname (field1. field3 DEC(5. ‘C’) Dept LIKE (“A%”) Dept [NOT] LIKE (“E_”) Dept between ‘A’ and ‘C’ Salary < 200 OR/AND sex =”F” (>. field2 INT. HAVING count(*) > 3 2. age ASC second field) OR ORDER BY 2 (i. HAVING Clause Used with a GROUP BY. y. =.g. field2 = field2 +20 WHERE [condition] DELETE FROM file WHERE [condition] INSERT INTO file VALUES (x. <=) ORDER BY Clause ORDER BY name DESC.field Union SELECT stmt 1 UNION ALL SELECT stmt 2 Modification UPDATE file SET field1 = value. <>.

e. foreign key) The SQL command for creating an empty table has the following form: create table <table> ( <column 1> <data type> [not null] [unique] [<column constraint>]. Oracle offers the following basic data types: • char(n): Fixed-length character data (string). (+ can be memory consuming). number(5. DEPTNO number(2) ).g. SAL number(7. Example: The create table statement for the EMP table has the form create table EMP ( EMPNO number(4) not null... The maximum size for n is 2000 (4000 in Oracle8). • The default format for a date is: DD-MMM-YY. a name and a data type must be specified and the column name must be unique within the table definition. • Maximum values: o =38. A not null constraint is directly specified after the data type of the column and the constraint requires defined attribute values for that column.. Only the bytes used for a string require storage.. NB: Except for the columns EMPNO and ENAME null values are allowed. number(5. Examples: ’13-OCT-94’.Database Management CREATE TABLE (using constraints – primary key. Examples: number(8).2) • Note that. d= number of digits to the right of the decimal point.99 without resulting in an error. • date: Date data type for storing date and time. ENAME varchar2(30) not null. Campbell 2010 78 .. Example: varchar2(80) • number(o. In fact. smallint and real. JOB varchar2(10). There is no difference between names in lower case letters and names in upper case letters. HIREDATE date. MGR number(4). The maximum size for n is 255 bytes (2000 in Oracle8). For each column. . different from null. dec[imal]..2). Data types derived from number are int[eger]..2) cannot contain anything larger than 999.. [<table constraint(s)>] ). n characters long. the attribute value null is allowed and two tuples having the attribute value null for this column do not violate the constraint. Note that a string of type char is always padded on right with blanks to full length of n. the only place where upper and lower case letters matter are strings comparisons. Unless the condition not null is also specified for this column. The keyword unique specifies that no two records can have the same attribute value for this column. Example: char(40) • varchar2(n): Variable-length character string. o = overall number of digits. d= −84 to +127. Column definitions are separated by comma.. d): Numeric data type for integers and reals. <column n> <data type> [not null] [unique] [<column constraint>]. ’07-JAN-98’ © Copyright G.

the employee number of the project’s manager. of course. . It is advisable to name a constraint in order to get more meaningful information when this constraint is violated due to. any column constraint can also be formulated as a table constraint. for our EMP table in the example above. Probably the most important type of integrity constraints in a database are primary key constraints. Example: create table EMP ( EMPNO number(4) constraint pk emp primary key. Example: We want to create a table called PROJECT to store information about projects.a project is identified by its project number. Only one long column is allowed per table.the manager and the budget must be defined.the name of a project must be unique. Each value for the attribute EMPNO thus must appear only once in the table EMP. Based on a primary key. Furthermore. . PERSONS number(5). . e. The definition of a table may include the specification of integrity constraints. A primary key constraint enables a unique identification of each record in a table. However. the budget and the number of persons working on the project. ). . The two most simple types of constraints have already been discussed: not null and unique. Basically two types of constraints are provided: column constraints are associated with a single column whereas table constraints are typically associated with more than one column. If no name is specified for the constraint. may only have one primary key. PNAME varchar2(60) unique. For each project. an insertion of a record that violates the constraint. It should be noted that data types vary from one database to another. Oracle automatically generates a name of the pattern SYS C<number>. Note that in contrast to a unique constraint. Campbell 2010 79 . null values are not allowed. we want to store the number and the name of the project. and the start date and end date of the project. we have the following conditions: . For example. The specification of a (simple) constraint has the following form: [constraint <name>] primary key | unique | not null A constraint can be named. .• Database Management long: Character data up to a length of 2GB. the specification defines the attribute EMPNO as the primary key for the table. © Copyright G..g. A table. Table definition: create table PROJECT ( PNO number(3) constraint prj pk primary key. PMGR number(4) not null. the database system ensures that no duplicates appear in a table.

primary key (empno). PNO CHAR(6). constraint EmpC foreign key (deptcode) references DeptTable). For this. . empname char(40). salary number(6. the project start date should be set to January 1st. PNO ). PSTART date. WEIGHT DEC(3). Example: If no start date is given when inserting a tuple into the table PROJECT. FOREIGN KEY ( SNO ) REFERENCES SUPPLIERS. QTY DEC(5). . PRIMARY KEY ( PNO ) ) CREATE TABLE INVENTORY ( SNO CHAR(5). <column j>) is used. COLOR CHAR(6). e.g..Database Management BUDGET number(8. © Copyright G. PNAME CHAR(20). we have to add the table constraint. dateofbirth date. . when a tuple is inserted. CITY CHAR(15). Example: Alter table Employees add column nisno char(6). A primary key constraint that includes more than only one column can be specified in an analogous way. CITY CHAR(15). 1995: PSTART date default(’01-JAN-95’) Examples: Create table Employee (empno int. . A column can be added using the alter table command alter table <table> add(<column> <data type> [default <value>] [<column constraint>]). Campbell 2010 80 . CONSTRAINT FKC FOREIGN KEY ( PNO ) REFERENCES PARTS ) NB.2). If it is required. deptcode char(3). In this case the pattern unique(<column i>. FKC is the name of the constraint ALTER TABLE It is possible to modify the structure of a table (the relation schema) even if records have already been inserted into this table. PRIMARY KEY ( SNO. CREATE TABLE SUPPLIERS ( SNO CHAR(5). for example. PRIMARY KEY ( SNO) ) CREATE TABLE PARTS ( PNO CHAR(6). we use the default clause.2) not null. that no two projects have the same start and end date. STATUS DEC(3). PSTART) This constraint has to be defined in the create table command after both columns PEND and PSTART have been defined. A unique constraint can include more than one attribute. PEND date). Instead of a not null constraint it is sometimes useful to specify a default value for an attribute if no value is given. SNAME CHAR(20) NOT NULL. Constraint no same dates unique(PEND.

.42. ’10-OCT-94’). If a column is omitted. e. This is useful. Example: Alter table Employees modify lastname char(35). Note that a column constraint is a table constraint. Therefore an insertion does not necessarily have to follow the order of the attributes as specified in the create table statement. 4. 150000. . value j>). PSTART) values(313. [NB. . Table definitions can be modified in an analogous way. however. for each column as defined in the create table statement a value must be given. A table constraint can be added to a table using alter table <table> add (<table constraint>). If no column list is given. the value null is inserted instead. BUDGET.g. © Copyright G. For each of the listed columns. ’DBS’. The syntax of the command for modifying a column is alter table <table> modify(<column> [<data type>] [default <value>] [<column constraint>]). Examples: Alter table Employees drop column Address3. PERSONS. . respective add clauses need to be separated by colons. ALTER TABLE SUPPLIERS ADD COLUMN STATE CHAR(15) ALTER TABLE SUPPLIERS DROP COLUMN CITY ALTER TABLE SUPPLIERS ADD TRN INT ALTER TABLE PARTS ADD DISCOUNT SMALLINT ALTER TABLE PARTS ALTER COLUMN COLOR CHAR(10) [In MySQL] ALTER TABLE PARTS MODIFY COLOR CHAR(10) [in Oracle] ALTER TABLE DROP CONSTRAINT FKC ALTER TABLE STUDENTS ADD CONSTRAINT FKC FOREIGN KEY (DEPTID) REFERENCES DEPARTMENTS INSERT The most simple way to insert a record into a table is to use the insert statement insert into <table> [(<column i. PNAME. column j>)] values (<value i. . a corresponding (matching) value must be specified. not null and primary key constraints can only be added to a table if none of the specified columns contains a null value. Campbell 2010 81 . when the size of strings that can be stored needs to be increased.Database Management If more than only one column should be added at one time. . .. too. Use alter instead of modify for some DBMS’s] A column can be removed using the following: Alter table <table> Drop column <column>. . Examples: insert into PROJECT(PNO.

42. If there are already some data in other tables. . 7411. comparison operators) In order to retrieve the information stored in the database. HIREDATE from EMP where HIREDATE < ’31-DEC-60’. HDATE) select EMPNO. the SQL query language is used. This operation is also called projection. 150000. logical operators. Such an insert statement has the form insert into <table> [(<column i. aggregate functions. the query select LOC. . null). the asterisk symbol “*” can be used to denote all attributes. ’10-OCT-94’. If all columns should be selected. ORDER BY. DEPTNO from DEPT. © Copyright G. GROUP BY. . For example. For this. HAVING. .Database Management or insert into PROJECT values(313. In SQL a query has the following (simplified) form (components in brackets [ ] are optional): select [distinct] <column(s)> from <table> [ where <condition> ] [ order by <column(s) [asc|desc]> ] Selecting Columns The columns to be selected from a table are specified after the keyword select. ’DBS’. Campbell 2010 82 . We now can use the table EMP to insert records into this new relation: insert into OLDEMP (ENO. null. column j>)] <query> Example: Suppose we have defined the following table: create table OLDEMP ( ENO number(4) not null. HDATE date). lists only the number and the location for each tuple from the relation DEPT. The query select * from EMP. SELECT (using WHERE. these data can be used for insertions into a new table. we write a query whose result is a set of records to be inserted.

Instead of an attribute name. • for strings: chr. Inserting the keyword distinct after the keyword select. sin. the sorting criteria is a descending order by the attribute values of HIREDATE. . the where clause is used. For the above query. search string. power. replace(string. concat(string1. substr(string. /. . or. . For this the order by clause is used and which has one or more attributes listed in the select clause as parameter. _. next day. from EMP order by DEPTNO [asc]. HIREDATE from EMP.−. DEPTNO. .Database Management retrieves all records with all columns from the table EMP. forces the elimination of duplicates from the query result. to char. If two records have the same attribute value for DEPTNO. replacement string). If one is interested in records that satisfy certain conditions. Conditions may also include pattern matching operations and even subqueries. length. log. the select clause may also contain arithmetic expressions involving arithmetic operators etc. some numbers will appear more than only once in the query result. exp. which retrieves the department number for each record. desc specifies a descending order and asc specifies an ascending order (this is also the default order). lower. however. Consider the query select DEPTNO from EMP. For example. we would get the following output: ENAME DEPTNO HIREDATE FORD 10 03-DEC-81 SMITH 20 17-DEC-80 BLAKE 30 01-MAY-81 WARD 30 22-FEB-81 ALLEN 30 20-FEB-81 Selection of Records Up to now we have only focused on selecting (some) attributes of all records from a table. mod. +.55 from EMP. In a where clause simple conditions based on comparison operators can be combined using the logical connectives and. © Copyright G. sqrt. . . HIREDATE desc. translate. For the different data types supported in Oracle. Campbell 2010 83 . duplicate result records are not automatically eliminated. to date. DEPTNO. n). month between. cos. Typically. . select ENAME. • for the date data type: add month. SAL* 1. displays the result in an ascending order by the attribute DEPTNO. upper. string2). . . the query select ENAME. m. several operators and functions are provided: • for numbers: abs. It is also possible to specify a sorting order in which the result records of a query are displayed. that is. and not to form complex conditions.

• substr(<string>. String Operations In order to compare an attribute with a string.g. two special characters are used: the percent sign % (also called wild card).<. m]) clips out a m character piece of <string>. Example: select _ from EMP where MGR is not null. • initcap(<string>) converts the initial letter of every word in <string> to uppercase. 10. Campbell 2010 84 .. also called position marker. the condition would be where DNAME like ’%C%C%’. if one is interested in all records of the table DEPT that contain two Cs in the name of the department.g. For all data types. for a tuple to be selected there must (not) exist a defined value for this column. • length(<string>) returns the length of the string. the underline stands for exactly one character. != or <>. starting at position n.<=. i. DNAME = upper(DNAME) (The name of a department must consist only of upper case letters. even the empty string. the not clause is used. e. Together with this operator. it is required to surround the string by apostrophes. If m is not specified. the end of the string is assumed.. the comparison operators =. substr(’DATABASE SYSTEMS’. where LOCATION = ’DALLAS’.. E. • select ENAME from EMP where HIREDATE between ’02-APR-81’ and ’08-SEP-81’. The percent sign means that any (sub)string is allowed there. Further comparison operators are: • Set Conditions: <column> [not] in (<list of values>) Example: select _ from DEPT where DEPTNO in (20. ENAME. SAL from EMP where (MGR = 7698 or MGR = 7566) and SAL > 1500. >. e. For example.) • lower(<string>) converts any letter to lowercase.30). n [. Aggregate Functions © Copyright G.e. Note: the operations = null and ! = null are not defined! • Domain conditions: <column> [not] between <lower bound> and <upper bound> Examples: • select EMPNO. Further string operations are: • upper(<string>) takes a string and converts any letters in it to uppercase.g. and the underline .Database Management Example: List the job title and the salary of those employees whose manager has the number 7698 or 7566 and who earn more than 1500: select JOB. • Null value: <column> is [not] null. 7) returns the string ’SYSTEMS’. To test for inequality. SAL from EMP where SAL between 1500 and 2500. Thus the condition where DNAME like ’%C C%’ would require that exactly one character appears between the two Cs. In contrast. => are allowed in the conditions of a where clause. A powerful operator for pattern matching is the like operator.

. select max(SAL) .DEPTNO. • avg Computes average value for a column (only applicable to the data type number) Note: avg. .]<column i>. . however. If we want to refer to either of these columns in the where or select clause. © Copyright G. In SQL the select statement is used for this kind of queries joining relations: select [distinct] [<alias ak>. min. max etc. . .min(SAL) from EMP. [<alias al>. but this sometimes can lead to rather lengthy query formulations. but count considers null values. select sum(SAL) from EMP where DEPTNO = 30. conditions in a where were restricted to simple comparisons. Instead of a table alias also the complete relation name can be put in front of the column such as DEPT. select min(SAL). .]<column j> from <table 1> [<alias a1>]. Joining Tables Thus far we have only focused on queries that refer to exactly one table. max(SAL) from EMP. For example. Furthermore. is to combine (join) records stored in different tables in order to display more meaningful and complete information. . • sum Computes the sum of values (only applicable to the data type number) Example: Sum of all salaries of employees working in the department 30. They are used to compute a single value from a set of attribute values of a column: • count Counting Rows Example: How many records are stored in the relation EMP? select count(*) from EMP. • • max Maximum value for a column min Minimum value for a column Example: List the minimum and maximum salary. min and max ignore tuples that have a null value for the specified attribute.Database Management Aggregate functions are statistical functions such as count. Example: Compute the difference between the minimum and maximum salary. . Example: How many different job titles are stored in the relation EMP? select count(distinct JOB) from EMP. the column DEPTNO occurs in both EMP and DEPT. a table alias has to be specified and put in the front of the column name. A major feature of relational databases. <table n> [<alias an>] [where <condition>] The specification of table aliases in the from clause is necessary to refer to columns that have the same name in different tables. Campbell 2010 85 .

ENAME.EMPNO. the name of its manager. we now want to retrieve the name as well as the number and the name of the department where he is working: select ENAME. A respective condition in the where clause then can have one of the following forms: 1.DEPTNO = E.e. Any number of tables can be combined in a select statement. Example: For each project. In such a case the query is called a subquery and the complete select statement is called a nested query. Test for (non)existence [not] exists (<subquery>) © Copyright G. As we have already seen for the insert statement. E2. E. Set-valued subqueries <expression> [not] in (<subquery>) <expression> <comparison operator> [any|all] (<subquery>) An <expression> can either be a column or a computed value. retrieve its name.DEPTNO and JOB = ’SALESMAN’. For each salesman. queries can be used for assignments to columns. Example: In the table EMP only the numbers of the departments are stored.ENAME from EMP E1. DEPT D where E.DEPTNO. PNAME from EMP E. EMP E2 where E1. A query result can also be used in a condition of a where clause. we have compared a column with a constant or we have compared two columns. DEPT D.MGR and D.Database Management Comparisons in the where clause are used to combine rows from the tables listed in the from clause. 2.EMPNO.DEPTNO = D. SELECT sub queries Up to now we have only concentrated on simple comparison conditions in a where clause. DNAME.. It is even possible to join a table with itself: Example: List the names of all employees together with the name of their manager: select E1. DNAME from EMP E. Campbell 2010 86 . Explanation: The join columns are MGR for the table E1 and EMPNO for the table E2. not their name.DEPTNO. i.MGR = E2. The equijoin comparison is E1.EMPNO = P. and the name of the department where the manager is working: select ENAME.MGR = E2. PROJECT P where E.

the subquery is evaluated individually. Note that an alias for the table EMP in the subquery is not necessary since columns without a preceding alias listed there always refer to the innermost query and tables.. it is advisable to use the in operator. As long as the result of a subquery is not known in advance. whether it is a single value or a set. i. this record belongs to the query result set. The subquery retrieves only one value (the number of the department located in Boston).MGR. If the condition where DEPTNO in . A subquery may use again a subquery in its where clause. Example: List the name and salary of employees of the department 20 who are leading a project that started before December 31. this tuple is selected. Explanation: The subquery retrieves the set of those employees who manage a project that started before December 31. Explanation: The subquery in this example is related to its surrounding query since it refers to the column E1. © Copyright G.e. Such type of queries is called correlated subqueries. evaluates to true. Campbell 2010 87 .]EMPNO = E1. Example: List all those employees who are working in the same department as their manager (note that components in [ ] are optional: select * from EMP E1 where DEPTNO in (select DEPTNO from EMP [E] where [E. SAL from EMP where EMPNO in (select PMGR from PROJECT where PSTART < ’31-DEC-90’) and DEPTNO =20. An important class of subqueries are those that refer to its surrounding (sub)query and the tables listed in the from clause. . . If the employee working in department 20 is contained in this set (in operator). 1990: select ENAME. 1990.MGR). Thus conditions can be nested arbitrarily. One can think of the evaluation of this query as follows: For each tuple in the table E1.Database Management In a where clause conditions using subqueries can be combined arbitrarily by using the logical connectives and and or. A record is selected from the table EMP (E1) for the query result if the value for the column DEPTNO occurs in the set of values select in the subquery. Example: List all employees who are working in a department located in BOSTON: select * from EMP where DEPTNO in (select DEPTNO from DEPT where LOC = ’BOSTON’). Thus it is possible to use “=” instead of in. respectively.

If there exists a corresponding record in the table EMP. Example: List workers who receive a higher rate than the average hourly rate. If the subquery yields an empty result set. In case no such record exists.e. Note: Also in this subquery no aliases are necessary since the columns refer to the innermost from clause. Campbell 2010 88 . in contrast.Database Management Conditions of the form <expression> <comparison operator> [any|all] <subquery> are used to compare a given <expression> with each value selected by <subquery>. the condition evaluates to true if for all rows selected by the subquery the comparison holds.DEPTNO). the condition is not satisfied. © Copyright G. the condition is checked whether there exists a record in the table EMP that has the same department number (DEPT. Such type of queries is formulated using the exists operator. = any not in . the following equivalences hold: in . • For the clause any. the condition evaluates to true if there exists at least on row selected by the subquery for which the comparison holds. Example: List all departments that have no employees: select * from DEPT where not exists (select * from EMP where DEPTNO = DEPT. • For the clause all. In this case the condition evaluates to true if the subquery does not yield any row or value. Explanation: For each tuple from the table DEPT. For all and any. Example: Retrieve all employees who are working in department 10 and who earn at least as much as any (i. Example: List all employees who are not working in department 30 and who earn more than all employees working in department 30: select * from EMP where SAL > all (select SAL from EMP where DEPTNO = 30) and DEPTNO <> 30. the condition is satisfied for the tuple under consideration and it is selected. at least one) employee working in department 30: select * from EMP where SAL >= any (select SAL from EMP where DEPTNO = 30) and DEPTNO = 10.DEPTNO).. <> all or != all Often a query result depends on whether certain rows do (not) exist in (other) tables. the record is not selected.

ENAME from EMP union select EMPNO. • intersect returns all rows that appear in both results <query 1> and <query 2>. Duplicates are automatically eliminated unless the clause all is used.hrly_rate > (select avg(b.Database Management Select empname from employee Where hrly_rate > (select avg(hrly_rate) from employee).hrly_rate) From worker b Where b. • minus returns those rows that appear in the result of <query 1> but not in the result of <query 2>. • Employees who are only listed in EMP: select * from EMP minus [NB. Campbell 2010 89 . SQL supports three set operators which have the pattern: <query 1> <set operator> <query 2> The set operators are: • union [all] returns a table consisting of all rows either appearing in the result of <query1> or in the result of <query 2>. Grouping In previous sections we have seen how aggregate functions can be used to compute a single value for a column.name from worker a Where a. Example: Assume that we have a table EMP2 that has the same structure and columns as the table EMP: • All employee numbers and names from both tables: select EMPNO. ENAME from EMP2.supv_id = a. Example: List workers who get an hourly rate higher than the average of those workers reporting to the worker’s supervisor? Select a. Operations on Result Sets Sometimes it is useful to combine query results from two or more queries into a single result. Each operator requires that both tables have the same data types for the columns to which the operator is applied. In other DBMS’s use EXCEPT instead of MINUS] select _ from EMP2. • Employees who are listed in both EMP and EMP2: select * from EMP intersect select * from EMP2.supv_id). Often applications require grouping rows that have certain properties and then applying an aggregate function on one column for each group © Copyright G.

SQL provides the clause group by <group column(s)>. As for the select clause also in a having clause only <group column(s)> and aggregations can be used. max(SAL) from EMP where JOB = ’CLERK’ group by DEPTNO having count(*) > 3. select <column(s)> from <table(s)> where <condition> group by <group column(s)> [having <group condition(s)>]. only respective rows build a group. For example. This clause appears after the where clause and must refer to columns of tables listed in the from clause. e. Rows from the table EMP are grouped such that all rows in a group have the same department number. It is important that only those columns that appear in the <group column(s)> clause can be listed without an aggregate function in the select clause ! Example: For each department.g. The query then would retrieve the minimum and maximum salary of all clerks for each department. certain groups can be eliminated based on their properties. if we add the condition where JOB = ’CLERK’. Note that is not allowed to specify any other column than DEPTNO without an aggregate function in the select clause since this is the only column listed in the group by clause (is it also easy to see that other columns would not make any sense). Campbell 2010 90 . © Copyright G. min(SAL). we want to retrieve the minimum and maximum salary. This type of condition is specified using the having clause.. if a group contains less than three rows. Those rows retrieved by the selected clause that have the same value(s) for <group column(s)> are grouped. Example: Retrieve the minimum and maximum salary of clerks for each department having more than three clerks. select DEPTNO.Database Management separately. The aggregate functions are then applied to each such group. select DEPTNO. Once groups have been formed. max(SAL) from EMP group by DEPTNO. We thus get the following query result: DEPTNO MIN(SAL) MAX(SAL) 10 1300 5000 20 800 3000 30 950 2850 Rows to form a group can be restricted in the where clause. Aggregations specified in the select clause are then applied to each group separately. For this. min(SAL).

Analogous to the insert statement. • All employees working in the departments 10 and 30 get a 15% salary increase. 5. Retrieve values for the columns and aggregations listed in the select clause. © Copyright G. DEPTNO = 20. update EMP set SAL = (select min(SAL) from EMP where JOB = ’MANAGER’) where JOB = ’SALESMAN’ and DEPTNO = 20. 3. A query containing a group by clause is processed in the following way: 1. . update EMP set SAL = SAL * 1. Examples: • The employee JONES is transferred to the department 20 as a manager and his salary is increased by 1000: update EMP set JOB = ’MANAGER’. however. .15 where DEPTNO in (10. . or an SQL query. Campbell 2010 91 . Select all rows that satisfy the condition specified in the where clause. Note that the new value to assign to <column i> must a the matching data type. Example: All salesmen working in the department 20 get the same salary as the manager who has the lowest salary among all managers. . <column j> = <expression j> [where <condition>]. An update statement without a where clause results in changing respective attributes of all records tuples in the specified table. In the above query. for example. instead of the constant 3. only a (small) portion of the table requires an update. Typically.Database Management Note that it is even possible to specify a subquery in a having clause. In such a case we have a <query> instead of an <expression>. UPDATE For modifying attribute values of (some) records in a table. Apply aggregate functions to each group.30). 2. other tables can be used to retrieve data that are used as new values. 4. An expression consists of either a constant (new value). an arithmetic or string operation. Discard all groups that do not satisfy the condition in the having clause. a subquery can be specified. From these rows form groups according to the group by clause. we use the update statement: update <table> set <column i> = <expression i>. SAL = SAL +1000 where ENAME = ’JONES’.

the deletions cannot be undone. the columns of the view get the same names as the attributes listed in the select statement (if possible). It is important that the order of data types and values of the selected row exactly correspond to the list of columns in the set clause. The optional clause or replace re-creates the view if it already exists. Example: The following view contains the name. column j>) = <query>. all records are deleted from the table. An alternative command for deleting all records from a table is the truncate table <table> command. JOB.g. However. SAL_12 ANNUAL SALARY from EMP where DEPTNO = 20. job title and the annual salary of employees working in the department 20: Create view DEPT20 as select ENAME. In Oracle the SQL command to create a view (virtual table) has the form create [or replace] view <view-name> [(<column(s)>)] as <select-statement> [with check option [constraint <name>]]. In this case the set clause has the form set(<column i. In the select statement the column alias ANNUAL SALARY is specified for the expression SAL_12 and this alias is taken by the view. which returns the name of the user logged into the current Oracle session.Database Management Explanation: The query retrieves the minimum salary of all managers. . MS-Access) have this command. If <column(s)> is not specified in the view definition. . Example: Delete all projects (tuples) that have been finished before the actual date (system date): delete from PROJECT where PEND < sysdate. Campbell 2010 92 . sysdate is a function in SQL that returns the system date. . Not all DBMS’s (e. If the where clause is omitted. DELETE All or selected records can be deleted from a table using the delete statement: delete from <table> [where <condition>]. <column(s)> names the columns of the view. in this case. It is also possible to specify a query that retrieves more than only one value (but still only one record!). This value then is assigned to all salesmen working in department 20. . An alternative formulation of the above view definition is © Copyright G. Another important SQL function is user. CREATE VIEW NB.

that is. • set-valued subqueries (in. CREATE © Copyright G. [NB. records can be retrieved from a view (also respective records are not physically stored. SAL _ 12 from EMP where DEPTNO = 20. INSERT. these rows would not be selected based on the select statement. . UPDATE. Campbell 2010 93 . A view is evaluated again each time it is accessed. all) or test for existence (exists) • group by clause or distinct clause In combination with the clause with check option any update or insertion of a row into the view is rejected if the new/modified row does not meet the view definition. or records can even be modified. update. but derived on basis of the select statement in the view definition). JOB. GRANT SELECT ON file to PUBLIC REVOKE SELECT ON file FROM PUBLIC Examples of privileges to be granted SELECT.. min. or delete modifications on views are allowed that use one of the following constructs in the view definition: • Joins • Aggregate function such as sum. DELETE.Database Management Create view DEPT20 (ENAME. DROP. ANNUAL SALARY) as select ENAME.. any. A view can be used in the same way as a table. JOB.. DROP VIEW A view can be deleted using the command delete <view-name>. In Oracle SQL no insert. field [ASC/DESC]. i. CREATE INDEX Create [UNIQUE] INDEX <indexname> on <table> (field [ASC/DESC] [. . A with check option can be named using the constraint clause.]) [WITH {primary | disallow null | ignore null }] Example: Create UNIQUE index Custid on Customers (CustomerID) with disallow null. Use Drop instead of delete for Oracle] DROP INDEX Drop index Custid on Customers.e. privilegen> on <table> to <username> Revoke < privilege > on <table> from <username>. GRANT and REVOKE Grant <privilege1. max etc.. DROP TABLE A table and its records can be deleted by issuing the command drop table <table> [cascade constraints]..

it is possible to undo all modifications since the last commit. As long as the user has not issued the commit statement.. It is advisable to complete each modification of the database with a commit (as long as the modification has the expected effect). is called a transaction. Note that any data definition command such as create table results in an internal commit. update. To undo modifications. © Copyright G. and delete statements. Campbell 2010 94 . a sequence of insert. They become permanent only after the commit command has been issued. one has to issue the rollback command. A commit is also implicitly executed when the user terminates an Oracle session. i.e. Modifications of records are temporarily stored in the database system.Database Management COMMIT and ROLLBACK A sequence of database modifications.

Display all records from both Departments and MorantBayDepts 11. You no longer need the field maritalstatus. 4 5 6 7 8 9 Create a table called STUDENTS with the following fields:. sex 1 character.SELECT STATEMENT 1. It has the same structure as Departments.000 to the schoolfee of all TVED students. Campbell 2010 95 . 13. UPDATE. department is IT. firstname and lastname of all students sorted by lastname © Copyright G. Add $4. DELETE. department 4 characters. schoolfee currency. Use the insert command to add data to the departments table 2. All fields in the students table for those who are in CS department The idnum. Change the schoolfee to $10. Student with idnumber 12 got married. deptname is Information Technology. Davis. Delete all students whose status says GRADUATED 9. Delete all records from the MorantBayDepts table. Use the insert command to add data to the students table 3. depthead 50 characters. EXERCISE 3 .idnum numeric. 4. it is 10 characters long. SELECT USING UNION 1. change her last name to Gordon and her maritalstatus to M. 6. Remove the link between the two tables. 12. 5. 2 3 Create a table called MorantBayDepts. maritalstatus 1 character. All records and all fields in the Departments table.000. DOB date. All records and all fields in the Students table.Database Management SQL EXERCISES EXERCISE 1 – CREATE TABLE AND ALTER TABLE STATEMENTS 1 Create a table called DEPARTMENTS with the following fields:. depthead is Mr. 3. the field department should be used to link this table to the departments table. Student with idnum 4 changed address to 9 Brentwood Rd 4. Add back the field marital status EXERCISE 2 – INSERT.department 4 characters. Add back the link between the two tables. increase it to 25. Display all records in the departments table that start with the letter C as well as all records in the MorantBayDepts table. lastname each 20 characters. Please name the link so that you can delete it later. The primary key is idnum.000 for all students whose school fee is less than $10. You have realized that 20 characters is not enough for the lastname. 8. The primary key of the table is department.000. telephone long integer. Please also note that the firstname field is a compulsory field. 5. The idnum. deptname 50 characters. firstname and lastname of all students. Add a new record to the Departments table. 2. Copy all of the records in the MorantBayDepts to the Departments table. 7. Please note that the deptname field is a compulsory field. remove it from the table. You forgot the status field. firstname. 10. please add it to the table. address with 50 characters. The School board made a ruling that the minimum school fee for all programs is $10. Add 3 records to the MorantBayDepts table.

Database Management
6. The idnum, firstname and lastname of all students sorted by lastname in descending order. 7. The idnum, firstname, lastname and sex of all students 8. The idnum, firstname, lastname and sex of all female students 9. The idnum, firstname, lastname , sex and maritalstatus of all female married students 10. The firstname, lastname, maritalstatus of single and divorced students 11. The firstname, lastname, maritalstatus of those who are not single or divorced students 12. The idnum, firstname, lastname, schoolfee of all female students sorted by schoolfee 13. The lastname, firstname, schoolfee of students with schoolfee greater than $30,000 sorted by lastname and firstname 14. The lastname, firstname, maritalstatus of students with lastname starting with the letter C 15. The lastname, firstname, maritalstatus of students with lastname not starting with the letter C 16. The total schoolfee 17. The total schoolfee for each department 18. The total schoolfee for each department where totals exceed 30000 19. The total number of students 20. The average schoolfee

EXERCISE 4 - SELECT STATEMENT USING MORE THAN ONE TABLE
1. All fields and records in both tables 2. Firstname, lastname, department, deptname, depthead for all Students. 3. Firstname, lastname, department, deptname, depthead for all students in the CS, BA and HET departments. 4. Firstname, lastname, depthead, maritalstatus of all married students. 5. Firstname, Lastname, deptname of all students whose lastname ends with the letter E. 6. Firstname, lastname, deptname, schoolfee of all students with schoolfee between $50,000 and $80,000 7. Average schoolfee per deptname 8. Average schoolfee per deptname where the average is between $25,000 and $50,000. 9. Total number of students in each deptname 10. Total number of students in each deptname where the department has more than 2 students

EXERCISE 5 – DISTINCT, WILDCARD cont’d, SUB QUERY, CREATE INDEX, DROP TABLE, DROP INDEX
1. 2. 3. 4. 5. 6. 7. 8. 9. Display the departments in the students table. Display each one only once. Display the lastnames of those with “a” as the second letter. Display the names of all students whose schoolfee is more than the average schoolfee. Display the names of the students whose schoolfee is more than the average schoolfee of those in the same department. Display the names of the students who are below the average age. Create an index called NAMEIDX on the students table. The index should be on lastname and firstname. Why would you need to do this? Create a unique index called SEXIDX on the students table. The index should be on sex. Why do you get an error message? Remove the index Delete the table MorantBayDepts.

EXERCISE 6 – REVIEW OF ALL COMMANDS

© Copyright G. Campbell 2010

96

Database Management
WRITE DOWN THE SQL COMMANDS FOR THE FOLLOWING THEN EXECUTE THEM IN ORACLE/MYSQL. Writing the commands before executing them is good practice as you will not have the computer before you in the final examination. (NB. Please prefix all tablenames, viewnames and indexnames with your initials. E.g. GCMOVIETYPES) DATABASE FOR A VIDEO CLUB 1. Create a table called MOVIETYPES with the following fields:- typecode 3 characters, typename 25 characters. The primary key of the table is typecode. 2. Create a table called OTHERMTYPES with the same structure as MOVIETYPES. 3. Create a table called MOVIES with the following fields:- movienum integer, movietitle, 20 characters, typecode 3 characters, producer 20 characters, rating 2 characters, cost 6 numbers with 2 decimal places, datepurchased date. The primary key is movienum, the field typecode should be the foreign key to the table called MOVIETYPES. 4. You forgot the director field, please add it to the MOVIES table, it is 25 characters long. 5. You no longer need the field producer, remove it from the MOVIES table. 6. You have realized that 20 characters is not enough for the movietitle, increase it to 30. 7. Add the following data to the MOVIETYPES table: [COM, Comedy], [HOR, Horror], [DRA, Drama], [TRA, Tragedy], [CAR, Cartoon]. 8. Add the following data to the OTHERMTYPES table: [MUS, Musical], [COM, Comedy], [DOC, Documentary]. 9. Add the following data to the MOVIES table. [123, Finding Nemo, CAR, G, 1500, 01JAN-2005, DisneyPixar], [456, Incredibles, CAR, G, 1300, 03-MAR-2006, Pixar], [789, Pursuit of Happyness, DRA, M, 1000, 02-JAN-2007, Steven Speilberg], [111, Free Willy, DRA, G, 900, 01-JAN-1980, John Holt], [222, Dancing with wolves, DRA, R, 1300, 04OCT-1990, Perry Mason]. 10. Add 6 more of your own records to the MOVIES table. 11. Display all records and all fields in the MOVIES table. 12. Display all records and all fields in the MOVIETYPES table. 13. Display all fields in the MOVIES table for those records who are rated G. 14. Display the movienum, movietitle of all movies. 15. Display the first 5 letters of the movietitle of all movies. 16. Display the movietitle, cost, and cost * 10 of all movies. 17. Display the movienum, movietitle of all movies sorted by rating. 18. Display the movienum, movietitle of all movies sorted by rating in descending order. 19. Display the movietitles that end with the letter S. 20. Display the movienum, movietitle of all movietitles that start with the letter F. 21. Display the movienum, movietitle of all movietitles that start with the letter F and cost less than $2000. 22. Display the movienum, movietitle of all movietitles that start with the letter F or cost less than $2000. 23. Display the movietitle, cost of all movies that cost between $1200 and $1400. 24. Display the total cost of the movies. 25. Display the average cost of the movies. 26. Display the highest and lowest cost of the movies. 27. Display the total cost for each movie rating. 28. Display the total cost for each movie rating where totals exceed $4000 29. Display the total number of movies. 30. Display the typecodes in the MOVIES table. Display each typecode only once. 31. Display the movietitles of the movies whose cost is more than the average cost. 32. Display all fields and records in both tables. 33. Display the movietitle, typecode and typename of all movies. 34. Display the movietitle, typecode and typename of all movies with typecodes CAR, COM and HOR.

© Copyright G. Campbell 2010

97

Database Management
35. Display the movietitle, typecode and typename of all movies with typecodes CAR, COM and HOR. Include the typecodes from the MOVIETYPES table that did not have a match as well. 36. Change the director of movienum 111 to Robin Givens. 37. Change the price of the movienum 123 to $2500. 38. Increase the price of all movies to $1200 if the price is less than $1200. 39. Delete all movies that are rated R. 40. Display all records from both MOVIETYPES and OTHERMTYPES. 41. Display all records that are common to both MOVIETYPES and OTHERMTYPES. 42. Display the result of MOVIETYPES minus OTHERMTYPES. 43. Create an index called MTITLES on the MOVIES table. The index should be on movietitle. 44. Remove the index called MTITLES. 45. Remove the table called OTHERMTYPES 46. Create a view called MOVIEV on the MOVIES table. It should only contain movietitle and rating. 47. Display all of the data in MOVIEV. 48. Remove the view called MOVIEV. 49. Create another user. Give this user SELECT access to your tables. 50. Login as this user and display all fields and records in the tables.

© Copyright G. Campbell 2010

98

but users connected to database B cannot use the same link to access data in database A. Database server Database servers are responsible for processing SQL queries that have been generated by the client process. local and global application. The link pointer is actually defined as an entry in a data dictionary table. then they must define a link that is stored in the data dictionary of database B. you must be connected to the local database that contains the data dictionary entry. A distributed database is a database that is spread across a network of computers that are geographically dispersed and connected via communication lines. A distributed database is a database that is under the control of a central database management system or distributed database management system (DDBMS) in which storage devices are not all attached to a common CPU. A distributed database works by using database links. For this connection to occur. of Ca. global intelligence Logical database Logical databases are programs that read data from database tables. and for returning the results of these queries back to the client process that made the request. Global Intelligence This is a DBMS that manages the distributed database. R* or System R by IBM Research. Users access the distributed database through: • Local applications . It can also be stored in multiple computers located in the same physical location.Database Management UNIT IV: DISTRIBUTED DATABASES Characteristics of a distributed database A centralized system is one in which all of the data is located in a single database at a single site. • Global applications . Campbell 2010 99 . © Copyright G. A database link connection allows local users to access data on a remote database.applications which do not require data from other sites. Definition of logical database. A database link connection is one-way in the sense that a client connected to local database A can use a link stored in database A to access information in remote database B. at Berkeley.applications which do require data from other sites. Distributed Ingres by Univ. Examples are: SDD-1 by Compute Corp of Americs. If local users on database B want to access data on database A. To access the link. The database must have a single logical data model. each database in the distributed system must have a unique global database name in the network domain. Users can log in from any location to access the database. The global database name uniquely identifies a database server in a distributed system. A database link is a pointer that defines a one-way communication path from a database server to another database server.

this means the user should not know that the data is portioned. This applies to the systems performance.These transparency features are listed below: • Distribution transparency . The global application accesses all sites at least once. Its job is to manage the distributed database as a whole.e. Campbell 2010 100 . At least one application takes a global view of the data. the system should still continue to operate without the user being aware that something had gone wrong. Transactions must also be divided into sub-transactions.e.g. users must be able to interact with the system as if it was one logical system. Assessment of a distributed database versus a loose connection of independent site 1. A global intelligence (i. regardless of where they reside? Care with a distributed database must be taken to ensure that the distribution is transparent.Database Management Client-server A client-server architecture in a distributed database is a network architecture in which each computer or process on the network is either a client or a server or both. in addition if all the locations are not updated then. a DBMS) exists over and above all the local intelligence (i. 4. which will serve to hide the complexities of the distributed database from the end user. In other words. Oracle). Database servers are powerful computers Clients are PCs or workstations on which users run applications.if one machine fails.Does a user access all of the files in a system in the same manner. In other word the DDPMS should make the user think that he/she is working with a centralized database. The queries will be sent to the database server. Data that makes up the logical database is stored at multiple sites connected by a network. and methods of access amongst other things. In other words. Homogeneous distributed database – All of the sites use the same DBMS (e.the system should allow the integration of various DBMS without the user being aware of all these issues. • Performance transparency . The users should not need to know at which site any given piece of data is stored. DMBSs).the performance of the system should not suffer because of the distributed design (in terms of network Congestion etc-) • Heterogeneity transparency . Clients rely on database servers to process their queries. The user will therefore use his client application to run queries. a distributed system should look like a centralized system to the user. • Failure transparency .this enables a transaction to update data at several locations. Terms and concepts used in distributed databases Transparency . © Copyright G. who returns the result to the client.the transactionis cancelled and the data reverts to its original state. that it is replicated or where it is located. 3. A DDBMS must provide certain transparency features. • Transaction transparency . Transactions are transparent – each transaction must maintain database integrity across multiple databases. each subtransaction affecting one database system. 2.

Table Replication . For example. Certain processing can go on at one site and other processing at other sites thereby speeding up processing.contains all the attributes/fields and a subset of the tuples/rows/records b) Vertical . the different sites do not have to use the same DBMS (e. The data may be distributed in several ways using the following database concepts: Fragmentation .Allows local groups (departments) to have control over their own data. © Copyright G. Three replication conditions exist: full replication. but the duplicate site has not been changed. Advantages and disadvantages of a distributed database Advantages • Reflects organizational structure – database fragments are located in the departments they relate to • Local Processing and Autonomy . data can be changed at one site. Reasons for replication a) To maximize local availability of data b) To provide backup copies of tables in case a particular network fails. • Partial replication . • No replication .Determines the distribution of tables around the network.Database Management Heterogeneous distributed database – Uses multiple DBMS’s. while others have been duplicated at various sites (e. (Parallel processing). Campbell 2010 101 .Describes how a single table/file is divided among network sites. There are three types of fragmentation. • Cost Reduction/Economics . subsets of rows and columns).contains a subset of the columns/fields/attributes and all the rows/records c) Mixed – database is fragmentation horizontally and vertically. It also costs less to create a network of smaller computers with the power of a single large computer.only some of the database fragments are replicated. these are as follows: a) Horizontal . Allocation . Some tables exist at only one site. partial replication or partial replication or no replication • Full replication . frequently used files that are basically static – such as a code file).g.combines fragmentation and replication.each database fragment is stored at the same location.g. Oracle and MS-SQL and Postgresql).all database fragments are replicated. replication degrades database performance as all copies of table must be updated regularly to maintain integrity. Replication can introduce integrity problems.Less transmission of data so communication costs down as data closer to locations where originate. For frequently updated tables. (in other words. In other words.

A distributed DBMS needs data about the distributed database to manage it. (A high load on one module of the database won’t affect other modules of the database in a distributed database. allowing load on the databases to be balanced among servers.If one site fails.If fire/sabotage of a site then data available on other site. Capacity and incremental growth . views (virtual tables). Extra database design work must also be done to account for the disconnected nature of the database — for example. The infrastructure must also be secured (e.Need for concurrency control and recovery mechanisms to process updates across the network and restores consistency after a crash. instead of the entire database. A difficulty may arise if one site holding a copy is not available at the time of the update. One solution is to designate one copy as the primary copy. • Distributed transaction management is hard to control. Disadvantages • Distributed execution . The database catalog consists of metadata in which definitions of database objects such as tables. Security . • Distributed DBMS schema management is very difficult . © Copyright G.. indexes. by encrypting the network links between remote sites). joins become prohibitively expensive when performed across multiple systems. instead of one big one. • Economics — Increased complexity and a more extensive infrastructure means extra labour costs. Extra work must also be done to maintain multiple disparate systems.There is no one machine that can hold all of the data. and user groups are stored.) Improved Availability and Reliability .g. added and removed from the distributed database without affecting other modules (systems). data may be on another site.• • • • • • Database Management Data and load sharing – Each site does its own processing rather than overloading one site.The distributed DBMS needs to synchronize and control processes on the various computers on network. This leads to improved performance – data is located near the site of greatest demand. Campbell 2010 102 . Such schemas must be stored and managed in a distributed fashion . If it becomes necessary to expand the system then it is easier to add a new computer than upgrade one computer. A fault in one database system will only affect one fragment. Modularity – systems can be modified. Efficiency and flexibility – If data is stored close to its normal point of use then response times and communication cost will be reduced. It is harder to recover from backups.very difficult. This site is responsible for broadcasting the updates • Catalog management is more difficult. It is difficult to maintain integrity because enforcing integrity over a network may require too much networking resources to be feasible. and the database systems themselves are parallelized. • Security — Remote database fragments must be secured. • Complexity — Extra work must be done by the database administrator (DBA) to ensure that the distributed nature of the system is transparent. and they are not centralized so the remote sites must be secured as well.

The payroll officer. name. Mary’s name change was made at Site A by the site manager. address. name. That night. Brown need to use the network for? Which query allows Mr. The database administrator now needs to do a restore. Osbourne Inc has 2 sites. Karen’s address was changed at Site B by the site manager but he could not make the same change on Site A because of the same network problem. Brown. Which version of the table is the correct one? 4. The table is duplicated on two different sites. name. The fields TRN. gender and date of birth. occupation and salary are located in Montego Bay. The distributed database has a table with the fields TRN. and as a young field there is not much readily available experience on proper practice. How does a distributed database work? © Copyright G. one in Kingston and the other in Montego Bay. a distributed database or a non-distributed (centralized) database? Give reasons for your answer. address. Campbell 2010 103 . What advantage does Hewlett Limited have in this case? 2. Mr. but he could not make the update on Site B because of a network problem. 6. What are the advantages of a distributed database? 7. both sites did a backup. These records need to be processed. One of their sites burnt to the ground. address and gender are located in Kingston while TRN. Geo Systems Limited has a table that contains the fields TRN. Which do you think is cheaper. Hewlett Limited has a distributed database.• Database Management Inexperience — distributed databases are difficult to work with. PQHG Limited has millions and millions of records in their database. Query 1 shows names and addresses of employees and Query 2 shows names and salaries of employees. Which query does Mr. He needs to create 2 queries. Karen changed her address. occupation and salary. Practice Questions 1. Do you think it is better to place all of the records on one computer to be processed or is it better to let several computers share the load? 3. who deals with salaries is located in Montego Bay. Brown executes Query 2 very often and Query 1 very rarely. gender. Mary got married and changed her last name. Brown to access files locally? Should there be a difference in the way he runs or accesses either query? What is transparency? Mr. What are the disadvantages of a distributed database? 8. name. Would you redistribute the fields or do you feel that the existing location is fine? 5. The next morning both systems crashed.

In addition. Middle level managers. if every department wants to have its own source of downloaded data. dedicated to facilitating quick decision making in a complex environment. © Copyright G. In other words. are tailored to serve the information needs of people who deal with short term inventory. A data warehouse includes not only data but also tools. Accordingly. The databases in a data warehouse usually are quite large. while one or two download sites can be managed without a problem. thus mandating their prompt reaction to change in order to remain competitive. For example. Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data. In addition. accounts payable or purchasing. Different managerial levels require different decision support needs. By tapping into the operational database. It typically has a user-friendly interface so users easily can interact with its data. managers are always looking for competitive advantages through product development. Data warehousing Downloading does move data closer to the user and thereby increase its potential utility. perhaps most important. It is designed to support management decision making. That service is called data warehousing. Campbell 2010 104 . Organizations tend to grow and prosper as they gain a better understanding of their environment. management can develop strategies to meet organizational goals. vice-presidents and presidents focus on strategic and tactical decision making. personnel and other resources that make access to the data easier and more relevant to decision makers. Through a data warehouse. the decision making cycle time is reduced. Typically. data analysis can provide information about short-term tactical evaluations and strategies such as: are our sales promotions working? What market percentage are we controlling? Are we attracting new customers? Tactical and strategic decisions are also shaped by constant pressure from external and internal forces. the modern business climate requires managers to approach increasingly complex problems based on a rapidly growing number of internal and external variables.Database Management Data warehouse The need for data analysis. Managers understand that their business climate is very dynamic. Such managers require detailed information designed to help them make decisions in a complex data and analysis environment. service. training. There is therefore growing interest in creating support systems. A data warehouse contains a wide variety of data that present a coherent picture of business conditions at a single point in time. Unfortunately. including globalization. marketing and so on. procedures. Given the many and many and varied competitive pressures. organizations began to look for some means of providing a standardized service for moving data to the user and making them more useful. technology. The goal of the data warehouse is to increase the value of the organization’s data asset. general managers. the management problems become immense. business managers must be able to track daily transactions to evaluate how the business is performing. based on operational databases. What is a data warehouse? A data warehouse (DW) is a huge database that stores and manages the data required to analyze historical and current transactions. the cultural and legal environment and. managers and other users access transactions and summaries transactions quickly and efficiently. transaction processing systems.

The reporting systems that formed the foundation of basic decision support required direct access to the operational data through a menu interface to yield predefined report structures.Database Management Figure 1 – A Data Warehouse (DW) The role of the data warehouse is to store extracts from operational data and make them available to users in a useful format. These reporting systems provided some basic answers to the end user’s questions. the reporting system was front-ended by a text-only presentation tool. The data can be extracts from databases and files. The data warehouse stores the extracted data and also combines it. disparate elements © Copyright G. photos and other non-scalar data. or the total of. some ad hoc query capability. but can also be document images. The end user’s questions. Typically. to use the queries the end user had to know the details of the underlying data structure. The SQL-based query tool provided some predefined reports and. better yet. transforms it and makes it available to users via tools that are designed for analysis and decision making such as OLAP (see section “What is On-line analytical processing (OLAP)?” below). aggregates3 it. The source data could also be purchased from other organizations. Evolution of the data warehouse The origins of today’s Data Warehouses can be traced to the reporting systems that were popular in the 1980s. Campbell 2010 105 . Such lightly summarized data were usually stored in an RDBMS and were accessed through SQL statements via a query tool. The presentation tool was similar to the one used by the 3 A collection of. although the format wasn’t always the most appropriate. The next development stage produced a sophisticated form of decision support by supplying lightly summarized data extracted form the operational database. although the format wasn’t always the most appropriate. Unfortunately. recordings.

Banking deposit/withdrawal etc. Primitive as they were by current standards. Given advances in hardware and software in the late 1980s and early to mid-1990s. Once data are stored. Subject-Oriented Time-Variant Non-volatile Components of a data warehouse • • • • • • • • • Data extraction tools Extracted data Metadata4 of warehouse contents Warehouse DBMS(s) and OLAP (online analytical processing) servers Warehouse data management tools Data delivery programs End-user analysis tools User training courses and materials Warehouse consultants The source of the warehouse is operational data or data generated from routine transaction processing systems such as Sales. Differences between data warehouse and operational database Characteristic Integrated Operational database data Similar data can have different representations or meanings Data warehouse data Provide a unified view of all data elements with a common definition and representation for all departments. the sales of a dimension is added to facilitate data product in a given data). no changes are allowed. Data updates and deletes are Data cannot be changed.g. (e. analysis and time comparisons. but it did provided additional customization options for ad hoc reports. End users used their own desktop tools to access and manipulate data in order to support their decision making process. sales. only added periodically from operational systems. A variation on this theme of greater end user empowerment was the use of spreadsheets or statistical packages to analyze operational data. 4 Data about the data such a field names. these reporting systems and their extensions gave IS departments the first major tools with which to solve decision support problems. Campbell 2010 106 . Data are very common. Data are stored with a functional Data are stored with a subject or process orientation (for orientation that facilitates multiple example. debits views for data and decision making etc). The data warehouse therefore needs tools for extracting the data and storing them. products. Payroll. © Copyright G. Registration of a student. validation rules etc). their format.g.Database Management original reporting system. data warehouse developments were almost inevitable. sales by products etc.) Data represent current Data are historic in nature. credits. These data however are not useful without metadata and describe the nature of the data. field types. the explosion of available operational data. their origins. A time transactions (e. and the growing sophistication of decision support systems. invoices. limits on their use and other characteristics of the data that influence the way they can and should be used.

A reshuffling of employees and offices should be shown on a diagram of office space. Finally. but complicated set of resources and services. Many users of data warehouse facilities want to import warehouse data into domainspecific programs. a user who wants to investigate the impact of different marketing campaigns may want to aggregate product sales according to package color at one time. integrates and transfers data from one processor to another within the data warehouse. Programs may be needed to store and process non-scalar data like graphics and animations also. and other similar training products to make it easy for users to take advantage of the warehouse resources. aggregates5. While the data in a report or query may vary from month to month. For example. but simply presents it differently. Another difference is that users want to do their own data aggregation6. These requirements are more difficult because they vary from user to user and from task to task. the data warehouse contains billions of bytes of data in many different formats. according to package color within marketing program at a third time. several DBMS and OLAP products may be used. The data warehouse provides an important. training materials and on-line help utilities.Database Management Potentially. Campbell 2010 107 . a user may be presented with a screen that shows total product sales for a given year. Hence the warehouse needs to include training courses. Sales by state and province should be shown on a map. Data warehouse users. The user may then want to be able to click on the data and have them explode into sales by month. or the total of. the structure of reports and queries is standardized. Because the purpose of the data warehouse is to make organizational data more available. often need to change the structure of queries and reports. a typical database application. the data warehouse includes knowledgeable personnel who can serve as consultants. User requirements for a data warehouse The requirements for a data warehouse are different from the requirements for a traditional database application. the warehouse must include tools not only to deliver the data to the users but also to transform the data for analysis. For example. query and reporting. The analyst wants the same data in each report. Users want to see results of geographic data in geographic form. For one. it needs DBMS and OLAP servers of its own to store and process the data. For example. or drill down their data. and the features and functions of these may be augmented by additional in-house developed software the reformats. Accordingly. In fact. according to marketing program at another time. to click again and have the data explode into sales by product by month or sales by region by product by month. for instance. and OLAP for userspecified aggregation and dis-aggregation. financial analysts want to import data into their 5 6 A collection of. the structure of the report or query stays the same. Graphical output is another common requirement. on the other hand. Data warehouse users also want to dis-aggregate them in their own terms. disparate elements To collect or total disparate elements © Copyright G.

3. The current generation of specialized decision support systems provides a comprehensive infrastructure to design. The data warehouse data are integrated. and oil drilling engineers want to import data into seismic analysis programs. 7. the classical approach is process driven. The data warehouse contains historical data over a long time horizon. integration. 8. © Copyright G. 2. Campbell 2010 108 . 4. functionality. The 12 rules capture the data warehouse life cycle. from its introduction as an entity separate from the operational data store. storage.Database Management spreadsheet models and into more sophisticated financial analysis programs. All of this importing usually means that the warehouse data needs to be formatted in specific ways. The data warehouse data are snapshot data captured at a given point in time. transformation. The data warehouse’s metadata7 are a critical component of this environment. develop. Individual groups or departments often extract data from the data warehouse to create their data marts. The data warehouse and operational environments are separated. The following list is made up of 12 rules that define a data warehouse. The data warehouse contains a charge-back mechanism for resource usage that enforces optimal use of the data by end users. 1. 7 Data about the data such a field names. and highly summarized data. The data warehouse data are mainly read-only periodic batch updates from operational data. The data warehouse environment is characterized by read-only transactions to very large data sets. usage. The operational environment is characterized by numerous update transactions to a few data entities at a time. lightly summarized. Portfolio managers want to import data into portfolio management programs. relationships. field types. Marketing and sales departments may have their own separate data marts. transformations and storage. The metadata identify and define all data elements. validation rules etc). 12. This list was created by William H. The data warehouse development is data driven. The data warehouse data are subject-oriented. 9. and management processes. The data warehouse contains data with several levels of detail: current details data. implement and use decision support systems within an organization. Data mart Some organizations decide to limit the scope of the warehouse to more manageable chunks. 10. and history of each data element. Rules for defining a data warehouse. The data warehouse environment has a system that traces data sources. A data mart is a smaller version of a data warehouse. The metadata provide the source. 6. old detail data. No online updates are allowed. containing a database that helps a specific group or department make decisions. 11. to its components. The data warehouse development life cycle differs from classical systems development. 5. Inmon and Chuck Kelley in 1994.

and problems such as timing and domain inconsistencies are unlikely to occur because users gain experience working with the same data. At the other extreme. the users are well trained. Use multidimensional data analysis techniques 2. OLAP systems share four major characteristics. budgeting and forecasting. a data warehouse provides extensive types of data and services for both recurring and ad hoc requests. these are: 1. financial reporting and similar areas. As we move from left to right. Tools for managing the data warehouse and for providing data to the users can be written with an eye toward the requirements that marketing analysts are likely to have. Campbell 2010 109 . Data downloading is the smallest and easiest alternative. the alternatives become more powerful but also more expensive and difficult to create. Support client/server architecture OLAP is an approach to quickly answer multi-dimensional analytical queries. validation rules etc). Metadata8 is also simpler and easier to maintain. The typical applications of OLAP are in business reporting for sales. Data are extracted from operational systems and delivered to particular users for specific purposes. Data Marts Data Downloading Particular Data Inputs Particular Business Functions Particular Business Unit or Geographical Region Data Warehouse Easier Figure 2 . Provide easy-to-use end user interfaces 4. field types. A data mart that is restricted to a particular business function. which also encompasses relational reporting and data mining. © Copyright G. management reporting. business process management (BPM). marketing. and operations research activities. such as marketing analysis.Database Management Restricting a data mart to a particular type of data makes the management of the data warehouse simpler and probably means that an off-the-shelf DBMS product can be used to manage the data warehouse. Provide advanced database support 3. OLAP is part of the broader category of business intelligence. business modelling. but all of those data serve the same type of users. The downloaded data are provided on a regular and recurring basis. so the data warehouse resources can be allocated to fewer users. may have many types of data and metadata to maintain. The following diagram summarizes the scope of alternatives for sharing data. The term 8 Data about the data such a field names.Continuum of Enterprise Data Sharing More Difficult On-line analytical processing What is On-line analytical processing (OLAP)? OLAP refers to an advanced data analysis environment that supports decision making. so the structure of the application is fixed. but the amount of data to be managed is less than for the entire company. Data marts fall in the middle. A data mart that is restricted to a particular business unit or geographical area may have many types of input and many types of users. There will also be fewer requests for service.

allowing for complex analytical and ad-hoc queries with a rapid execution time. Time Dimension 15/5/96 16/5/96 Totals $3500 $2000 $5500 $1800 $800 $2600 $5300 $2800 $8100 Sales figures occur at the intersection of a customer row and time column Data mining Often.g. Data warehouses often use a process called data mining. What is OLAP? Draw an example of a Multidimensional View of the data in the Education data warehouse.Database Management OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing). Data mining is a process that often is used by data warehouses to find patterns and relationships among data. E. 2. The following shows the difference between the operational view of sales data and the multidimensional view of sales data. What is the difference between operational data and a data warehouse? Explain the components of a data warehouse. the database is distributed. They borrow aspects of navigational databases and hierarchical databases that are faster than relational databases. © Copyright G. A state government could mine through data to check if the number of births has a relationship to income level. Examples of data mining findings can be: • 65% of customers who did not use their credit card in the last six months are 88% likely to cancel their account • 82% of customers who bought a new TV 27” or larger are 90% likely to buy and entertainment center within the next four weeks • If age < 30 and income <= 25000 and credit rating < 3 and credit amount > 25000 then the minimum loan term is 10 years. Many e-commerce sites use data mining to determine customer preferences. Campbell 2010 110 . Databases configured for OLAP use a multidimensional data model. 3. Operational View INVOICE Table Number Date 2034 15/5/96 2035 15/5/96 2036 16/5/96 2037 16/5/96 LINE Number 2034 2034 Table Product Price Quantity Mouse $150 20 Diskette $50 10 Customer Dartonik INC Dartonik INC Amount $3500 $1800 $2000 $800 Multidimensional View Customer Dimension Dartonik INC Totals Practice Questions 1. 4.

We can have several types of locks: • ·Database locks . Consistent. If concurrency control is not enforced at this point. Isolated.this property states that the transaction should never leave the database in an inconsistent state. Concurrency control is the process of coordinating the simultaneous executions of transactions within a multiuser environment. The concept of isolation is what makes concurrency control possible.Database Management Transactions – Atomic.all the tables within the database are exclusive to the current transaction. The simultaneous execution of transactions becomes problematic. © Copyright G.this property states that the transaction must be completed in its entirety or not at all. isolation states that a transaction has exclusive rights to the data being modified. Concurrency control The concept of concurrency control is very important when designing multiuser databases. only if the transactions are attempting to access or modify the same data. • Consistent . • Durable . All transactions must adhere to the ACID test: • Atomic . A lock is a mechanism that guarantees exclusive use of a data item. • Isolated . Remember. A transaction may be defined as being a group of data modifications that must be performed entirely or not at all. then data inconsistencies may occur during the process of data modification.this property states that the data modification is permanent once the transaction has been completed and if the transaction is not completed then the system should remain in its original state. Durable (ACID) An understanding of transactions is essential to the database designer especially if he/she is designing a multiuser database. This property ensures that the integrity rules and business are not violated. Campbell 2010 111 . Conflict Table Transactionl Read Read Write Write Transaction2 Read Write Read Write Result No conflict Conflict Conflict Conflict Lock Level In order to accomplish isolation the DBMS makes it possible to perform a lock on a data item.this property states that the data that is being used by a transaction is not accessible until the transaction has been completed.

© Copyright G. * An exclusive lock exists when the data item is available only to a single transaction. The DBMS will then use this information to ensure that each transaction is durable (made permanent). * A shared lock is one that allows two or more transaction to access the same data item for reading purposes. • • Lock Type Irrespective of the lock level. With this method each transaction must impose a lock on the data item being accessed and must release the lock once the transaction has been completed. Both of these locks are example of binary locks. ·Row locks . Transaction Logs The DBMS uses a transaction log to keep track of all the data modifications. A binary lock only has two states: locked or unlocked.all the rows and columns within a table is exclusive to the current transaction.the selected columns are exclusive to the current transaction.• Database Management ·Table locks . ·Column locks . The most common are exclusive locks and shared locks.the selected rows are exclusive to the current transaction. at the same time. If the transaction was not completed then the system would ensure the durability of the system by ensuring that the before values are permanent. The problem with an exclusive lock is that the DBMS will not allow two or more transactions to the access same data item for reading. If the transaction were completed then the DBMS would ensure the durability of the system by ensuring that the after values are permanent.is checked to see which transactions were completed and which transactions were not. which are performed by each transaction. the DBMS may impose different lock types on the data item. Campbell 2010 112 . A typical transaction log will store the following pieces of information: • • • • • • · · · · · The start of a transaction The name of table being modified The primary key of the record being modified The field that is being modified The before and after value of the field being modified The end of the transaction When a system failure occurs the transaction log.

Database Management

UNIT V: SECURITY ISSUES
The role of the Data Dictionary The DBMS makes use of descriptions of data items provided by the DDL. This is data about data (meta-data). Metadata describes the structure and format of the data and the
overall database.

System tables store metadata. Contents include:
number of tables and table names, number of fields and field names, field types, field

lengths, key fields, field descriptions, files, cross references, error checks e.g. range etc. The DD helps a database user in: • Communicating with other users • Controlling data elements (add fields, change descriptions, formatting). Maintaining standards. • Determining the impact of changes to data elements on the total database • Centralizing the control of data elements as an aid in database design and in expanding the design. • Data validation What is data security? In the computer industry, data security refers to techniques for ensuring that data stored in a computer cannot be read or compromised by any individuals without authorization. Most security measures involve data encryption and passwords. Data encryption is the translation of data into a form that is unintelligible without a deciphering mechanism. A password is a secret word or phrase that gives a user access to a particular program or system. [Research – Protection vs Security] What are Security Risks? A computer or data security risk is any event or action that could cause a loss of or damage to computer hardware, software, data, information, or processing capability. Security risks fall into 6 main categories, they are as follows: Human error Technical error Virus, worm, Trojan horse Natural disasters etc Unauthorized use and access Theft and vandalism Sources of incorrect data:• Accidents - mistyping input or programming errors • Malicious use of the database • System problems - disk crash etc.

© Copyright G. Campbell 2010

113

Database Management Database protection involves: • Integrity preservation - concerns non malicious errors and their prevention. • Security (Access control) - concerned with restricting certain users so they are allowed to access and/or modify only a subset of the database. Security risks and their effects 1. Human error Humans make mistakes. Examples of mistakes made include: • Deleting a file by accident • Formatting a hard drive • Adding data twice • Entering incorrect data • The computer is being misused by someone who is not adequately trained/experienced (e.g. young child) The effects of human error include: • Loss of data • Less data integrity (incorrect data) therefore incorrect information will be retrieved • Physical damage to computer due to improper use 2. Technical error A technical error is a system failure. The failure could be because of either hardware, software or both. Examples include: • Hard disk crashing • Missing or corrupted files (e.g. due to not shutting down properly etc.) • Computer not booting • Drives (diskette, CD), not working (e.g. due to dust) The effects of technical error include: • Loss of data • Loss of time in having to re-enter data • The inability to use certain devices 3. Virus A virus is computer program that is designed to replicate itself by copying itself into the other programs stored in a computer. It may be benign or have a negative effect, such as causing a program to operate incorrectly or corrupting a computer's memory. In addition to replication, some computer viruses share another commonality: a damage routine that delivers the virus payload. A virus” payload is an action it performs on the infected computer. The effects of viruses include: • The computer cannot boot because a boot sector virus has corrupted the boot sector • Files are erased by the virus • Hard drive is formatted (all files are therefore lost)
© Copyright G. Campbell 2010 114

Database Management • Files are corrupted by the virus • Consumption of storage space and memory • Degrading performance of the computer It's important to remember that most viruses aren't programmed with destructive intentions. Most simply reproduce without any destructive attack. However, these viruses can cause damage to your files, particularly since many of the viruses are poorly written programs that can cause unintended software conflicts. At the very least, viruses are intrusive applications that steal storage and CPU cycles without your permission. Most people's worst virus fear is having their hard drive erased, but those who regularly create back-up versions of important data could recover within a few hours. Viruses that subtly corrupt data are potentially much more destructive - computer users may not notice their presence until a great deal of data has been ruined. Some viruses insert random numbers in spreadsheet applications or system files, or add typos to word processing documents. One particularly nasty virus posted confidential documents in the user's name to Internet newsgroups. [Research – the different types of viruses] 4. Natural disasters etc Disasters can cause physical damage to computers, thereby causing loss of the data on the computers. Examples of disasters (natural and otherwise) include: • Earthquake • Hurricane • Fire • Flood • Lightening • Power surge, low voltage • Rats, roaches, insects etc. The effects of disasters include: • Physical damage to computer • Loss of data • Repair bills 5. Unauthorized access and use Unauthorized access is the use of a computer or network without permission. Unauthorized access includes: • Hacker/cracker – A hacker is a slang term for a computer enthusiast, i.e., a person who enjoys learning programming languages and computer systems and can often be considered an expert on the subject(s). Depending on how it used, the term can be either complimentary or derogatory, although it is developing an increasingly derogatory connotation. The pejorative sense of hacker is becoming more prominent largely because the popular press has co-opted the term to refer to individuals who gain unauthorized access

© Copyright G. Campbell 2010

115

patent and industrial design. or destroyed.the unauthorized copying of software.• Database Management to computer systems for the purpose of stealing and corrupting data. theft of marketing information (e. medical information. or disk that can be used if the original is lost. customer lists. Unauthorised use is the use of a computer or its data for unapproved or possibly illegal or unethical activities. email. • Full – backup that copies all of the files in a computer (also called archival backup) 9 Intellectual property refers to the category of intangible (non-physical) property comprising primarily copyright. Backup is therefore the main risk management solution. or marketing plans). A person accessing someone else’s bank account. A backup is a duplicate of a file. Unauthorized use includes: • Employees do things to deliberately modify the data such as give themselves a raise • Taking money from someone’s account • Checking personal email or playing computer games on company time • Software piracy . trademark. damaged. pricing data. Campbell 2010 116 . The following describes the different types of backup. Competing entity could use data against your company • Loss of time • Identity theft • Also leads to theft of intellectual property9.) • Illegal access to files • Loss of income due to software piracy. The effects of theft and vandalism include: • Loss of computer and data (and time to re-enter etc. Database protection methods . or blackmail based on information gained from computerized files (e. Hackers maintain that the proper term for such individuals is cracker. The effects of unauthorized access and use are as follows: • Loss of sales due to piracy. Theft and vandalism A computer can be physically stolen or destroyed.g. 6. © Copyright G. personal history. This also causes loss of data. things can still go wrong..g.backup and restore methods Backup is the key – the ultimate safeguard Regardless of the precautions that you take.. medical records etc without permission. or sexual preference). moral rights related to copyrighted materials. If your computer fails you can restore from the backup.

Son (or Three-generation backup) – backup method in which you recycle 3 sets of backups. Since the primary key automatically sets an index. • • Uniqueness of key .g No two students should have the same id number. data validation. Data Validation What is data validation? Data validation is the process of comparing data with a set of rules or values to find out if the data is correct. What is the purpose of a validation rule? Validation rules reduce data entry errors. also called validity checks. This method allows you to have the last 3 backups at all times. The father then becomes the grandfather. are checks performed on the data to ensure that the user is entering the correct data. authority levels Keys Since primary keys do not allow null or duplicate values. regardless of whether or not the files have changed since the last backup Grandfather. What is a validation rule? Validation rules. The various types of validity checks include: • Valid values – List © Copyright G. Each time that you backup you reuse the oldest backup medium. They do this by limiting what the user is allowed to enter in a particular field. and reports. it also allows the DBMS to locate records faster. the middle backup is the father and the latest backup is called the son. each table should include a field or set of fields that uniquely identifies each record stored in the table. Father. Referential integrity (must match foreign key) – Ensures that related records in separate tables have a match on the common field. The oldest backup is called the grandfather. the son becomes the father and the new backup becomes the son. it prevents the data entry person from entering the same record more than once or from entering a record with no unique identifier. forms. Integrity Preservation – keys (primary and foreign). E. The power of a database system comes from its ability to quickly find and bring together information stored in separate tables using queries.This prevents duplication.• • • • Database Management Incremental – backup that copies only the files that have changed since the last full or last incremental backup Differential – backup that copies only the files that have changed since the last full backup Selective – backup that allows a user to choose specific files to back up. Campbell 2010 117 . In order to do this.

credit card and other identification numbers often include one or more check digits. playing games. marital status can only be single.Database Management The data in the field is limited to a certain list of values. Unauthorized use is the use of a computer or its data for unapproved or possibly illegal activities (e. Consistency check This tests the data in two or more associated fields to ensure that the relationship is logical. (E. 5 to 9) Alphabetic/numeric check (Data type check) Alphabetic check .Ensures that users enter only alphabetic data into a field. Completeness check Verifies that a required field contains data. A check digit often confirms the accuracy of a primary key value.g. For example. For example. Security Control – unauthorized access and use. • Range check A range check determines whether a number is within a specified range. Bank account. use a bar code reader to scan in the items rather than have the cashier typing in the item code Training of users so that human error is reduced. Security controls help to preserve the integrity of data. Authority Levels Authority levels are used to limit access (only certain users can perform certain tasks). Field size check Data that is entered into a field can also be limited by the size. In other words. One user may have Add/Change authority while another has Delete authority. • • • • • • • • • © Copyright G. automate as many processes as possible. For example. Supervision of children and inexperienced users. For example. encryption. Security controls include: • Unauthorized access is the use of a computer or network without permission.g. anti-virus.Ensures that users enter only numeric data into a field. widowed or divorced. The user should therefore not be allowed to enter a student id number that has more than 6 characters. Data validation Reduction of human interaction (because humans make mistakes). surfing net on company time). For example. every student must have a first and last name entered. Check Digit A number or character that is appended to or inserted into a primary key value. This is done for example through login ids and passwords. married. firewall. the value in a Training_Date field cannot occur earlier in time than the value in the Date_Joined field. Campbell 2010 118 . SQL views A security control is an action taken to either prevent a data security risk from happening or to reduce its effects. sex can only be male or female. your student id number is made up of 6 characters. Numeric check .

This is different from a generator which is used during a power cut and runs on gas. Use biometric devices – e.g. The UPS is important because improper shutdown can corrupt files. defrag.e. UPS (Uninterruptible Power Supply) – This has a battery which charges while there is power. committing fraud or stealing from the company. Retinal scan. finger print scan. Insurance of equipment in order to re-purchase if your computer is destroyed. Get a warranty period when purchase a computer – a computer that has a technical error can therefore be fixed free of cost Air conditioning – to keep the computer cool Plastic dust covers to keep dust out of diskette drives etc. Backup . (e. Lightening rod to protect the building and all electrical devices within the building from lightening storms. Black Ice. weatherproof facilities (no windows. It allows you to continue using the computer for as long as there is gas. Raised floors also allow you to hide cables below. Zone Alarm) Place computer site in a good location (e. Proper (sturdy) desk on which to store computer No magnets/don’t open shutter and other proper diskette care procedures to prevent data from being erased Proper maintenance (care) – e. power surge/spike. You can also use mirrored disks in which data is saved to more than one disk. Surge protectors to protect against low voltage. junk email (spam).g. Use only authorized media for loading data and software. This is in order to prevent employees from making mistakes.g. passwords – to prevent unauthorised access and use. voice activated © Copyright G.just in case the hardware fails you. The UPS also provides protection from power surges. cleaning computer Regular testing of hardware and software Virus protection . An offsite backup protects in cases of disaster. or if you get a virus or other problem that causes loss of files.a program and/or hardware that filters the data coming through the internet to prevent unauthorized access. Fire extinguishers – specially made for computers (foam).g.g. Limit software downloads to reduce the likelihood of getting a virus. Campbell 2010 119 . Anti-virus software detects and removes viruses. if one disk crashes the other takes over. It gives you time to shut down the computer properly when there is a power cut. fireproof) No food/drink around the computer – no insects. Do not open unknown email and attachments to avoid getting a virus. An offsite backup is one that is not at the same location as the computer.g. [Research RAID] Buy quality hardware from a reputable dealer to reduce likelihood of hardware failure. Norton Antivirus. not on a hillside or near the sea) Strong. It is used for earthquake protection as it works as a shock absorber. Some firewalls protect systems from viruses. McAfee. lightening etc.• • • • • • • • • • • • • • • • • • • • • • • • • Database Management Separation of duties (e. Use a firewall . The software must however be updated regularly as new viruses are invented each day. Write protection of diskettes if not saving (only reading) so as not to get a virus. Access codes. one person enters and another person is needed to change the data such as a cashier). These will not damage the computers whereas water would cause damage. spills on keyboard etc Raised (false) floors – Similar to a false ceiling except this is below your feet.

Time and Location controls – User can only use system at certain times and in certain locations (can’t hide and do wrong things) Proper distribution and disposal . Campbell 2010 120 . guards.encoding data so that it means nothing to hackers if they get into the system. Callback systems – the user can connect to the computer only after the computer calls the user back at a previously established telephone number. Go to secure sites (lock at the bottom of the screen). update.g. grills etc. this reduces unauthorised access and use. (What happens when you. locks. or the wrong PIN for your debit card at the ATM) Audit trails and logs . (Patents/Trademarks) Auditing the programs that are written in case an unscrupulous employee deliberately put in code for his benefit. Go to reputable web sites so that will not steal credit card number.reports should be distributed to the correct users. Physical isolation of data Encryption of data . Metal detectors to prevent hardware theft Lock the computer to the desk Low profile facilities (no overt disclosure of high-value nature of site. (e. delete © Copyright G. try to put in a false telephone card number. Copyright and License agreements – so that you have the right to sue persons who steal your software/data. Views/Virtual tables – user able to only see certain fields/records. do not throw away credit card statements (prevents persons from going in your garbage and getting your private information). Shred reports and do not just throw them in the garbage. (Keep the receipt/invoice as proof of purchase and to have a record of the serial number).g. Grant and Revoke – allows users to have only certain types of privileges – e. select.audit trails keep track of what a user does when he is on the system while log systems – keeps track of user sign on/off Physical security – e. in other words do not display a sign to let persons know where your computer facilities are) Mark your computers in a secret place so that you can identify it if the police find it.g.• • • • • • • • • • • • • • • Database Management Intrusion detection software – detects if you put in the wrong password more than 3 times and kicks you off.

customer_name varchar2(20). CREATE TABLE dmot_depositor ( account_number char(5). INSERT INTO dmot_branch VALUES ('Jamaica'. balance number. account_number). © Copyright G. 'East Village'. customer_city varchar2(20). 'Canal St'. 150000). INSERT INTO dmot_account VALUES ('A-105'. CREATE TABLE dmot_borrower ( loan_number char(5). 'Park Slope'. 'East Village'. foreign key (loan_number) references dmot_loan). CREATE TABLE dmot_customer ( customer_name varchar2(20). 'New York'. 200000). CREATE TABLE dmot_branch ( branch_name varchar2(20). primary key (customer_name)). 'Jamaica'). assets number. DROP TABLE dmot_account. INSERT INTO dmot_customer VALUES ('Christina'. 350000). CREATE TABLE dmot_loan ( loan_number char(5). DROP TABLE dmot_customer. customer_street varchar2(20). branch_city varchar2(20). INSERT INTO dmot_account VALUES ('A-103'. 220000000). DROP TABLE dmot_borrower. 300000000). 'Jamaica'. loan_number). CREATE TABLE dmot_account ( account_number char(5). INSERT INTO dmot_branch VALUES ('East Village'. foreign key (branch_name) references dmot_branch). foreign key (customer_name) references dmot_customer. INSERT INTO dmot_account VALUES ('A-102'. branch_name varchar2(20). DROP TABLE dmot_loan. 'Jay St'. INSERT INTO dmot_account VALUES ('A-104'. 'Broadway'. INSERT INTO dmot_customer VALUES ('Adams'. '7th Ave'. foreign key (account_number) references dmot_account). INSERT INTO dmot_branch VALUES ('Brooklyn Heights'. INSERT INTO dmot_customer VALUES ('Bob'. INSERT INTO dmot_customer VALUES ('Susan'. 'Jamaica'. primary key (account_number). 'Brooklyn'. primary key (loan_number). 500000). INSERT INTO dmot_branch VALUES ('SOHO'. 'East Village'. 450000). Campbell 2010 121 . amount number. 'New York').Database Management SAMPLE SQL CODE FOR RECREATING DATABASE DROP TABLE dmot_depositor. 'Brooklyn'). DROP TABLE dmot_branch. 'Brooklyn'). primary key (customer_name. foreign key (branch_name) references dmot_branch). 'New York'). foreign key (customer_name) references dmot_customer. 'New York'. primary key (customer_name. primary key (branch_name)). INSERT INTO dmot_customer VALUES ('Joe'. INSERT INTO dmot_branch VALUES ('Park Slope'. 'New York'). customer_name varchar2(20). 'Park Ave'. INSERT INTO dmot_account VALUES ('A-101'. 200000000). '112th St'. INSERT INTO dmot_customer VALUES ('Johnson'. 150000000). 180000000). branch_name varchar2(20). 'Brooklyn'.

100000). 'Bob'). 'Bob'). 'Jamaica'. 'Park Slope'. 'Susan'). INSERT INTO dmot_loan VALUES ('L-103'. INSERT INTO dmot_borrower VALUES ('L-101'. INSERT INTO dmot_loan VALUES ('L-102'. 'Park Slope'. INSERT INTO dmot_borrower VALUES ('L-106'. 'SOHO'. 'Jamaica'. INSERT INTO dmot_depositor VALUES ('A-108'. 220000). 'Joe'). © Copyright G. 'Johnson'). 'Johnson'). INSERT INTO dmot_account VALUES ('A-108'. 'Joe'). INSERT INTO dmot_loan VALUES ('L-104'. INSERT INTO dmot_borrower VALUES ('L-104'. 100000). INSERT INTO dmot_loan VALUES ('L-106'. 'Adams'). INSERT INTO dmot_depositor VALUES ('A-105'. 'Adams'). 180000). INSERT INTO dmot_depositor VALUES ('A-104'. 'Jamaica'. INSERT INTO dmot_depositor VALUES ('A-101'. 'Bob'). INSERT INTO dmot_depositor VALUES ('A-102'. INSERT INTO dmot_loan VALUES ('L-105'. 'Susan'). INSERT INTO dmot_account VALUES ('A-107'. INSERT INTO dmot_borrower VALUES ('L-102'. 50000). 'Park Slope'. 'Brooklyn Heights'. 'Bob'). Campbell 2010 122 . INSERT INTO dmot_depositor VALUES ('A-103'. INSERT INTO dmot_borrower VALUES ('L-103'. You will need to create a similar text file and execute it each time you need to recreate your tables and data quickly. NB. 'East Village'. 200000). INSERT INTO dmot_depositor VALUES ('A-106'. 150000). INSERT INTO dmot_depositor VALUES ('A-107'.Database Management INSERT INTO dmot_account VALUES ('A-106'. 'Susan'). 120000). INSERT INTO dmot_borrower VALUES ('L-105'. INSERT INTO dmot_loan VALUES ('L-101'. 'Christina'). 100000).

3rd Ed.Database Management REFERENCES Date. H.edu. Rob. Campbell 2010 123 . Oracle/SQL Tutorial. Addition-Wesley. Coronel. Database Management Lecture Notes. Cave Hill. J. A Guide to The SQL Standard.wikipedia. Michael. Implementation and Management.db. Department of Computer Science. Thomson Publishing. Helman. Available: http://www. Thomson. Entity Relationship Model. Date. Database Management Systems Lecture Notes. Introduction to Database Systems. Database and Information Systems Group. J. Peter. Available: http://en. Gertz. The Science of Database Management. Carlos. Irwin Peter.ucdavis. Addison-Wesley. Shelly. Paul. Scarlett. (2005). T. G. Davis. 4th Ed.org/wiki/Entityrelationship_diagram.. University of California. Hadrian Dr. C. UWI. C. Discovering Computers 2006. [On-line]. © Copyright G. Database Systems: Design.cs. Cashman.

Sign up to vote on this title
UsefulNot useful