You are on page 1of 18

INTRODUCTION TO DATABASE

• Data: Is unorganized form (such as alphabets, numbers, or symbols) that refers to, or
represent, conditions, ideas, or objects.

• Information: Data processed to be useful in decision making

• Data item—smallest named unit of data that has meaning in the real world (examples:
last name, address, ssn, political party)

• Data aggregate (or group) - a collection of related data items that form a whole concept;
a simple group is a fixed collection, e.g. date (month, day, year); a repeating group is a
variable length collection, e.g. a set of aliases.

• File - collection of records of a single type (examples: president, election)

• Database - computerized collection of interrelated stored data that serves the needs of
multiple users within one or more organizations, i.e. interrelated collections of records of
potentially many types. Motivation for databases over files: integration for easy access
and update, non-redundancy, multi-access.

• Server-Based Database: It is a multiuser database that is designed to be hosted on a


server instead of a desktop.

• Desktop Database: This is a database that is primarily designed to stand alone on a


desktop or a PC for a single user. Microsoft Access and FileMaker Pro are good
examples.

• Open Source Software (OSS): Computer software available with its source code under
an open source license to study, change and improve its design.

• Three-Tier Architecture: This is a conceptual model for a Web database application


that has at its base the database tier (with database management system), then the middle
tier (hosting the application logic), and finally the client tier (usually the browser).

• Metadata: Data that describes data

• Case-Sensitive - To be aware of the case of character values. In this context, “SPUD,”


“Spud” and “spud” would all be considered as different strings, so the case-sensitivity of
a function or query will influence the values they will return.

• Domain of a database attribute is the set of all allowable values that attribute may
assume. a data domain refers to all the values which a data element may contain.  The
rule for determining the domain boundary may be as simple as a data type with
an enumerated list of values.
• Examples: A field for gender may have the domain {male, female, unknown} where
those three values are the only permitted entries in that column.

• Table/relation – The view that displays the data base as a combinations of rows
(records) and columns (fields). The cells contain the bits and pieces of data for each
record in each field. The first row of a table is reserved for the field names.

• View - A database component that behaves exactly like a table but has no independent
existence of its own; a virtual table.

• Field/column- A component of a relation or table that holds a single attribute of that


relation or table.

• Record/row/tuple: An instance of data in a table, a record is a collection of all the facts


related to one physical or conceptual entity; often referring to a single object or person

• Database schema or simply the schema is the physical layout of the database, which
describes how data are organized and stored in the database. In a relational database, the
schema defines the tables, the fields in each table, and the relationships between fields
and tables.

• Instance is an occurrence of a data object in a database.

• Data Redundancy - Having the same data stored in more than one place in a database.

• Null Value - If a field contains a data item, that has a specific value. A field that does not
contain a data item is said to have a null value. In a numeric field, a null value is not the
same as a value of zero; in a character field, a null value is not the same as a blank both
the numeric zero and blank character are definite values. A null value indicates that the
that the field’s value is undefined it's value is not known.

• Database administrator (DBA) - person or group responsible for the effective use of
database technology in an organization or enterprise. Rules of DBA include Design
logical/physical schemas, Handle security and authorization, Data availability (crash
recovery) and Database tuning as needs evolve.

Duties of database administrator

One of the main reasons for using DBMS is to have a central control of both data and the
programs accessing those data. A person who has such control over the system is called a
Database Administrator (DBA). The following are the functions of a Database administrator
• Schema Definition
The Database Administrator creates the database schema by executing DDL statements. 
Schema includes the logical structure of database table(Relation) like data types of
attributes, length of attributes, integrity constraints etc.

• Storage structure and access method definition


Database tables or indexes are stored in the following ways: Flat files, Tree
etc..                                            

• Schema and physical organization modification


The DBA carries out changes to the existing schema and physical organization.

• Database Security: Ensuring that only authorized users have access to the database and
fortifying it against any external, unauthorized access.

• Database Tuning: Tweaking any of several parameters to optimize performance, such as


server memory allocation, file fragmentation and disk usage.

• Backup and Recovery: It is a DBA's role to ensure that the database has adequate
backup and recovery procedures in place to recover from any accidental or deliberate loss
of data.

• Producing Reports from Queries: DBAs are frequently called upon to generate reports
by writing queries, which are then run against the database.

DATABASE MODELS
Data model is the way in which data is organised for storage in a database. There are a number
of different types of database models, also referred to as DBMS models. Each one represents a
somewhat different approach to organizing data in a systematic manner. They include:

• Flat files

• Hierarchical

• Network

• Relational

• Object oriented

The relational database model is by far the most widely used

Flat files
• The most basic way to organize data is as a flat file. You can think of this as a single
table with a large number of records and fields. Everything you need is stored in this
table, or flat file.

Hierarchical

• In a hierarchical database model data item is subordinate to another one. This is called a
parent-child relationship. The hierarchical data model organizes data in a tree-like
structure.

• One of the rules of a hierarchical database is that a parent can have multiple children, but
a child can only have one parent For example, think of an online store that sells many
different products. The entire product catalog would be the parent, and the various types
of products, such as books, electronics, etc., would be the children.

Network

• In a network database model every data item can be related to many others ones. The
database structure is like a graph. This is similar to the hierarchical model and also
provides a tree-like structure. However, a child is allowed to have more than one parent.
In the example of the product catalog, a book could fall into more than one category. The
structure of a network database becomes more like a cobweb of connected elements.

Relational

• In a relational database model all data are organized in the form of tables. This DBMS
model emerged in the 1970s and has become by far the most widely used type of DBMS.
Most of the DBMS software developed over the past few decades uses this model. In a
table, each row represents a record, also referred to as an entity. Each column represents
a field, also referred to as an attribute of the entity.

Object oriented

• Object databases consist of objects rather than relations, tables or other data structures.

• Objects basically consist of the following:

• Attributes - Attributes are data which defines the characteristics of an object. This data
may be simple such as integers, strings, and real numbers or it may be a reference to a
complex object.

• Methods - Methods define the behavior of an object and are what was formally called
procedures or functions.
• Objects = Attributes + Behaviour (Operations)

DATA MODELLING TECHNIQUES


• We are using object-oriented techniques with UML to design data model

• There are other methods which are also commonly used in database design

• One widely used method is called Entity Relationship Modelling (ERM)

• Represents the data model as an Entity Relationship Diagram (ERD)

RELATIONAL DATABASE
A relational database is a collection of data items organized as a set of formally-described tables
from which data can be accessed or reassembled in many different ways without having to
reorganize the database tables.

DBMS
• A Database Management System (DBMS) is a software package designed to store and
manages databases.

• Database management systems (DBMSs) are specially designed software applications


that interact with the user, other applications, and the database itself to capture and
analyse data. A general-purpose DBMS is a software system designed to allow the
definition, creation, querying, update, and administration of databases.

Popular DBMSs
• Microsoft Access. Aimed at small businesses, and useful for desktop applications and
systems with a small number of users

• Microsoft SQL Server, Oracle, IDM DB2. Scalable and secure, and widely used by
large organisations

• MySQL. Open-source and quite powerful, widely used in web sites

• Microsoft SQL Server Compact, Java DB, SQLite. Compact DBMSs, suitable for
mobile devices in particular
DBMS COMPONENTS
• Hardware. The physical computer system that allows physical access to data

• Software. The actual program that allows users to access, maintain, and update physical
data

• Data: stored physically on the storage devices

• User: include end users and application programs

• Procedure. A set of procedures (rules) that should be clearly defined and followed by the
users of the database

DATABASE APPLICATIONS
• Banking: all transactions

• Airlines: reservations, schedules

• Universities: registration, grades

• Sales: customers, products, purchases

• Manufacturing: production, inventory, orders, supply chain

• Human resources: employee records, salaries, tax deductions

 Databases touch all aspects of our lives

ADVANTAGES OF DATABASE SYSTEMS


1. Data Independence: The data is held in such a way that changes to the structure of
the database do not affect any of the programs used to access the data.
2. Consistency of Data: Each item of data is held only once therefore no danger of
item being updated on one system and not on another.
3. Control over Redundancy: In a non-database system, the same information may be
held on several files. This wastes space and makes updating more time-consuming. A
database system minimizes these effects.
4. Integrity of Data: The DBMS provides users with the ability to specify constraints
on data such as making a field entry essential or using a validation routine.
5. Greater Security of Data: The DBMS can ensure only authorized users are allowed
access to the data.
6. Centralized Control of Data : The Database Administrator will control who has
access to what and will structure the database with the needs of the
7. More Information Available to Users: Users have access to a wider range of data
that was previously held in separate departments and sometimes on incompatible
systems.
8. Increased Productivity: The DBMS provides an easy to use query language that
allows users to get immediate response from their queries rather than having to use a
specialist "programmer" to write queries for them. whole department in mind

DISADVANTAGES OF DATABASE SYSTEMS


1. Larger Size: More disk space is required and probably a larger and more powerful
computer.
2. Greater Complexity: For optimum use the database must be very carefully
designed. If not done well, the new system may fail to satisfy anyone.
3. Greater Impact of System Failure: "All eggs in one basket."
4. More Complex Recovery Procedures: If a system failure occurs it is vital that no
data is lost.

PROCESS/STEPS IN DATABASE DESIGN


The process of database design is divided into different parts. It consists of a series of steps.
They are

• Requirement Analysis

• Conceptual Database Design (ER-Diagram)

• Logical Database Design (Tables, Normalization)

• Physical Database design (Table Indexing, Clustering)

• Database implementation, monitoring, and modification

Requirement Analysis

In this phase a detailed analysis of the requirement is done. The objective of this phase is to get a
clear understanding of the requirements. It makes use of various information gathering methods
for this purpose. Some of them are

• Interview

• Analyzing documents
• Survey

• Site visit

Conceptual Database Design

The requirement analysis is modeled in this conceptual design. The ER Model is used at the
conceptual design stage of the database design. The ER diagram is used to represent this
conceptual design. ER diagram consists of Entities, Attributes and Relationships.

Logical Database Design

Once the relationships and dependencies are identified the data can be arranged into logical
structures and is mapped into database management system tables. Normalization is performed
to make the relations in appropriate normal forms.

Physical Database Design

It deals with the physical implementation of the database in a database management system. It
includes the specification of data elements, data types, indexing etc. All these information are
stored in the data dictionary.

INTEGRITY RULES AND CONSTRAINTS


Before one can start to implement the database tables, one must define the integrity constraints.
Integrity means something like 'be right' and consistent. The data in a database must be right and
in good condition.

There are the domain integrity, the entity integrity, the referential integrity and the foreign key
integrity constraints.

• Domain Integrity. Means the definition of a valid set of values for an attribute. You
define - data type, - length or size- is null value allowed- is the value unique or not for an
attribute.
You may also define the default value, the range (values in between) and/or specific
values for the attribute. Some DBMS allow you to define the output format and/or input
mask for the attribute.
These definitions ensure that a specific attribute will have a right and proper value in the
database.
• Entity Integrity – Every table requires a primary key.  The primary key, nor any part of
the primary key, can contain NULL values. This is because NULL values for the primary
key means we cannot identify some rows. For example, in the EMPLOYEE table, Phone
cannot be a key since some people may not have a phone.

• Referential Integrity Constraint The referential integrity constraint is specified


between two tables and it is used to maintain the consistency among rows between the
two tables. It also includes the techniques known as cascading update and cascading
delete, which ensure that changes made to the linked table are reflected in the primary
table.
The rules are:
1. You can't delete a record from a primary table if matching records exist in a related
table.
2. You can't change a primary key value in the primary table if that record has related
records.
3. You can't enter a value in the foreign key field of the related table that doesn't exist in
the primary key of the primary table.
4. However, you can enter a Null value in the foreign key, specifying that the records are
unrelated.

• Foreign Key Integrity Constraint There is two foreign key integrity constraints:
cascade update related fields and cascade delete related rows. These constraints affect the
referential integrity constraint.

Cascade Update Related Fields


Any time you change the primary key of a row in the primary table, the foreign key
values are updated in the matching rows in the related table. This constraint overrules
rule 2 in the referential integrity constraints. 

Cascade Delete Related Rows


Any time you delete a row in the primary table, the matching rows are automatically
deleted in the related table. This constraint overrules rule 1 in the referential integrity
constraints. 

BUSINESS RULES
Another term used is semantics.  Business rules are obtained from users when gathering
requirements.  The requirements gathering process is very important and should be verified by
the user before the database design is built.  If the business rules are incorrect, the design will be
incorrect and ultimately the application built will not function as expected by the users.

Some examples of business rules are:


• A teacher can teach many students

• A class can have a maximum of 35 students

• A course can be taught many times, but by only one instructor

• Not all teachers teach classes, etc

The Entity-Relationship Model


• The E-R model is a detailed, logical representation of the data for an organisation or
business area

• It must be flexible enough so that it can be used and understood in practically any
environment where information is modelled

The ER model
• It is expressed in terms of entities in the business environment, the relationships (or
associations) among those entities and the attributes (properties) of both the entities and
their relationships

• The E-R model is usually expressed as an E-R diagram

E-R Model Constructs


• Entity - person, place, object, event, concept

• Entity Type - is a collection of entities that share common properties or characteristics.


Each entity type is given a name, since this name represents a set of items, it is always
singular. It is placed inside the box representing the entity type.

• Entity instance – is a single occurrence of an entity type. An entity type is described just
once (using metadata) in a database, while many instances of that entity type may be
represented by data stored in the database. e.g. – there is one EMPLOYEE entity type in
most organizations, but there may be hundreds of instances of this entity stored in the
database
Sample E-R Diagram

Strong versus Weak entity type


Most of the basic entity types are classified as strong entity types [Rectangle] – one that
exists independently from other entity types (such as EMPLOYEE)

Always have a unique characteristic (identifier) – an attribute or combination of attributes


that uniquely distinguish each occurrence of that identity
A weak entity type [[Double Rectangle]] – existence depends on some other entity type.
It has no meaning in the ER diagram without the entity on which it depends (such as
DEPENDENT)

The entity type on which the weak entity type depends is called the Identifying owner (or
owner for short).

Identifying relationship is the relationship between a weak entity type and and its owner
(such as ‘Has’ in the following Fig.)

Weak entity identifier is its partial identifier (double underline) combined with that of its
owner. During a later design stage dependent name will be combined with Employee_ID
(the identifier of the owner) to form a full identifier for DEPENDENT.

Example of a weak entity

Attributes
• An attribute is a property or characteristic of an entity type, for example the entity
EMPLOYEE may have attributes Employee_Name and Employee_Address.
• In ER diagrams place attributes name in an ellipse with a line connecting it to its
associated entity

• Attributes may also be associated with relationships

• An attribute is associated with exactly one entity or relationship

Simple versus composite attributes


• Some attributes can be broken down into meaningful component parts, such as Address,
which can be broken down into Street Address, City. etc.

• The component attributes may appear above or below the composite attribute on an ER
diagram

• Provide flexibility to users, as can refer to it as a single unit or to the individual


components

• A simple (atomic) attribute is one that cannot be broken down into smaller components

A composite attribute

Single-Valued versus Multivalued Attribute


• It frequently happens that there is an attribute that may have more than one value for a
given instance, e.g. EMPLOYEE may have more than one Skill.

• A multivalued attribute is one that may take on more than one value – it is represented by
an ellipse with double lines

Entity with a multivalued attribute (Skill) and derived attribute


(Years_Employed)

Derived Attributes
• Some attribute values can be calculated or derived from others

• e.g., if Years_Employed needs to be calculated for EMPLOYEE, it can be calculated


using Date_Employed and Today's_Date
• A derived attribute is one whose value can be calculated from related attribute values
(plus possibly other data not in the database)

• A derived attribute is signified by an ellipse with a dashed line (see previous Fig.)

Identifier attribute
• Identifier attribute or Key is an attribute (or combination of attributes) that uniquely
identifies individual instances of an entity type, such as Student_ID

• To be a candidate identifier, each entity instance must have a single value for the
attribute, and the attribute must be associated with each entity

• The identifier attribute is underlined, such as Student_ID

Primary Key (PK) - Is a value that can be used to identify a unique row in a
table. In the relational model of data, a primary key is a candidate key chosen as the main
method of uniquely identifying a tuple in a relation.

To qualify as a primary key for an entity, an attribute must have the following properties:

 It must have a non-null value for each instance of the entity

 The value must be unique for each instance of an entity

 The values must not change or become null during the life of each entity instance

Types of Key

• Candidate key - A candidate key is a field or combination of fields that can act as a
primary key field for that table to uniquely identify each record in that table.  It can be
defined as minimal Super Key or irreducible Super Key

• Alternate key - An alternate key is any candidate key which is not selected to be the
primary key.

• Super Key – An attribute or a combination of attribute that is used to identify the records
uniquely is known as Super Key. A table can have many Super Keys.

• Secondary (Index) Key: An attribute or a set of attributes that has been used to construct
the data retrieval index.
• Concatenated (Combined, compound or Composite) Key: A set of attributes that has
been used as the key. Is a key that consists of 2 or more attributes

• Foreign key - a foreign key (FK) is a field or group of fields in a database record that
points to a key field or group of fields forming a key of another database record in some
(usually different) table. Usually a foreign key in one table refers to the primary key (PK)
of another table. This way references can be made to link information together and it is
an essential part of database normalization

• Natural key.  A composite primary key which is composed of attributes (fields) which
already exist in the real world e.g. First Name, Last Name, Social Security Number. 

• Surrogate key.  A primary key which is internally generated (typically auto-


incremental integer value) that does not exist in the real world i.e. ID=1 for Customer A
and ID=2 for Customer B serves to uniquely identify the record but has no bearing the
customer themselves and is an attribute they will never (need to) be aware of.

• The Unique key (UK) uniquely identifies each record in a database table.

 Can be more than one unique key in one table.

 Unique key can have null values

 Unique key can be null

Difference between Primary Key and Unique Key

Primary Keygh Unique Key


Primary Key can't accept null values. Unique key can accept null value.

By default, Primary key is clustered By default, Unique key is a unique non-clustered


index and data in the database table is index.
physically organized in the sequence of
clustered index.
We can have only one Primary key in a We can have more than one unique key in a table.
table.
Simple and composite key attributes
(a) Simple key attribute
Composite Identifier

• A Composite Identifier is when there is no single (or atomic) that can serve as an
identifier

• Flight_ID is a composite identifier that has component attributes Flight_Number and


Date – this combination is required to uniquely identify individual occurrences of Flight

• Flight_ID is underlined, whilst its components are not

(b) Composite key attribute

You might also like