Documenting Information Systems

DOCUMENTATION OF INFORMATION SYSTEMS
Several tools are available for documenting the flow of information within an organization, from a simple
descriptive method to a diagrammatic model. For example, using short statements to list the set of
activities to be performed within a department of an organization is an example of a simple descriptive
method.
Data Flow Diagrams (DFDs)

A data flow diagram, also known as DFD, is a graphical representation of the information flow within a
system. It is an example of a diagrammatic model. A data flow diagram can also be used for the
visualization of data processing (structured design). It is common practice for a designer to draw a
context-level DFD first which shows the interaction between the system and outside entities. This context-
level DFD is then "exploded" to show more detail of the system being modeled. With a dataflow diagram,
users are able to visualize how the system will operate, what the system will accomplish and how the
system will be implemented. Old system dataflow diagrams can be drawn up and compared with the new
systems dataflow diagrams to draw comparisons to implement a more efficient system. Dataflow diagrams
can be used to provide the end user with a physical idea of where the data they input, ultimately has an
effect upon the structure of the whole system from order to dispatch to restock how any system is
developed can be determined through a dataflow diagram.
Types of DFD
In analyzing a business, several sets of DFDs are drawn. Initial DFDs might model the existing system
(flaws and all), while later DFDs may model a solution to the problem being analyzed. For these solution
DFDs a logical and physical DFD is drawn. Physical DFDs represent physical files and transactions, while
logical or conceptual DFDs can be used to represent business functions or processes.
Components of a DFD
A data flow diagram illustrates the processes, data stores, and external entities in a business or other
system and the connecting data/resource flows.
The four components of a data flow diagram (DFD) are:
 External Entities (represented by a square or oval)
 Processes (represented by a rounded rectangle)
 Data Stores (represented by an open ended narrow rectangle)
 Data Flows (represented by an arrow)
 Resource Flows
Data Flow Diagrams – Diagram Notation

There are only five symbols that are used in the drawing of business process diagrams (data flow
diagrams). These are now explained, together with the rules that apply to them.

This diagram represents a banking process, which maintains customer accounts. In this example,
CAPE NOTES INFORMATION TECHNOLOGY Unit 2 1

customers can withdraw or deposit cash, request information about their account or update their account
details. The five different symbols used in this example represent the full set of symbols required to draw
any business process diagram.
External Entities
An external entity is a source (where the data comes from) or destination (where the data goes) of a data
flow which is outside the system but interacts with the system (e.g., person, department or organization).
When modeling a DFD, the designer is not interested in the inner workings of the external entity, but only
what data is produced/needed by the entity. The symbol used to represent an external entity is a square or
oval.
Student OR
Custome
r
Diagrams used to represent external entities
Process
A process shows a transformation or manipulation of data flows within the system. A process is a manual
or computerized routine that changes data by performing a calculation, updating information, sorting a list,
and so on. There are two ways in which a process is represented, one with a rounded rectangle, which is
numbered and labeled with a verb phase (except for the process numbered 0, which gives the system’s
name). Two, with a rectangle that has three parts an identification number, a location that appears to the
right of the identifier which describes where in the system the process takes place and a descriptive title
placed in the centre of the box. The process shows a part of the system that transforms inputs into outputs;
that is, it shows how one or more inputs are changed into outputs.
Enroll OR
student
Diagrams used to represent a process
Data Store
A data store is a holding place for information within the system. It is represented by an open ended
narrow rectangle. Data stores may be long-term files such as sales ledgers, or may be short-term
accumulations: for example batches of documents that are waiting to be processed. Each data store should
be given a reference followed by an arbitrary number. It could represent the data being held manually or
on a computer. Basically, a data store is a logical collection of data.
OR
D1Student’s Records

Diagram used to represent data store

Data flow
A data flow shows the flow of information from its source to its destination. A data flow is represented by
a line, with arrowheads showing the direction of flow. Information always flows to or from a process or a
data store and may be written, verbal or electronic.
Diagram used to represent data flow
Resource Flow
A resource flow shows the flow of any physical material from its source to its destination. For this reason
they are sometimes referred to as physical flows. The physical material in question should be given a
meaningful name. Resource flows are usually restricted to early, high-level diagrams and are used when a
description of the physical flow of materials is considered to be important to help the analysis.
Diagram used to represent resource
Rules governing how the diagram components relate to each other
Data Flows
For data flows the rules are as follows:
Data flows and resource flows are allowed between external entities and processes. Data flows are also
allowed between different external entities. However, data flows and resource flows are not allowed
between external entities and data stores.
Processes
For processes the data flow rules are as follows:
Data flows and resource flows are allowed between processes and external entities and between processes
and data stores. They are also allowed between different processes. In other words processes can
communicate with all other areas of the business process diagram.
Data Stores
For data stores the data flow rules are as follows:
Data flows and resource flows are allowed between data stores and processes. However, these flows are
not allowed between data stores and external entities or between one data store and another. In practice
this means that data stores cannot initiate a communication of information, they require a process to do
this.

Data Flow Diagrams – Context Diagrams (Level 0 Diagram)
The first step in constructing DFDs is to draw a context diagram. A context diagram is a top-level view of
an information system that shows the system’s boundaries and scope. To draw a context diagram, you
start by placing a single process symbol in the centre of the page. The symbol represents the entire system,
and you identify it as process 0. Then you place the entities around the perimeter of the page and use data
flows to connect the entities to the central process. You do not show any data stores in a context diagram
because data stores are internal to the system. In drawing the context diagram you should only be
concerned with the most important information flows. Remember that no data flow process diagram is
the definitive solution - there is no absolute right or wrong.

Context Diagram Example

The context diagram shown above represents a book lending library. The library receives details of books,
and orders books from one or more book suppliers. Books may be reserved and borrowed by members of
the public, who are required to give a borrower number. The library will notify borrowers when a reserved
book becomes available or when a borrowed book becomes overdue. In addition to supplying books, a
book supplier will furnish details of specific books in response to library enquiries.
Note, that communications involving external entities are only included where they involve the 'system'
process. Whilst a book supplier would communicate with various agencies, for example, publishers and
other suppliers - these data flow are remote from the 'system' process and so this is not included on the
context diagram.
Data Flow Diagrams – Level 1 Diagrams

The level 1 diagram shows the main functional areas of the system under investigation. As with the
context diagram, any system under investigation should be represented by only one level 1 diagram.

There is no formula that can be applied in deciding what is, and what is not, a level 1 process. Level 1
processes should describe only the main functional areas of the system, and you should avoid the
temptation of including lower level processes on this diagram. As a general rule no data flow diagram
should contain more than 12 process boxes.
The level 1 diagram is surrounded by the outline of a process box that represents the boundaries of the
system. Because the level 1 diagram depicts the whole of the system under investigation, it can be difficult
to know where to start.
Some additional guidelines for creating DFDs:
1. All processes must have at least one data flow in and one data flow out.
2. All processes should modify the incoming data, producing new forms of outgoing data.
3. Each data store must be involved with at least one data flow.
4. Each external entity must be involved with at least one data flow.
5. A data flow must be attached to at least one process.
6. Two entities, or two data stores, or an entity and a data store or vice versa, cannot be connected
together with a data flow.
7. Choose meaningful names for processes, flows, stores, and external entities.
8. Number the processes.
9. Avoid overly complex DFDs, do not cross lines.
10. Make sure the DFD is internally consistent and consistent with any associated DFDs.
11. The general principle in Data Flow Diagramming is that a system can be decomposed into
subsystems, and subsystems can be decomposed into lower level subsystems, and so on.
12. Each subsystem represents a process or activity in which data is processed. At the lowest level,
processes can no longer be decomposed.
13. Each process in a DFD has the characteristics of a system.
14. Data enters the system from the environment; data flows between processes within the system;
and data is produced as output from the system.

Dos and Don’ts
1. An Entity must be a noun (e.g. Member, Student, Customer etc.)
2. Two Data Stores, two Entities, or an Entity and a Data Store or vice versa, must not be
directly connected.
D1 D1
Video File Video File
Member Student
D1
Member Video File
N.B. Always use one or more processes

3. A Process must have at least one Data Flow entering it and one leaving it.
Video Number 1 Video Details

Get Video
Details
4.
1
Print airline Ticket A Process that has no input is called a Miracle. It is
tickets impossible for an information system to process data it
doesn’t have.
2
5. Serial numbers A Process that has no output is called a Black hole. A
Sort Serial
process that just does
numbers
something to data, but doesn’t put that data anywhere is
useless.
6. A Data Flow must not be split into two or more Data Flows

7. All Data Flows, Entities, Processes or Data Store must be labelled.
8. The Data Flow label leading to a Process must be different from the Data Flow label being
outputted from that same Process.
(If you put grapes into a blender and blend you will not get grapes when finished but grape juice)

Example of a Data Flow Diagram
This data flow diagram represents a system used in a video rental store.
D1 Video File
Video Record
1 1
Video Video Video
Number Get Details Co
Video mp
ute
Details Bill
Member Tot
Details Bill Total
al
Payment
Member Member
Number 1 1
Get Ge
Memb t
er Pa
Details
Member Payment ym Member
Record Detailsent
D1 Member File
Member
Record
1
Pri
nt
Re
Class Activity cei
pt
Use the Data Flow Diagram below to answer the following questions.
1. How many processes are in the diagram?

2. Give the names of two of the processes.
3. What action is done to bill amount?
4. What is bill amount?
5. How many data stores are in the diagram
6. Name an external entity.
Data Flow Diagrams – Practices Exercises
Question 1
A student completes an application to become a member of a library. The student takes the application to
the librarian who checks the form and then issue a membership card. The completed application is stored
as Member Record in file, named Member Folder.
To borrow books, a member takes the books and his membership card to the librarian. She updates her
cards in the Book Folder.
When a member returns the borrowed books to the library, the librarian updates her records.
New books come into the library periodically. The librarian records the relevant details on cards, which
are stored in the Book Folder.
Required: Construct a Context Diagram (Level 0).
Question 2
You have an address book in which you keep details of your friends. From time to time, you will make
amendments such as change of address or telephone number for ‘old’ friend or add details for ‘new’
friends. In addition, a friend may request that you get the address or telephone number (from the address
book) of another contact.
Required: Construct a Context Diagram (Level 0).

CAPE
Information Technology
Syllabus Focus: Unit 2 Module 1 Content 9
Specific Objective 9: Explain the concept of normalization;
Content: Definition of normalisation; attribute redundancy and anomalies; normal forms: including First
Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF); keys: primary, Foreign
and Composite (or compound or concatenated); Partial and Non-key Dependencies; relationships, use of
entity-relationship diagrams (ERD).
explain the concept of Definition of normalisation; attribute redundancy and

normalization; anomalies; normal forms: including first normal form
(1NF), second normal form (2NF), third normal form
(3NF); keys: primary, foreign and composite (or
compound or concatenated); partial and non-key
dependencies; relationships, use of entity-relationship
diagrams (ERD).
Database Normalisation, sometimes referred to as Canonical Synthesis, is a technique for designing

Relational Database tables to minimize duplication of information and, in so doing, to safeguard the
Database against certain types of logical or structural problems, namely Data Anomalies. For example,
when multiple instances of a given piece of information occur in a table, the possibility exists that these
instances will not be kept consistent when the data within the table is updated, leading to a loss of data
integrity. A table that is sufficiently normalized is less vulnerable to problems of this kind, because its
structure reflects the basic assumptions for when multiple instances of the same information should be
represented by a single instance only.
Higher degrees of normalization typically involve more tables and create the need for a larger number of
joins, which can reduce performance. Accordingly, more highly normalized tables are typically used in
database applications involving many isolated transactions (e.g. an automated teller machine), while less
normalized tables tend to be used in database applications that need to map complex relationships between
data entities and data attributes (e.g. a reporting application, or a full-text search application).
Database theory describes a table's degree of normalization in terms of normal forms of successively
higher degrees of strictness. A table in third normal form (3NF), for example, is consequently in second
normal form (2NF) as well; but the reverse is not necessarily the case.
Problems addressed by Normalisation
Although the Normal Forms are often defined informally in terms of the characteristics of tables, rigorous
definitions of the normal forms are concerned with the characteristics of mathematical constructs known
as relations. Whenever information is represented relationally, it is meaningful to consider the extent to
which the representation is normalised.

A table that is not sufficiently normalized can suffer from logical inconsistencies of various types, and
from anomalies involving data operations. In such a table:
 The same information can be

expressed on multiple records;
therefore updates to the table
may result in logical
inconsistencies. For example,
each record in an "Employees'
Skills" table might contain an
Employee ID, Employee
Address, and Skill; thus a
change of address for a
particular employee will potentially need to be applied to multiple records (one for each of his
skills). If the update is not carried through successfully—if, that is, the employee's address is
updated on some records but not others—then the table is left in an inconsistent state. Specifically,
the table provides conflicting answers to the question of what this particular employee's address is.
This phenomenon is known as an Update Anomaly.
 There are circumstances in which certain facts cannot
be recorded at all. For example, each record in a
"Faculty and Their Courses" table might contain a
Faculty ID, Faculty Name, Faculty Hire Date, and
Course Code—thus we can record the details of any
faculty member who teaches at least one course, but
we cannot record the details of a newly-hired faculty
member who has not yet been assigned to teach any
courses. This phenomenon is known as an Insertion
Anomaly.
 There are circumstances in which the deletion of data
representing certain facts necessitates the deletion of
data representing completely different facts. The
"Faculty and Their Courses" table described in the
previous example suffers from this type of anomaly,
for if a faculty member temporarily ceases to be
assigned to any courses, we must delete the last of the
records on which that faculty member appears. This
phenomenon is known as a Deletion Anomaly.
Ideally, a relational database table should be designed in such

a way as to exclude the possibility of Update, Insertion, and
Deletion Anomalies. The Normal Forms of Relational
Database theory provides guidelines for deciding whether a
particular design will be vulnerable to such anomalies. It is possible to correct an unnormalised design
so as to make it adhere to the demands of the Normal Forms: this is called normalization. Removal
of redundancies of the tables will lead to several tables, with Referential Integrity restrictions between
them.

Normalisation typically involves decomposing an unnormalized table into two or more tables that
were they are to be combined (joined), would convey exactly the same information as the original
table.
Referential Integrity in a Relational Database is consistency

between coupled tables. Referential Integrity is usually
enforced by the combination of a primary key or candidate key
(alternate key) and a foreign key. For Referential Integrity to
hold, any field in a table that is declared a foreign key can
contain only values from a parent table's primary key or a
candidate key. For instance, deleting a record that contains a
value referred to by a foreign key in another table would break
referential integrity. The Relational Database Management
System (RDBMS) enforces referential integrity, normally either
by deleting the foreign key rows as well to maintain integrity,
or by returning an error and not performing the delete. Which
method is used would be determined by the referential integrity
constraint, as defined in the Data Dictionary.
Background to Normalisation
Functional Dependency: Attribute B has a functional

dependency on attribute A i.e. A → B if, for each value of
attribute A, there is exactly one value of attribute B. If value of
A is repeating in tuples then value of B will also repeat. In our
example, Employee Address has a functional dependency on Employee ID, because a particular Employee
ID value corresponds to one and only one Employee Address value. (Note that the reverse need not be
true: several employees could live at the same address and therefore one Employee Address value could
correspond to more than one Employee ID. Employee ID is therefore not
functionally dependent on Employee Address.) An attribute may be
functionally dependent either on a single
attribute or on a combination of attributes. It
is not possible to determine the extent to
which a design is normalized without
understanding what functional
dependencies apply to the attributes within
its tables; understanding this, in turn,
requires knowledge of the problem domain.
For example, an Employer may require certain
employees to split their time between two locations, such as Kingston and Spanish Town, and therefore
want to allow Employees to have more than one Employee Address. In this case, Employee Address
would no longer be functionally dependent on Employee ID.
 Primary key: Most DBMSs require a table to be defined as having a single unique key, rather than
a number of possible unique keys. A primary key is a key which the database designer has
designated for this purpose.

 Super key: A super key is an attribute or set of attributes that uniquely identifies rows within a
table; in other words, two distinct rows are always guaranteed to have distinct super keys.
{Employee ID, Employee Address, Skill} would be a super key for the "Employees' Skills" table;
{Employee ID, Skill} would also be a super key.
 Candidate key: A candidate key is a minimal super key, that is, a super key for which we can say
that no proper subset of it is also a super key. {Employee Id, Skill} would be a candidate key for
the "Employees' Skills" table.
 Non-prime attribute: A non-prime attribute is an attribute that does not occur in any candidate
key. Employee Address would be a non-prime attribute in the "Employees' Skills" table.
NORMAL FORMS
The Normal Forms (abbrev. NF) of Relational Database theory provide criteria for determining a table's
degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to
a table, the less vulnerable it is to inconsistencies and anomalies. Each table has a "highest normal form"
(HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower
than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its
HNF.
The Normal Forms are applicable to individual tables; to say that an entire database is in Normal Form n
is to say that all of its tables are in Normal Form n.
First Normal Form
First normal form (1NF or Minimal Form) is a normal form used in database normalization. A
relational database table that adheres to 1NF is one that meets a certain minimum set of criteria. These
criteria are basically concerned with ensuring that the table is a faithful representation of a relation and
that it is free of repeating groups.
The concept of a "repeating group" is, however, understood in different ways by different theorists. As a
consequence, there is not universal agreement as to which features would disqualify a table from being in
1NF. Most notably, 1NF as defined by some authors (for example, Ramez Elmasri and Shamkant B.
Navathe, following the precedent established by Edgar F. Codd) excludes relation-valued attributes (tables
within tables); whereas 1NF as defined by other authors (for example, Chris Date) permits them.
According to Date's definition of 1NF, a table is in 1NF if and only if it is "isomorphic to some relation",
which means, specifically, that it satisfies the following five conditions:
1. There's no top-to-bottom ordering to the rows.

2. There's no left-to-right ordering to the columns.
3. There are no duplicate rows.
4. Every row-and-column intersection contains exactly one value from the applicable domain (and
nothing else).
5. All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden
timestamps].

Violation of any of these conditions would mean that the table is not strictly relational, and therefore that
it is not in 1NF.
Examples of tables (or views) that would not meet this definition of 1NF are:
 A table that lacks a unique key. Such a table would be able to accommodate duplicate rows, in
violation of condition 3.
 A table whose definition mandates that results be returned in a particular order, so that the row-
ordering is an intrinsic and meaningful aspect of the view. This violates condition 1. The tuples
in true relations are not ordered with respect to each other.
 A table with at least one null able attribute. A nullable attribute would be in violation of
condition 4, which requires every field to contain exactly one value from its column's domain. It
should be noted, however, that this aspect of condition 4 is controversial. It marks an important
departure from Codd's original vision of the relational model, which made explicit provision for
nulls.
Date's fourth condition, which expresses "what most people think of as the defining feature of 1NF", is
concerned with repeating groups. The following example illustrates how a database design might
incorporate repeating groups, in violation of 1NF.
Example 1: Domains and values
Suppose novice designer CUSTOMER

wishes to record the names Customer ID First Name Surname Telephone Number
and telephone numbers of 123 Robert Ingram 555-861-2025
customers. He defines a 456 Jane Wright 555-403-1659
customer table which looks 789 Maria Fernandez 555-808-9633
like this:
The designer then becomes aware of a requirement to record multiple telephone numbers for some
customers. He reasons that the
simplest way of doing this is CUSTOMER
to allow the "Telephone Customer ID First Name Surname Telephone Number
Number" field in any given 123 Robert Ingram 555-861-2025
record to contain more than 456 Jane Wright 555-403-1659
one value: 555-776-4100
789 Maria Fernandez 555-808-9633
Assuming, however, that the
Telephone Number column is defined on some Telephone Number-like domain (e.g. the domain of strings
12 characters in length), the representation above is not in 1NF. 1NF (and, for that matter, the RDBMS)
prohibits a field from containing more than one value from its column's domain.

Example 2: Repeating groups across columns
The designer might attempt to get around this restriction by defining multiple Telephone Number
columns:
CUSTOMER
Customer ID First Name Surname Tel. No. 1 Tel. No. 2 Tel. No. 3
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659 555-776-4100
This representation, however, makes use of null able columns, and therefore does not conform to Date's
definition of 1NF. Even if the table is taken that null able columns are allowed, the design is not in
keeping with the spirit of 1NF. Tel. No. 1, Tel. No. 2., and Tel. No. 3. share exactly the same domain and
exactly the same meaning; the splitting of Telephone Number into three headings is artificial and causes
logical problems. These problems include:
 Difficulty in querying the table. Answering such questions as "Which customers have telephone
number X?" and "Which pairs of customers share a telephone number?" is awkward.
 Inability to enforce uniqueness of Customer-to-Telephone Number links through the RDBMS.
Customer 789 might mistakenly be given a Tel. No. 2 value that is exactly the same as her Tel. No. 1
value.
 Restriction of the number of telephone numbers per customer to three. If a customer with four
telephone numbers comes along, we are constrained to record only three and leave the fourth
unrecorded. This means that the database design is imposing constraints on the business process,
rather than (as should ideally be the case) vice-versa.
Example 3: Repeating groups within columns
The designer might, alternatively, retain the single Telephone Number column but alter its domain,
making it a string of sufficient length to accommodate multiple telephone numbers:
CUSTOMER
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659, 555-776-4100
This design is not consistent with 1NF, and presents several design issues. The Telephone Number
heading becomes semantically woolly, as it can now represent either a telephone number, a list of
telephone numbers, or indeed anything at all. A query such as "Which pairs of customers share a
telephone number?" is more difficult to formulate, given the necessity to cater for lists of telephone
numbers as well as individual telephone numbers. Meaningful constraints on telephone numbers are also
very difficult to define in the RDBMS with this design.

A design that complies with 1NF
A design that is unambiguously in 1NF makes use of two tables: a Customer Name table and a Customer
Telephone Number table.
CUSTOMER NAME CUSTOMER TELEPHONE NUMBER

Customer First Customer ID Telephone Number
ID Name123 555-861-2025
123 Robert456 555-403-1659
456 Jane 456 555-776-4100
789 Maria789 555-808-9633
Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone
Number link appears on its own record.
Newcomers to database design sometimes suppose that normalisation proceeds in an iterative fashion, i.e.
a 1NF design are first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of
how normalization typically works. A sensibly designed table is likely to be in 3NF on the first attempt;
furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF. Achieving the "higher"
normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the
designer, because 3NF tables usually need no modification to meet the requirements of these higher
normal forms.
Edgar F. Codd originally defined the first three normal forms (1NF, 2NF, and 3NF). These Normal
Forms have been summarized as requiring that all non-key attributes be dependent on "the key, the whole
key and nothing but the key".
A table is in first normal form (1NF) if and only if it represents a relation. Given that database tables
embody a relation-like form, the defining characteristic of one in First Normal Form is that it does not
allow duplicate rows or nulls. Simply put, a table with a unique key (which, by definition, prevents
duplicate rows) and without any null able columns is in 1NF.
By redefining 1NF to exclude null able columns in 1NF, no level of normalization can ever be achieved
unless all null able columns are completely eliminated from the entire database.
One requirement of a relation is that every table contains exactly one value for each attribute. This is
sometimes expressed as "no repeating groups". While that statement itself is axiomatic, experts disagree
about what qualifies as a "repeating group", in particular whether a value may be a relation value; thus the
precise definition of 1NF is the subject of some controversy. Notwithstanding, this theoretical uncertainty
applies to relations, not tables. Table manifestations are intrinsically free of variable repeating groups
because they are structurally constrained to the same number of columns in all rows.
Put at its simplest; when applying 1NF to a database, every record must be the same length. This means
that each record has the same number of fields, and none of them contain a null value.

Second Normal Form
The criteria for second normal form (2NF) are:
 The table must be in 1NF.

 None of the non-prime attributes of the table are functionally dependent on a part (proper subset) of a
candidate key; in other words, all functional dependencies of non-prime attributes on candidate keys
are full functional dependencies. For example, consider an "Employees' Skills" table whose attributes
are Employee ID, Employee Name, and Skill; and suppose that the combination of Employee ID and
Skill uniquely identifies records within the table. Given that Employee Name depends on only one of
those attributes – namely, Employee ID – the table is not in 2NF.
 In simple terms, a table is 2NF if it is in 1NF and all fields are dependent on the whole of the primary
key, or a relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each
candidate key of the relation.
 Note that if none of a 1NF table's candidate keys are composite – i.e. every candidate key consists of
just one attribute – then we can say immediately that the table is in 2NF.
 All columns must be a fact about the entire key, and not a subset of the key.
Consider a table describing employees' skills:
EMPLOYEES' SKILLS
Employee Skill Current Work Location
Jones Typing 114 Main Street
Jones Shorthand 114 Main Street
Jones Whittling 114 Main Street
Roberts Light Cleaning 73 Industrial Way
Ellis Alchemy 73 Industrial Way
Ellis Juggling 73 Industrial Way
Harrison Light Cleaning 73 Industrial Way
The table's only candidate key is {Employee, Skill}.
The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely
Employee. Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are
represented: we are told three times that Jones works at 114 Main Street, and twice that Ellis works at 73
Industrial Way. This redundancy makes the table vulnerable to update anomalies: it is, for example,
possible to update Jones' work location on his "Typing" and "Shorthand" records and not update his
"Whittling" record. The resulting data would imply contradictory answers to the question "What is Jones'
current work location?"

A 2NF alternative to this design would represent the same information in two tables:
EMPLOYEES EMPLOYEES' SKILLS

Employee Current WorkEmployee
Location
Jones Typing
Jones 114 Main Street
Jones Shorthand
Roberts 73 Industrial
Jones Whittling
Way Roberts Light Cleaning
Ellis 73 Industrial
Ellis Alchemy
Way Ellis Juggling
Harrison 73 Industrial
Harrison Light Cleaning
Way
Update anomalies cannot occur in these tables, which are both in 2NF.
Not all 2NF tables are free from update anomalies, however. An example of a 2NF table which suffers
from update anomalies is:
TOURNAMENT WINNERS
Tournament Year Winner Winner Date of Birth
Des Moines Masters 1998 Chip Masterson 14 March 1977
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
Even though Winner and Winner Date of Birth are determined by the whole key {Tournament, Year} and
not part of it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple
records. This problem is addressed by third normal form (3NF).
Third Normal Form
A table for which there are no partial functional dependencies on the primary key is typically, but not
always, in 2NF. In addition to the primary key, the table may contain other candidate keys; it is necessary
to establish that no non-prime attributes have part-key dependencies on any of these candidate keys.
Multiple candidate keys occur in the following table:
ELECTRIC TOOTHBRUSH MODELS

Manufacturer Model Model Full Name Manufacturer Country
Forte X-Prime Forte X-Prime Italy
Forte Ultraclean Forte Ultraclean Italy
Dent-o-Fresh EZBrush Dent-o-Fresh EZBrush USA
Kobayashi ST-60 Kobayashi ST-60 Japan

Hoch Toothmaster Hoch Toothmaster Germany
Hoch Contender Hoch Contender Germany
Even if the designer has specified the primary key as {Model Full Name}, the table is not in 2NF.
{Manufacturer, Model} is also a candidate key, and Manufacturer Country is dependent on a proper subset
of it: Manufacturer.
Third Normal Form
The Third Normal Form (3NF) is a normal form used in database normalization. 3NF was originally
defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the
following conditions hold:
 The relation R (table) is in second normal form (2NF)

 Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every key
of R.
An example of a 2NF table that fails to meet the requirements of 3NF is:
TOURNAMENT WINNERS
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
The only candidate key is {Tournament, Year}.
The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent
on {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is
functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing
to stop the same person from being shown with different dates of birth on different records.
In order to express the same facts without violating 3NF, it is necessary to split the table into two:
TOURNAMENT WINNERS PLAYER DATES OF BIRTH

Tournament Year Winner Player Date of Birth
Indiana Chip Masterson 14 March 1977
1998 Al Fredrickson
Invitational Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson Bob Albertson 28 September 1968
Des Moines
1999 Al Fredrickson
Masters
Indiana
1999 Chip Masterson
Invitational

Update anomalies cannot occur in these tables, which are both in 3NF.
Additional Reading
An Introduction to Database Normalization
By Mike Hillyer
Table of contents:
 Introduction
 Mike's Bookstore
 First Normal Form
 Defining Relationships
 Second Normal Form
 Third Normal Form
 Joining Tables
 Conclusion
 Resources

Normalization [Extra Notes]
Introduction
This article will take a practical look at database normalization, focusing on the first three of seven
generally recognized normal forms. Additional resources that look at the theory of database normalization
and the additional normal forms can be found in the Resources section at the end of this article.
Note:
This article has been updated from an original version published in 2003.
Mike's Bookstore
Let's say you were looking to start an online bookstore. You would need to track certain information about
the books available to your site viewers, such as:
 Title
 Author
 Author Biography
 ISBN
 Price
 Subject
 Number of Pages
 Publisher
 Publisher Address
 Description
 Review
 Reviewer Name
Let's start by adding the book that coined the term ?Spreadsheet Syndrome?. Because this book has two
authors, we are going to need to accommodate both in our table. Lets take a look at a typical approach
Table 1. Two Books
Title Author Bio ISBN Subject Pages Publisher

Chad Russell is a
programmer and network
Beginning
Chad administrator who owns his MySQL,
MySQL
Russell, Jon own Internet hosting 1590593324 Database 520 Apress
Database Design
Stephens company., Jon Stephens is a Design
and Optimization
member of the MySQL AB
documentation team.
Lets take a look at some issues involved in this design:

First, this table is subject to several anomalies: we cannot list publishers or authors without having a book
because the ISBN is a primary key which cannot be NULL (referred to as an insertion anomaly).
Similarly, we cannot delete a book without losing information on the authors and publisher (a deletion
anomaly). Finally, when updating information, such as an author's name, we must change the data in every
row, potentially corrupting data (an update anomaly).
Note:
Normalization is a part of relational theory, which requires that each relation (AKA table) has a
primary key. As a result, this article assumes that all tables have primary keys, without which a
table cannot even be considered to be in first normal form.
Second, this table is not very efficient with storage. Lets imagine for a second that our publisher is
extremely busy and managed to produce 5000 books for our database. Across 5000 rows we would need
to store information such as a publisher name, address, phone number, URL, contact email, etc. All that
information repeated over 5000 rows is a serious waste of storage resources.
Third, this design does not protect data consistency. Lets once again imagine that Jon Stephens has written
20 books. Someone has had to type his name into the database 20 times, and it is possible that his name
will be misspelled at least once (i.e.. John Stevens instead of Jon Stephens). Our data is now in an
inconsistent state, and anyone searching for a book by author name will find some of the results missing.
This also contributes to the update anomalies mentioned earlier.
First Normal Form
The normalization process involves getting our data to conform to progressive normal forms, and a higher
level of normalization cannot be achieved unless the previous levels have been satisfied (though many
experienced designers can create normalized tables directly without iterating through the lower forms).
The first normal form (or 1NF) requires that the values in each column of a table are atomic. By atomic
we mean that there are no sets of values within a column.
In our example table, we have a set of values in our author and subject columns. With more than one value
in a single column, it is difficult to search for all books on a given subject or by a specific author. In
addition, the author names themselves are non-atomic: first name and last name are in fact different
values. Without separating first and last names it becomes difficult to sort on last name.
One method for bringing a table into first normal form is to separate the entities contained in the table into
separate tables. In our case this would result in Book, Author, Subject and Publisher tables.
Table 2. Book Table
ISBN Title Pages

1590593324 Beginning MySQL Database Design and Optimization 520
Table 3. Author Table
Author_ID First_Name Last_name

1 Chad Russell

2 Jon Stephens
3 Mike Hillyer
Table 4. Subject Table
Subject_ID Name
1 MySQL
2 Database Design
Table 5. Publisher Table
Publisher_ID Name Address City State Zip

1 Apress 2560 Ninth Street, Station 219 Berkeley California 94710
Note
The Author, Subject, and Publisher tables use what is known as a surrogate primary key -- an
artificial primary key used when a natural primary key is either unavailable or impractical. In the
case of author we cannot use the combination of first and last name as a primary key because there
is no guarantee that each author's name will be unique, and we cannot assume to have the author's
government ID number (such as SIN or SSN), so we use a surrogate key.
Some developers use surrogate primary keys as a rule, others use them only in the absence of a
natural candidate for the primary key. From a performance point of view, an integer used as a
surrogate primary key can often provide better performance in a join than a composite primary key
across several columns. However, when using a surrogate primary key it is still important to create
a UNIQUE key to ensure that duplicate records are not created inadvertently (but some would
argue that if you need a UNIQUE key it would be better to stick to a composite primary key).
By separating the data into different tables according to the entities each piece of data represents, we can
now overcome some of the anomalies mentioned earlier: we can add authors who have not yet written
books, we can delete books without losing author or publisher information, and information such as author
names are only recoded once, preventing potential inconsistencies when updating.
Depending on your point of view, the Publisher table may or may not meet the 1NF requirements because
of the Address column: on the one hand it represents a single address, on the other hand it is a
concatenation of a building number, street number, and street name.
The decision on whether to further break down the address will depend on how you intend to use the data:
if you need to query all publishers on a given street, you may want to have separate columns. If you only
need the address for mailings, having a single address column should be acceptable (but keep potential
future needs in mind).
Defining Relationships
As you can see, while our data is now split up, relationships between the tables have not been defined.
There are various types of relationships that can exist between two tables:

 One to (Zero or) One
 One to (Zero or) Many
 Many to Many
The relationship between the Book table and the Author table is a many-to-many relationship: A book can
have more than one author, and an author can write more than one book. To represent a many-to-many
relationship in a relational database we need a third table to serve as a link between the two. By naming
the table appropriately, it becomes instantly clear which tables it connects in a many-to-many relationship
(in the following example, between the Book and the Author table).
Table 6. Book_Author Table
ISBN Author_ID
1590593324 1
1590593324 2
Similarly, the Subject table also has a many-to-many relationship with the Book table, as a book can cover
multiple subjects, and a subject can be explained by multiple books:
Table 7. Book_Subject Table
ISBN Subject_ID
1590593324 1
1590593324 2
As you can see, we now have established the relationships between the Book, Author, and Subject tables.
A book can have an unlimited number of authors, and can refer to an unlimited number of subjects. We
can also easily search for books by a given author or referring to a given subject.
The case of a one-to-many relationship exists between the Book table and the Publisher table. A given
book has only one publisher (for our purposes), and a publisher will publish many books. When we have a
one-to-many relationship, we place a foreign key in the table representing the ?many?, pointing to the
primary key of the table representing the ?one?. Here is the new Book table:
Table 8. Book Table
ISBN Title Pages Publisher_ID

1590593324 Beginning MySQL Database Design and Optimization 520 1
Since the Book table represents the ?many? portion of our one-to-many relationship, we have placed the
primary key value of the Publisher as in aPublisher_ID column as a foreign key.
In the tables above the values stored refer to primary key values from the Book, Author, Subject and
Publisher tables. Columns in a table that refer to primary keys from another table are known as foreign
keys, and serve the purpose of defining data relationships.

In database systems (DBMS) which support referential integrity constraints, such as the InnoDB storage
engine for MySQL, defining a column as a foreign key will allow the DBMS to enforce the relationships
you define. For example, with foreign keys defined, the InnoDB storage engine will not allow you to
insert a row into the Book_Subject table unless the book and subject in question already exist in the Book
and Subject tables or if you're inserting NULL values. Such systems will also prevent the deletion of
books from the book table that have ?child? entries in the Book_Subject or Book_Author tables.
Second Normal Form
Where the First Normal Form deals with atomicity of data, the Second Normal Form (or 2NF) deals with
relationships between composite key columns and non-key columns. As stated earlier, the normal forms
are progressive, so to achieve Second Normal Form, your tables must already be in First Normal Form.
The second normal form (or 2NF) any non-key columns must depend on the entire primary key. In the
case of a composite primary key, this means that a non-key column cannot depend on only part of the
composite key.
Let's introduce a Review table as an example.
Table 9. Review Table
ISBN Author_ID Summary Author_URL

1590593324 3 A great book! http://www.openwin.org
In this situation, the URL for the author of the review depends on the Author_ID, and not to the
combination of Author_ID and ISBN, which form the composite primary key. To bring the Review table
into compliance with 2NF, the Author_URL must be moved to the Author table.
Third Normal Form
Third Normal Form (3NF) requires that all columns depend directly on the primary key. Tables violate the
Third Normal Form when one column depends on another column, which in turn depends on the primary
key (a transitive dependency).
One way to identify transitive dependencies is to look at your table and see if any columns would require
updating if another column in the table was updated. If such a column exists, it probably violates 3NF.
In the Publisher table the City and State fields are really dependent on the Zip column and not the
Publisher_ID. To bring this table into compliance with Third Normal Form, we would need a table based
on zip code:
Table 10. Zip Table
Zip City State

94710 Berkeley California

In addition, you may wish to instead have separate City and State tables, with the City_ID in the Zip table
and the State_ID in the City table.
A complete normalization of tables is desirable, but you may find that in practice that full normalization
can introduce complexity to your design and application. More tables often means more JOIN operations,
and in most database management systems (DBMSs) such JOIN operations can be costly, leading to
decreased performance. The key lies in finding a balance where the first three normal forms are generally
met without creating an exceedingly complicated schema.
Joining Tables
With our tables now separated by entity, we join the tables together in our SELECT queries and other
statements to retrieve and manipulate related data. When joining tables, there are a variety of JOIN
syntaxes available, but typically developers use the INNER JOIN and OUTER JOIN syntaxes.
An INNER JOIN query returns one row for each pair or matching rows in the tables being joined. Take
our Author and Book_Author tables as an example:
mysql> SELECT First_Name, Last_Name, ISBN

-> FROM Author INNER JOIN Book_Author ON Author.Author_ID = Book_Author.Author_ID;
+------------+-----------+------------+
| First_Name | Last_Name | ISBN |
+------------+-----------+------------+
| Chad | Russell | 1590593324 |
| Jon | Stephens | 1590593324 |
+------------+-----------+------------+
2 rows in set (0.05 sec)
The third author in the Author table is missing because there are no corresponding rows in the
Book_Author table. When we need at least one row in the result set for every row in a given table,
regardless of matching rows, we use an OUTER JOIN query.
There are three variations of the OUTER JOIN syntax: LEFT OUTER JOIN, RIGHT OUTER JOIN and
FULL OUTER JOIN. The syntax used determines which table will be fully represented. A LEFT OUTER
JOIN returns one row for each row in the table specified on the left side of the LEFT OUTER JOIN
clause. The opposite is true for the RIGHT OUTER JOIN clause. A FULL OUTER JOIN returns one row
for each row in both tables.
In each case, a row of NULL values is substituted when a matching row is not present. The following is an
example of a LEFT OUTER JOIN:
mysql> SELECT First_Name, Last_Name, ISBN

-> FROM Author LEFT OUTER JOIN Book_Author ON Author.Author_ID =
Book_Author.Author_ID;
+------------+-----------+------------+
| First_Name | Last_Name | ISBN |
+------------+-----------+------------+

| Chad | Russell | 1590593324 |
| Jon | Stephens | 1590593324 |
| Mike | Hillyer | NULL |
+------------+-----------+------------+
3 rows in set (0.00 sec)
The third author is returned in this example, with a NULL value for the ISBN column, indicating that
there are no matching rows in the Book_Author table.
Conclusion
Through the process of database normalization we bring our schema's tables into conformance with
progressive normal forms. As a result our tables each represent a single entity (a book, an author, a
subject, etc) and we benefit from decreased redundancy, fewer anomalies and improved efficiency.

SQL
 SQL is a standard language for accessing databases.

 Our SQL tutorial will teach you how to use SQL to access and manipulate data in:
 MySQL, SQL Server, Access, Oracle, Sybase, DB2, and other database systems.

What is SQL?
 SQL stands for Structured Query Language
 SQL lets you access and manipulate databases
 SQL is an ANSI (American National Standards Institute) standard
CREATING A TABE SQL
SYNTAX
CREATE TABLE table_name

(
column_name1 data_type,
....
)
EXAMPLE (Verified in Access)
CREATE TABLE Persons

(
P_Id int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
N.B. The P_Id column is of type int and will hold a number. The LastName, FirstName, Address, and City
columns are of type varchar with a maximum length of 255 characters.

CREATE A TABLE (with Primary Key)
CREATE TABLE Persons

(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
PRIMARY KEY (P_Id)
)
SQL can be divided into two parts: The Data Manipulation Language (DML) and the Data Definition
Language (DDL).
The query and update commands form the DML part of SQL:
 SELECT - extracts data from a database
 UPDATE - updates data in a database
 DELETE - deletes data from a database
 INSERT INTO - inserts new data into a database
The DDL part of SQL permits database tables to be created or deleted. It also defines indexes (keys),
specifies links between tables, and imposes constraints between tables. The most important DDL
statements in SQL are:
 CREATE DATABASE - creates a new database
 ALTER DATABASE - modifies a database
 CREATE TABLE - creates a new table
 ALTER TABLE - modifies a table
 DROP TABLE - deletes a table
 CREATE INDEX - creates an index (search key)

DROP INDEX - deletes an index
INSERT DATA INTO TABE SQL
SYNTAX
INSERT INTO table_name
VALUES (value1, value2, value3,...)
OR
INSERT INTO “Table_Name” (Column1, Column2, Column3)
VALUES (Value1, Value2, Value3)
N.B. words go in single quotations
INSERT INTO Persons ( P_Id, LastName, FirstName, Address, City )
VALUES (1000, 'Gordon', 'Jamiel', 'Moneague', 'St. Ann');
INSERT INTO Persons (P_Id, LastName, FirstName, Address, City)
VALUES (1001, 'Thomspon', 'Keneisha', 'Marlborough', 'Manchester');
VALUES (1002, 'Archibald', 'Delano', 'Poverty', 'Kingston');
VALUES (1003, 'Nembhard', 'Natalee', 'Lucea', 'Hanover');

UPDATE DATA IN SQL
SYNTAX
UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value
UPDATE Persons
SET LastName = 'Williams'
WHERE FirstName='Jamiel';
DELETE DATA IN SQL
SYNTAX
DELETE *
FROM “Table name”
WHERE Columm1='value';

DELETE *
FROM Persons
WHERE FirstName='Jamiel'

SELECT DATA IN SQL
This will become tricky as there are many ways to utilize the “select” function
SYNTAX
SELECT column_name(s)
FROM table_name
OR
SELECT * FROM table_name
OR
SELECT column_name(s)
FROM table_name
WHERE column_name(s)=value
EXAMPLES
SELECT *
FROM Persons
WHERE P_Id=1000; (Verified in Access)
SELECT LastName
FROM Persons
WHERE FirstName='Jamiel' (Verified in Access)
N.B Literal Words need single quotes around them, numbers DO NOT
SELECT * FROM Persons (Verified in Access)
WHERE FirstName='Jamiel’
AND LastName='Williams'  Use Williams if previous UPDATE query was run
N.B Using the words “AND” and “OR”, one can dictate more than one columns and criteria

SELECT * FROM Persons
WHERE FirstName='Jamiel'
OR P_Id=1000 (Verified in Access)

CAPE
Information Technology
Syllabus Focus: Unit 2 Module 2 Content 1 - 2
Specific Objective 1: Explain reasons for the use of IT tools

Content: For example: Enhanced speed, accuracy, reliability, efficiency, flexibility, communication,
presentation of information, integration of processes, decision making, storage and retrieval of large
volumes of data; manageability of task pressure from clients competitors and suppliers.
What are Information Technology Tools?

Information Technology tool is a program or software that is used to do work or perform a task.
Sometimes, people will call any software a "tool." For example, the phrase, "there aren't any tools to do
that job" means that no application is available to perform the required processing.
Benefits (Reason for the use) of Information Technology

 Enhance speed – eg. Information systems can perform calculations or process paperwork much
faster than people.
 Increase accuracy – eg. When calculation is done it can give one a more accurate figure than
human.
 It is more reliable- can get the same result using the same input all the time.
 It is more efficient- books compare to hard drive
 It is more flexible
 Enhance presentation of information
 Assist in a more objective decision making.
 It enable storage and retrieval of large volumes of data
 Allow for multitasking.
 Information systems can help companies learn more about the purchase patterns and preferences of
their customers.
 Information systems provide new efficiencies through services such as automated teller machines
(ATMs), telephone systems, or computer-controlled airplanes and air terminals.
 Information systems have made possible new medical advances in surgery, radiology and patient
monitoring
 Better communication - The internet distributes information instantly to millions of people across
the world.
Negative (Problems) Impact of Information System

 By automating activities that were previously performed by people, information systems may
eliminate jobs.
 Information systems may allow organizations to collect personal details about people that violate
their privacy.
 Information systems are used in so many aspects of everyday life that systems outages can cause
shutdown of businesses or transportation services, paralyzing communities.
 Heavy users of information systems may suffer repetitive stress injury, techno stress, and other
health problems.
 The internet can be used to distribute illegal copies of software, books, articles, and other
intellectual property.
 Can be expensive to set information system.
 Sometimes there are issue with training personnel to use system

 Information system rapidly changing therefore increase therefore there is problems with
obsolescence.
 There might be resistance to changes.
Specific Objective 1: Explain the function and uses of the major types of software tools
Content: For example types of software packages and their function and uses. Including financial
packages, Software Development Tools, Statistical Tools, Word Processors, Spreadsheets, Presentation
Tools and Database Management Tools, Desktop publishing, graphics and personal information
management.
Productivity tools
Within the category of general purpose software is a set of application tools known as productivity tools.
These software tools were originally designed for use in businesses. The most popular productivity tools
that are available on the market today are
 Word processing – produce documents such as letters, memos, newsletter and reports
 Spreadsheet – process numerical data and formulae to produce tabulated information or graphs.
Also perform what is scenarios
 Desktop publishing – combine text and graphics such as photographs and line drawings to create
visually appealing, high quality pages for books, magazines etc. Example of DTP include
PageMker, Ms. Publisher, Pageplus
 Presentation packages and DBMS – prepare great looking visual such as slide show for
presentation, especially to large audience.
Other software tools include:
 Financial or accounting packages – track income and expenses and produce financial reports and
statements. Examples of this type of software include ACPAC and Quicken
 Software development tools
 Statistical tools – perform statistical function of variance, regression, correlation and distribution.

Examples of this type of software include SPSS, RATS
Spreadsheet
A spreadsheet is a program that is designed to create tables and financial schedules by allowing you to enter
data into rows and columns arranged as a grid.
Dan Bricklin and Bob Frankson invented the first practical spreadsheet for microcomputers. They called it
VisiCalc. It was released for the Apple II computer in 1979. Prior to this, spreadsheets had to be manually
re-calculated each time one cell was changed. With Visicalc, work that might take a week could be done
in minutes. Several years after VisiCalc's release, the inventors sold the program to Lotus Development
Corporation, who renamed it Lotus 1-2-3 and upgraded the software to run on an IBM-PC or compatible
computer.
Function
Spreadsheets can do more than perform simple arithmetic calculations. A spreadsheet can translate
complicated data and reports into a combination of numbers and graphs. Modern versions include an
extensive list of financial calculators, such as interest calculations, loan amortization, even calculations for
Treasury bill rates. Statistical functions from common calculations (Chi Square, Pearson Coefficient of
Correlation and Standard Deviation) to abstruse functions like the Hypergeometric Distribution and the
Poisson distribution return needed values with no pain. There are at least 100 of these formulas included
in contemporary spreadsheets.
Features
Spreadsheets can function as basic databases. By inserting data and number in different columns, the
results can be sorted, searched or filtered. What-if analysis can run using data from cells in a spreadsheet.
There are mathematical and trigonometric functions as well. A user can create a simplified mailing list in
a spreadsheet by entering names and addresses in individual columns. There are even word processing
features offering control over fonts, bold or italic typeface, size, color and page formatting.
Warning

Because of the complexity of how a spreadsheet works and the potential to develop complicated and
interrelated calculations, the potential for error increases exponentially with the size of the spreadsheet.
Audit controls are limited and often what is possible in this area is under-utilized. All too often not enough
planning goes into the development of spreadsheets, particularly when designed for other users. Because it
is so easy to change values in the spreadsheet, easy mistakes have unintended consequences.
This package allows the user to enter data and to perform calculations on them. A typical use might be to
keep simple accounts. Spreadsheet are used primarily by persons who work with numbers. They are ideally
suited for fats, accurate number crunching and forecasting.
Examples of spreadsheet on the market are Ms. Excel, Lotus 1-2-3 and Quattro Pro.
Spreadsheet hide mathematical formulae “under” the numbers, so that some of the numbers that you are
seeing on the screen are actually generated by the underlying formulae.
Built in functions allow you to quickly and easily create complex formulae. The capabilities to recalculate
complex tables of number makes spreadsheets ideal for what if scenarios.
You can create a variety of graphs and charts by using the program’s built in wizard.
The data are organised into cells that can contain data or formulae. The formulae can use numbers or the
contents of other cells to perform the calculations. Typical features of a spreadsheet package are:
• Enter numerical data or text.
• Enter formulae that contain numerical operations.
• Functions to perform mathematical or statistical calculations.
• Sorting of data into sequence.
• Charts of various types.
• Automatically perform a series of operations using a macro.
Importing and exporting capabilities
 Electronic spreadsheets allow for calculation across multiple spreadsheets whereas paper spreadsheets
make this task very difficult and time consuming.
Electronic spreadsheets can be sent to numerous recipients via email and shared across different locations
in a matter of minutes. Paper spreadsheets need to be faxed or mailed; this would take more time to
distribute. Additionally, parts of the spreadsheets can be locked to prevent data from being modified.

What Are the Main Parts of a Spreadsheet?
A spreadsheet is a grid, like a paper ledger, made up of columns and rows. But unlike a paper ledger,
spreadsheets are on the computer and have more components and functions.
Row
1. The horizontal strips of the spreadsheet are called rows. They are identified by number at
the far left hand side of the spreadsheet. This is called the row header.
Columns
2. The vertical strips of the spreadsheet are columns. They are identified by letters across the
top of the spreadsheet. This is called the column header.
Cell
3. The point where the rows and columns intersect is called a cell. To activate a cell, simply
click on it.
Formula Bar
4. Data or formulas can be entered directly into the active cell or entered into the formula
bar. If you click on a cell that already has data in it, the formula used will show in this bar.
Name Box
5. The name box is located at the upper left corner before the formula bar and lists the
location of the active cell you are working on.
Shortcomings of spreadsheet
While spreadsheets are a great step forward in quantitative modeling, they have deficiencies. At the level
of overall user benefits, spreadsheets have four main shortcomings.

 Spreadsheets have significant reliability problems. Research studies estimate that roughly 94% of
spreadsheets deployed in the field contain errors, and 5.2% of cells in unaudited spreadsheets
contain errors
 The practical expressiveness of spreadsheets is limited. Several factors contribute to this limitation.
Implementing a complex model requires implementing detailed layouts, cell-at-a-time. Authors
have difficulty remembering the meanings of hundreds or thousands of cell addresses that appear
in formulas
 Collaboration in authoring spreadsheet formulas is difficult because such collaboration must occur
at the level of cells and cell addresses. By comparison, programming languages aggregate cells
with similar meaning into indexed variables with names that indicate meaning. Although some
spreadsheets have good collaboration features, authoring at the level of cells and cell formulas
remains a significant obstacle to collaboration in authoring spreadsheet models. On the other hand,
many people collaborate on entering numerical data and many people can use the same
spreadsheet.
 Productivity of spreadsheet modelers is reduced by the cell-level focus of spreadsheets. Even

conceptually simple changes in spreadsheets (such as changing starting or ending time or time
grain, adding new members or a level of hierarchy to a dimension, or changing one conceptual
formula that is represented as hundreds of cell formulas) often require large numbers of manual
cell-level operations (such as inserting or deleting cells/rows/columns, editing and copying
formulas, re-laying out worksheets). Each of these manual corrections increases the risk of
introducing further mistakes.
These four deficiencies in high-level benefits have deeper causes that, ironically, flow directly from the
signature strength of spreadsheets (that they capture the structure of models in terms of WYSIWYG sheet
layout for authors and report users).
 Spreadsheets capture model logic in terms of sheet layout, especially contiguous layout of cells in
a table. Spreadsheets have weak or nonexistent methods to capture higher level structures such as
named variables, segmentation dimensions, and time series.
 Formulas are subordinated to the cell layout. This forces the sheet layout to carry the structure of
the model, not variables and formulas that relate variables. This also causes a large proliferation of
cells, formulas and cell-level tasks even when only a few basic concepts are involved in a model.
This forces authors to think and work at the level of cells instead of at the level of the natural
concepts and structures of the model.
 Formulas expressed in terms of cell addresses are hard to keep straight and hard to audit. Research
shows that spreadsheet auditors who check numerical results and cell formulas find no more errors
than auditors who only check numerical results

 Proliferation of error-prone manual cell-level operations contributes to all four of the high-level
problems listed above.
Other problems associated with spreadsheets include:
Some sources advocate the use of specialized software instead of spreadsheets for some applications
(budgeting, statistics)
 Many spreadsheet software products, such as Microsoft Excel (versions prior to 2007) and
OpenOffice.org Calc have a capacity limit of 65,536 rows by 256 columns. This can present a
problem for people using very large datasets, and may result in lost data.
 Lack of auditing and revision control. This makes it difficult to determine who changed what and
when. This can cause problems with regulatory compliance. Lack of revision control greatly
increases the risk of errors due the inability to track, isolate and test changes made to a document.
 Lack of security. Generally, if one has permission to open a spreadsheet, one has permission to
modify any part of it. This, combined with the lack of auditing above, can make it easy for
someone to commit fraud.
 Because they are loosely structured, it is easy for someone to introduce an error, either accidentally
or intentionally, by entering information in the wrong place or expressing dependencies among
cells (such as in a formula) incorrectly.
 The results of a formula (example "=A1*B1") applies only to a single cell (that is, the cell the
formula is actually located in - in this case perhaps C1), even though it can "extract" data from
many other cells, and even real time dates and actual times. This means that to cause a similar
calculation on an array of cells, an almost identical formula (but residing in its own "output" cell)
must be repeated for each row of the "input" array. This differs from a "formula" in a conventional
computer program which would typically have one calculation which would then apply to all of
the input in turn. With current spreadsheets, this forced repetition of near identical formulas can
have detrimental consequences from a quality assurance standpoint and is often the cause of many
spreadsheet errors. Some spreadsheets have array formulas to address this issue.
 Trying to manage the sheer volume of spreadsheets which sometimes exists within an organization
without proper security, audit trails, the unintentional introduction of errors and other items listed
above can become overwhelming.
While there are built-in and third-party tools for desktop spreadsheet applications that address some of
these shortcomings, awareness and use of these is generally low. A good example of this is that 55% of
Capital market professionals "don't know" how their spreadsheets are audited; only 6% invest in a third-
party solution
Database
This package stores and manipulates large quantities of data. It allows the user to enter the data and
provides reports that allow the data to be viewed in different forms. Most modern database packages
store the data in tables and allow the data in different tables to be related in various ways. Typical
features of a database package are:
• Forms to enter the data.
• Queries that allow the data to be selected and sorted in various ways.
• Tabular reports.

Software development tool
A programming tool or software development tool is a program or application that software developers
use to create, debug, maintain, or otherwise support other programs and applications. The term usually
refers to relatively simple programs that can be combined together to accomplish a task, much as one
might use multiple hand tools to fix a physical object.
History
The history of software tools began with the first computers in the early 1950s that used linkers, loaders,
and control programs. Tools became famous with Unix in the early 1970s with tools like grep, awk and
make that were meant to be combined flexibly with pipes. The term "software tools" came from the book
of the same name by Brian Kernighan and P. J. Plauger.
Tools were originally simple and light weight. As some tools have been maintained, they have been
integrated into more powerful integrated development environments (IDEs). These environments
consolidate functionality into one place, sometimes increasing simplicity and productivity, other times
sacrificing flexibility and extensibility. The workflow of IDEs is routinely contrasted with alternative
approaches, such as the use of Unix shell tools with text editors like Vim and Emacs.
The distinction between tools and applications is murky. For example, developers use simple databases
(such as a file containing a list of important values) all the time as tools. However a full-blown database is
usually thought of as an application in its own right.
For many years, computer-assisted software engineering (CASE) tools were sought after. Successful tools
have proven elusive. In one sense, CASE tools emphasized design and architecture support, such as for
UML. But the most successful of these tools are IDEs.
The ability to use a variety of tools productively is one hallmark of a skilled software engineer.
Categories
Software development tools can be roughly divided into the following categories:
 performance analysis tools

 debugging tools
 static analysis and formal verification tools
 correctness checking tools
 memory usage tools
 application build tools
 integrated development environment

Word processor and Desk Top Publishing
Both packages assist you in creating documents; but end result and the way you get there can be quite
different. WP is designed for textual document processing i.e you first enter the text, then added images,
border and shading around the text. .
In DTP you first task is to determine the layout followed by the text and graphics. The page layout
capabilities, combined with its precision make the DTP ideal for professionally printed documents such as
books and magazines.
Spreadsheet and DBMS

Both packages allow you to work with data in rows and columns. The SS uses worksheets and the DBMS
uses tables, which consist also of rows and columns. Both packages are also used for storing and
analyzing data.
SS give greater flexibility in building formulae, performing what if analysis, generating charts and graphs,
and in the manipulation of rows and columns of data. DBMS are ideal for data entry, building queries,
generating formatted reports and developing information systems.

Word processor
This package allows the user to produce documents of various types. It can produce reports, letters, essays,
and so on. Modern word processors have all the features necessary to produce newspapers and magazines.
Typical features of a word processor are:
• Type and correct text, delete text, move text.
• Produce various layouts, set margins, organise into columns, and so on.
• Set fonts and special effects, italic, bold, underline.
• Insert pictures and diagrams.
• Create indexes and table of contents.
• Find and replace text.
• Check spelling and grammar.
• Automatically produce personalised letters (mail merge).
• Charts of various types.
Wordprocessing packages are the most popular type of computer software. For many people a word
processing package is the only software they use. It is hard to think of a single job where some use for
word-processing d not be found.
Many people now type their own documents directly into a computer rather than give them to a typist. This
saves time, if people can type quickly. Because the text is stored in memory once it is typed, people can
alter their text before finally saving and print.
Now, even printing letters is becoming out-of-date because electronic mail is often to send word processed
documents in electronic form from one place to another. There is no need for an envelope or a stamp and the
recipient may store the letter on disk for future reference, thus saving valuable storage space.
Hardware and software for word processing

The most common hardware arrangement is:
• a microcomputer
• a high quality printer (a laser or ink-jet printer)
a mouse (necessary if the word processor I uses Windows or the Macintosh operating r/stem)
• a keyboard.
Apart from the operating system, the only Software needed is a word-processing package. This can be a
separate package which just does word-processing or part of an integrated package, where wordprocessing is
one part of the complete package.
You can also use special voice recognition software to dictate straight into a word-processor.

The advantages of word processing
Word processing has several advantages compared with typing.
1. Much more professional results can be obtained by everyone and not just those who are experienced typists,
because typing mistakes are easily altered on the screen before printing out.
2. Fewer resources are used, provided that material is carefully proofread on the screen before finally printing.
Since we can edit and correct mistakes before printing, this reduces the amount of paper used. Most large
companies now send all their internal letters and memos using electronic mail. The widespread use of
electronic mail will help to conserve valuable resources.
3. More people are able to produce their own documents rather than pass them to someone else to type. This
can save both time and money.
4. More word processors are becoming like simple desktop publishing packages. These extra facilities enable
people to produce notices, posters, tickets, etc. with very little training.
What can you do with word processing software?

Word-processing software is software that is able to store, edit and manipulate text. All word-processing
software allows you to enter text then edit and rearrange it before storing it and printing it out. Some word
processor are 'WYSIWYG' ('what you see is what you get'). This means that you are able to see on the
screen exactly what the page will look like when it is printed out.
Most word processors allow you to change the fonts (type styles and sizes).
Spell checkers
Nearly all word processors have a dictionary against which the words in a document canj compared to check
their spelling. Most allow you to add words to the dictionary which is useful if you use special terms in
subjects such as law or medicine. It is important to note that spell checking a document will not remove all
the errors. For instance, if you intended to type 'the' and typed 'he' instead, then the spell checker will not
detect this since 'he' as a word is spelt correctly. After using a spell checker it is still necessary to of read a
document.
Thesaurus
A thesaurus is useful for creative writing (perhaps for English GCSE coursework). It allows you to highlight
a word in a document and the computer lists words with similar meanings (called synonyms)
Mail merge
Merging involves combining a list of say, names and addresses, with a typed letter, that a series of similar
letters is produced, each addressed to a different person. The list is either created using the word processor
or by importing data from a database of names and addresses. The letter is typed, using the wordprocessor,
with blanks where the data the list is to be inserted.
Indexing

It allows you to highlight words that you would like to use in an index. The wordprocessor keeps a record
of the words, and creates an index when instructed to.
Macros
Macros allow you automatically to produce a sequence of keystrokes so that, for example, you can just press
one key or a combination of keys and have your name and address printed at the top of the page. You can
also insert dates just by pressing a couple of keys. Macros are very useful for things that need to be done
repetitively.
Grammar checkers
Some of the more sophisticated word processors have a feature called a grammar checker. This is useful if
your English is not so good. If the word processor you use does not have this facility, then you can buy a
separate grammar checking package to use with it. Figure 28.5 shows a grammar checker being used.
Because of the complexities of the English language, grammar checkers do have their limitations. As yet
they find only a few faults and do tend to provide an incorrect analysis of the grammar of writing. You need
to exercise care when using them.
Other features found in most word-processing packages

Here are just some of the many features you will find as part of a professional word-processing package such
as Microsoft Word.
Print preview - using the print preview you can see the whole page (as a smaller version) before you print
it out. This can save time and paper.
Templates and wizards - here the framework of the document (fonts, sizes, headings, etc.) is laid out for
you. You do not need to worry about the layout. All you have to do is supply the text.
Save in different file formats - this means you can transfer your wordprocessed material to almost any other
package.
Word-count - this counts the number of words in a document or part of a document. This is useful if you
have to write essays or course work of a certain number of words.
On-line help - you can search for help by entering a keyword or by selecting a topic from a list. Good on-
line help is particularly important as a lot of software is supplied without manuals.
Drawing tools - these tools allow you to produce arrows, shapes, boxes, etc. without having to use a special
graphics package.
Zooming in and out - particularly useful if you have difficulty seeing small text on the screen.
Tables - rather than use a spreadsheet you can use the tables function in the wordprocessor. If there are
numbers in the tables you can even perform simple calculations on them.

Ability to import clip art/other files - documents can be made much more attractive by importing pictures
(clip art) or photographs from pre-stored files.
Choosing a word processing package

Ideally you should choose the software you want to run and then choose hardware to suit software.
However, since many people using existing hardware, their choice of word processing package may, to
some extent be dictated by the hardware.\
Most people's favourite word processor is the one they are used to, but there are many new and useful
features in the newer packages to tempt you. How do you decide which ONE TO USE? You may like to ask
yourself the following questions to help you decide.
1. If you are a fast typist you would look for a wordprocessor that could keep up with
your typing speed. Windows word-processors can be slow on older computers, so a DOS based
package might be better.
2. you would like to do simple desktop publishing, then many of the Windows,
Macintosh or Windows 98-based word processors would be suitable. Many of have some of the
features you used find only in DTP packages.
3) If you need to send out a number of letters to lots of different people, it would be best to go for a word
processor with a mail merge facility. Mail merging allows you to create standard letters and link them
to a database of names and addresses so that you can make it appear that the letters are all tailor made.
4) If you use a variety of word processors, then you should check whether you can transfer the files
between the different packages. With some it is easy, but with others you may find that one word
processor is unable to read files from a different word processor.

Intro to HTML
HTML is a language for describing web pages.

 HTML stands for Hyper Text Markup Language
 HTML is not a programming language, it is a markup language
 HTML uses markup tags to describe web pages
HTML markup tags are usually called HTML tags

 HTML tags are keywords surrounded by angle brackets like <html>
 HTML tags normally come in pairs like <b> and </b>
 The first tag in a pair is the start tag, the second tag is the end tag
 Start and end tags are also called opening tags and closing tags
In this tutorial we use a plain text editor (like Notepad) to edit HTML. I believe this is the best way to
learn HTML.
You save the document ALWAYS with whatever name you chooses ending with “.html”
HOW TO MAKE A WEB PAGE

To make a web page you can type the following codes using note pad or any text editor of your choice
<html>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>
The code <html> </html> tells us that you want to create a web page. Please note a webpage must have
an open and a close tag.
The code <body></body> defines what will be shown on the web page. Please note the body must have an
opening and a closed tag
HTML headings are defined with the <h1> to </h1> code. Heading is what we call the title. You can
change the number to different sizes. <h1> is the largest while <h6> is the smallest. All header tags should
have an opening and a closed tag.
HTML paragraphs are defined with the <p> tag.

Whenever you want to start a new paragraph <p> is what you can use. Paragraph tag should have an
opening and a closed tag.
HOW TO ALIGN THE TEXT
This is called text aligning where you can choose for the words to be left, right or centre. We can do this
from within the “h” tags. For instance if I wanted to centre the title I would use the following codes.

<h1 style="text-align:center">
Similarly if I wanted to centre the paragraph I would use the following codes.
<p1 style="text-align:center">
HOW TO GIVE THE PAGE SOME COLOUR
We use the following codes when we want to give the web page some life.
<body style="background-color:yellow">
We write the code within the body tag. Also noticed that the word colour is spelt the American way color
as that is the way the computer understands it.
HOW TO CHANGE THE STYLE OF THE FONT
We use the following codes to change the style of the writing. We do this in the <body> tag, the <p> tag
or the <h> wherever you would like to change the style of the writing.
<p style="font-family:courier new”>
<body style=”font-family:arial”>
<h1 style= “font-family:times new roman”>
Note that you must type the code “font-famly:” before the name of the font you would like to use.
HOW TO CHANGE THE COLOUR OF THE FONT
We use the following codes to change the colour of the writing. We do this in the <body> tag, the <p> tag
or the <h> wherever you would like to change the colour of the writing.
<p style="color:red”>
<body style=”color:blue”>
<h1 style= “color:green”>
Note that you must type the code “color:” before the name of the colour you would like to use. If you
want a particular paragraph to have a different colour you use the <p> tag or if you want the whole page
to have a particular colour you use the <body> tag and similarly if you want your title to have a different
colour you use the <h> tag.

HOW TO CHANGE THE FONT SIZE
We use the following codes to change the size of the words. We do this in the <body> tag or the <p>
wherever you would like to change the side of the words.
<p style="font-size:10px”>
<body style=”font-size:15px”>
Note that you must type the code “font-size:” before how big you would like your words to look. If you
want a particular paragraph to have a different size you use the <p> tag or if you want the whole page to
have a particular font size you use the <body> tag.
HOW TO MAKE A TABLE IN YOUR WEB PAGE
Tables are defined with the <table> tag. A table is divided into rows (with the <tr> tag), and each row is
divided into data cells (with the <td> tag). The letters td stands for "table data," which is the content of a
cell. A data cell can contain text, images, lists, paragraphs, forms, horizontal rules, tables, etc.
You use the following codes to make a table
<table>
<tr>
<th>Day</th>
<th>Subject</th>
</tr>
<tr>
<td>Monday</td>
<td>Information Technology</td>
</tr>
</table>
The table has an opening and a closed tag. The rows and columns are defined within these tags <table>
</table>
If you would like to make more rows and column simply copy the <tr> to </tr> code.
Notice in the code above there are some codes called <th></th>. Within this code is the table header or
what gives the title for each of the column.
HOW TO INSERT A PICTURE
A web page would not be a web page without pictures

To put pictures onto your webpage you write the following codes
<img src="mypicture.jpg"/>

You can place the image anywhere you prefer on your web page. However you must ensure that the
picture and the web page are in the same folder for the picture to show on the page.
The name of the image must be the same as that in the code.

Types of Web Pages
Portal
A web portal is most often a specially designed web site that brings information together from
diverse sources in a uniform way. Usually, each information source gets its dedicated area on the page for
displaying information
Example of Portal: Yahoo!
Justification: Yahoo! is an American multinational technology company. It operates a portal that provides
the latest news, entertainment, and sports information. It is important to note that the portal also gives
users access to other Yahoo services like Yahoo! Search, Yahoo Mail, Yahoo Maps, Yahoo Finance,
Yahoo Groups and Yahoo Messenger. For this reason, it can be classified as a suitable web portal
News
A news website is one that is created for the main purpose of providing accurate and up-to-date
news information from around the region and the world and gathering it together in one spot so the user
can see it. A news website can sometimes be the online version of a newspaper.
Example: BBC News (www.bbc.com/news)
Justification: The BBC News website is one of largest online based newspaper in the world and is the
most frequently accessed news website in the United Kingdom. The website contains international news
coverage, as well as British, entertainment, science, and political news. Many reports are accompanied by
audio and video from the BBC's television and radio news services, while the latest TV and radio bulletins
are also available to view or listen to on the site together with other current affairs programmes. Therefore,
it fits perfectly in the “news” category.

Blog
A blog (shortened version of “weblog”) is a discussion or informational website published on the
World Wide Web consisting of discrete, often informal diary-style text entries ("posts"). Blogs can also be
described as online diaries, journals, or editorials.
Example: Lifehacker
Justification: Lifehacker is a weblog about life hacks and software (a life hack refers to any shortcut or
trick that increases productivity in all walks in life). The blog posts cover a wide variety of topics that
discuss both software and life itself. Topics include Microsoft Windows, Mac, Linux programs, iOS and
Android, as well as general life tips and tricks. The staff of Lifehacker updates the site about 18 times per
weekdays, with reduced times on weekends. This in accordance with a blog’s characteristics of periodic
updates. Because of its nature to share stories, tricks and advice on a timely basis, Lifehacker can easily be
classified as a blog.
Social Networking
A social networking site an online platform that is used by people to build social networks or
social relations with other people who share similar personal or career interests, activities, backgrounds or
real-life connections. Depending on the social media platform, members may be able to contact any other
member. In other cases, members can contact anyone they have a connection to, and subsequently anyone
that contact has a connection to, and so on.
Example: Facebook (www.facebook.com)
Justification: Facebook is easily, and by far, “the biggest and most recognizable social media network on
the Internet, both in terms of total number of users and name recognition. Founded on February 4, 2004,
Facebook has within 12 years managed to accumulate more than 1.59 billion monthly active users and this
automatically makes it one of the best mediums for connecting people from all over the world with your
business.” (Maina, 2016). Facebook provides a plethora of options for individuals to connect to friends,

make new friends based on recommendations, see a friend’s status or update and chat online in real time.
This makes it the perfect example of a social networking site.
Personal
A personal website are World Wide Web pages created by an individual to contain content of a
personal nature rather than content pertaining to a company, organization or institution. Personal web
pages are primarily used for informative or entertainment purposes but can also be used for personal
career marketing.
Example: JoshJohnson (http://joshuajohnson.co.uk/)
Justification: Josh Johnson is a personal website created by a man of the same name. It is used to sell
himself as a very skilled web designer. Characteristic of a personal website, JoshJohnson contains
personal information of Johnson as well being used to market his personal career.
Business/Marketing
A business website is a collection webpages that are used to carry out some form of business
and/or marketing purposes. They are setup so that individuals can purchases a product, request a service or
market a product of their own. Businesses use these websites to expand their range of customers and carry
out their operations across the Internet.
Example: Amazon.com
Justification: Amazon is an American based electronic e-commerce and Business Company. It is the
largest Internet-based retailer in the world by total sales and market capitalization. Amazon more than
qualifies as a business website as it allows users to purchase a wide range of products from their stores by
the click of a button.

Documenting Information Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Documenting Information Systems

Uploaded by

Copyright:

Available Formats

DOCUMENTATION OF INFORMATION SYSTEMS

Data Flow Diagrams (DFDs)

Data Flow Diagrams – Diagram Notation

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 1

Diagrams used to represent a process

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 2

Diagram used to represent data flow

Diagram used to represent resource

Rules governing how the diagram components relate to each other

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 3

Context Diagram Example

Data Flow Diagrams – Level 1 Diagrams

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 4

Some additional guidelines for creating DFDs:

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 5

1. An Entity must be a noun (e.g. Member, Student, Customer etc.)

N.B. Always use one or more processes

Video Number 1 Video Details

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 6

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 7

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 8

Data Flow Diagrams – Practices Exercises

Required: Construct a Context Diagram (Level 0).

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 9

Specific Objective 9: Explain the concept of normalization;

explain the concept of Definition of normalisation; attribute redundancy and

Database Normalisation, sometimes referred to as Canonical Synthesis, is a technique for designing

Problems addressed by Normalisation

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 10

 The same information can be

Ideally, a relational database table should be designed in such

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 11

Referential Integrity in a Relational Database is consistency

Functional Dependency: Attribute B has a functional

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 12

First Normal Form

1. There's no top-to-bottom ordering to the rows.

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 13

Example 1: Domains and values

Suppose novice designer CUSTOMER

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 14

Example 3: Repeating groups within columns

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 15

CUSTOMER NAME CUSTOMER TELEPHONE NUMBER

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 16

The criteria for second normal form (2NF) are:

 The table must be in 1NF.

Consider a table describing employees' skills:

The table's only candidate key is {Employee, Skill}.

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 17

EMPLOYEES EMPLOYEES' SKILLS

Third Normal Form

Multiple candidate keys occur in the following table:

ELECTRIC TOOTHBRUSH MODELS

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 18

Third Normal Form

 The relation R (table) is in second normal form (2NF)

The only candidate key is {Tournament, Year}.

TOURNAMENT WINNERS PLAYER DATES OF BIRTH

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 19

An Introduction to Database Normalization

CAPE NOTES INFORMATION TECHNOLOGY Unit 2 20

Table 1. Two Books