BBA Notes 3rd Yr PDF

Computer Applications – BBA 3rd year SN
Computer Applications
System Analysis and Design:
System
A collection of components that work together to realize some objective forms a
system. Basically there are three major components in every system, namely input,
processing and output.
In a system the different components are connected with each other and they are
interdependent. For example, Human body represents a complete natural system.
We are also bound by many national systems such as political system, economic
system, educational system and so forth. The objectives of the system demand that
some output is produced as a result of processing the suitable inputs.
The term “system” originates from the Greek term syst¯ema, which means to
“place together.” Multiple business and engineering domains have definitions of a
system. This text defines a system as:
• System An integrated set of interoperable elements, each with explicitly

specified and bounded capabilities, working synergistically to perform value-
added processing to enable a User to satisfy mission-oriented operational
needs in a prescribed operating environment with a specified outcome and
probability of success.
Characteristics of a System
1. Organization: It says the Structure or order of built.

2. Interaction: Procedure in which the components interact.
3. Interdependence: How the modules in a systems are interdependent
4. Integration: How different modules are integrated to get a full system
5. Central Objective: How the system is made to achieve the central goal
and its performance towards the goal.
1
Elements of System Analysis
There are 4 basic elements of System analysis, they are
1. Outputs
2. Inputs: The essential elements of Inputs are
a) Accuracy of data
b) Timeliness
c) Proper format
d) Economy.
3. Files
4. Process
Super System and Sub System
A sub-system is a system that exists in another system. Its existence depends up on

the existence of its super-system. The sub-system contributes in the proper working
of the entire system. The proper functioning of the sub-systems ensures proper
functioning of the entire system. For example an Automobile system, it consists of
many sub-systems like acceleration system, fuel injection system, breaking system
etc., every sub-system is responsible for the proper functioning of the entire
automobile system. A super-system is a system which contains many sub-systems.
The super-system is responsible for monitoring the overall working of its sub-
systems. The super-system decides the constraints and resources to be put on its
sub-systems. For example, a Central government system, it is a super-system which
has under its control the state government systems which form the sub-systems.
Types of Systems
1. Physical or abstract systems

2. Open or closed systems
3. Deterministic or probabilistic
4. Man made systems
Formal systems – Organization representation
Informal systems – Employee based system
Computer based information systems – Computer handling business
applications. These
are collectively known as Computer Based Information systems (CBIS).
a. Transaction Processing System (TPS)

b. Management Information System (MIS)
c. Decision Support System (DSS)
d. Office Automation System (OAS)
Systems analysis is the interdisciplinary part of science, dealing with analysis of

sets of interacting entities, the systems, often prior to their automation as computer
systems, and the interactions within those systems. This field is closely related to
operations research. It is also "an explicit formal inquiry carried out to help
2
someone, referred to as the decision maker, identify a better course of action and
make a better decision than he might have otherwise made.
SYSTEM LIFE CYCLE
System life cycle is an organizational process of developing and maintaining

systems. It helps in establishing a system project plan, because it gives
overall list of processes and sub-processes required developing a system.
System development life cycle means combination of various activities. In

other words we can say that various activities put together are referred as
system development life cycle. In the System Analysis and Design
terminology, the system development life cycle means software
development life cycle.
Following are the different phases of software development cycle:
• System study
• Feasibility study
• System analysis
• System design
• Coding and Implementation
• Testing
• User Implementation
• Maintenance
The different phases of software development life cycle is shown in Fig.29.1
3
Fig. 29.1 Different phases of Software development Life Cycle
29.5 PHASES OF SYSTEM DEVELOPMENT LIFE CYCLE
Let us now describe the different phases and the related activities of system
development life cycle in detail.
(a) System Study
System study is the first stage of system development life cycle. This gives a
clear picture of what actually the physical system is? In practice, the system
study is done in two phases. In the first phase, the preliminary survey of the
system is done which helps in identifying the scope of the system. The
second phase of the system study is more detailed and in-depth study in
which the identification of user’s requirement and the limitations and
problems of the present system are studied. After completing the system
study, a system proposal is prepared by the System Analyst (who studies the
system) and placed before the user. The proposed system contains the
findings of the present system and recommendations to overcome the
limitations and problems of the present system in the light of the user’s
requirements.
To describe the system study phase more analytically, we would say that
system study phase passes through the following steps:
• problem identification and project initiation

• background analysis
• inference or findings
(b) Feasibility Study
On the basis of result of the initial study, feasibility study takes place. The
feasibility study is basically the test of the proposed system in the light of its
workability, meeting user’s requirements, effective use of resources and .of
course, the cost effectiveness. The main goal of feasibility study is not to
solve the problem but to achieve the scope. In the process of feasibility
study, the cost and benefits are estimated with greater accuracy.
There are 3 types of feasibility

1. Economical
2. Operational
3. Technical
(c) System Analysis

4
Assuming that a new system is to be developed, the next phase is system

analysis. Analysis involved a detailed study of the current system, leading
to specifications of a new system. Analysis is a detailed study of various
operations performed by a system and their relationships within and outside
the system. During analysis, data are collected on the available files,
decision points and transactions handled by the present system. Interviews,
on-site observation and questionnaire are the tools used for system analysis.
Using the following steps it becomes easy to draw the exact boundary of the
new system under consideration:
• Keeping in view the problems and new requirements

• Workout the pros and cons including new areas of the system
All procedures, requirements must be analyzed and documented in the form

of detailed data flow diagrams (DFDs), data dictionary, logical data
structures and miniature specifications. System Analysis also includes sub-
dividing of complex process involving the entire system, identification of
data store and manual processes.
The main points to be discussed in system analysis are:
• Specification of what the new system is to accomplish based on the

user requirements.
• Functional hierarchy showing the functions to be performed by the new
system and their relationship with each other.
• Function network which are similar to function hierarchy but they
highlight the those functions which are common to more than one
procedure.
• List of attributes of the entities - these are the data items which need
to be held about each entity (record)
(d) System Design
Based on the user requirements and the detailed analysis of a new system,
the new system must be designed. This is the phase of system designing.
It is a most crucial phase in the development of a system. Normally, the
design proceeds in two stages :
• preliminary or general design

• Structure or detailed design
Preliminary or general design: In the preliminary or general design, the

features of the new system are specified. The costs of implementing these
features and the benefits to be derived are estimated. If the project is still
considered to be feasible, we move to the detailed design stage.
5
Structure or Detailed design: In the detailed design stage, computer oriented

work begins in earnest. At this stage, the design of the system becomes
more structured. Structure design is a blue print of a computer system
solution to a given problem having the same components and inter-
relationship among the same components as the original problem. Input,
output and processing specifications are drawn up in detail. In the design
stage, the programming language and the platform in which the new system
will run are also decided.
There are several tools and techniques used for designing. These tools and
techniques are:
• Flowchart
• Data flow diagram (DFDs)
• Data dictionary
• Structured English
• Decision table
• Decision tree
Each of the above tools for designing will be discussed in detailed in the next
lesson.
(e) Coding and Implementation
After designing the new system, the whole system is required to be

converted into computer understanding language. Coding the new system
into computer programming language does this. It is an important stage
where the defined procedure are transformed into control specifications by
the help of a computer language. This is also called the programming phase
in which the programmer converts the program specifications into computer
instructions, which we refer as programs. The programs coordinate the data
movements and control the entire process in a system.
It is generally felt that the programs must be modular in nature. This helps in
fast development, maintenance and future change, if required.
(f) Testing
Before actually implementing the new system into operations, a test run of
the system is done removing all the bugs, if any. It is an important phase of a
successful system. After codifying the whole programs of the system, a test
plan should be developed and run on a given set of test data. The output of
the test run should match the expected results.
Using the test data following test run are carried out:
6
• Unit test
• System test
Unit test: When the programs have been coded and compiled and brought to
working conditions, they must be individually tested with the prepared test
data. Any undesirable happening must be noted and debugged (error
corrections).
System Test: After carrying out the unit test for each of the programs of the
system and when errors are removed, then system test is done. At this stage
the test is done on actual data. The complete system is executed on the
actual data. At each stage of the execution, the results or output of the
system is analyzed. During the result analysis, it may be found that the
outputs are not matching the expected out of the system. In such case, the
errors in the particular programs are identified and are fixed and further
tested for the expected output.
When it is ensured that the system is running error-free, the users are called
with their own actual data so that the system could be shown running as per
their requirements.
(g) User Implementation
After having the user acceptance of the new system developed, the
implementation phase begins. Implementation is the stage of a project
during which theory is turned into practice. During this phase, all the
programs of the system are loaded onto the user's computer. After loading
the system, training of the users starts. Main topics of such type of training
are:
• How to execute the package

• How to enter the data
• How to process the data (processing details)
• How to take out the reports
After the users are trained about the computerized system, manual working
has to shift from manual to computerized working. The following two
strategies are followed for running the system:
i. Parallel run: In such run for a certain defined period, both the
systems i.e. computerized and manual are executed in parallel. This
strategy is helpful because of the following:
o Manual results can be compared with the results of the

computerized system.
7
o Failure of the computerized system at the early stage, does not

affect the working of the organization, because the manual
system continues to work, as it used to do.
i. Pilot run: In this type of run, the new system is installed in parts.
Some part of the new system is installed first and executed
successfully for considerable time period. When the results are found
satisfactory then only other parts are implemented. This strategy
builds the confidence and the errors are traced easily.
(h) Maintenance
Maintenance is necessary to eliminate errors in the system during its

working life and to tune the system to any variations in its working
environment. It has been seen that there are always some errors found in
the system that must be noted and corrected. It also means the review of
the system from time to time. The review of the system is done for:
• knowing the full capabilities of the system

• knowing the required changes or the additional requirements
• studying the performance
If a major change to a system is needed, a new project may have to be set

up to carry out the change. The new project will then proceed through all the
above life cycle phases.
Requirements Gathering
Requirements analysis in systems engineering and software engineering,

encompasses those tasks that go into determining the needs or conditions to
meet for a new or altered product, taking account of the possibly conflicting
requirements of the various stakeholders, such as beneficiaries or users.
Requirements analysis is critical to the success of a development project.
Requirements must be actionable, measurable, testable, related to identified
business needs or opportunities, and defined to a level of detail sufficient for
system design. Requirements can be functional and non-functional.
Conceptually, requirements analysis includes three types of activity:
• Eliciting requirements: the task of communicating with customers and users

to determine what their requirements are. This is sometimes also called
requirements gathering.
• Analyzing requirements: determining whether the stated requirements are
unclear, incomplete, ambiguous, or contradictory, and then resolving these
issues.
8
• Recording requirements: Requirements might be documented in various

forms, such as natural-language documents, use cases, user stories, or
process specifications.
Requirements analysis can be a long and arduous process during which

many delicate psychological skills are involved. New systems change the
environment and relationships between people, so it is important to identify
all the stakeholders, take into account all their needs and ensure they
understand the implications of the new systems. Analysts can employ
several techniques to elicit the requirements from the customer. Historically,
this has included such things as holding interviews, or holding focus groups
(more aptly named in this context as requirements workshops) and creating
requirements lists. More modern techniques include prototyping, and use
cases. Where necessary, the analyst will employ a combination of these
methods to establish the exact requirements of the stakeholders, so that a
system that meets the business needs is produced.
Requirements engineering
Systematic requirements analysis is also known as requirements engineering. It is
sometimes referred to loosely by names such as requirements gathering,
requirements capture, or requirements specification. The term requirements
analysis can also be applied specifically to the analysis proper, as opposed to
elicitation or documentation of the requirements, for instance.
Developing an IT application is an investment. Since after developing that

application it provides the organization with profits. Profits can be monetary or in
the form of an improved working environment. However, it carries risks, because in
some cases an estimate can be wrong. And the project might not actually turn out
to be beneficial.
Cost benefit analysis helps to give management a picture of the costs, benefits and
risks. It usually involves comparing alternate investments.
Cost benefit determines the benefits and savings that are expected from the system
and compares them with the expected costs.
The cost of an information system involves the development cost and maintenance
cost. The development costs are one time investment whereas maintenance costs
are recurring. The development cost is basically the costs incurred during the
various stages of the system development.
Each phase of the life cycle has a cost. Some examples are :
• Personnel
• Equipment
• Supplies
• Overheads
• Consultants' fees
9
Cost and Benefit Categories
In performing Cost benefit analysis (CBA) it is important to identify cost and benefit
factors. Cost and benefits can be categorized into the following categories.
There are several cost factors/elements. These are hardware, personnel, facility,
operating, and supply costs.
In a broad sense the costs can be divided into two types
1. Development costs-
Development costs that are incurred during the development of the system
are one time investment.
• Wages
• Equipment
2. Operating costs,
e.g. , Wages
Supplies
Overheads
Another classification of the costs can be:
3. Hardware/software costs:
It includes the cost of purchasing or leasing of computers and its peripherals.

A software cost involves required software costs.
4. Personnel costs:
It is the money, spent on the people involved in the development of the

system. These expenditures include salaries, other benefits such as health
insurance, conveyance allowance, etc.
5. Facility costs:
Expenses incurred during the preparation of the physical site where the
system will be operational. These can be wiring, flooring, acoustics, lighting,
and air conditioning.
6. Operating costs:
Operating costs are the expenses required for the day to day running of the
system. This includes the maintenance of the system. That can be in the
10
form of maintaining the hardware or application programs or money paid to

professionals responsible for running or maintaining the system.
7. Supply costs:
These are variable costs that vary proportionately with the amount of use of
paper, ribbons, disks, and the like. These should be estimated and included
in the overall cost ofthe system.
Benefits
We can define benefit as

Profit or Benefit = Income - Costs
Benefits can be accrued by :
- Increasing income, or
- Decreasing costs, or
- both
The system will provide some benefits also. Benefits can be tangible or
intangible, direct or indirect. In cost benefit analysis, the first task is to
identify each benefit and assign a monetary value to it.
The two main benefits are improved performance and minimized processing
costs.
Further costs and benefits can be categorized as
Tangible or Intangible Costs and Benefits
Tangible cost and benefits can be measured. Hardware costs, salaries for
professionals, software cost are all tangible costs. They are identified and
measured.. The purchase of hardware or software, personnel training, and
employee salaries are example of tangible costs. Costs whose value cannot
be measured are referred as intangible costs. The cost of breakdown of an
online system during banking hours will cause the bank lose deposits.
Benefits are also tangible or intangible. For example, more customer

satisfaction, improved company status, etc are all intangible benefits.
Whereas improved response time, producing error free output such as
producing reports are all tangible benefits. Both tangible and intangible costs
and benefits should be considered in the evaluation process.
Direct or Indirect Costs and Benefits
11
From the cost accounting point of view, the costs are treated as either direct
or indirect. Direct costs are having rupee value associated with it. Direct
benefits are also attributable to a given project. For example, if the proposed
systems that can handle more transactions say 25% more than the present
system then it is direct benefit.
Indirect costs result from the operations that are not directly associated with
the system. Insurance, maintenance, heat, light, air conditioning are all
indirect costs.
Fixed or Variable Costs and Benefits
Some costs and benefits are fixed. Fixed costs don't change. Depreciation of
hardware, Insurance, etc are all fixed costs. Variable costs are incurred on
regular basis. Recurring period may be weekly or monthly depending upon
the system. They are proportional to the work volume and continue as long
as system is in operation.
Fixed benefits don't change. Variable benefits are realized on a regular basis.
Performing Cost Benefit Analysis (CBA)
Example:
Cost for the proposed system (figures in USD Thousands)
Benefit for the propose system
12
Profit = Benefits - Costs

= 300, 000 -154, 000
= USD 146, 000
Since we are gaining, this system is feasible.
Steps of CBA can briefly be described as:
• Estimate the development costs, operating costs and benefits

• Determine the life of the system
• When will the benefits start to accrue?
• When will the system become obsolete?
• Determine the interest rate
(this should reflect a realistic low risk investment rate.)
Select Evaluation Method

When all the financial data have been identified and broken down into cost
categories, the analyst selects a method for evaluation.
There are various analysis methods available. Some of them are following.
1. Present value analysis

2. Payback analysis
3. Net present value
4. Net benefit analysis
5. Cash-flow analysis
6. Break-even analysis
13
Data Base Management System

A database is a collection of related information. For example, a phone book
is a database of names, addresses and phone numbers.
A data file is a single disk file that stores related information on a hard disk
or floppy diskette. For example, a phone book database would be stored in a
single data file.
A Database Management System (DBMS) is a software tool that

facilitates creating, maintaining, and manipulating an information database.
A DBMS is repository of interrelated data along with a set of
functions to access and manipulate those data.
Data Manipulation involves

• Retrieval of data from database
• Insert new data into database
• Update existing data into database
• Deletion of data
Relational Database Software

Relational database software allows the user to work with several database
files at the same time and share information across the files. For example,
to implement an accounting database system, one requires relational
capabilities to link together information that is stored in the different files.
14
An example of relational database software would be Microsoft Access,

Oracle, Sybase and Paradox.
Flat-file Database Software

A flat-file database program allows the user to create many databases but
let’s him/her work with only one file at a time. Using a flat -file database
program, one can create simple applications such as mailing list databases
or personnel files.
Advantages of the database approach over traditional file-

processing systems
Following are some of the advantages of using a database over a traditional

file-processing system:
• Potential for enforcing standards.
• Flexibility.
• Reduced application development time.
• Availability of up-to-date information to all users.
• Economies of scale.
Benefits of a Relational Database
Following are some of the advantages of a relational database:

• Data can be easily accessed.
• Data can be shared.
• Data modeling can be flexibility.
• Data storage and redundancy can be reduced.
• Data inconsistency can be avoided.
• Data Integrity can be maintained.
• Standards can be enforced.
• Security restrictions can be applied.
• Independence between physical storage and logical data design can be
maintained.
• High-level data manipulation language (SQL) can be used to access
and manipulate data.
A Relational database stores data is tables. The data stored in a table is

organized into rows and columns. Each row in a table represents an
individual record and each column represents a field. A record is an
individual entry in the database. For example, each person’s name, address,
and phone number is a single record of information in a phone book. Where
as a "field" is a piece of information in a record. For example, you can divide
a person’s record in the phone book into fields for their last name, first
name, address, city and phone number.
15
ADVANTAGES OF DBMS
The DBMS (Database Management System) is preferred ever the conventional file
processing system due to the following advantages:
1. Controlling Data Redundancy - In the conventional file processing system,

every user group maintains its own files for handling its data files. This may lead to
• Duplication of same data in different files.

• Wastage of storage space, since duplicated data is stored.
• Errors may be generated due to updation of the same data in different files.
• Time in entering data again and again is wasted.
• Computer Resources are needlessly used.
• It is very difficult to combine information.
2. Elimination of Inconsistency - In the file processing system information is

duplicated through¬out the system. So changes made in one file may be necessary
be carried over to another file. This may lead to inconsistent data. So we need to
remove this duplication of data in multiple file to eliminate inconsistency.
For example: - Let us consider an example of student's result system. Suppose

that in STU¬DENT file it is indicated that Roll no= 10 has opted for 'Computer’
course but in RESULT file it is indicated that 'Roll No. =l 0' has opted for 'Accounts'
course. Thus, in this case the two entries for z particular student don't agree with
each other. Thus, database is said to be in an inconsistent state. Sc to eliminate this
conflicting information we need to centralize the database. On centralizing the data
base the duplication will be controlled and hence inconsistency will be removed.
Data inconsistency are often encountered in everyday life Consider an another

example, w have all come across situations when a new address is communicated
to an organization that we deal it (Eg - Telecom, Gas Company, Bank). We find that
some of the communications from that organization are received at a new address
while other continued to be mailed to the old address. So combining all the data in
database would involve reduction in redundancy as well as inconsistency so it is
likely to reduce the costs for collection storage and updating of Data.
Let us again consider the example of Result system. Suppose that a student having
Roll no -201 changes his course from 'Computer' to 'Arts'. The change is made in
the SUBJECT file but not in RESULT'S file. This may lead to inconsistency of the data.
So we need to centralize the database so that changes once made are reflected to
all the tables where a particulars field is stored. Thus the update is brought
automatically and is known as propagating updates.
3. Better service to the users - A DBMS is often used to provide better services
to the users. In conventional system, availability of information is often poor, since
it normally difficult to obtain information that the existing systems were not
designed for. Once several conventional systems are combined to form one
centralized database, the availability of information and its updateness is likely to
improve since the data can now be shared and DBMS makes it easy to respond to
anticipated information requests.
16
Centralizing the data in the database also means that user can obtain new and
combined information easily that would have been impossible to obtain otherwise.
Also use of DBMS should allow users that don't know programming to interact with
the data more easily, unlike file processing system where the programmer may
need to write new programs to meet every new demand.
4. Flexibility of the System is Improved - Since changes are often necessary to

the contents of the data stored in any system, these changes are made more easily
in a centralized database than in a conventional system. Applications programs
need not to be changed on changing the data in the database.
5. Integrity can be improved - Since data of the organization using database

approach is centralized and would be used by a number of users at a time. It is
essential to enforce integrity-constraints.
In the conventional systems because the data is duplicated in multiple files so

updating or changes may sometimes lead to entry of incorrect data in some files
where it exists.
For example: - The example of result system that we have already discussed.
Since multiple files are to maintained, as sometimes you may enter a value for
course which may not exist. Suppose course can have values (Computer, Accounts,
Economics, and Arts) but we enter a value 'Hindi' for it, so this may lead to an
inconsistent data, so lack of Integrity.
Even if we centralized the database it may still contain incorrect data. For example:
-
• Salary of full time employ may be entered as Rs. 500 rather than Rs. 5000.
• A student may be shown to have borrowed books but has no enrollment.
• A list of employee numbers for a given department may include a number of non
existent employees.
These problems can be avoided by defining the validation procedures whenever any
update operation is attempted.
6. Standards can be enforced - Since all access to the database must be through
DBMS, so standards are easier to enforce. Standards may relate to the naming of
data, format of data, structure of the data etc. Standardizing stored data formats is
usually desirable for the purpose of data inter¬change or migration between
systems.
7. Security can be improved - In conventional systems, applications are

developed in an adhoc/temporary manner. Often different system of an organization
would access different components of the operational data, in such an environment
enforcing security can be quiet difficult. Setting up of a database makes it easier to
enforce security restrictions since data is now centralized. It is easier to control that
has access to what parts of the database. Different checks can be established for
each type of access (retrieve, modify, delete etc.) to each piece of information in
17
the database.
Consider an Example of banking in which the employee at different levels may be

given access to different types of data in the database. A clerk may be given the
authority to know only the names of all the customers who have a loan in bank but
not the details of each loan the customer may have. It can be accomplished by
giving the privileges to each employee.
8. Organization's requirement can be identified - All organizations have

sections and departments and each of these units often consider the work of their
unit as the most important and therefore consider their need as the most important.
Once a database has been setup with centralized control, it will be necessary to
identify organization's requirement and to balance the needs of the competition
units. So it may become necessary to ignore some requests for information if they
conflict with higher priority need of the organization.
It is the responsibility of the DBA (Database Administrator) to structure the

database system to provide the overall service that is best for an organization.
For example: - A DBA must choose best file Structure and access method to give
fast response for the high critical applications as compared to less critical
applications.
9. Overall cost of developing and maintaining systems is lower - It is much

easier to respond to unanticipated requests when data is centralized in a database
than when it is stored in a conventional file system. Although the initial cost of
setting up of a database can be large, one normal expects the overall cost of setting
up of a database, developing and maintaining application programs to be far lower
than for similar service using conventional systems, Since the productivity of
programmers can be higher in using non-procedural languages that have been
developed with DBMS than using procedural languages.
10. Data Model must be developed - Perhaps the most important advantage of
setting up of database system is the requirement that an overall data model for an
organization be build. In conventional systems, it is more likely that files will be
designed as per need of particular applications demand. The overall view is often
not considered. Building an overall view of an organization's data is usual cost
effective in the long terms.
11. Provides backup and Recovery - Centralizing a database provides the

schemes such as recovery and backups from the failures including disk crash,
power failures, software errors which may help the database to recover from the
inconsistent state to the state that existed prior to the occurrence of the failure,
though methods are very complex.
12. Concurrent Access of data can be possible while keeping the whole
data of the database consistent.
Disadvantages of DBMS
18
1. COST
2. COMPLEXITY
A data model is a conceptual representation of the data structures that are required by a database.
The data structures include the data objects, the associations between data objects, and the rules
which govern operations on the objects. As the name implies, the data model focuses on what
data is required and how it should be organized rather than what operations will be performed on
the data. To use a common analogy, the data model is equivalent to an architect's building plans.
A data model is independent of hardware or software constraints. Rather than

try to represent the data as a database would see it, the data model focuses on
representing the data as the user sees it in the "real world". It serves as a bridge
between the concepts that make up real-world events and processes and the
physical representation of those concepts in a database.
The Entity-Relationship Model

The Entity-Relationship (ER) model was originally proposed by Peter in 1976
[Chen76] as a way to unify the network and relational database views. Simply
stated the ER model is a conceptual data model that views the real world as entities
and relationships. A basic component of the model is the Entity-Relationship
diagram which is used to visually represent data objects. Since Chen wrote his
paper the model has been extended and today it is commonly used for database
design for the database designer, the utility of the ER model is:
• It maps well to the relational model. The constructs used in the ER model can
easily be transformed into relational tables.
• it is simple and easy to understand with a minimum of training. Therefore,
the model can be used by the database designer to communicate the design
to the end user.
• In addition, the model can be used as a design plan by the database
developer to implement a data model in specific database management
software.
Basic Constructs of E-R Modeling
The ER model views the real world as a construct of entities and association
between entities.
Entities
Entities are the principal data object about which information is to be

collected. Entities are usually recognizable concepts, either concrete or
abstract, such as person, places, things, or events which have relevance to
the database. Some specific examples of entities are EMPLOYEES, PROJECTS,
and INVOICES. An entity is analogous to a table in the relational model.
19
Entities are classified as independent or dependent (in some methodologies,

the terms used are strong and weak, respectively). An independent entity is
one that does not rely on another for identification. A dependent entity is one
that relies on another for identification.
An entity occurrence (also called an instance) is an individual occurrence of

an entity. An occurrence is analogous to a row in the relational table.
Special Entity Types
Associative entities (also known as intersection entities) are entities used to

associate two or more entities in order to reconcile a many-to-many
relationship.
Subtypes entities are used in generalization hierarchies to represent a subset

of instances of their parent entity, called the super type, but which have
attributes or relationships that apply only to the subset.
Associative entities and generalization hierarchies are discussed in more

detail below.
Relationships
A Relationship represents an association between two or more entities. An

example of a relationship would be:
employees are assigned to projects
projects have subtasks
departments manage one or more projects
Relationships are classified in terms of degree, connectivity, cardinality, and

existence. These concepts will be discussed below.
Attributes
Attributes describe the entity of which they are associated. A particular

instance of an attribute is a value. For example, "Jane R. Hathaway" is one
value of the attribute Name. The domainof an attribute is the collection of all
possible values an attribute can have. The domain of Name is a character
string.
20
Attributes can be classified as identifiers or descriptors. Identifiers, more

commonly called keys, uniquely identify an instance of an entity. A
descriptor describes a non-unique characteristic of an entity instance.
Classifying Relationships
Relationships are classified by their degree, connectivity, cardinality,

direction, type, and existence. Not all modeling methodologies use all these
classifications.
Degree of a Relationship
The degree of a relationship is the number of entities associated with the

relationship. The n-ary relationship is the general form for degree n. Special
cases are the binary, and ternary, where the degree is 2, and 3, respectively.
Binary relationships, the association between two entities are the most
common type in the real world. A recursive binary relationship occurs when
an entity is related to itself. An example might be "some employees are
married to other employees".
A ternary relationship involves three entities and is used when a binary

relationship is inadequate. Many modeling approaches recognize only binary
relationships. Ternary or n-ary relationships are decomposed into two or
more binary relationships.
Connectivity and Cardinality
The connectivity of a relationship describes the mapping of associated entity

instances in the relationship. The values of connectivity are "one" or "many". The
cardinality of a relationship is the actual number of related occurrences for each of
the two entities. The basic types of connectivity for relations are: one-to-one, one-
to-many, and many-to-many.
A one-to-one (1:1) relationship is when at most one instance of a entity A is

associated with one instance of entity B. For example, "employees in the company
are each assigned their own office. For each employee there exists a unique office
and for each office there exists a unique employee.
A one-to-many (1:N) relationships is when for one instance of entity A, there are
zero, one, or many instances of entity B, but for one instance of entity B, there is
only one instance of entity A. An example of a 1:N relationships is
a department has many employees
each employee is assigned to one department
21
A many-to-many (M:N) relationship, sometimes called non-specific, is when for

one instance of entity A, there are zero, one, or many instances of entity B and for
one instance of entity B there are zero, one, or many instances of entity A. An
example is:
Employees can be assigned to no more than two projects at the same time;
Projects must have assigned at least three employees
A single employee can be assigned to many projects; conversely, a single project

can have assigned to it many employee. Here the cardinality for the relationship
between employees and projects is two and the cardinality between project and
employee is three. Many-to-many relationships cannot be directly translated to
relational tables but instead must be transformed into two or more one-to-many
relationships using associative entities.
Direction
The direction of a relationship indicates the originating entity of a binary
relationship. The entity from which a relationship originates is the parent entity; the
entity where the relationship terminates is the child entity.
The direction of a relationship is determined by its connectivity. In a one-to-one
relationship the direction is from the independent entity to a dependent entity. If
both entities are independent, the direction is arbitrary. With one-to-many
relationships, the entity occurring once is the parent. The direction of many-to-
many relationships is arbitrary.
Type
An identifying relationship is one in which one of the child entities is also a
dependent entity. A non-identifying relationship is one in which both entities are
independent.
Existence
Existence denotes whether the existence of an entity instance is dependent upon
the existence of another, related, entity instance. The existence of an entity in a
relationship is defined as either mandatory or optional. If an instance of an entity
must always occur for an entity to be included in a relationship, then it is
mandatory. An example of mandatory existence is the statement "every project
must be managed by a single department". If the instance of the entity is not
required, it is optional. An example of optional existence is the statement,
"employees may be assigned to work on projects".
Generalization Hierarchies
A generalization hierarchy is a form of abstraction that specifies that two or

more entities that share common attributes can be generalized into a higher
level entity type called a supertype or generic entity. The lower-level of
entities become the subtype, or categories, to the supertype. Subtypes are
dependent entities.
Generalization occurs when two or more entities represent categories of the

same real-world object. For example, Wages_Employees and
22
Classified_Employees represent categories of the same entity, Employees. In

this example, Employees would be the supertype; Wages_Employees and
Classified_Employees would be the subtypes.
Subtypes can be either mutually exclusive (disjoint) or overlapping

(inclusive). A mutually exclusive category is when an entity instance can be
in only one category. The above example is a mutually exclusive category.
An employee can either be wages or classified but not both. An overlapping
category is when an entity instance may be in two or more subtypes. An
example would be a person who works for a university could also be a
student at that same university. The completeness constraint requires that
all instances of the subtype be represented in the supertype. Generalization
hierarchies can be nested. That is, a subtype of one hierarchy can be a
supertype of another. The level of nesting is limited only by the constraint of
simplicity. Subtype entities may be the parent entity in a relationship but not
the child.
ER Notation
There is no standard for representing data objects in ER diagrams. Each

modeling methodology uses its own notation. The original notation used by
Chen is widely used in academics texts and journals but rarely seen in either
CASE tools or publications by non-academics. Today, there are a number of
notations used, among the more common are Bachman, crow's foot, and
IDEFIX.
All notational styles represent entities as rectangular boxes and relationships

as lines connecting boxes. Each style uses a special set of symbols to
represent the cardinality of a connection. The notation used in this document
is from Martin. The symbols used for the basic ER constructs are:
• Entities are represented by labeled rectangles. The label is the name of the
entity. Entity names should be singular nouns.
• Relationships are represented by a solid line connecting two entities. The
name of the relationship is written above the line. Relationship names should
be verbs.
• Attributes, when included, are listed inside the entity rectangle. Attributes
which are identifiers are underlined. Attribute names should be singular
nouns.
• Cardinality of many is represented by a line ending in a crow's foot. If the
crow's foot is omitted, the cardinality is one.
• Existence is represented by placing a circle or a perpendicular bar on the line.
Mandatory existence is shown by the bar (looks like a 1) next to the entity for
an instance is required. Optional existence is shown by placing a circle next
to the entity that is optional.
Examples of these symbols are shown in Figure 1 below:

23
Figure 1: ER Notation
Entity-Relationship Diagrams (ERD)
Data models are tools used in analysis to describe the data requirements and assumptions
in the system from a top-down perspective. They also set the stage for the design of
databases later on in the SDLC.
There are three basic elements in ER models:
Entities are the "things" about which we seek information.
Attributes are the data we collect about the entities.
Relationships provide the structure needed to draw information from multiple
entities.
Generally, ERD's look like
24
Developing an ERD
Developing an ERD requires an understanding of the system and its components.

Before discussing the procedure, let's look at a narrative created by Professor
Harman.
Consider a hospital:
Patients are treated in a single ward by the doctors assigned to them.
Usually each patient will be assigned a single doctor, but in rare cases they
will have two.
Heathcare assistants also attend to the patients, a number of these are
associated with each ward.
Initially the system will be concerned solely with drug treatment. Each
patient is required to take a variety of drugs a certain number of times per
day and for varying lengths of time.
The system must record details concerning patient treatment and staff
payment. Some staff are paid part time and doctors and care assistants
work varying amounts of overtime at varying rates (subject to grade).
The system will also need to track what treatments are required for which
patients and when and it should be capable of calculating the cost of
treatment per week for each patient (though it is currently unclear to what
use this information will be put).
How do we start an ERD?
1. Define Entities: these are usually nouns used in descriptions of the system, in
the discussion of business rules, or in documentation; identified in the narrative
(see highlighted items above).
2. Define Relationships: these are usually verbs used in descriptions of the

system or in discussion of the business rules (entity ______ entity); identified in
the narrative (see highlighted items above).
25
3. Add attributes to the relations; these are determined by the queries,and may
also suggest new entities, e.g. grade; or they may suggest the need for
keys or identifiers.
What questions can we ask?
a. Which doctors work in which wards?
b. How much will be spent in a ward in a given week?
c. How much will a patient cost to treat?
d. How much does a doctor cost per week?
e. Which assistants can a patient expect to see?
f. Which drugs are being used?
4. Add cardinality to the relations
Many-to-Many must be resolved to two one-to-manys with an additional
entity
Usually automatically happens
Sometimes involves introduction of a link entity (which will be all foreign
key) Examples: Patient-Drug
5. This flexibility allows us to consider a variety of questions such as:
a. Which beds are free?
b. Which assistants work for Dr. X?
c. What is the least expensive prescription?
d. How many doctors are there in the hospital?
e. Which patients are family related?
6. Represent that information with symbols. Generally E-R Diagrams require the
use of the following symbols:
Reading an ERD
26
It takes some practice reading an ERD, but they can be used with clients to
discuss business rules.
These allow us to represent the information from above such as the E-R Diagram
below:
ERD brings out issues:

Many-to-Manys
Ambiguities
Entities and their relationships
What data needs to be stored
The Degree of a relationship
Now, think about a university in terms of an ERD. What entities, relationships and
attributes might you consider? Look at this simplified view. There is also an
example of a simplified view of an airline on that page.
Database Normalization
In the field of relational database design, normalization is a systematic way of
ensuring that a database structure is suitable for general-purpose querying and free
of certain undesirable characteristics—insertion, update, and deletion
27
anomalies—that could lead to a loss of data integrity and also it is process of

removing the redundancy.
Edgar F. Codd, the inventor of the relational model, introduced the concept of
normalization and what we now know as the First Normal Form (1NF) in 1970. Codd
went on to define the Second Normal Form (2NF) and Third Normal Form (3NF) in
1971.
When an attempt is made to modify (update, insert into, or delete from) a table,
undesired side-effects may follow. Not all tables can suffer from these side-effects;
rather, the side-effects can only arise in tables that have not been sufficiently
normalized. An insufficiently normalized table might have one or more of the
following characteristics:
• The same information can be expressed on multiple rows; therefore updates

to the table may result in logical inconsistencies. For example, each record in
an "Employees' Skills" table might contain an Employee ID, Employee
Address, and Skill; thus a change of address for a particular employee will
potentially need to be applied to multiple records (one for each of his skills).
If the update is not carried through successfully—if, that is, the employee's
address is updated on some records but not others—then the table is left in
an inconsistent state. Specifically, the table provides conflicting answers to
the question of what this particular employee's address is. This phenomenon
is known as an update anomaly.
• There are circumstances in which certain facts cannot be recorded at all. For
example, each record in a "Faculty and Their Courses" table might contain a
Faculty ID, Faculty Name, Faculty Hire Date, and Course Code—thus we can
record the details of any faculty member who teaches at least one course,
but we cannot record the details of a newly-hired faculty member who has
not yet been assigned to teach any courses. This phenomenon is known as an
insertion anomaly.
• There are circumstances in which the deletion of data representing certain
facts necessitates the deletion of data representing completely different
facts. The "Faculty and Their Courses" table described in the previous
example suffers from this type of anomaly, for if a faculty member
temporarily ceases to be assigned to any courses, we must delete the last of
the records on which that faculty member appears. This phenomenon is
known as a deletion anomaly.
Free the database of modification anomalies
An update anomaly. Employee 519 is shown as having different addresses on different records.
28
An insertion anomaly. Until the new faculty member, Dr. Newsome, is assigned to teach at
least one course, his details cannot be recorded.
A deletion anomaly. All information about Dr. Giddens is lost when he temporarily ceases to be
assigned to any courses.
Normalization
First Normal Form
• Eliminate repeating groups in individual tables.

• Create a separate table for each set of related data.
• Identify each set of related data with a primary key.
A table is said to be in 1NF if the intersection of every row and column

contains only single data. i.e the multivalued data is not permitted .
Do not use multiple fields in a single table to store similar data. For example,
to track an inventory item that may come from two possible sources, an
inventory record may contain fields for Vendor Code 1 and Vendor Code 2.
But what happens when you add a third vendor? Adding a field is not the
answer; it requires program and table modifications and does not smoothly
accommodate a dynamic number of vendors. Instead, place all vendor
information in a separate table called Vendors, then link inventory to vendors
with an item number key, or vendors to inventory with a vendor code key.
For a table to be in first normal form, data must be broken up into the smallest
units possible. For example, the following table is not in first normal form.
29
Name Address Phone

123 Broadway New York, NY, (111) 222-
Sally Singer
11234 3345
Jason 456 Jolly Jumper St. Trenton NJ, (222) 334-
Jumper 11547 5566
To conform to first normal form, this table would require additional fields.
The name field should be divided into first and last name and the address should
be divided by street, city state, and zip like this.
ID First Last Street City State Zip Phone

56
Sally Singer 123 Broadway New York NY 11234 (111) 222-3345
4
56
Jason Jumper 456 Jolly Jumper St. Trenton NJ 11547 (222) 334-5566
5
In addition to breaking data up into the smallest meaningful values, tables

in first normal form should not contain repetitions groups of fields such as
in the following table.
Rep ID Representative Client 1 Time 1 Client 2 Time 2 Client 3 Time 3

TS-89 Gilroy Gladstone US Corp. 14 hrs Taggarts 26 hrs Kilroy Inc. 9 hrs
RK-56 Mary Mayhem Italiana 67 hrs Linkers 2 hrs
The problem here is that each representative can have multiple clients not
all will have three. Some may have less as is the case in the second record,
tying up storage space in your database that is not being used, and some may
have more, in which case there are not enough fields. The solution to this is
to add a record for each new piece of information.
Rep Rep First Rep Last Time With

Client
ID Name Name Client
TS-89 Gilroy Gladstone US Corp 14 hrs
TS-89 Gilroy Gladstone Taggarts 26 hrs
Kilroy
TS-89 Gilroy Gladstone 9 hrs
Inc.
RK-56 Mary Mayhem Italiana 67 hrs
RK-56 Mary Mayhem Linkers 2 hrs
Notice the splitting of the first and last name fields again.
30
This table is now in first normal form. Note that by avoiding repeating groups
of fields, we have created a new problem in that there are identical values
in the primary key field, violating the rules of the primary key. In order to
remedy this, we need to have some other way of identifying each record. This
can be done with the creation of a new key called client ID.
Rep First Rep Last Client Time With

Rep ID* Client
Name Name ID* Client
TS-89 Gilroy Gladstone 978 US Corp 14 hrs
TS-89 Gilroy Gladstone 665 Taggarts 26 hrs
TS-89 Gilroy Gladstone 782 Kilroy Inc. 9 hrs
RK-56 Mary Mayhem 221 Italiana 67 hrs
RK-56 Mary Mayhem 982 Linkers 2 hrs
This new field can now be used in conjunction with the Rep ID field to create
a multiple field primary key. This will prevent confusion if ever more than
one Representative were to serve a single client.
Second Normal Form
• Create separate tables for sets of values that apply to multiple records.
• Relate these tables with a foreign key.
A table is said to be in 2NF if it is in 1NF and all non key attributes are fully
functionally dependent on the primary key.
A fully functional dependency is a dependency where the r.h.s (dependent) is fully
dependent on the composite L.h.s (determinant) part. i.e AB --- > C. Then C is fully
functionally dependent on AB. If A --- > C or B --- > C holds then it is not to
be said as fully functionally dependency rather partial dependency.
Records should not depend on anything other than a table's primary key (a
compound key, if necessary). For example, consider a customer's address in an
accounting system. The address is needed by the Customers table, but also by the
Orders, Shipping, Invoices, Accounts Receivable, and Collections tables. Instead of
storing the customer's address as a separate entry in each of these tables, store it
in one place, either in the Customers table or in a separate Addresses table.
31
The second normal form applies only to tables with multiple field primary
keys. Take the following table for example.
Rep Rep First Rep Last Client Time With

Client
ID* Name Name ID* Client
TS-89 Gilroy Gladstone 978 US Corp 14 hrs
Taggart
TS-89 Gilroy Gladstone 665 26 hrs
s
Kilroy
TS-89 Gilroy Gladstone 782 9 hrs
Inc.
RK-56 Mary Mayhem 221 Italiana 67 hrs
RK-56 Mary Mayhem 982 Linkers 2 hrs
Taggart
RK-56 Mary Mayhem 665 4 hrs
s
This table is already in first normal form. It has a primary key

consisting of Rep ID and Client ID since neither alone can be considered a
unique value.
The second normal form states that each field in a multiple field primary key
table must be directly related to the entire primary key. Or in other words,
each non-key field should be a fact about all the fields in the primary key.
Only fields that are absolutely necessary should show up in our table, all other
fields should reside in different tables. In order to find out which
fields are necessary we should ask a few questions of our database. In our
preceding example, I should ask the question "What information is this
table meant to store?" Currently, the answer is not obvious. It may be
meant to store information about individual clients, or it could be
holding data for employees time cards. As a further example, if my
database is going to contain records of employees I may want a table of
demographics and a table for payroll. The demographics will have all the
employees personal information and will assign them an ID number. I should
not have to enter the data twice, the payroll table on the other hand should
refer to each employee only by their ID number. I can then link the two tables
by a relationship and will then have access to all the necessary
data.
In the table of the preceding example we are devoting three field to the
identification of the employee and two to the identification of the
client. I could identify them with only one field each -- the primary
key. I can then take out the extraneous fields and put them in their own
table. For example, my database would then look like the following.
32
Rep Client Time With

ID* ID* Client
TS-89 978 14 hrs
TS-89 665 26 hrs
TS-89 782 9 hrs
RK-56 221 67 hrs
RK-56 982 2 hrs
RK-56 665 4 hrs
The above table contains time card information.
Rep
First Name Last Name
ID*
TS-89 Gilroy Gladstone
RK-56 Mary Mayhem
The above table contains Employee Information.
Client
Client Name
ID*
978 US Corp
665 Taggarts
782 Kilroy Inc.
221 Italiana
982 Linkers
The above table contains Client Information
These tables are now in normal form. By splitting off the unnecessary
information and putting it in its own tables, we have eliminated redundancy and
put our first table in second normal form. These tables are now ready to
be linked through relationship to each other.
Third Normal Form
33
• Eliminate fields that do not depend on the key.
I.e. in 3rd normal form there should not be any transitive dependency.
• Remove transitive dependencies.

• Transitive Dependency A type of functional dependency where an attribute
is functionally dependent on an attribute other than the primary key. Thus its
value is only indirectly determined by the primary key.
• Create a separate table containing the attribute and the fields that are
functionally dependent on it. Tables created at this step will usually contain
descriptions of either resources or agents. Keep a copy of the key attribute
in the original file.
A table is said to be in 3NF if it is in 2NF and every non key attributes are
not transitively dependent on the Primary key. I.e. there should not be
any transitive dependency.
Third Normal Form Example

The new tables would be:
CustomerNo, CustomerName, CustomerAdd
ClerkNo, ClerkName
All of these fields except the primary key will be removed from the original table.
The primary key will be left in the original table to allow linking of data as follows:
SalesOrderNo, Date, CustomerNo, ClerkNo
Together with the unchanged tables below, these tables make up the database in
third normal form.
ItemNo, Description
SalesOrderNo, ItemNo, Qty, UnitPrice
What if we did not Normalize the Database to Third Normal Form?

• Repetition of Data – Detail for Cust/Clerk would appear on every SO
• Delete Anomalies – Delete a sales order, delete the customer/clerk
• Insert Anomalies – To insert a customer/clerk, must insert sales order.
• Update Anomalies – To change the name/address, etc, must change it on
every SO.
Completed Tables in Third Normal Form

Customers: CustomerNo, CustomerName, CustomerAdd
Clerks: ClerkNo, ClerkName
Inventory Items: ItemNo, Description
34
Sales Orders: SalesOrderNo, Date, CustomerNo, ClerkNo
SalesOrderDetail: SalesOrderNo, ItemNo, Qty, UnitPrice
Values in a record that are not part of that record's key do not belong in the
table. In general, any time the contents of a group of fields may apply to
more than a single record in the table, consider placing those fields in a
separate table.
For example, in an Employee Recruitment table, a candidate's university

name and address may be included. But you need a complete list of
universities for group mailings. If university information is stored in the
Candidates table, there is no way to list universities with no current
candidates. Create a separate Universities table and link it to the Candidates
table with a university code key.
EXCEPTION: Adhering to the third normal form, while theoretically

desirable, is not always practical. If you have a Customers table and you
want to eliminate all possible interfield dependencies, you must create
separate tables for cities, ZIP codes, sales representatives, customer classes,
and any other factor that may be duplicated in multiple records. In theory,
normalization is worth pursuing; however, many small tables may degrade
performance or exceed open file and memory capacities.
It may be more feasible to apply third normal form only to data that changes
frequently. If some dependent fields remain, design your application to
require the user to verify all related fields when any one is changed.
SQL
SQL often referred to as Structured Query Language is a database computer
language designed for managing data in relational database management systems
(RDBMS), and originally based upon relational algebra. Its scope includes data
query and update, schema creation and modification, and data access control. SQL
was one of the first languages for Edgar F. Codd's relational model in his influential
1970 paper, "A Relational Model of Data for Large Shared Data Banks"[4] and
became the most widely used language for relational databases.
35
SQL was developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the
early 1970s. This version, initially called SEQUEL, was designed to manipulate and
retrieve data stored in IBM's original relational database product, System R.
The SQL language is sub-divided into several language elements, including:
• Clauses, which are in some cases optional, constituent components of

statements and queries.[9]
• Expressions which can produce either scalar values or tables consisting of
columns and rows of data.
• Predicates which specify conditions that can be evaluated to SQL three-
valued logic (3VL) Boolean truth values and which are used to limit the
effects of statements and queries, or to change program flow.
• Queries which retrieve data based on specific criteria.
• Statements which may have a persistent effect on schemas and data, or
which may control transactions, program flow, connections, sessions, or
diagnostics.
o SQL statements also include the semicolon (";") statement terminator.
Though not required on every platform, it is defined as a standard part
of the SQL grammar.
Queries
The most common operation in SQL is the query, which is performed with the declarative
SELECT statement. SELECT retrieves data from one or more tables, or expressions. Standard
SELECT statements have no persistent effects on the database. Some non-standard
implementations of SELECT can have persistent effects, such as the SELECT INTO syntax that
exists in some databases.[10]
Queries allow the user to describe desired data, leaving the database management system
(DBMS) responsible for planning, optimizing, and performing the physical operations necessary
to produce that result as it chooses.
A query includes a list of columns to be included in the final result immediately following the
SELECT keyword. An asterisk ("*") can also be used to specify that the query should return all
columns of the queried tables. SELECT is the most complex statement in SQL, with optional
keywords and clauses that include:
• The FROM clause which indicates the table(s) from which data is to be retrieved. The FROM
clause can include optional JOIN subclauses to specify the rules for joining tables.
• The WHERE clause includes a comparison predicate, which restricts the rows returned by
the query. The WHERE clause eliminates all rows from the result set for which the
comparison predicate does not evaluate to True.
• The GROUP BY clause is used to project rows having common values into a smaller set of
rows. GROUP BY is often used in conjunction with SQL aggregation functions or to
eliminate duplicate rows from a result set. The WHERE clause is applied before the GROUP
BY clause.
36
• The HAVING clause includes a predicate used to filter rows resulting from the GROUP BY
clause. Because it acts on the results of the GROUP BY clause, aggregation functions can
be used in the HAVING clause predicate.
• The ORDER BY clause identifies which columns are used to sort the resulting data, and in
which direction they should be sorted (options are ascending or descending). Without an
ORDER BY clause, the order of rows returned by an SQL query is undefined.
The following is an example of a SELECT query that returns a list of expensive books. The query
retrieves all rows from the Book table in which the price column contains a value greater than
100.00. The result is sorted in ascending order by title. The asterisk (*) in the select list indicates
that all columns of the Book table should be included in the result set.
SELECT *
FROM Book
WHERE price > 100.00
ORDER BY title;
Data manipulation
The Data Manipulation Language (DML) is the subset of SQL used to add, update
and delete data:
• INSERT adds rows (formally tuples) to an existing table, e.g.,:
INSERT INTO My_table

(field1, field2, field3)
VALUES
('test', 'N', NULL);
• UPDATE modifies a set of existing table rows, e.g.,:
UPDATE My_table
SET field1 = 'updated value'
WHERE field2 = 'N';
• DELETE removes existing rows from a table, e.g.,:
DELETE FROM My_table

WHERE field2 = 'N';
• TRUNCATE deletes all data from a table in a very fast way. It usually implies a
subsequent COMMIT operation.
• MERGE is used to combine the data of multiple tables. It combines the
INSERT and UPDATE elements. It is defined in the SQL:2003 standard; prior to
that, some databases provided similar functionality via different syntax,
sometimes called "upsert".
Transaction controls(TCL)
37
Transactions, if available, wrap DML operations:
• START TRANSACTION (or BEGIN WORK, or BEGIN TRANSACTION, depending

on SQL dialect) mark the start of a database transaction, which either
completes entirely or not at all.
• SAVE TRANSACTION (or SAVEPOINT ) save the state of the database at the
current point in transaction
CREATE TABLE tbl_1(id int);

INSERT INTO tbl_1(id) value(1);
INSERT INTO tbl_1(id) value(2);
COMMIT;
UPDATE tbl_1 SET id=200 WHERE id=1;
SAVEPOINT id-1upd;
UPDATE tbl_1 SET id=1000 WHERE id=2;
ROLLBACK TO id-1upd;
SELECT id FROM tbl_1;
• COMMIT causes all data changes in a transaction to be made permanent.

• ROLLBACK causes all data changes since the last COMMIT or ROLLBACK to be
discarded, leaving the state of the data as it was prior to those changes.
Once the COMMIT statement completes, the transaction's changes cannot be rolled
back.
COMMIT and ROLLBACK terminate the current transaction and release data locks. In
the absence of a START TRANSACTION or similar statement, the semantics of SQL
are implementation-dependent. Example: A classic bank transfer of funds
transaction.
START TRANSACTION;
UPDATE Account SET amount=amount-200 WHERE account_number=1234;
UPDATE Account SET amount=amount+200 WHERE account_number=2345;
IF ERRORS=0 COMMIT;
IF ERRORS<>0 ROLLBACK;
Data definition
The Data Definition Language (DDL) manages table and index structure. The most
basic items of DDL are the CREATE, ALTER, RENAME, DROP and TRUNCATE
statements:
• CREATE creates an object (a table, for example) in the database.

• DROP deletes an object in the database, usually irretrievably.
38
• ALTER modifies the structure of an existing object in various ways—for

example, adding a column to an existing table.
Example:
CREATE TABLE My_table

(
my_field1 INT,
my_field2 VARCHAR(50),
my_field3 DATE NOT NULL,
PRIMARY KEY (my_field1, my_field2)
);
Embedded SQL
Embedded SQL is a method of combining the computing power of a programming
language and the database manipulation capabilities of SQL. Embedded SQL
statements are SQL statements written in line with the program source code of the
host language. The embedded SQL statements are parsed by an embedded SQL
preprocessor and replaced by host-language calls to a code library. The output from
the preprocessor is then compiled by the host compiler. This allows programmers to
embed SQL statements in programs written in any number of languages such as:
C/C++, COBOL and FORTRAN.
The ANSI SQL standards committee defined the embedded SQL standard in two
steps: a formalism called Module Language was defined, then the embedded SQL
standard was derived from Module Language.[1] The SQL standard defines
embedding of SQL as embedded SQL and the language in which SQL queries are
embedded is referred to as the host language. A popular host language is C. The
mixed C and embedded SQL is called Pro*C in Oracle and Sybase database
management systems. In the PostgreSQL database management system this
precompiled version is called ECPG. Other embedded SQL precompilers are
Pro*Ada, Pro*COBOL, Pro*FORTRAN, Pro*Pascal, and Pro*PL/I.
Systems that support Embedded SQL

IBM DB2
IBM DB2 version 9 for Linux, UNIX and Windows supports embedded SQL for C, C+
+, Java, COBOL, FORTRAN and REXX although support for FORTRAN and REXX has
been deprecated.[2]
Oracle Corporation
Ada
Pro*Ada was officially desupported by Oracle in version 7.3. Starting with Oracle8,
Pro*Ada has been replaced by SQL*Module but appears to have not been updated
39
since.[3] SQL*Module is a module language that offers a different programming

method from embedded SQL. SQL*Module supports the Ada83 language standard
for Ada.
C/C++
Pro*C became Pro*C/C++ with Oracle8. Pro*C/C++ is currently supported as of

Oracle Database 11g.
COBOL
Pro*COBOL is currently supported as of Oracle Database 11g.
Fortran
Pro*FORTRAN is no longer updated as of Oracle8 but Oracle will continue to issue

patch releases as bugs are reported and corrected.[4]
Pascal
Pro*Pascal was not released with Oracle8.[4]
PL/I
Pro*PL/I was not released with Oracle8. The Pro*PL/I Supplement to the Oracle
Precompilers Guide, however, continues to make appearances in the Oracle
Documentation Library (current as of release 11g).[4]
PostgreSQL
C/C++
ECPG is part of PostgreSQL since version 6.3.
COBOL
Cobol-IT is now distributing a COBOL precompiler for PostgreSQL
Altibase
C/C++
SESC is an embedded SQL precompiler provided by Altibase Corp. for its DBMS
server.
Data Access Corporation
40
With DataFlex 3.2 and Visual DataFlex you can pass SQL statements via one of the
Data Access CLI connectivity kits to Microsoft SQL Server, IBM DB2 or any ODBC
supporting database. The results can be retrieved and processed.
Microsoft SQL Server

COBOL
Cobol-IT is distributing a Embedded SQL precompiler for COBOL.
MySQL
COBOL
Cobol-IT is distributing a Embedded SQL precompiler for COBOL.
File Organization
Introduction
File organization is the methodology which is applied to structured computer files. Files contain
computer records which can be documents or information which is stored in a certain way for
later retrieval. File organization refers primarily to the logical arrangement of data (which can
itself be organized in a system of records with correlation between the fields/columns) in a file
system. It should not be confused with the physical storage of the file in some types of storage
media. There are certain basic types of computer file, which can include files stored as blocks of
data and streams of data, where the information streams out of the file while it is being read until
the end of the file is encountered.
We will look at two components of file organization here:
1. The way the internal file structure is arranged and

2. The external file as it is presented to the O/S or program that calls it. Here we
will also examine the concept of file extensions.
We will examine various ways that files can be stored and organized. Files are presented to the
application as a stream of bytes and then an EOF (end of file) condition.
A program that uses a file needs to know the structure of the file and needs to interpret its
contents.
Internal File Structure
41
Methods and Design Paradigm
It is a high-level design decision to specify a system of file organization for a computer software
program or a computer system designed for a particular purpose. Performance is high on the list
of priorities for this design process, depending on how the file is being used. The design of the
file organization usually depends mainly on the system environment. For instance, factors such
as whether the file is going to be used for transaction-oriented processes like OLTP or Data
Warehousing, or whether the file is shared among various processes like those found in a typical
distributed system or standalone. It must also be asked whether the file is on a network and used
by a number of users and whether it may be accessed internally or remotely and how often it is
accessed.
However, all things considered the most important considerations might be:
1. Rapid access to a record or a number of records which are related to each

other.
2. The Adding, modification, or deletion of records.
3. Efficiency of storage and retrieval of records.
4. Redundancy, being the method of ensuring data integrity.
A file should be organized in such a way that the records are always available for processing
with no delay. This should be done in line with the activity and volatility of the information.
Types of File Organization

Organizing a file depends on what kind of file it happens to be: a file in the simplest
form can be a text file, (in other words a file which is composed of ascii (American
Standard Code for Information Interchange) text.) Files can also be created as
binary or executable types (containing elements other than plain text.) Also, files
are keyed with attributes which help determine their use by the host operating
system.
Techniques of File Organization
The three techniques of file organization are:
1. Heap (unordered)
2. Sorted
1. Sequential (SAM)
2. Line Sequential (LSAM)
3. Indexed Sequential (ISAM)
3. Hashed or Direct
Sequential Organization
42
A sequential file contains records organized in the order they were entered. The
order of the records is fixed. The records are stored and sorted in physical,
contiguous blocks within each block the records are in sequence.
Records in these files can only be read or written sequentially.
Once stored in the file, the record cannot be made shorter, or longer, or deleted.
However, the record can be updated if the length does not change. (This is done by
replacing the records by creating a new file.) New records will always appear at the
end of the file.
If the order of the records in a file is not important, sequential organization
will suffice, no matter how many records you may have. Sequential output is also
useful for report printing or sequential reads which some programs prefer to do.
Line-Sequential Organization
Line-sequential files are like sequential files, except that the records can contain
only characters as data. Line-sequential files are maintained by the native byte
stream files of the operating system.
In the COBOL environment, line-sequential files that are created with WRITE
statements with the ADVANCING phrase can be directed to a printer as well as to a
disk.
Indexed-Sequential Organization
Key searches are improved by this system too. The single-level indexing structure is
the simplest one where a file, whose records are pairs, contains a key pointer. This
pointer is the position in the data file of the record with the given key. A subset of
the records, which are evenly spaced along the data file, is indexed, in order to
mark intervals of data records.
This is how a key search is performed: the search key is compared with the index
keys to find the highest index key coming in front of the search key, while a linear
search is performed from the record that the index key points to, until the search
key is matched or until the record pointed to by the next index entry is reached.
Regardless of double file access (index + data) required by this sort of search, the
access time reduction is significant compared with sequential file searches.
Let's examine, for sake of example, a simple linear search on a 1,000 record
sequentially organized file. An average of 500 key comparisons is needed (and
this assumes the search keys are uniformly distributed among the data keys).
However, using an index evenly spaced with 100 entries, the total number of
comparisons is reduced to 50 in the index file plus 50 in the data file: a five to one
reduction in the operations count!
Hierarchical extension of this scheme is possible since an index is a sequential file
in itself, capable of indexing in turn by another second-level index, and so forth and
so on. And the exploit of the hierarchical decomposition of the searches more and
more, to decrease the access time will pay increasing dividends in the reduction of
processing time. There is however a point when this advantage starts to be reduced
by the increased cost of storage and this in turn will increase the index access time.
Hardware for Index-Sequential Organization is usually Disk-based, rather than tape.

Records are physically ordered by primary key. And the index gives the physical
43
location of each record. Records can be accessed sequentially or directly, via the
index. The index is stored in a file and read into memory at the point when the file
is opened. Also, indexes must be maintained.
Life sequential organization the data is stored in physical contiguous box. However
the difference is in the use of indexes. There are three areas in the disc storage:
• Primary Area:-Contains file records stored by key or ID numbers.

• Overflow Area:-Contains records area that cannot be placed in primary area.
• Index Area:-It contains keys of records and there locations on the disc.
Inverted List
In file organization, this is a file that is indexed on many of the attributes of the data
itself. The inverted list method has a single index for each key type. The records are
not necessarily stored in a sequence. They are placed in the data storage area, but
indexes are updated for the record keys and location.
Here's an example, in a company file, an index could be maintained for all products,
and another one might be maintained for product types. Thus, it is faster to search
the indexes than every record. These types of file are also known as "inverted
indexes." Nevertheless, inverted list files use more media space and the storage
devices get full quickly with this type of organization. The benefits are apparent
immediately because searching is fast. However, updating is much slower.
Content-based queries in text retrieval systems use inverted indexes as their
preferred mechanism. Data items in these systems are usually stored compressed
which would normally slow the retrieval process, but the compression algorithm will
be chosen to support this technique.
When querying a file there are certain circumstances when the query is designed to
be modal which means that rules are set which require that different information
be held in the index. Here's an example of this modality: when phrase querying is
undertaken, the particular algorithm requires that offsets to word classifications are
held in addition to document numbers.
Direct or Hashed Access

With direct or hashed access a portion of disk space is reserved and a “hashing”
algorithm computes the record address. So there is additional space required for
this kind of file in the store. Records are placed randomly throughout the file.
Records are accessed by addresses that specify their disc location. Also, this type of
file organization requires a disk storage rather than tape. It has an excellent search
retrieval performance, but care must be taken to maintain the indexes. If the
indexes become corrupt, what is left may as well go to the bit-bucket, so it is as well
to have regular backups of this kind of file just as it is for all stored valuable data.
Distributed Database Management System
A distributed database management system ('DDBMS') is a software system that

permits the management of a distributed database and makes the distribution
transparent to the users. A distributed database is a collection of multiple, logically
44
interrelated databases distributed over a computer network. Sometimes

"distributed database system" is used to refer jointly to the distributed database
and the distributed DBMS.
A distributed database is a database that is under the control of a central database management
system (DBMS) in which storage devices are not all attached to a common CPU. It may be
stored in multiple computers located in the same physical location, or may be dispersed over a
network of interconnected computers.
Collections of data (e.g. in a database) can be distributed across multiple physical locations. A
distributed database can reside on network servers on the Internet, on corporate intranets or
extranets, or on other company networks. Replication and distribution of databases improve
database performance at end-user worksites. Template: Needs clarification
To ensure that the distributive databases are up to date and current, there are two processes:
replication and duplication. Replication involves using specialized software that looks for
changes in the distributive database. Once the changes have been identified, the replication
process makes all the databases look the same. The replication process can be very complex and
time consuming depending on the size and number of the distributive databases. This process can
also require a lot of time and computer resources. Duplication on the other hand is not as
complicated. It basically identifies one database as a master and then duplicates that database.
The duplication process is normally done at a set time after hours. This is to ensure that each
distributed location has the same data. In the duplication process, changes to the master database
only are allowed. This is to ensure that local data will not be overwritten. Both of the processes
can keep the data current in all distributive locations.
Besides distributed database replication and fragmentation, there are many other distributed
database design technologies. For example, local autonomy, synchronous and asynchronous
distributed database technologies. These technologies' implementation can and does depend on
the needs of the business and the sensitivity/confidentiality of the data to be stored in the
database, and hence the price the business is willing to spend on ensuring data security,
consistency and integrity.
Important considerations
Care with a distributed database must be taken to ensure the following:
• The distribution is transparent — users must be able to interact with the

system as if it were one logical system. This applies to the system's
performance, and methods of access among other things.
• Transactions are transparent — each transaction must maintain database
integrity across multiple databases. Transactions must also be divided into
subtransactions, each subtransaction affecting one database system.
45
Advantages of distributed databases
• Management of distributed data with different levels of transparency.

• Increase reliability and availability.
• Easier expansion.
• Reflects organizational structure — database fragments are located in the
departments they relate to.
• Local autonomy — a department can control the data about them (as they
are the ones familiar with it.)
• Protection of valuable data — if there were ever a catastrophic event such as
a fire, all of the data would not be in one place, but distributed in multiple
locations.
• Improved performance — data is located near the site of greatest demand,
and the database systems themselves are parallelized, allowing load on the
databases to be balanced among servers. (A high load on one module of the
database won't affect other modules of the database in a distributed
database.)
• Economics — it costs less to create a network of smaller computers with the
power of a single large computer.
• Modularity — systems can be modified, added and removed from the
distributed database without affecting other modules (systems).
• Reliable transactions - Due to replication of database.
• Hardware, Operating System, Network, Fragmentation, DBMS, Replication
and Location Independence.
• Continuous operation.
• Distributed Query processing.

• Distributed Transaction management.
Single site failure does not affect performance of system. All transactions follow A.C.I.D.
property: a-atomicity, the transaction takes place as whole or not at all; c-consistency, maps one
consistent DB state to another; i-isolation, each transaction sees a consistent DB; d-durability,
the results of a transaction must survive system failures. The Merge Replication Method used to
consolidate the data between databases.
Disadvantages of distributed databases
• Complexity — extra work must be done by the DBAs to ensure that the
distributed nature of the system is transparent. Extra work must also be done
to maintain multiple disparate systems, instead of one big one. Extra
database design work must also be done to account for the disconnected
nature of the database — for example, joins become prohibitively expensive
when performed across multiple systems.
• Economics — increased complexity and a more extensive infrastructure
means extra labour costs.
• Security — remote database fragments must be secured, and they are not
centralized so the remote sites must be secured as well. The infrastructure
46
must also be secured (e.g., by encrypting the network links between remote
sites).
• Difficult to maintain integrity — in a distributed database, enforcing integrity
over a network may require too much of the network's resources to be
feasible.
• Inexperience — distributed databases are difficult to work with, and as a
young field there is not much readily available experience on proper practice.
• Lack of standards – there are no tools or methodologies yet to help users
convert a centralized DBMS into a distributed DBMS.
• Database design more complex – besides of the normal difficulties, the
design of a distributed database has to consider fragmentation of data,
allocation of fragments to specific sites and data replication.
• Additional software is required.
• Operating System should support distributed environment.
• Concurrency control: it is a major issue. It is solved by locking and time
stamping.
A distributed database management system is software for managing databases

stored on multiple computers in a network. A distributed database is a set of
databases stored on multiple computers that typically appears to applications on a
single database. Consequently, an application can simultaneously access and
modify the data in several databases in a network. DDBMS is specially developed
for heterogeneous database platforms, focusing mainly on heterogeneous database
management systems (HDBMS).
Object Oriented Programming

What is Object-Oriented Programming?
As computers increase in processing power, the software they execute becomes
more complex. This increased complexity comes at a cost of large programs with
huge codebases that can quickly become difficult to understand, maintain and keep
bug-free.
Object-oriented programming (OOP) tries to alleviate this problem by creating

networks of objects, each like small software 'machine'. These objects are naturally
smaller entities, simplifying the development task of each unit. However, when the
objects co-operate in a system, they become the building blocks of much more
complex solution.
Consider the motor car. If a car were designed as a single machine, it would be seen
as hugely complex with lots of opportunities for failure. However, when broken
down into its constituent parts, such as wheels, pedals, doors, etc. the individual
design items become simpler. Each part (or object) is created independently, tested
carefully and then assembled into the final product. The creation of the parts can be
simplified further when they are broken down into even simpler items. For example,
when each door is considered as being composed of an outer panel, handle, inner
panel and window.
47
The car example is analogous to the object-oriented software. Rather than writing a
huge program to create, for example, a project management system, the solution is
broken into real-world parts such as project, task, estimate, actual, deliverable, etc.
Each of these can then be developed and tested independently before being
combined.
An object-oriented program may be considered a collection of interacting
objects. Each object is capable of sending and receiving messages, and processing
data. Consider the objects of a driver, a car, and a traffic light. When the traffic
light changes, it sends a virtual message to the driver. The driver receives the
message, and then chooses to accelerate or decelerate. This sends a virtual
message to the car. When the car's speed changes, it sends a virtual message back
to the driver, via the speedometer.
Key Concepts
Classes and Objects
The basic building blocks of object-oriented programming are the class and the
object. A class defines the available characteristics and behavior of a set of similar
objects and is defined by a programmer in code. A class is an abstract definition
that is made concrete at run-time when objects based upon the class are
instantiated and take on the class' behavior.
As an analogy, let's consider the concept of a 'vehicle' class. The class developed by
a programmer would include methods such as Steer(), Accelerate() and Brake(). The
class would also include properties such as Colour, NumberOfDoors, TopSpeed and
NumberOfWheels. The class is an abstract design that becomes real when objects
such as Car, RacingCar, Tank and Tricycle are created, each with its own version of
the class' methods and properties.
Class
A class defines the characteristics of an object i.e its an template for creating
objects. Characteristics include: Attributes (fields or properties), and behaviors
(methods or operations). For example, a "Car" class could have properties such as:
year, make, model, color, number of doors, and engine. Behaviors of the "Car"
class include: On, off, change gears, accelerate, decelerate, turn, and brake.
Object
An object is an instance of a class. Creating an object is also known as

instantiation. For example, the object "my Porsche" is an instance of the car class.
Method
• A method is a behavior of an object. Within a program, a method usually

affects only one particular object. In our example, all cars can accelerate, but
the program only needs to make "my Porsche" accelerate.
48
Message Passing
• Message passing (or method calling) is the process where an object sends
data to another object to invoke a method. For example, when the object
called "joe" (an instance of the driver class), presses the gas pedal, he
literally passes an accelerate message to object "my Porsche", which in turn,
invokes the "my Porsche" accelerate method.
• Message passing, also known as interfacing, describes the communication

between objects using their public interfaces. There are three main ways to
pass messages. These are using methods, properties and events. A property
can be defined in a class to allow objects of that type to advertise and allow
changing of state information, such as the 'TopSpeed' property. Methods can
be provided so that other objects can request a process to be undertaken by
an object, such as the Steer() method. Events can be defined that an object
can raise in response to an internal action. Other objects can subscribe to
these so that they can react to an event occurring. An example for vehicles
could be an 'ImpactDetected' event subscribed to by one or more 'AirBag'
objects.
Encapsulation
Encapsulation conceals the functional details of a class from objects that send
messages to it. For example, the "Porsche Carrera GT" class has a method called
"accelerate". The code for the "accelerate" method defines exactly how
acceleration occurs. In this example, fuel is pumped from gas tank and mixed with
air in the cylinders. Pistons move causing compression, resulting in combustion,
etc. Object "Joe" is an instance of the "Driver" class. It does not need to know how
"my Porsche" accelerates when sending it an accelerate message.
Encapsulation protects the integrity of an object by preventing users from changing

internal data into something invalid. Encapsulation reduces system complexity and
thus increases robustness, by limiting inter-dependencies between components.
Encapsulation, also known as data hiding, is an important object-oriented

programming concept. It is the act of concealing the functionality of a class so that
the internal operations are hidden, and irrelevant, to the programmer. With correct
encapsulation, the developer does not need to understand how the class actually
operates in order to communicate with it via its publicly available methods and
properties; known as its public interface.
Encapsulation is essential to creating maintainable object-oriented programs. When
the interaction with an object uses only the publicly available interface of methods
and properties, the class of the object becomes a correctly isolated unit. This unit
49
can then be replaced independently to fix bugs, to change internal behavior or to

improve functionality or performance.
In the car analogy this is similar to replacing a headlight bulb. As long as we choose
the correct bulb size and connection (the public interface), it will work in the car. It
does not matter if the manufacturer has changed or the internal workings of the
bulb differ from the original. It may even offer an improvement in brightness!
Abstraction
Abstraction is the practice of reducing details so that someone can focus on a few
consepts at a time. For example, "my Porsche" may be treated as a "Car" most of
the time. It may sometimes be treated as a "Porsche Carrera GT" to access specific
properties and methods relevant to "Porsche Carrera GT". It could also be treated
as a vehichle, the parent class of "Car" when considering all traffic in your
neighborhood.
Inheritance
• Inheritance (also known as subclasses) occurs when a specialized version of a

class is defined. The subclass inherits attributes and behaviors from the
parent class. For example, the "Car" class could have subclasses called
"Porsche car", "Chevy car", and "Ford car". Subclasses inherit properties and
methods from the parent class. The software engineer only only has to write
the code for them once. A subclass can alter its inherited attributes or
methods. In our example, the "Porsche car" subclass would specify that the
default make is Porsche. A subclass can also include its own attributes and
behaviors.
• We could create a subclass of "Porsche car" called "Porsche Carrera GT". The
"Porsche Carrerra GT" class could further specify a default model of "Carrera
GT" and "number of doors" as two. It could also include a new method called
"deploy rear wing spoiler".
• The object of "my Porsche" may be instantiated from class "Porsche Carrera
GT" instead of class "Car". This allows sending a message to invoke the new
method "deploy rear wing spoiler".
• Inheritance is an interesting object-oriented programming concept. It allows

one class (the sub-class) to be based upon another (the super-class) and
inherit all of its functionality automatically. Additional code may then be
added to create a more specialized version of the class. In the example of
vehicles, sub-classes for cars or motorcycles could be created. Each would
still have all of the behavior of a vehicle but can add specialized methods and
properties, such as 'Lean()' and 'LeanAngle' for motorcycles.
50
• Some programming languages allow for multiple inheritances where a sub-

class is derived from two or more super-classes. C# does not permit this but
does allow a class to implement multiple interfaces. An interface defines a
contract for the methods and properties of classes that implement it.
However, it does not include any actual functionality.
Polymorphism
• Polymorphism is the ability of one type to appear as (and be used like)

another type. Classes "Porsche Carrera GT" and "Ford Mustang" both
inherited a method called "brake" from a similar parent class.
• The results of executing method "brake" for the two types produces different
results. "my Porsche" may brake at a rate of 33 feet per second, whereas
"my Mustang" may brake at a rate of 29 feet per second.
• Polymorphism is the ability for an object to change its behaviour according to
how it is being used. Where an object's class inherits from a super-class or
implements one or more interfaces, it can be referred to by those class or
interface names. So if we have a method that expects an object of type
'vehicle' to be passed as a parameter, we can pass any vehicle, car or
motorcycle object to that method even though the data type may be
technically different.
Function Overloading
We are overloading a function name f by declaring more than one function with the name f in
the same scope. The declarations of f must differ from each other by the types and/or the number
of arguments in the argument list. When you call an overloaded function named f, the correct
function is selected by comparing the argument list of the function call with the parameter list of
each of the overloaded candidate functions with the name f. A candidate function is a function
that can be called based on the context of the call of the overloaded function name.
Consider a function print, which displays an int. As shown in the following example, you can
overload the function print to display other types, for example, double and char*. You can
have three functions with the same name, each performing a similar operation on a different data
type:
#include <iostream>
using namespace std;
void print(int i) {
cout << " Here is int " << i << endl;
}
void print(double f) {
cout << " Here is float " << f << endl;
}
void print(char* c) {
cout << " Here is char* " << c << endl;
}
51
int main() {
print(10);
print(10.10);
print("ten");
}
The following is the output of the above example:
Here is int 10
Here is float 10.1
Here is char* ten
Virtual Function
C++ virtual function is a member function of a class, whose functionality can be
over-ridden in its derived classes. The whole function body can be replaced with a
new set of implementation in the derived class. The concept of c++ virtual
functions is different from C++ Function overloading.
C++ Virtual Function - Properties:
C++ virtual function is,

• A member function of a class
• Declared with virtual keyword
• Usually has a different functionality in the derived class
• A function call is resolved at run-time
The difference between a non-virtual c++ member function and a virtual member
function is, the non-virtual member functions are resolved at compile time. This
mechanism is called static binding. Whereas the c++ virtual member functions
are resolved during run-time. This mechanism is known as dynamic binding.
C++ Virtual Function - Reasons:

The most prominent reason why a C++ virtual function will be used is to have a
different functionality in the derived class.
For example a Create function in a class Window may have to create a window
with white background. But a class called Command Button derived or inherited
from Window may have to use a gray background and write a caption on the center.
The Create function for Command Button now should have functionality different
from the one at the class called Window.
C++ Virtual function - Example:
This article assumes a base class named Window with a virtual member function
named Create. The derived class name will be Command Button, with our over
ridden function Create.
class Window // Base class for C++ virtual function example
{
public:
virtual void Create() // virtual function for C++ virtual function example
{
cout <<"Base class Window"<
52
}
};
class CommandButton : public Window
{
public:
void Create()
{
cout<<"Derived class Command Button - Overridden C++ virtual
function"<
}
};
void main()
{
Window *x, *y;
x = new Window();
x->Create();
y = new CommandButton();
y->Create();
}
The output of the above program will be,

Base class Window
Derived class Command Button
If the function had not been declared virtual, then the base class function
would have been called all the times. Because, the function address would have
been statically bound during compile time. But now, as the function is declared
virtual it is a candidate for run-time linking and the derived class function is being
invoked.
C++ Virtual function - Call Mechanism:
Whenever a program has a C++ virtual function declared, a v-table is constructed
for the class. The v-table consists of addresses to the virtual functions for classes
and pointers to the functions from each of the objects of the derived class.
Whenever there is a function call made to the c++ virtual function, the v-table is
used to resolve to the function address. This is how the Dynamic binding happens
during a virtual function call.
Friend Functions
In this C++ tutorial, you will learn about friend functions, need for friend function,
how to define and use friend function and few important points regarding friend
function, explained with example.
53
Need for Friend Function:
As we know from access specifies, when a data is declared as private inside a class,
and then it is not accessible from outside the class. A function that is not a member
or an external class will not be able to access the private data. A programmer may
have a situation where he or she would need to access private data from non-
member functions and external classes. For handling such cases, the concept of
Friend functions is a useful tool.
What is a Friend Function?
A friend function is used for accessing the non-public members of a class. A class
can allow non-member functions and other classes to access its own private data,
by making them friends. Thus, a friend function is an ordinary function or a member
of another class.
How to define and use Friend Function in C++:
The friend function is written as any other normal function, except the function
declaration of these functions is preceded with the keyword friend. The friend
function must have the class to which it is declared as friend passed to it in
argument.
Some important points to note while using friend functions in C++:
• The keyword friend is placed only in the function declaration of the friend
function and not in the function definition.
.
• It is possible to declare a function as friend in any number of classes.
.
• When a class is declared as a friend, the friend class has access to the private
data of the class that made this a friend.
.
• A friend function, even though it is not a member function, would have the
rights to access the private members of the class.
.
• It is possible to declare the friend function as either private or public.
.
• The function can be invoked without the use of an object. The friend function
has its argument as objects, seen in example below.
Example to understand the friend function:
#include
class exforsys
{
private:
54
int a,b;
public:
void test()
{
a=100;
b=200;
}
friend int compute(exforsys e1)
//Friend Function Declaration with keyword friend and with the object of
class exforsys to which it is friend passed to it
};
int compute(exforsys e1)

{
//Friend Function Definition which has access to private data
return int(e1.a+e2.b)-5;
}
main()
{
exforsys e;
e.test();
cout<<"The result is:"<
//Calling of Friend Function with object as argument.
}
The output of the above program is
The result is: 295
The function compute () is a non-member function of the class exforsys. In order to

make this function have access to the private data a and b of class exforsys , it is
created as a friend function for the class exforsys. As a first step, the function
compute () is declared as friend in the class exforsys as:
friend int compute (exforsys e1)
Composition
Objects can work together in many ways within a system. In some situations, classes and objects
can be tightly coupled together to provide more complex functionality. This is known as
composition. In the car example, the wheels, panels, engine, gearbox, etc. can be thought of as
individual classes. To create the car class, you link all of these objects together, possibly adding
55
further functionality. The internal workings of each class are not important due to encapsulation
as the communication between the objects is still via passing messages to their public interfaces.
Modularity
In addition to the concepts described above, object-oriented programming also permits increased
modularity. Individual classes or groups of linked classes can be thought of as a module of code
that can be re-used in many software projects. This reduces the need to redevelop similar
functionality and therefore can lower development time and costs.
56

BBA Notes 3rd Yr PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BBA Notes 3rd Yr PDF

Uploaded by

Copyright:

Available Formats

Computer Applications – BBA 3rd year SN

• System An integrated set of interoperable elements, each with explicitly

1. Organization: It says the Structure or order of built.

Elements of System Analysis

There are 4 basic elements of System analysis, they are

Super System and Sub System

A sub-system is a system that exists in another system. Its existence depends up on

1. Physical or abstract systems

a. Transaction Processing System (TPS)

Systems analysis is the interdisciplinary part of science, dealing with analysis of

SYSTEM LIFE CYCLE

System life cycle is an organizational process of developing and maintaining

System development life cycle means combination of various activities. In

Following are the different phases of software development cycle:

The different phases of software development life cycle is shown in Fig.29.1

Fig. 29.1 Different phases of Software development Life Cycle

29.5 PHASES OF SYSTEM DEVELOPMENT LIFE CYCLE

(a) System Study

• problem identification and project initiation

(b) Feasibility Study

There are 3 types of feasibility

(c) System Analysis

Assuming that a new system is to be developed, the next phase is system

• Keeping in view the problems and new requirements

All procedures, requirements must be analyzed and documented in the form

The main points to be discussed in system analysis are:

• Specification of what the new system is to accomplish based on the

(d) System Design

• preliminary or general design

Preliminary or general design: In the preliminary or general design, the

Structure or Detailed design: In the detailed design stage, computer oriented

(e) Coding and Implementation

After designing the new system, the whole system is required to be

(g) User Implementation

• How to execute the package

o Manual results can be compared with the results of the

o Failure of the computerized system at the early stage, does not

Maintenance is necessary to eliminate errors in the system during its

• knowing the full capabilities of the system

If a major change to a system is needed, a new project may have to be set

Requirements analysis in systems engineering and software engineering,

Conceptually, requirements analysis includes three types of activity:

• Eliciting requirements: the task of communicating with customers and users

• Recording requirements: Requirements might be documented in various

Requirements analysis can be a long and arduous process during which

Developing an IT application is an investment. Since after developing that

Cost and Benefit Categories

Another classification of the costs can be:

It includes the cost of purchasing or leasing of computers and its peripherals.

It is the money, spent on the people involved in the development of the

form of maintaining the hardware or application programs or money paid to

We can define benefit as

Benefits can be accrued by :

Further costs and benefits can be categorized as

Tangible or Intangible Costs and Benefits

Benefits are also tangible or intangible. For example, more customer

Direct or Indirect Costs and Benefits

Fixed or Variable Costs and Benefits

Performing Cost Benefit Analysis (CBA)

Cost for the proposed system (figures in USD Thousands)

Benefit for the propose system

Profit = Benefits - Costs

Since we are gaining, this system is feasible.