Database Management systems

1. Introduction DBMS – A Database is a collection of interrelated data and a Database Management System is a set of programs to use and/or modify this data. 1. 1 Approaches to Data Management • File-Based Systems Conventionally, before the Database systems evolved, data in software systems was stored in and represented using flat files. • Database Systems Database Systems evolved in the late 1960s to address common issues in applications handling large volumes of data which are also data intensive. Some of these issues could be traced back to the following disadvantages of File-based systems. Drawbacks of File-Based Systems

As shown in the figure, in a file-based system, different programs in the same application may be interacting with different private data files. There is no system enforcing any standardized control on the organization and structure of these data files. • Data Redundancy and Inconsistency Since data resides in different private data files, there are chances of redundancy and resulting inconsistency. For example, in the above example shown, the same customer can have a savings account as well as a mortgage loan. Here the customer details may be duplicated since the programs for the two functions store their corresponding data in two different data files. This gives rise to redundancy in the customer's data. Since the same data is stored in two files, inconsistency arises if a change made in the data in one file is not reflected in the other. • Unanticipated Queries In a file-based system, handling sudden/ad-hoc queries can be difficult, since it requires changes in the existing programs. • Data Isolation Though data used by different programs in the application may be related, they reside in isolated data files. • Concurrent Access Anomalies In large multi-user systems the same file or record may need to be accessed by multiple users simultaneously. Handling this in a file-based systems is difficult. • Security Problems In data-intensive applications, security of data is a major concern. Users should be given access only to required data and not the whole database. In a file-based system, this can be handled only by additional programming in each application. • Integrity Problems In any application, there will be certain data integrity rules which needs to be maintained. These could be in the form of certain conditions/constraints on the elements of the data records. In the savings bank application, one such integrity rule could be “Customer ID, which is the unique identifier for a customer record, should be non-empty”. There can be several such integrity rules. In a file-based system, all these rules need to be explicitly programmed in the application program. It may be noted that, we are not trying to say that handling the above issues like concurrent access, security, integrity problems, etc., is not possible in a file-based system. The real issue was that, though all these are common issues of concern to any dataintensive application, each application had to handle all these problems on its own. The application programmer needs to bother not only about implementing the application business rules but also about handling these common issues.

1.2 Advantages of Database Systems

As shown in the figure, the DBMS is a central system which provides a common interface between the data and the various front-end programs in the application. It also provides a central location for the whole data in the application to reside. Due to its centralized nature, the database system can overcome the disadvantages of the file-based system as discussed below. • Minimal Data Redundancy Since the whole data resides in one central database, the various programs in the application can access data in different data files. Hence data present in one file need not be duplicated in another. This reduces data redundancy. However, this does not mean all redundancy can be eliminated. There could be business or technical reasons for having some amount of redundancy. Any such redundancy should be carefully controlled and the DBMS should be aware of it. • Data Consistency Reduced data redundancy leads to better data consistency. • Data Integration

Since related data is stored in one single database, enforcing data integrity is much easier. Moreover, the functions in the DBMS can be used to enforce the integrity rules with minimum programming in the application programs. • Data Sharing

Related data can be shared across programs since the data is stored in a centralized manner. Even new applications can be developed to operate against the same data. • Enforcement of Standards Enforcing standards in the organization and structure of data files is required and also easy in a Database System, since it is one single set of programs which is always interacting with the data files. • Application Development Ease The application programmer need not build the functions for handling issues like concurrent access, security, data integrity, etc. The programmer only needs to implement the application business rules. This brings in application development ease. Adding additional functional modules is also easier than in file-based systems. • Better Controls Better controls can be achieved due to the centralized nature of the system. • Data Independence The architecture of the DBMS can be viewed as a 3-level system comprising the following: - The internal or the physical level where the data resides. - The conceptual level which is the level of the DBMS functions - The external level which is the level of the application programs or the end user. Data Independence is isolating an upper level from the changes in the organization or structure of a lower level. For example, if changes in the file organization of a data file do not demand for changes in the functions in the DBMS or in the application programs, data independence is achieved. Thus Data Independence can be defined as immunity of applications to change in physical representation and access technique. The provision of data independence is a major objective for database systems. • Reduced Maintenance Maintenance is less and easy, again, due to the centralized nature of the system. 1.3 Functions of a DBMS The functions performed by a typical DBMS are the following: • Data Definition

2. the type and size of fields and the various constraints/conditions to be satisfied by the data in each field. data needs to be inserted. • Data Recovery & Concurrency Recovery of data after a system failure and concurrent access of records by multiple users are also handled by the DBMS.4 Role of the Database Administrator Typically there are three types of users for a DBMS. These function can handle planned and unplanned data manipulation needs. She needs to have access and knowledge of only the data she is using. • Data Manipulation Once the data structure is defined. They are : 1. These include defining and modifying the record structure. These can be easily invoked by the application and hence the application programmer need not code these functions in his/her programs. this is the user who actually puts the data in the system into use in business.The DBMS provides functions to define the structure of the data in the application. modified or deleted. • Data Security & Integrity The DBMS contains functions which handle the security and integrity of data in the application. • Performance Optimizing the performance of the queries is one of the important functions of a DBMS. Planned queries are those which form part of the application. She has more knowledge about the data and its structure since she has manipulate the data using . The functions which perform these operations are also part of the DBMS. The End User who uses the application. She also need not be aware of the complete data in the system. 1. Hence the DBMS has a set of programs forming the Query Optimizer which evaluates the different implementations of a query and chooses the best among them. Unplanned queries are ad-hoc queries which are performed on a need basis. The Application Programmer who develops the application programs. Ultimately. • Data Dictionary Maintenance Maintaining the Data Dictionary which contains the data definition of the application is also one of the functions of a DBMS. Thus the DBMS provides an environment that is both convenient and efficient to use when there is a large volume of data and many transactions to be processed. This user need not know anything about the organization of data in the physical level.

• Defining Security & Integrity Checks The DBA finds about the access restrictions to be defined and defines security checks accordingly. 1. the periodicity of taking backups and also the medium and storage place for the backup data. In the figure. different records are inter-related through hierarchical or tree-like structures. A parent record can have several children. ORDER_PARTS and SALES_HISTORY. • Defining the Schema The DBA defines the schema which contains the structure of the data in the application. • Monitoring Performance The DBA has to continuously monitor the performance of the queries and take measures to optimize all the queries in the application. ORDERS. The Database Administrator (DBA) who is like the super-user of the system. but a child can have only one parent. . The role of the DBA is very important and is defined by the following functions. The DBA determines what data needs to be present in the system ad how this data has to be represented and organized. 3.the first storing the relations between CUSTOMER. These are the pre-relational models. The many-to-many relationship is implemented through the ORDER_PARTS segment which occurs in both the hierarchies.her programs. Data Integrity checks are also defined by the DBA.5 Types of Database Systems Database Systems can be catagorised according to the data structures and operators they present to the user. hierarchic and network systems. Defining backup procedures includes specifying what data is to backed up. while the other has a logical pointer to this segment. CONTACTS and ORDER_PARTS and the second showing the relation between PARTS. • In the Hierarchical Model. In practice. only one tree stores the ORDER_PARTS segment. • Liaising with Users The DBA needs to interact continuously with the users to understand the data in the system and its use. IMS (Information Management System) of IBM is an example of a Hierarchical DBMS. The oldest systems fall into inverted list. there are two hierarchies shown . She also need not have access and knowledge of the complete data in the system. • Defining Backup / Recovery Procedures The DBA also defines procedures for backup and recovery.

.• In the Network Model. IDMS from Computer Associates International Inc. Records are physically linked through linked-lists. a parent can have several children and a child can also have many parent records. is an example of a Network DBMS.

.. .. there are no physical links. Sybase. is a major reason for the relational model to become more programmer friendly and much more dominant and popular in both industrial and academic scenarios.• In the Relational Model. This. Unlike the other two type of DBMS. All data is maintained in the form of tables consisting of rows and columns... Data in two tables is related through common columns and not physical links or pointers. CUSTOMER NAME 15371 Nanubhai & Sons . . Operators are provided for operating on rows in tables. J. DB2.. there is no need to traverse pointers in the Relational DBMS. NO. .... CITY Mumbai .. MS-SQL Server are few of the popular Relational DBMSs. This makes querying much more easier in a Relational DBMS than in the the Hierarchical or Network DBMS.. Road ... Oracle. in fact.. ADDRESS L. Ingres... unlike the Hierarchical and Network models. Informix. . CUSTOMER CUST. . .

.. DESIGNATION Owner Accountant . 3-Level Database System Architecture ... . C1 S3 .. S3 S3 S3 S3 REGION East North South West YEAR 1996 1996 1996 1996 UNITS 2000 5500 12000 20000 The recent developments in the area have shown up in the form of certain object and object/relational DBMS products.... . . Examples of such systems are GemStone and Versant ODBMS.00 .... .... .. 24-June-1997 15371 . . . .. . • CONTACTS CUST.......... ORDERS ORDER NO.....NO.5" Floppies .. SALES-HISTORY PART NO. QUANTITY 300 120 . 3216 3216 . ORDER DATE PARTS DESC Amkette 3. ORDERS-PARTS ORDER PART NO. . CONTACT 15371 15371 .. 3216 . PARTS PARTS NO. .... Research has also proceeded on to a variety of other schemes including the multi-dimensional approach and the logic-based approach. . ... ... ......... S3 . . ........ .... NO.... . ...... . CUSTOMER NO... PART PRICE 400. • .. Nanubhai Rajesh Munim ...

The Internal Level is the level which deals with the physical storage of data. what are the representation of the fields etc.• • • The External Level represents the collection of views available to different endusers. . The Conceptual level is the representation of the entre information content of the database. The Internal Level This chapter discusses the issues related to how the data is physically stored on the disk and some of the access mechanisms commonly used for retrieving this data. the main objective is to optimize performance by minimizing the number of disk accesses during the various database operations. The Internal level is the physical level which shows how the data data is stored. While designing this layer. 2.

the latter maps the record to a page containing it and requests the Disk Manager for the specific page. This method of storing logically related records. The DBMS views the database as a collection of records. if records which are frequently used together are placed physically together. The Disk Manager determines the physical location on the disk and retrieves the required page. When the DBMS makes a request for a specific record to the File Manager. Eg: Consider CUSTOMER table as shown below. Hence the number of pages to be retrieved will be less and this reduces the number of disk accesses which in turn gives a better performance. if the page containing the requested record is already in the memory. Thus.1 Clustering In the above process.The figure shows the process of database access in general. . physically together is called clustering. In such a situation. more records will be in the same page. The File Manager of the underlying Operating System views it as a set of pages and the Disk Manager views it as a collection of physical locations on the disk. 2. time taken for the whole operation will be less. retrieval from the disk is not necessary.

. If queries retrieving Customers with consecutive Cust_IDs frequently occur in the application.e.. . ... This type of clustering may be required to enhance the speed of queries retrieving related records from more than one tables... in the given example.... .... Assume that the Customer record size is 128 bytes and the typical size of a page retrieved by the File Manager is 1 Kb (1024 bytes). . i.2 Indexing Indexing is another common method for making retrievals faster. .. .Cust ID 10001 10002 10003 10004 . . Q: Can a table have clustering on multiple fields simultaneously ? A: No •Intra-file Clustering – Clustered records belong to the same file (table) as in the above example.. . . Cust City Delhi .. But... ..7 Q: For what record size will clustering be of no benefit to improve performance ? A: When the record size and page size are such that a page can contain only one record.. if the records are clustered. only 13 disk accesses will be required to obtain the query results. Hence the number of pages to be accessed for retrieving the 100 consecutive records will be ceil(100/8) = 13. will require 100 pages to be accessed which in turn translates to 100 disk accesses.. Hence a query to retrieve 100 records with consecutive Cust_Ids (say.. 2.. ... . If there is no clustering... it can be assumed that the Customer records are stored at random physical locations. Cust Name Raj . each record may be placed in a different page.... In the worst-case scenario.. .... . This can be explained as follows.. .. Here interleaving of records is used. . clustering based on Cust_ID will help improving the performance of these queries. Thus. 10001 to 10100).. a page can contain 8 records. . •Inter-file Clustering – Clustered records belong to different files (tables)... clustering improves the speed by a factor of 7..

a search is carried out on the Index file. the number of pages depends on the size of each record also. “Retrieve the records of all customers who reside in Delhi” Here a sequential search on the CUSTOMER table has to be carried out and all records with the value 'Delhi' in the Cust_City field have to be retrieved. A new index file is created.Consider the example of CUSTOMER table used above. the pointer in the second field of the records can be followed to directly retrieve the corresponding CUSTOMER records. it is to be noted that this search will be much faster than a sequential search in the CUSTOMER table. The number of records in the index file is same as that of the data file. Here. When the records with value 'Delhi' in the Cust_City field in the index file are located. If the records are randomly stored. if the records are stored physically together. they slow down updates on the table since updates on the base table demand update on the index field as well. Whenever a query based on Cust_City field occurs. The index file has two fields in each record. This results in the scenario as shown below. The time taken for this operation depends on the number of pages to be accessed. Creating an Index on Cust_City is one such method. The following query is based on Customer's city. Thus the access involves a Sequential access on the index file and a Direct access on the actual data file. This is because of the much smaller size of the index record due to which each page will be able to contain more number of records. One field contains the value of the Cust_City field and the second contains a pointer to the actual data record in the CUSTOMER table. If the records are stored physically together. If such queries based on Cust_City field are very frequent in the application. the page accesses depends on the volume of data. Retrieval Speed v/s Update Speed : Though indexes help making retrievals faster. . steps can be taken to improve the performance of these queries.

where f is the hash function. index on field combinations. Multiple indexes can also be created on the same table simultaneously though there may be a limit on the maximum number of indexes that can be created on a table.It is possible to create an index with multiple fields i.e. This method provides direct access to record on the basis of the value of a specific field called the hash_field. it is physically stored at an address which is computed by applying a mathematical function (hash function) to the value of the hash field. . Here. Q: In which of the following situations will indexes be ineffective ? a) When the percentage of rows being retrieved is large b) When the data table is small and the index record is of almost the same size as of the actual data record.. c) In queries involving NULL / Not NULL in the indexed field. when a new record is inserted. hash_address = f (hash_field). d)All of the above A: d) All of the above Q: Can a clustering based on one field and indexing on another field exist on the same table simultaneously ? A: Yes 2. Thus for every new record.3 Hashing Hashing is yet another method used for making retrievals faster.

20002. A pointer from the first record at the original hash address to the new record will also be stored. 1217 etc. search for the next free location available in the disk and store the new record at this location. . the hash address is computed to locate the record. When it is seen that the record is not available at the hash address. 10002. 30002. say an id. 1153. Retrievals are faster since a direct access is provided and there is no search involved in the process. During retrieval. respectively. And same is the case with CUST_ID values 30001. the pointer from the record at that address is followed to locate the required record. 1153. respectively. 30003 etc. The methods to resolve a collision are by using : 1. 10003 etc. An example of a typical hash function is given by a numeric hash field. Linear Search: While inserting a new record. 20003 etc. Hence there can be only one hash field per file. In the above example. Let CUST_ID be the hash field and the hash function be defined as ((CUST_ID mod 10000)*64 + 1025).Later. will be stored at addresses 1089. Q: Can there be more than one hash fields on a file ? A: No As hashing relates the field value to the address of the record. multiple hash fields will map a record to multiple addresses at the same time. 1217 etc. if it is found that the location at the hash address is already occupied by a previously inserted record. records with CUST_ID values 20001. The records with CUST_ID 10001. when a record is to be retrieved. will also map on to the addresses 1089. modulus a very large prime number. Collisions : Consider the example of the CUSTOMER table given earlier while discussing clustering. It is possible that two records hash to the same address leading to a collision. the same hash function is used to compute the address where the record is stored.

In this method. the hash address location contains the head of a list of pointers linking together all records which hash to that address. the over head incurred is the time taken for the linear search to locate the next free location while inserting a record. 2. an overflow area needs to be used if the number of records mapping on to the same hash address exceeds the number of locations linked to it. In this method. 3.1 The Relational Model Relational Databases: Terminology . Collision Chain: Here.

Item#. . Items etc. A row from Customers relation is a Customer tuple.Ord_Items Databases: Case Example Ord_Aug Ord # 101 102 103 104 105 OrdDate 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94 Cust# 002 003 003 002 005 Ord # 101 101 101 102 103 104 104 105 Items Item # HW1 HW2 HW3 SW1 SW2 Descr Power Supply 101-Keyboard Mouse MS-DOS 6. Ord_Date. A field or a column in a relation. from the given Case Example Ord_Aug. Customers. CustName etc. The number of attributes in a relation.0 MS-Word 6. A row or a record in a relation. Cardinality of Ord_Items relation is 8 Degree of Customers relation is 3.0 Price 4000 2000 800 5000 8000 Item # HW1 HW3 SW1 HW2 HW3 HW2 HW3 SW1 Qty 100 50 150 10 50 25 100 100 Customers Ord # 101 102 103 104 105 OrdDate 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94 Cust# 002 003 003 002 005 Term Relation Tuple Attribute Cardinality of a relation Degree of a relation A table Meaning Eg. The number of tuples in a relation.

Domain of an attribute The set of all values that can be taken by the attribute. An attribute or a combination of attributes in one relation R1 which indicates the relationship of R1 with another relation R2. This can be logically explained with the help of the following example: Consider the relations Employee and Account as given below. An attribute or a combination of attributes that uniquely defines each tuple in a relation. The foreign key attributes in R1 must contain values matching with those of the values in R2 Domain of Qty in Ord_Items is the set of all values which can represent quantity of an ordered item. Ord# and Item# in Ord_Items are foreign keys creating references from Ord_Items to Ord_Aug and Items respectively. • Attributes are unordered – The order of columns in a relation is immaterial.. Primary Key of Customers relation is Cust#. Ord# and Item# combination forms the primary Key of Ord_Items Cust# in Ord_Aug relation is a foreign key creating reference from Ord_Aug to Customers. i. • Attribute Values are Atomic – Each tuple contains exactly one value for each attribute. • No Component of the Primary Key can be null.e. In any relation.2 Properties of Relations • No Duplicate Tuples – A relation cannot contain two or more tuples which have the same values for all the attributes. It may be noted that many of the properties of relations follow the fact that the body of a relation is a mathematical set. • Tuples are unordered – The order of rows in a relation is immaterial. This is called the referential integrity rule. How do we explain this ? Unlike the case of Primary Keys.3 Integrity Rules The following are the integrity rules to be satisfied by any relation. if the application business rule allows this. Q: Can the Foreign Key accept nulls? A: Yes. . every row is unique. there is no integrity rule saying that no component of the foreign key can be null. 3. Primary Key of a relation Foreign Key 3. • The Database must not contain any unmatched Foreign Key values. This is required to indicate the relationship between Orders in Ord_Aug and Customers.

If the business rules allow an employee to exist in the system without opening an account. Cascade: Delete/Update all the references successively or in a cascaded fashion and finally delete/update the parent record. needs to be stored for every order placed. as long as there is a foreign key reference to these records from some other table. Hence Restrict the deletion of the parent record.Employee Emp# X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Account ACC# 120001 120002 120003 120004 OpenDate 30-Aug-1998 29-Oct-1998 01-Jan-1999 04-Mar-1999 BalAmt 5000 1200 3000 500 EmpAcc# 120001 120002 Null 120003 EmpAcc# in Employee relation is a foreign key creating reference from Employee to Account. 003 or 005 ? The default answer is NO. . Here. can be deleted only after deleting those records with Ord# 101 and 104 from Ord_Items relation. a Null value can be allowed for EmpAcc# in Employee relation. The next issue related to foreign key reference is handling deletes / updates of parent? In the case example. can we delete the record with Cust# value 002. in turn. Customer record with Cust#002 can be deleted after deleting order records with Ord# 101 and 104. But these order records. In the case example. a Null value in EmpAcc# attribute is logically possible if an Employee does not have a bank account. Here. Cust# in Ord_Aug cannot accept Null if the business rule insists that the Customer No. In the case example given. Deletion can still be carried if we use the Cascade or Nullify strategies. the records are referenced from the order records in Ord_Aug relation.

if Employee Raj decides to close his account. After the deletion the data in the tables will be as follows: Employee Emp# X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Account ACC# 120001 120002 120003 120004 3. In the above example of Employee and Account relations.Nullify: Update the referencing to Null and then delete/update the parent record. For example.4 Relational Algebra Operators The eight relational algebra operators are 1. But this deletion is not possible as long as the Employee record of Raj references it. SELECT – To retrieve specific tuples/rows from a relation. Account record with Acc# 120002 has to be deleted. OpenDate 30-Aug-1998 29-Oct-1998 01-Jan-1999 04-Mar-1999 BalAmt 5000 1200 3000 500 EmpAcc# 120001 120002 Null Null 120003 . Hence the strategy can be to update the EmpAcc# field in the employee record of Raj to Null and then delete the Account parent record of 120002. an account record may have to be deleted if the account is to be closed.

PROJECT – To retrieve specific attributes/columns from a relation. .Ord# 101 104 OrdDate 02-08-94 18-09-94 Cust# 002 002 2.

Cust# 001 002 003 004 005 001 002 CustName Shah Srinivasan Gupta Banerjee Apte Shah Srinivasan City Bombay Madras Delhi Calcutta Bombay Bombay Madras . Ord# 101 101 101 101 101 102 102 OrdDate 02-08-94 02-08-94 02-08-94 02-08-94 02-08-94 11-08-94 11-08-94 O.Cust# 002 002 002 002 002 003 003 C.0 Price 4000 2000 800 5000 8000 3. PRODUCT – To obtain all possible combination of tuples from two relations.0 MS-Word 6.Descr Power Supply 101-Keyboard Mouse MS-DOS 6.

INTERSECT. UNION – To retrieve tuples appearing in either or both the relations participating in the UNION.To retrieve tuples appearing in both the relations participating in the INTERSECT.4. Eg: Consider the relation Ord_Jul as follows (Table: Ord_Jul) Ord# 101 102 101 102 103 104 105 OrdDate 03-07-94 27-07-94 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94 Cust# 001 003 002 003 003 002 005 Note: The union operation shown above logically implies retrieval of records of Orders placed in July or in August 5. .

DIFFERENCE – To retrieve tuples appearing in the first relation participating in the DIFFERENCE but not the second.Eg: To retrieve Cust# of Customers who've placed orders in July and in August Cust# 003 6. .

Eg: To retrieve Cust# of Customers who've placed orders in July but not in August Cust# 001 7. Eg: ORD_AUG join CUSTOMERS (here. JOIN – To retrieve combinations of tuples in two relations based on a common field in both the relations. the common column is Cust#) Ord# 101 OrdDate 02-08-94 Cust# 002 CustNames Srinivasan City Madras .

1999 BalAmt 5000 1200 3000 500 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Acc# 120001 120002 Null 120003 A join can be formed between the two relations based on the common column Acc#. 1999 4. Mar. This is the most common join operation. EMPLOYEE EMP # X101 X102 X103 X104 ACCOUNT Acc# 120001 120002 120003 120004 OpenDate 30. Jan. Aug. Aug. Consider the example of EMPLOYEE and ACCOUNT relations. Such a join operation where only those rows having corresponding rows in the both the relations are retrieved is called the natural join or inner join. 1998 29.102 103 104 105 11-08-94 21-08-94 28-08-94 30-08-94 003 003 002 005 Gupta Gupta Srinivasan Apte Delhi Delhi Madras Bombay Note: The above join operation logically implies retrieval of details of all orders and the details of the corresponding customers who placed the orders. 1998 BalAmt 5000 . Oct. The result of the (inner) join is : Emp# X101 EmpName Shekhar EmpCity Bombay Acc# 120001 OpenDate 30. 1998 1.

only those records which have corresponding records in the other table appear in the result set. from each table. Oct. Otherwise. Oct. Jan 1999 1200 3000 Note that. 1998 NULL 1. EMPLOYEE left outer join ACCOUNT gives: Emp# X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Acc# 120001 120002 NULL 120003 OpenDate 30. . columns of the right-side table will take null values. The other type of join is the outer join which has three variations – the left outer join. Aug. 1998 29. the correspondence will be shown. If there are corresponding or related rows in the right-side table.X102 X104 Raj Vani Pune Bhopal 120002 120003 29. Otherwise. columns of the left-side table will take null values. These three joins are explained as follows: The left outer join retrieves all rows from the left-side (of the join operator) table. 1998 1. If there are corresponding or related rows in the left-side table. the right outer join and the full outer join. This means that result of the inner join shows the details of those employees who hold an account along with the account details. Jan 1999 BalAmt 5000 1200 NULL 3000 The right outer join retrieves all rows from the right-side (of the join operator) table. the correspondence will be shown.

If there is a correspondence or relation between rows from the tables of either side. Otherwise. Aug. 1998 1. 1999 BalAmt 5000 1200 3000 500 (Assume that Acc# 120004 belongs to someone who is not an employee and hence the details of the Account holder are not available here) The full outer join retrieves all rows from both the tables. Mar. Oct. Oct. EMPLOYEE full outer join ACCOUNT gives: Emp# X101 X102 EmpName Shekhar Raj EmpCity Bombay Pune Acc# 120001 120002 OpenDate 30. Jan 1999 4. 1998 BalAmt 5000 1200 . Aug. related columns will take null values. 1998 29.EMPLOYEE right outer join ACCOUNT gives: Emp# X101 X102 X104 NULL EmpName Shekhar Raj Vani NULL EmpCity Bombay Pune Bhopal NULL Acc# 120001 120002 120003 120004 OpenDate 30. 1998 29. the correspondence will be shown.

Data Manipulation Language – Consists of SQL statements for operating on the data (Inserting. Jan 1999 4. 4. Mar. Deleting and Retrieving Data) in tables which already exist. DIVIDE Consider the following three relations: R1 divide by R2 per R3 gives: a Thus the result contains those values from R1 whose corresponding R2 values in R3 include all R2 values. .1 SQL : An Overview The components of SQL are a.X103 X104 NULL Sharma Vani NULL Nagpur Bhopal NULL NULL 120003 120004 NULL 1. 1999 NULL 3000 500 Q: What will the result of a natural join operation between R1 and R2 ? A: a1 a2 a3 b1 b2 b3 c1 c2 c3 8. Structured Query Language (SQL) 4. Modifying.

views etc. Data Definition Language – Consists of SQL statements for defining the schema (Creating. Data Control Language – Consists of SQL statements for providing and revoking access permissions to users Tables used: .b. indexes.) c. Modifying and Dropping tables.

2 DML – SELECT.Keyboard Mouse MS-DOS 6.Ord_Aug Ord# 101 102 103 104 105 Items Item# HW1 HW2 HW3 SW1 SW2 Descr Power Supply 101. The SELECT statement . UPDATE and DELETE statements.0 Price 4000 2000 800 5000 8000 OrdDate 02-AUG-94 11-AUG-94 21-AUG-94 28-AUG-94 30-AUG-94 Cust# 002 003 003 002 005 Ord_Items Ord# 101 101 101 102 103 104 104 105 Item# HW1 HW3 SW1 HW2 HW3 HW2 HW3 SW1 Qty 100 50 150 10 50 25 100 100 Customers Cust# 001 002 003 004 005 CustName Shah Srinivasan Gupta Banerjee Apte City Bombay Madras Delhi Calcutta Bombay 4.0 MS-Word 6. INSERT.

custname FROM customers. Result Query 4: SELECT ord# "Order ".Retrieves rows from one or more tables according to given conditions. General form: SELECT [ ALL | DISTINCT ] <attribute (comma)list> FROM <table (comma)list> [ WHERE <conditional expression>] [ ORDER BY [DESC] <attribute list> [ GROUP BY <attribute (comma)list>] [ HAVING <conditional expression>] Query 1: Some SELECT statements on the Case Example SELECT * <----------------FROM items. In the result set the column headings will appear as “Order” and “Ordered On” instead of ord# and orddate. * -denotes all attributes in the table . Result Query 2: SELECT cust#. Result Query 3: SELECT DISTINCT item# FROM ord_items. orddate "Ordered On" <--FROM ord_aug.

a separate datatype (eg: date. <---------Result Query 9: SELECT * FROM ord_items Illustrates the use of 'date' fields. descr FROM items WHERE price>2000. datetime etc. Result Query 8: SELECT * FROM ord_aug WHERE orddate > '15-AUG-94'.) is available to store data which is of type date.Result Query 5: SELECT item#. Result Query 6: SELECT custname FROM customers WHERE city<>'Bombay'. Result Query 7: SELECT custname FROM customers WHERE UPPER(city)<>'BOMBAY'. . In SQL.

'Madras') Result Query 11: SELECT custname FROM customers WHERE custname LIKE 'S%' . <-----------LIKE 'S%' . <-----The conditional expression evaluates to TRUE for those records for which the value of city field is in the list ('Bombay. 'Madras').WHERE qty BETWEEN 100 AND 200. Result Query 10: SELECT custname FROM customers WHERE city IN ('Bombay'.'S' followed by zero or more characters Result Query 12: SELECT * FROM ord_items WHERE qty>100 AND item# LIKE 'SW%'. Result . Result Query 13: SELECT custname FROM customers WHERE city='Bombay' OR city='Madras'.

Query 14: SELECT * FROM customers WHERE city='Bombay' ORDER BY custname. qty DESC. <------------Display the result set in the ascending order of item#. <-------------------- Records in the result set is displayed in the ascending order of custname Result Query 15: SELECT * FROM ord_items ORDER BY item#. If there are more than one records with the same item# . they will be displayed in the descending order of qty Result Query 16: SELECT descr. price .

cust# = customers. <---------------- SELECT statement implementing JOIN operation.cust# (+). customers WHERE city='Delhi' AND ord_aug. JOIN condition Result Query 18: SELECT ord#.cust#. custname <---------------FROM ord_aug. <----------Result Nested SELECT statements (+) indicates outer join. city FROM ord_aug. Result Query 19: SELECT ord#. <---------------------------ORDER BY the 2nd attribute (price) in the attribute list of the SELECT clause Result Query 17: SELECT ord#.cust#. Here it is a right outer join as indicated by the (+) after the right side field.cust#. . city FROM ord_aug.cust# = customers.ORDER BY 2 FROM items ORDER BY 2.cust# = customers. customers.cust#. ord_aug. customers.cust#. customers WHERE ord_aug. customers WHERE ord_aug.

SQL allows nesting of SELECT statements. Result Arithmetic Expressions + * / () Arithmetic functions are allowed in SELECT and WHERE clauses. price <--------------------------FROM items WHERE price > (SELECT AVG(price) FROM items). price. custname FROM customers WHERE city = "BOMBAY" Inner SELECT statement Outer SELECT statement . In a nested SELECT statement the inner SELECT is evaluated first and is replaced by its result to evaluate the outer SELECT statement. Query 22: SELECT descr. <-----Result Query 21: SELECT cust#. price*0. descr. Query 20: SELECT item#. custname <-----------------FROM customers WHERE city = ( SELECT city FROM customers WHERE custname='Shah').1 "discount" FROM items WHERE price >= 4000 Here the outer SELECT is evaluated as SELECT cust#.

m) . Result Query 23: SELECT descr FROM items.item# = ord_items. Result Query 25: SELECT qty. ROUND(qty/2.0) "qty supplied" FROM ord_items WHERE item#='HW2'.m) SQRT(n) ROUND(n. Result Examples of Numeric Functions MOD(n. Result Numeric Functions Query 24: SELECT qty. TRUNC(qty/2.0) "qty supplied" FROM ord_items WHERE item#='HW2'.item#.ORDER BY 3.m) TRUNC(n. ord_items WHERE price*qty > 250000 and items.

of days Date – Date Query 26: SELECT ord#. MONTHS_BETWEEN(SYSDATE. orddate+15 "Supply by" FROM ord_aug. no. of months) SYSDATE Returns system date. date2) ADD_MONTHS(date.'m' indicates the number of digits after decimal points in the result. Converts the value of the date field orddate to character string of the format DD/MM/YYYY .' DD/MM/YYYY') <-FROM ord_aug. of days Date .orddate) FROM ord_aug.No. Result Query 28: SELECT TO_CHAR(orddate. Result Date Functions MONTHS_BETWEEN(date1. Query 27: SELECT ord#. Date Arithemetic Date + No.

.Result Note: DD .name of day MM .day of month (1-31) D .day of week (1-7) DAY .fill mode : suppress blank padding Character Expressions & Functions || . Result Examples of Character Functions: INITCAP(string) UPPER(string) LOWER(string) SUBSTR(string.name of month MON .no.Concatenate operator Query 29: SELECT custname || ' . of characters) Group Functions Group functions are functions which act on the entire column of selected rows.month (01-12) MONTH .abbreviated name of month HH:MI:SS .hours:minutes:seconds fm .' || city FROM customers.start.

<-----------------Result Query 33: HAVING clause used to apply the condition to be applied on the grouped rows and display the final result. AVG(qty) <--------------FROM ord_items WHERE item#='SW1'. GROUP BY clause used to group rows according to the value of item# in the result. <------------------------Result Query 32: SELECT item#. .Query 30: SELECT SUM(qty). They compute the sum/average of qty values of all rows where item#='SW1'. Result Examples of Group Functions: SUM AVG COUNT MAX MIN Query 31: SELECT item#. SUM and AVG are examples of Group Functions. SUM(qty) FROM ord_items GROUP BY item#. SUM function acts individually on each group of rows. SUM(qty) FROM ord_items GROUP BY item# HAVING SUM(qty)>100.

To insert multiple tuples INSERT INTO <table-name> [<attribute (comma)list>] SELECT [ ALL | DISTINCT ] <attribute (comma)list> FROM <table (comma)list>* [ WHERE <conditional expression>]. SUM(qty) FROM ord_items GROUP BY item# HAVING COUNT(*)>2.list of existing tables Sample INSERT statements from the Case Example Query 34: Insert all values for a new row INSERT INTO customers <------------------VALUES (006. Query 35: Insert values of item# & descr columns for a new row . Attribute list need not be mentioned if values are given for all attributes in the tuple. 'Madras').SELECT item#. Inserts a single row in Customers Table. Result The INSERT statement Inserts one or more tuples in a table. * . General forms: To insert a single tuple INSERT INTO <table-name> [<attribute (comma)list>] VALUES <value list>. 'Krishnan'.

005).attribute-n = value-n] [ WHERE <conditional expression>]. '31-AUG-94'. descr) <---------VALUES ('HW4'. attribute-2 = value-2. Here Price column for the newly inserted tuple takes NULL value. TO_DATE('310894'. The UPDATE statement Updates values of one or more attributes of one or more tuples in a table. Query 39: Changes a wrongly entered item# from HW2 to SW2 UPDATE ord_items SET item# = 'SW2' WHERE ord#=104 AND item# = 'HW2'. 005). Query 37: Inserts a new row with the date field being specified in non DD-MON-YY format INSERT INTO ord_aug VALUES (106... Sample UPDATE statements from the Case Example Query 38: changes price of itmem SW1 to 6000 UPDATE items SET price = 6000 WHERE item# ='SW1'.INSERT INTO items (item#. '132-DMPrinter').. The DELETE statement .'DDMMYY'). Attribute list mentioned since values are not given for all attributes in the tuple. Query 36: Inserts a new row which includes a date field INSERT INTO ord_aug VALUES(106. General form: UPDATE <table-name> SET <attribute-1 = value-1[.

<------------------Deletes all rows in Ord_Items Table. views. indexes and other elements of the DBMS. The table remains empty after the DELETE operation.Deletes one or more tuples in a table according to given conditions General form: DELETE FROM <table-name> [ WHERE <conditional expression>]. and DROP statements. General form: CREATE TABLE <table-name> (<table-element (comma)list>*). modify and drop the definitions or structures of various tables. 4.table element may be attribute with its data-type and size or any integrity constraint on attributes.3 DDL – CREATE. Sample DELETE statements from the Case Example Query 40: Deletes Customer record with Customer Number 004 DELETE FROM customers WHERE cust# = 004. Some CREATE TABLE statements on the Case Example Query: CREATE TABLE customers ( cust# NUMBER(6) NOT NULL. The CREATE TABLE statement Creates a new table. custname CHAR(30) . * . ALTER. DELETE FROM Ord_Items. . DDL statements are those which are used to create.

This query Creates a table CUSTOMERS with 3 fields . which has the same structure of ord_aug. .cust#. . No data in ord_aug is copied to the new table since there is no row which satisfies the 'always false' condition 1 = 2.city CHAR(20)). The ALTER TABLE statement Alters the structure of an existing table. custname and city. Creates a new table ord_sep. Cust# cannot be null Query: CREATE TABLE ord_sep <------------------AS SELECT * from ord_aug. The data in ord_aug is copied to the new table ord_sep. Copies structure as well as data. which has the same structure of ord_aug.This query changes the custname field to a character field of length 35.This query Creates table ORD_SEP as a copy of ORD_AUG. Creates a new table ord_sep.This query Creates table ORD_SEP as a cpy of ORD-AUG. Query: ALTER TABLE customers MODIFY custname CHAR(35). Used for modifying field lengths and attributes. <------------Modifies the data type/size of an attribute in the table . Examples of ALTER TABLE statement. . but does not copy any data as the WHERE clause is never satisfied. . Query: CREATE TABLE ord_sep <-----------------AS SELECT * from ord_aug WHERE 1 = 2. General form: ALTER TABLE <table-name> ADD | MODIFY (<table-element (comma)list).

Query: CREATE VIEW myview1 AS SELECT ord#. for existing tuples (if any).cust#. ord_aug. and custname using a join of ORD_AUG and CUSTOMERS tables.Query: ALTER TABLE customers ADD (phone number(8). . . at any given time will evaluate the view-defining query in the CREATE VIEW statement and display the result. orddate.This query defines a view consisting of ord#. the new attribute will take NULL values since no DEFAULT value is mentioned for the attribute. cust#.phone & credit_rating to the customers table. Example: Query: DROP TABLE ord_sep. General form: DROP TABLE <table-name>. . Adds two new attributes to the Customers table. <-----------------credit_rating char(1)). The DROP TABLE statement DROPS an existing table. customers WHERE ord_aug.This query adds two new fields .cust# = customers.The above query drops table ORD_SEP from the database Creating & Dropping Views A view is a virtual relation created with attributes from one or more base tables. custname FROM ord_aug. Here.cust#. SELECT * FROM myview1. .

price FROM items WHERE price < 1000 WITH CHECK OPTION. qty FROM ord_items. descr.To drop a view . and Quantity respectively. Creates a new index named i_city.This query defines the view as defined. <---. Query: CREATE VIEW myview3 AS SELECT item#. and renames these columns as ItemNo. . Quantity) AS SELECT item#. <------------------WITH CHECK OPTION in a CREATE VIEW statement indicates that INSERTs or UPDATEs on the view will be rejected if they violate any integrity constraint implied by the view-defining query. WITH CHECK OPTION ensures that if this view is used for updation.this query drops the view MYVIEW1 Creating & Dropping Indexes Query: CREATE INDEX i_city <-------------------ON customers (city). Query: DROP VIEW myview1. the updated values do not cause the row to fall outside the view. The new index file(table) will have the values of city column of Customers table Query: CREATE UNIQUE INDEX i_custname <----Creates an index which allows only unique values for .Query: CREATE VIEW myview2 (ItemNo.This query defines a view with columns item# and qty from the ORD_ITEMS table. .

4 DCL – GRANT and REVOKE statements. Query: GRANT SELECT ON customers TO sunil WITH GRANT OPTION. . <-------Query: Enables user 'sunil' to give SELECT permission on customers table to other users. indexes. Query: CREATE INDEX i_city_custname <--------ON customers (city. <-------------------- custnames Creates an index based on two fields : city and custname Drops index i_city 4. Grants all permissions on the table customers to the user who logs in as 'ashraf'. views and other elements of the DBMS. DCL statements are those which are used to control access permissions on the tables. User 'sunil' does not have permission to insert. Query: GRANT SELECT <-------------ON customers TO sunil. custname). update. Query: DROP INDEX i_city. Grants SELECT permission on the table customers to the user 'sunil'.ON customers (custname). delete or perform any other operation on customers table. Granting & Revoking Privileges Query: GRANT ALL <-----------------ON customers TO ashraf.

Any transaction takes the database from one consistent state to another. it has to be noted that the single operation “amount transfer” involves two database updates – updating the record of from_cust and updating the record of to_cust. WHENEVER SQLERROR GOTO UNDO UPDATE DEPOSIT SET BALANCE=BALANCE-100 WHERE CUSTID=from_cust.. if only one of the updates is performed. This is true with all transactions. The set of programs which handles this forms the . GOTO FINISH ROLLBACK. In between these two updates the database is in an inconsistent (or incorrect in this example) state. 5. It need not necessarily preserve consistency of database at all intermediate points.e. one cannot say by seeing the database contents whether the amount transfer operation has been done or not. Recovery and Concurrency in a DBMS are part of the general topic of transaction management. Consider the following example: The procedure for transferring an amount of Rs. Here. something goes wrong due to problems like a system crash. or a violation of an integrity constraint etc. EXEC SQL EXEC SQL EXEC SQL EXEC SQL UNDO: EXEC SQL FINISH: RETURN.REVOKE DELETE <------------ON customers FROM ashraf. i. If. Recovery and Concurrency Takes away DELETE permission on customers table from user 'ashraf'. Hence we shall begin the discussion by examining the fundamental notion of a transaction.. an overflow error. Hence to guarantee database consistency it has to be ensured that either both updates are performed or none are performed. 5. Hence it is important to ensure that either a transaction executes in its entirety or is totally cancelled. then the first update needs to be undone. 100/. UPDATE DEPOSIT SET BALANCE=BALANCE+100 WHERE CUSTID=to_cust: COMMIT.1 Transaction A transaction is a logical unit of work. after one update and before the next update.from the account of one customer to another is given.

Thus the contents of the database buffers which contain the updates of transactions are lost.transaction manager in the DBMS. During a system failure. the system has to ensure that the ACID properties of transactions are maintained and the database remains in a consistent state. Durability: Once a transaction commits. a system log or journal is maintained by the transaction manager. ROLLBACK – The ROLLBACK operation indicates that the transaction has been unsuccessful which means that all updates done by the transaction till then need to be undone to bring the database back to a consistent state.ACID standing for atomicity. at regular intervals..e. then the system will guarantee that its updates will be permanently installed in the database even if the system crashes immediately after the COMMIT.e.. its updates survive in the database even if there is a subsequent system crash. transferred to the database. The properties of transaction can be summarised as ACID properties . consistency. COMMIT – The COMMIT operation indicates successful completion of a transaction which means that the database is in a consistent state and all updates made by the transaction can now be made permanent. If a transaction successfully commits. A transaction's updates are concealed from all others until it commits (or rolls back). but do not physically damage the database. A transaction transforms a consistent state of the database into another without necessarily preserving consistency at all intermediate points. i. • . This is needed because the precise state of such a transaction which was active at the time of failure is no longer known and hence cannot be successfully completed. Either all operations in the transaction have to be performed or none should be performed. isolation and durability. Atomicity: A transaction is atomic.2 Recovery from System Failures System failures (also called soft crashes) are those failures like power outage which affect all transactions in progress. Isolation: Transactions are isolated from one another. the strategy to be followed for recovery at restart is as follows: • Transactions which were in progress at the time of failure have to be undone at the time of restart. 5. the contents of the main memory are lost. The beforeand after-images of the updated tuples are recorded in the log. i.) At restart. (Note: Transactions do not directly write on to the database. To attain this. To help undoing the updates once done. The updates are written to database buffers and. The transaction manager uses COMMIT and ROLLBACK operations for ensuring atomicity of transactions. Consistency: Transactions preserve database consistency. Transactions which had completed prior to the crash but could not get all their updates transferred from the database buffers to the physical database have to redone at the time of restart.

b)physically writing a special checkpoint record to the physical log. T3 and T5 must be undone and T2 and T4 must be redone. including both active and committed transactions.3 Recovery : An Example At the time of restart. Thus during a checkpoint the updates of all transactions. The checkpoint record has a list of all active transactions at the time of taking the checkpoint. This helps in carrying out the UNDO and REDO operations as required. T1 does not enter the recovery procedure at all since it updates were all written to the database at time tc as part of the checkpoint proces . will be written to the physical database.This recovery procedure is carried out with the help of • An online logfile or journal – The logfile maintains the before.and after-images of the tuples updated during a transaction. Typical entries made in the logfile are : • • • • • • • Start of Transaction Marker Transaction Identifier Record Identifier Operations Performed Previous Values of Modified Data (Before-image or Undo Log) Updated Values of Modified Records (After-image or Redo Log) Commit / Rollback Transaction Marker • Taking a checkpoint at specific intervals – This involves the following two operations: a) physically writing the contents of the database buffers out to the physical database. 5.

say Amt. the Amt value in record R has value 1200. a) Lost Update Problem (To understand the above situation. some kind of control mechanism has to be in place to ensure that concurrent transactions do not interfere with each other.4 Concurrency Concurrency refers to multiple transactions accessing the same database at the same time. Three typical problems which can occur due to concurrency are explained here.5. Both transactions A & B fetch this value at t1 and t2 respectively. having value 1000 before time t1. Thus after time t4. Transaction B updates the Amt field in R to 1200 at time t4.) b) Uncommitted Dependency Problem . In a system which allows concurrency. assume that • there o o o is a record R. Transaction A updates the Amt field in R to 800 at time t3. with a field. Update by Transaction A at time t3 is over-written by the Transaction B at time t4.

The Amt field takes the initial value 1000 during rollback. Transaction A fetches R with Amt field value 800 at time t2. Transaction A continues processing with Amt field value 800 without knowing about B's rollback.(To understand the above situation. having value 1000 before time t1. say Amt.) c) Inconsistent Analysis Problem . Transaction B fetches this value and updates it to 800 at time t1. with a field. assume that • there o o o is a record R. Transaction B rolls back and its update is undone at time t3.

shared (S lock) 2.5 Locking Locking: A solution to problems arising due to concurrency. Locks are of two types 1. (Here update means INSERT. The transaction releases the lock after this time.5. . Locking of records can be used as a concurrency control technique to prevent the above mentioned problems. An exclusive (write) lock is acquired on a record when a transaction wishes to update the record.) The following figure shows the Lock Compatibility matrix. and exclusive (X Lock). UPDATE or DELETE. A transaction acquires a lock on a record if it does not want the record values to be changed by some other transaction during a period of time. • • A transaction acquires a shared (read) lock on a record when it wishes to retrieve or fetch the record.

Explicit lock requests need to be issued if a different kind of lock is required during an operation. 5.Normally. each of them waiting for one of the others to release a lock before it can proceed. Breaking a . A FETCH request is an implicit request for a shared lock whereas an UPDATE request is an implicit request for an exclusive lock. Deadlock is a situation in which two or more transactions are in a simultaneous wait state. the system may detect it and break it. However. if an X lock is to acquired before a FETCH it has to be explicitly requested for. locking can also introduce the problem of deadlock as shown in the example below.6 Deadlocks Locking can be used to solve the problems of concurrency. If a deadlock occurs. locks are implicit. For example. Detecting involves detecting a cycle in the “Wait-For Graph” (a graph which shows 'who is waiting for whom').

Query Optimization 6. this knowledge can be utilised only if the query is re-written each time. 6. thereby releasing all its locks.000 records in ORD_ITEMS There are 50 order items with item# 'HW3' Query Evaluation – Method 1 T1 = ORDTBL X ORD_ITEMS (Perform the Product operation as the first step towards joining the two tables) .1 Overview When compared to other database systems. (In a manual system. Select ORDDATE. Automatic optimization done by the relational systems will be much more efficient than manual optimization due to several reasons like : • • uniformity in optimization across programs irrespective of the programmer's expertise in optimizing the programs.deadlock implies choosing one of the deadlocked transactions as the victim and rolling it back. query optimization is a strength of the relational systems. Assumptions: • • • There are 100 records in ORDTBL There are 10. Consider the following query. For the same query. This may allow some other transaction(s) to proceed.ORD# and ITEM# = 'HW3'.) system's ability to evaluate large number of alternatives to find the most efficient query evaluation method. Deadlock prevention can be done by not allowing any cyclic-waits. • In this chapter we shall look into the process of automatic query optimization done by the relational systems. It can be said so since relational systems by themselves do optimization to a large extent unlike the other systems which leave optimization to the programmer. system's ability to make use of the knowledge of internal conditions (eg: volume of data at the time of querying) for optimization. which is not practically possible. such conditions may be different at different times of querying. ORD_ITEMS where ORDTBL. ITEM#.2 An Example of Query Optimization Let us look at a query being evaluated in two different ways to see the dramatic effect of query optimization. QTY from ORDTBL.ORD# = ORD_ITEMS. 6.

No more tuple i/o s) .1000000 tuples read into memory (1000000 tuple reads) .100 tuple reads (100 tuple reads from ORDTBL) . QTY(T2) (Projection performed as the final step. No more tuple i/o s) .QTY (T2) (Projection performed as the final step.50 selected (those tuples satisfying both the conditions.ITEM#.10000 tuple reads (10000 tuple reads from ORD_ITEMS) .10000 X 100 tuple reads (1000000 tuple reads -> generates 1000000 tuples as intermediate result) .000 tuple I/O's (of Method 1) ! . 1000000 tuple writes to a temporary space in the disk.resulting relation with 50 tuples T3 = ORDDATE.50 tuples (final result) Total no. 50 held in the memory itself) T3 = ORDDATE. of tuple i/o s = 10000 reads + 100 reads = 10100 tuple i/o's Comparison of the two Query Evaluation Methods 10.100 tuple I/O's (of Method 2) v/s 3.) ORDTBL.000. No disk writes assuming that the 50 tuples forming the intermediate result can be held in the memory) T2 = ORDTBL JOIN T1 . ITEM#.. of tuple i/o s = 1000000 reads + 1000000 writes + 1000000 reads = 3000000 tuple i/o s Query Evaluation – Method 2 T1 = step) ITEM#='HW3' (ORD_ITEMS) (Perform the Select operation on ORD_ITEMS as the first . no disk writes (50 tuples satisfy the condition in Select.1000000 tuples written to disk (Assuming that 1000000 tuples in the intermediate result cannot be held in the memory.ORD# & ITEM# 'HW3'(T1) T2 = (Apply the two conditions in the query on the intermediate result obtained after the first step) .50 tuples (final result) Total no.ORD# = ORD_ITEMS.50 tuples selected.

6. Query Tree for the SELECT statement discussed above: . select CITY. irrespective of the cardinality of the relation. v/s v/s Here the second version is faster. COUNT(*) from CUSTTBL group by CITY having CITY != 'BOMBAY'. 'dd-mm-yy'). Some more examples: select CITY. the first operation to be performed was a 'Select' which filters out 50 tuples from the 10. The time for this evaluation will be thus proportional to the cardinality of the relation. Here it needs to be noted that in the Method 2 of evaluation. COUNT(*) from CUSTTBL 1. a function to_date is applied on a constant and hence needs to be evaluated just once. select * from ORDTBL where ORDDATE = to_date('11-08-94'. since the attribute appears in an expression and its value is not directly used. Moreover. where CITY != 'BOMBAY' group by CITY. Thus this operation causes elimination of 9950 tuples.000 tuples in the ORD_ITEMS table. In the second form. a function to_char is applied on an attribute and hence needs to be evaluated for each tuple in the table. if the attribute ORDDATE is indexed.Thus by sequencing the operations differently a dramatic difference can be made in the performance of queries. the index will not be used in the first case.3 The Query Optimization Process The steps of query optimization are explained below. where to_char(ORDDATE. select * from ORDTBL 2. The internal form typically chosen is a query tree as shown below. Thus elimination in the initial steps would help optimization. In the first form of the query. a) Cast into some Internal Representation – This step involves representing each SQL query into some internal representation which is more suitable for machine manipulation.'dd-mm-yy') = '11-08-94'.

cause eliminations and hence better performance. the optimizer makes use of some transformation laws or rules for sequencing the internal operations involved. Some examples are given below. Rule 3: (A[projection_1])[projection_2] . (Note: In all these examples the second form will be more efficient irrespective of the actual data values and physical access paths that exist in the stored database.b)Convert to Canonical Form – In this second step. Rule 2: (A WHERE restriction_1) WHERE restriction_2 A WHERE restriction_1 AND restriction_2 Two restrictions applied as a single compound one instead applying the two individual restrictions separately. ) Rule 1: (A JOIN B) WHERE restriction_A AND restriction_B (A WHERE restriction_A) JOIN (B WHERE restriction_B) Restrictions when applied first.

The basic strategy here is to consider the query expression as a set of low-level implementation procedures predefined for each operation. are considered. The information about the current state of the database (existence of indexes. This can be explained with the following example(A trivial one but illustrative enough). current cardinalities etc. there will be a set of procedures for implementing the restriction operation: one (say.) which is available from the system catalog will be used to make this choice of candidate procedures. query plans are generated by combining a set of candidate implementation procedures.. Each such procedure has and associated cost measure indicating the cost. Rule 4: (A[projection]) WHERE restriction (A WHERE restriction)[projection] Restrictions when applied first.e. Assume that there is a query expression comprising a restriction. one (say. i. physical clustering of records. cause eliminations and hence better performance. d)Generate Query Plans and Choose the Cheapest – In this last step. procedure 'b') where the restriction attribute is hashed and so on. Implementation Procedure a Operation Restriction Condition Existing Restriction attribute is indexed . At this stage factors such as existence of indexes or other access paths. the optimizer decides how to execute the transformed query. For eg. The optimizer chooses one or more candidate procedures for each low-level operations in the query.. Some examples. a join and a projection. typically in terms of disk I/Os. all but the last one can be ignored. procedure 'a') for the case where the restriction attribute is indexed. distribution of data values etc. c)Choose Candidate Low-level Procedures – In this step. Reference [1] gives more such general transformation laws.A[projection_2] If there is a sequence of successive projections applied on the same relation. of implementation procedures available for each of these operations can be assumed as given in the table below. The entire operation is equivalent to applying the last projection alone.

Considering the above example.4 Query Optimization in Oracle Some of the query optimization measures used in Oracle are the following: –Indexes unnecessary for small tables... Hence indexes will not make much difference in the performance of queries. Thus the query plans can be – adf ... 6. the number of such query plans possible can be too many and hence generating all such plans and then choosing the cheapest will be expensive by itself. i. one such heuristic method can be as follows: If the system knows that the restriction attribute is neither indexed nor hashed. It has to be noted that in reality. –Indexes/clusters when retrieving less than 25% of rows. Hence a heuristic reduction of search space rather than exhaustive search needs to be done..adg – aef – aeg – bdf .Restriction Restriction Join Join Projection Projection Restriction attribute is hashed Restriction attribute is neither indexed nor hashed b c d e f g Now the various query plans for the original query expression can be generated by making permutations of implementation procedures available for different operations.e. –Multiple column WHERE clauses –evaluations causing largest number of eliminations performed first . The overhead of searching in the index file will be more when retrieving more rows. . then the query plans involving implementation procedure 'c ' alone (and not 'a' and 'b') need to be considered and the cheapest plan can be chosen from the reduced set of query plans. the search time in the index table and the data table will be comparable. if the size of the actual data record is not much larger than the index record.

JOIN columns or Foreign Key columns may be indexed since queries based on these columns can be expected to be very frequent.–JOIN-columns should be indexed.. & Silberschatz A. 2. McGraw-Hill (Year) . 7th edition.Date C. –Index not used in queries containing NULL / NOT NULL.Korth H. J. Hence need not search for these in the index table. Index tables will not have NULL / NOT NULL entries.. “An Introduction to Database Systems”. F. 2nd edition. 2000. Suggested References: 1. Addison-Wesley. “Database System Concepts”.

. Its perceived benefits are: • • reduction in the effort needed to ensure reliability of software.. Algorithmic decomposition views it as a series of tasks. Two types of decomposition have been used.etc. the concerns have changed from productivity during software development to reliability of software.etc. consider a typical banking system. algorithmic decomposition. it should be noted that concerns arising out of this preoccupation have changed with times. namely. the time tested technique of decomposition also known as "DIVIDE and CONQUER" has been applied from the early days of software development. • • . As software engineering and software technology evolved. and object-oriented decomposition. For example. The object-oriented decomposition views it as Object={Data Structures}+{Operations} Program={Objects} The program is viewed as a set of objects. and improved productivities of the software development process Software being inherently complex. Hence the emphasis on extensibility and reusability of software modules.. The algorithmic decomposition views software as Program={Data Structures}+{Operations} The program is viewed as a series of tasks to be carried out to solve the problem. Hence it is no surprise that it continues to be a dominant theme even today. not unlike the following: • • • • • • Open an account Deposit money Withdraw money Transfer money between accounts Close an account ...However. like the following: • Account object o account number (data) o current balance (data) o get account number (operation) o update balance (operation) Account holder object o name (data) o address (data) o deposit money (operation) o withdraw money (operation) . The object-oriented decomposition views it as a set of objects.Object Oriented Concepts 1.... which cooperate with each other to solve the problem. Introduction Improvement of programmer productivity has been a primary preoccupation of the software industry since the early days of computers.

until the level of difficulty is considered to be manageable. It is a language rich in data structures and control structures and promotes modularity. At the lowest level. Polymorphism: It is the means by which an operation behaves differently in different contexts. i. Given a program one could apply the definition of structuredness and say whether it was a structured program or not. Because of packaging most updates are localised. Structured programming is a style of programming usually associated with languages such as C. But one did not know how to design a structured program. 3. Historical Perspective After early experience in software development. modify. It allows reuse and extension of behaviour. a solution is implemented in terms of data structures and procedures. It improves security and integrity of data Hierarchy: It is the basic means to provide extensibility of software modules. ideas of structured programming were devoid of a methodology. . These themes are discussed further below against the backdrop of significant periods in the history of software engineering methodologies. the data structures and procedures are not implemented as classes. This approach is often used with imperative programming languages that are not object-oriented languages. these gains arise from the dominant themes in the OO paradigm. An initially large problem is broken into several smaller subproblems. It was realised that branching (use of goto's) makes a program unstructured. and helps in increasing the reuse of modules. Permits iterative development of large systems Basically. Pascal and so on. Similarly.2. It avoids unnecessary attention being paid to implementation of data. the software field hoped that structured programming ideas would provide solutions to its problems. Using structured programming techniques. It was felt that use of the most appropriate control structures and data structures would make a program easy to understand. It was looked upon as a technique to make programs easier to understand. Fortran. However. a problem is often solved using a divide and conquer approach. In the absence of a methodology. It was felt that a program should be modular in nature. a novice could obtain absurd results starting from a problem specification. Encapsulation: This helps in controling the visibility of internal details of the objects. This makes it easier to maintain OO systems.e. How object oriented decomposition helps software productivity • • • • Data and operations are packaged together into objects. and extend. guidelines for developing good modular structure for a program were conspicuously absent. Each of these is then progressively broken into even smaller sub-problems. The language PL/I was designed around this time. This facilitates independent and parallel development of code. They are Data Abstaction: This provides a means by which a designer or programmer can focus on the essential features of her data.

similarly. The point to be noted here is that permitting a user to define her own data permits her to use most appropriate data for a problem. Structured data. It is more natural than use of data or operations which have more or less generality than needed. Avoid going to great depth. It can be summarised as follows: 1. When a clear and unambiguous statement of the problem is written in simple English in the above stated manner • • • • the data gets identified in the abstract form. Identify the fundamental data structures involved in the problem. also popularised the methodology of stepwise refinement. when performed on the data defined in the above step. the rule that each module should be about 50 lines of code can be used to decide that each segment of 50 lines in a program should become a module. not on how it is to be done. . If any of these operations are composed of other operations apply steps 1-4 to each such operation. These operations are also domain specific. provide effective means to control complexity of program design provide easy means to help a user design his data The PASCAL language was designed with a rich control structure. 2. a user could define the set of values a data item could take. the essential features of an operation get highlighted such data and operations are said to be 'domain-specific'. Thus. and the means to support user defined data. For example. These could be Simple data. 4. Design the operations identified in step 3 above. array and record types Operations on user defined data were coded as procedures and functions. viz. viz. It is also less prone to design or programming errors. Other features are suppressed. 3. 5. receipt} var trans_code : code. Application of this methodology in a recursive manner leads to a hierarchical program structure. Around this time it was also felt that having to use a set of predefined data types of the language was very constraining. PASCAL permitted the programmer to define her own data types.For example. This data belongs to the problem domain. Niklaus Wirth. Identify the fundamental operations which. So a methodology for program design was necessary. The designer of PASCAL. would implement the problem specification. Also focus on what is to be done. An operation being defined becomes a 'problem' to be solved at a lower level leading to the identification of its own data and sub-operations. They are directly meaningful in the problem domain use of domain-specific data and operations simplifies program design. Write a clear statement of the problem in simple English. and hence prone to design errors/program bugs. enumerated data types. Design the data identified in step 2 above. an inventory control program could start like this: type code = {issue. It should • • • make the task of program design simple and systematic. Only the features of the problem get highlighted. and could also define how the values were to be manipulated.

This endangers data consistency. real1 := real1+3. Thus. Hence complex conjugates can only be used in some specific manner. (At this point you are encouraged to visit these links [1]. end.. it is the first language which introduced many of the object oriented concepts. This takes compile-time validation of a program one step further. It implies the following: • • • Syntax errors should be detected by a compiler. Most compilers do this. Definition of the operations that can be performed on the data. will be visible and the rest will be hidden. real2 : real. However. which introduced the concept of classes. It will allow its component real1 to be used as a real variable. invalid use of data should be detected by the compiler... PASCAL lacks features to define legal operations on user defined data and to ensure that only such operations are used on the data. it did not provide encapsulation. However. PASCAL does not fully succeed in implementing compile-time validation. This will eliminate the debugging effort which would otherwise be required to detect and correct the error. Thus. PASCAL was also the first widely used language to emphasize the importance of compile-time validation of a program. Violations of language semantics should also be detected by a compiler. imag1 : real. Encapsulation implies 'sealing' of the internal details of data and operations. imag2 : real. {this violates the idea of complex conjugates} The user defined data is like any other data and hence its components can be used in any manner consistent with the type of the components. A class in SIMULA is a linguistic unit which contains: • • Definition of the legal values that data of the class can assume. A user can create variables of a class in her program.begin . This would ensure that a program cannot perform any invalid operation on the data. if trans_code = issue then . which are taken for granted today. begin . For example: type complex_conjugate = record real1 : real. and can use the operations defined in the class on these variables. Only those data and operations.. SIMULA was the first language. The semantic checks should also apply to user-defined data. This way the compiler knows what operations are legal on the data. [2] .. Infact. and hence less prone to errors. Encapsulation implies compile-time validation. but this is not known to the compiler. Advantage of this is that programming is now more natural. Attempts to use any other operations will lead to compile-time errors.. which are explicitly declared as visible to the outside world. illegal access to the imaginary part of a complex number can be prevented. stressing its importance is one of its achievements. Use of encapsulation in the class promotes information hiding.

The responsibility for correct manipulation of the data now rests only with module B. The program consists of one 'main' object which is active when the program is initiated. Chnages made in module B do not affect module A. All other characteristics are unimportant. Thus a single copy of the code exists in the program. Each operation is called a method.Data abstraction coupled with encapsulation provides considerable advantages in the presence of modularity. Each object is an instance of a class. we identify their essential characteristics. . Virtual machine abstraction: The object groups together operations that are all used by some superior level of control Coincidental abstraction: The object packages operations that have no relation to each other. Consider a program consisting of two modules A and B. The data of module B can only be manipulated by its operations. In most present-day object oriented systems. module A cannot access the data of module B directly. i. There are different types of abstraction: • • • • Entity abstraction: The object presents a useful model of an entity in the problemdomain Action abstraction: The object provides a generalised set of operations. Each object is a variable of a 'data type'. The operations in all objects of a class share the code of the class methods. We can make the following observations: • • • • • • Module B defines its own data. Now we are in a position to clarify the expressions: Object={Data structures}+{Operations} Program={Objects} • • An object is a collection of data items and operations that manipulate data items. These are called instance variables of the object. 4. A object oriented program is a collection of objects which interact with each other. The operations are also defined in the class. Abstraction Abstraction provides a well-defined conceptual boundary relative to the perspective of the viewer. Thus a class is a template and objects are its 'copies' • • • The data structures are declared in the class. A class defines a 'data type'. only one object is active at a time. also defined in module B. Hence. When we define abstract data. Each object of the class contains a copy of the data structures declared in the class.e. all of which perform the same kind of function. the execution of an object oriented program is 'single threaded'. This simplifies debugging.

Details of a person's income are not essential to a viewer of this class. it can raise the following complications: • • Possibility of name clashes Repeated inheritance from two peer super-classes The object structure hierarchy is also called 'part of' hierarchy. in C a structure can be used to group logically related data elements and structures. These are called derivedclasses. It is implemented through aggregation. etc. that is one object becomes a part of another object. This leads to entity abstraction. One could declare some of the variables and methods of the object as private and the rest public. It is called the 'is a' hierarchy. wheels. modify. For example. This is called multiple inheritance. called base-class. For example. For example. They represent 'things' or entities. By carefully selecting what is available to the outside world. chasis. This is achieved through inheritance. One often faces the question as to how to identify useful objects from a problem specification. three-wheeler. The answer is look for nouns in the specification. encapsulation enables one to expose only those details that are necessary to effectively use the object. (or super-class. It is not unique to object oriented systems. Thus information about an entity is localized. Hierarchy Hierarchy is a ranking or ordering of abstractions. consider a class called Person with an operation like 'compute_income_tax'. In OO context. 5. The important hierarchies in an OO system are: • • The class structure hierarchy The object structure hierarchy A class structure hierarchy is used for sharing of behaviour and sharing of code between different classes of entities which have some features in common. For example. Hence the data members of the class can be declared private. or even hide some of these features. compute_pf. (or sub-classes. Only the ability to compute the income-tax for each object is essential. or child-classes). However. an object called salaried_employee could contain operations like compute_income_tax. a vehicle consists of many parts like engine.Entity abstraction is considered to be the best form of abstraction as it groups together the data and operations concerning an entity. etc. Features that are common to many classes are migrated to a common class. Hence it can be represented as an aggregation of many parts. a two-in-one inherits the features of a radio and a tape-recorder. This is achieved through information hiding and "need to know" principle. the developer can prevent illegal use of objects. Polymorphism . Only public variables and methods will be accessible by other objects. Other classes may add. 7. Encapsulation While abstraction helps one to focus on the essential characteristics of an object. For example. 6. and four-wheeler can be derived classes. or parent-class). There is also a possibility of inheriting from more than one class. vehicle can be a base-class and two-wheeler. while 'compute_income_tax' can be public. Aggregation permits grouping of logically related structures.

Based on the foregoing criteria. Based on the above rules and criteria. the following five design principles follow: • • Linguistic modular units principle: Modules must correspond to syntactic units in the language used. Protection: A characteristic that reduces the propagation of side effects of an error in one module. Composability: Degree to which modules once designed and built can be reused to create other systems. • • • Each module can be compiled separately. A module typically clusters logically related abstractions. This is invaluable in understanding a system. Explicit interfaces: Interfaces should be explicit. pentagon. documented boundaries within a system. Communication through global variables violates this criterion. For example. Small interfaces: Minimum amount of information should move across an interface. It assists in adopting a unified design approach.e. Understandability: Ease of understanding the program components by themselves. This is polymorphism. We have derived three classes. Continuity: Ability to make incremental changes. Connections between modules are the assumptions which modules make about each other. Any such name is thus able to respond to some common set of operations in different ways. Meyer suggests five rules to be followed to ensure modularity: • • • • • Direct mapping: Modular structure of the software system should be compatible with modular structure devised in the process of modeling the problem domain Few interfaces: Minimum number of interfaces between modules. The version of the draw function to be used during runtime is decided by the object through which it is called. Meyer's Criteria Bertrand Meyer suggests five criteria for evaluating a design method's ability to achieve modularity and relates these to object oriented design: • • • • • Decomposability: Ability to decompose a large problem into subproblems.It is a concept wherein a name may denote instances of different classes as long as they are related by some common super class. Information hiding: Information concerning implementation is hidden from the rest of the program. without refering to other components. i. Self-documentation principle: All information about a module should be part of the module itself. Modularity Modularity creates a number of well-defined. but has connections with other modules. 8. a module should be a single syntactic construct in the programming language. let us say that we have declared a class called polygon with a function called draw(). each with its own redefinition of the draw() function. . rectangle. triangle. Ideally. 9.

and identity. Characteristics of an object From the perspective of human congnition. Something that may be apprehended intellectually. There are five kinds of service: • • • • • Modifier: The operation alters the state of the object. Some objects may have clear physical identities(e. A class provides explicit interfaces and information hiding. the following statement is true for all objects: An object has state. when it is in tape-recorder mode. behaviour. 10. . Agent: The object acts on other objects and is also acted upon by others. Destructor: The operation destroys the object There are three categories of objects: • • • Actor: The object acts upon other objects but is never acted upon. a two-in-one cannot operate as a radio. a chemical process). That is. 11. Constructor: The operation creates an object and initialises its state. Iterator: The operation accesses parts of an object in some order. Something towards which thought or action is diected. Single choice principle: Whenever a software system must support a set of alternatives. an object is one of the following: • • • A tangible and/or visible thing. Server: The object is always acted upon by other objects. An object's behaviour is governed by its state. Protected: The feature that is declared protected is accessible to the class itself and all its derived classes and friends Private: The feature that is declared private is accessible only to the class itself and its friends. An operation is a service that an object offers to the rest of the system.g.machine). This is what sets OO languages apart from early languages like PASCAL and SIMULA.• • • Uniform access principle: All services offered by a module should be available through a uniform notation Open-closed principle: Modules should be both open and closed. Selector: The operation accesses the state of an object but does not alter it. However. A class in an OO language provides a linguistic modular unit. There are three types of visibility in to a class.g. For example. it should be possible to extend a module. while some others may be intangible with crisp conceptual boundaries(e. Extensibility The notions of abstraction and encapsulation provide us with features of understandability. • • • Public: The feature that is declared public is accessible to the class itself and all its clients. while it is in use. one and only one module in the system should know their exhaustive list.

o Use of the new station is integrated in the software. Such applications need not be retested nor revalidated. Semantic extension using object oriented methodolgy does not suffer from this problem. Java. They cannot be used until the maintenance is complete. The classical process is to declare that the entity is under maintenance. New applications using 'Persons' can be developed. It interrupts the running applications. To incorporate the extension o We create another instance of BSTATION. What changes are required to support additional stations? • • • Let us say the application contains a class called BSTATION. If necessary. months and days. 'Persons' must then be modified. Avoiding obtrusion leads to different versions of software. It can be done in two ways: extension by scaling and semantic extension. There are many languages which support object-oriented programming. Object-oriented approach itself is most suitable in business areas which are • • • • • Rapidly changing Require fast response to changes Complex in requirements or implementation Repeated across company with variations Having long-life applications that must evolve . Thus. Semantic extension may involve Changing the behaviour of existing entities in an application. We may want to compute the average marks of students. Conclusions Thus object-oriented approach has many advantages over the structured methodology. Applications using an entity are not affected by the extension and can continue to run while the entity is being extended. Let us say these requirements necessitate changes to the entity 'Persons'. Eiffel. provision is made to direct some workload at the new station. The choice of the language will usually be based on business requirements. We can achive scaling by merely declaring another variable of a class. This is achieved through the notion of inheritance. We may want to compute the average age of students in years. Some of the popular ones are. each application using 'Persons' should be retested. this form of maintenance for extension is obtrusive. and protection. This instance will share the code for the methods with other instances. C++. tested. Each banking station is an instance of this class. Smalltalk. Now existing applications using 'Persons' can be resumed. but will have its own copies of the data. It is made simple by the notion of a class. For example. 12. This implies: • • • • • Applications using 'Persons' must be held in abeyance. It must then be extensively tested. one of the goals of OO paradigm is 'extensibility'. Defining new kinds of behaviour for an entity. As we said in the beginning. This is extension by scaling.continuity. etc. Consider a banking application which supports two 'banking stations'.e. Ada. i.

by Grady Booch.. Prentice-Hall "Object-oriented modeling and design". by Bertrand Meyer. by James Rumbagh. Addison-Wesley Object oriented concepts Introduction to object oriented programming using C++ An introduction to design by contract . Prentice-Hall of India "Object-oriented software engineering. References • • • • • • • "Object-oriented analysis and design with applications". et. AddisonWesley "Object-oriented software construction". by Ivor Jacobson.13. a use case driven approach".al.

Some type of notation represents models visually. The notation often takes the form of graphical symbols and connections. Business modeling is a technique which will help in finding out whether we have identified all the system use cases as well as determining the business value of the system. Modeling 1.3 What is Business Modeling? UML is not only used for modeling system software but also for modeling the business process. The modeling language is the notation that methods use to express designs. A system can be represented by a set of independent models. specify.1 What is Modeling ? A software development method consists of a modeling language and a process. To ensure that customer-oriented solutions are built. Models provide various perspectives. Business modeling is a technique to model business processes. 1. 1.OOAD using UML 1. Models are built so that we can better understand the system that we are developing. Business models provide ways of expressing the business processes in terms of business activities and collaborative behavior. Models in software help us to visualize. The Unified Modeling Language (UML) is called a modeling language. it is to be decided which elements are to be included and which are to be excluded. as a basis for building the system . for a given level of abstraction. While creating a model. we must not overlook • the environment in which these systems will work • the roles and responsibilities of the employees using the system • the "things" that are handled by the business. construct and document the artifacts of a software intensive system. not a method. Why Business Modeling? The system can provide value only if we know how it will be used. who will use it and in what circumstances it will be used. The process describes the steps taken in doing a design.2 Principles of Modeling • The choice of the right model is extremely important since it decides the way in which we deal with the problem • The levels of precision for each model vary • The best models are connected to reality • A single model may not be able to represent all the details of a system. when put together will provide an overall view of the system.

b) Incremental The iterations are incremental in function. • Use component-based architectures. • Manage requirements. this process provides continuous feedback that improves the final product. 2. architecture-centric. On the management side.1 What is Unified Process? A process is a set of activities intended to reach a goal. a) Iterative The development is iterative and involves a sequence of steps and each iteration adds some new information. c) Architecture-Centric The importance of well defined basic system architecture is realized and is established in the initial stage of the process. Unified Process 2. 2. d) Use Case driven . The goals of the Unified Process are to enable the production of highest quality software that meets end-user needs with predictable schedules and budgets.One of the great benefits of business modeling is to elicit better system requirements. requirements that will drive the creation of information systems that actually fit in the organization and that will indeed be used by end-users. The Unified Process is iterative. Each iteration builds on the use cases developed in the previous iterations. incremental. The Unified Process captures some of the best current software development practices in a form that is tailorable for a wide range of projects and organizations. Thus. The architectural blue print serves as a solid basis against which to plan and manage software component based development. We can use UML with a number of software engineering processes.Features The Unified Process captures many of modern software development's best practices in a form suitable for a wide range of projects and organizations: • Develop software iteratively and incrementally. The Unified Process is one such lifecycle approach well-suited to the UML. The inputs to the software process are the needs of the business and the output will be the software product.2 Unified Process . use-case driven. the Unified Process provides a disciplined approach on how to assign tasks and responsibilities within a software development organization. Each iteration is evaluated and used to produce input for the next iteration.

The Unified process places strong emphasis on building systems based on a thorough understanding of how the delivered system will be used. in which each iteration builds production quality software.3 A brief outline of a typical development process A typical OO development process is iterative and incremental. The construction phase typically has many iterations . It has the following stages: • • • • Inception Elaboration Construction Transition A software system is not released in a big bang at the end of the project but is developed and released in phases. 2.Development activities under the Unified Process are use case driven. . (tested and integrated) that satisfies a subset of project requirements.

Spend time getting comfortable with them. e. Skeleton Domain models can be drawn using OO techniques like UML Considerations for Workflow.2 Elaboration phase In the elaboration phase. It may be as informal as a chat in the cafeteria. as we use ATM one use case will be . we develop business model for the project. the project plan is developed and risk assessment is also performed. another may be print receipt.2. Dealing with Skills Risks • Acquire skills through training and mentoring . display balance and so on. Determine roughly how much it will cost and how much it will earn. Rulebase.g. build a simple application using these. or a fullfledged feasibility study that takes weeks.1 Inception phase During the inception phase.3. The major types of risks are • • • • Requirements Risks Technological Risks Skills Risks Political Risks Dealing with Requirements Risks • • • • Use cases can be employed to provide basis for communication between customer and developers in planning a project Use case is a typical interaction between user and system to achieve a goal. Security need to made Dealing with Technological Risks Build prototypes that try out pieces of technology that we plan to use.3.Dispense cash. the following objectives are to be achieved: • Concurrence on the scope of the project and the estimates • Understanding of the requirements 2. Identifying all use cases and describing important use cases also happens at this stage. The actors of the system and their interaction with the system are analyzed at a high level. Also. consider how easy or difficult it is to port to other platforms in future. At the end of the Inception stage. E.g. The feasibility study is performed as well as the overall scope and size of the project is determined during the inception phase. if we are using Java and relational database. a baseline architecture is established. Inception can take many forms. Try out tools that support our choice of technology.

e. impact on system design. The cost and schedule estimates can be comfortably made at this stage and create a plan for construction. It should end with system tests to confirm that the use cases have been built correctly and a demo to the user Each iteration builds on the use cases developed in the previous iterations. estimate length of time required for each use case.. we will never have a developer with no distractions .an iteration should be long enough for us to do several use cases. This is how much development can we do in an iteration. integration and documentation. if we have 6 developers.g. Determine our iteration length.how fast can we go.so difference between ideal time and reality will be the load factor. they lose their structure and deform into a mass of spaghetti! Refactoring is a technique used to reduce the short term pain of redesigning .g. Assuming a fully committed developer. they can be looked upon as example models o They describe common ways of doing things o They are collected by people who spot repeating themes in analysis and design o Each theme is described so that people can read it and see how to apply it. medium and low business value Have the developers categorize use cases into high. design. Determine our project velocity . Planning the Construction phase • • • • • • Have our customer categorize use cases into high. 3 week iteration length and a load factor of 2.apply load factors . lack of adequate understanding .it is iterative in nature Use Refactoring in iterating code Test extensively • • • • • • • Refactoring • • Software Entropy: The principle of software entropy suggests that programs start off in a well designed state. Estimate effort involved for each iteration. o e. Thus.include analysis..it is incremental in nature Each iteration will involve rewriting some existing code to make it more flexible .these would reflect level of difficulty. patterns promote reuse. we will have 9 ideal developer-weeks per iteration Deal with Use cases with high risk first Have a release plan ready Each iteration is a mini-project in itself. Dealing with Political Risks • • Training on Effective Team work can be of great help here. medium and low risks. but as bits of functionality are added. coding unit testing.• • Read relevant technical books Look for patterns o Pattern is an idea that has been useful in one practical context and may probably be useful in others o In the modeling context.

Refactoring is done after every iteration. Each step is tiny but performing these steps can make a remarkable difference to the program. • • The product should be stable and mature for release Actual versus planned expenditure should be acceptable 2.• • • It involves small steps like renaming a method.4 Transition phase The objective of this phase is to transition the software product to the user community. schedule and quality. It is iterative since each iteration involves rewriting some existing code to make it more flexible. The construction phase is incremental and iterative. Test after each such step Do not add new functionality and refactor at the same time At the end of the elaboration • • • • • The use case model should be complete Nonfunctional Requirements should be elaborated Software Architecture should be described Revised risk list should be present A preliminary user manual (optional) 2. At the end of the construction. Introduction to UML * . The activities in this phase include • • • User Training Conversion of Operational databases Roll out the product to marketing and sales The objectives of the transition phase are • • • Customer Satisfaction Achieving the concurrence of the stakeholders that the deployment baselines are complete and consistent with the evaluation criteria Achieving final product baseline rapidly in a cost effective manner 3. consolidating similar methods into a superclass and the like. correcting defects and optimization of the software are part of this phase. Resources are managed and operations are controlled to optimize cost.3.3 Construction phase All the components are developed and the components are integrated during the construction phase. Developing new releases.3. All the features are completely tested during this stage. It is incremental because each iteration builds on the code developed in the previous iterations.

constructing. The UML is process independent. dynamic. devised by James Rumbaugh and others at General Electric. throughout the development life cycle.2 Evolution of UML One of the methods was the Object Modeling Technique (OMT). of the ideas from different methodologies from three amigos – Booch. It emphasizes analyzing the system from both a macro development view and micro development view and it was accompanied by a very detailed notation. It is called so. The UML combines the best from • • • • Data Modeling concepts Business Modeling (work flow) Object Modeling Component Modeling The UML may be used to • • • • • • Display the boundary of a system and its major functions using use cases and actors Illustrate use case realizations with interaction diagrams Represent a static structure of a system using class diagrams Model the behavior of objects with state transition diagrams Reveal the physical implementation architecture with component & deployment diagrams Extend the functionality of the system with stereotypes * This material is not in compliance with UML 2. since it was the unification. architecture-centric. iterative. . It advocated that at each stage of the process there should be a check to see that the requirements of the user were being met. 3. and across different implementation technologies. and documenting the artifacts of a software-intensive system. It consists of a series of models . construct and document the artifacts of a software-intensive system. and incremental.0 3. The UML is the standard language for visualizing. specify. specifying. and functional that combines to give a full view of a system.use case.1 What is UML The Unified Modeling Language (UML) is a robust notation that we can use to build OOAD models. The UML is only a language and so is just one part of a software development method. object. Rumbaugh and Jacobson. The Booch method was devised by Grady Booch and developed the practice of analyzing a system as a series of views. although optimally it should be used in a process that is use case driven.The UML may be used to visualize. It can be used with all processes. The Object-Oriented Software Engineering (OOSE) method was devised by Ivar Jacobson and focused on the analysis of system behavior.

object. For example. Each model element has a corresponding graphical symbol to represent it in the diagrams. activity. component. the class diagram represents a group of classes and the relationships.3. the five main views of the system are o User o Structural o Behavioral o Implementation o Environment .4 The UML is composed of three different parts: • • • Model elements Diagrams Views Model Elements • • The model elements represent basic object-oriented concepts such as classes. each drawing on strengths of the others to augment their weaker aspects. Each view is an aspect of the system that is abstracted to a number of related UML diagrams. sequence. such as association and inheritance. and deployment Views • • • • Views provide the highest level of abstraction for analyzing the system. and relationships. Diagrams • • • Diagrams portray different combinations of model elements. state chart. The UML provides nine types of diagram . the views of a system provide a picture of the system in its entirety. 3. class. In the UML. This led to a growing similarity between the methods and hence the Unification. This made it very difficult for developers to choose the method and notation that suited them and to use it successfully. Taken together. objects.3 Why Unification Each of these methods had their strong points and their weak points. New versions of some of the methods were created. Each had its own notation and its own tools.use case. between them. collaboration.

control.5 UML Diagrams 3. as the visual diagrams can't contain all of the information that is necessary.1 Use Case Diagrams The use case diagram presents an outside view of the system The use-case model consists of use-case diagrams. .5. and components Examples • • • • Class stereotypes: boundary. Use cases also require a textual description (use case specification). and views. inheritance relationships. And it provides mechanisms to adapt or extend itself to a particular method. • • The use-case diagrams illustrate the actors. the UML provides mechanisms for adding comments.In addition to model elements. diagrams. and their relationships. or organization. Extensions of UML • • Stereotypes can be used to extend the UML notational elements Stereotypes may be used to classify and extend associations. or semantics to diagrams. software system. entity Inheritance stereotypes: extend Dependency stereotypes: uses Component stereotypes: subsystem 3. the use cases. information. classes.

Use Case Naming: Use Case should always be named in business terms. Creating a use-case model involves the following steps: 1. Use Case Specification shall document the following • • • • • • • Brief Description Precondition Main Flow Alternate Flow Exceptional flows Post Condition Special Requirements Notation of use case Actor: Definition: someone or something outside the system that interacts with the system • An actor is external .• • The customers. Notation for Actor . It will be usually verbs or short verb phrases. describing the use cases 4. but an interface needed to support or use it. the domain experts. identifying the actors and the use cases 3. picking the words from the vocabulary of the particular domain for which we are modeling the system. the end-users. • It represents anything that interacts with the system. defining the system 2. It should be meaningful to the user because use case analysis is always done from user’s perspective.it is not actually part of what we are building. and the developers all have an input into the development of the use-case model. defining the relationships between use cases and actors. 5. defining the relationships between the use cases Use Case: Definition: is a sequence of actions a system performs that yields an observable result of value to a particular actor.

UML modeling elements in class diagrams • • Classes and their structure and behavior Relationships o Association o Aggregation o Composition o Dependency o Generalization / Specialization ( inheritance relationships) o Multiplicity and navigation indicators o Role names A class describes properties and behavior of a type of object. An extend relationship indicates that one use case is a variation of another. labeled <<extend>>.2 Class Diagrams Class diagram shows the existence of classes and their relationships in the structural view of a system. Extend notation is a dotted line.5. and with an arrow toward the base case. 3. which determines when the extended case is appropriate. The extension point.Relation: Two important types of relation used in Use Case Diagram Include– An Include relationship shows behavior that is common to one or more use cases (Mandatory) Include relation results when we extract the common sub flows and make it an use case Extend – An extend relationship shows optional behavior (Optional) Extend relation results usually when we add a bit more specialized feature to the already existing one.. we say the use case B extends its functionality to use case A. • • • Classes are found by examining the objects in sequence and collaboration diagram A class is drawn as a rectangle with three compartments Classes should be named using the vocabulary of the domain Naming standards should be created e. is written inside the base case. A system boundary rectangle separates the clinic system from the external actors.g. all classes are singular nouns starting with a capital letter .

the problem requirements.• • The behavior of a class is represented by its operations. attributes. and by applying domain knowledge Notation: Class information: visibility and scope The class notation is a 3-piece rectangle with the class name.Aggregation • • • • An aggregation is a stronger form of relationship where the relationship is between a whole and its parts It is entirely conceptual and does nothing more than distinguishing a ‘whole’ from the ‘part’. Here is a new. expanded Order class. It doesn’t link the lifetime of the whole and its parts An aggregation is shown as a line connecting the related classes with a diamond next to the class representing the whole . Attributes and operations can be labeled according to access and scope. Relationship . and operations.Association • • • Association represents the physical or conceptual connection between two or more objects An association is a bi-directional connection between classes An association is shown as a line connecting the related classes . Operations may be found by examining interaction diagrams The structure of a class is represented by its attributes Attributes may be found by examining class definitions.

. it is often desirable to restrict navigation to one direction ..* or * 1 1. The notation n .Multiplicity • • Multiplicity defines how many objects participate in relationships Multiplicity is the number of instances of one class related to ONE instance of the other class This table gives the most common multiplicities.Composition • Composition is a form of aggregation with strong ownership and coincident lifetime as the part of the whole. .. Multiplicities Meaning 0. m indicates n to m instances no limit on the number of instances (including none) exactly one instance at least one instance For each association and aggregation. . there are two multiplicity decisions to make: one for each end of the relationship • Although associations and aggregations are bi-directional by default.1 0..* zero or one instance.

An association class is an association that also has class properties(or a class has association properties) .Dependency A dependency relationship is a weaker form of relationship showing a relationship between a client and a supplier where the client does not have semantic knowledge of the supplier A dependency is shown as a dashed line pointing from the client to the supplier . . an arrowhead is added to indicate the direction of the navigation .• If navigation is restricted.Generalization: is a relationship between a general thing (called the super class or the parent) and a more specific kind of that thing (called the subclass(es) or the child).

An interface is formally equivalent to abstract class with no attributes and methods.A qualifier is an attribute or set of attributes whose values serve to partition the set of instances associated with an instance across an association. . only abstract operations. the system described by the model is invalid ..An interface is a specifier for the externally visible operations of a class without specification of internal structure. .A constraint is a semantic relationship among model elements that specifies conditions and propositions that must be maintained as true: otherwise.

objects. actors and messages between them to achieve the functionality of a Use Case There are two types of interaction diagrams 1. .3. vertical axis represents the passage of time The following sequence diagram realizes a scenario of reserving a copy of book in a library. Sequence Diagram 2.5.3 Interaction Diagrams • • • • Interaction diagrams is used to model the dynamic behavior of the system Interaction diagram helps us to identify the classes and its methods Interaction diagrams describe how use cases are realized as interactions among objects Show classes. Collaboration Diagram Sequence diagram: show the interaction of objects with respect to time Sequence diagrams have two axes • • horizontal axis represents the objects involved in a sequence.

Collaboration diagram: shows the interaction of the objects and also the group of all messages sent or received by an object. The following collaboration diagram realizes a scenario of reserving a copy of book in a library . This allows us to see the complete set of services that an object must provide.

A branch and its subsequent merge marking the end of the branch appear in the diagram as hollow diamonds. connecting it to the next activity.4 Activity Diagram An activity diagram is essentially a fancy flowchart. Collaboration diagrams: emphasize the spatial aspect of a scenario . Guard expressions (inside [ ]) label the transitions coming out of a branch. 3. The activity diagram for reserving a book (in the Library Management System) is shown below: .5. A transition may fork into two or more parallel activities.they focus on how objects are linked. A transition may branch into two or more mutually exclusive transitions. While a statechart diagram focuses attention on an object undergoing a process (or on a process as an object). Activity diagrams can be divided into object swimlanes that determine which object is responsible for which activity. A single transition comes out of each activity. The activity diagram shows the how those activities depend on one another. Activity diagrams and statechart diagrams are related. The fork and the subsequent join of the threads coming out of the fork appear in the diagram as solid bars.Difference between the Sequence Diagram and Collaboration Diagram • • Sequence diagrams: emphasize the temporal aspect of a scenario .they focus on time. an activity diagram focuses on the flow of activities involved in a single process.

usecase. operation The events that cause a transition from one state to another The actions that result from a state change State transition diagrams are created for objects with significant dynamic behavior (More Contents are to be added) 3. .6 Component Diagram Describe organization and dependency between the software implementation components.5.3.5 State Chart Diagram A state chart (transition) diagram is used to show • • • The life history of a given class.5.

Contain components . 3. client machine) .e. source code.g.e.5. database. object code. source code and nodes (e.g. object code. printer.Components are distributable physical units .7 Deployment Diagram Describe the configuration of processing resource elements and the mapping of software implementation components onto them.g.

http://www.htm .4. Suggested References 1. Ivar Jacobson) 2. James Rumbaug. UML in a Nutshell (Author. UML distilled (Author -Martin Fowler) 3.Sinan Si Alhir) 4. The Unified Modeling Language User Guide (Authors.org/technology/documents/formal/uml. Ambler) 5.Grady Booch. The Elements of UML Style (Author – Scott W.omg.

the word 'last' is ambiguous. Sometimes.Requirements Engineering 1. standard. re-engineering of the business processes may be required to improve efficiency and that may be all that is required. Client requirements are usually stated in terms of business needs. To be able to do this. or complex numbers. which could be anywhere in a random access file. The analyst should not blindly assume that only a software solution will solve a client's problem. "The software should be highly user friendly" How does one determine. whether this requirement is satisfied or not. is missing on the type of the matrix elements. Introduction The objectives of this module are • • • To establish the importance / relevance of requirement specifications in software development To bring out the problems involved in specifying requirements To illustrate the use of modelling techniques to minimise problems in specifying requirements Requirements can be defined as follows: • • A condition or capability needed by a user to solve a problem or achieve an objective. A condition or capability that must be met or possessed by a system to satisfy a contract. It could mean the last accessed record. Let us look at a few examples: • "The counter value is picked up from the last record" In the above statement. etc. Stating and understanding requirements is not an easy task. or other formally imposed document. He should have a broader vision. it could be physically the last record in the file "Calculate the inverse of a square matrix 'M' of size 'n' such that LM=ML=In where 'L' is the inverse matrix and 'In' is the identity matrix of size 'n' " This statement though appears to be complete. A bank manager might state his requirements in terms of time to service his customers. then a detailed statement of what the software must do to meet the client's needs should be prepared. a stores manager might state his requirements in terms of efficiency in stores management. specification. After all this. the analyst must understand the client's business domain: who are all the stake holders. if it is found that a software solution will add value. At a high level. This document is called Software Requirements Specification (SRS) document. Depending on the answer to this question. what are the constraints. • • . Software requirements specify what the software must do to meet the business needs. the algorithm will be different. what are the alterables. For example. real numbers. Are they integers. or. requirements can be classified as user/client requirements and software requirements. how they affect the system. It is the analyst's job to understand these requirements and provide an appropriate solution.

functional requirements and nonfunctional requirements. requirements do change. should not be used. that is. All requirements must be verifiable. The SRS should be complete in all respects. it should be possible to verify if a requirement is met or not. Words like 'highly'. Functional requirements specify what the system should do. The Agile development methodologies are specifically designed to take this factor in to account. Understanding Requirements 2. There should be no factual errors All requirements should have one interpretation only. Examples are: • • • Calculate the compound interest at the rate of 14% per annum on a fixed deposit for a period of three years Calculate tax at the rate of 30% on an annual income equal to and above Rs. So the format of the SRS should be such that the changes can be easily incorporated • • • 2. It is difficult to achieve this objective.• "The output of the program shall usually be given within 10 seconds" What are the exceptions to the 'usual 10 seconds' requirement? The statement of requirements or SRS should possess the following properties: • • • All requirements must be correct.00. However. 'usually'.00. All requirements must be consistent and non-conflicting As we have stated earlier.000 Invert a square matrix of real numbers (maximum size 100 X 100) Non-functional requirements specify the overall quality attributes the system must satisfy.000 but less than Rs. Many times clients change the requirements as the development progresses or new requirements are added.3. We have seen a few examples of ambiguous statements above.2. The following is a sample list of quality attributes: • • • • • • • • • • • portability reliability performance testability modifiability security presentation reusability understandability acceptance criteria interoperability Some examples of non-functional requirements are: . They partition the requirements in to subsets called scenarios and each scenario is implemented separately.1 Functional and Non-Functional Requirements Requirements can be classified in to two types. each scenario should be complete. namely.

Exciting requirements. are not stated by the users. But if the developer provides for them in the system. Expected requirements may not be stated by the users. users may be thoroughly dissatisfied. The user satisfaction level is directly proportional to the extent to which these requirements are satisfied by the system. For example. 2.2 Other Classifications Requirements can also be classified in to the following categories: • • • • Satisfiability Criticality Stability User categories Satisfiability: There are three types of satisfiability. as the story goes. not only. The trend over the years has been that the exciting requirements often become normal requirements and some of the normal requirements become expected requirements. They are very important from the developer's point of view. they do not even expect them. other users started demanding it as part of their systems. and exciting. but the developer is expected to meet them. expected. After this training. on-line help feature was first introduced in the UNIX system in the form of man pages. At that time. but if they are not met. Later.• • • • • • Number of significant digits to which accuracy should be maintained in all numerical calculations is 10 The response time of the system should always be less than 5 seconds The software should be developed using C language on a UNIX based system A book can be deleted from the Library Management System by the Database Administrator only The matrix diagonalisation routine should zero out all off-diagonal elements. Now a days. the user satisfaction level may not increase. which are equal to or less than 10-3 Experienced officers should be able to use all the system functions after a total training of two hours. but the developer is expected to provide it. users do not ask for it. Normal requirements are specific statements of user needs. namely. the average number of errors made by experienced officers should not exceed two per day. . it was an exciting feature. normal. If the requirements are met. user satisfaction level will be very high.

Stable requirements don't change often. This classification should be done in consultation with the users and helps in determining the focus in an iterative development model.Criticality: This is a form of priortising the requirements.1 Overview Every software system has the following essential characteristics: • • It has a boundary. It is important that all stakeholders are identified and their requirements are captured. or atleast the time period of change will be very long. They can be classified as mandatory. The boundary separates what is with in system scope and what is outside It takes inputs from external agents and generates outputs . There can be further subdivisons among these classes depending on the information needs and services required. Broadly they are of two kinds. Stability: Requirements can also be categorised as stable and non-stable. and non-essential. 3. Some requirements may change often. User categories: As was stated in the introduction. if business process reengineering is going on alongside the development. Those who dictate the policies of the system and those who utilise the services of the system. Modelling Requirements 3. For example. there will be many stake holders in a system. All of them use the system. desirable. then the corresponding requirements may change till the process stabilises.

only a single process. and the data flow between the external agents and the system. we will be describing artifacts used by Structured Systems Analysis and Design Methodology (SSADM). defines the system scope. and the data flow between them is shown. Structured English to paraphrase process algorithms. State Transition Diagram to model state changes of the system.2 Data Flow Diagram (DFD) Data flow diagram focuses on movment of data through the system and its transformations. and querying it The system may also use data stores to store data which has a life beyond the system In the following. modifying. It is divided in to levels. No duplicates are allowed. Entity Relationship Diagram (ERD) for modelling data and their relationships. 3. It uses: • • • • • • Data Flow Diagram (DFD) for modelling processes and their interactions.• • • It has processes which collaborate with each other to generate the outputs These processes operate on data by creating. Level 3. Duplicates are to be identified. where all the major processes. At level 0. but need not be drawn at level 2 onwards. . Process: They indicate information processing activity. show details of individual processes. Data Dictionary to specify data Decision Tables and Decision Trees to model complex decisions. data stores. etc. Level 1 is an explosion of Level 0. but interact with the system. the number of processes should be limited to 7 ± 2. depicting the system is shown. system boundary. On subsequent levels. Level 0. Level 2. They must be drawn at level 0. The notation used is the following: External agents: They are external to the system. They must be given meaningful names. It consists of external agents. destroying. also known as the context diagram. They must be shown at all levels.

Duplicates must be indicated. . They are not shown at level 0. All data stores should be shown at level 1. They must be shown at all levels and meaningful names must be given. Data Flows: They indicate the flow of information.Data Stores: They are used to store information.

Examples: 1. Customer places sales orders. The system checks for availability of products and updates sales information .

It must show all the external agents. and every dataflow. They are outside the scope of the system. 4) Do not show dataflows between external agents. Invites all eligible candidates for interview. and all the dataflows connecting the various artifacts.2. Checks for eligibility conditions. external agents. • • Points to remember: 1) Remember to name every external agent. Company receives applications. all the major processes. and the dataflows connecting the system and the external agents. Avoid dataflow crossings. 3) Do not show loops. . all the data stores. and decisions. The artifacts should be placed based on logical precedence rather than temporal precedence. It should show the system boundary. Updates the eligibility conditions as and when desired by the management Getting started: • • • • Identify the inputs or events which trigger the system and outputs or responses from the system Identify the corresponding sources and destinations (external agents) Produce a context diagram (Level 0). Maintains a list of all candidates called for interview. every process. Produce Level 1 diagram. 2) Do not show how things begin and end. Refine the Level 1 diagram. Explode the individual processes as necessary. every data store.

6) Do not show dataflow between two data stores. There should be a process in between.5) Do not show dataflow between an external agent and a data store. process. 7) There should not be any unconnected external agent. 8) Beware of read-only or write-only data stores . or data store. There should be a process in between.

ERD focuses on data and the relationships between them. 9) Ensure that the data flowing in to a process exactly matches the data flowing in to the exploded view of that process. Similarly for the data flowing out of the process. It is an effective tool to communicate with senior management (what is the data needed to run the business). data administrators (how to manage and control data).3 Entity Relationship Diagram (ERD) ERD complements DFD. Also. 10) Ensure that the data flowing out of a data store matches data that has been stored in it before. beware of processes which generate outputs spontaneously without taking any inputs. It helps to organise data used by a system in a disciplined way. It helps to ensure completeness.9) Beware of processes which take inputs without generating any outputs. Entity: It represents a collection of objects or things in the real world whose individual members or instances have the following characteristics: . See the appendix for the complete data flow diagram of "Material Procurement System (Case Study)" 3. database designers (how to organise data efficiently and remove redundancies). It consists of three components. adaptability and stability of data. While DFD focuses on processes and data flow between them.

a four wheeler is a type of vehicle. a car consists of engine. For example. in an inventory management system. will be chosen. which are relevant for the system under study. and also physical attributes like height. locations. A truck is a type of four wheeler . supplier. Examples are employee. vendor.• • • Each can be identified uniquely in some fashion. 1. gear box. etc. Aggregation entity: It consists of or an aggregation of other entities. Associative entity: It depends on two or more entities for its existence. If 'name' is an attribute of an entity. which are not associated with any instance of the second entity. The domains of the attributes should be pre-defined. etc. For example. but only a subset. Mandatory relationship means associated with every instance of the first entity there will be atleast one instance of the second entity. For example. Generalisation entity: It encapsulates common characteristics of many subordinate entities. namely. Optional relationship means that there may be instances of the first entity. Entities generally correspond to persons. Every entity will have many attributes. delivery. warehouse. chasis. There are five types of entities. A vehicle can also be regarded as an aggregation entity. because a vehicle can be regarded as an aggregation of many parts. For example. Each plays a necessary role in the system we are building. For example. materials Subordinate entity: It depends on another entity for its existance. materials. Optionality is of two types. • Entity keys are used to uniquely identify instances of entities. events. Similarly invoices will depend on purchase orders. etc. then its domain is the set of strings of alphabets of predefined length. Attributes are classified as entity keys and entity descriptors. But only one set will be chosen depending on the context. etc. objects. . Relationships: They describe the association between entities. weight. For example. an employee entity will have professional attributes like name.g. etc. Each can be described by one or more data elements (attributes). purchase order can be an entity and it will depend on materials being procured. • • Fundamental entity: It does not depend on any other entity for its existence. student grades will depend on the student and the course. • • • Attributes: They express the properties of the entities. Attributes having unique values are called candidate keys and one of them is designated as primary key. For e. designation. salary. mandatory and optional. They are characterised by optionality and cardinality. 2.

The following table gives the notation.employee-spouse relationship has to be optional because there could be unmarried employees. Other types of relationships are multiple relationships between entities. while an instance of the second entity is associated with only one instance of the first entity. Bachman notation. each instance of the second entity is related to one instance of the first entity. It is not correct to make the relationship mandatory. 1. Peter Chen notation 2. 2. COMPONENT ENTITY OR OBJECT TYPE REPRESENTATION PURCHASE ORDER RELATIONSHIP CARDINALITY OPTIONALITY PETER CHEN BACHMAN Example for Bachman notation . Peter Chen and Bachman are the name inventors of the notation. relationship of entity with itself. many-to-many. Similarly. Cardinality is of three types: one-to-one. Not surprisingly. In many-to-many relationship an instance of the first entity is related to many instances of the second entity and the same is true in the reverse direction also. One-to-one relationship means an instance of the first entity is associated with only one instance of the second entity. 3. one-to-many. One-to-many relationship means that one instance of the first entity is related to many instances of the second entity. relationships leading to associative entities. EXCLUSIVE-OR and AND relationships ERD notation: There are two type of notation used: 1.

Example for Peter Chen notation .

a company manufactures many cars. Among the automobile manufacturing companies.Given below are a few examples of ER diagrams using Bachman notation. but a given car is manufactured in only one company 3. In a college. In a company. each division is managed by only one manager and each manager manages only one division 2. every student takes many courses and every course is taken by many students . First the textual statement is given followed by the diagram 1.

An extension of example-3 above is that student-grades depend upon both student and the course. A teacher conducts examination for many students and a student is examined by many teachers. an employee reports to another employee.4. An employee can play the role of a manager. In that sense. Hence it is an associative entity 7. 6. In a library. A teacher teaches many students and a student is taught by many teachers. . a member may borrow many books and there may be books which are not borrowed by any member 5.

8. data structures. . A car consists of an engine and a chasis 3. 9. data flows. which can not be decomposed further in the current context of the system. and data stores mini specifications of the primitive processes in the system any other details which will provide useful information on the system Data element is piece of data.4 Data Dictionary It contains • • • an organised list of data elements. A tender is floated either for materials or services but not both.

The description is similar to data flows. Data flow is composed of data structures and/or data elements. The dictionary entry of a data element should also specify the domain. charge} * (1 to N) This indicates that the data structure may be repeated 1 to N where N is not fixed voter_identity_number/customer_account_number This indicates that either of the elements will be present. interest_rate. Each data element is a member of a domain. Invoice_details. employee_name. Also useful to include the flow volume/frequency and growth rates. Definitions of dependent data structures/data elements precede the definition of data flow. While defining the data flow the connecting points should be mentioned. The notation elements used in the data dictionary are the following: • • • [spouse_name] This indicates that spouse_name is optional {dependent_name. Structured english is used for stating minispecifications. which may be composed of Invoice_identification. company_name. not the detailed steps. They state the ways in which data flows that enter the primitive process are transformed in to data flows that leave the process. Delivery_address. which may be composed of Customer_name and Customer_address. Data store. etc. . like data flow is made up of a combination of data structures and/or data elements. Cutomer_address in turn is a structure. Only the broad outline is given. They must exist for every primitive process. Data structure is composed of data elements or other data structures.Examples are purchase_order_no. Customer_details. Examples are Customer_details. • Data dictionary also contains mini specifications. Another example is Invoice. relationship} * (0 to 15) This indicates that the data strucure can be repeated 0 to 15 times {expense_description..

An example is given below. DFD and ERD can be created independently and parallely. There should be processes in DFD which create modify and delete instances of the entities in ERD. and the Data dictionary are created. the three of them must be matched against each other.5 Decision Tree and Decision Tables A decision tree represents complex decisions in the form of a tree.Once the DFD. 3. • • • • Every data store in the DFD must correspond to atleast one entity in the ERD. Though visually it is appealing. calculate based on maximum possible electricity . ERD. there should be corresponding elements in DFD and ERD. then check if the house is occupied is occupied. For every description in the data dictionary.e. calculate on seasonal consumption basis otherwise consuption basis is damaged. meter reading appears "LOW". First the textual statment is given and then the corresponding decision tree is given: Rules for electricity billing are as below: If the meter reading) If the meter If the house calculate on If the meter usage reading is "OK". it can soon get out of hand when the number and complexity of decisions increase. calculate on consumption basis(i. For every relationship in ERD there should be a process in DFD which uses it.

then new tables will look like the following: BINARY-VALUED DECISION TABLE (three rows and two columns are added to deal with the extra class of customers) Academic Domestic customer N Y N Y N N N N Y N Y N . In the above example. An example follows: ELECTRICITY BILL CALCULATION BASED ON CUSTOMER CLASS If a customer uses electricity for domestic purposes and if the consumption is less than 300 units per month then bill with minimum monthly charges. with the rules: If the consumption is less than 300 units per month then bill with concessional rates. Multi-valued decision tables have an edge.There are two types of decision tables. if we add a new class of cutomers. called Academic. Otherwise bill with twice the concessional rates. Non-domestic users are charged double that of domestic users (minimum and special rates are double). binary-value decision tables can grow large if the number of rules increase. BINARY-VALUED DECISION TABLE Domestic Customer Consumtion < 300 units per month Minimum rate Special rate Double minimum rate Double special rate Y Y Y N N N Y N N Y N N N Y N N Y N N N N N N Y MULTI-VALUED DECISION TABLE Customer Consumption Rate D 300 S D <300 M N 300 2S N <300 2M Like decision trees. Domestic customers with a consumption of 300 units or more per month are billed at special rate. binary-valued(yes or no) and multi-valued.

A control file is also to be updated for the 'total bill amount'.6 Structured English To specify the processes (minispecifications) structured english is used. normal english: In the case of 'Bill'.Consumption < 300 units/month Minimum rate Special rate Twice minimum rate Twice special rate Concessional rate Twice concessional rate Y Y N N N N N N N Y N N N N Y N N Y N N N N N N N Y N N Y N N N N Y N N N N N N N Y MULTI-VALUED DECISION TABLE (only two columns are added to deal with the extra class of customers) Customer Consumption Rate Domestic 300 Special Domestic <300 Minimum Non-domestic 300 Twice special Non-domestic <300 Twice minimum Academic 300 Twice concessional Academic <300 Concessional 3. It consists of: • • • • • sequences of instructions (action statements) decisions (if-else) loops (repeat-until) case groups of instructions Examples: Loose. a master file is updated with the bill (that is consumer account number and bill date). A similar treatment is to be given to 'Payment' Structured english: If transaction is 'BILL' then update bill in the Accounts master file update total bill amount in the Control file If transaction is 'PAYMENT' then update receipt in the Accounts master file update total receipt amount in the Control file .

ERD. See the following figure. At a minimum. A system is in a state and will remain in that state till a condition and an action force it to change state. Appendix contains another example. It can be used to model the state changes of the system. 4. it should contain the DFD. Conclusion The output of the requirements engineering phase is the software requirements specifications (SRS) document.Another example: If previous reading and new reading match then perform 'status-check' If status is 'dead' then calculate bills based on average consumption Else compute bill based on actual consumption status-check If meter does not register any change after switching on any electrical device then meter status is 'dead' Else meter status is 'ok' 3. and the .7 State Transition Diagram Another useful diagram is the state transition diagram.

Management has decided to support purchase operations with a computer. The Director-Operations is supported by various deparmental heads for executing day-to-day operations. Currently. total strength of the department is 20. The various procedures followed by the Purchase Department have already been studied and improved by a company approved management consultant and unless really necessary no procedures should be changed while providing the necessary computer support. Appendix Case Study: Material Procurement System XYZ is a company which manufactures fertilizers(Soil Nutrition Chemicals) and chemicals for agricultural use. all purchase operations are being carried out manually. The Purchase Department's main function is to procure material for production. The satndards body of Institute for Electrical Electronics Engineers (IEEE) has defined a set of recommended practices called "IEEE Recommended Practice for Software Requirements Specifications". personnel fertilizers transportation and plant administration are some of the major departments. Our study is restricted to the purchase operations and related interfaces. The company has its head office in Delhi and a manufacturing plant in Surat. The other diagrams may be used as required. plant production. An extract of the discussions with the purchase manager and his staff related to current procedures is enclosed. It can be used as a guideline document for SRS. there are more than 300 Purchase Order (POs) always open at any time (a closed Purchase Order is defined as a Purchase Order for which all material order is received and paid) and it has become impossible to keep track of activities and provide up-to-date status of these activities to the concerned departments. . stores. An extract of discussions held with the purchase manager and his staff. It is also expected that after computerisation. At present. purchase. The manufacturing plant is looked after by a Director-Operations. The idea behind seeking computer support is to access information quickly and monitor purchase activity efficiently. A. with the same manpower. 5. finance. maintenance and other departments in time so that there should not be any situation where the plant has to be shut down due to requirement of some material requested by the departments. IEEE standards 830-1998. future purchase load (related to diversification plans of the company) can also be met. Overview of the Department The Purchase Department is headed by a Purchase Manager and he reports directly to Director-Operations.data dictionary and the minispecifications. plant maintenance. The Purchase Manager is supported by Material Procurement Officers (Buyers) and clerical staff.

Receipt of Quotations from Vendors . While passing invoices. which have separate material codes.) staggered delivery of material is mentioned. out of 4 materials involved in an enquiry. Purchase Department validates the MPR for material requested and MPRs received. In a purchase Order every material ordered has a due date for delivery. In case of any discrepancy in the terms & conditions such invoice is rejected. Insurance packing freight). Vendors submit technical and commercial quotation (2 stage bidding). Receipt of Purchase Request from Departments 2. Sometimes in the organisation's interest. Stock Clearance from stores dept. he reviews MPR Register and consolidates (from various MPRs) material requirement for his groups. Environmental Model (Ref. is required only for non-stock items of specific class. for an enquiry more than one Purchase Orders are placed (For example. where quantity ordered is large (this situation is typical for bulk material procurement. taxes. Purchase Departments checks and passes the invoice and payment advice to Finance Department for payment. last date of receipt of quotation from vendors).5".e. After this exercise. If any technical discrepancy is found such quotations are rejected after closing date for an enquiry. MPRs are accepted by Purchase Department only if the required quantity is not available in stores (Stock Clearance by Stores Dept. Material Procurement details contain material code. This PO proposal is sent to Finance Dept.) and concerned department has not exhausted allocated budgets for procurement. etc. On receiving of "Acceptance" intimation from stores. payment terms and all other relevant terms and conditions. (Allocated Budgets are provided by Finance Department at the beginning of the financial year and are available with purchase Department).) prepare their own material purchase request (MPR) and send these to purchase department as and when a requirement comes up.e. Vendors deliver material (against PO) to Store Department and submit an invoice to Purchase Department. and a financial concurrence is obtained. adjustments are made of any material which is rejected by stores (such data is mentioned by Stores in Material Receipt Intimation). Material Procurement System) Event List 1. An enquiry can contain one of many materials of a material group (for example 1. cement. description and quantity. A PO proposal contains details of all materials. On lowest cost basis an offer is selected and purchase order proposal is prepared. Procedural Details Related to Purchase Activity User Departments (i. Receipt of Stock Clearance Data 3. Maintenance. PO is considered effective from the date of acceptance of PO by vendors. All commercial quotations received are opened by the concerned buyer (in presence of Purchase Manager and Vendors) and afterwards are analysed for total value (basic cost of materials. can be part of single enquiry) every enquiry has only one closing date (i. 2 are cheaper from one vendor and rest are cheaper from another vendor).B. Production. More than one material procurement detail can be given in the same MPR. Whenever a buyer who handles a particular material group (25 material group have been formed on the basis of similariy of material) gets time. 1" bolts etc. steel etc. he raises an enquiry (request for quotation) with registered vendors (vendors are registered with the company for specific class of items). Also in some purchase orders.

4. 8. 7. Order Acceptance from Vendors Availability of Material Receipt Data Invoice Submission by Vendors Receipt of Financial Concurrence Receipt of Allocated Budgets . 5. 6.

Level -1 Data Flow Diagram .

Level 2 .Data Flow Diagram .

References 1. 3. Printice Hall.Galgotia Publications Pvt. M. Software Engineering . D. Introducing Systems Analysis . Whitten.Wesley Publishers Ltd.Steven Skidmore . Bentley and V. 4.BPB Publications. 2. L.Alan M.Analysis and Specification . Iriwin Inc . L. . Software Requirements . Ltd. Barlow Richard d. Davis.Ian Sommerville Addison .J.6. Systems Analysis & Design Methods .

that will save time and money. If you can find a problem in the requirements before it turns into a problem in the system. Systems Product 67% of total defects during the development found in Inspection Applications Product 82% of all the defects found during inspection of design and code . March 1976. Formal Definitions Quality Control (QC) A set of techniques designed to verify and validate the quality of work products and observe whether requirements are met. Fagan "Design and Code Inspections to Reduce Errors in Program Development". M.“Is the correct task done?” Static Testing V&V is done on any software element.“Is the task done correctly?” Validation . Importance of Static Testing Why Static Testing? The benefit is clear once you think about it. Software Element Every deliverable or in-process document produced or acquired during the Software Development Life Cycle (SDLC) is a software element. 2. Verification and validation techniques Verification . Dynamic Testing V&V is done on executing the software with pre-defined test cases.E. The following statistics would be mind boggling.Reviews. Walkthroughs & Inspections 1. IBM Systems Journal.

March 1990. IEEE Transactions on Software Engineering) captures the importance of Static Testing.A. in saving of $25.F.000 The following three ‘stories’ should communicate the importance of Static Testing: When my daughter called me a ‘cheat’ My failure as a programmer Loss of “Mars Climate Obiter” A Few More Software Failures . IEEE Computer Society Press. Buchwald.Lessons for others The following diagram of Fagan (Advances in Inspections. Nice. Operating System Inspection decreased the cost of detecting a fault by 85% Marilyn Bush. "Improving Software Quality: The Use of Formal Inspections at the Jet Propulsion Laboratory". pages 196-199. L." IEEE Software. Jet Propulsion Laboratory Project Every two-hour inspection session results. France. Lewski. Proceedings of the 12th International Conference on Software Engineering.Spend a little extra earlier or spend much more later. Ackerman. "Software Inspections: An Effective Verification Process. and F. on an average. May 1989. . The lesson learned could be summarized in one sentence .

Sometimes automated tools can help. Java programmers can use tools like the JTest product to check their programs against a coding standard. we have to start at the right time. where Static Testing starts. It is appropriate to state that not all static testing involves people sitting at a table looking at a document.The ‘statistics’. Effective reviews involve the right people. walkthrough or inspection should understand the basic ground rules of such events. However. However. because we're good at spotting inconsistencies. the above stories and Fagan’s diagram emphasizes the need for Static Testing. And everyone who attends a review. a static testing should happen as soon as possible after the item to be tested has been created. testers who attend review meetings do need to bring sufficient knowledge of the business domain. For optimal returns. . while the assumptions and inspirations remain fresh in the creator's mind and none of the errors in the item have caused negative consequences in downstream processes. When to start the Static Testing? To get value from static testing. and the like. as testers can't prevent bugs in code that's already written. missing details. As testers. reviewing the requirements after the programmers have finished coding the entire system may help testers design test cases. system architects must attend design reviews. and programming to each review. the lint program can help find potential bugs in programs. system architecture. the significant return on the static testing investment is no longer available. and expert programmers must attend code reviews. For C programmers. we can also be valuable participants. Business domain experts must attend requirements reviews. vagueness. For example. The following diagram of Somerville (Software Engineering 6th Edition) communicates.

What are objectives of “Reviews”? To ensure that: • • • The software element conforms to its specifications. Test Plans. users. and guidelines applicable for the project. managers. Changes to the software element are properly implemented and affect only those system areas identified by the change specification.3.when applicable Review team members should receive the review materials in advance and they come prepared for the meeting Check list for defects Reviews – Meeting . Reviews IEEE classifies Static Testing under three broad categories: • • • Reviews Walkthroughs Inspections What is “Reviews?” A meeting at which the software element is presented to project personnel. The software element can be Project Plans. Design Documents. Reviews . customers or other interested parties for comment or approval. User Manual. URS.Input • • • • • • • • A statement of objectives for the technical reviews The software element being examined Software project management plan Current anomalies or issues list for the software product Documented review procedures Earlier review report . code. standards. SRS. The development of the software element is being done as per plans.

Walkthrough . status on previous feedback and review days utilized are also recorded.Objectives • • • To find defects To consider alternative implementations To ensure compliance to standards & specifications Walkthrough – Input • • • • • • A statement of objectives The software element for examination Standards for the development of the software Distribution of materials to the team members. later. later.Outputs • • List of review findings List of resolved and unresolved issues found during the later re-verification 4. perceived omissions or deviations from the specifications Document the above discussions Record the comments and decisions The walk-through leader shall verify. that the action items assigned in the meeting are closed Reviews .• • • • • • • Examine the software element for adherence to specifications and standards Changes to software element are properly implemented and affect only the specified areas Record all deviations Assign responsibility for getting the issues resolved Review sessions are not expected to find solutions to the deviations. The areas of major concerns. Walkthrough Walkthrough – Definition A technique in which a designer or programmer leads the members of the development team and other interested parties through the segment of the documentation or code and the participants ask questions and make comments about possible errors. before the meeting Team members shall examine and come prepared for the meeting Check list for defects Walkthrough. The review leader shall verify. that the action items assigned in the meeting are closed Walkthrough – Outputs . violation of standards and other problems.Meeting • • • • • • Author presents the software element Members ask questions and raise issues regarding deviations Discuss concerns.

violations of development standards and other problems.author Inspectors raise questions to expose the defects Recording defects .Output • • Defect list. Inspection – Objectives • • • • To verify that the software element satisfies the specifications & conforms to applicable standards To identify deviations To collect software engineering data like defect and effort To improve the checklists. An inspection is very rigorous and the inspectors are trained in inspection techniques. as a spin-off Inspection – Input • • • • • • • • A statement of objectives for the inspection Having the software element ready Documented inspection procedure Current defect list A checklist of possible defects Arrange for all standards and guidelines Distribution of materials to the team members. Inspection Inspection – Definition A visual examination of software element to detect errors.Acceptance with no or minor rework. Determination of remedial or investigative action for a defect is a mandatory element of a software inspection. without further verification . description and severity of the defect Reviewing the defect list .specific questions to ensure completeness and accuracy Making exit decision . description and classification An estimate of rework effort and rework completion date . before the meeting Team members shall examine and come prepared for the inspection Inspection – Meeting • • • • • • Introducing the participants and describing their role (by the Moderator) Presentation of the software element by non .• • List of walk-through findings List of resolved and unresolved issues found during the later re-verification 5. although the solution should not be determined in the inspection meeting.Re-inspect Inspection . containing a defect location.Accept with rework verification (by inspection team leader or a designated member of the inspection team) .a defect list details the location.

Documented attendance.Detect defects. Verify resolution Group Dynamics: • • • Reviews: 3 or more persons . Technical experts and peer mix Walkthroughs: 2 to 7 persons Technical experts and peer mix Inspections: 3 to 6 persons. Examine alternatives. whereas dynamic methods show only the symptom of the defect Static methods expose a batch of defects. Advantages Advantages of Static Methods over Dynamic Methods • • • • Early detection of software defects Static methods expose defects. whereas it is usually one by one in dynamic methods Some defects can be found only by Static Testing o Code redundancy (when logic is not affected) o Dead code o Violations of coding standards 8. Walk-Throughs and Inspections Objectives: • • • Reviews . Metrics for Inspections .6.Evaluate conformance to specifications. Comparison of Reviews. with formally trained inspectors Decision Making & Change Control: • • • Reviews: Review Team requests Project Team leadership or management to act on recommendations Walkthroughs: All decisions made by producer. Change is prerogative of the author Inspections: Team declares exit decision – Acceptance. Ensure change integrity Walkthroughs .Detect and identify defects. Forum for learning Inspections . Rework & Verify or Rework & Re-inspect Material Volume: • • • Reviews: Moderate to high Walkthroughs: Relatively low Inspections: Relatively low Presenter • • • Reviews: Software Element Representative Walkthroughs: Author Inspections: Other than author 7.

either physical or emotional.e. that are draining off the participant’s attention.Fault Density • • Specification and Design Faults per page Code Faults per 1000 lines of code Fault Detection Rate Faults detected per hour Fault Detection Efficiency Faults detected per person . and correcting any conditions. Walk-throughs and Inspections Responsibilities for Team Members • • • • Leader: To conduct / moderate the review / walkthrough / inspection process Reader: To present the relevant material in a logical fashion Recorder: To document defects and deviations Other Members: To critically question the material being presented Communication Factors • • • • • • • • • Discussion is Kept within bounds Not extreme in opinion Reasonable and calm Not directed at a personal level Concentrate on finding defects Not get bogged down with disagreement Not discuss trivial issues Not involve itself in solution-hunting The participants should be sensitive to each other by keeping the synergy of the meeting very high Being aware of. Maximum number of defects is found during the early stages of software development life cycle.hour Inspection Efficiency (Number of faults found during inspection) / (Total number of faults during development) Maintenance Vs Inspection "Number of corrections during the first six months of operational phase" and "Number of defects found in inspections" for different projects of comparable size 9. shall ensure that the meeting is fruitful – i. . Common Issues for Reviews.

196-199. vol.E. no 3. Russell.1990 Literature References for further reading: 1. 12." IEEE Transactions on Software Engineering. "Experience with inspection in ultralarge-scale developments". Handbook of Walkthroughs. and F. pp. "Experience with Fagan's Inspection Method. Std 1028 – 1997 2. Ackerman. AT&T Technical Journal. pp. M. 31 – 36. 65.10. A. IEEE Standards for Software Reviews. G.F. France. "In-process inspections of workproducts at AT&T". 173 – 182. 5. P. Basic References 1. January 1991. pp. March 1986. W. vol. 25 – 31. 6. pp. no 2. 182 – 211. 7. Daniel P. IBM Systems Journal. February 1992." IEEE Software. 744 – 751. no 3. Fagan. Inspections. and Technical Reviews (Third Edition). 22. Fowler. vol. "Improving Software Quality: The Use of Formal Inspections at the Jet Propulsion Laboratory". Fagan "Design and Code Inspections to Reduce Errors in Program Development". E. L. vol. May 1989. Marilyn Bush. Lewski. no 1. Nice. IEEE Computer Society Press pp. pp." Software-Practice and Experience. no 2. Priscilla J. Freedman and Gerald M. Proceedings of the 12th International Conference on Software Engineering. vol. Weinberg. . "Software Inspections: An Effective Verification Process. Dorset House. 2. pp. 3. Buchwald. March 1976. 102 – 112. July 1986. 15. no 7. 4. M. Doolan. IEEE Software. March 1990. E. 6. vol. "Advances In Software Inspections. 8.

same thing can not be said about software. During its life. they will be found in our homes.When they are going to play such a crucial role. Conformance to standards: We need to follow certain standards to ensure clear and unambiguous documentation. The sad part is. In addition to meeting the client's requirements. Methodology: Broadly there are two types of methodologies. they can take you to the moon. and so on. software design experts. In the near future. Slowly and surely they are taking over many of the functions that effect our lives critically. Planning: Since the development takes place against a client's requirements it is imperative that the whole effort is well planned to meet the schedule and cost constraints. the software also should meet additional quality constraints. Some of the aspects of real life software projects are: • Team effort: Any large development effort requires the services of a team of specialists. For example the team could consist of domain experts. For example. It is the software that gives life to them. both literally and figuratively. But at the same time. even in unforeseen circumstances. Lay user: Most of the time. manufacturing. It may be good enough for 'toy' problems. Introduction Computers are becoming a key element in our daily lives. transportation. etc. one of them should be chosen in advance. Once developed. 'procedure oriented methodolgies' and 'object oriented methodologies'. Load the right kind of software. security. it has to undergo lot of changes. the software lives for a long time. There are tools known as Computer Aided Software Engineering (CASE) tools which simplify the process of documentation. one small flaw either in the hardware or the software can lead to catastrophic consequences. documentation is necessary if client signoff is required at various stages of the process. defence systems. Without clear design specifications and well documented code. it will be impossible to make changes. these software packages will be used by non-computer savvy users. coding specialists. There will be constant interaction among team members. Software tools: Documentation is important for the success of a software project. hardware specialists. but in real life. Oral communication and 'back of the envelop designs' are not sufficient. process control systems. but it is a cumbersome task and many software practitioners balk at the prospect of documentation. they are harmless pieces of hardware. They could be in terms of performance. Hence there is a need to control its development through a well defined and systematic process. IEEE standards for requirements • • • • • • • . communication. Left to themselves. Hence the software has to be highly robust. Though theoretically either of them could be used in any given problem situation. Documentation: Clear and unambiguous documentation of the artifacts of the development process are critical for the success of the software project. Quality assurance: Clients expect value for money. namely. etc. They are now controlling all forms of monetary transactions. Each group could concentrate on a specific aspect of the problem and design suitable solution.Software Development Process 1. testing experts. There is no theory for software devlopment as yet. The old fashioned 'code & test' approach will not do any more. it is mandatory that software always behaves in a predictable manner. while there are well defined processes based on theoretical foundations to ensure the reliability of the hardware. For example. software is expected to solve enormously complex problems. controlling all forms of appliances. However no group can work in isolation.

mathematical libraries.• • • • • specifications. it is important that the user gets the right copy of the software. But. Non-developer maintenance: Software lives for a long time. it is necessary to analyse its impact on various parts of the software. inadequate resources. graphical user interface tool kits. 2. a little less reliability may be acceptable. Every function that accesses the variable will be effected. Sometimes. Subject to risks: Any large effort is subject to risks. Imagine modifying the value of global variable. It is necessary to constantly evaluate the risks. in a weather monitoring system. the software may not behave as expected. etc. Reuse: The development effort can be optimised. If the software is going to be used in life critical situations. memory usage Correctness is the most important attribute. by reusing well-tested components. The other attributes may be present in varying degrees. The development team. Some other team will have to ensure that the software continues to provide services. However. it should be possible to roll back to the previous versions. etc. accuracy. The risks could be in terms of non-availability of skills. clients may specify the standards to be used. technology. In case of failures. etc. and put in place risk mitigation measures. Version control: Once changes are made to the software. Unless care is taken to minimise the impact. it is an expensive proposition to make a software 100% reliable and it is not required in all contexts. the final . even when unexpected inputs are given Usability: Ease of use. Software Quality The goal of any software development process is to produce high quality software. Change management: Whenever a change has to be made. Every software must be correct. For example. For example. What is software quality? It has been variously defined as: • • • • Fitness for purpose Zero defects Conformability & dependability The ability of the software to meet customer's stated and implied needs Some of the important attributes that can be used to measure software quality are: • • • • • • • • • • • Correctness: Software should meet the customer's needs Robustness: Software must always behave in an expected manner. EJBs. design. say. may not be available to maintain the package. A software with a graphical user interface is considered more user-friendly than one without it Portability: The ease with which software can be moved from one platform to another Efficiency: Optimal resource (memory & execution time) utilization Maintainability: Ease with which software can be modified Reliability: The probability of the software giving consistent results over a period of time Flexibility: Ease with which software can be adapted for use in different contexts Security: Prevention of unauthorised access Interoperabilty: The abilty of the software to integrate with existing systems Performance: The ability of the software to deliver the outputs with in the given constraints like time. then 100% reliability is mandatory.

it was possible to access the internal architecture directly to improve performance.decision lies with the client. In the days.100% Inspection .1 Exercise . One should keep in mind that some of the above attributes conflict with each other. For example. 3. when DOS ruled the world. To port such a program to any other platform would require enormous changes. one may resort to using system dependent features. What is a Process 3. but that will effect the portability. portability and efficiency could conflict with each other. To improve efficiency. So in practice there will always be a tradeoff.

2 What is a Process? A Process is a series of definable. and measurable tasks leading to a useful result.3. Individual productivity increases due to specialization and at the same time the team's productivity increases due to coordination of activities A good software development process should: • • • • view software development as a value added business activity and not merely as a technical activity ensure that every product is checked to see if value addition has indeed taken place safeguard against loss of value once the product is complete provide management information for in-situ control of the process To define such a process the following steps need to be followed: . repeatable. • • • • • It provides visibility into a project. The benefits of a well defined process are numerous. This avoids cascading of faults into later phases It helps to organize workflow and outputs to maximize resource utilization It defines everybody's roles and responsibilities clearly. Visibility in turn aids timely mid-course corrections It helps developers to weed out faults at the point of introduction.

is pictorially shown thus: . In this phase we determine organisation of various modules in the software system Construction: Coding is the main activity in this phase Testing: There are three categories of testing: unit testing. This phase should answer the question: what is to be done to meet user needs? Design: This phase answers the question: How to meet the user needs? With respect to the above example. accuracy. the process of checking if an algorithm has been implemented correctly. There are two types of testing: Black box testing and White box testing. Verify means to check if the task has been executed correctly. Black box testing focuses on generating test cases based on requirements. while the process of checking if the result of the algorithm execution is the solution to the desired problem.sequential model and iterative model. The generic phases that are normally used in a software development process are: • Analysis: In this phase user needs are gathered and converted into software requirements. is verification. In the context of software. For example.1 Waterfall Model: Sequential model. White box testing focuses on generating test cases based on the internal logic of various modules Implementation • • • • 4. integration testing. design consists of deciding the algorithm to be used to solve the governing equations. also known as water fall model. two types of software life cycle models are used . etc. the software requirement is to solve the governing equations. The choice of the algorithm depends on design objectives like execution time. while validate means to check if the correct task has been executed. is validation. Software Life Cylce Models In practice. 4.• • • • • Identify the phases of development and the tasks to be carried out in each phase Model the intra and inter phase transitions Use techniques to carry out the tasks Verify and Validate each task and the results Exercise process and project management skills The words 'verify' and 'validate' need some clarification. if the user need is to generate the trajectory of a missile. and system testing.

The essence is speed. Prototyping as the name suggests. There are two types of prototyping models. ensuring that quality standards are met. It is often difficult to envisage all the requirements a priori. A working version of the software will not be available until late in the project life cycle. Software is continuously refined and expanded with feedback from the client. an ad-hoc and quick development approach with no thought to quality.2 Prototyping Prototyping is discussed in the literature as a separate approach to software development. the code is discarded and fresh development is started. namely: • • Throw away prototype and Evolutionary prototype The objective of the throw away prototyping model is to understand the requirements and solution methodologies better. It requires that a phase is complete before the next phase is started. because it does not meet the quality standards. It is akin to 'code and test'. in the sense that developed code is discarded. is resorted to. This model suffers from wastage of effort. always with an eye on quality. then the whole process has to be started all over again. once the objective is met. . iteration both within a phase and across phases is a necessity. it helps in contract finalisation with reference to delivery and payment schedules. If a mistake in understanding the requirements gets detected during the coding phase. because of the uncertainity in the software requirements. However. Hence. Since the requirements are now well understood. The requirements are prioritised and the code is developed for the most important requirements first.It represents the development process as a sequence of steps (phases). In practice it is difficult to use this model as it is. So. requires that a working version of the software is built early in the project life. Because of the explicit recognition of phases and sequencing. 4. The chief advantage of prototyping is that the client gets a feel of the product early in the project life cycle. one could use the sequential approach. Evolutionary prototyping takes a different approach.

Such a model can be characterised by doing a little analysis. the risks involved in carrying out the phase are evaluated. It is more in the nature of a framework. Pictorially it can be shown thus: It allows best mix of other approaches and focusses on eliminating errors and unattractive alternatives early. simulations. . code. 4. design. Once the objectives. and constraints for a phase are identified. evolutionary prototyping is an iterative model. etc. For evaluation purposes. test and repeat the process till the product is complete. which needs to be adapted to specific projects. alternatives. This model is best suited for projects. no go' decision. which involve new technology development.3 Spiral Model Barry Boehm has suggested another iteartive model called the spiral model. Risk analysis expertise is most critical for such projects. one could use prototyping.As can be seen. An important feature of this model is the stress on risk analysis. which is expected to result in a 'go.

More information on RUP can be obtained here and from here. either corrective action is taken or a rework is ordered. LEAN DEVELOPMENT (LD). then there can be lot of wastage of effort or the final product may not meet the customer's needs. If the model cannot handle this dynamism. They are also criticised for their excessive emphasis on structure. They are considered to be heavyweight or rigorous. 'T' is the set of tasks to be performed. Rational Unified Process (RUP) developed by Rational Corporation is noteworthy. questioning this premise.4. A short description of each of these methods follows: . It can be used in any development process. Each phase in the process can be considered as an activity and structured using the ETVX model. build often". usually of the order of 3 to 4 weeks. 'V' stands for the verification & validation process to ensure that the right tasks are performed. It is an iterative model and captures many of the best practices of modern software development.4 ETVX model IBM introduced the ETVX model during the 80's to document their processes. the tasks can be further subdivided and each subtask can be further structured using the ETVX model. FEATURE-DRIVEN DEVELOPMENT. If an activity fails in the validation check. The proponents argue that software development being essentially a human activity. CRYSTAL METHODS. The development team also gets continuous feedback. There is a movement called Agile Software Movement. A number of agile methodologies have been proposed.5 Rational Unified Process Model Among the modern process models. One of the criticisms against these methodologies is that there is more emphasis on following procedures and preparing documentation. 4. RUP is explained more fully in the module OOAD with UML. 'E' stands for the entry criteria which must be satisfied before a set of tasks can be performed. 4. The subprojects are chosen so that they have short delivery cycles. there will always have variations in processes and inputs and the model should be flexible enough to handle the variations. Hence the agile methodolgies advocate the principle "build short. For example: the entire set of software requirements cannot be known at the begining of the project nor do they remain static. DYNAMIC SYSTEMS DEVELOPMENT METHOD (DSDM). If required.6 Agile Methodologies All the methodologies described before are based on the premise that any software development process should be predictable and repeatable. This way the customer gets continuous delivery of useful and usable systems. The more popular among them are SCRUM. That is the given project is broken up in to subprojects and each subproject is developed and integrated in to the already delivered system. and 'X' stands for the exit criteria or the outputs of the tasks. EXTREME PROGRAMMING (XP).

Fitness for business purpose 5. Red . Probably the best way to attack this problem is to look at the software requirements. Orange web. the restructuring of the Japanese automobile manufacturing industry that occurred in the 1980s. how can we choose one? There is no single answer to this question. Empower the team. demonstrates the efficacy of pair programming. the other person would be reviewing. Some of the names used for the methodologies are Clear. Active user involvement 2. Integrated testing 9. Baselining of requirements at a high level 8. criticality and objectives.. 5. . The paper by Laurie Williams et al. LEAN DEVELOPMENT (LD): This methodology is derived from the principles of lean production. Iterative and incremental development 6. Orange. and pair programming. etc. continuous refactoring. Code is always developed in pairs. Decide as late as possible. Frequent delivery of products 4. Deliver as fast as possible. • • • • CRYSTAL METHODOLOGIES: They are a set of configurable methodologies. Yellow. The site agilealliance. It focuses on building an object model. Amplify learning. The configuration is carried out based on project size.com is dedicated to promoting agile software development methodologies. It advocates daily team meetings for coordination and integration. This site is dedicated to pair programming. How to choose a process Among the plethora of available processes. Collaboration and cooperation between stakeholders More information on DSDM can be obtained here. More informtion can be obtained from here. More informtion can be obtained from here. It is based on the following principles of lean thinking: Eliminate waste. design by feature. build feature list. and build by feature. See the whole. Team empowerment 3. It is based on three important principles.. . plan by feature. viz.• • SCRUM: It is a project management framework. test first. Build the integrity in. FEATURE DRIVEN DEVELOPMENT (FDD): It is a short iteration framework for software development. It divides the development in to short cycles called sprint cycles in which a specified set of features are delivered. They focus on the people aspects of development. One of the important concepts popularised by XP is pair programming. More information can be obtained from here. All changes during development are reversible 7. More information on SCRUM can be obtained here DYNAMIC SYSTEMS DEVELOPMENT METHOD (DSDM): It is characterised by nine principles: 1. More information can be obtained from here. EXTREME PROGRAMMING (XP): This methodology is probably the most popular among the agile methodologies. While one person is keying in the code.

Addison-Wesley Publishing Company 4. evolutionary prototyping is better. Where the requirements are changing. McGraw-Hill. " The Rational Unified Process. In these days of dynamic business environment. completeness of requirements. " Software Engineering". then a model based on Boehm's spiral model. and project size is relatively small. by Roger S. underlying business processes. an agile process should be chosen. If they are stable. by Pankaj Jalote. For example. These are but guidelines only. " An Integrated Approach to Software Engineering". Inc. " Agile Software Develoment Ecosystems". organisational structure. SpringerVerlag 2. Pressman. " Software Engineering: A Practitioner's Approach".• • • • • If they are stable and well understood. then waterfall model may be sufficient. 3. modified to include iterations within phases. by Philippe Kruchten. 6. by Ian Sommerville. by Jim Highsmith. The choice of the process should depend upon the stabilty of the requirements. Many organisations choose a model and adapt it to their business requirements. some organisations use waterfall model. then throw away prototyping can be used. An Introduction". References For Further Reading In addition to the links provided. and the prevailing business environment. If the requirements are coupled with the underlying business processes. where 'time to market' is critical. 8. which are going through a process of change. the following references may be consulted: 1. Addison-Wesley Publishing Company . AddisonWesley Publishing Company 5. like the Rational Unified Process should be used. Conclusions The most important take away from this module is that software development should follow a discplined process. but not clear.

1 below depicts the components of a software system according to this view and includes some examples of each. they are relatively few and far between (examples are included in the bibliography). design. documentation and operating procedures by which computers can be made useful to man”.1 Introduction Software maintenance is often considered to be (if it is considered at all) an unpleasant. What is Software Maintenance? In order to define software maintenance we need to define exactly what is meant by software. specification. system and user manuals. As such it is often considered to be the “poor relation” of software development – budgets often do not allow for it (or allow too little). when it begins. or rather are more exposed to programs than other components of a software system – usually it is the program code that attracts most attention. This lack of published information contributes to the misunderstandings and misconceptions that people have about software maintenance. when it ends and how it relates to development. and few programmers given a choice would choose to carry out maintenance over developmental work. This view. Although there are some textbooks on software maintenance. Maintenance organisations within business publish even less because of the corporate fear of giving away the "competitive edge". Therefore it is necessary to first consider what is meant by the term software maintenance. Part of the confusion about software maintenance relates to its definition.Software Maintenance 1. Software Maintenance suffers from an “image problem” due to the fact that although software has been maintained for years. It is a common misconception to believe that software is programs and that maintenance activities are carried out exclusively on programs.2 A Definition of Software A more comprehensive view of software is given by McDermid (1991) who states that it “consists of the programs. This is because many software maintainers are more familiar with. expensive and unrewarding occupation . thus. 1. relatively little is written about the topic. the academic community publishes relatively few papers on the subject. misunderstandings and myths concerning this crucial area of software engineering abound. and the . Little funding for research about software maintenance exists. such as requirements analysis. that software maintenance is “the last resort” is largely born out of ignorance. time consuming. Table 1.something that is carried out at the end of development only when absolutely necessary (and hopefully not very often). McDermid's definition suggests that software is not only concerned with programs .but also relates to documentation of any facet of the program. Misconceptions.source and object code . What is Software Maintenance 1. Periodicals address the topic infrequently and few universities include software maintenance explicitly in their degree programmes .

“post-delivery evolution” and “support”. correcting errors accounts for only part of the maintenance effort. If however we were to build an extension to the house or fit a sun roof to a car then those would usually be thought of as improvements (rather than a maintenance activities). McDermid's is not the only definition of a software system but it is comprehensive and widely accepted. However it can be argued that there is nothing wrong with using the word maintenance provided software engineers are educated to accept its meaning within the software engineering context regardless of what it means in non-software engineering disciplines. if not all.3 A Definition of Maintenance The use of the word maintenance to describe activities undertaken on software systems after delivery has been considered a misnomer due to its failure to capture the evolutionary tendency of software products.procedures used to set up and operate the software system. Instructions to set up and use the software system 2. These include “software evolution”. After all. which is simply corrective maintenance. any work that needs to be done to keep a software system at a level considered useful to its users will still have to be carried out regardless of the name it is given. Maintenance has traditionally meant the upkeep of an artifact in response to the gradual deterioration of parts due to extended use. Consequently. a number of authors have advanced alternative terms that are considered to be more inclusive and encompass most. replacing the brakes or fixing the leaking roof. Instructions on how to react to system failures Examples 1. one carries out maintenance on a car or a house usually to correct problems e. However. of the activities undertaken on existing software to keep it operational and acceptable to the users.g. Software Components Program Source code Object code Documentation Analysis/specification: (a) Formal specification (b) Context diagram (c) Data flow diagrams Design: (a) Flowcharts (b) Entity-relationship charts Implementation: (a) Source code listings (b) Cross-reference listing Testing: (a) Test data (b) Test results Operating Procedures 1. . So for example. Therefore to apply the traditional definition of maintenance in the context of software means that software maintenance is only concerned with correcting errors.

no maintenance activities occur during the software development effort (the pre-delivery stage of the life cycle of a software system). and telecommunications facilities can be used Others insist on a general view which considers software maintenance as “any work that is undertaken after delivery of a software system”. "Maintenance covers the life of a software system from the time it is installed until it is phased out. the developers considering their work done once the system is handed over to users. IEEE STD 1219-1993. system features. for example. defines software maintenance as: Modification of a software product after delivery."(von Mayrhauser 1990). The problem with these definitions is that they fail to indicate what is actually done during maintenance. Maintenance occurs after the product is in operation (during the post-delivery stage). which draws on these different views. During development little consideration is made to the maintenance phase which is to follow." (Martin and McClure 1983). attempts have been made to develop a more comprehensive definition of maintenance which would be appropriate for use within the context of software systems. to correct faults. Some definitions focus on the particular activities carried out during maintenance. software maintenance must be performed in order to • • • • • • • • Correct errors Correct design flaws Interface with other systems Make enhancements Make necessary changes to the system Make changes in files or databases Improve the design Convert programs so that different hardware. .1. . changes that have to be made to computer programs after they have been delivered to the customer or user. The IEEE software maintenance standard. to improve performance or other attributes. Based on these definitions. According to Martin and McClure (1983). . the 'need-to-adapt' view which sees maintenance as a activity which entails changing the software when its operational environment or original requirement changes. the 'support' to users view where maintenance of software is seen as providing a service to users of the system. Schneiderwind (1987) believes that maintenance is difficult because of this shortsighted view that maintenance is a postdelivery activity. software. Also the common theme of the above definitions is that maintenance is an "after-the-fact" activity.4 Definitions of Software Maintenance As a result of the above. Other perspectives include: • • • the 'bug-fixing' view which considers software maintenance as an activity involving the detection and correction of errors found in programs. for example: • • ". or to adapt the product to a modified environment (Van Edelstein 1993).

some authors argue that the general lack of consensus on software maintenance terminology has also contributed to the negative image associated with it. Maintenance is the activity associated with keeping operational computer systems continuously in tune with requirements of users & data processing operation. Via management feedback the maintainer makes the approved corrections or improvements and the improved system is delivered to the users. thus perpetuating the loop of maintenance and extending the life of the product.5 Maintenance image problems The inclusion of the word maintenance in the term software maintenance has been linked to the negative image associated with the area. Higgins (1988) describes the problem: .. recovering from failures.Can also be defined as. unexciting detective work. and updating documentation. entails very little new creation and is therefore categorised as dull. The users operate the system and may find things wrong with it. 2. 1.programmers.. Why is Software Maintenance necessary 2.1 Overview In order to answer this question we need to consider what happens when the system is delivered to the users. Error! Reference source not found.. better performance and customisation to local working patterns. Software Maintenance is any work done to a computer program after its first installation and implementation in an operational environment The maintenance of software systems is motivated by a number of factors: • • • • To provide continuity of service: This entails fixing bugs. Schneidewind (1987) contends that to work in maintenance has been akin to having bad breath. or identify things they would like to see added to it. Similarly. Further. . The cycle then repeats itself. and it is reassuring to their ego when they manage to successfully complete a difficult section of code. . Software maintenance. shows the lifecycle of maintenance on a software product and why (theoretically) it may be never ending.tend to think of program development as a form of puzzle solving. on the other hand. and also by attempts to maintain a competitive edge over rival products. and so far outweighs the development phase in terms of time and cost. To support user requests for improvements: Examples include enhancement of functionality. To facilitate future maintenance work: This usually involves code and database restructuring. and accommodating changes in the operating system and hardware. In most cases the maintenance phase ends up being the longest process of the entire life cycle. To support mandatory upgrades: These are usually caused by changes in government regulations.

Design errors occur when for example changes made to the software are incorrect. faulty logic flow or incomplete test data. His first law is the Law of Continuing Change. wrongly communicated or the change request misunderstood. The second law is the Law of Increasing Complexity. which states that the structure of a program deteriorates as it evolves. In the event of a system failure due to an error. actions are taken to restore operation of the software system. Over time. Swanson (1976) was one of the first to examine what really happens during maintenance and was able to identify three different categories of maintenance activity.Figure 1. which states that a system needs to change in order to be useful.1 Corrective Changes necessitated by actual errors (defects or residual "bugs") in a system are termed corrective maintenance. incomplete. logic errors and coding errors. Types of Software Maintenance In order for a software system to remain useful in its environment it may be necessary to carry out a wide range of maintenance activities upon it. incorrect implementation of design specification. sometimes called “residual errors” or “bugs” prevent the software from conforming to its agreed specification. the structure of the code degrades until it becomes more cost-effective to rewrite the program. A defect or “bug” can result from design errors. The approach here is to locate the original specifications in order to . 3. 3.1 The Maintenance Lifecycle Lehman's (1980) first two laws of software evolution help explain why the Operations and Maintenance phase can be the longest of the life-cycle processes. Coding errors are caused by incorrect implementation of detailed logic design and incorrect use of the source code logic. Logic errors result from invalid tests and conclusions. All these errors. These defects manifest themselves when the system does not operate as it was designed or advertised to do. Defects are also caused by data processing errors and system performance errors.

This is based on the premise that as the software becomes useful. if not impossible. the users tend to experiment with new cases beyond the scope for which it was initially developed. to comprehend.2 Adaptive Any effort that is initiated as a result of changes in the environment in which a software system must operate is termed adaptive change. This task is estimated to consume about 25% of the total maintenance activity. and enhancements made to a system to meet the evolving and/or expanding needs of the user. 3. Unforeseen ripple effects imply a change to one part of a program may affect other sections in an unpredictable fashion.determine what the system was originally designed to do. estimates of around 50% are not uncommon. deletions. software and hardware operating platforms.3 Perfective The third widely accepted task is that of perfective maintenance. A change to the whole or part of this environment will warrant a corresponding modification of the software. This is actually the most common type of maintenance encompassing enhancements both to the function and the efficiency of the code and includes all changes. Increased program complexity usually arises from degeneration of program structure which makes the program increasingly difficult. Unfortunately. The ad hoc nature of this approach often gives rise to a range of problems that include increased program complexity and unforeseen ripple effects. work patterns. This is often due to lack of time to carry out a thorough “impact analysis” before effecting the change. government policies. . but the software maintainer must expend resources to effect the change. A successful piece of software tends to be subjected to a succession of changes resulting in an increase in its requirements. Adaptive change is a change driven by the need to accommodate modifications in the environment of the software system. insertions. extensions. 3. The term environment in this context refers to all the conditions and influences which act from outside upon the system. for example business rules. Expansion in requirements can take the form of enhancement of existing system functionality or improvement in computational efficiency. However. without which the system would become increasingly less useful until it became obsolete. The categories of maintenance above were further defined in the 1993 IEEE Standard on Software Maintenance (IEEE 1219 1993) which goes on to define a fourth category. Perfective maintenance is by far the largest consumer of maintenance resources. due to pressure from management. maintenance personnel sometimes resort to emergency fixes known as “patching”. This state of affairs can be referred to as the “spaghetti syndrome” or “software fatigue”. with this type of maintenance the user does not see a direct change in the operation of the system. modifications. Corrective maintenance has been estimated to account for 20% of all maintenance activities. As the program continues to grow with each enhancement the system evolves from an average-sized program of average maintainability to a very large program that offers great resistance to modification.

its complexity. fixing problems and routine debugging) is a small percentage of overall maintenance costs.3. Preventive maintenance is rare (only about 5%) the reason being that other pressures tend to push it to the end of the queue. The most comprehensive and authoritative study of software maintenance was conducted by B. Preventive change does not usually give rise to a substantial increase in the baseline functionality. it is easy to see that if one considers the probability of a software unit needing change and the time pressures that are often present when the change is requested. Ongoing support. Clearly. increases unless work is done to maintain or reduce it.4 Preventive The long-term effect of corrective. corrective maintenance (that is. adaptive and perfective change is expressed in Lehman's law of increasing entropy: As a large program is continuously changed.2 Distribution of maintenance by categories 3. Figure 1. B.5 Maintenance as Ongoing Support This category of maintenance work refers to the service provided to satisfy nonprogramming related work requests. Lientz and E. Martin and McClure (1983) provide similar data. The IEEE defined preventative maintenance as "maintenance performed for the purpose of preventing problems before they occur" (IEEE 1219 1993). it makes a lot of sense to anticipate change and to prepare accordingly. Still. This is the process of changing software to improve its future maintainability or to provide a better basis for future enhancements. (Lehman 1985). a demand may come to develop a new system that will improve the organisation’s competitiveness in the market. The preventative change is usually initiated from within the maintenance organisation with the intention of making programs easier to understand and hence facilitate future maintenance work. Figure 1.2 depicts the distribution of maintenance activities by category by percentage of time from the Lientz and Swanson study of some 487 software organisations. P. For instance. although not a change in itself. Swanson (1980). This will likely be seen as more desirable than spending time and money on a project that delivers no new function. which reflects deteriorating structure. The objectives of ongoing . is essential for successful communication of desired changes.

In practice. • • • . In practice. Firstly. telephone support . especially rivals. and user groups. Some changes require a faster response than others. however. resource estimates) to enable them take strategic business decisions. the introduction of a more efficient sorting algorithm into a data processing package (perfective maintenance) may require that the existing program code be restructured (preventive maintenance). Cost may also be an issue. can be a great determinant of the size of a maintenance budget. Training of end users . Ideally changes are implemented as the need for them arises. on-site visits. cost. Organisational strategy: The desire to be on a par with other organisations. The Importance of Categorising Software Changes 4. For example. obscure 'bugs' may be introduced. in the course of modifying a program due to the introduction of a new operating system (adaptive change).3 depicts the potential relations between the different types of software change. however important or potentially profitable such change may be. Despite the overlapping nature of these changes. Questions such as “should we enhance the existing system or replace it completely” may need to be considered. time. Secondly. Figure 1. informal short courses. software maintenance activities can be classified individually. training of end-users and providing business information to users and their organisations to aid decision making. Inertia: The resistance to change by users may prevent modification to a software product.users need various types of timely and accurate business information (for example. they are often intertwined. The bugs have to be traced and dealt with (corrective maintenance). it allows management to set priorities for change requests. • • 4. there are several reasons why a good understanding of the distinction between them is important. Failure to achieve the required level of communication between the maintenance organisation and those affected by the software changes may eventually lead to software failure.1 Overview In principle. Good customer relations are important for several reasons and can lead to a reduction in the misinterpretation of users change requests. Quality of the existing system: In some 'old' systems. help desk. Similarly. this can be so poor that any change can lead to unpredictable ripple effects and a potential collapse of the system. Business information . there are limitations to software change. a better understanding of users' business needs and increased user involvement in the maintenance process.typical services provided by the maintenance organisation include manuals. however this is not always possible for several reasons: • Resource Limitations: Some of the major hindrances to the quality and productivity of maintenance activities are the lack of skilled and trained maintenance programmers and the suitable tools and environment to support their work.support include effective communication between maintenance and end user personnel. since a greater proportion of maintenance effort is spent providing enhancements requested by customers than is spent on other types of system change. • Effective communication is essential as maintenance is probably the most customer-intensive part of the software life cycle.

Thirdly software is often subject to incremental release where changes made to a software product are not always done all together. The changes take place incrementally, with minor changes usually implemented while a system is in operation. Major enhancements are usually planned and incorporated, together with other minor changes, in a new release or upgrade. The change introduction mechanism also depends on whether the software package is bespoke or off-the-shelf. With bespoke software, change can often be effected as the need for it arises. For off-the-shelf packages, users normally have to wait for the next upgrade. Swanson's definitions allow the software maintenance practitioner to be able to tell the user that a certain portion of a maintenance organisation’s efforts is devoted to userdriven or environment-driven requirements. The user requirements should not be buried with other types

Fig 1.3. The Relationship between the different types of software change of maintenance. The point here is that these types of updates are not corrective in nature—they are improvements and no matter which definitions are used, it is imperative to discriminate between corrections and enhancements. By studying the types of maintenance activities above it is clear that regardless of which tools and development model is used, maintenance is needed. The categories clearly indicate that maintenance is more than fixing bugs. This view is supported by Jones (1991), who comments that “organisations lump enhancements and the fixing of bugs together”. He goes on to say that this distorts both activities and leads to confusion and mistakes in estimating the time it takes to implement changes and budgets. Even worse, this "lumping" perpetuates the notion that maintenance is fixing bugs and mistakes. Because many maintainers do not use maintenance categories, there is confusion and misinformation about maintenance. 5. A Comparison between Development and Maintenance 5.1 Overview Although maintenance could be regarded as a continuation of development, there is a fundamental difference between the two activities. The constraints that the existing

system imposes on maintenance gives rise to this difference. For example, in the course of designing an enhancement, the designer needs to investigate the current system to abstract the architectural and the low-level designs. This information is then used to • • • ascertain how the change can be accommodated predict the potential ripple effect of the change determine the skills and knowledge required to do the job

To explain the difference between new development and software maintenance further, Jones (1986) provides an interesting analogy: “The task of adding functional requirements to existing systems can be likened to the architectural work of adding a new room to an existing building. The design will be severely constrained by the existing structure, and both the architect and the builders must take care not to weaken the existing structure when additions are made. Although the costs of the new room usually will he lower than the costs of constructing an entirely new building, the costs per square foot may be much higher because of the need to remove existing walls, reroute plumbing and electrical circuits and take special care to avoid disrupting the current site”, (quoted in Corbi 1989). 6. The Cost of Maintenance 6.1 Overview Although there is no real agreement on the actual costs, sufficient data exist to indicate that maintenance does consume a large portion of overall software lifecycle costs. Arthur (1988) states that only a quarter to a third of all lifecycle costs are attributed to software development, and that some 67% of lifecycle costs are expended in the operations and maintenance phase of the life cycle. Jones (1994) states that maintenance will continue to grow and become the primary work of the software industry. Table 1.2 (Arthur 1988) provides a sample of data complied by various people and organisations regarding the percentage of lifecycle costs devoted to maintenance. These data were collected in the late 1970s, prior to all the software engineering innovations, methods, and techniques that purport to decrease overall costs. However, despite software engineering innovations, recent literature suggests that maintenance is gaining more notoriety because of its increasing costs. A research marketing firm, the Gartner Group, estimated that U.S. corporations alone spend over $30 billion annually on software maintenance, and that in the 1990s, 95% of lifecycle costs would go to maintenance (Moad 1990 Figure 1.4). Clearly, maintenance is costly, and the costs are increasing. All the innovative software engineering efforts from the 1970s and 1980s have not reduced lifecycle costs. Maintenance (%) 60 40-80 60-70

Survey Canning Boehm deRose/Nyman

Year 1972 1973 1976

Mills Zeikowitz Cashman and Holt

1976 1979 1979

75 67 60-80

Table 1.2 Maintenance Costs as a Percentage of Total Software Life-cycle Costs Today programmers' salaries consume the majority of the software budget, and most of their time is spent on maintenance as it is a labour intensive activity. As a result, organisations have seen the operations and maintenance phase of the software life cycle consume more and more resources over time. Others attribute rising maintenance costs to the age and lack of structure of the software. Osborne and Chikofsky (1990) state that much of today's software is ten to fifteen years old, and were created without benefit of the best design and coding techniques. The result is poorly designed structures, poor coding, poor logic, and poor documentation for the systems that must be maintained.

Figure 1.4 The Percentage of Software life-cycle costs devoted to maintenance. Over 75% of maintenance costs are for providing enhancements in the form of adaptive and perfective maintenance. These enhancements are significantly more expensive to complete than corrections as they require major redesign work and considerably more coding than a corrective action. The result is that the user driven enhancements (improvements) dominate the costs over the life cycle. Several later studies confirm that Lientz and Swanson's data from 1980 was still accurate in 1990. Table 1.3 summarises data from several researchers and shows that noncorrective work ranges from 78% to 84% of the overall effort, therefore that the majority of maintenance costs are being spent on enhancements. Maintenance is expensive therefore because requirements and environments change and the majority of maintenance costs are driven by users. Maintenance Category Lientz & Swanson 1980 Ball Deklava Abran 1987 1990 1990

Corrective Non-corrective

22% 78%

17% 83%

16% 84%

21% 79%

Table 1.3 Percentage effort spent on Corrective and Non-Corrective maintenance The situation at the turn of the millennium shows little sign of improvement. The Maintenance Budget Normally, after completing a lengthy and costly software development effort, organisations do not want to devote significant resources to postdelivery activities. Defining what is meant by “a significant resource” is in itself problematic – how much should maintenance cost? Underestimation of maintenance costs is partly human nature as developers do not want to believe that maintenance for the new system will consume a significant portion of lifecycle costs. They hope that the new system will be the exception to the norm as modern software engineering techniques and methods were used. Therefore the software maintenance phase of the life cycle will not, by definition, consume large amounts of money. Accordingly, sufficient amounts of money are often not allocated for maintenance. With limited resources, maintainers can only provide limited maintenance. The lack of financial resources for maintenance is due in large part to the lack of recognition that "maintenance" is primarily enhancing delivered systems rather than correcting bugs. Another factor influencing high maintenance costs is that needed items are often not included initially in the development phase, usually due to schedules or monetary constraints, but are deferred until the operations and maintenance phase. Therefore maintainers end up spending a large amount of their time coding the functions that were delayed until maintenance. As a result development costs remain within budget but maintenance costs increase. As can be seen from Table 1.3, the maintenance categories are particularly useful when trying to explain the real costs of maintenance. If organisations have this data, they will understand why maintenance is expensive and will be able to defend their estimates of time to complete tasks and resources required. 7. A Software Maintenance Framework 7.1 Overview To a large extent the requirement for software systems to evolve in order to accommodate changing user needs contributes to the high maintenance cost. However, there are other factors which contribute indirectly to this by hindering maintenance activities. A Software Maintenance Framework (which is a derivative of the Software Maintenance Framework proposed by Haworth et al. 1992) will be used to discuss some of the factors that contribute to the maintenance challenge. The elements of this framework are the user requirements, organisational and operational environments, maintenance process, maintenance personnel, and the software product (Table 1.4). Component Feature

scheduled and resourced may conflict against one another or against company policy such that it is never implemented 7. Maintenance process 5. work patterns. government regulations.1. competition in the market place . error correction and improve maintainability Request for non-programming related support Change in policies Competition in the market place Hardware innovations Software innovations Capturing requirements Variation in programming and working practices Paradigm shift Error detection and correction Maturity and difficulty of application domain Quality of documentation Malleability of programs Complexity of programs Program structure Inherent quality Staff turnover Domain expertise 2. desirable. Operational environment 4.e. They may take the view that • • • Software maintenance is like hardware maintenance Changing software is easy Changes cost too much and take too long Users may be unaware that their request • • • may involve major structural changes to the software which may take time to implement must be feasible. prioritised.g.2 Users and their Requirements Users often have little understanding of software maintenance and so can be unsupportive of the maintenance process. Software product 6. Maintenance personnel Table: Components of the Software Maintenance Framework 7. business rules. Organisational environment 3. The two categories of environment are • The organisational environment . taxation policies.3 Organisational and Operational Environment An environmental factor is a factor which acts upon the product from outside and influences its form or operation. Users & requirements Requests for additional functionality.

The operational environment - software systems e.g. operating systems, database systems, compilers and hardware systems e.g. processor, memory, peripherals

In this environment the scheduling of maintenance work can be problematic as urgent ‘fixes’ always go to the head of the queue thus upsetting schedules and unexpected, mandatory large scale changes will also need urgent attention. Further problems can stem from the organisational environment in that the maintenance budget is often underfunded. 7.4 Maintenance Process The term process here refers to any activity carried out or action taken either by a machine or maintenance personnel during software maintenance. The facets of a maintenance process which affect the evolution of the software or contribute to maintenance costs include • The difficulty of capturing change (and changing) requirements - requirements and user problems only become clearer when a system is in use. Also users may not be able to express their requirements in a form understandable to the analyst or programmer - the 'information gap'. The requirements and changes evolve, therefore the maintenance team is always playing “catch-up” Variation in programming practice – this may present difficulties if there is no consistency, therefore standards or stylistic guidelines are often provided. Working practices impact on the way a change is effected. “Time to change” can be adversely affected by “clever code”; undocumented assumptions; and undocumented design and implementation decisions. After some time, programmers find it difficult to understand their own code. Paradigm shift - older systems developed prior to the advent of structured programming techniques may be difficult to maintain. However, existing programs may be restructured or 'revamped’ using techniques and tools e.g. structured programming, object orientation, hierarchical program decomposition, reformatters and pretty-printers. Error detection and correction - error-free software is virtually non-existent. Software products tend to have 'residual' errors. The later these errors are discovered the more expensive they are to correct. The cost gets even higher if the errors are detected during the maintenance phase (Figure 1.5).

Figure 1.5 Cost of fixing errors increases in later phases of the life cycle 7.5 Software Product Aspects of a software product that contribute to the maintenance challenge include • • • Maturity and difficulty of the application domain: The requirements of applications that have been widely used and well understood are less likely to undergo substantial modifications than those that are still in their infancy. Inherent difficulty of the original problem: For example, programs dealing with simple problems such as sorting are obviously easier to handle than those used for more complex computations such as weather forecasting. Quality of the documentation: The lack of up-to-date systems documentation effects maintenance productivity. Maintenance is difficult to perform because of the need to understand (or comprehend) program code. Program understanding is a labour-intensive activity that increases costs. IBM estimated that a programmer spends around 50% of his time in the area of program analysis. Malleability of the programs: The malleable or 'soft' nature of software products makes them more vulnerable to undesirable modifications than hardware items. Inadvertent software changes may have unknown and even fatal repercussions. This is particularly true of 'safety-related' or 'safety-critical' systems. Inherent quality: The tendency for a system to decay as more changes are undertaken implies that preventive maintenance needs to be undertaken to restore order in the programs.

7.6 Maintenance Personnel This refers to individuals involved in the maintenance of a software product. They include maintenance managers, analysts, designers, programmers and testers. The aspects of personnel that affect maintenance activities include the following:

Staff turnover: Due to high staff turnover many systems end up being maintained by individuals who are not the original authors therefore a substantial proportion of the maintenance effort is spent on just understanding the code. Staff who leave take irreplaceable knowledge with them. Domain expertise: Staff may end up working on a system for which they have neither the system domain knowledge nor the application domain knowledge. Maintainers may therefore inadvertently cause the “ripple effect”. This problem may be worsened by the absence of documentation or out of date or inadequate documentation. A contrary situation is where a programmer becomes ‘enslaved’ to a certain application because she/he is the only person who understands it.

Obviously the factors of product, environment, user and maintenance personnel do not exist in isolation but interact with one another. Three major types of relation and interaction that can be identified are product/environment, product/user and product/maintenance personnel (Figure 1.6). • • • Relationship between product and environment - as the environment changes so must the product in order to be useful. Relationship between product and user - in order for the system to stay useful and acceptable to its users it also has to change to accommodate their changing requirements. Interaction between personnel and product - the maintenance personnel who implement changes also act as receptors of the changes. That is, they serve as the main avenue by which changes in the other factors - user requirements, maintenance process, organisational and operational environments - act upon the software product. The nature of the maintenance process used and the attributes of the maintenance personnel will impact upon the quality of the change.

Figure 1.6 Inter-relationship between maintenance factors 8. Potential Solutions to the Maintenance Problem

A number of possible solutions to maintenance problems have been suggested, they include • • • Budget and Effort Reallocation Complete Replacement of the System Maintenance of the Existing System

8.1 Budget and Effort Reallocation Based on the observation that software maintenance costs at least as much as new development, some authors have proposed that rather than allocating less resource to develop unmaintainable or difficult to maintain systems, more time and resource should be invested in the development – specification and design – of more maintainable systems. However, this is difficult to pursue, and even if it was possible, would not address the problem of legacy systems that are already in a maintenance crisis situation. 8.2 Complete Replacement of the System One might be tempted to suggest that “if maintaining an existing system costs as much as developing a new one, then why not develop a new system from scratch”. In practice of course it is not that simple as there are costs and risks involved. The organisation may not be able to afford the capital outlay and there is no guarantee that the new system will function any better than the old one. Additionally the existing system represents a valuable knowledge base that could prove useful for the development of future systems thereby reducing the chance of re-inventing the wheel. It would be unrealistic for an organisation to part with such as asset. 8.3 Maintenance of the existing system In most maintenance situations budget and effort reallocation is not possible and completely redesigning the whole software system is usually undesirable (but nevertheless forced in some situations). Given that these approaches are beset with problems, maintaining the existing system is often the only alternative. Maintenance could be achieved, albeit it in an “ad hoc” fashion by making necessary modifications as ‘patches’ to the source code, often by hitting alien code ‘head-on’. However this approach is fraught with difficulty and is often adopted because of time pressures. Corrective maintenance is almost always done in this way for the same reason. An alternative and cost-effective solution is to apply preventative maintenance. Preventative maintenance may take several forms, it may: • • • be applied to those sections of the software that are most often the target for change (this approach may be very cost effective as it has been estimated that 80% of changes affect only 30% of the code). involve updating inaccurate documentation involve restructuring, reverse engineering or the re-use of existing software components

Unfortunately preventive maintenance is rare; the reason being that other pressures tend to push it to the end of the queue.

Data Structured Maintenance: The Warnier/Orr Approach. 28(2): 294-306. J-Software Engineers Reference Book. M. and Conversation in the Large-Program Lifecycle. New York. Software Maintenance Management. London 1985. 1989. Maintaining The Competitive Edge. NY: McGraw-Hill. Jones 1991 Jones C. IEEE. Chapter 16. Leintz & Swanson 1980 Lientz B. Swanson 1976 Swanson E. Corbi 1989 Corbi. On Understanding Laws. Software Evolution: The Software Maintenance Challenge. The Dimensions of Maintenance. Technical Report 729. Englewood Cliffs. New York. Addison-Wesley Publishing Company. References Arthur 1988 Arthur.1990. Lehman 1985 Lehman M. The State of Software Maintenance. A. Massachusetts. T. New York. John Wiley and Sons. SE-13(3):303-10. D. Inc. 1983. Higgins 1988 Higgins. 1993 IEEE standard 1219: Standard for Software Maintenance. Reading. 1983. P. Butterworth-Heinemann 1991. San Francisco. 1:213-21. J. DATAMATION 61-6. & McClure C. Fitting Pieces to the Maintenance Puzzle. N. McDermid 1991 McDermid. IEEE Software. 1988. Program Evolution. . 1996. Osborne and Chikofsky 1990 Osborne W. New York. Jones 1994 Jones. Schneiderwind 1987 Schneidewind. F. ANSI/IEEE 1983 IEEE Standard Glossary of Software Engineering Terminology. John Wiley & Sons. 1994. NJ: Prentice Hall. Academic Press. M. A. NY. & Chikofsky E. J. March 1987. Lehman 1980 Lehman M. In Proceedings of the Second International Conference on Software Engineering. Pigoski 1996 Pigoski T. Program Understanding: Challenge for the 1990's. M. IEEE Computer Society Press. 10-11. Software Maintenance: The Problem and its Solutions. IBM Systems Journal. M. The Journal of Systems and Software. Practical Software Maintenance: Best Practice for Managing your Software Investment. & Swanson E. 1988.. 1980. Assessment and Control of Software Risks. pages 492-497. Martin & McClune 1983 Martin J. 1991. Applied Software Measurement.9. Moad 1990 Moad J. L. NJ: Prentice Hall. B. Dorset House Publishing Co. Evolution. Englewood Cliffs. 1980. C. 1990. CA. Los Altimos. IEEE Transactions of Software Engineering. B. October 1976.

D.Van Edelstein 1993 Van Edelstein.Methods and Management. CA. . 1990.Software Engineering Notes. von Mayrhauser 1990 von Mayrhauser T. ACM SIGSOFT . E.Standard for Software Maintenance. Software Engineering . Inc. San Diego. Academic Press. October 1993. Report on the IEEE STD 1219-1993 . 18(4):94-95.

1981]. & Mr. In addition. In fact. a process or a system in sufficient detail to permit its physical realization” [Webster Dictionary]. Some definitions for Design: “Devising artifacts to attain goals” [H. “The process of defining the architecture. XYZ to sleep cooking dining general activities and so on. System will be .12]. Simon. Introduction to Design 1.1 Introduction Design is an iterative process of transforming the requirements specification into a design specification. The architectural design specifies a particular solution. Software design can be viewed in the same way. interfaces and other characteristics of a system or component” [ IEEE 160. & Mr.A. one may maximize children’s room. and other minimizes it to have large living room. Consider an example where Mrs. modern and two-storied. We use requirements specification to define the problem and transform this to a solution that satisfies all the requirements in the specification. XYZ want a new house. component. For example. • • • • • a a a a a room room room room room for for for for for two children to play and sleep Mrs. and there may not be a “best” design. “The process of applying various techniques and principles for the purpose of defining a device. Without Design. An architect takes these requirements and designs a house. the style of the proposed houses may differ: traditional. the architect may produce several designs to meet this requirement. Their requirements include.System Design Overview 1. All of the proposed designs solve the problem.

Design is different from programming.2 Qualities of a Good Design Functional: It is a very basic quality attribute. Data disintegrity may also result. The very purpose of doing design activities is to build systems that are modifiable in the event of any changes in the requirements. Design brings out a representation for the program – not the program or any component of it.so that such needs are not “hard-coded” later.how it work successfully (More important for real-time and mission critical and on-line systems).inefficient due to possible data redundancy and untuned code. Therefore it is difficult to monitor & control.unmaintainable since standards & guidelines for design & construction are not used.unmanageable since there is no concrete output until coding. . No reusability consideration.. The difference is tabulated below. Any design solution should work. syntax of language Make trade-offs w. size of executable.t. . .inflexible since planning for long term changes was not given due emphasis. size of source. of transactions / unit time) memory usage. . Poor design may result in tightly coupled modules with low cohesion.r.constraints etc Devices representation of program 1. and should be constructable. . Portability & Security: These are to be addressed during design . etc Construction of program Programming Device algorithms and data representations Consider run-time environments Flexibility: It is another basic and important attribute.not portable to various hardware / software platforms. Reliability: It tells the goodness of the design . Efficiency: This can be measured through • • • • run time (time taken to undertake whole of processing task or transaction) response time (time taken to respond to a request for information) throughput (no. Design Abstactions of operations and data("What to do") Establishes interfaces Choose between design alternatives Choose functions.

aesthetics. 1995) include 1) Modular decomposition • • Based on assigning functions to components. ergonomics. 2) Event-oriented decomposition • • Based on events that the system must handle. ‘Skills’ is alterable (for example. Mutually agreed upon standards has to be adhered to. The problems with respect to integrating to other systems (typically client may ask to use a proprietary database that he is using) has to be studied & solution(s) are to be found. etc) and how much time it takes to master the system. Hardware and software platforms may remain a constraint. Usability: Usability is in terms of how the interfaces are designed (clarity. Typical applications today are internet based. 1. .4 Popular Design Methods Popular Design Methods (Wasserman. Designer try answer the “How” part of “What” is raised during the requirement phase. forgiveness. To that extent a designer should know what is happening in technology. C) to Integrated programming environments.3 Design Constraints Typical Design Constraints are: • • • • • • Budget Time Integration with other systems Skills Standards Hardware and software platforms Budget and Time cannot be changed. It starts from functions that are to be implemented and explain how each component will be organized and related to other components. user control. 1. Large. As such the solution proposed should be contemporary. It starts with cataloging various states and then describes how transformations take place. central computer systems with proprietary architecture are being replaced by distributed network of low cost computers in an open systems environment We are moving away from conventional software development based on hand generation of code (COBOL.Economy: This can be achieved by identifying re-usable components. directness. by arranging appropriate training for the team). 3) Object-oriented design • • Based on objects and their interrelationships It starts with object types and then explores object attributes and actions.

“Software Engineering: A Practitioner’s Approach”. 2001) The data design transforms the data model created during analysis into the data structures that will be required to implement the software. The architectural design defines the relationship between major structural elements of the software.S. and with humans who use it. fifth edition. The interface design describes how the software communicates within itself.Structured Design . .5 Transition from Analysis to Design (source: Pressman. 1.uses modular decomposition. the design patterns that can be used to achieve the requirements and the constraints that affect the implementation. R. McGraw-Hill.

Programs written in Unix shell In Call-and-return systems.1 Architectural Design Shaw and Garlan (1996) suggest that software architecture is the first step in producing a software design. and a collection of independent components that operate on . the hardware and network that are used to develop and operate it. Pipes are the connectors that transmit output from one filter to another. e. Commonly used styles include • • • • • • Pipes and Filters Call-and-return systems o Main program / subprogram architecture Object-oriented systems Layered systems Data-centered systems Distributed systems o Client/Server architecture In Pipes and Filters. a complete architecture plan addresses the functions that the system provides.g. The architecture of a system is a comprehensive framework that describes its form and structure. An architecture style involves its components. In Object-oriented systems. connectors. e. Interface Design and Data Design. Structure Chart is a hierarchical representation of main program and subprograms. and constraints on combining components.g. High Level Design include Architectural Design. Data-centered systems use repositories. and acts as a client to the layer inside it. Communication and coordination between components is accomplished via message calls. component is an encapsulation of data and operations that must be applied to manipulate the data. Repository includes a central data structure representing current state.g. which in turn may invoke still other components. and the software that is used to develop and operate it. In Layered systems. OSI ISO model. 2. each layer provides service to the one outside it. 2. Low Level Design) elaborates structural elements of the software into procedural (algorithmic) description. its components and how they interact together. each component (filter) reads streams of data on its inputs and produces streams of data on its output. They are arranged like an “onion ring”. High Level Design Activities Broadly. e. Shaw and Garlan (1996) describe seven architectural styles. Architecture design associates the system capabilities with the system components (like modules) that will implement them.The Procedural design (typically. Generally. the program structure decomposes function into a control hierarchy where a “main” program invokes (via procedure calls) a number of program components.

o Reduce the amount of information that must be memorized between actions . indicates that substantial information is to be overwritten. In a traditional database. o Ask for verification of any nontrivial destructive action § If a user requests the deletion of a file. General interaction 2. e. Data entry • General Interaction Guidelines for general interaction often cross the boundary into information display. read [1] (OPTIONAL) The following issues are also addressed during architecture design: • • • • • • Security Data Processing: Centralized / Distributed / Stand-alone Audit Trails Restart / Recovery User Interface Other software interfaces 2.g. Information display 3. Pressman [2] (Refer Chapter 15) presents a set of Human-Computer Interaction (HCI) design guidelines that will result in a "friendly. the transactions. data display and the myriad other functions that occur in a HCI. command input. Database A popular form of distributed system architecture is the Client/Server where a server system responds to the requests for actions / services made by client systems. For a detailed description of architecture styles. an "Are you sure . Three categories of HCI design guidelines are 1. Clients access server by remote procedure call. o Offer meaningful feedback § Provide the user with visual and auditory feedback to ensure that two way communication (between user and interface) is established. Reversal should be available in every interactive application. data entry and overall system control. The following guidelines focus on general interaction. o Be consistent § Use a consistent format for menu selection. or asks for the termination of a program.. trigger process execution. in the form of an input stream.the central data store." efficient interface. all-encompassing and are ignored at great risk.. o Permit easy reversal of most actions § UNDO or REVERSE functions have saved tens of thousands of end users from millions of hours of frustration. They are." message should appear. therefore.2 User Interface Design The design of user interfaces draws heavily on the experience of the designer.

use a presentation format that enables rapid assimilation of information § Graphs or charts should replace voluminous tables. ambiguous or unintelligible. and predictable colors § The meaning of a display should be obvious without reference to some outside source of information. the application will fail to satisfy the needs of a user. pictures and sound. the original image should be displayed constantly (in reduced form at the corner of the display) so that the user understands the relative location of the portion of the image that is currently being viewed. resolution. and text grouping to aid in understanding § Much of the information imparted by a HCI is textual. o Use consistent labels. "Now what does this mean. It may also take up unnecessary space in menu lists. In essence. § • Information Display If information presented by the HCI is incomplete. Information is "displayed" in many different ways with text. motion and size. motion. o Allow the user to maintain visual context § If computer graphics displays are scaled up and down.o o o o o The user should not be expected to remember a list of numbers or names so that he or she can re-use them in a subsequent function. indentation. o Don’t bury the user with data. standard abbreviations. and thought § Keystrokes should be minimized. o Produce meaningful error messages o Use upper and lower case. menus and graphics to obtain information relevant to a specific system function. Seek efficiency in dialog. o Use “analog” displays to represent information that is more easily assimilated with this form of representation . yet. Provide help facilities that are context sensitive Use simple action verbs or short verb phrases to name commands § A lengthy command name is more difficult to recognize and recall. the layout and form of the text has a significant impact on the ease with which information is assimilated by the user. o Display only information that is relevant to the current context § The user should not have to wade through extraneous data. by placement. the user should rarely encounter a situation where he or she asks. o Use windows to compartmentalize different types of information § Windows enable the user to "keep" many different types of information within easy reach. the distance a mouse must travel between picks should be considered in designing screen layout. Memory load should be minimized. The following guidelines focus on information display. the designer should strive for "cohesive" placement of commands and actions." Forgive mistakes § The system should protect itself from errors that might cause it to fail (defensive programming) Categorize activities by functions and organize screen geography accordingly § One of the key benefits of the pull down menu is the ability to organize commands by type. using color. and even omission.

a display of holding tank pressure in an oil refinery would have little impact if a numeric representation were used. Consider the available geography of the display screen and use it efficiently § When multiple windows are to be used. vertical motion and color changes could be used to indicate dangerous pressure conditions. § • Data Input Much of the user's time is spent picking commands. In many applications. In addition. screen size (a system engineering issues should be selected to accommodate the type of application that is to be implemented. color. o Let the user control the interactive flow § The user should be able to jump unnecessary actions. The following guidelines focus on data input: o Minimize the number of input actions required of the user § Reduce the amount of typing that is required. while a manager might be more comfortable using a point and pick device such as a mouse. digitizer and even voice recognition systems are rapidly becoming effective alternatives. Do not let the user to type . However. o Allow the user to customize the input § An expert user might decide to create custom commands or dispense with some types of warning messages and action verification. change the order of required actions (when possible in the context of an application). . text size. This can be accomplished by using the mouse to select from pre-defined sets of input. The HCI should allow this. and never let the user to enter information that can be acquired automatically or computed within the program. using "macros" that enable a single keystroke to be transformed into a more complex collection of input data. A clerical worker might be very happy with keyboard input. § This protects the user from attempting some action that could result in an error. o Maintain consistency between information display and data input § The visual characteristics of the display (e. o Interaction should be flexible but also tuned to the user’s preferred mode of input § The user model will assist in determining which mode of input is preferred. but the mouse.. space should be available to show at least some portion of each. and recover from error conditions without exiting from the program. provide default values whenever possible. placement) should be carried over to the input domain. the keyboard remains the primary input medium. typing data and otherwise providing system input.g. o Provide help to assist with all input actions o Eliminate “Mickey mouse” input § Do not let the user to specify units for engineering input (unless there may be ambiguity). This would provide the user with both absolute and relative information. a thermometer-like display were used.o For example. o Deactivate commands that are inappropriate in the context of current actions.00 for whole number dollar amounts. using a "sliding scale" to specify input data across a range of values.

etc.[IEEE. of a system that emphasizes some of the system's details or properties while suppressing others. Such levels of abstraction provide flexibility to the code in the event of any future modifications." -. 1984] While decomposing.1 Structured Design Structured design is based on functional decomposition. In structured design we functionally decompose the processes in a large system (as described in DFD) into components (called modules) and organize these components in a hierarchical fashion (structure chart) based on following principles: • • • Abstraction (functional) Information Hiding Modularity Abstraction "A view of a problem that extracts the essential information relevant to a particular purpose and ignores the remainder of the information." --. Structured Design Methodology The two major design methodologies are based on • • Functional decomposition Object-oriented approach 3. or specification. It follows typically from dataflow diagram and associated processes descriptions created as part of Structured Analysis. Information Hiding “." -. we give more details about each component. we consider the top level to be the most abstract. Structured design uses the following strategies • • Transformation analysis Transaction analysis and a few heuristics (like fan-in / fan-out. where the decomposition is centered around the identification of the major system functions and their elaboration and refinement in a top-down manner. at least for the moment.[Parnas. 1983] "A simplified description.3. span of effect vs. scope of control.. and as we move to lower levels. Every module is characterized by its knowledge of a design decision which it hides from all others. 1972] .) to transform a DFD into a software architecture (represented using a structure chart). immaterial or diversionary.[Shaw. A good abstraction is one that emphasizes details that are significant to the reader or user and suppress details that are. Its interface or definition was chosen to reveal as little as possible about its inner workings..

subprograms (e. As long as the programmer sticks to the interfaces agreed upon. functions. For example. 3. she can have flexibility in altering the component at any given point. and (to a large degree) unchanging interfaces. arrays. e. and protected members. and subroutines). C++ provides for public. Programming languages have long supported encapsulation. There are degrees of information hiding. 1990]. and "packages" in Ada. For example. Thus. The concept of encapsulation as used in an object-oriented context is essentially different from information hiding. it is easy to examine each component separately from others to determine whether the component implements its required tasks. if needed. Newer programming languages support larger encapsulation mechanisms.Parnas advocates that the details of the difficult and likely-to-change decisions be hidden from the rest of the system. with respect to railway reservation. This gives a greater freedom to programmers. For example. the rest of the system will have access to these design decisions only through well-defined. and record structures are common examples of encapsulation mechanisms supported by most programming languages. at the programming language level. 1988] • • • Break the system into suitably tractable units by means of transaction analysis Convert each unit into into a good structure chart by means of transform analysis Link back the separate units into overall system implementation Transaction Analysis The transaction is identified by studying the discrete event types that drive the system. The difference between abstraction and information hiding is that the former (abstraction) is a technique that is used to help identify which information is to be hidden. and each component has a clearly stated purpose. private. a customer may give the following transaction stimulus: . For example. Smalltalk and C++. the user interface may be designed with object orientation and the security design might use state-transition diagram..g. Modularity Modularity leads to components that have clearly defined inputs and outputs. Encapsulation refers to building a capsule around some collection of things [Wirfs-Brock et al. In C language. Further. Modularity also helps one to design different components in different ways..g. "modules" in Modula. and Ada has both private and limited private types.2 Strategies for converting the DFD into Structure Chart Steps [Page-Jones. procedures. "classes" in Simula. information hiding can be done by declaring a variable static within a source file.

two and three respectively. It will not affect any other components in our breakup. Reserve Ticket (booking) and Cancel Ticket (cancellation). The modules Transaction1 ( ). refresh and print a text menu and prompt the user to select a choice and return this choice to Main ( ). Transaction2 ( ) and Transaction3 ( ) are the coordinators of transactions one. On any given time we will get customers interested in giving any of the above transaction stimuli. etc.) for all the identified transaction centers. The first step in our strategy is to identify such transaction types and draw the first level breakup of modules in the structure chart. In a typical situation. This is shown as follows: The Main ( ) which is a over-all coordinating module. It may for example. gets the information about what transaction the user prefers to do through TransChoice. The details of these transactions are to be exploded in the next levels of abstraction. We will continue to identify more transaction centers by drawing a navigation chart of all input screens that are needed to get various transaction stimuli from the user. Remember. Transform Analysis Transform analysis is strategy of converting each piece of DFD (may be from level 2 or level 3. The actual details of how GetTransactionType ( ) is not relevant for Main ( ). The TransChoice is returned as a parameter to Main ( ).The three transaction types here are: Check Availability (an enquiry). by creating separate module to co-ordinate various transaction types. The human user would inform the system her preference by selecting a transaction type from a menu. any one stimulus may be entered through a particular terminal. even when this module is changed later to return the same input through graphical interface instead of textual menu. the given system has only . for all identified transaction centers. These are to be factored out in the next levels of the structure chart (in exactly the same way as seen before). we are following our design principles faithfully in decomposing our modules. In case.

4. 3. Draw a DFD of a transaction type (usually done during analysis phase) Find the central functions of the DFD Convert the DFD into a first-cut structure chart Refine the structure chart Verify that the final structure chart meets the requirements of the original DFD Let us understand these steps through a payroll system example: • Identifying the central transform The central transform is the portion of DFD that contains the essential functions of the system and is independent of the particular implementation of the input and output. 1988) is to identify the centre of the DFD by pruning off its afferent and efferent branches. then we can start transformation from level 1 DFD itself. a format or validation process only refines the input – does not transform it). Transform analysis is composed of the following five steps [Page-Jones. 1988]: 1. Afferent stream is traced from outside of the DFD to a flow point inside. One way of identifying central transform (Page-Jones. 5. just before the input is being transformed into some form of output (For example. 2. Similarly an efferent stream is a flow point from where output is .one transaction (like a payroll system).

The main advantage of hierarchical (functional) arrangement of module is that it leads to flexibility in the software. P4 & P5 . In the above example. P1 is an input process. security. If the same module is to be used in more than one place. we moved a module to get valid timesheet (afferent process) to the left side (indicated in yellow). • First-cut Structure Chart To produce first-cut (first draft) structure chart. In case we fail to find a boss module within. . P3. Factor down till you reach to modules that correspond to processes that access source / sink or data stores. the name should sum up the activities done by the module and its sub-ordinates. etc. A boss module can be one of the central transform processes.which transform the given input into some form of output. Ideally. Once this is ready. we have a dummy boss module “Produce Payroll” – which is named in a way that it indicate what the program is about. it will be demoted down such that “fan in” can be done from the higher levels. the module can be split into two in the next level – one to get the selection and another to calculate.formatted for better presentation. the efferent stream process on the right most side and the central transform processes in the middle. The two central transform processes are move in the middle (indicated in orange). other features of the software like error handling. The processes between afferent and efferent stream represent the central transform (marked within dotted lines above). • Refine the Structure Chart Expand the structure chart further by using the different levels of DFD. we have created two modules (in blue) – essentially to print results. has to be added. if “Calculate Deduction” module is to select deduction rates from multiple rates. first we have to establish a boss module. Central transform processes are P2. the “Calculate Deduction” module would return the same value. A module name should not be used for two different modules. By grouping the other two central transform processes with the respective efferent processes. on the right side. Here. a dummy coordinating module is created In the above illustration. the afferent stream processes are moved to left most side of the next level of structure chart. Even after this change. For instance. Having established the boss module. such process has to be more of a coordinating process (encompassing the essence of transformation). and P6 & P7 are output processes. Ideally.

Verify Structure Chart vis-à-vis with DFD

Because of the orientation towards the end-product, the software, the finer details of how data gets originated and stored (as appeared in DFD) is not explicit in Structure Chart. Hence DFD may still be needed along with Structure Chart to understand the data flow while creating low-level design. • Constructing Structure Chart (An illustration)

Some characteristics of the structure chart as a whole would give some clues about the quality of the system. Page-Jones (1988) suggest following guidelines for a good decomposition of structure chart: • • • Avoid decision splits - Keep span-of-effect within scope-of-control: i.e. A module can affect only those modules which comes under it’s control (All sub-ordinates, immediate ones and modules reporting to them, etc.) Error should be reported from the module that both detects an error and knows what the error is. Restrict fan-out (number of subordinates to a module) of a module to seven. Increase fan-in (number of immediate bosses for a module). High fan-ins (in a functional way) improves reusability.

Refer [Page-Jones, 1988: Chapters 7 & 10] for more guidelines & illustrations on structure chart. 3.3 How to measure the goodness of the design To Measure design quality, we use coupling (the degree of interdependence between two modules), and cohesion (the measure of the strength of functional relatedness of elements within a module). Page-Jones gives a good metaphor for understanding coupling and cohesion: Consider two cities A & B, each having a big soda plant C & D respectively. The employees of C are predominantly in city B and employees of D in city A. What will happen

to the highway traffic between city A & B? By placing employees associated to a plant in the city where plant is situated improves the situation (reduces the traffic). This is the basis of cohesion (which also automatically ‘improve’ coupling). COUPLING Coupling is the measure of strength of association established by a connection from one module to another [Stevens et al., 1974]. Minimizing connections between modules also minimizes the paths along which changes and errors can propagate into other parts of the system (‘ripple effect’). The use of global variables can result in an enormous number of connections between the modules of a program. The degree of coupling between two modules is a function of several factors [Stevens et al., 1974]: (1) How complicated the connection is, (2) Whether the connection refers to the module itself or something inside it, and (3) What is being sent or received. Table 1 summarizes various types of coupling [Page-Jones, 1988]. NORMAL Coupling (acceptable) DATA Coupling Two modules are data coupled if they communicate by parameters (each being an elementary piece of data) STAMP Coupling Two modules are stamp coupled if one passes to other a composite piece of data (a piece of data with meaningful internal structure) CONTROL Coupling Two modules are control coupled if one passes to other a piece of information intended to control the internal logic of the other COMMON (or CONTENT (or GLOBAL) PATHOLOGIC Coupling AL) Coupling (unacceptable) (forbidden) Two modules are common coupled if they refer to the same global data area Two modules exhibit content coupled if one refers to the inside of the other in any way (if one module ‘jumps’ inside another module) Jumping inside a module violate all the design principles like abstraction, information hiding and modularity.

e.g. sin Instead of e.g. calc_order_amt e.g. print_report (theta)returning (PO_Details)returning (what_to_print_flag) communicating sine value through value of the order parameters, two modules use a calc_interest global data. (amt, interest rate, term)returning interest amt.

We aim for a ‘loose’ coupling. We may come across a (rare) case of module A calling module B, but no parameters passed between them (neither send, nor received). This is strictly should be positioned at zero point on the scale of coupling (lower than Normal Coupling itself) [Page-Jones, 1988]. Two modules A &B are normally coupled if A calls B – B returns to A – (and) all information passed between them is by means of parameters passed through the call mechanism. The other two types of coupling (Common and Content) are abnormal coupling and not desired. Even in Normal Coupling we should take care of following issues [Page-Jones, 1988]:

• •

Data coupling can become complex if number of parameters communicated between is large. In Stamp coupling there is always a danger of over-exposing irrelevant data to called module. (Beware of the meaning of composite data. Name represented as a array of characters may not qualify as a composite data. The meaning of composite data is the way it is used in the application NOT as represented in a program) “What-to-do flags” are not desirable when it comes from a called module (‘inversion of authority’): It is alright to have calling module (by virtue of the fact, is a boss in the hierarchical arrangement) know internals of called module and not the other way around.

In general, Page-Jones also warns of tramp data and hybrid coupling. When data is passed up and down merely to send it to a desired module, the data will have no meaning at various levels. This will lead to tramp data. Hybrid coupling will result when different parts of flags are used (misused?) to mean different things in different places (Usually we may brand it as control coupling – but hybrid coupling complicate connections between modules). Page-Jones advocates a way to distinguish data from control flags (data are named by nouns and control flags by verbs). Two modules may be coupled in more than one way. In such cases, their coupling is defined by the worst coupling type they exhibit [Page-Jones, 1988]. COHESION Designers should aim for loosely coupled and highly cohesive modules. Coupling is reduced when the relationships among elements not in the same module are minimized. Cohesion on the other hand aims to maximize the relationships among elements in the same module. Cohesion is a good measure of the maintainability of a module [Stevens et al., 1974]. Stevens, Myers, Constantine, and Yourdon developed a scale of cohesion (from highest to lowest): 1. 2. 3. 4. 5. 6. 7. Functional Cohesion (Best) Sequential Cohesion Communicational Cohesion Procedural Cohesion Temporal Cohesion Logical Cohesion Coincidental Cohesion (Worst)

Let us create a module that calculates average of marks obtained by students in a class: calc_stat(){ // only a pseudo code read (x[]) a = average (x) print a }

average(m){ sum=0 for i = 1 to N{ sum = sum + x[i]} return (sum/N) } In average() above, all of the elements are related to the performance of a single function. Such a functional binding (cohesion) is the strongest type of binding. Suppose we need to calculate standard deviation also in the above problem, our pseudo code would look like: calc_stat(){ // only a pseudo code read (x[]) a = average (x) s = sd (x, a) print a, s } average(m){ // same as before … } sd (m, y){//function to calculate standard deviation … } Now, though average() and sd() are functionally cohesive, calc_stat() has a sequential binding (cohesion). Like a factory assembly line, functions are arranged in sequence and output from average() goes as an input to sd(). Suppose we make sd() to calculate average also, then calc_stat() has two functions related by a reference to the same set of input. This results in communication cohesion. Let us make calc-stat() into a procedure as below:

calc_stat(){ sum = sumsq = count = 0 for i = 1 to N read (x[i]) sum = sum + x[i] sumsq = sumsq + x[i]*x[i] …} a = sum/N s = … // formula to calculate SD print a, s } Now, instead of binding functional units with data, calc-stat() is involved in binding activities through control flow. calc-stat() has made two statistical functions into a procedure. Obviously, this arrangement affects reuse of this module in a different context (for instance, when we need to calculate only average not std. dev.). Such cohesion is called procedural. A good design for calc_stat () could be (data not shown):

In a temporally bound (cohesion) module, the elements are related in time. The best examples of modules in this type are the traditional “initialization”, “termination”, “housekeeping”, and “clean-up” modules. A logically cohesive module contains a number of activities of the same kind. To use the module, we may have to send a flag to indicate what we want (forcing various activities

sharing the interface). Examples are a module that performs all input and output operations for a program. The activities in a logically cohesive module usually fall into same category (validate all input or edit all data) leading to sharing of common lines of code (plate of spaghetti?). Suppose we have a module with all possible statistical measures (like average, standard deviation, mode, etc.). If we want to calculate only average, the call to it would look like calc_all_stat (x[], flag1, flag2, para1,…). The flags are used to indicate out intent. Some parameters will also be left blank. When there is no meaningful relationship among the elements in a module, we have coincidental cohesion. Refer [Page-Jones, 1988: Chapters 5 & 6] for more illustrations and exercises on coupling and cohesion. 4. Data Design The data design can start from ERD and Data Dictionary. The first choice the designer has to make is to choose between file-based system or database system (DBMS). File based systems are easy to design and implement. Processing speed may be higher. The major draw back in this arrangement is that it would lead to isolate applications with respective data and hence high redundancy between applications. On the other hand, a DBMS allows you to plug in your application on top of an integrated data (organization-wide). This arrangement ensures minimum redundancy across applications. Change is easier to implement without performance degradation. Restart/recovery feature (ability to recover from hardware/software failures) is usually part of DBMS. On the flip side, because of all this in-built features, DBMS is quite complex and may be slower. A File is a logical collection of records. Typically the fields in the records are the attributes. Files can be accessed either sequentially or randomly; Files are organized using Sequential or Indexed or Indexed Sequential organization. E.F. Codd introduced a technique called normalization, to minimize redundancy in a table. Consider, R (St#, StName, Major, {C#, CTitle, Faculty, FacLoc, Grade}) The following are the assumptions given: • • • • Every course is handled by only one faculty Each course is associated to a particular faculty Faculty is unique No student is allowed to repeat a course

This table needs to be converted to 1NF as there is more than one value in a cell. This is illustrated below:

After removing repeating groups, R becomes R1(St#, StName, Major) (UNDERLINE) R2(St#,C#, CTitle, Faculty, FacLoc, Grade) as illustrated below:

The primary key of the parent table needs to be included in the new table so as to ensure no loss of information. Now, R1 is in 2NF as it has no composite key. However, R2 has a composite key of St# and C# which determine Grade. Also C# alone determines

Inorder to convert it into 2NF we have to remove partial dependency. R2 now becomes R21(St#, C#, Grade) R22(C#, CTitle, Faculty, FacLoc) Now, R21 & R22 are in 2NF which is illustrated as below:

in R22 there exists one transitive dependency. R1 and R21 are in 3NF also. Employee (Name. Now R221 and R222 are in 3NF. Grade) . If application does not . Faculty*) R222(Faculty. C#. Location. FacLoc) – R22 De-normalization can be done for improving performance (as against flexibility) For example. StName. Major) – R1 Mark_List (St#.Now. Door#. However. as Faculty determines FacLoc. StreetName. assuming Faculty is unique. CTitle. Faculty*) – R221 and Faculty (Faculty. shown as below: Inorder to convert it into 3NF we have to remove this transitive dependency. The final 3NF reduction for the given relation is: Student (St#. FacLoc). CTitle. Pin) may become two table during 3NF with a separate table for maintaining Pin information. R22 now becomes R221(C#.R21 Course (C#.

Vol. 1990. Guidelines for converting ERD to tables: • • • • Each simple entity --> Table / File For Super-type / Sub-type entities o Separate / one table for all entities o Separate table for sub & super-type If M:M relationship between entities.New York. 13. [9] [Stevens et al. Meilir. [6] [Shaw.” IBM Systems Journal. 1988.26. Prentice Hall. "Abstraction Techniques in Modern Programming Languages. D.have any frequent change to Pin." IEEE Software. Wiener. pp. 2. 4. (Reprinted in IBM Systems Journal. October 1984.. "On the Criteria To Be Used in Decomposing Systems Into Modules. Add inherited attribute (foreigh key) depending on retrieval need (Typically on the ‘M’ side of 1:M relationship) References [1] Shaw.P. . pp. 3rd edition. and L. Wilkerson. Vol. G. Addison-Wesley. Software Architecture: Perspectives on an Emerging Discipline. 1974. 10 . [5] [Parnas. 12. “Structured Design. Wirfs-Brock. Myers. Vol. 5. 1983. December 1972.L. Parnas. [2] Roger S. 1998. New York. 2000. 1. [4] [IEEE. The Institute of Electrical and Electronic Engineers. 1999) Recommended Text Books [10] Pankaj Jalote. [3] Ben Shneiderman. 38. Shaw. pp.L. Stevens. Mary & Garlan. Constantine. Prentice Hall. Prentice-Hall. Narosa Pub. and L. 1996. House. No. Designing the User Interface: Strategies for Effective HumanComputer Interaction.J. 5th edition. 10531058. No. M. Vol. Englewood Cliffs. Second Edition. IEEE. No. McGraw-Hill. 231-256. 2 &3. R. 1997. The Practical Guide to Structured Systems Design. 1972b]. then we can de-normalize this two table back to its original form by allowing some redundancy for having improved performance. 1988] Page-Jones. Software Engineering: A Practitioner’s Approach. New Jersey. 1990]. An Integrated Approach to Software Engineering. Pressman. 1983]. Nos. B. 1974] W. create one Table / File for relationship with primary key of both entity forming composite key. [8] [Page-Jones. 1984]. IEEE Standard Glossary of Software Engineering Terminology. Designing Object-Oriented Software. [7] [Wirfs-Brock et al." Communications of the ACM. David.

Object-Oriented Analysis and Design With Applications . [12] Date. C. and Shaw. 2000. D. 1994. 1998. 2nd Edition. [15] Garlan. Addison-Wesley. 3rd edition. CMU Software Engineering Institute Report. [14] Grady Booch.[11] Meilir Page-Jones. January. M. Introduction to Database Systems. Designing the User Interface: Strategies for Effective HumanComputer Interaction. Addison-Wesley. 1994.J. An Introduction to Software Architecture. CMU-CS-94-166. 1988. Addison-Wesley. 2nd edition. Practical Guide to Structured Systems Design. . Prentice-Hall. [13] Ben Shneiderman. 7th edition.

html and refer http://www. 1979 .2 How do we decide correctness of a software? To answer this question.. 1988 Shut down of Nuclear Reactors. Thomas Huckle’s site for a collection of software bugs: http://wwwzenger. Introduction to Testing 1.. The three values are interpreted as representing the lengths of the sides of a triangle. Explosion. we need to first understand how a software can fail.Testing and Debugging 1. by identifying defects and problems [2].informatik. 1996 AT&T long distance service fails for nine hours. it is a technique for evaluating product quality and also for indirectly improving it. in the software context.tau.de/persons/huckle/bugse. What is common between these disasters? Ariane 5. The program displays a message that states whether the given sides can make a scalene or isosceles or equilateral triangle (Triangle Program)” On a sheet of paper.ac. mandatory part of software development. is non-conformance to requirements. Software faults!! Refer Prof.il/~nachumd/verify/horror. Evaluate the effectiveness of your test cases using this list of common errors.html for software horror stories! 1. 1990 Airbus downing during Iran-conflict.tu-muenchen. Failure.cs.1 A Self-Assessment Test [1] Take the following test before starting your learning: “A program reads three integer values. write a set of test cases that would adequately test this program. Failure may be due to one or more faults [3]: • Error or incompleteness in the requirements . Testing is an important.

which is expected to produce various outputs for inputs from a domain (represented diagrammatically above). rather a destructive mindset is needed) The probability of more errors is proportional to the number of errors already found Consider the following diagram: If we abstract a software to be a mathematical function f. Myers [1] discusses more testing principles: • • • • • • • • Test case definition includes expected output (a test oracle) Programmers should avoid testing their own programs (third party testing?) Inspect the result of each test Include test cases for invalid & unexpected input conditions See whether the program does what it is not supposed to do (‘error of commission’) Avoid throw-away test cases (test cases serve as a documentation for future maintenance) Do not assume that the program is bug free (non-developer. Statistics on review effectiveness and common sense says that . Myers’ classic (which is still regarded as the best fundamental book on testing). We need to develop an attitude for ‘egoless programming’ and keep a goal of eliminating as many faults as possible. proving 100% correctness of a program is not possible. then it is clear from the definition of Testing by Myers that • • Testing cannot prove correctness of a program – it is just a series of experiments to find out errors (as f is usually a discrete function that maps the input domain to various outputs – that can be observed by executing the program) There is nothing like 100% error-free code as it is not feasible to conduct exhaustive testing.• • • Difficulty in implementing the specification in the target environment Faulty system or program design Defects in the code From the variety of faults (Refer Pfleeger [3] for an excellent discussion on various types of faults) above. “The Art of Software Testing” lists the following as the most important testing principles: • • • (Definition): Testing is the process of executing a program with the intent of finding errors A good test case is one that has a high probability of detecting an as-yet undiscovered error A successful test case is one that detects an as-yet undiscovered error. Contradictory to the terms. it is clear that testing cannot be seen as an activity that will start after coding phase – Software testing is an activity that encompasses the whole development life cycle. Testing brings negative connotations to our normal understanding. ‘demonstrating correctness’ is not a demonstration that the program works properly. We need to test a program to demonstrate the existence of a fault.

This method of testing exposes both errors of omission (errors due to neglected specification) and also errors of commission (something not defined by the specification). it makes sense to give both valid and invalid inputs. For instance. we cannot be sure whether the program will detect all equilateral triangles. we not only test with valid inputs but all possible inputs (including invalid ones like characters. We need to place static testing also in place to capture an error before it becomes a defect in the software. Display of records not in sorted order of employee ids.prevention is better than cure. However. name. To exhaustively test the triangle program. we need to create test cases for all valid triangles up to MAXIMUM integer size. Let us look at a specification of a simple file handling program as given below. it is many a time practically impossible to do complete white box testing to trace all possible paths of control flow as the number of paths could astronomically large.3 Testing Approaches There are two major approaches to testing: • • Black-box (or closed box or data-driven or input/output driven or behavioral) testing White-box (or clear box or glass box or logic-driven) testing If the testing component is viewed as a “black-box”. The advantage of this approach is that the tester need not worry about the internal structure of the program. file created with fewer fields etc. Recent agile methodologies like extreme programming addresses these issues better with practices like test-driven programming and paired programming (to reduce the psychological pressure on individuals and to bring review part of coding) [4] 1. the inputs have to be given to observe the behavior (output) of the program. create a file of employees and display the records sorted in the order of employee ids. ) However. (No amount of black box testing can expose such errors of commission as the method uses specification as the reference to prepare test cases. etc.300. date of joining and department as input from the user. This is an astronomical task – but still not exhaustive (Why?) To be sure of finding all possible errors. The observed output is then matched with expected result (from specifications). In this case. float. “White-box” approach examines the internal structure of the program. “The program has to read employee details such as employee id. it is impossible to find all errors using this approach.). a program segment which has 5 different control paths (4 nested if-then-else) and if this segment is iterated 20 times. the number of unique paths would be 520+519+…+51 = 1014 or 100 trillion [1]. if one tried three equilateral triangle test cases for the triangle program. This means logical analysis of software element where testing is done to trace all possible paths of control flow. negative integers. The program may contain a hard coded display of ‘scalene triangle’ for values (300. If we were able to complete one test case every . “ Examples for errors of omission: Omission of Display module.300). Example of error of commission: Additional lines of code deleting some arbitrary records from the created file. For instance.

). we may not be testing all these paths. [4] Kent Beck. Extreme Programming Explained: Embrace Change. then. None of these approaches are superior – meaning. John Wiley & Sons. Addison-Wesley. but not the least. The Art of Software Testing. Software Engineering: Theory and Practice. it would take approximately one billion years to test every unique path. 1999. actually. the program may sort in ascending order. S.5 minutes. 2nd edition.1 Overview In developing a large system. in Pierre Bourque and Robert Dupuis (eds. The challenge. it may not guarantee that the program work as per specification: Instead of sorting in descending order (required by the specification). A. Prentice Hall. Not. [3] Pfleeger. Hence. Even if we manage to do an exhaustive testing of all possible paths. References [1] Myers. as they really complement each other. static testing still plays a large role in software testing. The details of various techniques under black and white box approach are covered in Test Techniques. 1979. 2. Exhaustive path testing will not address missing paths and data-sensitive errors.L. Levels of Testing 2. [2] Bertolino. • • • • Unit Testing Integration Testing System Testing Acceptance Testing . 2001. • • • It is not feasible to do exhaustive testing either in block or in white box approaches. Guide to the Software Engineering Body of Knowledge (SWEBOK). one has to use both approaches. “Software Testing”. testing usually involves several stages (Refer the following figure [2]). G. IEEE. 2001. not all control paths may be feasible. Due to dependency of decisions.J. lies in using a right mix of all these approaches and in identifying a subset of all possible test cases that have highest probability of detecting most errors. A black-box approach would capture these errors! In conclusion.

This step is called System Testing. Walkthroughs and Inspections are used (Refer RWI course) Proving code correct • o After coding and review exercise if we want to ascertain the correctness of the code we can use formal methods. Such a testing is called Unit Testing (or component or module testing). the overall functionality is tested against the Software Requirements Specification (SRS). This process of verifying the synergy of system components against the program Design Specification is called Integration Testing.2 Unit Testing Pfleeger [2] advocates the following steps to address the goal of finding faults in modules (components): Examining the code • Typically the static testing methods like Reviews.Initially. Once the system is integrated. Once the system is accepted. the other non-functional requirements like performance testing are done to ensure readiness of the system to work successfully in a customer’s actual working environment. Customer in their working environment does this exercise of Acceptance Testing usually with assistance from the developers. it will be installed and will be put to use. 2. the next step is ensuring that the interfaces among the components are defined and handled properly. A program is correct if it implements the functions and data properly as indicated in the design. each program component (module) is tested independently verifying the component functions with the types of input identified by studying component’s design. Then. and if it interfaces properly with all other components. Unit testing is done in a controlled environment with a predetermined set of data fed into the component to observe what output actions and data are produced. When collections of components have been unit-tested. The next step is customer’s validation of the system against User Requirements Specification (URS). One way to investigate program correctness is to view .

• • Integration Strategies Depending on design approach. We need to test the following aspects that are not previously addressed while independently testing the modules: • Interfaces: To ensure “interface integrity. we can show that the truth of the theorems implies the correctness of the code.3 Integration Testing Integration is the process of assembling unit-tested modules. When data is passed to another module. one of the following integration strategies can be adopted: · Big Bang approach · Incremental approach • • • Top-down testing Bottom-up testing Sandwich testing To illustrate. Module combinations may produce a different behaviour due to combinations of data that are not exercised during unit testing. On the contrary. “Proving code correctness” will be an elusive goal for software engineers. Proving views programs in terms of classes of data and conditions and the proof may not involve execution of the code. testing gives us information about how a program works in its actual operating environment. Testing program components (modules) • o In the absence of simpler methods and automated tools. Using mathematical logic. input data and conditions are chosen to demonstrate an observable behaviour of the code. Test case are generated by using either black-box or white-box approaches (Refer Test Techniques) • 2. A test case is a particular choice of input data to be used in testing a program. if we can formulate the program as a set of assertions and theorems. Global data structures. For example. the code for performing bubble sort is much smaller than its logical description and proof. may reveal errors due to unintended usage in some module. The loss or corruption of data can happen due to mis-match or differences in the number or order of calling and receiving parameters.” the transfer of data between modules is tested. Much work is involved in setting up and carrying out the proof. o To test a component (module). if used. o Use of this approach forces us to be more rigorous and precise in specification. by way of a call. testing is a series of experiments to observe the behaviour of the program for various input conditions. consider the following arrangement of modules: . there should not be any loss or corruption of data. While proof tells us how a program will work in a hypothetical environment described by the design and requirements.• the code as a statement of logical flow.

Locating interface errors. this approach is quite challenging and risky as we integrate all modules in a single step and test the resulting system. “stubs” are created. as these modules may not be ready yet. M3 and M4 have to be somehow simulated by the tester somehow. the software is gradually built up. More complex situation demand stubs to simulate a full . To test M1 in isolation. The alternative strategy is an incremental approach. In Top-down testing. testing begins with the topmost module. spreading the integration testing load more evenly through the construction phase. if any. communications to modules M2. Incremental approach can be implemented in two distinct ways: Topdown and Bottom-up. In this way. becomes difficult here. wherein modules of a system are consolidated with already tested components of the system. To simulate responses of M2.Big Bang approach consists of testing each module individually and linking all these modules together only when every module in the system has been tested. Though Big Bang approach seems to be advantageous when we construct independent module concurrently. Simple applications may require stubs which simply return control to their superior modules. An example order of Top-down testing for the above illustration will be: The testing starts with M1. M3 and M4 whenever they are to be invoked from M1. A module will be integrated into the system only when the module which calls it has been already integrated successfully.

the driving function can be provided through a testing harness or may be created by the tester as a program. As with the stub.range of responses. Again. Myers [1] lists the advantages and disadvantages of Top-down testing and Bottom-up testing: Testing Advantages • • TopDown • Advantageous if major flaws occur toward the top of the program Early skeletal program allows demonstrations and boosts morale • Disadvantages Stub modules must be produced Test conditions my be impossible. driver must be provided for modules M2. Elementary modules (those which call no subordinates) require no stubs. or very difficult. M3. If M5 is ready. M1. Stubs may be individually created by the tester (as programs in their own right) or they may be provided by a software testing harness. The integration of M3 would require a stub or stubs (?!) for M5 and M4 would require stubs for M6 and M7. M3 and M4. In the above illustration. to create Observation of test output is more difficult. There is no need for a driver for the topmost node. which is a piece of software specifically designed to provide a testing environment. Bottom-up testing begins with elementary modules. M7. Such a “driver” for M5 would simulate the invocation activities of M3. M6. M3 and M4. it could be responsible for passing test data (as parameters) and it might be responsible for receiving output data. The driver would be responsible for invoking the module under test. M1 would require stubs to simulate the activities of M2. the complexity of a driver would depend upon the application under test. we need to simulate the activities of its superior. as • . including parameter passing. The following diagram shows the bottom-up testing approach for the above illustration: For the above example. M5.

document modification and document deletion.4 System Testing The objective of unit and integration testing was to ensure that the code implemented the design properly. adding a character. Since the focus here is on functionality. This non-functional requirement includes security. a black-box approach is taken (Refer Test Techniques). and the levels below the target. 2. Choosing an integration strategy [2] depends not only on system characteristics. if the bottom layer contains many general-purpose utility programs. For example [2]. deleting a . accuracy. A function test checks whether the integrated system performs its functions as specified in the requirements. the levels above the target. Initially the functions (functional requirements) performed by the system are tested. This approach allows bottom-up testing to verify the utilities’ correctness at the beginning of testing. After ensuring that the system performs the intended functions. adding a paragraph. a word processing system can be tested by examining the following functions: document creation. A topdown approach is used in the top layer and a bottom-up one in the lower layer. speed. so we may adopt an integration schedule that produces a basic working system early in the testing process. test cases are developed from requirements document (SRS). but also on customer expectations. Function testing is performed in a controlled situation. In system testing.only simulated values will be used initially. For example. For the same reason. adding a word. For instance. and reliability. In this way coding and testing can go concurrently. the customer may want to see a working version as soon as possible. Testing converges on the target layer. program correctness can be misleading • Bottomup • • Advantageous if major flaws occur toward the bottom of the program Test conditions are easier to create Observations of test results is easier (as “live” data is used from the beginning) • • Driver modules must be produced The program as an entity does not exist until the last module is added To overcome the limitations and to exploit the advantages of Top-down and Bottom-up testing. the performance test is done. Since function testing compares the system’s actual performance with its requirements. we need to ensure that the system does what the customer wants it to do. System testing begins with function testing. chosen on the basis of system characteristics and the structure of the code. To test document modification. the target layer (the one above) will be components using the utilities. a sandwich testing is used [2]: The system is viewed as three layers – the target layer in the middle.

disruption of power. and confidentiality of data and services) 7· Timing tests – include response time. motion. a stress test evaluates system performance when all those devices or users are active simultaneously. deleting a paragraph. Performance testing evaluates the speed with which calculations are made. These tests include calculation of mean time to failure and mean time to repair. and the response time to user inquiry. transaction time. Regression tests – are required when the system being tested is replacing an existing system (Always used during a phased development – to ensure that new system’s performance is at leaset as good as that of the old) 6· Security tests – ensure the security requirements (testing characteristics related to availability. 9· Quality tests – evaluate the system’s reliability.g. humidity. etc. the precision of the computation. maintainability. electrical or magnetic fields.character. etc. 8· Environmental tests – look at the system’s ability to perform at the installation site. moisture. system to retrieve information from a large database system) 5. Performance testing addresses the non-functional requirements. are to be tested. Configuration tests – analyzes the various software and hardware configurations specified in the requirements. integrity. Checking the size of fields. chemical presence. Stress tests – evaluates the system when stressed to its limits. Types of Performance Tests [2] 1. as well as average time to find and fix a fault. portability. 3.g. If the requirements state that a system is to handle up to a specified number of devices or users. Volume tests – addresses the handling of large amounts of data in the system. This test brings out the performance during peak demand. changing the font. changing the type size. This includes • • • Checking of whether data structures have been defined large enough to handle all possible situations. system to serve variety of audiences) 4. the security precautions required. If the requirements include tolerances to heat. System performance is measured against the performance objectives set by the customer. . and Checking of system’s reaction when data sets reach their maximum size. and availability. deleting a word. 2. changing the paragraph formatting. Compatibility tests – are needed when a system interfaces with other systems (e. function testing may have demonstrated how the system handles deposit or withdraw transactions in a bank account package. then our tests should ensure that the system performs under these conditions. records and files to see whether they can accommodate all expected data. (e. or any other environmental characteristics of the site. For example. Usually done with stress test to see if the timing requirements are met even when the system is extremely active.

John Wiley & Sons. device or services. memory map. maintenance guides and technical documentation exists and to verify consistency of information in them.5 Acceptance Testing Acceptance testing is the customer (and user) evaluation of the system. The system is subjected to loss of system resources and tested if it recovers properly. The in-house test.10· Recovery tests – addresses response to the loss of data. Display screens. 2. 1979. The Art of Software Testing.1 Black Box Approach . report formats and other aspects are examined for ease of use. Usually acceptance test is done by customer with assistance from developers. S. etc. 11· Maintenance tests – addresses the need for diagnostic tools and procedures to help in finding the source of problems. Sometimes the system is piloted in-house before the customer runs the real pilot test. in such case. power. 3. 3.Boundary Value Analysis . The new system is put to use in parallel with previous version and will facilitate gradual transition of users. [2] Pfleeger. 12· Documentation tests – ensures documents like user guides.J. A pilot test installs the system on an experimental basis. References [1] Myers. is done when a new system is replacing an existing one or is part of a phased development. 2001. In benchmark test. and to compare and contrast the new system with the old. Customers can evaluate the system either by conducting a benchmark test or by a pilot test [2]. traces of transactions. This approach is common in the case of commercial software where the system has to be released to a wide variety of customers. messages. Software Engineering: Theory and Practice. parallel testing. A third approach.L. 2nd edition.Equivalence Partitioning . G. the system performance is evaluated against test cases that represent typical conditions under which the system will operate when actually installed. To verify existence and functioning of aids like diagnostic program. primarily to determine whether the system meets their needs and expectations. Test Techniques We shall discuss Black Box and White Box approach. is called an alpha test. and the system is evaluated against everyday working. and the customer’s pilot is a beta test. 13· Human factor (or Usability) tests – investigates user interface related requirements. Prentice Hall.

We need to test the program with invalid data also as the users of the program may give invalid inputs.25000 c3: values > 25000 However. we can form the equivalence classes as explained below.Salary between 15001 and 25000 – Tax is 18 % of Salary . Hence we can form two classes . it is not sufficient that we test only valid test cases.Salary above 25000 – Tax is 20% of Salary Here.. 15000 – No Tax . It is easy to identify an invalid class “c4: values < 12000”.Salary up to Rs.Cause Effect Analysis . This can be explained with the following example.15000 c2: values in the range 15001.. Accordingly..MAX and identify an invalid class “c5: values > MAX”.. The program calculates tax as follows: . If such a group of values can be found in the input domain treat them together as one equivalent class and test one representative from this. MAX may be either defined by the specification or defined by the hardware or software constraints later during the design phase. If we assume some maximum limit (MAX) for the variable Salary.. If we further expand our discussion and assume that user or tester of the program may give any value which can be typed in through the keyboard as input. intentionally or unintentionally. To put this in simpler words. Depending on the system. Since the input has to be “salary” it can be seen intuitively that numeric and non-numeric values are treated differently by the program.. Equivalence Partitioning Equivalence partitioning is partitioning the input domain of a system into finite number of equivalent classes in such a way that testing one representative from a class is equivalent to testing any other value from that class...Error Guessing I. . we can modify the class c3 above to “values in the range 25001. Consider a program which takes “Salary” as input with values 12000..37000 in the valid range.class of non-numeric values . the valid input domain can be divided into three valid equivalent classes as below: c1 : values in the range 12000.Cause Effect Graphing .class of numeric values Since all non-numeric values are treated as invalid by the program class c1 need not be further subdivided. the specification contains a clue that certain groups of values in the input domain are treated “equivalently” by the program. the next best alternative is to check whether the program extends similar behaviour or treatment to a certain group of inputs. since it is practically infeasible to do exhaustive testing.

Thus the equivalent classes identified for the given specification along with a set of sample test cases designed using these classes are shown in the following table.values > MAX Tax = 18% of Salary Tax = 20% of Salary Error Msg: “Invalid Input” We can summarise this discussion as follows: To design test cases using equivalence partitioning.values in the range 15001…25000 c5 ..one invalid value above the range Similarly. To design test cases for a specific set of values . for a range of valid input values identify . Input Condition (Salary) A non-numeric value A numeric value < 12000 A numeric value >=12000 and <= 15000 A numeric value >=150001 and <= 25000 A numeric value >=250001 and <= MAX A numeric value > MAX Expected Result (Tax amount) Error Msg: “Invalid Input” Error Msg: “Invalid Input” No Tax Actual/Observed Remarks Result Class c1 – class of nonnumeric values c2 .values < 12000 c3 . if we look for groups of values meeting with similar treatment from the program the following classes can be identified: values values values values values < 12000 in the range 12000…15000 in the range 15001…25000 in the range 25001..one invalid value .one valid case for each value belonging to the set .. within this class.values in the range 25001. Again.one invalid value below the range and .MAX > MAX Each of these equivalent classes need not be further subdivided as the program should treat all values within each class in a similar manner.Class of numeric values needs further subdivision as all elements of the class are not treated alike by the program.vaues in the range 12000…15000 c4 .MAX c6 ..one valid value within the range .

we need to look at the boundaries of equivalent classes more closely. To design test cases using boundary value analysis. The same guidelines need to be followed to check output boundaries also. We need to perform testing using boundary value analysis to ensure that this difference is maintained. the test cases using boundary value analysis are Input Expected Result Actual/Observed Remarks Condition (Tax amount) Result (Salary) 11999 12000 15000 150001 Invalid input No Tax No Tax Tax = 18% of Salary If we closely look at the Expected Result column we can see that for any two successive input values the expected results are always different.Savings.A compiler being tested with an empty source program .A function for deleting a record from a file being tested with an empty data file or a data file with just one record in it. .Eg: Test Cases for Types of Account (Savings.Overdraft (invalid case) It may be noted that we need fewer test cases if some test cases can cover more than one equivalent class. Boundary Value Analysis Even though the definition of equivalence partitioning states that testing one value from a class is equivalent to testing any other value from that class. Other examples of test cases using boundary value analysis are: ..Trigonometric functions like TAN being tested with values near p/2 . . Current (valid cases) . for a range of values.Two invalid cases just beyond the range limits Consider the example discussed in the previous section. Current) will be .15000” of Salary.Two valid cases at both the ends . For the valid equivalence class “c2-2: values in the range 12000. II. This is so since boundaries are more error prone..

for receipt . Cause Effect Graphing This is a rigorous approach. IV. – Make decision tables based on the graph. III. – Identify the causes and their effects. In such systems the number of inputs and number of equivalent classes for each input could be many and hence the number of input combinations usually is astronomical. Consider the following specification: A program accepts Transaction Code . Guidelines for graphing : – Divide specifications into workable pieces as it may be practically difficult to work on large specifications. A cause is an input condition or an equivalence class of input conditions. – Link causes and effects in a Boolean graph which is the cause-effect graph. Cause Effect Analysis The main drawback of the previous two techniques is that they do not explore the combination of input conditions. The number of columns will depend on the number of different combinations of input conditions which can be made. recommended for complex systems only. – Convert the columns in the decision table into test cases. This is done by having one row each for a node in the graph. Cause effect analysis is an approach for studying the specifications carefully and identifying the combinations of input conditions (causes) and their effect in the form of a table and designing test cases It is suitable for applications in which combinations of input conditions are few and readily visible.Though it may sound that the method is too simple. For a valid input the following must be true. Hence we need a systematic approach to select a subset of these input conditions.3 characters as input. boundary value analysis is one of the most effective methods for designing test cases that reveal common errors made in programming. An effect is an output condition or a system transformation. 1st character (denoting issue or receipt) + for issue .

if the 1st character is ‘+’ or ‘-‘.e. In the graph: (1) or (2) must be true (V in the graph to be interpreted as OR) (3) and (4) must be true (? in the graph to be interpreted as AND) The Boolean graph has to be interpreted as follows: . (i. if the 2nd and 3rd characters are digits) .the intermediate node (6) turns true if (3) and (4) are true (i.e.the intermediate node (5) turns true if (1) or (2) is true (i.a digit To carry out cause effect graphing.2nd character .e. the control flow graph is constructed as below. A partial decision table corresponding to the above graph: Node (1) (2) (3) Some possible combination of node states 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 1 0 1 .node(4) becomes true if the 3rd character is a digit .. if the 1st character is ‘+’ or ‘-‘) .node (1) turns true if the 1st character is ‘+’ .The final node will be true for any valid input and false for any invalid input. 2nd and 3rd characters are digits) .node (2) turns true if the 1st character is ‘-’ (both node (1) and node (2) cannot be true simultaneously) ..The final node (7) turns true if (5) and (6) are true.a digit 3rd character .node(3) becomes true if the 2nd character is a digit ..

2 Test Techniques White Box Approach .Basis Path Testing Basis Path Testing is white box testing method where we design test cases to cover every statement. exit. Error Guessing Error guessing is a supplementary technique where test case design is based on the tester's intuition and experience.: statement like return. V. However. decision coverage and condition coverage. Thus the method attempts statement coverage.(4) (5) (6) (7) Sample Test Case for the Column 0 0 0 0 $xy 0 1 0 0 +ab 1 1 0 0 +a4 0 0 1 0 +2y 1 0 1 0 @45 1 1 1 1 +67 The sample test cases can be derived by giving values to the input characters such that the nodes turn true/false as given in the columns of the decision table. To perform Basis Path Testing • Derive a logical complexity measure of procedural design o Break the module into blocks delimited by statements that affect the control flow (eg. jump etc. There is no formal procedure. every branch and every predicate (condition) in the code which has been written. a checklist of common errors could be helpful here. and conditions) o Mark out these as nodes in a control flow graph o Draw connectors (arcs) with arrow heads to mark the flow of logic o Identify the number of regions (Cyclomatic Number) which is equivalent to the McCabe’s number Define a basis set of execution paths o Determine independent paths Derive test case to exercise (cover) the basis set • • . 3.

of predicate nodes in G or No.N +2 (E: No. of regions of G o V(G) = E . Complexity of a flow graph ‘g’. is computed in one of three ways: o V(G) = No. of conditions in the code) . of edges & N: No. of nodes) o V(G) = P + 1 (P: No.McCabe’s Number (Cyclomatic Complexity) • • • • Gives a quantitative measure of the logical complexity of the module Defines the number of independent paths Provides an upper bound to the number of tests that must be conducted to ensure that all the statements are executed at least once. v(g).

The independents in the above graph are: i) 1-2-3-5-6 ii) 1-2-4-5-6 The last step is to write test cases corresponding to the listed paths. i. This would mean giving the input conditions in such a way that the above paths are traced by the control of execution. it indicates that there two linearly independent paths in the code..e.N +2 ( = 6 – 6 +2 = 2 for the above graph) McCabe’s Number = P + 1 ( =1 + 1= 2 for the above graph) Please note that if the number of conditions is more than one in a single control structure. Path i) ii) Input Condition value of ‘a’ > value of ‘b’ Expected Result Increment ‘a’ by 1 Actual Result Remarks value of ‘a’ <= value of ‘b’ Increment ‘b’ by 1 .McCabe’s Number = No. of Regions (Count the mutually exclusive closed regions and also the whole outer space as one region) = 2 (in the above graph) Two other formulae as given below also define the above measure: McCabe’s Number = E . The test cases for the paths listed here are show in the following table. each condition needs to be separately marked as a node. When the McCabe’s number is 2. two different ways in which the graph can be traversed from the 1st node to the last node.

cause-effect analysis & boundaryvalue analysis. Common Criteria Practiced • Stop When Scheduled Time For Testing Expires • Stop When All The Test Cases Execute Without Detecting Errors Both are meaningless and counterproductive as the first can be satisfied by doing absolutely nothing and the second is equally useless as it does not ensure quality of test cases. . Eg. derived from equivalent partitioning. Stop When • All test cases. When to stop testing When To Stop Testing ? The question arises as testing is never complete and we cannot scientifically prove that a software system does not contain any more errors. are executed without detecting errors. it does the opposite !!! • Defined methodologies are not suitable for all occasions !!! • No way to guarantee that the particular methodology is properly & rigorously used • Depends on the abilities of the tester & not quantification attempted ! Completion Criterion Based On The Detection Of Pre-Defined Number Of Errors In this method the goal of testing is positively defined as to find errors and hence this is a more goal oriented approach.4. whichever come later How To Determine "Number Of Predefined Errors " ? Predictive Models • Based on the history of usage / initial testing & the errors found .For a system test : Detection of 70 errors or an elapsed time of 3 months. Drawbacks • Rather than defining a goal & allowing the tester to select the most appropriate way of achieving it.Testing of module is not complete until 3 errors are discovered .

Caution ! The Above Condition May Never Be Achieved For The Following Reasons • Over Estimation Of Predefined Errors (The Software Is Too Good !!) • Inadequate Test Cases Hence a best completion criterion may be a combination of all the methods discussed Module Test • Defining test case design methodologies (such as boundary value analysis. we can say that testing is complete if 80% of the predefined number of errors are detected or the scheduled four months of testing is over.Defect Seeding Models • Based on the initial testing & the ratio of detected seeded errors to detected unseeded errors (Very critically depends on the quality of 'seeding') Using this approach.. test results or other documentation are visually examined.) Function & System Test • Based on finding the pre-defined number of defects 5. violations of development standards or other problems • Dumps . usually by the person who generated them. to identify errors.. Debugging Debugging occurs as a consequence of successful testing. whichever comes later. as an example. It is an exercise to connect the external manifestation of the error and the internal cause of the error Debugging techniques include use of: • Breakpoints A point in a computer program at which execution can be suspended to permit manual or automated monitoring of program performance or results • Desk Checking A technique in which code listings.

A display of some aspect of a computer program’s execution state. 2nd edition. 5th edition. . is executed in response to an external signal. Glenford J. the names and values of variables or both. The ART of Software Testing. Roger S. showing the sequence of instructions executed. Myers. 2000. A display of the contents of a file or device • Single-Step Operation In this debugging technique a single computer instruction. Wiley. usually the contents of internal storage or registers 2. 6. Pressman. References 1. • Traces A record of the execution of computer program. McGraw-Hill.1. Software Engineering: A Practitioner’s Approach. 2004 2.

An Abbreviated C++ Code Inspection Checklist John T. Baldwin. sources. See rear page for complete information concerning copyright permission. Baldwin October 27. and distribution. . 1992 Copyright © 1992 by John T.

5. not for defending or explaining the code. skim through the inspection checklist. uninterrupted sitting. or the author may be present. To get ready for the inspection. 2. The inspectors are not allowed to ask questions the code is supposed to answer them.40 minutes explaining the general layout of the code to the inspectors. The code author spends 20 . and the author of the code should not be present. find whether the answer is "yes." A yes answer means a probable defect. This should be done in a single. There is a sharp drop-off beyond 122 sloc/hr. 4. but not including whitespace. At each line or block of code. or both. To be more like a walkthrough. the meeting should have a moderator who is well experienced in C or C++. Page 1 . For each applicable question. Use the source line counts on the hardcopy. Meeting. The inspector should have a goal of covering 70-120 source lines of code per hour. A single code inspector should cover no more than 250 source code lines. one of the major purposes of the inspection is to ensure that the code is sufficiently self-explanatory. go through the code line by line. If you want this to be more like a formal inspection. Surprisingly enough. but this overview is designed to speed up the process. You will notice that some of the questions are very low-level and concern themselves with syntactical details. the moderator may be omitted. including comments. nor too slowly! [This has been shown in several studies to be the next major factor after programming experience which affects the number of errors found. this is a "carved in stone" limit! The hardcopy should contain a count of the source code lines shown. and in conducting code inspections. Strive to stay on task. Code inspector teams consist of 2-5 individuals. 3. print separate hardcopies of the source code for each inspector. Inspection overview. The author's goal is to stick to the major. not to evaluate developers. attempting to fully understand what you are reading. This is because inspection ability generally drops off after this amount of time. while others are high-level and require an understanding of what a block of code does. Remember. The author of the code to be inspected is not part of the initial inspection team! A code inspection is not a witch hunt so no witch-hunting! Our purpose here is to improve the code. important points.How to Conduct an Informal Code Inspection 1. Be prepared to change your mental focus. it is for the purpose of collecting feedback. so don't rush!] To do the inspection. Individual inspections. Each inspector uses the attached checklist to try to put forward a maximum number of discovered possible defects. including interruptions. If the author is present. and strive not to inspect too quickly. The meeting is attended by all the code inspectors for that chunk of code. and to not allow interruptions. looking for questions which apply. Each meeting is strictly limited to two hours duration. and keep it as close to 20 minutes as possible without undercutting the explanation. Write it down.

etc. one of the by-products of the meeting will be a count of the total number of different types of potential defects noted. The correctness of the rework will be verified either at a short review meeting. adding or deleting comments. or to another assigned individual for "rework. he or she may hold a short meeting with any or all of the inspectors.Different inspectors may cover different groups of code for a single meeting. In order to objectively track success in detecting and correcting defects. It is the moderator's personal responsibility to ensure all defects have been satisfactorily reworked. The document containing the pairing of code modules with their numbers will be maintained by a single individual who has no management responsibilities on the project. following the code inspection meeting." This can consist of changing code. Note that solutions are not discussed at the inspection meeting! They are neither productive nor necessary in that setting. The "improvements" meeting is led by the author/maintainer. a single meeting could theoretically cover a maximum of (5 inspectors) × (120 sloc/hr) × (2 hrs) = 1200 lines of source code. quit. then an individual is selected for this role at the inspection meeting. If there is no formal moderator. In actuality. or during later inspection stages during the project. Record keeping. In order to eliminate both the perception and the possibility that the records will be used to evaluate developers (remember. 8. 7. and this document will be destroyed upon completion of the code development phase of the project. up to the case of everyone having inspected the same code. The moderator or note taker should submit the existing notes to the author or maintainer. there should be some overlap between inspectors. Rework. a "code module number" may be assigned and used. If the group is not finished at the end of two hours. Thus. Do not attempt to push ahead. then neither the name of the author nor the source module will be noted in the defect counts. the goal is to improve the software). The defects list is submitted to the author. 6. If absolutely necessary to keep the counts straight. who is free to accept or reject any suggestions from the attenders. Page 2 . If the author/maintainer desires feedback on solutions or improvements. Follow up. and the remaining material should be covered in a subsequent meeting. restructuring or relocating things.

which results in the buffer being one element bigger than absolutely necessary.C++ Inspection Checklist 1 1.1 Is an array dimensioned to a hard-coded constant? int should be int intarray[TOT_MONTHS+1]. should be char entry[LAST_ENTRY+1]. Page 3 .1 Does the value of the variable never change? int should be const unsigned months_in_year = 12.2.1.1.2 Is the array dimensioned to the total number of items? char entry[TOTAL_ENTRIES]. months_in_year = 12. 1.2 Constants 1. The first example is extremely error-prone and often gives rise to off-by-one errors in the code.2 Are constants declared with the preprocessor #define mechanism? #define MAX_FILES 20 should be const unsigned MAX_FILES = 20.1 VARIABLE DECLARATIONS Arrays 1. 1. The preferred (second) method permits the writer to use the LAST_ENTRY identifier to refer to the last item in the array.2. 1. Instances which require a buffer of a certain size are rarely rendered invalid by this practice. intarray[13].

1. } .. since the default types are usually signed. Page 4 . const long bar::MAX_INSTS = 70000L. Static data items are not permitted to be initialized within the class declaration.1 Does a negative value of the variable make no sense? If so.. Static constant members have one drawback: you cannot use them to declare member data arrays of a certain size.. The keyword static ensures there is only one instance of the variable for the entire class.3 Is the usage of the constant limited to only a few (or perhaps only one) class? If so. age. This is an easy error to make.. private: enum { MAX_FOO_BUFFERS = 40. so the initialization line must be included in the implementation file for class bar. }. If the size of the constant exceeds int.3 Scalar Variables 1. This is because the value is not available to the compiler at the point which the array is declared in the class. is the variable signed? int should be unsigned int age. const unsigned MAX_FOO_BUFFERS = 40. is the constant global? const unsigned MAX_FOOS = 1000. . } .2. should be class foo { public: enum { MAX_INSTANCES = 1000.3.. }.. another mechanism is available: class bar { public: static const long MAX_INSTS.1.

(Exceptions may occasionally be found for some classes having a destructor with neither of the other two. 1. // WRONG on Borland C++ 3. Making the destructor virtual ensures that the right code will be run if you delete the object via the pointer. acct_balance might equal 103446. can often be handled in counts of cents.46. and subject to more complex overflow and underflow behavior than integer math is.4. as above. SmallInt mumble = 280. acct_balance.1 // or MSC/C++ 7.1 Does the class have any virtual functions? If so.034.4 Classes 1.4. Thus.) Page 5 . the only time floating point arithmetic is necessary is in scientific or navigational calculations. and print out as $1. it generally will need all three.3 Does the program unnecessarily use float or double? double should be unsigned long acct_balance. and formatted properly on output. SmallInt. is the destructor non-virtual? Classes having virtual functions should always have a virtual destructor. Monetary calculations.0! 1.2 Does the class have any of the following: Copy-constructor Assignment operator Destructor If so.1. It is slow.3. 1.2 Does the code assume char is either signed or unsigned? typedef char SmallInt. This is necessary since it is likely that you will hold an object of a class with a pointer of a lessderived type.3. The typedefs should be typedef unsigned char typedef signed char SmallUInt. In general.

2.2 2.1 Can the string ever not be null-terminated? 2.1 Are there always size checks when copying into the buffer? 2.1 INITIALIZATION Local Variables 3. one program had no size checks when reading data into a buffer because the correct data would always fit.2 Are there possible ordering problems (portability)? 3 3.1 Is a bitfield really required for this application? 2.2.1. instead of assigning some default value.2 Can the buffer ever be too small to hold its contents? For example.2.2 Buffers 2.1 DATA USAGE Strings 2.3.1.3 Bitfields 2.1.2 Are C++ locals created. Page 6 . then later throwing it away and assigning the real value. when an initialization variable is known. then assigned later? This practice has been shown to incur up to 350% overhead. as if it were a string? 2.2 Is the code attempting to use a strxxx() function on a non-terminated char array.1 Are local variables initialized before being used? 3.3. compared to the practice of declaring the variable later in the code.1. the program crashed mysteriously. It is the simple matter of putting a value in once. But when the file it read was accidentally overwritten with incorrect data.

1 for the correct parenthesization.2 If a macro is not completely parenthesized.3.1 Can a variable carry an old value forward from one loop iteration to the next? Suppose the processing of a data element in a sequence causes a variable to be set.2. and some globals initialized for that file. will this ever cause unexpected results? #define IsXBitSet(var) (var && bitmask) result = IsXBitSet( i || j ).b) max(i++.2 Missing Reinitialization 3. j) + 3.3 If the macro's arguments are not parenthesized. a file might be read. 4. This expands into: result = (i || j && bitmask). Can those globals be used for the next file in the sequence without being re-initialized? 4 4. is the macro ever expanded with a actual parameter having side effects? For example. b) result = max(i. For example.1 MACROS If a macro's formal parameter is evaluated more than once. ( (a) > (b) ? (a) : (b) ) 4. See the example in 4. The correct form is: #define IsXBitSet(var) ((var) && (bitmask)) // not what expected! Page 7 . what happens in this code: #define max(a. (a) > (b) ? (a) : (b) This expands into: result = (i) > (j) ? (i) : (j)+3. is it ever invoked in a way that will cause unexpected results? #define max(a. j).

5 5. sizeof(buffer2)). since C++ provides an allocation operator. 5.1 SIZING OF DATA In a function call with arguments for a buffer and its size. along with the reason for implementing in this manner.1. 0.2 Is the argument to sizeof an incorrect type? Common errors: sizeof(ptr) instead of sizeof(*ptr) sizeof(array) sizeof(*array) instead of sizeof(array) instead of sizeof(array[0]) (when the user wanted the size of an element) 6 6. are an obvious exception. but it is a dangerous practice.1 Is too little space being allocated? 6.1. // danger! This is not always an error.1. Page 8 . 6. Each instance should be verified as (a) necessary.3 Is malloc(). and then commented as such. but should always be prominently documented. since a single object has control of its class data. Constructors which allocate.2 Does the code allocate memory and then assume someone else will delete it? This is not always an error. calloc(). paired with destructors which deallocate.1 DYNAMIC ALLOCATION Allocating Data 6. is the argument to sizeof different from the buffer argument? For example: memset(buffer1. or realloc() used in lieu of new? C standard library allocation functions should never be used in C++ programs. and (b) correct.

calloc.2. 6.2. or realloc invoked for an object which has a constructor? Program behavior is undefined if this is done. or to another safe value meaning "uninitialized. and such usage is specifically deprecated by the ANSI draft C++ standard.2.2 Does the deleted storage still have pointers to it? It is recommended that pointers are set to NULL following deletion.2. 6. 6.2 Is malloc. Program behavior is undefined if you do them.2. or realloc? 6.4 Is delete invoked on a pointer obtained via malloc." This is neither necessary nor recommended within destructors.5 Is free invoked on a pointer obtained via new? Both of these practices are dangerous.2.3 Are you deleting already-deleted storage? This is not possible if the code conforms to 6.2. myCharArray. Page 9 . The draft C++ standard specifies that it is always safe to delete a NULL pointer. since the pointer variable itself will cease to exist upon exiting. so it is not necessary to check for that value. If C standard library allocators are used in a C++ program (not recommended): 6.1 Are arrays being deleted as if they were scalars? delete should be delete [] myCharArray.If you find you must mix C allocation with C++ allocation: 6. calloc.2.2 Deallocating Data 6.

Worse.2 CASTING Is NULL cast to the correct type when passed as a function argument? Does the code rely on an implicit type conversion? C++ is somewhat charitable when arguments are passed to functions: if no function is found which exactly matches the types of the arguments supplied. 9 9. can the pointer ever be NULL? When copying the value of a pointer. is the parenthesization incorrect? if ( a = function() == 0 ) should be if ( (a = function()) == 0 ) 9." If this does not occur all in one place. should it instead allocate a copy of what the first pointer points to? 8 8.1 COMPUTATION When testing the value of an assignment or computation.7 7. While this saves unnecessary casting. if more than one function fits the conversion rules. is it guaranteed that all variables get updated if a single value changes? Do all updates occur before any of the values are tested or used? Page 10 . it attempts to apply certain type conversion rules to find a match.1 8. a group of variables must be modified as a group to complete a single conceptual "transaction.2 POINTERS When dereferenced. or from adding an overloaded function) to cause previously working code to break! See the Appendix (A) for an example.2 Can any synchronized values not get updated? Sometimes. it can cause additions to the type system (either from adding a related class.1 7. it will result in a compilation error.

10.2 Are unsigned values tested greater than or equal to zero? if ( myUnsignedVar >= 0 ) will always evaluate true.4 If the test is an error check. Calculations involving someVar may never result in it taking on that value. The constant 0. Solution: use >. it may miss the value of the test constant entirely.1 ) might never be evaluated as true. <.10 10. 10. 10. or <=. thus the compiler must round it to some other number. depending on which direction you wish the variable bound. could the "error condition" actually be legitimate in some cases? Page 11 .1 CONDITIONALS Are exact equality tests used on floating point numbers? if ( someVar == 0. >=.3 Are signed variables tested for equality to zero or another constant? if ( mySignedVar ) if ( mySignedVar >= 0 ) if ( mySignedVar <= 0 ) // not always good // better! // opposite case If the variable is updated by any means other than ++ or --.1 is not exactly representable by any finite binary mantissa and exponent. This can cause subtle and frightening bugs when code executes under conditions that weren't planned for.

a whole class of offby-one errors is eliminated. 11.1 Control Variables 11. they may be "stacked" together and the code terminated with a single break. 11.1 In a switch statement. the following assumptions always apply: • the size of the interval equals the difference of the two limits • the limits are equal if the interval is empty • the upper limit is never less than the lower limit Examples: instead of saying x>=23 and x<=42. It is likely to simplify the code. 11. is any case not terminated with a break statement? When several cases are followed by the same block of code.11 FLOW CONTROL 11.2 Is the upper limit an inclusive limit? By always using inclusive lower limits and exclusive upper limits.2 Branching 11. Furthermore. All other circumstances requiring "drop through" cases should be clearly documented in a strategic comment before the switch. Cases may also be exited via return.3 Does a loop set a boolean flag in order to effect an exit? Consider using break instead.1 Is the lower limit an exclusive limit? 11. even when it appears that the code can never get there. Page 12 .2.2.1.2 Does the switch statement lack a default branch? There should always be a default branch to handle unexpected cases. use x>=23 and x<43. This should only be used when it makes the code simpler and clearer.1.2.

too.1.1. 12.1. in fact. consider replacing it with an else clause if it will simplify the code. they should be semantically identical.1.1.2.11.) Page 13 . 12.2 Use of assignment 12.4 Does the assignment operator return anything other than a const reference to this? Failure to return a reference to this prevents the user from writing (legal C++): a = b = c. be implemented differently (+= should be more efficient).2.1 Does "a += b" mean something different than "a = a + b"? The programmer should never change the semantics of relationships between operators. They may.3 Does the assignment operator fail to test for self-assignment? The code for operator=() should always start out with: if ( this == &right_hand_arg ) return *this. 12. For the example here.2 and commentary.4 Does the loop contain a continue? If the continue occurs in the body of an if conditional. the two statements above are semantically identical for intrinsic types (even though the code generated might be different).1 Can this assignment be replaced with an initialization? (See question 3. so for a user defined class. Failure to make the return reference const allows the user to write (illegal C++): (a = b) = c.2 Is the argument for a copy constructor or assignment operator non-const? 12.1 Assignment operator 12. 12 ASSIGNMENT 12.

a long.1 ARGUMENT PASSING Are non-intrinsic type arguments passed by value? Foo& do_something( Foo anotherFoo. or a struct!).2 14.) Page 14 . you'd be using way too much memory. const Bar& someThing ).3 14. Bar someThing ). 13 13.2. 14 14.2 Is there a mismatch between the units of the expression and those of the variable? For example. 14. and such by value.12.5 Does an operator return a reference when it should return an object? Are objects returned by value instead of const references? (See question 13. should be Foo& do_something( const Foo& anotherFoo. longs.4 14. passing objects this way incurs significant expense due to the construction of temporary objects. you might be calculating the number of bytes for an array when the number of elements was requested. While it is cheaper to pass ints.1 and commentary. The problem becomes more severe when inheritance is involved.1 RETURN VALUES Is the return value of a function call being stored in a type that is too narrow? (See Appendix (B).) Does a public member function return a non-const reference or pointer to member data? Does a public member function return a non-const reference or pointer to data outside the object? This is permissible provided the data was intended to be shared. If the elements are big (say. and this fact is documented in the source code. Simulate pass-by-value by passing const references.

a user was surprised to see nonsensical values when the following code was executed: printf(" %d %ld \n".1.2 Can this function violate the preconditions of a called function? Page 15 . On that particular system. printf() is responsible for manually accessing the stack. a_long_int. It then saw "%ld" and grabbed 4 bytes (a long).. should it be a different function with a similar name? (E.. The two values printed were the MSW of a_long_int.2 Are there extra arguments? 15.1. If necessary.2. and the combination of a_long_int's LSW and another_long_int's MSW.) 15.15 FUNCTION CALLS 15. Solution: ensure types explicitly match. thus. another_long_int).3 Do the argument types explicitly match the conversion specifications in the format string? (printf and friends.1 Is this function call correct? That is.g.1 Varargs functions (printf.) Type checking cannot occur for functions with variable length argument lists.2. it saw "%d" and grabbed 2 bytes (an int). arguments may be cast to smaller sizes (long to int) if the author knows for certain that the smaller type can hold all possible values of the variable.1 Is the FILE argument of fprintf missing? (This happens all the time. respectively). For example. 15.1. strchr instead of strrchr?) 15.2 General functions 15. and other functions with ellipsis . ints and longs were different sizes (2 and 4 bytes.) 15.

16.1 FILES Can a temporary file name not be unique? (This is. a common design bug. fp = fopen(..3 Is a file not closed in case of an error return? Page 16 .2 Is a file pointer reused without closing the previous file? fp = fopen(.16 16.).).... surprisingly enough.) 16.

. // copy constructor // . For example: class String { public: String( char *arg ). Code which relies upon implicit type conversions may become broken when new classes or functions are added. . // need another foo that works with "Words" void foo( const Word& aWord ). // copy constructor operator const char* () const. } The code worked before class Word and the second foo() were added. Even though there was no foo() accepting an argument of type const char * (i. So the compiler performed the implicit conversion.e. bar(baz). }. }. a constant string like "hello")..Appendix A. // . // Now.. there is a foo() which takes a constant String argument by reference. Errors due to implicit type conversions. we added the following class class Word { public: Word( char *arg ). void bar( const char *anArray ). // This used to work! // Now it breaks! What gives? String baz = "quux". there is also a way to convert Strings to char *'s and vice-versa. int { gorp() foo("hello").. // but this still works. And (un)fortunately. void foo( const String& aString ).

char chr. then information will be lost when the high-order byte(s) are scraped off prior to the test for EOF. // or use casted tmpchar . simply by writing the new function as a forwarding function and making it inline. or (b) explicitly cast the argument. // throughout. plus EOF (-1). Worse yet.. void foo( const Word& ). depending on whether char is signed or unsigned by default on the particular compiler and machine being used. Option (a) is preferred over (b). signextension can wreak havoc and cause some of these loops never to terminate. The practice in the top example is unsafe because functions like getchar() may return 257 different values: valid characters with indexes 0 -255. }. since (b) defeats automatic type checking. { should be: int tmpchar. Errors due to loss of "precision" in return values Functions which can return EOF should not have their return values stored in a char variable. For example: int getchar(void). while ( (tmpchar = getchar()) != EOF ) { chr = (char) tmpchar. Since the mechanisms of the failure may be distributed among two or more header files in addition to the implementation file. there is a problem. and either (a) overload the function to provide an explicitlytyped variant. This can cause the test to fail... along with a lot of other code. The easiest solution is to recognize while coding or inspecting that a function call results in implicit type conversion. with the addition of class Word. Option (a) can still be implemented very efficiently. . and another foo() which works with it. If sizeof(int) > sizeof(char). while ( (chr = getchar()) != EOF ) . The line which calls foo("hello") matches both: void foo( const String& ). it may be difficult to find the real problem.Now. B.. }...

i >= 0. ++i ) for ( i = max_index. --i ) for ( i = max_index. --i ) .C. i . i < sizeof(array). i <= max_index. Loop Checklist The following loops are indexed correctly. chances are that something is wrong or at least that something could be clearer. Acceptable forms of for loops which avoid off-by-one errors. and are handy for comparisons when doing inspections. ++i ) for ( i = 0. If the actual code doesn't look like one of these. for ( i = 0.

Rules and Recommendations.. marick@testing. modify. Inc. provided you retain the original copyright notice and contact information. Champaign. the following contact information is provided below: Brian Marick Testing Foundations 809 Balboa. GA 30243 1-404-3399621 johnb@searchtech. and distribute this document.edu. In conformance with his copyright notice. provided that this complete copyright and permission notice is maintained intact in all copies. Finally. all modifications and remaining original material are: Copyright © 1992 by John T. Baldwin 1511 Omie Way Lawrenceville.uiuc. copy. Portions of his document were Copyright © 1991 by Motorola. John T. IL 61820 (217) 351-7228 marick@cs. Baldwin." 2.com "You may copy or modify this document for personal use. Copyright © 1992 by Brian Marick.Copyright Notices 1. which graciously granted him rights to those portions. Copyright © 1990-1992 by Ellemtel Telecommunication Systems Laboratories. In conformance with their copyright notice: "Permission is granted to any individual or institution to use.co m . All Rights Reserved. Some questions and comment material were modified from Programming in C++." 3. Some of the questions applicable to conventional C contained herein were modified or taken from A Question Catalog for Code Inspections.

modify. distribute. permission. provided that the complete copyright.Permission is granted to any institution or individual to copy. and use this document. and contact information applicable to all copyright holders specified herein remains intact in all copies of this document. .

6. 3. are the assumptions about order of evaluation and precedence correct? Are parentheses used to avoid ambiguity? 6. Christopher John Fox Publisher: Prentice Hall Date Published: February 1991 Data-Declaration Errors 1. is the referenced storage currently allocated to the proper data? Are all defined variables used? Is the #define construct used when appropriate instead of having hard-wired constants in functions? Computation Errors Are there missing validity tests (e. is each subscript value within the defined bounds? For pointer references.. 8.Code Inspection Checklist Title: Software Engineering in the UNIX/C Environment Author: William Bruce Frakes. 6. 5. X == TRUE instead of FOUND == TRUE )? Are there any comparisons between variables of inconsistent types? Are the comparison operators correct? Is each boolean expression correct? Are there improper and unnoticed side-effects of a comparison? Has an "&" inadvertently been interchanged with a "&&" or a "|" for a "||"? . Is the "=" expression used in a comparison instead of "=="? Is the correct condition checked (e. 7. 3. 2. 4. 7. if (FOUND) instead of if (!FOUND) )? Is the correct variable used for the test (e. 4. Is a variable referenced whose value is uninitialized or not set to its proper value? For all array references.C .g. Nejmeh. 6. Do all variables used in a computation contain the correct values? 1. 3. 2. 4. 3. Brian A. Is each data structure correctly typed? Is each data structure properly initialized? Are descriptive data structure names used? Could global data structures be made local? Have all data structures been explicitly declared? Is the initialization of a data structure consistent with its type? Are there data structures that should be defined in a type definition? Are there data structures with confusingly similar names? Data-Reference Errors 1. 2. 5.. 5. 2. 5.g.. Comparison Errors 1.g. 8. 4. is the denominator not too close to zero)? Is the correct data being operated on in each statement? Are there any computations involving variables having inconsistent data types? Is overflow or underflow possible during a computation? For expressions containing more than one operator. is the correct level of indirection used? For all references through pointers.

what are the consequences of falling through the loop? For example. write)? Have all files been closed after use? Is buffered data flushed? Are there spelling or grammatical errors in any text printed or displayed by the function? Are error conditions checked? Interface Errors 1. Are there switch case 's missing break statements? If so. 2. Are there any "dangling elses" in the function (recall that an else is always associated with the closest unmatched if )? 20. 4. one too many or too few iterations)? 19.g. Is the nesting of loops and branches correct? 12. order.Control-Flow Errors Are null bodied if . Are there any "off by one" errors (e. the can be nested within each other)? 8. is each exit necessary? If so. 5.else statement the if statement? Are there any unnecessary branches? Are if statements at the same level when they do not need to be (e. or other control structure constructs correct? Will all loops terminate? Is there any unreachable code? Do the most frequently occurring cases in a switch statement appear as the earliest cases? Is the most frequently exercised branch of an if .g. flag = FALSE. Have all files been opened before use? Are the attributes of the open statement consistent with the use of the file (e. 5. types. 4. if ( flag == TRUE ) )? 17. and values of parameters received by a function correct? Do the values in units agree (e. Is the underlying behavior of the function expressed in plain language? Is the interface specification of the function consistent with the behavior of the function? Do the comments and code agree? Do the comments help in understanding the code? Are useful comments associated with each block of code? ... Input-Output Errors 1. 4. 6. 2. 5. 3. When there are multiple exits from a loop. inches versus yards)? Are all output variables assigned values? Are call by reference and call by value parameters used properly? If a parameter is passed by reference. Are goto 's avoided? 9. is each exit handled properly? 11.g.. Does each switch statement have a default case? 14. 3. Does the function eventually terminate? 16. Is it possible that a loop or condition will never be executed (e. 7. Are out-of-boundary conditions properly handled? 10. consider the following while statement: while ( !found && (i < LIST_SIZE) ) What happens if found becomes TRUE ? Can found become TRUE ? 18..g. a searching loop).g. Are the number. 5. 2. read. 6. 2.. Are statement lists properly enclosed in { } ? 1. Are all loop terminations correct? 13. 4. 3. else . 3. are these marked with comments? 15.g. does its value get changed by the function called? If so. For a loop controlled by iteration and a boolean expression (e.. is this correct? Comment Errors 1.

g. Is there a way of exploiting algebraic rules to reduce the cost of evaluating a logical expression? 9. Is dynamically allocated memory large enough? 3. Is statically allocated memory large enough (e.. Is the style of the function consistent with the coding standards of the project? Traceability Errors 1. is the degree of dependency on other functions low)? 3. Can the underlying behavior of the function be expressed in plain English? 2. Is the function well enough documented that someone other than the implementer could confidently change the function? 2. Are the logical tests arranged such that the often successful and inexpensive tests precede the more expensive and less frequently successful tests? 10. Has additional functionality beyond that specified in the design of the function(s) been implemented? . Can a short loop be unrolled? 7. Is code written unclearly for the sake of "efficiency"? Maintenance Errors 1. Is memory being freed when appropriate? Performance Errors 1. Do the expected changes require a considerable amount of change to the function? 4. Is there a low level of coupling among functions (e.g. Are any expected changes missing? 3.6. is there a strong relationship among functions in a module)? 4.g. Are there tests within a loop that do not need to be done? 6. Are there two loops that operate on the same data that can be combined into one loop? 8. Are there enough comments in the code? 7. Are library functions used where and when appropriate? Storage Usage Errors 1. Has the entire design allocated to the function(s) been satisfactorily implemented? 2.. Is there a high level of cohesion among functions (e.. Can a more concise storage representation be used? 4. Is there repetitive code (common code) throughout the function(s) that can be replaced by a call to a common function that provides the behavior of the repetitive code? 5. is the dimension of an array large enough)? 2. Are frequently used variables declared with the register construct? 2. Can the cost of recomputing a value be reduced by computing the function once and storing the results? 3. Can a computation be moved outside a loop without affecting the behavior of the loop? 5. Are there too many comments in the code? Modularity Errors 1.

explicit. In other words. Refer to Appendix B for a checklist of potential ambiguity indicators. all terms that could have multiple meanings should be defined in a glossary where its meaning is made more specific. precise and clear.Software Requirements Specification (SRS) Checklist (Alfred Hussein. This requires that each requirement statement accurately represent the functionality required of the system to be built. 2. . University of Calgary) 1. Are the requirements in the SRS unambiguous. Is the SRS correct? Each requirement in the SRS is free from error. try to imagine that the requirement is to be given to ten people who are asked for their interpretation. When writing the requirement. It is also important to remain objective when writing requirements. For example. If there is more than one such interpretation. At a minimum. if the problem domain states that the XYZ system is to provide a response to input within 5 seconds and the SRS requirement specifies that the XYZ system will respond within 10 seconds. The meaning of each requirement is easily understood and is easy to read. the requirement is in error. never assume that everyone will understand the requirement the way you understand it. this category of errors is concerned with the technical nature of the application at hand. It is recommended that checklists of words and grammatical constructs that are particularly prone to ambiguity be used to aid in identifying possible errors of this type. then the requirement is probably ambiguous. and clear? Each requirement in the SRS is exact and unambiguous. the requirement is just plain wrong. The difficulty of ambiguity stems from the use of natural language which in itself is inherently ambiguous. Each requirement should have a specific purpose and represent a specific characteristic of the problem domain. While the other categories within this checklist result in errors. Requirement statements should be short. there is one and only one interpretation for every requirement. precise. Rambling verbose descriptions are generally subject to interpretation.

A requirement is verifiable if and only if there is some finite cost-effective way in which a person or machine can check to see if the software product meets the requirements. and sections are present. This ensures that the TBD is not interpreted as an excuse to delay completion of the SRS indefinitely. It is further complicated by the implication that something is missing in the SRS. and all referenced material. design constraints. the SRS should include the section number and an explanation of why it is not applicable. we insure that the TBD expires at some point. Is the SRS verifiable or testable? An SRS is verifiable if and only if every requirement statement is verifiable. If this is unavoidable. When looking at the testability of a requirement consider if it can be tested by actual test cases.3. named and referenced. Conformity to the applicable SRS standard. This includes the responses to both valid and invalid input. The following qualities indicate a complete SRS: • • • • • Everything the software is supposed to do is included in the SRS. the SRS specifies what the appropriate output will be. Definitions of the responses of the software to all realizable classes of input data in all realizable classes of situations is included. all terms and units of measure are provided. 4. No sections are marked "To Be Determined (TBD)". who will complete the section and when it will be completed. and it is not a simple process to find something that is not present by examining what is present. Is the SRS complete? An SRS is considered complete if all requirements that are needed for the specification of the requirements of the problem have been defined and the SRS is complete as a document. There should be some quantitative way to test the requirement. This is completeness from a word-processing perspective. There are a number of reasons why a requirement may not be verifiable: . If a particular section is not applicable. This includes functionality. or external interfaces. all figures and tables are numbered. performance. each TBD should be appended with a notation that explains why it can't be completed. The use of "TBD" in a section of the SRS should be avoided whenever possible. All pages are numbered. By including the name of the responsible individual and the date. what needs to be done to complete the section. This implies that for every input mentioned in the SRS. analysis or inspection. This is a difficult quality to judge because it requires a complete understanding of the problem domain.

Inconsistency can manifest itself in a number of ways. In some cases. Inconsistency can also occur between the requirements and their source. The term prompt to denote a message to have a user input data is used in one requirement while the term cue is used by another requirement to mean the same thing. 6. For example. but this is always a customer prerogative. Using the same vocabulary as the source documents works towards solving this form of inconsistency. a requirement may dictate how a task is to be accomplished. Using non-measurable quantities such as "usually" or "often" implies the absence of a finite test process. This leads to non-verifiability. A logic inconsistency may be one requirement stating that the software will multiply the user inputs. not "how". It is important to insure that the terminology used is the same as the terminology used in the source documents. For example. . 5. Does the SRS deal only with the problem? The SRS is a statement of the requirements that must be satisfied by the problem and they are not obscured by design detail. Requirements should state "what" is required at the appropriate system level. one requirement states all inputs shall be via a menu interface while another states all inputs shall be via a command language.• • The requirement is ambiguous. one requirement may state that system A will occur only while system B is running and another requirement may conflict by stating that system A will occur 15 seconds after the start of system B. • • • Conflicting terms: Two terms are used in different contexts to mean the same thing. Conflicting characteristics: Two requirements in the SRS demand the software to exhibit contradictory attributes. there is no way to verify software that exhibits ambiguous traits. Temporal or logical inconsistency: Two parts of the SRS might specify conflicting timing characteristics or logic. Is the SRS consistent? An SRS is consistent if and only if the stated requirements do not conflict with other stated requirements within the SRS. Refer to Appendix C for additional nonquantifiable measures. another requirement may state that the software will add the user inputs.

. the SRS can become inconsistent. an index. instead state "what" has to be accomplished. with a table of contents. there are now two locations to change. Is the SRS modifiable? An SRS is modifiable if its structure and style are such that any necessary changes to the requirements can be made easily. The SRS should have no redundancy. do so by paragraph reference. it is a technique. Is each requirement in the SRS feasible? Each requirement in the SRS can be implemented with the techniques. If growth capability is to be included. If the change is made in only the first location. redundancy should not be used unless absolutely necessary. Avoid telling the designer "how" to do this job. The difficulty here is that a requirement may be repeated in another requirement to make the second requirement clearer. Modifiability of an SRS requires the following: • • The SRS has a coherent and easy-to-use organization. a requirement should not be in more than one place in the SRS. and consistently. Specify what functions and performance attributes the user requires and yet do not commit to any particular physical solution. 8. tools. The level to which a "hook" is to be designed into the system should be clearly identified. which can be used to improve readability. Set performance bounds based on system requirements and not state-of-the-art capability. If references to a previous requirement are necessary. Vague growth statements are generally useless. resources and personnel that are available within the specified cost and schedule constraints. Though a cross-reference can help to alleviate this problem. Requirements should not be technologyoriented. state and place bounds on it. Is the SRS traceable? An SRS is traceable if the origin of each of its requirements is clear and if it facilitates the referencing of each requirement in future development or enhancement documentation. If the first requirement needs to be changed. 9. This will allow easy modification of the SRS in future updates. and cross-references if necessary. completely. Redundancy itself is not an error. but it also can reduce modifiability. yet provide sufficient freedom so design is not constrained. This attribute of an SRS is more concerned with the format and style of the SRS as opposed to the actual requirements themselves. that is. 7.Requirements must be detailed enough to specify "what" is required.

Does the SRS use the appropriate specification language? The use of "shall" statements is encouraged. 11. This is done by each requirement in the SRS having a unique name or reference number. Does the SRS specify design goals? Design goals should be used with discretion. All requirements must be met even if the design goal is not satisfied. 7. unless interrelated requirements cannot be separated and still provide clarity. 9. intent or declaration of purpose. Unwillingness to make such a commitment may indicate that the requirement. Quantitative design goals must have a requirement associated with them. Preferably the requirement should be titled as well. Is the SRS correct? Are the requirements in the SRS nonambiguous. Grouping of several requirements in a single paragraph should be avoided. is too vague or is stating "how" and not "what".Each requirement should be contained in a single. There are two types of traceability: • • Backward traceability implies that we know why every requirement in the SRS exists. It is recommended that "will. This will facilitate backward traceability to source documents and forward traceability to software design documents and test cases. should and may" not be used in writing requirements unless absolutely necessary. The use of "shall" requires a sense of commitment. 8. 4. APPENDIX A Software Requirements Specification (SRS) Summary Checklist 1. wish. 10. etc. 3. 6. "Will" statements imply a desire. as currently stated. and clear? Is the SRS complete? Is the SRS verifiable or testable? Is the SRS consistent? Does the SRS deal only with the problem? Is each requirement in the SRS feasible? Is the SRS modifiable? Is the SRS traceable? . "Should" or "may" are used to express non-mandatory provisions. Each requirement explicitly references its source in previous documents. numbered paragraph so that it may be referred to in other documents. precise. 2. Forward traceability implies that all documents that follow the SRS are able to reference the requirements. 5. as this implies a directive to express what is mandatory.

"latest". "and/or". Implied certainty. Imprecise verbs. APPENDIX C Nonquantifiable Measures These words signify non-quantifiable measures that can indicate that a requirement can not be verified or tested: • • • • • • • • Flexible Modular Efficient Adequate Accomplish Possible (possibly/correct(ly)) Minimum required/acceptable/reasonable Better/higher/faster/less/slower/infrequent ." (by whom?) Every pronoun. and "where practicable". such as "supported". "never". Words ending in "est" or "er" should be suspect. and "TBD". "to the greatest extent". or "rejected". Passive voice. such as "generally". Checklist of Words and Grammatical Constructs Prone to Ambiguity • • • • • • • • Incomplete lists. such as "earliest". or "every". "all". "highest". Does the SRS specify design goals? 11.10. Comparatives. "normally". such as "the counter is set.". particularly "it" or "its" should have an explicit and unmistakable reference. Does the SRS use the appropriate specification language? APPENDIX B. such as "always". typically ending with "etc. Words whose meanings are subject to different interpretations between the customer and contractor such as: o Instantaneous o Simultaneous o Achievable o Complete o Finish o Degraded o A minimum number of o Nominal/normal/average o Peak/minimum/steady state o As required/specified/indicated o Coincident/adjacent/synchronous with. "handled". "processed". Vague words and phrases.

• • • • • Some/worst Usually/often To the extent specified To the extent required To be compatible/associated with NOTES ON THE SOFTWARE INSPECTION: .

code. Studies have found that 30 to 90 percent of the defects in a work product may be uncovered through the inspection process. It should be noted that inspections may take up a significant portion of a project's time and budget if performed on a consistent basis throughout the life of a project. Tyran Overview Software plays a major role in modern organizations. project teams may devote four to fifteen percent of project time to the inspection process. The software inspection approach is a formally defined process involving a series of well defined inspection steps and roles. Frequently software is delivered late and does not meet user requirements due to defects. Quality improvements will reduce defects and thus speed up development time (due to less rework required prior to release) and decrease the costs and personnel demands associated with program maintenance (due to fewer defects in the released software). test plan). Unfortunately. . Early detection of defects can lead to cost savings. For example. It has been suggested by software experts that the way to address these types of software problems is to improve software quality through quality assurance methods. One of the most well known software quality techniques is the software inspection.. Many software problems may be attributed to the development of low quality software that is characterized by numerous defects. as it is used to run the computerbased systems which collect. user interface design. and the formal collection of process and product data. Software inspections are conducted in industry because they have been found to be an effective way to uncover defects. one study has estimated that inspection-based techniques at Hewlett-Packard have yielded a cost saving of $21 million. the development of software is a major headache for organizations. store. According to an industry estimate from AT&T. While allocating this amount of time on inspections may seem high. and organize the data used to conduct business. The detection rate for an inspection varies depending on the type of work product being inspected and the specific inspection process used.Overview and Frequently Asked Questions Prepared by Craig K. A software inspection is a group meeting which is conducted to uncover defects in a software "work product" (e. transform. requirements specification. a checklist to aid error detection. the benefit of reducing software defects has been found to outweigh the cost of conducting inspections.g.

poor user interfaces may be generated during the design phase. Examples of work products corresponding to some of the different system life cycle phases include: o Analysis phase: DFDs. What is meant by a "work product"? • A work product refers to the models." Software Inspections: Frequently Asked Questions What is a software inspection? • A group review of a software "work product" for the purpose of identifying defects or deficiencies. studies show that a defect that is found during system testing can be 10 to 100 times more expensive to fix than defects found early in the development life cycle (e. Defects may happen throughout the development process. the cheaper it is to fix. structure charts o Implementation phase: program code. etc. Why perform software inspections? • Inspections are justified on an economic basis. programs.. process specifications o Design phase: input/output forms. that are generated during the systems development process. who has written that "the [software inspection] has been the most cost effective technique to date for eliminating software errors.g. The earlier that a software defect can be identified. analysis and design). database design. project dictionary entries. ER diagrams. reports. test plans. For example. designs. . and "buggy" code and incomplete testing plans may be generated during the implementation phase. errors regarding DFDs and the project dictionary may occur during the analysis phase. test plan for code What is meant by "defects and deficiencies"? o Anything that may ultimately hinder the quality of the final software product is considered a defect. For example.The favorable attitude that many in the software development field have toward the inspection process is underscored by a statement made by Barry Boehm (a well known expert in the field of systems development).

Does a software inspection entail error detection AND error correction.e.. However. if software developers in an organization tend to make the same types of errors on multiple projects. the organization may wish to understand the root cause for these problems and take action to improve the development process. Are inspections typically used for purposes of performance appraisal? • No.• • Data collected during the inspection process (e. They have found that if inspections are used to formally track performance of IS personnel for performance appraisals. then the inspections become too much of a high-stress activity and some personnel do not want to participate. Inspections can help to improve communication and learning o Among IS personnel o Between IS and user personnel What is the "output" of an inspection review? • An "Action List" of the errors/deficiencies that need to be fixed.g.. the types and rates of defects) can be used to help improve the software development process. When should an inspection be conducted? . Organizations that conduct inspections have found that it is best to use the inspection process to improve the quality of a work product and not use the inspection for performance appraisal. the producer is always innocent (i. focus on the product and not the person who developed the product). For example. Error correction is delegated to the person who produced the work product. This list is then passed onto the person who produced the work product. What is a good attitude for inspectors to have during an inspection? • • Consider the work product guilty until proven innocent. or just error detection? • Only error detection.

Also. Once the work product is "in shape" to be viewed by users. there may be an internal MIS review involving only IS personnel where the work product is checked to make sure it is accurate and clear." inspections may lead to excessive criticism and conflict.• • When a unit of work/documentation is completed and is still in a "bite size" piece. In the latter case. then an inspection should be scheduled. Also. code inspections typically involve only IS personnel. there is the chance that inspectors can get bogged down on trying to correct problems. What may be some of the potential problems that happen during inspection meetings? • Group inspection meetings may be unproductive due to sidetracking and domination by certain group members. How can these potential problems be addressed? • To help address some of the potential problems with group inspections. Specific guidelines include: ° The number of participants should be manageable . However. For example. Who participates in an inspection? • IS personnel and users may be involved. an inspection of code could be conducted once the program for a new module has been coded. successful inspection teams follow a set of guidelines/ground rules. For example. During programming. Conduct inspections frequently to find errors as early as possible. an external review and inspection involving both IS and user personnel may take place (to determine if the system supports the user needs). during analysis.Range: 3-6 people ° ° Emphasize error detection. The selection of participants depends on the purpose of the inspection. an inspection concerning analysis and design work products may involve the users. prior to a review with the user. if inspectors are not "polite. it may be appropriate to conduct an inspection once the first draft of DFDs for a subset of the project have been completed. not correction Participants should be "charitable critics" (don't be nasty) .

R. ************************* T H E E N D ************************* 240 . when inspecting an analysis documents. 11(4). 1994. Recognize that there may be some "open issues" that can not be resolved at the inspection meeting (e. New York.. New York. V.° ° ° Follow organizational standards (e. England. Wokingham. "Indutrial sofware metrics top ten list. D. Gilb. September 1987. "Key lessons in achieving widespread inspection use... S. et al. and Strauss. 31-36. July 1994. and Graham. May 1989. "Software inspections: An effective verification process. 8485." IEEE Software. Ebenau." IEEE Software. Software InspectionAddison-Wesley.g. Sources: Ackerman. Software Inspection ProcessMcGraw-Hill." IEEE Software. Boehm.) to reduce misunderstandings or disagreements. .g. etc. T. A. 1993. and Van Slack. there may be some questions that need to be referred to the user) Keep length of inspection to be less than two hours to reduce fatigue factor. Grady. diagramming/naming conventions. R. T.

Sign up to vote on this title
UsefulNot useful