You are on page 1of 116

Department of Computing Science Faculty of Computing & Engineering

Software Engineering using C++


Lecture Notes

Prepared by Terry Chapman

September 1999

Table of Contents Table of Contents


BASIC C++ .....................................................................................................................1
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 1. 2. 3. 4. 5. 6. 7. 8. 9. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. A First C++ Program.......................................................................................................................... 1 Data Types.......................................................................................................................................... 1 String Constants.................................................................................................................................. 2 Variables and Constants ..................................................................................................................... 3 Arithmetic Operators .......................................................................................................................... 4 Type conversions................................................................................................................................ 5 Assignment operator........................................................................................................................... 5 The compound assignment operators ................................................................................................. 5 The increment & decrement operators ............................................................................................... 5 Iostream library .................................................................................................................................. 6 Command line redirection .................................................................................................................. 6 Streams ............................................................................................................................................... 6 Output manipulators ........................................................................................................................... 7 Relational operators and expressions.................................................................................................. 8 FALSE and TRUE.............................................................................................................................. 8 Logical operators and expressions...................................................................................................... 9 Short-circuit evaluation ...................................................................................................................... 9 The while statement.......................................................................................................................... 10 The if statement ................................................................................................................................ 11 Style for logical expressions............................................................................................................. 12 The ctype library............................................................................................................................... 12 Introduction ...................................................................................................................................... 13 Input and output in functions............................................................................................................ 15 Multi-function programs................................................................................................................... 15 Stepwise Refinement (or Top-down design) .................................................................................... 16 Automatic variables.......................................................................................................................... 17 Function values................................................................................................................................. 17 Function arguments .......................................................................................................................... 17 Function argument agreement & conversion.................................................................................... 18 Overloaded function names .............................................................................................................. 18 Reference Arguments ....................................................................................................................... 19 Function comments .......................................................................................................................... 19 Summary .......................................................................................................................................... 19 The type cast operator ...................................................................................................................... 21 The comma operator......................................................................................................................... 21 The conditional operator................................................................................................................... 21 The for statement.............................................................................................................................. 22 The do statement............................................................................................................................... 23 Nested loops ..................................................................................................................................... 23 The break statement ......................................................................................................................... 24 The continue statement..................................................................................................................... 24 The switch statement ........................................................................................................................ 24 Introduction ...................................................................................................................................... 27 Reference Type................................................................................................................................. 27 Pointers v References ....................................................................................................................... 28 Enumeration Types........................................................................................................................... 29 The typedef statement....................................................................................................................... 29 Reference arguments to functions .................................................................................................... 29 Pointer arguments to functions ......................................................................................................... 30 Default arguments ............................................................................................................................ 32 Inline functions................................................................................................................................. 32 Mathematical functions .................................................................................................................... 33 i

FUNCTIONS .................................................................................................................13

FLOW OF CONTROL ...................................................................................................21

POINTERS, REFERENCES AND FUNCTIONS............................................................27

Table of Contents
ARRAYS .......................................................................................................................35
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 1. 2. 3. 4. 5. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. ii Introduction.......................................................................................................................................35 Defining and referencing arrays........................................................................................................35 Array initialisation ............................................................................................................................37 Multi-dimensional arrays ..................................................................................................................38 Arrays as function arguments ...........................................................................................................38 Pointers and arrays............................................................................................................................39 Character strings and variable pointers .............................................................................................40 Character string input/output ............................................................................................................40 Arrays of pointers and pointers to pointers .......................................................................................41 Command line arguments .................................................................................................................42 Initialising pointer arrays ..................................................................................................................43 Review ..............................................................................................................................................43 Summary...........................................................................................................................................44 An array application - Stack of char .................................................................................................45 Introduction.......................................................................................................................................47 The steps to produce an executable...................................................................................................48 Types, storage class and scope..........................................................................................................48 Local duration ...................................................................................................................................49 Declaration versus definition ............................................................................................................50 Static duration ...................................................................................................................................51 Storage class static ............................................................................................................................52 Static local variables .........................................................................................................................52 Static global variables .......................................................................................................................52 The C++ pre-processor .....................................................................................................................53 Conditional compilation....................................................................................................................53 Conditional file inclusion..................................................................................................................54 Data Types ........................................................................................................................................55 Abstract Data Types..........................................................................................................................55 Classification ....................................................................................................................................55 Categories of Collection ...................................................................................................................56 Stacks................................................................................................................................................56 Abstract Data Type? .........................................................................................................................59 Queues ..............................................................................................................................................59 Lists...................................................................................................................................................61 Structs ...............................................................................................................................................61 Unions...............................................................................................................................................62 Structures ..........................................................................................................................................63 Comparison between structs and arrays ............................................................................................64 Storage Management ........................................................................................................................65 Dynamic Data Structures - Linked Lists ...........................................................................................68 Other dynamic structures ..................................................................................................................72 Introduction.......................................................................................................................................73 Components of Sorting .....................................................................................................................73 Sorting Files......................................................................................................................................73 Why sort?..........................................................................................................................................75 Does it pay to sort? ...........................................................................................................................75 What is the best sort? ........................................................................................................................75 Sorting efficiency..............................................................................................................................75 Simple Array Sort - Exchange (Bubble) ...........................................................................................76 Insertion Sort.....................................................................................................................................77 Simple Sort performance ..................................................................................................................78 Conclusions.......................................................................................................................................78 Complex sorts ...................................................................................................................................78

PROGRAM FILES ........................................................................................................47

DATA STRUCTURES ...................................................................................................55

DYNAMIC DATA STRUCTURES..................................................................................63

SORTING......................................................................................................................73

Table of Contents
13. 14. 15. 16. 17. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 1. 2. 3. 4. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 1. 2. 3. 4. 5. 6. 7. 8. 1. 2. 3. QuickSort.......................................................................................................................................... 79 Efficiency of Quicksort .................................................................................................................... 80 C++ code for function Quicksort ( see Wirth )................................................................................. 81 Comparison of complex sorting algorithms...................................................................................... 81 Further Reading ................................................................................................................................ 81 The context for testing - Verification and Validation....................................................................... 83 The objectives of testing................................................................................................................... 83 Testing & Debugging ....................................................................................................................... 84 Two different testing strategies ........................................................................................................ 84 Categories of Testing........................................................................................................................ 86 Test Planning.................................................................................................................................... 86 How much testing? ........................................................................................................................... 87 Test Data v Test Cases ..................................................................................................................... 87 Black box v White box testing ......................................................................................................... 87 Black box testing .............................................................................................................................. 88 White box testing - Introduction....................................................................................................... 91 White box testing.............................................................................................................................. 92 Automated Testing ........................................................................................................................... 96 Representing Abstract Structure ....................................................................................................... 99 Implementing Data Structures ........................................................................................................ 100 Metrics............................................................................................................................................ 100 Mathematical Notations.................................................................................................................. 101 Applications.................................................................................................................................... 105 Implementation............................................................................................................................... 105 Variations ....................................................................................................................................... 105 Example Declaration ...................................................................................................................... 105 Expression Trees ............................................................................................................................ 106 Tree Traversal................................................................................................................................. 106 Parse Trees ..................................................................................................................................... 107 Binary Search Trees ....................................................................................................................... 107 Importance of Balance.................................................................................................................... 108 Other types of tree .......................................................................................................................... 108 Applications.................................................................................................................................... 111 Operations ...................................................................................................................................... 111 Efficiency ....................................................................................................................................... 111 Problem .......................................................................................................................................... 111 Hashing........................................................................................................................................... 111 Collision Resolution ....................................................................................................................... 112 Hash Table example ....................................................................................................................... 112 Perfect Hashing Functions.............................................................................................................. 113 The ctype library............................................................................................................................. 115 The maths library............................................................................................................................ 116 The standard library........................................................................................................................ 117

TESTING ......................................................................................................................83

DATA STRUCTURE METRICS ....................................................................................99

TREES ........................................................................................................................ 105

HASH TABLES........................................................................................................... 111

LIBRARIES................................................................................................................. 115

BIBLIOGRAPHY......................................................................................................... 119

iii

Basic C++ Basic C++


1. A First C++ Program
// first.cpp // My first C++ program // A. Student // 27/09/99 #include<iostream> int main( void ) { cout << Hello World << endl; return 0; } The lines starting with // are comments. These are for human consumption - the compiler ignores them. They cause all text on the current line to the right of the symbol to be a comment. An alternative form of comment is the pair:/* this is a comment */ These do not need repeating on every line and therefore a number of lines can be enclosed within one pair. Since the program is going to display output, it is necessary to make available the input/output library iostream. This is done by issuing a compiler directive that the text of the file iostream.h should be included in the compilation. The compiler knows where to find this file. The word cout represents the output stream and the symbol << causes what follows it to be placed on the standard output stream. By default, the standard output stream is displayed on the terminal. Every C++ program must have one, and only one function main. This is where program execution always commences. This, and all other functions have a return type, in this case int, and an argument list, in this case empty - indicated by void and a body that is delimited by open and close braces { }. The first line of function main outputs the message Hello World to the terminal followed by a new line. The program then terminates, returning the value 0 to the operating system. By convention, a return value of 0 indicates success. This program deals only with two values - a constant string1 literal containing the words Hello World and an integer constant 0. It does not require the use of any variables. Most programs require the use of variables, i.e. storage locations in memory that contain values during program execution. Variables may be of different types.

2.

Data Types
There are a number of basic data types built in to all programming languages. A data type consists of a name and a specification of :! !

the range of values that a variable of that type can hold - its domain. This range is often limited due to the amount of storage that is used by such items. the operations that may be carried out on values of that type

In C++, the most common data type is int - whole numbers that may be positive or negative - natural numbers. The amount of storage allocated to variables of type int is

A sequence of characters

Basic C++
often 2 bytes and sometimes 4 bytes depending on the compiler. This allows a range of values from
! !

-32768 - 32767 in the case of 2 bytes and -2,147,483,648 - 2,147,483,647 where 4 bytes are employed.

These peculiar ranges arise from use of the binary system. The fundamental native2 data types and their storage size in GNU C++ are:type Char unsigned char short int Int unsigned int long int Float Double Range of values Character codes 0 - 127 Unsigned character codes 0 - 255 Signed integer -32768 to 32767 Signed integer -2,147,483,648 to 2,147,483,647 Unsigned integer 0 - 4,294,967,295 Signed integer -2,147,483,648 to 2,147,483,647 1.17549e-38 to 3.40282e+38 2.22507e-308 to 1.79769e+308 Bytes 1 1 2 4 4 4 4 8

Note that, unlike some compilers, GNU C++ uses 4 bytes for type int thus providing the same range of values as type long int (or just long). Unsigned integers have double the capacity of signed integers because there is no need to store the sign. Strings and characters are not the same. A string containing only a single character, e.g. "W" actually occupies 2 bytes of storage one for the 'W' and one for the ASCII NUL. A character variable can hold only one single character, e.g. 'W', normally occupying only one byte. To declare a variable of type string and give it a value immediately:char myname[] = "Terry Chapman"; If the string is not intended to be changed, it should be declared as a constant:const char myname[] = "Terry Chapman"; The empty brackets signify an array whose size is determined automatically by the compiler which also reserves space for the terminating ASCII NUL. The variable or constant can be output in the usual way, i.e. cout << myname;

3.

String Constants
A string constant is a sequence of characters enclosed in double quotes. e.g. "MSc Information Technology". The sequence may be empty e.g. "". If the string is to include certain characters, e.g. double quotes and the backslash, then these must be escaped with the '\' backslash character, e.g. "She said \"I have lost my file mydir\\myprog.cpp\"". When output, this would display: She said "I have lost my file mydir\myprog.cpp"

Types built into the language

Basic C++
Other special characters may be included, e.g.
\n \t \f newline Tab formfeed \? \' \a question mark single quote alarm bell

A string constant can extend over 2 or more lines by placing a backslash at the end of an uncompleted line. Two adjacent strings are concatenated to form a single string e.g. "This string " "is concatenated with this one" There is no native data type string in C++. Instead, strings are implemented as an array3 of characters terminated by the special character '\0' (ASCII NUL). 0 1 2 3 4 5 ie the unprintable character which has the ASCII code 0. We will cover arrays H e l l o \0 later - they are a very important compound data type holding a sequence of data items in a contiguous area of memory. Strings and characters are not the same. A string containing only a single character, e.g. "W" actually occupies 2 bytes of storage one for the 'W' and one for the ASCII NUL. A character variable can hold only one single character, e.g. 'W', normally occupying only one byte. To declare a variable of type string and give it a value immediately:char myname[] = "Terry Chapman"; If the string is not intended to be changed, it should be declared as a constant:const char myname[] = "Terry Chapman"; The empty brackets signify an array whose size is determined automatically by the compiler which also reserves space for the terminating ASCII NUL. The variable or constant can be output in the usual way, i.e. cout << myname;

4.

Variables and Constants


Variables are names associated with a value. In programming, names are referred to as identifiers. During program execution, the value associated with an identifier may be changed many times. In C++ the compiler must know the type of the identifier because this determines the amount of storage that must be allocated for its value. For this reason, every variable declaration must have a type. In addition, the variable may be initialised with a value:Examples:int sum; int size = 37; int sum, total = 0; float average = 0.0; char ch; char ch = ; char progname[] = myprog.cpp; // variable named sum of type int // initialised on declaration // 2 integers, only total is initialised to 0 // Initialisation must be of the appropriate type // Uninitialised declaration // literal space surrounded by single quotes // strings by double quotes

A contiguous sequence of memory locations

Basic C++
Identifiers must start with a letter. After this, they may contain any number of letters, digits or the underscore character. They must not include spaces. int this_is_a_very_long_identifier_with_99 = 99; // valid float The Average; // invalid - contains a space char 2good; // invalid, starts with digit You must use meaningful identifiers. They are part of the programs documentation and should be expressive of the purpose for which the identifier is required. An exception to this is loop control variables that have no other purpose than to access elements of an array. These are commonly a single character e.g. i, j. Constants are named items that cannot change. These are used for values in the program that will remain constant throughout the programs execution. They must be initialised with a value. Examples:const double pi = 3.14159265359; const int numitems = 350;

5.

Arithmetic Operators
+ * / % unary plus or addition unary minus or subtraction multiplication division modulus

Note that there is no exponentiation operator that raises a number to a power. There are library routines that accomplish this. The above operators apply to all numeric types (except %). Modulus produces the remainder after integer division and applies only to integral types5 % 2 = 1, 11 % 3 = 2, 19 % 5 = 4.

You should find a table of operator precedence in your textbook. 2 + 3 * 4 means "add 2 to the product of 3 and 4". If you want it to mean "add 2 to 3 and then multiply by 4" you must change the precedence with parentheses (2 + 3) * 4. A combination of arithmetic operators and arithmetic constants or variables is known as an arithmetic expression. An expression has a value, thus 10 * 3 has the value 30. A statement on the other hand is a command to carry out processing, e.g. x = 10 * 3; is a statement that means assign to the variable x the value of the expression 10 * 3. You might have rationalised the difference between a statement and an expression by thinking to yourself that an expression has a value whereas a statement does not. You would be correct if you were talking about most conventional programming languages like Pascal, Modula-2 and BASIC. But you would be wrong if you were talking about C and C++ since, in these languages, a statement also has a value - in the above example, the statement x = 10 * 3 has the value (30). This value can be used for further operations, e.g. for assignment to another variable:y = x = 10 * 3; // both x and y now have the value 30

Basic C++
6. Type conversions
There are two aspects:!

Automatic conversions carried out by the compiler These are discussed below (para 7) Type conversion operators These use the name of a type as a function in order to force an expression into a particular type e.g. int(99.21) will yield 99.

7.

Assignment operator
C++ carries out automatic type conversion so that the result of an expression on the right hand side of the assignment symbol is automatically converted (if possible) into the type of the variable on the left hand side. This is convenient in many ways, but there are occasions when you need to know what the exact effect is. Like letting a futuristic washing machine automatically decide what program to use according to the clothes you put in. What program does the machine decide to use when you wash a silk shirt and a very dirty towel? Do you get a grubby towel or a ruined silk shirt? Ultimately you will need to know what the conversion rules are, but do not worry about them at present. In any case it is desirable not to make a habit of mixing your washing since you may get a result you did not expect. Briefly, fractional values (types float and double) are truncated when assigned to integral variables (int, unsigned int, long int). Large values that exceed the capacity of the integral variable to which they are assigned will cause overflow and the result will be meaningless. No overflow warning is issued and care should be taken when writing expressions with integral value to ensure that overflow does not occur.

8.

The compound assignment operators


These are all very intuitive and make life easier by reducing the typing. count += 2 stock -= 1 divisor /= 10.0 power *= 9 remainder %= 2
increment the value of count by 2 decrement the value of stock by 1 assign to the float divisor the result of dividing its current value by 10 assign to power the result of multiplying its previous value by 9. assign to remainder the result of taking its current value modulus 2

Note that the last may only be used with integers, all others may be used with any arithmetic type. Note also the effect of sum /= 3 + 7. The expression 3 + 7 is evaluated first.

9.

The increment & decrement operators


y = x++ y = ++x similarly with -We will return to these when we look at processing arrays using loops.
assign the old value of x to y and then increment x (postincrement) increment x and assign the new value to y (preincrement)

Basic C++
10. Iostream library
Input and output in C++ is based on streams. A stream is an abstract concept that you do not need to worry about. Just think of the natural phenomenon. Whenever a C++ program executes, three streams are opened automatically - standard input, standard output and standard error. Normally, standard input is expected to come from the keyboard and standard output is sent to the display. However standard input and standard output can be redirected from the DOS command line using the < and > characters when the program is executed. Standard error cannot be redirected.

11. Command line redirection


If you wish to capture the output of a program in a file, simply redirect its output as follows myprog > myprog.out Similarly you can substitute a file to be the input to a program instead of the keyboard. myprog < myinput To redirect both input and output use something like myprog < myinput > myprog.out

12. Streams
Access to istream (input stream) and ostream (output stream) operators is obtained by putting the preprocessor directive #include<iostream> at the top of each program file that needs to carry out standard input and/or standard output. This has the effect of including the header file iostream.h (a text file) in the compilation.

12.1 Unformatted input and output


cin and cout are the predefined standard input and output streams defined in the above header file (there is a third cerr):cin >> x >> y >> z; obtains from the keyboard values for 3 variables. Spaces or tabs may separate the actual inputs.

cout << "A message : " << message << endl; where message is a string constant or variable. << and >> endl cin.get(ch) cout.put(ch) cout.good() cin.good() cout.bad() cin.bad() cin.eof()
are known as the insertion and extraction operators. The unusual notation arises from the object-oriented aspects of the language. Just take it for granted at present causes subsequent output to be displayed on the next line of the display. gets a single character from standard input and returns the state of the standard input stream puts a single character to standard output and returns the state of the standard output stream Return true if there has been no error from the last output (input) operation The opposite of good() Returns true if end of input encountered, false otherwise. When entering from the keyboard, end of input is indicated with Ctrl Z.

All of the fundamental types supported by C++ (including strings) may be input using cin >> and output using cout <<.
6

Basic C++
13. Output manipulators
As their name implies, these allow formatting of the output stream for such things as the field width, justification, decimal precision etc. They are normally included within the output statement - see examples below and Skansholm pp 365-369. Use of these manipulators requires that the header file iomanip be included in the program:#include<iomanip> setw(int)
sets the field width to n characters for the output e.g. cout << "22 right adjusted in field width of 4 is [" << setw(4) << 22 << "]"; produces 22 right adjusted in field width of 4 is [ 22] setw must be repeated for each subsequent output for which a fieldwidth is required. In the absence of setw() the fieldwidth is the actual width of the output. specifies the character that is to be used for padding output that is narrower than the field width, e.g. cout << [ << setw(4) << setfill(*) << 22 << ]; produces [**22]

setfill( char )

setprecision(int)

changes the precision for the display of types float and double (the default is 6 digits). Normally it determines the number of digits displayed, but if the showpoint flag (see below) has been set, then it controls the number of decimal places displayed change flags that control such things as justification, precision etc. setiosflags( ios::showpoint ) forces the decimal point to be displayed even for whole numbers. After the showpoint flag has been set, the effect of setprecision is to control the number of decimal places displayed. setiosflags( ios::left ) and setiosflags( ios::right ) determine the justification of the output which will remain unchanged until the flag is modified by another call. setf() is a member function of iostream and does the same job as setiosflags except that it cannot be used within an output statement as setiosflags can. It would be called by e.g. cout.setf(ios::right);

setiosflags( ) and setf()

The items starting with ios:: within the parentheses after setiosflags are constants that are defined in the iostream library. Their names are self-explanatory and you do not need to know their values. The meaning of ios:: will only be explained in a subsequent module unless you read up on it yourself. A program basiccpp.cpp is provided in the lab that shows the effect of setw(n) and some of the flags that can be set using setiosflags(), including display of integer in octal and hexadecimal.

Basic C++
14. Relational operators and expressions
14.1 Relational operators
< > <= >= == != less than greater than less than or equal greater than or equal equal (Note: 2 equal signs with no space between) not equal

14.2 Relational expressions


These expressions compare two values and return true if the test succeeds and false otherwise. ch > 'A' y * y <= 2 * y + 1 f < 0.0 y == x ch != '\0' Take care not to use = as the equality operator. This is a common programming error. Beware of testing two floating point variables for equality. Their binary internal representation means that many fractional values cannot be expressed exactly. Instead, test for the difference between their absolute values. The function fabs(<float>) can be used to find the absolute value of a float or double. To use it you need to #include<math.h> double f1 = 12.34574, f2 = 12.34578; const double delta = 0.00005; if ( fabs( f1 - f2 ) < delta ) // consider them equal else // consider them unequal See also Skansholm p52.

15. FALSE and TRUE


Recent C++ compilers support a bool data type that can take one of two possible values true or false. Earlier compilers do not support this type and, instead, false is represented by an integer with the value 0 and true by any non-zero value. GNU C++ provides the bool data type and we shall be using it on this course. If you are using a compiler at home that does not support bool, there is a simple addition that you can make to your programs that gets around this deficiency. Enter the following into a file called bool.h and #include this in all programs if you are not using GNU C++:typedef int bool; const bool false = 0; const bool true = !false; However, I strongly recommend that you do use the GNU compiler. Several students have had problems when using the Borland 4.5 compiler.
8

Basic C++
16. Logical operators and expressions
16.1 Logical operators.
The draft C++ ANSI standard introduced the new operators AND, OR and NOT. These are not supported by the GNU C++ compiler, nor by Borland 4.5. Instead use &&, || and !
&& or AND ||, or OR !, or NOT logical AND logical OR unary negation

16.2 Logical expressions


!5 !0 ch = 'a'; ch == '\0' !(ch == '\0') (ch) (!ch) (ch && ch != '\n') (ch == 'a' || ch == 'A') false true assign to ch the letter 'a', i.e.NOT the test for equality false true true (the character 'a' is converted to an integer and is tested for non-zero) false true true

17. Short-circuit evaluation


Note that, in the logical expression expression1 && expression2 both expressions must be true for the whole expression to be true. If expression1 yields false, then the whole expression cannot possibly be true. Therefore expression2 does not need to be and will not be evaluated. Similarly with expression1 || expression2, if the expression1 yields true, then the whole expression must be true, whatever the value of expression2, so expression2 is not evaluated. This feature is important in cases where, if the first test fails, the second test must not be evaluated because it would cause an error. We will meet this again when we come to look at pointers.

Basic C++
18. The while statement
This is one of several iteration constructs provided by C++, and is the simplest. while ( logical expression == true ) <statement> The parentheses () are required. If there is more than one statement to be executed within the loop then braces { } are required:while (logical expression == true) { statement1; statement2; etc.. }
while
condition set up condition

true

statement(s) false set up condition

Example
// show.cpp // copies its input to its output next #include<iostream.h> statement int main(void) { char ch; cin.get(ch); // get a character from the keyboard while ( cin.good() ) // Becomes false if end of file or other input problem { cout.put(ch); // output the character to the display cin.get(ch); // get the next char in preparation for the next loop iteration } return(0); } The while statement is preceded by a statement cin.get(ch) that sets up the value to be tested by while. This is important because the termination condition may already exist in which case the loop should not be entered. If the loop is entered, then cin.get(ch) is repeated at the bottom of the loop to set up the condition again. This is invariably the way that files are processed since they may be empty. It is a common error to forget to initialise the test condition before entering the while loop. This program can be used to display the contents of a text file if issued at the DOS command line using redirection:show < show.cpp displays the source program file show.cpp at the terminal

The output can also be redirected, giving a file copy show < show.cpp > showcpy.txt showcpy.txt is now an exact copy of show.cpp Here is a refinement of the above program:-

10

Basic C++
// show2.cpp // copies its input to its output #include<iostream> int main(void) { char ch; while ( cin.get(ch) ) // Becomes false if end of file or other input problem cout.put(ch); // output the character to the display return(0); } The get( ch ) function is called within the loop condition parentheses. The expression cin.get(ch) does two things: a) it gets a character from standard input and passes it back via its argument ch and b) it returns a reference to the standard input stream cin as its function result. The stream has the value 0 when there is no further input and this is the condition being tested by while. This does away with the need for the get prior to entry of the loop, and also with the get at the bottom of the loop.

19. The if statement


This statement is classified as a branching construct. It allows the flow of control of the program to be changed depending on the value tested, e.g. an input from the user or a value held in a file. if (condition) statement; Or if (condition) statement1; else statement2; As with while, if there is more than one statement in either the if part or the else part then they must be surrounded by braces {} as in the body of function main. Notice that, unlike Pascal, there must be a semi-colon after each statement. Condition is any logical expression yielding a boolean value (true or false). It may consist of expressions combined into a larger, more complex expression by the logical operators && (logical AND) and || (logical OR). Statements within each part ( if or else ) may be any statement, including another if statement. if (condition1) if (condition2) statement2a; else statement2b; else statement1; The else clause is assumed to relate to the immediately preceding if unless braces are used to change this association.
true (nonzero) statement(s) false (0)

if
condition

statement(s)

these may also be if statements

next statement

11

Basic C++
20. Style for logical expressions
In natural language we can say If late for lecture then hurry else have another coffee. We do not say If late for lecture is true then . Similarly, in programming, the test of a logical value e.g. in an if statement would be written as if( late_for_lecture ) hurry(); else have_another_coffee(); and not if( late_for_lecture == true ) hurry(); It is generally considered to be poor programming style to use this second approach and you will lose marks if you use it.

21. The ctype library


This is a 'C' library of functions that operate on characters. They include functions to test whether a char is a letter, a digit, punctuation etc. and also to carry out case conversion. See Libraries on page 115.

12

Functions Functions
1. Introduction
You have already seen and used a function - the function main which every C++ program must have. Until now it has been reasonable to write all of the code of your programs in this function. However, as programs become larger, it is necessary to break them down into collections of smaller and more manageable units. One such subdivision is the function. Functions give us the ability to store a computation in a named block of code and to carry out the computation simply by referring to its name i.e. by calling the function. This facility for breaking programs down into simpler and more manageable units is a major weapon in the fight to reduce the complexity of large programs and involves the process of abstraction. Abstraction allows us to concentrate on the current task and to ignore details that are not relevant. So when we call a function e.g. sqrt to find the square root of a number, we are concerned only with how to make the call and not what steps the function takes to achieve the computation. We do need to know the data type of the number to be passed to sqrt, the data type of the value returned by it and what happens if we pass a negative value etc. - these aspects are relevant to our making the call, but the actual details of the computation are not relevant. Of course, at different times we will have different levels and views of abstraction - if we had been concerned with writing function sqrt then we would have been concentrating our attention on expressing the algorithm to compute the square root of a number and would have ignored unnecessary detail elsewhere (e.g. the other functions which make up the library maths). A further advantage of storing code in functions is, of course, the ability to re-use them again in other programs. This type of abstraction is called procedural abstraction after the procedures - the name that most other languages use to refer to these named blocks of code. Technically a function differs from a procedure in that it returns a value, whereas a procedure does not. C++ does not have procedures, but it is possible to specify that a function does not return a value. Functions in C, C++ and most other languages (except the functional languages) do not conform closely to the mathematical concept of a function that accepts a single argument and returns a single value. As we shall see, it is possible to pass more than one value to a function and to get back more than one result. The structure of a function is:type-specifier function_name(argument_list) { definition_and_statement_list } type_specifier function_name The data type of the value that is returned by the function A programmer-defined identifier that conforms to the rules for identifiers. This is the name that is used to call the function.

formal argument_list The names and types of the values that are passed to the function on which it is to carry out some computation. definition_and_statement_list Exactly what you have been writing in function main up until now, i.e.. constant and variable definitions and statements including (normally) a return statement that provides the value returned back to the point of the call, e.g. the return(0) appearing at the bottom of main.

13

Functions
Example:You are writing a program which needs to compute values raised to a power. There is no exponentiation operator in C++, so you must develop one yourself. You want to be able to write e.g.:result = power(12,3) where result is a integral variable which is to be given the value 12 raised to the power 3 (i.e. 1728). On other occasions in the same program different numbers are to be raised to different powers, e.g. in the statement cout << power(7,5) << endl; outputs 7 raised to the power 5, i.e.16,807. So the function must be generalised to handle a range of different inputs for its single result. This generalisation is provided by the argument list. In the call to the function, the values passed to the function are known as the actual arguments i.e. 12 and 3 in the first example above, and 7 and 5 in the second example. In the definition of the function they are known as the formal arguments. It is important that you understand this distinction because these two terms are used frequently when talking about functions. Assuming that we want to be able to handle some large resulting values, the integral return type should probably be of type long int. The type of the arguments can be left as plain integer. The formal specification of function power is then:long int power(int a, int b) // long power (without the int) is also OK { definition_and_statement_list return (<long_integer_expression>) } Where <long_integer_expression> is an expression of the result of raising a to the power b. When the call power(12,3) is made the actual argument values 12 and 3 are copied into their respective formal argument variables a and b. If the actual arguments had been integer variables (as opposed to constants) with the same values (12 and 3), then the values of the actual argument variables would have been copied into the formal argument variables producing exactly the same effect. In the function, the formal arguments a and b are effectively local variables of the function. Any variable definitions made in the body of the function are also local variables. This means that they are not accessible from outside the function. In fact, normally, they only exist while the function is executing and are then removed from memory. Inside the body of power there will be an appropriate computation that produces a value representing a raised to the power b, and this value will be passed back by the return statement. A function normally has a value (unless its return type is void) and can therefore be used on the right hand side of an assignment or within a cout statement in just the same way as a variable or an arithmetic expression. In fact a call to a function which returns a value is an expression. In the case of the statement:- result = power(12,3), the returned value will therefore be assigned to result. The value returned by power can be used anywhere else that an expression of long int type is required, e.g. in cout << power(7,5) // 16,807 or even as the actual argument of a call to another function. cout << power( power( 2, 3 ), 4 ) // 4,096

14

Functions
2. Input and output in functions
In general, it is considered good practice to isolate input and output statements in one particular area of a program. This is because I/O tends to be hardware-specific and it is easier to make changes for a different machine platform or display device if all the I/O code is in one place. When writing small programs in a learning situation, it is not always easy to follow this guide for best practice. But, wherever possible, try to confine I/O to one or more suitable functions rather than spreading it across the program in a number of functions whose primary purpose is not I/O. In particular, it is not good practice to carry out I/O in low level functions. The reason for this is that a function that may be re-used many times in many different programs cannot know how the calling program wishes its output to be displayed, whereas the calling program does know this. Different operating environments have different ways of displaying output to the user of the program, so a low level routine that displays output for a character console could not be used in a program that runs in a windowing environment.

3.

Multi-function programs
There must always be a function called main in any C++ program. There may be any number of other functions in the same source program file (or indeed in other source program files). The question then arises - where do you put these other functions? C++ does not allow functions to be nested within other functions (unlike Pascal and Modula-2). So additional functions may appear textually either before function main, or after it. When the compiler scans the source text of a program, it will flag an error if it finds a call to a function whose definition it has not yet encountered. So if a function is defined after main, then a function declaration must appear before the point at which the call is made. This declaration (also known as a function prototype) should normally be placed at the start of main giving the compiler sufficient information to enable it to check that the function has been called properly. This prototype will consist only of the return type, the function name, and the types of its arguments. // fun01.cpp // illustrates the placing of functions in relation to main // tdc 28/09/95 #include<iostream> int add(int a, int b ) { return(a + b); } // this placing is deprecated

int main(void) { // int mult(int a, int b); // prototype commented out int x = 10, y = 3; cout << add( x, y ) << endl << mult(x,y) << endl Error: Function 'mult' should have a prototype in function main() return(0); } int mult(int a, int b) { return (a * b); } Function add has been placed before main contrary to the recommendation for best practice above.
15

Functions
The prototype for function mult has been commented out, causing the compiler error. Removing the comments allows the program to compile successfully. Different organisations may set their own 'house' styles, but we will show the full definition of functions after main with prototypes normally appearing as the first definitions within the body of main. Note that the identifiers a and b in the prototype for mult are not essential. The prototype could have been int mult(int, int); // prototype with argument identifiers omitted But the argument identifiers may be included if they aid the understanding of their purpose. The compiler will also flag an error if the prototype does not match the formal definition as regards either its name, or its number and type of arguments. But it will not detect a difference between the return type as declared in the prototype and as defined in the formal definition. If there is such a difference then a run-time error is likely to result.

4.

Stepwise Refinement (or Top-down design)


When designing the solution to a problem it is normal to set out in logical order the steps that need to be taken. Stepwise refinement is a technique for tackling the problem of program complexity by breaking a task down into steps, each of which is implemented by a function. Each of Function main() these functions are then further refined by breaking each of Step 1 Step 2 them down into a series of Step 3 steps implemented as Step 4 functions, and so on. Initially, the design process can be approached by using a PDL (Program Description Step 3.1 Step 1.1 Language). We do not Step 2.1 Code Step 3.2 Step 1.2 Step 3.3 introduce a formal description of such a language, it is better left flexible so that you can use a structured type of natural Code Step 1.2.1 Code Code Code Code language. Read Skansholm pp 20-26 for an example of Topdown design. When you find that it is impossible to specify any further steps in the process Code of functional decomposition without using commands of the programming language, then you have taken the functional decomposition process as far as it can go. You are then ready to translate your natural language description into C++ source code. Initially, your programs will be short and simple and you will wonder what all the fuss was about. But when you have to tackle a large problem you will, I hope, see the point. Initially, you will be unfamiliar with the syntax of C++, so it will be extremely difficult for you to express a solution to a difficult problem directly in the programming language. In these circumstances, it is essential that you develop the habit of expressing a solution in natural language before attempting to write the code. Note that, in the schematic diagram above, those boxes (functions) which consist entirely of Step x.x should not be assumed to consist entirely of function calls without any other

16

Functions
code. They may well contain constructs such as branching (if, else) and loops (while etc.) within which the subsidiary functions are called.

5.

Automatic variables
Variables declared within a function are called local variables and have the default storage class automatic (auto is the key word). Since this is the default, the storage class does not have to be given and it is normal to omit it. There are other storage classes that will be dealt with later. Scope is an important topic since the scope rules determine the visibility of objects. If an object is not visible, it cannot be changed. Your Unix password is invisible to others because, if others had access to your account, you do not know what they could do. They might let you have useful comments about your work. On the other hand they might change it, or delete it. The scope mechanism is employed to reduce the chances of errors in a program caused by some other programmer (or even yourself!) from inadvertently corrupting the program as a result of changing an object to which he/she should not have access. This is part of the concept of encapsulation which we shall cover in more detail in the second Semester. For now, work on the principle that functions should not, as a rule, use or modify global variables. As an example, if function x requires a variable to control a loop, declare that variable locally within the function. In that way, only errors within the function itself can cause the loop to run incorrectly. If a global variable were used for this purpose, there is a possibility of it being changed from outside the function while the loop is executing causing errors that can be very difficult to identify and correct. Similarly, although there can be exceptions, functions should not modify global variables directly. Instead this should be done via arguments. More about how in a later lecture. An obvious corollary to the lack of visibility of a local variable from outside the function is that variable names may be duplicated within different functions without any clash.

6.

Function values
A fairly obvious point - the value appearing after return should be of the same type as that in the definition. Thus int add(int a, int b) should, in its return statement, return an integer value. You have been doing this for some time in function main. As mentioned earlier, it is possible for a function to accept no arguments, or to return no value. In either of these cases, the reserved word void should be used, e.g. void dosomething(void) is a function which neither accepts arguments nor returns a value. In this case it must not have a return statement, and a call to it must be used differently to reflect the fact that no value is returned. dosomething(); result = dosomething(); // i.e. a statement, not an expression // wrong

7.

Function arguments
These are a means of passing information to a called function. It is also possible for a function to pass information back via its arguments and this will be dealt with later. Arguments are a comma-separated list of type/identifier pairs appearing within the parentheses after the function name, e.g. (int a, int b) as in function add above. Naturally, the number and type of the actual arguments supplied in the call must match the number
17

Functions
and type of the formal arguments with the exception of default arguments (see Default arguments on page 32. The function may modify the values of its arguments, and this will have no effect on the values of any actual argument variables used in the call. Remember that the values of the actual arguments are copied into the formal argument identifiers. This is the pass-by-value argument mechanism. The actual arguments may be any expression of the correct type. This includes a literal constant, e.g. 9.0, a variable, e.g. f, or even a call to another function which returns a value of the correct type, e.g. cout << sqrt( sqrt(81.0) ); // outputs 3.

8.

Function argument agreement & conversion


Automatic type conversion is carried out when actual arguments are copied into formal argument variables in just the same way as that carried out during assignment. Generally speaking you should not rely on this. Instead always pass the correct type as actual arguments.

9.

Overloaded function names


First you should recognise that an operator (e.g. +) is just a function specified in a different way i.e. normally in infix form. Thus the arithmetic expression a + b in infix form is just a different (and more convenient) way of expressing the function add( a, b ) which is in prefix form. Assuming that add has been declared as:int add ( int a, int b ); then both a + b and add( a, b ) are expressions which have the value of the sum of a and b. In most programming languages, some operators are overloaded, e.g. the '-' operator can mean
! ! ! !

unary negation binary subtraction of integers binary subtraction of floats binary subtraction of long int

e.g. e.g. e.g. e.g.

-1 3-2 4.5 - 3.2 123456789L - 123456788L

We are allowed to use the same operator for semantically similar operations because it is convenient to do so even though the actual computation required is quite different - the compiler determines which computation to perform based on the type of the operands. But many languages will not allow the corresponding functions to have the same name, e.g. subtract( int, int ) - a function accepting two arguments of type int would not be permitted to exist in the same scope as subtract( float, float ) - a function accepting two float arguments. This is illogical. Fortunately for us, C++ does permit overloading of function names provided that they can be distinguished by their signatures i.e. the number and type of their arguments. You have already seen this with the standard output stream cout that has a function << that accepts an argument of any one of the fundamental types. The language allows the function << to be declared in such a way that it can be used as an operator. Note that functions with the same name must be distinguishable by their number and type of arguments. The function return type is not taken into account in determining whether they are different. void print( int, int ); void print( float, float ); int print( int, int ); // OK. different argument types // error erroneous redeclaration, the return type is // not considered

18

Functions
10. Reference Arguments
This will be dealt with in a subsequent lecture.

11. Function comments


Each function should be provided with one or two lines of comment after the header describing what it does and any special assumptions that it makes about any arguments passed to it. The formal way to do this is to provide pre and post conditions which specify:pre assumptions the function makes about the value of arguments passed to it and any other relevant conditions. There is no need to include assumptions about the types of the arguments since the compiler will check these. post the state after it has accomplished its task. This may include any limitations on the return value, how unusual situations are flagged etc. These pre and post conditions then form a contract between the caller and the function. The caller guarantees to meet the pre-conditions and the function guarantees to satisfy the post-conditions. If the caller fails to meet his side of the contract (i.e. he does not meet the pre-conditions), then all bets are off, and the function is relieved from meeting the postconditions. Some language designers consider that this concept is so important that it should not be dealt with merely by comments. They have therefore incorporated pre and post conditions into the language so that they can be checked at run-time, raising an exception if the contract is broken. Eiffel is an example. Large programs have to be broken down into smaller and more manageable components in order to deal with their complexity and to allow teams of programmers to work on them. The separate components can be tested individually with a range of inputs to ensure that they behave as specified. But what happens when they are put back together again? Will all these components work together? Or will there be discrepancies arising from a misunderstanding on how the parts interrelate? The ability to check the interaction of these components at run time can provide significant advantages in terms of quality and reduction of debugging time.

12. Summary
We have looked at functions which may have formal arguments or should have the word void in the formal argument list to indicate that no arguments are required. Functions normally return a value via the return statement, and the type returned must agree with the return type provided in the definition. Functions are called by name, passing actual arguments whose values are copied into the formal arguments. Since a call to a function that returns a value is an expression (i.e. it has a value), a function call may be used in any case where an expression is expected. It is recommended that function definitions appear after the function main. This requires that function prototypes appear as the first lines of function main. Functions whose prototypes are supplied in main are private to main, i.e. the prototypes serve the requirements of main and no other functions. If there are other functions, defined after main, and before the functions they wish to call, then they will not be able to do so. There are two solutions:-

19

Functions
! !

Ensure that the definitions of the functions to be called appear before the definitions of the functions that wish to call them. Provide prototype declarations for the called functions before main so that they have file scope and can therefore be called from anywhere in that file.

Local variables of a function usually have the storage class auto and are not visible to code outside the function. They cease to exist after the function terminates. The formal arguments are also invisible from outside. Changes to formal arguments that are passed by value and changes to local auto variables have no effect outside the function, and their identifiers may duplicate identifiers appearing elsewhere in the program. Functions are one of the weapons that C++ provides in the war against complexity and the errors that this complexity may bring with it. They are an example of procedural abstraction and allow a program to be designed as a hierarchy of functions that progressively refine the problem by breaking it down into smaller problems. Large programs must be designed on paper using this process of stepwise refinement before the program is written. A suitable tool for this design process is a PDL (program description language), one variant of this being known as Structured English. Libraries of frequently used routines (functions) can be written and a very large number of libraries are provided with all compilers, each library containing a number of functions. Pre and post conditions provided as comments at the head of the function are an important way of specifying what they do and how they are to be used. This helps to ensure that, when a large number of tested functions is finally brought together to form a program, the various parts work together as specified. Ideally, input and output should be isolated in a limited number of functions designed for that purpose and not scattered about over many functions whose primary purpose is not I/O. Generally speaking, functions should not modify global variables and should never use global variables for such local uses as loop control.

20

Flow of Control Flow of Control


1. The type cast operator
The typecast operator provides the possibility of forcing an expression into another data type by using the name of the new type as though it were a function. For example, in a program to calculate the statistics on a sequence of integers, the mean can be calculated from the integer total of the numbers divided by the count of the number of items (where mean is a float) by:mean = float(sum) / float(count); The new C++ standard has introduced four new operators that carry out explicit conversions from one type to another. Of these four, only static_cast is introduced. It is intended to be used for conversion between similar types, e.g. between char and int, between int and enum, and between float and int. Example:mean = static_cast<float>(sum) / static_cast<float>(count); Explicit type conversions are error-prone and a large proportion of program errors is due to them (Stroustrup). The virtue of the new operators is that they are easy to search and find in large program source files, whereas the earlier example float(sum) could be very difficult to find.

2.

The comma operator


A sequence of statements can be separated by commas. The last statement or expression provides the value of the sequence, e.g. s = ( t = 2, t + 3 ); t is assigned the value 2, then the expression t + 3 is evaluated to 5, and this last expression provides the value assigned to s. This device can be used to include, for instance, a number of statements in a while loop condition. The value of the last expression is the value of the condition. while( cin.get(ch), !cin.eof() ) An attempt is made to read a character from standard input and a test is made to see if a character could not be read because the end of the file has been reached. The value of the loop continuation condition is that of the test !cin.eof() and, if this has the value false, then the loop terminates. If the input stream is empty then the loop is never entered.

3.

The conditional operator


This consists of 3 expressions separated by a '?' and a ':' expression1 ? expression2 : expression3 The first must be a logical expression, i.e.. yielding either true or false. If expression1 is true, then the value of the whole expression is the value of expression2, otherwise the value is that of expression3. This could have been used in defining function max:Example 1 int max( int a, int b ) { return( a > b ? a : b ); }
21

Flow of Control
Equivalent to:if ( justify == 'L' ) cout.setf( ios::left ); else cout.setf( ios::right );

Example 2 cout.setf( justify == L ? ios::left : ios::right );

4.

The for statement

In most programming languages, the for iteration construct is suitable mainly for loops whose number of iterations can be determined in advance. In C++, the for loop is much more general and can, in fact, be employed for any loop including while and do. The syntax is for ( expression1; expression2; expression3 ) statement_block; where:expression1 may consist of one or more statements (separated by commas) that initialise the loop. e.g. count = 1, max = 10; A new variable declaration may be made here whose scope will extend to the end of the for loop block:- e.g. int count = 1, max = 10; expression2 is a logical expression which determines the continuation of the loop (in the same way as in a while loop) e.g. count <= max. This expression may consist of several logical expressions connected by the boolean operators && (and) and || (or). is a statement or statements which will be executed at the end of each loop iteration. Normally this is used to modify the loop control variable, e.g. count++ is either one statement, or more than one statement surrounded by braces. The single statement may be empty.

expression3

statement_block

If any of the 3 expressions is missing, the semi-colon separator must remain to show its absence. Examples a) for( ; ; ) cout << "hello" << endl; runs for ever, printing "hello" on a new line each time b) for(bool forever = true; forever ; ) cout << "hello" << endl; behaves as a) because forever is forever true c) for ( cin.get(ch); !cin.eof() ; cin.get(ch)) cout.put(ch); this is the same as:cin.get( ch ); while( !cin.eof()) { cout.put(ch); cin.get(ch); }
22

or while( cin.get(ch), !cin.eof() ) cout.put(ch);

Flow of Control
Note that, in example b) above, bool forever in the first expression is the declaration of a new boolean variable. It is convenient and makes programs easier to read if the declaration of variables is as close as possible to the point where they are used. This facility is one of the improvements over the C language provided by C++. Because of its versatility, there is a tendency for programmers to use the for loop exclusively and to ignore the while loop. However, the latter is designed to deal explicitly with cases where the loop should not be entered at all under certain conditions (e.g. when processing a file which may be empty). Although this condition can be handled by for as shown above, its primary purpose is for loops whose number of iterations can be determined before it is entered e.g. when processing arrays (to be covered soon). The very fact that a while loop is being used signals that it may never be entered whereas, in a for loop this fact can only be determined by inspection of its expressions.

5.

The do statement
In a limited number of cases, processing requires that the loop condition is tested at the bottom rather than at the top of loop. In other words, the statement(s) in the loop body will always be executed at least once. The format of the do loop is:do statement_block; while ( expression ); where expression is a logical expression yielding either true or false. As with all loop statements, if statement_block comprises more than one statement, it must be enclosed in braces:do { statement_1; statement_2 ; ... } while ( expression );

// Note: the test normally appears on same line as the // closing brace

6.

Nested loops
Frequently, a loop is nested within another loop or loops. The reasons why this might be necessary will become clearer when arrays are covered. Notice that the total number of iterations of the inner loop is the product of the number of its iterations and those of any surrounding loops. This number can escalate to very large values and can result in programs that run slowly. for ( int i = 0; i < 10; i++ ) for ( int j = 0; j < 10; j++ ) for ( int k = 0; k < 10; k++ ) process( i, j, k )
;

Function process is called 1,000 times.

Sometimes, it may not be obvious how many potential iterations of the inner statement will occur because, for instance, the second and third lines above may consist of function calls that, themselves, contain a loop. You should always be aware of the possibility of introducing inefficiencies into a program in this way because it may result in unacceptable performance.

23

Flow of Control 7. The break statement


This statement is used to alter the flow of control in loop statements (for, while and do) and in the switch statement (see below). Its effect in loops is to cause immediate exit from the loop in which it appears. This might be required if some abnormal condition occurs that requires that no further iterations of the loop should be made. The abnormal condition would normally be detected by an if statement. If the loop is nested within another loop then control will return to the immediately surrounding loop.

8.

The continue statement


This complements the break statement and causes an immediate switch of control to the test part of the loop in which it appears, thus ignoring any remaining statements that appear after it in the loop body. Its use is deprecated.

9.

The switch statement


Where a choice needs to be made from a number of possible states, the if else statement can become cumbersome. The switch statement is a more compact and readable alternative. The syntax is:switch ( expression ) { case constant_expression1 : statement_block; case constant_expression2 : statement_block; case constant_expression3 : statement_block; ... default : statement_block; } Where:expression is an expression yielding an integral value, i.e. int, short, long, unsigned or char (but excluding floats and arrays), e.g. a typical menu selection might include:cin.get(ch); switch( toupper(ch) ) are some possible values of expression, e.g. 'P', 'D', 'E'. They must be constants, either literal or symbolic - see the example below. is a statement or sequence of statements which will normally end with the break statement. The effect of break is to cause control to jump out of the switch statement and not to execute any statements in the following cases. If break is not present, then the statements in subsequent cases are executed until either a break statement is encountered, or the end of the switch statement is met. if none of the cases is met, the statements in the default section are executed. It is wise always to include this so as to deal with all other possible values of expression.

constant_expression1,2,3 ..

statement_block

default

24

Flow of Control
Example (this program is installed in the lab) int main(void) { void DoPrint( void ); void DoDisplay( void ); void DoEdit( void ); const char EDIT = 'E', DISPLAY = 'D', PRINT = 'P', QUIT = 'Q'; cout << "Enter choice P)rint D)isplay E)dit Q)uit : " << flush; for( char ch = '\0'; ch != QUIT; ) // ch initialised to null. While ch == null { ch = getch(); // getch from conio.h - char input without echo to the // display switch ( toupper( ch )) // toupper from ctype.h { case PRINT : DoPrint(); break; // assumed functions DoPrint etc. are // defined elsewhere case DISPLAY : DoDisplay(); break; case EDIT : DoEdit(); break; case QUIT : break; default : cout << '\a' << endl; ch = '\0'; // invalid response, // sound the bell } cout << "Enter choice P)rint D)isplay E)dit Q)uit : " << flush; } return(0); }

25

Pointers, References and Functions Pointers, References and Functions


1. Introduction
A variable is a symbolic name for a location in memory that holds a value of the relevant type. In the assignment statement:x = y; the meaning of the use of the variable names x and y is different. The use of variable y means the value currently stored at the memory location known as y. The use of variable x means the memory location known as x. Thus the whole statement can be understood as meaning obtain the value stored at the memory location known as y and store it in the memory location known as x. The current value stored in x is neither needed nor accessed, and is overwritten by the assignment.

2.

Reference Type
We introduce a new data type the reference whose value is not an integer, float, char etc. but a reference to a variable which holds an integer, float, char etc. It is an alias for another object. Alias means another name for. Example:k b

Assume the following declarations int k = = 5; k;


Address Contents 46524 5 Address 75145 Contents 46524

int& b

and assume that variable k is stored in memory location number 46524. The value stored at memory location 46524 is 5.

Variable b, a reference variable, is declared to be a reference to variable k. It will therefore hold as its value the memory location of k, thus referencing the value of k, i.e. 5. Any assignment of a new value to k will therefore affect the value referenced by b, and any change to the value referenced by b will change the value of k. Note that the special symbol & is placed after the type (int) in the declaration of b and that this must be followed by an initialisation using a previously declared variable (not a value) of the correct type. Once this declaration and initialisation has been made, b is behaves exactly as though it were an ordinary variable. The compiler looks after the necessary indirection so that e.g. the assignment:b = 12; is interpreted as 'assign the value 12 to the memory location referenced by b'. After this assignment, k also has the value 12, and after the further statement b++, k has the value 13. Note that b++ does not (in contrast to a pointer) increment the value stored in b, i.e.. it does not change 46524 to 46525. Once a reference variable has been associated with another variable in this way, it cannot be changed so that it refers to a different variable. Thus &b = m intended to mean 'change b so that it is now an alias for m instead of k' is not allowed.

27

Pointers, References and Functions


3. Pointers v References
Pointers are carried over from C, and are, in part, superseded by the reference type. However, many C libraries use pointers and the type has been retained for compatibility purposes and for their importance in building dynamic data structures. Some books describe references in abstract terms, and pointers in concrete terms. Pointers, they say, are variables which hold the address of another variable. But, in fact, this is exactly what references hold as their value. The differences are:!

Abstraction The fact that references hold an address does not need to be known in order to use them, whereas you must take specific action in order to make a pointer point to some other object and to obtain the value of the object pointed to (see Syntax below). Syntax Pointers require special symbols to be used by the programmer !

to assign to a pointer the address of another object i.e. to make it point to it - use the address operator & to yield the value of the object to which a pointer points, known as dereferencing - use the indirection operator *

Reference variables, once declared are treated as ordinary variables without the use of special symbols. The necessary indirection is looked after by the compiler. References are at a higher level of abstraction than pointers. A further difference is that pointers can be reassigned at will to point to another variable and can be incremented to step through memory. They are a much lower level tool than references as befits their origin. References cannot be reassigned to point to a different object.

Pointer variables
int k = 5; int* ptr int* ptr = &k; OK. Declaration of a pointer to int named ptr declaration and initialisation combined using address operator & using indirection operator * to assign 12 to the variable to which ptr points prints 12 prints 12 using dereferencing prints the address of k e.g. 46524 increments the address held by ptr which now references the memory location immediately following that of k. (k is unchanged) int k = 5; int& ref; int& ref = k;

Reference variables
illegal. Must be initialised on declaration declaration and initialisation assignment to the variable referenced by ref (no special syntax needed) prints 12 prints 12 increments the variable referenced by ref, k now has the value 13

*ptr = 12; cout << k; cout << *ptr; cout << ptr; ptr++;

ref = 12; cout << k cout << ref;

ref++;

28

Pointers, References and Functions


4. Enumeration Types
It is valuable as a documentation tool to use symbolic names for constant values in programs. The classic case is pi which can be given a symbolic name by const double pi = 3.14159265359; If however you need to model a real world object that may take on any one of a set of know values, then you can declare an enumeration type enum dow = { SUN, MON, TUE, WED, THU, FRI, SAT }; dow day1, day2, day3; Creates an new data type dow (day of week) and declares 3 variables ( day1, day2 and day3 ) of this type. Note that the enumerated values are not strings. They are simply constant numeric values that commence with 0. A further possible use for an enumeration type is to describe the different states that a program may be in at any one time. In this type of program, the processing e.g. of input will vary depending on the current state, and certain types of input will have the effect of changing the state. An example of this type of processing is reading a data file that consists of several lines, each containing a description and a number. The description may include numeric digits, so it is enclosed in quotes:3D Drawing Program 12 Sprocket Type 4S 31 The states might be described using an enumeration as follows enum State = {IN_NAME, IN_NUMBER, BETWEEN}; State state = BETWEEN;

5.

The typedef statement


Allows the definition of new data types based on the fundamental types of the language. The new type is just an alias for the base type and cannot be given any attributes that are different from the original. typedef float real; real length = 10.56; typedef int* intptr; // creates a new type real based on float. // type intptr is a pointer to int

There may not seem to be a great deal of value in this mechanism until we meet compound data types, e.g. array and struct.

6.

Reference arguments to functions


The reference type is rarely used in the way described in para. 2. It is intended primarily to be used in function formal arguments. Earlier, we looked at simple functions and noted that functions in programming languages are not 'pure' in the mathematical sense in that they can return more than one value. The classic example of this is the function that swaps the value of its two arguments. This is frequently used in sorting algorithms. int a = 6, b = 199; swap( a, b ); cout << setw(6) << a << setw(6) << b << endl;
199 6

29

Pointers, References and Functions


This function does not need to have a return value, but it must return the changed values of its two arguments. This is accomplished by using reference argumentsvoid swap( int& x, int& y ) { int temp; temp = x; x = y; y = temp; // classic swap algorithm. Needs a temporary variable } So what is happening here? The actual arguments in the call swap( a, b ) are the variables a and b. The formal arguments are defined to be references to integer ( int& x, int& y ). When the function is called, the compiler recognises that the function is expecting references to integers and not integer values, so it copies into x and y, not the values 6 and 199, but references to the variables a and b which hold these values. When the swapping is carried out in the body of the function, the values that are swapped are those of the variables referenced by x and y, namely a and b. This is because x and y are aliases for a and b, so anything done to x and y is actually being done to a and b! For this reason, the function may only be called with variables and not with literal constants, e.g. swap( 6, 199 ); would be an error. This is the mechanism provided by C++ to allow a function to return values via its arguments. Not all of the arguments need to be reference arguments. A function to convert a time in seconds held as a long int (first argument) into hours, minutes and seconds (the remaining 3 arguments) will have the first argument as a value parameter and the remaining three as reference parameters. void time2hms( long t, int& h, int& m, int& s ) Sometimes formal arguments are referred to as IN, OUT and INOUT arguments. In the case of function swap, both arguments were INOUT, whereas in time2hms, the first argument is IN, and the next three are OUT. So both OUT and INOUT arguments need to be defined in the function declaration as reference arguments. Notice that, in the prototype of any function, the argument identifiers may be omitted as shown below, but notice that the absence of the identifiers makes it more difficult to understand what the function does without further comment being provided. void time2hms( long, int&, int&, int& ); // prototype for time2hms

7.

Pointer arguments to functions


Although there is generally no need to use pointer arguments to functions because reference arguments can do the same job, it is still common to find them - particularly if they were originally written by 'C' programmers. In addition, many functions in the 'C' libraries accept and return pointers. A further consideration is that pointers are frequently used to access successive components of an array rather than by the conventional means (array indexing - to be covered later).

30

Pointers, References and Functions


Example 1 A typical example of C code:void makeupper( char* s ) // converts a string to upper case by using a pointer to access the components of the // array { char *p = s; // local pointer variable p given value of s, // i.e. p now points to the first character of the string s while ( *p ) // while char pointed to by p != '\0' - the ASCII NUL string // terminator { if (*p >= 'a' && *p <= 'z') // if the char pointed to by p is lower case *p += ('A' - 'a'); // convert to upper case p++; // increment pointer to look at the next char } // since p points to the same string as s and s is a // pointer to the actual array argument, the actual // argument has been converted to upper case } ... char name[] = "i am all in lower case"; makeupper(name); cout << name << endl; I AM ALL IN LOWER CASE Note that, in 'C' and C++ an array passed as an argument to a function is always passed as a pointer to the first element. Example 2 The 'C' string library cstring or string.h contains a number of functions operating on 'C' style strings which accept pointer arguments and some of which return pointer results, typical ones are:char *strcat(char *dest, const char *src); // concatenates 2 strings returning a // pointer to the result. dest has been // modified // copies src into dest returning a // pointer to dest as result

char *strcpy(char *dest, const char *src);

An example of the use of these two functions is:char source[25] = "GNU"; char *blank = " ", *cplus = "C++"; char destination[25]; char *p = destination; // p points to the string destination p = strcat(source, blank); // concatenate a blank onto source. p points to source strcat(source, cplus); // concatenate "C++" onto source strcpy(destination, p); // copy the result back into destination. p still points to // source which has been changed. cout << "destination = " << destination << endl; destination = GNU C++

31

Pointers, References and Functions


8. Default arguments
Sometimes we need to provide an argument that enables the caller to change the default behaviour of the function. Where the default behaviour is not to be overridden, then there should be no need to provide this argument. C++ permits a default argument value to be specified in the function declaration and, if this argument is not supplied by the caller, then the default value is used by the function. If the argument is supplied, then it overrides the default. In the case of one default, it must be the last. In the case of two defaults, they must be the last and last but one etc. The default must be supplied only once - in the declaration (prototype), and should not be repeated in the function definition. Assume a function is to print to the stdout a number of lines of a file. The default is 4 lines, but this may be overridden by supplying an argument specifying a different number of lines. void printfile( char filename[], int numlines = 4 ); void printfile ( char filename[], int numlines ) { ... ... } printfile( "fred.cpp", 10); printfile( "jim.cpp"); // prototype // definition

// overrides default with 10 // default of 4 is used

9.

Inline functions
Calling a function has an overhead that costs time. The runtime system has to set up a 'stack frame' and allocate space for the arguments and local variables. On termination, the stack frame has to be released and a jump made to the point immediately after the call. Very small functions can be specified as 'inline' so that the compiler will substitute the actual code of the function body for each occurrence of a call to the function. This will improve speed at the expense of code size. In fact, the use of inline is a recommendation only, and there is no guarantee that the compiler will honour it - this will depend on the compiler and the size of the inline function. int main ( void ) { inline int square( int ); // prototype ... z = square( x ); // compiler should substitute z = x * x ... } int square( int a ) { return ( a * a ); } A test of the above program was timed for 100 million calls to function square. The elapsed time without inlining was approx 3.9 seconds and, with inlining, approx 3.05 seconds - an improvement of 20%. The code size was increased by a very minor amount because the call to function square occurs only once.

32

Pointers, References and Functions


Note that the GNU compiler does not care whether the keyword inline occurs in the prototype, in the function definition or in both places. To achieve inlining the compiler optimisation switch -O has to be set. In RHIDE change the option Options.Compilers.Optimizations -O to 1

10. Mathematical functions


See Libraries on page 116

33

Arrays Arrays
1. Introduction
Arrays are an aggregate type capable of holding a number of values all of the same type, contiguously in memory. The components may be any one of the fundamental data types int, long, unsigned, float, char, enumerated, pointer or one of the aggregate types, i.e. array, struct or class. The struct and class types have not yet been covered. The struct is referred to in other languages as record and consists of one or more fields of (possibly) different types (including arrays and records). The class data type will be covered in the Object-Oriented Programming & Design module. The advantage of the built-in array type is that a large number of data items can be held in a single named array variable whose components can be accessed randomly as we shall see later. The disadvantage is that its size is fixed at compile time and this cannot be varied at run time to accommodate the fluctuating requirements of the application. Most of the time, therefore, it is wasting space because it is not full and the type itself does not allow resizing. The solution, as we shall see later, is dynamic memory allocation.

2.

Defining and referencing arrays


The syntax for the definition of an array is type_specifier name[number_of_elements] where type_specifier name number_of_elements is the data type of the components. is an identifier conforming to the normal requirements for identifiers. is the total number of components that the array is to be capable of holding. This value appears in square brackets and may be a literal e.g. 6, or a previously defined constant e.g. numelements where numelements has been defined as const int numelements = 6;

Example
0 9 1 14 2 7 3 5 4 1 5 3

An array of integer with 6 elements

int table[6]; float temperatures[31]; char name[16];

an array called table capable of holding 6 integers an array called temperatures capable of holding 31 floats an array called name capable of holding 16 characters (but note that, allowing for the terminating NUL character, only 15 readable characters can be held).

Arrays are indexed. That is, each element is uniquely numbered. The numbering always starts at 0 and always increments by 1 for each successive element (regardless of the size of the elements).

35

Arrays
The value held by table element 0 is 9, the value held by table element 1 is 14 etc. Access to the elements (or components) is by subscripting the table name with the desired element number. Thus table[0] is an integer with the value 9, table[1] contains 14 etc. Notice that, since the numbering starts at 0, the last element always has an index one less than the number of elements. The subscripted array can be used anywhere that an expression of the component type is required:const int size = 6; int table[ size ]; table[ 5 ] = 22; table[ 1 ] = table[ 5 ]; cout << table[1];
Change the value of element 1 to that of element 5

output the integer (22) contained in element 1

The subscript may be any expression with an integer value, thus:int i = 3; table[ i ] = table[ size - 1 ];
change the value of element 3 to that of element 5 (the last)

Since the array subscript can be a variable, we can process an array's elements by means of a loop using as subscript a variable that increments for each iteration of the loop:-

2.1

Inputting values to array table


int count = 0, size = 6, anint; cout << Enter an integer: ; cin >> anint; cout << endl; while( cin.good() && count < size ) { table[ count++ ] = anint; cout << Enter an integer: ; cin >> anint; cout << endl; } Note the need to check two conditions:! !

The input is a valid integer The end of the array has not been reached

cin.good() count < size

For this reason, the input is read into an auxiliary variable anint before the start of the loop and before it is assigned to an array element inside the loop. A further input is then assigned to anint at the bottom of the loop.

2.2

Outputting values from array table


for( int i = 0; i < count; i++ ) cout << table[ i ] << endl; Note that, in this example, the condition for the loop to continue is controlled by the number of items entered (count). This might be less than the total number of elements in the array. Attempting to process elements of an array that have not been given a value can lead to unpredictable results.

36

Arrays
2.3 Shuffling array elements one position left (or down)
This requires care to avoid overwriting the changes. const int size = 6; int table[ size ] = { 0, 1, 2, 3, 4, 5 }; // initialised on declaration - see below Original contents 0 1 2 3 4 5

for( i = 1; i < size; i++ ) table[ i - 1 ] = table[ i ]; Shuffled left 1

// shuffle the contents one element to // the left 2 3 4 5 5

2.4

Shuffling array elements one position right (or up)


for( i = size - 1; i > 0; i-- ) table[ i ] = table[ i - 1 ]; Shuffled right 0 0 // traverse the array backwards // shuffle the contents one element to // the right 1 2 3 4

3.

Array initialisation
Arrays may be initialised on declaration by enclosing a list of values within braces, separated by commas. If all elements of the array are given values in this way, the number of elements need not be supplied between the brackets after the array name:int table[] = { 9, 14, 7, 5, 1, 3 }; Multi-dimensional arrays may be initialised by placing braces around each row, and separating the rows with commas (see the definition of type Plane in section 4):Plane aPlane = { { 'X', ' ', 'X', 'X' }, { ' ', 'X', ' ', 'X' }, .... { 'X', 'X', ' ', 'X' } // // // // Row 1 Row 2 etc. Row 12, no comma

}; Where some initialisers are omitted, and the array is not auto, the remaining elements are set to 0. The behaviour for auto (local function) variables is undefined. The number of elements in an array can be found by the built-in sizeof function:cout << "sizeof(table) = " << sizeof(table) << endl << "sizeof(int) = " << sizeof(int) << endl << "num elements = " << sizeof(table) / sizeof(table[0]) << endl; sizeof(table) = 24 sizeof(int) = 4 num elements = 6 But note that sizeof cannot be used in a function to find the size of an array formal argument since this is a pointer.

37

Arrays
4. Multi-dimensional arrays
There is no theoretical limit to the number of dimensions an array may have, although the number of elements increases rapidly with the number of dimensions as do the chances of there being redundant elements. Two dimensional arrays are declared with 2 values, each enclosed in brackets:// airplane reservation system const int maxRows = 12, seatsPerRow = 4; typedef char Plane[maxRows][seatsPerRow]; // declares a new type based on a // fundamental type Plane aPlane; // aPlane is a variable of type Plane void makeEmpty( Plane aPlane) { for( int row = 0; row < maxRows; row++ ) for( int seat = 0; seat < seatsPerRow; seat++ ) aPlane[ row ][ seat ] = ' '; // Space = empty } Functions that operate on the Plane data structure bool seatFree( Plane aPlane, int row, int seat ); // return true if row,seat is a space, else false void allocateSeat( Plane aPlane, int row, int seat ); // mark seat allocated with an 'X' void showSeatingPlan( const Plane aPlane ); // show plan with spaces and Xs as opposite
1 2 3 4

Seat
1 2 3 4

X X X X X X

Row

5.

Arrays as function arguments

11 12

An example of a 2 dimensional array aPlane of type Plane being passed to a function appears in 4 above. In C++, an array formal argument to a function is always a pointer to the first element of the array. This is automatic without any action on the part of the programmer. Within the function, the array may be subscripted in the normal way. This explains why, in the function makeEmpty above, it was not necessary to use a reference argument to ensure that the changed value of the array was passed back to the point of the call. Since a pointer is passed automatically, any change to the formal argument within the function body is, in fact, being made to the actual argument. If it is not intended that the function should modify its formal argument, then the argument should be const modified to indicate the fact. The compiler will then flag an error if the function body contains statements that might modify the formal argument. void showSeatingPlan( const Plane aPlane )
aPlane is a constant and may not appear on the LHS of an assignment within the function.

38

Arrays
6. Pointers and arrays
This has already been introduced under pointers. Note that an array name unqualified is treated by the compiler as an address, so const int size = 6; int table[size] = { 0, 1, 2, 3, 4, 5 }; int *ptr = table; cout << *ptr *ptr = 10 ptr++ cout << *ptr cout << *(table + 3) cout << table[3] // assigns to ptr the address of the first element of table // outputs the object to which ptr points, namely the // integer 0 // changes the value of table[0] to 10 // moves ptr to point to the next element of the array // outputs 1 // outputs 3 // same as above, outputs 3

Unlike most other languages, C++ supports pointer arithmetic and, since table is a pointer, a variable can be used to indicate an offset from the beginning for ( int i = size - 1; i > 0; i-- ) *( table + i ) = *( table + i - 1 );// shuffle contents one element to the right or, using a supplementary pointer for ( int* p = table + size - 1; p > table; p-- ) *p = *( p - 1 );
address of table + size(6) - 1 elements = address of last element The compiler knows the size of an int, so p-results in p being adjusted by sizeof(int), i.e. by 2 or 4 bytes on a PC (depending on the compiler), similarly with p - 1

While the address held by p > the address of table

39

Arrays
7. Character strings and variable pointers
Notice the difference between char word[] = "hello" and char *greeting = "hello". word is a constant address where the string is stored. greeting is a pointer containing the address at which the string is stored. char word[] = "hello"; char *greeting = "hello"; cout << "word[] = " << word << endl; cout << "greeting = " << greeting << endl; word = "fred"; strcpy(word, "wilfred"); greeting = "william"; cout << "word[] = " << word << endl; // OK. No problem // OK. No problem

compiler error: "incompatible types in assignment of 'char[5]' to 'char[6]'" because word is a constant pointer and can't be assigned do this instead, but note that, if the new string is longer, the extra chars are stored outside the array's allocated memory and may cause the program to crash

cout << "greeting = " << greeting << endl;


OK because greeting is variable pointer. Fresh memory is allocated for the new string and greeting is changed to point to the new location.

8.

Character string input/output

As shown above, inserting into the output stream either the name of a character array e.g. word or a pointer to a character string e.g. greeting has the same effect. setw(<field_width>) causes a string to be output right-justified in field_width. It can be left justified by the manipulator cout.setf( ios::left, ios::adjustfield ); or by setiosflags(ios::left) as in cout << setiosflags(ios::left) << setw(10) << word << endl; cin can be used for string input, but terminates at the first whitespace character (space, tab). To avoid possible overflow by the input exceeding the space allocated to the string, setw can be used within cin to limit the number of characters entered. The excess characters are held in the input buffer and are used to satisfy any subsequent use of cin. const int MESSAGESIZE = 4; char input[MESSAGESIZE+1]; cout << "Enter a message without spaces: "; cin >> setw(MESSAGESIZE+1) >> input; char overflow[80]; cin >> overflow; cout << "your input: " << input << endl << "the overflow was: " << overflow << endl; To input lines of text whose length is unknown at compile time, use cin.getline( char *line, int limit, char delim = '\n' ) The input is restricted to limit characters (e.g. 80 for a typical line of text) and is terminated by the supplied delimiter that defaults to newline and may be omitted to use the default. The terminator is not stored in the array. The address at which the line is stored is held in the pointer line

40

Arrays
const int linelen = 80; char line[linelen+1]; cin.getline( line, linelen ); while( !cin.eof() ) { cout << line << endl; cin.getline( line, linelen); }

// excess chars over 80 discarded

// output the line

9.

Arrays of pointers and pointers to pointers


Arrays of pointers can point to different arrays whose declared lengths differ. Thus arrays of pointers to char can accommodate jagged arrays i.e. arrays of string whose lengths are different - not just different in the number of characters held, but also in the numbers of elements allocated in memory. char *ptr[4] = { "one", "two", "three", "four" }; Assume that the address held in ptr[0] is 36714 Using the Borland C++ Debug Inspect4 menu item:// array of 4 pointers to char
ptr[0] ptr[1] ptr[2] ptr[3]
36714 36718 36722 36728 o t t f n w h o e \0 o \0 r u e r e \0 \0

-------- Inspecting ptr ------8F50:0FF0 [0] 8F4C:001E "one" [1] 8F4C:0022 "two" [2] 8F4C:0026 "three" [3] 8F4C:002C "four"

36714 36718 36722 36728

This makes for efficient use of memory when storing large numbers of strings. The 4 arrays of char are allocated contiguously in memory and the above could be viewed as follows:o n e \0 t w o \0 t h r e e \0 f o u r \0

ptr[0] ptr[1] ptr[2] ptr[3]

36714 36718 36722 36728

Printing this array of pointers can be done by for (int i = 0; i < 4; i++ ) cout << ptr[i]) << endl ;

The GNU C++ debugger built into RHIDE does not support inspect

41

Arrays
10. Command line arguments
You have already encountered programs that accept command line arguments, e.g. dir /w. Dir accepts an argument w that indicates a wide display of file names. The slash is just an indicator that an argument follows. MS DOS provides the facility for programs to pick up arguments supplied at the command line when invoking a program. For example pretty.exe might be a C++ program to 'pretty print' C++ source files, in the command line invocation pretty myprog.cpp the argument myprog.cpp represents the name of the source file to be printed. In C++, information about these command line arguments is provided by 2 arguments to function main named by convention:! !

int argc char *argv[]

the number of arguments (including the name of the executed program) an array of pointers to char representing the strings appearing on the command line.

In the above example, argc = 2, argv[0] is a pointer to the string "pretty", and argv[1] is a pointer to the string "bacteria.cpp". Whitespace on the command line separates the arguments into the individual components of argv[]. Thus a command line containing myprog /x/y/t myfile would represent 3 arguments, with "myprog" in argv[0], "/x/y/t" in argv[1] and myfile in argv[2], whereas myprog /x /y /t myfile would produce argc with the value 5 with argv[0] holding the string myprog the four arguments /x, /y, /t and myfile held in elements argv[1], argv[2], argv[3] and argv[4] respectively. What these arguments mean, of course, is up to the author of myprog. It is good practice to check the number of arguments in main and, if the number falls outside the number expected (often a variable number of arguments can be entered), an error message is issued and the program terminates. If no arguments are supplied (other than the program name, of course) and at least one is expected, then it is usual to print the program name together with a list of valid arguments. This list should not be verbose and should not exceed about 22 lines otherwise some lines will disappear off the top of the screen. There is a convention that MSDOS programs expect arguments announced by the slash '/'. In Unix the character used is invariably minus '-'. Assuming that you have written a program to pretty-print a C++ program; that the program name is pretty and that 3 arguments are allowed:1. 2. 3. /ln print n lines per page, where n is an integer (optional - defaults to 60) /fn print with font size n, where n is an integer (required) filename to print (required)

argc will hold a maximum value of 4 (the name of the program plus 3 arguments) and a minimum of 3. If argc < 3 or argc > 4 then there is an error and the program should display an error message to the terminal and then terminate. The error message would be something like:incorrect number of arguments usage: pretty [/ln] /fn filename /ln = print n lines per page /fn = use font size n (8..12)

42

Arrays
Note the square brackets to indicate an optional argument. The program can then be terminated with either:! !

return 1; exit(1);

when the error is detected in main, or in other cases. exit is in cstdlib (or stdlib.h).

By convention, a non-zero value returned from main or as an argument to exit indicates an error. In both cases, other non-zero values can be used to indicate different error conditions.

11. Initialising pointer arrays


Here is an example of an array being used to provide a lookup table. In a program involving the use of dates, it is likely that a facility to convert a day number into the name of a day of the week may be required. The day numbers would be in the range 0 - 6, and values within this range would be passed as an argument to a function that returns the corresponding day of the week, i.e. "Sunday" - "Saturday". We introduce here the concept of static function local variables. These are variables declared within a function whose scope is limited to the function body but, unlike auto local variables their life is that of the surrounding program - they are not destroyed when the function terminates. This topic is covered more fully in the chapter on Program Files para. 8 char* dayname( int daynum ) { static char *name[] = { "Sunday", "Monday", "Tuesday", "Wednesday" "Thursday", "Friday", "Saturday" }; return name[daynum]; } . int daynumber = 2; cout << "day " << daynumber << " is " << dayname(daynumber) << endl; The static local variable name is created and initialised only once. Thereafter, the declaration is ignored and the return statement simply looks up the day name within the array that corresponds to the incoming argument. Note that no check is carried out on the argument, so if it falls outside the range of values 0 - 6 the function will either return an incorrect value or cause a runtime error.

12. Review
You will, by now, have seen that arrays and pointers to arrays in C++ are somewhat complex and error-prone. This is because these facilities were designed over 20 years ago for 'C' (a language that was originally designed for writing operating systems) and have had to be retained in C++ for backward compatibility. In fact, the object-oriented facilities provided by C++ allow these deficiencies to be hidden from the application programmer who can use libraries of classes e.g. class string which hides the underlying shortcomings of the built-in array of char type. In particular, the disadvantage of the fixed size of built-in arrays and the absence of array bounds checking can be overcome in container classes which are provided with most C++ implementations and are now standardised as the Standard Template Library. However, we shall be concerned with how container classes are designed and written and we therefore need to understand the base facilities on which they are built. You will be provided with a simple String data type that can be used for assignments. You should read Skansholm pp 91-93 on the standard string type that is now part of the Standard Template Library. If you wish, you can use this standard type wherever strings are required.
43

Arrays
13. Summary
!

The array type allows a collection of items of the same type to be stored under a single name. The array declaration specifies the type of its components and the number of elements. Individual components of an array can be accessed by subscripting the array name with an integer expression, making them well suited to processing by loops. The compiler provides no run time checking of array bounds so that care needs to be taken to ensure that array bounds are not exceeded otherwise memory may be corrupted. When an array is passed to a function, the address of the first element of the actual argument is copied into the corresponding formal argument. An array formal argument can be declared as either e.g. int table[] or int *table they both mean a pointer to an array of int. Within the body of the function, the components of the array may be accessed either using subscripts, normally in the form of a variable whose values are controlled by a loop e.g. table[ i ], or by a pointer. In the formal argument list of a function, a multi-dimensional array must specify the number of all dimensions except the first. Arrays with 2 or more dimensions are likely to be specific to a particular application and are best given a new type name using typedef. Arrays can be initialised on declaration with values inside braces separated by commas. Any items unspecified in this way are initialised to 0 except in auto declarations where the treatment of unspecified values is undefined. This default initialisation only has meaning for the primitive types. Strings are one-dimensional arrays of char terminated by the ASCII NUL ('\0') character. Room must be allowed for this character otherwise output and other routines will not behave correctly. In some programmer-defined functions that process arrays of char, the terminator must be provided by the programmer. Arrays of pointers to char can be used to handle arrays of strings. This is how command line arguments are provided as the second argument (argv) to function main, the first argument (argc) being an integer representing the number of arguments. A array of pointers to char can be initialised with a list of strings. The number of pointer elements, unless given within the brackets, is fixed by the number of strings in the initialisation list. Output of this array of char pointers could be by:int size = sizeof(course) / sizeof(course[0]); for ( int i = 0; i < size; i++ ) cout << course[ i ] << " "; cout << endl; sizeof(course[0]) will yield either 2 or 4 (the size of a pointer), and sizeof(course) will yield either 10 or 20 (5 pointers). The value of size in either case will be 5.

44

Arrays
14. An array application - Stack of char
A stack is an abstract data type - a type that is not provided by the programming language but which can be implemented by using the data structuring facilities of the language. A stack works on the LIFO (last in, first out) principle - the last item put onto the stack is the first to be removed from it. The last item put onto the stack is at the top of the stack and the next item to be removed will be taken from the top. Access to the stack is at one end only - the top. Compare it to a stack of plates - the next one to be used is the latest one to be placed onto the stack. The standard operations on a stack of char are:! ! ! ! !

void push( char ) void pop( void ) char top( void ) bool empty( void )

char is pushed onto the stack the top of stack item is removed the top of stack char is returned, the stack is unchanged returns true if the stack is full, otherwise false

void makeempty( void ) empties the stack

One way of implementing a stack is to use an array:// charstck.cpp // illustrates an array implementation of a stack of char const int MAXSTACK = 20; // 20 elements char stack[MAXSTACK ]; // the stack int thetop; // the index value of the current top of stack // (initially empty) int main ( void ) { void push( char ch ); // 5 function prototypes void pop ( void ); char top( void ); bool empty( void ); void makeempty( void ); char word[] = abracadabra; makeempty(); for ( int i = 0; word[ i ] != \0; i++ ) // push each letter of word push( word[i] ); cout << word << reversed = ; while ( !empty( ) ) { cout << top( ); pop( ); // output the top char and then pop } cout << endl; return 0; } void push( char ch ) // post - ch has been placed at the top of the stack { ... } void pop ( void ) // pre - the stack is not empty // post - the top of stack item has been removed { ... }

45

Arrays
char top( void ) // pre - the stack is not empty // post - the top of stack item has been returned. The state of the stack is unchanged { ... } bool empty( void ) // post - if the stack is empty, true is returned, else false is returned { ... } void makeempty( void ) // post - the stack is empty // abracadabra reversed = arbadacarba Note that the code in function main never accesses the array stack directly. All operations are carried out only via the provided routines makeempty, push, pop, top, empty. This is an example of data abstraction - the stack data structure is protected from corruption by requiring all accesses to be made through these functions. In the example, this discipline is not enforced - it is possible for the stack to be accessed directly since stack is a global variable that has file scope. We shall see later how direct access can be prevented, and how the stack can be encapsulated in a single entity that holds both the array and the variable that records the top of stack.

46

Program Files Program Files


1. Introduction
The unit of compilation in C++ is the file. A program can be built from several files. These will comprise:! !

The main program file that includes a function main Zero or more modules providing support functions, data types etc. comprising
!

A header file ( .h ) that contains prototype declarations for the functions provided by the module and possibly type and data declarations. A source ( .cpp ) file containing the definition of the functions, types and variables provided by the module. This file may or may not be present. The object file ( .obj ) created by compiling the .cpp file (see above) that provides the definition of the functions whose declarations appear in the header.

The main program file contains compiler directives to #include the header file(s) for the supporting modules. This ensures that functions and variables, constants and types defined in the supporting source files can be accessed by the main program. In other words, the header files provide the prototypes for functions and referencing declarations for variables etc. that allow the compiler to generate code for the main program without the source of the supporting .cpp files themselves being present at compile time. At link time, the programmer must indicate which supporting object ( .obj ) files he wants to be linked with the object code of the main program. Within the GNU C++ IDE this is done by creating a project which defines all the required source files for a particular project and ensures that the object code of each is up to date before the linker links them all in to produce the executable. The project definition itself is saved as a .gpr file which can be opened and changed as required. By default, the name of the executable file will be the name of the project file. Thus assign1.gpr (the project file) will cause the executable resulting from linking all object files to be named assign1.exe regardless of the name of the main source program file. The default can be changed by the menu item Project.main targetname. Take iostream as an example. You must include the compiler directive #include<iostream> to ensure that the actual text of this header file is included in the compilation of your main program. Without this, the compiler would not be able to make sense of a call to e.g. cin.get(). You do not need the source of iostream (iostream.cpp) and it is not even present on the machine. At link time, the linker sees the header declaration and knows from this that the object file for iostream must be combined with the object code generated from the source of your main program in order to produce the executable. The integrated environment allows the location of the object code of iostream to be specified and the linker fetches it from that directory for inclusion. Thus we have the concept of separate program modules that consist of two parts:! !

an interface part - the header file iostream an implementation part - the object file iostream.obj. (In fact, you will not find iostream.obj in the directory because the code is included in the library files in the lib directory).

The interface part defines the services provided by the module in terms of the functions, variables, constants and types that are provided (exported) by the module. The implementation part provides the actual implementation in the form of object code that is needed at link time. This is another example of abstraction. We need to know how to call the iostream functions, and it is convenient that objects like cin and cout are pre-declared. For this
47

Program Files
reason, prototypes of the functions and the declaration of the standard I/O streams are made available to us in the header file iostream, but the implementation is hidden in the library files since we need not be concerned with how the functions are implemented nor how stream objects are represented. Consequently we can access the resources provided by iostream only via the routines and declarations provided in the header file (the interface). We cannot access the representation of streams because it is hidden and is therefore protected from the possible corruption that might have occurred had we been allowed direct access to it. Note that the ANSI C++ standard specifies that system header files such as iostream, string, vector etc. should not be given with a .h file name extension. However, all other modules (including those that you write) must have the extension .h. The GNU C++ compiler meets this requirement of the standard, but other, older, compilers may not and, in those cases you will have to use the old name for such system headers, e.g. iostream.h.

2.

The steps to produce an executable


Assume that you have a program that consists of the files:main.cpp other.cpp other.h
! ! ! ! ! !

the main program file containing the function main a source file containing the definition of support functions, type and variable definitions the header file containing the external referencing declarations for the functions, types and variables that are defined in other.cpp

Select Project - Open project - call it myprog.prj Add other.cpp and main.cpp to the project Compile other.cpp Compile main.cpp. (Header file other.h is brought in during compilation) Link main.obj with other.obj When you choose link with main.cpp
! ! !

You are not linking the source files but the object files created by the compiler. The linker doesn't know what to link with main.obj unless you have a project The linker links together the object code of other, main and of any library code required e.g. iostream You could use make instead of compile and link. This will compile all modules whose object file has a time earlier than the source file (.cpp) and then link.

The name of the executable is the same as that of the project i.e. myprog ( not main ). Some students find this process of setting up a project intimidating for some reason. But it quite simple and has to be mastered in order to write real programs that consist of more than one file.

3.

Types, storage class and scope


Each object that is given an identifier in a program is a reference to a memory location where that object's representation is stored. Thus the declaration int count associates count with a location in memory where the bit representation of the value of count is stored. An object known by its identifier has 3 attributes in addition to its value: -

48

Program Files
!

type This is important because it determines the amount of memory that is allocated for the representation of the object and also its bit pattern. Thus both the number of bytes and the pattern of the bits stored in those bytes will be completely different between e.g. an int and a float even if they appear to hold the same value. storage class This is important because it determines the lifetime of the object, i.e. how long it remains in existence occupying storage. Storage class has defaults which are determined by the position in the source code of the object's declaration. This may be varied by providing an explicit storage class on declaration. There are 3 categories of lifetime !

local (auto) static dynamic

! !

lifetime is transient and exists only for the lifetime of the enclosing block (usually a function, but see later). lifetime exists for the duration of the program's execution allocated dynamically during a program's execution. lifetime is for the duration of the program, or until de-allocation whichever is sooner. This will be dealt with later.

scope This is the portion of the source code within which the object is visible. Thus a variable declared within a function is visible (in scope) only within the block of statements that constitute the function body regardless of its storage class. See also Skansholm Chapter 4.3 Declaration, scope and visibility.

There can be different combinations of scope and storage class, e.g. a function local variable can be declared static. The effect is that its visibility (scope) remains limited to the enclosing block (i.e. the function body) but its lifetime continues for the duration of the program's execution.

4.

Local duration
Unlike some programming languages (e.g. Pascal and Modula-2), the body of a function may not include the definition of another function. In other words, functions may not be nested in C++ and the only valid definitions appearing within a function are those for data items. Variables defined in a function have the default storage class auto and the formal arguments to the function are also treated as auto. The body of a function is a sequence of declarations and statements surrounded by braces {}. This construct is known as a compound statement or block. Within a function body, any statement may itself be a block. It is logical therefore that such a block, nested within a function body, should be allowed to contain data declarations, and that the scope of those declarations should be the surrounding block as with function local variables. Therefore the sequence of statements that depend on the truth or otherwise of the logical expression in an if statement may be a block that contains declarations whose scope is limited to that block. A block may even consist of just the braces surrounding one or more statements :-

49

Program Files
void swapifless ( int& a, int& b ) { if ( a < b ) { const int temp = a; a = b; b = temp ; { int inner = temp; cout << a; } cout << inner << endl; } cout << temp << endl; } Function swapifless above could have included a local variable definition int temp (declared before the if statement). This outer temp would have been invisible within the if block because the inner temp would have caused a 'hole' in its scope. This hole would extend for the scope of the if block only. A local variable can, of course, be initialised on definition. This initialisation can be by any expression that is valid at that point, for instance by an expression that contains reference to the formal arguments as above. In the absence of any initialisation, the value of a local auto variable is undefined. // error undefined symbol temp if block function body block

inner block

//error undefined symbol inner

5.

Declaration versus definition


A definition of a function is a block of source code that defines the function and its body:void swap ( int& a, int& b ) { int temp = a; a = b; b = temp; } A declaration of a function is just the header followed by a semi-colon:void swap ( int&, int& ); // prototype

A definition of a variable is a statement that allocates storage with optional initialisation:int count = 0; // allocates storage

A declaration of a variable is a notification to the compiler that a variable has been defined in another file, but is being referenced in the current file:extern int count; // external referencing declaration. Does not // allocate storage

You will not normally need to make external referencing declarations because our standard practice will be to #include a header file that serves the same purpose (see para 6 below).

50

Program Files
6. Static duration
An external referencing declaration for a function is no different in form from the function prototypes with which you are already familiar. It informs the compiler that a function is to be called from a separate file from that in which it is defined. An external referencing declaration for a function is made in the source program file in which the call to the function is to be made, i.e. in the file in which it is not defined. The format is as follows: external void print( void ); // declares a function that is defined in another file // external may be omitted

External referencing declarations are usually made by placing in the main program file a compiler directive to #include a header file that provides the necessary external referencing declarations as explained in paragraph 1. Variables declared outside of any function - e.g. before function main have file scope and are referred to as global variables. The C++ compiler guarantees to initialise any global variables to zero, but it is considered good practice to initialise them explicitly. As with any data declaration, using the same identifier as another object declared in a surrounding block, a local variable causes a hole in the scope of the global variable with the same name - see the example below:#include<iostream> int sum; int main( void ) { void subroutine( void ); // prototype declaration sum = 15; subroutine(); cout << "Global sum is " << sum << endl; return 0; } void subroutine( void ) { float sum = 1.234; cout << "Local sum is " << sum << endl; } The global variable sum is distinct from the local variable of the same name in function subroutine. The latter causes a hole in the scope of the global from the point immediately after the definition of float sum. The only variable of that name visible within subroutine is the local one with the value 1.234. As a corollary float sum is not visible with main because its scope is confined to the function in which it is defined. The program's output is:Local sum is 1.234 Global sum is 15 It is possible to gain access to a global variable even when it is masked by a local variable of the same name. In function subroutine for instance the global variable sum can be referenced by preceding it with the double colon scope resolution operator which you have already met in e.g. setiosflags( ios::left ):cout << "Local sum is " << sum << endl; cout << "Global sum is " << ::sum << endl;

51

Program Files
7. Storage class static
Variables that are explicitly given the storage class static may be either local or global. The meaning of the static differs depending on whether its declaration appears within a function or outside.

8.

Static local variables


The default storage class of variables declared within a function is auto. This means that their scope is confined to the block in which they are declared, and also that their lifetime is the same as that of the block. If a variable declared within a function is initialised with e.g. int local = 1; Then the initial value of local will be 1 for every activation of that function. If it is not initialised, then its initial value is undefined. A local variable given the storage class static still has local scope, but retains its value between successive activations of the block in which it is declared. void fun1() { static int staticlocal = 1; ... staticlocal++; } On the very first occasion that fun1 is called, the value of staticlocal will be 1. But for subsequent calls staticlocal will have the value that it was last given in the body of fun1 e.g. 2 on entry at the second call, 3, 4 etc. in the above example. In other words, staticlocal retains its value across activations of fun1 and occupies storage for the whole of the program's execution.

9.

Static global variables


The effect of giving a global variable or function the storage class static is to make it inaccessible to any program unit (i.e. file) other than the one in which it is defined. In other words, it can be accessed by any function in the file in which it is declared, but may not be accessed from any other file, even if an external referencing declaration is given in the other file. The effect of static definitions at the global level in source files that have no function main is to give the programmer of these implementation modules the ability to control the export of both variables and functions. This is a standard requirement of a programming language that supports the separate compilation of modules. A function of storage class static would typically be a support function called by other functions in the same module but required not to be accessible from another module. A global variable would be given the storage class static to prevent access to it from any module other than the one in which is it declared. This is known as data hiding. Items which are explictly made visible (by declaring them in a header file) are said to be exported from the module. Note that this mechanism could be used to prevent access to the stack and its top-of-stack indicator if the char stack in the previous chapter were to be implemented in a separate file.

52

Program Files
10. The C++ pre-processor
This is a simple macro processor that, in the case of GNU C++, constitutes a separate pass by the compiler. It makes a pass over the source file substituting all occurrences of defined identifiers with the token string that represents the macro definition. Thus, if you liked Pascal and also like typing, you could make C++ look more like Pascal by replacing all occurrences of { with BEGIN and all occurrences of } with END; and by providing macros that carry out the conversion back to the C++ convention immediately prior to compilation; #define BEGIN { #define END; } int main( void ) BEGIN int a, b; if ( a > b ) BEGIN int temp = a; a = b; b = temp; END; return(0); END; The macro processor was used extensively in C to produce the effect of inline functions and constant declarations which are now part of the C++ language. Its use in C++ is therefore mostly confined to controlling conditional compilation and the inclusion of header files.

11. Conditional compilation


When developing a complex program, it may be useful to include debugging statements that output the value of certain variables or that indicate at which point in the source code execution is currently being carried out. The output can be directed to a file by using output redirection at the command line. When the program appears to be working correctly, these debugging statements could be deleted from the source. But all too often, it is found that bugs still remain and some or all of the debugging statement have to be reinserted. The inclusion in the compilation of the debugging statements can be controlled by macro conditional statements of the form:#define DEBUG 1 ... #if DEBUG statements1 #else statements2 #endif Statements1 and statements2 are actual C++ program statements. The sequence #if DEBUG, #else, #endif can be scattered throughout the source code and will have the effect of including statements1 into the compilation if DEBUG is true, and including statements2 if DEBUG is false. // Macro definition setting DEBUG to true

53

Program Files
In order to eliminate the debugging statements, it is only necessary to change the value of DEBUG from true to false (0), and re-compile and link. The GNU C++ IDE allows macro constant definitions to be changed via the menu item:Options.Compiler options To define a macro named DEBUG, go to this menu item and enter -DDEBUG. To undefine it, enter -UDEBUG. A file macro.cpp is installed in the labs for you to try this out. The conditional compilation facility may also be used to generate different versions of a program for different platforms or conditions.

12. Conditional file inclusion


There are two formats for specifying the name of the include file in a compiler include directive. If the header file name is surrounded by angle brackets, a predefined list of specified include directories is searched. If the header file name is surrounded by double quotes, the current directory is searched followed by the specified include directories. #include<iostream> look in the standard include directories

#include"myheader.h"look in the current directory first, then the standard include directories The standard include directories are stored in a directory indicated by operating system path directives that are set up when the system starts or that are indicated by values that can be configured from within the IDE. When developing programs that consist of several modules (files) it is normal to supply a header file for each module other than the main module. The main module then requires compiler directives to #include these header files, using the form #include "filename.h". If necessary, the header file may also be included in the compilation of the .cpp file for which it is the header. In cases where header files themselves contain include directives, there is the likelihood that some declarations will be included twice. In those cases, header file inclusion may be made conditional on the existence or otherwise of a definition Initially, you will not be writing programs whose complexity requires the use of #ifndef and #define so do not worry about them unduly. When the linker complains that you have multiple definitions of a function or variable, you will know that you have hit the problem. Then seek advice.

54

Data Structures Data Structures


1. Data Types
Data types can be described in terms of the range of values they may hold and by the operations provided for them. e.g. type int has a range of possible values from -2,147,483,648 to 2,147,483,647, and the provided operations include +, -, *, /, %, ++, +=, >, <=, ==, !=. We have not dealt in any detail with the way in which type int is represented in memory because we do not need to know this in order to use the type. We defined a type Clock to have a range of values representing the times from midnight to 23:59 at intervals of 1 minute. We also provided a small set of operations - gettime, tick and show. We try to follow the principle that the definition of such data types provides all the information another programmer needs in order to use them in his program, but that the representation should be hidden so that it cannot be corrupted. Another reason for hiding the implementation is that it should be possible to change it, e.g. to improve performance. The client program will have to be re-linked with the object code of the new implementation but, provided that the definition is unaltered, no change should be required to the source code of the client program.

2.

Abstract Data Types


These are data types that are defined entirely in terms of their set of operations without any consideration of how their values are represented. The domain of values may also feature in their definition, but often it is so large as to make this not useful. There may, in fact, be several different ways of implementing them, each with their own set of advantages and disadvantages. They are often models of objects from the real world or from mathematics, e.g. Sets, Queues and Lists. The implementation should allow a programmer to define new instances of the type, but should prevent access to the representation.

3.

Classification
There are two main groups - single entities of which there may be many instances e.g.Clock, and collections (or containers) of many objects of the same type e.g. Set, List etc. The components of these collections may be of any type, but, within one collection, must all be of the same type. Frequently, part of the definition of a collection is the relationship between the members.

55

Data Structures
4. Categories of Collection
The broad categories are:!

Collections in which there is no relationship between the members except that, in the domain of all possible values that may be a members, each is either a member or is not, e.g. Set and Bag. Linear structures in which the members have a one to one relationship with each other.

Set

Linear

Hierarchical structures in which the members have a one to many relationship with each other.

Hierarchical (Tree)

Graphs - where the members have a many to many relationship.

5.

Stacks
Definition This is the simplest of the linear collection types since the number of operations is typically small. As with all containers, the components may be of any type, but must be of the same type within any one stack. Additions to, and removals from the stack are made at one end only - the top. Access to components is limited to the item currently at the top. The consequence of this relationship between members is that the first item to be added is the last to be removed. This is known as a LIFO structure - last in, first out. Stacks are very widely used in Computer Science. When a function is called, a stack frame is built containing the address to which control must return when the function has finished execution. In addition, space is reserved in the stack frame for any auto local variables and for the values of any actual arguments passed to the function. This structure is pushed onto the system stack. When the function terminates, the stack frame is popped from the stack, causing the arguments and local variables to perish. Another application is recording the path taken through a structure so func that it can be retraced - the 'Hansel & Gretel' effect.
main main

Graph

int funa ( int y ) { return ( y * 2 ) ; } int funb ( int z ) { return ( funa ( z ) / 2 ); } int func( int a ) { return ( funb( a ) ); } int main (void ) { int x = 4, y; y = func( x ); }

funa

funb

funb

funb

func

func

func

func

main

main

main

main

main

Stack frames for the above code

56

Data Structures
The classic operations are:push top pop empty push a new item onto the stack retrieve the top of stack item without removing it remove the top of stack item test if the stack is empty

Viewed as an abstract type, a stack cannot be full, but the actual implementation may have to place a limit on the number of items that can be held on the stack. This gives rise to a further operation full test if the stack is full

Operations on abstract data types can typically be categorised into those that:! ! !

change the state of the data type e.g. push, pop report on the state of the data type without changing it e.g. top, empty, full. create and/or initialise an instance of the type - no example here

Each operation is provided with a pre-condition and post-condition that states i) pre - any requirement placed on the caller as to the state of the structure prior to the call, or on the values passed as arguments; for instance, top and pop must not be called on an empty stack. ii) post - the state of the structure that is guaranteed to hold after the operation has been carried out, provided that the pre-condition has been met; for instance, after a push, the number pushed is at the top of the stack. The definition of a stack of integers can be placed in a header file which is then available for importing (using #include "intstack.h") by any client program requiring it:// intstack.h // definition of a stack of integers void push( int arg ); // pre - !full() // post - stack contains the value of arg, top() = arg void pop( void ); // pre - !empty() // post - top() has been removed int top ( void ); // pre - !empty() // post - stack is unchanged, the item at the top of the stack has been returned bool empty(); // pre - none // post - returns TRUE if stack is empty, otherwise FALSE bool full(); // pre - none // post - returns TRUE if stack is full, otherwise FALSE

57

Data Structures
Representation The obvious first choice for representing a stack is an array, although this has the disadvantage that an upper limit for the number of items to be stored must be chosen before compiling, and this cannot be varied at run-time. This representation should be hidden from a user of the stack by specifying the storage class static // intstack.cpp // representation and implementation of a stack of integers #include "intstack.h" const int MAX_STACK = 10; // the maximum number of items that can be stored static int data[MAX_STACK]; // the container for the stack members static int Top; // the index of the top item. // Top will need to be initialised on startup, incremented // before pushing a new member, and decremented after // popping a member. // When Top = MAX_STACK - 1, the stack is full Implementation of the operations This is left as an exercise. The full definition of the functions would be placed after the global data definitions in intstack.cpp. Note that intstack.cpp contains an include compiler directive for the header file. intstack.cpp would contain only the data declarations shown above and the function definitions. There must be no function main. Using the stack A client program wishing to use the integer stack would import the definition (i.e. #include "intstack.h") and then carry out operations on it as though it had been defined in the same file. Because of the static qualifiers used for the array definition data and the integer variable Top, the client program cannot access the representation directly even if extern declarations are made for these two items in the client's source code. const MAX_STACK also cannot be accessed because of its const qualifier. #include <iostream> #include "intstack.h" int main( void ) { // push some items cout << endl << endl; while( !full()) { static int item = 0; push( ++item ); cout << "pushing " << item << endl; } Now an attempt to access the stack variables directly - causes linker errors:Top = -1; // Linker error undefined symbol _Top - defined as static cout << "MAX_STACK = " // Linker error undefined symbol << MAX_STACK << endl; // MAX_STACK is const in intstack.cpp // pop them while ( !empty() ) { cout << "popping " << top() << endl; pop(); }

58

Data Structures
6. Abstract Data Type?
Can this implementation of a stack of integers be classed as an abstract data type? It has been defined in terms of its set of operations. It is encapsulated by being placed in separate files and its representation is hidden from its clients - its state can only be altered through the supplied operations. But only one stack can exist at any one time in any one client program. The client cannot declare instances of the type by e.g. IntStack astack, bstack; This is clear since there is no mechanism provided by the stack module for specifying on which stack the operations are to be carried out - there is only one. This single instance of an encapsulated type is sometimes referred to as an abstract state machine and is simple to implement and useful when only one instance of the type is required at any one time. Later, we will see how a true abstract data type can be defined of which as many instances may be created as the client program requires.

7.

Queues
A queue follows closely the real-world example. Operations are permitted at both 'ends' with additions (enqueue or append) being made at the tail and removals (serve or remove) being taken from the head. Effectively, the elements are ordered physically according to the time of their arrival. It is known as a FIFO structure - first in, first out. Typical operations are:! ! ! ! !

append or enqueue serve or remove size empty full

add an element at the tail remove an element from the head return the length of the queue query whether the queue is empty query whether the queue is full

Implementation Again, an array implementation is considered. We need two integers to indicate the head and tail of the queue and possibly a further integer to record the size (although this can be computed from head and tail). const int MAX_QUEUE = 10; static char queueitems[ MAX_QUEUE ]; static int head = 0, tail = -1, count = 0; // A queue of characters

Initially, the indicator (technically cursor) tail is set to a special value to indicate the empty state. The head of the queue can be viewed as being at the 'left hand' or 'bottom' of the array, while the tail grows 'right' or 'up' the array as items are appended.

59

Data Structures

head

1.

Empty
tail

head

2.

append('A')
tail

head

3.

append('B')
tail

head

4.

ch = serve()
tail

head

2 C

5.

append('C')
tail

The problem with this method of handling the array is that as items are appended and served, the queue moves up the array, and will eventually bump up against the end when, in fact, there may be space available lower down caused by elements being removed from the head e.g. A in this case. One solution is to slide all items in the queue down the array once the tail has reached the top, but data moves are relatively expensive - particularly if the queue elements are large. A satisfactory solution is to view the array as circular so that the first element follows on immediately after the last. Spare space in the array caused by removals will always be available for use as long as the number of elements remains below MAX_QUEUE. Instead of simply incrementing head on each removal, and tail on each append, these two cursors must be taken modulus MAX_QUEUE each time they are incremented. Thus, if e.g. tail is presently 9, and a further element is appended, tail becomes ( 9 + 1 ) % 10 = 0, and the newly arrived element is inserted at array element 0.

count = 6
9

tail O
1

M
2

L
7

Process

K
6

J
5

head

60

Data Structures
void enqueue( char element ) { tail = (tail + 1) % MAX_QUEUE; queueitems[tail] = element; count++; } The simplest way of implementing the test for full and empty is to maintain the size of the queue in a variable (e.g. count) within the queue module. As with all data structures based on an array, the storage space is fixed at compile time and the number of items that can therefore be stored is bounded. This inflexibility means that arrays can only be used in cases where the maximum number of components can be determined in advance.

8.

Lists
Basically a list is a sequence of elements, each element other than the first and the last having a predecessor and a successor. Another way of expressing this is that a list is
! !

either empty or consists of an element followed by a list.

This is known as a recursive definition. The elements may be ordered:! ! ! !

by their time of arrival, i.e. each successive addition is placed after the previous last, or inversely by their time of arrival - each element is inserted before the previous in a similar way to a stack, although access may be allowed to any element. by some quality of the data e.g. a list of names ordered alphabetically. by requesting insertion at the 'current' position as indicated by some cursor.

Again, an array is considered as the method of representation. However, we find that there is a high cost involved where insertion and deletion is permitted other than at the ends. Each insertion within the list will require all elements following it to be moved up the array to make room, and, since there can be no null elements, each deletion will require all following elements to be moved down to close the gap. The time required to carry out these moves makes this method of representation less than optimal. There are more efficient and flexible ways of implementing lists in cases where insertions and deletions are permitted within the list.

9.

Structs
Frequently there is a need to store information about an entity under a single name where the information describing that entity involves different data types. The struct is an aggregate type that provides this facility:struct student { char name[30]; int age; char coursecode[6]; }; student courserep; // student is a type, not a variable.

// note the semi-colon // courserep is one student

61

Data Structures
Each separate data item within the structure is referred to as a data member. Once the new type student has been declared, a collection with that component type can be defined. student aclass[16]; // aclass is an array of 16 students

Access to the members of a struct is by dot notation:strcpy( courserep.name, William Brown ); // simple assignment not allowed courserep.age = 21; strcpy( courserep.coursecode, mit96 ); cout << courserep.name << endl << courserep.age << endl << courserep.coursecode << endl; A queue of students could be declared as:const int MAX_QUEUE = 16; static student stuqueue[ MAX_QUEUE ]; static int head = 0, tail = -1, count = 0; // A queue of students

10. Unions
This is similar to the struct in that it can hold one or more items of different types. It differs from struct in that it can hold only one of its components at any one time. The compiler allocates storage for the largest of the specified members and all members are overlaid onto the same storage. In other programming languages this type is usually known as a variant record. There are two main uses for unions.
!

In cases where different instances of the same entity may have different characteristics, i.e. they are described by a different set of variables. This might arise in a collection of students where part-time students require a record of their employer whereas full-time students do not. In low level programming when a location in memory may be viewed as two different sets of data, e.g. either two separate integer values or a long integer.

Example: typedef short TwoInts[2]; union cheat { Twoints twoints; long along; }; cheat x; x.twoints[0] = 255; x.twoints[1] = 1; cout << x.along << endl;

65791

62

Dynamic Data Structures Dynamic Data Structures


1. Structures
These were introduced in the previous chapter. The type name in C and C++ is struct, but in most other languages they are known as Records. They are particularly useful for modelling real-world objects that are described by a set of attributes (data values). The syntax is struct type-name { list-of-members }; This is a type definition and does not allocate storage. It introduces a new type that can be used subsequently in definitions of variables whose type is type-name. Examples:struct Date { int year; int month; int day; }; ... Date today, his_birthday; struct Person { char name[20]; Date birthdate; char address[4][20]; }; .... Person Fred, Jane; struct Student { Person personaldata; char tutorGrp; int modulemarks[9]; }; ... Student mscit[40];

These examples illustrate several things about the data type struct.
! !

The members (referred to as fields in other languages) may be of the same type, or of different types. There is no limit to the number of members, but large records can be built up from other struct types, for instance, type Person has a field birthdate which is itself a struct type (Date). The members may be of any type, including arrays (and other structs) The type name can be used in declarations of arrays whose elements are of struct type, e.g. mscit is an array of 40 elements, each of whose data type is Student. Each Student has a data member called personaldata of type Person; a tutorGrp of type char; and an array of 9 elements of type int called modulemarks. The type-name appearing after the reserved word struct is known as the structure tag. It is desirable that this name (e.g. Date, Person, Student) be unique within its own scope.

! !

As you can see, structures can be used in combination with other structures and with arrays to create arbitrarily complex types capable of modelling many real-world entities.

63

Dynamic Data Structures


2. Comparison between structs and arrays
!

Component data type The elements of an array must all be of the same type whereas structs may contain data members of different types. Assignment An array may not be assigned to another array because an array name is a constant pointer whereas the use of a structure variable name accesses the whole structure. The consequences of this are important:Variables of structure type may be assigned to other variables of the same type. The effect of assignment is to copy all of the fields from the source structure to the target structure (including each element of any array members of the structure). Thus we could write or Jane = Fred; mscit[1] = mscit[2];

Function arguments and return Structure arguments are, by default, passed to a function by value (not as a pointer in the case of arrays). However, a reference argument may be used to reduce the cost of copying large structures and/or to enable any changes to the structure to be reflected in the actual argument. If the objective is to eliminate the cost of copying large structures when a function is called and it is not the intention to modify the structure within the function, then the formal reference argument can be const modified, e.g. void printDate( const Date& aDate ) { cout << aDate.day << '/' << aDate.month << '/' << aDate.year << endl; } There is no intention to change the value of the argument aDate since it is only being output. However, to reduce the cost of copying the actual argument into the formal argument, the formal argument is made a reference to the actual argument Date&. Copying a reference involves only a few bytes. A function may return a structure or a reference to a structure as its result. Example:Date changeDate( Date aDate ); { aDate.year++; return aDate; }

Access to components Elements of an array can be accessed by subscripting the array name as in the example above. The subscript can be a variable that is modified within a loop c.f. the Plane example. This allows computed random access to any array component. The members of a struct, on the other hand, are accessed using dot notation i.e. the structure variable name followed by a dot followed by the member name. The dot is known as the structure member operator. If the member name is itself a structure and access is required to its members, then further dots are required to tunnel down through the member hierarchy, viz.

64

Dynamic Data Structures


Fred.name; Fred.birthdate.day; mscit[10].personaldata.birthdate.year; mscit[20].personaldata.address[1]; mscit[30].marks[2]; // the marks of student number 30 for the // second module
!

Pointers to structures If a structure is referenced by a pointer then the de-referencing operator applied to the pointer provides the access:Date* dptr = today; // dptr is a pointer to Date and points to the Date today Date dt = *dptr; // dt is assigned the value of today by dereferencing the // pointer dptr However, the structure member operator (dot) has a higher precedence than the dereferencing operator (*). So access to a member of today via the pointer dptr must use parentheses to resolve the precedence:cout << (*dptr).year; // displays the year member of today via the // pointer dptr This type of access is frequently required and the syntax is rather clumsy. A new operator is introduced for this purpose - the structure pointer operator ->. This does two things - dereferences the pointer to access the whole structure, and then accesses the member given after the operator (year in this example). cout << dptr->year;

Initialisation As with arrays, structures may be initialised at the time they are defined, e.g. Date his_birthday = { 1995, 11, 15 };

3.

Storage Management
So far we have only been able to use data items that have been defined at compile-time. Thus, an array defined in the source code of a program as:int table[100]; Will hold 100 integers and, if the requirements of the program exceed this number of elements, then the excess cannot be handled. Clearly this is unsatisfactory. The programmer cannot predict the demands that will be made on his program when it is being used by a client. What may have seemed a generous estimate when the program was written might soon turn out in practice to be a ludicrous under-estimate. What is more, if the estimate is indeed generous, then a large amount of storage space remains unused and therefore wasted because it cannot be used temporarily by other data items. An example is a windowing system like MS Windows. The programmers of Windows could not possibly have worked on the assumption that the number of open windows should never exceed a certain fixed limit. Since that code was written, the memory installed in the average PC has at least doubled, redoubled, and redoubled again. To have fixed this limit 3 or 4 years ago would have put all users in a straight jacket which would now appear intolerable. So how can we create and delete data items dynamically at run-time in response to the demands of the application program? By using the memory allocation and deletion procedures new and delete. The use of these routines is closely bound up with pointers and equivalent facilities are to be found in most of the conventional programming languages such as Ada, Pascal, Modula-2 and C.
65

Dynamic Data Structures


3.1 new
The syntax is new type-name [number-of-elements], where [number-of-elements] is optional and is used when a dynamically allocated array is required. Examples int* intptr = new int; char* chptr = new char[20]; The first statement allocates from the heap a chunk of memory sufficient to hold one integer and sets the pointer to integer intptr to point to this memory location. The heap, or free store is the name given to that part of available random access memory that is not currently occupied by program code and ordinary program variables. The second statement allocates sufficient memory from the heap to accommodate an array of 20 characters and sets chptr to point to the first.

3.2

de-referencing the pointers


Notice that the new data items are anonymous - they have no name. This is not surprising since the compiler is responsible for associating variable names with memory locations and the compiler did not know whether or not we would execute these two statements at run time - they may be encountered only if the user selects a particular menu option. Access to the newly allocated data items is obtained only via a pointer that points to them:*intptr = 99; cout << "intptr points to " << *intptr << endl; strcpy( chptr, "Hello, Hello!!!!!!!"); cout << " and chptr points to " << chptr << endl; In the assignment and output statements, intptr needs de-referencing to produce the value of the integer to which it points. chptr, on the other hand, does not require dereferencing since we want the whole array to be assigned or output rather than just the single character to which chptr points. This treatment is analogous to that of an array name.

3.3

delete
the delete operator has two forms, without brackets for single data items, and with brackets for arrays. Note that, whereas the form of new required the brackets to be placed after the type name:char* chptr = new char[20]; the syntax of delete requires the brackets to be placed after delete delete intptr; // de-allocate memory occupied by int pointed to by intptr delete[] chptr; // de-allocate memory occupied by string pointed to by chptr The effect of delete is to return back to the heap the memory referenced by the pointer (intptr and chptr in the above examples) and not to delete the pointer itself. After this, it is an error to attempt to de-reference these pointers in order to access the item they previously referenced.

3.4

Lifetime
The lifetime of objects allocated by new is from allocation to the earlier of deallocation (via delete) or termination of the program. Notice that lifetime may be different from scope. If a pointer providing access to a dynamically allocated item goes out of scope (perhaps because it is a local function

66

Dynamic Data Structures


variable and the function terminates) then the dynamic data item continues to exist, but is inaccessible. This is known as memory leakage. If it happens often enough, the program could run out of memory even though not all is being used. Local function variables can be used for allocating dynamic data items, but it is necessary to ensure that, before the function terminates, some other pointer that will continue in scope is set to point to it. int* makenewtable( int size ) { int* intptr = new table[size]; return intptr; } Since the function returns a pointer to integer, the result of the function call will be assigned to some other pointer to integer and access will not be lost by the demise of inptr:int* newtable; newtable = makenewtable( 20 ); If there is insufficient memory available on the heap when new is called, new returns the special pointer value 0. This means that the pointer does not point to anything and that, in this case, the allocation has failed. When building dynamic data structures, 0 is frequently used as a pointer value to indicate that no link exists between components of the structure. int* intptr = new table[ size ]; if ( intptr == 0 ) { cout << "Error, insufficient memory " << endl; exit(1); } The size argument to new permits the size of a dynamically allocated array to be determined at runtime. This can be used to get over the fixed size problem of arrays. The array is allocated on start-up with. say 10 elements. When it becomes full, makenewtable is called with an argument of, say, double this (i.e. 20). The contents of the original array are copied into the newly allocated one, and the old array then deleted. Next time the array becomes full, makenewtable is called with an argument of 40, and the copying done again. In this way, the effect of a dynamically resizeable array can be obtained. However, during this doubling process, there is a temporary requirement for additional memory that might cause memory exhaustion. Also, the requirement that the old data be copied into the newly allocated table is relatively costly in terms of time, and it is therefore advisable to minimise the number of resizing operations wherever possible - this is the reason for doubling the size on each resize.

67

Dynamic Data Structures


4. Dynamic Data Structures - Linked Lists
A list is a sequence of data items, each item other than the first and the last having a predecessor and a successor. A more elegant definition using recursion (and one which can be realised in most programming languages) is :! !

a list is either empty or consists of a head representing a single data item followed by a tail which is a list of data items.

Lists may be implemented using arrays but dynamic memory allocation is more flexible in that the list may grow and shrink in response to the demands of the application. The list should be viewed as a series of nodes, each node containing some data and a link to the next node. The link is a pointer to a node, and the node is most usefully implemented as a struct.

Linked List
last first count data Node link data Node link data Node link

For simplicity, a list of integers will be illustrated, but the data contained in a node (struct) may be as large or as complex as the application requires. The node is therefore defined as:struct Node { int data; Node* link; } Each node therefore consists of a data field (in this case an integer) and a pointer to the next node. The list itself can be implemented as a structure containing links to the first and last nodes in the list, and a count of the number of nodes. These links are, again, of type pointer to node. If the list is empty, then the links to the first and last nodes are given the special value 0 referred to above. The same principle will be applied to the link member of the last node in the list since it will have no successor:struct LinkList { int count; Node* first, * last; } The operations for a list are much less closely prescribed than those for stacks and queues since it is a more general structure and access may be provided at any point. There are also several possibilities for the ordering of the nodes. For simplicity therefore, the example shown below will add new items to the end of the list, and remove items from the front. This is therefore, in effect, a queue.

68

Dynamic Data Structures


4.1 List Cursors
Sometimes a list is provided with an internal cursor that can be moved about by making calls to appropriate functions. At any time, additions may be made at the position indicated by this internal cursor, and also deletions provided the list is not empty. The addition of a cursor (a pointer to Node) and the operations to move it are left as an exercise. It is not usual to provide a print function for a data structure since it ties it a particular I/O regime which may not be appropriate for all applications or on other platforms. However, it is sometimes useful for debugging purposes and one is included here to demonstrate a traversal of the list. These example operations all have a reference to a list as one of their arguments. This allows the client program to declare several lists, and to specify via the actual argument on which list the operation is to be carried out.

4.2

Initialising the list


void init( LinkList& t) { t.count = 0; t.first = 0; t.last = 0; }

// count of elements = 0 // pointer to first element does not point to anything // pointer to last element does not point to anything

4.3

Creating a new node


static Node* newnode() // function that returns a pointer to a Node. This // function is private to the list module (it does not // appear in the header file and is // declared with storage class static to prevent // access by a client)

{ Node* n = new Node; // allocate memory from the heap sufficient to // accommodate a Node and store a pointer it // in n. return(n); // return the pointer as the function's result }

4.4

Checking for empty


bool empty( const LinkList& t) // the argument is const because the list is not // changed by this function // post: returns true if the list is empty, false otherwise { return (t.count == 0); }

69

Dynamic Data Structures


4.5 Adding a new item to the list
void add (LinkList& t, const int item) // post: item is added at end of list { Node* n = newnode(); // create a new node dynamically by calling // function newnode n->data = item; // put incoming data into the data member of the // new node n->link = 0; // and set its link member to point to nothing if (empty(t)) // special action required if empty t.first = n; // set first to point to the new (first) node else t.last->link = n;// set link member of last node to point to the new // node t.last = n; // set 'last' member of the list to point to new (last) // node t.count++; // increment the count }
LinkList
last first 2 1 Node 2 Node

Node *n = newnode();

data

link

Heap
n->data = item; n->link = 0; a) t.last->link = n; b) t.last = n; c) t.count++; LinkList
last first 2 3 3

b) c) a)
1 Node 2 Node 3 Node

70

Dynamic Data Structures


4.6 Removing an item from the list
int remove(LinkList& t) // pre : the list is not empty() // post: the first item in the list has been removed { int tempdata = t.first->data; // save the data in the node pointed to by // first for return Node* tempnode = t.first; // save the first node for deletion t.first = t.first->link; // reset first to point to the next node after // first t.count--; delete tempnode; // recover memory for the old node // return the saved data return tempdata; }
LinkList
last first 3 2

c) d) a) 1
Node 2 Node 3 Node

b) e)
tempnode

Heap
a) int tempdata = t.first -> data; b) Node *tempnode = t.first; c) t.first = t.first -> link; d) t.count--; e) delete tempnode LinkList
last first 2 2 Node 3 Node

71

Dynamic Data Structures


4.7 Printing the list
void printlist( constLinkList& t ) { Node* temp = t.first; // temp points to first node while ( temp != 0 ) // while list not completely traversed { cout << temp-> data << endl; // output the data member of the node // pointed to by temp temp = temp-> link; // move pointer forward one node. } }

4.8

Searching the list


bool found( const LinkList& t, const int target ) // post: returns true if target is in the list, else false { if (empty( t )) return false; Node* temp = t.first; do { if( target == temp->data ) return true; // return true if found temp = temp-> link; // else move to next node } while( temp != 0 ); // while not at end of list return false; // not found }

5.

Other dynamic structures


The ability to create storage space for data dynamically at run time in response to the requirements of the application and to link these data items together by means of a pointer or pointers allows us to represent a wide range of structures of arbitrary complexity. Thus we can model stacks, queues, priority queues, lists, ordered lists, lists of lists, trees, graphs etc. The object-oriented features of the language that we shall be studying in the second Semester enable us to design data types as classes of object that represent these data structures. There are a number of books available that provide examples of these data structures and the algorithms to process them.

72

Sorting Sorting
1. Introduction
There are two main types of sorting - sorting arrays held in random access memory, and sorting files. In the early period of computing, file sorting tended to be dominant because RAM was very expensive and mass storage was held on magnetic tape, access to which is sequential. In contrast, magnetic disk storage provides the possibility of accessing file records by reference to their position in the file.

2.

Components of Sorting
Sorting involves rearranging the elements so that they are in order. This, in turn consists of two operations:! !

Comparing elements - usually by reference to a key field Moving elements - usually by swapping pairs of elements

There are normally many more comparisons than moves and the number of comparisons will be the most significant operation in terms of time, and therefore the prime indicator of the efficiency of a sorting algorithm.

3.

Sorting Files
Database systems are now universal, and file sorting has become less important. Instead, a number of different indexes are held - either within the data file, or as separate files - that allow the data file to be read (and output) in different orderings. If the amount of RAM permits it, and indexes are not supported, then the fastest way of sorting a file is to read it into an array, sort the array and write the data back out to file. If the file is too big, then it can be broken up into chunks, each of which is sorted in an array and written out to a separate file. Then the several ordered files are merged back into a single file. The traditional file merge requires only 2 elements of the file to be in memory at any one time and works as follows:!

split the original file into two new files writing 1 item to each new file alternately. Then merge back into the original file in pairs, creating n 2 runs of 2 items per run split the original file into 2 writing 2 items to each file alternately. Then merge back into the original file in quadruples creating n 4 runs of 4 items per run split the original file into 2 writing 4 items to each file alternately. Then merge back into the original file in octuples creating n 8 runs of 8 items per run etc.

The sort has finished when the original file contains 1 run of n items. The following is a simplified example based on a file of 8 items. The principle is exactly the same for any number of items.

73

Sorting
Pass 1 Original File Split into 2 files consisting of 1 item from file 1 file 2 Description Files 5 8 3 6 7 2 4 1 5 8 3 6 7 2 4 1

the original file written alternately


Merge the two files by comparing 1 item from each file and writing the smaller then the larger into the original file giving 4 runs of 2 items Split into 2 files consisting of 2 items from the original alternately Merge the two files in groups of 2 items from each file, giving 2 runs of 4 items Run 1

5 8, 3 6, 2 7, 1 4 5 8, 3 6, 2 7 1 4

2.1

5 3 5 6 8 6

3 3 5 3 5 6 3 5 6 8, 3 5 6 8, 1 3 5 6 8, 1 2 3 5 6 8, 1 2 4 3 5 6 8, 1 2 4 7 3 5 6 8 1 2 4 7 1 2 3 4 5 6 7 8

only 1 item remaining from this run, write it 2.2 Run 2

8 2 1 2 4 7 4

only 1 item remaining from this run, write it 3 Split into 2 files consisting of 4 items from

the original alternately


Merge the 2 files in groups of 4 items giving 1 run of 8 items. The file is now sorted

Note that:! ! !

There are only 2 elements from the file present in memory at any one time The process is dominated by I/O time The number of passes required to sort the original file is log2n
n

Passes 3 6 9 12 15 18 21

8 64 512 4,096 32,768 262,144 2,097,152

74

Sorting
4. Why sort?
! ! !

Sorting is used to optimise searching for and retrieving data either by humans or by the computer To produce a report which, because it is sorted, simplifies the manual retrieval of information To make more efficient searches for items held in either main memory or external storage

5.

Does it pay to sort?


Sorting carries an overhead for
! ! !

time memory for the code memory for temporary data

For very small data amounts of data, sequential searching may be sufficiently fast to avoid the need for sorting But a simple sorting technique can be employed for low data volumes, needing little overhead.

6.

What is the best sort?


Different sorting techniques have different strengths and weaknesses depending on:! !

The number of items to be sorted Whether the items are:


! ! !

already ordered, or nearly so in random order already inversely ordered, or nearly so Temporary local variables an explicit stack additional space on the system stack for stack frames if a recursive algorithm is used

The amount of additional storage required:!

Permanent for the code which implements the sort

The number and size of data items required to be moved

7.

Sorting efficiency
We are not usually concerned with the absolute amount of time required for a sort. But we are concerned with how the time t taken for a sort varies with the number of items n required to be sorted. If there is a linear relationship, then t will vary directly with n. i.e. it will be O(n). But no O(n) sort has yet been discovered!
75

Sorting
If t varies as a function of n2 then an increase in n by a factor of, say 10 will increase t 100 times and increasing n by 100 will increase t 10,000 times The simple sorting algorithms are all O(n2)

8.

Simple Array Sort - Exchange (Bubble)


Work through the array comparing adjacent pairs of elements. If the first element is heavier (larger) than the second, swap them Continue making passes, but stop one element sooner on each pass, because the next heaviest element has bubbled down to its correct place k=n While k > 1 Do For each element i from 1 to k - 1 Do If element i > element i +1 then Swap element i with element i + 1 Endif EndFor Decrement k EndWhile

Pass K 44 55 12 42 94 18 6 67

1 8 44 12 42 55 18 6 67 94

2 7 12 42 44 18 6 55 67 94

3 6 12 42 18 6 44 55 67 94

4 5 12 18 6 42 44 55 67 94

5 4 12 6 18 42 44 55 67 94

6 3 6 12 18 42 44 55 67 94

7 2 6 12 18 42 44 55 67 94

Notice that after each pass, the heaviest element in the unsorted part of the array has settled to the bottom, increasing the sorted portion by one and decreasing the unsorted portion by one. The indicators of the efficiency of this algorithm are:Comparisons Max moves Ave moves = = = (n-1) + (n-2) ... + 1
3 3

= 28 = (n2 - n) = 84 max = 42 ave

/2 (n2 - n) /4(n - n)
2

This algorithm can be improved by employing a flag that is set when no exchanges take place on a pass. In this case the array is sorted and no further passes are required. This is an O(n2) algorithm. It is never used in real application because it is the least efficient of all sorting algorithms. It is introduced here because it is relatively easy to understand and so that you will know never to use it!

76

Sorting
9. Insertion Sort
This works in a similar way to the sorting of a hand of cards Pick up the last but one element and place it in the correct order in the last 2 Pick up the last but 2 and place in the correct order in the last 3 etc. If the number of items to be sorted > 1 then For each element k from last item but one down to 0 j=k+1 save = k'th element While j <= last item AND the key of save > the key of the j'th element r[j-1] = r[j]; increment j endwhile r[j-1] = save endfor endif

Pass K k'th key 44 55 12 42 94 18 6 67

1 7 6 44 55 12 42 94 18 6 67

2 6 18 44 55 12 42 94 6 18 67

3 5 94 44 55 12 42 6 18 67 94

4 4 42 44 55 12 6 18 42 67 94

5 3 12 44 55 6 12 18 42 67 94

6 2 55 44 6 12 18 42 55 67 94

7 1 44 6 12 18 42 44 55 67 94

Ave No. Comparisons Ave No Moves


! ! ! !

= =

(n2 + n - 2) (n2 + 9n - 10)

= =

14 (14 in the example) 32 (29 in the example)

On average, there are half as many comparisons as Exchange sort The algorithm is efficient if the data is already in order It is an O(n2) algorithm It is stable - equal keys are not moved. This can be important if 2 or more consecutive sorts are required - each using a different key - the second being the tie breaker when the first keys contain duplicates.

77

Sorting
10. Simple Sort performance
Selection Sort (not covered in this note) Moves Compares Worst Average Best 3(n-1) 3(n-1) 3(n-1) n(n-1) n(n-1) n(n-1) Insertion Sort Moves n(n-1) n(n-1) 2(n-1) Compares n(n-1) n(n-1) n-1 Exchange Sort Moves 1.5n(n-1) 3/4n(n-1) 0.00 Compares n(n-1) n(n-1) n-1

Simple Sorting Algorithms Log Scale


10,000 1,000 100 Inverse 10 1 0
Insertion Selection Exchange

Ordered Random

11. Conclusions
11.1 Insertion sort is better for small data items and large keys. It also gives good performance when the data is already ordered (or nearly so). For this reason it is often used in conjunction with advanced sorting algorithms, e.g. Quicksort 11.2 Exchange sort is the slowest sorting algorithm and is only used in teaching or trivial applications because it is the simplest to code 11.3 Selection sort (not shown) is better for large data items with small keys. It has shown slightly better performance than Insertion on inversely ordered data

12. Complex sorts


! ! ! !

Shell sort - derived from insertion sort Quicksort - See later Heapsort These are in a different class to the simple sorts. The number of comparisons tend to vary in proportion to n.log2 n and they are therefore O(n.log n) sorts.

78

Sorting
13. QuickSort
This was invented by C.A.R. Hoare - a famous Oxford professor of computing and is an advanced algorithm, based on the exchange sort, that normally employs recursion. It is the most efficient of the advanced sorts although it becomes inefficient under certain very exceptional conditions. The more data items, the less likely these conditions are to arise. Insertion sort is often used in conjunction with Quicksort to sort small partitions. The technique is to split the array into two partitions and then to sort the first partition followed by the second partition:void QuickSort( AnyType array[] ) { If sorting is needed then split array into partitions S1 and S2 QuickSort(S1); QuickSort(S2); EndIf } All the keys in partition S1 must be less than (or possibly equal to) each of the keys in partition S2. The recursive routine sorts successively smaller and smaller partitions until a partition contains only one item and is therefore sorted The partitions are portions of the array itself - described by starting and ending indexes, and not some additional temporary data structure. Here is a refinement of the first description using four array index variables void QuickSort( AnyType array[], int first, int last ) { if( first < last ) { split the array into 2 partitions QuickSort( array, first, last_of_first_partition ); QuickSort( array, first_of_last_partition, last ); } } The 'partition' portion of the algorithm is where all the work is done. the second and third statements are simply recursive calls to the function itself. The partitioning process ensures that all items in the first partition have values that are <= all items in the second partition - although neither partition is necessarily sorted. One of the keys in the partition currently under consideration is selected as the pivot (the central element in this example) The items in the current partition are scanned
! ! !

first from left to right looking for an element >= pivot then from right to left looking for an element <= pivot when each scan has stopped, and provided the scan indexes have not crossed over, the two items are swapped.

79

Sorting
Pivot 44 Scan Swap 18 Scan Swap 18 6 12 42 94 55 44 67 55 12 42 94 6 44 67 Scan 55 12 42 94 6 18 67 Scan

1st Partition

2nd Partition

Scanning continues until the 2 pointers cross over. The pivot is now in its correct position in the array and is no longer involved in the partitioning. It may have been moved from its original position. Quicksort is called recursively to partition the lower and upper partitions, provided there are at least 2 elements in them

14. Efficiency of Quicksort


14.1 Best Case
The pivot exactly divides the array into 2 equal partitions. There are then log2 partitions. There are n items, so the total number of comparisons is n.log2 n i.e. O(n.logn )

14.2 Worst case


O(n2) - no better than Exchange sort. But this is extremely unlikely. The choice of pivot is crucial - ideally, this should be the median key, but the true median can only be found by sorting! Some variants choose the pivot by finding the median of 3 items randomly selected. The example below selects the central element as the pivot.

14.3 Average
For all possible orderings of the keys 1.39n.log2n. Mathematicians can see the proof in Algorithms - see para 17. below.

80

Sorting
15. C++ code for function Quicksort ( see Wirth )
void QuickSort( int array[], int first, int last ) { int lb = first, ub = last; // lower bound and upper bound int pivot = array[ (first + last) / 2 ]; // pivot = central element int temp; // for the swap do { while ( array[ lb ] < pivot ) // search up for item >= pivot lb++; while ( pivot < array[ ub ] ) // search down for item <= pivot ub--; if ( lb <= ub ) // if not crossed over, then swap { swap ( lb, ub ); // swap elements using their index lb++; // increment ready for next scan ub--; // decrement ready for next scan } } while ( lb <= ub ); // until indexes cross over if ( first < ub ) // if > 1 item in the partition QuickSort(array, first, ub); // partition the lower partition if ( lb < last ) // if > 1 item in the partition QuickSort(array, lb, last); // partition the upper partition }

16. Comparison of complex sorting algorithms


16.1 Shell sort - a refinement on insertion sort proposed by D L Shell in 1959. The analysis of this algorithm poses some difficult mathematical problems. 16.2 Heapsort - a refinement of selection sort. It seems to like sequences which are
initially in inverse order. The second fastest of the advanced sorts. Shell sort is faster only if the data is already ordered.

16.3 Quicksort - is significantly faster than either of the above whatever the initial ordering of the data.
500 T 400 i 300 m e 200 100 Shell Sort Heap Sort Quicksort

Ordered Random Inverse

17. Further Reading


Algorithms + Data Structures = Programs, Wirth N, 1976, Prentice Hall Classic Data Structures in C++, Budd Timothy A., 1994, Addison Wesley

81

Testing Testing
1. The context for testing - Verification and Validation
Verification and Validation is a generic term for all processes which ensure that the software meets its requirements, and that the specification meets the needs of the client. In other words, Verification means Are we building the product right? This involves checking that the software product conforms to its specification Validation means Are we building the right product? This involves checking to ensure that the software product meets the expectations of the client

Techniques required
!

Static Dynamic testing

Analysis of the design and program listing. Includes Walkthroughs, Inspections, Formal verification Exercising the program using test data similar to real data, i.e.

2.

The objectives of testing


! !

To show that the software system meets its specification. To exercise the system in such a way that any latent defects are exposed.

Testing cannot prove the absence of defects, only their presence. A successful test is one that discovers defects.

Testing can never be exhaustive


Apart from trivial programs, the number of different
! !

possible inputs pathways through the program

are effectively infinite. For large programs, testing all possible combinations of pathways through the code and all possible variations in categories of input would take until the end of the universe even at the rate of one test per millisecond.

83

Testing
3. Testing & Debugging
! !

Testing is required to discover errors in software. Debugging is the process of correcting errors discovered by testing.

Locate Error

Design Repair

Repair Error

Re-Test

It is much more economical to discover errors at the design stage than after the program has been coded because this avoids the correction process i.e. it avoids the need to debug and re-test.

4.

Two different testing strategies


! !

Bottom-up Top-down

4.1

Bottom-up testing
As each component (e.g. function or module) is developed it is tested 'stand-alone' by using a specially written 'test harness' or 'test driver'. This is referred to as unit testing. In C++ a module is a file pair - the interface (header file) and the implementation (object code file). Usually this pair will implement either:! !

A set of useful functions, e.g. iostream, math An abstract type, e.g. a linked list or string abstraction

Re-usable components (e.g. a linked list module) should be distributed with test drivers. Individual components e.g. functions are tested to ensure that they operate correctly. Each component is treated as a stand-alone entity that does not need other components in order for it to be tested. Functions are assembled into modules that are then tested. - module testing. Several modules may be amalgamated to produce sub-systems which are then tested - sub-system testing. One of the problems that module or sub-system testing might reveal is a mismatch between the interfaces. This can occur when the module using the facilities of another module has been designed on assumptions that differ from those made in the design of the module. This might result from a lack of understanding of the interface specification on the part of either the author or the user of the module. Or it might be caused by an error in implementation.

Unit Testing

Module Testing

SubSystem Testing

System Testing

Acceptance Testing

Component Testing

Integration Testing

User Testing

Finally, all modules are combined to produce the program - system testing.
84

Testing
After this, the user carries out acceptance testing. For bespoke systems developed for a single user, this is sometimes referred to as alpha testing. For marketable software products beta testing may be used where a number of users agree to use the system and to report on any problems. In exchange for this they may get the software either free or at a preferential rate.

Advantages and Disadvantages of Bottom-up Testing


!

Advantage It is easier to create test conditions. The functionality is there - it just needs code to test it. Disadvantages If combined with top-down development, all system components must " be available before testing can start because the last items to be completed under this development strategy are the lowest level components - the first to be tested. " If top-down development is not employed, then special test drivers have to be written for each component. Eventually these are replaced by the actual higher level components when they are implemented.

4.2

Top-Down Testing
This starts with a skeleton of the system. An 'executive module' (at the top of the hierarchy). Some or all of lower level modules may not have been implemented and exist only as stubs. Stubs are functions whose body has not yet been implemented. They simply report e.g. the name of the function or the value of the arguments and/or return a dummy value. Initially, the tests are very limited - the purpose is only to exercise the interfaces between major sub-systems. As more and more modules are implemented the tests can become more comprehensive.

Advantages! !

The testing process matches the top-down design approach. Structural errors - perhaps faults in the design are found earlier. This may avoid extensive re-design at a later stage. The availability of a limited working system is a morale booster and may be available to demonstrate to client. It may be difficult to provide stubs which simulate the behaviour of a complex component. In most systems, output is generated by lower level modules. There may therefore be a need for an artificial environment to generate test results for higher level modules.

Disadvantages
!

4.3

Conclusion
The top-down approach is generally considered preferable for most systems today Yourdon. But, in practice, it will always be necessary to include a certain amount of bottom up testing of low level components.

85

Testing
5. Categories of Testing
5.1 Functional testing
The most common form. Its purpose is to ensure that the program performs its normal functions correctly - see above.

5.2

Thread testing
This may be used in real-time systems which are usually made up of a number of co-operating processes. An external event such as an input from a sensor may cause control to be transferred from the current process to the process that handles that event. Real time systems are difficult to test because of the time-dependent interactions between the processes. An error may occur only when the processes are each in a particular state. Thread testing follows the functional testing of the processes and is designed to trace the effect of the different external events as they thread through the various processes. The number of combinations of state of the various processes may be so great that it is impossible to test all of them, e.g. 10 processes, each with 10 possible states produces 10,000,000,000 different combinations.

5.3

Recovery Testing
Purpose - to ensure that the system can recover from various types of failure. This is important in on-line and real-time systems e.g. controlling manufacturing processes. It may be necessary to simulate in software such failures as hardware, power, operating system etc.

5.4

Performance (Stress) Testing


Purpose - to ensure that the system can handle the specified volume of transactions in terms of response time, storage requirements etc. This would be important in large transaction processing applications such as airline reservation systems.

6.

Test Planning
The planning of tests should be carried out during the Specification and Design phases of the software project:-

Req'ments Spec

System Spec

System Design

Detailed Design

Acceptance test plan

System Integration test plan

Sub-system integration test plan

Module & Unit code test

Service

Acceptance test

System Integration test

Sub-System Integration test

86

Testing
6.1 Test Plan & Test Log
The Test plan includes
! ! ! !

A unique identifying number for the test. A description of the purpose of the test. A specification of the data to be used. A description of the expected result.

The Test log includes


! ! ! ! ! !

A reference to a test plan item. The date of the test. The result of test. An indication of whether or not expected result was obtained. A reference to any corrective action required if a fault is found. A possible reference to re-testing if this is needed.

7.

How much testing?


In theory, a program should be tested in such a way that all sets of pathways through it and all possible combinations of input data are covered. In practice this is impossible for all except very trivial programs because the number of combinations of input and pathway is effectively infinite. However not every possible input may need to be tested. There is probably a very large number of different inputs that will have the same effect. Thus, if a function expects to receive an integer argument in the range 1..100, then all argument values in this range should cause the function to behave correctly, and any outside of this range should cause an error. It should not be necessary to test for every single valid argument value, nor for every single invalid value. Instead, the range of argument values can be partitioned into a number of equivalence classes (see para. 10).

8.

Test Data v Test Cases


Test Data The inputs devised to test the system Input and Output specifications + a statement of the function under test, the reason for the test and the expected result. Test Cases -

Test data can sometimes be generated automatically, but it is impossible to generate test cases automatically.

9.

Black box v White box testing


Black Box White box Does not consider the code of a component. Test cases are derived only from its specification and interface. Test cases are derived from a detailed study of the code of the component to be tested.

These two methods are NOT alternatives. White box testing may be carried out early in the testing process, while black box testing may be applied later. They are likely to uncover different classes of error.

87

Testing
10. Black box testing
There are two techniques for deriving the test data ! !

Equivalence Partitioning Boundary Value analysis

10.1 Equivalence Partitioning


This technique divides the input domain into a number of equivalence classes so that a test on one representative value of each class is equivalent to a test using any other value in that class. Example A function requires an argument Age which is an integer. The allowable range of values for Age accepted by the function is 18..65. From a study of the specification of the function or other program documentation the following 3 equivalence classes can be identified:! ! !

Valid class Invalid class Invalid class

any value in range 18..65 any value in range MIN(int)..17 any value in range 66..MAX(int)

Test cases can then be designed for each valid equivalence class and for each invalid equivalence class - a total of 3 tests in this simple case. If there is more than one argument, the test cases should cover the invalid classes for only one argument at a time because one erroneous argument may mask the effect of another erroneous argument.

Another Example
/* Pre Post -

Binary Search function of an ordered array

bool binsearch( int array[], int numitems, int target, int& location )
The array is ordered, numitems >= 1, numitems <= no. of array elements If target is present in the array, then location records the element number at which target was found and true is returned, else location records the correct insertion point and false is returned */

{ int low = 0, high = numitems - 1, mid; bool found = false; do { mid = (low + high) / 2; if( target > array[ mid ] ) low = mid + 1; else high = mid - 1; } while( target != array[ mid ] && low <= high ); found = ( target == array[ mid ] ) if ( found ) location = mid; else location = low; return found; }

88

Testing
Valid Equivalence classes for input arguments:The choice of VECs may require experience, e.g. that the binary search of an ordered array may, if not correctly coded, behave differently depending on whether the number of items stored in the array is odd or even, or if there is only one item.
!

Array "

"
"
!

has 1 item (numitems = 1) has even number of items (e.g numitems = 6) has odd number of items (e.g. numitems = 7)

Target is present in the array " " is not present in the array

= 6 combinations of valid equivalence classes

Invalid Equivalence classes for input arguments:These are all cases where the pre-conditions are not met. The specification of the binsearch function says nothing about how it will respond to such error conditions. C++ provides the facility for an exception to be raised in such cases and for error handlers implemented elsewhere in the code to catch the exception and take the necessary action. In a production program these invalid equivalence classes would be tested to ensure that the exception and handling mechanisms dealt correctly with the various causes of the error.

Black box testing on classes of output


It is necessary to test the outputs from the function in the same way as for inputs. The same principles are applied as for input by specifying valid and invalid equivalence classes for each output. Inputs are then devised that will produce these defined outputs:!

location valid " " " invalid

0..numitems <0 > numitems

valid return values (there are no invalid return values) non-zero (true) " zero (false) "

= 2 combinations of valid, and two combinations of invalid equivalence classes.

89

Testing
10.2 Boundary Value Analysis
This complements equivalence partitioning and, in practice, is used at the same time as equivalence partitioning to determine the test data required for testing a component. Boundary values are those directly on ! just below ! just above the boundaries of the equivalence classes
!

It is an observed fact that a greater number of errors occur at the boundaries of the input domain than in the centre.

Examples
! ! ! ! ! ! !

Range of values, e.g. 18..65 Test 17,18,65 and 66 Discrete set of values, e.g. 2, 3, 5, 8, 13 Test 1, 2, 13, 14 Data structure (e.g. array) has 1..100 elements Test 0, 1, 100, 101 Loop iterations, none, 1, 2, max, max + 1

It is also necessary to identify the boundaries of the output equivalence classes.

Boundary Analysis of Search Procedure


Previously identified valid equivalence classes:!

Array " " "

has 1 item (numitems = 1) has even number of items (numitems = e.g. 6) has odd number of items (numitems = e.g. 7)

Target is present " " is not present

= 6 combinations of valid equivalence classes Experience shows that programmers often make errors in an algorithm due to a misunderstanding of its behaviour at the boundaries of its input domain. In the case of the binary search algorithm, these errors might occur when the target (if present) is located in the first element of the array, or in the last element. Obviously it is necessary also to test the normal case when the target is in neither of these locations.

90

Testing
Thus the further test cases are added to those above:Target is in first element of the array ! Target is in the last element of the array ! Target is in neither the first nor the last element When the equivalence classes already developed are combined with these boundary values, the following 10 test cases arise:! ! ! ! ! ! ! ! ! ! !

numitems = 1, target is present numitems = 1, target is not present numitems is even, target is in the first element numitems is even, target is in the last element numitems is even, target is present and in neither the first nor the last element numitems is even, target is not present numitems is odd, target is in the first element numitems is odd, target is in the last element numitems is odd, target is present and in neither the first nor the last element numitems is odd, target is not present.

11. White box testing - Introduction


Test data is derived from the actual source code of the component instead of from its specification. Ideally tests should exercise all possible sets of paths through the code

Branching Decision

Statement Block

Loop twice

How many different sets of paths exist for this simple piece of code?
1 First iteration Second iteration A A 2 A B 3 A C 4 B B 5 B A 6 B C 7 C C 8 C A 9 C B

91

Testing
The answer is 9, i.e. 3 paths raised to the power number of loop iterations. And this? The answer is 95,367,431,640,625 = 520 different sets of paths. Evaluating every possible set of paths at 1 test/millisecond would take 3,022 years. So exhaustive testing is not possible. In practice, tests should guarantee that
! ! ! !

Each path (not necessarily all sets of paths) has been exercised. All logical branches have both values tested (true and false). All loops are exercised at their boundaries and within their bounds. All internal data structures have been exercised to ensure their validity.
loop 20 times

But why do we need to go to all this trouble? Wouldn't we spend our time better simply ensuring that the function/module/program requirements have been met? In other words why don't we confine our tests to black box testing?

Because
!

Logic errors and incorrect assumptions tend to occur in inverse proportion to the probability that a path will be executed. Normal processing tends to be well understood and scrutinised, but special cases tend to fall down the cracks.

! !

We often believe that a path is unlikely to be executed when, in fact, it may be executed regularly. Typing errors are usually picked up by the compiler. But those that are not detected are just as likely to occur on an obscure logical path as on a mainstream path.

12. White box testing


12.1 Techniques
! !

Statement coverage Condition coverage Branch testing " " Domain testing Loop coverage

12.2 Statement coverage


Every statement should be executed at least once. See Sommerville Ch 22.2.1 on path testing & cyclomatic complexity. also Pressman Ch 18.2..18.4

92

Testing
12.3 Path testing
A technique for finding the number of unique paths through a program thus providing the number of test cases. Uses flow graphs derived from the program code or from the PDL (program description language) for the routine + metrics for calculating the cyclomatic complexity.

12.4 Path testing


Flow Graph Constructs
Sequence While Case

If Repeat

12.5 Cyclomatic Complexity


The cyclomatic complexity is a measure of the logical complexity of the code. A flow graph is drawn from the flow chart of the component. The C.C. may be calculated in any one of three ways (see flow graph on next page).
!
do

mid = (low + high) / 2;

Number of regions (including the one outside the graph) Number of edges - number of nodes + 2 Number of predicate nodes + 1. (Predicates are simple 2 branch constructs. Each diamond in the flow chart opposite is a predicate).

if( target > array[ mid ] )

low = mid + 1;

else high = mid - 1;

while( target != array[ mid ] && low <= high );

Each of these three methods produces the same cyclomatic complexity metric (i.e. the number of independent paths through the code). In this example = 5 The number of independent paths also provides the number of different test cases required to ensure that all statements are exercised.

if ( found )

location = mid;

else location = low;

return found;

Flow chart for binary search

93

Testing
12.6 Condition Testing
Conditions are made up of:!
mid = (low + high) / 2;

Arithmetic & character expressions involving arithmetic and character variables and constants Relational expressions - logical expressions involving arithmetic and character expressions and relational operators. They have the value of either TRUE or FALSE. Boolean variables.- Values Non-zero (TRUE), zero (FALSE). Boolean operators (&&, ||, !) joining one or more logical expressions. Parentheses surrounding simple or compound conditions

if( target > array[ mid ] )

R2 low = mid + 1; R1 else high = mid - 1;

while( target != array[ mid ] && low <= high ); R3 R5

if ( found )

Number of edges = 14 Number of nodes = 11 Number of regions = 5 Number of predicates = 4

R4 location = mid;

else location = low;

Condition testing
Focuses on testing each condition in the component (including each of the simple conditions making up a compound condition).
Flow graph for binary search

return found;

Condition testing strategies


! !

Branch testing Domain testing

The advantages of condition testing are i) it is easy to generate test cases and ii) it is likely to reveal other errors in the program.

12.7 Branch testing


Test data is constructed so that the TRUE and FALSE branches of compound conditions and the TRUE and FALSE values of every simple condition within the compound conditions are tested. To find all possible combinations of the TRUE and FALSE branches of all conditions, it is necessary to construct a truth table.

94

Testing
Example
if ( A > 1 && B == 0 ) X /= A;
A>1
TRUE / FALSE T T F F Value 3 3 1 1 T F T F B == 0 TRUE / FALSE Value 0 1 0 1 A > 1 && B == 0 TRUE / FALSE T F F F

For the above 2 conditions there are 4 test cases i.e. 22. For 3 conditions, there are 23 = 8 possible combinations etc. This technique is therefore only practicable for small numbers of conditions.

12.8 Domain testing


Domain testing of relational expressions requires that 3 values be considered for each variable component of a relational expression : less than, equal to and greater than. For the above example, this gives rise to the following test cases for each variable A and B.
= A == 1 B == 0 > 1 0 A>1 B>0 2 1 A<1 B<0 < 0 -1

There are therefore 3 test cases for each of the two variables in the example compound condition, leading to 32 = 9 test cases. Again, the number of test cases rises rapidly as the number of variables involved in a relational expression increases.

12.9 Loop coverage


The vast majority of algorithms in software employ loops. Loop testing focuses entirely on the validity of loops which are classified as follows
! ! !

Simple loops Nested loops Concatenated loops

Simple loops

Nested loops

Concatenated loops

95

Testing
Simple loops
The following tests should be applied to simple loops, where n is the maximum number of allowable iterations of the loop:! ! ! ! !

Skip (loop is not entered) One pass 2 passes m passes (m < n) n - 1, n, n + 1 passes

Nested loops
The number of times that statements within the inner loop are executed is the product of the number of iterations of all nested loops within which it appears. Thus a triply nested loop, where each loop iterates 10 times, will cause statements in the inner loop to be executed 1,000 times. The number of test cases grows geometrically and full testing may be impracticable. The suggested solution is:a) b) c) d) Start with the innermost loop, setting all outer loop control variables to their minimum. Test the inner loop as Simple above. Work outwards to next innermost etc. keeping outer loop control variables at their minimums, and the inner at typical values. Continue until all nested loops have been tested.

Concatenated loops
Where the concatenated loops are independent of each other, treat each as a simple loop. Where the second loop has the same control variable as the first and starts with its value unchanged, treat the two loops as nested.

13. Automated Testing


Testing often accounts for as much as 40% of the total time spent on software development. Automated testing tools are therefore an important ingredient in the software developer's armoury. The following categories have been identified:! ! ! ! !

Static Analysers Carry out a static analysis of the program's structure and format. Code auditors Special purpose filters that check the quality of software to ensure it meets minimum coding standards. Assertion processors

96

Testing
!

The programmer writes assertions about the state of program. The assertion processor tests whether they are true or false. C incorporates a simple form of assertion testing:#include <assert.h> int main ( void ) { int i = 0; for( ; i <= 10; i++ ); assert( i == 10 ); return 0; } /* Assertion failed: i == 10, file ASSERT.CPP, line 7 Abnormal program termination */ C++ provides exception handling which gives greater flexibility and permits an exception handler to attempt recovery from an error.

! ! ! !

Test file & Test data generators Test verifiers - measure and report on internal test coverage Test harnesses - Allow the program to be installed in a test environment, and fed input data. The behaviour of subordinate modules is simulated by stubs. Output comparators - compare output from the current version of program with that from an earlier version to determine any differences This is an area of growing importance and descendants of the first generation testing tools are expected to cause radical changes in the way software is tested.

97

Data Structure Metrics Data Structure Metrics


1. Representing Abstract Structure
Assume we wish to store a linear list of names in random access memory. There are several ways this could be done. Scheme 1 Names are stored in successive memory locations (each name is assumed to occupy only 8 bytes). Given the start address of the list (1000), we can find the ith name by going to address Start + (i - 1) * 8.

Address 1000 1008 1016 1024 1032

Name Milton Dickens Eliot Arnold Conrad

We can find the address of the next name by adding 8 to the address of Scheme 1 the current element. Thus, Scheme 1 implements the logical structure of the data by locating its elements in physically adjacent memory locations. But if we wish to retrieve a name (in order to access some other data associated with it), then we would have to scan the list from the start, looking for the name to be retrieved. Scheme 2 Each name is positioned in memory according to the value of its first letter. The address for a particular name is found by 1000 + 8 * (int(firstletter) - int(`A')) In this case there is no way of finding the logical successor of a record. We are prevented from operating on the data using its logical structure. But if we wished to retrieve a particular name, we could do so very quickly by calculating the address directly from the name. Scheme 3
Address 1000 1008 1016 1024 1032 .. 1096 Name Arnold Conrad Dickens Eliot .. Milton

Scheme 2

Each element contains both a name and an address pointing to the Name element's logical successor. Given the address of any element, we Address can find its successor by simply going to the address contained in 992 that element. 1000 Milton Scheme 3 implements the logical order by linking the elements 1008 Dickens together in the proper sequence which is not the same as the 1016 Eliot physical sequence. Address 992 is used to hold the address of the 1024 Arnold first name in the list. Milton has a blank successor address field
1032 Conrad

Successor Address 1024 0 1016 1000 1032 1008

indicating that this is the last name in the list.

Scheme 3

As with Scheme 1 we cannot find a given name other than by starting at the beginning of the list and comparing each successive name with the target. These three schemes illustrate the three fundamental methods of implementing abstract list data types - by an array, a hash table and a linked list.

99

Data Structure Metrics


2. Implementing Data Structures
The implementor of a data structure must design the black box so that memory space is not wasted and the operations are performed efficiently. If the user knows in advance how many data elements the structure is required to handle, then certain efficiencies can be gained. If not, then the structure must be made flexible in order to accommodate an unknown number of items, or considerable space may be wasted. If the length of the elements is fixed and known, again efficiencies can be obtained compared with the case where elements are of unknown length. The implementor can make certain operations efficient at the expense of others, and he will need to know for which operations maximum efficiency is important to the user. There is almost always a trade-off available between space and time. Greater speed can be obtained at the expense of more memory space and, conversely, a saving in space will usually incur a time penalty. Which is the more important to the user, space or time? All of these considerations must be taken into account in the implementation of a data structure.

3.

Metrics
One way of implementing a list is to use an array. It is true that arrays are relatively unsuitable for this purpose because of their inflexibility and because of the need to shuffle array elements down to fill the hole left by a deletion, but they have the advantage of requiring no overhead in terms of space. Linked lists, of course, carry an overhead in the form of the links (pointers) that connect the nodes. Envisage then a list implemented as an array as in Scheme 1 above and assume that we wish to find the name Eliot in the list.

3.1

Number of Comparisons
We simply start at the first name in the list (Milton) and search through the list, comparing each name encountered with Eliot. One measure of the time required to find this name is the number of comparisons made of each name with the target. Unless the list is very short, the time required to initialise and finalise the search will be relatively unimportant when set against the number of comparisons. It is generally true that the number of comparisons made when searching a data structure will be one of the major factors in determining the speed of execution.

3.2

Number of Data Moves


This is the second most important operation determining the efficiency of operations on a data structure. Suppose we wished to remove the name Eliot from the list and did not wish to leave a gap. We could move each name below Eliot up one location, thus reducing the length of the list by one element. No comparisons are required for this operations, but many data moves. The speed of deletion will therefore be governed by the number of moves.

100

Data Structure Metrics


3.3 Algorithm Complexity
The measurement of the complexity of an algorithm is important because of the effort required to
! ! ! ! !

implement the algorithm understand it debug it modify it maintain it

4.

Mathematical Notations
One way of ascertaining the efficiency of algorithms used in operations on data structures is to write a program which tests the algorithm on a large number of different types and sizes of data. This approach is useful in trying to understand an algorithm and the factors which affect its efficiency, but the problem is that:a) b) The data would only be valid for the computer, operating system and language we have employed and the nature of the data stored in the data structure. We could not possibly examine exhaustively all possible combinations of data (there are over 358,000 different combinations of just four characters, ignoring case). We would finish up with a mass of results which would be difficult to understand and distil into a general indication of the efficiency of the algorithm under consideration.

c)

We require a crude indicator of the time complexity of an algorithm that relates the time taken to the number of elements held in the data structure. We are not particularly concerned with the absolute amount of time, which, for one algorithm, will depend on the factors mentioned in a) above. Looking at the search example above, how many comparisons, on average will be required to find a name in the list? Let n denote the number of names in the list:Element Number 1 2 3 .. .. n Number of Comparisons 1 2 3 .. .. n

To find the average number of comparisons necessary to locate a name present in the list, we first find the total required to find each of the names, and then divide by n. Thus, n comparisons would be needed to find the last name, n - 1 to find the last but one ... through to just one comparison to find the first. We can calculate the average number of comparisons for n items without needing to know the value of n:-

101

Data Structure Metrics


Total number of comparisons = Reverse Add n + (n-1) + (n-2) + 1 + 2 + 3 + (n+1) + (n+1) + (n+1) + .. 1 .. n .. (n+1)

Since there are n items in the sequence, the total of the third row is n(n + 1). To find the average number of comparisons, we need to divide by n and also by 2 since we added the 2 sequences together. (n + 1) Divide by 2n to find the average for any one name n(n+1) ie 2n Thus the average number of comparisons required to find a name in the list is about half n whatever the value of n. Since we have seen that the number of comparisons is a major determinant of the time required, we can say that the time taken for this search is proportional to (n + 1). Since the constant is not significant in relation to other possible factors of n, we can say that the order of magnitude of the efficiency of the search is n, and we write this as O(n). This is sometimes referred to as the Big O notation. Only the dominant term is chosen to represent a crude notion of the order of magnitude of the entire expression, eg n(n+1) 15n logn + 0.1n2 + 5 6 logn + 3n + 7 2n - 5 Why is the second item above classified as O(n2) when this appears to form only a small part of the expression? Table 1 shows the value of this function for various values of n. The last column shows the value of the expression divided by 0.1n2. Note that from n = 512, the value in this last column starts to settle down to about 1.0 indicating the overwhelming importance of the 0.1n2 component.
n 15n log2n 3 5 7 9 12 16 20 0.1n
2

is is is

O(n2) O(n2) O(1)

15n.log2n+0.1n +5 371 2,507 15,083 95,339 2,415,007 445,225,375 110,265,735,583

/ 0.1n

8 120 32 480 128 1,920 512 7,680 4,096 61,440 65,536 983,040 1,048,576 15,728,640

6 102 1,638 26,214 1,677,722 429,496,730 109,951,162,778


2

58.03 24.49 9.21 3.64 1.44 1.04 1.00

TABLE 1

Value of expression 15n logn + 0.1n + 5 for various values of n

Table 2 gives some idea of the values of several different functions of n. Some simple sorting methods (e.g. Exchange or Bubble sort) operate in a time which is O(n2) whereas other, more complex algorithms (e.g. Shell sort and Quicksort), operate in a time which is O(nlog2n). If there were 1024 items to sort, the simple method would take approx 1,000,000 units of time. Compare this with a time of only approx 10,000 for the O(nlog2n) sort. However, it is not true to say that the complex sort is 100 times faster than the simple sort. Because of its complexity, the more powerful O(nlog2n) sort will carry an overhead which results in constants which are present in the true value of the function but which are ignored in arriving at the crude order of magnitude value. For this reason also, the complex sort may not be as fast as the simple sort for small values of n.
102

Data Structure Metrics


n O(n) O(n ) O(n) O(log2n) O(n.log2n)
2

8 8 64 3 3 24

128 128 16,192 11 7 896

1,024 1,024 1,048,576 32 10 10,240

1,048,576 1,048,576 1,099,511,629,000 1,024 20 20,971,520

TABLE 2 Values for various functions of n

In some sources you may find logarithms specified without the base, eg O(nlogn). Does it matter which logarithm base in used in these order of magnitude expressions? The answer is no, because, although the absolute values of the expressions will differ according to the base used, the rate of increase of the function for increasing values of n will remain the same for all logarithm bases. Table 3 illustrates this by showing the values of the expression O(nlogn) for logarithms base 2, e and 10 and for values of n which double in each row. Note that the rate of increase is exactly the same for all three bases, and is approximately 2.2 times for each doubling of n.
n 128 256 512 1,024 2,048 4,096 8,192 16,384 n.log2n 896 2,048 4,639 10,240 22,528 49,152 106,496 229,377 rate of increase 2.29 2.27 2.21 2.20 2.18 2.17 2.15 n.ln n 621 1,420 3,216 7,098 15,615 34,070 73,817 158,991 rate of increase 2.29 2.27 2.21 2.20 2.18 2.17 2.15 n.log10n 270 617 1,397 3,083 6,782 14,796 32,058 69,049 rate of increase 2.29 2.27 2.21 2.20 2.18 2.17 2.15

TABLE 3 Values for various logarithm bases

103

Trees Trees
1. Applications
! ! !

Trees are hierarchical structures and can be used in any application that models a hierarchical structure, e.g. disk directory and file structure. In some forms they can provide rapid searching and lookup They can maintain their data ordered (usually on a unique key that is associated with their data)

2.

Implementation
Trees cannot normally be based on a fixed size structure such as an array. They are normally implemented using dynamically allocated nodes linked by pointers.

3.

Variations
! ! ! ! !

Binary Search trees Expression Trees Balanced Trees N'ary Trees B Trees

4.

Example Declaration
struct DataItem { int key; anytype value; }; struct Node { DataItem data; Node* left, *right; }; struct BinaryTree { int count; Node* root; }; // key to search on // depends on the application

// struct as above // pointers to left and right child nodes

// number of nodes // single entry point into the tree

105

Trees
5. Expression Trees
Assume the expression ( 3 + 4 ) * ( 6 - 4 ) is to be evaluated. Parsing and evaluating an infix expression of this sort in a single pass is very difficult because the string has to be searched back and forth to recognise and allow for the modifying effect that the parentheses have on the meaning of the expression. A tree of nodes representing operators ( +, -, *, / ) and values (or variables) can be built to represent the semantics of the expression without the parentheses. The tree can then be traversed to retrieve the symbols and values in an appropriate order for evaluation - see Traversal below.
#

6.

Tree Traversal
There are several possible ways in which the tree can be traversed, the most common are known as inorder, postorder and preorder:Inorder PostOrder PreOrder <left tree> Node <right tree> <left tree> <right tree> Node Node <left tree> <right tree> (3 + 4) * (6 - 4) 34+64-* *+34-64

The post order traversal would produce the nodes in an order suitable for evaluating the resultant postfix expression using a stack. The algorithm for binary tree traversal is one of the most elegant in computer science. It is recursive:void inorderTraverse( Node* p ) { if ( p != 0 ) { inorderTraverse( p->left ); Process( p-> data ); inorderTraverse( p->right ); } } Process( p-> data ) is the operation that is to be carried out on each node. Note that this algorithm effectively maintains its own stack of nodes visited but not yet processed. This is represented by the series of stack frames that is pushed onto the system stack for each call to the function. A non-recursive version of this algorithm requires an explicit stack of nodes to be maintained and is quite inelegant when compared to the above.

106

Trees
7. Parse Trees
Sentence Subject Object Noun Verb = = = = = Subject Verb Object Noun | Noun Phrase Noun | Noun Phrase Cat | Mat | Dog sat | ate | chased
Noun Subject
OR

Sentence

Verb

Object
OR

Noun Phrase

Noun

Noun Phrase

Parse trees such as the above very simple example above may be used in natural language recognition and language translation software.

8.

Binary Search Trees


The (recursive) definition of a binary tree is:A binary tree is
! !

either empty or consists of a node with left and right binary trees

Binary search trees are ordered on a unique key field. The first data item to arrive causes a new node to be allocated which becomes the root node. Access to the tree is always via the root. For subsequent additions, the tree is traversed, looking for an empty left or right child node starting at the root. If the key of the data to be added is less than that of the current node, then the left child of the current node is visited. If the data to be inserted is greater than that of the current node, then the right child is visited. If the two data values are equal, then the data cannot be added since binary trees rely on the keys being unique. Eventually, an empty left link or right link is encountered. A new node is allocated and linked in to the tree as the left, or right child of the node currently being visited. All additions therefore take place at the lower levels of the tree - as leaf nodes.
!

Searching for 6 Left, Right, found Searching for 11 Right, Left, Right, not found Inserting 13 Right, Right, Left, not found so Insert as Left child of 14
2 6 4

12

10

14

The total number of nodes in a perfectly balanced binary search tree is 2 20 levels, the total number of nodes would be 1,048,575.

Level

-1. Thus, for

The efficiency of a perfectly balanced tree is measured by the average number of comparisons required to find a key that is present in the tree. Since it requires one comparison to visit the root node, two comparisons to examine the root node and one of its child nodes etc. the maximum number of comparisons is the number of levels and, since the number of nodes doubles at each level, the average number of comparisons for a perfectly balanced tree is the number of levels - 1. Thus for a perfectly balanced tree of 1,048,000 nodes, the average number of comparisons is Number of Levels - 1 = 19.
107

Trees
This makes binary search trees a suitable structure for fast retrieval of data by reference to a key and, for this reason, the C++ Standard Template Library uses balanced binary search trees to implement searchable structures such as map and set.

9.

Importance of Balance
This tree was generated by inserting the data in numeric order - 2, 4, 6 .. 16. If, as in this case, the tree is not balanced, search efficiency degrades towards a simple sequential search, i.e. from an average number of comparisons = Level - 1 to (n + 1). There is little difference between the two in this small example but, for large numbers of items, the difference in searching efficiency is extremely large.
2 4 6 8 10 12 14

Degenerate Binary Search Tree

16

AVL Trees (from Adelson-Velskii & Landis) employ a balancing algorithm on every insertion and deletion which ensures that the tree maintains an adequate (although not perfect) balance. Another algorithm is red/black trees that are used in the Standard Template Library.

10. Other types of tree


An important tree structure is the B Tree (not to be confused with binary tree). They are used extensively in database software for file indexing, i.e. the storage (possibly in a different file) of pairs consisting of a key and a record number at which the data associated with the key may be found in a data file. They differ from binary trees in that each node contains not one key, but an array of ordered keys and have the attribute that they are always balanced and that new nodes are created by splitting the root node in two. Arrays can be searched efficiently by a binary search in which the searched-for key is compared with the middle element of the array. If it is smaller, then the 'top' half of the array can be discarded, and a binary search carried out on the lower half. The converse applies, of course, where the key is larger than the middle element of the array. The efficiency of a binary search is the same as that of a binary tree. Each comparison is halving the number of items which remain to be searched.

108

Trees

Level 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Nodes in Level 1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 32,768 65,536 131,072 262,144 524,288

2^ level 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 32,768 65,536 131,072 262,144 524,288 1,048,576

No. of comparisons for Total Nodes level = (2^level - 1) level * nodes in level 1 1 3 4 7 12 15 32 31 80 63 192 127 448 255 1,024 511 2,304 1,023 5,120 2,047 11,264 4,095 24,576 8,191 53,248 16,383 114,688 32,767 245,760 65,535 524,288 131,071 1,114,112 262,143 2,359,296 524,287 4,980,736 1,048,575 10,485,760

Total comps 1 5 17 49 129 321 769 1,793 4,097 9,217 20,481 45,057 98,305 212,993 458,753 983,041 2,097,153 4,456,449 9,437,185 19,922,945

Ave comps per node 1.000 1.667 2.429 3.267 4.161 5.095 6.055 7.031 8.018 9.010 10.005 11.003 12.002 13.001 14.000 15.000 16.000 17.000 18.000 19.000

Table 1 Metrics for Binary Trees

109

Hash Tables Hash Tables


1. Applications
! ! !

Compilers (see later under perfect hashing functions) Basis for other Abstract Data Types, e.g. Set, Dictionary Very efficient retrieval

2.

Operations
! ! !

Insert Remove Find (Lookup)

3.

Efficiency
The measure of efficiency of searching and sorting is given using the big O notation (see Data Structure Metrics on page 99). This is a very crude measure of the relationship between time and the number of items being dealt with. The important factor is the rate at which time increases as the number of items increases. Hash tables are unique among data structures in that their efficiency is not dependent on the number of items stored and their efficiency is therefore given as O(1).

4.

Problem
The penalty paid for this exceptional measure of efficiency is that hashing destroys the lexical order of keys, so that they cannot subsequently be retrieved in their lexical order.

5.

Hashing
Data is stored in a Hash Table that is based on the fundamental array structure provided by the language. The size of the table is always a prime number. Insertion (and searching) is performed by applying some function to the key which converts it into an integer in the range 0 .. table_size -1. The modulus operation is used to achieve wrap-around. In this example the column headed ASC represents the sum of the ASCII codes of the first 3 characters of the name. This is then taken modulo 11 (the table size) to produce the table index. The insertion of the first three items is Key ASC Table Index Name shown in the hash table (second of the two tables). The fourth key BYR produces the same SHELLEY SHE 224 4 index as that of WORDSWORTH - a collision. WORDSWORTH WOR 248 6 This is not surprising since we are trying to KEATS KEA 209 0 insert a very large domain of values into a table BYRON BYR 237 6 with only 11 locations.
BLAKE BLA 207 9 BETJEMAN BET 219 10

111

Hash Tables
6. Collision Resolution
There are two strategies for resolving collisions:!

Open Addressing A second hashing function is used to give a new table location and a further attempt is made to enter the key into the table. The simplest function to produce a new location after a collision is to successively add 1 to the result of hashing the key. But this can cause clustering where the relative density of certain areas of the table is higher than average. This can give rise to a higher than necessary number of collisions. An improved second hashing function is:hashvalue = hashvalue + step where step = hashvalue % ( table size - 2) + 1 step is computed only once before the loop is entered.

Key 0 KEA 1 2 3 4 SHE 5 6 WOR 7 8 9 10 KEATS

Data

SHELLEY WORDSWORTH

Probing continues until an empty slot is found or, after a certain number of tries, the table is deemed to be full.
!

Chaining The Table entry contains a data entry and a pointer to the head of a list of data items that collided with the first or, more simply, just a pointer to the head of a list.

0 KEA KEATS 1 2 3 4 SHE SHELLEY 5 6 WOR WORDSWORTH 7

BYR BYRON

7.

Hash Table example


This is a simple skeleton for a hash table that holds a String pair - a key and its associated string data. Several functions are not shown, e.g. resize, search. The search function closely matches the add function except that resizing is not needed. #include "strng.h" #include <assert.h> struct Item { String Key, Data; bool occupied; }; const TABLESIZE = 167; Item tabl[TABLESIZE]; int itemcount; // component type of the table

// 167 is prime // Hash table is an array of Item // number of items stored

void init(void ) { for ( int i = 0; i < TABLESIZE; i++ ) tabl[i].occupied = false; theSize = TABLESIZE; itemcount = 0;
112

Hash Tables
} void add( const String& key, const String& data ) { // for best efficiency, the number of occupied slots should be <= // 80% of table size if ( itemcount > theSize * 8 / 10 ) { resize( ); } int hash = key.hashvalue(); // key must support a hashvalue function int step = hash % (theSize - 2) + 1; // step size for collision resolution hash %= theSize; // hash mod table size int numprobes = 1; // to count the number of probes // look for an unoccupied slot bool foundslot = ( !tbl[hash].occupied ); // loop not entered if unoccupied slot found first time while( !foundslot && (numprobes < theSize) ) // second cond is belt & braces { hash = ( hash + step ) % theSize; foundslot = ( !tbl[hash].occupied ); numprobes++; } assert( foundslot ); // should always be true tbl[hash].Key = key; // store the key tbl[hash].Data = data; // and the associated data tbl[hash].occupied = true; // slot is now occupied itemcount++; // increment count of items }

8.

Perfect Hashing Functions


The special properties of hash tables have led to extensive research to exploit their efficiency. One such area is the speed at which compilers can parse the source code of a program. If a function can be found that is guaranteed to find a unique location in a fixed size hash table for all the reserved words of a programming language, then a significant speed improvement could be gained. It is not easy to find such a function other than empirically. But it may be worth a considerable amount of effort to find it bearing in mind the commercial advantage to be obtained for a fast compiler. A perfect hashing function for Pascal reserved words that does not result in any collisions is:H(key) = L + g(key[1]) + g(key[L]) where L = the length of the reserved word and g = a function associating a letter with an integer. This gives the fastest retrieval possible

113

Hash Tables
[2] do
[3] end [4] else [5] case [6] downto [7] goto [8] to [9] otherwise [10] type [11] while [12] const [13] div [14] and [15] set [16] or [17] of [18] mod [19] file [20] record [21] packed [22] not [23] then [24] procedure [25] with [26] repeat [27] var [28] in [29] array [30] if [31] nil [32] for [33] begin [34] until [35] label [36] function [37] program

114

Libraries Libraries
1. The ctype library
This is a 'C' library of functions that operate on characters. They include functions to test whether a char is a letter, a digit, punctuation etc. and also to carry out case conversion. The functions available from ctype.h are:int int int int int int int int int int int int int int int isalnum(int c); isalpha(int c); isascii(int c); toascii(int c); iscntrl(int c); isdigit(int c); isgraph(int c); islower(int c); isprint(int c); ispunct(int c); isspace(int c); isupper(int c); isxdigit(int c); tolower(int c); toupper(int c);

The use of int instead of char in the return and argument types is historical. For the is.. functions, the return type can be understood to be boolean, In all cases the argument type can be read as type char. Help on each on these functions is provided from the RHIDE menu Help.libc reference. functional categories.ctype.

115

Libraries
2. The maths library
These are to be found in math.h. To use them you need to #include <cmath> or #include <math.h> The functions and constants to be found are:
double acos(double x); double asin(double x); double atan(double x); double atan2(double y, double x); double ceil(double x); double cos(double x); double cosh(double x); double exp(double x); double fabs(double x); double floor(double x); double fmod(double x, double y); double frexp(double x, int *pexp); double ldexp(double x, int _exp); double log(double y); double log10(double x); double modf(double x, double *pint); double pow(double x, double y); double sin(double x); double sinh(double x); double sqrt(double x); double tan(double x); double tanh(double x); double acosh(double a); double asinh(double a); double atanh(double a); double hypot(double x, double y); double log2(double x); long double modfl(long double x, long double *pint); double pow10(double x); double pow2(double x); #define M_E #define M_LOG2E #define M_LOG10E #define M_LN2 #define M_LN10 #define M_PI #define M_PI_2 #define M_PI_4 #define M_1_PI #define M_2_PI #define M_2_SQRTPI #define M_SQRT2 #define M_SQRT1_2 #define PI #define PI2 2.7182818284590452354 1.4426950408889634074 0.43429448190325182765 0.69314718055994530942 2.30258509299404568402 3.14159265358979323846 1.57079632679489661923 0.78539816339744830962 0.31830988618379067154 0.63661977236758134308 1.12837916709551257390 1.41421356237309504880 0.70710678118654752440 M_PI M_PI_2

The usage of any of these functions can be found by running the info program from the DOS command line. Move the cursor to * libc.a: (libc.inf). The Standard C Library Reference press Enter and choose menu options Functional Categories and math functions. press Q to exit the info program
116

Libraries
3. The standard library
This requires the inclusion of cstdlib or stdlib.h. It is a miscellaneous collection of functions for such operations as converting strings to numeric types, sorting and searching, exiting or aborting a program, and executing DOS commands. void int int double int long void * div_t void char * long ldiv_t void int void double long unsigned long int abort(void); abs(int _i); atexit(void (*_func)(void)); atof(const char *_s); atoi(const char *_s); atol(const char *_s); bsearch(const void *_key, const void *_base, size_t _nelem, size_t _size, int (*_cmp)(const void *_ck, const void *_ce)); div(int _numer, int _denom); exit(int _status) __attribute__((noreturn)); getenv(const char *_name); labs(long _i); ldiv(long _numer, long _denom); qsort(void *_base, size_t _nelem, size_t _size, int (*_cmp)(const void *_e1, const void *_e2)); rand(void); srand(unsigned _seed); strtod(const char *_s, char **_endptr); strtol(const char *_s, char **_endptr, int _base); strtoul(const char *_s, char **_endptr, int _base); system(const char *_s);

Some functions in the standard library have been omitted from the above list, because they are either 'C' functions that have a better counterpart in C++ or because they refer to the wide char type that is not covered on this course. Help on these functions can be obtained from within RHIDE by selecting Help.libc reference.alphabetical list or by entering info at a DOS prompt, moving the cursor to * libc.a: (libc). The Standard C Library Reference and pressing Enter, then Alphabetical list.

117

Bibliography Bibliography
C++ From the Beginning C++ for Engineers Instant C++ Programming C++ Primer 3rd Edition The C++ Programming Language 3rd Edition Object-Oriented Programming using C++ Software Engineering 4th Edition Software Engineering - A Practitioner's Approach Algorithms + Data Structures = Programs Classic Data Structures in C++ Skansholm J Bramer B & Bramer S Wilks Ian Lippman Stanley B Addison-Wesley Arnold Wrox Addison-Wesley

Stroustrup Bjarne Addison Wesley Romanovskaya, Shapetko & Svitovsky Wrox Sommerville I Addison-Wesley Pressman R S Wirth N Budd Timothy A McGraw-Hill Prentice Hall Addison Wesley

119