PROGRAMMING TECHNIQUES USING ‘C’

Disclaimer

The copyright of the content used in the courseware will remain with Principle Company.

1

TABLE OF CONTENTS

Chapter 1 Introduction to C Chapter 2 Data types and Storage class Chapter 3 Operators and Type casting Chapter 4 Control Statements and Looping Chapter 5 Array and String Chapter 6 Function Chapter 7 Structure Chapter 8 Pointer Chapter 9 Dynamic memory allocation Chapter 10 File Handling

3 14 22 26 44 57 71 76 91 95

2

Chapter 1

Introduction to C
THE C PROGRAMMING LANGUAGE The C programming language is .the most popular computer language today. C is the language of choice for thousands of programmers working in various fields like systems_ programming, design and implementation of database management systems, firmware development, graphical user interfaces, mathematical modelling etc. That C enjoys a vast following and popularity is not without reasons. There are several issues which make C the premier programming language in the world today. In this section we shall attempt to discover these reasons. In the process we would have had enough of light thrown on the subject for you to gain an insight into the language and its design objectives. As you are aware, computer languages are classified into generations. Machine language, assembly language, and high level languages, are called, the first, second, and third generation languages respectively. Let us leave the machine language out of discussion because no one programs in it today. By choosing to program in assembly language a programmer gains the opportunity to exploit the full power of the hardware. Since the assembly programmer’s instruction set is directly based on the hardware's native capabilities, what cannot be done in assembly language can just not be done at all with the given hardware. But assembly programming has its own disadvantages: Control flow mechanisms (like if, while, for...) are not directly available, I/O functions are not available, the programs are not portable at all, besides the fact that an assembly language programmer ought to have an intimate knowledge of the hardware. The first high level languages were designed precisely to address these problems. Languages like COBOL, FORTRAN and Pascal provided high level control structures, I/O facilities, hardware independence and so on. But quite late in the evolution of computer languages it was realized that while the high level languages provided high level mechanisms, they took the programmer farther and farther away from the hardware that, they simply were not suitable for system level' programming (tasks like writing operating systems, compilers etc.). The realization dawned that a good high level language is one which while providing the desirable features of a 3-GL does not rob the programmer of the power of assembly language. So efforts were started to design a programming language which would provide the best of both worlds: the benefits of high level features on one hand and the speed, and power of assembly language on the other. The language was aptly named CPL Combined Programming Language. CPL was not destined to become popular, for one reason: it was a very bulky and complex language. A small derivative of CPL called BCPL (for Basic CPL)
3

was developed by Martin Richards. BCPL was further adapted by Ken Thompson to produce another language called B, and we now are studying C, which was developed by Dennis Ritchie based on B. C is very suitable for system programming. C has often been called a high level assembly language. This is because C deals with the same set of operands as does the hardware: numbers, characters, strings, and addresses. C implements a rich set of operators using which these operands can be manipulated. That C meets its design objectives can be realized from the fact that several popular programs have been written in C: the Unix operating system, the ORACLE database. system, the Clipper compiler, dBASE III Plus, the Turbo-C compiler, the MS-Windows graphical kernel etc., to name just a few. C encourages structured programming. The well defined control flow mechanisms, and subroutine call mechanisms allow developing well defined, provable, and easy to maintain code. The programmer can break down his program into functional units, develop them independently, and then integrate them into a complete application. The different modules can be present in different source files and they can be compiled separately: Facilitating participation by a team in the development. C encourages code reusability. Apart from the standard C library (which contains a set of subroutines to perform I/O, string manipulation, file operations, heap management and so on), the programmer can develop subroutines to perform various operations, load them into a library and possibly use them in a totally different application. C is one of the most portable languages. The language very clearly spells out the issues that might affect portability and as long as your program does not tread on these grounds, your programs can be ported with ease to a new hardware, or a new OS or both. C is a general purpose programming language. Though the language had its origin in systems programming (it was originally developed to write UNIX in), it is quite general purpose that it has been used in various other fields too.

STRUCTURE OF A C PROGRAM A typical C program has different Sections 1. Documentation Section 2. Link Section

3. Definition Section 4. Global decoration Section 5. Main Method Section 6. User Defined Method Section
4

Though we say that there are different sections in a C program, these are not sections in the sense of divisions of COBOL or Pascal. There are no section headers or footers. . You simply start your program with pre processor directives and once you have supplied them, start providing the global declarations and once they are over start supplying the functions. Also these sections are not compulsory. If you do not require any pre processor directive you need not supply them, if you do not require any global variables, you can skip the global declaration section .also, but you cannot skip the function section. Your program will have to have at least one function by name main. A function in C is nothing but a subroutine as it is called in BASIC, and is similar to the procedures and functions of Pascal and procedures of dBASE. The function's body comprises two parts : variable declarations and executable statements. All variable declarations must be combined and made before the very first action statement starts. The function main must be present in every C program. This is because; when the program is run after compilation the program starts running in main. Inasmuch as main is the entry point to the program, main must be present in every C program. Based on your requirements you may provide as many additional functions as you please. Let us take a look at a typical C program and dissect it and analyze it. The following program accepts marks from the keyboard and calculates the total and average.
#include <stdio.h> main ( ) { int ml,m2,m3,m4,m5; int total; float average; printf("Enter the 5 marks ); scanf("%d%d%d%d%d",&ml,&m2,&m3,&m4,&m5); total = ml + m2 + m3 + m4 + m5; average = total / 5.0; printf("Total = %d Average = %f\n",total,average); }

The first line of the program constitutes the pre processor directive. Let us study the pre processor directives in the next chapter. The first 3 lines inside the function main are the variable declarations. One of the first points you have to appreciate is that all variables that you want to use in your program must be declared. The general form of a variable declaration is type name [= initial value]; By declaring a variable you are informing the compiler of the attributes of the variable such as its name, its type and the initial value. Specification of the initial value is optional. If you do not supply an initial value all variables will have an undefined value. The semicolon
5

in the declaration acts as a statement delimiter. All statements in C must end with a semicolon. Why is declaration of variables important? For a number of reasons: 1. By looking at the declaration the compiler will know the memory variable. 2. What value should the variable have on program start-up? 3. Further the compiler can ensure that the operations you perform on the variables are legal and meaningful. For example the modulus operator % can be applied only to integral types and it is meaningless when applied to floats. The type declaration that you provide thus helps the compiler to verify your programs. Naming Conventions in C The names that you choose to provide to your variables must obey the naming conventions of C. In C, a variable name or a function name (in general identifier name) has to obey these rules: 1. The name can be up to 31 characters long 2. Can be made up of a-z, A-Z, 0-9 and the _ (underscore) 3. Can not duplicate a reserved or keyword in C 4. The first character cannot be a numeral 5. All names are case sensitive. There are a number of points that require close attention here. All the type names that we have seen so far are keywords in the language, for example, int, short, long, float, double, char, signed, unsigned are all keywords in C. These words cannot be reused for your variable names and function names. The next important point is that C is a case sensitive language. This means that the name total is not the same as TOTAL which is not the same as Total. So if you declare a variable as int a; and later attempt to assign 10 to it by saying A = 10; the compiler will give an error ('A' undefined). C is a free form input language. What this means is that the compiler does not expect the source program to be in any positional form. Contrast this with a language like COBOL or FORTRAN where a statement will have to start in a fixed column of the source program. A statement in C can start anywhere in a line and end anywhere in a line. T~e ~ame statement can span several lines and the same line can contain several statements. ~ or more white
6

requirements for the

spaces will have to separate one word from another and where one white space is legal any number of white spaces are legal. Note that the phrase white space here refers to either a space or a tab or a carriage return. So the declaration int a = 25; may as well be written as int a = 25 ; C does not expect the variable declarations to come in any order. That is to say that you can declare whatever variable you want first. If you want to declare multiple variables of the same type you can provide a comma separated list of variables, or choose to declare them in separate lines; Take a look at the following declarations. They are all legal in C. int a,b,c,d; /* Comma separated list of variables*/ float e ; int f ; char g ; /* Multiple statements in Same Line */ short int p = 50, j = 45; unsingned char q = ‘A’; /* char constants are enclosed in ' ‘*/ Also take a look at the. Following. These are illegal in C. int float; /* float cannot be used as a var name */ int 2temp; /* name cannot start with numeral */ int a;b; /* b; is a separate incorrect statement now */ int r float f; /* semicolon missing */ Look at the function main now. When the program starts running, it starts in main and executes the statements one by one. The first statement in main is a function call: call to the function printf. A function in C is, as already mentioned nothing but a subroutine. You call a subroutine in C by simply naming the subroutine and supplying a parenthesized list of parameters. Note that C does not provide any command like BASIC (GOSUB) or FORTRAN (CALL) or dBASE (DO), to call a subroutine. So the general form of a subroutine call is : Functionname(parameterl, parameter2, ...) ; So, as soon as the program starts running, a call is made to the function printf and the string "Enter the 5 marks.." is passed as a parameter. A parameter or what is otherwise called
7

an argument is nothing but an input information given to a subroutine. printf is a function available in the standard library (more on this later) ofC. printf is a general purpose formatted output function. It is called a general purpose function because it can be used to print either integers or floats, or characters or strings. printf sends its output to the standard output which usually is the monitor.

COMPILATION PROCESS The compilation of a C program is a complex process which involves several different stages. In this chapter, we will strive to achieve a clear understanding of the process. Next, we will take a look at a very interesting feature of the language - Mixed mode arithmetic. Often one needs to combine different data types and operate on them. We will study how C provides for this. After studying these issues, we will turn our focus to learning special operators in C and more I/O functions. Stages of compilation of a C program Role of the C pre processor The include & define directives Header files The stdio library The role of the linker Mixed mode arithmetic Special operators More on I/O - getcharO and putcharO The main purpose of the DATA statement is to give names to constants; instead of referring to pi as 3.141592653589793 at every appearance, the variable PI can be given that value with a DATA statement and used instead of the longer form of the constant. This also simplifies modifying the program, should the value of pi change. - FORTRAN manual for Xerox Computers One of the very interesting points that you will have to learn now is that, cc is not the C compiler. cc is instead the C compilation system. When you attempt to compile your program, your program is taken through various steps of compilation. The different steps in compilation of a C source file, and the associated programs are as shown below: No Stage Program involved

8

1. Preprocessing /lib/cpp 2. Compiling /lib / comp 3. Assembling /bin/as 4. Linking /bin/ld

So when you invoke cc, that in turn invokes one by one, the C Pre processor( / lib/ cpp), the compiler (/lib/ comp), the system assembler (/bin/ as) and finally the linker (/bin/ ld). The output of a particular stage acts as input to the subsequent stage. Thus the pre processor accepts your program and preprocesses that. The output of pre processor is nothing but a preprocessed C program, which is then passed on to the compiler. Note that traditionally the C compilers on UNIX machines produce assembly code as output and not machine code, which is why an assembly step is involved. The output of the assembler though is machine code, is not suitable for execution until it is linked. Pre processing As already mentioned, the first step in compilation of a C program is pre processing. The pre processor reads the source file and scans for pre processor directives. Pre processor directives are nothing but commands to /lib/epp. Any line that starts with’#' is treated as a pre processor directive. Note that the '#' must appear in the first column of the source program. There are several commands which you can issue to the pre processor, but we ~hall at the moment restrict our attention to two: the include directive and the define directive. In the program that we studied in the last chapter, we already encountered the directive #include <stdio.h> This instructs the pre processor to include the contents of the file stdio.h, which is stored in a predefined place (/ usr / include) into your source file. Why is the int I use directive important? For a number of reasons. We already saw that the C compiler expects you to declare all the variables that you use in your program. Well the truth is that, it is not sufficient if you declare your variables alone, you also have to declare all the functions that you call from your source. As already mentioned, there is a standard C library which contains several functions that a C programmerr will need: But the standard functions are not a part of the compiler proper, and are as much alien to the compiler as your own subroutine will be. Compiler and the library The distinction between the C compiler and the C standard library has to be understood clearly. The C compiler is the translator whereas the standard library is nothing but a
9

collection of compiled subroutine's. Whenever you call a subroutine, the subroutine's body must be present in the executable file. For example, if a BASIC programmerr says something like GOSUB 500, then there must be a subroutine present at line number 500 of the source program. Or if a dBASE programmerr says DO Addeust, then there must. be a procedure by name Addeust. It is ridiculous to attempt to transfer control to a non-existent subroutine. The developers of C have identified a set of functions that a C programmerr will 'need to perform various operations like I/O, string operations, file manipulation, and so on, have already written these functions and have loaded them into the library. So the library is nothing but a collection of compiled subroutines and is therefore an external body and not an integral part of the compiler. This means that such commonly used C functions like printf, scanf, malloc, fopen (to be studied later) are unknown to the compiler and therefore have to be declared. Well, this indeed can be a serious nuisance. Because today if you write a C program and use printf and scanf in it you will have to declare them, and again tomorrow in your next program. So to avoid this trouble the designers of C improve up with a simple solution: the include directive. The file stdio.h (called a header file) contains a sequence .of declarations for all the functions present in the stdio library. When .the pre processor encounters the #include <stdio.h> directive in your source, it throws away this line and instead inserts the contents of the file stdio. h into your source. Since the file stdio.h contains the set of declarations for the stdio functions, the declarations as such become a part of your source. Since the compiler accepts the output of the pre processor as its own input, by the time the compiler is into action, the declaration for the functions are in place. Header files The standard I/O library is not the only library available to a C programmerr. C installations will usually have many additional libraries each of which will contain a related set of functions. On Unix machines you have other libraries like the math library, the isam library, the curses library. Similarly the file stdio. h is not the only header file. As a matter of fact it is just one of numerous header files. There is a math. h which has the declarations for the math library, curses. h for the curses library and isam. h for the isam library. You may choose to include whatever header files are necessary as per your source program. Header files and Library A very clear understanding of the relationship between header files and the library and also the difference between them is called for. Whereas the library is a collection of compiled object routines, a header file is nothing but a C program, which contains such things as function declarations,. macro definitions and type definitions (to be studied later). A header
10

file being a text file can be viewed using any text editor and if required modified to suit your needs. define directive The next pre processor directive that we shall study now is the define directive. This is used to define a symbolic constant in your program. A symbolic constant is a name attached to a literal value. For example consider the following:

#define PI 3.14 #define RATE OF INTEREST 9.5 #define PASS MARK 40 The general form of a define directive is #define symbol value

Having provided such a directive you may go ahead and use the symbolic name instead of the original value. When the pre processor encounters the define directive, it notes down the symbol name and the associated value and later on wherever the symbolic name is used in the source by you, it replaces the symbol with its value. By using the define directive, you gain a number of advantages. Firstly, your program acquires a degree of readability. Numbers like 9.5 and 4739 which may make sense to you at the time of writing the program may not make sense at all after a few months when you want to modify the program. By using a symbolic name you lend your program a good degree of documentation value. There is a more important benefit in using the define directive. Proper use of the directive makes your program easily modifiable. To illustrate this point let us assume that you are writing a banking application where the figure 7 appears again and again, presumably as the rate of interest. Let us assume that you go ahead and hard code the value 7 in your program. If at a later date your bank changes the rate of interest to 8 then you will have to modify the number 7 with 8 in all places of the program. If you by mistake forget to change in even a single place, the integrity of your program becomes questionable. You possibly cannot attempt a global search and replace using a word processor because the number 7 can occur in other contexts than the rate of interest.
The define directive solves this problem altogether. If you had provided a define directive saying 0 #define RATE 7

And had gone ahead and used this symbol name, then the only change you will have to make is to the define directive itself. What happens when you use the define directive? As already said, the pre processor does a text substitution, replacing the symbol RATE with the value 7. After modification of the
11

define directive, the symbol RATE will instead get replaced with 8. So when the compiler gets to see your source program the symbol RATE is not present at all. Compiling & Assembling Once the pre processor has its run, compilation and assembly take place resulting in object files. Linking The object code, though in machine language, is not suitable for execution immediately. It has to be linked. The linker takes the object modules and identifies whatever external functions are called, and extracts the bodies of those functions from the library and links it with the object module thereby producing the executable file. MIXED MODE ARITHMETIC The phrase mixed mode arithmetic refers to performing arithmetic between different data types. For example what if you want to add a float with an int and store the result back in an int ? Is it legal at all ? Well, C allows you to mix the 3 fundamental data types that we have studied so far: the int, float, and char. What may really be surprising to a person new to C is that, characters can also take part in arithmetic in C. The three types and all their flavours can be mixed in a free fashion. For example it is very much legal to multiply an int with a float and store the result in a char. It is equally legal to divide a char by a float and store the result in float. What happens really when a char takes part in arithmetic? When a char is used in an expression, the numerical code that is available in it is used for evaluating the expression. Note that the numerical code that is available in a char is always integral (the ASCII code is after all an integral code). For this reason the char can be totally mixed with integers. In a manner of speaking, you can say that the char is just another variety of the int apart from the short, long and into Whenever you mix the data types across an assignment statement the narrower type is converted to an equivalent wider type value, arithmetic is done at the wider type, the result will be converted to the type of LHS. Note that the float is conceptually regarded wider than the into Whenever an int is to be converted to a float this is achieved by appending a .0 to the value. But when a float is to be converted into an int, this is achieved by truncating and not by rounding. Study the following code: #include <stdio.h> main ( ) { int i; float f;
12

char c; i = 65; f = i; /* f now contains 65.0 */ c= f; /* c contains 65 */ printf("%c ",c); /* prints 'A': the ascii code of 'A' is 65 I */ f = f + 0.99; i = f; /* i contains 65 and not 66 I */ c = c+ 1; /* c becomes 66 */ printf("%d\n", c+i) ; /* prints 67 */ } Though it is legal to perform mixed mode arithmetic in C, note that such operations may not always be meaningful. For example assuming that we have a float variable f with a value 10000.0 assigning it to a character variable c is, though legal, completely meaningless, because the char variable with just 1 byte of storage cannot store any value greater than + 127. So if we did do that what is the value of c after the assignment? It is undefined. C never checks for numeric overflow. Similar circumstances in certain other languages (notably Pascal) would have resulted in a run time error. But your C program simply continues to run; only the value of the variable becomes undefined. The considerable overhead involved in checking for numeric overflow is avoided in C. This is one of the so many reasons why C programs have the reputation of running very fast. But this advantage comes to you with a risk: It is now your responsibility to ensure that your programs are meaningful. This is very important, so let us repeat this: It is totally the responsibility of the programmerr to ensure that the receiving data type is wide enough to receive whatever you are trying to store in it.

13

Chapter 2

Data types and Storage class

DATA TYPES The first question that should come to one's mind when learning a new language is what the data types available in the language are. The data types supported in a language dictate what kind of values can be processed in the language. Prima facie one may argue that a language that provides more data types is more powerful, more general purpose and more versatile. But obviously this is not the only criterion to evaluate a language. C supports a number of data types, which may broadly be classified into two varieties: 1. Simple or primitive or atomic data 2. Compound or structured or derived data. An atomic datum is a fundamental unit of information which cannot be broken down to constituent parts, whereas a derived data item is made up of one or more simple data items. As an example, we can consider an integer which is a simple data type and an array of 10 integers. Whereas an integer cannot be broken down to more minute parts, an integer array can be. It can be broken down to the components. The following table shows the different data types provided in C. Data Types Simple Compound Integers Arrays Floats Structures Characters Unions Pointers Bit Fields We shall reserve all discussion on the compound data types and pointers until a later chapter and focus our attention on the simple data types now. Integers Integers store numeric values without a decimal point. C provides three different flavours of integers. There is what is called an int, another called a short int and finally a long into Since all of these are flavours of the same data type, what is the difference between them? The difference between these different forms of integers is the number of bytes the variables of these types occupy and subsequently the range of values you can store in them. A short int occupies 2 bytes, an int occupies 4 bytes and the long int 4 bytes again. Internally integers are represented in pure binary, i.e., 0 is represented as the bit stream 0000, 1 is represented as 0001, 2 as 0010, 3 as 0011 and so on. So if you have an int variable by name intvar and you move the value 10 into it, the variable will contain. 0000 0000 0000 0000 0000 0000 0000 1010 32 bits)
14

whereas a short int variable with the same value 10 will have the representation 0000 0000 0000 1010 (16 bits) Since the number of bytes required to store these various forms of integers is different the range of values you can store in them is also different. In a short integer which has 16 bits totally, 15 bits are available to store the value and one bit, usually, the MSB stores the sign. So going by the old rule which you must be aware of, that, in n bits one can represent a total of 2n combinations, the range of values which you can store in a short int are -32768 (_215) to +32767 (215 -1). Integers by default are signed, but if you desire you can make them unsigned. An unsigned integer always stores a positive value and such being the case there is no necessity to reserve the MSB to store the sign bit, and so all the bits (16 or 32 as the case may be) store the value. The following table summarizes the properties of the different flavours of integers: Type short int int long int unsigned short int unsigned int unsigned long int Bytes Required 2 4 4 2 4 4 Range -32768 to +32767 -2147483648 to +2147483647 -2147483648 to +2147483647 0 to + 65535 0 to +4294967295 0 to +4294967295

Because there is a wide variety of choice available, a programmerr can, depending on his requirements choose a proper type. Let us assume that as part of your program you require to store the age of a person. The variable to store the age can be declared as a short int rather than as an int, thereby effecting a saving of 2 bytes. The age of a person after all, is never going to exceed 200. Have a look at the table again. There does not seem to be any difference between a long int and an int at all. So are these two the same? Yes. This is so, as far as the Unix C compiler is concerned. One of the first points that you will have to appreciate is that, the sizes of the data types are not fixed and may change from one implementation to another. The C language standard does not prescribe any fixed size for these data types but instead. leaves it to the discretion of the compiler writer. So you may expect to find that the sizes of these types are different in a different compiler. Well, this indeed is the case when you look at one of the very popular C
15

compilers for MS-DOS: The Turbo C compiler. In Turbo C a short int is 2 bytes, an int is 2 bytes again and a long is 4 bytes. As a matter of fact it is very much conceivable that on a particular machine you may find the short int, int, and the long int: all of them to be of the same size. The idea of having these different flavors is that where possible an implementation can have them with different sizes. What values are shown in the table are for one implementation of C : The tcc compiler which is the official C compiler on Unix machines. Floats The next important data type that C provides is the float. Floating point data types store numerical values with a fractional portion. Like the integers floats are also available in many flavors. There is what is just called a float, another called a double and finally a long double. Floating point numbers are essentially signed. The following table summarizes the properties of floats, again with respect to the cc compiler. Type float double long double Description Single Precision Double precision Extended precision Size 4 8 8 Range 3.4E-38 to 3.4E+38 1.7E-308 to 1.7E308 1.7E-308 to 1.7E308

Floating point variables are represented in a different way internally by the compiler. In a 32 bit float, 23 bits are reserved to store the mantissa and 9 bits to store the exponent. Floating point arithmetic may be carried out by hardware, if a math co-processor is available in the machine. If one is unavailable, float arithmetic will have to be carried out by software routines. Characters A char is a data type which stores an element of the machine's character set. The character set used is usually the ASCII set, though it can be different depending on the host machine. Characters come in two flavors, the signed char and the unsigned char. Characters usually occupy 1 byte. So a signed char can store a value between -128 to + 127 whereas an unsigned char can store values from 0 to +255. Let us study the function now. The first parameter to printf is always a format string. It may additionally receive many more parameters. printf when it is called opens the format string in the left end and keeps reading a char at a time. If the character read is not a % or a \ printf prints it on the standard output and advances to the next character. The % symbol and the \ character have a special meaning to printf. When printf encounter.s a % symbol it . does not print the % character, reads the next character, does not print the next character also, instead it treats the % symbol and the
16

immediate next character as what is known as a format descriptor. Some of the important descriptors are %d %f %c %s - for int - for float - for char - for strings /* char arrays to be studied later* /

Similarly the \ character has a special meaning to printf. When printf encounters a \ it does not print it, reads the next character, does not print it also and treats the \ and subsequent character as what is known as an escape sequence. Some of the important escape sequences are \n new line (Carriage return + Line Feed) \t - Tabulator \a - To beep the speaker /* only in ANSI C * / So, when printf is called for the first time in the program and the string "Enter the 5 marks.." is passed as a parameter to it, printf prints the string as such at the current cursor location because the string neither contains any format descriptors nor any escape sequences. But look at the second call to printf reproduced below: printf("Total = %d Average = %f\n",total,average): Now printf instead of printing the %d prints the value of the variable total as an integer (%d is the descriptor for int) and instead of printing %f prints the value of variable average as a float. The immediate logical conclusion you should have arrived at by now is that if there are n format descriptors in the format string, there must be n subsequent parameters of suitable types. Calls to printf such as, printf("%d\t%d\t%d\t",onlyonevar); printf("%d %f\n", floatvar, intvar); /* 2 parameters missing! */ /* order is wrong ! */ will produce nonsensical output, and such errors may on occasions even result in a run time error. In the above code fragment, variable declarations have of course been omitted. and the reader is expected to assume that they have been declared properly. The output of printf when it is directed to the screen always comes at the current cursor position. A limited amount of cursor control and output formatting can be achieved by using the \t and \n sequences. Note that the ANSI standard C does not provide any mechanism to clear the screen or to position the cursor. Where such capabilities are desired you will have to
17

make use of such additional special libraries as the curses library which we shall be studying later. The function scimfis the twin ofprintf and performs general purpose formatted input. scanf can be used to read ints, floats, chars and strings from the standard input which usually is the keyboard. scanf also receives a format string as a first parameter. The format string contains format descriptors which tell scanf what to read from the keyboard. scanf reads the input and stores it in the associated parameters. Just as it is in the case of printf, it is imperative that if the format string contains n descriptors then there must be n subsequent parameters. Also note an important point here : If you are trying to read in an int or a float or a char from the keyboard, then you must prefix the variable name with an & symbol. Calls to scanf like scanf("%d%f%d", &intvar) /* Parameter count wrong ! */ scanf("%d", intvar) ; /* & missing ! */ scanf("%d", &floatvar) /* Type mismatch */ are going to result in grave trouble, nothing short of a run time error.

STORAGE CLASSES
The storage class of a variable dictates how, when and where storage will be allocated for the variable. The different storage classes available are : 1. auto 2. register 3. static 4. extern Let us study these one by one. auto The auto storage class is the default storage class for all local variables. The keyword auto derives from automatic, but it is seldom used, because local variables belong to the auto storage class anyway. Variables belonging to the auto storage class get automatically created when the function/block containing them is entered and they automatically cease to exist when the corresponding function/block is exited. Local variables derive all their characteristics by virtue of being automatic variables. Thus they are initialized to garbage in the absence of explicit initializes and lose their value between function calls. register
18

It is possible for you to attribute the register storage class to certain variables. First of all, the register storage class is applicable only to local variables and even in that only to variables of the int, char and pointer data types. You can declare a variable with the register storage class as shown below: register int i; register char c,f; /* same as register int j; */ register j; register char *cp; When the compiler encounters a variable with the register storage class specification, it tries to store this variable directly in one of the registers of the CPU, rather than in memory. Since the access time of the CPU's registers is far less than that of the memory, your program will run faster. It is best to identify heavily used variables and assign them the register storage class. Whether the compiler will satisfy your request or not is contingent upon the availability of registers. Usually, at most three variables per function can be accommodated in registers. But you may specify as many such variables as you want. Excess declarations are harmless and ignored. Registers of the CPU do not have addresses.

EXTERNAL LINKAGE All our programs so far, have been quite small, that it was all right to have the entire source program in one source file. But this arrangement is, as we shall see shortly, not a very suitable one when it comes to very large programs containing several hundreds of source lines. Why is this so ? With a little imagination, we Will be able to answer this question. When your program is very big, having all the source in the same file is not advantageous. This is because any change done to the source however small it is, will mean compiling the entire source which can be a very time consuming operation. To reduce compilation time, C allows you to break down your program into multiple files. Please bear in mind that we are not talking of multiple programs, on the contrary about the same program being split into multiple files. Assume that you have a source program, 1000 lines long. If you want, you may organize this into four separate files: let us say a. c, b. c, c. c and d. c. Now to compile these files and arrive at a. out, you will invoke tcc as : Now the compiler will compile all these files, producing, object files a.obj, b.obj, c.obj and d.obj. The linker will load all these object files into one executable load module a. out. Now subsequent to this, if you make a modification to a. c, you need to compile only that file and forego the compilation of the other files, because they have already been compiled, their object modules are available and after all no change has taken place in them. You can now compile the programs as:

19

The compiler will now, compile the source file a.c arriving at object module a.obj and link this object module with the others to produce a.exe. Thus you save a considerable time. Splitting your source program into multiple source files as shown just now, raises a number of issues: First of all, what is the organization of each program? Each has its own preprocessor directives, its own global variables and its own functions. Note that only one of these files can have the function main. Secondly how do you access a variable declared in one file in another? Note that the local variables of a function in one file cannot be accessed by other functions in the same file, let alone those outside. But if you want, you may share the global variables. Let us assume that, you have a global int variable by name intvar declared in the file a.c and that you want to use this variable in the file b. c then you will have to provide a declaration of the form shown below in the file b.c : extern int intvar; extern is a keyword in C which when you attach to a declaration tells the compiler that, the variable is declared elsewhere. Note that this is not the same as saying: int intvar; If you provide the latter declaration, it becomes a redeclaration of the variable intvar. Now the variable intvar in the file a. c has a global scope, and that in the file b. c too has global scope, thus violating the scope rule of C : No two variables in the same scope can have the same name. extern Storage Class This is the default storage class for all global variables. As a matter of fact global variables derive most of their properties from belonging to the extern storage class. extern storage class variables get initialized to zero automatically, retain their value throughout the execution of the program and can be shared by different modules of the same program. static The final storage class that we shall study is the static class. The static storage class has a number of implications depending upon its usage. 1. The default storage class for all local variables is auto. This can be changed to static by prefixing the declaration with the keyword static as in static int intvar. A local variable with static storage class is still a local variable as far as its scope is concerned : It is still available only inside the function in which it is declared. But certain other properties of the variable change: It gets initialized to zero automatically, is initialized only once, (during program startup) and it retains its value throughout the execution of the program. 2. When applied to a global variable, the global variable becomes inaccessible outside the file in which it is declared.

20

3. This storage class can be applied to functions too. Similar to case 2. The functions become inaccessible outside the file. The static storage class provides for information hiding. By applying this storage class to internal identifiers a library programmerr can prevent, internal names from being exported, thus averting any potential naming conflicts between the internal names and the identifiers the user of the library may use.

21

Chapter 3

Operators and Type casting

Having had an introduction to the fundamental data types available in C let us take a look at the operators provided in C. Arithmetic operators are: + for addition, - for subtraction, * for multiplication, / for division, % for modulus operation. Relational operators are : < for less than, < = for less than or equal to, > for greater than, > = for greater than or equal to, ! = for not equal to, = = for equality testing, = for assignment. Logical operators are : ! for not operation, && for and, || for or operation. Note that wherever you have two symbols put together to make up an operator there is no space between the symbols. C provides several more operators. We shall study them later. SPECIAL OPERATORS C provides a number of operators which perform various operations. We are already familiar with a number of them. Now we shall study a few more. + + serves as the increment operator in C and -- serves as the decrement operator. Note that these operators can be applied only to integral operands (by now you know that char is also integral). We earlier saw that whenever two symbols are put together to make up an operator it is illegal to provide a space between them. So the two plus symbols and the two minus symbols must be together. Often using the special increment operator and decrement operator, (instead of adding 1 and subtracting 1) will result in better performance of the program. This is because most of the

22

micro processors implement separate increment and decrement instructions which are much more efficient than the equivalent addition and subtraction.

The increment and decrement operators are available in two forms: 1. The prefix form and 2. the postfix form. That is, if you have an int variable by name count, and you want to increment it, you can achieve this either by saying ++count or by saying count++; There is a small but very important difference between these two forms. When you apply the operator in the prefix form, first the value of variable is incremented, then its new value is used. On the other hand if you use the postfix operator, then the current value of the variable is first used (for whatever purpose) and then it is incremented. The difference between the two forms is not at all apparent when you consider the above example. But consider the one below: int a = 50, b = 100; a = ++b /* now both a and b contain 101 */ b = a++; /* now b contains 101 whereas a's value is 102 */

printf("%d\n", ++a) /* printf is going to print 103 now */ printf("%d\n", a++) /* prints 103 again I; but a has become 104 */ Note that the difference between the two forms has an effect only when the value of the variable that is being incremented is used in the expression: as in the above case to assign to the variable on the LHS (or possibly when passed as a parameter to a function). When all you want to do is to increment a variable (as in the first case) which of the two forms you use does not make a difference at all. Though in the above discussion we have been talking mainly of the + + operator the same facts hold true for the -- operator also. C provides a number of operators which can be expressed in the general form op= Where op is an operator like +,-, *, /, %, etc. So you have operators like + =, - =, * = , = and so on. In general, an expression likes

23

var op= value; is interpreted as
var = var op value; So Temperature += 5; is actually equivalent to saying Temperature = Temperature + 5; Also look at these statements: Total *= i; /* Total = Total * i ; */ These operators serve a number of purposes:

1. They provide a short hand notation, which once clearly the number of key strokes.

understood and mastered, reduce

2. They reduce the possibility of spelling errors (since you do not have to specify the variable name on both sides of the assignment operator). 3. Last but definitely not the least: They often result in better object code. MORE ON I/O We saw in the last chapter the two functions - printf and scanf which allow you to perform general purpose formatted I/O. Now we shall look at a pair of functions which specialize ,in character I/O – getchar() and putchar(). The function getchar does not take any parameters. When it is called, it reads a character from the standard input device, and returns an integer value corresponding to the ASCII code of the input char. If no input is available (i.e. end of me encountered) getchar returns a special value called EOF. You will learn later that the identifier EOF is actually a constant defined in stdio.h. You will recall that end of me can be signaled from the keyboard by typing CTRL-D. Bear in mind that though getchar reads a char it returns an int and so the return value must be captured in an into putchar is the twin of getchar and does the complementary job: it prints a char to the standard output device. The char which is to be printed will have to be passed as parameter to putchar. Take a look at the following code fragment: it serves to illustrate the use of getchar and putchar: #include <stdio.h> main ( )
24

{ int c; printf ("Enter an uppercase letter"); c = getchar();/* parenthesis needed even if no parameter I */ printf("Lowercase=”); ptchar(c + 32);

Also study the following statements: putchar('A') ; putchar (65); /* int and char can be mixed in c */ putchar('A' + 5); /* prints 'F' : char can take part in arithmetic ! */ The last call to putchar merits discussion. Observe that an expression is being passed (well not exactly, only the result of the expression is) to a function. This is a perfectly legal arrangement in C. Wherever a particular data type is legal, in all those places a constant of that type, a variable of that type, an expression which results in that type or a function that returns that type are legal in C. Since putchar takes an int as a parameter, you can pass to it either an int constant (like 65) or an int variable, or an integer expression, or a function that returns an integer (for example getchar).

25

Chapter 4

Control Statements and Looping
if statement in C The if statement in C starts with the keyword if followed by a parenthesized Boolean expression. A Boolean expression is one which can be evaluated as being true or false. Attached to the if condition you may specify actions. The if statement in its simplest form can be specified as if (expression) /* parenthesis compulsory ! */ statement; For example: if (average > 50) printf("pass\n"); Note that C does not use the word "then". The if statement is processed in C in much the same manner as in other languages. The expression is evaluated. If it is true the associated statement will be executed, otherwise not. Let us pay some attention to the expression getting evaluated. One of the first rules is that wherever a simple condition is legal, in all those places a compound condition is legal and vice versa. An expression of the form (average > 50) is called a simple condition. A compound condition is formed by linking two or more simple conditions by using the logical or (II) and the logical and (&&) operators. Whenever you form a compound condition by using the && operator, for the compound condition to be true, each and every one of the constituent simple conditions must be true. On the other hand if you use the II operator, then it is enough if just one of the simple conditions is true. Look at the example below which illustrates the use of a compound condition: If((ml > 40)&&(m2 > 40)&&(m3 > 40)&&(m4 > 40)&&(m5 > 40)) printf("pass\n"); Whenever you link simple conditions together to form a compound condition the evaluation of the condition is carried out from left to right and will stop as soon as the truth or falsehood of the expression is known. For example in the above -code fragment if ml has a value 30 then the compound condition is false. The evaluation of the other conditions is irrelevant and is as such not carried out in C. CONTROL FLOW - I The action part of the if statement need not be just a single statement as in the above case. You may want to execute multiple statements subjecting to the truth of a given
26

condition. Under such circumstances you can supply multiple action statements by using a compound statement or what is otherwise called a block.

A compound statement or a block is nothing but one or more simple statements sandwiched between a pair of braces ({ }). Another rule in C is that wherever a simple statement is legal in all those places a compound statement is legal and vice versa. So the following is a perfectly legal statement in C: if (average > 60) { printf("pass - "); printf("Class - I\n"); } The variable average is tested against 60, if it is greater, the entire of the compound statement gets executed, and otherwise the entire of the compound statement is skipped. Note that the same effect cannot be achieved by saying: if (average > 60) printf("pass - "); printf("Class - I\n); If you do not supply the braces only the first printf call is connected to if and the second call gets executed unconditionally! Let us now ask ourselves one question: What can be specified in the action part of the if? C says that . any legal C statement can be used in the action part of if : any legal C statement which also includes the if itself. So study the following if statement which is syntactically correct: if(ml > 50) if (m2 > 50) if(m3 > 50) if(m4 > 50) if (m5 > 50) printf("pass\n"); The else in C The if statement can have an optional else clause. Look at the following statement:
27

if (average > 40) printf("pass\n"); else printf("Fail\n"); Note that the body of else clause can again be a compound statement. Also C says that the else clause can have any legal C statement attached to it: which means that an if statement can be attached to it too, thus producing a nested if - else - if else - if ladder. The nesting of if statements thus, can go to any level in C. The following code fragment illustrates this point: if (average> 80) printf("Distinction \n"); else if (average > 60) printf("First Class \n"); else if (average > 50) printf("Second class\n"); else printf("Fail\n"); It will help to remember that C is a free form input language. So the statements can be arranged in any position form. Entire of the above ladder could have been placed on the same source line if needed. One of the issues that you will have to have in mind is that an else always associates with the closest if: #define MALE 0 #define FEMALE 1 main ( ) { int sexcode, age; printf("Enter Your Age & Sex Code : "); scanf("%d%d", &age, &sexcode); if(sexcode == MALE) if (age < 20) printf ("Hello young boy t \n"); else printf ("Hello Lady I \n") ; } Though in the above program the indentation clearly specifies what the programmerr has in mind, you may be embarrassed if you are a male and your age is greater than or equal to 20. How to remedy this problem? Use braces and clearly bracket out the statements as below: #define MALE 0 #define FEMALE 1 main ( ) { int sexcode, age; printf("Enter Your Age & Sex code: "); scanf("%d%d", &age, &sexcode);
28

if(sexcode == MALE) { if (age < 20) printf ("Hello young boy 1 \n"); } else printf ("Hello Lady 1 \n"); } Before we leave the if statement altogether, let us make one final observation: you may on occasions get into what is popularly called the dangling else problem. This is nothing but an else not connected to any if Look at the following code fragment: if (average> 80) printf("Distinction \n"); printf("congratulations\n"); else if (average < 40) printf("Fail \n"); TERNARY OPERATOR C provides a condition evaluation operator (called the ternary operator) in the form of the? symbol. The general form of the ternary operator is as shown below: (condition) ? expression1 : expression2 The ? operator evaluates the condition that precedes it. If it is true, it returns expression! and returns expression2 otherwise. Thus, this operator provides a shorthand notation which can be used as a substitute for the if statement. But, where you want a nested if, an expression constructed using this operator will indeed prove to be cryptic. while loop C provides 3 different kinds of loops namely the while loop, the do - while loop and finally the for loop. Of these the while loop is the commonest. The while loop in C starts with the keyword while, followed by a parenthesized Boolean condition, and has a set of statements which constitutes the body of the loop. The general form of the while loop can be expressed as follows: while (expression) { action statements; }

29

We have already seen that, wherever a simple condition is legal, in all those places a compound condition is legal. So the expression that you supply as the controlling condition of your while loop can be as complex as you please. The while loop executes like this: After executing whatever statements are prior to the while loop, you arrive at the while loop. As soon as execution reaches the while loop, the condition specified is tested. If it is found to be true, you enter the body of the loop, execute the body throughout and once you reach the close brace of the body, you automatically loop back to the top, test the condition freshly now, and if it is true re-enter the body and so on and on and on. When the controlling condition of the while loop becomes false, you break out of the while loop and start executing whatever statements are subsequent to that. In short, as long as the controlling- condition remains true, you will continue to iterate across the loop and repeatedly execute the body of it. From the above discussion, it must be clear to you that the controlling condition should be such that it will fail at some point of time. Otherwise you will have in your hands an infinite loop. Now let us ask ourselves one question: What can be specified in the body of the loop? First of all, the body can be just a single statement or a block: we have seen that wherever a simple statement is legal, in all those places a block is also legal. As to the contents of the body, C says that any legal C statement can be used in the body of a while loop. Since while itself is a legal C statement it follows that you may have a while loop inside another. This nesting can go to any level. Let us write a small program: A program to print the ASCII table. #include <stdio.h> /* A program to print the ASCII table */ main ( ) { int dh; ch = 0; while (ch < 128) { printf("%c %d\n", ch, ch); ch++; } } The program itself deserves little explanation. So let us look at the comments instead. We have already been using comments on several occasions, though they were never recognized as such. Anything enclosed between a /* and a * / is a comment and is ignored by the compiler. It is a helpful practice to liberally comment your programs. Because of the documentation value, comments help you to remember what different parts of the program do, and are really of valuable aid, when you want to modify the program at a later point of time.
30

There are a few rules that you have to know about comments: Comments do not nest. A comment inside a string is not recognized as one. Comments can span multiple lines. I/O Redirection The I/O functions that we already have studied - printf, scanf, getchar and putchar (as well as gets and puts which we shall be studying later) do not directly operate on the keyboard and the monitor. They instead read from and write to the standard input stream and the standard output stream. As you are aware the standard input of a program can be redirected to come from some other source than the keyboard by using the command line redirection operator <. Similarly the standard output can be redirected so as to reach some destination (typically a file or a device) other than the VDU by using the command line redirection operator > or > > . So if your program uses the C stdio functions and if when the executable file is run its I/O is redirected in the command line then getchar and scanf and gets will automatically read from the file specified and similarly, putchar, printf and puts will send their output to the file specified. Let us exploit this feature to write a few programs that will operate on files. The first that we will write is a program to convert a me (its input) into upper case. #include <stdio.h> main ( ) { int ch; ch = getchar(); while(ch 1= EOF) /* value returned by getchar to be captured in an intl*/ { if (ch>='a' && ch <= 'z') /* char const to be enclosed in ' ‘ */ ch -= 32; putchar(ch); /* difference between 'a' & 'A' */ ch = getchar ( ) ; } /* end of while */ } /* end of main */ You will now recall how getchar functions: It reads a char from the standard input stream, returns an int value corresponding to the char read. If no input is available (end of me encountered) it returns a value called EOF. EOF is actually a symbolic constant dermed in the stdio. h me whose value is -1. With this background information in mind, the program should be quite simple to understand. We read a char in every iteration, ascertain whether it is a lower case one, if not simply print it, but in the event that the char read just allow happens to be a lower case letter, we first convert it into the equivalent uppercase letter and print it. When ultimately getchar finds that it has exhausted the input, it returns EOF at which point of time, the controlling condition of the while loop becomes false. So control will break out of the while loop, and since there is nothing else to do in the program, you are left at the shell prompt.
31

Though the above program is technically a correct one, you will in general find that experienced C programmerrs will write it in a much more concise way. Let us illustrate this by writing another program: One which will display its input a page at' a time (say 25 lines), and provide for a pause in between.

#include <stdio.h> main ( ) { int lines = 0; int ch; while((ch = getchar()) != EOF) { putchar(ch) ; if (ch == '\n') /* \n is C's notation for line feedl*/ { lines ++; if (lines == 25) { lines = 0; sleep (5) ; /* stop for 5 secl */ } } } /* end of while */ } /* end of main */ It is an extremely common occurrence in C that, in the evaluation part of the while loop (as well as in a for loop or a do - while loop) some action will be specified, as in while ((ch=getchar ( ))! =EOF). When the controlling condition of the while loop is tested, getchar is called, whatever that returns is captured in ch and then ch is tested against EOF. Note that the parenthesis that encloses the expression ch=getchar () is absolutely necessary. The != operator in C has higher precedence than the = operator so that an expression like
ch = getchar() != EOF is interpreted by the compiler as meaning ch = (getchar () != EOF)

What this means is certainly different from what we intended: Now getchar is called, whatever it returns is not captured in ch, but is compared with EOF, the result of the comparison which is either a 0 (false) or a 1 (true) is stored in ch. quite different from what we want indeed. That brings us to the next topic. What is the precedence of operators in C? C with its numerous operators makes it extremely difficult to memorize the precedence relationships. For the moment, you are advised to clearly parenthesize and specify to the compiler what you want. The parenthesis, you can be sure, has the highest precedence. Before we close this chapter let us address a few miscellaneous yet important topics.
32

THE TRUTH ABOUT TRUTH What exactly is true in C and what is not? According to C's convention a value of 0 is false and any other value positive or negative is true. C does not have a Boolean or logical data type as in Pascal or dBase. The integer itself plays the role of the Boolean data type. Often instead of saying if (i ! = 0) a C programmerr will say if (i). Let us illustrate this point by rewriting the program to print the ASCII table: We will print the ASCII table in the reverse order now: #include <stdio.h> main ( ) { int ch = 127; while (ch) { printf("%c %d\n", ch, ch); ch--; } } With this information in mind, you will be able to judge what will happen if by mistake you provide an if statement like this: if(a = 5) /*should be a== 5*/ b = 10; Instead of comparing a with 5 you are assigning 5 to a, and since the value 5 is true in C you will end up assigning 10 to b too. What in effect is happening is that the value of a is lost forever inadvertently as well as the assignment of 10 to b is executing unconditionally. The compiler is totally silent about this because in C, an assignment is a legal expression and has the value of LHS. We have already used an assignment in the expression: while ((ch = getchar ()) != EOF) One final question : Why is it that you have to capture getchar's return value in an int, and not in a char? This is because there is no end of file marker in Unix systems. So all characters (total of 256) can be present legally inside the file. So if getchar were to return a char what value can it return to signal end of file? None. getchar needs to return all possible 256 char combinations as well as a special EOF signal. Hence it has to return a type which is wide enough to represent 257 combinations ( 256 chars + 1 EOF signal) : the int is a suitable candidate for this purpose. If all you want to do is read a char and you are not bothered about end of file condition, it does not make a difference. You may as well receive it in a char. In this chapter, we will take a close look at the other control flow mechanisms available in C: the do - while loop, the for loop and the switch case structure. We will also study the functional relationship between the for loop and the while loop.
33

The do - while loop The for loop switch case structure break and continue statements Goto : (noun)A programming tool that exists to allow structured programmerrs to complain about unstructured programmerrs. The do - while loop in C C provides a do while loop which performs the test at the bottom rather than at the top. The do - while loop starts with the keyword do, followed by the body of the loop (a set of statements enclosed between braces), the keyword while and a parenthesized boolean expression. The general form of a do - while loop can be represented as follows: do { action statements } while (expression) ; The do - while loop behaves in this manner: As soon as execution reaches the do while loop, you enter the body of the loop and execute the statements present in the body. Once you reach the while, the expression specified is evaluated. If it is found to be true, you automatically loop back to the top and re-enter the body of the loop. If at the time of testing, the condition evaluates as false, you break out of the do - while loop and execute whatever statements are subsequent to it. In short, you continue to iterate and repeatedly execute the body of the loop as long as the controlling condition remains true. One point of contrast should have occurred to you by now: In the case of a while loop there is a possibility that the body of the loop may not be executed at all, whereas the body of a do - while loop will get executed at least once. This difference stems from the fact that in the while loop the test is done at the top as opposed to the do while loop where the evaluation of the controlling condition is done at the bottom of the loop. By now it hardly needs mention that the body of the do - while loop can be either a simple statement or a compound statement and that the controlling condition can be a simple condition or a compound condition. Any legal C statement can be a part of the body of the do - while loop: that of course includes the do - while loop itself. Let us write a program to demonstrate the use of the do - while loop: A program that will print the multiplication tables from 1 to 10. #include<stdio.h> main()
34

{ int multiplier; int multiplicand; int product; multiplicand = 1; do { printf("\nMultiplication Table For %d", multiplicand); printf("\n_______________________________\n\n"); multiplier = 1; do { product = multiplier * multiplicand ; printf("\n%d * %d = %d",multiplicand,multiplier,product); multiplier++; } while (multiplier <= 10); multiplicand ++; getchar ( ) ; /* wait for a key press */ } while (multiplicand <= 10); /* end of outer do - while */ } /* end of main */ The C for loop The for loop in C is very different from the for loops that you may encounter in the other languages. The general form of a for loop in C can be specified as : for (initial statements; expression; terminating statements) { body; } As you can see, the for loop starts with the keyword for. The keyword for is followed by a parenthesized what is called a header. This is followed by the body of the loop which typically is a set of statements enclosed be~ween braces. The header of the for loop consists of 3 portions: a set of statements to be executed initially before actually entering the loop, an expression that will act as the controlling condition of the loop and finally terminating actions : which are nothing but a set of statements executed after every iteration of the loop. The behaviour of the f or loop requires close attention: After executing whatever statements are prior, you arrive at the for loop. As soon as you reach the for loop the set of statements supplied in the initialization part of the header are carried out. Once this is done the initialization statements are conceptually cut off from the for loop and have no role to play. The controlling condition is evaluated now. If it is found to be false, you immediately break out of the for loop. On the other hand if the controlling condition is found to be true. you
35

enter the body of the loop, execute it throughout and once you reach the close brace of the body, you loop back to the top and start executing the terminating actions. After executing the terminating actions, you test the controlling condition freshly now, to decide whether another iteration should be carried out or not. As long as the controlling condition remains true you will loop across the body of the loop. for (startch = 'A' ; startch <= 'Z' ; startch++) printf("%d %c\n", startch, startch); As you can imagine. the above for loop prints the ASCII codes of upper case alphabets alone. startch is assumed to have been declared to be an int or a char. A number of points require attention here: 1. The set of statements specified in the initialization part of the for loop's header gets executed just once as soon as the f or loop is reached, irrespective of how many times the loop iterates. 2. Every iteration of the loop is immediately preceded by the test. 3. For each iteration, after the body is fully executed, the terminating statements get executed. Though the for loop's header has 3 parts to it, not all of them are compulsory. If you do not need any initial actions, you need not provide any, if you do not want to supply a test, you can omit that too, no terminating action need be specified if none is called for, and you need not provide a body also. The minimal syntactically correct f or loop is : for (; ;) What does it do ? Nothing is initialized, nothing is tested, nothing is done. In other words, you have in your hands an infinite do nothing loop. What the C compiler does impose is that, there be two semicolons inside the header, the first one separating the (possibly empty) initializer from the (possibly empty) expression and the second one separating the expression from the (possibly empty) terminating statements. Whereas- on one hand you may omit specific portions from the for loop's header, if you so desire you may supply multiple initialization statements and multiple terminating statements. Of course - the expression specified can be either a simple condition or a compound condition, and the body can be a simple statement or a compound statement. The following is a syntactically correct for loop assuming that i, j , k have been declared:
for (i=0,j=10,k=20 ; i < 10 && j < 40 ; j++, k +=2, i*=2) { /* do something here */ }

36

The f or loop in C as already mentioned, is quite different from the for loops in other languages. There is no notion of initial value, step, and final value: though the for loop can be used in that context too. In a manner of speaking, the f or loop in C is just a variant of the while loop. The while loop can always be replaced with a for loop and vice versa. See with what ease a for loop can be transformed into a while loop: for (i = 0 ; i < 10 ; i++) { action - 1 action - 2 action - 3 } i = 0; while(i < 10) { action - 1; action - 2; action - 3; i++; } The foregoing discussion should not be misunderstood to mean that, the for loop in C is by any means redundant. The for loop with its header clearly spelling out the initialization statements, the controlling condition and the terminating statements, all in one place, is indeed, a very elegant choice for several situations. You may at this point of time recall some of the rules associated with the for loop in other languages : In languages like BASIC and Pascal, there are certain rules which you have to follow, typically: you cannot alter the for loop's index variable within the body of loop and where you have a nested for loop, the inner for loop's index variable cannot be the same as that of the outer for loop and as soon as you exit the for loop the for loop's index variable's value is undefined. Well, none of these is true of the for loop in C. If you want, you are at liberty to alter the controlling variable, nested for loops can employ the same variable and finally the for loop's controlling variable's value is very much defined on exit from the loop. To demonstrate the relationship between the while loop and the for loop and also to show that the C for loop is not tied down to the paradigm of initial value - step - final value let us rewrite the program to print a file page by page. This time employing the for loop instead of the while: #include <stdio.h> main ( ) { int lines = 0; int Chi
37

for(ch getchar() ; ch != EOF ; ch = getchar()) { putchar(ch) ; if (ch == '\n') /* \n is C's notation for line feed! */ { lines++; if (lines == 25) { lines = 0; sleep (5) ; /* stop for 5 sec! */ } } } /* end of for */ } /* end of main */

THE SWITCH - CASE STRUCTURE C provides a switch case structure also. Similar constructs are available in other languages too - as in Pascal, dBASE and of course the Bourne shell. For those of you who know FORTRAN - the switch . case structure is quite analogous to the computed GOTO statement. One of the first observations that we shall make is that, the switch case structure is not a loop: It is instead a multiple path decision making tool. You supply several alternative courses of action and which one the program will take is governed by the value of the case index variable. Look at the statement below. It portrays the general form of the switch case structure. switch (integral expression) /* case index var */ { case value1 :C statements; break; case value2 :C statements; break; case value3 :C statements; break; : : : default : C statements; break; }/* End of switch statement */

38

Note that in the above statement, switch, case, break, and default are C keywords, while value-I, value-2 and so on will be substituted with actual integral values by you. As soon as execution reaches the switch case structure the case index variable's value is compared one by one with the listed case values. Whichever is the listed case value that matches the case index variable's value, the C statements corresponding to that get executed. When a break statement is encountered, you break out of the switch case structure. If none of the listed case values match the value of the case index variable then the set of statements associated with the default clause get executed. A number of points need close attention here : The case index variable has got to be an integral variable. Floats and strings can just not be used. The listed case values must be integral constants. You cannot use a variable in a case. No two case values can be identical. Let us take a look at the default clause now. The default clause need not be the last in the ladder. It can appear anywhere, even as the first. It always gets executed only if none of the other explicitly listed values match the value of the case index. A default clause is optional. What happens when you do not provide a default clause and the case index variable's value is not matched? Well, nothing happens. You simply get out of the switch case structure. The break statement that you see used in the example above is often necessary but not compulsory. The break ensures that you do not fall through and reach the statements associated with the next case value. If it is not specified then execution will not stop with the selected set of statements alone. Instead you will fall through and execute the statements corresponding to next case also, even though the case index variable's value is different. Look at the example below: int index = 10; switch(index) { case 10 : printf(“Index var value = 10 \n”); case 20 : printf(“Index var value = 20 \n”); case 20 : printf(“Index var value = 20 \n”); } The switch case structure above will execute all the 3 printf calls given if the variable index has a value 10. This is at times convenient: when you want the same set of statements to be executed for different values of the case index variable, as in : switch(var) { case 1 : case 3: case 5: case 7: case 9: printf(“Odd \n”); break; case 2: case 4: case 6: case 8: case 10: printf(“Even \n);
39

break; }

Recall the free form input nature of C. Let us demonstrate the use of the switch case structure with a small program: A four function calculator which will use infix notation: #include <stdio.h> main ( ) { int first, second, operator; scanf("%d%c%d", &first, &operator, &second); switch (operator) { case ‘+’ : printf("%d+%d = %d\n", first, second, first + second); break; case '-' :printf(" %d - %d = %d\n",first,second,first - second); break; case ‘*’ :printf(".%d * %d = %d\n",first,second,first * second); break; case '/' :printf(" %d / %d = %d\n", first, second, first / second); break; default: printf("Unknown operator \n"); break; } 1* end of switch *1' } 1* end of main *1 One small observation: See the way the constants are specified in the case ladder. With C you never have to fmd out the ASCII code of a char: you can simply specify the character itself inside single quotes. The compiler will automatically insert the ASCII code for you. We indeed have already used this kind of a construct in: if(ch >= 'a' && ch <= 'Z') You can if you want, take the harder way and specify if (ch >= 97 && ch <= 122) But you can immediately see that by taking the latter approach your program loses its readability. break & continue C provides two statements - break and continue, using which the normal behaviour of a loop can be altered. We already have used the break statement. It can also be used inside a while loop, or a for loop, or a do - while loop. Wherever it is used it always performs the same job:
40

It causes control to break out of the innermost control structure, irrespective of whether the controlling condition is true or not. Pay some attention to the phrase Innermost control structure. C does not provide any mechanism to break out of an outer enclosing loop. Because of its nature a break will always be conditional (attached to an if). An unconditional break inside a loop is meaningless since in the first iteration itself the loop will break. while (1) { /* do something */ if (some condition) break; /* do something else */ } It is often customary that a C programmerr will employ a while (1) loop as shown above. Since as we already discussed, 1 is always true in C, the loop apparently is an infinite loop while actually not. By using the break statement within the loop, the programmerr is causing control to break out. Though the break statement is convenient at times, excessive use of it violates. the principles of structured programming. One of the axioms of structured programming is that a control structure should have just one entry point and one exit point. See that where break statements are liberally used, it is no longer apparent, by just looking at the top of loop (while & for) or, the bottom of it (do - while) what are all the conditions that cause the loop to break. Thus your program comes to lose readability. The continue statement whenever executed causes the rest of current iteration to be skipped, and causes the next iteration to begin, subjecting of course to the truth of the controlling condition. In a manner of speaking the continue statement is like a goto statement executed to reach the hidden label at the end of the body. Note that the continue statement can be used only inside a loop. Study the code below: while (exp) { while (exp) { /* do something */ /* do something else */ if(a == 10) continue; if(a == 10) goto end; /* do something else */ /* do something else */ } end ; } The minimal for loop revisited One final word: Recall the minimal for loop that we discussed earlier reproduced below for convenience:

41

for (; ;) ; /* legal C statement */ A semicolon is a legal statement in C. It can be used wherever a statement is required syntactically. What happens when you execute a semicolon? Nothing really. A semicolon results in no operations. This is important because many an experienced programmerr has spent hours debugging code like this: for (i = 0 ; i < 10 ; i++); /* see the semicolon */ { /* this is supposed to be the body */ /* while it actually is not */ /* See the semicolon after for loop's header */ /* THIS BLOCK IS NOT CONNECTED TO for LOOP AT ALL III */ } The above code fragment will not draw any errors from the compiler (After all it is syntactically correct). Only you will be breaking your head wondering why in the world the loop is not entered at all. Let us combine the skills we have acquired to write one more program in this chapter: A program which will check for syntax errors in your C programs. Although C is not a line oriented language, programmerrs generally tend to place their statements in a line oriented fashion. Given this, you can say that, in any line there will be equal number of open and close parentheses, brackets (used with array indexing) and in any given file there will be equal number of open and close braces. So let us write a program which will check for this and report an error on mismatch. #include <stdio.h> main () { int bracket = 0, parenthesis = 0, brace=0,line=0,ch; while((ch = getchar()) 1= EOF) { if (ch == '\n') { line++; if (bracket != 0) { printf("Bracket mismatch in %d\n",line); bracket = 0; } if (parenthesis) /* 0 is false & nonzero true 1 */ { printf("parenthesis mismatch in %d\n",line); parenthesis = 0; } }
42

switch(ch) { case ’(‘: parenthesis++; break; case ’)‘: parenthesis--; break; case ’[‘: bracket++; break; case ’]‘: bracket--; break; case ’{‘: brace++; break; case ’}‘: brace--; break; } /* end of switch */ }/* end of while */ if(brace) { printf(“Brace Mismatch”); } }/* end of main */ The program checks for braces only at EOF because braces normally do not occur in pairs in the same line. The program is not infallible of course. But it can be useful.

43

Chapter 5

Array and String
Arrays are a homogeneous compound data type. Since an array is fundamentally homogeneous in nature it always has a base type. Thus you can talk of an array of ten ints or an array of hundred chars. You cannot have an array where some elements are ints and some are floats. Why are arrays so useful ? This is because you can treat an array as a single unit when you desire, or break it down to pieces and access the elements independently. Imagine that you are writing a program to print mark sheets for students for the entire degree course. If there are totally forty papers that one has to appear for, to qualify for the degree, then you will need forty int variables to store the marks. Just imagine how you will have to be scanning them from the keyboard: scanf("%d%d%d%d%d ..........", &ml, &m2, &m3,............,&m40); (40 times) (40 different variables) Quite a cumbersome process indeed! Here is where arrays come into the picture. Look at how elegantly you will accomplish the same thing using arrays:
#include<stdio.h>

main() { int marks [40]; printf("Enter the 40 marks...\n"); for(i = 0 ; i < 40 ; i++) scanf("%d", &marks[i]); /* process the marks here */ } Arrays, with your ability to index across them and thereby access the independent elements, will prove to be an extremely elegant and useful data type for a number of problems. Let us get down to the brass tacks and study how to declare arrays, how to initialize them, how to use them and so on. In C, an array variable is declared by specifying first the base type of the array, then the name of the array variable, and then the number of elements the array will have, followed optionally by a list of initial values. Look at the general form of array declaration shown below: type name [no of elements] = { List of initial values } ARRAYS
For example: int array [5] = { 10, 20, 30, 40, 50 } ; float values[3] = { 12.56, 34.78, -90.67 }; 44

char string[10] ={'H','e','L','L','0'};

As in the case of other data types initial values are optional and need not be specified. If no. initial values are specified all elements start up with an undefined value. You ,will recall our earlier discussion that in the absence of explicit initialization by the program mer all variables have an undefined value, When you do want to supply initial values, note that the list of values will have to be comma separated and enclosed between braces. Yon may supply less number of initial values than there are elements in the array: The rest of the elements have an undefined value. Now let us look at the array declaration itself. The number of elements that the array will have, should be specified between a pair of square brackets ([]). Note that this value cannot be a variable and has to be an integral constant. See how C uses various symbols for various purposes: Brackets ([]) are used for array indexing, parentheses (0) are used for function calls and braces ({}) are used for enclosing the body of blocks. Their use is fixed and cannot be interchanged. When you declare an array you must specify (either directly or indirectly as we will see how, later) how many elements the array will have. This is necessary because the compiler will have to know how much memory to reserve for this array. So if you declare an integer array like this: int array [5] = { 10, 20, 30, 40, 50 }; The compiler will reserve twenty contiguous bytes in memory to hold these five integer elements. See the diagram below: 10 20 30 40 50 array[0] array[ 1] array[2] array[3] array[4] Having declared such an array it is now possible for you to access the individual elements by using indexing or what is otherwise called subscripting. One of the very first observation that we shall make is this : In C, array indexing always starts at zero. The very first element is array [0] and not array [ 1]. So considering the above declaration, the legal elements of the array are: array [0], array [ 1], array [2] , array [ 3] and array [ 4] : array [ 5] is undefined and it is illegal to attempt to access array[5]. So if you want to access the second element of the array and assign 100 to it you can achieve this by saying: array [1] = 100; /* indexing starts at 0 ! */ We already saw that wherever a particular data type is legal, in all those places, a constant of that type, a variable of that type, an expression which yields a result of that type, and a
45

function that returns that type are also legal. In C the array index will have to be an integral type: So the index that you specify can be an int constant (like 2 or 33) or an int variable, or an integer expression, or a function that returns an integer. Study the following code fragment. What does it do?
#include<stdio.h> main( ) { int arr[5] = { 10, 20, 30, 40, 50}; printf("Which element do you want to display? <5)\n"); printf("Here it is : %d \n", arr[ getchar() - '0' ]); }

Array Overflow One of the extremely important points that you will note here is that it is illegal to access a non-existent element of the array. The array indexing that you perform should not leave either bounds of the array. Note that C traditionally does not check for array overflow at all. In other languages like Pascal, BASIC and Clipper, the compiler attaches a run time support code along with your executable file. The duty of this run time support code is to verify whether such things like numeric overflow and array overflow take place during the execution of the program. In such languages, an array overflow will immediately result in a run time error. The C run time system never checks for array overflow : So it is entirely the programmerr's responsibility to ensure that any indexing performed does not cross the upper as well as the lower bounds of the array. There are two sides to this issue : On one hand the considerable overhead involved in checking for array overflow is not present in C - your programs consequently are considerably faster. On the other, the onus of ensuring the integrity of your program lies with you. What happens when you do by mistake cross the bounds of the array? Look at. the following code fragment:

int arr[5]; arr [5] = 100; /* arr [5] is illegal */ You are now moving a 32 bit pattern corresponding to the binary value of 100 onto the 4 bytes that immediately follow the end of the array. What do these four bytes contain ? It is not known. The C compiler generally stores all your variables together. So when you are executing the assignment arr [ 5] = 100, you possibly are overwriting some other variables that your program uses. In the best case you will end up overwriting one variable (either a single int - 4 bytes or a single float) and in the worst case you will end up overwriting 4 variables - 4 different char variables.
46

The treatment of array overflow in C merits a closer look. Though on one hand your programs gain the advantage of speed, this benefit does not come to you without a price. In other languages as soon as your program makes an attempt to access a non-existent element, it crashes with a run time error. Well, all is not lost: you at least know that there is an array overflow. But what is going to happen to your C program? This purely depends on how significant those variables that have been overwritten are and where they are used subsequently in your program. It is not inconceivable that an array overflow that takes place when the hundredth line of your source is executed, causes your program to misbehave at a much later time when possibly the thousandth line is being executed. This) kind of delayed manifestation of misbehaviour can be extremely frustrating, because it totally puts you on a wrong scent. You end up suspecting innocent code, and spend precious hours debugging code that is perfectly alright, not at all suspecting the offending line that is possibly at an altogether different part of the source file. The above paragraph seems to paint a dismal picture: Well, it was not intended to. The problem is not really as complex as that. There is a very simple solution : Ensure in the first place that you never cross the bounds of the array. Note that an expression like arr [0] has the value of the very first (0th) element and is of the same type as that of very first element. So the expression arr [0] can be used in any place where an int can be. Having discussed the fundamental issues related to arrays, let us write a few programs using arrays. The first of the programs that we shall write is to sort an array of ten integers. To start with, we will employ one of the simplest sorting algorithms: the exchange sort. There are several sorting algorithms that have been developed. Considerable amount of literature too has been devoted to sorting. Why is sorting an important exercise at all? This is because sorting makes subsequent searching easy. Two simple examples come to mind: dictionary and the telephone directory. Just imagine your plight, if you were to search for the meaning of a word in an unsorted dictionary! Before we see the program let us discuss the algorithm. To perform exchange sort on an array of ten integers, first of all you gather the input. Once the input is available the sorting can be begun thus: You compare the first element of the array, one by one with every subsequent element of the array. Whenever you find that the first element is greater than the other element with which it is being compared, exchange the two. What would you achieve by this? Whatever is the least of all numbers present in the array, wherever it was, would have now come to the very first position (array [0]) in the array. ARRAYS

47

Now the minimum has come to array [0] and your job is to sort the rest of the array. So start with array [1] and compare it one by one with all the subsequent elements and effect a swap whenever necessary. Repeatedly apply this process to every element of the array and you will finally have the array completely sorted.

The following program implements this algorithm: #include <stdio.h> #define MAXELEMENTS 10 main ( ) { int array[MAXELEMENTS); int i, j, temp; printf("Enter the numbers to sort :"); for (i = 0; i < MAXELEMENTS ; i++) scanf("%d", &array[i); for (i = 0; i < MAXELEMENTS ; i++) for (j = i+1 ; j < MAXELEMENTS ; j++) if (array [i) > array[j) { temp = array[i); array[i) = array[j); array[j) = temp; } printf("The sorted List of numbers is : "); for(i = 0; i < MAXELEMENTS ; i++) printf("%d\n", array[i); } /* end of main */ Note down the fact that the number of elements of the array is not coded into the program straight away. We have instead chosen to use a define directive. As we discussed in the second chapter this allows us to easily modify the program. If we want the program to sort an array of 20 ints, all we need to do, is change this directive and recompile the program. The outer for loop selects one element for the comparison starting from array [ 0 ]. This element is compared one by one with the rest of the elements in the inner f or loop. You will note here that the entire of inner for loop is executed for every iteration of the outer for loop and the elements are interchanged whenever necessary, thus leaving the array completely sorted. See the way the numbers are scanned from the standard input: Since array [ 0 ], array [ 1 ] , array [2] and so on, are the elements of the array and further since these are integers an ampersand has been prefixed to the expression array [i]. Linear Search One of the simplest search algorithms is the linear search. This is simply to start at the first element of the array and keep searching for the element of interest. Search can be stopped

48

when the element is found or you have exhaustively searched the entire array. Look at the program below: #include <stdio.h> main ( ) { int list [ ] = { 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 }; /* See that the array size is not specified!!! */ int number; int i: printf ("Enter the number to Search "); scanf ("%d", &number); i=0: do { if (number == list[i) { printf("%~ Found in position %d\n", number, i); break; /* no point continuing */ } i++; } while(i < 10); if(i == 10) printf("%d not found in the array \n", number); } /* end of main */ The program itself is quite simple and does not call for any more explanation. Instead let us look at the array declaration. See that, the size of the array is not specified. This is perfectly legal in C. When you supply a list of initial values, the compiler can automatically determine the number of elements in the array. Binary Search The linear search that we studied just now is quite simplistic and inefficient. Let us take up the binary search algorithm now. Tc perform binary search, one of the very first requirements is that the array must be sorted. The following discussion assumes that you have an array of n integers and that the array has been. Sorted in the ascending order already. To search for any particular value, you divide the array into two equal intervals and look at the middle element of the array. If the number you are searching for is equal to what is found at the mid of the array then the search is over. On the other hand, if you find that the mid element of the array is-less than what you are searching for then you can be sure that the element you are searching for, can just not be present at the left half of the array and must be present, if at all only at the right half So you can treat the right half of the array as the search space and divide that into two intervals and look at the mid. Applying the process again and again, you will continue until either you hit upon the element you are searching or the interval comes down to 0 in which case the element is not present in the array. You can see that with every comparison the search space comes down by half. So if initially your array has 1000 elements your search space consists of 1000 numbers. After the first comparison your search space comes down to just 500 elements! (Either the first half of the
49

array or the latter halo and after the second comparison it further comes down to just 250 elements. Hence the name binary search. Look at the program below:

#include <stdio.h> #define MAX 10 main () { int Lower_Boundary, upper_Boundary, Mid; int arr[MAX]; int found=O; int i, number; printf("Enter the array elements in ascending order \n"); for(i = 0; i < MAX ; i++) /* input must be in sorted order!*/ scanf("%d", &arr[i]); printf(“Enter the number to be searched :"); scanf ("%dn, &number); Lower_Boundary = 0; upper_Boundary = MAX; while (! found) { Mid = (LOWer_Boundary + upper_Boundary) /2; if(arr[Mid] == number) found = 1; else if (LOWer_Boundary >= Mid) /*Interval Dwindled to 0 */ { printf("Given Element Not Found \n"); exit(1); } else if(arr[Mid] < number) Lower_Boundary = Mid; /* Upper 1/2 to be searched */ else if(arr[Mid] > number) /* Lower 1/2 to be searched */ Upper_Boundary = Mid; }/* End of while */

printf("Given Number Found at position: %d\n", Mid); exit(0); } /* end of main */ Look at the call to exit. The function exit whenever executed immediately terminates the program and returns control to the as. It is possible for you to return a status code to the invocation environment : by passing the status code as a parameter to exit. This status code is precisely the same as the exit status that you check for in shell programming. By convention an exit status of 0 is returned to indicate success and non-zero to indicate failure of some sort.

50

INTRODUCTION TO STRINGS
In this chapter, we will have a very limited introduction to strings. (Strings are the main topic for a later chapter). Strings are nothing but single dimensional char arrays. So whatever rules are applicable to single dimensional arrays are applicable to strings too. Look at the following string declaration: char message[10] = { 'H', 'e', 'l', 'l', 'o',’\0’ } ; In as much as strings are a common data type in C the compiler gives you a shorthand for string initialization. The following declaration will serve the same purpose : char message [10] = "Hello"; See that string constants are enclosed in double quotes in C. We have already used them in printf and scanf. As we discussed earlier, you can input and output strings using scanf and printf respectively using the %s format descriptor. C also provides you two more specialized string I/O functions - gets and puts. The function gets accepts one parameter - a char array. It reads a string (until new line) from the standard input and stores it in the given array (excluding the new line). If no input is available, (end of me encountered) gets returns a special value called NULL. Note that EOF is not returned by gets. NULL is a symbolic constant defined in stdio. h whose value is 0. The function puts accepts one parameter - the string that you want to print. It prints the string on the standard output and further automatically issues a new line. So if you have a string declared like this: char name [20] ; you can perform I/O using the following statements :
printf("%s", name); /* No automatic new line ! */ puts(name); /* equivalent to printf("%s\n",name);*/ gets (name) ; /* until new line */ scanf("%s",name); /* No & for strings!! - Scans only upto white space*/

STRINGS IN C We have already seen, how to declare strings, how to initialize them, and how to perform I/O on strings (refer chapter - 5). In this chapter, we will fmd out how to operate on strings. Strings are internally represented as single dimensional char arrays. So if you declare a string like this: char name [10] = "chintu" the compiler stores the string in memory as shown below: c h i n t u ‘\0’
51

See that each cell of the array stores the ASCII code of one letter of the string. Every string has two attributes attached to it : One is the size of the string and the other, its length. The size of the string is fIxed, decided by the programmerr at the time of writing the program. For example, the size of the string declared above is 10 bytes. The size cannot change at run time. The only way to change the size of the string is to modify and recompile the source program. The other attribute of the string - its length is variable and can change at run time depending upon what you store in the string. If you store "Raama", the length of the string is 5 but if you choose to store "Ramaswamy" in the string, its length is 9. String Delimiter in C Since the length of the string can change at run time, there must be some means by which it would be possible to identify, where the actual contents of the string end. What in essence is required is a string delimiter. C uses what is called a null byte as the string delimiter. A null byte is a byte in which all the eight bits are zeros. Since a character can be treated as an integer in C, the null byte has a numeric value 0. Please note that the null byte that we are talking about is not the same as the character '0'. The ASCII code of '0' is not 0. STRINGS Any language that supports strings must provide some means by which the programmerr and the functions in the language can identify where a string ends. Let us for the sake of evaluation compare how Pascal handles this problem, with C's approach. In Pascal whenever you declare a string, one byte of the string (its 0'th cell) is reserved to store its length. The actual contents of the string start only from the subsequent cell. C does not use this kind of "length byte" approach, but instead as we have seen defines a specific character (the null byte) as the string delimiter. Which of these approaches do you think is better? As you can immediately conclude, C's design and implementation of strings is far more superior because in Pascal a string's length cannot exceed 255 characters. The length byte being a byte after all, cannot store a value greater than 255. But C has no such limitations: a string can be arbitrarily large. Note that in C, a null byte itself can occur in a string, only in the context of a delimiter and not as a part of the string. But this is hardly a handicap, because you will in general never need to store a null byte as a part of the string. Note: In the above discussion, by Pascal, we are referring to Turbo Pascal. The ISO Pascal does not allow operations on strings at all. All string functions in C, take cognizance of the presence of the null byte as the string delimiter and act accordingly. puts and printf (when you use it to print a string) will stop printing as soon as they encounter the null byte. Similarly gets scanf (when you use it to scan a string) will append a null byte to the end of the string in the array. strcpy will copy only upto the null byte in the source string. strlen when computing the length of the string calculates only upto the null byte (excluding it). strcpy and strlen are string functions provided in the C library. We will be studying them shortly.
52

At this point, you will be able to appreciate the difference between a character and a string whose length is 1. That is to say that "A" is totally different from ' A '. Their internal representation is different, their types are different, they occupy different number of bytes and consequently where one can be used, the other can never be used. String Functions One of the first observations that we shall make is this: There are absolutely no operators in C which operate on strings. A BASIC programmerr can, to assign one string on to another, use the assignment operator' =' (as in A$ = B$), to compare two strings use the equality operator '= '(as in IF A$ = B$ THEN...) and finally to concatenate two strings use the string catenation operator' +' (as in C$ = A$ + B$). A dBASE programmerr can, in a similar way, perform operations on strings using operators. See that you cannot achieve these operations in C using operators as in : char strl[20), str2[20), str3[40); strl = str2 ; /* won't compile at all 1 */ if(strl == str2) /* won't get you anywhere */ str3 = strl + str2 /* Blasphemous 1*/ Note: The only exception to this is when you declare a string: At the time of declaration of a string, you assign an initial value to it by using '= ' So if that were to be the case, how to operate on strings at all ? The C standard library provides you with a whole set of functions which specialize in string manipulation. There are numerous string functions which perform all kinds of odd jobs on strings. The prototypes of these functions are declared in the file string. h. This file should be included to make use of these functions. Let us study the important ones now: C provides a function called strcpy to copy two strings. So, using the declarations provided above, if you want to assign str2 to str1 you will achieve this by saying: strcpy(strl, str2); Note that the function strcpy takes two parameters. The fIrst one is the destination string and the second is the source. The source can be a constant string as in : strcpy(strl, "Raama"); whereas the fIrst parameter, the destination, has to be a variable. The result of strcpy is undefmed when the fIrst parameter is a string constant as shown below: strcpy( "You are Doomed", " If you do this"); and last but defnitely not the least : the receiving string has to be wide enough to store whatever you are trying to move into it. Trying to move a string, that is wider than the

53

destination will produce the same kinds of complications as we discussed in chapter - 5 (refer to the section on Array Overflow). Note that, if you declare a string of 10 chars (as in : char string [ 10] ;) the longest string that you can store in it will be a string of 9 characters and not 10 characters. This is of course due to the fact that one byte will be required to store the delimiter - the null byte. strcpy copies the source string to the destination string character by character until the null byte. As you can imagine, strcpy will copy the null byte also. Otherwise the end of string cannot be properly detected. C provides a function by name strcmp which can be used to compare two strings. The function strcmp takes two parameters - the strings that have to be compared. It returns o on perfect match, returns a negative value when the first string is alphabetically less than the second, and finally a positive value when otherwise. Strings are of course case sensitive. See how it can used: if (I strcmp (strl, str2)) /* I is the NOT operator */ /* strings are identical act accordingly. */ or as In if(strcmp(strl, str2) == 0) /* returns 0 on match not 1 111 */ /* strings are identical~ act accordingly. */ strlen can be used to calculate the length of the string. Pass the string as a parameter to it. strlen computes the length and returns it which can be captured in a suitable variable of yours. Note that the null byte is not counted when the length of a string is determined. To concatenate two strings, C provides a function by name strcat. strcat's use is illustrated in the example below: char strl[20) = "Chintu"; char str2[20) = "Kumar"; strcat(strl,str2); /* strl now contains "ChintuKumar" */ The first parameter is the destination string while the other is concatenated to the end of it. Of course the first parameter must be a variable which is wide enough to store the full string that results from the catenation. To illustrate the use of string functions, let us now write a program: A program which will print the longest line in its input: #include <stdio.h> #include <string.h> main ( ) { char longest[256) = ""; char line [256]; while (gets (line) != NULL)/* Returns NULL on end of file I */ {
54

if(strlen(line) > strlen(longest)) strcpy(longest, line); } printf("%s\n", longest); } Passing strings to functions Let us write a few functions now that operate on strings. The first that we shall write is a function which will convert a given string to upper case. Since it is sufficient if the function achieves the conversion and need not really return any thing at all, we will make it a void function. See the function declaration below: void capitalize(char str[]); The function will be called as : capitalize(string); Take a look at the body of the function produced below: void capitalize(char str[]) {/* See that the size is not specified.*/ int i = 0; while (str[i] != 0) /* while not end of str */ { if(str[i] >= 'a' && s~r[i] <= 'Z') str[i] = str[i] - 32; i++; } } The function itself does very little: Starting from the zeroth element it scans the string character by character, and whenever it encounters a lower case letter, converts it to upper case. See how the formal parameter is declared. The declaration does not specify the size of the array. When an array is a formal parameter do not specify the size. For one thing, you are not trying to allocate memory. Memory allocation for the string has been done elsewhere. For another, the caller can pass differently sized arrays. So it does not make sense to supply the size when an array is a formal parameter to the function. Before we close the chapter, we will write just one more function. A function to copy’ two strings. It shows how strcpy can be implemented: void copy (char strl[], char str2[]) {
55

int i =0; while (str2[i] 1= '\000') { strl[i] = str2[i]; i++; } strl[i] = '\000'; /*Null byte has to be copied too ! */ }/*end of function copy */

See how the null byte is represented in the program. C provides a convenient notation to express octal constants, in your program. This is to supply an octal number in the form ‘\ 000'. There is yet another way by which an octal number or a hexadecimal number can be specified directly in your program. According to C's conventions any number that starts with a leading 0 is treated as an octal number and any number that starts with a leading 0x is treated as hexadecimal number.

56

Chapter 6

Function
As already mentioned, functions in C are nothing but subroutines. Functions act as the fundamental building blocks of a large program. Let us ask ourselves an elementary, yet important question now: Why use functions at all? For a number of reasons. A programmerr can identify whatever actions are performed repeatedly in various parts of the program, write them as a function and call the function from the various parts of the program. This approach avoids code duplication. Instead of duplicating the same code in different parts of the program, you just write the function once and call it from different parts of the program, thus reducing the executable file size. In multitasking systems like UNIX, there is a direct relationship between the amount of memory a program consumes and the speed with which it runs. You will, in general find that large programs which demand a lot of memory are sluggish. Breaking down your program into subroutines lends your program a structure. You can divide your task into multiple sub-tasks, develop a function for each sub-task, and integrate them into a single program. Once a function is written and completely tested, a programmerr can expect it to smoothly fit with the rest of the program modules. Once a function is written, tested and found to be working as per the expectations, the function can be loaded into a library, and can be used in an altogether different application. This approach encourages code reusability. There are three parts to any function in C. Firstly, a function will have to be declared~ then there will be a function call and fin8lly a function will have a body. For example, take the function printf. As you are aware the file stdio. h contains the declaration of the function prin.tf, the call to printf will be made from your program and the body of printf is present in the standard I/O library of C. Let us look at these one by one starting with the function declaration. The general form of function declaration is as follows: type name (parameter list); Note: As mentioned earlier, the pre ANSI C style of function declaration is different from what is shown above. A later section on ANSI C presents these differences. As you can immediately imagine, a function can have a type attached with it. C allows bidirectional transmission of information between the caller of a function and the called function. The caller can communicate with the function by passing parameters, whereas the called function can communicate with the caller by returning a value. The caller here is more at liberty because C allows you to design functions, which will receive several parameters 57

as many as you please. But the called function can return at the most only one value of only one type. So the type of a function specifies the type of the value that the function will return to the caller. The idea of a function returning a value is nothing new to us. We have been using getchar which returns an integer. Just as you capture the return value of getchar, you can also capture what your own function returns, by using an assignment operation. The call to your own function is again straight forward. You call it by naming it and by supplying a parenthesized set of parameters as in : val = myfunction(paraml, param2); Let us look at the body of the function now. The general form of the body of the function is as shown below: type name (parameter declaration, if any) { /* Local variable declarations */ /* Action statements */ } We have said often enough that main itself is a function in C. But main, as we have been writing it until now, does not seem to fit in the skeleton of the scheme (declaration – callbody) that we just discussed. Let us strive to establish now, that main does indeed get treated in the same way as any other function of yours. First of all let us address this paradox: We saw that, your functions will have to be declared. But main is not declared anywhere. Not by you in your program and not in stdio. h. To understand how this is permissible, you will have to know of a concession that the C compiler makes towards function declarations. If a function returns an int it need not be declared. The Other side of the coin is that if a function is not declared but used, the compiler automatically assumes that to be an int returning function. main in C is an int returning function. The body of main too, as we have been writing till now is quite different from the general form that we saw, which is reproduced below: type name (parameter declaration) main() { { /* Local variable declarations*/ /* This is how we */ /* have been writing */ /* main till now */ /* Looks very different*/ /* indeed compared to */ /* Action statements */
58

/* the general syntax */ } } To understand this, we will have to be aware of yet another concession that the C compiler makes: The type of a function need not be specified in the body, if the function happens to be an int returning one. Furthermore, main is presently not receiving any parameters, and since there are no parameters, the declaration of parameters is also missing. The objective of the discussion above is to show you that, the idea of your own functions is not anything really new. We have been doing it already right from the day one. So you can feel at home. Let us apply the skills acquired, to write a function now: A function which will calculate the cube of a given number. Look at the program below: #include <stdio.h> int number, result; /* global variables */ int cube(void); /* void specifies an empty parameter list */ main() { printf ("Enter a number ...:"); scanf ( "%d" , & number ) ; result = cube(); printf("The cube of %d = %d\n", number, result); } int cube(void) { return (number * number * number); } main itself is quite simple, so let us look at the function cube instead. See how the function has been declared. The keyword void specifies that the parameter list is empty. Look at the return statement employed in the function cube. return is a keyword in C and is actually a statement like break and continue. A return statement is used to transfer control from a’ called function to the caller. Whenever a return statement is executed, the function is over, and control immediately returns to the caller along with any value specified. So in the case of the function cube the return value is computed and returned to the caller and as you can see the caller (main) captures this value in a variable by name result. Though the above program is technically correct, there is something wrong in the way the function cube has been written. A little contemplation will show you that the function cube, as it has been written now, can only be used to calculate the cube of just one variable: one by name number. If you want to calculate the cube of some other variable, let us say one by name value, the same function cannot be used. Parameter Passing
59

How to write a function in such a way, that it can be used to calculate the cube of any variable? This can be accomplished by passing parameters. Parameters are nothing but input information given to a function. By passing parameters, the caller can ask the function to process a given set of values. By passing different sets of values at different points of time, the same function can be made to operate on different data. Let us demonstrate this by writing yet another version of the same function cube. Look at the program below: #include <stdio.h> int cube(int); main() { int number, result; printf("Enter a number ...:"); scanf("%d",&number); result = cube(number); printf("The cube of %d = %d\n", number, result); } int cube(int value) { int retval=(value * value * value); return retval; } Parameter passing is a very powerful mechanism. By passing parameters, as we saw already, you can make the function process different sets of data at different points of time. Thus parameter passing allows you to write generalized, reusable functions. Let us write yet another function now: One which will raise a given number to a given power i.e. one. Which when called as power (x, y) will return xy. #include<stdio.h> int power(int, int); main() { int base,pow; printf(“Enter 2 numbers “); scanf(“%d%d”,&base,&pow); printf(“%d power %d = %d”,base,pow,power(base,pow); } int power(int x,int y) { int result; for(result=1;y>0;y--) result= result * x; return result; } Whatever parameters the caller passes are called the actual parameters and whatever parameters the function is written to receive are called the formal parameters. Considering the
60

program shown above, the parameters that the caller passes namely, base and pow are called the actual parameters and the parameters the function power receives namely, x and y are called the formal parameters. A formal parameter can be called by any name by the programmerr. It need not necessarily bear the same name as the actual parameter, although it may have the same name, if you so desire. One of the very important rules in C is that, the actual parameter list and the formal parameter list must be identical in terms of number of parameters, type of parameters and the order of parameters. If the function expects n parameters, be sure to pass the same number of parameters. Passing more is usually harmless though you are strongly discouraged from doing so. Passing less number of parameters than what the function expects is dangerous. The actual parameter list and the formal parameter list should be identical or atleast compatible in terms of types of parameters also. ANSI C The American National Standards Institute has formulated a standard for C. The ANSI C differs from the pre ANSI C in a few major areas. One of them is with respect to function declarations. The ANSI C introduces the idea of function prototypes. A prototype to a function clearly spells out the input output relationships of the function. For example, under the new style, the function power will be declared as shown below: int power(int, int); whereas in the pre ANSI compilers, the declaration will have the following form : int power(); The function declaration in ANSI C, not only specifies the return type, but also the parameters to the function. Please note that the pre ANSI C compilers never have an opportunity to check for parameter mismatches. The following function ,calls which are glaringly wrong will compile without a problem in pre ANSI C : int ch, a,b; printf();/* no parameters at all */ getchar (ch); /* getchar does not accept any at all I */ strcpy(a,b); /* ints are being passed instead of strings I */ All of the above erroneous calls will be passed by the pre ANSI C compiler without an error. Of course, the program will not run properly and is very likely to crash with a run time error. The pre ANSI C compiler can never check for parameter mismatches because the compiler just does not know the input output relations of a function. The provision of the prototype is precisely to address this problem. Now with the new style of declarations the compiler is equipped with enough knowledge to find out whether a
61

function call is proper, and if not, carry out suitable action like rejecting the source with an error or attempting to provide suitable type conversions. Note that, the ANSI C compiler will reject all the above erroneous statements, because the compiler can now definitely verify the validity of your source to a greater degree. The function prototype not only allows the compiler to verify whether the actual parameter list and the formal parameter list are identical, but also allows the compiler to perform automatic type conversions wherever suitable. For example, if the function power is declared with the new style of declaration and is called with float parameters as: float f1 = 10.99, f2= 3.89; printf("%d", power(fl,f2); The compiler will now, automatically convert the values of actual parameters into equivalent int values and then pass them onto the function power. Thus, as you can expect, printf will now print 1000. In the pre ANSI C compilers, the situation narrated just now, Will produce totally nonsensical results. This is because, the compiler in the absence of prototype specification, is unaware of the fact that, the function expects ints. Hence, it does not provide the needed conversion of internal representation of the values of the actual parameters. The syntax for the body is also slightly different with the ANSI C. The function power's body is shown below with the new style: int power(int x, int y) { int result; for (result = 1 ; Y > 0 ; y--) result = result * x; return result; } Contrast this with the old style shown below: int power(x, y) int x, y; { int result; for (result = 1 ; Y > 0 ; y--) result = result * x; return result; }

62

Note that the ANSI C compilers will also accept the pre ANSI form of function declarations for compatibility reasons. Note: Check up whether the compiler you have access to conforms to the ANSI standard or not, and choose the suitable style. We made a mention earlier that, int returning functions need not be declared. But, you are strongly discouraged from omitting the declaration. This is because, in the absence of the declaration, the compiler assumes that the function returns an into Well, this is fine, but, what does the compiler know about the parameter list the function expects? Nothing at all! So, inadvertent parameter mismatches will go undetected by the compiler. One final observation before we move on to the next topic. How to declare, a function which does not accept any parameters at all ? The proper way of doing this is : int some_function(void); and not int some_function(); In the latter case, what really happens is that, the compiler interprets the declaration to be a pre ANSI C style declaration and turns off all parameter count and type checking. void functions A function need not necessarily have a type. If you do not care to return a value from a function at all, you may specify the return type as void. A void function does not return any value and cannot return any value. The return statement revisited The general form of the return statement is: return (expression); Where you do not care to return a value (as in a void function) and want to return the control alone you may simply say return ; We will continue with our discussion of functions in this chapter. We made a brief mention of global variables earlier. We shall learn them in greater detail now. A very clear understanding of the idea of scope is necessary to successfully program in C. We will strive to achieve this. C allows you to declare variables that belong to several storage classes. We shall discuss the different storage classes and their significance. LOCAL VARIABLES
63

We saw that, the body of any function comprises two parts : declaration of variables and the set of executable statements. We have indeed been practicing this with respect to main in a number of programs. main is just another subroutine in C. So. what holds true for main, holds true for any other subroutine that you write. Thus it follows that. variables can be declared inside any function. These variables can be numerous. belong to various types and so on. Variables declared inside a function are called local variables. This name derives from the fact that a variable declared inside a function can be used only inside that function. An attempt on your part to access the local variable of one function. in another, will draw an error from the compiler: Identifier undefined. But you often require the ability to access variables declared in another function. So how to handle this ? As you can imagine, this problem can be solved by passing parameters. When a function is called you can pass all the relevant information as parameters. But this approach can, at times prove to be cumbersome, because where a called function has to be provided access to numerous variables of the caller, the parameter list will be very lengthy. Apart from this burden on the programmerr, to remember and properly invoke a function with the correct parameter list, this approach will also consume a lot of time during the execution of the program. You will recall how the parameter passing mechanism works: the actual parameters are copied to the formal parameters. Thus when the number of parameters is large, the overhead of parameter passing can be very high. Besides this, often, several functions are required to share information. This results in transmission and re-transmission and re-transmission of parameters, which affects the performance of the program.

GLOBAL VARIABLES One way of solving this problem is to resort to the use of global variables. In the very first chapter we made a passing mention of global variables. We saw that a C program consists of three sections namely: the pre processor directives, the global variable section and finally the functions. The variables you declare in the global variable section are called global variables or external variables. While a local variable can be used only inside the function in which it is declared, a global variable can be used anywhere in the program. The problem depicted above, that has to pass a large number of parameters to a function, can be circumvented using global variables. Since these are accessible by name, anywhere within the program, these need not be passed as parameters. The caller can deposit information in the global variables and branch to the function, which can now access the variable and act suitably. The simplicity of this scheme can be misleading. Global variables, though seem to simplify communication between functions, may actually cause a lot of trouble when handled indiscriminately. One of the axioms of structured programming is that, a program must have well defined control coupling and data coupling. See that where global variables are used excessively, the data coupling is no longer apparent. By merely examining the function call
64

or the prototype, you can no longer determine the data communicated between the caller and the called function. This leads to lose inter-connections between the different modules of the program, thus making it less structured. Maintenance of the program becomes complicated because, a change to a module could reflect on the other modules too. The reusability of a function is also affected. Unless the same global variables are available in other programs also, the function which uses global variables cannot be reused. For functions to be reusable, they must encapsulate code and data. So, the choice of using global variables has to be made by judiciously considering the above implications. Certain rules of thumb can be formed: 1. If a piece of information is needed by only one function, it should be available only to that function. 2. As you break down your program into functions, try to identify the reusable components. Develop them with least dependence on global variables. Block variables Yet another place to declare variables is inside any block: these variables are called block variables and these can be used only inside that block. Their characteristics are identical to those of local variables. So, for all practical purposes these can be considered to be local variables, exclusive to the block in which they have been declared. GLOBAL VS LOCAL VARIABLES There are a number of differences between global variables and local variables. 1. Local variables can be used only inside the function or the block in which they are declared. On the other hand, global variables can be used throughout the program. 2. The rules of initialization differ. All global variables, in the absence of explicit initialization, are automatically initialized to zero. A global int variable starts up with the value 0, a global float gets initialized to 0.0, a global char holds the ASCII null byte, and a global pointer points to NULL. As against this, local variables do not get initialized to any specific value, when you do not provide any value. Thus a local variable starts up with an unknown value, which may be different each time. 3. Global variables get initialized only once, typically just before the program starts executing. But, local variables get initialized each time the function or block containing their declaration is entered. 4. The initial value that you supply for a global variable must be a constant, whereas a local variable can contain variables in its initializer.

65

5. A local variable loses its value the moment the function/block containing it is exited. So you cannot expect a local variable to retain the value deposited in it the previous time the function/ block was entered. Global variables retain their values throughout the program's execution. 6. The pre-ANSI C compilers do not allow local compound variables to be initialized. A number of points deserve close attention here. One of the fundamental rules, a good program should obey, is, under a given set of conditions and with a given set of input, the program should yield the same output today, tomorrow, next year and a thousand years hence. But see how this rule is in peril when it concerns the usage of uninitialized local variables. If a local variable is not initialized, it contains garbage and should your program attempt to use this value, then the program's behavior becomes erratic and absolutely unpredictable. Look at the statement below: int some function () { int index; int array [10] while (index < 10) /* Loop to initialize elements to 0 ~/ /* Do something */ }

array [index ++] = 0 ;

The intention of the programmer is quite obvious. He/she wants to initialize the elements of the array to zero. Observe that the local variable index has not been initialized. What happens when this function is executed? Since the value of index is unknown, we probably are accessing non-existent elements of the array. Also take a look at the following code fragment: int some_function () { int i; if(i % 2)/* Test if odd or even /* */ Do something */ else */ Do something else */ } Whether the statements associated with the if will be executed or those attached to the else will be executed is unpredictable. During different runs of the function, different courses, of action may be taken. So beware! Always assign a value to local variable before attempting to access it. Some in sigh t into the memory layout of an executable file is called for. When you attempt to run a.out from the shell prompt, the kernel transforms the file a.out present in the file
66

system, into a process. Before execution, the kernel allocates memory to hold the different sections of the executable image. The three sections of any process are: the text/code section, the data section and the stack section. The code section contains the sequence of machine instructions which the compiler has produced. The data section is nothing but the working storage for the global variables. Finally, the stack is a reusable area of memory used for a number of purposes like storing local variables, function parameters and return values etc, By reading the source file, the compiler knows the amount of memory required to hold the local variables of each function. The compiler arranges to allocate as many bytes as are necessary to store the local variables in the stack. This allocation is done just prior to entering the function and as soon as the function is exited, this storage is de-allocated and made available for future allocation of storage space for local variables belonging possibly to another function. Given this piece of information, you should be able to appreciate why local variables do not retain their values between function calls. Also, reflect for a moment on why a local variable contains garbage when not initialized. Since the stack constitutes recycled area of memory, the storage space allocated for a local variable, would have housed other variables earlier. Hence the garbage. PARAMETERS VS LOCAL VARIABLES How about parameters? Are these global or local? According to C's rules, the parameters of a function are very much local variables of the function. But these differ from local variables in one respect: parameters are initialized by the caller. We discussed the positional transmission of values in the last chapter. By passing parameters, the caller initializes the formal parameters of the function. Note that the formal parameters and the actual parameters are totally distinct variables. Changing the formal parameter does not change the actual parameter, even when the two have the same name. As mentioned earlier, parameters are also housed in the stack. For the positional transmission to effect proper initialization of the formal parameters, it is crucial that the formal and actual parameter list conform. So what happens when you pass more or less parameters than expected by the called function? If the compiler is an ANSI compatible compiler and you have supplied the ANSI style function prototype declaration, then the compiler will issue an error. On the other hand, if your function declaration style or the compiler is pre-ANSI, then what happens can be described as follows? When you pass less number of parameters, some of the formal parameters of the called function remain uninitialized. This leads to the same kind of complications as uninitialized local variables will lead to. Passing more parameters than expected is usually harmless. SCOPE OF VARIABLES We are already aware of the notion of scope, though it was never recognized as such. Scope specifies the region of the source program where a variable is known and accessible. Other

67

languages too have similar features though they may be referred to by other names. Two notable examples are Pascal and dBASE. The scope of local variables is limited to the functions in which they are declared. Or, in other words, these variables are inaccessible outside of this function. Likewise, the scope of block variables is limited to the block in which they are declared. Global variables have a scope that spans the entire source program, which is why they can be used in any function. Closely associated with scope is the notion of the extent of a variable. The extent of a variable is the lifetime of the variable. This notion specifies when a variable comes into existence and when it expires. A local variable gets created, as we saw, just as the function is about to be entered. It expires the moment the function is exited. The extent of block variables, is similarly, confined to the open and close braces that enclose the block. Global variables are alive till the program terminates. One of the points you will have to be aware of is that, it is possible for you to have more than one variable by the same name, as long as their scopes are different. Consider the scenario presented below: # include <stdio.h> int x,y,z; float a,b,c; short p,q; char str1 [20],str2 [20]; main() some_function() { { int a,b; char str1 [10],str2 [10]; float p,q; int p,q; short x,y; /* Action statements */ /* Action statements */ } } As you can observe, there are several variables whose names are not unique. For example, a and b are global float variables and there are also two int variables by the same name inside main. Similarly, main as well as some_function have local variables by name p and q. One of the first points that we shall make is this: These variables, though they have the same name are not the same variables. Each has its own address, may store a different value, may be of different type. Altering the value of one of these variables, will not affect the others with the same name.

68

Given this kind of a situation, there may arise conflicts due to non-unique names. How does the compiler know which variable the programmer is referring to? For example, if the programmerr were to say, p = 100; in main, then which of the three variables with the name p takes the value 100 ? Is it the global variable p, or main's local variable p or the variable p that is local to some_function? In order to answer this question you will have to be aware of C's rules of precedence concerning identifier conflicts. First of all, the variable p declared inside some_function, by virtue of its scope and extent, is not at all accessible in main. Therefore, the p inside some_function is out of the conflict. As per the rules of C, the innermost declaration of a variable is the one the compiler will bind the reference to. Whenever you refer to an identifier in your program, the compiler checks for a declaration within the current block. If one is found, your reference binds to this declaration, and the compiler looks no further. If no such block level declaration is found, then the compiler searches for a local variable with the same name. If one is found, the compiler binds your reference to it, and, stops looking further. In the event that a declaration is absent in both these places, the compiler searches for a global declaration to bind your reference. If this is also missing, the compiler will reject your source. See that the compiler does not search the local variable sections of other functions at all. Though, you may use several variables with the same name, one of the stipulations that you have to meet is that no two variables with the same scope may have the same name. This is a demand imposed by the compiler as in the absence of this stipulation; the compiler may not be able to resolve naming conflicts. RECURSION Recursion is a situation when something is defined in terms of its own self. Loosely put, recursion refers to a situation when a function calls its own self. If a function xyz is allowed to call printf, scanf, getchar, putchar and all other functions, can it call xyz itself? Yes, this is very much possible in C. Why is recursion useful at all? Basically because, a number of problems in the humanDom and computerDom are recursive in nature. Family tree and directory structure come to mind. A recursive program to handle these cases is the natural choice. Let us, to illustrate how recursion may be employed, write a program now: A program to calculate the factorial of a given number. Factorials are by definition recursive. A factorial of a positive integer is defined as: N I = N '* (N-l) '* (N-2) '* 3 '* 2 '* 1; This by inspection can be reduced to :
69

N I = N '* (N-l) I Thus the factorial of a number is nothing but that number multiplied, by the factorial of the number one less. As you can see factorial is defmed in terms of its own self. Look at the program below: #include <stdio.h> int fact(int); main ( ) { int number; int result; printf(" Enter a positive int"); scanf ( "%d", &number); result = fact(number); printf("The factorial of %d is %d\n", number, result); } int fact(int number) { if( number == 0 || number == 1) return 1; else return number * fact(number-l); /* fact is called from fact */ } Look at the body of the function fact. As you can see, it calls its own self. Assume that in response to the scanf you typed the value 2. So main calls fact and passes 2 to it. The if statement inside the function fact gets executed now and execution reaches the else clause because the condition is false presently. So fact gets called a second time and whatever it returns is multiplied with the present value of number and is returned to main.

70

Chapter 7

Structure, Union, Enum

A structure in C is a heterogeneous compound data type, similar to the records of dBASE and Pascal. We have earlier studied one of the other compound data types that C provides: arrays. Whereas arrays are essentially homogeneous in nature, a structure can have as its member objects of different types. Why are structures useful? For the same reason as the one, which makes arrays indispensable? To spell out, you have the choice of treating a structure as a single composite unit if you desire so, or break it down and access the independent members, if that is more convenient. Just as any other variable, a structure variable will also have to be declared. The declaration of the structure can be done in the following form: struct [tag] { type varl; type var2; type var3; : : type var_n; } stvar; As you can see, the structure declaration starts with the keyword struct followed by a tag. Sandwiched between an open and close brace, you provide the type declaration for the members of the structure. After the close brace you may specify a comma separated list of variables. The first observation that we shall make is this : The tag is optional. What purpose does the structure tag serve ? It is a name used to identify a particular template of the structure. Just as with other data types you may require structure variables of the same shape to be declared in several parts of the source program. Some of these declarations could be local to functions, and some others are possibly global. So is it really necessary to introduce the structure template (shape) in every one of these places? Well, not really. This is where the structure tag comes into play. The optional structure tag, when it is provided can be used to declare structure variables by using the following form of declaration: struct tag newstvar; /* template introduced earlier I */

71

STRUCTURES The tag eliminates the need to spell out the member declarations all over again. Look at the following code fragment. It tells you how the tag may be used and also how to initialize structures. struct person { char name [50] ; int age; float salary; char telephone [15]; }; struct person emprec = { "Ram", 20, 4000.50, "0448251209" }; C provides the . (dot) operator using which structure members can be accessed independently. The dot operator connects a structure variable with its member. Look at the example code fragment shown below: emprec.age = 50; strcpy(emprec.name, "Tom"); /* not emprec.name = "Tom" ! */ The dot operator must have a structure variable on its left and a legal member name on its right. Note that the structure tag is not a variable name. It is rather a name given to a template of a structure. Hence statements like the one shown below will be rejected by the compiler. person.age = 20; /* Person is not a structure variable but a tag*/

Having discovered how to declare structures and access their members, let us now fmd out, what the members of the structure can be. The definition of C says that any legal data type can be a member of a structure, any legal data type, which includes a structure too, thus giving rise to a nested structure declaration. Look at the following structure declaration which tells you, how nested structures may be employed. struct librec { char bookname[50]; char author[50]; int accession_no; struct { int date; int month; int year;
72

} return_date; char borrower_name[50]; } bookrec; Given this kind of a declaration how do you set the year to 1992? You do it as shown below: Bookrec.return_date.year = 1992; ARRAY OF STRUCTURES It is possible as we saw, to have an array inside a structure. The other side of the coin hold true too. You can declare an array of structures if you so desire: struct person { char name [50] ; int age; float salary; char telephone [15]; } starray[10]; STRUCTURES AND FILES A C programmerr when he or she needs to save a structure in a file can resort to two means. The first is to create an ASCII file with the use of the fprintf and fscanf functions. The formatted output calls mentioned, when used, create a file which is cat'able. In general, records in the file can be organized in a line oriented fashion, with each line containing a structure. The structure's fields may typically be separated by a white space. Look at the following program. It illustrates this approach. #include <stdio.h> #include <stdlib.h> main() { struct address { char person[50]; char street[50]; char locality[50]; char city[50]; } emprec; /* system is declared here */ FILE * fp; fp = fopen("address.dat","a");
73

if(fp == NULL) { perror("address.dat"); exit(l); } do { system("cls"); /* system executes arg as though typed at prompt*/ printf("Enter name :"); gets(emprec.person); printf("Enter street :"); gets(emprec.street); printf("Enter locality :"); gets(emprec.locality); printf ("Enter city : ") ; gets(emprec.city); fprintf(fp,"%s\t%s\t%s\t%s\n", emprec.person, emprec.street, emprec.locality, emprec.city); printf("DO you want to add more ?"); } while( getchar() == 'y'); fclose (fp) ; exit(0); } OPERATIONS ON STRUCTURES Certain operations are permissible on structures. One structure variable can be assigned to another provided that they have the same tag. A structure can be passed to a function as a parameter and a function can return a structure. The address of a structure variable can be extracted and manipulated. Addition, multiplication and all such operations are forbidden, even when the structure consists only of numerical data. Study the program below. It serves to illustrate how structures may be passed to functions. . # include <stdio.h> /* Program to add two complex numbers */ struct complex { int real; int imaginary; }; struct complex complexadd(struct complex, struct complex); main () { struct complex cl, c2, c3; printf("Enter the real and imaginary parts"); scanf("%d%d",&cl.real,&cl.imaginary); printf("Enter the real and imaginary parts"); scanf("%d%d",&c2.real,&c2.imaginary); c3 = complexadd(cl,c2);
74

printf("Sum = %d+j%d\n",c3.real,c3.imaginary); } struct complex complexadd(struct complex cl, struct.complex c2) { struct complex temp; temp. real = cl.real+c2.real; temp. imaginary = cl.imaginary+c2~imaginary; return temp; }

75

Chapter 8

Pointer
Pointers are a simple data type in C. We have already studied the other atomic data type sthat C provides namely the ints, the floats and the chars. The first question that we shall ask ourselves is this: If pointers are a data type in C, what kind of values do they store in them? Before we can get an answer to that question, we will have to quickly refresh certain fundamental ideas which we have learnt elsewhere. Any digital computer, be it a mainframe or a home computer can be found to possess the same component units. You have the central processor, which is connected on one hand to the memory unit and on the other to I/O devices. The central processor is the combination of arithmetic logic unit (ALU) and the control unit. By memory, we are referring to the primary semiconductor random access memory. Hard disks and floppy disks are not memory devices but instead are I/O units (secondary storage devices). The central processor is the only electronic circuitry in the entire of the computer which can perform arithmetic. The CPU has inside it several, what are called, registers. A register is an electronic circuit which registers data. The width of a register, in terms of the number of bits it can store, may vary from one computer to another. You would have often heard of a 32 bit computer or a 16 bit computer and so on. What is being talked about is, precisely the width of the CPU's register. A typical central processor has about twenty to thirty registers inside it. But a program may use several hundreds of variables. So it is clear that not all of these variables can be stored inside the CPU itself. This is where the memory unit comes into the picture. The program with all its code and data is stored in RAM and the data is brought into the CPU's registers as and when needed, computation. is performed on the data and results are transferred back to RAM. This requires movement of data back and forth between the CPU and the RAM. Since there usually will be several thousands of memory locations in the RAM, the CPU needs some mechanism to identify and request transfer of data from and to a particular location. This mechanism is called addressing. The RAM is organized as a stack of several byte locations. How many such memory locations are there in a computer is dependent on the configuration of the computer. In order for the CPU to identify each and every one of these locations and request transfer of data, each of these locations has been given a unique positive identification number. This number is what is called the address of that memory location. It is absolutely imperative that, one memory location should have only one address and one address should correspond only to one memory location. Well, this is something that the hardware has to guarantee and is not a concern for us. Since your program's code and data are stored in memory, you can infer that every variable and every function has an address and that these addresses will be distinct for each variable and function. Now let us find out how the computer executes a high level source statement such as:
76

total = first + second; First of all, we will make this observation: The identifiers that you use in your programs for variables and functions are used only in the high level source. The hardware does not understand anything about these variable names, and it does not execute a program in terms of these. A variable name is merely a symbolic reference given to a memory location. So, if the program with the above statement is loaded in memory in such a way that the variable first occupies the address 5000 and second, the address 5010, and total occupies 5020, then the compiler would have generated machine instructions to move the data from the locations 5000 and 5010 into the CPU's registers, add them and move the sum to the location 5020. The compilation process strips all identifiers from the executable me and everything now is in terms of addresses. So, addresses of objects are a very important piece of information which the computer operates on. The pointer data type in C, stores nothing but addresses. C provides operators using which the addresses of objects can be extracted. These can be stored in pointer variables and manipulated in several ways. The following sections describe what you can do with pointers, and how. DECLARATION AND USAGE Having discussed what pointers are, let us now discover how to declare a pointer and how to make use of it. The general form of a pointer declaration is as shown below: type * varname; for example: int *iPi float, *fPi; short *SP; char *cpl, *Cp2; /* not char *cpl, cp2 */ The last declaration deserves close attention. If you provide a declaration such as char *cpl, cp2 ; What you are declaring are two variables: one by name cpl which is a char pointer and another by name cp2 which is a char and not a pointer. If you have a pointer variable ip, and an int variable, i, as shown below, int *ip, i ;

then, you can make ip point to i by the statement:
77

ip = &i ; The & is called the address of operator. It extracts the address of the associated variable. This address is captured in the pointer variable ip. ip now contains the address of the variable i. You say that ip is pointing to that area of memory where i resides. Hence the name pointer. Now that ip is pointing to i, it is possible to access i using the pointer ip. This is done by saying: *ip = 50; The * operator called the indirection operator, when applied to a pointer variable, accesses the variable to which the pointer points. Thus, in the above statement, we are moving a value 50 into i. Note that the expression * ip has the type int and as such can be used in any place where an int is expected. Since ip is a variable, it can hold the addresses of different variables at different times. If you make such an assignment as : ip = &temp; /* where temp in an int variable */ then, ip no longer points to i, it points to temp, now. Initialization The rules governing the initialization of variables that we studied earlier apply to pointers also. Thus, a pointer variable, declared in the global variable section, gets initialized so as to contain the address O. On the other hand, local pointer variables are not initialized to any specific address. So what happens if you apply the * operator to such a pointer? Before you use a pointer variable to access an object, you must, first of all, make the pointer point to a valid object of yours. A global pointer variable gets initialized to a NULL pointer. A NULL pointer is nothing but a pointer which contains the address 0. One of the first warnings that we will give you is this: never apply the indirection operator *, on a NULL pointer. Usually, the operating system kernel occupies the lower end of memory. So, an attempt on your part to access the 0th location for writing, means that you are trying to overwrite the as kernel, and for reading, means that, you are attempting to read the data structures or code of the kernel. Now, what will be the effect of such an attempt on your part? This depends on the as you are working with UNIX and other protected mode operating systems will in general, prohibit any such intrusions and abort the offending program (segmentation violation). On the other hand, benign operating systems like, MS -DOS, will allow themselves to be over-written, causing misbehaviour and utter chaos.

Let us now find out what happens when you apply the * operator to a local pointer variable, to which you have not assigned any address. Since a local variable gets initialized to an unknown value, a local pointer variable, which is not initialized by you, points to an unknown location. So, applying the * operator to such a pointer, is dangerous. The pointer could be
78

pointing to the OS kernel space or another application program's address space. The UNIX kernel will thwart such attempts by aborting your program. Even if the stray pointer is pointing to your own address space, you still do not know, where in your address space it is pointing. If the pointer that we are talking about is an int pointer, in the best case, you will end up corrupting one variable and in the worst case, four different variables. Often, there are certain hardware imposed restrictions on how the data is to be aligned in memory. Several microprocessors require ints to be aligned, so as to start at an even address. So, if an integer pointer happens to contain an odd address, and you apply the * operator to it, it will result in a bus error. Use pointers with care. Never apply the indirection operator to a pointer until it points to a valid object. Mixing pointers There are as many pointer varieties as there are data types. The data types in C, char,short, int, long, float, double etc., each has its own associated pointer type. It is in general not advisable to mix pointer types. That is, do not attempt to store an int's address in a char pointer or a char's address in a float pointer and so on. Why is it that such mixing of pointers is to be avoided? int * ip; char ch; ip = &ch; /* Not recommended */ *ip = 100; /* Gone case 111 */ Assuming that the address of the variable ch is 5000, this is captured in ip. The final statement shown attempts to move a 32-bit pattern of value 1O~ into the four locations 5000,5001, 5002 and 5003. This happens so because, ip is an int pointer and an int occupies 4 bytes. So, the compiler, while generating the code, would have arranged for the 32-bit pattern to be moved into the four locations, whose base address is 5000. We can conclude by saying that mixing pointers will, under the normal circumstances result in improper access of data. POINTER ARITHMETIC Since a pointer is a data type in C, certain operations are allowed on it. Let. us now find out, what the operations that you can perform on a pointer variable are. For illustration let us assume that we have an int pointer variable by name ip and an array of 10 integers by name arr and that the pointer ip is pointing to this array as in : int * ip ;

79

int arr[lO); ip = &arr[O); or ip = arr; /* Same as previous statement */

arr[0] ip

arr[1] (ip+2)

arr[2] (ip+4)

In order to make ip a pointer to arr, either of the above statements can be used. In C,the name of an array, is automatically a pointer to the array. So, arr being the name of an array, is an integer pointer expression which refers to &arr [0]. But, there is an important difference between the name of an array and a pointer variable, like ip. The name of an array is, though a pointer, a constant expression pointing to the start of the array. Because of this, while it is legal to say : ip++ ; it is illegal to say : arr++ ; The above statement is illegal on two counts: firstly, arr, being a constant, cannot be incremented. Secondly, what you are, in effect, trying to achieve is to alter the base address of the array. This is impossible and is as such forbidden. Given this kind of a situation, C allows you to perform a number of operations on the pointer variable ip. Firstly ip can be incremented by applying the ++ operator on it. What is the effect of incrementing a pointer? When you increment an integer the value of the int variable increases by one. On the other hand when you apply the ++ operator to a pointer, the pointer comes to contain the address of the next element. Or in other words the pointer becomes a pointer to the next element. Note that for this to happen, the ++ operator may have to raise the contents of the pointer by more than one. Let us assume that, the array starts at the address 10,000. This means that the element arr. [ 1] has the address, 10,004 and arr [ 2 ], the address 10,008 and so on. So, the statement, ip = &arr[O); results in ip containing 10,000. Now if you increment ip, the value of ip becomes 10,004 and not 10,001. Thus, it is important to realize that applying the increment operator to a pointer may result in the value contained by the pointer being incremented by more than one. Though the above discussion uses an int pointer and an int array, for example, the same logic holds true for any kind of pointer. Assume that we have a char pointer variable cp and a string as shown below:

80

char *cp; char string [10] ; cp = &string [0]; Given this, if you increment cp, it will become a pointer to string [1]. Increment it further, it points to string [2]. If the string were to begin at the address 20,000, the statement cp = &string [0]; assigns 20,000 to cpo Incrementing cp, will cause 20,001 to be stored in cpo Thus, the actual value increment that the ++ operator provides, is dependent on the data type. This is not an issue that the programmer has to bother with. The C language guarantees that when you increment a pointer, it will point to the next element. For this to be achieved, as you can imagine, the contents of the pointer variable will have to be augmented by as much as the size of the data type of the pointer. This is important, so let us repeat this: whenever you increment a pointer of any type, it becomes a pointer to the next element. Note that the word element is generic. The decrement operator --, when applied to a pointer, makes the pointer point to the previous element. Look at the statements given below. They illustrate this feature: int * ip; int array [10] ; ip = &array [5]; ip-- : /* ip points to array [4], now */ C allows you to add an integer to a pointer. Adding an integer n to a pointer, causes the pointer to point to the element which is n elements beyond the current element. In a similar fashion, subtracting an integer n from a pointer, results in the pointer pointing to the element which is n elements before the current element. The final operation that C allows with pointers is the subtraction of two pointers, provided that they point to the same array. Subtracting two pointers yields the number of elements in between the locations pointed to by them. Let us summarize and enumerate the operations permissible on pointers. 1. A pointer can be assigned to another. The effect of this operation is to make both the pointers point to the same object. 2. A pointer can be incremented/ decremented. The result is to cause the pointer to point to the subsequent/previous element.

81

3. Integers can be added/subtracted with pointers. The pointer moves forward or backward accordingly. 4. Two pointers can be subtracted from one another provided that they are pointing to the same array. The expression returns the number of elements between them. All other operations on pointers are forbidden. You cannot add two pointers, nor multiply them and so on. The compiler will reject all such attempts. Let us demonstrate the issues involved in pointer arithmetic by rewriting the program to sort an array of 10 ints. # include <stdio.h> #define MAXELEMENTS 10 main() { int array[MAXELEMENTS]; int i, j, temp; int *ip; ip = &array [0]; printf("Enter the numbers to sort :"); for (i = 0; i < MAXELEMENTS ; i++) scanf("%d", ip + i); for (i.= 0; i < MAXELEMENTS ; i++) for (j = i+l ~ j < MAXELEMENTS ; j++) if(* (ip + i) > * (ip + j)) { temp = *(ip + i); *(ip + i) = *(ip + j); *(ip + j) = temp; } printf("The sorted List of numbers is : "); for(i = 0; i < MAXELEMENTS ; i++) printf("%d\n", * (ip + i)); } /* end of main */ Before we conclude this section and move onto discuss the relationship between arrays and pointers in C, let us make one final observation: Pointer arithmetic is meaningful only when performed on a pointer that is pointing to an array. Doing arithmetic on a pointer that is pointing to a standalone variable will result in grave trouble. int * ip; int temp; ip = &temp; ip++; /* meaningless though the compiler will allow this */ /* Grave mistake ! */ *ip = 100 ; As the above example shows performing arithmetic on a pointer (ip) pointing to a standalone variable (temp) will cause serious trouble. Even when the pointer is pointing to an
82

array, it is the programmerr's responsibility to ensure that any arithmetic done does not carry the pointer beyond either of the bounds of the array.

ARRAYS AND POINTERS In C there is a strong relationship between pointers and arrays. A pointer can be treated as though it is an array and similarly an array can be treated as a pointer provided certain conditions are met. A pointer can be treated as an array and can be indexed across, if and only if it is presently pointing to an array. An array can be treated as a pointer as long as you do not attempt to change the base address of the array. As you are aware, the name of an array in C is a constant expression which is a pointer to the 0th element of the array. Look at the following statements: int * ip; int array[10]; ip = array;

Given the above situation, if you want to access array [5] and store the value 100 in it, this can be done by saying: array[5] = 100; *(ip + 5) = 100; /*'Treat array as an array */ ip[5] = 100; /* Treat pointer as a pointer */ *(array+5) = 100; /* Treat a pointer as an array */ *(ip + 100) = 100; /* Treat array as a pointer */ array = array + 2; /* meaningless, though the compiler will allow this */ /* illegal : LsssHS is a constant expression */ PARAMETER PASSING We have already seen the advantages of parameter passing. In this section we shall take a look at the different methods of parameter passing. C provides two different methods of parameter passing namely call by value and call by reference. The kind of parameter passing that we have been employing in several of our functions till now, is call by value. To bring to light the differences between the said methods, let us take a specific exercise now: To write a function which will accept two int variables as parameters and swap the values of these variables. Look at the program below which looks like the obvious solution: #include <stdio.h> void swap(int, int); main() {
83

int first, second; first = lOO; second = 200; swap(first, second); /* This won't work 1 */ printf("Value of first = %d\tvalue of second = %d\n", first, second); } void swap(int a,int b) { int temp; temp = a; a = b; b = temp; } If you try the above program, as it is, you will be in for a surprise. The program's output will show that the variable first has a value 100 and second has 200. Or in other words, though the function swap got called, the values of the variables would not have got swapped. What is happening? Well, there is no mystery involved here. We already are in a position to explain this seemingly baffling mystery. We saw in chapter 7 that, the parameters of a function are nothing but local variables of the function. The function swap when called does swap a pair of variables, but these variables that are swapped, are the formal parameters and not the actual parameters. There is yet another way of explaining, why the function swap will not work as it is written now. The function swap when it is called gets passed two items of information the value 100 and the value 200. So swap knows that, it has to swap some variable with the value 100 and some other variable with the value 200. It does not know the identity of these variables. What information is really necessary to swap two variables ? This can be answered by looking at what you should do to effect a swap. You must pick up the bit pattern present in the location(s) allocated to the first variable, deposit it in the location(s) allocated to the second variable. For a function to do this, just knowing the values of the variables concerned will not do. Something more is required : the addresses of these variables. As mentioned earlier, C allows two methods of parameter passing. The first is called call by value and the latter, call by reference. In call by value, you simply pass the value of the variable, whereas in call by reference you pass the address of the variable. Let us now learn one of the most, probably the most, important rule about programming in C: When a function is called by value, the called function cannot modify the actual parameters. In the above program the function swap was called by value. So there is no way the function swap can alter the actual parameters - first and second. Let us now rewrite the program so as to employ the other method of parameter passing: call by reference. #include <stdio.h>
84

void swap(int *, int *); /* We are going to employ call by reference 1 */ main() { int first, second; first = 100; second = 200; swap(&first,&second); /*pass address rather than value */ printf("Value of first%d\tvalue of second = %d\n", first, second); } void swap(int * a,int * b) /* Note the changed header 1 */ { int temp; temp =a; *a = *b; *b = temp; } Note that just changing the call from swap(first,second) to swap(&first,& second) alone will not suffice. The function’s header will have to be changed too. Otherwise we will be violating the golden rule about identity of actual parameter list and the formal list. In chapter 6 we saw that, the actual parameter list and the formal parameter list must be identical in terms of the number of parameters, type of parameters and order of parameters. Now unless we change the function's header there will be a type mismatch between the actual parameters (integer's addresses or simply int pointers) and formal parameters (integers). That is why the function's header has also been changed to swap (int * a, int *b). Given this piece of information, you will be able to appreciate why all the while, we have been attaching an & to the parameters used with scanf. The function scanf should be in a position to alter the contents of its parameter. When scanf is called and asked to read an integer value for a variable intvar, scanf needs to modify the bit pattern present in the locations where the variable intvar is stored. Or in other words, the parameters to scanf must be reference parameters. Value parameters are useless. Contrast this with how printf works. For printf to print the value of an int variable, it is sufficient if the value of the variable is known and its address is not relevant really. That brings two questions to mind. What will happen if scanf is wrongly called as shown below: int var = 100; scanf("%d", var);/*& is missing ! */ Since the current value of the variable is 100, this value gets passed to scanf. Now scanf mistakenly believes that, the variable whose new value it has to read from the keyboard is available at the address 100 ! So scanf reads an int value from the keyboard (well, from the standard input) and attempts to store it in the locations whose addresses are 100, 101, 102, 103. The fate of your program is subject to the same failures that we discussed in the last chapter.
85

ARRAYS AS PARAMETERS The second question that we shall take up now is this: Why don't we attach an & to strings when used with scanf ? We have already seen that the name of an array is by C's rules and conventions automatically a pointer to the array. So attaching an & to a string variable name is superfluous and is thoroughly unnecessary. That, the name of an array is automatically a pointer to the array should throw some light on one of the- very important aspects about C: In C, arrays are always passed by reference. This is a serious advantage because employing call by value on huge data structures such as arrays is wasteful of the CPU time, what with having to copy the whole data structure from the actual parameter to the formal parameter. Look at the modified version of the capitalize function produced below: void capitalize(char * str) { while (*str 1= 0) /* while not end of str */ { if(*str >= 'a'&& *str <= 'z') *str = *str - 32; str++; /* move oft */ } } It is worthwhile comparing this version of the function with that shown in chapter 8. When the function capitalize is called as in. char strvar[lOO]; gets (strvar) ; capitalize(strvar); what the caller passes, indeed is a char pointer and not the whole array. Look at the previous version shown below: void capitalize(char str[]) { int i = 0; while (str[i] != 0) /* while not end of str */ { if(str[i] >= 'a' && str[i] <= 'Z') str[i] = str[i] - 32; i++; } } In this version, the formal parameter is being treated as though it were an m:ray, while in reality it is a char pointer. Well, as you are aware, this is very much consistent with C's
86

practices. We saw in the last chapter that, it is possible to treat an array as a pointer and a pointer as an array.

COMMAND LINE ARGUMENTS We have said time and again, that main is a regular subroutine in C. A question arises now: If main is just as good as any other function, is it possible to pass parameters to main? Yes, main can very well receive parameters. The parameters to main are the command line parameters to the program. In shell programming, we saw that it is possible for you to pass parameters to the shell script, by simply typing them along with the command name, during the invocation of the program. The shell script accesses the parameters by using the notation $1, $2 etc. In a similar fashion when a. out is invoked, parameters can be typed at the command line itself. Well, there is nothing new here. We have been passing command line parameters to vi, cc, head, more and so on. These are after all C programs. So it follows that your programs too may receive parameters from the command line. Command line parameters are a convenient mechanism to "get down to business right away". A dialogue with vi every time you want to edit a file can indeed be frustrating. main receives two parameters, traditionally called argc (for argument count) and argv (for argument vector). In as much as these are formal parameters, you can call them by any name, not necessarily argc and argv. Whereas the names of these parameters can be anything based on your choice, their types are fixed. The first parameter to main is always an integer, and the next, always a character pointer array. By inspecting argc, you can determine how many arguments have been passed and by looking at argv you can determine what those are. Look at the command below: A number of points require close attention here. The argument count includes the command name also. So the minimum value, that argc can ever assume is one and not zero. The legally accessible pointers in the argv array are from argv [ 0] to argv [ argc-l ]. These pointers can be used in any place where a character pointer is required. Let us, to illustrate the issues involved write a program now: A program which will show you how the echo command of Unix could be implemented. #include <stdio.h> main(int argc, char *argv[]) { int i; for(i = 1; i < argc;i++)
87

printf("%s ", argv[i); printf("\n"); }

MULTI DIMENSIONAL ARRAYS C allows you to declare and use multi dimensional arrays also. Arrays in C can be of any number of dimensions, limited only by the amount of memory available in your computer. Two dimensional and three dimensional arrays are extremely common in C. Let us now take a look at how to declare multi dimensional arrays and use them. int array[3][3] = { { 1, 2,3 }, { 4, 5, 6}, { 7,8, 9 } } ; Look at the declaration above. It shows how to declare a two dimensional array consisting of three rows and three columns. C uses what is known as the row major arrangement for accessing the elements of the array. In this method what is specified first is the number of rows and the next is the number of columns. You can visualize the two dimensional array as a matrix such as the one shown below: When you want to access a cell of a two dimensional array, you need to specify two indices: one to identify the row number and the other to identify the column number. For instance if you want to access the element which lies at the intersection of the first row and the first column, you will have to say something like: array [1][1] = 100; Or if you want to read an int and store it in the cell which lies at the intersection of the 0th row and the 0th column, you can achieve this by saying scanf("%d", &array[0][0]); You will remember here that array indexing always starts in zero. In a manner of speaking, it can be argued that C allows only single dimensional arrays. For instance the above declaration can be interpreted as that for a single dimensional array by name array which has three elements, each of which is a single dimensional array of 3 integers. Pointer to array Since an array is a data type in C, it follows that you may extract the address of it and attempt to manipulate it. Yes, this indeed is possible. Look at the declaration below: int (*arrptr) [3]; int array [3] [3] ; arrptr = array;
88

What we are declaring is a pointer to an array of 3 integers and a two dimensional array of integers. Incrementing this pointer now, makes it point to the subsequent element of the array i.e. to &array [ 1] [ 0 ]. (Recall that a two dimensional array is nothing but an array of arrays). Multi Dimensional arrays as parameters The following code fragment shows how multi dimensional arrays may be passed to functions: int matrix [3][3]; process(matrix);

The function's body itself will have to be provided as below: void process (int matrix[][3]) /* Row specification missing 1 */ { /* Do something */ } Whereas row specification can be omitted, you cannot omit the column specification. This is necessary so that the relative positions of the different elements may be computed accurately. String operations revisited We earlier saw that, there are absolutely no operators using which we can operate on strings. At this point of time, we will be able to appreciate why this is so. Look at the following. code fragments: char strl[20],str2 [ 20] ,str3[20]; strl = str2; /* won't compile at all ! */ if(strl == str2) /* will compile, but wont' work */ str3 = strl + str2; /* ghastly !! */ As we are aware, the name of an array in C is a constant pointer to the beginning of the array. So, the first case above will be rejected by the compiler because, the LHS happens to be a constant. The second statement shown will compile well, but, will not produce any desirable results because, what we are comparing are the addresses of the strings rather than their contents. The fmal statement is wrong on two counts. Firstly, the LHS is a constant, and secondly, it is illegal to attempt to add two pointers. Pointer to pointer We saw that, there are as many pointer types as there are data types in C. A question arises now. Since a pointer is a data type in C, is it possible to have a pointer to a pointer? Yes, this is very much possible. Look at the following declaration: char **cpp
89

We are declaring cpp to be a pointer to a character pointer, or in other words, cpp is capable of storing the address of a character pointer variable. There is nothing special about pointers to pointers, excepting for the object to which these point: These obey the same rules of initialization and usage and these can also take part in arithmetic. The results of arithmetic are quite predictable from our general understanding of pointer arithmetic. #include <stdio.h> main (int argc, char ** argv) { int i = 1; argv++i /* skip over argv[O] */ while (i < argc) { printf ("%s ", *argv); argv++; i++; } }

As we are aware, an array can be treated as a pointer and a pointer can be treated as an array. The above program, which is a revision of the echo program which we wrote earlier, exploits this feature.

90

Chapter 9

Dynamic Memory Allocation
DYNAMIC MEMORY ALLOCATION Until now, we have been discussing the syntactic and semantic issues involved in using pointers. We have not attempted at all, to explain why and how pointers are useful. In this section, we shall have a brief overview of the applications of pointers and focus on one use of pointers: Dynamic memory allocation. Pointers are useful in a number of ways. 1. Pointers are used to implement the call by reference mechanism. We shall study this in detail, later. 2. Pointer arithmetic will, in general, result in better performance of the program, compared to equivalent array indexing. 3. Pointers are used in dynamic memory allocation. Whatever memory allocation the compiler performs is called static memory allocation. The compiler, during compilation, reads your source file(s), and from the various declarations provided by you, determines how much the memory required is, and arranges to allocate that amount of memory. This static allocation has a number of disadvantages. The sizes of the data structures will have to be decided right at compile time. There is no way the size of a data structure like an array, can be increased or decreased at run time. But often, the precise size of input is not known at the time of writing the program, and so a programmer will have to make a judicious guess about the size of a data structure. However intelligent the guess is still remains a guess. To illustrate this point, let us consider the array sorting exercise, which we have done earlier. What if we were to sort an array of 20 ints instead of 10 ? The program cannot be used as it is, and will have to be modified and recompiled. Let us say that we effect this modification, but what will we do if we were to sort 100 numbers instead of 20 ? One crude way of solving the above problem is to declare a very huge array, let us say of 10,000 ints. Once this is done, what happens if you are given just 5 ints to sort ? Your program is functional, yes, but in the last case, it is going to waste a lot of memory. As we mentioned earlier, on another occasion, in multitasking systems like UNIX, there is a direct relationship between the amount of memory a program consumes and the speed with which it runs. In general, programs that claim too much of memory is sluggish. So, how to achieve our objective of writing a program to sort an array of any number of ints, without wasting memory? Here is where dynamic allocation comes into the picture. The
91

problem depicted above is, by no means, peculiar to the sorting exercise. The same problem occurs again and again in several programming activities. The problem can be expressed very briefly as follows: how to allocate the right amount of memory for our data structures, when in the first place, we do not know the precise size of input at all ? The solution is to avoid using static memory allocation as it is very rigid, and postpone all allocation attempts till run-time, when the size of input will be known. This scheme of allocation of memory is called dynamic memory allocation. The C library provides you a number of functions using which, at run-time, memory can be allocated, used, released and so on. These functions are declared in the file stdlib ° h which should be included in your program, to make use of these functions. The function malloc, is used to allocate memory.- malloc is called with one parameter, which specifies the number of bytes of memory required. malloc tries to allocate the said number of bytes from the heap and if successful, returns a pointer to allocated spac~ As you can imagine, malloc may be asked to allocate space for ints or floats or any other type. So, what should malloc return - an int pointer, or a float pointer, or what? The pre ANSI C treats a char pointer as the generic pointer type. A generic pointer is one which can be converted to other pointer types. The ANSI C determines a void pointer as the generic type. So, in the pre ANSI versions, malloc is designed to return a char pointer and in ANSI versions, a void pointer. The return value ofmaolloc, will have to be converted into a pointer of the suitable type as per your necessity. This is done by a mechanism that C provides, called type-casting. Conversion between compatible types is possible by using this facility. Look at the example below, which allocates space for 10 floats. float *fp; fp = (float *) malloe (10 * sizeof (float)); To achieve a type-cast, you simply prefix the new type enclosed within parenthesis to the expression which is to be cast. The above code fragment, also illustrates the use of the sizeof operator in C. sizeof is an operator that extracts the number of bytes that the variable or data type will occupy. It is often advantageous to use sizeof to calculate the sizes of data types, rather than to directly embed values like 2 and 4. This of course, follows from the fact that, the sizes of data types may differ in different implementations of C. If you directly hard code the number of bytes, your program loses portability. Note that sizeof is a keyword in C and is an operator though it looks like a function. C library provides yet another function calloc to allocate memory at run time. Calloc takes two parameters: the first one, an unsigned int specifying the number of objects for which you want to allocate space, and the final one, the size of each element as in :
92

void * ealloe (unsigned number_of_elements, unsigned sizeof_element); The difference between malloc and calloc is that whereas calloc initializes the allocated space to zeros, malloc does not provide any initialization. So the space allocated by malloc contains garbage initially. Thus, by using these two functions, memory can be allocated at run time after clearly determining the size of input. If it should become necessary, for whatever reason, to increase the allocated space, or to shrink it, this. can be done by using the function realloc. The prototype of realloc is as shown below: void * realloc (void * p, unsigned size); The void pointer p must be pointing to a previously malloced or calloced space. Size specifies the new size that you desire. realloc alters the allocation and returns a pointer to the space. If tJ;1e new size is larger, the new space is uninitialized, the contents will be unchanged upto the minimum of old and new sizes. There is a general convention in C which you have to be &.ware of. The library functions, when they want to signal an error or failure, return specific values. An int function will usually return EOF to signal error, whereas a pointer function will return a NULL pointer. True to this convention, the functions that we saw just now, return NULL pointer if the request for memory allocation cannot be satisfied, for whatever reason. You should never assume that the memory allocation functions will always be successful. You must always determine whether the concerned function succeeded or not, by inspecting its return value to see if it is NULL. Otherwise, should you attempt to access the space by applying the * operator on the pointer, your program will crash with a run-time error. One of the major advantages of allocating memory dynamically is that, the allocated space can be released to the system, once you are finished with it. This allows recycling of memory. The function free is used to release memory. free is called as shown below: void free (void * p); What you pass to free must be a pointer to space malloced or calloced. Passing anything else will usually result in a run time error. Once space is freed, you lose all access to it and should not attempt to use it. Allocation and freeing can be done in any order. When a program terminates, all memory allocated to it is automatically reclaimed by the kernel. Look at the modified version of the array sort program produced below: #include <stdio.h> #include <stdlib.h> main ( ) { int * ip; /* ip is pointing to unknown location now */ int i, j, temp, no_of_elements;
93

printf("\nEnter the number of elements to sort.: "); scanf ("%d", &no_of_elements); ip = (int *) malloc (no_of_elements * sizeot (int)); if (ip == NULL) { printt("\nNo room in RAM\n"); exit (1); } printt("Enter the numbers to sort :"); for (i = 0; i < no_ot_elements ; i++) scanf("%d", ip + i); /* values have been accepted into the allocated space */ for (i = 0; i < no_ot_elements ; i++) for (j = i+1 ; j < no_ot_elements ; j++) if(* (ip + i) > * (ip + j)) { temp = * (ip + i); *(ip + i)= * (ip + j); *(ip + j)= temp; } printf("The sorted List ot numbers is : "); tor(i = 0; i < no_ot_elements ; i++) printt("%d\n"" * (ip + i)) free ((char *) ip); /* not really necessary here */ } /* end of main */ Failing to capture what malloc or calloc or realloc returns renders the allocated space inaccessible and useless.

94

Chapter 10

File Handling

Let us get down to business right away. To operate on files you will first have to, as in any other language, open a file. When you open a file, you will have to specify what kind of operation you desire on the file. This is achieved by opening the file in a specific mode. The different modes .of opening a file allow you to read from, write to, append to and update the contents of the file. One of the first points that we shall make is this: to operate on files you need to declare a FILE variable. What is a FILE pointer? Very simply put it is a pointer to a FILE. That brings us to the next question: What is a FILE? A FILE is a data type (defined in stdio.h), which is used to operate on files. Since you may open several files simultaneously, when you want to perform I/O on a file you will need some means to identify and specify which file you want to operate on. This is the purpose the FILE pointer variable serves. You will pass the FILE pointer as a parameter to the I/O functions thereby identifying the file to operate on. To open a file, the stdio library provides a function called fopen. The prototype of the function fopen is as shown below: FILE * fopen (char * filename, char * mode); As you can see, the function expects two parameters: the first one, which is a character pointer is expected to be pointing to a string containing the name of the file to be opened. Since you are aware that the name of a char array is a pointer to the array, as well as, a quoted string constant, is a char pointer, the first parameter can be either of these. Look at the examples shown below. All of them are legal calls to fopen. char * fname = "/etc/passwd"; char filename[20] = "/etc/passwd"; FILE *fpl, *fp2, *fp3; /* Not FILE * fpl, fp2, fp3! */ fpl = fopen (fname, "r"); fp2 = fopen (filename, "r"); fp3 fopen ("/etc/passwd", "r"); Note that the file name can be either an absolute path name or a relative path name. Whether or not the file names are case sensitive is operating system dependent. The second parameter to fopen is a char pointer, pointing to a string containing the mode of opening. The table given below, lists the important modes and their significance. Turn to the reference section for a detailed discussion of the other modes. Mode of Access
95

1 "r" Read only; File must be present and readable; Soon after set to top of file. 2 "w" Overwrite. If file present contents are discarded. Essentially a rewrite mode.

opening current position is

If file is absent it is created.

3 "a" Append. If file present any output done gets appended. If file absent it gets created. As soon as file is opened current position is set to the end of file. 4 "r+" Update. File must be present. Read as well write access as soon as fopen is called is top of file. available. Current position

The function fopen, tries to open the said file in the said mode and if successful, returns a FILE pointer, which you will capture in your own FILE * variable. Note that, if you fail to capture the return value of fopen, you cannot access the file. If fopen encounters an error, it returns a NULL pointer. You should never assume that a file open operation will be successful. You must always check for errors and if they do occur, act suitably. We have earlier discussed the dangers of in differencing a NULL pointer. Considerations such as those necessitate good error trapping and error handling. Why should f open fail at all ? This could be because of a number of reasons: 1. When you are trying to open a file in read mode, the file must be present. Trying to take input from a non-existent file, is meaningless and therefore will result in an error. 2. It could be that, the file is present but you do not have the neceSS81Y access rights.

3. If the mode of opening employed is "w" and the file is not present, fopen tries to create a file by that name. For this operation to be successful, you need write permission on the target directory and if this is not available, fopen will fail. 4. Every OS imposes a maximum limitation on the number of files that can be open simultaneously. This limit is 20 for Unix. 5. Under Unix, directories can be opened as though they were files. Note that you can open a directory, only in read mode, and cannot open it in any output mode even if write permission is available on that directory. 6. An absolute path name is specified and some component of the path is missing. A number of points require attention here. Once a file is successfully opened, you can perform I/O on the file based on the opening mode. It goes without saying that a file opened in read mode cannot be written to and vice versa. When the mode of opening is “w", and the file is present, the file's contents are discarded and the file is created afresh. The append mode
96

"a", allows you to add on to the file, without destroying its original contents. The "a" mode also creates a file if it is not present. CHARACTER I/O C provides a number of functions to perform character I/O on files. getc can be used to read a character from an open file. The prototype of getc is as shown below: int getc (FILE * fp); getc is functionally very similar to getchar. It reads a character from the file specified and returns an int containing the value read. On end of file, getc too returns EOF. putc, which is the twin of getc, performs character output on files and has the prototype shown below: . int putc (int ch, FILE * fp); putc writes the given character to the given file and on success, returns the same character. putc returns EOF error. If this happens, the possible reason could be a disk-full condition. Let us put into practice the skills acquired to write a program now: a program which will copy one file to another. #include <stdio.h> main(int argc, char * argv[]) { FILE *fpl, *fp2; int ch; if (argc 1= 3) { puts("usage : copy source_file target_file"); exi t (1); } fpl = fopen (argv[l], "r"); if (fpl == NULL) { printf("Cannot open source file\n"); exit (2); } if ((fp2 = fopen(argv[2],"W")) == NULL) { printf("Cannot open target file\n"); exit (3); } while ((ch = getc (fpl)) != EOF) { if (putc (ch, fp2) 1= ch) { printf("Error writing target file\n"); exit (4); } }/* end of while loop */ fclose (fpl);
97

fclose (fp2); exit(O); }/* fclose (fpl,fp2) not allowed 1 */ Note that, with me I/O there is always a notion of current position. The current position specifies where in the file your next input or output operation will take place. The different modes of opening of files leave the current position in different places, as you saw in table 11.1. So when in the above program, the source me is opened in read mode the current position is set to the top of me. Thus when the program calls getc for the first time what it reads is in fact the very first character in the file. As soon as getc reads the first character the current position has changed to the subsequent byte, so that a subsequent call to getc returns the next byte in the file. Hence the name sequential access. The program itself is simple and deserves little discussion. So, let us look at the function fclose, instead. The function fclose, is used to close a file. The file to be closed will be identified by passing the associated FILE pointer. Once a me is closed, you have lost access to it and unless it is reopened you cannot operate on it. All files that you open in your program, must be closed by you. Failing to close a file may result in problems like, loss of data, files not being updated properly and so on. Note that fclose returns EOF on error. C provides yet another pair of functions to perform character I/O. The function fgetc, is for all practical purposes, identical in its nature to getc. It accepts the same parameters, performs the same task and returns the same values as getc does. Similarly, the function fputc, is quite identical to putc in all respects. So, what is the difference between these pairs of functions at all? getc and putc are, often, implemented as macros, while fgetc and fputc are pure functions. Macros will be the subject of discussion in a later chapter. For the moment, we shall close this discussion by saying that these pairs of functions are operationally identical but differ in the way they have been implemented. STRING I/O Having discovered how to carry out character I/O, let us now turn our attention to string I/O. The pair of functions, fgets and fputs allows you to read and write strings to files. The prototype of these functions are provided below: char * fgets (char * buffer, unsigned max_chars, FILE * fp); int fputs (char * buffer, FILE * fp); The function fgets, which is similar to gets, as you can observe, takes 3 parameters. The first parameter is the pointer to the destination buffer. fgets reads either until a ‘\n’ is encountered or max_chars-l number of characters have been read already, whichever occurs earlier. Note that, fgets transfers the ' \n' also to the buffer, as against gets, which does not. The function

98

fputs, prints the specified string onto the given file. fputs does not issue a terminating' \n', automatically. As we already discussed, the C stdio functions, behave in a specific way, when they encounter an error or an anomalous' condition. You can in general observe that the pointer returning functions, return NULL pointer on error, while the int returning functions, return EOF on error. Though this is not a rule, it is a commonly followed practice. Think of such pointer functions as malloc, calloc and fopen : all of these return NULL pointer on error. On the other hand, getchar, getc and fgetc, return EOF on error. True to this spirit, the function fgets returns a NULL pointer on error .while fputs returns EOF. Let us now write a program which will show how the head command of UNIX can be implemented. #include<stdio.h> #include<stdlib.h> #define SIZE 100 main(int argc, char * argv[]) { int count; FILE *fp; char buffer [SIZE]; if ( argc 1 = 3) { puts ( "Usage : head number filename"); exit (1); } count = atoi (argv[l]); /* Convert string to binary int */ if ((fp = fop,en (argv[2], "r")) == NULL) { perror (argv[2]); exit (2); } while (fgets (buffer,SIZE,fp) 1= NULL && count> 0) { printf("%s", Duffer); count--; } fclose (fp); exit(O); } Since the number of lines to be displayed, as supplied in the command line, is a null terminated ASCII string and not a binary int, it has to be converted before usage. The function atoi achieves this. It accepts a string and returns the equivalent binary into Note that the C library also provides other functions such as atof (ASCII-to-float) and atol (ASCII-tolong).

99

We saw earlier that there could be several reasons for fopen to fail. This is true, not only of f open but also of several other functions. So, how are we to determine the actual cause of failure of a function? The function perror (short for print error) helps you do this. perror accepts one parameter - a character pointer. It prints whatever is passed to it, followed by a diagnostic message which tells you the exact cause of the error.

FORMATTED I/O To perform formatted I/O on files, C library provides two functions, namely, fprintf and fscanf. Their prototypes are given below: int fprintf (FILE * fp, char * format_string, ...); int fscanf (FILE * fp, char * format_string,...); These functions are quite similar to their stdio counterparts, printf and scanf, excepting that these operate on files, while those operate on the standard I/O streams. Accordingly, these functions receive as the first parameter, the FILE pointer indicating the file on which I/O is to be done. ERROR HANDLING Reflect for a moment on how the above programs handle errors. Both the copy and head programs use the function perror to print a diagnostic message and exit when the file open fails. Though this kind of error handling is perfectly all right for the above programs, it is not suitable for all programs. For example if a word-processor finds that the file specified is missing, it just cannot exit after printing a message. It has to ask the user if he / she wants to create a fresh one and act according to the user's wishes. So how to handle this kind of a situation at all? The C library functions, when they encounter an error, set a variable called errno (for error number). The variable errno is an integer declared in the file errno.h. The C library functions deposit different values in the variable errno for different kinds of errors. When a program encounters an error it can inspect the value available in the variable errno and based on this decide on a suitable course of action. Typical values deposited in errno for various important error conditions are shown in table 11.2 below. Note that the identifiers used, are all symbolic constants defined in errno. h and such being the case, a program will have to include the file errno . h to use these constants. errno value Meaning EACCES Access denied ENOENT No such file or directory ENOMEM No room in RAM
100

ENOSPC Disk full EMFILE Too many open files Note: The variable errno is set only when errors occur and is not cleared by the library functions when they are successful. So the value is meaningful only immediately after an error. When an error occurs, the program should try to transmit error messages to the standard error device rather than to the standard output. This is to ensure that the output message, which is meant for the user's inspection, does not get redirected to a file or drain down a pipeline. The stdio. h header file, introduces three FILE pointer constants, namely, stdin, stdout and stderr. Bear in mind that these are not variables. These FILE pointer constants, point to the standard input, standard output and standard error device, respectively. These can be used in any place where a FILE pointer is required. Look at the following code fragments: gete (stdin); /* equivalent to getehar() */ fprintf(stderr,"File missing"); /* o/p goes to VDU even under redirection*/ fputs("Hellol\n",stdout); /* Sarne as putS("Hello!"); */ The stderr FILE pointer is particularly useful with respect to error handling and can be used as shown above. Binary I/O ASCII files are just one way of storing data. The C stdio library provides two functions called fread and £write using which binary I/O can be implemented. The prototype of these functions are given below: unsigned fread( void *, unsigned size, unsigned no_of_units, FILE *); unsigned fwrite( void *, unsigned size, unsigned no_of_units, FILE *); As you can see, both the functions take identical parameters. When fwrite is called, you supply the base address where the data resides, the size of one unit of datum, the number of units of data that you want to write and fmalIy the FILE pointer. fwr i te simply picks up the required number of bytes from the appropriate memory locations and dumps the memory image into the file. Similarly, fread reads a previous memory image from a file into a variable. Files created using fwrite must be read by fread to be properly interpreted. in the same way as files created using fprintf must be read by using fscanf. Look at the following program which shows how a binary file may be created. #include <stdio.h> #include <stdlib.h> main ( ) { struct address {
101

char person[50]; char street[50]; char locality[50]; char city[50]; } emprec; FILE * fp; fp = fopen( ..address .dat", "a"); if(fp == NULL) { perror( "address .dat"); exit ( 1 ) ; } do { system( "cls"); printf ("Enter name :"); gets(emprec.person); printf("Enter street :"); gets(emprec.street); printf("Enter locality:"); gets(emprec.locality); printf("Enter city:"); gets(emprec.city); fwrite(&emprec,sizeof(struct address), 1, fp); printf("DO you want to add more ?"); } while ( getchar() == 'y'); fclose (fp) ; exit(O); } Look at how sizeof is used to calculate the size of the structure variable. This isvery important: Never, repeat, never attempt to calculate the size of a structure manually. This may often yield a wrong value, because the compiler may introduce hidden padding byte(s) to satisfy alignment requirements imposed by certain processors.

Binary I/O Vs ASCII I/O
There are a number of important issues which deserve attention here. 1. ASCII I/O using fprintf and fscanf involves conversion during input as well as output. fprintf converts binary data to ASCII information during output and fscanf converts ASCII information back to binary. This conversion can be very expensive, when the number of records is very high. On the other hand the use of the functions fread and fwrite involves no conversions whatsoever. When fwrite is called, you supply the base address where the data resides, the size of one unit of datum, the number of units of data that you want.. to write and finally the FILE pointer. fwrite simply picks up the required number of bytes from the appropriate memory locations and dumps the memory image into the f1le. 2. fwrite and fread always take 'fIxed number of parameters : four. On the other hand the number of parameters that fscanf and fprintf will receive depends on the number of fIelds present in a structure. Where the structure has numerous members, calls to fprintf and fscanf can indeed be very cumbersome.

102

3. As.we saw, fwrite always produces records of fixed size, whereas the size of a record in the case of an ASCn file depends on the data. Having records of fixed size is a great advantage because it allows you to perform random access. Note that it is very much possible to obtain records of fixed size using fprintf too, but ASCn data files may not always lend themselves to random access. 4. The above comparison seems to paint a black picture as far as ASCn files are concerned. Well, this is not totally true. ASCn files too have their own advantages, the chief ones being, your ability to view them directly and portability.

RANDOM FILES We studied sequential files. As you will recall, in sequential file organization, you start at the top of the file and access the information sequentially till the end of file. Sequential files have the advantage that they are very simple to implement and as such do not require any extra programming effort. But sequential file organization is not always the best. Assume that we have a database file which consists of 20000 records of the following form:
struct person { char name [50] : int age: float salary: char telephone [15]: };

Let us assume that you want to read the 10000th record from this file. As we saw earlier, with file I/O there is always a notion of current position. The current position specifies where in the file your next input or output operation will take place. The different modes of opening of files leave the current position in different places, as you saw in table 11.1. So if you open this database file in read mode, the current position will be the top of file. You are also aware that, as and when you do any I/O your current position advances automatically and proportionately. So if we want to read the 10000th record we somehow have to make the beginning of the l000th record as our current position inside the file. Well how to do this? One way is to read the first 9999 records sequentially, so that at the end of that exercise, your current position would have reached the 10000th record. Well this precisely is the sequential access. This process will prove to be very time consuming and hence will just not be suitable for several application where very fast access to data is desired : for example in on-line transaction processing. One alternative to this is to employ the random file access. As the name implies in random access files you do not have to read information in any given order; You can directly position
103

yourself in any particular byte inside the file. This is accomplished by using the fseek function from the C library. Look at the prototype of the function fseek produced below: int fseek(FILE * fp, long offset, int origin); The first parameter identifies the file whose current position is to be changed. The second parameter is the number of bytes by which the current position must be adjusted and the final parameter specifies the origin with respect to which the current position must be set. The legal values for origin are 0 for top of file, 1 for the current position and 2 for the end of file. Thus a call to fseek such as fseek(fp, 0, 0);

alters the current position so that it is at an offset of 0 bytes from the top of file. Or in other words the current position is set to the top of file. Similarly the call to fseek as: fseek(fp, 0, 2) ; moves the current position to EOF. Note that fseek returns 0 on success and non zero otherwise. By supplying a negative value for the offset, it is possible for you move back in the file. Thus to accomplish our objective of reading the l0000th record all we need to do is call f seek and thereby adjust our current position inside the file and then go ahead and read the desired record. Look at the code fragment below: fseek( fp, 9999 * sizeof(struct person ), 0); This call to fseek adjusts the current position so as to be at an offset of 9999 * sizeof (struct person) bytes. Or in other words, at the beginning of the l0000th record. A subsequent call to fread can get you the record you needed.

104