You are on page 1of 589

Any use of this file constitutes your agreement to the terms of the license

(see next page). The text in this file corresponds exactly to the printed
version of the book. Electronic versions of this and other books by the
author can be obtained at http://www.stolyarov.info.
PLEASE READ CAREFULLY BEFORE USING THIS BOOK

READ THIS MESSAGE CAREFULLY BEFORE USING THIS BOOK


I TRANSLATED THIS VOLUME FOR MY PERSONAL USE USING AABBYY FINE READER TO
EXTRACT THE TEXT FROM THE ORIGINAL FILE AND DEEPL TO TRANSLATE THE TEXT INTO
ENGLISH.
I DON'T KNOW THE RUSSIAN LANGUAGE SO I AM NOT RESPONSABLE FOR ANY ERRORS
OCCURRED DURING THE TRANSLATION.
USE THIS MATERIAL AT YOUR OWN RISK.
I THINK THE BOOK IS VERY GOOD FOR LEARNING PROGRAMMING, THAT'S WHY I
TRANSLATED IT SO THAT I CAN READ IT.
PUBLIC LICENSE
Andrei Viktorovich Stolyarov's textbook "Programming: Introduction to the Profession" in three volumes,
published by MAKS Press in 2021, hereinafter referred to as the "Work", is protected by the current Russian and
international copyright laws. All rights to the Work under the law, both proprietary and non-property, belong to
its author.
This License establishes the ways of using the electronic version of the Work, the right to which is granted by
the author and the copyright holder to an unlimited number of persons, provided that these persons
unconditionally accept all the terms of this License. Any use of the Work that does not comply with the terms of
this License, as well as the use of the Work by persons who do not agree with the terms of the License, is possible
only with the written permission of the author and the copyright holder, and in the absence of such permission is
illegal and is prosecuted under civil, administrative and criminal law.
The author and copyright holder hereby authorizes the following uses of this file, which is an electronic
representation of the Work, without notice to the copyright holder and without payment of royalties:
1) reproduction of the Work (in whole or in part) on paper by means of a printer in a single copy for personal
household or educational needs, without the right to transfer the reproduced copy to other persons;
2) copying and distributing this file electronically, including by recording on physical media and by
transmission over computer networks, subject to the following conditions: (1) all copies of the file reproduced
and transmitted to any person are exact copies of the original PDF file, and no deletions, abbreviations,
additions, alterations, distortions, or any other changes, including changes in the file's presentation, are made
in the copying process; (2) distribution and transmission of copies to others is free of charge only, i.e., no
remuneration in any form, including advertising, media charges, or the act of copying and transmission itself,
even if the copying and transmission is made free of charge.
Any other means of distributing this file without the written permission of the author is prohibited. In particular,
it is prohibited : to make any changes to this file, to create and distribute distorted copies, including copies
containing any part of the work; to distribute this file on the Internet through websites providing paid services,
through websites of commercial companies and individual entrepreneurs (including file-sharing and any other
services organized on the Internet by commercial companies, including free of charge), as well as through
websites containing advertising of any kind ; to sell and exchange physical copies of this file on the Internet; to
distribute this file on the Internet through websites providing paid services, through websites of commercial
companies and individual entrepreneurs (including file-sharing and any other services organized on the Internet
by commercial companies, including free of charge), as well as through websites containing advertising of any
kind; to sell and exchange physical copies of this file on the Internet. On the other hand, it is allowed to donate
(free of charge) the media containing the file, to record the file free of charge on media belonging to other users,
to distribute the file through free decentralized P2P file-sharing networks, etc. Links to a copy of the file located
on the author's official website are allowed without restrictions.
А. V. Stolyarov forbids the Russian Copyright Society and any other organizations to make any kind of
licensing of any of his works and to carry out in the interests of the author any other copyright-related activities
without his written permission.

А. V. STOLYAROV

PROGRAMMING
INTRODUCTION TO THE PROFESSION
second edition
in three volumes

Volume I: BASICS OF PROGRAMMING

PRESS

Moscow - 2021
UDC 519.683+004.2+004.45
BBK 32.97.
С81

Stolyarov, Andrey Viktorovich.


C81 Programming: Introduction to the Profession. - 2nd edition, revised and
supplemented. : in 3 volumes / A. V. Stolyarov. - Moscow : MAKS Press, 2021.
ISBN 978-5-317-06573-7.
Volume : Fundamentals of Programming. - 704 с. : ill.
ISBN 978-5-317-06574-4.
DOI 10.29003/m1982.978-5-317-06574-4
The textbook "Programming: Introduction to the Profession" is oriented to
self-study and assumes the use of Unix family systems (including Linux) as an
end-to-end learning environment. The first volume of the textbook contains three
parts covering the basics of programming as an activity.
The first part includes selected information from the history of computer
science, discussion of some areas of mathematics, mathematical foundations of
programming, principles of construction and functioning of computer systems,
initial information about working with the command line of Unix OS.
The second part is devoted to the initial skills of composing computer
programs on the example of Free Pascal under Unix OS. The material is oriented
on studying C language in the future; much attention is paid to working with
addresses and pointers, building dynamic data structures; at the same time, many
Pascal features are excluded from consideration. Information about the rules of
program text design, testing and debugging is given.
The third part deals with programming at the machine instruction level (in
assembly language). The text assumes the use of i386 hardware platform and
NASM assembler.
For high school students, college students, teachers, and anyone interested
in programming.
UDC 519.683+004.2+004.45
BBK 32.97.

© Stolyarov A. В., 2016


© Stolyarov A. В., 2021,

Table of contents
ISBN 978-5-317-06574-4 (vol. I) Preface one, philosophical..... 10
ISBN 978-5-317-06573-7 amended
Preface two, methodological ............................................................................. 20
Preface three, parting ........................................................................................ 39
Book structure and conventions used in the text .... 43

1. Preliminary information 49
1.1. Computer: what it is ................................................................................ 49
1.2. How to use a computer properly ............................................................. 72
1.3. Now a little math................................................................................... 136
1.4. Programs and Data ................................................................................ 196

2. Pascal Language and the Beginnings of Programming 231


2.1. First programs ....................................................................................... 232
2.2. Expressions, variables and operators ..................................................... 243
2.3. Subprograms .......................................................................................... 284
2.4. Program Design .................................................................................... 304
2.5. Symbols and their codes; text data ......................................................... 314
2.6. Pascal's type system .............................................................................. 333
2.7. Selection operator .................................................................................. 362
2.8. Full-screen programs ............................................................................ 365
2.9. Files ....................................................................................................... 383
2.10. Addresses, pointers and dynamic memory.................................................
2.11. More on recursion ................................................................................ 441
2.12. More about program design ................................................................. 460
2.13. Testing and debugging ........................................................................ 489
2.14. Modules and separate compilation ....................................................... 507

3. Processor capabilities and assembly language


3.1. Introductory information....................................................................... 523
3.2. Fundamentals of the i386 command system ......................................... 544
3.3. Stack, subroutines, recursion ................................................................ 591
3.4. Main features of NASM ...................................................... 609 assembler
3.5. Macro toolsand macro processor .................................................................
3.6. Interfacing with the operating system ................................................... 635
3.7. Split broadcast .............................................................................................
3.8. Floating point arithmetic ....................................................................... 682
Concluding remarks ......................................................................................... 702
List of references ............................................................................................ 703

Table of Contents
Preface one, philosophical ................................................................................ 10
Preface two, methodological ............................................................................. 20
Can you learn to be a programmer .......................................................... 21
Self-learning isn't easy either .................................................................. 22
There's a way out, or "Why Unix" ........................................................... 23
Reason one is math.................................................................................. 24
Reason two is psychological ................................................................... 25
Reason three is ergonomic ...................................................................... 27
Reason four is pedagogical...................................................................... 27
Language defines thinking ...................................................................... 29
How to ruin a good idea and how to save it ............................................ 36
Preface three, parting ........................................................................................ 39
Structure of the book and conventions used in the text ................................ 43

1. Preliminary information 49
1.1. Computer: what it is ................................................................................ 49
1.1.1. A bit of history ........................................................................... 49
1.1.2. Processor, memory, bus.............................................................. 63
1.1.3. Principles of central processing unit operation .... 66
1.1.4. External devices ......................................................................... 67
1.1.5. Memory Hierarchy ..................................................................... 69
1.1.6. Summary .................................................................................... 71
1.2. How to use a computer properly ............................................................. 72
1.2.1. Operating systems and types of user
interface ...................................................................................... 72
1.2.2. History of Unix OS .................................................................... 82
1.2.3. Unixon the home machine .......................................................... 86
1.2.4. First session in the computer ...................................................... 89
1.2.5. Directory Tree.Working with files ............................................. 91
1.2.6. Command and its parameters ..................................................... 95
1.2.7. File name templates .................................................................... 98
1.2.8. Command history and autodescription of file names 99
1.2.9. Task management ..................................................................... 100
1.2.10. Running in the background .................................................... 105
1.2.11. Redirecting I/O streams.......................................................... 106
1.2.12. Text editors ............................................................................ 108
1.2.13. File permissions ..................................................................... 115
1.2.14. Electronic documentation (man command) .... 118
1.2.15. Command files in Bourne Shell ............................................. 119
1.2.16. Environment variables............................................................ 126
1.2.17. Session logging ...................................................................... 128
1.2.18. Graphics subsystem in Unix OS ............................................. 128
1.3. Now a little math................................................................................... 136
1.3.1. Elements of combinatorics ....................................................... 136
1.3.2. Positional number systems ....................................................... 152
1.3.3. Binary logic .............................................................................. 164
1.3.4. Types of infinity ....................................................................... 170
1.3.5. Algorithms and Computability ................................................. 175
1.3.6. Algorithm and its properties ..................................................... 185
1.3.7. Sequencing has nothing to do with . . . . 193
1.4. Programs and Data ................................................................................ 196
1.4.1. On measuring the quantity of information ............................... 196
1.4.2. Machine representation of integers ........................................... 204
1.4.3. Floating point numbers.............................................................. 210
1.4.4. Texts and languages ................................................................. 211
1.4.5. Text as a data format.Encodings .............................................. 215
1.4.6. Binary and textual data .............................................................. 221
1.4.7. Machine code, compilers and interpreters . 224

2. Pascal Language and the Beginnings of Programming 231


2.1. First programs ....................................................................................... 232
2.2. Expressions, variables and operators .................................................... 243
2.2.1. Arithmetic operations and the concept of type .... 243
2.2.2. Variables, initialization, and assignment . . . . 246
2.2.3. Identifiers and reserved words . . . 250
2.2.4. Input of information for its further processing 251
2.2.5. Watch out for bit capacity!........................................................ 254
2.2.6. Simple sequence of operators.................................................... 256
2.2.7. Branching construct .................................................................. 258
2.2.8. Compound operator .................................................................. 261
2.2.9. Logical expressions and logical type ........................................ 263
2.2.10. The concept of cycle; while operator ............ 265
2.2.11. Cycle with postcondition; repeat operator ......... 270
2.2.12. Arithmetic loops and for operator ............... 271
2.2.13. Nested loops............................................................................ 273
2.2.14. Bitwise operations................................................................... 277
2.2.15. Named constants ..................................................................... 279
2.2.16. Different ways of writing numbers ......................................... 283
2.3. Subprograms ......................................................................................... 284
2.3.1. Procedures ................................................................................. 285
2.3.2. Functions ................................................................................... 290
2.3.3. Logical functions and conditional expressions . . . . 293
2.3.4. Parameters-variables ................................................................. 294
2.3.5. Global variables ........................................................................ 297
2.3.6. Functions and side effects ......................................................... 298
2.3.7. Recursion .................................................................................. 301
2.4. Program Design ..................................................................................... 304
2.4.1. The concept of structural programming . . . . 304
2.4.2. Exceptions to the rules: exit operators .... 306
2.4.3. Unconditional transitions .......................................................... 310
2.4.4. On dividing the program into subprograms .... 312
2.5. Symbols and their codes; text data ......................................................... 314
2.5.1. Symbol handling tools in Pascal ............................................... 315
2.5.2. Character input of information .................................................. 319
2.5.3. Reading to the end of the file and filter programs . . . 324
2.5.4. Reading numbers to the end of the file ..................................... 329
2.6. Pascal's type system .............................................................................. 333
2.6.1. Built-in types and custom types .... 333
2.6.2. Ranges and enumerated types ................................................... 335
2.6.3. General concept of ordinal type ................................................ 337
2.6.4. Arrays ....................................................................................... 338
2.6.5. Type entry ................................................................................. 346
2.6.6. Constructing complex data structures . . . . 348
2.6.7. User-defined subprogram types and parameters 349
2.6.8. Type conversions ...................................................................... 351
2.6.9. String literals and arrays of sbar' ........................................... 354
2.6.10. Type string ........................... 357
2.6.11. Built-in tools for working with strings .... 359
2.6.12. Processing command line parameters .... 361
2.7. Selection operator ................................................................................. 362
2.8. Full-screen programs ............................................................................ 365
2.8.1. A bit of theory ........................................................................... 366
2.8.2. Output to arbitrary screen positions .......................................... 368
2.8.3. Dynamic input........................................................................... 369
2.8.4. Color Management.................................................................... 376
2.8.5. Random and pseudorandom numbers ....................................... 380
2.9. Files ...................................................................................................... 383
2.9.1. General information .................................................................. 383
2.9.2. Text files ................................................................................... 388
2.9.3. Typed files ................................................................................ 392
2.9.4. Block I/O.........................................................................................
2.9.5. Operations on a file as an integer .............................................. 398
2.10. Addresses, pointers and dynamic memory.................................................
2.10.1. What is an index ..................................................................... 401
2.10.2. Pointers in Pascal .................................................................... 402
2.10.3. Dynamic variables .................................................................. 404
2.10.4. Single-linked lists ................................................................... 408
2.10.5. Stack and Queue ...........................................................................
2.10.6. Passing through the list by pointer to the pointer .... 425
2.10.7. Bilinked lists; decks ......................................................................
2.10.8. An overview of other dynamic data structures . . 436
2.11. More on recursion ................................................................................ 441
2.11.1. Reciprocal recursion ............................................................... 441
2.11.2. Hanoi Towers .......................................................................... 442
2.11.3. Comparison with the sample .........................................................
2.11.4. Recursion when working with lists ......................................... 452
2.11.5. Working with binary search tree ............................................. 455
2.12. More about program design ................................................................. 460
2.12.1. On the role of ASCII typing and English .... 460
2.12.2. Allowable structural indentation styles .... 463
2.12.3. If statement with else branch ................ 464
2.12.4. Design features of the selection operator . . . 466
2.12.5. Sequences of mutually exclusive i^s . 467
2.12.6. Tags and goto operator ..................... 471
2.12.7. Maximum width of the program text .... 473
2.12.8. How to break a long string ...................................................... 476
2.12.9. Spaces and separators ............................................................. 483
2.12.10. Selecting names (identifiers) ................................................. 484
2.12.11. Letter case in names and keywords . . . . . 486
2.12.12. How to deal with description sections .................................. 487
2.12.13. Continuity of compliance ...................................................... 487
2.13. Testing and debugging ......................................................................... 489
2.13.1. Debugging in the life of a programmer ................................... 489
2.13.2. Tests ....................................................................................... 494
2.13.3. Debug Print ............................................................................. 499
2.13.4. Debugger gdb ............................ 502
2.14. Modules and separate compilation ...................................................... 507
2.14.1. Modules in Pascal ................................................................... 509
2.14.2. Using modules from each other ....................................................
2.14.3. Module as an architectural unit .....................................................
2.14.4. Weakening of module cohesion ....................................................

3. Processor capabilities and assembly language


3.1. Introductory information....................................................................... 523
3.1.1. Classical principles of program execution . . 523
3.1.2. Features of programming under control
multitasking operating systems ................................................ 526
3.1.3. History of the i386 .................................................... 529 platform
3.1.4. Familiarizing yourself with the tool ................................................
3.1.5. Macros from the file stud_io.inc .............. 540
3.1.6. Rules of assembly program design . . . 541
3.2. Fundamentals of the i386 command system ......................................... 544
3.2.1. Register System ..............................................................................
3.2.2. User task memory. Segments . . . . 548
3.2.3. Directives for memory retraction .............................................. 551
3.2.4. The mov commandand operand types .................................. 556
3.2.5. Indirect addressing; executive address . . . . 559
3.2.6. Operand sizes and their permissible combinations 563
3.2.7. Integer addition and subtraction ......................................................
3.2.8. Integer multiplication and division .................................................
3.2.9. Conditional and unconditional transitions.......................................
3.2.10. On the construction of branching and cycles ................................
3.2.11. Conditional transitions and ECX register; cycles .... 576
3.2.12. Bitwise operations................................................................... 579
3.2.13. String operations ..................................................................... 585
3.2.14. Some more interesting teams .................................................. 589
3.3. Stack, subroutines, recursion ................................................................ 591
3.3.1. concept of a stack and its purpose ............................................. 591
3.3.2. Stack organization in i386 processor ....................................... 592
3.3.3. Additional stack commands .... 595
3.3.4. Subprogrammes: general principles ................................................
3.3.5. Calling subroutines and returning from them............................ 596
3.3.6. Stack Frame Organization ......................................................... 598
3.3.7. Basic subroutine call conventions .... 601
3.3.8. Local tags .................................................................................. 603
3.3.9. Example: comparison with the sample...................................... 604
3.4. Main features of NASM ...................................................... 609 assembler
3.4.1. Command line keys and options ............................................... 610
3.4.2. Syntax basics...................................................................................
3.4.3. Pseudocommands............................................................................
3.4.4. Constants.........................................................................................
3.4.5. Calculating expressions during assembly 616
3.4.6. Critical Expressions ........................................................................
3.5. Macro toolsand macro processor .................................................................
3.5.1. Basic concepts .................................................................................
3.5.2. simplest macro examples ................................................................
3.5.3. Single-line macros; macro variables .... 623
3.5.4. Conditional compilation ..................................................................
3.5.5. Macro-repeats .................................................................................
3.5.6. Multiline macros and local labels . . . . 630
3.5.7. Macros with variable number of parameters .... 632
3.5.8. Macrodirectivesfor working with strings ..............................
3.6. Interfacing with the operating system ..........................................................
3.6.1. Multitasking and its main types ......................................................
3.6.2. Hardware support for multitasking ....................................... 640
3.6.3. Interrupts and exceptions .......................................................... 643
3.6.4. System calls and "program interrupts" 647
3.6.5. ................................................... Linux650 system call convention
3.6.6. FreeBSD OS system call convention .... 651
3.6.7. Examples of system calls .......................................................... 653
3.6.8. Accessing command line parameters ........................................ 656
3.6.9. Example: Copying the file ..............................................................
3.7. Split broadcast .............................................................................................
3.7.1. Module support in .........................................................NASM670
3.7.2. Example ................................................................................... 670
3.7.3. Object code and machine code .........................................
3.7.4. Libraries ..........................................................................................
3.7.5. Algorithm of the editor links .........................................
3.8. Floating point arithmetic ..............................................................................
3.8.1. Floating point number formats ........................................................
3.8.2. Arithmetic coprocessor device .... 685
3.8.3. Data exchange with the coprocessor ...............................................
3.8.4. Arithmetic commands .....................................................................
3.8.5. Commands for calculating mathematical functions . 693
3.8.6. Comparison and processing of its results ........................................
3.8.7. Exceptional situations and their treatment ............................ 697
3.8.8. Exceptions and wait command ................. 699
3.8.9. Controlling the coprocessor ...................................................... 700
Concluding remarks ......................................................................................... 702
List of references ............................................................................................ 703
Preface one, philosophical
The book you are reading is almost a unique phenomenon - but it is not about its
content, which is a matter for others to assess. It is about the way in which the book - now
in its second edition - came into being.
I had the idea to write a book like this one quite a long time ago, and it took me about
five years before the idea turned into a concrete crowdfunding project. I already had
experience in writing books, and a lot of it, but none of the books I'd written before were
more than two hundred pages long. In the past, I had always made do on my own; once I
had a book in mind, I would just sit down and write it. Some of my textbooks have been
published by the educational institutions where I work or have worked, others I have
safely published at my own expense, paying off by selling part of the print run: with a
print run of a hundred or two copies it is not so difficult, though it takes a long time.
Several times I made attempts to cooperate with publishers; if I had agreed to their terms,
I would not have had to publish some of my books at my own expense, but these books
would not have been on my site in the public domain: publishers always and everywhere
require full transfer of property rights to the book, which completely excludes legal free
distribution of the electronic version. Thanks, gentlemen, no thanks: I write my books to
be read, not for you to make money off them by ripping off my readers.
Everything was fine, as long as my ideas were not large-scale; I always managed to
carve out a couple of more or less free weeks to write a text, and ten or fifteen thousand
rubles to publish the written book. But this time the reality was somewhat different. First
of all, the volume of the book was initially supposed to be quite large. I must say that the
book turned out to be even bigger than planned, and quite a lot bigger than planned - about
twice as much; but even in the configuration that was originally envisioned, it was clear
that its publication in paper, even in the smallest edition, exceeded my personal financial
possibilities.
Secondly, it was clear from the very beginning that I would have a serious labor
marathon to write the book. The parts of this book, devoted to assembly language
programming and C+, already existed as separate books; it was planned to use also an
existing book on operating systems (the book "Introduction to Operating Systems" of
2006 can hardly be recognized in what I got later on this topic, but I didn't intend to rewrite
it so much). Anyway, I had to write it from scratch.
the introductory part, the part devoted to Pascal and programming basics, and the part
about the C language. Using the existing experience, I estimated the upcoming labor costs
at 500 hours, and, as it turned out, I was almost right - if I had had the willpower to limit
myself to the originally planned amount of work, it would have taken just that much.
Five hundred hours of work time was not a couple of weeks or even a couple of
months; given my main job, it would take at least six months to write the book, provided
I gave up freelance work and private tutoring for that time. In addition to that, the amount
of money that was to be paid for the publication of the book, according to the most
optimistic calculations, corresponded to my salary for six months. All this together turned
the project into a project.
I could go to publishers; I could probably find someone who would agree to enter into
a so-called contract of authorship with me and even pay me some token fee. But this is
not about royalties. Distribution of the book in electronic form, as I do with all my books,
would then be completely impossible; one could read the book either on paper or, even
worse, by buying the electronic version. This option is contrary to my beliefs; in
particular, I am deeply convinced that the only way to pay for e-books (and bits and bytes
in general) is to pay for them with electronic scans of money.
Before finally abandoning my idea, I decided, and without much hope, to try the last
opportunity I saw: crowdfunding, which has been trendy lately; simply put, I decided to
ask the public for money for the project.
After looking at crowdfunding-oriented sites, I had to stop using them. Such sites should not
be used at all: working with them requires registration, during which you have to accept "terms
of use", and these terms of use, in particular, allow their owners to send advertisements to
registered users and anything else they want - and these are the terms of use on all the
crowdfunding sites I have seen. They also refuse to work with JavaScript disabled in the browser.
I simply can't ask anyone to visit such sites, it's out of the question.
In addition, I do not tolerate so-called "social networks", which are not really networks at all 1

. 1The picture is completed by my extreme sensitivity in the choice of means of PR and other
publicity: I am categorically not ready to tolerate anything even remotely resembling spam - after
all, my dissertation in philosophy was called "Information Freedom and Information Violence",
and it grew out of the study of a private question about the specific reasons why spam cannot
be considered a manifestation of freedom of speech.
Meanwhile, we needed to raise a substantial sum of 600,000 rubles (I remind you, we are
talking about the beginning of 2015). Half of this sum was supposed to be spent on partially
compensating my working hours, which would allow me to stay afloat without wasting time on
casual part-time jobs; the other half of the sum was to be spent on publishing a paper book. By
posting on my website stolyarov.info, I had little hope for anything, but I had thought of
a system of incentives for those who would support me financially: For a donation of 300 rubles
I promised a mention in the list of sponsors, which will be placed in the book, for a donation of
500 rubles - a "branded" CD with the author's autograph (I note that no copy of such a disk was
not demanded by anyone in the end, so this idea was, apparently, quite unsuccessful); for a
donation of 1500 rubles - a copy of a paper book, again, with an autograph, and from 3000 rubles
- a book in a gift version, in which it will be made in accordance with the number of such
donations.

1
Details about my attitude to so-called "social networks" are outlined in my article "Theater of Content
Absurdity. Social networks: the history of one terminological deformation", which can be easily found on
the Internet with the help of search engines.
Almost immediately - at least before the first donations came in, anyway - several people
asked me what would happen to the money if the required amount was not raised; I responded
by writing a separate page where I promised to at least do something anyway, even if I didn't
receive any donations at all. Specifically, I promised that if the sum of collected donations was
less than 25 000 rubles, I would still write a part of the book devoted to the C language and
publish it as a separate book, plus I would once again finalize the text of my C++ book and
republish it for the fourth time. At the sum of donations from 25 to 55 thousand I promised to
revise and republish also my old book on NASM, at the sum from 55 to 100 thousand - to revise
and republish "Introduction to Operating Systems", at the sum from 100 to 120 thousand - to
write a part devoted to Pascal and publish it as a separate book. Finally, when the threshold of
120 thousand was reached, I promised that I would write the whole book and continue raising
money to make it possible to publish it. I set September 1, 2015 as the date of my decision, while
the described events took place at the beginning of January - the announcement of the project
was dated 7.01.2015.
After the announcement, it was quiet for two days; the first donation came the next day, but
it could not be counted: one of my old friends, known under the nickname Gremlin, decided to
support the project. The merry-go-round didn't start until January 10, when I received seven
donations during the day, totaling over 14 thousand. I would like to take this opportunity to
sincerely thank Grigory Kraynov, who, when he sent the second donation, was not lazy and
brought information about the project to the general public through the notorious "social
networks".
The first milestone of 25,000 was reached on January 12, the second milestone (55,000) -
on January 16; on February 4, the amount exceeded 100,000, and on February 10 - the magical
120,000, so all my "backup options" became irrelevant at once; the book now had to be finished
with a carcass or a stuffed animal.
Of course, things were not so rosy; by spring the first wave of donations had finally dried up,
so I even had to stop working on the manuscript to earn money on the side. In the summer I
managed to announce the project on the Linux.Org.Ru site, with special thanks to its owner
Maxim Valyansky for permission to do so; the announcement generated a second wave of
donations. In the first year, the project went into the negative many times and came back out
again, and until the last minute it was unclear whether there would be enough money for the
publication and in what form.
Work on the manuscript was going surprisingly briskly at that time. By February 23, 2015 -
a measly month and a half after the start - I had already finished the manuscript of the C part, a
month later I finished the introductory part; then I stopped for a couple of months, because the
project just went into negative territory and I had other things to do. When I resumed work in
early June, by September 1 I had written the Pascal part, designed to "tell the story of
programming". I was most afraid of this part at the beginning of the project, because I had no
experience in teaching Pascal to students, and private lessons with high school students
preparing for the Unified State Exam are not quite the same. But as you know, the road will take
one who walks; having become the largest TEX file I had ever edited, the part of the manuscript
devoted to basic programming skills brought a fair dose of creative satisfaction, and at the same
time marked the end of the manuscript of the parts that had to be created from scratch - only the
parts that had already existed before as separate books remained.
At the beginning of November 2015 the manuscript was completed in the form originally
planned. The result gave a strange impression: seven parts (introduction, Pascal, assembler, C,
operating systems, parallel programming and C++) stretched for 1103 pages, so the book was
too thick. However, I had no money to publish it anyway, and I was also actively looking for an
artist who could draw a decent cover for this ugliness. While the manuscript was being read by
several of my friends for obvious errors, on top of that I had a clear feeling that I didn't like the
manuscript as it was - it was missing several chapters that I really wanted to have (in particular,
chapters on working with the terminal driver and the ncurses library), and the further I went on,
the more I wanted to include parts on Tcl and Tcl/Tk, various exotic languages like Lisp and
Prolog, and to show how to use OOP to create graphical interfaces.
Thinking about what to do next led to the idea of turning the book into a three-volume set,
which would allow me to publish the first parts I was sure of right away, and to continue working
on anything that seemed too raw. The idea was published on the website on December 15 and
seemed to be unobjectionable, so I concentrated on preparing the first volume, which included
only the introduction and the Pascal part. Besides many revisions of the text, an important part
of this preparation was the drawing and design of the cover. My friends introduced me to the
designer Elena Domennova, who brilliantly realized the idea I had formed. The plot of the
drawing, where the globe stands on three crutches on the back of Bug, riding through a field with
a rake on a bicycle with square wheels, and around flies a crazy fish with wings and webbed
feet, I borrowed from a work that can be easily found on the Internet under the title "The Amazing
World of Programming". The original work was a drawing with felt-tip pens on a whiteboard that
someone was too lazy to take a picture of. Many thanks to the author of the original drawing for
the idea, which provided me with a considerable dose of positive mood. Unfortunately, I still don't
know who created this drawing, but I hope that if its author ever sees the cover of my first volume,
he will like our remake with Elena :-).
The first volume went to the printer on March 2, 2016. By that time, a little more than 400
thousand rubles had been collected in donations; taking into account the compensation for my
time (557 hours of which had been spent by that time) and the costs of publishing, the project
"flew into the minus" by almost 34 thousand rubles. Technically, I had the manuscript of the
second volume ready, but there was nothing to publish this second volume with. And after
finishing the chapter about ncurses and a few other fragments, the second volume swelled quite
a bit - up to 650 pages. The last straw was the obsessive feeling that I didn't like the text about
operating systems in its current form; after all, the book "Introduction to Operating Systems" was
ten years old by that time, and the further I went, the more I wanted to take it apart and build it
up again. As a result, I decided to include in the second volume only the parts about assembler
and C language (under the general title "Low-Level Programming"), and the rest - actually,
everything that came out of that old book - to bring it up to date, to provide an additional full-
length part devoted to computer networks, and to publish a separate - third - volume, entitled
"Systems and Networks".
Despite all the difficulties, the publication of the first volume seemed to be an important
victory, first of all from an ideological and philosophical point of view, because it was possible to
publish it at the expense of the donations collected, retaining control over copyright and thus
escaping from the copyrastic enclosure built for authors by the publishing and media industry.
There were, however, reasons for dissatisfaction. I myself, of course, did not intend to stop on
Pascal, but it was not quite clear to the public who had seen only the first volume; of course,
there was a considerable number of haters who said that "outdated" Pascal is sold to students
only by those who do not know anything else. The publication of the second volume clearly had
to be accelerated as quickly as possible.
In April 2016, I was offered to give a course "Computer Architecture and Assembly
Language" at the MSU branch in Yerevan; of course, I did not have time to publish the volume
"Low-Level Programming" before the start of my business trip - and, I must say, it is very good
that I did not have time. Reading the lectures, on the basis of which I put the corresponding
chapter of the manuscript, showed that there was something to be corrected; when nine years
before that - in spring 2007 - I gave the same course in Tashkent (from which, in fact, the book
about NASM grew), I didn't know some curious things yet, such as CDECL convention and many
other things; While giving lectures in Yerevan, I clearly saw how some of the examples in the
text should be reworked and how the emphasis should be shifted in some places.
When I returned from Yerevan, I spent the whole of May 2016 on finalizing the parts about
NASM and C; the second volume went to print in early June, and the fact that the project was
still in a deep "minus" did not stop me, I really wanted, as they say, to "close the gestalt" and at
the same time to demonstrate that Pascal is used in my book certainly not because I supposedly
know nothing else; note that I know C much better, for that matter, and in fact I have been
teaching it, not Pascal, to students every year since 2000. The manuscript was burning my
hands, and I did not want to postpone the publication of the second volume.
So the second volume went to press, and the project was down almost 190,000; that was
already a lot more than I could afford. After the marathon publishing of the first two volumes, I
needed a break anyway, so I honestly announced on the website that I would return to work on
the third volume when the project was out of the minus. The break, however, didn't last that long.
On September 29, 2016, having received a record-breaking donation for the fantastic sum of
99999 rubles, I couldn't go on shirking work.
The logic of presentation used in the old book on operating systems had to be completely
broken in the end. To tell in detail about some aspects of the kernel's work it was necessary that
the reader was already familiar with the problems arising when accessing shared data, but of
course it would be not good at all to put the part about parallel programming before the part about
operating systems as a phenomenon; we had to "saw" the material of the already actually ready
part - to tell separately how the kernel looks like from the viewpoint of user tasks, what services
it provides to the user-programmer at the level of system calls, and to make a separate part
about the kernel's operation and the system calls. To this was added an independent part about
computer networks; it started with an already existing chapter on network sockets, grew into a
libécé on TCP/IP stack protocols and, together with a lot of pictures, grew to almost a hundred
pages.
After the dizzying speed with which the text of the first two volumes was produced, work on
the third was unexpectedly slow, not least because of the long search for the correct sequence
of presentation, but also because some of the material was outside my area of confident
knowledge and some issues had to be carefully studied. Teachers, following Richard Feynman,
often say: if you want to master a subject seriously, read a course of lectures on it. I can now say
from my own experience that writing my own book on the subject is an even more reliable
method; I once lectured on operating systems for several years, but I certainly learned even more
about the subject in the course of the book.
In June 2017, the manuscript of the third volume was finally completed and submitted to the
proofreader, and on July 14, 2017. - On July 14, 2017, the day when the sacramental number of
seconds since 01.01.1970 passed one and a half billion, the volume went to press, which I could
not refuse myself the pleasure of mentioning when talking about the time system call. The
project, which had been in a financial plus for some time, again "sank into a minus", this time,
however, not so deeply - by 117 thousand. The total time of work on the manuscript amounted
to 975 hours by this point - almost twice as much as the miserable five hundred hours I had
originally planned; however, I had not intended to turn an old 200-page book on operating
systems into a full 400-page volume, and there was a lot in the first two volumes (after giving up
the idea of publishing one big book) that might not have been there.
The most painful was the last, fourth volume under the general title "Paradigms". It was not
planned at all - except for the part into which my book on C++, which by that time had gone
through three editions, was to be turned. As the book was turning into a four-volume book, I
realized what I wanted to see in the fourth volume except the notorious C++. It turned out to be
quite logical: the first volume - the very basics (at the level of an "advanced" school), the second
volume - how everything actually works, the third - "systematic", and the fourth - "applied science"
where efficiency is not always critical and one can afford all sorts of "liberties" with the style of
thinking.
The amount of work to be done was, frankly, daunting. At first I didn't touch the manuscript
because the project was "in the negative," but on October 2, 2017, that excuse disappeared:
book sales and continuing donations brought the project to a plus, where it has remained ever
since. But even then I could not immediately bring myself to continue working: in the spring,
when I was finishing the third volume, I completely "burned out", I did not manage to recover
over the summer, then I had a "fun" fall semester, and only in January-February 2018, when
students passed the session and rested on vacation, and teachers could afford to do nothing for
a month, I managed to more or less come to my senses.
At the beginning of the semester, students persuaded me to republish Introduction to C++
again (for the fourth time), which required, as they say, "restoring the context", and I took
advantage of this to continue working on the fourth volume. I had to start, as usual, with
rearranging the structure of the parts. Initially I planned four parts: on C++, on some widget library
for creating GUIs (imagine, I even thought it would be Qt; dibs on me), on scripting using Tcl as
an example (plus Tcl/Tk at the same time), and on "exotic" languages like Lisp and Prolog,
where, in fact, I planned to cover, as they say, the topic of paradigms. Almost at once it became
clear that putting off the talk about paradigms to the very end of the volume was an absolutely
crazy idea; there was a short preface devoted to paradigms in the book on C++, but such a
superficial text did not fit into the big book where paradigms were to become one of the main
subjects of discussion. So there appeared the part "about paradigms in gener al", the first in the
fourth volume and the ninth in the general numbering. The part on C++, which was supposed to
be the seventh according to the initial idea, turned out to be the tenth in the end.
My own understanding of programmer reality didn't stand still either. In 2018, the article "Pure
Compilation as a Programming Paradigm" was published, written in co-authorship with my
graduate student Anna Anikina and at that time already former graduate student Oleg
Frantsuzov; I thought it was right to develop the topic of interpreted and compiled execution in
the book, and the part about scripting was best suited for this purpose - but in this form it had to
be put after the discussion about "exotic" (and mostly interpreted) languages; so these two parts
changed places. In order not to multiply the parts, I decided to attach the material about graphical
interfaces and OOP application for them to the part about C++; that's how the structure of the
fourth volume got its final form. Also, I suddenly realized how exactly C language uglifies the
thinking of novice programmers if it is used as the first language in training; the result of this
"enlightenment" was the paragraph "Conceptual Difference between C and Pascal" included into
the part about paradigms.
On June 10, 2019, I had to withdraw the hard copies of the first and second volumes from
free sale, just enough to fulfill my obligations to the donors. The manuscript of the fourth volume,
which by then had been almost two years in the making, was still far from being published, and
I could not even predict its date; it was clear that the work had to be accelerated. But I had no
time to do it - I had other things to do that had nothing to do with the book. Nevertheless, armed
with the notes of my lectures on the course "Programming Paradigms", I got down to the part
about "non-destructive paradigms". The part "about paradigms in general" was already ready by
that time, in the final part the description of Tcl and Tcl/Tk was completed and the most interesting
thing was left - compilation and interpretation as paradigms; there was still "hanging" material
about graphical interfaces, but at least it was already clear that the FLTK library would serve as
a basis, on which I even had to make a small commercial project by that time.
The "non-destructive" part showed me in a convincing way what a pound of flesh is worth. It
would seem that here are the lectures - my own lectures! - on all these languages; put them into
literary form, and it was done. Not so. During the lectures one could get away with suggesting to
"google" how to write real programs with the help of existing implementations of Common Lisp,
Scheme and Prolog, and not those worthless toys that students usually run from inside the
interpreter. This option did not work for the book, and I had to look into the issue in detail,
considering different implementations; for a long time I did not want to believe that everything
was so bad with them. The work on this part stretched for half a year and was finished onl y on
December 12; fortunately, I had to admit at some point that there were no more acceptable
implementations of Refal and abandon the chapter about this language, otherwise it would have
taken even more time. On the bright side, I finally dotted the I'S related to carring and the fixed
point combinator; the chapter on lazy computation has separate paragraphs on both of these
intricate entities.
After finishing this part, I managed, taking advantage of the winter session and vacations, to
finish rather quickly, first, the chapter on creating graphical interfaces with FLTK, and, second,
to finish the "philosophical" aspects of the final part - considering interpretation, compilation and
scripting as special paradigms. On February 9, 2020, the last "to be finalized" mark was dropped
from the manuscript. The fourth volume turned out to be the thickest of all, 656 pages (vs. 464,
496, and 400 pages of the previous volumes). It took almost another month to proofread and
prepare for printing; the volume went to press only on March 5, when the coronavirus panic was
growing worldwide. I would like to take this opportunity to once again thank Alla Nikolaevna
Matveeva and other employees of MAKS Press, who literally snatched the finished volume from
the printing house on the last day before the "general closure" - March 27, 2020.
It is noteworthy that the total time of work on the manuscript amounted to 1703 hours, of
which 728 hours were spent on the fourth volume. At the time of sending the fourth volume to
print, the total amount of donations collected amounted to 1,173,798 rubles.
In the meantime, the first two volumes were long gone, there were very few copies of the
third volume left; working copies of the first three volumes were full of marks, plus there was a
serious ideological revision due to the epiphany about Pascal, C, and side effects. All this made
me think that the second edition should not be delayed. The announcement of the beginning of
work on the second edition, which also contained plans to create a problem book to support the
whole book, was published on the website on May 13, 2020; alas, I could not immediately force
myself to start working on the manuscript - the coronavirus marasmus going on in the country
was not conducive to constructive activity. In May I had only one meaningful working day, during
which the structure of the book was reorganized - three volumes of more or less similar length
were made from four volumes that were very "floating" in volume. All I was able to do in June
was to make the notes accumulated in the first volume into the manuscript during two
"approaches to the shell". More or less dense work began only towards the end of July - and was
almost immediately interrupted by vacation; fortunately, it was this two-week shock "rest" in the
form of a category five water trip that allowed me to get my brain more or less back in working
order.
Initially, I thought that preparing the manuscript for the second edition would take about 200
hours, well, maybe a little more. As usual, reality made its own adjustments: only the preparation
of the first volume of these notorious hours took 140 and more, and I managed to finish it only
on November 11. At the same time, another noteworthy story took place: I finally decided that
everyone who could claim their "prize" set of first edition books had already done so, and I put
the remaining eight sets on sale; despite the knowingly inflated prices, the books sold out in just
over a week. After two months and another 196 hours of work time, the last notes "to be revised"
were thrown out of the new second volume; there was still the third volume, which was made
from the old fourth, but it was the last one to be published, the list of desired revisions and
corrections in it had not yet had time to swell, so one could hope that it would be faster. And so
it was, it took a little over two weeks and "only" about 50 hours to rework the last volume. At the
same time, I figured out how to make the subject index common to all three volumes; what I
didn't expect was that getting it in relative order would take another forty hours or so.
Anyway, on February 4, 2021, I deemed the book ready for a second edition. Since the
beginning of the project, more than 2200 working hours have been spent on it, 500+ of them on
preparing the text for the second edition. The volume of donations received passed 1750,000
rubles, but, unfortunately, the publication of the paper book threw the balance "in the minus"; it
is noteworthy that this happened for the first time since 2018, when the project returned from
negative territory after the publication of the third volume; when the fourth volume was published,
the project did not go into the minus, there was enough money.
I would like to say thank you to everyone who reported errors in the text of the
published volumes; special thanks to Ekaterina Yasinitskaya for her heroic proofreading
work, which borders on a feat, and to Elena Domennova for the beautiful covers. I would
also like to thank Leonid Chayka for his high praise of the book in his popular video blog.
And, of course, I am deeply grateful to all those who participated in financing the project,
thus making it possible. Below is the list of donors (except for those who preferred to
remain incognito):
Gremlin, Grigoriy Kraynov, Arseniy Sher, Vasily Taranov, Sergey Setchenkov, Valeria Shakirzyanova,
Katerina Galkina, Ilya Lobanov, Suzana Tevdoradze, Oksana Ivanova, Julia Kulikova, Kirill Sokolov,
jeckep, Anna Sergeevna Kulyova, Marina Ermakova, Maxim Olegovich Perevedentsev, Ivan Sergeevich
Kostarev, Evgeny Dontsov, Oleg Frantsuzov, Stepan Kholopkin, Artem Sergeevich Popov, Alexander
Bykov, I. Beloborodov.Б., Kim Maxim, artyrian, Igor Elman, Ilyushkin Nikita, Kalsin Sergey
Alexandrovich, Evgeny Zemtsov, Shramov Georgiy, Vladimir Lazarev, eupharina, Nikolay Korolev,
Goroshevsky Aleksey Valerievich, Lemenkov D.D., Forester, say42, Anya "canja" F., Sergey,
big_fellow, Dmitry Volkanov, Tanechka, Tatiana 'Vikora' Alpatova, Andrey Belyaev, Andrey Loshkins
(Alexander and Daria), Kirill Alexeev, kopish32, Ekaterina Glazkova, Oleg "burunduk3" Davydov,
Dmitry Kronberg, yobibyte, Mikhail Agranovsky, Alexander Shepelev, G.Nerc=Y.uR, Vasily Artemyev,
Smirnov Denis, Pavel Korzhenko, Ruslan Stepanenko, Tereshko Grigory Yuryevich 15e65d3d,
Lothlorien, vasiliandets, Maxim Filippov, Gleb Semenov, Pavel, unDEFER, kilolife, Arbichev, Ryabinin
Sergey Anatolievich, Nikolay Ksenev, Kuchin Vadim, Maria Trofimova, igneus, Alexander Chernov,
Roman Kurynin, Andrey Vlasov, Boris Dergachev, Aleksey Alekseevich, Georgy Moshkin, Vladimir
Rutsky, Roman Fedulov, Denis Shadrin, Anton Panfyorov, os80, Ivan Zubkov, Konstantin Arkhipenko,
Alexander Asiryan, Dmitry S. Guskov, Toigildin Vladislav, Masutacu, D.A.X., Kaganov Vladislav,
Anastasia
Nazarova, Gena Ivan Evgenievich, Linara Adylova, Alexander, izin, Nikolay Podonin, Julia
Korukhova, Evgeniya Kuzmenkova, Sergey "GDM" Ivanov, Andrey Shestimerov, vap, Tatyana
Gratsianova, Yuri Menshov, nvasil, V.. Krasnykh, Ogryzkov Stanislav Anatolievich, Buzov Denis
Nikolaevich, capgelka, Volkovich Maxim Sergeevich, Vladimir Ermolenko, Goryachaya Ilona
Vladimirovna, Polyakova Irina Nikolaevna, Anton Khvan, Ivan K., Aleksey Salnikov, Aleksey
Shcheslavsky, Roman Zolotarev, Konstantin Glazkov, Sergey Cherevkov, Andrey Litvinov,
Shubin M.V., Alexey Syschenko, Nikolay Kurto, Dmitry Kovrigin
Anatolievich, Andrey Kabanets, Yuri Skursky, Dmitry Belyaev, Baranov
Vitaly, Sergey Novikovmaxon86, mishamm, Spiridonov Sergey
Vyacheslavovich, Sergey Cherevkov, Kirill Filatov, Chaplygin Andrey, Victor
Nikolayevich Ostroukhov, Nikolay Bogdanov, Baev Alen, Ploskov Alexander,
Sergey Matveeva.k.a. stargrave, Ilya, aykar, Oleg Bartunov, micky_madfree,
Alexey Kurochkin aka kaa37, Nikolay Smolin, I, JDZab,Kravchik Roman
Dmitry Machnevbergentroll, Ivan A. Frolov, Alexander Chashchin, Muslimov Ya,
Sedar,Maxim Sadovnikov, Yakovlev S.D., Rustam Kadyrov, Nabiev Marat
Pokrovsky Dmitry Evgenievich, Zavorin Alexander, Pavlochev Sergey
Yuryevich, Rustam YusupovNoko Anna,Andrey Voronov, Lisitsa Vladimir
Alexey Kovura, Chaika Leonid Nikolaevich, Koroban Dmitry, Alexey
Veresov, suhorez,Olga Sergeyevna Tsaun,Olga Sergeyevna Tsaun, Sergey Boborykin,
Olokhtonov Vladimir, Alexander Smirnitsky, Maxim Klochkov, Anisimov
Sergey, Vadim Vadimovich Chemodurovrumiantcev, babera,Artyom Korotchenko
Evgeny Shevkunov, Alexander Smirnitsky, Artyom Shutov, Zaseev Zaurbek
Slobodnyuk, Yan Zaripov, Vitaly Bodrenkov, Alexander Sergienko,
Denis Kuzakov, Fluffy Bumblebee, Sergey Spivak, suuuumcaa, Gagarin, Valery Gainullin, Alexander
Makhayev (mankms), VD, A.B. Likhachev, Col_Kurtz, Dmitry Sergeevich H., Anatoly Kamchatnov,
Evgeny Tabakaev, Alexander Troshenkov, Andrey Malyuga, Andrey Sorokin, Ivan Burkin, Alexander
Logunov, moya_bbf3, Vilnur_Sh, Alexander Kipnis, Oleg G. Geier, Vladimir Isayev (fBSD), Filimonov
Sergey Vladimirovich vsudakou, Danilov Evgeny, Vorobyov V. Geier, Vladimir Isaev (fBSD),
Filimonov Sergey Vladimirovich, vsudakou, AniMath, Danilov Evgeny, Vorobiev V. S., mochalov,
Kamchatka LUG, Sergey Loginov, Artem Chistyakov, A&A Sulimovs, Denis Denisov, Andrey Sutupov,
kuddai, Aleksey Ozeritsky, alexz, Vladimir Tsoi, Vladimir Berdovshchikov, Sergey Dmitrichenko, Danil
Ivantsov, D.A. Zamyslov, Vladimir Khalamin, Maxim Karasev (begs), ErraticLunatic,
А. Е. Artemiev, FriendlyElk,Alexey Spasyuk, Konstantin Andrievsky
Andreyevich, Vladislav Vyacheslavovich Sukachev, Artyom Abramovmaxon86,Sokolov
Pavel Andreyevich, Alexey N, Nikita Gulyaev, Evgeny Bodiul,rebus_x

The experience of this project has made me rethink my attitude to reality in many
ways, and in some respects even believe in humanity. I can hardly think of another equally
convincing proof that my work is in demand and that I do not waste my time on my books
in vain. But the main conclusion from the success of our project with you, dear donators,
is that we can really do without copyright parasites and the institution of so-called
"copyright" (and in fact purely publishing) law in general. The creators of free
software in their field have shown this long ago; in the field of fiction this fact is also
practically obvious, as evidenced by the multitude of "samizdat" sites on the Internet and
the abundance of amateur translations of foreign "art"; the book you are holding in your
hands is another very clear nail in the coffin of the traditional (i.e. copyright) publishing
and media business built on information violence, and a very serious step towards
building a free information society, I would like to congratulate you and myself once
again on this very convincing, albeit small, victory.

Preface two, methodological


This preface is addressed, oddly enough, not to those who I consider to be the main
audience of my book, i.e. not to those who have decided to learn programming; I have
another preface for them, which I call "a farewell". As for the methodical preface, it is
intended rather for those who already know how to program, as well as for my fellow
teachers; here I will try to explain my approach and, at the same time, the reasons for the
appearance of this book.

Can you learn to be a programmer


The situation with training of new programmers at the first glance is quite absurd. On
the one hand, a programmer is one of the most in-demand, highly paid and at the same
time scarce specialties: the personnel hunger in this sphere does not disappear during the
most severe crises. The salaries of qualified programmers are comparable to the salaries
of top management of medium and sometimes large companies, and even for such a salary
a candidate has to search for a long time. On the other hand, programming is not actually
taught anywhere. The majority of teachers of higher educational institutions, who teach
"programmer" disciplines, have never been programmers themselves and have a very
approximate idea about this kind of activity; it is understandable, because the majority of
those who can program for money earn money by programming in modern conditions. In
a few "top" universities there are former and sometimes even current programmers among
teachers, but this does not save the situation in general. People who can both program and
teach at the same time are quite rare, but even among them few people are able to
adequately imagine the general methodical picture of a new programmer's formation;
judging by the results observed at the output, if there are such people, they cannot translate
their vision of programmer education into a concrete set of disciplines considered in the
university, because the resistance of the environment is too great.
There is another problem with studying at universities. Applicants enter
"programming" specialties having in most cases a very rough idea of what they are going
to do. Programming is not an activity that can be taught to anyone; it requires very specific
abilities and aptitudes. By and large, all programmers are quite perverts, because they
manage to enjoy work that any normal person would run away from. But it is absolutely
unreal to recognize a future programmer at university entrance exams or even at a job
interview (which nobody conducts anywhere), especially if we take into account that at
school programming is either not studied at all or is studied in such a way that it would
be better not to study it. Whether a person will be good or not becomes clear closer to the
second year, but in the existing conditions (unlike most Western universities) changing
the chosen specialty is possible in theory, but in practice it is too difficult for mass
application; most students prefer to finish their studies in the specialty they originally
entered, despite the obvious error in its choice. As a result, even among the students of
MSU VMK, where the author of the book has the honor to teach, there are at best one
third of future programmers, and ten percent of future good programmers.
Moreover, there are reasons to suppose that it is fundamentally impossible to create a
programmer within the framework of a higher education institution: a craft is not
transferred within the walls of educational institutions, a craft is transferred only in a
workshop - from an acting master to an apprentice, and directly in the process of work,
and this does not concern only programming. At all this, programmers appear from
somewhere, and the conclusion from this may be disappointing but undoubted: a person
can become a programmer only and exclusively as a result of self-training. Note that
this is confirmed both by my personal experience and by the experience of other
programmers: no one has given a second answer to my question whether a person studied
programming himself or was taught at a university.

Self-learning isn't easy either


I was lucky enough to start programming in earnest at the very beginning of the
nineties - exactly when this profession suddenly ceased to be the domain of a narrow
circle of unknown people and became a mass phenomenon. But in those times the world
was organized a bit differently. The dominant platform (if at all it is possible to express
so in application to realities of that time) was MS-DOS system and its numerous clones,
and the typical appearance of the computer screen was formed by blue panels of Norton
Commander. It was not difficult to write a program for MS-DOS, there were plenty of
means for that, and all this resulted in a unique flowering of amateur programming. Many
of those amateurs later became professionals.
Today's environment is qualitatively different from the era of the early nineties. All
prevailing platforms emphasize the graphical user interface; creating a program with a
GUI requires understanding the principles of event-driven application construction, the
ability to think in terms of objects and messages, that is, simply put, to make a program
equipped with a graphical user interface, you must already be a programmer, so the
options "tried it - liked it" or "tried it - got it" are cut off purely technically. Moreover,
starting to learn programming by drawing windows in most cases means irreversibly
traumatizing one's own thinking; such trauma completely excludes achieving high
qualification in the future. Even the seemingly harmless fact that every program is built
according to an event-oriented template distorts thinking; some beginners manage to
completely miss the fact of the main loop's existence while writing event handlers, and
the perception of the program ceases to be whole.

The only refuge for amateur programmers suddenly turned out to be web
development. Unfortunately, once people start in this field, they usually end up in it. The
difference between scripts that make up websites and serious programs can be compared,
perhaps, with the difference between a moped and a dump truck; moreover, having got
used to the "forgiving" style of scripting languages like PHP, most neophytes are
fundamentally unable to switch to programming in strict languages - even in some Java,
not to mention C, and the intricacies of C+ are beyond the horizon of understanding for
such people. Web coders, as a rule, call themselves programmers and often even get good
money without realizing what real programming is and what they have lost for
themselves.

There's a way out, or "Why Unix"


Whenever the situation starts to seem hopeless, it makes sense to look for a way out
where it hasn't been looked for yet. In this particular case, the way out is immediately
apparent, if you just take a step away from the office-home computer mainstream of our
days. Operating systems of the Unix family throughout the history of the Internet firmly
and unshakably held the sector of server systems; since the mid-1990s Unix-systems have
penetrated the computers of end users, and today their share on desktop computers and
laptops is such that it is no longer possible to ignore it. This situation becomes especially
interesting when you consider that MacOS X, which is used on luxury macbooks, is
nothing but Unix: it is based on the Darwin system, which belongs to the BSD family.
In spite of the presence of graphical interfaces in Unix-like systems, which often
exceed their mainstream counterparts by their sheer size, the command line has always
been and remains the main tool of a professional user of these systems, simply because
for a person who knows how to handle it, a properly organized command line is much
more convenient than "menu-icon" interfaces. The possibilities of a graphical interface
are limited by the imagination of its developer, while the possibilities of the command
line (of course, if properly organized) are limited only by the characteristics of the
computer; work in the command line is faster, sometimes dozens of times; finally, the
hands, freed from the need to constantly grasp the mouse, are much less tired, the right
shoulder joint and wrist stops hurting. By the way, it is more natural for people to express
their thoughts (in this case, wishes) with words rather than gestures. In Unix family
systems, the command line is so competently organized that its dominant position as the
main interface is not threatened. In the context of our problem, the important fact is that
it is much easier to write a program designed to work in the command line than a
GUI program; the command line as the main tool in Unix systems makes possible the
very amateur programming that seemed irretrievably lost when the "good old" MS-DOS
was replaced by Windows systems in the mainstream.
If you look closely, Unix family systems are not only suitable, but also (in modern
conditions) the only possible option when it comes to learning programming. I will allow
myself to emphasize four reasons here.

Reason one is mathematical


Any computer program is, as we know, a record of some algorithm in a chosen
programming language. Nobody really knows what an algorithm is and, interestingly
enough, nobody can know, otherwise we would have to throw the whole theory of
computability together with the theory of algorithms to the dump, forget the Church-
Turing thesis and abandon the theoretical component of computer science altogether.
Nevertheless, it is commonly believed that every algorithm performs a transformation
from a set of words (chains of symbols) over some alphabet into the same set itself. Of
course, not every such transformation can be performed by an algorithm, because, as it is
easy to show, there is a continuum of such transformations, while algorithms are not more
than a countable set; moreover, an algorithm itself is not such a transformation, because
we can sometimes talk about equivalent algorithms, i.e. algorithms that always "make"
the same output word from one and the same input word; in other words, there can be
more than one algorithm for one and the same transformation (to be precise, for each
transformation there is an algorithm or a set of algorithms). Nevertheless, every algorithm
performs just such a transformation, and that alone is what makes it interesting, generally
speaking. If you like, an algorithm is a thing that takes some input word ("reads" it), does
something constructive and produces another word; the indefiniteness of the concept here
lies in the word "constructive", which, of course, cannot be defined either.
Many programs in Unix systems work like this: they read data from the standard input
stream and write it to the standard output stream. Such programs even have their own
name - filters. Thanks to advanced command-line tools, such filter programs can be
combined on the fly to solve a wide variety of text transformation problems. Since text
representation is virtually universal, the algebra of console programs proves to be an
unexpectedly powerful tool. Each new console program, however simple it may be,
becomes a part of this system, making the system itself a little more powerful and the
range of tasks to be solved a little wider. At the same time, filter programs fully
correspond to the understanding of an algorithm as a transformation from an input word
to an output word.
But even this is not the main thing in learning. The developed culture of console
applications gives an opportunity to a novice programmer to write a real program instead
of a toy sketch. Why this is so important will help us understand

The second reason is psychological


Programming is ultimately nothing more than a craft, and you can't learn a craft, you
can only learn it. Before a beginner becomes a programmer, he/she has to make a number
of very important steps. The first step is the transition from textbook tasks to tasks set by
oneself, and not the ones that are made up, not the ones that are solved "because you have
to", but the ones that are done because this particular subject found it interesting to subdue
the computer and make it solve this particular task.
The second step is to move from sketches to a real solution to a real problem, however
simple, but real. It can be a calendar or a notebook, a reminder of friends' birthdays, some
simple text converter (even if it is for removing extra spaces), anything. For me at one
time such a program was a "cracker" for the game "F-19", which corrected a bytik in the
file of the list of pilots, "reviving" those who were marked as dead. It was much more
difficult to find the right byte than to write a Pascal program that writes zeros in the right
positions of the file, but nevertheless this primitive, one-screen program allowed me to
jump, as they say, to the next level, which your humble servant guessed only ten years
later.
The future programmer notices the third step when this step has long been taken. The
new quality this time consists in the fact that a third-party user appears at some very
primitive program written by you. Of course, there is no money in it: we are talking only
about the fact that you managed to write not just a useful program, but one whose
usefulness was appreciated (in reality, not in words) by someone other than yourself. In
other words, someone has found someone who agrees to spend time using your program
because the results you get are more valuable to him than the time you spend. In many
cases, this program turns out to be some simple toy, rarely - something more serious,
some simple cataloger or something else. It is interesting that you may write the program,
but you cannot know in advance that someone will use it, so you usually do not realize
your own transition to the next level. And only when you suddenly find out that someone
has actually started using your handicraft, not because you begged him or her, but
voluntarily - at this moment you can congratulate yourself from the bottom of your heart:
you have become a programmer.
Of course, there will be a fourth milestone - getting paid for writing programs. In most
cases, you can consider yourself a professional only after that. However, the difference
between a professional and an amateur is not so significant as between a programmer and
a non-programmer. After all, history knows examples when by the time of the first
monetary effect from his programming activity a person had such a high qualification that
nobody would dare to doubt his professionalism; take Linus Torvalds for example.
So, by teaching a person to program using Windows family systems, we deprive
him of the opportunity to do all three of these steps. A real Windows program, which
should be understood, of course, as only a window application, can be written when you
are already a programmer and not before; text programs that nobody even explains to
students how to run them correctly (yes, hello readln^ at the end of every program) do not
look real, neither in full-face nor in profile, and therefore such a program will never have
a third-party user, and the author himself will not use this incomprehensible thing.
Moreover, if the result is so miserable and there is no chance to make it look like
something real, it is unlikely that our trainee will be interested in the prospect of spending
several hours of the rest of his life to make such a worthless solution to a very interesting
problem.
That is why it is vital for the trainee not just to program under Unix, but to live under
Unix, i.e. to use Unix (be it Linux or any other Unix system) for everyday work, for
surfing the Internet, for communicating via e-mail and various messengers, for working
on texts, for watching movies and photos, in general for everything for which computers
are usually used. Only in this case our learner at more or less early stages of development
may have such a need from everyday life, for which it may be necessary to write a
program.

The third reason is ergonomic


No matter what tricks the creators of graphical user interfaces resort to, they will never
surpass or overtake the good old command line in terms of usability and efficiency. The
only people who disagree with this statement are those who don't know how to use the
command line, and those who don't know how to use the command line are usually those
who have never seriously tried it.
If you can't imagine your life without a GUI, you will never understand what working
with a computer should really look like. Most modern usability programs are ugly
monsters that take nine-tenths of the user's effort to fight against their flaws, and users
manage to ignore this situation because they simply don't know that it can be any other
way. That's why it's very important to master the command line (even if you don't plan to
stop using GUI: it will happen by itself), and you should do it as early as possible, before
your brain loses its ability to learn quickly and adapt instantly to unfamiliar conditions:
after 25 years of age, learning something fundamentally new becomes so difficult that it
requires a qualitatively higher motivation.
But really full-fledged command line tools are not available outside the Unix family,
sorry. That's just the way it is.

The fourth reason is pedagogical.


If there is a future programmer among students trying to master programming in
inhuman Windows conditions, the terrible pseudo- and non-programs that they are forced
to write - most often in some hopelessly dead environment like Turbo Pascal, or in general
in such an environment that is designed specifically for learning and has never been
"alive" - will very quickly cease to satisfy such a subject, and he will want something real.
A teacher will hardly explain to an advanced student how to write window programs for
Windows (most teachers cannot do it themselves), but such trifles will not stop a future
programmer. If he picks up the first book he comes across, he will learn how to draw
windows by himself.
In some (alas, quite rare) cases, even this will not spoil him, and in a few years such
a student, who is actually a born programmer, will become a competent and mature
specialist, for whom employers will fight. People of such a class, such people who cannot
be turned off the right path by any teacher or minister of education, really exist; moreover,
the whole modern industry is based on these unicums. The problem is that such people
are very, very, very few.
Much more often we observe a completely different picture: starting with drawing
windows, a beginner irreversibly traumatizes his own thinking, putting the world upside
down: he starts thinking about the program not from the subject area, but from the
elements of the graphical interface, and they (or rather, their event handlers) become a
kind of skeleton of any of his programs, on which the "meat" of functionality is hinged.
The prospect of such an action as, for example, changing the library of widgets in use,
makes such a programmer terrified, outwardly expressed by the phrase "what are you, it
is absolutely impossible, the program will have to be rewritten from scratch"; he is afraid
to even think about the fact that it is possible to make the user interface view changeable.
Such people often use an absolutely fantastic technique, which more competent
programmers jokingly call "drawing on the back side of the screen" - when there are not
enough event handlers, an invisible (!) graphical object is created in a dialog window,
through which other objects exchange information.
In the present conditions such a programmer will be quite in demand, moreover, he
will even be paid a good salary; but he will never realize what he has really lost and how
much more interesting his work could have been if he had not grabbed the notorious
windows in his time.
Alas, most teachers and professors continue to use Windows computers in the
classroom with a tenacity worthy of better use; in fact, they usually teach students to
program under MS-DOS, the same MS-DOS that is now, a quarter of a century after its
final death, embarrassing to even think about. In such conditions, any arguments in favor
of keeping Windows as a system on educational computers, obviously turn out to be just
excuses, and the real reason is the same: irrational fear of mastering everything new.
Windows is not suitable as a teaching tool; to continue to use it when there are better (and
better in every way; Windows has absolutely no advantages over Unix systems)
alternatives is, to put it mildly, bizarre. It should not be a problem for someone teaching
someone how to program to learn an unfamiliar operating environment.
Language defines thinking
In addition to the operating environment used for teaching, the choice of
programming languages is also crucial. The days of BASIC with numbered strings are
fortunately over; but the opposite extreme is often encountered (especially in special
schools). Failed programmers, who decided to try themselves as school teachers, "teach"
innocent schoolchildren "professional" languages such as Java, C# and even C+. Of
course, a fifth-grader who gets C -+ shoved into his brain (a real case in a real educational
institution) will understand absolutely nothing as a result, except for memorizing the
"magic words" cin and cout (this has a very doubtful relation to real C++
programming), but such "genius teachers" do not care about the possibilities of the
audience at all, especially since the methods of control, of course, are chosen in such a
way that the students "skip" the tests and other "obstacles" without any problems, without
understanding anything from the material given to them. I have met schoolchildren who
don't understand what a cycle is, but at the same time get A's in computer science at their
schools where they are "taught C++".
Teachers of this category don't seem to care at all: C+-+ uses the STL library, so they
have to tell their students about STL; of course, they never go further than vectors
and lisVa (these two containers are probably the most useless of all STL containers), but
the most interesting thing is that the students, of course, don't understand what they are
talking about. Indeed, how can you explain the difference between vector and list
to a person who has never seen either dynamic arrays or lists in his life and does not
understand what a pointer is at all? For such a student, list differs from vector in
that it doesn't have a convenient indexing operation (why isn't it there? Well, they
explained something to us, but I didn't understand anything), so you should always use
vector, because it is much more convenient. What? Adding to the beginning and the
middle? It's there for vector too, what's the problem. Yes, we were told that it's
"inefficient", but it works! It's almost impossible to retrain such a student: trying to make
him create a single-linked list manually is doomed to failure, because there is a list,
and this is so much unnecessary hemorrhoids! In fact, that's all: if our student was given
STL before he mastered dynamic data structures, he will never know them again; the way
to serious programming is thus closed to him.
Another variant is no less common: it seems that they try to teach pure C (the one
without pluses), i.e. they don't tell you about classes, containers, STL (which is generally
correct) or references, but cin/cout, bool type (which never existed in pure C),
string comments and other C+-+ tricks pop up out of nowhere. The explanation is quite
simple: thanks to Microsoft's efforts with their VisualStudio, the difference between pure
C and C+-+ is sometimes completely lost in the minds of Windows programmers,
especially beginners. In the Unix world it is much better: in any case, nobody confuses
these two completely different languages; but, as already mentioned, Unix is hard to find
in our schools, teachers prefer to pay state money for commercial software, fight viruses
by reinstalling all computers every week (again, a real situation in a real school), and
mangle students' brains, just to avoid learning anything outside the mainstream.
However, even pure C as a first programming language is frankly nonsense. There is
a rather simple technical reason for this: you should approach learning C already
understanding pointers and knowing how to handle them, otherwise, at the first
lesson, to read something from the keyboard using scanf, you will need to use the
address fetch operation, but you will not be able to explain what this beast is to a beginner
- it is a fundamentally impossible task, and no incantations will help. Working with
pointers seems simple only to those who have known how to handle them for a long time,
but for most beginners pointers are a huge and difficult to overcome barrier. Attempts to
teach the C language to people who do not know how to handle pointers and addresses
are akin to the well-known principle of education through selection: whoever swims out
is good, whoever drowns is not sorry. People tend to underestimate this problem on the
principle of "we'll get through somehow", but such a hat-trick always ends up in one of
two ways: either the students don't understand anything, and in the end they perceive the
program as a kind of incantation (and the teacher simply ignores this fact for lack of
anything better, being satisfied with the fact that the students somehow manage to perform
simple tasks); or the teacher "forgets" to study pure C and starts using cin/cout from
C+-+ for input-output. It is not clear what is worse: in the first case we
have trained monkeys instead of programmers who do not even try to think about what is
going on (and, what is quite bad, are sure that this is how programming is done), while in
the second case the students do not realize that there is such a language as C and that it is
not the same as C+-+.
There is a second reason why C is unacceptable as a first language to learn. This
reason is not so easy to explain, although it is actually much more important than the
pointer problem. For example, I personally have always argued that C as a first language
irreversibly traumatizes thinking, but I have to admit that I was only able to articulate the
reasons for this two or three years ago,
when the first volumes of the first edition of this book had already been written and
published.
In C, as we know, there are no procedures, only functions, and all the "standard" ways
of changing the value of a variable, from assignment to increments and decrements, are
arithmetic operations, which themselves can be included in more complex expressions.
In such conditions, purely formally, any action turns into a side effect, and since C is
mostly an imperative language, i.e. program execution in it consists of actions, it turns
out that C program execution consists (entirely!) of side effects. It is not surprising that
in such conditions people simply forget the meaning of the word "side-effect". Modern
programmers mostly either do not remember the term "side effect" at all or are sure that
it is any modifying (in terms of functional programming - "destroying") action performed
in any program in any language.
In fact, this is categorically not true; for example, in Pascal programs, side effects are
usually almost non-existent, they can only be created deliberately by writing a function
that does something other than compute the result, but such functions are not honored
there, because there are procedures for isolating arbitrary sets of actions; assignment is an
operator, not an operation, so it naturally does not imply any side effects either. In its
original sense, a side effect can only occur when evaluating an expression, and this is
very important: a side effect is an arbitrary change that occurs when evaluating an
expression and can later be detected in some way. If the only thing that happens when an
expression is evaluated is that its result (value) is obtained, the expression (or rather, its
evaluation) is said to have no side effects. When a certain action is specified by an
operator or other language construct that has nothing to do with the computation of
expressions, we cannot speak of any side effects at all. The confusion here is largely due
to the prevalence of C and C++ languages - for them, as well as for functional
programming languages, any modifying (if you will, "destroying") action is indeed a side
effect.
These (seemingly purely terminological) mishaps should not be underestimated. Let's
recall an example of "true C code" known to every C programmer - the textbook string
copying:

while((*dest++ = *src++)); On PDP-11, where C originally appeared,


this construction was translated into a single machine command, which explains
and even in a certain sense justifies the appearance of such a rebus; but PDP-11
is already history, and in general the suggestion to write in C in such a way that
certain machine commands are obtained is somewhat questionable - is it not
easier to write in assembly language? However, explaining to C fans why they
should not write in this way is often more difficult than one might expect. The
correct answer to the question "why" will, of course, be based on the
undesirability of side effects, but our opponent may claim that all these side
effects will not go anywhere anyway, there will be exactly the same number of
them no matter how we do this string copying - and in C terms he will be (from
a purely formal point of view) absolutely right . You can try to explain that side
2

2
In fact, one side effect out of three can be "saved", but it is no longer so essential and also may
(though not necessarily) lead to a loss of effectiveness.
effects supposedly "come in many forms", and it is indeed true - among all side
effects you can single out those that are side effects only because of the C
language structure, and would not be so in other languages... but such arguments
will have no effect on a person who "thinks in C". I'll take the liberty to say that
the well-known "Cishness of the brain" consists precisely in taking side effects
for granted and the term "side effect" itself as having no negative connotation.
Despite all of the above, it is necessary to learn C and low-level programming. A
programmer who does not know C is unlikely to be taken seriously by sensible employers,
even if the candidate is not required to write in C, and there are reasons for that. A person
who does not feel at the subconscious level how exactly a computer does this or that
simply cannot write high-quality programs, no matter how high-level programming
languages he or she uses. It is also best to learn the basics of interaction with the operating
system in C, everything else does not give the full feeling.
The necessity of learning C, if postulated, brings us back to the problem of pointers.
It is necessary to master them before starting to learn C, and for this purpose we need
some language which a) has pointers, and in full size, without any garbage collection; b)
we can do without pointers until the learner is more or less ready to perceive them; and c)
by starting to use pointers the learner will expand his possibilities, i.e. there must be a real
need in pointers. Note that without the last requirement you could use C++ in conjunction
with STL, but you cannot throw this requirement away, we have already discussed above
what will happen to a beginner if you give him containers before low-level data structures.
But only Pascal satisfies all three points simultaneously; this language allows you to
approach pointers smoothly and from afar, without using or introducing them until the
level of the learner is sufficient for this; at the same time, since their introduction, pointers
in Pascal exhibit almost all the properties of "real" pointers, except for address arithmetic.
A search for another language with similar pointer learning capabilities has proved
fruitless; there seems to be no alternative to Pascal.
On the other hand, if we consider learning Pascal as a preparatory stage before C, we
can leave out some of its features, such as type-multiplicity, with operator and nested
subroutines, to save time. It should be remembered that the purpose of learning here is
not "the Pascal language" but programming. There is absolutely no point in insisting on
teaching formal syntax, operation priority tables, and other such nonsense to the student:
the output is not knowledge of Pascal, which the student may never need again, but the
ability to write programs. The highest barriers to the student's way here are, firstly, the
same pointers and, secondly, recursion, which you can also learn how to work with on the
example of Pascal. Note that the CRT module, fondly loved by our teachers (so much so
that the sacramental "uses crt;" can often be seen in programs that do not use any of
its features, even in textbooks), works great in Free Pascals under Linux and *BSD,
allowing you to create full-screen terminal programs; in C it is much harder to do this,
even a professional usually needs a few days to more or less understand the ncurses
library.
Using Pascal also eliminates the problem of side effects. Assignment here is an
operator, not an operation; there is a clear and unambiguous distinction between functions
and procedures, and a separate procedure call operator, so a Pascal program can be
written without a single side effect. Unfortunately, existing implementations destroy this
aspect of conceptual purity by allowing functions to be called for the sake of a side effect
(this was forbidden in the original Virtual Pascal) and by introducing a number of library
functions that have side effects; but if one understands the direction one should take, these
shortcomings are easily circumvented.
Another "inevitability" is assembly language programming. Here we have something
quite reminiscent of the well-known mutually exclusive paragraphs. On the one hand, it
is better never to write anything in assembly language at all, except for short fragments
in operating system kernels (for example, entry points to interrupt handlers and all kinds
of virtual memory management) and in microcontroller firmware. Everything else is more
correct to write in the same C, the efficiency of execution time does not suffer at all and
even in some cases increases due to optimization; at the same time, the gain in labor costs
can reach dozens of times. Most programmers will never encounter a single "assembly"
task in their entire lives. On the other hand, assembly language experience is absolutely
necessary for a skilled programmer; in the absence of such experience, people do not
understand what they are doing. Since assembly languages are almost never used in
practice, the only chance to get some experience is during the learning period, and
therefore it is clear that there is no way we can neglect assembly language.
Learning assembly language can also demonstrate what the operating system kernel
is, why it is needed, and how to interact with it; a system call no longer seems magical
when you have to do it manually at the level of machine commands. Since the goal here,
again, is not to learn a specific assembly language or even assembly programming as
such, but only to understand how the world works, it is certainly not necessary to provide
the student with ready-made libraries that will do all the work for him, in particular, to
translate a number into a textual representation; On the contrary, by writing a simple
assembly language program that reads two numbers from a standard input stream,
multiplies them and outputs the resulting product, the student will understand and feel
much more than if he or she is offered to write something complex and complex in the
same assembly language, but the translation from text to number and back again is done
for him or her in some macro library. Here you should also see how subroutines with local
variables and recursion are organized (no, not on the example of a factorial, which has
been sucked out of your finger and everyone is already bored with it, but rather on the
example of comparing a string with a sample or something similar), how a stack frame is
built, what kind of linkage conventions there are.
If you have to learn assembly language programming anyway, it is logical to do it
before learning C, because it helps you to understand why C is the way it is: this, to put
it mildly, strange language becomes not so strange if you consider it as a substitute for
assembly language. Address arithmetic, assignment as an operation, separate increment
and decrement operations, and much more - all this is easier not only to understand but
also to accept if you already know by now what programming looks like at the level of
CPU instructions. On the other hand, the idea to start learning programming from
assembly language is not even worth discussing, it is an obvious absurdity.
With this in mind, there is a rather unambiguous chain of languages for initial training:
Pascal, assembly language, C. You can add something to this chain at any place, but it
seems that you can neither remove elements from it nor rearrange them.
Knowing C, we can return to the study of the phenomenon called the operating system
and its capabilities from the point of view of a programmer creating user programs. Our
student already understands what a system call is, so we can tell him what they are, using
the level of terminology characteristic of this subject area - namely, the level of describing
system calls in terms of C functions. File I/O, Unix process control (which, by the way,
is organized in a much simpler and clearer way than in other systems), ways of process
interaction - all these are not only concepts demonstrating the structure of the world, but
also new opportunities for the student to develop his own ideas leading to independent
developments. Mastering sockets and the unexpected discovery of how easy it is to write
programs that communicate with each other through a computer network gives students
a great deal of enthusiasm.
At some point in the course it is worth mentioning shared data and multithreaded
programming, emphasizing that it is better not to work with tracks even if you know how
to do it; in other words, you should know how to work with tracks, if only to make a
conscious decision not to use them. At the same time, any qualified programmer needs to
understand why mutexes and semaphores are needed, where the need for mutual exclusion
comes from, what a critical section is, etc., otherwise, for example, when reading a
description of the Linux or FreeBSD kernel architecture, a person will simply not
understand what we are talking about.
It is curious that this is the traditional sequence of programming courses at the VMK
faculty: in the first semester the course "Algorithms and algorithmic languages" is
supported by practice in Pascal, in the second semester the lecture course is called
"Computer architecture and assembly language", and the lectures in the third semester -
"Operating systems" - imply practice in C. The fourth semester is a bit more complicated;
the lecture course there is called "Programming Systems" and is built as a rather strange
combination of introduction to the theory of formal grammars and object-oriented
programming using C as an example. I would venture to say that C is not a very good
language for the initial learning of OOP, and in general this language has no place in the
programs of the main courses: those students who will become professional programmers
can (and do) master C+-+ themselves, while those who will work in related or other
specialties, a superficial acquaintance with a narrowly specialized professional tool,
which is undoubtedly the C+-+ language, does not add either a general outlook or
understanding of the world.
The situation with the readers this book is somewhat different: those who are not
interested in programming as a kind of activity will simply not read it, and those who
originally wanted to become programmers but changed their plans upon closer
acquaintance with this kind of activity will most likely quit reading somewhere in the
second or third part and will not get to the end in any case. At the same time, those who
will finish this book in its entirety - that is, future professionals - may find useful not the
C+ language as such, because it can be learned from any of the hundreds of existing
books, but rather that special view of this language, which your humble servant always
tries to convey to students at seminars of the fourth semester: a view of C+ not as a
professional tool, but as a unique phenomenon among existing programming languages,
as C+ was before it was hopelessly spoiled by the authors of standards. Therefore, C+ is
among the languages described in this book; the peculiarities of my approach to this
language are described in the preface to the corresponding part.
How to ruin a good idea and how to save it
Unfortunately, small things and particularities make the series of programming
courses adopted at VMK hopelessly far from perfection. Thus, in the lectures of the first
semester students are for some reason taught the so-called "standard Pascal", which is a
monster, suitable only for intimidation and not found in nature; at the same time, in the
seminars the same students are forced to program in Turbo Pascal under the same MS-
DOS - on a dead system in a dead environment, in addition, which has nothing to do with
the "standard" Pascal, about which they tell in the lectures. Moreover, the lectures are
built as if the goal is not to learn how to program, but to study in detail the Pascal language
as such, and in its obviously dead version: a lot of time is spent on formal description of
syntax, it is repeatedly emphasized in what sequence the sections of descriptions should
go (any really existing version of Pascal allows you to arrange descriptions in any
sequence, and certainly there is nothing good in following the conventions of standard
Pascal to describe at the very beginning of Pascal, and it is not good to describe in any
order whatsoever).
Things are a little better in the second semester. Until recently, assembly language
was also demonstrated under MS-DOS, i.e. the instruction system of 16-bit Intel
processors (8086 and 80286) was studied. Programming under an obviously dead system
relaxed the students and cultivated a contemptuous attitude towards the subject (and this
attitude was often transferred to the teachers).
In 2010, one of the three streams at VMK started an experiment to introduce a new
program. We should give credit to the authors of the experiment, they eliminated the
rotten corpse of MS-DOS from the educational process, but, unfortunately, along with the
frank deadness, the experimenters threw out the Pascal language, starting the training with
C. This could have been conditionally acceptable if all applicants entering the first course
had at least rudimentary programming experience: for a person who has already seen
pointers, C is not very difficult. Alas, even the USE in computer science cannot ensure
the presence of at least the most rudimentary programming skills in the entrants: it is quite
possible to pass it with a positive score without knowing how to program, not to mention
the fact that pointers are not included in the school program of computer science (and,
accordingly, in the USE program). Most freshmen come to the faculty with absolutely
zero understanding of what programming is and how it all looks like; C becomes their
first programming language in their lives, and the output is an outright disaster.
By the way, the author of these lines warned the ideologists of the notorious
experiment about the problem of pointers, but of course nobody listened to him. Now he
can only state with a certain amount of grim satisfaction that everything ended exactly as
it should have ended - the use of input-output via cin/cout in classes on supposedly
"pure C" and the annual output in the form of many dozens of new "programmers" who
do not know what pure C is and do not understand what they are doing.
It is interesting that at some point the lecturers reading the course "Computer
Architecture" on the remaining two streams, as they call it, were nailed to the wall by the
demand to modernize the course, but this, to put it bluntly, did not help anything. The
lecturers proudly announced the transition to 32-bit architecture, as if the size of a
machine word could affect anything; assembler is now studied under Windows, and for
the workshop they had to make a monstrous wrapper, supposedly allowing to create
window applications in assembler; the fact that the executable files with this wrapper are
four and more megabytes in size is enough to understand what all this actually has to do
with the study of assembly language and its role in the world around us. But the situation
is no better on the first ("experimental") stream, where NASM under Linux is studied:
programs there are written using I/O from the standard C library - and also with its own
wrapping, hiding, in particular, the entry point into the program. Most students who have
mastered this workshop are convinced that the process should end with the ret
command.
With some stretch one can agree that in reality it doesn't make much difference which
assembly language to study, the main thing is to catch the logic of working with registers
and memory areas; but how both versions of the course currently being read are
emphasized is quite impossible to understand. At the end of the course, students usually
don't understand what an interrupt is, and almost nobody knows what a stack frame looks
like; it's not quite clear what the whole semester is spent on and what is the benefit of this
version of the course.
Everything gets more or less back to normal only in the second year, where Unix is
used (previously FreeBSD, now, unfortunately, Linux, since the technical services of the
faculty seem to have failed to support FreeBSD) and pure C is studied in this environment,
which is ideally suited for C. However, before that, two whole semesters are spent, to put
it mildly, with questionable efficiency.
The order of programming disciplines in junior courses adopted at VMK seems to be
potentially the most successful of all that is encountered - if it were not for the above
mentioned "trifles". The persistent unwillingness of some teachers to give up Windows,
and of others to take into account that the purely technical nature of teaching at the
university is inappropriate, puts the future of the whole concept in doubt. All the steps
taken to "modernize" the courses and workshops in recent years (to be more precise -
during the whole time of the author's work at the faculty) turned out to be purely
destructive, destroying the fundamental character of programmer training at VMK and
turning it either into a little sensible technical training, or into a meaningless phenomenon
akin to the well-known cargo cult.
The book you are holding in your hands is an attempt of its author to preserve at least
in some form a unique methodological experience, which is in danger of total oblivion.
In conclusion, I feel it is my duty to give my fellow teachers fair warning about one
important thing. If you want to use this book in the classroom, your own primary way of
interacting with computers in your daily life should be (become?) the command line. The
book is designed for the student to use the command line in preference to graphical
interfaces - that's the only way they have a chance to take the steps listed above toward a
profession. If you copy files, shuffle

You can hardly convince your students that the command line is more efficient and
convenient, because you don't believe it yourself. In that case, this book is useless for you.
Preface three, parting words
This preface, the last of three, is addressed to those for whom the book was written -
to those who have decided to study programming, that is, one of the most fascinating
kinds of human intellectual activity.
For a long time, the smartest and most skillful people have wanted to create something
that works by itself; before the advent of electricity, this was available only to mechanical
watchmakers. In the XVIII century, Pierre Jacquet-Droz created several unique
3

mechanical dolls, which he called "automatons": one of these dolls plays five different
melodies on the organ, while pressing with his fingers the necessary keys of the organ,
albeit made especially for her, but at the same time really controlled by the keys; another
draws on paper quite complex pictures - any of the given three. Finally, the last, the most
complex puppet, the "Writing Boy" or "Calligrapher", writes a phrase on paper by dipping
a goose quill into an inkwell; the phrase consists of forty letters and is "programmed" by
turning a special wheel. This mechanism, completed in 1772, consists of more than six
thousand parts.
Of course, the hardest part of building such an automaton is to come up with all its
mechanics, to find the combination of parts that will make the mechanical arms make
such complex and precise movements; no doubt the creator of the Writing Boy was a
unique genius in the field of mechanics. But once you are dealing with mechanics, genius
alone is not enough. Pierre Jacquet-Droz had to make each of the six thousand parts,
milling them out of metal with fantastic precision; of course, some of the work was done
by the hired workers of his workshop, but the fact remains that, apart from the genius of
the designer of such mechanical products, their appearance requires a huge amount of
human labor, and one that cannot be called creative.
Jacquet-Droz's automatons are a kind of extreme illustration of the possibilities of the
creative human mind combined with the investment of a great deal of routine labor in the
manufacture of material parts; but the same principle can be observed in almost any kind
of engineering activity. A brilliant architect can draw a sketch of a beautiful palace and
with build its detailed design, but the palace will never appear unless there are those
willing to pay for the labor of thousands of people involved in the whole chain of
production of building materials and then in the construction itself. A genius designer can
invent a new car or airplane, which will remain an idea until thousands of other people
agree (most likely for money, which must also come from somewhere) to produce all the
necessary parts and units, and then, combining them all together, to conduct a cycle of
tests and improvements. Everywhere creative technical genius stumbles upon the material
prose of life; we see with our own eyes the results of the work of ingenious designers, if
the resistance of the material environment can be overcome, but we can only guess how
many equally ingenious ideas have been wasted without ever finding an opportunity to be
embodied in metal, plastic or stone.
With the advent of programmable computers, it has become possible to create
something that works by itself, avoiding the complexities associated with material

3
A number of sources say "Droz", but this is incorrect; the last letter in the French surname Droz is
not pronounced.
embodiment. The design of a house, airplane, or automobile is just a formal description,
which must then be used to create the automobile or house itself, otherwise it will be of
no use. A computer program is also a formal description of what should happen, but,
unlike technical projects, the program itself is a finished product. If Pierre Jacquet-Droz
could materialize his ideas by simply making drawings, he would surely surprise the
public with something much more complex than "The Writing Boy". It is not an
exaggeration to say that programmers have such an opportunity; perhaps programming is
the most creative of all engineering professions, and programming attracts not only
professionals but also a great number of amateurs. The eternal question of what is more
in programming - technique or art - has not been solved in anyone's favor and is unlikely
to be solved ever.
The flight of engineering thought, unbound by production routine, inevitably leads to
increasing complexity of programming as a discipline, and this is the reason for some
peculiarities of this unique profession. It is known that a programmer cannot be taught, a
person can become a programmer only by himself or not at all. Higher education is
desirable because a good knowledge of mathematics, physics and other sciences puts
brains in order and sharply increases the potential for self-development; however, we
should admit that all this is desirable but not obligatory. "Programming" subjects studied
at the university may be useful, providing information and skills that otherwise would
have to be found independently; but, observing the development of future programmers,
we can quite definitely say that the role of "programming" subjects in this development
is much more modest than is commonly believed: without a teacher, a future programmer
would find everything he needs by himself, and he does so, since the efforts of teachers
meet his needs in special knowledge by only a quarter.
Being a university teacher myself, I have to admit that I know many excellent
programmers who have non-core higher education (chemical, medical, philological) or
even no diploma at all; on the other hand, being a professional programmer, though now,
perhaps, a former one, I must say that core university education certainly helped me in
terms of professional growth, but in general, I made myself a programmer by myself,
another option is simply impossible. So, higher education for a programmer is desirable
but not obligatory, but self-study, on the contrary, is categorically necessary: if a potential
programmer does not make himself, others will not make a programmer out of him at all.
The book you are reading now is the result of an attempt to gather together the basic
information you need when learning programming on your own, so that you don't have to
search for it in various places and sources of dubious quality. Of course, you can become
a programmer without this book; there are many different paths you can take to eventually
come to an understanding of programming; this book will show you certain waypoints,
but even with that in mind, your path to your goal will remain yours alone, unique and
different from others.
This book alone will not be enough to become a programmer; all you can get out of
it is a general understanding of what programming is as a human activity and how
approximately it should be done. Besides, this book will remain an absolutely useless pile
of paper for you if you decide to just read it without trying to write programs on a
computer. One more thing: this book will not teach you anything if the Unix command
line is not your primary means of everyday work with your machine.
The explanation for this is very simple. To become a programmer, you first have to
start writing programs so that they work; then at some point you have to switch from
sketches to trying to extract some usefulness from your own programs; then you have to
take the last crucial step - to bring the usefulness of your programs to such a level that
someone other than you starts using them. Writing any useful program with a graphical
interface is quite difficult - you have to be a programmer to do it, but to become one, you
have to start writing useful programs. This vicious circle can be broken by dropping the
graphical interface from consideration, but programs that have no graphical interface and
yet are useful only exist in Unix OS, nowhere else.
Unfortunately, there is one more not very pleasant thing, which it would be better to
take into account from the very beginning. Not everyone can become a programmer, and
it's not a matter of intelligence or "aptitude", but of your individual aptitudes.
Programming is a very hard work that requires extreme intellectual tension, and only
those relatively rare perverts who are able to enjoy the process of creating computer
programs can endure this torture. It is quite possible that in the course of studying this
book you will realize that programming is "not your thing"; that's okay, there are many
other good professions in the world. If this book "only" allows you to realize in time that
this is not your path and not to spend the best years of your life on fruitless attempts to
study at a university in some programming specialty - well, this is a great deal in itself:
the best years wasted will never be returned to you, and the sooner you realize what you
need (or rather, don't need), the better.
But enough about sad things. The first, introductory part of this book contains
information that you will need later in programming, but which does not require
programming exercises in itself. It can take you from one day to several weeks to learn
the introductory part; during this time, try to install some Linux or BSD system (FreeBSD,
OpenBSD, or any other system - of course, if you can manage to install it) on your
computer and start using this system in your daily work. For this purpose, you can use
almost any old computer that has not yet crumbled into rusty ashes; you are unlikely to
find a "live" Pentium-1 nowadays, but a Pentium-II class machine from the late 1990s is
enough to run some of the actively supported Linux distributions. By the way, you can
use the appearance of the necessary operating system in your household as a test of your
own readiness for further: if three or four weeks have passed, and there is still nothing
Unix-like on your computers, you can not deceive yourself: you simply do not need
further attempts to "learn to program".
Once you have Unix at your disposal, start by trying to do as much of your normal
"computer stuff" as possible in it. Yes, you can listen to music, watch photos and videos,
access the Internet, have adequate substitutes for the usual office applications, you can
do everything. At first, it may be unfamiliar and hard to use; don't worry about it
Don't worry, this period will soon pass. When you get to the beginning of the second part
of our book, take your text editor and Pascal compiler in your hands and try it out. Try,
try, try, try, try, try! Know that your computer will not explode from your programs, so
try harder. Try this and that, try this and that. If some task seems interesting to you - solve
it, it will be more useful than the tasks from a problem book. And remember: all of this
should be "fun"; it is useless to torture programming.
To all those who are not afraid, I sincerely and wholeheartedly wish you success. I
have spent more than six years on this book; I hope it was not in vain.

Structure of the book and conventions used in


the text
It was not originally planned to divide the book into volumes; the idea arose when the
manuscript of the originally planned seven parts was almost ready. The manuscript was
much larger than anticipated, and the current financial situation of the project did not
allow for immediate publication of the entire book, even after giving up everything that
could be given up. Gradual publication in separate volumes partially relieved the
problems or, at least, reduced their severity, especially since the material of the book,
which eventually consisted of twelve parts instead of seven, was successfully and
naturally divided into four volumes in the first edition and three in the second.
The second edition you are holding in your hands retains the structure of the parts of
the first edition; the only significant change is that the volumes are now three instead of
four. The first volume contains the first three parts of the book. Part I is devoted to the
preliminary knowledge needed by the would-be programmer; it contains information
from history, from mathematics (mostly discrete), a popular presentation of the basics of
computability and algorithm theory, and, finally, an outline of how to use a computer
running the Unix family of operating systems. Part II was usually called "Pascalian",
which is not really correct: the study of Pascal as such was never the purpose of this book,
including its second part, which seems to deal with this language. It would be more correct
to say that the second part is devoted to acquiring basic skills of writing computer
programs, for which Pascal is best suited. If the world were perfect, the entire content of
the first two parts would be part of the high school curriculum; unfortunately, the ideal
seems unattainable so far.

Part III, also included in the first volume, is devoted to assembly language
programming; together with the next part, devoted to the C language and located in the
second volume, it is intended to demonstrate an important phenomenon, conventionally
called low-level programming.
It will be appropriate to say a few words for those who doubt the necessity of learning
low-level as such. The difference between programmers who know how to program in
assembly language and C and those who don't is really the difference between those who
know what they are doing and those who don't. The statement that in modern conditions
"you can do without it" is partly true: among the people who get paid for writing programs
you can find those who don't know how to work with pointers, those who don't know the
machine representation of integers, and those who don't understand the word "stack"; it
is also true that all these people find quite well-paid positions for themselves. This is all
true; but to conclude that low-level programming is "unnecessary" would be at least
strange. The ability to write programs without fully understanding one's own actions is
created by software that cannot be written in this style by itself; this software, usually
called system software, obviously has to be developed by someone. And the statement
that "you don't need a lot of system developers" seems quite ridiculous: there is an
objective shortage of qualified people, i.e. the demand for them exceeds the supply, so
you need them, at any rate, more than there are; well, the fact that you need them in
general less than those to whom high qualification in their work is not so critical has no
relevance here at all, because what matters is the ratio of supply and demand, not the
volume of demand as such.
Assembler and C have one very important thing in common: it is absolutely
impossible to do either of them without a thorough understanding of what is going on. A
trainee may not be able to "pull off" machine-level programming and go into web
development, computer support of business processes, and other similar areas, but this is
no reason not to try to teach anyone serious programming from the start.
While C language belongs to the actively used professional tools, "assembly
language" in modern conditions is written very rarely and in very specific cases; the vast
majority of programmers do not encounter a single assembly language task in their entire
life. Nevertheless, the ability and experience of working at the machine instruction level
is vital to understanding what is going on, making the study of assembly language
rigorously necessary. We can consider that all the material in the first volume (i.e. the
first three parts of the book) is united by one property: most likely, it will not be directly
applicable in your future professional practice, but you cannot become a good
programmer without it. That's why the volume is called "Programming Basics".
The second volume, as already mentioned, opens with Part IV, devoted to the C
language; knowing this language is very important in itself, but we will need it, among
other things, to master the later parts of the second volume, in which all the examples are
written in C.
In Part V, we will learn about the main "visible" objects of the operating system and
how to interact with them through system calls; this part includes material on file I/O,
process management, interprocess communication, and terminal driver management.
The discussion of core system services continues in Part VI on computer networks.
Any data transfer over a network is, of course, also made possible only by the operating
system. Experience has shown that the simplicity of the socket interface and the ease with
which Unix allows you to create programs that communicate with each other over a
network literally delights many students and dramatically increases the "degree of
enthusiasm". The material on sockets is preceded by a small "literacy" on networks in
general, the TCP/IP protocol stack is considered, and examples of application layer
protocols are given.
Part VII describes what problems may arise when several "actors" (running programs
or instances of one and the same program) simultaneously access one and the same
portion of data, whether it is a RAM area or a disk file. This is exactly the situation that
occurs if you use so-called tracks - independent threads of parallel execution within one
instance of a running program. We must admit that this part of the book is written rather
not to teach the reader how to use tracks (although all the information necessary for this
purpose is there) but to convince him/her that tracks should not be used; but even if the
reader, following the author, decides never to use multithreaded programming for
anything, the material of this part will remain useful. First, such a decision should have
been made consciously, with an opportunity to give arguments in its favor. Secondly,
working with shared data is not only in multithreaded programs: the same multi-user
databases are an example of this, and sooner or later any professional programmer will
face a task of this kind. Besides, working with shared data is unavoidable in the operating
system kernel, so it would be difficult to explain some aspects of its internal structure
without a preliminary talk about shared data, critical sections and mutual exceptions.
The volume ends with Part VIII, which attempts to explain how an operating system
works from the inside out. Here we will learn about virtual memory models, talk about
CPU time scheduling, and how I/O is actually organized (i.e., at the OS kernel level,
where it all really happens).
All this can be - again conditionally - combined by the term "system programming";
the C language as the most suitable for creating system programs also belongs to this area,
so don't be surprised that the part devoted to this language appeared in the volume under
the general title "Systems and Networks" together with the material devoted to the
operating system and computer networks.
The third and final volume of the book is entitled "Paradigms". The programming
languages discussed in the first and second volumes - Pascal, assembly language and C -
are often referred to the so-called Von Neumann languages, because their construction is
4

conditioned by the structure of a computer (von Neumann machine): a program is built


as a sequence of direct instructions to perform certain actions, variables represent memory
areas and, importantly, have addresses, and these addresses can be directly manipulated.
The prevailing modern views unambiguously classify not only assembly languages, but
also the C language as low-level, that is, close to the hardware. The situation with the
Pascal language is slightly more complicated, the example of which is used in the first
volume to show the basic principles of program creation; Pascal has always been
considered a high-level language, but even when working in this language we know for
sure, for example, that a variable is nothing but a memory area, we can work directly with
pointers, which also implies the presence of memory in the Von Neumann sense, and
assignment itself as a concept is conditioned by the Von Neumann machine.
In system programming, i.e. when creating programs that serve other programs and
the computer itself, in most cases, no programming languages other than Fonneymans can
be used; however, in application programming, when programs oriented to the end user
are created, things are not so rigid. The efficiency of a program - the speed of its operation
and/or the amount of memory it occupies - may not be as important a factor here as the
labor cost of creating the program or, say, the time that passes from the beginning of
development to the appearance of a finished tool; it becomes possible (and reasonable) to
use programming languages that exploit high-level abstractions created programmatically
and not related to the von Neumann machine. A programmer can afford to distract himself
from thinking about the structure of the computer in order to concentrate on how to
translate his idea of what the program should do into a program in a simpler, shorter, or
clearer way, and at the same time it immediately turns out that it is possible to think about
the program in a different way, quite different from what programmers writing in Pascal,

4
The rules of the Russian language make the spelling of the adjective "von Neumannian" and
especially its antonym "nefonneimanian" problematic. If we follow the letter of the rules, "von
Neumannian" should be spelled separately, but the author of these lines has a strong inner protest against
such spelling; as for "nefonneimanian", there is no correct spelling for it at all, any variant violates some
rule. Here and hereafter we will use the fused spelling; if you like, consider it as the author's orthography.
C, and other von Neumann languages are used to. This leads to the emergence of a variety
of programming paradigms.
The first part of the third volume, numbered IX, discusses programming paradigms
(and paradigms in general) as a phenomenon. Here, the reader will find explanations of
what paradigms are and what this phenomenon looks like in programming; examples of
private paradigms are discussed, including those that the reader has already met before
(recursion, event-driven programming, etc.) and an overview of "big" paradigms, such as
functional, logic, and object-oriented programming. Most of the examples are based on
the C language, and only to demonstrate logic programming the Prolog language is used
with appropriate explanations.
Part X is devoted to the C+ language and the paradigms of object-oriented
programming and abstract data types. C+, to use "modern" terms, is presented as a
truncated subset, which does not include any "features" imposed on the world by
standardization committees; more details about the choice of the C+-+ subset are given
in §10.2. The main material of this part has been published several times before in a
separate book; as a kind of "bonus", the part includes a chapter on building graphical user
interfaces in C+ using the FLTK library.
Part XI is entirely devoted to the alternative view of programs, which assumes that
nothing changes during execution - new information may appear (and does appear), but,
having once appeared, any data object remains unchanged until it disappears (leaves the
scope of visibility). Here we will finally get acquainted with functional and logical
programming, for which we will consider "exotic" programming languages of "very high
level" - Lisp, Scheme, Prolog and Hope.
In the final, XII part of our book, the strategies of program execution - interpretation
and compilation - are considered as peculiar programming paradigms. The part begins
with a look at the command-scripting language Tcl, whose interpreted nature is
unquestionable; it is in this capacity that we are interested in it. The study of Tcl comes
with one more "bonus" related to GUIs, but not directly related to the study of paradigms
- a brief acquaintance with the Tcl/Tk library, which allows you to build simple GUI
programs very quickly "on your knees". Having completed the study of Tcl, we will
devote the rest of this section to the peculiarities of programmer's thinking, conditioned
by the chosen strategy of program execution, and discuss the limits of what is permissible
when applying interpretation and compilation.
Note that before you start studying the material in Volume 3, especially the C/-+
part, you should have some programming experience. Your programs must reach
volumes measured in thousands of lines, and they must have third-party users; only then
will you understand what object-oriented programming is and why you need it. Haste in
this matter is fraught with irreversible consequences for thinking. As they say, he who is
forewarned - well, he who is forearmed.
In the text of all three volumes, there are fragments typed in reduced sans serif font.
When reading the book for the first time, you can safely skip such passages; some of them
may contain forward references and are intended for readers who already know something
about programming. Examples of what not to do are marked with this sign in the margin:
New concepts introduced are in bold italics. In addition, the text uses italics for semantic
emphasis and bold type to emphasize facts and rules that it is desirable not to forget,
otherwise there may be problems with subsequent material.
At the end of volume three you will find a general subject index; for each term it is
indicated in which volume and on which page it appears in the text - for example, 2:107
means that the term you are interested in can be found on page 107 of volume two. 107
of the second volume.
The home page for this book on the Internet is located at
http://www.stolyarov.info/books/programming_intro
Here you can find an archive of the example programs given in the book, as well as an
electronic version of the book itself. For the examples included in the archive, the file
names are given in the text.
Part 1

Preliminary information

1.1. Computer: what it is


When dealing with the variety of computer devices that surround us today, we often
forget that the original function of a computer is to count; most of us cannot remember
the last time we used a computer for calculations. However, even if we try to do it, for
example, by launching the program "Calculator" or some digital table like LibreOffice
Calc or Microsoft Excel, we can notice one curious fact: the computer will spend millions
of times more operations on drawing windows, buttons, table frames and in general on
organizing a dialog with the user than on calculations as such. In other words, a device
designed to perform calculations , does anything but calculations. A small excursion into
5

history will help us to understand how this happened.

1.1.1. A little history


William Schickard's mechanical arithmometer, created in 1623, is referred to as the
first calculating machine in history. The machine was called a "counting clock" because
it was made of mechanical parts typical of clockwork. The "counting clock" operated on
six-digit integers and the

5
This follows even from its name: the English word computer literally translates as "calculator", and
the official Russian term "computer" is formed from the words "electronic computer".
§ 1.1. Computer: what it is 53
The machine was capable of addition and subtraction; overflow was indicated by the
ringing of a bell. The machine has not survived to this day, but a working copy was created
in 1960. According to some reports, Shikkard's machine may not have been the very first
mechanical counting machine: Leonardo da Vinci's sketches (XVI century) depicting a
counting mechanism are known. It is not known whether this mechanism was embodied
in metal.
The oldest surviving counting machine is Blaise Pascal's arithmometer, created in
1645. Pascal began work on the machine in 1642 at the age of 19. The inventor's father
dealt with tax collection and had to perform long, grueling calculations; with his
invention, Blaise Pascal hoped to make his father's work easier. The first sample had five
decimal disks, that is, could work with five-digit numbers. Later, machines with up to
twenty disks were created. Addition on Pascal's machine from the point of view of the
operator was simple - you had to type first the first summand, then the second; as for
subtraction, for it had to use the so-called method of nine additions.
If we have (for example) only five digits, the carry to the sixth digit, as well as the borrow
from it, is safely lost, which allows us to perform the addition of some other number instead of
subtracting a number. For example, if we want to subtract the number 134 ( 00134) from the
number 500 (that is, at five digits, from 00500), we can add the number 99866 instead.
If we had a sixth digit, we would get 100366, but since there is no sixth digit, the result is
00366, which is exactly what we need. As it is easy to guess, the "magic" number 99866 is
obtained by subtracting our subtractor from 100000; from the point of view of arithmetic, instead
of the operation x - y we perform x + (100000 - y) - 100000, and the last subtraction occurs by itself
due to the transfer to the non-existent sixth digit.
The trick here is that it turns out to be unexpectedly simple to get the number 100000 - y from
the number y. Let's rewrite the expression 100000 - y in the form 99999 - y + 1. Since the number
y is not more than five-digit, the subtraction of 99999 - y in the column will occur without a single
loan, that is, simply each digit of the number y will be replaced by a digit that completes it to nine.
It remains only to add a one, and the job is done. In our example, the digits 00134 are
replaced by their corresponding digits 99865, then add one and we get the "magic" 99866,
which we added to 500 instead of subtracting 134.
On Pascal's arithmometers, subtraction was performed in a slightly trickier way. First, it was
necessary to dial the nine-digit addition of the decreasing (the number 99999 - x, for our example
it will be 99499), for which the drums with the digits of the result, visible through special
windows, contained two digits each - the main and complementary to the nine, and the machine
itself was equipped with a bar, with the help of which the "unnecessary" row of digits was closed,
so that it did not distract the operator. To the dialed nine addition was added subtracted, in our
example 00134, that is, the number 99999 - x + y was obtained. However, the operator kept
looking at the digits of the nines, which displayed 99999-(99999-x + y), that is, just x - y. For the
numbers in our example, the result of the addition would be the number 99633, whose nine-
digit complement, the number 00366, is the correct result of the 500 - 134 operation.
Now this method seems to us something like a trick, curious, but not very necessary in
modern realities. But we will meet with the calculation of addition, which requires addition of one,
when we will discuss the representation of negative integers in a computer.
Thirty years later, the famous German mathematician Gottfried Wilhelm Leibniz built
a mechanical machine capable of performing addition, subtraction, multiplication and
§ 1.1. Computer: what it is 54
division, and multiplication and division were performed on this machine in the same way
we perform multiplication and division in a column - multiplication is performed as a
sequence of additions, and division - as a sequence of subtractions. In some sources you
can find a statement that the machine was supposedly able to calculate square and cube
roots; in fact, this is not true, just calculate the root, having a device for multiplication, is
much easier than without it.
The history of mechanical arithmometers lasted for quite a long time and ended in the
second half of the 20th century, when mechanical calculating devices were replaced by
electronic calculators. One common property of arithmometers is important for our
historical excursion: they could not perform calculations consisting of more than one
action without human participation; meanwhile, solving even relatively simple problems
requires performing long sequences of arithmetic operations. Of course, arithmometers
facilitated the labor of calculators, but there was still a need to write out intermediate
results on paper and type them manually with the help of wheels, levers, in later versions
- with the help of buttons.
The English mathematician Charles Bebbidge (1792-1871) drew the attention of to 14

the fact that the labor of calculators could be automated completely; in 1822 he proposed
the design of a more complex device known as a difference machine. This machine was
to interpolate polynomials by the finite difference method, which would automate the
construction of tables of a variety of functions. Having secured the support of the English
government, Bebbidge began work on the machine in 1823, but the technical difficulties
he encountered somewhat exceeded his expectations. The story of this project is told in
different ways by different sources, but all agree that the total amount of government
subsidies amounted to a whopping £17,000 at the time; some authors add that Bebbidge
spent a similar amount from his own fortune. The fact is that Bebbidge never built a
working machine, and in the course of the project, which dragged on for almost two
decades, he himself cooled to his idea, concluding that the method of finite differences
was only one (albeit important) of a huge number of calculation problems; the next
machine conceived by the inventor was to be universal, that is, adjustable to solve any
problem.
In 1842, having failed to obtain any working device, the British government refused
to further finance Bebbidge's activities. Based on the principles proposed by Bebbidge,
the Swede Georg Schutz in 1843 completed the construction of a working difference
machine, and in the following years built several more copies, one of which he sold to the
British government, the other - to the government of the United States. At the end of the
20th century, two copies of Bebbidge's difference machine were built from his original
drawings, one for the Science Museum in London and the other for the Computer History
Museum in California, thus demonstrating that Bebbidge's difference machine could work
if it were completed.
However, in historical terms, it is not the difference machine that is more interesting,
but the universal computing machine conceived by Bebbidge, which he called analytical.
The complexity of this machine was such that Bebbidge could not even fulfill its

14
In fact, an earlier description of the difference machine is known - in a book by the German
engineer Johann von Muller, published in 1788. It is not known whether Bebbidge used the ideas from
this book.
§ 1.1. Computer: what it is 55
drawings; the conceived device exceeded the possibilities of the technology of that time,
and his own possibilities as well. Anyway, it was in Bebbidge's works on the analytical
machine that, first, the idea of program control, i.e. the execution of actions prescribed
by a program, emerged; and, second, actions not directly related to arithmetic appeared:
the transfer of data (intermediate results) from one storage device to another and the
execution of certain actions depending on the results of data analysis (e.g., comparison).
In the same year, when the British government stopped funding the difference
machine project, Bebbidge gave a lecture at the University of Turin, devoted mainly to
the analytic machine; the Italian mathematician and engineer Federic Luigi Menabrea
published in French an abstract of this lecture. At Babbage's request, Lady Augusta Ada
15

Lovelace translated this abstract into An glish, supplying her translation with extensive
16

commentaries much larger than the article itself. One section of these comments contains
a complete set of commands for computing Bernoulli numbers on an analytical machine;
this set of commands is considered the first computer program ever written, and Ada
Lovelace herself is often referred to as the first programmer. Interestingly enough, Ada
Lovelace, while pondering the possibilities of the analytical machine, was already able to
look into the future of computers; among other things, her comments contained the
following fragment: "The essence and purpose of the machine will change from the
information we put into it. The machine will be able to write music, paint pictures, and
show science in ways we have never seen before." In fact, Ada Lovelace observed that the
machine conceived by Bebbidge could be seen as a tool for processing information in a
broad sense, while the solution of computational mathematical problems is only a special
case of such processing.
If a working difference machine, as mentioned above, was still a
built in the middle of the 19th century, although not by Bebbidge, the idea of a
programmable computer was almost a hundred years ahead of the state of the art: the first
working program-controlled computers appeared only in the second quarter of the 20th
century. At present it is considered that chronologically the first programmable computer
was Z1, built by Konrad Zuse in Germany in 1938; the machine was completely
mechanical, electricity was used only in the motor that drove the mechanisms in motion.
The Z1 used binary logic, and the elements that calculated logical functions such as
conjunction, disjunction, etc., were realized as sets of metal plates with clever cutouts.
For the interested reader, we can recommend to find a video on the Internet demonstrating
these elements on an enlarged model: the impression made by their work is certainly
worth the time spent.
The Z1 machine was not very reliable, the mechanisms often jammed, distorting the
result, so it was not practical, but it was followed a year later by the Z2, which used the
same mechanics to store information ("memory"), but performed computational
operations using electromagnetic relays. Both machines carried out instructions received
from punched tape; they were unable to rewind the tape, which severely limited their

Menabrea published quite a lot of his works in French, which in those days was more popular as an
15

international language than English.


16
Ada Lovelace was the only legitimate child of the poet Byron, but her father saw his daughter only
once in his life, a month after her birth. Agreed--
§ 1.1. Computer: what it is 56
capabilities, making it impossible to organize repetitions of a section-but the most plausible
version is that Ada took her passion for mathematics from her mother, Anna Isabella Byron. Ada
Lovelace's acquaintances included, in addition to Charles Babbage, such luminaries as Michael Faraday
and Charles Dickens, and her mentor as a young girl was the famous woman scientist Mary Sawmerville.
program, i.e. cycles. Later, in 1941, Zuse built the Z3 machine, which used only relays
and stored the program on plastic punch tape; according to some sources, ordinary film
was used for this purpose - rejected dubs and other waste products of film studios. This
machine allowed to organize cycles, but had no instruction for conditional transition,
which also somewhat limited its capabilities. The Z3 was developed as a secret
government project; Zuse applied to the government for additional funding to replace
relays with electronic circuits, but was denied. The last machine in the series, the Z4, was
similar in principles to the Z3, but finally allowed branching. The Z4 was completed in
1944, and of all Konrad Zuse's machines it was the only one to survive, the others being
destroyed in the Allied bombing of Berlin.
For a long time, Zuse's work was not known outside Germany. Meanwhile, in the
vicinity of World War II, there was a real boom in the creation of computing devices on
both sides of the ocean, both electromechanical (including those based on relays) and
electronic, based on electron-vacuum tubes.
A radio lamp (see Fig. 1.1) is an electronic device made in the form of a sealed glass bulb
with electrodes, from which air has been pumped out. The simplest radio lamp - diode - has two
working electrodes (anode and cathode), as well as a spiral that heats up the cathode to
temperatures at which thermoelectron emission begins, when negatively charged electrons
leave the cathode and create a kind of electron cloud in the lamp space; under the action of
potential difference, electrons are attracted to the anode and absorbed by it. In the opposite
direction, there is no one to transfer the charge to: the anode, remaining cold, does not emit
electrons, and there are no charged ions in the bulb, because there is a vacuum. Thus, current
can flow through the diode only in one direction; since the electron charge is considered to be
negative, in terms of electrodynamics the movement of electric charge is towards the electrons,
i.e. from anode to cathode. If the polarity of the diode in the circuit is reversed, the electrons
leaving the heated electrode will immediately be attracted back to it under the action of the
positive potential, while the particles will not reach the second electrode (the anode, which has
become the cathode as a result of the reversal of polarity), being repelled by the negative
potential.
By adding another electrode, the so-called grid, we get a new type of radio lamp called a
triode. The grid is placed inside the bulb on the path of electrons traveling from the cathode to
the anode. When a negative potential is applied to the grid, it begins to repel the electrons,
preventing them from reaching the anode; if a modulated signal, such as that received from a
microphone, is applied to the grid, the current through the triode will follow the changes in grid
potential, but may be much stronger. Triodes were originally designed for signal amplification.
§ 1.1. Computer: what it is 57

Figure 1.1. Radio lamp (double triode) in action (left); circuit of a trigger on two triodes
5

(right)

By taking two triodes and connecting the anode of each triode to the grid of the other, we
have a device called a trigger. It can be in one of two stable states: current flows through one of
the two triodes (it is said to be open), and due to this, there is a potential on the grid of the second
triode that prevents the current from flowing through the second triode (the triode is closed). By
briefly applying a negative potential to the grid of the open triode, we stop the current through it,
as a result the second triode opens and closes the first triode; in other words, the triodes switch
roles and the trigger goes to the opposite steady state. A trigger can be used, for example, to
store a single bit of information. Other triode connection schemes allow to build logic gates
realizing conjunction, disjunction and negation. All this makes it possible to use radio tubes to
build an electronic computing device.
Due to the absence of mechanical parts, machines using electron-vacuum lamps
worked much faster, but the radio lamps themselves are quite unreliable: the bulb can lose
its seal, the coil heating the cathode will burn out over time. One of the first programmable
computers - ENIAC - contained 18,000 lamps, and the machine could only work if all the
lamps were in good working order. Despite unprecedented measures taken to improve
reliability, the machine had to be constantly repaired.
5
Photo from Wikipedia; the original can be downloaded from
https://en.wikipedia.Org/wiki/File:Dubulttriode_darbiibaa.jpg. Images used here and hereafter from
Wikimedia Foundation sites are licensed for redistribution under various Creative Commons licenses;
detailed information, as well as original images in substantially better quality, can be obtained from the
respective web pages. In what follows, we omit detailed notation, limiting ourselves to referring only to
the web pages containing the original image.

ENIAC was created by American scientist John Moushley and his student J. Eckert; the work
was started during World War II and financed by the military, but, fortunately for the creators of
the machine, they were not able to complete it before the end of the war, so the project was
declassified. The pioneers of tube computer construction from Great Britain were less lucky: the
Colossus Mark I and Colossus Mark II machines built in the strictest secrecy after the end of the
§ 1.1. Computer: what it is 58
war by Churchill's personal order were destroyed, and their creator Tommy Flowers, obeying
17

the same order, was forced to burn all the design documentation, which made it impossible to
recreate the machines. The general public became aware of this project only thirty years later,
and its participants were deprived of deserved recognition and were actually excommunicated
from the world development of computer technology. By the time the project was declassified,
the achievements of the Colossus creators were only of historical interest, and most of them
were lost when the machines and documentation were destroyed.
It is often claimed that the Colossus machines were designed to decrypt messages
encrypted by the German electromechanical encryption machine Enigma, and that the famous
mathematician Alan Turing, one of the founders of the theory of algorithms, participated in the
project (and almost led it). This is not true; Turing did not take any part in the Colossus project,
and the machine built with his direct participation and really intended for breaking Enigma codes
was called Bombe, was purely electromechanical and, strictly speaking, was not a computer, as
well as Enigma itself. Tommy Flowers' machines were designed to break ciphergrams made with
the Lorenz SZ machine, the cipher of which was much more resistant to breaking than the
Enigma cipher, and did not lend itself to electromechanical methods.
However, Tommy Flowers did have a chance to work under Turing for some time in one of
the British cryptanalytic projects, and it was Turing who recommended Flowers' candidacy for
the Lorenz SZ-related project.
Computers built on radio tubes are usually called first-generation computers; it
should be noted that only electronic computers are distinguished by generations, and all
kinds of mechanical and electromechanical computers are not included in them. In
particular, Konrad Zuse's machines were not electronic, so they are not considered "first-
generation computers" or computers in general.
The capabilities of the machines of this era were very limited: due to the bulky
element base we had to make do with meager (by modern standards) memory capacities.
Nevertheless, it was to the first generation that one of the most important inventions in
the history of computing machines belongs - the principle of stored program, which
implies that the program in the form of a sequence of command codes is stored in the
same memory as the data, and the memory itself is homogeneous and command codes
do not differ from data in principle. Machines corresponding to this principle are
traditionally called von Neumann machines in honor of John von Neumann.
The history of the name is rather peculiar. One of the first electronic machines to store a
program in memory was the EDVAC computer; it was built by Moushley and Eckert, who are
familiar to us from ENIAC, and they discussed and designed the new machine already during
the construction of ENIAC. John von Neumann, who participated as a scientific consultant in the
Manhattan project , became interested in the ENIAC project because Manhattan required huge
18

amounts of calculations, on which a whole army of female calculators who used mech anical
arithmeters worked. Naturally, von Neumann took an active part in discussions with Moushley
and Eckert about the architectural principles of the new machine (EDVAC); in 1945 he
summarized the results of the discussions in a written document known as the First Draft of the

There is a version that Churchill's goal was to prevent publicity of the fact that he knew in advance
17

about the mass bombing of Coventry from the intercepted ciphergrams, but he did nothing about it in order
not to give away Britain's ability to open German ciphers; however, historians differ on this point.
Moreover, the version about the conscious sacrifice of Coventry is refuted by a number of testimonies of
the direct participants of the events of that time. The question of why it was necessary to destroy the
equipment used to open the ciphers after the end of the war remains open.
The American project to build an atomic bomb.
18
§ 1.1. Computer: what it is 59
Report on the EDVAC machine. Von Neumann did not consider the document complete: in this
version, the text was intended only for discussion by members of the Moushley and Eckert
research group, which included, among others, Herman Goldstein. The prevailing version of
historical events is that it was Goldstein who commissioned the reprinting of the manuscript
document, putting only von Neumann's name on its title page (which is formally correct, since
von Neumann was the author of the text, but not quite correct in the light of scientific traditions,
since the ideas set forth in the document were the result of collective work), and then, having
reproduced the document, sent out several dozen copies to interested scientists. It was this
document that intentionally linked von Neumann's name with the relevant architectural principles,
although von Neumann does not appear to be the author (at least not the sole author) of most of
the ideas presented there. Later von Neumann built another machine, the IAS, in which he
embodied the architectural principles outlined in the "message".
There are many interesting stories associated with the computational work done for the
Manhattan Project; some of them were described by another participant in the project, the
famous physicist Richard Feynman, in his book "You're Joking, Mr. Feynman" [7]. [7]. There is,
in particular, such a fragment:

And as for Mr. Frenkel, who started all this activity, he began to suffer from computer
disease - everyone who has worked with computers today knows about it. It is a very
serious disease, and it is impossible to work with it. The trouble with computers is that
you play with them. They are so beautiful, so many possibilities - if it's an even number,
you do this, if it's an odd number, you do that, and very soon you can do more and more
sophisticated things on one single machine, if you are smart enough.
After a while, the whole system fell apart. Frenkel paid no attention to it, he was no longer
in charge. The system was very, very slow, and he was sitting in his room, figuring out
how to get one of the tabulators to automatically print the arctangent of X. Then the
tabulator would turn on, print the columns, then - bang, bang, bang - calculate the
arctangent automatically by integration and produce the whole table in one operation.
Absolutely useless. After all, we already had arctangent tables. But if you've ever worked
with computers, you understand what a sickness it is - the delight of being able to see
how much can be done.
Unfortunately, our time is too different from when Feynman worked on the Manhattan Project
and even from when he wrote his book. Not everyone who now deals with computers is aware
of the existence of this "computer disease", computers have become all too commonplace, and
most people find computer games much more fun than "playing" with the computer itself, with its
capabilities. Feynman is absolutely right that "everyone who worked with computers" knew about
this disease - it's just that in those days there were no "end users", everyone who worked with
computers was a programmer. Strangely enough, it is this "disease" that turns a person into a
programmer. If you want to become a programmer, try to catch the disease described by
Feynman.
In one way or another, the stored program principle was a definite breakthrough in
the field of computer science. Before that, machines were programmed either with
punched tapes, like Konrad Zuse's machines, or with jumpers and toggle switches, like
ENIAC; it took several days to physically set the program - to rearrange all the jumpers
and switch the toggle switches - and then the count would pass in an hour or two, after
which the machine had to be reprogrammed again. Programs in those days were not
written, but rather invented, because in essence a program was not a sequence of
instructions, but a scheme of connection of machine nodes.
§ 1.1. Computer: what it is 60
Storing a program in memory in the form of instructions made it possible, firstly, not
to spend a huge amount of time on changing a program: it could be read from an external
medium (punched cards or a deck of punched cards), placed in memory and executed,
and this happened quite quickly; of course, it also took a lot of time to prepare the program
- to think it up and then put it on punched cards or punched cards - but it did not use up
the time of the machine itself, which cost a lot of money. Secondly, the use of the same
memory both for command codes and for processed data made it possible to treat a
program as data and to create programs that operated on other programs. Such now-
familiar phenomena as compilers and operating systems would have been unthinkable on
machines that did not meet the definition of a von Neumann machine.
Strictly speaking, von Neumann's architectural principles include not only the principle of the
stored program but also a variety of other properties of a computer; you will return to this point
in §3.1.1.
Meanwhile, the memory capacity of computers had managed to grow somewhat; for
example, the already mentioned IAS of John von Neumann had 512 memory cells of 40
bits each. But while the Americans continued to build computers focused exclusively on
scientific and engineering numerical calculations, even if using a stored program, in
Britain at the same time found people who paid attention to the potential of computing
machines to process information beyond the narrow "calculation" area. The first, or at any
rate one of the first computers, originally designed for purposes broader than numerical
calculations, is considered to be the LEO I, developed by the British company J. Lyons
& Co. Lyons & Co.; it is noteworthy that this firm, which was engaged in the food supply,
restaurant and hotel business, had nothing to do with the engineering industry. In 1951,
the newly built computer took over a large part of the accounting and financial analysis
functions of the company, and the actual calculations as such, although noticeable, but by
no means the largest share of the operations performed by the machine. By accepting
input data from the punch cards and outputting the results to a text printing device, the
machine made it possible to automate the preparation of payrolls and other similar
documents. Ada Lovelace's prophecy slowly began to come true: the object of work for
the computer program was information, and mathematical calculations were an important,
but by no means the only way to process it.
Meanwhile, the growing level of technology was inevitably approaching a revolution
in computer construction. The main innovation behind the revolution was the
semiconductor transistor, an electronic element that is circuitry very similar to the radio
lamp triode. The transistor, like the triode, has three contacts, which are usually called
"base," "emitter," and "collector" (or "gate," "source," and "drain" for a type of transistor
called a field-effect transistor). When the voltage at the base changes with respect to the
emitter (at the gate with respect to the source), the current between the emitter and
collector (between the source and the drain) changes. In analog electronics, both the triode
tube and the semiconductor transistor are used to amplify the signal because the currents
flowing between the anode and cathode of the triode, and between the emitter and
collector of the transistor, can be much larger than the signals applied to "close" them to
the grid or base, respectively. In digital circuits, power does not play a role; here the
control effect as such is more important. In particular, like two triodes, two transistors
allow you to make up a trigger, with the current flowing through one transistor closing
§ 1.1. Computer: what it is 61
the other, and vice versa.
The first working transistor is believed to have been created in 1947 at Bell Labs, with
William Shockley, John Bardeen and Walter Brattain credited as the inventors; they were
awarded the Nobel Prize in Physics a few years later. Early transistors were bulky,
unreliable, and inconvenient to work with, but rapid improvements in crystal growth
technology made it possible to mass-produce transistors, which, compared to radio tubes,
were, first, quite tiny; second, transistors did not require cathode heating, so they also
consumed much less electricity; finally, again compared to tubes, transistors were
virtually trouble-free: Of course, they also sometimes failed, but this was an emergency,
whereas lamp failure was just a routine event, and the radio tubes themselves were
regarded as consumables rather than permanent parts of the design.
The second major invention, which determined the change of computer generations,
was the memory on magnetic cores. The bank of such memory (Fig. 1.2, right) was a
rectangular grid of wires with ferrite rings at its nodes; each ring stored one bit of
information, replacing the bulky device of three or four radio lamps used for the same
purpose in the first generation of computers. Computers built on solid-state electronic
components, primarily transistors, are commonly referred to as second-generation
computers. If the first-generation computers occupied entire buildings, the second-
generation machine fits in one room; power consumption has been dramatically reduced,
and the capabilities, and above all the amount of RAM, significantly increased. The
reliability of machines has also increased, as transistors fail much less frequently than
radio lamps. The cost of computers in monetary terms dropped significantly. The first
fully transistorized computing machines were built in 1953, and in 1954 IBM released the
IBM 608 Transistor Calculator, which is called the first commercial computer.
The next major change in the approach to building computers was made possible with
the invention of integrated circuits - semiconductor devices in which a single crystal
houses the

Fig. 1.2 Transistor (left); memory bank on ferrite cores (right) 8

Several (up to several billion in modern conditions) elements such as transistors, diodes,
resistors and capacitors. Computers based on integrated circuits are considered to be of
§ 1.1. Computer: what it is 62
the third generation; despite their still very high cost, it became possible to mass-produce
these machines - up to tens of thousands of copies. The central processor of such a
computer was a cabinet or nightstand full of electronics. As the technology improved,
microchips became more and more compact, and their total number in the CPU steadily
decreased. In 1971 the next transition of quantity into quality took place: microchips were
created that contained the whole central processor. Which chip became the first
microprocessor in history is not known; most often it is called Intel 4004, about which
we can at least say for sure that it was the first microprocessor available on the market.
According to some sources, the MP944 chip, which was used in the avionics of the F-14
fighter jet, should be given the lead in this case, but the general public, as usual, knew
nothing about this development until 1997.
The advent of microprocessors made it possible to "package" a computer into a
desktop device known as a "personal computer". From this point on, it is customary to
count the history of the fourth generation of computers, which continues to this day.
Strange as it may seem, no qualitatively new improvements have been proposed for the
past half a century or so. The Japanese project of "fifth-generation computers" did not
yield significant results, especially since it was not based on a
Photo from Wikimedia Commons; http://commons.wikimedia.org/wiki/File:
8

KL_CoreMemory_Macro.jpg.

on the technological development of the hardware base, but on the alternative direction
of software development.
As can be seen, nowadays computers are used to process any information that can be
recorded and reproduced. Besides traditional databases and texts, to which electronic
informatics was reduced in the middle of XX century, computers successfully process
recorded sound, images, video recordings; there are attempts to process tactile
information, albeit in embryonic state - in practical application it is still only Braille
displays for the blind, but engineers do not give up attempts to create all kinds of
electronic gloves and other similar devices. The situation with taste and smell is much
worse, but taste and smell information in general at the current level of technology can
not be processed; there is no doubt that if someday a way to record and reproduce taste
and smell will be found, computers will be able to work with these types of information.
Of course, sometimes computers are also used for numerical calculations; there is
even a special industry for the production of so-called supercomputers designed
exclusively for solving large-scale computational problems. Modern supercomputers
have tens of thousands of processors and in most cases are produced in single copies; in
general, supercomputers are rather an exception to the general rule, while most
applications of computers have very little in common with numerical calculations. The
question may naturally arise - why then do computers still continue to be called
computers? Wouldn't it be better to use some other term, for example, call them info-
analyzers or info-processors? Strange as it may seem, this is completely unnecessary; the
point is that it is not only numbers and not only formulas that can be calculated. If we
recall the notion of a mathematical function, we will immediately find that both the area
of its definition and the area of its values can be sets of arbitrary nature. As it is known,
§ 1.1. Computer: what it is 63
any information can be processed only if it is represented in some objective form;
moreover, digital computers require discrete representation of information, and this is
nothing else than representation in the form of chains of symbols in some alphabet, or
simply texts; note that it is precisely such representation of arbitrary information that is
considered in the theory of algorithms. With this approach, any transformation of
information turns out to be a function from a set of texts to a set of texts, and any
processing of information becomes a computation of a function. It turns out that
computers are still engaged in calculations, albeit not of numbers, but of arbitrary
information.
1.1.2. Processor, memory, bus
There's a machine in our hall, A tire
goes through it, And information goes
back and forth on the tire.

The internal structure of almost all modern computers is based on the same principles.
The basis of the computer is a common bus, which is, roughly speaking, many (several
dozens) parallel wires called tracks. The bus is connected to the central processing unit
(CPU from central processing unit), RAM from random access memory, and controllers
that allow you to control the rest of the computer devices. The CPU communicates with
the rest of the computer through the bus; RAM and controllers are designed to ignore any
information passing over the bus except that which is addressed to a particular memory
bank or controller. To accomplish this, a portion of the bus tracks are allocated to an
address; this portion of the bus tracks is called the address bus. The tracks that carry
information are called data buses, and the tracks that carry control signals are called
control buses. From the circuitry point of view, each track can be in the state of logical
one (the track is "pulled up" to the supply voltage of the circuit) or zero (the track is
connected to "ground", i.e. zero voltage level); a certain combination of zeros and ones
on the address bus constitutes an address, and all devices, except the CPU, are enabled to
work with the bus only when the state of the address bus corresponds to their address, and
the rest of the time they do not pay any attention to the bus and do not transmit anything
to it, so as not to interfere with the work of the CPU with other devices.
RAM, or simply memory, consists of identical memory cells , each of which has its
19

own unique address that distinguishes this cell from others. All cell addresses technically
possible on a given computer form the address space; its size is determined by the number
of tracks on the address bus: if these tracks are N, then the possible addresses will be 2N
(to the reader who has not studied combinatorics, this may seem incomprehensible; in this
case, come back here after §1.3.1, where all the necessary information is given).
Many computers use virtual addresses to work with memory; in this case, the address space
we just discussed - actually the set of possible address bus states - is called physical (as

19
It is useful to know that the English term memory cell, literally translated into Russian as "memory
cell", is used in reality in a completely different way - it denotes a circuit that stores a single bit. In the
sense in which we use the term memory cell, English sources use the phrase memory addressable location
or simply memory location.
§ 1.1. Computer: what it is 64
opposed to virtual, which is formed by virtual addresses). We will return to the discussion of
virtual memory in §3.1.2.
There are only two operations that can be performed on a memory cell: write a value
to it and read a value from it. To perform these operations, the CPU sets the address of
the desired cell on the address bus and then, using the control bus, transmits an electrical
impulse that causes the selected cell - that is, the one whose address is set on the bus and
no other - to transfer its contents to the data bus (read operation) or, conversely, to set its
new contents according to the state of the data bus (write operation). The old cell contents
are lost in this process. On the data bus information is transmitted in parallel: for example,
if a cell contains, as is usually the case, eight digits (zeros and ones), then to transfer data
during its reading and writing uses eight tracks; when performing a read operation, the
memory cell must set these tracks to logic levels corresponding to the digits stored in it,
and during a write operation, conversely, to set the digits stored in the cell in accordance
with the logic levels of the data tracks. To store values, several memory cells are often
used in a row, that is, they have neighboring addresses, and the data bus bit capacity is
usually sufficient for simultaneous transfer of information of several cells.
It should be noted that RAM is an electronic device that requires power to function.
When the power is turned off, the information stored in the memory cells is
immediately and irretrievably lost.
Computer memory should never be confused with disk storage devices, where
files are stored. The CPU can directly interact with memory via the bus; the CPU cannot
work with disks and other devices by itself, for this purpose it is necessary to run special
rather complex programs on the CPU, which are called drivers. Drivers organize work
with the disk and other external devices through controllers by transferring certain control
information.
Some memory blocks may physically represent permanent memory rather than RAM.
Such memory does not support write operation, i.e. its contents cannot be changed, at
least not by CPU operations; on the other hand, information written to such memory is
not erased when the power is turned off. The CPU does not distinguish between RAM
and persistent memory cells in any way, because they work exactly the same way when
performing a read operation. Usually, when a computer is manufactured, some program
is written into the permanent memory to test the computer hardware and prepare it for
operation. This program starts running when you turn on the computer; its job is to find
where the operating system can be loaded from, then load it and give it control; everything
else, including running user programs, is up to the operating system. We will talk about
operating systems in detail later; for now, let's just note that an operating system is a
program, not something else, i.e. it was written by programmers; an operating system
differs from all other programs only in that, having started on the computer before other
programs, it gets access to all its capabilities, while all other programs are started by the
operating system, and the operating system starts all other programs so that it does not
give them any access to the computer's capabilities; user programs are the only programs
that can be started by the operating system.
Note that the operating system is responsible, among other things, for organizing work
with all external devices, so it contains all the necessary drivers for this purpose.
Disks connected to a computer may contain a large amount of various information,
§ 1.1. Computer: what it is 65
and in order not to get confused in this abundance, the operating system organizes the
storage of information on disks in the form of files - units of information storage, having
names accessible to a person. Every computer user knows about the existence of files,
because they have to work with them not just every day, but every time you need to put
some information in storage or, conversely, to use the information stored earlier. Not
everyone understands that files are placed on disks, and only on them. There are no
files in memory, they are simply not needed there, because memory allocation is
constantly changing depending on the needs of running programs; this is handled by the
same operating system, and only it knows which memory areas are currently used and for
what. Even running programs are not allowed into this kitchen, each of them disposes
only of those areas that are given to it; well, a human user does not need to know about
memory allocation at all, he could not do anything useful with this knowledge anyway.
Therefore, no names are required for memory areas; the operating system identifies them
for itself as it sees fit.

1.1.3. Principles of operation of the central processing unit


The central processor is an electronic circuit, usually in the form of a microprocessor
(i.e. in the form of a single chip); its main and only purpose is to perform the simplest
actions specified by the commands that make up the program. As part of the CPU is
usually present so-called registers - memory devices capable of storing from a few to
several dozen binary digits, the main work of the processor performs on the information
stored in the registers. Among the operations that can be performed by the processor,
necessarily include reading and writing of memory cells, in which information is
transferred through the bus from RAM to the processor or back.
Among the operations performed by the CPU there is always arithmetic - at least
addition and subtraction, although all modern CPUs also have multiplication and division;
an example of an arithmetic operation might be the instruction "take the values from the
second and fourth registers, add them up, and write the result back to the second register".
Some processors can perform such actions not only on registers, but also on groups of
memory locations. There are other actions, such as copying information from one register
to another, logical operations, all sorts of service actions like transitions to execution of
instructions from another memory location; all together they form the instruction system
of a particular processor. We will get to know the instruction system of a real processor
in detail later, in the third part of this book.
Each elementary action performed by the CPU (machine instruction) is indicated by
an operation code or, as it is often called, a machine code. A program consisting of such
codes is located in memory locations; one of the processor's registers, called the
instruction counter or instruction pointer , contains the address of the memory location
20

where the next instruction to be executed is located. The processor works by executing an
instruction processing cycle time after time. At the beginning of this cycle, the address is
taken from the instruction counter and the code of the next instruction is read from the

20
The corresponding English terms are program counter and instruction pointer; it may seem that the
name "counter" is not very good here because it does not count anything, but the point is that the meaning
of the English word counter is much wider.
§ 1.1. Computer: what it is 66
memory cells located at this address. Immediately thereafter, the instruction counter
changes its value to indicate the next instruction in memory; for example, if the instruction
just read occupied three memory locations, the instruction counter is incremented by
three. The processor circuits decode the code and perform the actions prescribed by the
code: for example, this may be the familiar instruction to "take the contents of two
registers, add them up, and put the result back into one of the registers" or "copy a number
from one register to another", etc. When the actions prescribed by the instruction are
executed, the processor returns to the beginning of the instruction processing cycle, so
that the next pass of this cycle executes the next instruction, and so on to infinity (or
rather, until the processor is shut down).
Some machine instructions can change the sequence of instruction execution by
instructing the processor to move to another location in the program (that is, simply put,
to explicitly change the current value of the instruction counter). Such instructions are
called transition instructions. There is a distinction between conditional and
unconditional transitions; a conditional transition instruction first checks the truth of
some condition and makes a transition only if the condition is met, while an unconditional
transition instruction simply forces the processor to continue executing instructions from
a given address without any checks. Processors usually also support return-point memory
transitions, which are used to call subroutines.

1.1.4. External devices


The CPU and memory (RAM and persistent memory) essentially make up what is
called a computer, but if we only had them, it would be surprisingly useless. First, a
computer somehow needs to receive programs and data from the outside world, and
somehow give the results back. Secondly, the contents of RAM are instantly lost when
the power is turned off, so it is not suitable for any long-term and reliable storage of
information. Clearly, a computer needs devices other than RAM and the processor. In
early computer systems, all devices were connected directly to the CPU, but it quickly
became clear that this was inconvenient: the CPU had to be redesigned for each new
device; meanwhile, external devices are usually much simpler than the CPU, and appear,
of course, faster than new processors. In addition, there are too many different external
devices. It is simply technically impossible to organize support for all or at least a
significant number of such devices in a CPU. All this led to the idea of the common bus
described above.
External devices are connected to the common bus via a controller, an electronic
circuit that can communicate with the CPU via the bus; all controllers do this in the same
way, regardless of which devices they control, i.e. from the CPU's point of view all
controllers are "one and the same". The "other end" of each controller connects directly
to its external device and controls its operation. Interaction of the central processor with
the controller is based on the familiar read and write operations; moreover, on some
architectures memory cells and controllers are in one common address space and are
interchangeable, that is, the central processor "does not know" what he is dealing with -
with real memory or with the controller; however, more popular approach, when
controllers have a separate address space ; in this case we talk about the addresses of I /
11
§ 1.1. Computer: what it is 67
O ports. A single controller can support one or more such "ports", that is, it can respond
to several different addresses when operating on the bus. The operations of reading from
an I/O port and writing to such a port from the central processor's point of view look
exactly the same as the read-write operations of memory cells, but controllers, unlike
memory cells, do not memorize the values sent to them during "writing", but perceive
them as instructions to do something; during the "reading" of a port, controllers do not
issue any pre-stored value, but such a value that is somehow related to the state of the
controller itself, that is, it allows you to know, for example, whether the next operation is
completed, whether the controller is ready
How many ports a particular controller supports, what prescriptions can be given to
it and what the codes of these prescriptions are, what the values that are read from its ports
mean - all this the CPU does not know, because all this depends on the particular
controller, and it may not be the same for another controller. To work with each specific
controller you need a special program, which, as we have already said, is called a driver;
running on the CPU, the driver program gives commands to write and read the I/O ports
of "its" controller, solving its tasks. Typically, the driver is part of the operating system,
or becomes part of the operating system once it is loaded. For example, if a user 21
REGISTERS.

RAM
D E S C E.

LENTS

Fig. 1.3. Hierarchy of memory devices

running a certain program, and this program needs to write to a file on the disk, then for
22

this purpose the program will address the operating system with a request to write such
and such data to such and such a file, the operating system will calculate in which place
of the disk the corresponding file is or should be located, and, addressing, in its turn, to
the driver (i.e. actually to its separate part), will request to write certain data to a certain
place of the disk; then the driver, armed with its knowledge of the capabilities of the disk
controller, will first perform

1.1.5. Memory hierarchy


Information in a computer system can be memorized and stored by devices of
different types depending on how fast the access to this information should be, how long

In fact, in this case there are two different buses in the computer - one for memory cells and one for
21

controllers.
In fact, in the case of the disk, things will be a bit more complicated; we'll leave a more detailed
22

discussion of what's going on until volume two.


§ 1.1. Computer: what it is 68
term its storage should be and what is its volume. The hierarchy of memory devices is
schematically shown in Fig. 1.3. The most operative information is available in registers
of the central processor. However, the volume of register memory is set once and for all
when designing the processor and cannot be increased; this volume is limited because
each new CPU register increases the complexity of the CPU circuit, requires the
introduction of additional instructions and in general can significantly increase the cost
of the processor.
Cache memory is designed to increase the speed of access to data stored in main
memory. Cache memory duplicates data from main memory that is most frequently used
by the running program; it is important to understand that this data is a copy of the data
stored in main memory and not something else. The speed of access to the cache is
significantly higher than to the main memory, because the processor does not need to use
the bus to interact with the cache - and the bus is quite slow due to its relatively long
length, and its work cannot be significantly accelerated, as it is limited by the speed of
light and other physical reasons. At the same time, the cache itself has a rather complex
structure, and its volume is relatively small.
Cache usually has several levels, in modern conditions a four-level scheme is typical.
Each subsequent level is in some sense "further" from the processor's computational
circuits, due to which it has a lower access speed, but has a larger volume. Usually all
cache levels, except for the last one, are physically realized in one chip with the processor,
while the last one is a separate circuit located next to the processor (between it and the
bus).
RAM is the primary storage for running programs and the data needed to run them.
RAM can be relatively large, and its cost has decreased in recent years. However, it may
not be enough. In addition, the contents of RAM, cache, and registers are lost when the
computer is shut down, so these types of storage devices are not suitable for long-term
data storage.
At the next level of the hierarchy are magnetic disks or, in general, long-term storage
devices that allow access to data in any order. In addition to magnetic disks themselves,
devices in this class include, for example, flash card drives. The now defunct magnetic
drums also belong to this class. The volume of such devices can be orders of magnitude
larger than the volume of RAM, and the cost - significantly lower. In addition, the
information stored on disks is not lost when the power is turned off and can be stored for
a long time. On the other hand, to access the disks require slow (compared to the speed
of the processor and RAM) I / O operations; moreover, the processor is not able to directly
address the disks, so that all the information with which it will work - and the code of
programs and data for them - must be previously copied into RAM. As already mentioned,
it is on disks that the files familiar to every computer user are located.
Disk storage can last for years, but is still limited. For archival purposes, magnetic
tape drives (streamers) are used. Tapes are the most reliable, long-lasting and cheapest
(per unit volume) way to store data. The disadvantage of tapes is that data blocks cannot
be accessed in any order; as a rule, data from tapes is copied to disks before use. In recent
years, with the growth of hard disk drives, tapes have become relatively rare; streamers
are now only found in organizations dealing with large data archives. A modern streamer
cassette can store several terabytes of data; the streamer itself, i.e. the device for working
§ 1.1. Computer: what it is 69
with such cassettes, is much more expensive than an ordinary hard disk of similar
capacity, but the cassettes for it are comparatively cheap; when using a large number of
cassettes, the specific cost of storage (i.e. the total cost of storage divided by the volume
of stored information) can be lower by dozens of times compared to the use of hard disks.
In addition, tapes are much more durable than hard disks if storage conditions are met;
tapes recorded in the 1960s are still perfectly readable more than half a century later, as
long as you can find a working device to handle tapes of the right format.

1.1.6. Summary
So, we can summarize some results: a computer is based on a common bus to which
the RAM and CPU are connected; external devices, including hard disks and disk drives,
as well as keyboards, monitors, sound devices, and in general everything that you are used
to seeing in a computer, but which is neither CPU nor memory, is also connected to the
common bus, only not directly, but through special circuits called controllers. With
memory, the processor can work by itself, to work with all other devices require special
programs called drivers. Disk storage devices are used for long-term storage of
information, where information is usually organized in the form of the files you are
familiar with; files can store both data and computer programs, but to run a program or
process data, both must first be loaded into RAM.
Among all programs a special place is occupied by a program called the operating
system; it is launched first and gets full access to all possibilities of the computer
hardware, and all other programs are launched under the control (and under the control)
of the operating system, and do not have direct access to the hardware; to perform actions
that are not reduced to the transformation of information in the allocated memory,
programs have to address the operating system.

Understanding these basic principles of computer systems design is vital to a


programmer's work and to those who choose to learn programming, so if you don't
understand something, try rereading this chapter again, and if that doesn't help, ask
someone more experienced to explain anything you don't understand.

1.2. How to use a computer properly


1.2.1. Operating systems and types of user interface
If you install Windows, you will be happy for a
month, then you will suffer for the rest of your life. If
you install Unix, you will suffer for a month, then you
will be happy for the rest of your life.
From what I overheard from the students

In the previous paragraph we mentioned a special program called the operating


system. Its main tasks include, firstly, controlling the start and termination of other
programs; secondly, the operating system takes over the control of peripheral devices in
§ 1.1. Computer: what it is 70
all their diversity, and provides all other programs with simplified opportunities for access
to the periphery: thus, a user program can ask the operating system to open a file for
reading (specifying only the file name), read information from it, place this information
in a specified area of RAM, and then close it. The program does not "care" what type of
disk the file is located on: on a hard disk built into the computer (which, by the way, are
also very many and very different), on an optical CD or DVD, on an old-type floppy disk,
on a flash-key or in general on the disk of another computer, which is connected as a
network resource. The operating system takes care of all the technical operations that need
to be performed to find a file with the right name and extract information from it.
The first operating systems appeared back in the 1960s, and in the half-century since
then, of course, there have been many; but strangely enough, the number of fundamentally
different operating systems has dwindled considerably by now. Everyone knows the word
"Windows"; this is the name of Microsoft's systems. However, virtually all the operating
systems that exist today are not
Microsoft-related systems (and, consequently, the word "Windows" in the name) appear
to be members of a family of systems collectively called Unix. These are the freely
distributed Linux systems in all the variety of their distributions, such as Debian, Ubuntu,
Fedora, Slackware, Gentoo and many others, as well as numerous systems of the BSD
family - FreeBSD, OpenBSD and others. In addition, the Unix family includes Android,
which is based on the Linux kernel, as well as Mac OS X and iOS, which trace their
origins back to the BSD family.
The role, tasks, and principles of operating systems is a topic for a long discussion,
and we will come back to this issue more than once; for now, let us note that the tasks of
operating systems, contrary to a common misconception, in no way include the
organization of interaction with the user, i.e. the person who works with the computer.
The point is that operating systems themselves are quite complex - they are almost the
most complex programs in the world, so their creators usually try to do everything that
can be done outside the operating system, just outside it and do. In this case, the means
of communication between the computer and the user - the so-called user interface - does
not have to be part of the operating system, it can be supported by an ordinary program.
This is how it is done in all Unix variants; for drawing windows, switching between them
and so on, various add-ons are responsible, written in the form of ordinary programs that
run under the control of the operating system, but are not part of it. In Microsoft operating
systems, on the contrary, support for the graphical interface is included in the kernel of
the system, which leads, in particular, to the impossibility for the user to choose his own
interface; he has to work with the only interface provided by the system.
Since we are talking about the graphical user interface, it should be noted that in some
cases it is not needed in the system at all. For example, server computers that serve user
requests on the computer network, most often placed in special racks for hardware, and
depending on the specific task in one rack can include from a dozen to two to three
hundred computers, each with its own processor, memory and peripherals. Processors and
power supplies of computers need cooling - usually air, that is, with the help of
conventional fans; these fans with so many computers placed in one place, can make a lot
of noise. Therefore, for the installation of servers usually provide a special room in which
a person to be uncomfortable because of the noise, because of the reduced air temperature,
§ 1.2. How to use a computer properly 104
which is specially maintained by air conditioners to ensure more reliable operation of
computers, because of drafts created by fans. Moreover, the physical absence of people
in the server room improves conditions for the work of computers - no one brings dust
and dirt into the room, does not release excess moisture into the air, does not trip over
wires. Therefore, people enter such a room only when it is necessary to do something with
the equipment - to repair it, install a new one, replace the old one; it happens that no living
person does not appear in the server room for several months. All settings and
management of server machines are done remotely, from other rooms, where the
workplaces of programmers and system administrators. Some server computers do not
have a video card, that is, the monitor can not be connected to them; before the mass
distribution of USB to such computers could not be connected to the keyboard. Why, one
may ask, should such machines support a graphical interface that no one will ever see?
Most end-users of computers these days don't understand how a computer can be
37

used at all if it doesn't have a graphical user interface, but this is largely just a consequence
of the propaganda of some corporations. Up until the mid-1990s, graphical user interfaces
were not as widespread as they are now, which did not prevent people from using
computers, and even now many users prefer to copy files and view the contents of disks
using two-pane file monitors such as Far Manager or Total Commander, whose
ideological predecessor was Norton Commander, which worked in text mode. Curiously,
even the traditional window interface, which implies the ability to resize windows, move
them around the screen, partially overlap them, etc., in the conventional era of the 1980s
and 1990s was often implemented without any graphics, on the screen of an alphanumeric
monitor.
However, both Norton Commander and all its later clones, and window interfaces that
used text mode (and in the days of MS-DOS they were very popular), although they do
not use graphics as such, are still based on the same basic principle as the now familiar
"icon-menu" interfaces: they use the screen space to place the so-called interface
elements, or widgets, which usually include menus, buttons, checkboxes and
radioobuttons. checkboxes and radiobuttons, fields for entering text information, as well
as static explanatory inscriptions; the use of graphical mode somewhat expands the
repertoire of widgets, including windows with pictograms ("icons"), all sorts of sliders,
indicators and other elements that the developer had enough imagination. Meanwhile,
observing the work of professionals - programmers and system administrators, especially
those who use Unix systems, one can notice another approach to human-computer
interaction: the command line. In this mode, the user enters commands from the
keyboard, prescribing the execution of certain actions, and the computer executes these
commands and displays the results on the screen; once upon a time, this was called the
dialog mode of work with the computer, in contrast to the batch mode, when operators
formed packages of tasks received from programmers in advance, and the computer

37
An end user is usually defined as a person who uses computers to solve some tasks that are not related
to the further use of computers; for example, a secretary or a designer using a computer is an end user, but
a programmer is not, because the tasks he solves are aimed at organizing the work of other users (or even
himself) with the computer. It is quite possible that the end users will be those who use the program that
the programmer is currently writing.
§ 1.2. How to use a computer properly 105
processed these tasks when ready.
Initially, the dialog mode of work with computers was built with the help of so-called
teletypes , which represented an electromechanical typewriter connected to a communication
38

line. The original purpose of teletypes was to transmit text messages at a distance; not so lon g
ago for urgent messages used telegrams, which the letter carrier delivered to the home of the
addressee, and the received telegram was a strip of printed text issued by the teletypewriter, cut
with scissors and pasted on a dense base. The teletype generally worked quite simply: whatever
the operator typed on the keyboard was transmitted to the communication line, and whatever
came from the communication line was printed on paper. For example, if two teletypes were
connected to each other, the operators could "talk" to each other; in fact, that's how telegrams
were transmitted, except that the lines of communication between the teletypes were
automatically switched in much the same way as wireline telephony lines are switched, and in
some cases wireline telephones were used as lines of communication. Telegrams have been
almost completely displaced by the development of digital communication networks - mobile
telephony and the Internet.
The idea of connecting a teletype to a computer dates back to the first generation of
computers; teletypes were mass-produced for telegraphy and were available on the market, so
they did not need to be developed, and computer engineers of that time had other things to worry
about. When working with a computer in dialog mode using a teletype, the operator typed in

Fig. 1.4. ASR-33 teletype with punch and punch tape reader 15

command on the keyboard, and the computer's response was printed on a paper tape.
Interestingly, this mode of operation "lasted" for a surprisingly long time: it was completely
eliminated from practice only by the end of the 1970s.
Using a teletype as a computer access device had an obvious disadvantage: it was very
paper-intensive. Initially, this was the reason for the mass transition from traditional teletypes to
alphanumeric terminals, which were equipped with a keyboard and a display device (screen)
based on an electron-beam tube (kinescope); everything the operator typed on the keyboard was
transmitted to the communication line, as in the case of a teletype, and the information received

38
It is interesting to note that in the Russian language the word "teletype" is firmly established as a
nominal designation of the corresponding devices, while in English-language sources the term
"teleprinter" (teleprinter) is more often used; the point is that the word teletype was a registered
trademark of one of the manufacturers of such equipment.
§ 1.2. How to use a computer properly 106
from the communication line was displayed on the screen, thus avoiding wasting paper.
Saving paper is by no means the only or even the main advantage of the screen in
comparison with paper tape, because on the screen you can change the image in any place at
any time; for terminals almost immediately were introduced control chains of characters known
as escape-sequences (from the special character Escape, which has a code of 27), when
receiving which the terminal moved the cursor to a specified position on the screen, changed the
color of the output text, etc.
Now alphanumeric terminals are no longer produced; if necessary, this role can handle any
laptop equipped with a serial port or USB-serial adapter, if you run the appropriate software on
it. By the way, the initial configuration of the above-mentioned server machines that do not have
a video card, is done in this way: the system administrator connects his work computer via COM-
port to the server machine being configured and runs a terminal emulator. This allows the
operating system to be downloaded from an external media, installed on the server machine, set
up communication with the local network and remote access tools; further configuration, as well
as management during operation is usually done remotely over the network,
15
Photo from Wikipedia, see http://en.wikipedia.org/wiki/ File:ASR-33_at_CHM.agr.jpg.

because it is more convenient - the computer to be customized no longer needs to be connected


directly to the administrator's machine by a cord.
A special program called a command line interpreter is used to process commands
entered by the user. This program usually issues a short prompt and waits for the user to
type a line and press Enter, then performs the actions prescribed by the entered line. In
the simplest case, such a line contains a single command, although the interpreter allows
several commands to be placed on the same line and even cleverly link them together so
that they interact with each other. The first word of a command is treated as its name, the
other words as parameters. Some command names the interpreter "knows" itself, such
commands are called built-in commands; all other names the interpreter considers to be
names of external programs, finds the necessary executable files on disk and runs them.
One sometimes hears that Unix systems are supposedly user-unfriendly because of
the command line. This is, of course, a myth that is based on confused cause and effect.
It is easy to disprove this myth. Hardly anyone who has ever seen an Apple notebook will
claim that MacOS X lacks a graphical user interface; on the contrary, it's almost the most
"sprawling" there. Most "ordinary" users are quite satisfied with it, but when another
macbook is in the hands of a professional, among all the splendor of the graphical
interface, a terminal emulator with a command line prompt is suddenly found.
Practically the same thing happens on modern distributions of freely distributed Unix
systems oriented to the end user. The variety of graphical shells used there is striking. As
already mentioned, the graphical user interface is not part of the operating system;
moreover, unlike commercial systems, including MacOS X, the appearance of the user
interface in traditional systems is not hardwired into a graphical add-on, but is
implemented by a separate program called a window manager. The user can choose the
appearance and functionality of the window system that is more convenient for him, and
with some skill - to change the appearance and behavior of the window system, for
example, depending on his mood, and right during work, even without closing the
windows of running applications.
Of course, for Linux and FreeBSD there have long existed, among other things, "icon-
§ 1.2. How to use a computer properly 107
based" file managers, and quite a few of them have been written - Nautilus, Dolphin,
Konqueror, PCManFM, Thunar, Nemo, SpaceFM, ROX Desktop, Xfe, and others; even
more widely represented are two- and three-dimensional file managers.

Fig. 1.5. vt100 terminal 17

panel file managers that continue the tradition of the famous Norton Commander: the
text-based Midnight Commander, as well as the graphical gentoo (not to be confused with
the Linux distribution of the same name), Krusader, emelFM2, Sunflower, GNOME
Commander, Double Commander, muCommander, and so on. Nevertheless, many
professionals prefer to work with files - copy them, rename them, sort them into separate
directories 39 40
, move them from disk to disk, delete them - using command-line
commands. This is explained by one very simple fact: it is really more convenient and
faster.
Interestingly, command line facilities are also present in Windows family systems;
you can get a terminal window with a corresponding prompt there by pressing the
sacramental "Start" button, selecting "Run" from the menu and entering three letters
"cmd" as the command name; but the standard Windows command line prompt is very
primitive, it is inconvenient to use, and most users are not even aware of its existence. It's
not suitable for professionals either, so in the Windows world even they have to make do
with graphical interfaces, using the command line only on rare occasions, usually related
to system maintenance. Programmers who are used to Unix systems and for one reason
or another are forced to work with Windows often install command line interpreters
ported from Unix; for example, such an interpreter is included in the MinGW package.
Of course, the command line requires some memorization, but there are not many

17
Ibid, see http://en.wikipedia.Org/wiki/File:DEC_VT100_terminal.jpg
40
Nowadays, the term "folder" is in common use; this term, which actually means an element of the
graphical interface - the "icon box" - is not acceptable for naming a file system object containing file
names. In particular, folders are not necessarily represented on disk in any way, and their icons do
not have to correspond to files; at the same time, you can work with files and directories without "folders"
- neither two-pane file managers nor the command line imply any "folders". In this book, we use correct
terminology; we consider the terms "directory" and "directory" to be equal, and the word "folder" appears
in the text only when we need to remind you of its inappropriateness.
§ 1.2. How to use a computer properly 108
commands to be memorized; meanwhile, graphical interfaces, despite all the claims about
their "intuitive comprehensibility", also require a lot of memorization: just using the Ctrl
and Shift keys in combination with the "mouse" when selecting items (this is still quite
simple, because the result is immediately visible) and when copying files, moving them
and creating "shortcuts". Learning to work with graphical interfaces from scratch, i.e.
when the trainee has no experience with computers at all, turns out to be harder than
learning to work with command line tools; the general public slowly stops noticing this
simply because nowadays people get used to graphical interfaces from pre-school age due
to their widespread use - which, in turn, is more the result of the efforts of PR departments
of certain commercial corporations rather than a consequence of a very questionable
nature. Often a user does not get used to a graphical interface in principle, but to a
particular version of it and finds himself completely helpless, for example, when
switching to another version of the operating system.
Of course, before command line tools became really convenient, they had to go
through a long way of improvement. Modern command line interpreters "remember"
several hundred of the last commands entered by the user and allow you to quickly and
effortlessly find the desired command among the memorized ones; in addition, they allow
you to edit the entered command using the "arrow" keys, "guess" the name of the file by
the first letters entered, some variants of the command line interface give the user
contextual clues as to what else can be written in this part of the entered command, etc.
Working with such a command line can be many times and even dozens of times faster
than performing the same actions with the help of any "tricked out" graphical interface.
Imagine, for example, that you have returned from a trip to Paris and want to copy photos
from your camera card to your computer. Commands
cd Photoalbum/2015
mkdir Paris
cd Paris
mount /mnt/flash
cp /mnt/flash/dcim/* .
umount /mnt/flash

taking into account autocomplete file names and using the command history, you can type
in six or seven seconds without much hurry, since you won't have to type most of the text
at all: You probably have only the name of the Photoalbum subdirectory in your
home directory that starts with Ph, so you can type just those two letters, press the Tab
key, and the command interpreter will add the name Photoalbum with a slash
after it; the same can be done by typing the command "mount /mnt/flash " (the
mnt directory is most likely the only directory in the root directory that starts
with m, and its flash subdirectory is most likely the only one that starts with f);
instead of "cp /mnt/flash/dcim/* . " an experienced user will type "cp
!:1/dcim/* ." instead of "cp !:1/dcim/* .", and the interpreter will
substitute the first argument of the previous command instead of "!:1", i.e.
"/mnt/flash"; the command "umount /mnt/flash" does not need to be
typed, it will be enough to type "u!m" (the text of the last command starting with m will
§ 1.2. How to use a computer properly 109
be substituted for !m), or simply press the up arrow twice and add the letter u to the
beginning of the command mount /mnt/flash that appears on the screen.
If you perform the same actions through the icon interface, you will first need to click
the mouse to get to the contents of the card, then, using the mouse in combination with
the Shift key, mark the entire list of files, right-click the mouse to call the context menu,
select "copy", then find (using the same mouse clicks) the Photoalbum/2015
directory, call the context menu again, create the Paris subdirectory, double-click it,
and finally, calling the context menu for the third time, select "paste". Even if you do
everything quickly, this procedure will take you at least twenty or thirty seconds, if not
more. But this, strange as it may seem, is not the main thing. If you, for example, very
often copy photos to your disk, then using the command line this procedure can be
automated by writing a so-called script - an ordinary text file consisting of commands.
For our example, the script might look like this:

#!/bin/sh
cd Photoalbum/2015
mkdir $1
cd $1
mount /mnt/flash
cp /mnt/flash/dcim/* .
umount /mnt/flash

but an experienced user is likely to write the script more flexibly:


#!/bin/sh
DIR=Photoalbum/2015
[ "$1" = "" ] && { echo "No dir name"; exit 1 }
mkdir $DIR/$1
mount /mnt/flash
cp /mnt/flash/dcim/* $DIR/$1
umount /mnt/flash

If you now name either of these two scripts, e.g. getphotos, the next time you need
to copy new photos (e.g. when you return from Milan), all you have to do is give the
command

./getphotos Milan

This trick does not work with graphical interfaces: unlike commands, mouse movements
and clicks cannot be formally described, at least not in a way that is simple enough for
practical use.
Note that it is also better to launch graphical/window programs from the command
line rather than using all sorts of menus. For example, if you know the address of the site
you want to go to, the easiest way to launch a browser is to give the command:

firefox http://www.stolyarov.info &


§ 1.2. How to use a computer properly 110
The name of the program to be launched (in this case firefox ) is not so long that
41

there may be some problems with its typing, especially taking into account the
autocompletion - for example, the author on his computer it was enough to type only the
letters fir and press Tab; well, the address of the site you would still have to type on
the keyboard, only, perhaps, not in the command line, but in the appropriate window of
the browser.
The expressive power of modern command interpreters is remarkable: for example,
you can use the text output from one command as part of another command, not to
mention that the result of one program can be sent to the input of another program, and
thus build a whole chain of information transformations called a pipeline. Each program
that appears in your system, including those written by you personally, can be used in an
infinite number of combinations with other programs and built-in tools of the command
line interpreter itself; at the same time you can use programs of other authors for such
purposes that their authors did not even suspect. In general, if the possibilities of a
graphical user interface are limited by the imagination of its developer, the
possibilities of a properly organized command line are limited only by the
capabilities of the computer.
In any case, these possibilities are certainly worth exploring. Of course, overcoming
the influence of the propaganda of "software" monsters and convincing all computer users
to switch to the command line is an unrealistic task in modern conditions; but, as long as
you are reading this book, apparently, you are not quite an ordinary user. So, for an IT
professional, fluent command-line skills are practically mandatory; the absence of
these skills drastically reduces your value as a specialist. In addition, the command line
interface proves to be extremely useful during initial programming training, if you will,
as a teaching aid. The reasons for this are detailed in the "methodical preface", but it may
be incomprehensible to the non-specialist, in which case the author can only ask the reader
to take the importance of the command line on faith for a while; it won't last long, it will
soon become clear to you.
Our entire book is written assuming that you have a Unix system installed on your
computer and that you are using a command-line interface to work with your computer;
the remainder of this chapter is devoted to how to do this. Although we've already
mentioned it in the preface, we feel it's worth repeating: if you want to learn anything
from this book, the command-line interface should become your primary way of working
with computers on a daily basis, and you should do so as soon as possible.

1.2.2. History of Unix OS


In the late 1960s, a consortium of General Electrics, MIT, and Bell Laboratories (then
a division of AT&T) were developing the MULTICS operating system. The MULTICS
project is sometimes spoken of as a failure; either way, Bell Labs withdrew from the
project at some point. Among the Bell Labs employees involved in the MULTICS project

Paying tribute to the popularity of firefox, the author nevertheless considers it necessary to note that
41

he himself stopped using this browser in 2018 due to the unjustified "weighting" of its graphical interface
and switched to palemoon.
§ 1.2. How to use a computer properly 111
was Ken Thompson. According to one of the legends, then he was interested in a new
area of programming at that time - computer games. Due to the high cost of computing
equipment of that time Ken Thompson had certain difficulties with using computers for
entertainment purposes, so he was interested in the available in Bell Labs machine PDP-
7; this machine was already obsolete and, as a consequence, there were not so many
applicants for it. System software included in the standard for that machine, Thompson
was not satisfied, and, using the experience gained in the project MULTICS, he wrote for
PDP-7 his own operating system. Initially, Thompson's system was a dual-task system,
i.e. it allowed running two independent processes according to the number of terminals
connected to the PDP-7 [2].
The name UNICS (similar to MULTICS) was jokingly suggested by Brian
Kernighan. The name stuck, only the last letters of CS were later replaced by a single X
(the pronunciation remained the same). Ken Thompson was joined in its development by
Dennis Ritchie, and the two of them transferred the system to the more advanced PDP-11
minicomputer. It was then that the idea arose to rewrite the system (at least as much of it
as possible) in a high-level language. Thompson tried to use a truncated dialect of the
BCPL language, which he called "B" (read "bi"), but the language was too primitive for
this: it didn't even have structural data. Ritchie proposed to extend the language; the
authors used the next letter of the English alphabet after "B" to name the resulting
language - the letter "C", which, as you know, in English is called "C".
In 1973, the system created by Thompson was rewritten in C. For that time, this was
a more than dubious step: the prevailing view was that high-level programming was
fundamentally incompatible with the level of operating systems. Time showed, however,
that this very step determined the tendencies of industry development for many years. The
C programming language and the Unix operating system retain their popularity almost
half a century after the described events. Apparently, the reason is that Unix turned out to
be the first operating system rewritten in a high-level language, and C became that
language.
In 1974, Thompson and Ritchie published an article in which they described their
achievements. The PDP-11 was a very popular machine at the time, installed in many
universities and other organizations, so after the article was published there were many
people who wanted to try the new system. At this point in history, the special position of
AT&T played an important role: anti-trust restrictions prevented it from participating in
the computer business, or any business outside of telephony. Therefore, copies of Unix
with source code were made available to everyone on a non-commercial basis. According
to one legend, Ken Thompson signed each copy, recorded on a reel-to-reel tape, with the
words "love, ken" [3]. The next big step was to port Unix to a new architecture. This idea
was proposed by Dennis Ritchey and Stefan Johnson and tested on the Interdata 8/32
machine. As part of this project, Johnson developed a portable C compiler, which was
nearly the first portable compiler in history. The porting of the system was completed in
1977.
The most important contribution to the development of Unix came from researchers
at UC Berkeley. One of the most popular branches of Unix, BSD, now represented by
FreeBSD, NetBSD, OpenBSD, and BSDi, was created there; in fact, the acronym BSD
stands for Berkeley Software Distribution. Unix-related research began there in 1974;
§ 1.2. How to use a computer properly 112
Ken Thompson's lectures at Berkeley between 1975 and 1976 played a role. The first
version of BSD was created in 1977 by Bill Joy.
In 1984, anti-trust restrictions were lifted from AT&T after one of its divisions was
split up; AT&T management began the rapid commercialization of Unix, and the free
distribution of Unix source code was stopped. The following years were marked by
confrontations and exhausting litigation between Unix developers, in particular between
AT&T and BSDi, which tried to continue development on the basis of BSD. The
ambiguities over the legal status of BSD stalled the development of the Unix community
as a whole. Beginning in 1987, work was done in Berkeley to remove AT&T's proprietary
code from the system. The legal disputes were not resolved until 1993, when AT&T sold
its Unix division (Unix Software Labs, USL) to Novell; the latter's lawyers identified
three of the 18,000 (!) files in dispute and reached a settlement with UC Berkeley that
resolved the dispute.
While the Unix developers were busy squabbling, the market was flooded with cheap
Intel-based computers and Microsoft operating systems. The Intel 80386 processor,
introduced in 1986, was suitable for Unix; there were also attempts to port BSD to the
i386 platform, but (not least because of legal problems) nothing was heard of these
developments until early 1992.
Another interesting line of events can be traced back to 1984, when Richard Stallman
founded the Free Software Foundation and issued a corresponding ideological manifesto.
The nascent social movement set itself the goal of creating a free operating system to
begin with. Reportedly, it was Stallman who in 1987 convinced researchers at Berkeley
to purge BSD of code owned by AT&T. Stallman's supporters managed to create a
substantial number of free software tools, but without a completely free OS kernel, the
goal was still far off. This did not change until the early 1990s. In 1991, Linus Torvalds,
a Finnish student, began work on a Unix-like operating system kernel for the i386
platform, without using code from other operating systems.
According to Torvalds himself, his creation was first conceived as a terminal emulator for
remote access to a university computer. The corresponding Minix program did not satisfy him.
To understand the i386 device at the same time, Torvalds decided to write his terminal emulator
as an operating system-independent program. The terminal emulator assumes two counter data
streams, for processing of which Torvalds made a CPU time scheduler, which actually does the
same thing as schedulers in kernels of multitasking operating systems. Later the author needed
to pump files, so the terminal emulator was provided with a disk drive driver; eventually the author
was surprised to find himself writing an operating system [4].
Torvalds published his interim results openly on the Internet, which allowed first
dozens and then hundreds of volunteers to join the development.
The new operating system was named Linux after its creator. It is noteworthy that this
name was given to the system by one of the project's third-party participants. Torvalds
himself planned to name the system "Freax". The first publicly available code (version
0.01) appeared in 1991, the first official version (1.0) - in 1994, the second - in 1996. As
Linus Torvalds himself points out, the court war between AT&T and the University of
Berkeley, which prevented the distribution of BSD on i386, played an important role in
Linux's meteoric rise. Linux got a big head start, eventually leaving BSD in second place:
nowadays BSD systems are less common, although they are still actively used. Torvalds'
§ 1.2. How to use a computer properly 113
kernel solved a major problem for the Richard Stallman-led social movement: a
completely free operating system was finally available. Moreover, Torvalds decided to
use Stallman's GNU GPL license for the kernel, so Stallman and his associates only had
to declare that they had achieved their goal.
The current Linux kernel source code includes code written by tens of thousands of
people. One of the consequences of this is that it is fundamentally impossible to "buy"
Linux: the kernel, as a copyrighted work, has too many copyright holders to be able to
talk seriously about making any kind of agreement with all of them. The only license
under which the Linux kernel can be used is the GNU GPL v.2 license, originally (at
Stallman's suggestion) adopted for the kernel source code; one of the features of this
license is that every programmer who makes a copyright contribution to the kernel accepts
the terms of the GNU GPL by the very fact of such a contribution, that is, he agrees to
make the results of his work available to all comers under its terms.
Nowadays, the trademark "Unix" is not used to name specific operating systems.
Instead, it refers to Unix-like operating systems, which form a whole family. The most
popular are Linux, represented by several hundred variants of distributions from various
vendors, and (with some margin) FreeBSD. Both systems are freely distributed. In
addition, we should mention commercial systems of the Unix family, among which the
most famous are SunOS/Solaris and AIX.
After nearly half a century of history, Unix - no longer as a specific operating system,
but as a general approach to building them - does not look obsolete at all, although it has
undergone virtually no revolutionary changes since the mid-1970s. Even the creation of
the X Window graphical add-on did not significantly change the fundamentals of Unix.
Note that the notorious Android is nothing more than Linux with its own (i.e., Google's
custom-made for Android) graphical shell; the same is found on Apple computers:
MacOS X is a descendant system of BSD.

1.2.3. Unix on a home machine


Among modern versions of Unix systems, many are end-user-oriented, so you can
install them without any problems by following the step-by-step instructions that are
abundant on the Internet. There you can also download the systems themselves, and the
programs that run on them, for almost any purpose: one of the key differences between
the Unix world and the world of "commercial" systems is that no one will ask you to pay
money for mass-produced software; you usually only have to pay money if the program
is custom-made for you (the money is for the work of the programmers, not the program
itself), but such things are usually for corporate users, not individuals.
There are really only two main options for choosing a system: Linux (in all the
splendor of the hundreds of distributions available) or something from the BSD family
(FreeBSD, OpenBSD, NetBSD). We will take the liberty of recommending Linux as the
first system, any distribution will do; if you want, you can try to install something from
the BSD family later, when you know what you are doing.
The question of where to put a Unix system may immediately arise. Of course, if you
are used to using other systems, it will be difficult to abandon them all at once, but it is
not necessary; the transition to Unix can be made smoothly and gradually.
§ 1.2. How to use a computer properly 114
The easiest way to solve the problem is to install Unix on a standalone computer; this
is a good option if you have an "obsolete" computer that you stopped using when you
bought a newer model but didn't sell it or didn't want to for some reason. You may be
surprised by how undemanding Unix systems are to hardware characteristics: a computer
that is already ten years old and that you thought was worthless will fly under the same
Linux. Professionals have no problem installing Linux on Pentium-1 class machines
manufactured in the mid-1990s, although we probably won't advise you to do so: many
modern programs are not designed for this, so you have to use their old, unsupported
versions. However, it is unlikely that you will find a serviceable Pentium-1; well, anything
newer than that will be fine for you. If you don't have a suitable computer in your stock,
you can try to buy such a machine from hand at some flea market on the Internet; as a
rule, they want very little money for them, a kilogram of sausage may cost more.
If you don't have a separate computer, you can install Linux on one of the disks of
your existing machine. In principle, Linux can be installed on any logical disk, but it will
certainly take up the whole disk, because the file system format of Linux is completely
different from that of Windows. If your computer's hard disk is divided into logical
partitions ("C:", "D: " and so on), you can select any of them, make copies of all the
files you need, delete that logical disk (it is easier to do this with Windows tools, although
you can do it during Linux installation), and then install Linux on the free space. If you
have software that can resize existing logical disks, it will be easier to shrink your logical
disk so that there will be some unused space on your hard disk. Approximately 10-15 Gb
will be more than enough.
From the very beginning, it is better to focus on "lightweight" window managers, such
as IceWM, Blackbox, Fluxbox and others. The author of these lines uses the ancient and
rather ascetic fvwm2 on all his computers, which he started with when he switched to
Linux in 1994; however, he would probably not recommend it to the reader. What should
be avoided as much as possible is the use of "heavy" environments, especially KDE and
GNOME; after all, you need an operating environment to run programs in it, and the
creators of "heavy" windows seem to forget this: their products themselves eat up the
lion's share of system resources. Moreover, the so-called "desktop metaphor", which
means drawing icons and folders, is a phenomenon obviously alien to the Unix world; it
was implemented for Unix systems to make it easier to convince end users to switch to
Unix, but for our purposes the use of desktop metaphor is definitely harmful. It is likely
that your chosen Linux distribution will by default run some variant of the "desktop
environment" (a term that unites graphical shells that implement desktop metaphor and
distinguishes them from simple window managers); try to change it to a simple "window"
as soon as possible, especially since your distribution probably provides packages with
them - you just need to install them.
Whether you are installing your system yourself or you have someone helping you,
you should immediately find how to run a terminal emulator; it can be one of the programs
xterm, konsole, Terminal and others. Usually, the default size of the terminal
emulator window is 24 lines of 80 characters; you should not change the number of
characters per line, it is optimal, but the size of one character (and, consequently, the size
of the entire terminal window) should be adjusted so that the window, while still 80
§ 1.2. How to use a computer properly 115
characters wide, covers most of your screen. It is recommended to immediately set the
background color to black and gray (not white, but gray) letters; this is less tiring for your
eyes.
Make sure that the terminal window can be accessed effortlessly by pressing a key
combination or by clicking on an icon located somewhere visible; it is too time-
consuming to invoke the terminal window through hierarchical menus. It is desirable to
set up your operating environment so that the terminal window opens immediately with
the desired font size. If you don't know how to do it, make sure you figure it out, look for
the corresponding instructions on the Internet (there are probably some), find some forum
where you can get help. The comfort of the operating environment is very important; your
self-learning process may fail just because you are too lazy to set up the environment
correctly from the beginning.
Immediately install programs for everyday use - a browser (which will probably be
installed during the installation process), LibreOffice for "office" file formats, atril
for reading PDF files, eog for viewing images, mplayer and/or vlc for audio
and video playback. Think about what else you normally do on your Windows computer
and find out how the same is done in Linux; you can be sure it's all possible.
While studying the material in this book, you will also need programmer text editors
(it is better to install vim, joe, and nano so that they are always "at hand"), a Free
Pascal compiler (the corresponding package may be called fpc or fp-compiler;
you don't need to install the integrated environment), the NASM assembler (the package
is usually called nasm), the C and C+-+ compiler called gcc (make sure you install the
C+-+ part of the compiler too, sometimes they are in different packages; however, C++
will not come soon enough), the make build system (just in case, note that on BSD
systems this version of the builder is called gmake), and the gdb debugger. After
installing all this, you are ready for further work.
From the very beginning, be sure to create a non-privileged user account for normal
work and do all work under it. Daily work with root user rights (i.e. system
administrator) is categorically unacceptable, some programs will even refuse to start.
You should log in with administrator rights only when necessary (for example, to install
additional programs, adjust system settings, etc.), and it is better to do it from the text
console rather than from the graphical shell; to switch to the text console press Ctrl-Alt-
F1, to switch back to the X Window shell - Alt-F7, Alt-F8 or Alt-F9, depending on the
configuration of your system. Note that using the sudo command to elevate your powers
from user to administrator is also highly undesirable, despite the fact that various manuals
(especially those oriented to the Ubuntu distribution) suggest doing exactly that.
Experienced Unix users do not allow sudo on the system at all.

1.2.4. First session in the computer lab


While it is easiest to install one of the Linux distributions on your home computer, in
a variety of computer labs you may have to deal with the same Linux operating system,
or perhaps another Unix system such as FreeBSD, and in some cases the operating system
you need may be running either directly on the machine you are working with or on a
§ 1.2. How to use a computer properly 116
shared server that you will need to access remotely. From a user's point of view, the
differences between these options are not too great.

If you are in a computer lab, you will probably receive brief instructions on how to
log in from your teacher or the computer lab system administrator, along with your login
name (login) and password (password). So, enter your login name and password. If you
make a mistake, the system will display a Login incorrect message, which can
mean either a typo in the login name or an incorrect password. Note that the case of letters
is important in both cases, so the reason why the system does not accept the password
could be, for example, an accidentally pressed CapsLock key.
You will need a command line interpreter to work with the system. When using
remote terminal access (for example, using the putty program), the command line is
the only means of working with the system that is available to you. An invitation will
appear as soon as you enter the correct name and password. If you are working in a Unix
terminal class and logging in using a text console, you will also be prompted immediately
after entering a valid username and password, but in this case you have the option of
running one of the possible graphical window interfaces. This is more convenient if only
because you can open several windows at the same time. To start the X Window graphical
shell, you need to give the command startx . It is also possible to log in to the system
42

using the GUI at once; this option is possible both when working with a local machine
and when using remote access. Once you have a working graphical shell, you should run
one or more instances of xterm or some equivalent; they look like graphical windows
in which the command prompt runs.
Your first action on the system, unless it is your personal computer, should be to
change your password. Depending on the system configuration, this may require the
passwd command or (in rare cases) some other command; your system administrator
will probably tell you. Type this command (without parameters). The system will ask you
first for your old password, then (twice) for your new password. Note that nothing is
displayed on the screen when you enter the password. The password you come up with
must be at least eight characters long and must contain upper and lower case Latin letters,
numbers and punctuation marks. The password should not be based on a natural language
word or your login name. However, you should come up with a password that you can
easily remember. The easiest way is to take some memorable phrase containing
punctuation marks and numerals and build a password based on it (numerals are
transferred by numbers, the first letters are taken from other words, and the letters
corresponding to nouns are capitalized, the rest are lowercase). For example, the proverb
"One with a biped, seven with a spoon" can be used to "make" the password
"1sS,7sL.". One last thing: do not share your password with anyone and never let
anyone and never let anyone work in the system under your name. Phrases like "I don't
care", "I trust my friends" or "I don't have anything secret there anyway" are
amateurishness and thoughtlessness in the worst sense of the word, and you will realize

Some systems may require a different command; contact your system administrator for information.
42
§ 1.2. How to use a computer properly 117
it as you gain experience.

1.2.5. Directory Tree. Working with files


Here and later, we will often need examples of dialog with the command line
interpreter, that is, fragments of text that include commands typed by the user and what
those commands printed in response. In such cases, before the commands themselves, we
will insert for clarity a prompt - a small line that the command line interpreter itself prints
to show that it is waiting for a command to be entered. Depending on your settings, the
prompt can look quite different; in our examples, it will contain the username, the short
name of the computer we are running on, the current directory, and the traditional $
symbol to indicate that you are running as a simple user (for an administrator, this symbol
is replaced by #). The username of the author of this book is avst, but there will be
other users in the examples; we will assume that our computer is simply called host.
The user's home directory is briefly indicated by the ~ symbol, so for example, the
prompt avst@host:~$ means that we are working on the host computer
under the avst account in the home directory, and if we go into the work subdirectory,
the prompt would be: avst@host:~/work$.
The directory system in Unix OS differs significantly from what MS-DOS and
Windows users are used to, and the most noticeable differences at first glance are the
absence of letters denoting devices (something like A:, C:, etc.), as well as the fact that
directory names in Unix OS are separated by a regular slash (/) instead of a back slash.
After logging in, you will find yourself in your home directory . The home directory
is where your personal files are stored. To find out the name (path) of the current
directory, type the pwd command:

lizzie@host:~$ pwd
/home/lizzie

You can find out which files are in the current directory by using the ls command:

lizzie@host:~$ ls
work tmp

Unix file names can contain any number of dots in any position, i.e., for example,
a.b..c...d....e is a perfectly valid file name. Names starting with a dot
correspond to "invisible" files; the ls command does not show them unless
specifically requested. To see all files, including invisible ones, add the -a parameter:

lizzie@host:~$ ls -a
. .. .bash_history work tmp

Some of the names shown may correspond to subdirectories of the current directory,
others may have special values. To make it easier to distinguish files by type, you can use
the -F checkbox:
§ 1.2. How to use a computer properly 118
lizzie@host:~$ ls -aF
./ ../ .bash_history work/ tmp/

Now we see that all names except .bash_history correspond to directories. Note
that "." is a reference to the current directory itself, and "..." is a reference
to the directory containing the current directory (in our
example, it's /home/avst. - is a reference to the directory
containing the current directory (in our example it is /home/avst).
You can move to a different directory with the cd command:
lizzie@host:~$ pwd
/home/lizzie
lizzie@host:~$ cd tmp
lizzie@host:~/tmp$ pwd
/home/lizzie/tmp
lizzie@host:~/tmp$ cd .
lizzie@host:~$ pwd
/home/lizzie
lizzie@host:~$ cd /usr/include
lizzie@host:/usr/include$ pwd
/usr/include
lizzie@host:/$ cd /
lizzie@host:/$ pwd
/
lizzie@host:/$ cd
lizzie@host:~$ pwd
/home/lizzie

The last example shows that the cd command without specifying a directory makes the
user's home directory current, as it was immediately after logging in.
Table 1.1. Commands for working with files
cp file copying
mv renaming or moving a file
rm file deletion
directory creation
mkdir
rmdir directory deletion
touch creating a file or setting a new modification time
fications
less View the contents of a paging file

It is important to understand that in Unix OS the current directory is different for


each running program (so-called process), so if you, for example, open two terminal
emulators or more, the current directory in each of them will change independently of the
others. Moreover, when there are many different programs running in one terminal, each
of them can change its own current directory as it wants without affecting other programs,
§ 1.2. How to use a computer properly 119
but you will have to start programming to be sure.
The basic commands for working with files are listed in Table 1.1. For example, the
command cp file1.txt file2.txt will create a copy of file1.txt named
file2.txt; the command rm oldfile will delete a file named oldfile. Most
commands accept additional checkbox-options beginning with a "-" sign. For example,
the command rm -r the_dir allows you to delete the_dir directory along with
all its contents.
The system allows you to use different kinds of file names. An absolute filename
uniquely identifies a file in the system and is independent of the current directory; in Unix,
such a filename always starts with the "/" character, which denotes the root directory. To
give examples, note that the root directory usually includes a first-level home
directory, which is traditionally where the personal directories of users on the
system are located. If there is a user named vasya on the system, this is usually the
name of his personal directory; the absolute name of such a directory would be the string
/home/vasya. If we create in it a third-level directory photos, in which we place
the file mars.jpg, the absolute name of this file will look like this:
/home/vasya/photos/mars.jpg.
If our current directory is /home/vasya/photos, we can refer to the same
file by a short filename that does not contain directory names; in this case it is
mars.jpg.
When in some other directory, we can use a relative file name that contains directory
names but does not start with a "/"; such a name will start from the current directory.
Relative filenames often use the "..." (two dots) symbol to denote the parent directory,
that is, the directory a level above the current directory. For example, if we are in the
/home/vasya directory, we can refer to the same file by specifying the string
photos/mars.jpg as its name; if our current directory is /home/anya, we
will need the path ../vasya/photos/mars.jpg; if we are in the
/home/anya/work/progs directory, the path will be longer:
../../../vasya/photos/mars.jpg. The parent directory symbol can also
be used by itself, without combining it with other directory names; for example, being in
the /home/vasya/photos/milan directory, we could "reach" the same
file through the name ../mars.jpg.
The system allows rather meaningless at first glance combinations like

work/progs/../../work/../../vasya/photos/../photos/mars.jpg

or

photos/../photos/../photos/../photos/mars.jpg

Obviously, these strange paths can be written shorter and clearer:


../vasya/photos/mars.jpg and photos/mars.jpg. Strange as it may
seem, in some cases when writing programs the ability of the operating system to
successfully handle such names turns out to be convenient, but describing such cases
§ 1.2. How to use a computer properly 120
would be too long.
Moreover, the system allows you to use a single dot in complex names; such a file
name is in every directory and refers to the directory itself, which at first glance makes its
use meaningless: for example,
./././././photos/./././././././mars.jpg is exactly the same as
just photos/mars.jpg. We will soon encounter situations where "a link to the
directory itself" applies and turns out to be meaningful.
A complex file name containing directories and "/" characters is often referred to as
a file path. When working with files, wherever a file name is implied, a complex path,
absolute or relative, can be specified, so the term "file name" is often abandoned in favor
of "file path" or even "file path", although this is, of course, jargon resulting from a too
literal translation of the English file path.
From the very beginning, we will give the reader one strong piece of advice: never
use Russian letters, spaces and all kinds of punctuation marks in file names, except
for the period and the underscore. Also, if you really want to, you can

use the minus sign, but never start a file name with it, because almost all commands
working with files treat words starting with minus as special keys (and then try to do
something with such a file; in fact, you can do anything with it, but you need to know and
remember how). All the other "tricky" characters in filenames can also cause problems
on the fly; all of these problems are actually quite easy to solve, but avoiding a problem
is always easier than solving it.

1.2.6. Command and its parameters


We have already mentioned (see page 77) that in dialog mode, the command
interpreter issues a prompt and waits for the user to enter a string. This string consists of
one command in the simplest case, but it can contain more; the easiest way to accomplish
this is to separate the commands with a semicolon, so that the interpreter will execute one
command first, then the other:

lizzie@host:~$ ls -a ; pwd ; echo abrakadabra


. .. .bash_history work tmp
/home/lizzie
abrakadabra
lizzie@host:~$

Later we will consider a number of constructions that imply several commands in one
line, the semicolon is interesting only because it is the simplest among them. It is
important to realize that the ";" symbol is not included in either the first command or the
second.
Each command consists of words, the first of which the interpreter considers to be the
name of the command, the rest of which are its arguments; in our example, ls, pwd,
and echo are the command names, -a is the argument we used to ask ls
to show "invisible" files, and the word abrakadabra is the argument of
§ 1.2. How to use a computer properly 121
echo, which it simply printed (this command is designed to print its arguments). In
the previous paragraph, we used arguments to specify file and directory names;
commands and programs can give their command-line arguments a wide variety of
meanings, from the interpreter's point of view, all of these arguments are nothing more
than strings.
Since we are talking about words, we can guess that the space character is a bit special
from the interpreter's point of view - it is used to separate words from each other; the tab
character can also theoretically play the same role, but when working with modern
command interpreters in dialog mode, the tab is usually used for something else - when
the interpreter receives this character, it tries to complete the word we enter for us (we
will discuss this possibility in detail in §1.2.8). The number of spaces between two words
can be arbitrary, it doesn't affect anything; you can see this with the example of two echo
commands:

lizzie@host:~$ echo abra kadabra


abra kadabra
lizzie@host:~$ echo abrakadabra
abra kadabra

As we can see, inserting extra spaces doesn't change anything. But what if we need a
parameter that contains spaces - for example, if you encounter a file that has a space in its
name? Even if you strictly follow the advice we gave at the end of the previous paragraph
and never use spaces in filenames, other computer users (especially those who are used
to Windows and don't know that there is anything else in the world) are often not so
careful, so you may be given a file with something with spaces in its name on a flash
drive, it may be sent to you in the mail, it may be the name of a file downloaded from an
Internet site, and so on.
The command line interpreter provides three basic ways to deprive a character of its
special role: to "shield" it with a backslash "\", to enclose it in apostrophes, or to enclose
it in double quotes. For example, if a friend with whom you recently traveled to Paris
gave you a flash drive with photos, and you discovered the Photos from Paris
catalog, it would be best to immediately rename the catalog by replacing spaces with
underscores; you can do this with one of the following commands:

Photos from Paris Photos_from_Paris mv 'Photos from Paris'


Photos_from_Paris mv 'Photos from Paris' Photos_from_Paris

Similarly, if you want to use the echo command to print words separated by more than
one space, you can arrange that with quotes too:

lizzie@host:~$ echo abrakadabra


abra kadabra
lizzie@host:~$ echo " abrakadabra "
abra kadabra

Screening can also achieve this, but it is inconvenient: you have to put a backslash before
§ 1.2. How to use a computer properly 122
each space. It should be noted that such methods deprive not only the space, but also other
"tricky" characters of special meaning. In particular, the semicolon usually separates one
command from another, but if you need it by itself, as an ordinary character, any of the
three methods will do:

lizzie@host:~$ echo \; ';;;;' ";;;;" ; ;;;; ;;;;

Note that the escape, apostrophe, and double-quote characters themselves disappear once
they have accomplished their mission, so that the programs and commands we run do not
see them (unless we make them stand for themselves).
Within apostrophes, only the apostrophe itself has a special meaning - it is considered
as a closing character, and it is impossible to deprive it of this role, but all other characters,
including quotes, and backslash, and in general anything - denote themselves and do not
have any special meaning:

lizzie@host:~$ echo ',?*$#@!()&\/';'


,?*$#@!()&\/";

If you need an apostrophe character as itself, there are two options: either escape it or use
double quotes:

lizzie@host:~$ echo I\'m fine "I'm fine"


I'm fine I'm fine

Double quotation marks differ from apostrophes in that they deprive not all special
characters of special meaning. In particular, the escape character works inside double
quotes:

lizzie@host:~$ echo "\\\\\\ "


"\

Looking ahead, we will say that inside the double quotes also retain a special meaning of
the symbols "!", "'" (reverse apostrophe) and "$".
It is very important to realize that apostrophes and quotation marks only change the
meaning of the characters inside, but do not separate those characters into separate words;
you can even use different kinds of quotation marks in the same word:

lizzie@host:~$ echo "abra "schwabra'kadabra


abraschwabrakadabra

But with their help you can create an empty word by putting two double quotes or two
apostrophes in a row ("" or '').

1.2.7. File name templates


In many cases it is convenient to perform a certain operation on several files at once.
For this purpose, the command interpreter supports filename substitution according to a
specified template. As a template, the interpreter considers any word not enclosed in
§ 1.2. How to use a computer properly 123
quotes or apostrophes, containing at least one "*" or "?" character, square or curly
brackets. The question mark in the template is considered to correspond to one arbitrary
character, and the asterisk is considered to correspond to an arbitrary chain of characters
(including, possibly, an empty chain); we will explain the meaning of brackets later. The
other characters in the template denote themselves. When the interpreter encounters such
a template on the command line, it replaces it with a list of all the filenames corresponding
to the template, i.e. in general, the word-template can be replaced by a sequence of words
of arbitrary length: one word, ten, one hundred, one thousand, whatever - depending on
how many files correspond to the template. For example, instead of a pattern consisting
of a single asterisk (or any number of asterisks), the interpreter will substitute a list of all
files in the current directory; instead of the "????*" pattern, all filenames consisting of
at least three characters will be substituted, and instead of "???" - file names
consisting of exactly three characters. The "*.txt" template will be replaced by a list
of all filenames with the suffix .txt, and the img_????.jpg template will be
43

replaced by names like img_2578.jpg, img_cool.jpg, and so on.


You can use templates in any commands that use lists of file names as arguments. For
example, the command rm *~ will delete all files in the current directory whose name
has a tilde as the last character, the command "ls /etc/*.conf" will show a
list of files with the .conf suffix in the /etc directory, the command "cp
files/* /mnt/flash" will copy all files from the files subdirectory in the
current directory to the /mnt/flash directory, and so on. Generally speaking,
the command we make work with patterns does not have to assume files specifically; for
example, the command "echo *" will print a list of files in the current directory;
meanwhile, the echo command itself has nothing to do with files and
does not work with them: it prints its command line arguments.
Square brackets in the template allow to designate any symbol from the given set; for
example, the template "img_27[234][0123456789].jpg" corresponds to the
names img_2720.jpg, img_2721.jpg, ..., img_2734.jpg, ...,
img_2749.jpg and none more. Of course, in real life you can take advantage of the
fact that most likely there is not a single file in the directory that has something else in its
name instead of the fourth digit, and use the shorter pattern "img_27[234]?.jpg".
In contrast, the exclamation point symbol allows you to designate any character other
than those listed; for example, "[!_]*.s" matches any filename with the suffix ".s"
except those beginning with an underscore character.
The curly brackets in templates denote any chain of characters from the explicitly
listed ones, the chains themselves are separated by a comma. For example, the
"*.{jpg,png,gif}" pattern matches all files in the current directory that have the
suffixes .jpg, .png, or .gif.
If no filename matches the pattern, the interpreter leaves the pattern unchanged,

The reader accustomed to the traditional terminology of the Windows world may be surprised by the
43

use of the term "suffix" instead of "extension". The point here is that in MS-DOS and early Windows, the
"extension" of a file was intrinsically linked to its type and was treated separately from the name, whereas
in Unix systems the ending of the file name never played such a role and was always just part of the name.
§ 1.2. How to use a computer properly 124
that is, it passes the word to the command as if it were not a pattern at all. This feature
should be used with caution; most command interpreters other than the Bourne Shell do
not have this feature.

1.2.8. Command history and autodescription of file names


Novice command-line users often assume that every command has to be written by
hand, letter by letter; that's the way it was in command prompts used forty years ago, but
those days are long gone.
First of all, modern command interpreters can add file names; this feature is enabled
by pressing the Tab key. When you start writing a file name, you can press Tab, and if
there is only one file on the disk whose name begins with the letters you have already
entered, the interpreter will complete its name for you. If there is more than one matching
file, nothing visible happens when you press Tab, but you can immediately press it a
second time and the interpreter will give you a complete list of matching files; a glance at
the list will in most cases tell you how many letters you need to type before you press Tab
again. Some interpreters know how to add not only filenames but also other parameters
depending on the command you are about to give. In general, this feature alone, called
autocompletion, can save you more than half of your keystrokes.

The second good feature of the interpreter, which makes life much easier for the user,
is that the interpreter remembers the history of the commands you have entered, and it
saves this history in a file when you finish a session, so you can use your commands the
next day or a week later. If the command you need was given recently, you can return it
to the screen by pressing the up-arrow key; if you accidentally skip over the desired
command during this "upward movement", you can go back by pressing the down-arrow,
which is quite natural. You can edit any command from the saved history using the
familiar left and right arrows, Home, End and Backspace keys in their usual role.
The entire saved history can be viewed using the history command; in most cases
it is more convenient to combine it with the less pager, i.e. give the command
history | less. Here you will see that each of the commands memorized by the
interpreter has a number; you can repeat any of the old commands, knowing its number,
by using an exclamation mark; for example, !137 will execute the command stored in
history as number 137. Note that "!!!" indicates the last command entered, and
"!!:0", "!:1", etc. - individual words from it; an individual word can be extracted not
only from the last command - for example, !137:2 indicates the second word
from command number 137; "!abc" indicates the last command starting with the string
abc, and individual words can be extracted here too.
Finally, you can search for a substring in the history. To do this, press Ctrl-R (from
the word reverse) and start typing your substring. As you type letters, the interpreter will
find more and more old commands containing the typed substring. If you press Ctrl-R
again, you will get the next (i.e., even older) command containing the same substring.
If you get confused while editing a command, searching the history, etc., you can reset
the input string at any time by pressing Ctrl-C; this is much faster than, for example,
deleting all the entered characters by pressing Backspace.
§ 1.2. How to use a computer properly 125
Once you start to actively use the tools listed here, you will soon see that your labor
costs for typing commands have been reduced by at least twenty times. Do not neglect
these features! It will take you about five minutes to master them, and the time saved will
be hundreds of hours.

1.2.9. Task management


Not all commands are executed instantly, as pwd, cd, and ls did in our examples;
it is often necessary to tell a running program that it is time to terminate, and if it does not
understand nicely, to terminate it forcibly.
The first thing to remember when working at the Unix command line is that many
programs involve reading data from the keyboard (strictly speaking, "from the standard
input stream") until there is an end-of-file situation. Beginners often have some
difficulties with this point: how a file can end is more or less clear, but how can the same
thing happen with the keyboard?! But actually there is nothing complicated here. Unix
uses the generalized term "data stream", which can mean both reading from a file and
inputting data from the keyboard or somewhere else; we will talk about this in detail in
the future. One of the fundamental properties of a data stream is its ability to terminate.
Of course, the keyboard itself cannot run out, but the user entering data has every
right to decide that he has already entered everything he wants. To inform the active
program (i.e. the program that is currently reading the information entered from the
keyboard) about it, it is necessary to press the Ctrl-D key combination; in this case
the operating system (to be precise, the terminal driver) will arrange the "end of file"
situation in the corresponding input stream, and although our keyboard seems to have not
gone anywhere, the active program will know for sure that its input stream of information
has run out. By the way, the command interpreter that dialogs with us also handles the
end-of-file situation correctly, so if you want to terminate your session in one of the
windows with the command line, the most correct and fast way to do it is to press Ctrl-
D; note at the same time that closing a terminal window using the window manager's
tools (any double-clicks or through menus) is, on the contrary, the most incorrect way;
you should never do this with terminals, because the programs running in the terminal
may not disappear.
The ability to simulate the "end of file" situation on the keyboard will be needed many
times in the future, so remember: "end of file" on the keyboard is simulated by
pressing Ctrl-D.
Of course, an active program does not have to terminate when the end of a file is
detected; moreover, it may not read anything from its standard input stream at all, so that
it simply does not know that the end-of-file situation has occurred. In the end, the program
may simply "hang" because of some error. In all such cases, you need to know how to
terminate the program without waiting for it to terminate itself.
The simplest and most common way to force termination of an active program is to
press Ctrl-C; in most cases this will help. Unfortunately, there are times when Ctrl-
C has no effect; in this case you can try pressing Ctrl-\, in some cases this will stop
the active program, but after that (depending on system settings) a file named core (in
§ 1.2. How to use a computer properly 126
FreeBSD, a file with the suffix .core) may appear in the current directory, which you
should delete as soon as you see it - it takes up a lot of space and is still completely useless
to you.
What you should definitely not do is use the Ctrl-Z combination until you
understand what it actually does in Unix. We will discuss this issue in detail later.
Before proceeding to the use of "heavy artillery", it should be remembered that terminals
usually allow you to temporarily stop output (for example, to have time to read the necessary text
fragment and prevent it from "scrolling" off the screen). Such a "pause" is activated by Ctrl-S
and deactivated by pressing Ctrl-Q. Beginning Unix operators, accustomed to key
combinations in graphical interfaces, often press Ctrl-S accidentally (trying, for example, to
save a file in a text editor and forgetting that this is not Windows). If you think your terminal is
hopelessly hung, try pressing Ctrl-Q just in case: this will help if the cause of the hang
is an accidentally pressed "pause", and in other cases nothing bad will happen anyway.
Unix beginners often make another typical mistake: not knowing how to deal with a
program running in a terminal window, they simply close the window itself, for example,
by double-clicking "where it should be". This strategy is akin to sticking your head in the
sand: you can't see the hung program anymore, but it doesn't mean that it has disappeared.
On the contrary, if regular methods did not help, closing the terminal window will not
help even more: the running task continues to work, and it may waste CPU time, memory,
and in some cases do something else.
If neither Ctrl-C, Ctrl-\, nor Ctrl-Q helped, then in order to force the task
to complete, we have to understand (at least briefly) the concept of a process and how to
deal with processes. In the very first approximation, a "process" is a program that is
running and currently executing on the system; in other words, when you start any
program, a process appears on the system, and when the program ends, the corresponding
process disappears (terminates). In fact, it's a bit more complicated than that, e.g., the
program you run can, at its own discretion, spawn several other processes, etc.; all of this
will be discussed when the time comes. For now, we are concerned with a purely
pragmatic question: if we have started a program that has created a process in the system,
how can we find and destroy this process?
Note at once that all processes have unique numbers in the system, thanks to which
they can be distinguished from each other. The list of processes currently running can be
obtained with the ps command:

avst@host:~$ ps
PID TTYTIMECMD
2199 pts/500 :00: 00bash.
2241 pts/500 :00: 00ps.

As you can see, the default command only gives you a list of processes running in that
particular session. Unfortunately, the flags of the ps command vary greatly from version
to version (particularly for *BSD and Linux). You should consult the documentation for
that particular OS for detailed information; here we will limit ourselves to noting that the
"ps ax" command will list all existing processes, and the "ps axu" command will
§ 1.2. How to use a computer properly 127
additionally show information about process owners . 44

In some cases, the top program, which works interactively, may be useful. It
displays a list of the most active processes on the screen, updating it once a second. To
exit the top program, enter the letter q.
You can remove a process by means of a so-called signal; note that this is what
happens when you press the Ctrl-C and Ctrl-\ combinations mentioned above;
the terminal driver sends signals to the process. Each signal has its own number, name
and some predefined role; we can't say anything more about signals, the concept of
"sending a signal to the process" can't be explained without getting into the maze, but we
don't need it now. It is enough to know, firstly, that a process can be sent a signal with a
given number (or name); secondly, that a process can decide how to react to most signals,
including not reacting to them at all; and thirdly, that there are such signals over which
processes have no control; this allows you to kill a process for sure.
Ctrl-C and Ctrl-\ send the SIGINT and SIGQUIT signals to the active
process, respectively (for clarity, note that they are numbered 2 and 3, but you don't need
to remember that). Usually both of these signals will cause the process to terminate
immediately; if not, it is likely that your process has intercepted them and you will have
to use the non-interceptable SIGKILL signal (#9) to remove it. The kill command
allows you to send an arbitrary signal to a process, but before you use it, you need to know
the number of the process you want to kill. To do this, you usually open another terminal
window and give the command ps ax in it; the list that appears will show both process
numbers and their command lines, which usually allows you to find out the number of the
process you want to kill. For example, if you have written a prog program, run it, and
it hangs so badly that no combinations help, you are likely to find a line like this near the
end of the ps ax command:

2763 pts/6 R+ 0:06 ./prog

The line should be identified by the program name (in this case ./prog) and the process
number at the beginning of the line (here it is 2763). Knowing this number, we can use
the kill command, but remember that by default it sends a SIGTERM signal (#15) to
the specified process, which can also be intercepted by the process. You can specify a
different signal either by number or by name (TERM, KILL, INT, etc.). The following
two commands are equivalent; both send the SIGKILL signal to process 2763:

kill -9 2763
kill -KILL 2763

Very rarely the process does not disappear even after this. This can only happen in two cases.
First, it can be a so-called zombie process, which has actually already terminated, but remains
in the system, because its immediate ancestor - the one who started it - for some reason does
not hurry to ask the operating system for information about the circumstances of its descendant's

This is true for Linux and FreeBSD. In other operating systems, such as SunOS/Solaris, the ps
44

command keys have a completely different meaning.


§ 1.2. How to use a computer properly 128
termination. You can't kill a zombie - it is already dead, it can only help by destroying its ancestor,
then the zombie itself will disappear. However, a zombie does not consume system resources,
it only takes up space in the process table, which is unpleasant but not very scary.
The second situation is much worse. The process may have made a system call, i.e.
requested some service from the operating system, during which the system put it into an
"uninterrupted sleep" state, in which it remained. Usually the system puts processes into this
state for a fraction of a second; if a process remains in this state for a long time - in most cases
it means serious problems with your computer, for example, a corrupted (physically!) disk.
Unfortunately, nothing can help here; it is impossible to force the process out of this state.
However, if your disk has started to "crash", you probably don't need to worry about processes
anymore.
In modern conditions it is possible to drive the process into this state without waiting for
serious hardware problems: it is enough, for example, to plug a flash fob into the computer, start
copying a large file to it, and then pull the fob out without unmounting it. Most likely, the process
that copied the file will be in uninterrupted sleep. Actually, to pull out of the computer flash drive,
which is in active work - it is an extremely bad idea, but at least it is not as terrible as a dying
hard disk.

You can find out which of the two situations is the case from the output of the same ps ax
command. The zombie process is marked with the letter Z in the STAT column and the word
"defunct" in the command line column, like this:

3159 pts/6 Z+ 0:00 [prog] <defunct>

A process in an "uninterrupted sleep" state can be distinguished by the letter D in the STAT
field:

4711 pts/6 D 0:01 badblocks /dev/sdc1

If any process stays in this state for even a few seconds, it's a reason to check if everything is
okay with your computer.

1.2.10. Running in the background


Some programs run for long periods of time without requiring user interaction via
standard I/O streams. During the execution of such programs, it is convenient to be able
to keep giving commands to the command interpreter so as not to waste time.
Suppose we needed to update the database for the locate command, which allows
us to find files on the system by part of their name. Normally this database is updated
automatically, but usually it is done at night, and if we are used to shutting down our
computer at night, the data used by the locate program may be quite out of date. The
update is done with the updatedb command, which can take several minutes to
complete. We don't want to wait for it to finish because we could use these minutes to
type in an editor, for example. In order to run the command in the background, you should
append the & symbol to the end of the command, for example:

avst@host:~$ updatedb &


[1] 2437
§ 1.2. How to use a computer properly 129
In response to our command, the system informs us that the job is running in the
background as background task #1, and the number of the running process is 2437. The
current list of running background tasks can be found with the jobs command:

avst@host:~$ jobs
[1]+ Running updatedb &

When the task is completed, the command interpreter will inform us about it. In case of
successful completion, the message will look like this:

[1]+ Done updatedb &

If the program at the end of the program informs the operating system that it does not
consider its execution successful (this happens rarely with updatedb, but much more
often with other programs), the message will have a different form:
[1]+ Exit 1 updatedb &

Finally, if the background process is removed by a signal, the message will be something
like this (for the SIGTERM signal):
[1]+ Terminated updatedb &

When signaling to processes that are background tasks of a particular instance of the shell,
you can refer to process numbers by background task number by appending "%" to the
number. Thus, the command "kill %2" would send a SIGTERM signal to the second
background task. The "%" symbol without a number indicates the last of the background
tasks.
If a task is already running out of the background and we don't want to wait for it to
finish, we can make a regular task a background task. To do this, press Ctrl-Z, as
a result of which the current task will be suspended. Then using the bg (background)
command you can put the suspended task back to execution, but in the background mode.
It is also possible to make any of the background and suspended tasks the current one (i.e.
the one the command interpreter is waiting for). This is done with the fg (foreground)
command.
Remember: the Ctrl-Z combination does not kill the active task, but only
temporarily suspends its execution. This is especially important for those who are used
to working with "console" programs in Windows; there this combination has a completely
different meaning. If you are used to it, it's time to get used to it.
Note that background execution is especially useful when running a windowed
application, such as a web browser, a text editor running in a separate window (e.g.
geany), or just another instance of xterm. When we run such a program, we usually
don't want our command interpreter to wait for it to finish without accepting new
commands from us.
§ 1.2. How to use a computer properly 130
1.2.11. Redirecting I/O streams
In Unix systems, running programs communicate with the outside world through so-
called I/O streams; each such stream allows a sequence of bytes to be received externally
(input) or, conversely, transmitted externally (output), and these bytes may come from
the keyboard, from a file, from a communication channel with another program, from a
hardware device, or from a communication partner through a computer network;
similarly, they may be output to the screen, to a file on disk, to a communication channel,
transmitted to a hardware device, or sent out through a computer network. A program can
handle several I/O streams simultaneously, distinguishing them by numbers; these
numbers are called descriptors.
Virtually all Unix programs follow the convention that an I/O stream with descriptor
0 is a standard input stream, a stream with descriptor 1 is a standard output stream, and a
stream with descriptor 2 is a stream for outputting error messages. When accepting and
passing data through standard streams, most programs make no assumptions about what
a particular stream is actually associated with. This allows the same programs to be used
both for terminal operations and for reading from and/or writing to a file. Command
interpreters, including the classic Bourne Shell, provide facilities for controlling the I/O
of running programs. The symbols <, >, >>, >& and | are used for this purpose
(see Table 1.2).
Unix usually has a less program that allows you to page through the contents of
files using the up arrow, down arrow, PgUp, PgDn, etc. keys for scrolling. The same
program allows you to page through the text submitted to it for standard input. Using the
less program is useful if the information given by any of the programs you run
does not fit on the screen. For example, the command
ls -lR | less

will allow you to view a list of all files in the current directory and all its subdirectories.
Note that many programs output all error messages and warnings to the diagnostic
thread. To view page-by-page messages from such a program (for example, from a C
compiler called gcc), you must issue a command that merges the diagnostic stream with
the standard output stream and sends the less merged result to the program's input:
gcc -Wall -g myprog.c -o myprog 2>&1 | less

If for some reason you are not interested in the information flow produced by some
program, you can redirect it to the /dev/null pseudo-device: anything directed
there simply disappears. For example, the following command will generate a list of all
files
Table 1.2. Examples of I/O redirections
cmdl > filel
run the cmdl program, directing its output to the
file filel; if the file exists, it will be overwritten
from scratch, if it does not exist, it will be created
§ 1.2. How to use a computer properly 131
cmdl >> filel run the cmdl program by writing its output to the end
of the filel file; if the file does not exist, it will
be created.
cmd2 < file2 run the cmd2 program, giving it the contents of
file2 as standard input; if the file does not exist, an
error will occur

cmd3 > filel < file2 run the cmd3 program, redirecting both input and
output
cmdl | cmd2
run cmdl and cmd2 programs simultaneously,
feeding data from the standard output of the first to the
standard input of the second (the so-called pipeline).
cmd4 2> errfile
send a stream of error messages to the errfile file
cmd5 2>&1 | cmd6 merge the standard output and diagnostic streams of
the cmd5 program and direct them to the standard
input of the cmd6 program

on your system, except for those directories it does not have permission to read; all error
messages will be ignored:

ls -l -R / > list.txt 2> /dev/null

Device files, which include /dev/null, are a separate, rather serious topic that we will
postpone a detailed review of until the second volume. For now, we will need only one of these
files in our daily life - this /dev/null file, and we will only need it to send everything
unnecessary there. Just in case, we should keep in mind that all device files are located in the
/dev directory, the name of which is derived from the English devices (literally "devices").

1.2.12. Text editors


There are several hundred different text editors in the Unix family of operating
systems. The following is a basic introduction to some of them.
When choosing a text editor for your work, you should pay attention to whether it is
suitable for writing programs. For this purpose, a text editor must, first of all, work with
files in a plain text format

Table 1.3. Commands of the vim editor


~ jump to the beginning of the line
$ skip to the end of the line
x delete the character under the cursor
dw delete a word (from cursor to space or end of line)
dd delete the current line
d$ delete characters from the cursor to the end of the line
J merge the next line with the current line (delete line feed)
§ 1.2. How to use a computer properly 132
i start entering text from the position before the current character
(insert)
a same, but after the current character (append)
o insert an empty line after the current line and start typing text.
herd
O same, but the line is inserted before the current line repeat the last
. operation
u undo the last operation (undo)
U undo all changes made to the current row

format; second, the editor must not automatically format paragraphs of text (for example,
MSWord is not suitable for this purpose); and third, the editor must use a monospaced
font, i.e., a font in which all characters have the same width. The easiest way to find out
whether an editor satisfies this property is to type a string of ten Latin letters m and a
string of ten Latin letters i below it. In an editor using monospaced font, the resulting
text will look like this:

mmmmmmmmmmmm
iiiiiiiiiiiiii

whereas in an editor that uses a proportional font (and is therefore unsuitable for
programming), the view will look something like this:

mmmmmmmmmmmm
iiiiiiiiiiiiii
However, for editors that work in a terminal window, this property is done automatically,
but we would not recommend using such text editors that open their own graphical
windows anyway.

vim editor
The vim (Vi Improved) editor is a clone of the classic text editor for Unix-like VI
operating systems. Working in
§ 1.2. How to use a computer properly 133

Fig. 1.6. Moving the cursor in vim using the alphabetic keys of this family of editors may seem
a bit inconvenient to a novice user, as they are fundamentally different from the on-screen text
editors with menu systems that most users are accustomed to. At the same time, many
programmers working under Unix-systems prefer to use these editors, because for a person who
knows how to use the basic functions of these editors, it is this variant of the interface that is
the most convenient for working on the program text. Moreover, as the experience of the author
of these lines shows, all this applies not only to programs; the text of the book you are reading
is typed in the vim editor, as well as the texts of all other books by the author.
If you find vim too difficult to master, there are other text editors available, two of which
are described below. For readers who choose not to learn vim, here is the sequence of
keystrokes to exit vim: if you accidentally start vim, in almost any situation you can press
Escape, then type :qa!, which will exit the editor without saving your changes.
To start the vim editor, just give the command vim myfile.c. If the myfile.c file
does not exist, it will be created the first time you save changes. The first thing you should
realize when working with vim is that it has two modes of operation: text input mode and
command mode. As soon as you start, you are in command mode. In this mode, any keystrokes
will be taken as commands to the editor, so if you try to enter text, you won't like the result.
You can move through the text in command mode using the arrow keys, but more
experienced vim users prefer to use j, k, h, and l to move down, up, left, and right,
respectively (see Figure 1.6). To make it easier to remember these letters, note that the four
keys you need are located next to each other on the keyboard, with the h key on the left and
the l key on the right, which are used to move left and right; the letter j looks a bit like
a down arrow, and is used to move down; the only thing left is k, which is used to move up.
Table 1.4. File commands of the vim editor
:w save the edited file
:w <name> write the file under a new name
:w! save, ignoring (if possible) the readonly flag
:wq save file and exit
exit the editor (if the file has not been modified since the last
:q
save)
exit without saving, resetting the changes made, read the contents
:q!
of the <name> file and paste it into the text to be edited.
:r <name> start editing another file
show the list of editable files (active buffers)
:e <name> move to buffer number N
:ls

:b <N>

The reason for this choice is that in UNIX the arrow keys generate a sequence of bytes beginning
with the Esc code (27); any such sequence can be perceived by the editor as a request for a
command mode transition and several character commands, and the only way to distinguish the Esc
sequence generated by pressing the key from the same sequence entered by the user is to measure
§ 1.2. How to use a computer properly 134
the time between the arrival of the Esc code and the one following it. When working on a slow link (for
example, when editing a file remotely on a slow or unstable network), this method can be frustrating.
A few of the most commonly used commands are listed in Table 1.3. The i, a, o, and O
commands put you in text entry mode. Everything you enter from the keyboard is now treated
as text to be inserted. Naturally, it is possible to use the Backspace key in its usual role. In most
cases it is also possible to use the arrow keys, but in some versions of vim, with some
peculiarities of customization, as well as when working through a slow connection channel, the
editor may not respond correctly to arrows. In this case, you need to exit input mode to navigate
through the text. Exit the input mode and return to the command mode by pressing the Escape
key.
To search by text, you can use (in command mode) the sequence /<text>, ending it by
pressing Enter. Thus, /myfun will position the cursor at the nearest occurrence of the string
myfun in your text. You can repeat the search by typing / and pressing Enter immediately.
You can move to a line with a given number (for example, to a line for which the compiler
has generated an error message) by typing a colon, the line number and pressing Enter.
Commands to save, load files, exit, etc. are also available via colon. (see Table 1.4).
When working with several files simultaneously, the Ctrl -~ combination allows you
to quickly switch between the two most recently edited files. By default, the editor requires the
current file to be saved, which is not always convenient; this can be overridden by setting the
hidden option with :set hidden. By the way, this and other commands can be written
to the .vimrc file in your home directory, so that they are always executed when you start
the editor.
The commands for selecting blocks and working with blocks deserve special mention. To
start selection of a fragment consisting exclusively of whole lines, use the V command; to
select a fragment consisting of an arbitrary number of characters, use the v command. The
selection boundary is set by arrows or by the corresponding h, j, k, and l commands.
The selected block can be deleted with the d command and copied with the y command.
In both cases, the selection is deselected and the text fragment that was under the selection is
placed in a special buffer. The contents of the buffer can be inserted into the text with the
commands p (after the cursor) and P (before the cursor). Text can also be placed in the buffer
without selection. Thus, all commands that delete certain text fragments (x, dd, dw, d$, etc.)
place the deleted text in the buffer. The yy, yw, y$ commands put the current line, the current
word, and the characters from the cursor to the end of the line into the buffer, respectively.
If you decide to seriously learn vim, we strongly recommend that you go through the
vimtutor tutorial program, which usually appears on your system along with vim itself.

Nano Editor
The Nano editor has become extremely popular in the world of "pops" I.piii.x'a in the last
ten years - some popular distributions put this very editor as the default one. The history of its
appearance is quite interesting. In the vicinity of the turn of the century, the Pine e-mail client,
which worked in a terminal window, was quite popular in Unix systems. The built-in text editor
of this client, originally intended for editing e-mails, was released at some point as a separate
program called "Pico" from the words Pine Composer; as for the Nano editor, it is a clone of
Pico, implemented from scratch by the Gnu project members because of doubts about the
§ 1.2. How to use a computer properly 135
license purity of Pico. This editor is not intended for programming, but it does have a number
of purely "programmer" functions, such as syntax highlighting, automatic shifting, etc. To start
this editor, as usual, you use its name as a command and the name of the file to be edited as an
argument:
nano myfile.pas

There are no tricky "modes" in this editor, you can immediately type text using the arrow keys,
PgUp/PgDn, Home, End, Backspace and Del in their usual role. All these keys have "single-
byte" alternatives in case of working through a slow communication line, but, unlike the same
vim commands, the location of the corresponding letters on the keyboard is not quite convenient
- for example, to move the cursor to the right, left, up and down you can use the combinations
Ctrl-F, Ctrl-B, Ctrl-P and Ctrl-N - from the words forward, backward, previous,
next; perhaps the arrows are much more convenient.
In the lower part of the screen there are two lines with hints; here it is worth remembering
at once that the symbol "~" stands for Ctrl, i.e., for example, "~C" corresponds to pressing
Ctrl-C. By the way, this combination in Nano is used in a very useful, though unexpected
(for exactly such keys) role: it shows the number of row and column corresponding to the
current cursor position (the mnemonic word here is current [position]). It is worth paying
attention to the combinations Ctrl-O (write Out) - to save the edited file and Ctrl-X -
to exit the editor (if the text contains unsaved changes, the editor will offer to save it).
You may not immediately notice that the editor asks questions in many cases, and for this
purpose it uses the third line from the bottom, just above the prompts. In particular, when you
try to save a file, it always asks if you want to save it under this name; usually you just press
Enter (or enter a different name), but you have to notice that the editor wants something from
you first. Try saving your file right away and pay attention to the bottom of the screen, then
you probably won't miss the editor's question next time.
The Ctrl-Shift-_ combination is extremely useful in programming. After that, the
editor will prompt you to enter the row number and column number where you want to go; the
two numbers you are looking for are entered using commas, or you can enter only the row
number and press Enter.
Another "secret knowledge" worth arming yourself with is the way to copy a text fragment
here. To do this, you first delete a line or several lines in a row from the text using Ctrl-K
(strangely enough, from the word cut - it's just that the letter C was already occupied), and then
press Ctrl-U (uncut) to paste the newly deleted fragment back into the text. Naturally,
between
Table 1.5: The most common commands of the joe editor
Ctrl-K D save file
Ctrl-K X save and exit
Ctrl-C bail out
Ctrl-Y delete the current line
Ctrl-K B kick off the block
Ctrl-K K end block
Ctrl-K C copy the selected block to a new location
Ctrl-K M move the selected block to a new location
§ 1.2. How to use a computer properly 136
Ctrl-K Y deselect
Ctrl-K L line number
Ctrl-Shift-'-' undo
Ctrl-' redo the canceled action (redo)
Ctrl-K F keyword search
Ctrl-L repeated search

with these actions you can move the cursor where you want it to go. Ctrl-U can be pressed
several times, which allows, firstly, to "multiply" text fragments, and, secondly, if you want to
copy a fragment rather than move it, you can first (immediately after deleting it) paste it in its
place, and then go to the desired location and paste the same fragment again.
You can learn about the Nano's additional features from the built-in description, which is
invoked by Ctrl-G or the traditional F1 key.

Editor Joe
Another popular Unix text editor is called Joe, from Jonathan's Own Editor. To run it, just
give the command joe myfile.c. If the file myfile.c does not exist, it will be created
the first time you save changes. Unlike the vim editor, the joe interface is more similar to the
text editors most users are used to. The arrow keys, Enter, Backspace, and others work in their
usual roles, and the Delete key is usually available as well. Commands to the editor are given
using key combinations, most of which begin with Ctrl-K. In particular, Ctrl-K h will
show a memo of the most used editor commands at the top of the screen (see Table 1.5).

Built-in Midnight Commander shell editor


The Midnight Commander shell (file monitor) is a clone of the once popular MS-DOS file
manager known as Norton Commander. The shell is launched with the mc command. The
built-in text editor for editing the selected file is invoked with the F4 key; if you want to create
a new file, use the Shift-F4 combination.
The interface of this editor is quite clear on an intuitive level, so we omit a detailed
description. We will limit ourselves to one recommendation. If no special measures are taken,
the editor will insert a tab character into the text instead of groups of eight spaces, which may
be inconvenient when using other editors. The only way to disable this style of filling is to set
the "Fill tabs with spaces" option. To get to the settings dialog, press F9, select the "Options"
menu item, and then select "General". To avoid losing your settings when you exit Midnight
Commander, save them. To do this, after exiting the editor, press F9, select the "Options" menu
item, and then select "Save Setup".

1.2.13. File access rights


To understand the material in this paragraph, you need to be able to handle binary and octal
number systems, and to understand what a "bit" is. If you are still unsure, skip this paragraph
and come back to it later; everything you need will be introduced and explained in §1.3.2.
Each file in Unix is associated with a 12-bit word called "permissions" for the file. The
§ 1.2. How to use a computer properly 137
lower nine bits of this word are organized into three groups of three bits; each group defines
access rights for the owner of the file, for his group, and for all other users. The three bits in
each group are responsible for the right to read the file, the right to write to the file, and the
right to execute the file. To find out the access rights of a file, you can use the ls -l
command, for example:
$ ls -l /bin/cat
-rwxr-xr-x 1 root root 14232 Feb 4 2013 /bin/cat
The -rwxr-xr-xr-x character group at the beginning of the line shows the file type (a
minus at the very beginning means we are dealing with an ordinary file, a d would mean a
directory, etc.) and access rights for the owner (in this case rwx, i.e. read, write and execute),
the group and everyone else (in this case r-x, i.e. no write rights). As we can see, the
/bin/cat file is available for any user to read and execute, but only the root user (system
administrator) can modify it.
Since a group of three bits corresponds to exactly one digit of the octal number system , 45

file access rights are often written as an octal number, usually three-digit, sometimes four-digit.
In this case, the lowest digit (the last digit) corresponds to the rights for all users, the middle
digit corresponds to the rights for a group and the highest (usually the first) digit denotes the
rights for the owner. Execution rights are set by the lowest bit of each group (value 1), write
rights by the next bit (value 2), and read rights by the highest bit (value 4); these values are
summed, i.e., for example, read and write rights are indicated by 6 (4 + 2) and read and execute
rights by 5 (4 + 1). The permissions for the /bin/cat file from our example can be encoded
with the octal number 0755 . 46

For directories, the interpretation of permissions bits is slightly different. Read permissions
allow you to view the contents of a directory. Write permissions allow you to modify the
directory, i.e. create and destroy files in it, and you can delete an alien file, and one to which
you have no access rights - it is enough to have write access rights to the directory itself. As for
the "execute" bit, for a directory this bit means the ability to use the contents of the directory in
any way, including, for example, opening files in the directory. Thus, if a directory has read
permissions but no execute permissions, we can browse it, but we can't use what we see; this
is a rather pointless situation, it's not usually done this way. On the contrary, if we have execute
rights but no read rights, we can only open a file from that directory if we know the exact name
of the file. There is no way for us to know the name, because we have no way to browse the
directory. This option is sometimes used by system administrators; however, in most cases,
read and execute permissions for a directory are set and removed together.
The remaining three (higher) digits of the permission word are called SetUid Bit (04000),

45
Details about number systems will be described in §1.3.2; in principle, it is not necessary to understand the
octal number system to work with access rights to files, it is enough to remember which type of rights corresponds
to which digit (4, 2, 1) and that the final designation of the access mode is their sum, which, of course, turns out
to be a digit from 0 to 7.
46
Note that the number is written with a zero in front; according to C rules, this means that the number is
written in octal, and since professional Unix users are very fond of this language, they usually write octal and
hexadecimal numbers following the C conventions and without specifying it, i.e. assuming that they will be
understood.
§ 1.2. How to use a computer properly 138
SetGid Bit (02000) and Sticky Bit (01000). If an executable file is set to SetUid Bit,
the file will have the rights of its owner (most often the root user) when executed, regardless of which
user ran the file. SetGid Bit works in a similar way, setting the execution to run as the group of
the file's owner instead of the group of the user running the program. For example, the SetUid Bit
is typically set for the passwd program. Sticky Bit on simple files is ignored by modern
systems. For directories, the SetGid Bit means that whichever user creates a file in that directory,
the "owner group" for that file will be set to the same group as the directory itself. Sticky Bit
means that even if a user has write permission to a given directory, they can only delete their own
(owned) files - this is used to create public storage locations like the /tmp directory. The SetUid
Bit on directories is ignored on most systems. We will return to the discussion of permissions in
Volume 2.
The chmod command is used to change file permissions. This command allows you to
47

set new permissions as an octal number, e.g.

chmod 644 myfile.c

sets the myfile.c file to owner-only write permissions and read permissions to all.
Permissions can also be specified as a mnemonic string like [ugoa][+-
=][rwxsXtugo]. The letters u, g, o and a at the beginning stand for the owner (user),
group (group), others and all at once, respectively; "+" stands for adding new
permissions, "-" for removing old permissions, "=" for setting the specified permissions and
removing all others. After the sign the letters r, w, x mean, as you can guess, read, write and
execute rights, the letter s means Set/Unset bits (makes sense for owner and group),
t means Sticky Bit, and the letters u, g and o to the right of the action sign
mean the rights set for owner, group and others respectively. The letter X (capitalized) means
to set/unset the execution bit only for directories, and for those files for which at least someone
has execution rights. If the chmod command is used with the -R checkbox, it will change
permissions to all files in all subdirectories of a given directory. For example, the chmod a+x
myscript command will make the myscript file executable; the chmod go-
rwx * command will remove all but the owner's permissions from all files in the current
directory. The following command can be very useful
chmod -R u+rwX,go=rX ~

just in case you accidentally mess up the permissions on your home directory; this command
will probably restore everything to a satisfactory state. To explain, this command sets all files
in your home directory and all its subdirectories to read and write permissions for the owner;
for directories, as well as files for which execution is allowed to anyone, the owner is also
assigned execution rights. Read permissions are set for the group and other users, execution
permissions are set for executable files and directories, and all other permissions are removed.

An abbreviation of the English words change mode, i.e. "change mode".


47
§ 1.2. How to use a computer properly 139
1.2.14. Electronic documentation (man command)
Unix distributions usually contain a large amount of documentation that can be accessed
directly during operation. Much of this documentation is in the form of files displayed with the
man (from the word manual) command.
The reference book available through the man command covers Unix OS commands,
system calls (i.e. functions that the OS kernel provides to user programs), C library functions
(and sometimes other languages supported by the system), file formats, some general concepts,
etc. For example, if you want to know all the options of the ls command, you should
give the command "man ls", and if you have forgotten in what order the arguments of the
waitpid system call go, the command "man waitpid" will help you. The man program
will find the corresponding document in the system directory and start the program to display
it. The document that appears on the screen can be scrolled with the "up arrow" and "down
arrow" keys, you can use the "space" key to skip a page of text at once. Exit from viewing the
help document by pressing the q key (from the word quit).
If the reference document you need is large and you need to find a specific place in it, it
may be convenient to search for a substring. This is done by typing / followed by typing the
string to be searched and pressing Enter. To search for the same string again, type / and press
Enter (i.e. the string itself can be omitted). To search in the opposite direction, you can use ?
instead of /.
In some cases, a system directory may contain more than one document with a given name.
For example, there is a write command and a write system call. You are unlikely to
need the write command, so if you type man write, you probably mean the system call;
unfortunately, the system doesn't know this and will give you the wrong document. This
problem can be solved by specifying the section number of the system directory. So, in our
example, the command

man 2 write

will output exactly the document devoted to the write system call, since section #2 contains
reference documents on system calls. Let's list the other sections of the system reference book:
• 1 - Unix OS user commands (such commands as ls, rm, mv, etc. are described in this
section);
• 2 - Unix OS kernel system calls;

• 3 - C library functions (this section can be referred to, for example, for information about
the sprintf function);
• 4 - device file descriptions;
• 5 - descriptions of system configuration file formats;
• 6 - game programs;
• 7 - general concepts (for example, man 7 ip will give useful information about
programming using TCP/IP);
• 8 - Unix system administration commands (for example, in this section you will find a
description of the mount command for mounting file systems).
The directory may also contain other sections, not necessarily labeled with a number; for
§ 1.2. How to use a computer properly 140
example, when the Tcl language interpreter is installed in the system, its reference pages are
usually organized in a separate section, which may be called "P ", "ZP ", ETC.

1.2.15. Command files in the Bourne Shell


The Bourne Shell interpreter can not only work in dialog mode with the user, but also
execute programs called command files (scripts). Unix family systems provide special support
for script programming: the system kernel considers executable files to be of two kinds -
ordinary "binaries" containing specially formatted machine code, and scripts - text files, the
beginning of which specifies which interpreter to use to execute them, followed by the program
text. The first line of a script file must necessarily begin with the characters "#!", the system
recognizes scripts exactly by these two characters; further in this line the full path to the
executable file of the interpreter is written. In particular, a file with a program intended for
execution by the Bourne Shell interpreter must begin with the following line

#!/bin/sh

Of course, /bin/sh is not the only possible interpreter. For example, a Perl program may
start with the line

#!/usr/bin/perl

- if, of course, the Perl interpreter is installed in the system. To turn a plain text file into an
executable script, it is enough to form its first line from the characters " #! " and the path
to the interpreter, and set the x bit in the file access rights (see §1.2.13). In the simplest case,
after the header line, a script file contains the commands to be executed, one per line (empty
lines are ignored); we have already given examples of such scripts in §1.2.1 (see page 80), but
did not explain how to handle them. Let's try to fill this gap.
If we remember that the echo command prints its command line arguments, we can
write a script that prints a poem. Let's take any text editor and create a file humpty.sh
containing the following text:

#!/bin/sh
echo "Humpty Dumpty sat on a wall,"
echo "Humpty Dumpty had a great fall."
echo "All the king's horses and all the king's men."
echo "Couldn't put Humpty together again."

Now set this file to execute and run it:


avst@host:~$ chmod +x humpty.sh
avst@host:~$ ./humpty.sh
Humpty Dumpty sat on a wall,
Humpty Dumpty had a great fall.
All the king's horses and all the king's men
Couldn't put Humpty together again.
avst@host:~$
§ 1.2. How to use a computer properly 141
Pay attention to the "./" characters before the script name when launching it. The dot here
stands for the current directory, and the "/" is, as usual, a separator between the directory name
and the file name, so the mysterious "./humpty.sh" literally means "humpty.sh file
located in the current directory". But why can't you just write "humpty.sh"?
To understand the answer to this question, let's first recall how we gave the usual commands
- ls, pwd, cd, cp, rm and others, how we ran text editors and other programs installed on
the system. To do this, we used a name that didn't contain a single slash every time. Actually,
we could have done things differently - say, run the vim editor with the command
/usr/bin/vim (and on FreeBSD, /usr/local/bin/vim), and write /bin/ls
instead of ls (this is where the ls executable is located on the system); but this is
48

inconvenient, to say the least. Therefore, a slightly different convention from the usual
conventions for file names applies to file names containing programs to be run. Absolute and
relative names-any names containing at least one slash-work the same way as regular file
names, but short names-names without slashes-are not considered file names in the current
directory, but command names. The command interpreter either executes such commands
itself or, if there is no built-in command with such a name, searches for the executable file in
system directories; we will learn more about which directories are considered "system" in this
sense later.
This is why we can run the same text editor with just its short name, vim, even though
it is not in the current directory; but for the same reason we cannot use short names to run
programs in the current directory, so we have to artificially turn the short name into a relative
name by adding the sacramental "./". This is how we'll have to run our own programs when
we start writing them. Of course, we could run our example differently, either by an absolute
name (something like /home/vasya/humpty.sh) or by some more complicated relative
name - in particular, if we are in the directory /home/vasya, it has a subdirectory called
work, and we put the script in that subdirectory, we could run it with the command
work/humpty.sh. One thing is important: the command name must contain at least one
slash, otherwise the system will try to find a command with that name in the system directories,
fail, and generate an error.
Let us note one more important point. In most systems, the length of the first line of a script is
limited, and in some systems the limits are quite severe - only 32 bytes. You can pass a command line
parameter to the interpreter (the interpreter itself, as a program), but, alas, not more than one; the
system will pass the name of the script file as the second argument. We don't need this now, but in
Volume 3, as we study a variety of interpreted languages, we will encounter some inconvenience
because of these limitations.
Of course, scripts are not limited to simple sequences of commands; the Bourne Shell
interpreter allows you to execute certain commands depending on the results of condition
checks, organize loops, use variables to store information, and so on. We will now look at some
of these features, but there is one difficulty here: not all readers have enough experience to
understand what is going on. If you haven't experienced programming (in any language at all),

Actually, the command line interpreter handles the ls command and some other commands by itself
48

without running any external programs, but just in case, all these commands are also available as a separate
program.
§ 1.2. How to use a computer properly 142
the rest of this paragraph may seem confusing and abstruse. There is nothing wrong with that,
just skip it and come back here later, after learning the basics of programming using Pascal as
an example - at least after you have finished chapter 2.2. Keep in mind that the Bourne Shell
scripting language is quite specific, because it belongs to the group of command-scripting
languages, in which programs longer than two or three hundred lines are not usually written
(and if they are written, it means in most cases that something went wrong in the management
of a particular project); you should certainly not start learning programming with this language.
You should continue reading this paragraph only if you already have experience of writing
working programs in any language, even small ones.
Like many other programming languages, Bourne Shell allows you to store information in so-called
variables - if you will, to associate some information with some name that can be accessed later to
use the previously stored information. Variables in the Bourne Shell language have names consisting
of Latin letters, numbers, an underscore and always starting with a letter. The value of a variable can
be any string of characters. To assign a value to a variable, you must give an assignment command,
for example:

1=10
MYFILE=/tmp/the_file_name
MYSTRING="Here are several words"

Note that there should not be spaces in the variable name and around the equal sign (assignment
character), otherwise the command will not be considered as an assignment, but as an ordinary
command in which the assignment character is one of the parameters. If there are spaces in the string
acting as a value, the string itself should be enclosed in quotes; if there are no spaces in the value, the
quotes can be omitted.
To refer to a variable, the $ sign is used to indicate that its value should be substituted for the
variable name. For example, the command

echo $I $MYFILE $MYSTRING

will print the string "10 /tmp/the_file_name Here are several words ". You can
enclose variable names in curly braces to make a concatenated text from variable values; for example,
the command "echo ${I}abc" will print "10abc".
The $(( )) construct is used to perform arithmetic operations. For example, the command
"I=$(( $I + 7 )))" will increase the value of the variable I by seven. Inside the double brackets
you can omit the variable reference sign - the interpreter treats as a variable name any word that
cannot be interpreted otherwise in an arithmetic expression (as a number or as an operation symbol).
Spaces are also optional in most cases, so you can just write " I=$((I+7))", the effect will be the
same.
Special variables $0, $1, $2, ..., $12, etc. are used to access the command line arguments
of the script itself, with $0 representing the name of the script as specified by the user at
startup. In this case $# is converted into an integer - the number of arguments. For example, if you
create a script argdemo.sh with the following text:

#!/bin/sh
# argdemo.sh

echo "My name is " $0


§ 1.2. How to use a computer properly 143
echo "I've got " $# " parameters"
echo "Here are the first three of them, in reverse order:" echo
..... $3$2 $1"

and run it with three parameters, the result is this:

avst@host:~$ ./argdemo.sh abra schwabra kadabra


My name is ./argdemo.sh
I've got three parameters
Here are the first three of them, in reverse order: kadabra
schwabra abra
avst@host:~$

Note the "empty" parameter in the last echo command. It is needed to prevent echo from processing
command-line options starting with a minus in the parameters after it. The command accepts a few
such parameters, but only before the beginning of the normal parameters that need to be printed; if
you don't put an empty parameter at the beginning, it is possible for something starting with minus to
affect echo's operation by passing a third parameter to the script.
Bourne Shell supports subroutines, which we will not consider for the sake of space, but just in
case we note that within a subroutine the variables $1, $2, etc. denote not the script command
line arguments, but the values of the parameters with which the subroutine was called; $#
corresponds to their number.
Recall (see §1.2.6) that the $ symbol retains its special meaning inside double quotes, but loses
it inside apostrophes.
To go further, we need to know that any commands executed in the system have the property of
terminating successfully or unsuccessfully. For this purpose, programs, no matter what language they
are written in, when they terminate, inform the operating system about the success of their work in the
form of a so-called termination code; formally, this code is a number from 0 to 255, with zero being
considered a success code and all other numbers being considered unsuccessful. We will learn how
exactly all this happens later, but what is important for us now is the very fact that the termination code
exists, because in the Bourne Shell any condition - for a branch or for a loop - actually represents the
execution of a command, and its successful completion is considered as a logical truth, and its
unsuccessful completion is considered as false.
The most common way to do this is to use the test command built into the Bourne Shell
interpreter, which can check various assumptions. If the assumption is correct, the command will
terminate with a null (successful) return code, otherwise with a one (unsuccessful). A synonym for the
test command is the opening square bracket symbol, and the command itself in this case takes the
closing square bracket symbol as its parameter (as a sign of expression termination), which allows you
to visually write the tested expression, enclosing it in square brackets. Here are some examples.

[ -f "file.txt" ]
# whether a file named file.txt exists
[ "$I" -lt 25 ]
# the value of variable I is less than 25
[ "$A" = "abc" ]
# the value of variable A is the string abc
[ "$A" != "abc" ]
# the value of variable A is not the string abc

This can be used, for example, in a branching construct:


§ 1.2. How to use a computer properly 144
if [ -f "file.txt" ]; then
cat "file.txt"
else
echo "File file.txt not found"
fi

Note that the same thing could have been written differently, without using square brackets, but not as
clearly:

if test -f "file.txt" ; then


cat "file.txt"
else
echo "File file.txt not found"
fi

Of course, not only test, but also any other command can act as a condition. For example:

if mkdir new_dir; then


echo "Directory created"
else
echo "Failed to make new directory" fi

In addition to branching, the Bourne Shell language supports more complex constructs, including
loops. For example, the following fragment will print all numbers from 1 to 100:

I=1
while [ $I -le 100 ]; do
echo $I
I=$((( I + 1 )))
done

The -le checkbox, accepted by the test command, is derived from the words less or equal; -
lt is used for "strictly less", -gt and -ge are used for "more" and "greater than or equal to",
respectively.
The while construct is not the only loop option available in the Bourne Shell; a second loop
construct is built with the word for, specifying the name of a variable that must run through all values
in a given word list, the in keyword, and the word list itself, ending with a semicolon; the loop
body is framed with the same words do and done. Thus, the following loop

for C in red orange yellow yellow green blue indigo violet; do


echo $C done

will print the English names of the rainbow colors (in a column, since the echo command translates
the line when it finishes). The for loop is especially convenient when combined with filename
substitution (see §1.2.7).
Information about the success of command execution can be used not only in if and while
constructs, but also with the help of the so-called logical connectives && and ||, corresponding to
the logical operations "and" and "or". As usual, logical truth corresponds to the successful completion
of the command, and falsehood - to the unsuccessful one; the operation of the bindings is based on
the fact that at certain values of the first operand of conjunction and disjunction the general result is
§ 1.2. How to use a computer properly 145
clear without calculating the second operand: if the first operand of conjunction is false, the result
(falsehood) is already known, you can do nothing further, and in the same way you can do nothing if
the first operand of disjunction is true. Simply put, the command line

cmd1 && cmd2

will cause the interpreter to execute cmd1 first; and cmd2 will be executed only if cmd1 completes
successfully. Conversely, the command line

cmd1 || cmd2

implies running cmd2 if cmd1 fails.


The priority of logical bindings among themselves is traditional (i.e., "and" is higher priority than
"or"). At the same time, the priority of "pipeline" operations and I/O redirections is higher than the
priority of logical bindings; for example, the command

cmd1 && cmd2 | cmd3

represents a mapping between cmd1 and the cmd2 | cmd3 pipeline as an integer. The "truth"
value of a pipeline is determined by the success or failure of the last of its component commands. As
usual, the order of operations can be reversed by using parentheses, e.g.:

(cmd1 && cmd2) | cmd3

In this example, the standard output of cmdl and cmd2 (if, of course, it is executed at all) will be
directed to the standard cmd3 input.
All of the features listed here are available not only in scripts but also in a normal session, i.e. we
can enter, for example, a loop construct as a normal command and the loop will be execut ed
immediately; this is not so convenient but can be useful. The Bourne Shell language contains a lot of
other features that we will not cover here. For more detailed information about programming in this
language, you should refer to specialized literature (e.g., [1]).

1.2.16. Environment variables


Programs running in Unix OS have the ability to get some information (usually related to
global settings) from so-called environment variables. An environment is actually a set of text
strings of the form VAR=VALUE, where VAR is the name of the variable and VALUE
is its value.
In fact, the environment is different for each running program, the so-called process. A process
has the ability to change its environment: add new variables, delete existing variables, or change their
values. When one process starts another process, the spawned process usually inherits the
environment of the parent process.
One of the most important variables is the PATH variable. This variable contains a list of
directories in which to look for an executable file if the user has given a command without
specifying the directory . We should also mention the HOME variable, which contains the
49

path to the user's home directory; the LANG variable, which is used by multilingual

See the reasoning on p. 120.


49
§ 1.2. How to use a computer properly 146
applications to determine in which language messages should be issued; and the EDITOR
variable, in which you can enter the name of your preferred text editor. Of course, the
list of environment variables does not end there. You can see the entire set of variables in your
environment by issuing the export command without parameters.
The command line interpreter provides options for controlling environment variables. First,
at startup the interpreter copies the entire environment into its own variables (note that the
interpreter's internal variables are organized in the same way as environment variables, namely
as a set of VAR=VALUE strings), so that they can be accessed:

avst@host:~$ echo $PATH


/usr/local/bin:/bin:/usr/bin
avst@host:~$ echo $HOME
/home/stud/s2003324
avst@host:~$ echo $LANG

ru_RU.KOI8-R

In addition, the interpreter provides the ability to copy variable values back to the environment
using the export command:

PATH=$PATH:/sbin:/usr/sbin
export PATH

or just

export PATH=$PATH:/sbin:/usr/sbin

Internal variable assignments, such as those we used in the command files in the previous
paragraph, do not affect the environment by themselves.
The variable can be removed from the environment using the unset and export
commands:

unset MYVAR
export MYVAR

Modifying the environment affects the execution of all commands that we give to the
interpreter, because the processes started by the interpreter inherit a modified set of
environment variables. In addition, if necessary, you can run an individual command with an
environment modified just for it. This is done in the following way:

VAR=value command

For example, to change user information, including the command interpreter used, you can use
the chfn command, which can be implemented in different ways: in some systems it
asks the user a series of questions, and in others it offers to edit a certain text, from which it
then extracts the desired values. To edit text, the vi text editor is launched by default, which
is not convenient for all users. You can get out of this situation, for example, in the following
way:
§ 1.2. How to use a computer properly 147
EDITOR=joe chfn

In this case, the joe editor will be launched.

1.2.17. Session logging


Workshop tasks often require you to provide a record of a program session, i.e., text that
includes both information entered by the user and information output by the program. This is
easily accomplished with the script command. To start logging, run the script
command with one parameter specifying the name of the log file. To end logging, press
Ctrl-D ("end of file"). For example:

avst@host:~$ script my_protocol.txt


Script started, file is my_protocol.txt
avst@host:~$ Is
al.c Documents my_protocol.txt tmp
avst@host:~$ echo "abc"
abc
avst@host:~$ [Ctrl-D]
Script done, file is my_protocol.txt

The file my_protocol.txt now contains the session protocol:

Script started on Wed May 17 16:31:54 2015 avst@host:~$ ls


a1.c Documents my_protocol.txt tmp
avst@host:~$ echo "abc"
abc
avst@host:~$
Script done on Wed May 17 16:32:14 2015

1.2.18. Graphics subsystem in Unix operating system


General information
Unlike some other systems, Unix itself does not include any GUI support. To work in
graphical mode, the Unix OS uses a software package collectively called the X Window
System, which consists of ordinary user programs and is not, generally speaking, part of the
operating system.
The name "XWindows" can sometimes be found in literature and conversations. Such a name is
categorically wrong, which the creators of the X Window system insistently emphasize. The word
"window" in the name of this system should be in the singular.
Central to the operation of the X Window System is the program responsible for displaying
graphical information on the user's display. This program is called the X server. All applications
that use graphics make requests to the X server to display a particular image; thus, the X server
provides a service for displaying graphics information to applications, hence the name "X
server". The programs that access the X server (i.e. all programs running in Unix that use
graphics) are called X-clients.
X-clients of a special type, called window managers, deserve special mention. A window
§ 1.2. How to use a computer properly 148
manager is responsible for framing the windows that appear on the screen - it draws window
frames and titles, allows you to move windows around the screen and resize them. Authors of
other graphics programs, as a consequence, do not have to think about window decoration;
usually, an X application is only responsible for drawing a rectangular area of the screen that
has no frame, title, or other standard elements of window decoration. On the other hand, the
user in this situation can also choose among several window managers the one that best suits
his individual inclinations and needs.
The author once liked to demonstrate to the "uninitiated" a simple trick, which consisted in
replacing on the fly the window manager from the ascetic-looking fvwm2 to fvwm95, copying the
appearance of MS Windows-95 in the smallest details. For some reason, the viewers are especially
impressed by the fact that open applications are not going anywhere.
One of the most popular X client programs is xterm, an alphanumeric display
emulator for the X Window. It may be convenient to have several instances of the xterm
process running at the same time, each spawning its own window where it runs a copy of the
command line interpreter. In one window we can run the text editor, in another window we can
perform translation and debugging, in a third window we can run test programs, etc.

Starting X Window and selecting a window manager


Depending on the configuration of your particular machine, the X Window system may
already be running, or you may need to start it yourself. This is usually done with the startx
command, which in turn starts the xinit program.
You may have a machine on your local network that acts as an xdm-based application server;
you can connect to such a machine using X Window's in-house tools so that all your programs will run
on that (remote) machine and your local workstation will only display their graphical windows, i.e. your
computer will act as an X terminal. To check if there are xdm servers on your network, try giving the
command "X -broadcast". If there is indeed an xdm server on your network, after switching to
graphical mode you will see a prompt to enter a username and password to log in to that server. If
there is more than one xdm server on your network, you will first be shown a list of them and asked to
choose which one you want to use. If nothing of the sort happens within 15-20 seconds of entering
graphical mode, there is probably no xdm server on your network.
The startx command will launch a window manager for you along with the X server.
On some systems the window manager can be selected from the menu that appears immediately
after starting X Window, on other systems the choice of a particular window manager is
configuration-dependent.
If you are not completely happy with the session configuration started by the startx
command, you can customize it to your liking by creating a file named .xinitrc in your
home directory (or editing it if you already have one). Note the dot before the file name, it's
important. Essentially .xinitrc is a command file from which application programs,
including the window manager itself, are run. The xinit program starts the X server and
then, after setting environment variables appropriately, begins executing commands from
.xinitrc. Ending the execution of .xinitrc means the end of X Window. The simplest
.xinitrc example might look something like this:

xterm &
§ 1.2. How to use a computer properly 149
twm

In this case, xterm will run first (we run it just in case, so that we can work even if the
window manager has an inconvenient configuration), followed by the twm window manager.
Note that xterm runs in the background (the & sign is placed at the end of the first line for
this purpose). This is done so that you don't have to wait for it to finish to run twm.
If the graphical shell starts automatically at system boot, and you log in by entering your name
and password in the graphical input form (this input f orm is drawn for you by the so-called display
manager), you will need the .xsession file instead of .xinitrc to set up your session. In fact,
it is organized in the same way as .xinitrc, i.e. commands from it are executed after starting
your work session; you should only take into account that you give the startx command from an
existing work session (even if it is not a graphical one), where environment variables affecting the
work of many programs are already set up; when logging in through the display manager, it is the
.xsession file that is responsible for setting up the environment in the work session. Some
recommend that the commands for starting the programs that make up your work session be written
in the .xinitrc file, and that the . xsession file be built from two commands: executing a standard
script describing your work environment ( .profile) and running .xinitrc. The contents of
.xsession are as follows:

. "/.profile
. "/.xinitrc.

In this case, there's a good chance you'll get a completely identical working environment both when
you launch X Window via startx and when you log in via the display manager.
All window managers existing nowadays can be divided into those that try to implement
the desktop metaphor in addition to their main functions (window management), and those that
do not. The difference between them is huge; the former are usually even called desktop
environments (DE). These include Gnome, KDE, xfce, MATE, Cinnamon, Unity; "regular"
window managers are represented by such programs as IceWM, Fluxbox, Window Maker,
BlackBox, the already mentioned twm, as well as mwm, fvwm, AfterStep and many others.
Strictly speaking, not all DEs are window managers - some of them include their own window
manager as a separate program, e.g. xfce's window manager is xfwm, and Gnome's window manager
can even be changed. However, usually a window manager included in a DE is not designed to work
separately from its DE.
The history of the "desktop metaphor" is rather peculiar. It is believed to have been
invented back in 1970 at Xerox PARC, Xerox's research center; Alan Kay is credited as the
author. The first experimental computer with a graphical interface - Xerox Alto - appeared in
1973. It should be understood that at that time computers were still "big", the rapid transition
to the fourth generation of computers was almost ten years away; punch cards were actively
used to work with computers, they were just beginning to be overtaken by alphanumeric
terminals, and a graphical monitor, even equipped with a "mouse" manipulator (so familiar to
us now) was perceived by professionals as exotic. There were no end-users in the modern sense
at that time; any computer at all was "exotic" for the general public.
The first realization of a "desktop" for a computer available to mass consumers appeared
only a decade later - in 1983 - on the Commodore 64 computer, and there the graphical interface
was not the main one. As the main user interface "desktop" was used in 1984 by the creators of
§ 1.2. How to use a computer properly 150
Apple Macintosh. We can not say that this approach immediately became popular; on the
market flooded with "IBM-compatible" computers, the first shell with DE - Windows 1.0 -
appeared in 1985, but tangible popularity of systems of this line reached only in the early 1990s.
It is understandable: for all this graphical economy stopped disgustingly slow, computers had
to first become fast enough, and while all this was not too great, the public got used to working
in text mode (although not with the command line - the command line in MS-DOS was too
primitive for serious work). The marketers were not able to convince the mass user that
Windows was exactly what the user needed, despite titanic advertising efforts.
Unfortunately, nowadays most users cannot imagine working with a computer in any other
way than in the desktop metaphor; this partly explains the even sadder fact that almost all
popular Linux distributions, unless they are specifically tweaked after installation, offer some
variant of the Desktop Environment by default; one of the justifications for this is said to be the
desire to make it easier for users to migrate from the Windows world.
Of course, nothing good comes out of it. To begin with, all DEs as one require a lot of
resources and, which is quite natural for programs of this class, often "slow down" quite
noticeably, especially on "old" computers. Meanwhile, the graphical shell is an auxiliary tool
whose duty is to provide the user with an opportunity to run programs that solve his tasks; it is
for the sake of these programs, called application programs, that the computer itself, the
operating system, and the graphical shell exist; a situation in which auxiliary programs take
away valuable resources from applications looks strange, to put it mildly.
Keep in mind that if you do allow any DE to start in your system, it will feel obliged to
create a whole bunch of subdirectories in your home directory with names like Desktop,
Downloads, Documents, Music, Photos and so on - note that they are completely
empty. By the way, if you now throw some files into the Desktop directory, they will
immediately appear as icons on your main screen (outside of windows), which is, in fact, the
"desktop" in the sense of DEs running Unix. And be thankful if these directories are named in
English instead of Russian, and on top of that with spaces in the names (yeah, just like that -
Desktop, Downloads, Documents), because of which you will have strange problems
with programs that don't expect it, and difficulties when trying to deal with all this from the
command line.
As already mentioned, in order to learn programming (and in general to increase your own
efficiency when working with computers), you should make the command line interface your
main tool; Desktop Environment programs only get in the way of this, so it is advisable to
immediately replace the environment that your Linux distribution has installed by default with
one of the "lightweight" window managers. Fortunately, many of these are included in most
distributions, although they are not installed by default. The author of this book would venture
to recommend that you try IceWM first, but don't get "hooked" on it; when you get more or
less used to the new environment, you should definitely try other window managers.
If you are logging in in graphical mode, i.e. through the display manager (in most cases
this will be the case, although the system can always be reconfigured to not run the display
manager), see if the display manager itself, in addition to providing a username and password,
also provides some sort of menu or other means of selecting the desired window manager; the
appropriate choice is usually called something like session type in English and "session type"
or something similar in Russian. If there is no such option, you will have to use the .xinitrc
and .xsession files mentioned above; for example, you can write something like this in
§ 1.2. How to use a computer properly 151
.xinitrc

xterm &
icewm

(however, xterm is not necessary here, just icewm), and in the .xsession file, as we
suggested above, this:

. "/.profile
. "/.xinitrc.

You can, however, do it even simpler - create a single .xsession file, write a single word
icewm in it, and with a good chance you will be satisfied with the result.

Working with classic window managers


We consider twm, fvwm and some others to be classic OMs; Note that the IceWM
recommended in the previous paragraph is not considered classic, because its window behavior
is rather similar to what you might be used to in Windows - for example, to move the input
focus to the desired window, you need to "click" on this window, and it will be immediately
"raised up", i.e. if some part of it was closed by other windows, after the sacramental "mouse
click" you will see the whole window.
If you follow our advice and try to work with different window managers, sooner or later
you will come across one in which the input focus follows the mouse cursor without any clicks,
which allows you, in particular, to enter text into a window that is only partially visible on the
screen. This may be unfamiliar, but in many cases it is much more convenient. There are other
features of classic windows that, when you look closely, turn out to be convenient, although
you will have to get used to them.
Any window manager has very advanced customization tools that allow you to
significantly change its behavior, so it would be difficult to give a comprehensive instruction
on how to work with any of the window managers at the level of "press this key to get this
result". In this paragraph we will limit ourselves to general recommendations that will allow
you to quickly master working with your window manager in the configuration installed in
your system. You can change any of the window manager's settings if you wish; you should
refer to the technical documentation for instructions on how to do this.
So, the first thing that can be advised after launching the window manager is to try to
understand how to highlight the main menu in it. In all classic OMs, the main menu is displayed
if you left-click anywhere on the screen (i.e. in a place that is not covered by any windows).
Of course, there is nothing similar to the "desktop" and icons here, so you can launch the
programs you need either through the menu or with the help of the command line, and the
second one is, to put it bluntly, preferable, if only because it is simply faster. Therefore, it is
better to get a command line window at your disposal right away by launching the xterm
program or some of its analogs. Usually these programs can be found either in the main menu
itself, or in its submenus called "terminals", "shells", and so on. If you already have one
command line window, you can start a new instance of xterm by issuing the
command
§ 1.2. How to use a computer properly 152
xterm &

Note the "&" symbol. We run the xterm program in the background, so that the old xterm
instance (with which we issue the command) does not have to wait for it to finish: otherwise,
starting a new xterm would make no sense, because we would not be able to use the
old one while it is running.
The xterm program has a well-developed system of options. For example,

xterm -bg black -fg gray &

will launch the terminal emulator on a black background with gray letters (the same set of
colors is usually used in a text console).
In most cases, a window that is partially hidden can be fully displayed (raised to the top
level) by clicking on its title bar (rather than anywhere in the window, as you may be used to).
Your settings may also allow you to do the reverse - to "drown" the window by showing what's
below it; this is usually done by right-clicking on the title bar. To move a window across the
screen, you can also use its title bar: just put the mouse cursor over the title bar, press (and keep
the left button pressed), select a new window position and release the button. If the window
title is not visible (for example, it is hidden under other windows), the same operation can be
done using vertical and horizontal parts of the window frame, except for selected areas in the
corners of the frame; these corner areas are used to resize the window, i.e. when you drag them
with the mouse, not the whole window is moved, but only the corner you have captured.
If you lose the window you need, you can usually find it easily by right-clicking in an
empty space on the screen - this will bring up a menu consisting of a list of existing windows.
In most cases, window managers support so-called virtual screens, on each of which you
can place your own windows. This is useful if you are working with a large number of windows
at the same time. The virtual screens map, which shows the virtual screens, is usually located
in the lower right corner of the screen; to switch to the virtual screen you need, just click on the
corresponding place on the map. Classic window managers (unlike IceWM, by the way) usually
consider all available virtual screens as parts of one "big screen", so, for example, a window
can be located partially on one virtual screen and partially on another. This is especially useful
when for some reason it is desirable to make a window larger than the size of your (real, if you
will, physical) monitor.
From the windows in which a particular text is displayed, it is usually possible to copy that
text to other windows. To do this, just select the text with the mouse; many programs running
under X Window do not have a special "copy" operation: the text that is selected is copied.
However, even if copy and paste operations are provided as separate entities - through menus
or hotkeys (for example, it is so in browsers and office applications), these operations belong
to another, parallel scheme of text copying, and do not cancel automatic copying of everything
that is selected. You can paste the selected text with the third (middle) mouse button. Most
likely, your mouse has a "wheel" for scrolling; note that this wheel can be pressed from top to
bottom without scrolling, and then it will work as a regular (third) button, which is what you
actually need. Besides, if, for example, you don't want to reach for the mouse, you can try
pressing the Shift-Ins key combination, most likely it will lead to the same result.
§ 1.2. How to use a computer properly 153
1.3. Now, a little math
In this chapter we will consider very brief information from the field of mathematics,
without knowledge and understanding of which you will definitely have problems during
further reading of this book (and learning programming). Most of this information relates to
so-called discrete mathematics, which is completely ignored in the school mathematics
curriculum, but in recent years has become part of the school computer science curriculum.
Unfortunately, the way these things are usually presented in school leaves us no choice but to
tell them ourselves.

1.3.1. Elements of combinatorics


Combinatorics is a section of mathematics that covers problems like "how many ways
can...", "how many different variants of..." and so on. Combinatorics always considers finite
sets, with elements of which something happens all the time: they are rearranged in different
order, some of them are discarded, then brought back, combined into different groups, sorted
and mixed again, and in general they are abused in every possible way. We will start with one
of the simplest problems of combinatorics, which, in order to avoid unnecessary nerdiness, we
will formulate in the language of a textbook for primary school age.

Vasya and Petya decided to play spies. For this purpose Vasya was not lazy to
screw three colored bulbs to the window of his room, which can be seen from
outside if they are lit, and each of the bulbs can be lit independently of the others.
Vasya has screwed the bulbs on tightly, so it is impossible to change their places.
How many different signals can Vasya send to Petya with the help of his bulbs, if
the variant "none of them is lit" is also considered as a signal? Vasya doesn't know
exactly when Petya will decide to look at his window, so all kinds of variants with
Morse code and other similar signaling systems are not suitable: Vasya needs to
put the lights in the position corresponding to the signal to be transmitted, and
leave them in that position for a long time, so that Petya will definitely notice the
signal.

Many readers will probably give the correct answer without too much thought: eight; however,
what is of interest here is not how to calculate the answer (by raising a two to the right degree),
but why the answer is calculated in this way. To find out, we start with a trivial case: when
there is only one light bulb. Obviously, two different signals can be transmitted here: one of
them will be indicated by the bulb being on, and the other by the bulb being off.
Let us now add another light bulb. If, for example, this second bulb is always on, we can
transmit only two signals as before: "first bulb on" and "first bulb off". But nobody prevents
the second bulb from being turned off; in this position we will also have two different signals:
"first bulb on" and "first bulb off". The one to whom the signals are intended, in our task Petya,
can look at both bulbs, that is, consider the state of both of them. The first two signals (when
the second bulb is on) will be different for him from the second two signals (when the second
bulb is off). In total, therefore, we will get the possibility of transmitting four different signals:
off-off, on-off, on-off, on-off and on-off.
§1.3. Now a little math 154
Let's equip these four signals with numbers from 1 to 4 and add one more bulb, the third
one. If we turn it on, we can transmit four different signals (by turning the first two bulbs on
and off). If we turn it off, we will get four more signals, which will be different from the first
four; in total we will get eight different signals. No one forces us to stop there; numbering the
existing eight signals with numbers from 1 to 8 and adding a fourth bulb, we get 8 + 8 = 16
signals. The reasoning can be generalized: if with the help of n bulbs we can transmit N signals,
then adding a bulb with the number n +1 doubles the number of possible signals (i.e. they get
2N), because the first N we get with the help of the originally available bulbs when the new
one is turned off, and the second N we get (with the same available bulbs) if the new one is
turned on.
It will be useful to consider the degenerate case: no bulbs, i.e., n = 0. Of course, you can't
play spies in this way, but the case is nevertheless important from the mathematical point of
view. To the question "how many signals can be transmitted with 0 light bulbs", most people
will answer "none", but, strangely enough, this answer is unfortunate. Indeed, our "signals"
distinguish one situation from another, or, more precisely, they correspond to some different
situations. To be even more precise, we can see that there are actually infinitely many
"situations", it's just that in signaling we ignore some factors, thereby combining many
situations into one. For example, our young spy Vasya could signal "all the lights are off" as
"I'm not home"; the other seven combinations in our friends' signaling could mean "I'm doing
homework", "I'm reading a book", "I'm eating", "I'm watching TV", "I'm sleeping", "I'm doing
something else", and finally, "I'm not doing anything, I have nothing to do at all, so come visit".
If we look carefully at this list, we can notice that in any of the situations further clarifications
are possible: the signal "I am eating" can equally denote the situations "I am having lunch", "I
am having dinner", "I am eating a delicious cake", "I am trying to overcome unpalatable and
disgusting peppers", etc. etc.; "I am doing my homework" can equally well mean "I am solving
math problems", "I am coloring the maps assigned in geography", or "I am blowing off my
neighbor Katya's Russian exercises". Possible variants are "I'm doing my homework and I'm
feeling good, so I'll get it all done soon" and "I'm doing my homework, but I have a
stomachache, so homework will take longer today." Each of the possible signals somewhat
reduces the overall uncertainty, but, of course, does not eliminate it.
Let's go back to our degenerate example. Not having a single light bulb, we cannot
distinguish situations from each other at all, but does that mean that we don't have any situations
at all? Obviously not: our young spies are still engaged in something or, on the contrary, not
engaged, it is just that our degenerate version of signaling does not allow us to distinguish these
situations. Simply put, we have combined all possible situations into one, completely removing
any certainty; but we have combined them into one situation, not zero of them.
Now everything becomes clear: at zero bulbs we have one possible signal, and adding each
new bulb doubles the number of signals, so that N = 2", where n is the number of bulbs and N
is the number of possible signals. In passing, we note that sometimes the above reasoning
allows us to better understand why k° == 1 for any k > 0.
The problem about the number of signals transmitted by n light bulbs, each of which may
or may not be lit, is equivalent to many other problems; before we get into the dry math, here
is another formulation:

Masha has a brooch, a chain, earrings, a ring and a bracelet. Every time she
§1.3. Now a little math 155
leaves the house, Masha thinks long and hard about which jewelry to wear this
time and which to leave at home. How many choices does she have?

To understand that this is the same problem, let's introduce an arbitrary assumption that is not
related to the essence of the problem and does not affect this essence in any way: let our young
spy Vasya from the previous problem turned out to be Masha's younger brother and decided to
tell his friend Petya what jewelry his sister has put on this time. To do this, he has to add two
more bulbs to the three already existing ones, so that there are as many bulbs as Masha has
jewelry. The first bulb out of five will indicate whether Masha is wearing a brooch, the second
bulb will indicate whether Masha is wearing a chain, and so on, one bulb for each piece of
jewelry Masha has. We already know that the number of signals transmitted by the five bulbs
is 2 = 32; it is obvious that this number is exactly equal to the number of combinations of
5

jewelry Masha is wearing.


In "dried" mathematical language, the same problem, completely stripped of the "husks"
that do not affect the result, and reduced to pure abstractions essential from the point of view
of the solution, is formulated as follows:

Given a set of n elements. How many different subsets of this set are there?

Answer 2" is easy to remember, and unfortunately, that's what they usually do in school; the
result of this "cheap and angry" approach to learning math is the ease with which a student can
be completely confused by any non-standard formulation of the problem conditions. Here's an
example:

Dima has four different-colored cubes. By placing them one on top of the other,
Dima builds "towers", and the one cube with nothing on it is also considered a
"tower" by Dima; in other words, Dima's tower has a height from 1 to 4 cubes.
How many different "towers" can be built from the available cubes, one at a time?

If only you knew, dear reader, how many high school students, without thinking at all, give the
answer 2 to this problem! By the way, some of them, noticing that the empty tower is excluded
4

by the terms of the problem, "improve" their answer by subtracting the "forbidden" variant, and
get 2 - 1. It doesn't become more correct from that, because it is simply not the same problem
4

in which a two is raised to the degree p; but to notice it, one should understand why a two is
raised to the degree p in "that" problem, and schoolchildren who have memorized the "magic"
2 have fatal problems with it.
"

By the way, the correct answer to this problem is 64, but the solution has nothing to do
with raising a two to the sixth degree; if there were three cubes, the answer would be 15, but
for five cubes the correct answer is 325. The point here, of course, is that in this problem it
matters not only what cubes the tower consists of, but also in what order the cubes that make
up the tower are arranged. Since for towers consisting of more than one cube it is possible to
get different variants simply by swapping the cubes, the resulting combinations are much more
numerous than if we consider possible sets of cubes without taking into account their order.
Before proceeding to the problems in which permutations are essential, let us consider a
couple more problems on the number of variants without permutations of elements. The first
of them we will "make up" from the original problem about young spies:
§1.3. Now a little math 156
Daddy brought Vasya a cheap Chinese lamp that can either be turned off, just glow or
blink. Vasya immediately screwed the lamp to his window, which already has three
ordinary light bulbs. How many different signals can Vasya transmit to Petya now that
he has improved his spy equipment?

The problem is, of course, quite elementary, but it is interesting in one way. If a person
understands how (and why exactly so) the problem with three ordinary light bulbs is solved,
then he will not have any problems with a "tricky" light bulb with three different states; but if
the problem is tried to solve by an average schoolboy, who was taught the formula N = 2"
without explaining where it came from, then with a good degree of certainty he will "sit down"
on this new problem. And it is solved by the same reasoning: we can transmit eight different
signals if the Chinese lamp is extinguished; we can transmit the same number of signals if it is
just lit; and the same number of signals if it is blinking. Total 8 + 8 + 8 + 8 = 3 - 8 = 24. This
case shows how much more valuable the scheme of formula derivation is than the formula
itself, and now it is time to note that in combinatorics it is always so; moreover, combinatorial
formulas are simply harmful to remember, it is better to derive them every time, all of them are
so simple that you can derive them in your mind. If you memorize any formula from the field
of combinatorics, you run the risk of applying it inappropriately, as the above-mentioned
schoolchildren do when trying to solve the problem about towers of cubes.
Another task on the same topic looks like this:

Olya has blanks for flags in the shape of a regular rectangle, a triangle and a rectangle
with a cut-out; Olya also has patches of other colors in the shape of a circle, a square,
a triangle and a star. Olya decided to make a lot of flags for the holiday; how many
different ones

Fig. 1.7. The checkbox problem

flags she can make by sewing one of the available patches onto one of the available
blanks?
§1.3. Now a little math 157
This problem is also, one could say, a standard textbook problem, so most people who at least
roughly know what we are talking about will simply multiply the two numbers and get the
absolutely correct answer - 12. But what is much more interesting here is not how to solve a
particular problem, but in what ways it can be done. To begin with, let's note that our reasoning
with the light bulbs also works remarkably well here: indeed, if Oli had only rectangular blanks,
she could make as many different flags as she has different patches, i.e. four. If she had only
triangular-shaped blanks, she could also make four different flags, and the same is true if all
her blanks were rectangular-shaped with a cutout. But the first four variants differ from the
second four variants, and the third four variants differ from both the first and the second by the
shape of the blanks; therefore, the total number of variants is 3 - 4 = 12.
Another variant of reasoning may be more interesting. Let's make a table with three
columns for the number of different blanks and four rows for the number of different patches.
In each column we will place the flags made using the corresponding blank, and in each row
we will place the flags made using the corresponding patch (see Fig. 1.7). For anyone who has
formed the abstraction of multiplication in his brain, it is immediately obvious that the cells
with flags are 3 - 4 = 12; interestingly, this insight is akin to the concept of the area of a
rectangle, only for the discrete case.
Consider another problem similar to the one we left for later:

Dima has a box with cubes of four different colors. By placing the cubes one on
top of the other, Dima builds "towers" up to four cubes high, and Dima also
considers one cube with nothing on it to be a "tower". How many different
"towers" can be built from the available cubes? It is considered that Dima has as
many cubes of each color as he wants.

Despite the apparent similarity (a large part of the text here was simply copied), this problem
is much simpler than its previous version, where there were only four cubes. However, again,
if you don't understand how combinatorial results are obtained, this problem is impossible to
solve, because standard formulas don't work for it. The correct answer here is 340; we invite
the reader to demonstrate how this answer was obtained.
So far, all the problems we have considered have been solved without taking permutations
into account; we have not solved the only problem in which permutations turned out to be
essential. We will start our discussion of problems with permutations with the canonical
problem of the number of possible permutations. As usual, let us first formulate it in the
schoolboy way:

Kolya has seven American billiard balls with different numbers (for example, "solid"
from one to seven) in his bag. How many different ways can Kolya put them in a row on
the shelf?

There are two ways to arrive at the correct answer, and we will consider them both. Let's start
with a trivial variant: there is only one ball, how many "ways" can we put it on the shelf?
Obviously, there is only one way. Now let's take two balls; it doesn't matter what their numbers
are, the result doesn't change, but let them be balls numbered 2 and 6 for definiteness.
Obviously, there are two ways to arrange them on the shelf: "two on the left, six on the right"
§1.3. Now a little math 158
and "six on the left, two on the right". The first way is called direct, the second way is called
reverse, because the numbers of balls from left to right in this case do not increase, but, on the
contrary, decrease.
Now let's add a third ball (for example, let it be number 3) and see how many ways there
are. We can choose the leftmost ball in three ways: put a two on the left, put a two on the left

®® ®® ®
®® ®® ®
®® ® ®® ®
Figure 1.8. Transpositions of three balls

a three or put a six on the left. Whichever ball we choose, the remaining two balls on the
remaining two positions can be placed in two ways already known to us - forward and
backward; in other words, for each choice of the leftmost ball there are two choices for the rest,
i.e. there are six choices in total (Fig. 1.8). Note that permutations are usually numbered in this
order: first they are sorted in ascending order of the first element (i.e., first come permutations
in which the first element has the smallest number, and at the end - permutations where the
number of the first element is the largest), then all permutations having the same first element
are sorted in ascending order of the second element, and so on.
If now we add a fourth ball (let it be the ball with number 5), we will get four ways of
choosing the leftmost one of them, and with each such way the rest of the balls can be arranged
in the six ways already known to us; the total number of permutations for the four balls is 24.
Now we are hopefully ready to generalize: if for k - 1 balls there are M^-i possible permutations,
then by adding the k-th ball we increase the number of permutations by a factor of k, i.e. M = к

k - M^-i. In fact, when adding the k-th ball the total number of balls becomes k, i.e. the very first
(for example, the leftmost) ball can be chosen in k ways, and the rest, according to the made
assumption, can be arranged (with the leftmost one fixed) in M 1 ways. Since we started from
к

the fact that for one ball there is one possible permutation, i.e. Mi = 1, the total number of
permutations for k balls will be equal to the product of all natural numbers from 1 to k:

M = k - Mk-i = k - (k - 1) - M = - - - - = k - (k - 1) - ... 2 - 1
к к2

As is well known, this number is called the factorial of k and is denoted "k!"; in fact, the
definition of the factorial of the natural number k is "the number of permutations of k elements,"
and that the factorial is equal to the product of the numbers from 1 to k is a corollary.

For the problem formulated above, the answer will thus be 7! = 5040 combinations.
This result can be reached in another way. Consider a bag containing seven balls and seven
empty positions on the shelf. We can choose a ball to fill the first empty position in seven ways;
whichever ball we choose, there will be six in the bag. In other words, when one position has
§1.3. Now a little math 159
already been filled, we have six choices to fill the second position, regardless of which of the
seven possible ways the first empty position was filled. Thus, for each of the seven ways to fill
the first position we have six ways to fill the second position, and the total number of ways to
fill the first two positions is 7 - 6 = 42. There are five balls left in the bag, i.e. for each of 42
combinations of the first two balls there are five variants of the third ball; the total number of
variants for the first three balls is 42 - 5 = 210. But for each such combination we have four
ways of choosing the next ball, because there are four balls left in the bag; and so on. We can
choose the penultimate ball from the remaining balls in the bag in two ways, the last one - in
one way. It turns out that we have a total of seven ways of arranging the seven balls

7 - 6 - 5 - 4 - 3 - 2 - 1=7! = 5040
Repeating the same reasoning for the case of balls, we come to the already familiar expression

k - (k - 1) - (k - 2) - ... - 3 - 2 - 1 = к!
- only this time we come to it moving from larger numbers to smaller ones, instead of vice
versa, as in the previous reasoning. Note that both considerations will be useful for
understanding further calculations.
Let us now consider the intermediate problem, starting, as usual, with a special case:

Kolya still has seven American billiard balls in his bag with numbers from one to
seven. Vasya showed Kolya a small shelf on which only three balls can fit. How
many different ways can Kolya fill this shelf with balls?

Undoubtedly, the reader will easily find the answer to this question by repeating the first three
steps of the above reasoning: we can choose the first of the three balls in seven ways, the second
in six ways, and the third in five ways; the answer is 7 - 6 - 5 = 210 choices. This number can
be written using the factorial symbol:
§1.3. Now a little math 160
7-6-5-4-3-2-1 7!
7-6-5 =
4-3-2- 1 4!
In the general case when we have n items (elements of a set)
and we need to compose an ordered set (tuple) of length k, we have:

n - (n - 1) - ... - (n - k + 1) =
n - (n - 1) - ... - (n - k + 1) - (n - k) - ... - 2 - 1
= (n - k) - ... - 2 - 1=
п!
(n - k)!
This quantity, called the number of placements from n to k, is sometimes denoted by A^ (read
A from en to ka) in the Russian-language literature. In the English-speaking literature, as well
as in some Russian-speaking sources, the notation (nD is used, which is called the decreasing
factorial.
Now is probably a good time to solve the problem that we formulated on page 139 but did
not solve. 139, but we didn't solve it. Let's recall its condition:

Dima has four different-colored cubes. By putting them one on top of another, Dima
builds towers, and Dima considers one cube, on which nothing stands, to be a tower too;
in other words, Dima's tower has a height from 1 to 4 cubes. How many different towers
can be built from the available cubes?

Clearly, there will be 4 towers of four cubes! = 24. There will be the same number of towers of
three cubes: each of them is obtained from one strictly defined tower of height 4 by removing
one cube, but the fact that Dima keeps this cube in his hands or has put it away instead of
putting it on top of the tower does not change the number of combinations. There will be 4 - 3
= 12 towers of two cubes, and 4 towers of one cube, according to the number of cubes available.
24 + 24 + 12 + 4 = 64, this is the answer to the problem.
Now we come close to another classical problem, for the sake of which, in general, the
whole conversation about permutations was started. As usual, we start with a special case:

In Kolya's bag there are the same seven American billiard balls with numbers from one
to seven. Vasya gave Kolya an empty bag and asked him to put any three balls into it.
How many different ways can Kolya fulfill Vasya's request?
This problem differs from the previous problem about Kolya and Vasya in that the balls in the
bag are obviously intermingled; in other words, we are no longer interested in the order of the
elements in the final combinations. To understand how this problem is solved, let us imagine
that Kolya is also interested in how many ways he can choose three balls out of the available
seven without taking order into account, and he first wrote down on a piece of paper all 210
variants obtained when solving the previous problem, where instead of a bag there was a shelf,
i.e. all the possible variants of the placements from seven to three, taking into account the order
of elements. Knowing that the variants differing only in the order of the elements will now have
to be considered as identical, Kolya decided to see how many times the combinations consisting
of balls with numbers 1, 2 and 3 occur among the 210 combinations written out. Having
§1.3. Now a little math 161
carefully looked through his records, Kolya found six such combinations: 123, 132, 213, 231,
312 и 321. Deciding to check some other set of balls, Kolya looked through his list for
combinations using balls with numbers 2, 3, and 6; he found six such combinations: 236, 263,
326, 362, 623, and 632 (these combinations are already familiar to us from Figure 1.8 on page
143).
At this point in his research, Kolya began to guess (hopefully together with us) that the
same thing would happen for any set of balls. In fact, the list of 210 combinations includes all
possible choices of three balls out of seven, taking into account their order; as a consequence,
whatever three balls out of seven we take, our list will contain, again, all combinations
consisting of these three balls, that is, simply, all permutations of the chosen three balls; well,
as we know, there are 3 permutations of three elements! = 3 - 2 - 1 = 6. It turns out that any of
the combinations we are interested in is represented in the list six times; in other words, the list
is exactly six times longer than the result we need. All we have to do is divide 210 by 6, and
we get the answer to the problem: 35 = TS0 = W3 = 44.
, 63! 4!3!
In the general case, we are interested in how many ways we can choose k items from the
available n without taking into account their order; the corresponding value is called the number
of combinations of n by k and is denoted as CP (read "Ce of en by ka"; the letter C comes from
the word combinations, i.e., "combinations" or "combinations"). Repeating the above reasoning
for the general case, we note that, if we consider all (n)& placements (which differ from
combinations in that in them the order of elements is considered important), then in it each
combination will be represented by k! times the number of possible permutations of k elements,
i.e. the number (n)& = ( - )! exceeds the sought SP exactly k! times. As a matter of fact, all that is
п
!
к

left is to divide ( PC; by k!, and we get the most important formula of school combinatorics:
п

^к _n\
п
= k!(n - k)!
And now we will tell you something that you could hardly have been told in school: under no
circumstances, for no carrot, under no circumstances, memorize this formula! If you do
happen to memorize it, or if you managed to memorize this formula by heart before you got
your hands on our book, try to forget it again, like the worst nightmare. The point is that
remembering this formula is simply dangerous: it is tempting to apply it without too much
thought in any combinatorial problem where any two numbers are present; in most cases such
application will be erroneous and will give wrong results.
Instead of memorizing the formula itself, memorize the scheme of its derivation. When you
really need to find the number of combinations of n initial elements by k selected elements, you
can derive the formula for SP in your mind while you are writing it; to do this, write "SP =" and draw
a fractional line and run through the following in your mind: the total number of permutations
of n is n! but we only need k terms of the factorial, so we remove the extra terms by dividing by
(n - k)!; what we get is the number of combinations, taking order into account, and we don't
care about order, so we count each combination k! times, divide by k! and get what we need.
This approach will prevent you from misusing the formula, because you will know exactly what
the formula means and what you can use it for.
It is interesting that the same formula can be derived by another reasoning. Imagine that
we first put n balls in a row on the shelf, then separated the first k of them and poured them into
§1.3. Now a little math 162
a bag, and the rest n - k poured into another bag; of course, in each bag the balls are mixed up.
How many such (final) combinations are possible, in which the balls are placed in two bags,
and the order in which the balls are poured into each of the bags does not concern us? Let's
follow the scheme of reasoning already familiar to us: initially we had n! combinations, but
among them every k! became indistinguishable due to the fact that k balls were mixed in the
first bag, so we are left with P! combinations (after the first k balls were poured into the first bag
and mixed there, and the remaining (n - k) balls were not poured anywhere yet), but among these
combinations every (n - k)! then also became indistinguishable due to the mixing of balls in the
second bag. Total n! exceeds the desired SP by k!(n - k)! times, i.e. SP = щП-к u-
This reasoning is remarkable because, unlike the previous one, it is symmetric: both
multipliers in the denominator of the fraction are obtained in the same way. The problem does
indeed have some symmetry: to choose which balls to pour into another bag can obviously be
chosen in the same number of ways as to choose which balls to leave in the bag. This is
expressed by the identity Cк = C"к .
For degenerate cases one makes the assumption that C° = C™ = 1 for any natural p, but this
assumption, as it is easy to see, is quite natural. Indeed, C0 corresponds to the answer to the
question "in how many ways can zero balls from n available be transferred to another bag".
Obviously, there is only one way: we simply do nothing, and all the balls remain in the original
bag, and the second bag remains empty. Things are almost as simple with C": "how many ways
can we pour n balls from a bag with n balls into another bag"? Naturally, exactly one: we put
the second bag, pour into it everything found in the first bag, and the job is done.
The numbers C " are called binomial coefficients, because through them we can write the general
form of the decomposition of Newton's binomial:

(a + Ъ)" = V S "a"-к Ък
k=0
For example, (a + Ъ) =
5
a5 + 5a Ъ + 10a Ъ + 10a Ъ + 5a
4 32 23 4
+ Ъ5 , with the numbers 1, 5,10,10, 5 and 1
representing C0, C1, C , C3,C and C5. It is interesting that the majority of professional mathematicians
2 4

consider everything here so obvious that they do not condescend to any explanations; meanwhile, the
rest of the public, including the majority of people who have higher technical education but are not
professional mathematicians, do not see any connection between the problem of pouring balls from
bag to bag and the decomposition of Newton's binomial; when asked where the combinations of balls
in the binomial formula came from, they usually answer with the sacramental "it just so happened",
apparently believing that what is happening is either a random coincidence or an accidental
coincidence.
Meanwhile, in order to obtain in the process of decomposition of the binomial our "problem about
bags of balls", it is enough to notice that it should not be a question of how to decompose the binomial
into summands (this question is too general and affects by no means only combinations), but about
what coefficient will stand at each term of the decomposition.
Just in case, let us remind you how the school "opening of brackets" occurs when multiplying one
sum by another. To multiply the sum (a1 + a2 + ... + a") by the sum ( Ъ1 + Ъ + ... + Ъ ), you must first
2 т

multiply each of the summands of the first sum by the first summand of the second sum (in this case -
by Ъ1), so that you get a1 Ъ1 + a2 Ъ1 + ... + Ya "Ъ1; then do the same with the second summand
of the second sum (in this case - by Ъ1); then do the same with the second summand of the second
sum. + Ya "Ъ1; then do the same with the second summand of the second sum, obtaining a-1 Ъ2 + a
§1.3. Now a little math 163
Ъ22 + ... + Ya "Ъ2 , and so on for each summand of the second sum; all the obtained chains of summands
should be added together. The result is a sum consisting of pt summands representing all possible
products of the form aibj. In particular, (a + Ъ)(c + d) = as + Ъc + ad + Ъd.
§1.3. Now a little math 164
С
0 1
C0C4
g" *0<*1g "*2 1 1
'2 '2 '2 1 2 1
<-*1<-*2<-*3
'3' 3 '3 '3
<*1<*2<*3<*4 1 3 3 1
С С С С С
4 4 4 4 4 14 6 4 1
g*0<*1g " *2<*3<*4<*5
С С С С С С
5 5 5 5 5 5 5 10 10 5
Ю<*1<*2<*3 Г' 4<*5<*6
С С С С С С
'6 6 6 6 6 6 6 61 20 15 6
Figure 1.9. Pascal's triangle 5

It is clear that if you do not bring in homogeneous terms when opening the parentheses in the
expression (a + b) ", you will get 2" summands in the final sum. For example:

(a + b) = (a + b)(a + b)(a + b)(a + b)(a + b) = (a + b)(aa + aab + ba + ba + bb) =


4

= (a + b)(aaa + aab + aba + abb + baa + baa + baab + baa + ba + bb) =


= aaaa + aaab + aaba + aabb + abaa + abaab + abba + abbb+
+baa + baa + baab + baba + baab + baa + baab + baab + baab + baab + baab

For the sake of clarity, we have not used degree exponents here. Each summand of the final expansion
is a product in which either a or b is taken from each initial "bracket", and the sum itself consists of all
possible such summands. It is not difficult to see that there are exactly 2 of them possible, for from each
bracket we must take either a or b, i.e. we get the familiar problem about young spies and light bulbs;
but this is not relevant in this case.
After adding such terms, we obviously obtain the sum of one-members of the form Mak b" (recall
-k

that in our decomposition example n = 4, but this is only a partial illustration of the general reasoning),
and it remains for us to find out what M is equal to; it is easy to guess that M is the answer to the
question how many ways we can choose from all n "brackets" to "brackets", from which we take the
summand a as a denominator and from the rest we take the summand b. In this formulation it becomes
clear that this is, in fact, our problem about balls: instead of balls we have "brackets"; instead of moving
the ball into another bag we choose the summand b from the "bracket", instead of leaving the ball in
the original bag we choose the summand a from the "bracket". In particular, for our example, the
monomial a2 b2 occurs six times in the expansion:

aabb + abab + abba + baab + baba + bbaa


- which corresponds to the value of C2 ; after adding such terms to the final polynomial, the polynomial
6a b will appear. At the same time, for example, the uninomial b met us only once (in the form bbbb),
2 2 4

which corresponds to C4 = 1, and so on.


There is a very simple way to calculate the values of C ", which does not require any operations except
addition - the so-called Pascal's triangle (Fig. 1.9). The first line of this "triangle" consists of one unit,
which corresponds to the value of C0 (how many ways can one transfer from an empty bag to another
empty bag?
§1.3. Now a little math 165
bag into another empty bag to transfer zero balls? Obviously, one). Subsequent lines begin and end
with one, which corresponds to C" and C"; all other elements of the lines are obtained by adding the
elements of the previous line, between which stands the element to be calculated. For example, the
double in the middle of the third line is obtained as the sum of the ones above it; the two triples in the
next line are obtained by adding the twos and ones above each of them; in the bottom line of the shown
lines the number 15 is obtained by adding the 5 and 10 above it, and so on. Pascal's triangle is based
on the identity C"+1 = C" 1 + C", which is easily derived by converting the formulas:

(k - 1)!(n - k + 1)!+ k!(n - k).


= п! ( __________ ________ 1+1^ ___ =
\ (k - 1)!(n - k + 1)! k!(n - k)!)
. ( ____________ _____________ 1+1А ___ =
\ (k - 1)!(n - k + 1)(n - k)! k(k - 1)!(n - k)! J
("-* , ________ (n - k + 1) __________
\k(k - 1)!(n - k + 1)(n - k)! k(k - 1)!(n - k + 1)(n - k)

=п! ( k!(n +^1 - k)! (n + 1 - k) \ _ k!(n +


.! ( ____ 2+1 ______ А = 1 - k)!) =
\k!(n + 1 - k)!) k!(n + 1 - k)!
(п + 1)!
However, much more interesting in the context of
our conversation is the "combinatorial sense" of this identity, which turns out to be
unexpectedly simple. So, let us have a bag with n balls numbered from 1 to n. We were given
another ball, numbered n + 1, and asked how many ways we could pour the balls into the
empty bag now that we had an extra ball. Holding the (n + 1)-th ball in our hands, we realized
that all our choices can be divided into two non-overlapping groups. The first group of variants
is based on the fact that we keep holding the (n + 1)-th ball in our hands, or hide it in our pocket
at all, and use only the balls we had originally for pouring into the empty bag. As it is easy to
guess, there are C " of such variants. The second group of variants assumes, on the contrary,
that we throw our (n + 1)-th ball into the empty bag to begin with, and it remains for us to add
(k - 1) balls from our initial bag; this can be done in C" ways. The total number of variants thus
—1

turns out to be C " + C" which was required to prove.


—1

Pascal's triangle has many interesting properties, which we will not enumerate here,
because we have already got a bit carried away. Let us note only one of them: the sum of
numbers in any line of Pascal's triangle is 2 , where n is the number of the line if they are
п

numbered from zero (i.e., n corresponds to the degree of the binomial whose expansion
coefficients make up this line). In other words, ^k=o C^ = 2 . This property also has a completely
п

trivial combinatorial meaning, which we suggest the reader to find on his own as an exercise.
To conclude the discussion of combinatorics, let us consider another textbook
problem:
Seven chess players are participating in a chess tournament, and it is
assumed that each of them will play exactly one game with each other.
How many games will be played in total?

It is clear that each of the seven must play six games, one with each of the other
participants of the tournament. But the following phrase for some reason drives many
novice combiners into a stupor: since two people participate in each game, the total
§1.3. Now a little math 154
number of games will be twice less than 7 - 6, i.e. 726 = 21 games will be played.
Since there are often difficulties with this "two people participate in each game",
we will have to give some explanations, and we will give them in two ways. First of
all, let's remember that chess players in competitions necessarily write down all moves,
and both participants of each game do it; the filled-in protocols are then handed over to
the judges. Imagine now that each of the tournament participants has prepared one
protocol form for each upcoming game. It is clear that each of them prepared six such
forms, and in total, therefore, they were prepared 6 - 7 = 42. Now the chess players met
in the hall on the day of the tournament and began to play; after each game its
participants handed their protocols to the judges, i.e. after each game the judges
received two protocols. At the end of the tournament, obviously, all 42 protocols end
up with the judges, but the judges received two protocols after each game - hence, there
were half as many games, i.e. 21.
There is a second variant of the explanation. The results of sports competitions
under the so-called "round robin system", where everyone plays exactly one game with
everyone else, are usually presented in the form of a tournament table, with a row and
a column for each participant in the tournament. Diagonal cells of the table, i.e. such
cells, which stand at the intersection of the row and column corresponding to one
player, are shaded, because nobody is going to play with himself. Further, if, for
example, player B and player D played a game and B won, then it is considered that
the game ended with the score 1:0 in favor of B; in his row at the intersection with
column D is entered the result "1:0", while in the row D at the intersection with column
B is entered the result "0:1" (see Fig. 1.10).
It is obvious that at first there were 7 - 7 = 49 cells in the table, but seven of them
were immediately painted over and there are 42 cells left; at the end of each game two
cells are filled in, i.e. after 21 games all cells will be filled in and the tournament will
be over.

А Б В Г Д Е Ж
А XXX
Б XXX 1:0
В XXX
Г XXX
Д 0:1 XXX
Е XXX
Ж XXX
Fig. 1.10. Tournament table

Translated into a purely mathematical language, this problem turns into a problem
about the number of edges in a complete graph. Recall that a graph is a finite set of
abstract vertices, as well as a finite set of unordered pairs of vertices, which are called
edges; a graph is represented as a picture in which vertices are denoted by points and
edges by lines connecting the corresponding vertices. A complete graph is a graph in
which any two vertices are connected by an edge, and only one edge at that. Complete
§1.3. Now a little math 155
n(n~ 1)
a graph with n vertices contains a friction; indeed, in
each vertex has (n - 1) edges, i.e. there are n(n - 1) "edge ends" in the graph, but since
each edge has two ends, their total number is п(п21) .

1.3.2. Positional number systems


As we know, the system of writing numbers in Arabic numerals, which is common
and familiar to us since preschool age, is a special case of the positional numbering
system. We use only ten digits; at that, if the number is viewed from right to left, each
next digit has a "weight" ten times greater than the previous one, i.e. the actual value
of a digit depends on its position in the number record (that is why the system is called
positional). Note that we have ten digits, and the "weight" grows ten times per position.
This is by no means a coincidence: if we want every integer to be written in our system,
and in only one way, then each successive digit must "weigh" exactly as many times as
many digits as we use. The number that denotes both the number of digits used and the
number of times each next digit is "heavier" than the previous one is called the base of
the number system; for the decimal system, the base is, as is easy to guess, the number
ten.
Here there is a rather simple connection with combinatorics, which we considered in the
previous paragraph. In fact, suppose we have k numbered flagpoles and an unlimited number
of n - 1 flags of different colors, i.e. on each of k flagpoles we can raise any of n - 1 flags, or
leave a flagpole empty, and all this independently of each other; in other words, each of k
flagpoles is independently in one of n states. Then the total number of flag combinations is nк
. If for some reason this number of combinations is not enough for us, we will have to add one
more flagpole; we can even consider that there are initially "as many as we want", just that all
but the first k flagpoles are empty.
This is exactly what happens when we write numbers using the positional number system.
We use n digits, with the digit 0 corresponding to an "empty flagpole"; when working with k
digits (positions), this gives us n numbers, from 0 to n - 1. For example, there are 1000 = 10
к к 3

three-digit decimal numbers, from 0 to 999. In this case, we can initially assume that there are
an infinite number of digits, just that all of them, except for the first (lowest) k, contain zeros.
When adding one to the number n - 1 (in our example - to the number 999) we exhaust
к

the possibilities of k digits and have to use one more digit, the (k + 1)-th one. It makes no
sense to use a new digit before this moment, because we can represent all smaller numbers
using only k digits, and if we get into the next digit, we will get more than one representation
for the same number. But when all combinations of lower digits have been exhausted, we
have no other option but to use the next digit. The logical thing to do in this next digit is to start
with the smallest possible digit, i.e., one, and to zero all the lower digits to "start over"; thus,
the one to + 1 digit must correspond to the total number of combinations that can be obtained
in the first k digits.
The fact that all mankind now uses the base-10 system is nothing more than an
accident: the base of the number system corresponds to the number of fingers on our
hands. Working with this system seems to us "simple" and "natural" only because we
get used to it from early childhood; in fact, as we will see later, counting in binary is
much easier, there is no multiplication table needed (at all; that is, it simply does not
§1.3. Now a little math 156
exist there), and multiplication in columns, so hated by schoolchildren in the lower
grades, in binary turns into a trivial procedure of "writing out with shifts". Back in the
17th century, Gottfried Wilhelm Leibniz, who was the first in history to describe the
binary number system in the form in which it is known now, noticed this circumstance
and stated that the use of the decimal system is a fatal mistake of mankind.
Anyway, we can, if we wish, use any number of digits, starting from two, to create
a positional number system; if we follow the traditional approach and, using n digits,
assign values from 0 to (n - 1) to them, then with such a number system we can work
57

in much the same way as with the familiar decimal system. For example, in any number
system, writing down the number 1000 (where n is the base of the number system) will
п

mean n : in the decimal system it is a thousand, in the binary system it is 8, in the


3

pentameter system it is 125. You just need to remember one important thing. The
number system determines how the number will be written, but the number itself and
its properties do not depend on the number system; a prime number will always
remain prime, an even number will always remain even, 5 - 7 will be 35 regardless of
what digits (even Roman digits!) we write these numbers with.
Before proceeding to consider other systems, let us note two properties of the
ordinary decimal notation of a number that generalize without change to number
systems on a different base. The first of these follows directly from the definition of
positional notation. If a number is represented by the digits dkdk-i... d d.]d0..., then its
2

numerical value will be ^2 10 d ; for example, for the number 3275, its value is
i=0
k
k

calculated as 3 - 10 + 2 - 10 +7 - 10 + 5 - 10 = 3000 + 200 + 70 + 5 + 5 = 3275. The


3 2 1 0

second property requires a slightly longer explanation, but, by and large, it is not more
complicated: if we divide a number by 10 with a remainder and write out the
remainders, and so on until the next partial is zero, we will get (in the form of written
out remainders) all the digits that make up this number, starting with the youngest. For
example, divide 3275 by 10 with a remainder, we get 327 and 5 in the remainder; divide
327 by 10, we get 32 and 7 in the remainder, divide 32, we get 3 and 2 in the remainder,
divide 3 by 10, we get 0 and 3 in the remainder. The sequence of remainders 5, 7, 2, 3
are the digits of the number 3275.
Both of these properties generalize to the number system on an arbitrary base, only
instead of 10 in calculations we need to use the corresponding base of the number
system. For example, for the semeric entry 1532u the numerical value will be B7 +5+ 32

+3-7 +2 = 611 (of course, we perform all calculations in the decimal system, because
1

it is easier for us). Now let's try to find out what digits make up the semeric notation of
the number 611, for which we will successively perform several divisions by 7 with a
remainder. The result of the first division will be 87, with a remainder of 2; the result
of the second division will be 12, with a remainder of 3; the result of the third division
will be 1, with a remainder of 5; the result of the fourth division will be 0, with a
remainder of 1. So, the semeric notation of the number 611 consists of the digits 2, 3,
5, 1, if we list them starting from the youngest, that is, this notation is 15327 (we have
already seen it somewhere).

57
Other approaches are possible; for example, a number system using three digits whose values are
0, 1, and -1 is quite often mentioned in the literature; we will leave such number systems outside the
scope of our book, but the interested reader can easily find descriptions of them in other sources.
§1.3. Now a little math 157
As we see, the first of the two formulated properties of the positional notation
allows us to translate a number from any number system into the one in which we are
used to perform calculations (for us it is the system on base 10), and the second property
- to translate a number from the notation we are used to (i.e. decimal) into a notation in
an arbitrary number system.
Note that when converting a number from "some other" system to decimal, we can
save on multiplications by representing

dk'nk + dk-ink + ... + din + do


1

in the form of

(.. . ((d - n + dfc-i) - n + d


fc fe _2) - n + ... + di) - n + do
For example, the same 15327 can be converted to decimal by calculating ((1-7+5)-7+3)-
7+2 = (12-7+3)-7+2 = 87-7+2 = 609+2 = 611.
It may be noted that the traditional order of writing numbers from higher to lower
digits, which we have been accustomed to since childhood, often turns out to be not
very convenient when translating from one number system to another; for example, we
have already had to write out a sequence of remainders from division by a base and
then "flip" this sequence to get the desired number entry. Moreover, if we think a little,
we will notice that when we see a decimal number in a text, for example, 2347450, we
do not really know what the first of its digits stands for; it is, of course, "two" (or "two"),
but what? Tens of thousands? Hundreds of thousands? Millions? Tens of millions? We
can find out only by looking through the record of the number to the end and finding
out how many digits are in it; only then, going back to the beginning of the record, will
we realize that this time the two really meant two million and not something else.
But why does the whole world use this "inconvenient" notation? The answer is
surprisingly simple: the numbers we use are called Arabic for a reason; according to
the prevailing historical view, the modern system of recording numbers was invented
by Indian and Arab mathematicians presumably in the sixth to seventh centuries A.D.,
and reached its almost final form in the extant works of the famous sage Al-Khwarizmi
(from whose name the word "algorithm" is derived) and his contemporary Al-Kindi;
these works were written, according to traditional dating, in the ninth century. In any
case, the Arabic script suggests that words and lines are written from right to left, not
from left to right as we are used to; so for the creators of the decimal notation system,
the digits in the number record were arranged in the way that was most convenient for
them: when looking at the number record, they first saw the digit representing units,
then the digit of tens, then the digit of hundreds, and so on.
In programming we often have to deal with binary notation of numbers, because
this is how numbers and all other information are represented in the computer memory
as sequences of zeros and ones. Since it is not very convenient to write out a series of
binary digits, programmers often use the number systems on the base of 16 and 8 as an
abbreviated record: each digit of the octal record of a number corresponds to three digits
of the binary record of the same number, and each digit of the hexadecimal record -
§1.3. Now a little math 158
four binary digits. Since the hexadecimal system requires sixteen digits, and the Arabic
digits are only ten, the first six letters of the Latin alphabet are added: A stands for 10,
B for 11, C for 12, D for 13, E for 14, and F for 15. For example, the
entry 3F means 63, 100x corresponds to the number 256, and 11116 corresponds to
16

the number 273.


When working with modern computers, binary digits (bits) are usually combined into
groups of eight digits (so-called bytes), which explains the popularity of the hexadecimal
system, because each byte corresponds to exactly two digits of this record: a byte can take
values from 0 to 255, in hexadecimal notation these are numbers from 00ie to FFie.
With octal digits this trick does not work: to write a byte you need three of them in general,
because numbers from 0 to 255 in octal system are represented by records from 000s to
377s ;
but at the same time three octal digits can be written numbers that do not fit into a byte,
because the maximum three-digit octal number - 7778 - is 511. In particular, it takes exactly
eight octal digits to write three consecutive bytes, but in practice, groups of three bytes almost
never occur. However, the octal number system has another undoubted advantage: it uses
only Arabic numerals. Actually, 8 is the maximum degree of two, not exceeding 10; this is why
octal was very popular with programmers before the eight-bit byte became a stable unit of
information measurement, but by now it is used much less frequently than hexadecimal.
Since there are only two digits in the binary system, translating numbers into and
out of it is easier than with other systems; in particular, if you memorize by heart the
degrees of two (which programmers do anyway), you can do without multiplications
when translating in either direction. Suppose, for example, we need to translate the
number 10011012 into decimal notation; the senior unit standing in the seventh digit
corresponds to the sixth degree of two, i.e. 64, the next unit stands in the fourth digit
and corresponds to the third degree of two, i.e. 8, the next one means 4, and the last
one, the youngest one - 1 (in general, the youngest digit is equal to itself in any system).
103 76
51 1 38 0
25 1 19 0
12 1 9 1
6 0 4 1
3 0 2 0
1 1 1 0
0 1 0 1
1100111 1001100
Fig. 1.11. Conversion to binary system by dividing in half

number). Adding 64, 8, 4 and 1, we get 77, this is the number we are looking for.
There are two ways to convert from decimal to binary. The first is the traditional
one: divide the original number in half with a remainder, writing out the resulting
remainders until zero remains in the quotient. Since division in half is not difficult to
perform in the mind, usually the whole operation is carried out by drawing a vertical
line on the paper; on the left (and from top to bottom) write first the original number,
then the results of division, and on the right write out the remainders. For example,
converting the number 103 to binary yields: 51 and 1 as a remainder; 25 and 1 as a
§1.3. Now a little math 159
remainder; 12 and 1 as a remainder; 6 and 0 as a remainder; 3 and 0 as a remainder; 1
and 1 as a remainder; 0 and 1 as a remainder (see Figure 1.11, left). All that remains is
to write out the remainders, looking from bottom to top, and we get 110011l2. Similarly,
for the number 76 we get 38 and 0, 19 and 0, 9 and 1, 4 and 1, 2 and 0, 1 and 0, 0 and
1; writing out the remainders, we get 10011002 (ibid., right).
There is another way, based on the knowledge of degrees of two. At each step we
choose the greatest degree of two, not exceeding the remaining number, then write out
a unit in the corresponding digit, and subtract the corresponding degree from the
number. Suppose, for example, we needed to convert the number 757 into binary. The
greatest degree of two that does not exceed it is the ninth (512), leaving 245. The next
degree of two is the seventh (128, since 256 is not suitable); that leaves 117. Next in
exactly the same way, subtract 64, leaving 53; subtract 32, leaving 21; subtract 16,
leaving 5; subtract 4, leaving 1; subtract 1 (the zero degree of two), leaving 0. The result
will be 10111101012. This method is especially convenient if the original number
slightly exceeds one of the powers of two: for example, the number 260 is converted to
"binary" almost instantly: 256 + 4 = 1000001002.
Since, as we have already mentioned, programmers often use the number systems
of base 16 and (a little less often) 8 to reduce binary numbers, there is often a need to
convert from binary to binary.

Table 1.6. Binary representation of octal and hexadecimal digits


octal hexadecimal
dg bin di6 bin di6 bin
0 000 0 0000 8 1000
1 001 1 0001 9 1001
2 010 2 0010 A 1010
3 011 3 0011 B 1011
4 100 4 0100 C 1100
5 101 5 0101 D 1101
6 110 6 0110 E 1110
7 111 7 0111 F 1111

and vice versa. Fortunately, if the base of one number system is a natural degree of
n of the base of another number system, then one digit of the first system
corresponds exactly to n digits of the second system. In practice, this property applies
only to translations between binary and base 8 and 16, although it would be possible,
for example, to translate numbers from ternary to nine and back again in the same way;
it is just that neither ternary nor nine has been widely used in practice.
To convert a number from octal to binary, each digit of the original number is
replaced by the corresponding three binary digits (see Table 1.6). For example, for the
number 3741 , these would be the groups of digits 011 111 100 001, the insignificant
g

zero can be discarded, so the result is 111111000012. To convert from hexadecimal to


binary, do the same thing, but replace each digit with four binary digits; for example,
for 2A3F16 we get 00101010 0011 1111, and after discarding the non-significant zeros,
§1.3. Now a little math 160
we get 101010001111112.
To convert backwards, the original binary number from right to left (this is
important) is divided into groups of three or four digits (respectively for conversion to
octal or hexadecimal systems); if there are not enough digits in the highest group, it is
supplemented with non-significant zeros. Then each resulting group is replaced by the
corresponding digit. Consider, for example, the number 100001011110112. To
translate it into octal system, we will divide it into groups of three digits by adding an
insignificant zero on the left: 010 000 101 111 011; now we will replace each group
with the corresponding octal digit and get 20573g. To convert the same number into
hexadecimal, we break it into groups of four digits by adding two digits to the beginning
of the group.
non-significant zeros: 0010 00010111 1011; replacing them with the corresponding
hexadecimal digits yields 217B16.
Combinations of binary digits given in Table 1.6 are usually just remembered by
programmers, but it is not necessary to memorize them on purpose: when necessary,
they can be easily calculated, and after some time they will be stored in the memory by
themselves.
Binary fractions are converted to the decimal system in the same way as we
converted integers, only the digits after the "binary comma " correspond to fractions 1,
58

1, 1, L, etc. For example, for the number y 4816


101.0110l2 we have 4+1 + 4 + 1 + Zu = 5.40625. We can do it in another way:
having realized that one sign after the "comma" is "halves", two signs are "quarters",
three signs are "eighths" and so on, we can consider the whole fractional part as a whole
number and divide it by the corresponding degree of two. In the case under
consideration we have five digits after the "decimal point" are thirty second parts, and
1101 is 13io, so we have 5C = 5.40625.
2

The reverse conversion of a fractional number from decimal to binary is also easy,
but it is a bit more difficult to explain why it is done in this way. To begin with, we
separately convert the integer part of the number, write out what we get, and forget
about it, leaving only the decimal fraction, which is obviously smaller than one. Now
we need to find out how many halves (one or none) we have in this fractional part. To
do this, it is enough to multiply it by two. In the resulting number, the integer part can
be equal to zero or one, this is the desired "number of halves" in the original number.
Whatever the obtained integer part is, we write it out as another binary digit, and
remove it from the working number, because we have already taken it into account in
the result. The remaining number is again a fraction, obviously smaller than one,
because we have just cut off the whole part; we multiply this fraction by two to
determine the "number of quarters", write it out, cut it off, multiply it by two, determine
the "number of eighths", and so on.
For example, for the already familiar number 5.40625, the conversion back to

58
Hereinafter, in the recording of positional fractions, we will use a period to separate the fractional
part, rather than a comma, as is usually done in Russian-language literature. The point has always been
used in English texts in this role, and as a result it is used in all existing programming languages. When
programming, you should accustom yourself to the idea that there is no such thing as a "decimal point",
but rather a "decimal point".
§1.3. Now a little math 161
binary would look like this. We immediately translate the integer part as an ordinary
integer, get 101, write out the result, put a binary dot and forget about the integer part
of our original number. We're left with 0.40625. Multiply it by two and we get 0.8125.
Since the integer part is zero, we write out the digit 0 in the result (right after the decimal
point) and continue the process. Multiplying 0.8125 by two gives 1.625; write out a one
in the result, remove it from the working number (we get 0.625), multiply it by two to
get 1.25, write out a one, multiply 0.25 by two to get 0.5, write out a zero, multiply it
by two to get 1.0, write out a one. This is the end of the translation, because we still
have a zero in the working number, and, of course, no matter how many times we
multiply it, we will get only zeros; note that in principle we have the right to do so -
because to the obtained binary fraction we can add to the right side as many zeros as
we want, all of them will be insignificant. The written out result is 101.0110²2, which,
as we have seen, is the binary representation of the number 5.40625.
It will not always be so favorable; in most cases you will get an infinite (but of
course periodic) binary fraction. To understand why this happens so often, it is enough
to remember that any finite or periodic decimal fraction can be represented as an
irreducible simple fraction with an integer numerator and a natural denominator; in fact,
this is the definition of a rational number. So, it is not difficult to see that in the form
of a finite binary fraction such and only such rational numbers are representable,
which have the degree of two in the denominator. Of course, a similar restriction is
present for decimal fractions, and in general in the number system on any base, but in
the general case this restriction is formulated in a milder way: a rational number
represented as an irreducible fraction ^ is representable as a finite fraction in the system
on base N if and only if there exists an integer degree of the number N divisible integer
n. In particular, the fraction Ts can be represented as a finite decimal because 10 = 100
2

is divisible by 50 pointwise; similarly, the number 87 can be written as a finite decimal


because 10 = 10000 is divisible by 80 without remainder. In the general case we have
4

an irreducible simple fraction turned into a finite decimal fraction if its denominator is
decomposed into prime factors in the form 2 - 5т : in this case we only need to choose
fc

the greater of k and t and use it as the degree into which to raise 10.
In the case of the binary system, things are tighter: the degree of a two, whatever it
is, can only be divided integer by another degree of two. As applied to the conversion
from decimal to binary, it should be noted that any finite decimal fraction is a number
of the form 17^; in order for the denominator to contain only twos, the numerator must
be divisible by five the required number of times. Thus, the 5.40625 considered in the
example above is541006 25 , but the numerator 540625 is perfectly divisible without
5

remainder by 5 = 3125 (the result of division is 173), so after the reduction of fives in
5

the denominator remains the degree of two, which allows us to write this number as a
finite binary fraction. But of course this is not always the case; in most cases (in
particular, always when the last significant digit of the decimal fraction is different from
5) the resulting binary fraction will be infinite, although periodic. In such a case, you
must follow the above procedure with successive multiplication by two until you get a
working number that you have already seen; this will mean that you have hit a period;
recall that a periodic fraction is written by putting its period in parentheses. For
§1.3. Now a little math 162
example, for the number 0.3510, we get 0.7 (write out 0), 1.4 (write out 1, leave 0.4), 0.8
(write out 0), 1.6 (write out 1, leave 0.6), 1.2 (write out 1, leave 0.2 ) - and finally 0.4,
which we already saw four steps ago. Hence, the period of the fraction is four digits,
and the result is 0.35 = 0.01(0110)2.
10

Since periodic fractions "pop up" quite often when translating between number
systems, it is useful to be able to determine which simple fraction corresponds to a
given periodic fraction. Let us remind you that it is not difficult to turn a finite decimal
fraction into a simple fraction: the integer part should be taken separately (in extreme
case we will add it to the numerator after multiplying it by the denominator), and count
the digits in the fractional part - and the simple fraction we need will be of the form ^^,
where M is an integer obtained by writing out the digits of our fractional part, and k is
the number of these digits; it remains only to reduce the obtained fraction, if possible -
and the job is done. For example, the number 17.325 has an integer part of 17 and a
10

fraction ivki (1 393- ik.iv'I'. icmL/T- 393 k- 3r- .il.'' m , (1


494 - - I
325 3
нѵ я

n.u^yu 0.325; poliut.aeivi M "d- 325, gv- h, so


0.325 - 1000 - 40, n-ua
-lO-1 !-g L n I -I OZ ? OH
original number 17.325 - 1740 - --°-- - -4 .
93

For an arbitrary base number system, everything is done exactly the same way,
except instead of 10, the base of the number system is used. For example, for 10 .10012
we have M - 10012, k - 4, so TGSi TT'HT'CHYA RL/T 1 O1001 -- 100000 +1 100101001 (prrrr
CHT/GRTTYГ G'VГ G Py WHifPP'T'PCI we get 10 100001000010000 (all numbers here, of course,
binary). Note that for the binary case, the fraction is always irreducible, unless it
originally contained non-significant zeros; indeed, if the fractional part ends with a one,
the integer in the numerator (the same M) will be odd, and the denominator will always
be a power of two.
But what if the fraction is infinite, even though it is periodic? They don't tell you
about this at school (unless it is a math school). At first glance, the situation may seem
hopeless, because the number k turns out to be "infinity"; but in reality there is nothing
difficult here, you just need to use a different technique. Let's start with the decimal
case to make it easier to understand how it works. First of all, let's discard the integer
part, and with it those digits of the fractional part that are not included in the period of
the fraction - we already know how to deal with them.
§1.3. Now a little math 163
we know how to deal with them. We will be left with only infinitely repeating digits,
the very period. Let's denote the number in question by x and see what x - 10 . Since fc

the fraction is infinite, it will be the number obtained by writing out the digits of the
period instead of the k zeros, to which is assigned an "infinite tail" equal to x; to
"discard" this tail, it is enough to subtract x, so that the number x - 10 - x will always be
fc

a finite fraction (and if the period began with the first decimal place after the decimal
point, it will be an integer). It remains to find x by solving a trivial equation.
For example, consider the number 7.3327327327327... = 7.3(327). The part that is
already easy to handle (7.3), let's put aside, leaving 0.0(327), which we will denote by
x. The length of our period is k = 3, so we will multiply by 10 = 1000. We have 1000x
3

= 32.7(327); what-
to remove the damned periodic "tail", we subtract x from the two 32. 7 _ 327
999999
Remembering that we still have 7.3, we'll turn that into io and get
0. that 73254 _
12209
parts of the equality and get 999x = 32.7, so x 99901665 .
73 , 32773-999+327
the original number is 70 + t+h = -999+- = exactly the same, only we need to replace 10
As usual, everything happens in a system on an arbitrary basiswith
number.
the desired
For
example, let's try to deal with the fraction 0.01(0110)2, which appeared in our
calculations a few paragraphs above. We put 0.012 aside, leaving 0.00(0110) 2, which
we denote by x; the length of the period is 4, the base of the system is 2, so we will
multiply by 100002 (i.e. by 16). We have:
100002 - х = 01.10(0110)2
100002 - x - x = 1.10 2

11112 - x =1.12
111102 - х = 11 2

_11
х
= 11110
Since 0.01 = loo (digits are binary), we get:
2

11
1 1111 + 110 _ 10101
0.01(0110)2 +
()2
100 ' 11110 111100= 111100
In the decimal system, this would be Y = D = 0.35, which is exactly what we got the
above periodic fraction 0.01(0110)2 from. We could reduce the resulting simple fraction
without converting it to decimal (-| | |10101 |= -i11 ^,) using, say, the algorithm Eu-
11110010100.
klida for finding the greatest common divisor - taking two numbers, at each step
subtract the smaller one from the larger one until they are equal. Also, it is curious to
note here that 0.00(0110)2 is nothing else,
X 1 1 0 0 0 1 X 1 1 0 0 01
1 1 0 1 1 1 01
1 1 0 0 0 1 1 1 0 0 01
+
+ 0 0 0 0 0 0 11 0 0 0 1
1 1 0 0 0 1 110 0 0 1
11 0 0 0 1 1001 1 1 1 1 01
§1.3. Now a little math 164
100 1 1 1 1 1 0 1
Fig. 1.12. Multiplication by column in binary system

as one tenth; indeed, 0.35 = 0.25 + 0.1, and 0.25 is |, that is, the same 0.0²2 that we "put
aside".
At this point, the reader may ask a very reasonable question, how it is that we so
dashingly perform operations on binary numbers. The answer will be worthy of Captain
Hindsight: we do it with the ordinary "column", the same one, which is undoubtedly
familiar to the reader from junior high school, it is only necessary to remember that we
have a different base of the number system and we have only two digits at our disposal.
Say, for example, for addition, having written out two binary numbers one above the
other, in each column from right to left we count ones. If there are none, we write zero,
if there is one (1+0 or 0+1), we write one. If there are two ones, we can no longer write
the result in binary digits, and we get the old school "we write zero, we write one in our
mind". Taking into account the transposition (this "in the mind"), there can be three
ones in the column, and then we get "one write, one in the mind". This is how we, for
example, got 1111 + 110 = 10101 (be sure to try it yourself!) Subtracting by column in
the binary system is also no more difficult than in the decimal system, only you need
to remember that when borrowing from the higher digit in the lower one, you get two
(10 ), and not ten at all.
2

When reducing fractions to a common denominator, we needed to find this


common denominator and then multiply the numerators by the coefficients, but here
we got a little tricky. The point is that, obviously, 100 = 10 - 102, and 111102 = 11112
2 2

- 102, so that the common denominator will be 11112 - 102 - 102 - 102 = 111100 (if
something is not clear here, note the following fact: the school "adding zeros" works in
any system, not only in decimal). The numerator of the first fraction had to be
multiplied by 11112, but since the numerator itself was a simple one, the multiplication
caused no problems; the second numerator was not so simple, it had two ones (decimal
3), but we had to multiply it by 102, i.e. just add a zero, which we did.
Now let's see what column multiplication turns into in binary. On pg. 153 we
mentioned that Leibniz called the decimal system the fatal mistake of mankind; it is
suggested that he said this when he saw how easy it was to multiply numbers in binary.
We hope that the reader remembers how multi-digit decimal numbers are
multiplied in a column: the longer one is written out at the top, the shorter one at the
bottom, and the whole "upper" number is multiplied separately by each digit of the
"lower" one, writing out the result each time one digit to the left of the previous one. In
the binary system, everything is the same with one important difference: since there are
only two digits, the "upper" number has to be multiplied at each step either by zero
(which, as we understand, is quite simple) or by one (also, to put it bluntly, nothing
complicated). Zero chains can be written or not written; multiplication by one is
reduced, obviously, to mechanical rewriting of the first ("top") multiplier. The main
thing is not to get confused with shifts.
For example, let's try to multiply the numbers 49io = 11000l2 and 13io = 110l2 in
binary. Having written out the numbers one above the other, we multiply the first
§1.3. Now a little math 165
multiplier (1110001) first by one (that is, we simply write it out), then by zero (it is
even easier - we write out the corresponding number of zeros), then two more times by
one (see Fig. 1.12, left). Add up the resulting column and get 1001111101 ; let the
2

reader see for himself that this answer is correct. The lines of zeros can be omitted, then
the column will look like in Fig. 1.12, right.

1.3.3. Binary logic


In programming we often encounter checks of all kinds of conditions, such as "the
string is not too long", "the discriminant is not negative", "we have enough space", "the
file we need exists", "the user has chosen this option or another", etc. When we start
programming, we will quickly see that the execution of even the simplest programs is
based on condition checks. The conditions themselves are logical expressions, i.e. such
expressions that are calculated and the result of the calculation is a logical value - false
or true.
The section of mathematics where expressions of this kind are studied is called
mathematical logic; it should be said that this is a rather complex area of knowledge,
including many non-trivial theories, and a deep study of mathematical logic can take a
lifetime. In our book we will consider only one of the most primitive components of
mathematical logic - the so-called algebra of binary logic.

Table 1.7 Basic Binary Logic Operations


arguments conjunction disjunction x V excl. or implication
х У x&y, x L y y xfu х

0 0 0 0 0 1
0 1 0 1 1 1
1 0 0 1 1 0
1 1 1 1 0 1

In essence, binary logic is similar to arithmetic, but instead of an infinite set of


numbers, it uses a set consisting of only two values: 0 (false) and 1 (true). Different
operations are defined on these values, which also result in 0 or 1. Perhaps one of the
simplest logical operations is negation, "not x", which is denoted by x; other notations
such as -x can also be found in books. The negation operation reverses a value, that is,
1 = 0 and 0=1. Since the negation operation has only one argument, it is said to be
unary.
The most famous and frequently used logical binary operations, i.e. operations of
two arguments, can be considered logical or and logical and, which are also called
disjunction and conjunction in mathematics. "Logical or" between two logical values
will be true when at least one of the original values is true; of course, they can be true
at the same time, then the "logical or" between them will also remain true. The only
case where "logical or" turns out to be false is when both of its arguments are false.
"Logical and", on the other hand, is true if and only if both of its arguments are true,
and is false in all other cases.
§1.3. Now a little math 166
The operation "logical or" is usually denoted by the sign "V", as for the operation
"logical and", the most popular denotation for it is the ampersand "&", but in many
modern textbooks this operation for some reason prefers to be denoted by the sign "L".
In addition to conjunction and disjunction, the operations excluding or and
implication are quite common. "Excluding or", denoted by "f ", is true when one of its
arguments is true, but not both at once; this operation differs from the usual "or" in its
meaning for the case of both true arguments.
The operation of implication (denoted by "^") is somewhat more difficult to
understand. It is based on the principle that in reasoning, true premises can only lead to
truth, but false premises can lead to anything, i.e. they can be true or false. To
understand why this is so, we can recall that not all scientific theories were true, but at
Table 1.8. All possible binary functions of two arguments
х
у 0 & > х < у ф V ; = У х -> | 1
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

The first balloon in the history of the world has been used remarkably well and even
produced the right results. In particular, it is known that the Mongolfier brothers lifted
the first balloon in history into the air by filling it with smoke from a mixture of straw
and wool; they regarded straw as the vegetative beginning of life, and wool as the
animal beginning, which, in their opinion, should lead to the possibility of flight, and
the flight actually took place, despite the fact that the animal and vegetative beginning
had nothing to do with it. In other words, it is possible to get absolutely correct result,
starting from absolutely false premises. According to this, an implication is false only
if its left argument is true and its right argument is false (a lie cannot follow from the
truth), in all other cases the implication is considered true.
If we denote the set {0,1} by the letter B , then the logical operations of two
59

arguments can be regarded as functions whose domain of definition is B x B (i.e., the


set of pairs {(0,0), (0,1), (1, 0), (1,1)}) and whose domain of values is B itself. Since the
domain of definition of logical functions is finite, they can be defined by tables in which
the value for each element of the domain of definition is explicitly specified. Such
tables for logical functions are called truth tables; in particular, Table 1.7 contains the
truth table for the conjunction, disjunction, "excluding or" and implication we have
considered.
Since a logical function of two arguments is completely determined by its values,
of which it has four, that is, simply speaking, by the set of four binary values, we,
recalling our knowledge of combinatorics, can conclude that there are 16 of them in
total. The set of all possible logical functions of two arguments is shown in Table 1.8,
where they are ordered in ascending order of their set of values, as if the set of values

From the word Boolean, after the English mathematician George Boole, who first proposed the
59

formal system now known as the algebra of logic, or Boolean algebra.


§1.3. Now a little math 167
were a record of a binary number. The enumeration starts with the function "constant
0", which is false for any arguments, and ends with "constant 1", or tautology, a
function which, on the contrary, is true on any arguments.
As can be easily seen, the table contains, among other things, conjunction, disjunction,
implication and "excluding or", which we have just discussed. In addition, the table
reveals the functions x (always equal to the first argument, regardless of the second
argument) and y (always equal to the second argument, regardless of the first
argument), as well as their negations; the functions labeled "f " and "|" are called
"Pierce's arrow" and "Schaeffer's stroke" respectively, and represent the negation of
disjunction and the negation of conjunction:

xf y=xVy|y=x&y
The function labeled "=" is called equivalence: it is true when its arguments are equal
and false when they differ. It is easy to see that equivalence is the negation of
"excluding or". Three more functions remain, these are the "implication in reverse" ("x
^ y = y ^ x"), and the "greater than" and "less than" functions, which represent the
negation of implicatures. In total, starting with constants, we have listed exactly 16
functions, that is, all of them.
If we consider logical functions of three arguments, the domain of their definition
- the set B x B x B - consists of eight triples {(0,0,0), (0, 0,0,1), (0,1,0),..., (1,1, 0),
(1,1,1)}; as a consequence, a logical function of three arguments is defined by a set of
eight values, and the total of such functions will be 2 = 256. In the general case, a
8

logical function of n arguments is defined on a power set of 2 " and is defined by an


ordered set of 2 " values, and there are obviously 22 such functions in total . Thus, functions
from four arguments turn out to be 2 = 65536, functions from five arguments are 2 2
16 3

= 4294967296 , and so on. If something in this paragraph seems unclear, reread the
paragraph on combinatorics; if that doesn't help either, be sure to find someone
who can explain to you what's going on here. It's not about the number of functions,
of course; but if you don't understand something here, you definitely have problems
with simple combinatorics, and that's no good.
Returning to functions of two arguments, we note that conjunction and disjunction
are in many ways analogous to multiplication and addition: thus, conjunction with zero,
like multiplication by zero, always yields zero; conjunction with one, like
multiplication by one, always yields the original argument; disjunction with zero, like
addition with zero, also always yields the original argument. Because of this similarity,
mathematicians often omit the conjunction notation in formulas, whether "x & y" or "x
L y", and write simply "xu" or "x - y". In particular:

It is not necessary to memorize these ratios! It is enough to remember what


conjunction or disjunction means, and all four of these relations are quite obvious. In
fact, conjunction is equal to one only at both unit arguments, so that if one of the
arguments of conjunction is equal to zero, then whatever the second argument is, all of
it together will not become one, i.e., x - 0 = 0. x - 0 = 0; at the same time, if one of its
arguments is already known to be one, then it is necessary that the second argument be
§1.3. Now a little math 168
one, then the whole of it will be one, otherwise it will be zero; that is, if one of the
arguments is known to be one, the conjunction is equal to the second argument: x - 1 =
x. Similarly, if one of the arguments of the disjunction is known to be one, then this is
enough, and it cannot be turned into zero, i.e. x V 1 = 1; well, if one of the arguments
of the disjunction is zero, then this is not a verdict, because the second argument can
be one, and then it will all be one: x V 0 = x.
Similarly, the relations that allow opening parentheses in expressions consisting of
conjunction, disjunction, and negation do not need to be memorized:

(x V y)d = xz V yz (hu) V z = (x V z)(y V z)


The first relation is obtained from the reasoning "for the truth of the whole expression
on the left, it is necessary and sufficient that at least one of the two variables x and y be
true, and at the same time that the variable z be true; but this is the same as saying that
we need either x and z to be true at the same time (a true x will make the parenthesis
true, z must be true, otherwise the conjunction cannot be true), or for the same reasons
y and z must be true at the same time". The second relation is obtained by reasoning
"the truth of the expression can be secured either by the truth of z, or by the
simultaneous truth of x and y ; this is the same as saying that we must have two
parentheses simultaneously true, and the truth of the first of them can be secured either
by x or z, and the truth of the second by either y or z".
Here are some more elementary relations:

x&x= xx V x = x x = xx = 0 x V x =1 x = x

They, of course, do not need to be memorized either. The reader is invited to find the
corresponding reasoning on his own.
The so-called de Morgan laws deserve special mention:
x V y = x & yh &y= xVy
For some reason, the fact that these relations are nominal, that is, they bear the name of
the person who supposedly discovered them, makes many beginners quite afraid; if
someone needed to "discover" these relations, and even for this discovery they were
named by the name of the person who discovered them, then there is no way without
rote learning. Meanwhile here everything is actually quite elementary. The first ratio:
"What do we need to make a disjunction false? If at least one argument is true, then the
whole disjunction becomes true; therefore, for it to be false, both arguments must be
false, i.e. x must be false and y must be false. The second relation is, "What do we need
to make a conjunction false? Apparently, it is enough for at least one of its arguments
to be false". So much for all of de Morgan's "great and terrible" laws.
To conclude the review of binary logic, let us give one recommendation. If you
have to solve a problem in which a certain logical formula is given and you are asked
to do something with it, first make sure that only conjunction, disjunction and negation
are used in the formula. If this is not the case, immediately get rid of all other operation
signs, reducing them to the first three; note that this can always be done. Take, for
example, the mysterious implication. For it to be true, it is sufficient for the first
§1.3. Now a little math 169
argument to be false (for anything can follow from a lie); similarly, it is sufficient for
the second argument to be true (for truth can follow from both truth and falsehood). We
obtain that

x^y=xVy

Similarly, if you encounter "excluding or", replace it with one of the following: "one
of the arguments must be true, but not two at the same time" or "you need one to be
false, the other true, or vice versa":

x f y = (x V y)(hu) x f y = hu V hu

Note that the second one is obtained from the first one by opening brackets:

(x V y) (xu) = (x V y)(x V y) = xx V yx V xu V yu = xu V yu
By the way, we can write any logical function from any number of variables in a similar form,
just by looking at its truth table. Having chosen those lines where the value of the function is
1, for all such lines we write out conjuncts consisting of all variables, and those that are equal
to zero in this line are included in the conjunct with the negation sign. All the conjuncts obtained
(which will be as many as there are sets of variables on which the function is equal to one) are
written through the disjunction sign. For example, for Pierce's arrow, the corresponding entry
will consist of one conjunct hu, and for Schaeffer's stroke it will consist of three conjuncts hu
V hu V hu. If, for example, we consider a function of three arguments f (x,y,z) which is equal
to one on the four sets {(0, 0, 0, 0), (0, 0,1), (0,1,1), (0,1,1), (1,1,1) } and zero on all other sets,
then the corresponding expression for this function will look like this: xyzVxyzVxyz Vxyz Vxyz.
This form of writing a logical function is called disjunctive normal form (DNF).

1.3.4. Types of infinity


The material of this paragraph may seem too "abstruse" to you; it should be
recognized that it has nothing to do with practical programming, or, to be more precise,
you can program without it. Unfortunately, if you skip this paragraph, you won't
understand the next paragraphs of this chapter, which are about algorithms and
computability theory; but if you are bored with math, you can skip them all. No one is
stopping you from coming back here later, when you're ready for it. If you do decide to
skip the rest of the "math" chapter, just remember one thing: the term "algorithm" is
actually quite a bit more complicated than you might think. In particular, the definition
of an algorithm does not exist and cannot exist, and not in any form or in any sense
at all. Why this is so, is described in the next paragraph.
Let us return to our subject of discussion. From the school course of mathematics
we know that, for example, there are "infinitely many" numbers; this fact is expressed
in the fact that, no matter how large a number we take, we can always add a unit to it
and get an even larger number. In school we are usually satisfied with this, but in the
course of higher mathematics we have to note that infinities are different. The
"smallest" of them is the so-called countable infinity; a set is considered countably
infinite if its elements can be renumbered with natural numbers in some order. The
simplest example of countable infinity is the set of natural numbers itself: to number 1
§1.3. Now a little math 170
we give the number one, to number 2 we give the number two, and so on. The set of
integers (which, along with natural numbers, also includes zero and negative integers,
i.e. "natural numbers with minus") is also countably infinite: for example, we can give
the first number to number 0, the second number to number 1, the third number to
number -1, numbers 2 and -2 will get the fourth and fifth numbers respectively, and so
on; each positive number k gets the number 2k, and each negative number -t gets the
number 2t + 1 (for example, number 107 gets the number 214, and number -751 gets
the number 1503). The set of even natural numbers is also countable: number 2 gets
number one, number 4 gets number two, and so on to infinity, i.e. every number of the
form 2n gets number n. It turns out that there are exactly as many natural numbers as
there are even natural numbers, and, on the other hand, exactly as many as there are
integers. Mathematicians denote the "number" of natural numbers by the symbol N0
(read "aleph-zero").
On close examination it turns out that the set of rational numbers, i.e. numbers
represented as an irreducible fraction , where gn is an integer and n is a natural number, is
2

also countable. To number them, imagine a coordinate half-plane, where on the horizontal
axis are laid down the values of the denominator of the fraction - recall, such must be natural,
that is, starting from one - and on the vertical axis are laid down the values of the numerator.
In other words, we need to think of some numbering for integer points of the coordinate plane
lying to the right of the vertical axis, and it is necessary that each number should be numbered
only once, that is, for example, fractions 2, 2, 3, etc. do not need to assign different numbers,
because these fractions denote the same number. However, it is just simple: when numbering
should, firstly, skip all contractible fractions, and secondly, all fractions with numerator 0,
except for the very first such fraction - 1, which will be the denotation of zero. The simplest
example of such numbering is constructed by "corners" diverging from the origin. We will give
the number one to the fraction y. On the first "corner" will get fractions 1. 0 and -1, but the fraction
0 we, as agreed, skip, but the other two (numbers 1 and -1) will get numbers two and three.
Moving along the next "corner", we will number: 1 (#4), 2 (#5), 0 (skip), - 1 (#6) and -d (#7). On the
next "corner" after that, we will already have to skip contractible fractions: 3 (#8), 2 (skip), 3 (#9),
4 (skip), -31 (#10), -2 (skip), -3 (#11). Continuing the process "to infinity", we will assign natural

numbers to all rational numbers.


The proposal "to continue the process to infinity" may seem unconstructive, because we
do not and cannot have "infinite" time at our disposal, but in this case we do not need infinite
time. What is important is that whatever the rational number is, we will be able to determine
what number it has in our numbering, and we will be able to do it in a finite time, no matter
how "tricky" the number is given to us.
One can very easily prove that the set of infinite decimal fractions (i.e. those
irrational numbers) is not countable. Indeed, let us think of some numbering of
60

infinite decimal fractions. Let us now consider, for example, a fraction having zero

60
Just in case we remind that an irrational number is a number that cannot be represented as a fraction2

, where t is an integer and n is a natural number; an example of such a number is \/'2. It is very important
not to make the common but no less monstrous mistake of writing all infinite decimal fractions such as
1 or 7 as "irrational". The "infinity" of periodic fractions is actually due to the choice of number system
and has nothing to do with the property of the number itself; for example, in a number system with a
base of 21, both of these fractions will have a finite "21-item" representation, while 1 will be an infinite
fraction.
§1.3. Now a little math 171
integer part; as its first digit after the decimal point we take any digit except that which
is the first digit in fraction No. 1; as the second digit - any digit except that which is the
second digit in fraction No. 2, and so on. The resulting fraction will be obviously
different from every fraction in our numbering, because it differs from a fraction with
arbitrary number p by at least a digit in the n-th position after the decimal point. It turns
out that in our (infinite!) numbering this new fraction has no number, and this does not
depend on how exactly we tried to number the fractions. It is easy to see that there are
an infinite number of such "unaccounted" fractions, although it is not so important. So,
no numbering can cover the whole set of infinite decimal fractions. The proof scheme
used here is called the Cantor diagonal method in honor of the German mathematician
Georg Cantor, who invented it.
The set of infinite decimal fractions is said to have the power of a continuum;
mathematicians denote it by the symbol Ki ("aleph-one"). To understand how "many" it
is, let us conduct a mental experiment, especially since its results will be useful to us
when we consider the theory of algorithms. Suppose we have an alphabet , i.e. a finite
set of "symbols", whatever these symbols may be: they may be letters, numbers, any
signs at all, but they may just as well be elements of any nature, as long as they are a
finite set. Let us denote the alphabet by the letter A. Let us now consider the set of all
finite chains composed of the symbols of the alphabet A, that is, the set of finite
sequences of the form a-1, a2 , a3 ,.... a^, where each a" e A. Such a set including an
empty chain (a chain of length 0) is denoted by A*. It is not difficult to see that, since
the alphabet is finite and every single chain is finite too (although we do not limit their
length, i.e. we can consider chains of length a billion symbols, a trillion, a trillion
trillion, and so on), the whole set of chains will be countable. Indeed, let the alphabet
include n symbols. An empty chain will be numbered 1; chains of one character will be
numbered 2 to n +1; chains of two characters, totaling n , will be numbered n + 2 to n
2 2

+ n +1; similarly, chains of three characters will start with n + n + 2 and end with n +
2

n + n +1, and so on.


32

Let us now consider an alphabet consisting of all the characters ever used in books
on Earth. Let's include Latin, Cyrillic, Greek alphabet, Arabic letters, exotic alphabets
used in Georgian and Armenian, all Chinese, Japanese and Korean characters,
Babylonian cuneiform, hieroglyphics of ancient Egypt, Scandinavian runes, add
numbers and all mathematical signs, think for a while if we have not forgotten
something, add everything that we remember or that our acquaintances advise us. It is
clear that such a super-alphabet, despite its rather impressive size, will still be finite.
Let us denote it by the letter V and see what we end up with in the set of chains V*
(recall that we also consider only finite chains).
Obviously, this set will include all books ever written on Earth. Not only that, it
will also include all books that have not yet been written, but will someday be ; all 61

The obvious objection here is that symbols may appear in the future which we have not included
61

in V; but on closer examination nothing is changed by this, it is enough to devise some way of
designating such symbols by means of the symbols already existing. For example, we can think of some
special symbol followed by the usual decimal digits to indicate the number of the "new" symbol, and
assign these numbers to all new symbols as they appear. Moreover, you can even consider an alphabet
of two characters 0 and 1, and all other characters encoded by combinations of zeros and ones; in fact,
in the memory of computers, everything is just like that.
§1.3. Now a little math 172
books that have never been and will never be written, but theoretically could have been
written; as well as books that no one would ever write - for example, all books whose
text is a chaotic sequence of symbols from different writing systems, where the German
letter "B" is adjacent to the pre-revolutionary Russian "yat", and Japanese characters
are interspersed with Babylonian cuneiform.
If this is not enough for you, imagine a book as thick as the radius of the solar
system, and the diagonal of the sheet as from us to the neighboring galaxy, and typed
in the usual 12th font; it is clear that it is physically impossible to make such a book,
but, nevertheless, the set V* will include not just "this book", it will include all such
books, differing from each other at least by one symbol. And not only these books, the
Universe is infinite! Who's stopping us from imagining a book a billion light years in
size? And all such books?
If your imagination has not yet failed, you can only be envious. And after all this,
we remember that the set V* is "only" countable! To get a continuum, we need to
consider not just books of ridiculously huge size, we need infinite books, the kind that
have no size at all, that never end, no matter how much we move along the text; in
comparison with this, a book with a format of a billion light years turns out to be
something like a child's toy. Moreover, such an "infinite book" itself, taken separately,
does not yet give a continuum, it is only one; no, we need to consider all infinite books,
and then we will see that the considered set (of infinite books) has the power of a
continuum. But it is not quite clear how exactly we are going to "consider" these infinite
books, especially directly in all their variety, if we cannot even imagine one such book.
For those readers who want to test their imagination, we would advise them to search the
Internet for articles about the so-called Graham number. If, say, to cut the whole observable
part of the Universe into Planck holes (now in physics it is considered that it is the smallest
possible volume, which cannot be divided into parts) and imagine a number, the decimal
record of which occupies the whole Universe (or rather, all its part, at least somehow available
for observation from the Earth), and each decimal digit of the record has the size of one Planck
hole, the resulting monster and in the footsteps of Graham's number is not fit for Graham's
number; even if in each Planck hole of our Universe we "cram" one more such universe and
all these universes are filled to the brim with decimal digits - it will not bring us very close to
Graham's number. Imagination fails there long before the description of the number reaches
the finish line, despite the fact that it is quite correctly formulated in mathematical language.
Of course, the record of Graham's number is also included in our V*, which is understandable
- this number is mathematically defined (i.e. there really exists a text giving its definition) and
is a solution to a clearly formulated problem.
Infinite texts, like infinite fractions, you might say, play in a different league - there, trying
to imagine something like that is simply futile from the start.
It is quite obvious that it is impossible to operate with continuous infinities in any
constructive sense; countable infinities represent an absolute limit not only for the
human brain but for any other brain, unless such a brain happens to be infinite in itself.
Frankly speaking, even when working with countable infinities, we never consider the
infinities themselves, we simply state that whatever element N is, there will always be
an element N +1; in other words, whatever the set of elements of the set already
considered, we will always come up with another element that is part of the set but not
yet considered in the proposed set. Here, by and large, there is no "infinity"; there is
§1.3. Now a little math 173
simply our refusal to consider some "ceiling" above which for some reason it is
forbidden to jump.
At the same time, in mathematics we consider not only continuum infinities, but
also infinities exceeding the continuum: the simplest example of such an infinity is the
set of all sets of real numbers (its power is denoted by N2). Of course, such constructions
do not carry any constructive sense; as long as we set ourselves applied tasks, the
appearance of a continuum infinity in our reasoning should serve as a kind of "warning
light": be careful, going beyond the limits of the constructive. Does this mean that math
is somehow bad? Of course not; it's just that math doesn't have to be constructive at all.
Mathematicians are constantly testing the limits of human thinking, and for this alone
they should be thanked, as well as for the fact that it is mathematicians who provide the
general public with the means to develop brain capacity; for the sake of this effect alone
- the development of one's own intellectual capacity - mathematics is certainly worth
studying. It's just that not every mathematical model is suitable for applied or, if you
will, engineering purposes; that in itself is neither bad nor good, it's just a fact.
The situation with the question whether there exist sets which cannot be numbered (i.e.
uncountable) but which are "smaller" than the continuum; in other words, whether there are
any other infinities between the countable and the continuum. The existence of such infinities
can neither be proved nor disproved, i.e. we have the right to think that such sets do not exist,
but just as we have the right to think that they do. It is clear, however, that it will not be possible
to construct such a set, that is, to describe the elements of which it will consist, just as we
described natural numbers for countable sets and infinite decimal fractions for continuums; if
this were possible, the existence of such sets would be proved, and this is impossible (and
this very impossibility, strangely enough, has been proved). That is, even if we assume that
such sets exist, nobody will let us "touch" them.

1.3.5. Algorithms and computability


In discussing the history of computers, we noted (see page 62) that the job of a
computer is to perform calculations, although the results of these calculations have
nothing to do with numbers in most cases. Any information with which the computer
works must be represented in some objective form, and, as a consequence, the rules by
which one information is used to produce another (and this is what the computer does)
are nothing but functions in the strictly mathematical sense of the word: the set of
certain "portions of information" represented in a chosen objective way serves as the
domain of definition of such a function, and it also serves as the domain of values. It is
usually considered that information - both initial and received - is represented as chains
of symbols in some alphabet A, i.e. each "portion of information" is an element of the
62

set A* already familiar to us from the previous paragraph. Taking this into account, we
can consider that the computer's work always consists in calculating some function of
the form A* ^ A* (the expression X ^ Y denotes a function with the area of definition X
and the area of values Y).
In doing so, we can easily notice a certain problem. If we can calculate the value

Recall that an alphabet can be understood as an arbitrary finite set, usually consisting of at least
62

two elements, although in some problems alphabets of one symbol are also considered. An alphabet
cannot be empty.
§1.3. Now a little math 174
of a function (in any sense), then obviously we can somehow write down our
intermediate calculations, i.e., we can represent the calculation as a text. Moreover, if
a function can be computed in some sense, then we can represent in the form of a text
the rule itself of how this function should be computed. The set of all possible texts, as
we already know (see page 172), is countable. Meanwhile, by means of the same
Kantor's diagonal method, it is easy to see that the set of all functions over the natural
numbers has the power of a continuum; if we replace texts, i.e. elements of the set
A*, by their numbers (and this can be done by virtue of the countability of the set D*),
it turns out that functions of the form A* ^ A* are also a continuum, whereas the set of
all possible rules for calculating a function is no more than countable, because they can
be written in the form of texts. Consequently, it is not possible to specify for each such
function the rule by which it will be computed; some functions are said to be
computable, while others are not.
Even if we consider only functions whose domain of definition is the set of natural
numbers and whose domain of values is the set of decimal digits from 0 to 9, such a set
of functions will be a continuum: indeed, to each such function we can mutually
uniquely correspond an infinite decimal fraction with zero integer part, taking f (1) as
the first digit, f (2) as the second, f (27) as the twenty-seventh, and so on, and the set of
infinite decimal fractions, as we have already seen, is uncountable, i.e. a continuum. It
is clear that if we extend the domain of values to all natural ones, the functions will not
become less; however, they will not become "more" either - they will remain the same
continuum. At the same time, the computable functions, let us remind you, are not more
than countable infinity, because each of them corresponds to a description, i.e. a text,
and the set of all texts is countable. It turns out that there are much more "natural"
functions than there are computable functions, whatever we mean by computability - if
only we mean that the rules of constructive computation can be written down.
Since infinite binary fractions are also a continuum, we can simplify the set of functions
under consideration even more, leaving only two variants of the result: 0 and 1, or "false" and
"true". Such functions, which, taking a natural argument, produce truth or falsehood, also turn
out to be a continuum, from which it immediately follows that the set of all sets of natural
numbers is also uncountable (has the capacity of a continuum): indeed, every such function
sets the set of natural numbers, and, on the contrary, to every set of natural numbers there
corresponds a function which produces truth for the elements of the set and falsehood for the
numbers not included in the set. We shall not need this result in what follows, but it is so
beautiful that it would be barbaric not to mention it.
A computer performs its calculations by obeying a program that embodies a certain
constructive procedure, or algorithm. Simply put, in order for a computer to be useful
to us in any way - and it can only be useful by creating one piece of information from
another - we need someone who knows exactly how to get "another" piece of
information from this "one" piece of information, and knows it so well that he or she
can make the computer put this knowledge into practice without direct control from the
owner of the original knowledge; this person, in fact, is called a programmer. As it is
easy to guess, an algorithm is the very rule by which a function is computed; we can
§1.3. Now a little math 175
say that a function should be considered computable if there is an algorithm for its
computation.
The simplicity of this reasoning is actually deceptive; the concepts of algorithm
and computable function turn out to be very complicated. Thus, in almost any school
computer science textbook you will find a definition of an algorithm - not an
explanation of what we are talking about, not a story about the subject, but a definition,
a short phrase like "an algorithm is this and that" or "an algorithm is called this and
that". Such a definition is usually highlighted in big bold font, surrounded by a frame,
provided with some pictogram with an exclamation mark - in short, everything is done
to convince both students and their teachers that it should be memorized by heart.
Unfortunately, such definitions are good only as a rote-learning exercise. In fact, there
is no such thing as a definition of an algorithm, that is, whatever "definition" is given,
it will be obviously wrong: every author who gives such a definition makes a factual
error the moment he decides to start formulating it, and it does not matter at all what
the final formulation will be. There is no correct definition, not in the sense that "there
may be different definitions" or even that "we do not know the exact definition now,
but perhaps we will someday"; on the contrary, we know for sure that there is no
definition of an algorithm, and there cannot be, because any such definition,
whatever it may be, would take the basis out from under an entire section of
mathematics - the theory of computability.
To understand how this happened, we need another excursion into history. In the
first half of the 20th century, mathematicians became interested in the question of how,
among all the theoretical variety of mathematical functions, to identify those that a
person using any mechanical or any other devices can calculate. This was initiated by
David Hilbert, who in 1900 formulated a list of unsolved (at that time) mathematical
problems known as "Hilbert problems"; the problem of solving an arbitrary
Diophantine equation, known as Hilbert's tenth problem, later turned out to be
unsolvable, but to prove this fact it was necessary to create a theory formalizing the
notion of "solvability": without this, it is impossible to say anything definite about the
set of problems that can be constructively solved, or about what should be understood
by constructiveness Problems of problem solvability (in other words, computability of
functions) were dealt with by such famous mathematicians as Kurt Gödel, Stephen
Clini, Alonzo Church and Alan Turing.
Functions operating with irrational numbers had to be discarded immediately.
Irrationals turned out to be "too many"; in the previous paragraph we gave some
explanations why continuous infinities are not suitable for constructive work.
Since continuous infinities are not constructively computable, limiting us to
countable sets, Gödel and Clini proposed to consider for theoretical investigations only
functions of natural arguments (possibly several) whose values are also natural
numbers; if necessary, any functions working on arbitrary countable sets (including,
importantly for us, the set A*) can be reduced to such "natural" functions by replacing
elements of sets by their numbers.
Common sense suggests that even such functions are not always computable;
reference to common sense is required here because we have not yet understood (and,
§1.3. Now a little math 176
strictly speaking, will not understand) what a "computable function" is. Nevertheless,
as already mentioned, functions of the form N ^ N (where N is the set of natural
numbers) are a continuum, while algorithms seem to be no more than a countable set;
the total number of possible functions, even when we consider "just" natural functions
of a natural argument, is "much larger" than the number of functions that can somehow
be computed.
Studying the computability of functions, Gödel, Clini, Ackerman and other
mathematicians came up with a class of so-called partially recursive functions. The
definition of this class is a basic set of very simple initial functions (constant, increment
by one, and projection - a function of several arguments whose value is one of its
arguments) and operators, i.e., operations on functions that allow to construct new
functions (composition, primitive recursion, and minimization operators); a partial-
recursive function is any function that can be constructed with the help of the listed
operators from the listed initial functions. The word "partial" in the name of the class
indicates that this class necessarily includes functions that are defined only on some
63

set of numbers, and for numbers not included in this set they are not defined, i.e. cannot
be computed. Note that the epithet "recursive" in this context means that the functions
are expressed one through the other - perhaps even through itself, but not necessarily.
As we will see later, in programming the meaning of the term "recursion" is somewhat
narrower.
Numerous attempts to extend the set of computable functions by introducing new
operations were not successful: each time it was proved that the class of functions
defined by new sets of operations turns out to be the same as the already known class
of partially recursive functions, and all new operations are safely (though in some cases
rather cunningly) expressed through the existing ones.
Alonso Church abandoned further attempts to extend this class and stated that it
seems to be precisely the partially recursive functions that correspond to the notion of
a computable function in any reasonable understanding of computability. This claim is
called Church's thesis . Note that Church's thesis cannot be regarded as a theorem - it
cannot be proved, since we have no definition of a computable function, much less a
definition of a "reasonable understanding". But why not, you may ask, provide some
definition so that Church's thesis is provable? The answer here is very simple. By
turning Church's thesis into a supposedly proven fact, we would be depriving
ourselves, quite unreasonably, of the prospects for further research on
computability and various mechanisms of computation.
So far, all attempts to create a set of constructive operations richer than the one
proposed earlier have failed: each time it turns out that the class of functions is exactly
the same. It is quite possible that this will always be the case, that is, the class of
computable functions will never be expanded; this is what Church's thesis asserts. But
this cannot be proved, if only because it is not quite clear what a "constructive
operation" is and what their set is. Hence, there is always the possibility that in the
future someone will come up with a set of operations that will be more powerful than
the basis for partially recursive functions. In this case, Church's thesis will be refuted,

Why this is so imperative, we shall learn a little later, see the discussion on page 190. 190.
63
§1.3. Now a little math 177
or, more precisely, a new thesis will appear in its place, similar to the existing one, but
referring to a different class of functions. Let us emphasize that the definition of a
computable function will not appear because even if the class of computable functions
turns out to be extended, it cannot in itself mean that it cannot be extended further.
It is a bit of a stretch to consider that the class of partially recursive functions with all its
properties represents some abstract mathematical theory like Euclid's geometry or, say,
probability theory, while the notion of computability as such is outside mathematics, being a
property of our Universe (the "real world") along with the speed of light, the law of universal
gravitation, and the like. Church's thesis in this case turns out to be a kind of scientific
hypothesis about how the real world works; everything finally falls into place if we remember
that according to Karl Popper's theory of scientific knowledge, hypotheses are not true, but
only unproved, and the researcher must take into account that any hypothesis, no matter how
much evidence it finds, may be disproved in the future. Church's thesis states that any function
that can be constructively computed is in the class of partially recursive functions; no one has
yet been able to disprove this statement, and we therefore accept it as true. Note, Popper's
falsification criterion applies perfectly well to Church's thesis. Indeed, we can (and easily
enough) specify such an experiment, the positive result of which would disprove Church's
thesis: it is enough to construct some constructive automaton that would compute a function
that does not belong to the class of partially recursive functions.
The formal theory of algorithms is constructed in much the same way as the theory
of computability. An algorithm is said to be a constructive realization of some
transformation from an input word to a result word, and both the input word and the
result word are finite chains of symbols in some alphabet. In other words, in order to
be able to discuss an algorithm, we must first fix some alphabet A, and then algorithms
will turn out to be constructive realizations of the same familiar transformations of the
form A* ^ A*, that is, simply speaking, realizations (if you like, constructive rules of
computation) of functions of one argument, for which the argument is a word of
symbols from the alphabet, and the word is the result of computation. Of course, all
this can in no way be considered a definition of an algorithm, since it relies on such
expressions as "constructive realization", "constructive rules of computation", and
these "terms" themselves remain undefined. Continuing the analogy, we note that not
every such transformation can be realized by an algorithm, because there is a continuum
of such transformations, while algorithms are, of course, no more than a countable set,
because whatever we understand by an algorithm, we, in any case, mean that it can be
written down in some way, i.e., represented as a finite text, and the set of all possible
texts is countable. Moreover, it would not be quite right to identify an algorithm with
the transformation it performs, since two different algorithms can perform the same
transformation; we will return to this question shortly.
Alan Turing, one of the founders of the theory of algorithms, proposed a formal
model of an automaton known as a Turing machine. This automaton has a tape, infinite
in both directions, in each cell of which a character of the alphabet can be written or
the cell can be empty. Along the tape moves the head, which can be in one of several
predetermined states, and one of these states is considered the initial (in which the head
is at the beginning of work), and the other - the final (when moving to it the machine
completes work). Depending on the current state and the character in the current cell,
the machine can:
• write to the current cell any alphabetical character instead of the one currently
§1.3. Now a little math 178
written there, including the same character, i.e. leave the cell contents
unchanged;
• change the state of the head to any other state, including remaining in the state
the head was in before;
• move one position to the right, one position to the left, or remain in the current
position.
A program for a Turing machine, more commonly referred to simply as a "Turing
machine", is represented as a table specifying what the machine should do for each
combination of the current symbol and the current state; the symbols are marked
horizontally, the states are marked vertically (or vice versa), and three values are
recorded in each cell of the table: a new symbol, a new state, and the next move (left,
right, or stay put). An input word is recorded on the tape before the operation is started;
if, after some number of steps, the machine has moved to a final state, the word now
recorded on the tape is said to be the result of the operation.
Turing's thesis states that whatever reasonable understanding of an algorithm is,
any algorithm corresponding to this understanding can be realized as a Turing machine.
This thesis is confirmed by the fact that many attempts to create a "more powerful"
automaton have failed: for each created formalism (formal automaton) it is possible to
specify how to build a Turing machine analogous to it. Many such formalisms have
been constructed: these are normal Markov algorithms, and all kinds of automata with
registers, and variations on the theme of the Turing machine itself - Post's machine,
machines with several tapes, and so on. Each such "algorithmic formalism", being
considered as a concretely defined working model instead of the "elusive" notion of an
algorithm, turns out to be in one way or another useful for the development of theory,
and in some cases - for practical application; there are, in particular, programming
languages based on the lambda calculus (which should be referred rather to the theory
of computable functions), as well as languages whose computational model resembles
Markov algorithms. Thus for each such formalism it was proved that it can be realized
on a Turing machine, and a Turing machine can be realized on it.
Nevertheless, it is impossible to prove Turing's thesis, because it is impossible to
define what a "reasonable understanding of the algorithm" is; this does not exclude the
theoretical possibility that sometime in the future Turing's thesis will be disproved: for
this purpose it is enough to propose some formal automaton corresponding to our
understanding of the algorithm (i.e. constructively realizable), but having
configurations that do not translate into a Turing machine. The fact that no one has yet
managed to propose such an automaton does not formally prove anything: what if
someone is luckier?
All this is very similar to the situation with computable functions, partially
recursive functions and Church's thesis, and this similarity is not accidental. As we have
already noted, all transformations of the form A* ^ A* can be transformed into
transformations of the form N ^ N by replacing an element of the set A* (i.e. a word)
with its number (which can be done since the set A* is countable), and vice versa by
replacing the number of the word with the word itself. Moreover, it is proved that any
transformation realized by a Turing machine can be given as a partially recursive
function, and any partially recursive function can be realized as a Turing machine.
§1.3. Now a little math 179
Does this mean that "computable function" and "algorithm" are the same thing?
Formally speaking, no, and there are two reasons for this. First, both notions are
undefined, so it is impossible to prove their equivalence, as well as to disprove it.
Secondly, as already mentioned, these notions are somewhat different in their content:
if two functions written in different ways have the same domain of definition and
always give the same values for the same arguments, it is usually considered that we
are talking about two records of the same function, whereas when applied to two
algorithms, we speak about the equivalence of two different algorithms.
An excellent example of such algorithms is the solution of the famous problem of the
Towers of Hanoi. The problem involves three rods, one of which has N flat disks of different
sizes on it in the form of a kind of pyramid (the largest disk at the bottom and the smallest at
the top). In one move we can move one disk from one rod to another, and if the rod is empty,
any disk can be placed on it, but if there are already some disks on the rod, we can place only
the smaller disk on top of the larger one, but not vice versa. It is impossible to take several
disks at once, that is, we move only one disk per move. The task is to move all disks from one
rod to another in the smallest possible number of moves, using the third one as an intermediate
one.
It is well known that the problem is solved in 2N - 1 moves, where N is the number of disks,
and, moreover, the recursive algorithm for solving the problem is well known, as outlined, for
64

example, in the book by J. Perelman's "Living Mathematics" [5], which was published in 1934.
The recursion basis can be the transfer of one disk in one move from the source rod to the
target rod, but it is even easier to use as a basis the degenerate case when the problem has
already been solved and nothing needs to be transferred anywhere, i.e. the number of disks
to be moved is zero. If we need to move N disks, we use our own algorithm (i.e., recursively
access it) to first move N - 1 disks from the source rod to the intermediate rod, then move the
largest disk from the source rod to the target rod, and again access ourselves to move N - 1
disks from the intermediate rod to the target rod.
To implement this algorithm in the form of a program, we need to come up with some
rules for recording moves. This is quite simple: since only one disk is moved, the move is
represented as a pair of rod numbers: from which one and to which one we are going to move
the next disk. The initial data for our program will be the number of disks. We will write the
texts of this program in Pascal and C later, when we have studied these languages sufficiently;
for the impatient reader we can suggest to look at §2.11.2, where the solution in Pascal is
given, and for the solution in C we will have to turn to the second volume of our book (§4.3.22).
Note, ahead of time, that the recursive subroutine that performs the actual solution of the
problem will take eight lines in both languages, including the header and operator brackets,
and everything else that we have to write is auxiliary actions that check the correctness of the
input data and translate them from textual to numerical representation.
Let us now consider another solution to the same problem, without recursion. Perelman
did not give this solution, so no one knows it in the end ; at the same time, in Russian the
65

solution is described much simpler than the recursive one. So, on odd strokes (on the first,
third, fifth, etc.) the smallest disk (i.e. disk #1) moves "in a circle": from the first rod to the

When we talk about algorithms or program fragments, recursion refers to the use of such an
64

algorithm (or such a program fragment) by itself to solve a simpler case of a problem.
The author does not claim the laurels of the inventor of this variant of the algorithm, because he
65

remembers exactly that he was told this solution in his student years by one of the senior students who
took a special seminar with him, but it is difficult to remember who it was.
§1.3. Now a little math 180
second, from the second to the third, from the third to the first, and so on, or, on the contrary,
from the first to the third, from the third to the second, from the second to the first, and so on.
The choice of the "direction" of this move depends on the total number of disks: if it is even,
we go in the "natural" direction, i.e. 1 ^ 2 ^ 3 ^ 1 ^ ..., if it is odd, we go in the "reverse" circle:
1 ^ 3 ^ 2 ^ 1 ^ ..... Moving the disk on even moves is defined unambiguously by the fact that
we must not touch the smallest disk, and there is only one way to make a move without
touching it, i.e. we simply look at the two rods that do not have the smallest disk on them and
make the only possible move between them.
Strange as it may seem, the computer program embodying this variant of the puzzle
solution turns out to be much (more than ten times!) more complicated than the one given
above for the recursive case. The point here is the formal realization of the phrase "look at t he
two rods and make a single move". A person solving a puzzle will indeed look at the rods,
compare which of the upper disks is smaller, and move that disk. To do the same in a program,
we would have to remember which disks are currently present on which rod, which is not too
difficult if you know how to work with single-linked lists, but still difficult enough not to insert
the text of this program into a book - at least not in its entirety. The recursive solution was so
easy because we didn't remember which disks were present where, we knew what moves to
make without it.
What we can say for sure is that, given the same input data, this program will print the
same moves as the previous (recursive) program, although the program itself will obviously
be written quite differently. The terminological problem that arises at this point is as follows.
Obviously, we are talking about two different programs (such programs are called equivalent).
But are we talking about two different algorithms or about the same algorithm written in
different words? In most cases, the "obvious" answer is that these are two different algorithms,
albeit equivalent, i.e. realizing the same function (of course, computable).
Nevertheless, it is usually considered that the theory of algorithms and the theory
of computable functions are more or less the same thing. Common sense suggests that
this is correct: both represent some "constructive" transformation from some countable
set to itself, only the sets are considered different: in one case it is the set of chains over
some alphabet, in the other case it is simply the set of natural numbers. The numbering
of chains makes it easy to pass from one to the other without violating
"constructiveness", whatever it is; hence, we are dealing with the same essence, only
expressed in different terms. Whether to distinguish transformations only in terms of
their "external manifestations", as in the theory of computability, or in terms of their
"concrete embodiment", as in the theory of algorithms, is ultimately a matter of
tradition. Moreover, many authors use the notion of the "Church-Turing thesis", which
implies that these theses should not be separated: we are talking about the same thing,
but in different terms.
After all this has been said, any definition of an algorithm is perplexing at best,
because if we give such a definition, we will thereby scrap the theory of computability
and the theory of algorithms, Church's thesis together with Turing's thesis, multiple
attempts to build something more complex - for example, a Turing machine with
multiple tapes, proofs of their equivalence to the original Turing machine, the work of
hundreds, perhaps even thousands of researchers - and the definition will still turn out
to be either wrong or so vague that it is impossible to use it
Does this mean that the notion of algorithm cannot be used at all because of its
indefiniteness? Of course not. First, if we speak about mathematics, we often use
concepts that have no definitions: for example, we cannot define a point, a line and a
§1.3. Now a little math 181
plane, but this fact does not cancel geometry in any way. Secondly, if we speak about
the sphere of engineering and technical knowledge, strict definitions are very rare here
in general, and this does not bother anyone. Finally, we should take into account the
theses of Church and Turing: taking into account these theses, the notion of an
algorithm acquires a quite rigorous mathematical filling, it is only necessary to
remember where this filling came from, i.e., not to forget the role of the theses of
Church and Turing in our theory.

1.3.6. Algorithm and its properties


Although the notion of an algorithm itself cannot be defined, our intuitive
understanding of an algorithm allows us to talk about some properties that are fulfilled
for any object that we can think of as an algorithm. In addition, we can talk about some
characteristics of algorithms, i.e. properties that may be fulfilled for some algorithms
and not for others.
One of the basic properties of any algorithm we have already repeatedly used in
our reasoning, postulating that an algorithm can be written in the form of a text, and, of
course, a finite one: we cannot actually write infinite texts purely technically, so, as
long as algorithms are related to constructive activity, they cannot be infinite. The
representability of any algorithm in the form of a text, and a finite one at that, can be
called the properties of objectivity and finiteness of an algorithm.
Another rather obvious property of any algorithm is its discreteness: whatever the
rule is, according to which the resulting information is obtained from the initial
information, and whatever the executor of this rule is, the execution itself is always a
discrete process, which at a closer look breaks down into some elementary
steps/actions. Discreteness can also be understood in the sense that any information on
which the algorithm works can be represented as a text, which means, in particular, that
we can work with integers and rational numbers, but with irrational numbers we will
have certain difficulties. However, since there is a notation for an irrational number, an
algorithm can operate with such a number - or rather, not with it, but with its notation;
for example, we can think of an algorithm for solving quadratic equations which, if it
is impossible to extract the square root from the discriminant, will produce as a result
an expression containing this root: for example, given the equation x - 5x + 6 = 0 as
2

input, such an algorithm would produce the numbers 2 and 3 as the result, whereas for
the equation x - x - 1 =0 it would, using one or another textual notation of the square
2

root, produce the expressions1 2'5 and1 '2 '5 as the answer. It is important to realize that
such an algorithm will not operate on the numerical value of y/b at any point, since it
has no discrete representation.
Note that the number y/5 has a rather simple "analog" representation: it is enough to fix
a certain standard of unit length, for example, a centimeter, and take a rectangle with sides 1
and 2, and its diagonal will just represent the root of five. In other words, we can construct
(with a circular and a ruler, if you will) a segment that will be exactly ^/5 times longer than the
given length. If you think about it a bit, you can, using the initial integer values, get y5 as a
force acting on some body, as a volume of liquid in a vess el, and similar analog physical
quantities. Well, algorithms do not work with anything like that ; as we have
§1.3. Now a little math 182
already said, the whole theory of computability is built exclusively on integers, and computers
operate with integers (eventually).
In principle, so-called analog computers are known, working just with continuous
physical processes; initial, intermediate and resulting data in such machines are represented
by values of electrical quantities, most often voltage; but the functioning of analog computers
has nothing to do with algorithms.
The third fundamental property inherent in any algorithm is not so obvious; it is
called determinism and consists in the fact that an algorithm does not leave any
"freedom of choice" to the executor: the prescribed procedure can be followed in one
and only one way. The only thing that can affect the algorithm's execution is the initial
data; in particular, given the same initial data, the algorithm always produces the same
results.
Readers who have already experienced practical programming may notice that programs,
especially game programs (and also, for example, cryptographic programs), often deviate from
this rule by using random number sensors. In fact, this in no way contradicts the theory of
algorithms, just that random numbers, wherever they come from, should be considered as a
part of the initial data.
The properties of objectivity/infinity (in the sense that every algorithm has a finite
objective representation), discreteness and determinacy are inherent in all algorithms
without exception, i.e., if something does not possess any of these properties, it is
obviously not an algorithm. Of course, these properties should be considered rather as
axioms, i.e. statements that are accepted despite their unprovability: it is simply
assumed that algorithms, whatever is meant by them, are always so. With some stretch
one can claim that an algorithm must still be understandable for the executor, but this
is already on the edge of the foul: understandability refers not to the algorithm but to
its record, but the algorithm and its record are not the same thing; moreover, one
algorithm can have infinitely many representations even in the same record system: for
example, if you rename all the variables in a program or swap subroutines, the
algorithm will not change.
Along with the above mentioned mandatory properties, an algorithm may have (but
may not have) some private properties, such as mass, completeness (applicability to
individual input words or to all possible input words), correctness and usefulness
(whatever they mean), etc. Perhaps these properties are desirable, but nothing more;
they are characteristics that a single algorithm may or may not fulfill, and in many cases
such fulfillment cannot (cannot!) even be verified. Unfortunately, there are many
various literary sources of doubtful quality (which unfortunately include some school
textbooks of computer science), where mandatory properties of algorithms are piled in
one heap with private ones, generating an absolutely fantastic shambles.
Let us start with the mass property of an algorithm, as the simplest one; this
property is usually understood in the sense that an algorithm should be able to solve a
family of problems rather than a single problem; this is why an algorithm is made
dependent on input data. The mass property is obviously verifiable, i.e., by considering
a particular algorithm, it is easy to determine whether it has mass or not; but that is
actually the end of it. It is in no way obligatory, i.e. an algorithm that does not depend
on input words at all does not cease to be an algorithm. As among computable functions
there are constants, so among algorithms there are generators of a single result. By the
way, the famous program "Hello, world" belongs to this category - it is the example of
§1.3. Now a little math 183
the first program in a programming language, which is a favorite of many authors; all
it does is to output the phrase "Hello, world!" and terminate. By the way, we will also
66

start learning Pascal and then C with this very program. Obviously, a program that
always prints the same phrase does not depend on any input data at all and, therefore,
has no mass. If we considered massiveness to be a mandatory property of an algorithm,
we would have to assume that programs like this one do not implement any algorithms.
If, say, we consider ordinary triangles in the plane, we can notice that every triangle obeys
the triangle inequality (that is, the sum of the lengths of any two sides is obviously greater than
the length of the third side), and also its sum of angles is always equal to 180°; these are
properties of all triangles without exception. Besides, among all triangles there are also right-
angled triangles for which Pythagoras' theorem is satisfied; of course, Pythagoras' theorem is
a very important and necessary property, but it would be absurd to demand its fulfillment for
all triangles. This is also the case with the massiveness of algorithms.
When reasoning about properties of algorithms does not draw an explicit
distinction between mandatory properties, i.e., properties inherent to all algorithms
without exception, and private properties, such that an algorithm may or may not
possess, such reasoning results in gross factual errors; now we will try to discuss the
most popular of them. This discussion will allow us, first, not to repeat the mistakes,
whose popularity does not in any way cancel their grossness, and, second, in the course
of the discussion we will learn some more interesting aspects of the theory of
algorithms.
It is surprisingly common to encounter the statement that any algorithm must have
the properties of "correctness" and "completability", i.e., in other words, any algorithm
must always terminate in a finite number of steps, and not just terminate, but produce
the correct result. Of course, we would like all algorithms (and, therefore, all computer
programs) to never hang and never make any errors; as they say, it is not harmful to
want. In reality, things are quite different: such a "paradise-ideal" state of affairs turns
out to be fundamentally unattainable, not only technically, but also, as we will soon
see, purely mathematically. One might as well demand that all cars be equipped with
perpetual motion machines and that the number of l's should be equal to three to make
it easier to count.
Let us start with "correctness". It is obvious that this notion is simply impossible to
formalize; moreover, it is often impossible to specify any criterion for checking
"correctness" at all when developing an algorithm, or, even worse, different people may
use different criteria when evaluating the same algorithm. This property of the subject
area is well known to professional programmers: one and the same behavior of a
computer program may seem correct to its author and not only incorrect to the
customer, but even outrageous in some cases. No formal descriptions, no detailed
testing can correct this situation. Among programmers it is not unreasonably believed
that there are no "correct" programs, there are only such programs in which no errors
have been found yet - but it doesn't mean that there are no errors there. Some time ago
the topic of formal verification of programs, proof programming, etc. was popular
among programmers-researchers; nobody achieved any encouraging results in this field

Hello, world! (Hello, world!)


66
§1.3. Now a little math 184
and the popularity of this direction has decreased. It turns out that if we consider
correctness to be a mandatory property of any algorithm, it follows that
algorithms do not exist - at least, they cannot be "presented", because it is technically
impossible to check the property of "correctness".
The "completability" property is even more interesting. The term "applicability" is
actively used in the theory of algorithms: an algorithm is called applicable to a given
input word if, having this word as input, the algorithm terminates in a finite number of
steps. And now the most interesting thing: the problem of applicability of an
algorithm to an input word is algorithmically intractable, i.e. it is impossible -
mathematically! - to construct such an algorithm which, given a record of another
algorithm and some input word, would determine whether this algorithm is applicable
to this word or not. The applicability problem, also known as the halting problem , is
perhaps the simplest and most obvious example of algorithmic intractability.
The unsolvability of the applicability problem can be proved, as they say, in three
lines, if we involve its special case - the problem of self-applicability, i.e., the question
of whether a given algorithm will stop if we give it its own entry as input. Indeed,
suppose there is such an algorithm; let us call it S. Let us write an algorithm S' of the
following form : 67

S'(X) = ifS(X) then ENDLESS_CYCLE, otherwise EXIT

Clearly, an algorithm S' can be either self-applicable or not self-applicable. Consider its
application to itself, that is, S'(S'). Suppose S is self-applicable; then the result of S(S )
' '

is true, and hence S'(S') will, according to its own definition, go into an infinite loop,
i.e., it will not be self-applicable, which contradicts the assumption. Suppose now that
the algorithm S' is not self-applicable. Then the result of S(S') will be false, so that S'(S')
will safely terminate, that is, it will turn out to be self-applicable, which, again, contradicts

the assumption. Thus, the algorithm S is neither self-applicable nor non-self-applicable;


'

hence, it simply does not exist. As a consequence, the algorithm S does not exist,
otherwise S' could be written (regardless of which of the algorithmic formalisms we use
- both branching and infinite loop are implemented everywhere).
Let us return to the property of "completability". Usually, when it is mentioned,
there is no mention of "input words", i.e., authors who attribute this property to an
algorithm seem to mean that the algorithm should be, following the classical
terminology, applicable to any input word. Meanwhile, in fact, it is impossible (in the
general case) to check even the applicability of the algorithm to one single input word,
let alone to the entire infinite set of such words. Of course, we can specify trivial cases
of such algorithms: for example, an algorithm that always produces the same value
regardless of the input word and does not analyze this input word in any way will be
obviously applicable to any input word. However, not only in the "general case", but

67
Here we are reasoning according to a simplified scheme - in particular, we do not distinguish
between an algorithm and its record; moreover, we talk about "an algorithm in general", whereas from
the formal point of view we should use one of the strictly defined formalisms, at least the same Turing
machine. All this can be easily corrected, but the conciseness of the statement will suffer greatly, and
our task now is not to give a formal proof, but to explain why things are the way they are.
§1.3. Now a little math 185
also in all the cases that are "interesting" enough to include almost any useful computer
program, we cannot say anything definite on the subject of "completability":
algorithmic intractability is a stubborn thing. If we consider "completability" to be an
a priori property of any algorithm (that is, if we assume that something that does not
have this property is not an algorithm), then we will not be able to give examples of
algorithms at all, except for the most trivial ones, and we will certainly miss out on a
large and most interesting part of algorithms. In other words, if we include
"completability" in the notion of an algorithm, as textbook authors often do, we, with
such a notion of "algorithm", in the vast majority of cases would not be able to say
definitively whether we are looking at an algorithm or not, i.e., we would simply not
be able to distinguish algorithms from non-algorithms; why do we need such a notion?
The situation starts to look even more piquant if we take into account that in the theory of
computable functions the possible indeterminacy of a function on a subset of the domain of its
definition plays an extremely important role, and any attempts to further define functions like
"consider another function which is equal to zero at all points at which the original function is
not defined" fail miserably.
To see why this is so, consider the following rather simple reasoning. Since we have
agreed that any constructive calcul-
It follows from this that there is a countable set of computable functions, i.e. there is some
numbering of all computable functions. Let us now try to introduce such a notion of a
computable function of one argument that any function computable in this sense is defined on
the whole set of natural arguments. It is clear that, since we are talking about computable
functions, these functions are also at most a countable set. Let us denote them all by f-1 , f2 ,
f3 ,... , f ,...; in other words, let the numbered sequence f" exhaust the set of computable (in
n

our sense) functions. Consider now the function e(n) = f (n) + 1. It is clear that in any
n

reasonable sense such a function is computable, but at the same time it is different from each
of f", that is, it turns out that it is not included in the set of computable functions .
68

As a consequence, we must either agree that the addition of a unit can make a
computable function incomputable, or take it as a given that when considering computable
functions (whatever we mean by such functions) we cannot restrict ourselves to functions that
are everywhere definite. Note that if the functions of the family f" do not have to be universally
defined, then we have no contradiction; indeed, if the function f is not defined at a point k,
k

then e(n), defined in the above way, is also undefined at a point k, and, as a consequence, its
difference from each of f is no longer guaranteed by its definition; in other words, e(n) can
k

coincide with any of those functions f , which are not defined at the point corresponding to
k

their number, and, consequently, "gets the right" not to go beyond the set {f }. k

The theory of computable functions considers, among other things, a certain "simplified"
class of functions - the so-called primitive-recursive functions. They are distinguished from
partially recursive functions (see page 178) by the absence of the minimization operator, i.e.,
the functions are constructed on the basis of the same primitive functions (constant, unit

68
For the reader interested in computable functions, we can recommend for a start the book by W.
Boss "From Diophantus to Turing" [6], which contains a rather successful popular review of the relevant
mathematical theory. [6], which contains a rather successful popular review of the corresponding
mathematical theory. There are, of course, also more specialized manuals. We will leave a more detailed
presentation of the theory of algorithms and computability outside the scope of our book, in order not
to try to cover the vastness.
§1.3. Now a little math 186
increase and projection) and the operators of superposition and primitive recursion. Al l
functions constructed in this way are obviously defined for any values of arguments, there is
simply no place for them to "loop"; in the class of partially recursive functions, it is the
minimization operator that introduces the possibility of "uncertainty".
It would seem that this class is quite broad, and, if we talk about the arithmetic of integers,
we have the (deceptive) impression that it covers "almost everything". To give an example of
a partially recursive function that would not be primitive-recursive, Wilhelm Ackermann had to
invent a function that was later called the Ackermann function; this function of two arguments
grows so rapidly that all its meaningful values fit into a small table, beyond which these
numbers exceed the number of atoms in the universe.
If, however, we return from the field of integer functions to the world of algorithms, it will
turn out that, when it comes to the numbering of input and output chains, the number of atoms
in the Universe is not such a large number, while an example of a function which, being
partially recursive, is not primitive-recursive, is not primitive-recursive, "suddenly" turns out to
be any computer program that uses "non-arithmetic" loops, i.e., in fact, ordinary while
loops, for which at the moment of entering the loop it is impossible to determine (except
by passing the loop) how many iterations this loop will have. Similarly, a program using
recursion for which it is impossible to know in advance at the moment of entry how many
nesting levels there will be .
69

For someone experienced in writing computer programs, two things are quite obvious.
On the one hand, if a program is written using only arithmetic loops, such as for in Pascal,
and without any recursion (or using "primitive" recursion, for which the number of nested calls
does not exceed a given number), it will be possible to estimate (from above) the execution
time of such a program in advance. There is simply no place for such programs to loop, so no
matter what data are input, such a program will surely terminate in a finite number of steps.
On the other hand, alas, we cannot write any useful program in this way. As soon as we
try to solve a practical problem, we have to resort either to a while loop, or to recursion
without a "level counter", or to a backtrack operator (which is actually a while loop too,
so we should use a loop instead of a backtrack). Well, together with such "nondeterministic
repetition", sources of uncertainty, algorithmic unsolvability of the stopping issue and other
delights known to users under the general name of "glitches" seep into our program. When
looking at reality from this angle, it turns out that the general "glitchiness" of programs is not a
consequence of programmers' carelessness, as it is commonly believed, but rather a
mathematical property of the subject area. This is because if we add at least one
nondeterministic repetition to our program (be it a non-arithmetic loop or non-primitive
recursion), we thus move the algorithm implemented by our program from the class of
primitive-recursive functions to the class of partial-recursive functions.
Thus, the indeterminacy of some computable functions at some points,
or, what is the same, the inapplicability of some algorithms to some
input words, is a fundamental property of any models of constructive
computation, almost trivially following from the very foundations of the theory. Not only can
nothing be done about it, but we should not do anything about it, otherwise we will lose much
more than we gain: our programs will become "correct" and completely stop "glitching", but
they will also become useless.

Note that the "primitive recursion operator" uses an integer parameter that is decremented by one
69

on each recursion, so that the number of remaining recursions is always exactly equal to this parameter.
§1.3. Now a little math 187
1.3.7. Sequencing has nothing to do with it
Among the "definitions" of an algorithm, which, despite their initial incorrectness,
are still found in the literature, and more often than we would like, variations on the
theme of "sequences of actions" prevail. So, in reality, an algorithm in the general
case may have nothing in common with any sequences of actions, at least those
explicitly described.
Indeed, we have already mentioned that a variety of formal systems are used to
study algorithms: among them, for example, a Turing machine (represented by a state
transition table) and partial recursive functions (represented by a combination of basic
functions and operators). Algorithmic formalisms also include normal Markov
algorithms (a set of rules for rewriting a word in view), lambda calculus (systems of
functions of one argument over some expression space), abstract machine with
unbounded registers (MNR), and many other models. Among all these formalisms, the
sequence of actions written out in explicit form is inherent only in the ISM; meanwhile,
it has been repeatedly proved that all these formalisms are pairwise equivalent, i.e.,
they specify the same class of possible computations. In other words, any algorithm
can be represented in any of the available formal systems, among which most of
them do not imply a description (at least explicitly) of the sequence of actions.
If we return from mathematical heaven to the sinful programmer's earth, we will find that
the embodiment of an algorithm in the form of a computer program will
not always be a description of a sequence of actions . Everything here depends
on the programming paradigm used, i.e. on the style of thinking used by the programmer; in
many respects this style is determined by the programming language. Thus, when working in
Haskell and other functional programming languages, we do not have any actions at all, we
have only functions in a purely mathematical sense - functions that calculate some value on
the basis of given arguments. The calculation of such a function is expressed through other
functions, and this expression has nothing to do with actions and their sequences. In most
cases, for Haskell programs it is impossible to predict in which sequence the calculations will
be performed, because the language implements the so-called lazy semantics, which allows
you to postpone the execution of a calculation until the result is needed for another calculation.
While in Haskell we at least specify how to compute the result, even if not in the form of
a specific "sequence of actions", when working in programming languages with declarative
semantics, we do not pay attention at all to how to search for the required solution; we only
specify what properties it should have, and leave it to the system to find it. Among such
languages, the most famous is the relational language Prolog: a program in this language is
written as a set of logical statements, and its execution is a proof of a theorem (or rather, an
attempt to disprove it).
Finally, the now super-popular paradigm of object-oriented programming is also based
not on "action sequences" but on some kind of message exchange between abstract objects.
If we speak about sequences of actions, the program is written as such only in so-called
imperative programming languages, also sometimes called "Fonneyman's". Such languages
include, for example, Pascal, C, BASIC, and quite a few others; however, the popularity of the
Fonneiman style is due solely to the prevailing approach to building computer architecture,
and not at all to the "simplicity" or even "naturalness" of imperative constructions.
It is clear that programs in any really existing programming language have the property
of constructive computability, otherwise there could not exist a practical implementation of
§1.3. Now a little math 188
such a language - and it exists. It is obvious, thus, that a computer program is always an
algorithm, no matter how it is written; the definition of an algorithm as a sequence of
actions, therefore, is no good and can lead to quite dangerous misconceptions. Incidentally,
the authors of such definitions also mislead themselves: this seems to be the source of the
often-occurring but completely insane statement that "programming languages are divided into
algorithmic and non-algorithmic". Of course, in reality, any programming language is
algorithmic. Perhaps, one could imagine some programming language that does not
possess the property of determinism, based, say, on heuristics and, as a consequence, does
not guarantee not only the correctness of programs, but even their stability, and the results of
program execution make them absolutely unpredictable. Strange as it may seem, such
approaches to solving some problems do exist (at least the same neural networks), but
fortunately it did not come to creation of such programming languages: it is hard to work with
unpredictable results, such methods, in particular, completely exclude debugging, so the
practical potential of non-algorithmic computations is very doubtful (hello to neural networks
fans).
However, when speaking of "non-algorithmic" languages, the authors of textbooks of
dubious quality usually mean an entity much simpler: namely, all languages in which a
program is written otherwise than in the form of the same "sequence of actions". Strictly
speaking, "non-algorithmic" should probably be considered all modern languages, except
assembly language, because even the imperative C and Pascal allow recursive programming
and, as a consequence, writing a program in a form in which a specific sequence of actions
can not be seen.
Note that to express the same idea about "division" of programming languages we can
offer quite correct wording: for example, we can say that programming languages are divided
into imperative and non-imperative, which, of course, will require at least a brief explanation
of the term "imperative programming", and here "sequence of actions" has just the right to
appear - but already in application to programs in concrete programming languages, not to
abstract algorithms.
In defense of their picture of the world, those who like to define an algorithm as a
sequence of actions quite often resort to the statement that the theory of algorithms is
"the wrong algorithm", and the theory of computability is supposedly not related to
programming at all. It turns out to be somewhat difficult to refute such a statement - in
fact, it turns out to be purely terminological at a closer look, and disputes, the central
point of which is the meaning of this or that term, are a thankless occupation.
Nevertheless, it is still possible and even necessary to say something on this topic.
Let's start with algorithmic intractability: we can appeal to the "abstract mathematicity"
of the theory of algorithms as much as we like, but it is worth suggesting to a more or
less competent programmer to develop an add-on for the operating system that would
automatically find "hung" programs and stop them - and if the programmer is
something in terms of qualification, we will immediately hear that he won't take up
such a task, and no one will either, and if someone does, he won't solve it anyway,
because it's an algorithmically intractable problem. It turns out that in this case the
algorithm turned out to be "just right".
Moreover, the algorithmic unsolvability of the applicability problem (the stopping
problem) inspires programmers to quickly and efficiently send "to the garden" any ideas
related to proving properties of an arbitrary program: if it is impossible even to formally
predict whether it will stop or not, what can we say about more interesting properties.
§1.3. Now a little math 189
In spite of this, if we only mention computability theory or some terms from it, like
"partial recursive functions", and for some reason the risk of running into an
interlocutor claiming that "this is not the right algorithm" increases dramatically. In
principle, this phenomenon is easily explained: algorithmic intractability is usually
demonstrated without going into the maze of computability theory or even mentioning
it at all, and in most cases it is done exactly as we did on page 189. 189. The proof of
unsolvability of the problem of algorithm's self-applicability is practically trivial, so
most future programmers who have seen it at least once have no difficulties in
understanding the essence of algorithmic unsolvability. The theory of computability is
another matter: it is a very specific subject, so most programmers have never even heard
the term "partial-recursive function" and certainly do not know what it is. Of course, it
is much easier to dismiss the incomprehensible than to go into it.
Moreover, we have already shown above that the classes of functions introduced
in the theory of computability have a direct relation to practical programming (see the
discussion on page 192), namely, that primitive-recursive functions correspond to the
class of programs without "unpredictable" loops, while partial-recursive functions
correspond to the whole set of computer programs. Will there be anyone willing to
argue that an algorithm in the mathematical sense is the "wrong" algorithm?

1.4. Programs and data


1.4.1. On measuring the quantity of information
As we already know, when using a computer to store and process information, a
binary representation is used; in other words, any information in a computer is
represented by zeros and ones. The smallest unit of information in computer
processing is said to be the bit; the word bit itself is derived from the English Binary
digiT, i.e. "binary digit".
Although computers are very convenient for processing information, information
as such exists by itself and existed long before the first calculating machines appeared.
Being a quite objective phenomenon, information is measurable, and from this angle
of view, the bit so familiar to us suddenly appears in a new light. To obtain information
is to reduce uncertainty; in this case, a bit is the amount of information that halves
uncertainty. The following problem will help us to illustrate what we have said:
Vasya has conceived a natural number not exceeding 1000 and offered Petya to
guess it, promising that he will answer honestly to questions, but only those that
imply the answer "yes" or "no". For how many questions can Petya be
guaranteed to know what number is conceived?
Obviously, each answer of Vasya gives Petya exactly one bit of information. The initial
uncertainty is 1000 variants. If each received bit of information reduces the uncertainty
by half, then after receiving 10 bits of information the uncertainty will be reduced by
2 = 1024 times, i.e. it will be less than one, so that in ten questions the number should
10

be guaranteed to be recognized. It remains to understand how to formulate the questions


correctly.
§1.3. Now a little math 190
If we approach the problem from an abstract point of view, it is enough to somehow
(arbitrarily) split all remaining options into two of equal power before asking the next
question
sets, well, or at least, differing from each other in power by no more than one unit, and
ask Vasya whether the number he conceived is in one of the two sets. For example,
Petya could take a thousand cards, write a number on each of them, divide the cards
randomly into two decks of 500 cards each, hand one deck to Vasya and ask him
whether the number he conceived is written on one of the cards; if the answer is
negative, throw away all the cards that were handed to Vasya; if the answer is positive,
on the contrary, throw away all the cards that were not given to Vasya. Divide the
remaining 500 cards into two decks of 250 and offer one of them to Vasya again. After
the second answer another 250 cards will be discarded, the remaining cards will be
divided into two decks of 125. After the third answer the remaining number of cards
will be odd, so we will have to divide them into unequal parts, and as a result after the
fourth answer we will have 64 or 63 cards, depending on our luck; after the fifth answer
- either 31 or 32; after the sixth answer - either 15 or 16; after the seventh - seven or
eight, after the eighth - three or four, after the ninth - one or two. If there is one card
left, the problem is solved; if there are two cards left, we have to ask the tenth question,
after which we are guaranteed to have no more than one card left.
Of course, technically the whole thing with the cards is rather boring, although the
process is mathematically perfect. But no one forces you to divide the variants into two
sets in any way; due to a small correction, Petya can do without drawing cards, and
Vasya doesn't have to look through them over and over again. It is enough to divide the
variants in half so that all the numbers in one set are less than any of the numbers in the
other set. At the first step, Petya can ask Vasya whether the conceived number is in the
range from 1 to 500. Depending on the answer, Petya will have either 1 to 500 or 501
to 1000 numbers left to consider; the next question will be, respectively, whether the
conceived number is in the range [1, 250] or [501, 750], and so on; at each step the
number of options left to consider will be halved.
The amount of information does not have to be measured by the whole number of
bits. It is widely known, for example, a simple card trick in which the presenter first
lays out an incomplete deck of cards on the table and invites the spectator to guess a
card. The presenter then arranges the cards in three rows and asks in which of the rows
the card is found; on receiving the answer, he collects the cards and arranges them again
in three rows, and then again. Having received the third answer, the presenter draws
from somewhere in the middle of the said row the card which the spectator has guessed,
or, for the sake of the effect, he collects the cards in the deck again, counts out some
cards and discards them, and opens the card which is on top, and it turns out to be the
card which has been guessed.
In this trick 27 cards are used, that is why the deck is incomplete; usually the
performer puts aside sixes, sevens and one eight in advance. Having arranged the cards
in three rows of nine and having learned from the spectator in which of the rows the
required card is, the performer collects the cards, but does not mix them; the cards from
the row named by the spectator are placed in the middle of the deck, placing the other
two rows above and below. Then the cards are to be placed one by one again in three
§ 1.4. Programs and data 204
rows of nine, with the first in the first row, the second in the second, the third in the
third, the fourth again in the first, and so on; it is not difficult to guess that the
"necessary" cards turn out to be the three middle cards in each of the new rows, and the
rest (three from either edge in any row), though laid out on the table, come from rows
which have already been excluded from consideration. Having repeated his question
and again placed the desired row between the other two, the performer again lays them
out one by one, but this time only those three cards which lie in the very middle of each
row actually take part in the consideration; the others only create a "mass". Having
received the answer, in which row the card is this time, our magician now knows
exactly what card was guessed - it is in the middle. It can simply be taken out, but then
the spectator can immediately guess what is going on; therefore, the cards are usually
put back into the deck, with the "right" row again placed in the middle, count 13 cards
from the top and open the next, fourteenth, which was just in the middle of the right
row and, therefore, it was the spectator who guessed it.
In our conversation about the amount of information, this simple trick is
noteworthy because with each response of the viewer, the uncertainty, which originally
amounted to 27 options, is reduced by three times; such a unit of information is
sometimes called a trit. If we try to express a trit in bits, we get an irrational number
log2 3: in fact, how many times do we need to halve the uncertainty (i.e., receive one
bit of information) in order for it to be reduced threefold in the end? Naturally, it is the
number of times which, if we raise a two to such a degree, will give a three, and this is
(by the definition of logarithm) log 3.
2

Let's consider a more complex case.

Two archaeologists, studying the archives of an old castle, learned that a


treasure was buried in the courtyard of the castle, which has the shape of
a rectangle 60 x 80 meters. While one of the archaeologists went for a
metal detector, his colleague continued studying the archives and found
out that the treasure was buried in the south-eastern part of the courtyard,
at a distance of no more than 30 meters from the short wall of the castle
and no more than 20 meters from the long wall. What is the informational
value of his discovery?

For some reason, this task puzzles even many programmers; meanwhile, it is easy to
see that the initial area of the treasure hunt was 60 x 80 m , and after the discovery of
2

new information it was reduced to 30 x 20 m , i.e. exactly eight times. This is the
2

reduction of uncertainty; since one bit halves uncertainty, we are dealing with three bits
of information.
Often in problems similar to this one, we talk about the probabilities of occurrence of some
events, and then that a message indicating the occurrence of some combination of such
events has been received, and ask what is the information capacity of the received message.
For example:

On the first day of the table tennis tournament Vasya had to play first with Petya
and then with Kolya. Watching the practice games that took place before the
tournament, the fans estimated that Vasya played with Kolya on approximately
§ 1.4. Programs and data 205
equal terms, but Petya beat Vasya 75% of the time.
Masha could not attend the tournament, but she was rooting for Vasya and
asked her friends to tell her the results of both of his games. Some time later,
she received an SMS message saying that Vasya had beaten both of his
opponents. (1) What is the information capacity of this message? (2) If Vasya
had lost both games and Masha had been informed about it, what would be the
information capacity of the message?

Such problems are no more difficult to solve than the problem about archeologists in the
courtyard of a castle, but instead of a courtyard we have the space of elementary outcomes,
and instead of an area we have a probability. The probability of Vasya's victory in the first
game, according to the conditions of the problem, is 4, the probability of victory in the second game
is 2; since for combinations of independent events the probabilities are multiplied, it turns out
that the probability that Vasya will win both games is 1 ' 1 = 8. Consequently, the information
that Vasya won both games reduces uncertainty eight times, and the information value of the
message is 3 bits.
The probability that Vasya will lose both games is |, so here the information value of the
message will be, firstly, lower, since a more probable event has occurred, and, secondly, this
time the sought number will not only be not integer, but even not rational: since the uncertainty
this time decreases | times, the information value of the message, if measured still in bits, will
be log 8 =3 - log 3.
2 2

In addition to the term "bit", the term "byte" is also used, usually referring to eight
bits. Oddly enough, this was not always the case: there were computers that had a
minimum memory location of, for example, seven bits rather than eight, and on these
computers a byte was usually just the amount of information stored in one such
location; they even introduced the special term octet to denote exactly eight bits.
Fortunately, you are unlikely to encounter in practice an understanding of a byte other
than eight bits, but it is worth keeping in mind the history of the term in any case.
An eight-bit byte can take on one of 256 possible values; these values are usually
interpreted as either a number between 0 and 255, or a number between -128 and 127.
Eight-bit memory cells are well suited for storing the letters that make up a text if that
text is written in a language that uses an alphabet like Latin; a so-called character code
is used to represent each individual letter, and there are significantly fewer different
codes than there are possible byte values. With multilingual texts everything is a bit
worse: for Cyrillic there are still enough codes, but, for example, with hieroglyphs this
approach is no longer suitable. We will come back to the question of text encoding.
Since we are talking about memory cells, it should be noted that memory cells are
used to store any information that is processed by a computer, including programs or,
more precisely, command codes that make up programs. When the range of values of
one cell is not enough, several memory cells are used in a row, and we no longer speak
of a cell, but of a memory area.
It is important to realize that the memory cell itself "does not know" how the
information stored in it should be interpreted. Let's consider this on the simplest
example. Suppose we have four consecutive memory cells whose contents correspond
to the hexadecimal numbers 41, 4E, 4E and 41 (the corresponding decimal numbers
are 65, 78, 78, 65). The information contained in such memory area can be interpreted
with the same success as an integer 1095650881; as a fractional number (so-called
§ 1.4. Programs and data 206
floating point number) 12.894105; as a text string containing the name 'ANNA';
finally, as a sequence of machine commands. In particular, on i386 platform processors,
these will be the commands conventionally labeled inc ecx, dec esi, dec esi,
inc ecx; we will discuss what these commands do in the third part of the book.
The history of units of measurement of large amounts of information is rather
peculiar. Buying a flash key in a store, we usually pay attention to its capacity, which
nowadays is usually expressed in gigabytes, denoted by "GB"; the reader has
undoubtedly many times encountered other units of this kind, such as kilobyte,
megabyte, terabyte, etc. Remembering that "byte" is not always 8 bits, we may notice
that such units of measurement of memory capacity are not quite logical, but this is half
the problem - machines, on which the memory cell would differ from eight bits, soon
half a century as nobody has seen, and, of course, absolutely all current storage media
(at least those that can be bought in the store) work with eight-bit bytes. What's much
worse is that there are two different opinions about what a kilobyte, a megabyte, and a
gigabyte actually are.
Long before the era of mass computerization, which began around the mid-1980s,
people who built and worked with computers had a need to characterize the RAM
capacity of different machines in some brief way. Recall that in §1.1.2 we mentioned
the address bus, which, like other parts of the bus, consists of a number of tracks, and
each track can carry a single bit, either a logical one or a logical zero; if the address bus
contains N tracks, then this bus allows a total of 2N addresses to be distinguished; hence,
this is what - 2N - will be the memory limit on a computer using this bus.
87

In real life, memory is usually smaller than the bus allows: the bus tracks are not
as expensive as the memory itself, and they are designed "out of the box" when the
processor is built. Memory itself usually consists of banks, each of which has its own
connection to the bus, so that some tracks of the address bus select the bank to be used,
while other tracks already select the cell inside the bank. The number of memory banks
connected to a particular computer does not have to be a power of two or even an even
number - for example, they can be connected three; but the banks themselves for
convenience always contain the number of cells corresponding to the power of two,
otherwise the address space of the computer would cease to be continuous, that is, the
addresses to which the memory cells correspond would be mixed with addresses that
can not be used, because the corresponding cells in the composition of the machine is
simply not available. It is very inconvenient to work with such a "piecewise" address
space.
Anyway, the number of memory cells is always closely related to powers of two,
although it is not always such a power. For example, "in the times when computers
were big and programs were small", memory could consist of banks of, say, 2 = 8192 13

cells each; this is still OK, programmers usually remember degrees of two, but what if
there are three such banks connected to the machine? Or seven? Looking at the numbers
24576 and 57344, it is unlikely to realize that these are actually 3 - 2 and 7 - 2 .
13 13

It is not known who first noticed the closeness of the numbers 1000 and 2 = 1024
10

More specifically, the cell limit; see footnote 9 on page 63. 63.
87
§ 1.4. Programs and data 207
and suggested that 1024 cells be denoted by the term kilobyte. In fact, it is not even a
fact that "byte" was in this story from its very beginning; say, if memory cells on some
machine consisted of 39 bits, they were not usually called "bytes"; with such cells,
usually the machine word (i.e. the size of the data portion processed by the processor
in one operation) coincided with the size of the cell. If such a machine had 2048 cells,
specialists said that the memory capacity was "2 K", sometimes going so far as to
explain that they meant "2 K words"; it was clear to everyone (that is, in fact, to all
other specialists) that the "K" stood for "kilo", but it was not 1000 as in other fields, but
1024. This is quite logical, considering that memory sizes have almost never been
multiples of 1000, but they have almost always been multiples of 1024 (actually, history
also knows machines with decimal addressing, such as IBM 702 and IBM 705, but this,
as they say, passed quickly). With the use of this "K" numbers become clearer; in our
example in the paragraph above we can say about a bank that its capacity is 8 K, and
memory capacities with three and seven such banks are 24 K and 56 K respectively, it
is enough to remember the multiplication table to understand what is going on.
Bytes, apparently, appeared a little later, when computers began to actively process
text information, and it became clear that it was expensive to spend a long (30-40 bits)
word to store the code of one character, and to reduce the machine word to 8, 7, or even
6 bits - it's just absurd. The logical next step was the transition to cells smaller than the
machine word - for example, a word could correspond to two, four or eight cells. This
finally fixed the kilobyte as a unit of memory size measurement, and when the byte size
actually lost its uncertainty and "froze" in its eight-bit version, it became possible to
use the same unit to measure the amount of information (not everyone realizes that this
is not the same as the number of cells in the computer memory).
With the growth of volumes naturally appeared megabytes (1024 kB, or 2 = 20

1048576 bytes), gigabytes (2 bytes), followed by terabytes (2 ), and petabytes (2 ),


30 40 50

but at some point experts and ordinary computer users faced a rather unpleasant
phenomenon coming from the marketing departments of companies - manufacturers of
equipment. Hard disks suddenly appeared on the market, the volumes of which seemed
to be designated in gigabytes (GB), but a gigabyte was understood not as 2 , but as 10 30 9

bytes. Manufacturers motivated it by the fact that the prefix "giga-" according to
international standards means 10 (billion), and they do not care about the "jargon" of
88 9

computer specialists.
Alas, the marketeers had a lot to fight for here. If at the level of kilobytes the
difference between such and other understanding of the unit of measurement is
insignificant (actually, 2.4%), when using gigabytes the difference reaches almost
7.5%, and if you try to interpret a terabyte as a degree of two, the result will differ from
the decimal ("metric") by 10%, which is quite a lot.
The problem here is that even in the field of measuring the quantity of information,
computer scientists themselves did not always apply degrees of two. For example, the
throughput of digital communication channels, which is not technically tied to bytes,
was usually measured in bits per second, and since serial bit transmission is also not
tied to degrees of two, all sorts of kilobits, megabits and gigabits have been denoted by

Here, by the way, is another role of standards: to justify moral freaks in their moral ugliness.
88
§ 1.4. Programs and data 208
the corresponding degrees of ten, not degrees of two, since the early computer
networks.
Already in the mid-nineties, there were proposals to introduce new prefixes for
denoting degrees of 1024 in units of measurement (as opposed to the traditional degrees
of a thousand). Standardizers immediately jumped on this idea, and thus "standard"
prefixes appeared: kibi- (from the words kilo, binary), mebi-, gibi-, tebi-, pebi- and even
exbi- (2 ), zebi- (2 ) and, pardon the expression, yobi- (2 ). The designations for these
60 70 80

prefixes have also been standardized: Ki, Mi, Gi, Ti, Pi, Pi, Ei, Zi and Yi. At the same
time it was proposed to designate bytes with a capital letter "B", and bits - the whole
word "bit", so that when you see a simple "B", you do not have to guess whether you
meant bits or bytes. For example, gibibyte, i.e. 2 bytes, according to these rules should
30

be designated as GiB, and mebibit (2 bits) - Mibit.


20

The only thing left to do was to convince the general public that "from now on" a
kilobyte is equal to 1000 bytes, and here the standardizers received a passive but very
effective response from the public; simply put, for the first ten years or so the public
ignored this innovation completely. The author of these lines first heard about "kibibits"
near the end of the noughties; the largest companies, vulnerable to the activities of
standardizers, reluctantly started using "new units" with footnotes around 2012-2013
to avoid lawsuits for misleading the public by using traditional units in the "new
meaning" (which, as we understand, is smaller than what people are used to). The funny
thing is that even among the standardizers there is no complete agreement on what
constitutes a kilobyte, megabyte, and gigabyte; when it comes to RAM capacity, these
units are almost always used in their traditional meanings (1024, 1024 , 1024 ). 23

To be fair, the "new" prefixes have one undoubted advantage: there is no ambiguity
here - if, of course, you know what it is at all - i.e. there are many people in the world
who do not know what "GiB" is, but there are hardly any people who believe that it is
10 bytes.
9

Fig. 1.13. Mechanical counter

1.4.2. Machine representation of integers


We have already mentioned that computers use the binary number system. At the
same time, computers, being real technical devices, impose some restrictions on the
representation of integers. In mathematics, the set of integers is infinite, that is,
whatever the number N is, there is always the next number N +1. For this purpose, the
number of signs in the number record, whatever system we use, should not be limited
in any way - but this very requirement is technically impossible to fulfill (even purely
theoretically: after all, the number of atoms in the Universe - at least in its part
§ 1.4. Programs and data 209
accessible for observations and in general for any interaction - is considered to be
finite).
In practice, some fixed number of digits (bits) are allocated to the computer
representation of an integer; usually 8 bits (one cell), 16 bits (two cells), 32 bits (four
cells), or 64 bits (eight cells). Limiting the digit capacity results in the "largest number",
and this does not only apply to the binary system. Imagine, for example, a simple
counting device used in electric meters and mechanical odometers of old cars: a chain
of rollers with numbers on them, which can be scrolled, and passing through the
"transfer point" (from nine to zero), scroll the next roller by one. Suppose such a device
consists of five rollers (see Fig. 1.13). At first we see the number zero on it: 00000.
As we rotate the rightmost roller, the number will change, and we will see 00001,
then 00002, and so on up to 00009. If we now turn the rightmost roller one more
unit, we will again see zero in the right position, but the rightmost roller will catch its
neighbor on the left and make it turn one unit, so we will see 00010, i.e. the number
"ten"; we have observed at this well known from junior high school transposition: "nine
plus one, zero we write, one in mind". The same will happen when we go from the
number 00019 to the number 00020 and so on, and when we reach the number
00099 and scroll the right roller for one more unit, two of its neighbors will get into
the meshing, so that three rollers will be scrolled forward by one unit, and we will get
the number "one hundred": 00100.
Now it is clear where such a monster as "the largest possible number" comes from:
sooner or later our counter will count up to 99999, and now there will be nowhere to
increase the number; when we once again scroll the right clip forward by one, it will
catch all the other rollers, they will all move to the next digit, and we will see zeros
again. If we had one more roller on the left, it would be hooked and show one, so the
result would be 100000 (which is absolutely correct), but we have only five rollers,
there is no sixth one. This situation is called carrying to a non-existent digit; we
encountered it when we discussed subtraction on Pascal's arithmometer (see page 50).
When we write numbers on paper, this does not usually happen: we can always add
another digit to the left, and if there is no space left on the sheet, we can take a larger
sheet or glue another sheet to the existing one; but if the number is represented by the
state of some group of technical devices, be it a chain of rollers or a set of triggers in
the computer's RAM, we do not have the possibility to add another digit to the number.
When using the binary number system, the same thing happens, with the difference
that only two digits are used to write numbers. Suppose we use a memory cell that
contains eight digits to count some items or events. First, the cell is zero: 00000000.
Adding a one, we get a binary representation of one: 00000001. We add one more unit,
there is nowhere to increase the lowest (rightmost) digit because we have only two
digits, so it becomes zero again, but a carry occurs, so the unit appears in the second
digit: 00000010; this is the binary representation of the number 2. Next would be
00000011, 00000100, and so on. At some point there will be one in all available
digits, there will be nowhere to add further: 11111111; this is the binary
representation of the number 255 (2 - 1). If now we add one more unit, instead of
8

256 we will get "all zeros", i.e. just zero; the familiar transfer to a non-existent digit
§ 1.4. Programs and data 210
has taken place. In general, when we use the positional unmixed number system on
base N to represent positive integers and limit the number of digits by the number
k, the maximum number we can represent is Nк - 1; so, in our example with the
counter there were five decimal digits, and the maximum number was 99999 = 10 - 1, 5

and in the example with the eight-bit cell the system was binary, the digits were eight,
so the maximum number was 28 - 1 = 255.
Some high-level programming languages allow you to operate with any large integers, as
long as you have enough memory, but we will not consider such tools for now: it is important
for us to understand how the computer works, while high-level languages try to hide the
computer device from us as much as possible. Let us only note that this feature, usually called
long arithmetic, significantly reduces the speed of integer calculations. Programming
languages that support long arithmetic will be discussed in the last volume of this book.
We have already noted that one cell usually consists of eight digits and can store a
number from 0 to 255, but if you want to work with numbers from a larger range,
use several consecutive memory cells, and here the question arises quite unexpectedly,
in what order to place the parts of the representation of one number in neighboring
memory cells. Two different approaches to byte ordering are used on different
machines. One approach, called little-endian , assumes that the first goes the lowest
89

byte of the number, then the bits are arranged in ascending order, the highest byte is the
last. The second approach, called big-endian, is the exact opposite: the highest byte of
the number comes first, and the lowest byte is placed last in memory. For example, the
number 7500 in hexadecimal is written as 1D4C16. If you represent it as a 32-bit (4-byte)
integer on a computer using the big-endian approach, the four-cell memory area storing
this number will be filled as follows: the first two bytes (with the lowest addresses) will
be set to 00, the next (third) byte will be set to 1D, and the last, fourth byte will
be set to 4C: 00 00 001D4C. If the same number is written to the memory of a
computer using the little-endian approach, the values of the individual bytes in the
corresponding memory area will be in the opposite order: 4C 1D 00 00 00. Most
computers in use today use the "little-endian" order, that is, they store the least
significant byte first, although "big-endian" machines are also sometimes found.
Let us now see what to do if we need to work not only with positive numbers. It is
clear that some other way of interpreting combinations of binary digits is required, such
that some of the combinations are considered to represent a negative number. In such
cases we will say that the cell or memory area stores a signed integer, in contrast to the
previous case when we speak of an unsigned integer.
In the early days of computing, different approaches were tried to represent
negative integers, for example, storing the sign of a number as a separate bit. It turned
out, however, that it was inconvenient to realize even the simplest operations - addition
and subtraction - because the sign bit of both summands had to be taken into account.

"The terms" big-endians and little-endians were introduced by Jonathan Swift in Gulliver's Travels
89

to denote the irreconcilable supporters of breaking eggs from the blunt end and from the sharp end,
respectively. In Russian, these names were usually translated as "blunt-enders" and "sharp-enders".
Arguments in favor of one or another architecture indeed often resemble a holy war between pointy-
pointed and blunt-pointed people.
§ 1.4. Programs and data 211
Computer creators quickly enough came to the use of the so-called additional code . 90

To understand how the additional code works, let's go back to our example with a
mechanical counter. In most cases, such roller chains can spin both forward and
backward, and if scrolling forward gave us an addition of one, then scrolling backward
will subtract one. Let us now have all rollers set to zero and unscroll the counter
backwards. The result will be 99999; it is understandable, because when we added one
to 99999 we got 00000, and now we have done the reverse operation. It is said that we
have borrowed from a non-existent digit: as in the case of transfer to a non-existent
digit, if we had another roll, everything would be correct (e.g. 100000-1 = 99999), but
it is not there. The same thing happens in binary: if there were zeros in all digits of a
cell (00000000) and we subtracted a one, we get all ones: 11111111; if we now add a
one again, we get zeros in all digits again. This logically leads us to the idea of using
as a representation of the number -1 the ones in all digits of a binary number,
no matter how many such digits we have. Thus, if we are working with eight-bit
numbers, 11111111 now means -1, not 255; if we are working with sixteen-bit
numbers, 1111111111111111 now means, again, -1, not 65535, and so on.
Continuing the operation of subtracting a unit over an eight-bit cell, we will come
to the conclusion that to represent the number -2 we should use 11111110
(previously it was 254), to represent -3 - 11111101 (previously it was 253) and
so on. In other words, we voluntaristically declared some combinations of binary digits
to represent negative numbers instead of positive ones, and always the new (negative)
value of a combination of digits is obtained from the old (positive) one by subtracting
the number 256 from it: 255 - 256 = -1, 254 - 256 = -2 and so on. (the number 256
represents 2 , and our reasoning is true only for the special case with eight-bit numbers;
8

in the general case, the number 2 " must be subtracted from the old value, where n is the
used digit capacity). The question remains at what point to stop, that is, to stop counting
numbers as negative; otherwise, getting carried away, we can reach 00000001 and
declare that it is not 1 at all, but -255. The following convention is accepted: if a
set of binary digits is considered as a representation of a signed number, then the
combinations whose highest bit is 1 are considered negative, and the other
combinations are considered non-negative. It turns out that the most modulo negative
number will be represented by one unit in the high bit and zeros in all others; in the
eight-bit case, this is 10000000, -128. If you subtract one unit from this number, you
get 01111111; this combination (high zero, other units) is considered the representation
of the largest signed number and for the eight-bit case represents, as you can easily see,
the number 127. As you have already guessed, adding one to this number will again
give the largest modulo negative. Here we see two simplest cases of overflow.
The role of overflow in signed integer arithmetic is similar to the role of carry to a
non-existent digit and borrow from it in unsigned arithmetic: both are the result of a
lack of digit capacity to represent the result of an operation (addition or subtraction). In
the general case of overflow, the sum of two positive numbers turns out to be "negative"
or, conversely, the sum of negative numbers turns out to be "positive". Unless special

The English term is two's complement, that is, "binary complement".


90
§ 1.4. Programs and data 212
measures are taken, such a "result" cannot be used further, but if you know for sure that
an overflow occurred during addition or subtraction (and the processor allows you to
know it), in fact it turns out that the correct result is known, it just does not fit into the
available digit capacity. It is possible to use the storage of a larger digit capacity and
get the necessary (correct!) value.
An overflow is detected by crossing the boundary between the combinations
011...11 and 100...00. This, rather than any other location of the overflow
boundary, provides two nice features. First, the sign of a number can be determined by
taking only one (high) bit from it. Second, the operation of changing the sign of a
number is very simple. To change the sign of a number to the opposite sign when
using an additional code, it is enough to change the values in all digits to the
opposite, and to the resulting value add one. For example, the number 5 is
represented by the following eight-bit signed integer: 00000101. To get the
representation of the number -5, we first invert all the digits and get 11111010; now
add one and get 11111011, this is the representation of the number -5. For clarity, let's
do the sign change again: we invert all bits in the representation of the number -5,
get 00000100, add one, get 00000101, that is again the number 5, which
was required. As it is easy to see, the sign change operation is invariant for the
representation of zero, i.e. it remains zero: 00000000 '" ■ 11111111 - 00000000.
The same situation occurs somewhat unexpectedly for the number -128 (in the
eight-bit case) or, generally speaking, for the maximum modulo negative number of a
given digit capacity: 10000000гпѵ 'y 01111111 --■ 10000000. This is caused
by the absence of a positive number with the same modulus in the given digit capacity,
i.e. an overflow occurs when applying the sign substitution operation to the combination
100...00.
If negative numbers are represented in an additional code, addition and subtraction
are implemented in hardware in exactly the same way, regardless of the signs of the
adders or even the fact of their "sign": we can still consider all possible bit combinations
as a representation of non-negative numbers (i.e., return to unsigned arithmetic), and
addition and subtraction will not change schematically. This eliminates the need for a
separate electronic device for subtraction: the subtraction operation can be realized as
an operation of adding a number, which first changed sign, and this, paradoxically, also
works for unsigned numbers.
The point is that due to transfers to a non-existent digit and loans from it,
subtraction of a number from another number can be performed as an addition of the
decreasing number with some other number, which is itself very easy to calculate; this
number is precisely the additional code of the original subtractor. The addition can be
performed as if the numbers were unsigned. Suppose, for example, that we need to
compute the difference 75 - 19 on eight-bit numbers. The binary representation of the
number 75 is 01001011, of the number 19 is 00010011; inverting it and adding one, we
get 11101101 - the representation of the number -19. Now add up 01001011
and 11101101, and the result is the binary number 1001110002, which has nine
digits. But we have only eight digits, so the highest digit will be simply discarded when
adding, and we get the combination 00111000, which is a binary representation of the
§ 1.4. Programs and data 213
number 56.
The arithmetic basis here is quite simple. Doing a bitwise inversion of an eight-bit
number is like subtracting that number from 111111112 = 25510; since we also add one, it
turns out that when we change the sign of the number, we replace the number x (if we
treat it as unsigned) with 256 - x. Instead of subtracting y - x, we appear to be adding y
+ (256 - x), but in doing so we have a transfer to a non-existent digit, as a result of which
we lose 256; the final result is y + (256-x) -256 = y - x. We have already discussed this
effect (see page 50).
1.4.3. Floating point numbers
Not every computation can be done in integers; in fact, you can actually do any
computation , it just won't be very convenient. Therefore, in addition to integers,
91

whose representation is described in the previous paragraph, computers can also handle
fractional numbers, called floating-point numbers. Such numbers involve the separate
storage of the mantissa M (a binary fraction from the interval 1 6 M < 2) and the
machine order P, an integer denoting the degree of two by which the mantissa is to be
multiplied. A separate bit s is allocated for the sign of the number: if it is equal to 1 -
the number is considered negative, otherwise positive. The final number is considered
equal to N = ( - 1) M2SР . A set of private agreements on the format of floating-point
numbers, known as the IEEE-754 standard, is currently used by almost all processors
capable of working with fractional numbers, despite the fact that the standard itself is
an example of a jumble of extremely unsuccessful technical solutions.
Since the integer part of the mantissa is always 1, it can be left out, so all available
bits are used to store the digits of the fractional part (there are exceptions to this rule,
but we will not discuss them for now). To store the machine order at different times
used different ways - signed integer using additional code, a separate bit for the sign of
the order, etc.; IEEE-754 suggests storing the machine order as an offset unsigned
integer: the corresponding bits are treated as an unsigned integer, from which to obtain
the machine order subtract a constant called the machine order offset. The specific
number of bits allocated to the order and mantissa depends on the size of the whole
number; for example, in an eight-byte floating-point number, the first bit (as in any
other) stores the sign, the next 11 bits - the order (the order offset in this case is 1023),
and 52 bits remain for the mantissa.
It is clear that even the simplest arithmetic operations on floating-point numbers
are much more complicated than on integers. For example, to add or subtract two such
numbers, you must first bring them to the same order; for this purpose, the mantissa of
the number whose machine order is smaller than the other is shifted to the right by the
required number of positions. Then the actual addition or subtraction is performed, and
then, if the result is less than one or greater than or equal to 2, it is subjected to
normalization, that is, change the order and simultaneously shift the mantissa so that
the value of the number has not changed, but the mantissa again began to meet the

91
In fact, with integers, one can obviously represent rational numbers (as a numerator/denominator
fraction), or use the concept of so-called fixed-point numbers, where an integer is considered to
represent not units but, say, hundred-thousandths of a processed quantity.
§ 1.4. Programs and data 214
condition 1 6 M < 2. Similar normalization is done in multiplication and division of
numbers.
When shifting to the right, the lower bits of the mantissa that have no place in the
allocated digits are simply lost. The difference between the result and what would have
been obtained if nothing had been discarded is called a rounding error. Generally
speaking, rounding error in operations with binary (as well as with decimal) fractions
is inevitable, no matter how many digits we have allocated for storing the mantissa,
because even an ordinary division of two integers, which form an irreducible fraction
and the divisor is not a power of two, will result in a periodic binary fraction (see page
160), for the "exact" storage of which we would need an infinite amount of memory.
The binary fraction representation of numbers such as |, 1, 1o, etc., is infinite (though
periodic), so that significant bits must inevitably be discarded when converting them to
floating-point format. Therefore, calculations over floating-point numbers almost
always give not an exact result, but an approximate one. In particular, programmers
consider it wrong to try to compare two floating-point numbers for strict equality,
because the numbers may not be "equal" just because of rounding errors.
By the way, the author of these lines has repeatedly met the statement that any
calculations on a computer are made with errors; as you can guess, the source of this
nonsense is the same school textbooks of computer science. Don't believe it, you are being
deceived! Computer calculations in integers are absolutely accurate.

1.4.4. Texts and languages


Textual data, i.e. information presented in a form understandable to a human being,
is probably the most universal way of working with information, because a person can
describe almost anything in the form of a text: the expressive power of natural
languages, such as Russian or English, is usually sufficient for this purpose;
philosophers, united by a common trend called "linguistic positivism", generally
believe that the boundaries of language coincide with the boundaries of thinking.
However, one interesting problem arises here: in general, natural language text is
extremely difficult to be analyzed by machines. A whole branch of science - computer
linguistics - deals with problems related to such analysis. Scientists have more or less
coped with morphological analysis of natural language text, i.e. they can make a
computer program determine what kind of word it is and in what form it is: for example,
the sequence of letters "shelf" can mean the noun "shelf" in the nominative case, but
with the same success and the noun "regiment" in the genitive case, and without
analyzing the context they cannot be distinguished. The research in the field of syntactic
analysis of the text, which roughly corresponds to the school "parsing by sentence
members", can be called conditionally successful. As for the semantic analysis of
natural language text, which should result in understanding the meaning of the written
text, it is not certain that this task will ever be solved, and the growth of computing
power of computers will not help here in any way: the problem is not in the number of
calculations, but in the algorithms of analysis, the complexity of which may exceed
human capabilities.
As usual, if a machine cannot be fully adapted to the needs of people, then people,
§ 1.4. Programs and data 215
on the contrary, have to adapt to the capabilities of the machine. It is impossible (and
will not be possible in the foreseeable future) to make a machine understand text in
natural language, and the fault here, of course, is not the machine - the concept "to be
blamed" is not applicable to it at all - but people, programmers, who are by no means
omnipotent. However, it is worth simplifying the task a bit, establishing some rather
simple formal rules for constructing language constructions and offering people to form
texts in accordance with these rules - and programs written by programmers perfectly
cope with the task of analyzing such a text.
When formal rules of text construction are defined, within the framework of which
a person can specify information in such a way that computer programs will understand
it, it is said that a formal language has been created - as opposed to natural languages,
which have developed in the course of civilization as a means of communication
between people. However, formal languages are not limited to the field of computer
use and have arisen much earlier; examples of formal sign systems are, in particular,
the familiar mathematical formulas, as well as musical notation used in music, and
many others.
In general, a language can be understood as the whole (usually infinite) set of texts
that corresponds to the established rules, and in the case of natural languages such rules
are very, very complex, while for formal languages they may be quite simple, or they
may be relatively complex, although not as complex as the rules of natural languages.
Of course, this understanding of language is somewhat coarse, because it does not take
into account the semantics (i.e., simply put, the meaning) of the text, for the sake of
which the text, generally speaking, exists; nevertheless, when we are talking about
automatic (i.e., computer) analysis of a text, the matter does not come to its meaning at
once, and at the early stages of analysis the understanding of language as a set of chains
of symbols is quite consistent with the tasks at hand.
The more complex are the rules of a language, the more complex is the computer
program designed to analyze it, so formal languages are usually created as a result of a
compromise between the requirements of the task and the capabilities of programmers.
For example, when the task is to construct a graph or histogram based on the results of
some measurements or calculations, it is quite possible to use a language for
representing the initial data, which implies writing a sequence of numbers in the
decimal number system, separated by a space character and/or line feed, and does not
allow anything else. On the other hand, if the task is to write computer programs, it
requires a formal language of a special kind - a programming language; some of these
languages can be extremely complex, although, of course, still cannot compare in
complexity with natural languages.
This is not to say that computers cannot handle natural language text at all; on the
contrary, computers in today's environment do just that. The book you are reading was
typed in a word processor, its original layout was prepared by a computerized layout
system; when you send and receive e-mail and other messages, when you read Internet
sites or e-books with a pocket reader, in all these cases we see the computer working
with natural text. Of course, the computer does not understand what is being said in the
book, or website, or e-mail that the user is reading, but it doesn't need to: in all of these
cases, the computer's job is simply to take natural language text from one person and
§ 1.4. Programs and data 216
present it to another.
However, computers (or rather, computer programs) can do much more
sophisticated things with natural language texts. One of the most spectacular, though
by no means the most complex examples on this topic can be considered programs that
"maintain a conversation" with a human, the first of which - "Eliza" - was written in
92

1966 by the American scientist Joseph Weizenbaum. For a person sitting at a computer
keyboard, programs like "Eliza " can create an impression that there is a live person
"on the other side", although people who know what the matter is, in fact, quite easily
distinguish a person from the program by specially creating a conversational situation
in which the program lacks "perfection". It is noteworthy that such programs do not go
into the meaning of the text at all, i.e. they do not perform semantic analysis of the
interlocutor's lines; instead, they analyze the structure of the received phrases and
themselves use the words received from the user to construct phrases whose meaning
they do not understand.
The range of existing formal languages is quite wide; several thousand
programming languages alone have been created in the history of computers, although
not all of them exist now - many programming languages have lost all their supporters
and died, as they say, a natural death. However, there are also at least several hundred
supported programming languages, i.e. languages in which we could write programs,
if we wanted to, and then execute these programs on computers.
In addition to these, so-called markup languages are widely used to design texts;
perhaps the best known of these is HTML, which is used to create hypertext pages on
Internet sites. Note that in many school textbooks you can find a completely insane
statement that HTML is supposedly a programming language; don't believe it.
Finally, in general, any computer program that accepts information in the form of
text as input, by the very fact of its existence sets a certain formal language consisting
of all such texts in which this program works without errors. Such languages are often
very primitive and, as a rule, have no name.
When discussing formal languages, sometimes doubts arise as to whether a
language can be considered a programming language, i.e., a language in which
programs are written. In some cases, the answer depends on the answer to the question
of what a "program" is, and this question is not as simple as it seems at first sight; in
any case, there are situations when different people give different answers to the
question whether a certain text is a program or not. For example, when working with
databases, the SQL query language is quite popular; there are claims that writing
queries in this language constitutes programming, but there is also an opposite opinion.
To eliminate terminological ambiguity, a narrower term can be introduced;
languages in which any algorithm can be written are called algorithmically complete
languages. Since, as we remember, the notion of algorithm has no definition, any of

92
According to one version, the program was named after the heroine of the play Pygmalion by
Bernard Shaw; there is indeed something in common, because in the play Professor Higgins teaches
Eliza how to pronounce words correctly, but at first completely overlooks other aspects that distinguish
a lady of high society from a flower girl.
§ 1.4. Programs and data 217
the introduced formalisms is used instead, most often a Turing machine; from the
formal point of view, an algorithmically complete language is a language in which an
interpreter (if you like, a simulator or model) of a Turing machine can be written. Some
authors prefer to call such languages "Turing-complete" rather than "algorithmically
complete" for greater certainty. It is worth noting that algorithmically complete
languages often turn out to be languages that from the very beginning were not intended
for writing programs at all. For example, the book you are reading has been prepared
using the TjX markup language created by Donald Knuth; this language consists mainly
of commands that specify text features such as font shape, size and position of headings
and illustrations, etc., i.e. it is not originally designed for programming; nevertheless,
TjX is algorithmically complete, although not all of its users know this.

1.4.5. Text as a data format. Encodings


Text as such, in the common sense of the word, in whatever language it is written,
can be represented in many different ways in computer memory (as the contents of
memory cells) and as a file on disk. The reader may have already come across the fact
that when saving a file, a text editor offers to choose whether to save it in the native
format of that particular text editor or in a format intended for subsequent reading by
some other program, most often some other text editor.
Among all possible representations of text, the format called ASCII-text or plain
text has a special place; the latter can be roughly translated as "plain text", "plain text",
or "flat text", while ASCII stands for American Standard Code for Information
Interchange, a code table originally intended for use in telegraphy on teletypes (we
discussed these devices in §1.2.1). The ASCII table was fixed in 1963; at that time
eight-bit bytes had not yet become widespread, so the creators of ASCII allocated the
minimum possible number of bits to represent one character, allowing all characters
that were deemed necessary at that time to be distinguished from each other. The initial
task for the development of the standard code table implied the possibility of encoding
26 uppercase and 26 lowercase letters of the Latin alphabet, 10 Arabic numerals, as
well as a certain number of punctuation marks (including, incidentally, the usual-, and,
by the way, the standard code table).

30 40 50 60 70 80 90 100 110 120


0 (2 < F P Z d n x
:
1 )3 = G Q [ e o y
: SPC * 4
2 > H R \ f p z
:
3! + 5 ? I S ] g q {
:
4 ,6 @ J T ~ h r |
:# - 7
5 A K U i s }
:
6$ . 8 B L V ' j t ~
:% / 9
7 C M W a k u
:& 0
8 D N X b l v
:? 1
9 ; E O Y c m w
:
Fig. 1.14. Displayed symbols ASCII and decimal codes
their
§ 1.4. Programs and data 218
The standard practice was to send a message to the teletypewriter on the other side of
the line.) It was also necessary to provide several so-called control codes, which did
not represent any symbols but were used to control a teletype on the other side of the
line; it was standard practice to send a message to a teletype operating without human
control, for example, at night in a locked building.
It was not possible to fit into 64 code values (indeed, only letters and digits will be
62), so it was decided to use seven-bit encoding, that is, to represent one symbol (or
control code) using seven binary characters. The total number of different codes is 2 = 7

128. The first thirty-two of them, from 0 to 31, assigned the role of control, such as line
feed (10), carriage return (13), tabulation (9) and others. Code 32 was assigned to the
character "space"; further in the table (see Fig. 1.14) are punctuation marks and
mathematical symbols, occupying codes 33 to 47; codes 48 to 57 correspond to the
Arabic numerals from 0 to 9, so that if you subtract 48 from the code of a digit, we
always get its numerical value (we will use it again).
Uppercase Latin letters are located in the ASCII table in positions 65 through 90 in
alphabetical order, and positions 97 through 122, again in alphabetical order, are
occupied by lowercase letters. As it is easy to see, the letters are arranged in the table
in such a way that the binary representation of an uppercase letter differs from the
binary representation of a lowercase letter by exactly one bit (the sixth, i.e. the
penultimate bit). This was done on purpose to make it easier to bring all characters of
the processed text to the same case, for example, to perform a case-insensitive search.
The remaining free positions are occupied by a variety of characters that, at the
time, seemed more useful than others to the members of the working group that created
the table. The last displayed character in the ASCII table is code 126 (that's a tilde,
"~"), and code 127 is considered control, as are codes 0 through 31; this code, called
RUBOUT, was originally intended to cross out a character. The point is that in those
days characters were often represented by punched holes in punch cards, with a
punched hole corresponding to a one and an un-punched hole to a zero; a punch card
designed for storing seven-bit text was exactly seven positions wide, so each character
was represented by exactly one line of holes. The binary representation of the number
127 consists of seven units, i.e. if you "punch" this code in any line of the perforated
tape, you will get seven holes, regardless of what was there originally. When reading
the punch tape, the code 127 was to be skipped, assuming that earlier there was a
symbol in this place, which was later crossed out.
The mass transition to the use of eight-bit bytes in computers has caused a
persistent association of byte with symbol in programming, because, of course, to store
one ASCII-code began to use a byte. At the same time, the appearance of the eighth bit
allowed the creation of a number of different extensions of the ASCII-table, which used
codes from the 128-255 area. In particular, at different times and in different systems,
five (!) different ASCII extensions providing for Russian letters (Cyrillic alphabet)
were used. Historically, the first of them was the COI8 encoding (eight-bit information
exchange code), which came into use back in the mid-1970s on machines of the EC
computer series. The main disadvantage of this encoding is a somewhat "strange" order
of letters, completely different from the alphabetical order: "UABCDEF-GH..." This
makes it difficult to sort strings, requiring an unnecessary conversion step, whereas if
§ 1.4. Programs and data 219
the characters in the table are arranged in alphabetical order, sorting can be performed
by simply comparing the character codes in an arithmetic sense.
This order of letters in the KOI8 table has a very definite reason, which becomes
clear if you write out the rows of the resulting extended ASCII table with 16 elements
per row. It turns out that Cyrillic letters in KOI8 are in the second half of the 256-code
table in exactly the same positions in which their Latin counterparts are located in the
first half of the table (i.e. in the original ASCII table). The point is that in the past there
were (and still are) often situations when someone forcibly "discards" the eighth (high)
bit in text data. KOI8 is the only Russian encoding that retains readability in such a
situation. For example, the phrase "good afternoon" will turn into "DOBRYJ
DENX" when the eighth bit is discarded; it is not very convenient to read such text, but
still possible. That's why COI8 encoding for a long time was considered the only
acceptable in telecommunications and surrendered its position only under the onslaught
of Unicode, which will be discussed below. It should be noted that the Russian alphabet,
containing 33 letters, "slightly failed" to fit into two consecutive lines of 16 codes;
"unlucky" this time was the letter "ё", which was sent to the "eviction" in KOI8,
assigning to its lowercase and uppercase versions the codes A316 and BZsch, while
all other letters occupy a continuous area of codes from C016 to FF16.
In the MS-DOS era, i.e. in the 1980s and early 1990s, the most popular Cyrillic
encoding on personal computers was the so-called "alternative" Russian encoding, also
known as cp866 (code page #866). In it, it should be said, the characters were not too
well arranged: the alphabetical order was preserved, that is, it could be used for sorting
without intermediate transformations (except for the letter "ё", which was again
unlucky), but at the same time, the uppercase Russian letters in cp866 were in a row,
while between the first sixteen and the second sixteen lowercase letters there were
codes for 48 pseudo-graphic characters. It is interesting that this encoding is still used
today - for example, in some versions of Windows it is used when working with console
applications; it was also used as the only Cyrillic encoding in OS/2 family systems.
During the mass migration of users to Windows systems, many people were
unpleasantly surprised to learn that these systems used another Cyrillic encoding,
completely different from either cp866 or KOI8. It was cp1251 encoding, which
contained at the same time characters from other Cyrillic alphabets - Ukrainian,
Belarusian, Serbian, Macedonian and Bulgarian, but its creators forgot about, for
example, about
Kazakh and many other non-Slavic languages using the Cyrillic alphabet. It should be
noted that the letter "ё" was once again unlucky - it was outside the main code area.
Apple Macintosh computers also used and still use their own encoding, the so-
called MacCyrillic. Another "standard" worth mentioning is ISO 8859-5, whose code
table differs from all of the above; however, this standard, created by yet another
committee with a completely unclear purpose, has never been used anywhere.
Several basic properties of textual representation of data can be distinguished:

• any text fragment is a text;

• the union of several texts is also a text;


§ 1.4. Programs and data 220
• representation of any character occupies exactly one byte, which, in particular,
allows you to extract any character from the text by its number, without looking
at the entire text;
• text representation does not depend on the computer architecture, on the size of
a machine word, on the order of bytes in the representation of integers and on
other features, i.e. it is universal for all computers in the world;
• data in text format (text) can be read by humans on any computer using
thousands of different programs.

With the growing number of known and used symbols in computer information processing,
there was a need for some order in this area, and in 1987 the Unicode registry was created,
which is often (and mistakenly) mistaken for an encoding. In fact, the basis of Unicode is the
so-called "universal character set" (universal character set, UCS), that is, simply a register of
known characters in which they are provided with numbers; as for encodings, there are at
least four of them based on Unicode: UCS-2, UCS-4 (aka UTF-32), UTF-8 and UTF-16.
The UCS-2 and UCS-4 encodings use two bytes and four bytes, respectively, to store a
single character. Since by now there are more than 110,000 characters in the Unicode
Registry, the UCS-2 encoding does not cope with the task of representing any characters (two
bytes can store only 65536 different values, and this has long been insufficient) and is now
considered obsolete; as for UCS-4, it is tried not to use because of too large and unproductive
memory consumption: indeed, to store most characters is enough to store one byte, and more
than three bytes are not needed at all never and hardly ever needed: it would be strange to
expect that by the
It should be noted that multibyte encodings are completely devoid of most of the above
advantages of textual data representation; they can be considered textual data at all with a
very high degree of suspicion: indeed, they even depend on the byte order of the integer
representation; therefore, standards require that in a file containing "text" in any of the
Unicode-based encodings, the first two (or four) bytes must be a pseudo-character with the
code FEFF , which allows a program reading the text to understand what order the bytes are
1e

in. As a result, the union of two such "texts" does not necessarily represent a text, because
the two source texts may be in different byte order, and the notorious FEFF found in the
1e

middle of the text is not perceived as an indication of a byte order change; a fragment of "text"
in multibyte encodings does not have to be a correct text.

The only exception is the UTF-8 encoding, which continues to use a sequence of bytes
rather than multibyte integers to encode texts. Moreover, the first 127 codes in it coincide with
ASCII and it is considered that if a byte begins with a zero bit, then this byte corresponds to a
one-byte character code. Thus, plain text in the traditional ASCII format appears correct in
terms of UTF-8 and is interpreted in the same way in both ASCII and UTF-8. For characters
not included in ASCII, UTF-8 assumes the use of sequences of two, three or four bytes, and
if necessary, five or six bytes, although this is not necessary yet: Unicode simply does not
have that many characters.
In UTF-8, a byte starting with bits 110 means that we are dealing with a character whose
code is represented by two bytes; the first byte of a three-byte character starts with bits 1110;
the first byte of a four-byte character starts with bits 11110. Additional bytes (the second and
subsequent in the representation of this symbol) begin with bits 10. Thus, to represent useful
information in the two-byte code uses 11 bits of 16, in three-byte - 16 bits of 24, in four-byte -
§ 1.4. Programs and data 221
21.
Text represented in UTF-8 is independent of byte order in integers; a fragment of such
text remains correct, except perhaps for "chunks" of a single character representation at the
beginning and end; the union of two or more UTF-8 texts remains UTF-8 text. UTF-8 has only
one obvious disadvantage: since variable-length codes are used for a single character, to
retrieve a character by its number, you have to look through all the text preceding it. In addition,
UTF-8 text has a disadvantage common to all Unicode-based encodings: no one can
guarantee you that on a computer where someone tries to read text generated in UTF-8, the
font you use will have the correct images for all the characters you use. Indeed, it is not easy
to create a font that represents one hundred and ten thousand different characters.
The strangest of all Unicode encodings is UTF-16: it uses two-byte numbers, but
sometimes two such numbers are used to represent one character. This encoding has no
advantages compared to the others, but has all their disadvantages at the same time; in
particular, if we are in principle satisfied with the variable length of the character code, it would
be much better to use UTF-8, because it does not depend on byte order and is also compatible
with ASCII, and no disadvantages compared to UTF-16 it does not have. However, it is UTF-
16 is used in the Windows line of systems, starting with Windows-2000; this is due to the fact
that in earlier systems of the line, starting with Windows NT, used UCS-2, also assuming two-
byte codes; as we already know, two bytes were not enough to represent all the characters
Unicode.
Anyway, text representation based on Unicode encodings, except for UTF-8, cannot be
considered as text data in the original programming sense at all; UTF-8 is somewhat better in
this respect, but while ASCII characters are known to be present and displayed correctly in
any operating environment, the same cannot be said of other characters. Therefore, in some
cases, the use of characters not included in the ASCII set is considered
undesirable or prohibited ; for example, this applies to the texts of computer programs
in any programming language.

1.4.6. Binary and text data


When processing a variety of information, there is often a choice in what form to
present this information: in text, that is, in a form in which it can be read by a person,
or in binary; under binary in principle means any representation other than text, but
usually the data are stored in files and transmitted over computer networks in the same
form in which they are represented in the memory cells of the computer during their
processing by programs.
In order to understand the difference between text and binary representation of data,
we will conduct a small experiment, and the hexdump program, which is usually
available on any Unix system, will help us with it. First, let's create two files into
93

which we will write the sequence of numbers 1000, 2001, 3002, ..., 100099 - that is, a
total of 100 numbers, each next one greater than the previous one by 1001. The trick is
to write these numbers in one file in text representation, one per line, and in the other
file - in the form of four-byte integers, exactly as they are represented in the computer
memory when performing all sorts of calculations. We'll call the files numbers.txt

These files can be found in the archive of examples in this book; the Pascal programs to create
93

them will be given in §2.9.


§ 1.4. Programs and data 222
and numbers.bin. First of all, let's see what we've got:
avst@host:~/work$ ls
numbers.bin numbers.txt
avst@host:~/work$ Is -l
total 8
-rw-r--r-- 1 avst avst 400 Mar 23 16:21 numbers.bin
-rw-r--r-- 1 avst avst 592 Mar 23 16:21 numbers.txt
avst@host:~/work$

As we can see, the binary file is exactly 400 bytes in size, which is quite understandable:
we wrote 100 numbers of 4 bytes each into it. The size of the text file is 592 bytes, i.e.
a little more; it could have been smaller if we had written into it, for example, numbers
from 1 to 100, i.e. numbers whose representation consists of fewer digits. Let's see
where the number 592 comes from. Numbers from 1000 to 9008 have four digits in
their text representation, and such numbers are nine; numbers from 10009 to 99098 are
written with five digits each, and such numbers we have ninety; the last number of our
sequence, 100099, is represented by six digits, and such number we have one. In
addition, each number is written on a separate line, and lines, as we know, are separated
by a line feed character (character with the code 10); since there are only one hundred
numbers, there are also one hundred line feed characters after each number (including
the last one; in general, it is considered that a correct text file should end with a correct
line, i.e. the last character in it should be just a line feed character). In total we have 4 -
9 + 5 - 90 + 6 - 1 + 100 = 36 + 450 + 6 + 100 = 592, which we saw in the output of the
command ls -l.
The numbers.txt file can be printed, for example:

avst@host:~/work$ cat numbers.txt


1000
2001 3002 4003 5004

[-]
98097 99098 100099 avst@host:~/work$

It can also be opened in any text editor - if we do that, we will see the same hundred
numbers, or rather their decimal representation, and we can make changes if we want.
In other words, we can work with the numbers.txt file using our usual text tools.
The numbers.bin file is another matter. If you try to print it, you will see
some blatant gibberish, something like this:

avst@host:~/work$ cat numbers.bin


Hyah.
eu~G0#'+K.T2.6.:>xBaFJJJ3NRVHYB;ba.ei{mdqMu6y}H3n;.~gP9.".
.T.T.schjof....u1g8k<o%svVzYuchiB.FintrVJ?Z(ChZZl.рU!V%+)
IW4o8.<.@DsH\LEP. TX\M\pc.g.kovs_wH{1avst@host:~/work$

- and we'll be lucky if our terminal doesn't fall into disrepair, having accidentally
§ 1.4. Programs and data 223
received control sequences reprogramming its behavior among all this nonsense.
However, the terminal can be brought to its senses at any moment with the reset
command. Anyway, we see that numbers.bin is not intended for humans; the
same can be said about any file that is not a text file.

The hexdump program, which allows you to print the byte values of a given
file (in hexadecimal notation) and, if you specify the additional key -C, also show
the symbols corresponding to the bytes - or rather, those that can be displayed on the
screen.

avst@host:~/work$ hexdump -C numbers.bin


0000000 e8 03 00 00 07 00 00 ba 0b 00 00 a3 0f 00 |Х. .yayo .. .
0000001
0 8c 13 00 00
d1 75 17 00 00 5e 1b 00 00 47 1f 00
00 00 |..
. .u. .G
0000002
0 30 23 00 00 19 27 00 00 02 2b 00 00 eb 2e 00 00 |0#
. . ' .. .К
.
0000003
0 d4 32 00 00 bd 36 00 00 a6 3a 00 00 8f 3e 00 00 |т2
. ..6 + ..
.
0000004
0 78 42 00 00 61 46 00 00 4a 4a 00 00 33 4e 00 00 |xB
. .aF
. .J. .3
>
0 . . J. N
[-]
0000017 a4 6b 01 00 8d 6f 00 76 73 01 00 5f 01 00 |.k ..o .vs ._
0000018
0 48 7b
01 01 00 00 31 00 1a 83 01 00
77 03 01 00 |H{
. .1.
. . .. w
0
00000190 7f 01 87 . .
avst@host:~/work$

The leftmost column here is the addresses, i.e. the byte numbers from the beginning of
the file; they are 1016 spaced, since each line contains an image of sixteen consecutive
bytes. Next come the bytes themselves, and in the column to the right are the characters
whose codes are contained in the corresponding bytes (or a dot if the character cannot
be displayed). If we look in this column, we will see the familiar gibberish.
Returning to the contents of the bytes, let's remember that the numbers we have by
the problem condition are, firstly, four-byte numbers, and, secondly, our computer
belongs to the little-endian class, i.e. the least significant byte of the number comes
first. Taking the first four bytes from our dump, we see e8 03 00 00; rearranging
them in reverse order, we get 000003e8; converting from hexadecimal to decimal,
we have (taking into account that E stands for 14): 3 - 16 + 14 - 16 + 8 - 16 = 768 +
2 1 0

224 + 8 = 1000, which is, as we remember, the first number written to the file. Just in
case we do the same for the last four bytes of our dump: 03 87 01 00; rearranging
them, we get 00018703 = 1 - 16 + 8 - 163 + 7 - 16 +0 -16 + 3 - 16 = 65536+8 -
16
4 2 1 0

4096 + 7 - 256+3 = 100099; as you can see, everything fits.


To complete the picture, let's apply hexdump to the text file:

avst@host:~/work$ hexdump -C numbers.txt


00000000 31 30 30 30 30 0a 32 30 30 31 0a 33 30 30 32 0a 34
|1000.2001.3002.4| 00000010 30 30 30 33 0a 35 30 30 34 0a 36 30
30 35 0a 37 30 |003.5004.6005.70| 00000020 30 36 0a 38 30 37 0a
39 30 30 38 0a 31 30 30 |06.8007.9008.100| 00000030 30 39 0a 31
31 30 30 30 30 0a 31 32 30 31 31 31 31 31 0a 31 |09.11010.12011.1|
00000040 33 30 31 32 0a 31 34 30 31 33 0a 31 35 30 31 34
|3012.14013.15014|

[-] 00000230 0a 39 36 30 39 35 0a 39 37 30 39 36 0a 39 38 30 |.96095.97096.980| 00000240 39


§ 1.4. Programs and data 224
37 0a 39 39 39 39 30 39 38 0a 31 30 30 30 30 39 39 0a |97.99098.100099.| 00000250
avst@host:~/work$

The right column contains quite readable text, which is understandable because the file
is a text file; the hexdump program had to replace only line feed characters with dots
here. Looking through the byte values, starting from the first byte, we see 3116 (decimal
49) - the ASCII code of digit 1, then three times 30 - the ASCII code of zero, then
16

0A (decimal 10) - this is just the line feed character, and so on.
16

This representation is convenient for humans, but it is more difficult for programs
to work with it; in order to perform any calculations, numbers have to be converted into
a representation that can be handled by the CPU. Of course, this transformation, as we
will see later, is not difficult at all, it is just necessary to remember about it and, of
course, to understand what, when and in what form we have a representation.

1.4.7. Machine code, compilers and interpreters


As we have already mentioned, practically all modern digital computing machines
work on the same principle. A computing device (the computer itself) consists of a
central processor, RAM and peripherals. In most cases, all these components are
connected to a common bus.
RAM consists of identical memory cells, each of which has its own unique number,
called an address. A cell contains several (most often eight) binary digits, each of which
can be in one of two states, usually denoted as "zero" and "one". This allows the cell as
a whole to be in one of 2 " states, where n is the number of bits in the cell; so, if there
are eight bits, the possible states of the cell are 2 = 256, or, in other words, the cell can
8

"remember" a number from 0 to 255.


The CPU has a number of registers, which are circuits resembling memory cells;
since the registers are located directly in the CPU, they are very fast, but their number
is limited, so the registers should be used to store the most necessary information. The
processor has the ability to copy data from RAM to registers and back, to perform
arithmetic and other operations on the contents of the registers; in some cases,
operations can be performed directly on the data in memory cells, without copying their
contents to the registers .94

The amount of information that a processor can process in a single instruction is


called a machine word. The size of most registers is exactly equal to a machine word.
In modern systems, a machine word is usually larger than a memory cell; for example,
the machine word of a Pentium processor is 32 bits, which corresponds to four eight-
bit memory cells.
As we already know (see §1.1.3, page 66), a program is written into RAM in the

94
The presence or absence of such a possibility depends on a particular processor; for example,
Pentium processors can, bypassing the registers, add a given number to the contents of a given memory
location and perform some other operations, while the processors SPARC, used in computers from Sun
Microsystems, could only copy the contents of a memory location into a register or, conversely, the
contents of a register into a memory location, but no other operations on memory locations could not
perform.
§ 1.4. Programs and data 225
form of numeric codes denoting certain operations, and a special processor register
called the program counter or instruction pointer determines from which memory
location the next instruction code should be retrieved. The processor performs a cycle
of instruction processing, i.e. it retrieves the next instruction code from memory,
increments the instruction counter, decrypts the retrieved code, performs the prescribed
actions, retrieves the next instruction from memory again, and so on ad infinitum.
The representation of a program consisting of machine instruction codes and, as a
consequence, "understandable" to a central processor, is called machine code. The
processor can easily decipher such command codes, but it is very difficult for a human
to memorize them, especially since in many cases the required number has to be
calculated by substituting code chains of binary bits in certain places. Here, for
example, two bytes written in hexadecimal as 01 D8 (the corresponding decimal
values - 1, 216), denote on Pentium processors command "take a number from the
register EAX, add to it a number from the register EBX, the result of the addition put
back into the register EAX. Remember the two numbers 01 D8 is not difficult, but
the different commands on the Pentium processor - a few hundred, and besides here the
command itself - only the first byte (01), and the second (D8) we have to calculate
in our minds, remembering (or learning from the reference book), that the lowest three
bits in this byte denote the first register (the first summand and also the place where the
result should be written), the next three bits denote the second register, and the highest
two bits here must be equal to ones, which means that both operands are registers.
Knowing (or, again, having looked it up in the reference book) that the EAX register
number is 0 and the EBX register number is 3, we can now write down the binary
representation of our byte: 11011000 (spaces are inserted for clarity), which gives
216 in decimal notation and the desired D8 in hexadecimal notation.
If we need to recall a piece of our program written two days ago, to read it, we will
have to manually arrange bytes into their constituent bits and, checking the reference
book, remember what command does what. If a programmer is forced to compose
programs this way, he will not write anything useful in his whole life, because in any
program, even the smallest but practically applicable, there will be several thousands
of such commands, and the largest programs consist of tens of millions of machine
commands.
When working with high-level programming languages, such as Pascal, C, Lisp,
etc., the programmer is given the opportunity to write a program in a form that is
understandable and convenient for a human, but not for a central processor. The CPU
cannot execute such a program, and in order to make the program written in a high-
level language execute, one of two possible ways of program translation has to be used.
These two ways are called compilation and interpretation.
In the first case, you use a compiler, a program that takes as input the text of a
program in a high-level programming language and outputs the equivalent machine
code . For example, in the next part of the book we will write programs in Pascal; after
95

95
Generally speaking, a compiler is a program that translates programs from one language to another;
translation into machine code language is only a special case, though a very important one.
§ 1.4. Programs and data 226
typing the program text (the so-called source code) and saving it in a file, we will run
the Pascal compiler, which, after reading the text of our program, will either generate
error messages or, if the program is written in accordance with the rules of the Pascal
language, will create its equivalent in the form of an executable file containing the
machine code of our program. By running our program, we will require the operating
system to load this machine code into RAM and transfer control to it, causing the
processor to perform the actions we described in the Pascal text.
It should be emphasized that a compiler is also a program written in some
programming language; in particular, our Pascal compiler is itself, oddly enough,
written in Pascal, and its creators use the previous version of their own compiler to
compile each subsequent version.
The second way of executing programs written in high-level languages is called
interpretation. An interpreter program loads the source text of a program from a file
specified to it and performs the actions prescribed by this text step by step without
translating anything. Modern interpreters usually create their own internal
representation of the executed program for convenience and to increase the speed of
work, but this representation has nothing to do with machine code.
Here we should make an important remark. From one school textbook to another
there is a completely nonsense phrase that an interpreter supposedly translates a
program into machine code step by step and immediately executes this code. If you
have been told something like this or if you have read it yourself in some other book
written by no one (even if it has the Ministry of Education stamp, it happens) - don't
believe it, you are being deceived once again! This technique of execution is possible
in principle and even has a name - JIT-compilation (the abbreviation JIT is formed
from the English words just in time ), but it is relatively difficult to implement; for
example, it has to circumvent the restrictions imposed by the operating system, which,
unless special measures are taken, does not allow you to write anything to the memory
areas storing the machine code of the program being executed, and does not allow you
to transfer control to the memory areas whose contents the program can change (that
is, to execute their contents as a set of machines). Because of the technical problems
that arise, JIT-compilation is not so often used; by the way, many programmers do not
consider this variant of program execution as interpretation at all. A normal interpreter
does not have to be so tricky to execute our program, because it is a program in itself
and can simply perform the necessary actions without translating them into any code;
it is ten times easier to create such an interpreter than a JIT-compiler that translates
program fragments into machine code at runtime. The strange person who first put the
phrase about step-by-step translation into machine code with immediate execution into
the textbook obviously never wrote any programs himself, otherwise such an idea
would not have occurred to him.
We have already encountered one interpreter: it is a command interpreter that
handles the commands we type on the command line. As we saw in §1.2.15, it is
possible to write real programs in the language of this interpreter, called Bourne Shell.
We will not consider other interpreters for the moment, deferring familiarity with them
until Volume 3; but in general, interpreted execution is characteristic of a wide range
of programming languages, such as Perl, Python, Ruby, Lisp, and many others.
§ 1.4. Programs and data 227
It is interesting to note that nowadays the boundaries between compilation and
interpretation are gradually blurring. Many compilers do not translate a program into
machine code, but into some intermediate representation, usually called "bytecode";
this is how Java and C# compilers work. In many cases, compilers generate code that
interprets some part of the intermediate representation of the program at runtime. On
the other hand, interpreters also translate the program into an intermediate
representation in order to increase its efficiency, but then they execute it themselves.
There are compilers that seem to create a separate executable file, but a close look at
this file reveals that it contains the entire interpreter of the program's internal
representation plus the representation itself.
In any case, compiled execution often uses elements of interpretation and
interpreted execution uses elements of compilation, and the question arises as to where
to draw the line between these two approaches and whether such a line exists at all. We
venture to propose a rather simple answer to this question, which allows us to say in
each particular case whether it is compiled or interpreted execution. During the
execution of an interpreted program the interpreter itself has to be in memory,
while the compiler is needed only at the stage of compilation, and the program can be
executed without its participation. Note that not all specialists agree with this
interpretation (in particular, with the fact that JIT-compilers should be referred to
interpreters, not compilers). In general, the question of the boundaries between
interpretation and compilation is quite complex and entails a whole trail of
methodological problems; in the third volume of our book we will devote a separate
part of a rather large volume to this problem.
Programming in high-level languages is convenient but, unfortunately, not always
applicable. The reasons may be very different. For example, a high-level language may
not take into account some peculiarities of a particular processor, or the programmer
may not be satisfied with the particular way in which the compiler implements certain
constructions of the source language with the help of machine codes. In these cases, it
is necessary to abandon the high-level language and compose a program in the form of
a specific sequence of machine commands. However, as we have already seen, it is very
difficult to compose a program directly in machine codes. This is where a program
called an assembler comes to the rescue.
An assembler is a special case of a compiler: a program that takes as input a text
containing human-friendly symbols of machine commands and translates these
symbols into a sequence of corresponding machine command codes that can be
understood by the processor. Unlike machine commands themselves, their symbols,
also called mnemonics, are relatively easy to memorize. Thus, the command from the
example given earlier, whose code, as we have found out with some difficulty, is 01
D8, looks like this in the conventional designations of : 96

add eax, ebx

96
Here and below, the conventions corresponding to the NASM assembler are used unless otherwise
stated; the NASM assembler will be discussed in detail in Part 3 of our book.
§ 1.4. Programs and data 228
Here we don't need to memorize the numeric code of the command and calculate
operand designations in our minds, we just need to remember that the word add
denotes addition, and in such cases the first summand (not necessarily a register, it can
be a memory location) is always the first after the command designation, the second -
the second summand (it can be a register, a memory location, or just a number), and
the result is always written in the place of the first summand. The language of such
symbols (mnemonics) is called assembly language.
Assembly language programming is fundamentally different from programming in
high-level languages. In a high-level language (like Pascal), we specify only general
instructions, and the compiler is free to choose how to execute them - for example,
which registers and memory locations to use for storing intermediate results, which
algorithm to use for executing some non-trivial instruction, and so on. In order to
optimize performance, the compiler can rearrange instructions, replace one instruction
with another - as long as the result remains unchanged. In an assembly language
program, we specify unambiguously and unambiguously what machine
instructions our program will consist of, and the assembler (unlike a high-level
language compiler) does not have any freedom.
Unlike machine codes, mnemonics are accessible to humans, that is, a programmer
can work with mnemonics without much difficulty, but this does not mean that
programming in assembly language is easy. An action that would take us one high-
level language statement to describe may require dozens, if not hundreds, of assembly
language lines, and in some cases even more. The point is that a high-level language
compiler contains a large set of ready-made "recipes" for solving often arising small
problems and provides all these "recipes" to the programmer in the form of convenient
high-level constructs; assembly language contains nothing of the kind, so we have only
the processor's capabilities at our disposal.
It is interesting that there can be several different assemblers for one processor. At
first glance this seems strange, because the same processor cannot work with different
machine code systems (so-called instruction systems). In fact, there is nothing strange
here, just remember what an assembler really is. The instruction system of a processor,
of course, cannot change (unless you take a different processor). However, for the same
commands it is possible to invent different designations; for example, the already
familiar command add eax, ebx in the designations proposed by AT&T will
look like addl %ebx, %eax - and the mnemonics are different, and the registers
are not so labeled, and the operands are not in the same order, although the resulting
machine code is, of course, strictly the same - 01 D8. Besides, when programming in
assembly language, we usually write not only mnemonics of machine commands, but
also directives, which are direct orders to the assembler. Following such directives, the
assembler can reserve memory, declare this or that label visible from other program
modules, proceed to generation of another program section, calculate (right during
assembly) some expression, and even "write" a program fragment in assembly language
(following, of course, our instructions), which it will process later. The set of such
directives supported by the assembler can also be different, both in terms of capabilities
and syntax.
Since an assembler is nothing more than a program written by quite ordinary
§ 1.4. Programs and data 229
programmers, no one prevents other programmers from writing their own assembler
program, which is often the case. The NASM assembler discussed in our book is one
of the assemblers that exist for the 80x86 family of processors; there are other
assemblers for these processors.
Part 2

The Pascal language and the


beginnings of programming
In this part of the book, we will try to move from words to deeds and learn the basics of
writing computer programs, for which we will need the Pascal language. Pascal was originally
proposed by the Swiss scientist Niklaus Wirth in 1970 as a language for teaching
programming; he also wrote the very first Pascal compiler.
It is rather surprising for beginners to hear the statement that nowadays it is impossible
to define unambiguously what the Pascal language is. The fact is that over the decades of
Pascal's history various people and organizations have created their own translators of this
language and at the same time introduced various extensions and changes into the language;
such modified versions of the programming language are usually called dialects. By now, the
concept of the Pascal language has become so blurred that it always requires clarification.
Both Wirth's original definition of the language and the existing Pascal standards are very
limited in their scope, and, in fact, no one has looked back on them for a long time.
The implementation we will use as a tutorial is called Free Pascal and dates back to 1993;
its author, Florian Paul Kiampfl, began developing his own Pascal compiler in response to
Borland's announcement that it was discontinuing its line of Pascal compilers for MS-DOS.
The Free Pascal compiler is currently available for all of the most popular operating systems,
including Linux and FreeBSD (as well as Windows, MacOSX, OS/2, and iOS; at the time of
this writing, support for Android was announced soon). Free Pascal supports several different
§2.1 First programs 232
Pascal dialects (the programmer's choice) and includes a huge number of various features
from other versions of Pascal.
Oddly enough, we will not be exploring all this variety; rather, the set of features we will
be using throughout this part of the book will be very limited. The point is that we are not
interested in Free Pascal per se, not as a tool for professional programming (although it can
act as such a tool), but only as a tutorial that will allow us to learn the beginnings and basic
techniques of imperative programming. In the future we will get acquainted with C and C+,
and it is desirable to approach their study with already formed ideas about structural
programming, recursion, pointers and other basic features that are used in computer programs;
Pascal will allow us to learn all this, but we will not need all Pascal (and even more so the
features of Free Pascal in all their diversity) for this purpose.
If the reader wishes, he or she can certainly bring his or her knowledge of both Free Pascal
and other Pascal implementations to any level he or she wishes by using the literature and
other materials abundantly available on the Internet; however, it is quite possible that after
familiarizing himself or herself with other programming languages, the reader will no longer
want to use Pascal as a professional tool. In any case, the choice of a professional tool is a
matter for the future.
The task before us now can be formulated in one phrase: we are not really studying Pascal,
but programming as such, and Pascal interests us only as an illustration, nothing more.

2.1. First programs


To begin with, let us recall some of the things that have already been discussed in the
introductory part. A computer - or rather, its central processor - can execute a program
represented in machine code, but it is almost unrealistic for a human to write machine code
because of the enormous laboriousness of such work. That's why programmers write
programs in the form of a text corresponding to the rules of a particular programming
language, and then use a program-translator; it can be either a compiler, which translates the
whole program into some other representation (for example, into machine code), or an
interpreter, which does not translate anything anywhere, but simply performs step by step the
actions prescribed by the text of the program being executed. The text written by the
programmer according to the rules of the programming language is called source code or
program source code.
The Pascal language, which we are beginning to learn, is usually referred to as
compilable; this means that in most cases when working with Pascal, it is the compilers . The 1

Pascal program itself, like almost any programming language , is text in ASCII
97 98

representation, which we discussed in §1.4.4. Consequently, we will have to use some text
editor to write the program; we described some of them in §1.2.12. We will save the result in
a file named with the suffix ".pas", which usually means the text of the Pascal program.
99

97
Pascal interpreters also exist, but are very rarely used.
There are exceptions to this rule, but they are so rare that you can quite safely ignore them.
98

Recall that the Unix family of systems does not have the "extension" of a file name that many users are
99

accustomed to; the word "suffix" means much the same thing - a few characters at the very end of a name that
§2.1 First programs 233
We will then run the compiler, which in our case is called fpc from the words Free Pascal
Compiler, and if all is well, the result of our exercises will be an executable file that we can
run.
For the sake of clarity and convenience, we will create an empty directory before 100

starting our experiments and conduct all experiments in it. In the examples of dialogs with
the computer, here and below we will reproduce the command line prompt consisting of the
user name (avst), the machine name (host), the current directory (recall that the user's
home directory is indicated by the "~" symbol) and the "$" sign traditionally used in the
prompt. So, let's begin:
avst@host:~$ mkdir firstprog
avst@host:~$ cd firstprog avst@host:~/firstprog$ Is
avst@host:~/firstprog$

We created the directory firstprog, entered it (i.e. made it the current directory), and,
using the ls command, verified that it is empty, i.e. does not contain any files. If something
here is not quite clear, please reread §1.2.5 immediately.
Now it's time to launch a text editor and type the program text in it. The author of these
lines prefers the vim editor, but if you don't want to learn it at all (which, admittedly, is
not very easy), you can just as easily use other editors, such as joe or nano. In all cases,
the principle of starting a text editor is the same. The editor itself is just a program that has a
name to run, and this program should be given (as a parameter) the name of the file you want
to edit. If such a file does not already exist, the editor will create it the first time you save it.
The program we will write will be very simple: all it will do is to display in English, the
101

same string every time. Of course, there is no practical use of such a program, but it is more
important to make some program work, just to make sure that we can do it. In the example,
the program will produce the phrase Hello, world! , so we will name the file of its
102

source code hello.pas. So let's run the editor (substitute joe or nano for vim
if you want):

vim hello.pas

Now we need to type the program text. It will look like this:

program hello;
begin

are separated from the main name by a period, but unlike other systems, Unix does not consider a suffix to be
anything other than just a piece of the name.
Once again, let us remind you that the term "folder" should not be used!
100

101
It is more correct to speak, of course, not about the screen, but about the standard output stream,
see §1.2.11. See §1.2.11, but we will allow ourselves to use the terms loosely for the time being, so as not to
complicate the understanding of what is going on.
102
Hello, world! (The tradition of starting learning a programming language with a program that prints
this exact phrase was introduced by Brian Kernighan for C a long time ago; it's not a bad tradition in itself, so
you can follow it when learning Pascal. However, you can use any phrase you like.
§2.1 First programs 234
writeln('Hello, world!') end.

Now it is time to give some explanations. The first line of the program is the so-called header,
which shows that all the text that follows represents a program (the word program), which
its author named hello (actually, you can write here any name consisting of Latin letters,
numbers and the underscore character, but the name must begin with a letter; we will call
such names identifiers). The header ends with a semicolon character. Generally speaking,
modern implementations of Pascal allow you not to write a header, but we will not use it: a
program without a header does not look so clear.

The word begin, which we wrote on the second line, means beginning in English; in
this case it means the beginning of the main part of the program, but our program is so simple
that it actually consists of this "main part" alone; later we will write programs in which the
"main part" will be quite small compared to everything else.
The next line, writeln('Hello, world!'), does exactly what our program
was written for - it prints "Hello, world!". Let us explain that the word "write" means
"write" in English, and the mysterious addition "ln" comes from the word line and means
that after printing we need to translate the line (we will consider this point in detail later). The
resulting word "writeln" is used in Pascal to denote an output operator with a line feed,
and the parentheses list everything the operator should output; in this case, it is one line.
The reader already familiar with Pascal may object that most sources do not call writeln an
operator, but a "built-in procedure"; but this is not quite correct, because the word "procedure"
(without the epithet "built-in") denotes a subroutine that is written by the programmer himself, and
built-in procedures should be called procedures that the programmer could write, but does not need
to do so, because the compiler already contains them. We cannot write anything like writeln with
its variable number of parameters and output formatting directives; in other words, if this "built-in
procedure" did not exist in the language, we could not make it ourselves. Writeln and other
similar entities have their own syntax (a colon after the argument, followed by an integer width in
characters), so it is a part of the Pascal language that is handled by the compiler in its own way, not
according to generalized rules. In this situation, it seems much more logical to call writeln an
operator rather than something else. To be fair, the compiler treats the words write, writeln,
read, readln, etc. as ordinary identifiers, not as reserved words, which is a strong
argument against classifying these entities as operators; however, Free Pascal classifies these
words as modifiers, which also includes, for example, break and continue, which no one
calls them anything other than operators.
The line intended for printing is enclosed in apostrophes to show that this text fragment
stands for itself and not for any language constructs. If we had not put apostrophes, the
compiler would have tried to understand what we mean by the word "Hello" and, failing to
find any suitable meaning, would have issued an error message, and would not have translated
our program into machine code in the end; but since the word is enclosed in apostrophes, it
stands for itself and nothing else, so the compiler does not need to think about anything. A
sequence of characters enclosed in apostrophes and specifying a text string is called a string
literal or string constant.
The last line of our program consists of the word end and a dot. The word end means
"end" in English (again, in this case, the end of the main part of the program). Pascal rules
require the program to end with a dot, either to be sure or just for beauty.
§2.1 First programs 235
So, the text is typed, save it and exit the text editor (in vish we press Eus-colon-wq-Enter;
in nano - Ctrl-O, Enter, Ctrl-X, in joe - Ctrl-K, then another x; in the future we will not tell
you how to do this or that in different editors, because in §1.2.12 all this has already been
discussed). When we get the command line prompt again, we make sure that our directory is
no longer empty - it contains the file we just typed:
avst@host:~/firstprog$ ls
hello.pas
avst@host:~/firstprog$

For the sake of clarity, we'll give a command that shows the contents of the file; it's best not
to do this with large texts, but our program consists of only four lines, so we can afford to
experiment a little. The command itself is called cat, as we already know, and its parameter
is the name of the file:

avst@host:~/firstprog$ cat hello.pas


program hello;
begin
writeln('Hello, world!')
end.
avst@host:~/firstprog$

By the way, as long as the file in our directory is only one, you can not type its name on the
keyboard, but press the Tab key, and the command line interpreter will write the name for us.
Now that we are sure that we have the file and that it contains what we expect, we can
run the compiler. Recall that it is called fpc; as usual, it needs a parameter, and this is, once
again, our source file name:

avst@host:~/firstprog$ fpc hello.pas

The compiler will print a number of lines that vary slightly depending on the particular version
of the compiler. If among them there is a string similar to
/usr/bin/ld: warning: link.res contains output sections; did you forget
-T? - you can safely ignore it, as well as all other lines, unless they contain the word
Error, Fatal, warning or note. An error message (with the word Error or
Fatal) means that the text you've fed to the compiler doesn't follow the rules of the Pascal
language, so you won't get compilation results - the compiler just doesn't know what to do.
Warnings (with the word warning) are issued if the program text formally complies with
the language requirements, but for some reason the compiler thinks that the result will not
work as you expected (most likely, incorrectly); the only exception is the above warning about
output sections, it is not actually issued by the compiler, but by the ld (linker)
program it calls, and we are not concerned with this warning. Finally, comments (messages
with the word note) are issued by the compiler if some part of the program seems strange
to it, although it should not, in its opinion, lead to incorrect operation.
For example, if we wrote writenl instead of writeln and then forgot to put a
period at the end of the program, we would see, among other things, messages like these:
§2.1 First programs 236
hello.pas(3,12) Error: Identifier not found "writenl"
hello.pas(5) Fatal: Syntax error, "." expected but "end of file"
found

The first message informs us that the compiler does not know the word writenl, so
nothing good will come out of our program; the second message means that the file has run
out and the compiler has not reached the point, and this has upset it so much that it refuses to
consider our program further at all (this is how Fatal differs from just Error).
Pay attention to the numbers in brackets after the file name; hello.pas(3,12)
means that the error was detected in the file hello.pas, in line 3, column 12, and
hello.pas(5) means an error in line 5 - there is no such line in our program, but by the
time the compiler detected that the file unexpectedly ran out, line 4 was left behind, and the
fact that there is nothing in line 5 is another matter.
Line numbers given together with error messages and warnings are very valuable
information that allows us to quickly find the place in our program where the compiler did
not like something. Regardless of which text editor you use, it is desirable to immediately
understand how to find a line by number in it, otherwise programming will be difficult.
You can verify the success of the compilation by once again viewing the contents of the
current directory with the ls command: avst@host:~/firstprog$ ls hello hello.o hello.pas
avst@host:~/firstprog$

As you can see, there are more files. We are not very interested in the file hello.o, it is
a so-called object module, which the compiler feeds to the linker and then forgets to delete
for some reason; but the file called simply hello without the suffix is what we started the
compiler for: an executable file, that is, simplistically speaking, a file containing the machine
code corresponding to our program. To be sure, let's try to get more information about these
files:

avst@host:~/firstprog$ ls -l
total 136
-rwxr-xr-x1avstavst1233242015-06-0419: 57hello
-rw-r--r--1avstavst18922015-06-0419:57hello.o
-rw-r--r--1avstavst472015-06-0419:57hello.pas
avst@host:~/firstprog$

In the first column, we see that the hello file has execution rights set (the letter "x"). At
the same time, we notice how much larger this file is than the source text: the file hello.pas
occupies only 47 bytes (by the number of characters in it), while the resulting executable file
"weighs" more than 120 kilobytes. Actually, everything is not so bad: as we will soon see,
the resulting executable file will not increase with the growth of the source program. It's just
that the compiler is forced to stuff into the executable file everything that is needed to perform
various input/output operations at once, and we don't use these possibilities yet.
All that remains is to run the resulting file. This is done in the following way:
avst@host:~/firstprog$ ./hello
Hello, world!
§2.1 First programs 237
avst@host:~/firstprog$

If you have a question about why you should always write " ./hello" rather than just "hello",
recall that we've already dealt with this when studying command scripts; there's a detailed
explanation on page 120. 120 for a detailed explanation. Briefly, a name with no slashes in it is used
in Unix to run commands from system directories (listed in PATH, see §1.2.16), and our working
directory is not one of them; so we necessarily need an absolute or relative name with at least one
slash in it. The name "." is present in any directory and refers to the directory itself, which is what
we need.
Naturally, there can be more than one operator in a program; in the simplest case, they
will be executed one after another. Let's consider such a program for example:
program hello2;
begin
writeln('Hello, world!');
writeln('Good bye, world.') end.

Now we have not one operator in the program, as before, but two; you can notice that we put
a semicolon between the operators. Usually a semicolon is placed at the end of the next
operator to separate it from the next one, if, of course, there is a "next" operator; if there is
not, there is no need for a semicolon. The word end is not an operator, so it is not
usually preceded by a semicolon.
If this program is compiled and run, it will print first (as a result of the first statement)
"Hello, world!" and then (as a result of the second statement) "Good bye,
world.":

avst@host:~/firstprog$ ./hello2
Hello, world!
Good bye, world.
avst@host:~/firstprog$

Let's go back a bit and explain in a bit more detail the letters "ln" in the name of the
writeln operator, which, as we have already said, mean line feed. Pascal also has the
write operator, which works in exactly the same way as writeln, but does not perform
a line feed at the end of the output operation. Let's try to edit our first program by removing
the letters "ln":

program hello;
begin
write('Hello, world!') end.

After that, let's run the fpc compiler again to update our executable and see how our
program will work now. The picture on the screen will look something like this:

avst@host:~/firstprog$ ./hello
Hello, world!avst@host:~/firstprog$
§2.1 First programs 238
This time, as before, our program outputted "Hello, world!", but it did not translate the
string; when, after its completion, the command line interpreter printed another prompt, it
appeared on the same line as the output of our program.
Here is another example. Let's write such a program:

program NewlineDemo;
begin
write('First');
writeln('Second');
write('Third');
writeln('Fourth')
end.

Let's save this program to a file, for example, nldemo.pas, compile and run it:
avst@host:~/firstprog$ fpc nldemo.pas

avst@host:~/firstprog$ ./nldemo
FirstSecond
ThirdFourth
avst@host:~/firstprog$

In the program, we output the first and third words using the write operator, while the
second and fourth words are output using writeln, i.e. with a line feed. The effect of this
is quite obvious: the word "Second" is printed on the same line as "First"
(after which the program did not do a line feed), so they merged together; the same happened
with the words "Third" and "Fourth".
Let us make one important terminological remark. The word "operator" is an example of a rather
unfortunate translation of an English term; in the original, this entity is called statement, which would
be more correctly translated as "statement", "sentence" or something else. The problem is that the
word operator also exists in English, and in programming this word denotes what we call operations
in Russian - addition, subtraction, multiplication, division, comparison and so on.
It is interesting that in mathematics the English word operator has passed into Russian
terminology, practically preserving (except that it has somewhat narrowed) its original meaning; the
reader may be familiar with such terms as linear operator, differentiation operator, etc.
In any case, the term "operator" in the Russian-language programming vocabulary is firmly fixed
as a designation of such a programming language construct that prescribes some action - not a
calculation, but exactly an action. How it happened is not important now; unfortunately, it creates
problems when programming languages use the word operator - in English, of course, and not
in Russian. There is no such word in Pascal, but in the third volume of our book we will study C++,
where the word operator is used quite actively. That's why it is useful to remember that in
programming the Russian word "operator" and the English word "operator" denote completely
different entities.
Before we finish talking about the simplest programs, let's mention one more very
important point - the way we have arranged different parts of our source code in relation to
each other. The header
§2.1 First programs 239
and the words begin and end, which denote the beginning and end of the main
part of the program, we wrote at the beginning of the line, while the operators, which make
up the main part, we moved to the right, putting four spaces before them; in addition, we
placed all these elements each on a separate line.
Interestingly, the compiler does not need all this at all. We could write, for example, like
this:

program hello; begin writeln('Hello, world!') end.

or like this:

program hello; begin


writeln('Hello, world!') end.

As for our second program, which has more operators, there's also a lot more room for
messing around - for example, you could write something like

program NewlineDemo; begin write('First');


writeln('Second'); write(
'Third'); writeln('Fourth') end.

From the compiler's point of view, nothing would change; moreover, as long as we are talking
about very primitive programs, it is unlikely that anything will change from our own point of
view, unless our aesthetic sense is revolted. However, the situation changes dramatically for
less or less complex programs. The reader will soon see for himself that program texts are
rather hard to understand; in other words, to understand what a program does and how it does
it from the available program text, one has to make intellectual efforts that in many cases
exceed the efforts required to write a program text from scratch. If the text is also written in
the wrong way, it turns out to be impossible to understand it, as Bulgakov's famous character
said. The most interesting thing is that you can get hopelessly lost not only in someone else's
program, but also in your own, and sometimes it happens before the program is finished.
There is nothing easier (and more frustrating) than getting lost in your own code before you
have had time to write anything.
In practice, programmers have to read programs - both their own and others' - almost
more time than they spend writing them, so, quite naturally, the readability of program code
is always given the most careful attention. Experienced programmers say that the program
text is intended first of all for human reading and only secondly for computer
processing. Practically all existing programming languages give the programmer a certain
freedom in terms of code design, which allows the most complex program to make it
understandable at a glance to any of its readers familiar with the language used; but with the
same success it is possible to make a very simple program incomprehensible to its own author.
To improve the readability of programs, a number of techniques are used, which together
make up a competent style of program code design, and one of the most important moments
here is the use of structural indents. Almost all modern programming languages allow an
§2.1 First programs 240
arbitrary number of whitespace characters - spaces or tabs - at the beginning of any line of
103

text, which allows you to design a program so that its structure can be "grabbed" by a
unfocused eye without reading. As we will soon see, the structure of a program is formed by
the principle of nesting one thing into another; the technique of structural indentation allows
us to emphasize the structure of a program by simply shifting to the right any nested constructs
relative to what they are nested in.
In our simplest programs, there is only one level of nesting: the writeln and write
operators are nested in the main part of the program. In this case, neither the header nor the
main part itself is nested in anything, so we wrote them starting from the leftmost column of
characters in the program text, and shifted the writeln and write operators to
the right to show that they are nested in the main part of the program. The only thing we
have chosen rather arbitrarily here is the size of the structural indentation, which in our
examples throughout this book will be four spaces. In reality, many projects (in particular, in
the Linux kernel) use the tab character for structural indentation, and exactly one. You may
also see a two-space indentation, which is the indentation used in the code of programs
produced by the Free Software Foundation (FSF). It is very rare to use three spaces; this
indentation size is sometimes found in programs written for Windows. Other indentation sizes
should not be used, and there are a number of reasons for this. One space is too small to
visually emphasize blocks, the left edge of the text is perceived as something smooth and does
not serve its purpose. The number of spaces exceeding four is difficult to enter: if there are
more than five spaces, they have to be counted as you type, which slows down your work,
but five spaces also turns out to be too many.
If you use more than one tab, there is almost nothing on the horizontal screen. If you use more
than one tab, there is almost no horizontal space on the screen.
Once you have chosen one indentation size, it should be adhered to throughout the
program text. This also applies to other decisions made about the design style: the source text
of the program must be stylistically homogeneous, otherwise its readers will have to
constantly rearrange their perception, which is rather tedious.

2.2. Expressions, variables and operators


2.2.1. Arithmetic operations and the concept of type
The write and writeln operators can print more than just strings. For example,
if we urgently need to multiply 175 and 113, we can write the program : 126

program mult175_113;
begin

103
A curious exception to this statement is Python, which does not allow an arbitrary number of spaces at
the beginning of a line - on the contrary, it has a strict syntax requirement for the number of such spaces, which
corresponds to the principles of structural indentation. In other words, most programming languages allow
structural indentation, whereas Python requires it.
126
Of course, it is much easier to use a calculator or the arithmetic capabilities of a command interpreter
(see §1.2.15), but that is not important now.
§ 2.2. Expressions, variables and operators 270
writeln(175*113)
end.

After compiling and running this program, we will see the answer 19775 on the screen.
The "*" symbol in Pascal (and in most other programming languages) denotes the
multiplication operation, and the construct "175*113" is an example of an arithmetic
expression . We could make this program a little more "user-friendly" by showing in its output
what the answer is actually referring to:

program mult175_113;
begin
writeln('175*113 = ', 175*113) end.

Here we have given the writeln operator two arguments to print: a string and an
arithmetic expression. We can specify as many arguments as we like, listing them, as in this
example, separated by commas. If we now compile and run a new version of the program, it
will look like this:

avst@host:~/firstprog$ ./mult 175*113 = 19775


avst@host:~/firstprog$

Note the fundamental (from the compiler's point of view) difference between characters
included in the string literal and characters outside it: if the expression 175*113 outside
the string literal was evaluated, the compiler did not try to consider the same characters
between the apostrophes and thus included in the string literal as an expression or in any other
capacity than just as characters. Therefore, in accordance with our instructions, the program
first printed one by one all characters from the given string literal (including, as you can easily
see, spaces), and then the next argument of the writeln operator, which is the expression
with the value 19775.
Of course, multiplication is by no means the only arithmetic operation that Pascal
supports. Addition and subtraction in Pascal are denoted by the natural "+" and "-" signs,
so that, for example, the expression 12 + 105 will result in 117, and the expression
10 - 25 will result in 15. Just as in writing mathematical formulas, operations
in Pascal expressions have different priorities; for example, the priority of multiplication in
Pascal, as in mathematics, is higher than the priority of addition and subtraction, so the value
of the expression 10 + 5*7 will be 45, not 105: when calculating this expression,
multiplication is done first, i.e. 5 is multiplied by 7, resulting in 35, and only then
the resulting number is added to ten. Just as in mathematical formulas, in Pascal's expressions
we can use parentheses to change the order of operations: (10 + 5) * 7 will result
in 105.
Note that Pascal also provides unary (i.e. having one argument) operations "+" and "-
", i.e. you can, for example, write -(5 * 7) and get -35: the unary operation
-, as you would expect, changes the sign of the number to the opposite sign. The unary
operation + is also provided, but it makes no sense: its result is always equal to the
§ 2.2. Expressions, variables and operators 271
argument.
The division operation is somewhat more complicated. The usual mathematical division
is indicated by the slash "/"; just remember that the division operation is different from
multiplication, addition, and subtraction: even if its arguments are integers, the result
generally cannot be expressed as an integer. This leads to a somewhat unexpected effect for
beginners. For example, if we write the writeln(14/7) operator in a program, we may
not even immediately recognize the number 2 in its output:
2.0000000000000000E+0000

To understand what the problem is and why writeln did not print just "2", we will need
a rather lengthy explanation introducing the concept of expression type . Since this is one
127

of the fundamental concepts in programming and we will not do without it in the future, let's
try to deal with it right now.
Note first that all the numbers we have written in the above examples are integers; if we
had written something like 2.75, we would have been talking about a different type of
number, a so-called floating-point number. You and I have looked at the representation of
both types of numbers in detail (see §1.4.2 and 1.4.3) and have seen that they are stored and
handled quite differently. Mathematically, "2" and "2.0" are the same number, but in
programming they are quite different things because they have different representations, so
that operations on them require different sequences of machine commands. Moreover, integer
2 can be represented as a two-byte, one-byte, four-byte or even eight-byte integer, signed or
unsigned . All these are also examples of situations where different types are involved. In
128

Pascal, each type has its own name; for example, signed integers can be of type shortint,
integer, longint, and int64 (respectively one-byte, two-byte, four-byte, and eight-
byte signed integers), the corresponding unsigned types are called byte, word,
longword, and qword, and floating-point numbers are usually of type real (although
Free Pascal supports other types of floating-point numbers).
As it often happens in engineering disciplines, the notion of type is poorly definable: even
if we try to give such a definition, it is likely to be in some way inconsistent with reality. Most
often in the literature one can find the statement that a type is a set of possible values, but if
we accept such a definition, we cannot explain the difference between 2 and 2.0; therefore,
the notion of type should include not only a set of values, but also the machine representation
of these values accepted for a given type. In addition, many authors emphasize the important
role of the set of operations defined for a given type, and this is, in principle, reasonable. By
stating that an expression type fixes the set of values, their machine representation and
the set of operations defined on these values, we are likely to be close to the truth.
Types (and expressions) are not only numeric. For example, the boolean type,
intended for working with logical expressions, has only two values: true and false; the
set of operations on values of this type includes the familiar conjunction, disjunction,
negation, and "exclusive or". The 'Hello, world!' used in our very first program is

Of course, if we simply write a number in a program, such an entry (a so-called numeric constant, or
127

numeric literal) will also be a special case of an expression.


If something is unclear here, be sure to reread §1.4.2.
128
§ 2.2. Expressions, variables and operators 272
nothing but a constant (and therefore an expression) of the string type, and there is even
an operation on strings, though only one - 'addition', denoted by the '+' symbol, which actually
means concatenation (simply put, joining) of two strings into one. Strings can also be
compared, but the result of the comparison, of course, will not be a string, but a logical value,
i.e. a value of the boolean type. The char type is used to work with single characters,
etc.
Let's return to the division operation, which is where this whole conversation started.
Now it is easy to guess why writeln(14/7) behaved in such an unexpected way. The
result of multiplication, addition and subtraction operations is usually of the same type as the
arguments of the operation, i.e. when adding or multiplying two integers we get an integer; if
we try to add two floating-point numbers, the result will be a floating-point number. The
division is different: the result is always a floating-point number, which is the reason for this
effect.
To elaborate, the writeln operator, unless special measures are taken, prints floating-point
numbers in the so-called scientific notation - in the form of a mantissa and order, with the mantissa
being a decimal fraction satisfying the condition 1 6 gp < 10 and printed with 16 decimal places; the
mantissa in scientific notation is followed by the letter E (from the word exponent) separating the
order entry from the mantissa - positive or negative of the integer. 10, and is printed with 16 decimal
places; the mantissa in scientific notation is followed by the letter E (from the word exponent),
separating from the mantissa the record of the order - a positive or negative integer p, representing
the degree of ten; the whole number is g-p 10 . This behavior can be changed by specifying explicitly
р

how many characters we want to allocate for printing a number and how many of them for decimal
places. To do this, the write and writeln operators add a colon, an integer (how many
characters total), another colon, and another number (how many decimal places) after the numeric
expression. For example, if you write writeln(14/7:7:3), it will print 2,000, and since
there are only five characters, it will print two more spaces before it.
In addition to ordinary division, Pascal provides for integer division, known from school
mathematics as division with remainder. For this purpose, two more operations are
introduced, denoted by the words div and mod, which mean division (with the remainder
discarded) and the remainder of such division. For example, if we write
writeln(27 div 4, ' ',27 mod 4);
- it will print two numbers separated by a space: "6 3".

2.2.2. Variables, initialization and assignment


All the examples given in the previous paragraph have one fundamental flaw - they print
the same thing every time, no matter what the circumstances, because they solve not just the
same problem (which is what most programs do), but the same problem for the same special
case. Those who like to talk about properties of algorithms would probably say that the
algorithms implemented in our examples do not have the mass property (see page 187).
It would be much more interesting if a program that can solve some problem, even if it is
only one and very simple, could still solve it in a general way, i.e. it would ask the user (or
take values from somewhere else) and work with them, rather than with those values that are
rigidly specified right when the program is written in its source code. To achieve this, we
need one very important feature: we must be able to store some information in memory and 11
§ 2.2. Expressions, variables and operators 273
work with it. Pascal uses so-called variables for this purpose.
This is true not only for Pascal, but probably for most existing programming languages - but still
not for all of them. There are exotic programming languages that do not provide for variables at all.
Besides, in many programming languages variables are present, but they are used in a completely
different way and organized in a different way than in Pascal; examples of such languages are Prolog,
Refal, Haskell and others. However, if we consider only the so-called imperative programming
languages, for which the programmer's thinking style is similar to Pascal's, then in all such languages
the concept of a variable is present and means approximately the same thing.
In the simplest case, a variable is denoted by an identifier - a word that can consist of
Latin letters, digits and an underscore, but must begin with the letter ; such an identifier is
129 130

called a variable name. For example, we can name a variable "x", "counter", "p12",
"LineNumber" or "grand_total". Later we will encounter variables that have no
names, but we are still a long way off from that. Note that Pascal does not distinguish between
uppercase and lowercase letters, i.e. according to Pascal's rules the words "LineNumber",
"LINENUMBER", "linenumber" and "LiNeNuMBeR" denote one and the same
variable; another issue is that using different spelling variants for one and the same identifier
is considered by programmers to be extremely bad taste.
A variable has a value associated with it at any given time; a variable is said to store a
value or a value is said to be contained in a variable. If a variable name occurs in an arithmetic
expression, a so-called variable reference is performed, where the variable value is
substituted into the expression instead of the variable name.
Pascal is classified as a strictly typed programming language; this means, in particular,
that each variable in a Pascal program has a strictly defined type. In the previous paragraph
we considered the concept of expression type; a variable type can be understood, on the one
hand, as the type of an expression consisting of a single reference to this variable, and, on the
other hand, as the type of an expression whose value can be stored in such a variable. For
example, the most popular type in Pascal programs, which is called integer, implies that
variables of this type are used to store integer values, can contain numbers from -32768
to 32767 (integers, two-byte signed integers), and a reference to such a variable will also,
of course, be an expression of type integer.
Before a variable can be used in a program, it must be described. To do this, a variable
description section is inserted between the header and the main part of the program; this
section begins with the word "var" (from the word variables), followed by one or more
variable descriptions and their types. For example, the variable descriptions section may look
like this:

var
x: integer;
y: integer;
flag: boolean;

This refers to the computer's RAM, or rather, the part of it that is allocated to our program for operation.
129

You can also start an identifier with an underscore, but in Pascal programs you don't usually do that; but
130

you can't start an identifier with a digit.


§ 2.2. Expressions, variables and operators 274
Here, the variables x and y are of type integer, and the variable flag is of type
boolean (recall that this type, sometimes called logical, implies only two possible values
- true and false, i.e. "true" and "false"). Variables of the same type can be grouped into
one description by listing them separated by commas:

var
x, y, z: integer;

Pascal provides several different ways to put a value into a variable. For example, you can set
the initial value of a variable directly in the description section; this is called initialization :

var
x: integer = 500; If you don't do this, the variable will still contain some
value, but what value - it is impossible to predict, it can be arbitrary garbage. Such a
variable is called uninitialized . Using a "garbage" value is inherently wrong, so the
compiler, when it sees such a use, issues a warning; unfortunately, it does not always handle
it; in some cases a program may be written so "cunningly" that it will refer to an
uninitialized variable, but the compiler will simply not notice it. Sometimes it happens the
other way around.
The value of a variable can be changed at any time by executing a so-called assignment
operator. In Pascal, an assignment is denoted by ":=", to the left of which is written the
variable to which the new value is to be assigned, and to the right is written the expression
whose value is to be entered into the variable. For example, the operator

x := 79

(read "put x equal to 79") will put the value 79 into the variable x, and from the moment
this operator is executed, expressions containing the variable x will use this value. The old
value of the variable, whatever it may be, is lost when the assignment is executed. The
operation of assignment can be illustrated by the following example:

program assignments;
var
x: integer = 25;
begin
writeln(x);
x := 36;
writeln(x);
x := 49;
writeln(x)
end.

When the first of the writeln statements is executed, the variable x contains the
initial value given in the description, that is, the number 25; this is what will be printed. Then
the assignment operator on the next line will change the value of x; the old value of 25 will
§ 2.2. Expressions, variables and operators 275
be lost, and the variable will now contain the value 36, which will be printed by the
second writeln operator; after that, the variable will contain the value 49, and the
last writeln operator will print it. In general, the execution of this program
will look like this:

avst@host:~/firstprog$ ./assignments
25
36
49
avst@host:~/firstprog$
Another example of assignment is somewhat more difficult to understand:
x := x + 5

Here, the value of the expression to the right of the assignment sign is calculated first; since
the assignment itself has not yet occurred, the old value of x, the value that was in
this variable just before the operator was executed, is used in the calculation. Then the
calculated value, which for our example will be 5 more than the value of x, is put back
into the variable x, that is, roughly speaking, as a result of the execution of this
operator, the value contained in the variable x becomes 5 more: if it was 17, it
becomes 22, if it was 100, it becomes 105, and so on.

2.2.3. Identifiers and reserved words


In the previous paragraph we needed to describe variables by giving them names such as
"x", "flag", "LineNumber", etc., we even introduced the concept of an identifier: a
word that can consist of Latin letters, Arabic numerals and an underscore, and must start with
a letter. If we remember the very beginning of the conversation about Pascal, we needed a
similar identifier name in the program header as a name for the whole program. Besides the
program and variables, Pascal has a number of entities described by the user (i.e., the
programmer) that also need names; we will gradually discuss most of these entities and learn
how to use them. The general rule is that an identifier can be a name for anything in Pascal,
and the rules for creating identifiers are the same regardless of what the identifier is for.
At the same time, we encountered words that Pascal itself provides us, i.e. words that we
don't need to describe: on the one hand, the words program, var, begin, end, div,
mod, and on the other hand, the words write and writeln, integer, boolean.
Although all of these words are part of the Pascal language to some extent, they belong
to different categories. The words program, var, begin, end, div and mod (as well
as many others, some of which we will study in the future) are considered reserved words
(sometimes also called keywords) by Pascal's rules; this means that we cannot use them as
names for variables or anything else; formally, they are not considered identifiers at all, hence
the name "reserved": the property of being reserved is precisely that these words cannot serve
as identifiers.
On the other hand, the words write, writeln and even the word integer,
although introduced by Pascal, are nevertheless ordinary identifiers (in fact, write and
§ 2.2. Expressions, variables and operators 276
writeln are not quite ordinary, but integer and boolean are really ordinary),
i.e. we can describe a variable with the name integer (of some other type), but we will
lose the possibility to use the type integer, because its name will be occupied by the
variable. Of course, we should not do this; but it is worth keeping in mind that this possibility
exists, if only for reasons of general erudition.
It is worth noting that there is a certain degree of arbitrariness in dividing the words provided by
the Pascal language into reserved words and "built-in identifiers". For example, the words true and
false, used to denote logical truth and logical falsehood, were considered simple identifiers in
the classical versions of Pascal, as well as in the famous Turbo Pascal (which most programmers
didn't even realize, because it never occurred to anyone to use these words for anything else). The
creators of Free Pascal found this inconvenient, so the Free Pascal compiler treats these two words
(as well as new, dispose, and exit) as reserved.

2.2.4. Input of information for its further processing


Besides assignment, there are other situations in which a variable changes its current
value; one of the most typical examples is the execution of so-called input operations. During
such an operation, the program receives information from an external source - from the
keyboard, from a file on disk, from a communication channel on a computer network, etc.;
naturally, the received information must be stored somewhere, and variables are used for this
purpose.
In Pascal, the most popular means of input operations is the input operator, denoted by
the word read, and its variation readln. Let's start with a simple example - a program
that inputs an integer from the keyboard, squares it, and prints the result:

program square; var


x: integer;
begin

read(x);
x := x*x;
writeln(x)
end.

As you can see, the program uses one variable of the integer type, which is called x.
The main part of the program includes three operators. The first of them, read, prescribes
to read an integer number from the keyboard and then put the read number into the variable
x. The program execution will stop until the user enters the number, and due to certain
peculiarities of the terminal operation mode (or rather, in our case - its software emulator,
which repeats the peculiarities of real terminals) the program will "see" the entered number
no sooner than the user presses the Enter key.
The second operator in our example is an assignment operator; the expression on its right
side takes the current value of the variable x (i.e. the value that has just been read from the
keyboard by the read operator), multiplies it by itself, i.e. squares it, and puts the result
back into the variable x. The third operator - the familiar writeln - prints the result.
§ 2.2. Expressions, variables and operators 277
Execution of this program may look like this:

avst@host:~/firstprog$ ./square
25
625
avst@host:~/firstprog$

Immediately after starting the program "freezes", so that an inexperienced user may think that
it has frozen, but in fact the program is just waiting for the user to enter the required number.
In the example above, the number 25 was entered by the user, and the number 625 was
given by the program.
By the way, now is a good time to show why the expression "input from the keyboard" is
not quite true and it would be more correct to speak about "input from the standard input
stream". First of all, let's create a text file containing one line and a number in this line, let it
be 37 for a change. We'll call the file num.txt. To create it, you can use the same text
editor that you use to enter program texts, but you can do it in a simpler way - for example,
like this:

avst@host:~/firstprog$ echo 37 > num.txt


avst@host:~/firstprog$ cat num.txt 37
avst@host:~/firstprog$

Now let's run our square program by redirecting the input from the num.txt file to it:
avst@host:~/firstprog$ ./square < num.txt 1369
avst@host:~/firstprog$
The number 1369 is the square of the number 37; our program printed it. As we can see,
it did not enter the original number from the keyboard - it read it from the num.txt file in
accordance with our instructions. It is easy to make sure of it: edit the num.txt file, replacing
the number 37 with some other number, and run the square program again with
redirection from the file, as shown in the example; this time the program will print the square
of the number you entered into the file.
In our example, we first used the system command echo with redirected output to
generate a file containing a number, and then ran our program with redirected input from that
file. Roughly the same results can be achieved without any intermediate files by running the
echo command at the same time as our program with a so-called pipeline (see §1.2.11);
in this case, the output of the echo command will go directly to the input of our program.
This is done in the following way:
avst@host:~/firstprog$ echo 37 | ./square 1369
avst@host:~/firstprog$ echo 49 | ./square
2401
avst@host:~/firstprog$
Let's redirect the output to the result.txt file:
avst@host:~/firstprog$ echo 37 | ./square > result.txt
avst@host:~/firstprog$
§ 2.2. Expressions, variables and operators 278
This time nothing was displayed on the screen at all, but the result was written to a file, which
is easy to verify:
avst@host:~/firstprog$ cat result.txt 1369
avst@host:~/firstprog$
Now we know from our own experience: output "to the screen" is not always on the
screen, and input "from the keyboard" is not always performed from the keyboard.
That is why it is more correct to speak about output to the standard output stream and about
input from the standard input stream. Let us emphasize that the program itself does not know
where its input is coming from and where its output is directed to, all redirections are
made by the command line interpreter just before launching our program.
2.2.5. Beware of the shortage of digitization!
Once Ilya Muromets came to chop off the head of the
Serpent Gorynych. He came and cut off the Snake's head.
And the Snake grew two heads. Ilya Muromets cut off
two heads of the Snake, and the Snake grew four. He
chopped off four heads, and the Snake grew eight heads.
Ilya Muromets chopped heads, chopped, chopped,
chopped, chopped, finally chopped a total of 65535 heads
- that's when the Serpent Gorynych died. Because he was
sixteen-bit.

Let's continue the experiments with the square program started in the previous
paragraph, but this time we will take larger numbers.

avst@hos : echo 100 ./squar


t:
10000 ~/firstprog
: |
150 e
./squar
$
avst@hos ~/firstprog echo | e
t:
22500 $
: 200 ./squar
avst@hos ~/firstprog echo | e
-25536 t: $
avst@host:~/firstprog$

If everything seemed to be OK with the first two runs, something obviously went wrong on
the third one. To understand what was going on, let's remember that variables of the
integer type in Pascal can take values from -32768 to 32767; but the square of the
number 200 is 40000, so it simply doesn't fit into a variable of the integer type!
Hence the ridiculous result, which is also negative.
The result, despite its absurdity, is very simple to explain. We already know that we are dealing
with a signed integer and we have had an overflow (see §1.4.2, page 208). The digit capacity of our
numbers is 16 bits, so the overflow results in a final number that is 2 = 65536 less than it should be.
16

The correct result of the multiplication would be 200 = 40000, but the overflow reduced it by 65536,
2

so that 40000 - 65536 = -25536; this is exactly what we see in the example.
The largest number that our program processes correctly is 181, its square is 32761;
the square of 182, which is 33124, does not fit into the digit capacity of the
§ 2.2. Expressions, variables and operators 279
integer type. But everything is not so terrible, we just need to apply another type of
variable. The most obvious candidate for the role of such a type is longint, which has a
32-bit capacity; variables of this type can take values from -2147483648 to
2147483647 (i.e. from -2 31

to 2 - 1). It is enough to change one word in the program - just replace integer with
31

longint:

program square; var


x: longint;
begin
read(x); x := x*x; writeln(x) end.

and the capabilities of our program (after its recompilation) will increase dramatically:

avst@host:~/firstprog$ echo 182 | ./square 33124


avst@host:~/firstprog$ echo 200 | ./square 40000
avst@host:~/firstprog$ echo 20000 | ./square 400000000
avst@host:~/firstprog$

Of course, it's too early to rejoice, there is a limit here as well:

avst@host:~/firstprog$ echo 46300 | ./square 2143690000


avst@host:~/firstprog$ echo 46340 | ./square 2147395600
avst@host:~/firstprog$ echo 46341 | ./square -2147479015
avst@host:~/firstprog$

but it's still better than what we had.


We can extend the digit capacity of a number even further by applying the int64 type, which
uses signed 64-bit numbers . After replacing longint with int64 and recompiling, our program
131

will be able to square "huge" numbers:

avst@host:~/firstprog$ echo 3000000000 | ./square


9000000000000000000

though, of course, he who seeks will always find it; of course, the maximum possible number exists
for int64 as well:

avst@host:~/firstprog$ echo 3037000499 | ./square


9223372030926249001
avst@host:~/firstprog$ echo 3037000500 | ./square -
9223372036709301616

The last thing we can do to extend the range of numbers is to replace signed numbers with unsigned
ones. It won't do much, we'll gain only one bit, but there are no numbers with a bitness greater than

While numbers of the longint type can be found in almost any Pascal implementation, the
131

int64 type is a peculiarity of our chosen implementation (i.e. Free Pascal), so if you try to use it in other
versions of Pascal, there is a high probability that it will not be there.
§ 2.2. Expressions, variables and operators 280
64 in Pascal (at least not in Free Pascal). So, let's change int64 to qword (from quadro word,
i.e. "quadruple word"; "word" on x86 architectures traditionally means 16 bits) and try it:

avst@host:~/firstprog$ echo 3037000500 | ./square


9223372037000250000

Since the maximum possible value of our variable is now 2 - 1 = 1844 6 744 0 7 3 7095 5 1 6 1 5 , we
64

can predict at which number the program will fail. The square of the number 2 is 2 , which is one
32 64

more than allowed. Therefore, the largest number that our program can still square is 2 - 1 = 42 9 4
32

9 9 6 7 7 2 95. Check:

avst@host:~/firstprog$ echo 4294967295 | ./square


18446744065119617025
avst@host:~/firstprog$ echo 4294967296 | ./square
0

This is the somewhat unexpected effect of overflow, or, to put it more strictly, of transfer to a non-
existent discharge, because this time we are dealing with unsigned ones. Do you remember the joke
about the Serpent Gorynych?

2.2.6. Simple sequence of operators


All the programs we have written so far have been executed sequentially, statement by
statement. It is interesting that the essentially trivial idea of sequential execution of
instructions is not something that all people conquer immediately and without a fight; if you
don't feel confident, try writing something like this:

program sequence;
begin
writeln('First');
readln;
writeln('Second');
readln;
writeln('Third') end.

Let's explain that the readln operator works in much the same way as the familiar read
operator, with the difference that, having read everything that was required, it necessarily
waits for the end of the line at the input. Since in our example we have not specified
parameters for this operator, it will do just that - waiting for the end of the line on input, i.e.
simply waiting for the Enter key to be pressed. If our program is compiled and run, it will
type the word "First" and stop waiting; nothing else will happen until you press Enter. The
program has completed its first statement and started executing the second; it will not
complete until the line feed character is read on the input.
When you press Enter, you will see that the program "came to life" and printed the word
"Second", then stopped again. When you pressed Enter, the first of the two
readln'crs finished, then the second writeln ran, which printed the word "Second";
after that, the second readln started executing. Like the first one, it will run until you press
Enter. When you press Enter a second time, you will see that the program has printed
§ 2.2. Expressions, variables and operators 281
"Third" and terminated: this is first the penultimate statement (readln) terminated, and
then the last one was executed.
In many computer science and programming textbooks, especially school textbooks, literally all
example programs end with this readln; sometimes readln at the end of a program becomes
so habitual that students and even some teachers begin to perceive it "as furniture", completely
forgetting what it is actually needed for at the end of the program. This whole semi-shamanic
shambles started with the fact that most schools use Windows family systems as a teaching aid, and
since it is very difficult to write a normal program for Windows, the programs are written "console",
or, to be more precise, just OѲB's programs. Of course, students are offered to run the programs
exclusively from under the integrated environment like Turbo Pascal or whatever one is rich in, and
nobody thinks it necessary to bother about the correct configuration of this environment. As a result,
when launching a program created in an integrated environment, the operating system opens an MS -
DOS emulator window for executing such a program, but this window automatically disappears as
soon as the program ends; naturally, we simply do not have time to read what our program has
printed.
Note that this "problem" can be overcome in many different ways, such as setting up the
environment so that the window does not close, or simply starting a command line session and
executing compiled programs from it; unfortunately, teachers instead prefer to show students how to
insert a completely irrelevant statement into programs, which is actually needed to keep the window
open until the user presses Enter; however, this is usually not explained to students.
Fortunately, we do not use Windows or integrated environments, so these problems do not
concern us. We should also note that the ridiculous readln at the end of every program is far from
being the only problem,

Figure 2.1. Block diagrams for simple succession, complete and incomplete succession
branching

that arises when using integrated environments. For example, having got used to launching a
program for execution from under the integrated environment by pressing the corresponding key,
students lose sight of the notion of an executable file, and with it the compiler and the role it plays -
many yesterday's students are sure that compilation is needed to check the program for errors, and
nothing else, in fact, nothing else.
§ 2.2. Expressions, variables and operators 282
As an independent exercise, take any collection of English-language poems and write a
132

program that will type some sonnet line by line, each time waiting for you to press Enter.
The sequence of operators in a program is often depicted schematically in the form of so-
called flowcharts. A flowchart for a simple sequence of operators is shown in Fig. 2.1 (left).
Note that common actions in flowcharts are traditionally depicted as a rectangle, the
beginning and the end of the program fragment under consideration are indicated by a small
circle, and the condition check is indicated by a rhombus; more about this in the next
paragraph.

2.2.7. Branching design


A single and forever set sequence of actions, which we have used so far, is suitable only
for very trivial tasks, which, moreover, have to be specially selected. In more complex cases,
the sequence of actions has to depend on various conditions (e.g., if a condition is met, we do
one thing, and if it is not met, we do something completely different), some program
fragments have to be repeated several (or even many) times in a row, we have to temporarily
jump to other places in the program in order to come back later, and so on.
Perhaps the simplest construction that breaks the rigid sequence of operator execution is
the so-called branching; when executing it, some condition is checked first, which may turn
out to be true or false, and at the time of writing the program we do not know what this
condition will turn out to be during its execution (most often it happens both ways, and during
one execution of the program). Branching can be complete or incomplete. With full branching
(Fig. 2.1, center), the program specifies one operator to be executed if the condition is true
and another to be executed if the condition turns out to be false; with incomplete branching
(Fig. 2.1, right), only one operator is specified and it is executed only if the condition turns
out to be true.
In Pascal, the simplest cases of branching are defined using the if statement, also called
a conditional statement or a branching statement. This operator is somewhat different from
the operators we have met so far. The point is that the if operator is complex: it
contains other operators inside it, including, by the way, another if operator. Let's
start with a simple example: let's write a program that calculates the modulus of of the 133

entered number. As you know, the modulus is equal to the number itself if the number is non-
negative, and for negative numbers the modulus is obtained by changing the sign. For
simplicity we will work with integers. The program will look like this:

program modulo;
var
x: integer;
begin

It is very important that the poems should be in English. The use of Cyrillic characters in program texts is
132

inadmissible, despite the fact that compilers usually allow it. Proper design of a program capable of "speaking
Russian" requires studying the possibilities of special libraries that allow creating multilingual programs; all
non-English messages are not in the program itself, but in special external files.
Actually, Pascal has a built-in function to calculate the modulus, but we'll ignore that fact here.
133
§ 2.2. Expressions, variables and operators 283
read(x);
if x > 0 then
writeln(x)
else
writeln(-x)
end.

As you can see, the first step here is to read the number; the read number is placed in the
variable x. Then, if the entered number is strictly greater than zero (the condition x > 0
is met), the number itself is printed; otherwise, the value of the expression -x is printed,
i.e. the number obtained from the initial one by changing the sign.
What is noteworthy here is that the design

if x > 0 then writeln(x) else writeln(-x)

is entirely a single operator, but complex because it contains the operators writeln(x)
and writeln(-x).
To put it more strictly, the if operator is composed as follows. First, we write the if
keyword, which tells the compiler that our program will now contain a branching construct.
Then we write a condition, which is a so-called logical expression; such expressions are
similar to simple arithmetic expressions, but the result of their calculation is not a number,
but a logical value (a value of the boolean type), i.e. true (true) or false (false). In
our example, a logical expression is formed by a comparison operation, which is labeled ">"
("more").
After the condition, we must write the keyword then; the compiler recognizes by it
that our logical expression has ended. Next comes the operator that we want to execute if the
condition is met; in our case, this operator is writeln(x). In principle, the conditional
operator (i.e. the if operator) can end here if we want incomplete branching; but if we
want to make branching complete, we write the keyword else, followed by another
operator specifying the action we want to perform if the condition is false. The syntax of the
if statement can be expressed as follows:
if <condition> then <operator1> [ else <operator2> ] Square brackets
here denote the optional part.
Our modulus calculation can be written with incomplete branching, and even a bit shorter:

program modulo;
var
x: integer;
begin
read(x);
if x < 0 then
x := -x;
writeln(x)
end.
§ 2.2. Expressions, variables and operators 284
Here we have changed the condition in the if statement to "x is strictly less than
zero" and in this case we put into the variable x the number obtained from the old value
of x by changing the sign; if x was non-negative, nothing happens. Then, regardless of
whether the condition is false or true, the writeln operator is executed, which prints what
is finally in the variable x.
Note how the indentation is arranged in our examples. The operators nested in the if,
i.e. being a part of it, are shifted to the right relative to what they are nested in, by the already
familiar four spaces; in the style we have chosen, we get a total of eight spaces - two
indentation sizes . In our example, these operators themselves are on the second level of
134

nesting.
Let us note one more important point. Beginners often make a rather typical mistake -
they put a semicolon before else; the program fails to compile after that. The point is that
in Pascal, the semicolon, as we have already mentioned, separates one operator from another;
when the compiler sees the semicolon, it considers that the next operator is over, and in this
case it is an if. Since the word else has no meaning by itself and can appear in the
program only as a part of the if operator, and this operator has already ended from
the compiler's point of view, the word else encountered afterwards leads to an error. Let
us repeat once again: in Pascal programs, no semicolon is placed before else as part of
the if statement!

2.2.8. Compound operator


In the previous paragraph it was said that the actions performed by the if statement
in case of true or false values of a condition are specified by a single operator (in the example
of the previous paragraph it was the writeln and assignment operators). But what should
we do if we need to perform several actions?
Here is a classic example of such a situation. Suppose we have variables a and b of
some numeric type in our program, and for some reason we need to make sure that the value
in variable a does not exceed the value in b, and if it does, we need to swap the values.
For temporary storage we need a third variable (let it be called t); but to swap the values of
two variables through the third variable, we need to make three assignments there, whereas
the if body has only one operator.
The problem is solved by using so-called operator brackets, which in Pascal use the
familiar keywords begin and end. By enclosing an arbitrary sequence of operators in
these "brackets", we turn the whole sequence together with the brackets into one so-called
compound operator. Taking this into account, our problem with the ordering of values in
variables a and b is solved by the following code fragment:

if a > b then
begin
t := a;
a := b;

134
Recall that the indentation size you choose can be two spaces, three, four, or exactly one tab character;
see page 242 for an explanation.
§ 2.2. Expressions, variables and operators 285
b := t
end

Let's emphasize again that the whole construction consisting of the words begin, end and
everything between them is a single operator - also, of course, belonging to the "complex"
ones, because it includes other operators.
Pay attention to the layout of code fragments containing a compound operator!
There are three different acceptable ways to arrange the construct shown above; in our
example, we moved begin to the line following the if statement header, but we did
not move it relative to if; as for end, we wrote it in the same column where the construct
that end closes begins.
The second popular way of arranging such a construction differs in that begin is left
on the same line as the if header:

if a > b then begin t := a; a := b; b := t end

Note that end remains where it was! You may think that it closes if; you may still insist
that it closes begin, but the horizontal position of the word end must in any case
coincide with the position of the line containing the thing (whatever it is) that this end
closes. In other words, end must be indented exactly the same way as the line containing
the thing that end closes. Following this rule allows you to "grasp" the overall structure of
the program with a unfocused eye, which is very important when working with source code.
The third variant is used comparatively rarely, when the word begin is shifted to a separate
nesting level, and what is enclosed in the compound operator is shifted even further. It looks like this:
if a > b then begin t := a; a := b; b := t end

We will not recommend the use of this style for a number of reasons, but in principle it is acceptable.

2.2.9. Logical expressions and logical type


Since we have started using logical ("Boolean") expressions, let's try to discuss them in
more detail. So far in the examples we have considered as logical only the operations "more"
and "less", denoted respectively by ">" and "<"; besides them Pascal provides the operations
"equal" ("="), "not equal" ("<>"), "greater than or equal to" (">="), "less than or equal
to" ("<=") and some others, which we will not consider.
Logical expressions, like arithmetic expressions, are evaluated: if, for example, we have
a variable x of type integer, then the expressions x + 1 and x > 1 differ only
in the type of value: the former is of the same type integer, while the latter is of
type boolean; in other words, while the expression x + 1 can result in an arbitrary integer,
the expression x > 1 has only one of two values, denoted true and false, but this value
- the result of comparison - is evaluated, just like the result of addition.
As we have already mentioned, boolean can act as a variable type, that is, we can
describe a variable that stores a logical value. The mere mention of such a variable is itself a
logical expression, and it can be used, for example, as a condition in an if statement;
variables of this type can be assigned values - of course, logical ones, i.e. if we put a variable
§ 2.2. Expressions, variables and operators 286
of boolean type to the left of the assignment sign, we have to write a logical expression
on the right.
In particular, we could rewrite the modulus calculation example using a boolean variable
to store an indication of whether we are dealing with a negative number. It would look like
this:

program modulo;
var
x: integer;
negative: boolean;
begin
read(x);
negative := x < 0;
§ 2.2. Expressions, variables and operators 287
if negative then
x := -x;
writeln(x)
end.

Here the variable negative after assignment will contain the value true if the
number entered by the user (the value of the variable x) is less than zero, and false
otherwise. After that we use the negative variable as a condition in the if
statement.
The Pascal language allows operations on logical values that correspond to the basic
functions of logic algebra (see §1.3.3). These operations are denoted by the keywords not
(negation), and (logical "and", conjunction), or (logical "or", disjunction) and xor
("excluding or"). For example, we could use the assignment operator flag := not flag
to change the value of the logical variable flag to the opposite value; we can check
whether the integer variable k contains a number written with one digit by using the logical
expression (k >= 0) and (k <= 9). Pay attention to the brackets! The point here
is that in Pascal the priority of logical connectives, including the and operation, is higher
than the priority of comparison operations, just as the priority of multiplication and division
is higher than the priority of addition and subtraction; if you don't put brackets, the
expression k >= 0 and k <= 9 will be "parsed" by the Pascal compiler according
to the priorities as if we wrote k >= (0 and k) <= 9, which will cause a compilation
error.
Pascal allows you to write any complex logical expressions; for example, if the variable
c is of type char, then the expression
((c >= 'A') and (c <= 'Z')) or ((c >= 'a') and (c <= 'z')))

will let you know if its current value is a Latin letter. Here we could do without brackets
around and, since the priority of and is higher than or anyway, but this is a case
where eliminating redundant brackets would not add any clarity to the expression.
Sometimes in programs of novice programmers you can meet in branching and loop
operators conditions formed by comparing a logical variable with a logical value, something
like

if negative = true then x := -x;

Formally, from the point of view of the Pascal compiler, this is not an error, because logical
values, like any other values, can be checked for equality or inequality to each other.
Nevertheless, you should never write like this; if you find this in your code, it means that you
probably haven't realized what logical expressions and logical values are and what their role
in programming is; it is also possible that you don't fully understand the essence of the
condition in branching and loop operators, because any such condition is a logical expression
and nothing else. If B is a logical variable or some more complex logical expression, then
instead of B = true you should write just B, and instead of B = false you
should use the negation operation, i.e. write not B. If the resulting expression does
not seem very clear after such a substitution, it makes sense to look for a more adequate name
§ 2.2. Expressions, variables and operators 288
for the logical variable you are using. For example, if your variable is simply called flag,
then writing if flag = true then may seem even more logical and understandable
than if flag then, but if instead of the impersonal word "flag", which can denote
any logical value at all, you use something more meaningful and relevant to the task at hand
- for example, found or exists, if you were looking for something, or some
negative, as in our example above, everything will become much clearer: if found
then looks more natural than if found = true then.

2.2.10. The concept of a loop; the while operator


In programming, a loop is understood as a certain sequence of actions that is executed
(repeated) a number of times in a row during program operation. This sequence of actions
itself, represented by one or several operators, is called the loop body, and each separate
execution of it is called an iteration; we can say that the execution of the whole loop consists
of a certain number of iterations. The loop body may be short or quite long, the number of
iterations may be quite small (two, three, one or even none) or may reach many billions; even
more interesting is the fact that the number of upcoming iterations may be known in advance
when the loop execution starts, or may be determined during the loop execution. There are
even some loops that are executed "indefinitely" - or rather, until the program executing such
a loop is stopped by someone.
Usually, something must change from iteration to iteration in a program when executing
a loop, otherwise the loop will never end; sometimes, however, such an infinite loop is
organized intentionally, but this is rather an exception. In the simplest case, the value of some
variable changes between iterations, and the logical expression defined for a particular loop,
which determines whether to continue or terminate the loop, depends on this variable.

There are three different loop statements in Pascal; the simplest of them is the while
loop .The header of this statement specifies a logical expression that will be evaluated before
executing each iteration of the loop; if the result of the evaluation is false, the execution of
the loop will be terminated immediately; if the result is true, the body of the loop, defined by
a single (possibly compound) statement, will be executed. Since the condition is checked each
time before the body is executed, this loop is called a preconditioned loop. The beginning of
the construct is marked with the while keyword, while the body is separated from
the condition by the do keyword. The syntax of the while statement can be
summarized as follows:
while <condition> do <operator>
Let's say, for example, that we need to display the same "Hello, world!" message on
the screen, but not once, but twenty times. Naturally, this can and must be done with the help
of a loop, and the while loop is quite suitable for this purpose , we just need to 135

figure out how to set the loop condition and how to change something in the program state so
that the loop is executed exactly twenty times and the condition is false on the twenty-first
time. The easiest way to achieve this is to simply count how many times the loop has already

135
Looking ahead, we should note that Pascal provides another control construct - for loop - just for such
situations when the number of iterations is known exactly when entering a loop.
§ 2.2. Expressions, variables and operators 289
been executed, and when it has been executed twenty times, do not execute it again. You can
organize such counting by introducing an integer variable to store a number equal to the
number of iterations that have already been executed. Initially we will put zero into this
variable, and at the end of each iteration we will increase it by one; as a result, at each moment
of time this variable will be equal to the number of iterations that have been executed so far.
This variable is often called a loop counter.
In addition to incrementing the variable by one, we need to do one more thing in the body
of the loop, which is actually what the loop is all about, i.e. printing the string. It turns out
that we need two operators in the loop body, and the while statement syntax provides
for only one operator as the loop body; but we already know that this is not a problem - we
just need to combine all the operators we need into one compound operator using the begin
and end operator brackets. Our entire program will look like this:

program hello20; var


i: integer;
begin
i := 0;
while i < 20 do begin writeln('Hello, world!'); i := i + 1
end end end.

Note that in the body of the loop we have placed the print statement first, and only then the
assignment statement that increments the variable i by one. For this particular task, nothing
would have changed if we had swapped them; however, we should not do so. As practice
shows, it is better - safer in terms of possible errors - to always follow one rather simple
convention: the preparation of variable values for the first iteration of a while loop
should take place just before the loop, and the preparation of values for the next iteration
should be located at the very end of the loop body. In this case, the preparation for the first
iteration consists in assigning zero to the loop counter, and the preparation for the next
iteration consists in incrementing the counter by one; we put the i := 0 operator before
the while loop itself, and the i := i + 1 operator last in its body.
We can approach this question in another way. Since the variable i stores the number of
lines printed so far (zero at first, then one more each time), it is quite logical to print the next
line first, and only then take this fact into account by increasing the variable by one.
The value of the loop counter can be used not only in the condition, as we did in the
hello20 program, but also in the body of the loop. Suppose we are interested in the squares
of integers from 1 to 100; we can print them, for example, like this:

program square100;
var
i: integer;
begin
i := 1;
while i <= 100 do
begin
writeln(i * i);
§ 2.2. Expressions, variables and operators 290
i := i + 1 end end.

It will not be very convenient to use the result of this program, because it will place each number on
a separate line. We can improve it by replacing writeln with write; but if we do so without
taking any additional steps, i.e., simply remove the letters ln and run the program as it is, the result
may be quite disconcerting:

avst@host:~/work$ ./square100
14916253649648110012114416919622525628932436140044148452957662567672978484190096
11024108911561225129613691444152116001681176418491936202521162209230424012500260
12704280929163025313632493364348136003721384439694096422543564489462447614900504
15184532954765625577659296084624164006561672468897056722573967569774479218100828
18464864988369025921694099604980110000avst@host:~/work$

The point is that the write operator fulfills our will literally: if we demanded to print a number,
it will print the digits that make up the decimal notation of this number, and nothing else - no
spaces or any other separators. If we look carefully at the output, we can see that the digits that make
up the decimal notation of the numbers 1, 4, 9, 16, etc. have not gone anywhere, just that the
numbers are not separated from each other.
It is very easy to solve this problem, just tell the write operator that we want it to print a space
character after each number. In addition, at the end of the program, i.e. after the loop, it is desirable
to add the writeln operator so that the program, before terminating, translates the line to print,
and the command line prompt after its completion would appear on a new line, not merged with the
printed numbers. The whole program will look like this:

program square100;
var
i: integer;
begin
i := 1;
while i <= 100 do
begin
write(i * i, ' ');
i := i + 1
end;
writeln
end.

Let us now consider an example of such a loop for which we do not know the number of
iterations in advance. Suppose we are writing a program that at some point must ask the user
his year of birth, and we need to check whether the entered number can really represent the
year of birth. Let's assume that the user's year of birth cannot be less than 1900 . In addition,
136

we will assume that the user's year of birth cannot be greater than 2020, since one-year-old
children do not know how to use a computer; if enough time has passed by the time you are
reading this book, you can adjust these values yourself.
Either way, we need to ask the user to enter their year of birth; if we are not satisfied with

136
At the time this book was written in 2016, there were only two people left on Earth about whom it was
reliably known that they were born before 1900, but when the second edition was being prepared for publication
in 2021, there were sadly no such people left.
§ 2.2. Expressions, variables and operators 291
the entry, we need to tell the user that they appear to have made a mistake and ask them to
repeat the entry. In the descriptions section, we can provide a year variable:

var
year: integer;

As for the dialog with the user itself, it can be implemented as follows:

write('Please type in your birth year: ');


readln(year);
while (year < 1900) or (year > 2020) do begin writeln(year,
' is not a valid year!'); write('Please try again: ');
readln(year)
end;
writeln('The year ', year, ' is accepted. Thank you!')

With such a program, for example, the following dialogue can take place:

Please type in your birth year: 1755 1755 is not a valid year!
Please try again: -500
-500 is not a valid year!
Please try again: 2050
2050 is not a valid year!
Please try again: 1974
The year 1974 is accepted. Thank you!

Note that the loop may not run even once if the user immediately enters a normal year; on the
other hand, the user may be stubborn, so strictly speaking we cannot know what the maximum
number of iterations of our loop is. We can assume that, say, a billion iterations is still too
many for the user's patience, but what exactly is the upper limit? A hundred? A thousand?
Assuming that a user can still be patient for 1000 iterations but not for 1001 looks rather
ridiculous, and any other specific number will look equally ridiculous in this role; it is easier
not to make any assumptions at all.
§ 2.2. Expressions, variables and operators 292

GS
Fig. 2.2. Block diagrams of cycles with precondition and postcondition

2.2.11. Cycle with postcondition; repeat operator


In the while statement, which was the subject of the previous paragraph, the
condition is checked first, and only then, perhaps, the first iteration is executed. As mentioned
above, such constructs in programming are called loops with a precondition.
In addition, when writing programs, postconditioned loops are sometimes used, in which
the loop body is executed first, and only then is it checked whether it should be executed
again; thus, in postconditioned loops, the body is executed at least once. The Pascal language
provides a special operator for loops with a postcondition, which is defined by the keywords
repeat and until; the operators that make up the body of the loop are written between
these words, and there can be as many such operators as you want, the use of operator brackets
is not required here; after the word until is written a condition for exiting the loop - a
logical expression, a false value of which indicates the need to continue the cycle, and the
true value - that the cycle is time to terminate. Block diagrams of loops with precondition and
postcondition are shown in Fig. 2.2.
For example, if for some reason we didn't need to dialog with the user, as in the example
on page 269, but just needed to enter integers from the keyboard until the next integer fell
within the 1900 to 2020 range, we could do it this way:

repeat
readln(year)
until (year >= 1900) and (year <= 2020)

Let's look at another example. The following loop inputs numbers from the keyboard and
adds them until the total sum is greater than 1000:
§ 2.2. Expressions, variables and operators 293
sum := 0;
repeat
readln(x);
sum := sum + x until sum > 1000

The syntax of the repeat operator can be represented as follows: repeat <operators>
until <condition>
Usually, the repeat operator is much rarer than while operator in programs, but you
should know about its existence anyway - sooner or later this construct will come in handy.

2.2.12. Arithmetic loops and for operator


Let us return to the examples given at the beginning of §2.2.10; recall that there we wrote
loops to print the same inscription twenty times and to print the squares of all integers from
one to one hundred. The loops we wrote to solve these simple problems have one very
important property: at the moment we enter the loop, we know exactly how many times it will
be executed, and the right number of iterations is provided by counting them in an integer
variable. Such loops are called arithmetic loops.
Arithmetic loops are so common that many programming languages, including Pascal,
have a special operator for them; in Pascal, this operator is called for. As is often the case,
it is easier to talk about it if we first give an example and then explain it, so we will start by
rewriting the programs hello20 and square100 using the arithmetic loop operator.
Let's start with the first one:

program hello20for;
var
i: integer;
begin
for i := 1 to 20 do
writeln('Hello, world!')
end. The construct for i := 1 to 20 do means that the variable i will be
used as a loop variable, its initial value will be 1 and its final value will be 20, i.e. it
must run through all values from 1 to 20, and for each such value the loop body will be
executed. Since there are 20 such values, the body will be executed twenty times, which is
what we need. If you compare the resulting program with the one we wrote on page 266, you
will notice that its text is much more compact; moreover, for a person who is already used
to the for syntax, this version is much easier to understand.
Let's now rewrite the square100 program, taking as a basis the variant that prints
numbers over a space. Using a for loop, the same effect can be achieved in the following
way:

program square100_for; var


i: integer;
begin
for i := 1 to 100 do
write(i * i, ' ');
§ 2.2. Expressions, variables and operators 294
writeln end.

As you can see, you can use the value of the loop variable in the for loop body. It should
be noted at once that this value can be accessed, but it should never be changed. Changing
the loop variable during loop execution is the prerogative of the for operator itself, and
attempts to interfere with its operation may lead to unpredictable consequences. Besides, there
is one more restriction related to the loop variable: after the for loop is completed,
the value of the loop variable is considered undefined, i.e. we should not assume that this
variable will be equal to a specific number. Of course, some value will be there, but it may
depend on the compiler version and even on the place where the loop is encountered in the
program. Simply put, the compiler's creators do not pay any attention to what value to leave
in the loop variable after the loop is finished and can leave anything there.
In both of our examples, the loop variable changed in the upward direction, but you can
make it run values in the opposite direction, from larger to smaller. To do this, the word to
is replaced by the word downto. For example, the program

program countdown;
var
i: integer;
begin
for i := 10 downto 1 do

write(i, '... ');


writeln('start!') end.

prints a string

10... 9... 8... 7... 6... 5... 4... 3... 2... 1... start!

Formally, the syntax of the for operator can be represented as follows:


for <trans> := <nach> to|downto <con> do <operator> Here <trans>
is the name of the integer variable, <nach> is the start value, and <con> is the end value.
Both of these values can be set not only by explicitly written numbers, as in our examples,
but also by arbitrary expressions, as long as the result is an integer. The expressions will be
evaluated once before the loop is executed, so if they include variables and the values of
these variables change during the loop execution, the loop itself will not be affected.
If after calculating the initial and final values, it turns out that the final value is less (and
for downto - on the contrary, it is more) than the initial value, it is not an error in itself: the
loop will not be executed even once, and in some cases this property can be used.
In fact, the loop variable used in the for statement can be more than just an integer type;
a little later we will introduce the concept of an ordinal type, which includes all integer types, as well
as ranges, character type, enumerated types, and logical type; the for statement can be used
with any such type. Of course, the type of the expressions that specify the start and end values must
match the type of the loop variable.
§ 2.2. Expressions, variables and operators 295
2.2.13. Nested loops
Consider the following task. We need to print a "slash" of the full screen size consisting
of the characters "*". The result should look like this:
*
*
*
*
*
*
*
*

(to save space, we have shown only eight lines, although there should be 24). To do this, in
principle, is very simple: we need to print 24 lines, and in each line we first print a certain
number of spaces, and then print an "asterisk" and translate the line. In the very first line we
don't print any spaces at all, we print an asterisk right away; we can consider that we print
zero spaces. Each subsequent line prints one more space than the previous line. If we consider
that the line numbers are from 1 to 24, it is easy to see that the line with the number n should
have n - 1 spaces.
It is clear that we should output strings in a loop, one iteration per line. Since at the
moment of entering the loop we know exactly how many iterations there will be, we should
use the for loop. Let's call the loop variable n, its value will correspond to the line
number. The loop ___ should 19 look like this :

for n := 1 to 24 do
begin
{ print the desired number of spaces }
writeln('*')
end

It remains to understand how to organize printing of the required number of spaces. We


already know how many spaces to print: n - 1. In other words, we need to replace the
mysterious "{ print the required number of spaces }" in our code with
something that will print n - 1 spaces. As it is easy to guess, we need a loop for this too,
and an arithmetic one too: we know at the moment of its start how many iterations there
should be. This is how we come to the concept of nested loops.
It is clear that another loop variable must be used in the nested loop so that the loops do
not conflict; after all, until the outer loop has completed, no one can change its variable,
including the inner loop. Having described the variable m for the inner loop, we get the
following program:

program StarSlash;
var
n, m: integer;
§ 2.2. Expressions, variables and operators 296
begin
for n := 1 to 24 do
begin
19
The curly brackets in Pascal mean a comment, i.e. a text fragment that is intended exclusively for the
human reader and should be completely ignored by the compiler. Hereafter, we will sometimes write comments
in Russian. In a textbook such liberty is acceptable, but in real programs you should not do this in any case:
firstly, Cyrillic characters are not included in ASCII; secondly, the world language of communication among
programmers is English. If comments are written in a program at all, they should be written in English, and if
possible without errors; otherwise, it is better not to write them at all.

for m := 1 to n - 1 do write(' ');


writeln('*') end
end.

Let's consider a slightly more complicated task - to display a figure of approximately this
kind:

*
* *
**
**
**
**
*

This figure is often called a "diamond". This time we will read the height of the figure from
the keyboard, i.e. we will ask the user to tell us how high he wants to see the "diamond".
Further analysis will be required.
First of all, we note that the height of our figure is always an odd number, so if the user
enters an even number, we will have to ask him to repeat the input; the same should probably
be done if the entered number is negative. An odd number is known to be represented as 2p+1,
where n is an integer; the top of our figure will consist of n+1 rows. For the figure shown
above, the height is seven lines, and p will be three.
Now we need to figure out how many spaces and where to print to get the shape we are
looking for. Note that when printing the very first line, we have to print n spaces, when
printing the second line - n - 1 space, and so on; when printing the last (n + 1)'th line, no
spaces are needed at all (we can consider that we print zero spaces).
The situation with spaces after the first asterisk is a bit more complicated. The first line
doesn't need any such spaces at all, there is only one asterisk there; but further on there is a
rather interesting process: in the second line you need to print one space (and after it a second
asterisk), in the third line - three spaces, in the fourth - five, and so on, each time two spaces
more. It is not difficult to guess that for the line number k (k > 1) the required number of
spaces is expressed by the formula 1 + 2(k - 2) = 2k - 3. It is interesting that with some stretch
we can consider this formula "true" also for the case k = 1, where it gives -1: if the operation
"print t spaces" is further defined for negative t as "return back to the corresponding number
of characters", it turns out that, having printed the first asterisk, we will have to go back one
§ 2.2. Expressions, variables and operators 297
position and print the second asterisk exactly on top of the first one. However, it is much more
difficult to do this than to simply check the value of k, and if it is equal to one, then after the
first asterisk we should not print any more spaces or asterisks, but translate the line at once.
The complete printing of a line with the number k should look like this: first we print n
+1 - k spaces, then an asterisk; after that, if k is equal to one, we simply issue a line feed and
consider the line feed finished; otherwise we print 2k - 3 spaces, an asterisk and only after
that we do a line feed. We need to do all this for k from 1 to n+1, where n is the "half-height"
of our "diamond".
Once the top of the figure has been printed, we need to somehow output the bottom of
the figure as well. We could continue numbering the lines and derive formulas for the number
of spaces in each line numbered n +1 < k 6 2p +1, which is in principle not that difficult;
however, we can do it even simpler by noticing that the lines we print now are exactly the
same as in the upper part of the figure, i.e., we first print the same line as n-th, then the same
as (n - 1)-th, and so on. The simplest way is to perform exactly the same printing procedure
for each line as described in the paragraph above, only this time the line numbers to we have
all numbers from n to 1 running in the reverse direction.
Our program will consist of three main parts: entering a number that means the height of
the figure, printing the top part of the figure, printing the bottom part. Here is its text (recall
that the words div and mod mean division with remainder and remainder of division):

program diamond; { diamond.pas }


var
n, k, h, i: integer;
begin
{ enter a number until the user enters it properly }
repeat
write('Enter the diamond's height (positive odd): ');
readln(h)
until (h > 0) and (h mod 2=1);
n := h div 2;
{ print the upper part of the figure } for k := 1 to n + 1
do begin
for i := 1 to n + 1 - k do
write(' ');
write('*');
if k > 1 then
begin
for i := 1 to 2*k - 3 do
write(' ');
write('*')
end;
writeln
end;
{ print bottom } for k := n downto 1 do begin
for i := 1 to n + 1 - k do write(' ');
write('*');
§ 2.2. Expressions, variables and operators 298
if k > 1 then
begin
for i := 1 to 2*k - 3 do write(' ');
write('*')
end;
writeln end
end.

It is easy to notice a very serious drawback of our program: the loops for drawing the upper
and lower parts of the figure differ only in the header, while the bodies in them are exactly
the same. In general, programmers think that this is not allowed: if we have two or more
copies of the same code in our program, then if we suddenly want to correct one of these
fragments (for example, if we find an error in it), most likely we will have to correct them all,
and this leads to unproductive labor costs (which is half the trouble) and provokes errors
because we edited part of it and forgot part of it. But we can deal with this problem only by
studying the so-called subroutines, which will be the subject of the next chapter.

2.2.14. Bitwise operations


Before going any further, let's try to complete our discussion of arithmetic expressions;
it would remain incomplete without bitwise operations, writing integers in number systems
other than decimal, and named constants.
Bitwise operations are performed on integers of the same type, but the numbers are not
treated as numbers themselves, but as strings of individual bits (i.e., binary digits) that make
up the machine representation. For example, if a bitwise operation involves a number 75 of
type integer, it means the bit string 0000000001001011.

Bitwise operations can be divided into two types: logical operations performed on
individual bits (all at the same time) and shifts. Thus, the not operation applied to an
integer (as opposed to the familiar not operation applied to a value of the boolean type)
results in a number whose bits are all opposite to the initial one. For example, if the variables
x and y are of type integer, after executing the operators

x := 75;
y := not x;

the variable y will contain the number -76, the machine representation of which
1111111110110100 is a bitwise inversion of the above representation of the number
75. Note that if x and y were of word type, i.e. unsigned type of the same digit
capacity, the result in the y variable would be 65460; the machine representation of this
number as a 16-bit unsigned number is the same as that of the number -76 as a 16-bit
signed number.
The and, or, and xor operations already familiar to us from §2.2.9 work in a
similar way on integers. All these operations are binary, that is, they require two operands;
when we apply them to integers, the corresponding logical operations ("and", "or", "excluding
or") are performed simultaneously on the first bits of the operands, on their second bits, and
§ 2.2. Expressions, variables and operators 299
so on; the results (also separate bits) are concatenated into an integer of the same type and,
consequently, of the same digit capacity, which becomes the result of the whole operation.
For example, an eight-bit unsigned representation (i.e., a representation of the byte type)
for the numbers 42 and 166 would be 00101010 and 10100110, respectively; if
we have variables x, y, p, q, and r of the byte type, then after the assignments of

x := 42;
У := 166;
p := x and y;
q := x or y; r := x xor y;

variables p, q, and r will get values of 34 (00100010), 174 (10101110), and


140 (10001100), respectively.
Bitwise shift operations, as the name implies, shift a bit representation a certain number
of positions to the left (shl, from the words shift left) or to the right (shr, shift right).
Both operations are binary, that is, they involve two operands; to the left of the operation
name is the original integer, to the right is the number of positions to shift its bitwise machine
representation. When shifting to the left by k positions, the higher k bits of the machine
representation of the number disappear, and to the right (i.e. as the lower bits) the zero bits
are added. Shift to the left by k positions is equivalent to multiplying the number by 2 . For
fc

example, the result of the expression 1 shl 5 will be the number 32, and the result
of 21 shl 3 will be 168.
When shifting to the right, the low-order bits disappear, and the zero bits are added to the
left. For unsigned integers this is equivalent to division by a power of two with discarding the
remainder, and the same is true for positive numbers, even represented as signed numbers,
but when shifting negative numbers to the right, the equivalence to division fails; it is
understandable - if you remember how signed integers are represented in the computer (see
page 206), it is obvious that the result of any shift to the right will be a positive number,
because the signed bit will contain zero. The built-in functions SarShortint,
SarSmallint, SarLongint and SarInt64 allow to remedy the situation; we leave
them for the reader to study on his own.

2.2.15. Named constants


The word "constant" originally refers to an expression whose value is always the same.
A trivial example of a constant is a literal, such as a simple number written explicitly. For
example, "37.0" is a literal that represents an expression of type real; obviously, the value
of this expression will always be the same, namely 37.0; hence, it is a constant. We can give
a more complex example of a constant: the expression "6*7". This is no longer a literal, it is
an arithmetic expression, and there are two literals here - the numbers 6 and 7; nevertheless,
the value of this expression is always the same, so this is an example of a constant.
Among all constants, compile-time constants are those whose value is determined by the
compiler while our program is being processed. All literals belong to such constants, which
is quite natural; besides, during compilation, the compiler can evaluate arithmetic expressions
that do not contain references to variables and functions. Therefore, "6*7" is also a compile-
§ 2.2. Expressions, variables and operators 300
time constant; the compiler itself calculates that the value here is always 42, and it is the
number 42 that it places in the machine code; neither sixes, sevens, nor the
multiplication operation will appear in the code.
In addition to compile-time constants, there are also run-time constants, which are
expressions that supposedly always have the same value, but the compiler cannot compute
this value at compile time for some reason, so it only becomes known at run time.
§ 2.2. Expressions, variables and operators 301
The exact boundary between these types of constants depends on the compiler
implementation; for example, Free Pascal can compile-time calculate sines, cosines, square
roots, logarithms, and exponents, although it is not required to do all this, and other versions
of Pascal do not.
However, you can also find limitations for Free Pascal: for example, string functions, even if
called with constant literals as parameters, are not evaluated at compile time. Thus, the following
fragment will cause a compile-time error, even though the copy function is just as built into the
Pascal compiler as the math functions like sine and logarithm mentioned above:

const
hello = 'Hello world!';
part = copy(hello, 3, 7);

We will discuss the copy function and other tools for working with strings in §2.6.11.
The mechanism of named constants allows you to associate a name, i.e. an identifier,
with a certain constant value (compile-time constant) and use this identifier instead of the
value written explicitly throughout the program text. This is done in the constants description
section, which can be placed anywhere between the header and the beginning of the main part
of the program, but usually programmers place the constants section as close to the beginning
of the file as possible - for example, right after the program header. The point here is that the
values of some constants can be (and are) the most frequently changed part of the program,
and placing constants at the very beginning of the program saves time and intellectual effort
when editing them.
For example, consider the program hello20for (see page 271); it outputs the
message "Hello, world!" "to the screen" (to the standard output stream) and does it 20
times. This task can be generalized in an obvious way: the program outputs a given message
a given number of times. We know from the school physics course that it is best to solve
almost any problem in a general form, and to substitute specific values at the very end, when
the general solution has already been obtained. The same way can be done in programming.
In fact, what will change in the program if we want to change the message being output? And
if we want to output the message not 20 times, but 27? The answer is obvious: only the
corresponding constants-literal will change. In such a short program as hello20for, of
course, it is not difficult to find these literals; but if the program consists of at least five
hundred lines? Five thousand? And this is far from the limit: in the largest and most complex
computer programs, the number of lines is in the tens of millions.

In this case, the constants defined in the code, on which the program execution depends,
are sufficiently arbitrary: this is clearly indicated by the fact that the problem is obviously
generalized to arbitrary values. It is logical to expect that we might want to change the values
of constants without changing anything else in the program; in this sense, constants are like
knobs on a variety of technical devices. Named constants make such "tuning" easier: if
without their use literals are scattered all over the code, then by giving each constant its own
name, we can collect all the "tuning parameters" at the beginning of the program text,
annotating them if necessary. For example, instead of the program hello20for we can
write the following program:
§ 2.2. Expressions, variables and operators 302
program MessageN; { message_n.pas }
const
message = 'Hello, world!'; { what to print } count =20;
{ how many times }
var
i: integer;
begin
for i := 1 to count do writeln(message) end.

As you can see, the constant descriptions section consists of the const keyword followed
by one or more constant descriptions; each such description consists of the name (identifier)
of the new constant, an equal sign, an expression specifying the value of the constant (this
expression itself must be a compile-time constant), and a semicolon. From the moment the
compiler processes such a description, the identifier introduced by this description will be
replaced by the constant value associated with it in further program text. The constant name
itself is, quite naturally, also considered a compile-time constant; as we will see later (e.g.,
when we study arrays), this fact is quite important.
The usefulness of named constants is not limited to facilitating "customization" of the
program. For example, it often happens that one and the same constant value occurs in several
different places in the program, and it makes sense that if you change it in one of the places,
you should also (synchronously) change all the other places where the same constant occurs.
For example, if we are writing a program that controls a storage chamber of individual
automated cells, we will certainly need to know how many cells we have. This will determine,
for example, counting the number of free cells, all sorts of user interface elements where we
need to select one cell from all available cells, and much more. It is clear that the number
meaning the total number of cells will occur time and again in different parts of the program.
If now the engineers suddenly decide to design the same storage chamber with a few more
cells, we will have to go through our whole program looking for the cursed number that must
be changed everywhere. It is easy to guess that such things are an inexhaustible source of
errors: if, say, one and the same number occurs thirty times in a program, we can be sure that
from the first look through we will "catch" only twenty such occurrences and miss the rest.
The situation becomes more complicated if there are two different, independent of each
other parameters in the program that happen to be equal to the same number; for example,
we have 26 cells of a storage room, and we also have a check printer that has 26 characters in
its line, and both numbers occur directly in the program text. If one of these parameters has
to be changed, we can be sure that we will not only miss some of the occurrences of the
required parameter, but we will also change the parameter that did not need to be changed
once or twice.
It is quite different if the number of cells of our storage box is explicitly mentioned
only once in the program - at the very beginning of the program, and then the name of a
constant is used throughout the text, for example, LockerBoxCount or something like
that. It is very easy to change the value of such a parameter, because the value itself is written
in exactly one place in the program; the risk of changing something wrong also disappears.
There is one more very important advantage of named constants: it is much easier to
understand a program created using them. Put yourself, for example, in the place of a person
§ 2.2. Expressions, variables and operators 303
who somewhere in the wilds of a long (say, several thousand lines) program stumbles upon
the number 80, written just like that, in digits in an explicit form - and, of course, without
comments. What is this "80", what does it correspond to, where did it come from? Maybe it
is the age of the program author's grandfather? Or the number of floors in a skyscraper on
New York's Broadway? Or the maximum number of characters allowed in a line of text
displayed on the screen? Or the room temperature in degrees Fahrenheit?
After spending a considerable amount of time, the reader of such a program may notice
that 80 is part of the network address, the so-called port, when establishing a connection
with some remote server; remembering that the port with this number is usually used for web
servers, it will be possible to guess that the program is trying to get something from
somewhere via HTTP protocol (and, by the way, it is not a fact that the guess will be correct).
How long will it take to analyze this? A minute? Ten minutes? An hour? It depends, of course,
on the complexity of a particular program; but if the DefaultHttpPortNumber
identifier had been used instead of the number 80 in the program, there would have been no
need to waste time at all.
In most cases, the rules of program code design simply forbid numbers written in explicit
form to appear in the program (outside the constants description section), except for 0, 1
and (sometimes) -1; all other numbers must be named. In some organizations,
programmers are prohibited to use not only numbers but also strings in the depths of the
program code, i.e. all string literals needed in the program must be put in the beginning and
named, and these names must be used in the further text.
The constants we have considered are called untyped constants in Pascal, because their type
is not specified in their description; it is inferred when they are used. In addition to them, Pascal (at
least its dialects related to the famous Turbo Pascal, including our Free Pascal) also provides typed
constants, for which the type is explicitly specified in the description. Unlike untyped constants, typed
constants are not compile-time constants; moreover, in the compiler's default mode, the values of
such constants are allowed to be changed at runtime, making the use of the name "constant"
questionable. We leave the history of the origin of this strange entity out of our book; the interested
reader can easily find the relevant materials on his own. Typed constants are not needed in our
course and we will not consider them. Anyway, in case you come across examples of programs that
use typed constants, something like

const
message: string = 'Hello, world!';
count: integer = 20;
remember that this is not the same as constants without type indication, and is similar in behavior
to an initialized variable rather than a constant.

2.2.16. Different ways of writing numbers


So far we have dealt mainly with whole numbers written in the decimal number system,
and when we had to work with fractional numbers, we wrote them in the simplest form - in
the form of the usual decimal fraction, in which the role of the decimal point plays the symbol
of the decimal point.
Modern versions of the Pascal language allow you to write integers in hexadecimal
§ 2.2. Expressions, variables and operators 304
notation, using the "$" symbol and a sequence of hexadecimal digits, with both upper- and
lowercase Latin letters for digits greater than nine. For example, $1A7 or $1a7 is the same
as 423.
Free Pascal also supports binary and octal literals. Octal constants begin with the "&" symbol,
while binary constants begin with the "7" symbol. For example, the number 423 can also be written
as 7,110100111 or as &647. Other versions of Pascal don't support this way of writing numbers;
Turbo Pascal didn't have it either.
As for floating-point numbers, they are always written in the decimal system, but even
here there is a form of notation that differs from the usual one. We have already encountered
the so-called scientific notation (see page 246) when we printed floating-point numbers;
recall that the mantissa, that is, the number satisfying the condition 1 6 t < 10, followed by
the letter "E" and an integer denoting the order (the degree 10 by which the mantissa should
be multiplied), was printed. Floating-point numbers can be written in the program text in a
similar way. For example, 7E3 is the same as 7000.0, and 2.5E-5 is the
same as 0.000025.

2.3. Subprograms
By the word "subprogram" programmers call a separate (i.e., having its own beginning,
its own end, and even its own variables) part of a program intended for solving some part of
a task. Almost any part of the main program or another subroutine can be allocated to a
subroutine; in the place where there used to be code that now appears in the subroutine, we
write the so-called subroutine call, consisting of its name and (in most cases) a list of
parameters passed to it. Through the parameters we can pass to the subprogram any
information it needs for its work.
Putting code parts into subroutines allows you to avoid code duplication, because the
same subroutine can be called as many times as you like from different parts of the program
once written. By specifying different parameter values when calling it, you can adapt the same
subroutine to solve a whole family of similar problems, saving even more on the amount of
code you have to write.
Experienced programmers know that saving code size is not the only reason for using
subroutines. Very often
in programs there are such subroutines that are called only once, which, of course, not only
does not reduce the size of the written code, but even on the contrary - increases it, because
the design of the subroutine itself requires writing several extra lines. Such "one-time"
subroutines are written to reduce the complexity of human perception of the program. By
correctly separating code fragments into subroutines and replacing them with their names in
the main program, we allow the reader of our program (and, by the way, mostly ourselves)
not to think about minor details when working with the main program.
Pascal provides two types of subroutines: procedures and functions. A procedure can
contain almost any set of actions, but it is important to make sure that these actions are
somehow related to each other, otherwise such a procedure will be of little use. Launching a
procedure is a separate operator in the program, which is called the procedure call operator.
As for functions, their main task is to calculate a certain value (by formula or otherwise), and
§ 2.3. Subprograms 305
they are called from arithmetic expressions, and their calls are also arithmetic expressions.
As we will soon see, procedures and functions look very similar. Originally in Pascal, functions
could only be called from expressions, but the creators of Turbo Pascal removed this "inconvenient"
restriction, so that functions can be used instead of procedures. In some other programming
languages there are only procedures or (much more often) only functions; you may encounter the
statement that dividing subroutines into two types is redundant and meaningless.
In reality, the absence of division into procedures and functions distorts the perception of the
most important phenomenon in programming - side effects, provokes their thoughtless application
and eventually cripples the programmer's thinking, which has already been mentioned in the prefaces
(see page 31). We will come back to this question many times.

2.3.1. Procedures
Subroutines, whether procedures or functions, are described in the program between its
header and the main program, i.e. in the familiar description section. To create a first
impression of the subject, let's take a very simple, albeit strange, example: let's rewrite our
very first program hello (see page 234), putting the only action it contains into a procedure.
It will look like this:

program HelloProc;

procedure SayHello;

begin
writeln('Hello, world!')
end;

begin
SayHello
end.

This program first describes a procedure named SayHello, which does all the work, and
the main program consists of a single action - calling this procedure. During the execution of
the program, the subroutine call is as follows. The computer remembers the address of the
137

memory location where the call instruction was encountered in the program, and then
transfers control to the called subroutine, i.e. proceeds to execution of its machine code. When
the subroutine is terminated, the return address memorized before its call is used to return
control to the place where the call originated, or, more precisely, to the next instruction after
the call.
In general, the structure of the procedure text is very similar to the structure of the whole
program: it consists of a header, a section of local descriptions (in our elementary example,
this section is not present, or rather, it is empty, but in the next example we will see how it
looks like when it contains something) and an analog of the "main part", the so-called body,
which looks exactly the same - it starts with the word begin, ends with the word end, and

137
The return address of a subroutine is stored in the so-called hardware (machine) stack; we will return to
this in the next part of our book, which is devoted to computer architecture and assembly language.
§ 2.3. Subprograms 306
contains operators inside. There are, however, some differences: the text of the procedure
ends with a semicolon instead of a dot, and the header may contain a list of formal parameters
(the SayHello procedure does not have this list, but it happens relatively rarely).
Our example illustrates what a subroutine is, but it does not show why it is needed:
compared to the original program with which we began our acquaintance with Pascal, the
new program is almost twice as long and much less understandable, so that it seems as if we
have gained nothing, but on the contrary, lost. This is true, but only for the reason that we
have put too simple an action into the procedure.
Now let's return to the diamond program from the previous paragraph (see page 276)
and try to make it more understandable. Let's first note that the program contains several times
a loop to print the required number of spaces; let's put this action into a procedure. In different

In Pascal, subroutine parameters are written in parentheses immediately after the subroutine
name in a list, very similar to the list of variable descriptions. In our case, the parameter will
be just one - the number of spaces to be printed. We'll call the procedure PrintSpaces
and the parameter count; the procedure header will look like this:

procedure PrintSpaces(count: integer);

Now let's remember that to print a given number of spaces we need a for loop, and in
it we need a variable of integer type as a loop counter. We can say for sure about this
variable that it does not concern anyone and nothing outside of our procedure, in other words,
it is such a detail of the procedure implementation that we do not need to know, unless we
write and edit our procedure itself. Most existing programming languages, including Pascal,
allow in such cases to describe local variables that are accessible (and generally visible) only
inside a single subroutine. For this purpose, the subprogram has its own description section,
which, like the description section of the main program, is located between the header and the
word begin. Our entire procedure will look like this:

procedure PrintSpaces(count: integer); var


i: integer;
begin
for i := 1 to count do write(' ')
end;

Let us emphasize that the names i and count in this procedure are local, i.e. they do
not affect the rest of the program: we can, for example, describe variables (and not only
variables) with the same names in any place, and perhaps of other types, and it will not lead
to any bad consequences.
Having described the procedure, we can now call it anywhere in the program by writing
something like PrintSpaces(k), and k spaces will be printed. The name of the
procedure, with a list of parameters enclosed in parentheses if necessary, is, in fact, the
procedure call statement.
Before rewriting the diamond program, let's recall the remark we made right after
writing it - that the bodies of two loops in this program turned out to be exactly the same and
§ 2.3. Subprograms 307
that it is impossible to do so, but it is impossible to cope with the problem without subroutines.
Now we have subroutines at our disposal, so let's fix this drawback at the same time. Recall
that the bodies of both cycles we have printed the next line of the figure with "half-height" n
and did it in the following way: first print n + 1 - k spaces, then an asterisk, then, if k > 1, then
print 2k - 3 spaces and another asterisk, and finally translate the line. As we can see, to perform
these steps we need to know two values: k and p, and this is enough to print the desired line
without paying any attention to what is going on around, including the fact in which of the
two cycles (i.e. in which phase of printing the figure) the program is now.
We will print a separate line of our figure in a procedure called
PrintLineOfDiamond; we will pass the numbers k and n to the procedure through
parameters. After replacing the bodies of both loops with calls of this procedure, the main
program will become quite short; the whole program will look like this:

program DiamondProc; { diamondp.pas }

procedure PrintSpaces(count: integer); var


i: integer;
begin
for i := 1 to count do write(' ') end;

procedure PrintLineOfDiamond(k, n: integer);


begin
PrintSpaces(n + 1 - k);
write('*');
if k > 1 then
begin
PrintSpaces(2*k - 3);
write('*')
end;
writeln
end;

var
n, k, h: integer;
begin
repeat
write('Enter the diamond's height (positive odd): ');
readln(h)
until (h > 0) and (h mod 2=1);
n := h div 2;

for k := 1 to n + 1 do
PrintLineOfDiamond(k, n);
for k := n downto 1 do
PrintLineOfDiamond(k, n)
end.
§ 2.3. Subprograms 308
Despite the abundance of service lines (when describing each procedure we spend at least
three extra lines - on the header, on the word begin and on the word end) and empty lines,
which we insert between subroutines for clarity, the new version of the program still turned
out to be four lines shorter than the previous one, and if we compare the length of their texts
in bytes, it turns out that we saved almost a quarter of the volume. However, such a modest
saving is only due to the primitive nature of the problem to be solved; in more complex cases,
the savings due to the use of subroutines can reach tens, hundreds, thousands of times, so that
it is simply unthinkable to create serious programs without dividing them into subroutines.
Before moving on, let us note one technical point. As in the description of ordinary
variables, in the description of subprogram parameters, parameters of the same type can be
listed comma-separated, specifying their type once; this is what we did when describing the
PrintLineOfDiamond procedure. If we need parameters of different types, the type is
specified for them separately, and a semicolon is placed between the descriptions of
parameters of different types. For example, if we want to improve the PrintSpaces
procedure so that it can print not only spaces but also any other characters, we can pass the
required character through a parameter:

procedure PrintChars(ch: char; count: integer); var


i: integer;
begin
for i := 1 to count do write(ch)
end;

Here, the parameters ch and count have different types, so a semicolon appears between
their descriptions in the header.
Sometimes there are procedures that do not require parameters; an example of such a
procedure is SayHello, which we described at the beginning of this paragraph. The Pascal
language allows in such cases not to write a parameter list either in the procedure description
or when calling it - the parameters disappear together with the parentheses in which their list
should be enclosed. Looking ahead, we note that this is true for functions as well.

The list of parameters specified when describing a subprogram - the one consisting of
variable names and their types - is often called the list of formal parameters, and the list of
values (or, more precisely, in the general case - expressions, the calculation of which gives
the values sought) specified when calling the subprogram - the list of actual parameters. We
do not use these terms, but it is useful to remember their existence.

2.3.2. Functions
In addition to procedures, Pascal also provides another type of subroutines - so-called
functions. At first glance, functions are very similar to procedures: they too consist of a
header, a description section, and a main part, just as local variables can be described in them,
etc.; but, unlike procedures, functions are designed to compute a value and are called from
arithmetic expressions. A function returns control to the calling fragment of the program not
just for nothing, but by communicating a calculated value; a function is said to return a value.
Let's consider the simplest example - a function that converts a floating-point number into
§ 2.3. Subprograms 309
a cube:

function Cube(x: real): real;


begin
Cube := x * x * x
end;

As you can see, the function header starts with the keyword function, then, as for a
procedure, the name and the list of parameters are written down; after it, the type of the return
value must be specified (with a colon) for the function, i.e., the type of the value our function
is intended to calculate. In this case, the function calculates a number of type real. Like a
procedure, a function can contain (but, like our Cube, may not contain) a section of local
descriptions, and its main part (the body of the function) is written between the words begin
and end, followed by a semicolon.
As we remember, a procedure call is a separate special kind of operator. This is not the
case with functions. They are called in the same way as procedures - by writing a name and,
if necessary, a list of parameter values in brackets after it; but, unlike procedures, this is no
longer an operator. A function call is an expression that has the same type as the return
value specified for the function. Thus, "Cube(2.7)" is an expression of type real. Of
course, this expression, as well as any other, can be a part of more complex expressions; for
example, in a program you may encounter something like
a := Cube(b+3.5) - 17.1;

where a and b are variables of type real; during the calculation of the expression in the
right part of the assignment, control will be temporarily given to the Cube function, and the
parameter called x in it will receive the value of the result of the calculation b+3.5; when
the function finishes its work, the number calculated by it will be used as a decreasing number
when performing subtraction, and the result of subtraction will be stored in the variable a.
Of course, the function can be called in a simpler way, for example, like this:

a := Cube(b);

or even like this:

a := Cube(a);

You could do it like this:


a := Cube(10);

but in such a situation it would be better to just write the number 1000 so as not to waste
time calculating it every time this operator is executed.
Free Pascal, like its inspiration Turbo Pascal, allows you to call a function in the same way as a
procedure - with a separate statement, ignoring its return value. We won't do that; moreover, we
strongly discourage you from doing so.
Like a procedure, a function is a subroutine, that is, it is a separate code fragment to which
§ 2.3. Subprograms 310
control is temporarily transferred when it is called. After finishing its work, a function returns
control just like a procedure, and there is only one fundamental difference: before returning
control to the one who called it, the function must fix the value that it was called to calculate.
The function "gives" this value to the caller together with the returned control, so
programmers say that the function returns a value, and this value itself is called returned.
As we have seen in the example, the return value is specified in the function code by a
special kind of assignment operator, which has the name of the function itself in the left part
instead of the variable name. Our function Cube consists of this operator, but there are more
complicated cases. For example, the sequence of Fibonacci numbers is well known, in which
the first two elements are equal to ones, and each subsequent one is equal to the sum of the
two previous ones: 1,1, 2, 3, 5, 8,13, 21, 34,..... The function that calculates the Fibonacci
number by its number could look like this (given the rapid growth of these numbers, we will
use integer type numbers for their numbering, and longint type numbers for working
with the Fibonacci numbers themselves):
function Fibonacci(n: integer): longint;
var
i: integer;
p, q, r: longint;
begin
if n <= 0 then
Fibonacci := 0
else begin q =:0 ; r := 1; for i := 2 to n do begin
p: = q;
q =:r ;
r := p + q

end;
Fibonacci := r end
end;
Some explanations will be appropriate here. The basic algorithm implemented in our function
operates under the condition that the number passed to the function through the parameter n
is not less than one. At each step, the two previous Fibonacci numbers and the current one
are stored in the variables p, q and r. Before starting the work, q contains 0 and r contains
the number 1, which corresponds to the Fibonacci numbers 0 and 1. Note that the variable
r now contains the current Fibonacci number, and the number of the current
number is considered to be the first. The loop located in the following lines "shifts" the
variables p, q and r by one position along the sequence, for this purpose p contains what
was in q (the penultimate number becomes the pre-next number), q contains what was in
r (the last number becomes the penultimate number), and r contains the sum of p and q,
i.e. the next Fibonacci number is calculated. In other words, each iteration of the loop
increases the number of the current number by one, and the current number itself appears in
the variable r each time. The loop is executed as many times as it takes for the number of
the current number to equal the value of the parameter n: if n is equal to one, the loop
is not executed at all, if it is equal to two, the loop runs one iteration, and so on. The resulting
§ 2.3. Subprograms 311
number r is returned as the final value of the function.
It is easy to see that all this can only work for parameter values (number of numbers) of
one and more, so we had to consider the case of n = 0 separately; this is what the if
statement is for. This allows our example to illustrate another important point: in the body of
a function there can be more than one assignment operator specifying the value that the
function will return, but each time our function is called, one and only one such operator must
be triggered, i.e. you cannot specify some return value and then "change your mind" and
specify another one.

2.3.3. Logical functions and conditional expressions


Pascal functions can return values of almost any type, which naturally includes the
138

boolean type (see §2.2.9). It is clear that a call to such a function will be a logical
expression; however, a crucial feature escapes the attention of many beginners: a call to a
function that returns boolean can itself serve as a condition in branching and loop
operators. This, in particular, allows you to avoid cluttering the text with complex (especially
multi-line) conditions by placing them in separate functions.
For example, if we need to check, as in the example on page 264, whether a variable c
(of type char) contains a Latin letter, the header of an operator (e.g., if) that uses such
a condition will be quite cumbersome:

if ((c >= 'A') and (c <= 'Z')) or ((c >= 'a') and (c <= 'z'))
then

This is exactly the case when you should not be lazy and write a logical function, calling it,
for example, IsLatinLetter:

function IsLatinLetter(ch: char): boolean;


begin
IsLatinLetter :=
((ch >= 'A') and (ch <= 'Z')) or
((ch >= 'a') and (ch <= 'z'))
end;

Our if will now be much more concise and, strangely enough, clearer:

if IsLatinLetter(c) then

Besides, once a function is written, we can use it elsewhere in the program if necessary. Just
in case, we remind you that if you notice that you want to write something like

The only exception is the family of so-called file variable types, which cannot be passed to subroutines
138

by value, returned from functions, or even simply assigned. We will consider these variables in Chapter 2.9,
devoted to working with files.
§ 2.3. Subprograms 312
if IsLatinLetter(c) = true then

- then this is a reason to reread the discussion on page 264 and make yourself finally get rid
of the bad habit of comparing logical values with constants, especially with the true
constant.

2.3.4. Parameter variables


The parameters we have used in our subroutines so far are sometimes called parameter-
values. The name of such a parameter, given to it in the subroutine header, is actually a local
variable into which the value specified in the call is stored when the subroutine is called
(hence the name "parameter-value"). For example, let us have a procedure p with parameter
x:

procedure p(x: integer);


begin
{ ... }
end;

We can call it by specifying an arbitrary expression at the call point as a parameter value, as
long as it results in an integer (of any integer type), for example:

a := 15;
p(2*a + 7) ;

In such a call, the value of the expression 2*a + 7 will be calculated first; the obtained
number 37 will be entered into the local variable (parameter) x of the procedure p,
after which the body of the procedure will be executed. It is obvious that from inside the
procedure we cannot influence the value of the expression 2*a +7 and even less the value
of the variable a, about which we know nothing at all, being inside the procedure. We can,
in principle, assign new values to the variable x (although some programmers consider this
a bad style), but when the procedure is finished, x will simply disappear along with all the
information we have written into it.
Some confusion among beginners is caused by the fact that you can (that is, no one
prevents you from) specifying the name of a variable of the corresponding type as a parameter
value:

a := 15;
p(a);

This case is no different from the previous one: the reference to the variable a is nothing
more than a special case of an expression, the result of the calculation of this expression is
the number 15, it is entered into the local variable x, and then the body of the procedure
is executed. We can say that variable x has become a copy of a, but this copy is not
connected to its original; actions on the copy do not affect the original in any way, i.e., as in
the previous case, we cannot affect the value of variable a from within the procedure.
§ 2.3. Subprograms 313
Meanwhile, in some cases it is convenient to be able to change the value(s) of one or
more variables located at the call point from within the subroutine. For example, this may be
necessary if our subroutine calculates more than one value, and the caller needs all these
values. We can describe a function to pass a single value from the subroutine to the caller, as
we did for cube expansion, for example; but what if we wanted to write a subroutine that, for
a given number, computes its square, cube, fourth degree, and fifth degree all at once? For
this simple example, we can specify a "frontal" solution to the problem: write not one
function, but four; but if we call them all sequentially, the result will be ten multiplications,
whereas it is enough to do only four of them to solve the problem. In more complicated cases,
there may be no "frontal" solution to the problem.
This is where parameter-variables come to the rescue, which differ from parameter-
values in that in this case it is not the value that is passed to the subprogram, but the variable
as such. The name of the parameter during the execution of the subprogram becomes
synonymous with the variable that was specified as a parameter in the call point, and
everything that is done with the parameter name inside the subprogram actually happens with
this variable.
In the subprogram header, the word var is placed before the parameter-variable
descriptions and acts before the semicolon or closing parenthesis; for example, for a procedure
with the header

procedure p(var x, y: integer; z: integer; var k: real);

parameters x, y and k will be parameter-variables, and z will be a parameter-value.


The four-degree problem can be solved using parameter-variables as follows:

procedure powers(x: real; var quad, cube, fourth, fifth: real);

begin
quad := x * x;
cube := quad * x;
fourth := cube * x; fifth := fourth * x end;

The powers procedure has five parameters; when calling it, you can specify an arbitrary
expression of type real as the first parameter , but the other four parameters require
139

specifying a variable, and a variable of type real and no other. This restriction is quite
understandable: inside the powers procedure, the identifiers quad, cube, fourth and
fifth will be synonyms for what we specify at the call point with the corresponding
parameters, and they are handled inside the procedure as variables of type real; if we try to
use a variable of any other type (i.e., using a different machine representation to store the
value), we will get complete chaos at the output, so the Pascal compiler does not allow such
things.

By the way, you can even use an integer expression; Pascal silently converts integers to floating-point
139

numbers if necessary, but to convert backwards, you have to explicitly specify how the conversion should be
performed: with rounding or with discarding the fractional part. We will talk about this in the future.
§ 2.3. Subprograms 314
A correct call to such a subroutine could look like this:

var
p, q, r, t: real;
begin
{ ... }

powers(17.5, p, q, r, t);

As a result of such a call, the variables p, q, r, and t will contain the second, third,
fourth, and fifth powers of the number 17.5.
Beginners are often confused by the word "variable"; a superficial perception of what is
going on may give the impression that only an identifier naming the variable can be
substituted for the parameter-variable in a call. But in reality it is not so; not all variables in
Pascal have identifier names, there are variables that are part of other variables, and there are
variables with no name at all. We haven't met them yet, but we will.
Passing information from a subroutine "out" is not the only use for parameter variables.
For example, later we will consider variables that are quite large; copying such a variable
would take too long, so that if you pass them by value, the whole program might be too slow.
When passing a variable (however large) to a subroutine via a parameter-variable, no
copying takes place, which makes it possible to use parameter-variables for optimization.
In addition, Pascal provides one special kind of variables (so-called file variables) that cannot
even be assigned, let alone copied; such variables can be passed to subroutines only through
parameter variables and in no other way.

2.3.5. Global variables


As we have already noted, a variable described in the description section of a subroutine
is visible only in the subroutine itself and nowhere else; such variables are called local
variables in subroutines. In contrast, variables described outside subroutines in a program
are visible from where they are described to the end of the program text. If your variable
will be used only in the main part of the program, it is best to describe it just before the main
part of the program , so that only there it will be visible; such variables can be called local
140

variables of the main part of the program with some stretch.


If a variable is described earlier, it will be visible in all subroutines described after it.
Such variables are called global variables. Theoretically, global variables can be used to
transfer information to and from a subroutine: for example, we can assign a value to a global
variable before calling a subroutine, and the subroutine will use this value, and vice versa, a
subroutine can put a value into a global variable, and the one who called the subroutine will
retrieve this value from the variable. Moreover, it is possible (theoretically) to organize
communication between different subroutines through global variables: for example, we call
first one subroutine, then another, and the first one puts something into global variables, and

The original version of Pascal proposed by Wirth did not provide such a possibility; the description
140

sections there had a strictly fixed order, and the variable description section had to come before the procedure
and function description sections. Current implementations of Pascal have no such restriction.
§ 2.3. Subprograms 315
the second one uses it. So, you should not do this. As far as possible, all communication
with subroutines should be maintained through parameters: transfer information to
subroutines through parameter-values, and from subroutines "outside" all information
should be transferred in the form of values returned by functions and, if necessary,
through parameter-variables.

The main reason for this lies, as it often happens, in the peculiarities of human perception
of a program. If the work of a procedure or function depends only on its parameters, it is much
easier to imagine the work of the program and understand what is going on; if the values of
global variables interfere, you will have to remember about them, that is, every time you look
at the call of a subprogram with some parameters, you will have to take into account that its
work will also depend on the values in some global variables; the parameters at the point of
call are clearly visible, which is not the case with global variables. Quite often it looks as if
some subprogram suddenly changes its behavior (most often from correct to incorrect), and
it is not easy to understand that global variables are to blame. It is said that global variables
accumulate state; unexpected changes in the behavior of some or other parts of the program
appear to be a consequence of this accumulation.
Besides, global variables can be accessed from many different places in the program, so,
for example, if you find out during debugging that someone managed to put a value into a
global variable that nobody expected there, you may spend a lot of time trying to find out
where in your program it happened; this is especially true when creating large programs where
several programmers work on them at the same time.
There is another reason why global variables should be avoided whenever possible. There
is always a possibility that an object that is currently one in your program will need to be
"multiplied". For example, if you are implementing a game and your implementation has a
game board, it is very, very likely that you may need two game boards in the future. If your
program works with a database, you can (and should) assume that sooner or later you will
need to open two or more such databases simultaneously (for example, to change the format
of data representation). The series of examples can be continued indefinitely. If we now
assume that the information critical for working with your database (or game field, or any
other object) is stored in a global variable and all subroutines are tied to the use of this
variable, you will not be able to make a "meta-transition" from one object instance to several.

2.3.6. Functions and side effects


The Pascal language does not impose any formal restrictions on the actions performed in
the body of a function, so that the work of a function does not have to be reduced to
calculations in the mathematical sense. For example, a function can return the current time or
something similar as its value. Moreover, a function can not only calculate or otherwise find
the value expected from it, but also do something else: print a string on the screen, change a
global variable, put a value into a variable passed as a var-parameter, and in
general do anything, no one prevents it from doing so. If an expression contains a call to such
a function, the result of evaluating the expression will be something else besides the value
obtained; in other words, something somewhere will change because the expression was
evaluated.
§ 2.3. Subprograms 316
Any changes in a program's execution environment, both in memory (in variables)
and outside of memory (e.g., the results of I/O operations), that occurred during the
evaluation of an expression are called side effects of that expression. In a sense, side effects
are counter-intuitive: we usually expect an expression to be evaluated and produce some
value, and if something else happens, it may come as a complete surprise to us (and to any
reader of the program).
For example, in the program for drawing a "diamond" of stars (see page 288) we could,
for the sake of further simplification of the main part of the program, put the organization of
the dialog with the user into the function. Recall that in the end we need a number n
representing the half-height of the figure, but the user is required to enter the whole height,
and in such a way that it is a positive odd number. Let's write a function that will perform the
whole dialog and return the number received from the user:

function NegotiateSize: integer;


var
h: integer;
begin
repeat
write('Enter the diamond's height (positive odd): ');
readln(h)
until (h > 0) and (h mod 2=1);
NegotiateSize := h
end;

Now the first six lines of the main part of the program can be replaced by a single assignment:

n := NegotiateSize div 2;
As you can see, the function eventually calculates the value, but it also outputs something to
the screen, reads something from the keyboard; all these I/O operations are nothing but its
side effects.
To tell the truth, it is not at all clear why we should do it this way, although people who
are used to some other programming languages create such functions surprisingly often. Let
us draw the reader's attention to the fact that in terms of simplifying the program text, we can
achieve almost the same success by using a procedure rather than a function, thus avoiding
the use of side effects altogether:

procedure NegotiateSize(var res: integer);


var
h: integer;
begin
repeat
write('Enter the diamond's height (positive odd): ');
readln(h)
until (h > 0) and (h mod 2=1); res := h
end;

A call to this procedure would look like this:


§ 2.3. Subprograms 317
NegotiateSize(n);
n := n div 2;

If you don't like the second line here, you can do the halving inside the procedure when
assigning a value to the res variable; actually, we pulled the halving out of the subroutine
(function first, then procedure) in our example purely for clarity reasons, to show that calling
a function with a side-effect can be part of a more complex expression - it's illustratively more
effective than just calling the function in the right-hand side of the assignment.
Many programmers sincerely believe that this solution is absolutely equivalent to the
previous one, also in terms of side effects, and that a "side effect" is in itself any change of
anything anywhere, i.e. any assignment, any input or output operation, etc. is a side effect.
Moreover, someone will probably try to convince you of the same thing. Don't fall for it!
They are deluded. Neither assignment nor I/O by itself has anything to do with side effects,
at least as long as we work in Pascal and refrain from writing functions (not procedures, that's
important!) that perform I/O or assignments to variables other than our own local variables.

The roots of mass misconceptions about the essence of side effects grow from the
popularity of C and C+ languages, in which, strange as it may seem, it is really true; when we
get to the study of C in the second volume, we will find out that there the whole program
execution consists of side effects, that is, literally completely, 100%, and this is not an
exaggeration - formally it is true. But first of all, there are no procedures there, and secondly
(which may completely discourage a person with insufficient experience) assignment is not
an operator, but an arithmetic operation.
Strange as it may seem, there are situations when the use of side effects (note,
consciously) can be justified, including in Pascal programs, i.e. it would be a mistake to say
that side effects should never be used at all. This is why Pascal allows functions that have
side effects. But in every such situation, another solution can be found that does not require
side effects; in general, Pascal allows you to do without side effects at all, and this is one of
its undoubted advantages. Since our task now is to learn to program well, we will try to take
advantage of this feature of Pascal and get used to the fact that an expression is always
evaluated just for the sake of its result. Later, when we learn C, where it is fundamentally
impossible to work without side effects, the habits formed now will help us distinguish
between unavoidable and often harmless side effects, many of which would not be so in other
languages at all, and inappropriate side effects, the application of which turns the program
into a rebus.

2.3.7. Recursion
Subprograms, as we already know, can be called from each other, but the matter is not
limited to this: a subprogram can call itself if necessary - both directly, when the body of the
subprogram explicitly contains a call to the subprogram itself, and indirectly, when one
subprogram calls another, another, possibly a third, etc., and some of them calls the first one
again.
Beginners at the mention of recursion are usually concerned about local variables; it turns
out that there is no problem, because the local variables of a subroutine are created when
§ 2.3. Subprograms 318
the subroutine is called and disappear when the subroutine returns control; a recursive
call will create a "new set" of local variables, and so on each time. If a procedure describes a
variable x and that procedure has called itself ten times, x will exist in eleven instances.

It is important to realize that recursion must end sooner or later, and for this purpose it is
necessary that each subsequent recursive call should solve the same problem, but for at least
a slightly simpler case. Finally, it is obligatory to identify the so-called recursion basis - such
a simple (usually trivial or degenerate) case in which further recursive calls are no longer
needed. If this is not done, the recursion will be infinite; but since each recursive call
consumes memory - for storing the return address, for placing parameter values, for local
variables - when the program goes into infinite recursion, sooner or later the available memory
will run out and the program will crash.
We will give a very simple example of recursion. In §2.3.1, we wrote the PrintChars
procedure that prints a given number of identical characters; the character itself and the
desired number of characters are passed through parameters (see page 289). This procedure
can be implemented using recursion instead of a loop. For this purpose, we should note that,
first, the case when the required number of characters is zero is a degenerate case in which
nothing needs to be done; second, if the case is not degenerate, then printing n characters is
the same as printing first one character and then (n - 1) characters, and the task "print (n - 1)
characters" is quite suitable as a "slightly simpler" case of the same problem. A recursive
implementation would look like this:

procedure PrintChars(ch: char; count: integer);


begin
if count > 0 then
begin
write(ch);
PrintChars(ch, count - 1) end
end;

As you can see, for the case where there is nothing to print, our procedure does nothing, but
for all other cases it prints one character, so that there is one less character to print; the
procedure uses itself to print the remaining characters. For example, if we call
PrintChars('*', 3), the procedure will print an asterisk and call
PrintChars('*', 2); the "new version" of the procedure will print an asterisk and
call PrintChars('*', 1), which will print an asterisk and call PrintChars ('*', 0); this
last call will do nothing and terminate, the previous one will also terminate, the one before it
will also terminate, and finally our original call will terminate. As you can easily see, the
asterisk will be printed three times.

It is often useful to use the reverse recursion, when a subroutine first makes a recursive
call and then performs some other actions. For example, suppose we have a task to print
(separated by spaces for clarity) the digits that make up the decimal representation of a given
number. It is not a problem to detach the lowest digit from the number: it is the remainder of
division by 10. All other digits of the number can be extracted by repeating the same process
for the original number divided by ten with discarding the remainder. The recursion can be
§ 2.3. Subprograms 319
based on the case of zero: we will not print anything in this case. If we must necessarily print
zero, yielding the number 0, then this special case can be handled by writing another
procedure that will print zero for the zero argument and call our procedure for any other
number. If you write something like
procedure PrintDigitsOfNumber(n: integer);
begin
if n > 0 then
begin
write(n mod 10, ' ');
PrintDigitsOfNumber(n div 10) end
end;

If we call it, for example, PrintDigitsOfNumber(7583), then it will not print exactly
what we want: "3 8 5 7". The digits are correct, but the order is reversed. It's
understandable, because we "chopped off" the lowest digit first and printed it immediately,
and so on, so they are printed from right to left. But the problem is solved with just one small
change:
procedure PrintDigitsOfNumber(n: integer);
begin
if n > 0 then
begin
PrintDigitsOfNumber(n div 10);
write(n mod 10, ' ') end
end;

Here we have swapped the recursive call and the print operator. Since returns from recursion
occur in the reverse order of entering recursion, the digits will now be printed in the reverse
order of the order in which they were "split", i.e., the same call will print "7 5 8 3", which
is what we needed.
Not only procedures, but also functions can be recursive. For example, in the following
example, the ReverseNumber function calculates the number obtained from the initial
number by "flipping backwards" its decimal entry, while the recursion itself takes place in the
auxiliary function DoReverseNumber:
function DoReverseNumber(n, m: longint): longint;
begin
if n = 0 then
DoReverseNumber := m
else
DoReverseNumber :=
DoReverseNumber(n div 10, m * 10 + n mod 10) end;

function ReverseNumber(n: longint): longint;


begin
ReverseNumber := DoReverseNumber(n, 0)
end;
§ 2.3. Subprograms 320
For example, ReverseNumber(752) will return the number 257, and
ReverseNumber(12345) will return 54321. How it works, we suggest the reader to
figure out for himself.

2.4. Program design


The material in this chapter may be incomprehensible for "complete beginners", because
when working with short programs, the problems discussed here simply do not have time to
manifest themselves. As long as your program fits entirely on the screen, there are no
difficulties in perceiving its structure, and, frankly speaking, there is no structure at all.
The situation changes dramatically when your program grows at least to several hundreds
of lines; it becomes problematic to keep its whole structure in your head at first, and then even
unreal. The possibilities of a professional are somewhat more than those of a beginner, i.e. a
professional and experienced programmer, of course, can navigate in much larger code
volumes, but for professionals the limit is very close. Programmers all over the world faced
this problem in the late 1960s when the complexity of everyday computer programs reached
such a level that even the best of the best could no longer cope with them.

2.4.1. The concept of structural programming


The programming languages used in the 1960s, for the most part, allowed for unrestricted
use in any
moment to transfer control to any place of the program - to make so-called unconditional
transitions. Such transitions confused the control structure of the program to such an extent
that its author himself often could not understand the resulting text . 141

Dutch scientist and programmer Edsger Dijkstra in 1968 in his article Go to statement
considered harmful proposed to refuse the practice of uncontrolled "jumps" between
142

different code places in order to increase program clarity. Two years before that, Italians
Corrado Böhm and Giuseppe Iacopini had formulated and proved a theorem, now commonly
referred to as the structural programming theorem; this theorem states that any algorithm
represented by a flowchart can be transformed to an equivalent algorithm (i.e. one that
produces the same output words on the same input words) using a superposition of only three
"elementary constructions": direct succession, in which one action is executed first, followed
by another; incomplete branching, in which a certain action is executed first, followed by
another; and branching, in which a certain action is executed first, followed by another action.
In practice, it is common to add also full branching and a loop with a postcondition; we have
already seen all these basic constructs in Figures 2.1 and 2.2 (see pages 258 and 270). All of
these basic constructs have one very important thing in common: each of them has exactly
one entry point and exactly one exit point.
The mysterious word "superposition" in this case means that each rectangle denoting

141
By the way, you should not think that it is impossible in modern conditions; it is not an exaggeration to
say that almost every programmer at least once got hopelessly confused in his own code. Many beginners start
to take the structure of their code more seriously only after such an incident.
142
This name can be roughly translated from English as "A go to operator considered harmful".
§ 2.4. Program design 321
(according to flowchart rules) some action can be replaced by a more detailed (or more
formal) description of this action, i.e., in turn, by a flowchart fragment that is also constructed
as one of the basic constructions. Such replacement is called detailing; the reverse
replacement of a correct flowchart fragment (i.e., a fragment that has one entry point, one exit
point, and is constructed as one of the basic constructs) by a single rectangle ("action") is
usually called generalization. Actually, the very possibility to generalize arises as a result of
observing the rule about one input point and one output point; a rectangle, which denotes a
single action in block diagrams, also has exactly one input and exactly one output, which
allows you to replace any basic structural programming construct with one rectangle, i.e. to
generalize it. This, in turn, allows you to make any flowchart simpler and simpler by hiding
minor details, until it as a whole is simple enough to understand at a glance.
While generalization is usually required when studying an existing program, detailing,
on the contrary, is widely used when creating new programs. One of the most popular
strategies of writing program code, which is called top-down step-by-step detailing, is to start
writing a program from its main part, and instead of some isolated fragments write so-called
stubs; a stub can be either a simple comment like "this is where this and that should happen"
or a call to a subroutine (procedure or function), for which only a header and an empty body
are written (for functions, as a rule, an operator specifying some return value is added). Then
the stubs are gradually replaced by working code, and new stubs appear, of course.
It is easy to see that each plug corresponds to a rectangle, which denotes a certain complex
action in the flowchart, which should be replaced by a more detailed fragment of the flowchart
in the process of detailing. By the way, we have already used top-down step-by-step detailing
when creating the StarSlash program (see page 274).

2.4.2. Exceptions to the rules: exit operators


Structural programming is a wonderful concept, but in some cases it turns out that the
program text can be made clearer if you depart from the strict canons a bit. Since the ultimate
goal is program comprehensibility and not structural programming canons themselves, you
should naturally choose the second one between observing the canons and unambiguous
increase of code comprehensibility.
Among all programmer's tricks that "slightly" violate the concept of structural
programming, the most prominent are all kinds of variants of "early exit", i.e. a forced
transition from somewhere in the "interior" of a program construct to its end. It is interesting
to note that in the Pascal variant, which was originally proposed by Wirth, there were no
special early exit operators, but there was an unconditional transition operator (the notorious
goto), which we will consider in the next paragraph; as it often happens, practice corrects
the theory, and modern Pascal variants have special operators for early termination of a loop
and its separate iteration, for immediate exit from a subprogram and for forced termination of
the whole program. We will consider them now.
Most often we need an early exit operator, called exit, to exit a subroutine. For
example, we can rewrite the function of calculating Fibonacci numbers given on page 292
using the exit operator as follows:

function Fibonacci(n: integer): longint;


§ 2.4. Program design 322
var
i: integer;
p, q, r: longint;
begin
if n <= 0 then
begin
Fibonacci := 0; exit
end;
q =:0 ;
r := 1;
for i := 2 to n do
begin
p: = q;
q =:r ;
r := p+ q end;

Fibonacci := r
end;

If in the previous variant we had to divide the whole body of the function into two branches
of the if operator and remember about it throughout the whole body of the function, here
we first process the "special case", and if it is this case, we fix the return value and
immediately terminate execution. Then we safely forget about the already processed case;
this technique allows us not to enclose the rest of the code (which actually implements
everything the function was written for) in the else branch.
The benefit of using the exit operator becomes more obvious as the number of special
cases grows. Suppose, for example, we need to write a subroutine that solves a quadratic
equation. It will get the coefficients of the equation through parameters; the result that our
procedure will return through parameters-variables will consist of three values: logical
(whether roots are found) and two values of real type - these will be the roots
143

themselves. We will not emphasize the case of matching roots, we will simply assign the same
number to both variables.
There are two special cases. First, the coefficient of the second degree may be zero, in
which case the equation is not a quadratic equation and cannot be solved as a quadratic
equation. Second, the discriminant may be negative. If we do not use the early exit operator,
the procedure will look like this:

procedure quadratic(a, b, c: real;


var ok: boolean; var x1, x2: real);
var
d: real;
begin
if a = 0 then

The reader familiar with complex numbers may notice that the roots of a quadratic equation always exist;
143

solving a quadratic equation in complex numbers is not very difficult, but now we have other goals in mind, so
we will solve it the "schoolboy way," in real numbers.
§ 2.4. Program design 323
ok := false
else
begin
d := b*b - 4*a*c;
if d < 0 then
ok := false
else
begin
d := sqrt(d) ;
x1 := (-b - d) / (2*a);
x2 := (-b + d) / (2*a);
ok := true
end
end
end;

The most interesting thing - the actual solution of the equation - was "buried" at the third
level of nesting, and in general, the control structure of our procedure looks quite scary.
Using the exit operator, we can rewrite it in a slightly different way:

procedure quadratic(a, b, c: real;


var ok: boolean; var x1, x2: real);
var
d: real;
begin
if a = 0 then
begin

ok := false;
exit
end;
d := b*b - 4*a*c;
if d < 0 then
begin
ok := false;
exit
end;
d := sqrt(d);
x1 := (-b - d) / (2*a);
x2 := (-b + d) / (2*a);
ok := true
end;

With a little trickery, you can make it even shorter:

procedure quadratic(a, b, c: real;


var ok: boolean; var x1, x2: real);
§ 2.4. Program design 324
var
d: real;
begin
ok := false;
if a = 0 then
Exit;
d := b*b - 4*a*c;
if d < 0 then
Exit;
d := sqrt(d);
x1 := (-b - d) / (2*a);
x2 := (-b + d) / (2*a); ok := true
end;

Obviously, the text of the procedure has become much clearer, which is greatly facilitated
by reducing the length of its body by one and a half times (it was 15 lines, now it is 10).
In practice, there are subroutines with a considerably large number of special cases - there
may be five, ten, whatever; if we try to write such a subroutine without exit, we simply
will not have enough screen width for structural indents. Besides, organizing the processing
of special cases with the help of nested if'cu constructions is not quite correct even at the
ideological level: the general case, which is obviously "more important" than all the special
cases considered separately, is processed somewhere in the depth of the control structure,
which distracts attention from it and contradicts its main role.

The whole program can also be terminated prematurely; this is done with the halt
operator. This operator can be used anywhere in the program, including any subroutine, but
it should be done with care. For example, novice programmers are very fond of "handling"
any erroneous situations by issuing an error message and immediately terminating the
program; to understand why this should not be done, it is enough to imagine a text editor that
would react in such a radical way to any incorrect keystroke.
The version of the halt operator included in Free Pascal has two forms: the normal form,
where the word halt is simply written in the program, and the parametric form, in which case an
integer expression is specified in parentheses after the word halt, i.e., something like
halt(1) is written. The parameter of the halt operator, if it is specified, sets the termination
code for our program, which allows us to tell the operating system whether we think our program
was successful or not. Code 0 means successful termination, codes 1, 2, etc. are considered
by the operating system as errors. Theoretically, any number from 0 to 255 (single-byte unsigned)
can be used as a termination code, but usually large numbers are not used in this role - in most cases
the termination code does not exceed 10.
The halt operator without parameters is equivalent to the halt(0) operator, i.e. it
corresponds to successful termination.
Finally, the operators for loop termination (break) and a separate loop iteration
(continue), which came into Pascal from the C language, are often useful. We can't
illustrate these operators on simple tasks, as we have considered so far, because they are not
needed there; but very soon we will encounter more complex tasks where break and
continue will allow us to simplify the program text considerably.
§ 2.4. Program design 325
2.4.3. Unconditional transitions
The goto unconditional jump operator allows you to transfer control to another
point in the program at any time or, if you like, to continue execution of the program from
another place. Operators to which unconditional jumps will be made are marked with so-
called labels, which can be ordinary identifiers or numbers-numbers; the compiler supports
the latter for the sake of compatibility with older Pascal dialects. Since a program with
identifier labels is obviously easier to read than a program with number labels, numbers are
not usually used as labels nowadays.
Labels should be listed in the description section using the label keyword to form the
label description section; the labels themselves are listed comma-separated, with a
semicolon at the end, for example:
label
Quit, AllDone, SecondStage;

Usually, label descriptions are inserted immediately before the word begin or before the
variable description section. It should be noted that Pascal prohibits "jumping" from one
subroutine to another, "jumping" into a subroutine from the main program and "jumping" into
the main program from subroutines, so it makes no sense to make labels "global". Labels used
inside a subprogram should be described in the description section of that subprogram, and
labels required in the main part of the program - just before it starts.
To mark an operator with a label, the label is written before the text of the operator,
separated from it by a colon, e.g. as follows:

writeln('We are before the point');


point:
writeln('And now we are at the point');

Switching to such a tag is done quite simply:

goto point;

In the literature, you can often find a statement that the goto operator supposedly "cannot
be used" because it confuses the program. In most cases this is true, but there are two (not
one, not three, but two) situations in which the use of goto is not only acceptable, but
also desirable.
The first of the two situations is very simple: jumping out of repeatedly nested control
constructs, for example, loops. The break and continue operators specially designed
for this purpose can handle the exit from one loop, but what to do if you need to "jump out"
from, say, three loops nested into each other? Of course, strictly speaking, you can do without
goto here: you can insert checks of some special checkbox into the loop conditions, check
the checkbox in the inner loop itself and make break;, and then all the loops will be
terminated. Note that in most cases the checkbox will have to be checked not only in the loop
conditions, but also in some parts of the loop bodies with if's that check the same
checkbox. It would be at least strange to claim that all this clutter would be clearer than a
§ 2.4. Program design 326
single goto operator (provided, of course, that the name of the label is chosen well and
corresponds to the situation in which the transition is made to it).
The second situation is a bit more difficult to describe, since we have not yet considered
either files or dynamic memory. Nevertheless, try to imagine that you are writing a subroutine
which, at the beginning of its work, takes a certain resource for a while and then gives it back.
This scheme of operation is quite common; in English, releasing the captured resources before
the end of the work is called cleanup, which can be roughly translated as cleaning. The need
to cleanup before exiting a subroutine does not present any problems if we have only one exit
point; the problems start if somewhere in the middle of the subroutine text there is a need to
end it early with exit. An attempt to do without goto will lead to the fact that everywhere
just before termination, i.e. before exit and before the end of the subroutine body, you will
have to duplicate the code of all operations that perform cleanup. Duplicating code is usually
not good: if we now change the beginning of the subroutine, adding or removing operations
to capture a resource, there is a high probability that out of the resulting several identical
fragments that perform cleanup, we will fix only some of them and forget about the rest.
That's why in such a situation we usually do something else: we put a label before the cleaning
operations located at the end of the subroutine (as a rule, it is called quit or cleanup),
and instead of exit we make goto to this label.
Note that in both cases, the transition is made "down" in the code, i.e. forward in the
execution sequence (the label is in the program text below the goto operator) and "out" of
the control constructs. If you feel the urge to geto backwards, it means that you are creating
a loop, and there are special operators for loops; try using while or repeat/until. If
you want to jump inside a control construct, it means that something has gone completely
wrong and you need to understand the reasons for such strange desires; note that Pascal will
not allow such a thing.

2.4.4. On the division of the program into subprogrammes


As we have already mentioned when discussing subprograms, a competent division of a
program into separate parts allows you to effectively combat the complexity of its perception;
this is the programmer's version of the principle of "divide and conquer".
Beginning programmers (usually schoolchildren or junior students) often make a serious
strategic mistake: neglecting subroutines, they try to realize the whole task as one big main
part. When the size of such a program exceeds a hundred lines, navigation through the code
becomes very difficult; to put it simply, when working with such a program (for example,
when you need to fix something in it), you spend more time searching for the necessary
fragment than fixing it. The same thing happens if you let any of the subprograms rather than
the main program "swell" significantly.
Ideally, each isolated part of a program, be it a subprogram or a main program, should be
so short that a quick glance at it is enough to understand its overall structure. Experience
shows that the ideal subroutine should not exceed 25 lines, although this limit is not, generally
speaking, quite rigid. If in your subroutine you need to consider many different possible
variants and act in accordance with this choice, it is quite acceptable to slightly exceed this
size: a subroutine of 50 lines in this situation is not considered a crime, although it is not
§ 2.4. Program design 327
welcomed; on the other hand, if the subroutine comes to the length of 60 or even 70 lines, it
should be immediately, without leaving it for an abstract "later", to break it into subtasks. If
the subprogram exceeds a hundred lines, you should reconsider your attitude to code design,
because this will never happen if you have the right approach.
The number 25 did not arise here by chance. The traditional screen size of an alphanumeric
terminal is 25x80 or 24x80 (24 or 25 lines of 80 characters per line), and it is considered that a
subroutine should fit entirely on such a screen, so that you don't have to scroll to analyze it. It is quite
possible that you personally prefer to work with text editors that use the graphical mode, and you can
fit much more than 25 lines on your screen; this, in fact, does not change the situation, because,
firstly, it is hard to perceive a text much longer than 25 lines, even if you managed to put it on the
screen; secondly, many programmers prefer large fonts when working in the graphical mode, so their
screen will not fit more than 25 lines.
The question of optimal subprogram size can be approached from another angle. Each
subprogram must solve exactly one task, and you must formulate for yourself what specific
task this subprogram will solve. If this task is complex enough that it is possible to separate
subtasks from it, they should be separated - this increases the clarity of the program.
Note that the rule "one subroutine - one task" will also help you when choosing
parameters for a subroutine. Beginner programmers often make a typical mistake, providing
a subroutine with such parameters, the meaning of which cannot be explained without
explaining the principle of operation of the calling subroutine. This style is no good. The
answer to the question "what does this subroutine do" should consist of a single phrase,
and from this phrase it should be clear, at least in the first approximation, what the semantics
of all the subroutine parameters is. This principle is called the rule of one phrase.
As long as we are talking about parameters, let us note one more point. A subroutine
with no more than five parameters is easy to use; a subroutine with six parameters is
somewhat difficult to use; subroutines with seven or more parameters complicate rather
than facilitate work with code. This is caused by the peculiarities of the human brain. It is
easy enough for us to keep in memory a sequence of five objects, not everyone can keep in
memory a sequence of six objects, and if there are seven or more objects, it is simply
impossible to keep them in memory with a single picture, we have to memorize the
corresponding sequence, and then spend time and effort to remember what we have
memorized. As long as your subroutine has no more than five parameters, you can usually
easily keep their semantics in memory (if it is not so - most likely, the subroutine is
unsuccessfully designed), so it will be easy for you to make a call to such a subroutine. If
there are more parameters, writing each call of this subroutine will turn into painful and not
always successful picking in memory or, more likely, in the program text; such things
inevitably distract the programmer from the current task, forcing him to recall irrelevant
details of the code written earlier.
There is one more rule of thumb that helps to correctly divide the task into subtasks. Each
selected subtask must be such that when working on the calling subprogram it is possible not
to remember the details of implementation of the called one, and vice versa, when working
on the called one - not to take into account the details of implementation of the calling one. If
this rule is not fulfilled, reading code with subroutines division into subroutines may become
more difficult than before such division - because when analyzing the code, you will have to
constantly "jump" between the bodies of two subroutines. To avoid this, the text of the called
§ 2.4. Program design 328
subroutine should be made understandable to a person who has never seen the text of the
calling subroutine, and vice versa. If two interacting subroutines cannot be understood
without each other, such division into subroutines is useless and may even be harmful.

2.5. Symbols and their codes; text data


As we already know from §1.4.5, computer texts are represented as sequences of
numbers, each of which corresponds to one character of the text, and the same character is
always represented by the same number; this number is called the character code.

| .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
.0
0.INUL
| SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
.
1.IDLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
.
2.ISPC ! " # $ % & ' ( ) * + , - . /
3. I0 1 2 3 4 5 6 7 8 9 ; < = > ?
4.| @ A B C D E F G H I J K L M N O
5.I P Q R S T U V W X Y Z [ \ ] ~
6.I ' a b c d e f g h i j k l m n o
7.I p q r s t u v w x y z { | } ~ DEL
Figure 2.3. Hexadecimal ASCII codes

In this chapter we will consider a somewhat simplified situation: assume that one byte is
sufficient to store the code of a single character. ASCII and its many eight-bit extensions (see
§1.4.5) fit into this picture; moreover, many programs written for single-byte character codes
will work quite well with UTF-8. As for the "quite correct" handling of multibyte characters,
this is a topic for a separate discussion, and a rather complex one at that; a detailed study of
this issue would distract us from more pressing tasks, so we will leave these problems outside
the scope of this book.

2.5.1. Tools for working with symbols in Pascal


As we already know, Pascal has a special type called char for storing characters. The
values of expressions of this type are single characters, such as 'a', '7', 'G', '!', '
' ' (space), as well as special characters, such as line feeds, tabs, and so on. When
discussing the computer representation of textual information in the first part of our book, we
already noted that the specific set of available characters depends on the encoding used, but
we can say for sure that the characters of the ASCII table are always and everywhere present.
We have given the ASCII table itself (with character codes in decimal) in Fig. 1.14, page 216.
For convenience, in Fig. 2.3 we have given the same table with hexadecimal codes;
combinations of two and three capital Latin letters denote special (control) characters; a space
is labeled "SPC" for clarity. Of all the variety of these "tricky" symbols, which are not really
symbols, we may be interested in NUL ("symbol" with code 0, when printed on the screen
does not change anything; often used as a string buffer limiter); BEL (bell, code 7, when
§ 2.5. Symbols and their codes; text data 320
printed on the screen does not change anything, but supposedly should sound a short beep; on
teletypes rang a bell); BS (backspace, code 8, moves the cursor one position to the left when
printed), HT (horizontal tabulation, code 9; moves the cursor to the right to the nearest
position that is a multiple of eight or another number set in the terminal settings), LF
(linefeed, the familiar line feed with code 10), CR (carriage return, code 13, moves the
cursor to the beginning of the current line) and ESC (escape, when printed, usually indicates
the beginning of a control code sequence, for example, to move the cursor to a given position).
Some of the control characters can be received by the program when entering text from
the keyboard: These are LF (by pressing Enter), HT (Tab key), BS and DEL (Backspace
and Delete keys, respectively), ESC (Escape key); in addition, "characters" with control
codes when working with a terminal (both real and software-emulated) can be entered by
pressing a combination of keys with Ctrl: Ctrl-@ (0), Ctrl-A (1), Ctrl-B (2),
...., Ctrl-Z (26), Ctrl-[ (27), Ctrl-\ (28), Ctrl-] (29)., Ctrl-~ (30), Ctrl-_
(31), but many of these codes are processed by the terminal driver itself, so that the program
running in the terminal does not see them. For example, if you press Ctrl-C, the terminal
driver will not give your program the character code 3; instead, it will send a special signal
to the program that will simply kill it (we discussed this effect in the introduction, see page
103). If necessary, the terminal driver can be reconfigured, and your program can do this as
well; in particular, later we will encounter programs that refuse to be killed by pressing
Ctrl-C.
In a Pascal program, characters, as we already know, are denoted by apostrophes: any 155

character enclosed in apostrophes denotes itself. In addition, you can define a character using
its code (in decimal system): for example, #10 means a line feed character, and #55 is
exactly the same as '7' (as you can see from the table, the code of the character seven is
3716, i.e. 55sh). To specify special characters with codes from 1 to 26, you can also use the
so-called "carriage" notation: ~A means a character with code 1 (i.e., the same as #1), ~B
means the same as #2, and so on; instead of #26, you can write ~Z.
For example, as we already know very well, the writeln operator, having printed all
its arguments, produces a line feed character at the end, which moves the cursor to the next
line. However, no one prevents us from issuing a line feed character without writeln; in
particular, instead of the familiar

writeln('Hello, world!')

we could write

write('Hello, world!', #10)

or

If your system uses Unicode-based encoding, you can easily make the mistake of putting between
155

apostrophes a character whose code takes more than one byte; the compiler cannot handle it. The universal
recipe here is very simple: you should use only characters from the ASCII set in the program text, and place
everything else in separate files if necessary.
§ 2.5. Symbols and their codes; text data 321
write('Hello, world!', ~J)

(in both of these cases, the string is output first, and then the line feed character separately);
looking ahead, we note that you can do even trickier things by "driving" the line feed character
directly into the string itself in one of the following ways:

write('Hello, world!'#10) write('Hello, world!'~J)

(here in both cases write prints only one line, but that line contains a line feed character
at the end).
The apostrophe character is used as a limiting character for literals representing both
single characters and strings; if it is necessary to specify the "'" character itself, it is doubled,
thus telling the compiler that it is the apostrophe character itself, and not the ending of the
literal, that is meant. For example, the phrase "'That's fine!" in a Pascal program is specified
as "'That''is fine!". If we need not a string but a single apostrophe character, the
corresponding literal will look like this: "''''"; the first apostrophe indicates the beginning of
the literal, the next two apostrophes are the apostrophe character itself, and the last one is the
end of the literal.
Comparison operations are defined on characters in the same way as on numbers; in fact,
simply the character codes are compared: for example, the expression 'a' < 'z' would
be true, and '7' > 'Q' would be false. This, as well as a continuous arrangement in
the ASCII-table characters of some categories, allows you to calculate a single logical
expression to determine whether the symbol belongs to such a category; so, the expression
(c >= 'a') and (c < = 'z') is a record on the ASCII-table.= 'z')
is an entry in Pascal for the question "is the character in variable c a lowercase Latin letter";
similar expressions for uppercase letters and numbers look like (c >= 'A') and (c
<= 'Z') and (c >= '0') and (c <= '9').
At runtime of a program, it is possible to get the numerical value of its code by an existing
character, i.e. an expression of char type; for this purpose, the built-in function ord . is 156

used. For example, if we have variables

var
c: char;
n: integer;

then the assignment n := ord(c) will be correct, and the code of the character stored in
the variable c will be entered into the variable n. The reverse operation - getting a
character by a given code - is performed by the built-in function chr; assignment c :=
chr(n) will put into the variable c the character whose code (as an ordinary number)
is in n - provided that this number was in the range from 0 to 255; the result of chr for
other values of the argument is undefined. Of course, chr(55) is the same as #55
or '7'; but, unlike literals, the chr function allows us to construct a character whose code
we didn't know at the time we wrote the program or which changes during the program. For

In fact, ord can do more than this; we will consider its full capabilities when we discuss the generalized
156

notion of ordinal types.


§ 2.5. Symbols and their codes; text data 322
example, a table similar to the one shown in Fig. 2.3 (only without the control characters) can
be printed using the following program:

program PrintAscii; { print_ascii.pas }


var
i, j: integer;
c: char;
begin
write(' |'); { first header line }
for c := '0' to '9' do
write(' .', c);
for c := 'A' to 'F' do
write(' .', c);
writeln;
write(' |'); { second header line }
for i := 1 to 16 do write( '');
writeln;
for i := 2 to 7 do{ the table itself, row by row }
begin
write(i, '.|');
for j := 0 to 15 do { print a single character }
write(' ', chr(i*16 + j)); writeln
end end.

As you can see, here chr is taken from the expression i*16 + j: in each line we have
16 characters, i contains the number of the line, j - the number of the column, so this
expression is just equal to the code of the desired symbol, it remains only to turn this code
into a symbol, which we do with the help of chr. The result of the program looks like this:

|.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .
| F
2.
! " # $ % & ' ( ) * + , - ./
|
3. 0 1 2 3 4 5 6 7 8 9 ;< = >?
|
4. @ A B C D E F G H I J KL M NO
|
5. P Q R S T U V W X Y Z [\ ] ~
|
6. ' a b c d e f g h i j kl m no
|
7. P q r s t u v w x y z {| } ~
|
2.5.2. Character input
Let's consider the opposite problem, when we have to get the code of a character unknown
at the moment of writing the program. We already know that we can use read to read,
for example, an integer from the keyboard; but what happens if our program tries to do this
and the user enters some gibberish? Usually, when entering numbers, the read operator
checks if the user input is correct, and if the user makes a mistake and enters something that
doesn't match the program's expectations, it displays an error message and terminates the
program. It looks like this:
§ 2.5. Symbols and their codes; text data 323
Runtime error 106 at $080480C4
$080480C4
$08061F37

If we know that the program is written in Free Pascal, we can find (for example, on the
Internet) information about what 106 is, but that's all; if the program is intended for a
user who doesn't know how to program himself, such diagnostics is useless for him and can
only spoil his mood; besides, as it was said before, it's not a good idea to terminate the program
because of any error.
We can "take the initiative" and tell the compiler that we will handle errors ourselves.
This is done by inserting a rather strange-looking {$I-} directive into the program text
("I" from the word input, "-" means that we disable built-in diagnostic messages for user
input errors). After that, we can always find out whether the next input operation was
successful or not; the built-in function lOResult is used for this purpose. If this function
returns 0, the operation was successful; if it is not zero, it indicates that an error has occurred.
In our case, if the user enters gibberish instead of a number, lOResult will return the
above-mentioned 106. For example, a program multiplying two integers could look like this
(taking into account the use of lOResult):
program mul;
var
x, y: longint;
begin
{$I-}
read(x, y);
if IOResult = 0 then writeln(x * y) else writeln('I
couldn't parse your input') end.

Of course, the phrase "I couldn't parse your input", which means "I couldn't parse your input",
looks friendlier than the frightening "Runtime error 106", but it doesn't solve the
problem completely. We don't know whether the user made a mistake when entering the first
number or the second one, which character caused the error, in which input position it
happened - in fact, we don't know anything at all, except that the user entered something
indecipherable instead of a number. This deprives us of the possibility to give the user a
diagnostic message, the informational value of which would be at least a little higher than the
sacramental "user, you are wrong".
For the trivial case of entering two integers it is not a problem, but when parsing more
complex texts, especially all sorts of files containing text in some formal language, such
features are useless: the program must explain to the user in detail what exactly is the error
and where it is made, otherwise it will be absolutely impossible to work with such a program.
The only option when we retain full control over what is happening and can make our
program's reaction to errors as flexible as we want is to refuse the services of the read
operator to turn the digits entered by the user (i.e. the textual representation of a number) into
a number and do it all by ourselves, reading the user input character by character.
Reading an integer, if we decided to implement it ourselves, should be placed in a
§ 2.5. Symbols and their codes; text data 324
procedure for convenience. In the simplest case, we can entrust this procedure with issuing
an error message, although, of course, this will somewhat reduce its scope - in any case, it is
unlikely that we will be able to use the same procedure in other programs of ours, because in
them such an error message, and not another, may not fit the general style, or it may be
necessary to issue it not by printing it to the screen, but somehow more cleverly - for example,
with the help of a dialog box or something else; but for the sake of simplicity, we will do it
this way. To make it more universal, let our procedure work with numbers of the longint
type; we'll call it ReadLongint.
Before we start writing the procedure, note that it may succeed, in which case it must
somehow inform the caller of the number read; but the user may enter something that cannot
be interpreted as a number, in which case the procedure will have nothing to tell us about the
number read, but will still have to inform the caller that it failed. The procedure will send the
results of its work "outward" through the parameter variables , of which there will be two:
157

of type boolean to notify about success/failure and of type longint to pass the read
number. If the number is correct, the procedure will put the value true into the first
variable and the read number into the second variable; if the user made a mistake, the
procedure will print a message about it and put false into the first variable, while the
second variable will not be touched at all, because there is nothing to put into it - the number
has not been read. The caller can call the procedure again if he wants to.
We will use two considerations when forming a number from the resulting characters.
First, as we have seen, the codes of characters-digits in the ASCII table are consecutive,
starting from 48 (code zero) and ending with the number 57 (code nine); this allows us to
obtain the numerical value of the character-digit by subtracting the code zero from the code
of the character in question. Thus, ord('5') - ord('0') equals 5 (53 - 48), ord('8')
- ord('0') in the same exact way equals 8, and so on.
Secondly, you can compose the numerical value of the decimal notation of a number from
the individual values of its digits, looking through these digits from left to right, by following
a fairly simple algorithm. To begin with, we need to create a variable in which the desired
number will be formed, and put zero there. Then, after reading the next digit, we increase the
already accumulated number tenfold, and to what we get, add the numerical value of the
freshly read digit. For example, when reading the number 257 we will have zero in the
variable before we start reading; after reading the digit "2" the new value of the number will
be 0 - 10 + 2 = 2, after reading the digit "5" we will get 2 - 10 + 5 = 25, after reading the last
digit we will get 25 - 10 + 7 = 257, which was required.
It remains to be noted that for character-by-character reading we can use the
familiar read operator, specifying a variable of char type as a parameter. To make it
more universal, let our function work with numbers of the longint type. The final text
will look like this:

{ char2num.pas }
procedure ReadLongint(var success: boolean; var result:
longint);

If the concept of parameter-variable is causing uncertainty, it's time to reread §2.3.4.


157
§ 2.5. Symbols and their codes; text data 325
var
c: char;
res: longint;
pos: integer;
begin
res := 0;
pos := 0;
repeat
read(c);
pos := pos + 1
until (c <> ' ') and (c <> #10);
while (c <> ' ' ') and (c <> #10) do
begin
if (c < '0') or (c > '9') then
begin
writeln('Unexpected ''', c, ''''' in pos: ', pos);
readln;
success := false;
exit
end;
res := res*10 + ord(c) - ord('0');
read(c);
pos := pos + 1
end;
result := res;
success := true
end;

Note that when an error is detected, we not only report it, but also execute the readln
statement; called without parameters, this statement will remove all characters up to the
nearest line feed from the input stream; in other words, if we detect an error in user input, we
discard the entire line in which the error was detected. Try experimenting with our procedure,
making various errors in various quantities (including several errors in one line), first as we
present it, and then by removing the readln operator; its purpose will probably become
obvious to you.
§ 2.5. Symbols and their codes; text data 326
To demonstrate work with the ReadLongint procedure, let's write a program that
will ask the user for two integers and output their product. Its main part can look like this, for
example:

var
x, y: longint;
ok: boolean; begin repeat
write('Please type the first number: ');
ReadLongint(ok, x) until ok; repeat write('Please type
the second number: '); ReadLongint(ok, y) until ok; writeln(x,
' times ', y, ' is ', x * y) end.

In fact, in Unix systems the tradition of organizing dialog with the user is somewhat different: it is
considered that the program should not ask the user questions and should not say anything at all, as
long as everything goes as it should; something should be said only in case of errors. Applied to our
program, this means that write statements that issue input prompts should be simply
removed.
If, in addition, we turn our ReadLongint procedure into a function that returns a logical value
(instead of passing it through a parameter variable), we get an obvious source of side effects in the
program (see §2.3.6), but we can rewrite our main part of the program much shorter:

var
x, y: longint;
begin
while not ReadLongint(x) do ;
while not ReadLongint(y) do ;
writeln(x, ' times ', y, ' is ', x * y) end.

This program is remarkable in that the bodies of both loops are empty; the semicolon character plays
the role of an empty statement, or rather, it ends the while loop statement, leaving it with an
empty body. As you can guess, this is made possible by a side effect of the ReadLongint
function; this side effect, which consists of input/output operations and putting a value into a
parameter variable, is what makes up the actual loop body.

In general, this is an example of how not to do things. The loop header side-effect technique is
often used in other programming languages, including C, which we will study in the second volume
of this book. Pascal programmers use this technique less often; Pascal does not honor side effects
at all. As we study C, we will try to show in which (not so frequent, it must be said) cases side effects
in the loop header are not a trick, but a justifiable programming technique; but until then, we strongly
advise you to continue, as we agreed in §2.3.6, to avoid using side effects.
The conclusion of this paragraph can be expressed in one phrase: character-by-
character input and analysis of textual information is the most universal approach to its
processing.

2.5.3. Read to end of file and filter programs


As we already know, reading performed with read is actually not always performed
from the keyboard, because the user can run our program with redirected input or as part of a
§ 2.5. Symbols and their codes; text data 327
pipeline. Data read from a file, unlike keyboard input, has one interesting property: it runs
out at some point. Let's discuss this point in more detail.
Until now, we always knew exactly what information the user had to enter and how much
of it there would be, so we had no problem at what point to stop entering. But things may be
different: we may need to read data "until it runs out". The question here is how the program
knows that the data in the standard input stream has run out. Formally, in such a case, it is
said that an "end of file" situation has occurred in the stream; Pascal allows you to check
this using the built-in function eof, the name of which is derived from the words end of
file, i.e. "end of file". When working with a standard input stream, the eof function is used
without parameters; it returns a boolean value: false if the end of the file has not yet
been reached and you can continue reading, and true if there is nothing more to read.
It is important to note that "end-of-file" is a situation, not something else. In beginner's
literature, the quality of which leaves much to be desired, there are such strange notions as
"end-of-file symbol", "end-of-file marker", "end-of-file sign" and the like; don't believe it!
The existence of some sort of end-of-file "symbol" is nothing more than a myth that dates
back to the days when a "file" was a magnetic tape record; in those days, the tape actually had
a special sequence of magnetized sections that marked the end of the record (even then,
however, it had nothing to do with "symbols"). Files on disk do not need any "end marks" at
all, because the operating system stores the length of each file on disk (as an ordinary integer)
and knows exactly how much data can be read from that file.
So, remember once and for all: there is no character, no sign, no marker, no marker,
nothing else to "read" at the end of a file instead of another batch of data, and anyone
who tries to convince you otherwise is simply lying. When we read data from a file and the
file ends, we don't read anything in doing so; instead, an "end-of-file" situation occurs,
meaning that there is nothing more to read from that file. This is what the built-in eof
function checks for - whether or not an end-of-file situation has already occurred.
Unix systems traditionally have a whole class of programs that read text from their
standard input stream and output some new text to the standard output stream, which may be
a modification of the read text or a completely new text containing some results of analysis
of the read text. Such programs are called filters. For example, the grep program selects
lines from the input text that meet certain conditions (most often - just containing a certain
substring) and outputs only these lines, and ignores the rest; the sort program forms the
output text from the lines of the input text sorted in a certain order; the cut
program allows you to select a certain substring from each line; the wc program (from
the word count) counts the number of characters, words and lines in the input text, and outputs
one line with the results of the count. All these programs and many others are just filters.
Once we know how to perform character-by-character reading and detect the "end of file"
situation, we can write a filter program in Pascal. Let's start with a very simple filter that reads
text from a standard input stream and responds to each line entered by the user with the string
"Ok", and when the text runs out - with the phrase "Good bye". To create this filter, we
don't need to store the entire input string in memory, just input the characters one by one, and
when a line feed character is input, output "Ok"; this should be done until the "end of file"
situation occurs. After exiting the read loop, all that is left is "Good bye". Writing:
§ 2.5. Symbols and their codes; text data 328
program FilterOk; { filter_ok.pas }
var
c: char;
begin
while not eof do
begin
read(c);
if c = #10 then
writeln('Ok')
end;
writeln('Good bye') end.

You can check the correct operation of this program not only with the help of a file (this is
not very interesting - it will give "Ok" as many times as there are lines in the file, but you
can't count them, really), but also with the help of ordinary keyboard input. We should
remember (see §1.2.9) that Unix terminal drivers are able to simulate the end of a file; if the
terminal is not reconfigured, it does this when the Ctrl-D key combination is pressed. If
we run the FilterOk program without redirecting input and start typing arbitrary strings,
the program will say "Ok" in response to each input string; when we get bored, we can press
Ctrl-D and the program will correctly terminate with a "Good bye". Of course, we
could simply "kill" the program by pressing Ctrl-C instead of Ctrl-D, but then it
would not give us any "Good bye".
Now let's write a more interesting filter that counts the length of each input string and
outputs the result when the string ends. As in the previous case, we don't need to store the
whole string; we will read the text character by character, and to store the current value of the
string length we will create a variable count; when reading any character except the end of
the string, we will increment this variable by one, and when reading the end of the string we
will output the accumulated value and reset the variable to zero to calculate the length of the
next string. It is also important not to forget to zero our variable at the very beginning of the
program, so that the length of the very first line is also calculated correctly. All together it
will look like this:

program FilterLength; { filter_len.pas }


var
c: char;
n: integer;
begin
n := 0;
while not eof do
begin
read(c);
if c = #10 then
begin
writeln(n);
n := 0 end else
n := n + 1 end
§ 2.5. Symbols and their codes; text data 329
end.

Let's consider a more complex example. Suppose we need a filter program that selects lines
starting with non-space characters from the input text and prints them only, and ignores lines
starting with a space or tab, as well as empty lines. It may feel like reading the whole line is
necessary here, but this time it is not. In fact, we only need to remember whether we are
printing the current line or not; there are times when we don't know whether to print the
current line yet - this happens if we haven't read a single character of the line yet. To store
both conditions, we'll use logical variables: one, which we'll call know, will remember
158

whether we know whether or not to print the current line, and the second, print, will
only be used if know is "true", in which case it will indicate whether or not we are printing
the current line.
After reading the character, we first check to see if it is a line feed character. If so, we
first check to see if the current line is being printed, and if so, we print the line feed character;
we then put a "lie" value in the know variable to indicate that we are starting the next line,
which we don't yet know if it should be printed.
If the character read is not a line feed character, we have two options. If we don't know
yet whether the current line is printable, it's time to find out: depending on whether the space
character (or tab) is read or not, we put the value "false" or "true" into the variable print,
and then we put "true" into the variable know, because now we know for sure whether
the current line is printable or not.
Next, whatever the read character is, we know whether to print it or not: indeed, even if
we didn't know it just now, we already know it. If necessary, we print the character, and that's
where the body of the loop ends.
At the beginning of the program, we must remember to specify that we don't know yet
whether the next line will be printed; this is done by putting a "lie" in the know variable.
To be on the safe side, we should also put "lie" into the print variable, otherwise
the compiler will warn that this variable can be used without initialization (this is not the case
here, but the situation was too complicated for the compiler). The full text of the program will
look like this:
program SkipIndented; { skip_indented.pas }
var c: char; know, print: boolean;
begin know := false; print := false; while not eof do begin
read(c); if c = #10 then begin if know and print then writeln;
know := false end else begin if not know then begin print := (c
<> ' ') and (c <> #9); know := true end; { by this point always
know = true } if print then write(c) end end end end.
It will be interesting to test this program by giving it its own source code as input:
avst@host:~/work$ ./skip_indented < skip_indented.pas program
SkipIndented; var begin end. avst@host:~/work$

158
Here it would be more correct to use an enumerated type variable for the three options ("yes", "no",
"don't know"), but we haven't parsed the enumerated type yet.
§ 2.5. Symbols and their codes; text data 330
Indeed, in our program, only these four lines are written from the leftmost position, while
the rest are shifted to the right (structural indents are used) and begin, as a consequence,
with spaces.
When analyzing the text of our program, you may get the misleading impression that if
you enter the text not from a file but from the keyboard, the characters output by the program
will be mixed with the characters entered by the user. In fact, it is not so: the terminal gives
our program the text entered by the user not one character at a time, but in whole lines, that
is, by the time our program reads the first character of the line, the user has already entered
the whole line, so that the characters will appear on the screen in the next line. You can check
this for yourself.
Of course, filter programs are much more complex, and it is often necessary to store in
memory not only entire strings, but also the entire input text (this is how the sort program,
for example, is forced to do). Later, when we get acquainted with dynamic memory
management, we will learn to write such programs ourselves.

2.5.4. Reading numbers to the end of the file


In the previous paragraph we discussed the "end-of-file" situation, its occurrence in a
standard input stream, and showed how to organize reading data until it runs out. When
reading is done character-by-character, there is no problem with this; beginners often fall into
a rather simple and standard trap here, trying to apply the same approach to reading sequences
of numbers.
Let's consider the simplest task: the user enters integers from the keyboard, and we need
to read them all and calculate their number and total sum. The number of numbers is not
known in advance, and the user informs us about the fact that the numbers have run out in the
most correct way - by arranging the "end of file", i.e. by pressing Ctrl-D if the input is coming
from the keyboard; if the user has already written the numbers to the file or if they are
generated by another program, then we don't even have to press anything for the "end of file"
to occur, everything will happen by itself. Let's assume that we have enough digit capacity of
the longint type, i.e. the user will not enter numbers exceeding two billion even in total.
Of course, the textual representation of a number can be read one character at a time, as
we did in §2.5.2, but this is very cumbersome, so there is a natural desire to take advantage
of the capabilities already available in the read operator - it can read the textual
representation of numbers, and it will translate the sequence of digits into the machine
representation of the number for us. Beginning programmers in this situation often write a
program like this:

program SimpleSum;
var
sum, count, n: longint;
begin
(0x0sum := 0;
count := 0;
while not eof do { bad idea! }
§ 2.5. Symbols and their codes; text data 331
begin
read(n);
sum := sum + n;
count := count + 1
end;
writeln(count, ' ', sum)
end.
- and are surprised to find that it works "somehow wrong". They have to press Ctrl-
D several times for the program to finally calm down; in this case, the number of
numbers produced by the program is more than we entered.
Figuring out what's wrong here can be tricky, but we'll give it a try; to do so, we'll
need to understand in detail what read actually does when we ask it to read a number,
but that shouldn't be a problem since we already did it ourselves in §2.5.2. So, the first
thing read does is skip whitespace characters, that is, it reads characters from the
input stream one at a time until it finds the first digit of the number.
Having found this digit, read begins to read the digits that make up the
representation of the number, one by one, until it again comes across the space
character, and in the process of this reading accumulates the value of the number in the
same way as we did, by a sequence of multiplications by ten and adding the value of
the next digit.
Note that the "end of file" situation does not have to occur immediately after
reading the last number. In fact, it almost never does; remember that text data is a
sequence of lines, and at the end of each line is a line feed character. If the data input
to our program is a valid text, then after the very last number in this text there should
be a line feed (this is if the user has not left a dozen or two insignificant spaces after
the number, and no one forbids him to do so). It turns out that at the moment when
read finishes reading the last number from the input stream, the numbers have
already run out, but the characters in the stream have not yet. As a consequence, eof
still thinks that nothing has happened and gives "false"; as a result, our program does
one more read, which, when trying to read the next character, safely stops at the
end of the file. Its behavior in this situation is a bit unexpected - without producing any
errors, it simply pretends to read the number 0 (it is not clear why this is done, but
the reality is like this). Hence the discrepancy between the number of entered numbers
calculated by the program and the number of numbers actually entered by the user
(although the sum is correct, because adding an extra zero does not change it; but this
is just lucky for us - imagine what would happen if we calculated the product instead
of the sum).
And this is not the end of the story. In the case when input is not from a real file
(which may indeed run out) but from the keyboard, where the end-of-file situation has
to be simulated by pressing Ctrl-D, it is quite possible to continue reading
after the end-of-file situation occurs; in our situation, this means that read,
having hit the end-of-file situation, has used it, so that eof will not see
this end-of-file situation. That's why you have to press Ctrl-D twice for the program
to terminate. When working with real files, this effect is not observed, because once a
§ 2.5. Symbols and their codes; text data 332
file has ended, the associated input stream remains in the end-of-file situation
"forever".
Anyway, we have a problem, and we need a means to solve it; Free Pascal provides
us with such a means: the SeekEof function. Unlike the usual eof, this function
first reads and discards all whitespace characters, and if it finally "hits" the end of the
file, it returns "true", and if it finds a non-whitespace character, it returns "false". At
the same time, the found non-white character is "returned" to the input stream, so that
the next read will start from it.
The above "incorrect" program turns into a correct one with a single correction -
we need to replace eof in the loop header with SeekEof and everything will
work exactly as we want it to.
Free Pascal's implementation of SeekEof until recently contained a completely
ridiculous bug that made it impossible to work with any input streams other than real disk files
and terminal (keyboard) input. In particular, programs using SeekEof could not work as
part of pipelines. The author of the book was informed about this by readers soon after the
first volume of the first edition was published.
Interestingly, the bug was introduced into the code around 2001; apparently, someone
tried to report it to the Free Pascal developers several times, but instead of fixing the bug,
they preferred to write in the documentation that this function is not designed to work with
streams other than disk files - in other words, they declared the bug a bug in the best
traditions. The creators of Free Pascal were silent about why such a function is needed at all
and how it is suggested to write programs that read numbers from text streams.
In September 2020, while preparing the book for its second edition, yours truly decided
to try to convince this company that SeekEof should be fixed. During the discussion, the
author got the clear impression that there was not a single person on the Free Pascal
development team who understood why the SeekEof function was needed, and in general,
what text input streams are (especially in Unix systems) and how they are handled. You can
read more about this story on the author's website, and there are links to forum discussions
there as well. To convince one particular hard-headed character that the world really works
this way, the author had to use heavy artillery: dig up the original Borland Pascal 7.0 box with
all the books in it and quote, as they say, the original source - the original description of
SeekEof, which, among other things, said what this function was for. Not only that, but I had
to extract from the archives of 25 years ago the distribution kit of the same BP 7.0, find the
RTL sources, find the file containing the implementation of four functions at once - Eof,
SeekEof, Eoln and SeekEoln - and clearly demonstrate to these strange people that
the original version of SeekEof never did all those idiotic things that the code of its
FreePascal implementation contained for some reason. Apparently, those who tried to report
this bug earlier didn't have a box of Borland Pascal^ in their closet, and the Free Pascal^
maintainers wouldn't settle for anything less.
The most interesting thing is that the same character, with whom your humble friend
spent dozens of hours in fruitless discussion, reluctantly agreed to fix the SeekEof function
itself, but "at the same time" completely broke some other function in the code, He refused to
discuss his idiotic actions, and there were no other Free Pascal maintainers who had access
to the code and were willing to cooperate.
Anyway, when the second edition of the book is released in early 2021, the last official
release of Free Pascal is still 3.2.0, in which SeekEof still contains 19 years of someone
§ 2.5. Symbols and their codes; text data 333
else's idiocy. To get a fixed version of SeekEof, you need to download an archive of the
latest version under development from freepascal.org and work with it; no one can
say when the next official release will be, of course, but there is at least some hope that
SeekEof will work correctly in the next version.
This whole story, unfortunately, shows that it would be extremely imprudent to use Free
Pascal as a professional tool. The author would not have had anything to do with this project
if it weren't for the fact that there are no other live implementations of Pascal in the world right
now.
Just in case, let's also mention the SeekEoln function, which returns "true"
when the end-of-line character is reached. Like SeekEof, it reads and discards
whitespace characters. This function may be needed, for example, if the input data
format involves sets of variable-sized numbers grouped on different strings.
2.6. Pascal's type system
2.6.1. Built-in types and custom types
The variables and expressions we have used so far belong to different types
(integer and longint, real, boolean and some others), but all these types
have one thing in common: they are built into the Pascal language, that is, we don't
need to explain to the compiler what these types are, the compiler already knows about
them.
Pascal does not stop with built-in types; Pascal allows us to create new types of
expressions and variables ourselves. In most cases, new types are named using the
identifiers we already know, just as we know about variables, constants, and
subroutines; in some cases, anonymous types are also used, i.e., as the name implies,
types that are not named, but their use in Pascal is somewhat limited. Any types
introduced by the programmer (i.e., not built-in) are called user-defined types because
they are introduced by the user of the compiler; clearly, this user is the programmer,
who of course should not be confused with the user of the resulting program.
To describe user-defined types and their names, type description sections
starting with the keyword type are used, just as variable description sections start
with the word var. As in the case with variables, the type description section can be
located in the global area or locally in a subprogram (between the subprogram header
and its body). Like variables, types described in a subprogram are visible only in that
subprogram, while types described outside subprograms (i.e. in the global area) are
visible in the program from the place where the type is described until the end of the
program.
The simplest variant of a new type is a synonym of some type we already have,
including a built-in type; for example, we can describe the type MyNumber as a
synonym of the type real:

type
MyNumber = real;

after which we will be able to describe variables of type MyNumber:


§ 2.6. Pascal's type system 334
var
x, y, z: MyNumber;

You can do everything with these variables that you can do with variables of type
real; moreover, you can mix them in expressions with variables and other values of
type real, assign them to each other, and so on. At first glance, the introduction of
such a synonym may seem pointless, but sometimes it turns out to be useful. For
example, while writing some relatively complex program, we may doubt what digit
capacity of integers will be enough for a local problem; If, for example, we decide that
ordinary two-byte integer'cu's are enough for us, and then (in the process of testing
or even operating the program) it turns out that their digit capacity is not enough and
the program works with errors due to overflows, then to replace integers with
longint'bi's or even int64's we will have to look through the program text
carefully to identify the variables that should work with numbers of higher digit
capacity and change their types; In doing so, we risk missing something or, on the
contrary, turning into a four-byte variable a variable for which two bytes are enough.
Some programmers cope with this problem, as they say, cheap and easy: simply
use longint always and for everything. But on closer examination this approach
turns out to be not too successful: firstly, longint's bitness may be
insufficient too and int64 will be needed, and secondly, sometimes thoughtless
use of variables with a bitness higher than required leads to a noticeable memory
overrun (for example, when working with large arrays) and slowing down of the
program (if you use int64 instead of integer in a 32-bit system, the program's
speed may drop several times).
The introduction of synonym types allows us to deal with this problem more
elegantly. Suppose we are writing a traffic simulator in which each object participating
in our simulation has its own unique number, and we also need loops that run through
all such numbers; at the same time, we don't know for sure whether 32767 objects will
be enough to achieve the required simulation goals or not. Instead of guessing which
type to use - integer or longint, we can introduce our own type, or rather,
type name:

type
SimObjectId = integer;

Now we will use the SimObjectId type name (instead of integer) wherever
we need to store and process the simulation object identifier, and if we suddenly realize
that the number of objects in the model is dangerously close to 32000, we can replace
integer with longint in one place in the program - in the description of the
SimObjectId type - and the rest will happen automatically. By the way, if we
never need a negative value of the object identifier during the program, we can replace
integer with unsigned word, which is more suitable here.

2.6.2. Ranges and enumerated types


The names-synonyms considered in the previous paragraph, in general, did not
§ 2.6. Pascal's type system 335
create any new types, only introduced new names for existing ones. In this paragraph
we will devote two simplest cases of creating really new types.
Perhaps the simplest case that requires the least amount of explanation is the range
of integers. A variable of this type can take integer values, but not all of them, and not
even those that are determined by its digit capacity, but only values from a given range.
For example, we know that decimal digits can have a value from zero to nine; we can
reflect this fact by describing the corresponding type:

type
digit10 = 0..9;
var
d: digit10;

The variable d described in this way can take only ten values: integers from zero to
nine. Otherwise, you can work with this variable in the same way as with other integer
variables.
It is worth noting that (at least for Free Pascal) the machine representation of a number
in such a variable is the same as the machine representation of an ordinary integer, so that
the size of a variable of range type is the same as the size of the smallest variable of a built-
in integer type that can accept all values of a given range. Thus, a variable of digit10
type will occupy one byte; a variable of range type will occupy one byte
15000.. 15010 will occupy two bytes, oddly enough, because the smallest type that
can accept all values from this range is the integer type. Similarly, a variable of the
range type will occupy two bytes as well
0.. 60010, since a variable of type word can take all these values, and so on. At
first glance, this seems a bit odd, since both of these ranges involve only 11 different values
each; clearly, one byte would be sufficient. But the point is that by using a single byte in these
situations, the compiler would be forced to represent the numbers in the ranges differently
than regular integers, and for every arithmetic operation on the numbers in the ranges, it
would have to insert an additional addition or subtraction into the machine code to bring the
machine representation to the form handled by the CPU. There is nothing impossible in this,
but in modern conditions program performance is almost always more important than the
amount of memory used.

Range types are not limited to integers; for example, we can specify a subset of
characters:

type
LatinCaps = 'A'..'Z';

We will return to this issue when we study the concept of an ordinal type. For now, let
us note one more very important point: when defining a range, only compile-time
constants can be used to specify its boundaries (see §2.2.15).
Another simple case of a user-defined type is the so-called enumerated type; an
expression of this type can take one of the values listed explicitly in the type
description. These values themselves are specified by identifiers. For example, to
describe the colors of a rainbow, we could specify the type
§ 2.6. Pascal's type system 336
type
RainbowColors =
(red, orange, yellow, green, blue, indigo,
violet);
var
rc: RainbowColors;

The variable rc can take one of the values listed in the type description; for example,
you could do this:

rc := green;

In addition to assignment, the values of enumerated type expressions can be compared


(both for equality and inequality, and for order, i.e. using the operations <, >, <= and
>=). In addition, you can find out the previous value and the next value for a value of
an enumerated type. This is done by the built-in functions pred and succ; for
example, when using the RainbowColors type from our example, the
pred(yellow) expression has the value orange, and the succ(blue)
expression has the value indigo. Attempting to calculate the value preceding the
first or following the last will do no good, so before applying the succ and pred
functions, you should make sure that the expression used as an argument does not have
the last or first value for the type, respectively.
Note that the functions succ and pred are not defined only for enumerated types,
but more on that later.
Values of the enumerated type have numbers, which can be found out using the
already familiar to us built-in function ord and which always start with zero, and
159

the number of the next value exceeds the number of the previous one by one.
In classical Pascal, constants specifying an enumerated type are equal only to
themselves and nothing else; the sentence "explicitly set the value of a constant in an
enumeration" is simply meaningless from the point of view of classical Pascal. At the same
time, modern dialects of Pascal, including Free Pascal, under the influence of C (in which
similar constants are simply integers) have somewhat modified and allow explicitly setting
numbers for constants in enumerations, giving rise to a strange (if not to say nonsense) type
that is considered an ordinal type, but at the same time does not allow succ and pred to
be used and in fact does not give anything useful, because (unlike C again) Pascal has tools
specifically designed to describe compile-time constants (see page 279). page 279), and not
only integer ones. We will not consider such "explicit number values", as we will not consider
many other facilities available in Free Pascal.
Note that a constant specifying the value of an enumerated type can be used only
in one such type, otherwise the compiler would not be able to determine the type of an
expression consisting of this constant alone. Thus, describing the RainbowColors
type in a program, and then forgetting about it and describing (for example, to simulate
traffic signals) something like

Recall that we met this function when we worked with character codes; see §2.5.1. §2.5.1.
159
§ 2.6. Pascal's type system 337
type
Signals = (red, yellow, green);

we will get an error. Therefore, programmers often prefix constants of enumerated


types with prefixes mnemonically related to their type. For example, we could avoid a
conflict by doing the following:

type
RainbowColors = (
RcRed, RcOrange, RcYellow, RcGreen,
RcBlue, RcIndigo, RcViolet.
);
Signals = (SigRed, SigYellow, SigGreen);

2.6.3. The general concept of ordinal type


An ordinal type in Pascal is a type that satisfies the following conditions:

1. all possible values of this type are assigned integer numbers, and this is done in
some natural way;
2. comparison operations are defined on values of this type, and the element with
the lower number is considered to be the smaller one;

3. for a type it is possible to specify the smallest and the largest value; for all values
except the largest, the next value is defined, and for all values except the
smallest, the previous value is defined; the sequence number of the previous
value is one less, and the next value is one more than the number of the initial
value.

Pascal has a built-in ord function to calculate the ordinal number of an element,
and pred and succ functions to calculate the previous and next elements,
respectively. All these functions are already familiar to us, but now we can move from
special cases to general ones and explain what these functions actually do and what
their scope of application is.
The ordinal types are:
• boolean type; its smallest value false has the number 0, and its largest
value true has the number 1;
• char type; its element numbers correspond to character codes, the smallest
element is #0, the largest element is #255;
• integer types, both signed and unsigned ; 160

• any range type;

The Free Pascal compiler does not consider 64-bit integer types, i.e. int64 and qword types,
160

to be ordinal. This restriction is introduced by the arbitrary behavior of the compiler's creators and has
no other grounds than some simplification of implementation.
§ 2.6. Pascal's type system 338
• any enumerated type.
At the same time, the real type is not an ordinal type; this is quite understandable,
because floating-point numbers (actually binary fractions) do not allow any "natural"
numbering. The functions ord, pred and succ cannot be applied to values of the
real type, as this will cause a compilation error. In general, no types other
than those listed above, i.e. except boolean, char, integers (up to and including 32
bits), enumerated types and ranges, are ordinal in Pascal.
The concept of an ordinal type is very important in Pascal, because in many
situations any ordinal type is allowed, but no other type is allowed. We have already
seen one such situation: a range can be specified as a subrange of any ordinal type -
and no other. In the future, we will also encounter other situations where ordinal types
must be used.

2.6.4. Arrays
The Pascal language allows you to create complex variables, which differ from
simple variables in that they themselves consist of variables (of course, of a different,
"simpler" type). There are two main types of complex variables: arrays and records;
we will start with arrays.
An array in Pascal is a complex variable consisting of several variables of the
same type, called array elements. To refer to the array elements are so-called indices
- values of one or another ordinal type, most often - ordinary integers, but not
necessarily; the index is written in square brackets after the array name, that is, if, for
example, a is a variable of the "array" type, for which indices are integers, then a[3]
is an element of the array a, having the number (index) 3. In square brackets
you can specify not only a constant, but also an arbitrary expression of the
corresponding type, which allows you to calculate indices during program execution
(without this feature, arrays would have no meaning).
Since an array is essentially a variable, even a "complex" one, this variable must
have a type; it can be described and named like other types. For example, if we plan to
use arrays of one hundred numbers of type real in a program, the corresponding
type can be described as follows:

type
real100 = array [1..100] of real;

Now real100 is a type name; a variable of this type will consist of one hundred
variables of type real (array elements), which are provided with their numbers
(indices), and integers from 1 to 100 are used as such indices. By entering a type
name, we can describe variables of this type, for example:
var
a, b: real100;

Described in this way a and b are arrays; they consist of the elements a[1], a[2],
§ 2.6. Pascal's type system 339
..., a[100], b[1], b[2], ..., b[100] For example, if for some reason we
need to study the behavior of sine in the vicinity of zero, we can start by forming a
sequence of numbers 1, |, |, |,..., 2 in the elements of array a :
-99

a[1] := 1;
for i := 2 to 100 do
a[i] := a[i-1] / 2;

and then enter the corresponding sine values into the elements of the array b:

for i := 1 to 100 do b[i] := sin(a[i]);

and finally, print the entire resulting table:

for i := 1 to 100 do
writeln(a[i], ' ', b[i]);

Note that here we perform all operations on array elements, not on arrays themselves;
but an array can be treated as a whole if necessary, because it is a variable. In particular,
arrays can be assigned to each other:

a := b;

But you should think carefully first, because this assignment copies all the information
from the memory area occupied by one array to the memory area occupied by another
array, and this can be relatively time-consuming, especially if you do it often, say, in
a loop. Similarly, care should be taken when passing arrays as a parameter to
procedures and functions; we'll come back to this issue. It is important to remember
that only arrays of the same type can be assigned to each other; if two arrays are of
different types, even if described in exactly the same way, they cannot be assigned.
As we have already mentioned, in principle it is possible not to give a name to a
type, but to describe a variable of that type at once; the type itself will be considered
anonymous in this case. This also applies to array types: for example, we could do the
following:

var
a, b: array [1..100] of real;

— and in a simple task where no more arrays of this type are assumed, we will not
notice any difference at all. But if at some point, for example, locally in some
procedure, we need one more such array, and we describe it:

var
c: array [1..100] of real;

— then this array will be incompatible with arrays a and b, i.e. they cannot be
assigned to each other, even though they are described in exactly the same way. The
§ 2.6. Pascal's type system 340
point is that formally they belong to different types: every time the compiler sees such
a variable description, another type is actually created, but it is not given a name, so in
our example the first such type was created when describing arrays a and b, and
the second one (though exactly the same, but new) - when describing array c.

The indication of index change limits deserves special attention. The syntactic
construction "1..100" definitely reminds us of something: range types were
described in exactly the same way, and this is not accidental. In Pascal, you can use
any ordinal type to index arrays, and when describing an array type, you actually
specify the type to which the index value should belong, which is most often a range,
and anonymous - but not necessarily. Thus, we could describe the real100 type in
a more detailed way:

type
from1to100 = 1..100;
real100 = array [from1to100] of real;

Here we first describe the range type explicitly, giving it the name "from1to100",
and then use this name when describing the array. There are arrays whose indexes are
not ranges, but something else. For example, if we have balls of seven colors of the
rainbow in our task, and we process the colors using the enumerated type
RainbowColors (see page 336), and at some point we need to count how many
balls of each color there are, an array of this type may be convenient:
type

RainbowCounters = array [RainbowColors] of integer;

You can use char or even boolean as an array index:

type
CharCounters = array [char] of integer; PosNegAmount
= array [boolean] of real;

The first of these types defines an array of 256 elements (of type integer)
corresponding to all possible values of type char (i.e. all possible characters); such
an array may be needed, for example, when analyzing the frequency of text. The
second type assumes two elements of type real, corresponding to logical values; to
understand where such a strange construction can be useful, imagine a geometric
problem associated with calculating the sums of areas of some figures, and the
summation should be performed separately for the figures corresponding to some
condition and for all others.
When describing array variables, you can initialize them, i.e., set the initial values
of their elements. These values are listed in parentheses separated by commas, for
example:

type
§ 2.6. Pascal's type system 341
arr5 = array[1..5] of integer;
var
a: arr5 = (25, 36, 49, 64, 81);

Let's note one more important point. Each array element is a full-fledged variable.
You can not only assign values to them, but also, for example, pass them to procedures
and functions via variable parameters (if you don't remember what this is, be sure to
read §2.3.4). Looking ahead, we note that, like any variable, array elements have an
address in memory, and this address can be learned and used; we will learn how to do
this in Chapter 2.10.
Arrays are indispensable in such problems where it is obvious from the condition
that it is necessary to maintain many values of the same type differing by numbers.
Let's consider the following problem as an example.

A computer science Olympiad was being held in the city N. There were a lot of
participants expected, so it was decided that they would register for the
Olympiad right in their schools. Since there are only 67 schools in the city, and
not more than two or three dozen students from each school can take part in the
Olympiad, the organizers decided to arrange the numbering of the participants'
cards in the following way: the number consists of three or four digits, and the
two lower digits set the number of the participant among the students of the
same school, and the higher one or two digits - the number of the school; for
example, in school No. 5 the future participants of the Olympiad were given
cards with the numbers 501, 502, ...., and in school No. 49 - with numbers
4901, 4902, etc.
Schoolchildren who came to the Olympiad presented their cards to the
organizers, and the organizers entered the following lines into the text file
olymp.txt: first they typed the card number, and then, after a space, the
surname and first name of the participant. Naturally, the participants appeared
at the Olympiad in a completely arbitrary order, so the file could contain, for
example, the following fragment:

5803 Vasily Ivanov


401 Lavrukhina Olga
419 Gorelik Oksana
2102 Borisov Andrey

On the day of the Olympiad there was a soccer match between popular teams in
the city, as a result of which not all students who registered in their schools
ended up coming to the Olympiad - some of them preferred to go to the stadium.
It is necessary to find out from which schools the largest number of participants
came.

At first glance, this task may seem difficult to solve, because we haven't yet learned
how to work with files and how to process strings in Pascal, but strangely enough, we
§ 2.6. Pascal's type system 342
don't need to. We can ignore the names of students, because we only need the school
number, which is extracted from the number of the participant's card by integer
division of this number by 100. As for the files, we can compensate for our lack of
knowledge by knowing how to redirect the standard input stream: in the program we
will read the information using the usual means - the readln operator (after reading
a number, this operator will reset all the information to the end of the line, which is
what we need), and we will run the program by redirecting the standard input from the
olymp.txt file. We will read until the file is finished; we already know how to do
this from §2.5.3.
While reading, we will have to count the number of students for each of the 67
schools in the city, i.e. we will need to maintain 67 variables at the same time; arrays
are invented just for such situations. To be on the safe side, we will put the number 67
in a named constant at the beginning of the program. In the same way, we will put the
maximum allowed number of students from one school into a named constant; it
corresponds to the number by which we need to divide the card number to get the
school number (in our case it is 100):

program OlympiadCounter;
const
MaxSchool = 67;
MaxGroup = 100;

In principle, we need exactly one array in our program, so we could leave its type
anonymous, but we won't do that and describe the type of our array with its name:

type
CountersArray = array [1..MaxSchool] of integer;

The variables we will need are, first, the array itself; second, we will need integer
variables to loop through the array, to read card numbers from the input stream, and to
store the school number. Let's describe these variables:

var
Counters: CountersArray; i, c, n: integer;

Now we can write the main part of the program, which we start by zeroing out all the
elements of the array; indeed, so far we haven't seen a single participant card, so the
number of participants from each school should be zero:

begin for i := 1 to MaxSchool do Counters[i] := 0;

Now we need to organize a cycle of reading information. As we have already agreed,


we will read until the eof function tells us that there is nothing more to read; we will
use the readln operator to read. Just in case, we will not assume that the data in the
standard input stream is correct, because the file was typed by people, and people make
mistakes. Therefore, after each entered number, we will check whether our readln
§ 2.6. Pascal's type system 343
operator managed to correctly convert the entered chain of characters into a number.
Recall that this is done using the lOResult function (see page 319). To make this
work, we must remember to inform the compiler of our intention to handle I/O errors
ourselves; as we know, this is done with the {$I-} directive.
Finally, the last important point is that before accessing an array element to try to
increment it, we must be sure to check that the resulting school number is valid. For
example, if the operator accidentally makes a mistake and enters some invalid card
number like 20505, our program will crash when trying to access the array element
with the number 205; we should not allow this to happen, it is better to inform the
user that we have found an invalid school number in the file.
If any error is detected, we should stop processing the file at this point, as we will
never get the correct results anyway; we can terminate the program by telling the
operating system that we are terminating unsuccessfully (see page 310). Our complete
reading cycle will look like this:

{$I-}
while not eof do
begin
readln(c);
if IOResult <> 0 then
begin
writeln('Incorrect data');

halt(1) end;
n := c div MaxGroup;
if (n < 1) or (n > MaxSchool) then begin
writeln('Illegal school id: ', n, ' [', c, ']');
halt(1)
end;
Counters[n] := Counters[n] + 1;
end;

The next step of processing the obtained information is to determine what number of
students from the same school is a "record", i.e. simply to determine the maximum
value among the elements of the Counters array. This is done as follows. To begin
with, we will declare school #1 to be the "record" school, no matter how many students
arrive from there (even if none). To do this, we will put its number, i.e. the number 1,
into the variable n; this variable will store the number of the school that is currently
(for now) considered to be a "record" school. Then we will look through the
information for all other schools, and every time the number of students from the next
school exceeds the number of students from the school that has been considered the
"record" school so far, we will assign to variable n the number of the new
"record" school:
n := 1; { give the record to the
first school }
for i := 2 to MaxSchool do{ go through the rest }
§ 2.6. Pascal's type system 344
if Counters[i] > Counters[n] then { new record? } n
:= i; { update the number of the "record"
school }

By the end of this cycle, all counters will have been looked through, so that variable n
will contain the number of one of the schools from which the maximum
number of students came; in general, there may be more than one such school (for
example, the Olympiad may have had 17 participants from schools #3, #29 and #51,
and fewer from all other schools). What remains to be done is what the program is
written for: to print the numbers of all schools from which exactly as many students
came as from the one whose number is in the variable n. This is quite simple: we look
through all schools in order, and if there are as many students from this school as from
the n'th school, we print its number:

for i := 1 to MaxSchool do
if Counters[i] = Counters[n] then writeln(i)

The only thing left to do is to end the program with the word end and a dot. The
whole text of the program is as follows:
program OlympiadCounter; { olympcount.pas }
const MaxSchool = 67; MaxGroup = 100; type CountersArray =
array [1....MaxSchool] of integer; var Counters:
CountersArray; i, c, n: integer; begin for i := 1 to
MaxSchool do Counters[i] := 0; {$I-} while not eof do begin
readln(c); if IOResult <> 0 then begin writeln('Incorrect
data'); halt(1) end; n := c div MaxGroup; if (n < 1) or (n
> MaxSchool) then begin writeln('Illegal school id: ', n,
' [', c, ']'); halt(1) end; Counters[n] := Counters[n] + 1
end; n := 1; for i := 2 to MaxSchool do if Counters[i] >
Counters[n] then n := i; for i := 1 to MaxSchool do if
Counters[i] = Counters[n] then writeln(i) end.

2.6.5. Type of record


As we have already mentioned, Pascal supports two kinds of complex types - arrays
and records. While a variable of the array type, as we have seen, consists of variables
(elements) of the same type that differ in number (index), a variable of the record
type consists of variables called fields, whose types are generally different. Unlike
array elements, record fields are distinguished not by numbers, but by names.

For example, in orienteering competitions, each checkpoint to be passed by the


participants may be characterized, first, by its number; second, by its coordinates,
which can be written as fractional numbers in degrees of latitude and longitude (by
default, for example, we can assume that our latitudes are northern and longitudes are
eastern, while southern latitudes and western longitudes are denoted by negative
values); third, some checkpoints may be "hidden", i.e. not shown on the maps given to
competitors; usually the location of such checkpoints becomes known when other
§ 2.6. Pascal's type system 345
checkpoints are taken; finally, each checkpoint is associated with a penalty for not
taking it, usually an integer number expressed in minutes. A special type can be created
to represent the checkpoint information:
type
Checkpoint = record
n: integer;
latitude, longitude: real;
hidden: boolean;
penalty: integer;
end;

Here the Checkpoint identifier is the name of the new type, record is the
keyword for the record, and the description of the record fields follows in the same
format as we describe variables in the var sections: first the variable names (in this
case, fields) separated by commas, then a colon, the type name, and a semicolon. A
variable of type Checkpoint would thus be a record with integer fields n and
penalty, latitude and longitude fields of type real, and a
hidden field of logical type. Let's describe such a variable:
var
cp: CheckPoint;

The cp variable occupies as much memory as is needed to accommodate all its


fields; these fields themselves are accessible "via point": as long as cp is the
name of a variable of type "record", then cp.n, cp.latitude, cp.longitude,
cp.hidden and cp.penalty are its fields, which are also, of course,
variables. For example, we can write the data of some checkpoint to the variable cp:
cp.n := 70;
cp.latitude := 54.83843;
cp.longitude := 37.59556;
cp.hidden := false;
cp.penalty := 30;

As in the case of arrays, when working with a record, most, if not all, of the actions are
performed on its fields; the only thing that can be done with the record itself as a single
entity is assignment. But the fields themselves (again, like array elements) are full-
fledged variables; you can do everything with them that is usually done with variables
- you can assign values to them, pass them to subroutines via variable parameters, etc.

2.6.6. Designing complex data structures


The field of a record may well be an array or even another record (although this
option is rarely needed). Similarly, a record can be an array element; for example, we
could, at the beginning of the program, set a constant indicating the total number of
§ 2.6. Pascal's type system 346
checkpoints:

const
MaxCheckPoint = 75;

and after describing the Checkpoint type (as done in the previous paragraph)
describe a type for an array that can hold information about all checkpoints in the
course and describe a variable of that type:

-type
CheckPointArray = array [1..MaxCheckPoint] of
Checkpoint;
var
track: CheckPointArray;

The resulting data structure in the track variable corresponds to our idea of a table.
When we build a table on paper, we usually write down the names of its columns at
the top, and then we have rows containing "records". In the data structure just built,
the role of the table header with column names is played by the field names - n,
latitude, longitude, hidden and penalty. The table rows correspond to
array elements: track[1], track[2], etc., each of which represents a
Checkpoint record, and a separate cell of the table is a field of the corresponding
record: track[7].latitude, track[63].hidden, etc.
An array element can also be another array. This situation is so frequent that it has
its own name - "multidimensional array", and a special syntax to describe it: when
specifying an array type, we can write not one type of index in square brackets, but
several, separated by a comma. For example, the following type descriptions are
equivalent:
type
arrayl = array [1..5, 1..7] of integer;
array2 = array [1..5] of array [1..7] of integer;

Similarly, when referring to an element of a multidimensional (in this case, two-dimensional)


array, we can just as easily write a [2][6] and a[2,6]. The most obvious use of
multidimensional arrays is to represent the mathematical concept of a matrix; for example,
when solving a system of linear equations, we might need some matrix like

type
matrix5x5 = array [1..5, 1..5] of real;

Multidimensional arrays can also be fields of records, just as records can be their elements.
There are no formal restrictions on the depth of nesting of type descriptions in Pascal, but you
should not get carried away with it: computer memory is by no means infinite, and in addition,
some specific situations may impose additional restrictions. For example, you should think
ten times before making a huge array of a local variable in a subroutine: local variables are
§ 2.6. Pascal's type system 347
located in stack memory, which may be much smaller than you expected.

2.6.7. Custom subprogram types and parameters


When passing values (and variables) having user-defined types into subroutines, you
should take into account some points that are not quite obvious for beginners.
First of all, it should be noted that it is forbidden to use anonymous types when
describing parameters in subprogram headers. In other words, an attempt to write
something like the following will cause an error:

procedure p1(b: 1..100); { error!


}/CCS
beg
inUsud
writeln(b) end;

In order for a type to be used when passing a parameter to a subprogram, this type must be
described in the type description section and named:

type
MyRange = 1..100;

procedure p1(b: MyRange);


begin
writeln(b)
end;

The same can be said about returning a value from a function: only types with names are
suitable for this.
The second point worth mentioning concerns passing into subroutines (and returning
from functions) values of complex types that occupy a large amount of memory. Pascal, in
principle, does not forbid to do this: you can pass a large record or array into a procedure, and
the program will successfully pass compilation and even work somehow. You only need to
remember two things. First, as already mentioned, local variables (including parameter-
values) are located in memory in the area of the hardware stack where there may be little
space. Secondly, copying large amounts of data itself takes non-zero time, which you can feel
very well if you do such things often - for example, calling a subroutine in a loop, each time
passing it a large-sized array as a value parameter.
Therefore, if possible, you should refrain from passing values that occupy a significant
amount of memory into subroutines; if you cannot avoid it, it is better to use var-parameter
even if you do not intend to change anything in the passed variable. The point is that when
you pass any value through a var-parameter, no copying takes place. We have already said
that the local name (var-parameter's name) becomes a synonym of the variable specified
during the subprogram's work; note that this synonym is realized at the machine code level
by passing an address, and the address does not take much space - 4 bytes on 32-bit systems
and 8 bytes on 64-bit systems.
So, if you work with variables that have a significant size, it is best not to pass them
§ 2.6. Pascal's type system 348
as parameters to subroutines at all; if you do have to do so, you should pass them as var-
parameters; and only in a very extreme case (which usually does not occur) you can try
to pass such a variable by value and hope that everything will be fine.
The question naturally arises as to which variables are considered to have a "significant
size". Of course, a size of 4 or 8 bytes is not "significant". You may not worry too much about
copying 16 bytes or even 32 bytes, but if the size of your type is even larger - then passing
objects of this type by value first becomes "undesirable", then somewhere at the level of 128
bytes - "significant".
"highly undesirable", and somewhere at the 512 byte level is unacceptable, even though the
compiler will not object. If you think of passing a data structure that occupies kilobytes or
more by value, at least try not to show your source code to anyone: there is a great risk that
they will not want to do business with you anymore after seeing such a trickery.

2.6.8. Type conversions


If you describe a variable of type real in a program and then assign an integer to it:

var
r: real;
begin
{ ... } r := 15;

- then, strangely enough, nothing terrible will happen: the variable r will contain the value
15.0, which is absolutely correct and natural for such a situation. In the same way, you can
assign to a variable of type real the value of an arbitrary integer expression, including
those calculated during program execution; the value assigned will be, of course, a floating-
point number, the integer part of which is equal to the value of the expression, and the
fractional part is zero.
Beginners often don't think about what actually happens in this situation; meanwhile, as
we know, the machine representations of integer 15 and floating-point 15.0 are quite
different from each other. A normal assignment, where you put the value of an expression of
exactly the same type into a variable, is reduced (at the level of machine code) to simply
copying information from one place to another; assigning an integer to a floating-point
variable cannot be realized in this way, you have to convert one representation into another.
This is how we come to the notion of type conversion.
The case of "magical conversion" of an integer into a fractional one refers to the so-called
implicit type conversions; this is what is said when the compiler converts the type of an
expression by itself, without direct instructions from the programmer. The possibilities of
implicit conversions in Pascal are rather modest: you are allowed to implicitly convert
numbers of different integer types into each other and into floating-point numbers, plus you
can convert floating-point numbers themselves into each other, of which there are also several
types - single, double and extended, and the familiar real is a synonym of
one of the three, in our case - double. A little later we will meet with implicit conversion
of a character into a string, but that's all over; the compiler won't convert anything else on its
own initiative. Implicit conversions occur not only in assignments, but also in subroutine
§ 2.6. Pascal's type system 349
calls: if a procedure expects a parameter of type real, we can easily specify an integer
expression when calling it, and the compiler will understand us.
Note that the compiler agrees to "magically transform" an integer number into a floating-
point number, but it refuses to do the conversion in the opposite direction: if we try to assign
a value to an integer variable that is of floating-point type, an error will be generated during
compilation. The point here is that there are different ways to convert a fractional number
into an integer one, and the compiler refuses to choose such a way for us; we have to specify
what specific way the number should be rid of the fractional part. There are actually two
options: conversion by discarding the fractional part and by mathematical rounding; the first
option uses the built-in trunc function, the second option uses round. So, if we have
three variables:
var
i, j: integer; r: real;

- then after performing assignments


r := 15.75;
i := trunc(r);
j := round(r);

in variable i will be number 15 (the result of "stupid" discarding of the fractional part),
and in variable j - number 16 (the result of rounding to the nearest integer). In fact, the
functions round and trunc do not perform type conversion, but some kind of
calculation, because in the general case (when the fractional part is non-zero) the obtained
value differs from the initial one. These functions are only indirectly related to the
conversation about type conversions: it would not be quite fair to say that the corresponding
implicit conversion is forbidden and not explain what to do if it is required.
Let us note one important point. Implicit transformations can be applied to expression
values, but not to variables; this means that when passing a variable to a subprogram via
var-parameter, the type of the variable must exactly match the type of the
parameter, no liberties are allowed here; for example, if your subprogram has a var-
parameter of type integer, you cannot pass it a variable of type longint, or
of type word, or of any other type - only integer. This has a simple explanation: the
machine code of the subroutine is generated for a variable of the integer type and
does not and cannot know anything about what actually happened at the call point, so if
the compiler allowed passing, for example, a variable of the longint type instead of
integer, the subroutine would have no way of knowing that it was given a variable of the
wrong type. Everything is much simpler with value conversions (unlike variables): the
compiler does all conversions at the call point, and the body of the subroutine gets the type it
is waiting for.
In addition to implicit conversions, Free Pascal 161
also supports explicit type conversions, and

Following the Turbo Pascal and Delphi compilers, but unlike the classic Pascal variants where there was
161

nothing like this.


§ 2.6. Pascal's type system 350
such conversions are possible for both expression values and variables. The word "explicit" here
means that the programmer himself - explicitly - indicates that here it is necessary to convert an
expression to a different type, or to temporarily consider a variable to be of a different type. In Pascal,
the syntax of such an explicit conversion resembles a function call with one parameter, but instead
of the function name, the type name is specified. For example, the expressions integer('5')
and byte('5') would both have an integer value of 53, in the first case a two-byte signed value,
in the second a one-byte unsigned value. Similarly, we can describe a variable of type char and
temporarily consider it as a variable of type byte:

var
c: char;
begin
{ ... }
byte(c) := 65;

Of course, not all types can be converted to each other. Free Pascal allows explicit conversions in
two cases: (1) ordinal types can be converted to each other without restriction, and this applies not
only to integers, but also to char, boolean, and enumerated types; and (2) types that have the
same machine representation size can be converted to each other. As for the conversion of variable
types, the first case does not work for them, and size matching is required.
You should be careful with explicit type conversions, because what happens will not always
coincide with your expectations. For example, it is better to get the code of a symbol using the ord
function, and to create a symbol by a given code using chr, which are specially designed for this
purpose. If you are not quite sure that you need explicit type conversion or
know how to do without it, it is better not to use it at all. Note that you can
always do without such conversions, but in some complicated situations the possibility to do the
conversion allows you to save labor costs; the other question is that you will not meet such tricky
cases for a long time.

2.6.9. String literals and char'oe arrays


We have already met strings in the form of string literals - apostrophe-encapsulated text
fragments; until now, we have only encountered them in write and writeln operators,
and we wrote them for one and only purpose: to print them immediately.
Meanwhile, as we have already discussed in the introductory part, text is the most
universal representation for almost any information; the only exceptions are data obtained by
digitization of some or other analog processes - photos, sound and video files, which can also,
in principle, be represented as text, but very inconveniently. Everything else, including, for
example, pictures drawn by a person with the help of graphic editors, it is convenient and
practical to represent in the form of text; the obvious consequence of this is the urgent need
to be able to write programs that process textual information.
We have already met with the simplest cases of text processing in the paragraph devoted
to filter programs (see §2.5.3), where we obtained one text from another text, which was
analyzed using character-by-character reading. In practice, processing texts one character at
a time often turns out to be inconvenient, and we want to treat text fragments - i.e., strings -
as a whole.
Since a string is a sequence of characters, and characters can be represented by values of
§ 2.6. Pascal's type system 351
type char, it is logical to assume that arrays of elements of type char can be used to
process strings. This is indeed possible; Pascal even allows some rather strange actions for
such arrays that cannot be done with any other arrays. For example, we know that arrays can
be assigned to each other only if they are of strictly the same type; but to an array consisting
of char'cu, Pascal allows you to assign a value given by a string literal, that is, a familiar
sequence of characters enclosed in apostrophes:
program HelloString; var
hello: array [1..30] of char;
begin
hello := 'Hello, world!';
writeln(hello)
end.

This program will successfully pass compilation and even, at first glance, work normally,
printing the sacramental "Hello, world!". But only at first glance; in fact, after the usual
phrase, the program will print 17 more characters with code 0, and only then will print a
line feed and terminate; you can easily verify this in various ways: redirect the output to a file
and look at its size, run the program as a pipeline in conjunction with the familiar wc
program, apply hexdump (you can apply it to the output at once, or to the resulting
file). The character with the code 0, being printed, does not show itself on the screen, even
the cursor does not move anywhere, so these 17 "extra" zeros are not visible - but they are
there and can be detected. What is especially unpleasant is that formally this invisible "null
character" has no right to occur in text data, so the output of our program is no longer, formally
speaking, a correct text.
It is easy to guess where these parasitic "zeros" came from: the phrase "Hello,
world!" consists of 13 characters, and we have declared an array of 30 elements. The
assignment operator, having copied 13 characters from the string literal into the array
elements hello[1], hello[2], ..., hello[13], filled the other elements with zeros for lack
of anything better.
Of course, you can fix the program and make it correct, for example, like this (recall that
the break operator stops loop execution prematurely):

program HelloString;
var
hello: array [1..30] of char;
i: integer;
begin
hello := 'Hello, world!';
for i := 1 to 30 do
begin
if hello[i] = #0 then
break;
write(hello[i])
end;
§ 2.6. Pascal's type system 352
writeln
end.

but this is very cumbersome - once again we have to process the string one character at a
time!
The situation when we need to process a string without knowing its length in advance is
quite typical. Imagine that you need to ask a user what his name is; one will answer with a
laconic "Vova", and another will say that he is no less than Ostap-Suleiman-Berta-Maria
Bender-Zadunaisky. The question whether this can be predicted at the stage of program
writing should be considered rhetorical. In this connection, it is desirable to have some
flexible tool for working with strings that takes into account this fundamental property of a
string - to have unpredictable length. Besides, the concatenation operation (joining one string
to another) is very often performed on strings, and it is desirable to designate it in such a way
that its invocation would be unburdensome for the programmer. By the way, a special case of
concatenation - adding a single character to a string - is so common that once you get used to
it, you will remember it nostalgically in the future when working in C (where this action
requires much more effort).
Anyway, to solve most of the problems that arise when working with strings, Pascal - or
rather, its later dialects, including Turbo Pascal and, of course, our Free Pascal - provides a
special family of types, which will be the subject of the next paragraph.
Note that as part of a string literal we can represent not only characters that have a printed
representation, but also any other characters through their codes. In the examples we have
already met strings ending with a line feed character, such as 'Hello'#10; "tricky"
characters can be inserted not only at the beginning or end of a string literal, but also at any
place in the string, for example, 'one'#9'two' - here two words are separated by a tab
character. In general, within a string literal, you can arbitrarily alternate sequences of
characters enclosed in apostrophes and characters specified by their codes; for example,

'first'#10#9'second'#10#9#9'third'#10#9#9#9'fourth'.

there is a valid string literal, the result of printing it (due to tabs and line feeds inserted into
it) will look like this:

first
second
third
fourth

Characters with codes from 1 to 26 can also be represented as ~A, ~B, ..., ~Z; this is
justified by the fact that, as already mentioned, characters with corresponding codes are
generated by combinations of Ctrl-A, Ctrl-B, etc. when entered from the keyboard. In
particular, the literal from our example can be written as follows:

'first'~J~I'second'~J~I~I'third'~J~I~I~I'fourth'
§ 2.6. Pascal's type system 353
2.6.10. Type string
The string type, introduced in Pascal specifically for working with strings, is
actually a special case of an array of elements of the char type, but it is a rather nontrivial
case. First of all, we should note that when describing a variable of the string type, you
may or may not specify the limit of the string size, but the string will not become "infinite":
its maximum length is limited to 255 characters. For example:

var
si: string[15];
s2: string;

Variable s1 described in this way can contain a string up to and including 15 characters
long, and variable s2 can contain a string up to 255 characters long. Specifying a number
greater than 255 is not allowed, it will cause an error, and there is a rather simple explanation:
the string type assumes that the string length is stored in a separate byte, and a
byte, as we remember, cannot store a number greater than 255.
A variable of the string type occupies one byte more than the maximum length of
the stored string: for example, our variables s1 and s2 will occupy 16 and 256 bytes,
respectively. A variable of the string type can be handled as a simple array of elements
of the char type, with indexing of elements containing characters of the string starting from
one (for si it will be elements s1[1] through s1[15], for s2 - elements s2[1]
through s2[255]), but - somewhat unexpectedly - one more element with index 0 is found
in these arrays. This is the byte used to store the length of the string; since the array elements
must be of the same type, this byte, when accessed, is of the same type as the other elements,
i.e. char. For example, if you perform an assignment

si := 'abrakadabra';

then the expression s1[1] will be equal to 'a', the expression s1[5] will be equal
to 'k'; since the word "abrakadabra" has 11 letters, the last meaningful element will
be s1[11], it is also equal to 'a', the values of elements with large indices are
undefined (they can contain anything, and especially anything that is not). Finally, the element
s1[0] will contain the length, but since s1[0] is an expression of char type, it
would be wrong to say that it will be equal to 11; in fact, it will be equal to the character
with code 11, which is denoted as #11 or ~K, and the length of the string can be obtained
by calculating the expression ord(s1[0]), which will give the desired 11. However,
there is a more common way to find out the string length: use the built-in length
function, which is specially designed for this purpose.
Regardless of the length limit, all variables and expressions of the string type in
Pascal are assignment compatible, i.e. we can make the compiler execute both s2 := si
and si := s2, and in this second case the string may be cut off during assignment; in fact,
the possibilities of s2 are somewhat wider, nobody prevents this variable from containing
a string, say, 50 characters long, but si cannot contain more than 15 characters.
§ 2.6. Pascal's type system 354
What is especially nice is that when assigning strings, as well as when passing them by
value to subroutines, and when returning from functions, only the significant part of the
variable is copied; for example, if a variable declared as string without a length limit is a
string of three characters, only those three characters (plus the byte containing the length) will
be copied, even though the entire variable occupies 256 bytes.
It is even more interesting that if your subprogram accepts a var-parameter of
type string, then you can submit any variable of type string as this parameter,
including such a variable for which the length is limited in the description. This exception to
the general rule (about identical coincidence of the variable type with the var-parameter
type) looks ugly and unsafe, but turns out to be very convenient in practice.
Variables of the string type can be "stacked" using the " + " character, which
for strings means connecting them to each other. For example, the program
program abrakadabra;
var
si, s2: string;
begin
si := 'abra';
s2 := si + 'kadabra';
writeln(s2)
end.
will print, as you can guess, the word "abrakadabra".
Strings can be empty, i.e., containing no characters. The length of such a string is zero;
the literal denoting an empty string looks like "''" (two apostrophe characters placed next to
each other); this literal should not be confused with "' '", which denotes a space character
(or a string consisting of a single space character).
In almost all cases, expressions of the char type can be implicitly converted to the
string type - a string containing exactly one character. This is especially
convenient in combination with the addition operation. Thus, the program
§ 2.6. Pascal's type system 355
program a_z; { a_z.pas }
var s: string; c: char; begin s := ''; for c := 'A' to 'Z' do
s := s + c; writeln(s) end.

will print "ABCDEFGHIJKLMNOPQRSTUVWXYZ".


In fact, Free Pascal supports a number of types for working with strings, and many of these
types do not have the limitations of the string type - that is, they can, for example,
store text of completely arbitrary length, work with multibyte character encodings, and so on. We
won't consider them, since we won't need all this splendor when mastering the material in the next
parts of the book; the reader who decides to make Free Pascal his working tool (as opposed to a
tutorial) can learn these features on his own.

2.6.11. Built-in tools for working with strings


The material in this paragraph can be easily dispensed with, because strings can be handled at
the level of individual characters, which means that we can do literally anything with them without
any additional tools. Nevertheless, when processing strings, a number of built-in procedures and
functions can make life much easier, so we'll give you some of them.
We already know one function: length takes a string type expression as input and returns
an integer - the length of the string. The current string length can be changed forcibly by the
SetLength procedure; for example, after executing s := 'abrakadabra';
SetLength(s, 4);

the variable s will contain the string "abra". Note that SetLength can also increase the length
of the string, which will result in "garbage" at the end of the string - unintelligible characters that were
in that memory location before the string was placed there; therefore, if you decide to increase the
length of the string using SetLength, it is best to immediately fill all of its "new" elements with
something meaningful. Note that only the current length of the string changes, but in no way the size
of the memory area allocated for this string. In particular, if you describe a string variable with ten
characters

s10: string[10];
356 The Pascal language and the beginnings of programming
and then try to set its length to more than 10, you will not succeed: the s10 variable cannot
contain a string longer than ten characters.
All of the procedures and functions listed below until the end of the
paragraph perform actions that you must be able to do yourself; we strongly
discourage the use of these tools until you are fluent with strings at the
character-by-character level. You should not start using each of the
functions and procedures listed below until you are satisfied that you can
do the same thing "manually".
The built-in functions LowerCase and UpCase take as input an expression of type string
and return the same string - the same string as contained in the parameter, except that
the Latin letters are brought to lower or upper case, respectively (in other words, the first function
replaces uppercase letters in the string with lowercase letters, and the second function, on the
contrary, replaces lowercase letters with uppercase letters).
The copy function takes three parameters as input: string, start position and number of
characters, and returns a substring of the given string starting from the given start position, with the
length as the given number of characters, or less if there were not enough characters in the original
string. For example, copy('abrakadabra', 3, 4) will return the string 'raka' and
copy('foobar', 4, 5) will return the string 'bar'.
The delete procedure also takes as input a string (this time via a parameter variable,
because the string will be changed), the starting position and the number of characters, and deletes
the given number of characters from this string (right on the spot, i.e. in the variable you passed as
a parameter), starting from the given position (or to the end of the string, if there are not enough
characters). For example, if the variable s contained the same "abrakadabra", then after
performing delete(s, 5, 4) the variable s will contain "abrabra"; but if we had applied
delete(s, 5, 100), we would have gotten just "abra".
The built-in insert procedure inserts one string into another. The first parameter specifies the
string to be inserted; the second parameter specifies the string type variable into which the specified
string is to be inserted. Finally, the third parameter (an integer) specifies the position from which the
insertion should be performed. For example, if the variable s contains the string "abcdef"
and then insert('PQR', s, 4), the variable s will contain the string "abcPQRdef".
The pos function accepts two strings as input: the first one specifies the substring to search
for, the second one specifies the string to search in. It returns an integer equal to the position of the
substring in the string if it is found, or 0 if it is not found. For example, pos('kada',
'abrakadabra') will return 5, and pos('aaa', 'abrakadabra') will return 0.
The val procedure, which constructs a number of type longint, integer or byte
from its string representation, can be very useful. The first parameter to the procedure is a string that
must contain the text representation of the number (possibly with some spaces before it); the second
parameter must be a variable of type longint, integer, or byte; the third parameter is another
variable, always having
§ 2.6. Pascal's type system 357
word type. If everything was successful, the procedure will put the obtained number into the second
parameter, and the number 0 into the third parameter; if there was an error (i.e. the string did
not contain a correct representation of the number), the number of the position in the string where
the conversion to a number failed is put into the third parameter, and the second parameter remains
undefined in this case.
Pascal also provides a means of converting a number back to a string representation. If decimal
representation is required, you can use the str pseudo-procedure; here you can use character
count specifiers in the same way as we did for print in the write statement (see page 246). For
example, str(12.5:9:3, s) will put the string " 12,500" (with three spaces at the beginning
to make it exactly nine characters) into the variable s.
There are also built-in tools for translating to binary, octal and hexadecimal, but this time they
are almost ordinary functions called BinStr, OctStr and HexStr. Unlike str, they work only
with integers and are functions, i.e. they return the received string as their value, not via a parameter.
All three for some reason provide two parameters: the first is an integer of arbitrary type, the second
is the number of characters in the resulting string.

2.6.12. Processing command line parameters


Programs written in Pascal, like any Unix program, can be run with command line
arguments (see §1.2.6). Let our program be called "demo"; the user can run it, for example,
as follows:

./demo abra schwabra schwabra kadabra

Here the command line consists of four words: the program name itself, the word "abra",
the word "schwabra", and the word "kadabra".
In a Pascal program these parameters are available using the built-in functions
ParamCount and ParamStr; the first function, without receiving any parameters,
returns an integer corresponding to the number of parameters without taking into account the
program name (in our example it will be number 3); the second function takes an integer as
input and returns a string corresponding to the command line parameter with a given
number. In this case, the program name is considered to be parameter number 0 (i.e. it can
be found out using the ParamStr(0) expression), and the others are numbered from one
to the number returned by the ParamCount function.
Let's write an example program that prints all elements of its command line, no matter
how many there are:

program cmdline; { cmdline.pas }


var

i: integer;
§ 2.7. Selection operator 358
begin
for i := 0 to ParamCount do writeln('[', i, ']: ',
ParamStr(i))
end.
Once compiled, we can try this program in action, for example:
avst@host:~/work$ ./cmdline abra schwabra kadabra
[0]: /home/avst/work/cmdline
[1] : abra
[2] : schwabra
[3] : kadabra
avst@host:~/work$ ./cmdline
[0]: /home/avst/work/cmdline
avst@host:~/work$ ./cmdline "one two three"
[0]: /home/avst/work/cmdline
[1]: one two three
avst@host:~/work$

Note that a three-word phrase enclosed in double quotes was treated as a single parameter.
This has nothing to do with the Pascal language; it is a property of the command interpreter,
which we discussed in detail in §1.2.6.

2.7. Selection operator


The select operator is, in a certain sense, a generalization of the branching (if) operator:
unlike branching, which provides only two execution options, the select operator allows you
to specify as many options as you like. During program execution, a given expression is
evaluated, which can be of any ordinal type (see §2.6.3); depending on the resulting value,
one of the execution branches provided in the body of the operator is selected.
In Pascal, the selection operator begins with the keyword case, followed by an arbitrary
arithmetic expression of an ordinal type. The termination of the expression is marked by the
keyword of; it is followed by a number of branches (at least one), each consisting of two
parts: the set of values for which the branch should be executed (i.e., the branch will be
executed if the result of evaluating the expression after case is one of the given values)
and an operator (possibly compound, but not necessarily). Values are separated from the
operator by a colon. At the end of the body of the case operator, it is possible (but not
mandatory) to put a branch else, which consists of the keyword else and an operator;
this operator is executed if the value of the expression in the case header does not match
any of the branches.
The value for a branch can be specified as a single value, multiple values can be listed
using commas; finally, a range of values can be specified using two dots. Note that the values
in the select statement must be set with compile-time constants (see §2.2.15).
Here is an example. The following program reads one character from the keyboard and
classifies it as belonging to one of the categories:
§ 2.7. Selection operator 359
program SymbolType; var
c: char;
begin
read(c);
write('The symbol is ');
case c of
'a'..'z', 'A'..'Z':
writeln('a latin letter');
'0'..'9':
writeln('a digit');
'+', '-', '/', '*':
writeln('an arithmetic operation symbol');
'<', '>', '=':
writeln(' acomparision sign');
'.',',' ,' ;',' :',' !',' ?':
writeln('a punctuation symbol');
'_',' ~',' @',' #','$', '%','"' ,
'&','' ',' |',' \':
writeln('a special purpose sign');
:
writeln('the space character');
#9, #10, #13:
writeln('a formatting code');
#27:
writeln('the escape code');
'(', ')', '[', ']', '{', '}':
writeln('a grouping symbol');
> " > > > > > - ,:
writeln('a quoting symbol');
else
writeln('something strange')
end
end.
The select operator often provokes a very serious problem related to the style of program
writing. In the sources of novice programmers you can find fragments like this:

case nv of
1: begin
{...}
end;
2: begin
{...}
end;
3: begin
{...}
end;
{...}
14: begin
{...} end end
§ 2.7. Selection operator 360
The determining factor here is that the case-expression has an ordinary integer type, and the
variants are designated by numbers 1, 2, 3 and so on, sometimes up to quite large numbers
(the author of these lines has seen such constructions with 30 or more variants). So, they fire
you from your job for such programming, and rightly so. Indeed, how should you read it?
What, for example, is 3, i.e. what does it mean in this case? You can find the answer to
this question by digging the program up and down, finding out where the value of this variable
nv comes from, in which cases it takes some values and in which cases it takes other
values; it will take a lot of time for the reader of the program. But that's not the worst of it.
The author of the program can confuse values of this kind himself by returning, say, number
7 instead of number 5 from some function by mistake. In general, a program whose internal
logic is based on "variant numbers" is surprisingly fast and dashing to "fight off".
The right thing to do in such a case is not to use the numbers of these variants, which do
not tell anyone anything, but the values of the enumerable type specially introduced for this
purpose (see §2.6.2), choosing meaningful identifiers.
It is necessary to note the influence of the selection operator on the size of subroutines.
As we have already mentioned, an ideal subroutine is no more than 25 lines long; if you
encounter a selection operator in your subroutine, in most cases you will not fit into this limit.
In principle, this is the case when it is quite acceptable to exceed the specified size, but not by
much. Nested selection operators

into each other is almost always unacceptable. If the implementation of all or some of the
alternatives in your choice operator is so complicated, you should separate each alternative
into a separate subroutine, and the choice operator itself will then consist of their calls.

2.8. Full-screen programs


In this chapter we will take a clear step away from the main path of learning. The material
covered here is completely unnecessary for understanding the following chapters and their
examples, and will not be used in any way in any further text, not only in this part on Pascal,
but also in the whole book. Nevertheless, we refrain from the traditional recommendation to
"skip this chapter if...".
Let us recall what we have already mentioned in the preface. The first - and perhaps the
most important - step in becoming a programmer is made when the future programmer passes
from the etudes from the problem book to solving problems set independently, not because it
is necessary, but because it is more interesting. The wider the range of simple possibilities
available to a beginner, the more chances that he will think of something that he would like to
write and that he can write.
Unfortunately, we are still far away from graphical interfaces - at least if we approach
them seriously and without the risk of traumatizing our own brains - but no one forces us to
limit ourselves to traditional console applications. By the end of this chapter we will learn to
write programs that work in the terminal emulator window, but at the same time use its
capabilities to the full, not limiting ourselves to "canonical" streaming input (line by line) and
the same output.
§ 2.8. Full screen programs 361
The possibilities of full-screen alphanumeric user interfaces are far from modest as it may
seem at first glance; in addition to the familiar text editors working in the terminal, we can
mention, for example, the mail client mutt, the XMPP protocol client (also known as Jabber)
mcabber and many other programs; but a really adequate idea of the possibilities is given
by the game NetHack, which some people play for decades (literally so; there are known cases
of complete completion of this game within 15 years) and which is built
The ability to work with the terminal "at full speed" really allows you to create dynamic
game programs, though not graphic, but no less interesting; and, of course, the matter is not
limited to this.
Now we can tell you under what conditions you can skip this chapter. You can go straight
to the next chapter if you have already thought of an interesting problem for yourself, started
to solve it, and you don't need full-screen work to do it. In this case, skipping the chapter on
full-screen programs will just save you time; in all other cases, try reading it and mastering
the tools offered here. You may well find them to your liking.

2.8.1. A little theory


Alphanumeric terminals, starting with the oldest models that had a screen instead of a
traditional printer, supported so-called escape sequences to control the display; each escape
sequence was a set of code bytes starting with a pseudo-character code 27 (Escape; hence the
name). Having received such a sequence, the terminal could, for example, move the cursor to
the position encoded in the sequence, change the color of the output text, scroll up or down,
etc.
For writing full-screen programs this is not enough, because the terminal driver by default
works with the keyboard in the so-called canonical mode, in which, firstly, the active program
receives user input line by line, i.e. the effect of pressing a key will not manifest itself until
the user presses Enter, and secondly, some key combinations such as Ctrl-C,Ctrl-D, etc. have
special meanings, so that the program cannot "read" them, before the user presses Enter, and,
secondly, some key combinations, such as Ctrl-C,Ctrl-D, etc., have a special
meaning, so that the program cannot "read" them as ordinary information and receives only a
ready-made effect - for Ctrl-C it is a SIGINT signal, which is fatal for most
programs, for Ctrl-D it is an imitation of the "end of file" situation, etc. д. Fortunately, all
these features of terminal driver behavior can be reprogrammed, which full-screen programs
do.
If everything is more or less clear with terminal driver reprogramming (just read the
reference information on the word termios), then everything is more complicated with
escape sequences. The point is that terminals used to be produced very different, and sets of
escape sequences for them differed somewhat; modern programs emulating the terminal, such
as xterm, konsole and others, also differ from each other in their capabilities. All these
difficulties are more or less covered by libraries that provide some set of functions to control
the terminal screen and generate different escape sequences depending on the type of terminal
used. In C programming, the ncurses library, itself quite complex, is usually used for this
purpose.
Borland's versions of Pascal, popular in the MS-DOS era, included a special library
§ 2.8. Full screen programs 362
module crt , which allowed creating full-screen text programs for MS-DOS. Of course, it
162

had nothing in common with terminals in Unix, the screen was controlled through BIOS
interface or even by direct access to video memory. Now all this is of archaeological interest,
except that the creators of Free Pascal set as one of their goals to achieve full compatibility
with Turbo Pascal, including the crt module; as a result, the version of Free Pascal that you
and I use on unix systems contains its own implementation of the crt module. This
implementation supports all the functions that were present in MS-DOS's Turbo Pascals, but
implements them, of course, by means of escape sequences and terminal driver
reprogramming.
It should be noted that the interface of the crt module is much simpler than that of the
same ncurses library and is much better suited for beginners. It is this module that we
will use.
To make the module's features available in the program, we need to tell the compiler that
we are going to use it. This is usually done immediately after the program header, before all
other sections, including before the constants section (though not necessarily), for example:

program tetris;
uses crt;

Before we start discussing the module's features, we should make one caveat. As soon as a
program written using the crt module is run, it will immediately reprogram the terminal to
suit its needs; among other things, this means that the life-saving Ctrl-C combination will
no longer work, so if your program hangs or you simply forget to provide a correct way to tell
it that it is time to terminate, you will have to remember how to kill processes from a nearby
terminal window. Perhaps you should reread §1.2.9.
Note that for programs written using the crt module, I/O redirections make no sense at
all; the whole thing just won't work.

2.8.2. Display in arbitrary screen positions


Let's start by clearing the screen so that the text left over from previous commands does
not interfere with our full-screen program. This is done by the clrscr procedure, the name
of which is derived from the words clear screen. The cursor will be in the upper left corner of
the screen; this is where the text would appear if we were to output it with a simple write
or writeln. But we will not do that; it is much more interesting to specify the place on the
screen where the message should be output.
Text, as you know, appears where the cursor is positioned. To move the cursor to an
arbitrary screen position, the GotoXY procedure accepts two integer parameters: the
horizontal coordinate and the vertical coordinate. The coordinate origin is the upper left
corner, and its coordinates are (1, 1), not (0, 0) as you might expect. You can find
out the width and height of the screen by referring to the global variables ScreenWidth

From the words Cathode Ray Tube, i.e. cathode ray tube, aka "kinescope". The liquid crystal "flat"
162

monitors, which have now completely replaced the kinetoscope monitors, did not exist at that time.
§ 2.8. Full screen programs 363
and ScreenHeight. These variables are also entered by the crt module; when the
program starts, the module writes into them the actual number of available characters in a line
and the lines themselves.
With GotoXY at our disposal, we can already do something interesting. Let's write
a program that displays our traditional phrase "Hello, world!", but it does it not within
our dialog with the command interpreter, as before, but in the center of the screen, cleared of
all extraneous text. Having printed the inscription, let's move the cursor back to the upper left
corner so that it doesn't spoil the picture, wait five seconds (this can be done using the delay
procedure, whose argument is an integer expressed in thousandths of a second; it is also
provided by the crt module), clear the screen again and finish the work. The duration of
the delay, as well as the text of the message to be output, will be placed at the beginning of
the program as named constants.
It remains to calculate the coordinates for printing the message. The vertical coordinate
is simply half the screen height (ScreenHeight, divided in half); for the horizontal
coordinate, we subtract the length of our message from the screen width (ScreenWidth),
and again divide the remaining space in half. With this approach, the difference between the
upper and lower margins, as well as between right and left, will not exceed one; we can't
achieve the best anyway, because the output in alphanumeric mode is possible only in
accordance with the available characters, you can't move the text by half a character either
horizontally or vertically. By the way, we should not forget that we will also need integer
division, using the div operation.
So, writing:

program HelloCrt; { hellocrt.pas }


uses crt; const Message = 'Hello, world!'; DelayDuration =
5000; { 5 seconds }
var x, y: integer; begin clrscr; x := (ScreenWidth -
length(Message)) div 2; y := ScreenHeight div 2; GotoXY(x, y);
write(Message); GotoXY(1, 1); delay(DelayDuration); clrscr
end.

Note that the current coordinates of the cursor can be obtained using the WhereX and WhereY
functions; these functions take no parameters. If we use GotoXY to try to move the cursor to
an existing position, it will be at that position, whereas if we try to move it to a position that does not
exist on our screen, the resulting current coordinates will be anything but what we expect. In addition
to GotoXY, the current position of the cursor is naturally changed by output operations (usually
the write operator is used together with the crt module).
Unfortunately, these tools have a very serious limitation: if the user resizes the window in which
the program is running, the program will not know about it; the ScreenWidth and
ScreenHeight values will remain as they were set by the crt module at startup. The source
of this limitation is quite obvious: back when the crt module was invented, the screen could not be
resized.
§ 2.8. Full screen programs 364
2.8.3. Dynamic input
To organize keyboard input by one key, as well as to handle all sorts of "tricky" keys like
"arrows", F1 - F12 and other such things, the crt module provides two functions:
KeyPressed and ReadKey. Both functions do not take parameters. The KeyPressed
function is quite simple: it returns the logical value true if the user has pressed a key whose
code you haven't read yet, and false if the user hasn't pressed anything.
The ReadKey function is a bit more complicated. It allows you to get the code of the
next key pressed; if you call ReadKey before the user has pressed something, the function
will block until the user deigns to press something; if the key has already been pressed, the
163

function will return control immediately. It should be emphasized that the ReadKey call,
while returning another code, removes it from the incoming buffer, i.e. this function has a side
effect.
The return value type of the ReadKey function is an ordinary char, and if the user
presses any key with a letter, number or punctuation mark, exactly this character will be
returned. The same is true for space (' '), tab (#9), Enter (#13; note that it is not #10,
although in some other version it may be #10), Backspace (#8), Esc (#27). The
combinations Ctrl-A, Crtl-B, Ctrl-C, ..., Ctrl-Z give the codes 1, 2, 3, ..., 26,
the combinations Ctrl-[, Ctrl-\ and Ctrl-] give the following codes 27, 28 and 29.
The ReadKey function is quite tricky with other service keys, such as arrow keys,
Insert, Delete, PgUp, PgDown, F1-F12. In the days of MS-DOS, the creators of the crt
module took a rather unobvious and not quite beautiful solution: they used so-called "extended
codes". In practice, it looks like this: the user presses a key, the ReadKey function in the
program returns the symbol #0 (symbol with code 0), which means that the function must
be immediately called for the second time; it returns control immediately, and the symbol
returned by the function in this repeated call just identifies the pressed special key. For
example, "left arrow" codes 0-75, "right arrow" codes 0-77, "up arrow" and "down arrow"
codes 0-72 and 0-80 respectively. The following simple program will allow you to find out
which keys correspond to which codes:

program RdKey; { rdkey.pas }


uses crt;
var
c, cc: char;
begin
repeat
c := ReadKey;
cc := c;
if (cc < #32) or (cc > #126) then cc := '?';
writeln(ord(c), ' (', cc, ')')

163
Beginners in this case often say "hangs", but this is wrong: when something "hangs", the only way to get
it out of this state is by extraordinary measures like killing the process, whereas simply waiting for an event -
in this case a keystroke - to stop as soon as this event occurs is not called hanging, but locking.
§ 2.8. Full screen programs 365
until c = ' ' end.

You must press the space bar to terminate this program.


The version of Free Pascal that the author had at the time of writing this book handled some keys
rather strangely, producing a sequence of three codes, the last of which was code 27 (Escape). It is
very difficult to select such a sequence, because it doesn't even start with zero. This behavior was
demonstrated by the End key, the number 5 on the additional keyboard with NumLock turned off, and
the Shift-Tab combination. Besides, the Ctrl-prompt and Ctrl-@ combinations
produced code 0 followed by nothing. All this contradicts the original specification of the ReadKey
function from the turbopascale crt module and is not documented anywhere. It is quite possible
that using other window managers and other runtime environments will reveal some other
inconsistencies; but it is just as likely that future versions of Free Pascal will fix this nonsense.
We must admit that the ReadKey function is designed very poorly: apart from its non-
obvious logic, the very fact that a library function has a side effect is annoying because it
contradicts the basic traditions of Pascal programming culture. In our examples, we will
isolate this strange function by writing our own procedure to get the extended code. We'll call
it GetKey. The procedure will naturally use the library version of ReadKey, but it will be
the only place in each of our programs where this (rather odd) function is called; by writing
our own procedure, we can get rid of the "proprietary" feature of ReadKey's logic - the
need to sometimes call it twice, since our procedure will do it itself if necessary. Besides, we
can successfully ignore ReadKey as a source of side effects, because procedures have no
side effects by definition.
The procedure will pass the received code to the caller via a parameter-variable of type
integer, and the usual codes corresponding to the codes of the input characters (i.e.
not extended) will be passed without changes, while for extended codes our procedure will
change the sign, returning a negative number. In particular, "left arrow" will correspond to
code -75, "right arrow" will correspond to code -77, etc. Together with a small head
program demonstrating the operation of the procedure, its text will look like this:

program GtKey; { getkey.pas }


uses crt;

procedure GetKey(var code: integer);


var

c: char; begin
c := ReadKey;
if c = #0 then begin
c := ReadKey;
code := -ord(c) end else begin
code := ord(c) end
end;

var
i: integer;
begin
§ 2.8. Full screen programs 366
repeat
GetKey(i);
writeln(i)
until i = ord(' ')
end.

To get a general idea of the possibilities offered by dynamic input, we will first write a
program that, like hellocrt, will display the text "Hello, world!" in the middle
of the screen, but which can then be moved with the arrow keys; exit the program by any key
that has a normal (not extended) code.
In the constants section we will have only the text of the message. To display the message
and remove it from the screen, we will write the ShowMessage and HideMessage
procedures; the latter will display a number of spaces equal to the length of the message at the
desired position. The basis of the program will be a relatively short procedure
MoveMessage, which accepts five parameters: two integer variables - the current
coordinates of the message on the screen; the message itself as a string; two integers dx and
dy, specifying the change of x and y coordinates.
In the main program we will make a pseudo-infinite loop in which we will read the key
codes. If a normal "non-extended" code is read (the GetKey procedure has written a positive
number to the variable), the loop will be interrupted with the break operator, and the
program will end after clearing the screen. If the extended code is read (the number in the
variable is negative), then, if it corresponds to one of the four arrow keys, the
MoveMessage procedure will be called with the corresponding parameters; the program
will ignore the other keys with extended codes. The full text of the program is as follows:

program MovingHello; { movehello.pas }


uses crt; const Message = 'Hello, world!';

procedure GetKey(var code: integer); var c: char;


begin c := ReadKey; if c = #0 then begin
c := ReadKey;
code := -ord(c) end else begin
code := ord(c) end
end;

procedure ShowMessage(x, y: integer; msg: string);


begin
GotoXY(x, y);
write(msg);
GotoXY(1, 1) end;

procedure HideMessage(x, y: integer; msg: string); var


len, i: integer;
begin len := length(msg); GotoXY(x, y); for i := 1 to len do
write(' '); GotoXY(1, 1) end;

procedure MoveMessage(var x, y: integer; msg: string; dx, dy:


integer); begin HideMessage(x, y, msg); x := x + dx; y := y + dy;
ShowMessage(x, y, msg) end; var CurX, CurY: integer; c: integer;
§ 2.8. Full screen programs 367
begin clrscr;

CurX := (ScreenWidth - length(Message)) div 2;


CurY := ScreenHeight div 2;
ShowMessage(CurX, CurY, Message); while true do begin
GetKey(c);
if c > 0 then { non-extended code; exit } break;
case c of
-75: { left arrow }
MoveMessage(CurX, CurY, Message, -1, 0);
-77: { right arrow }
MoveMessage(CurX, CurY, Message, 1, 0);
-72: { up arrow }
MoveMessage(CurX, CurY, Message, 0, -1);
-80: { down arrow }
MoveMessage(CurX, CurY, Message, 0, 1) end
end;
clrscr end.

This program has a serious drawback: it does not keep track of valid values for coordinates,
so we can easily "push" the message off the screen; after that it will always appear in the upper
left corner. We will suggest the reader to fix this as an exercise.
Let's consider a more complex example. Our next program will display an asterisk symbol
(*) in the middle of a blank screen. At first the symbol will be stationary, but if you press any
of the four arrows, the symbol will start moving in the specified direction at a rate of ten
characters per second. Pressing the other arrows will change the direction of its movement,
and pressing the spacebar will stop it. The Escape key will end the program.
In the constants section we will again have DelayDuration equal to 100, i.e. 10
seconds. This is the time interval that will pass between two movements of the star.
Taking into account the experience of the previous program, we will collect all the data
specifying the current state of the sprocket into one record, which we will pass to the
procedures as a var-parameter. This data includes the current coordinates of the star, as
well as the direction of motion specified by the familiar dx and dy values. The type for this
record will be called simply star. The procedures ShowStar and HideStar, receiving
a single parameter (a record of type star) will show the star on the screen and remove it
by typing a space in this place; the procedure MoveStar will move the star by one
position in accordance with the dx and dy values. For convenience, we will also describe
the SetDirection procedure, which puts the specified values into the dx and dy fields.
The main part of the program will first set the initial values for the asterisk and display it
in the middle of the screen; the program will then enter a pseudo-infinite loop in which, if the
user has not pressed any keys (i.e. KeyPressed has returned false), MoveStar
will be called and delayed; since nothing else needs to be done in this case, the loop body
will be terminated prematurely with the continue statement (recall that, unlike
break, the continue statement prematurely terminates only one iteration of the
loop body, not the entire loop). When the code of one of the arrows is received,
SetDirection will be called with the corresponding parameter values, when the code of
the space character (32) is received, the asterisk will be stopped by calling SetDirection
§ 2.8. Full screen programs 368
with zero dx and dy, when Escape (27) is received, the loop will be terminated by the
break operator. All together will look like this (here and below we omit the body of the
GetKey procedure to save space, it is the same in all examples):
program MovingStar; { movingstar.pas }
uses crt;
const
DelayDuration = 100;

procedure GetKey(var code: integer);


{ ... }

type
star = record
CurX, CurY, dx, dy: integer; end;

procedure ShowStar(var s: star);


begin
GotoXY(s.CurX, s.CurY);
write('*');
GotoXY(1, 1)
end;

procedure HideStar(var s: star);


begin
GotoXY(s.CurX, s.CurY);
write(' ');
GotoXY(1, 1)
end;

procedure MoveStar(var s: star);


begin
HideStar(s);
s.CurX := s.CurX + s.dx;
if s.CurX > ScreenWidth then
s.CurX := 1
else
if s.CurX < 1 then

s.CurX := ScreenWidth;
s.CurY := s.CurY + s.dy;
if s.CurY > ScreenHeight then s.CurY := 1 else if s.CurY < 1
then s.CurY := ScreenHeight;
ShowStar(s)
end;

procedure SetDirection(var s: star; dx, dy: integer);


begin
s.dx := dx;
s.dy := dy end;

var
s: star;
ch: char;
begin
clrscr;
§ 2.8. Full screen programs 369
s.CurX := ScreenWidth div 2;
s.CurY := ScreenHeight div 2;
s.dx := 0;
s.dy := 0;
ShowStar(s); while true do begin if not KeyPressed then begin
MoveStar(s);
delay(DelayDuration); continue
end;
GetKey(c); case c of
-75: SetDirection(s, -1, 0);
-77: SetDirection(s, 1, 0);
-72: SetDirection(s, 0, -1);
-80: SetDirection(s, 0, 1);
32: SetDirection(s, 0, 0); 27: break
end
end; clrscr end.

2.8.4. Color Management


So far, all texts appearing in the terminal window as a result of our programs have been
of the same color - the one specified in the terminal program settings; but this can be
changed. Modern terminal emulators, as well as
Table 2.1. Constants for color designation in the crt module
for text and background text-only
Black black blue green DarkGray dark gray light blue
Blue green blue blue LightBlue light green light blue
Green red purple brown LightGreen light red pink yellow
Cyan light gray LightCyan yellow white
Red LightRed
Magenta LightMagenta
Brown Yellow
LightGray White

terminals of the last models (for example, DEC VT340, production of which was discontinued
only in the second half of the 1990s), formed a color image on the screen and supported
escape-sequences that set the text color and background color.
Unfortunately, the interface of our module crt reveals these possibilities not to the full
extent; the fact is that the prototype of this module from Turbo Pascal was designed for the
standard text mode of so-called IBM-compatible computers, where everything was relatively
simple: each sign corresponded to two one-byte cells of video memory, the first byte
contained the character code, the second - the color code, and four bits of this byte set the
color of the text, three bits - the background color, which made it possible to use only eight
different colors for the background and sixteen - for the text. Even the capabilities of
alphanumeric terminals produced in those times were wider, not to mention modern emulator
programs.
Since the crt module introduced in Free Pascal was developed primarily to maintain
§ 2.8. Full screen programs 370
compatibility with its prototype, its interface repeats the features of the prototype's interface
and does not provide any broader features. Such features could be used using the video
module, but it is much more complicated to work with it, and our tasks are only educational;
perhaps, if you seriously want to write full-screen programs for an alphanumeric terminal, it
would be better to learn C and use the ncurses library. However, as you will soon see, the
capabilities of the crt module are quite enough to create quite interesting effects; at the
same time, it is much easier to master it.
We have only two main tools here: the TextColor procedure sets the text color, and
the TextBackground procedure sets the background color. The colors themselves are
set by constants described in the crt module; they are listed in Table 2.1. Note that all 16
constants listed in the table can be used to set the text color, while only the eight constants in
the left column can be used to set the background color. For example, if you execute
TextColor(Yellow);
TextBackground(Blue);
write('Hello');

then the word "Hello" will be printed in yellow letters on a blue background, and this
combination will be used for all text output until you change the text or background color
again. Alternatively, you can make the output text blink; to do this, when you call
TextBackground, you add a Blink constant to its argument; you can do this using
normal addition, although a bitwise "or" operation would be more appropriate. For example,
the text displayed on the screen after executing TextColor(Blue or Blink) will
be blue and blinking.
The described tools have a fundamental disadvantage: the text and background color
settings remain in effect after your program is terminated, and the crt module has no means
to find out what color of text and background is set now (in particular, at the moment of
launching your program), so if we use only the crt module tools, we will not be able to
restore the terminal settings, and after our program is terminated, the user will have to "bring
the terminal back to normal", for example, with the reset command, or close the window
altogether. However, this problem can be solved by using the terminal capabilities directly.
Operator
write(#27'[0m');

will output an escape sequence to the terminal (literally, a sequence of character codes starting
with the Escape code, i.e. 27; see page 366) that corresponds to restoring the terminal's
"default" settings; in particular, for terminal emulators used in X Window, the settings that
were in effect when the emulator was started are restored. 366), which corresponds to
restoring the terminal to its "default" settings; in particular, for terminal emulators used with
X Window, the settings that were in effect when the emulator was started are restored.
The following program demonstrates all possible combinations of text and background
colors, filling the screen with asterisks according to the following rules: in each line the color
of the asterisks themselves (i.e. text color) is the same, it is chosen new for each line; the
width of the line is divided into equal (as far as possible) parts corresponding to all possible
§ 2.8. Full screen programs 371
background colors; for each screen position before printing the asterisk, the text color is set
corresponding to the color of the current line, and the blink attribute is added to the
asterisks in positions with even numbers, making them blink.

All possible color values will be placed in the AllColors array; for convenience,
we introduce ColorCount and BGColCount constants corresponding to the total
number of all colors and background colors. The MakeScreen procedure will be
responsible for selecting the text color for each line; it will loop through all screen line
numbers and for each line call the MakeLine procedure with two parameters: line number
and selected color value. MakeLine will calculate the column width for each possible
background color and loop through all positions of the row, setting the appropriate colors for
each of them and giving an asterisk. If the column width is less than one, force it to one to
avoid further division by zero. Note that we will not print an asterisk in the last position of the
last line to avoid scrolling the entire screen; alas, the crt module does not allow us to print
a character in this position so that nothing goes anywhere.
To prevent the program from terminating immediately after forming the "picture" (so that
we wouldn't have time to see the picture), we need to perform some input operation; we don't
want to use the notorious ReadKey, but it doesn't make sense to drag our (rather
cumbersome) GetKey procedure into the program to use it in such a trivial situation, so
we'll use the usual readln operator to "pause" the program; the user will have to press the
Enter key to exit the program.
The text of the program turns out like this (we advise the reader to see for himself how
exactly the necessary values of positions and indexes of the array of colors are obtained with
the help of div and mod operations):
program ColorsDemo; { colordemo.pas }
uses crt;

const
ColorCount = 16;
BGColCount = 8;
var
AllColors: array [1..ColorCount] of word =
(
Black, Blue, Green, Cyan,
Red, Magenta, Brown, LightGray,
DarkGray, LightBlue, LightGreen, LightCyan, LightRed,
LightMagenta, Yellow, White
);

procedure MakeLine(line: integer; fgcolor: word); var


i, w, cw: integer;
begin
w := ScreenWidth;
cw := w div BGColCount;
if cw = 0 then
cw := 1;
if line = ScreenHeight then w := w - 1;
for i := 1 to w do
begin
§ 2.8. Full screen programs 372
GotoXY(i, line);
TextBackground(AllColors[(i-1) div cw + 1]);
if i mod 2=0 then
TextColor(fgcolor + blink)
else
TextColor(fgcolor);
write('*')
end end;

procedure MakeScreen; var


i: integer;
begin
clrscr;
for i := 1 to ScreenHeight do
MakeLine(i, AllColors[i mod ColorCount +1]) end;

begin
MakeScreen;
readln;
write(#27'[0m');
clrscr end.

2.8.5. Random and pseudo-random numbers


Having at your disposal the means of dynamic input and control of the screen space (even
if it is the screen of an emulated text terminal and not the whole computer screen), you will
almost certainly want to write some simple game program . An indispensable tool for
36

creating "toys" is the so-called random number sensor, which allows you to make each game
session different from others, to introduce variety and healthy doses of unpredictability into
the game.
Of course, not all games require elements of randomness; but you should not be in a hurry.
Even a program that plays chess can- By the way, if you have not noticed such a desire, it may well
36

mean that you are studying programming in vain. May the reader forgive me for another reminder that not
everyone should become a programmer; but if, having received at your disposal a toolkit sufficient for creating
your own "toy", you did not feel the desire to rush immediately to make such a toy, then, most likely, the
programming process does not give you pleasure - and this, as we discussed in the prefaces, almost certainly
means that the work of a programmer will turn into a natural torture for you. However, it's up to you.

If he always plays the same opening, he will bore the user to death.
Generating truly random numbers that cannot be predicted is a science; however,
programmers have long since learned how to deal with this task, and modern operating
systems (including Linux) include tools for generating random numbers based on
unpredictable events, such as intervals between packet arrivals from the local computer
network and between keystrokes on the keyboard, fluctuations in the speed of hard disks, and
so on. Such tools are usually required in serious cases, for example, when generating secret
cryptographic keys - in general, in situations where the ability of an outsider to predict the
sequence of "random" numbers generated on our computer could cost us dearly.
Game programs do not generate such situations, unless we are going to play with someone
for money, and a lot of it. In most cases, no one will simply not try to predict the sequence of
numbers generated in a game program or in some screensaver, because such a prediction, even
§ 2.8. Full screen programs 373
if it can be made, will not bring the predictor any benefit at all comparable to the cost of time
spent (and in most cases - none at all). Therefore, programmers often replace the generation
of "real" random numbers with sequences of pseudo-random numbers.
The general principle of pseudorandom number generation is as follows. There is a certain
variable, which is usually called random seed ; with the help of some tricky formula, the
164

next value of this variable is obtained from the previous value of this variable, which is entered
into it every time a random number is required. The random number itself is obtained from
the current value in the random seed by some other formula, probably much less tricky.
Sequences of pseudo-random numbers have one undoubted advantage: if such a need
arises, they can be repeated, starting from the same random seed value as the last time.
Sometimes this is required when debugging programs. However, in most cases we need the
opposite: to make the sequence new and "unpredictable" each time, taking into account that
nobody will seriously try to predict it. For this purpose, at the random seed is filled with some
value that can be expected to be new every time - for example, the current value of the system
time, which is measured as the number of seconds since January 1, 1970, or something similar.
Free Pascal provides built-in facilities for generating a sequence of pseudorandom
numbers. To fill the random seed at the beginning of the program, you must call the
randomize procedure; it will put a "random" number (actually, just the value of the current
time) into a global variable called randseed, but that's beside the point - this variable
should not be accessed directly. It should be emphasized that randomize must be called
exactly once; if, for example, you start calling it every time you need another random number,
it is likely that all the numbers generated in the program within one second will be the same.
To get a random number, you should use the random function, which is present in two
variants: it can be called without parameters, and then it will return a number of type real
on the half-interval from 0 to 1 (including zero, but not including one); if you need an integer,
the random function is called with one (integer, or more precisely - of type longint)
parameter, and it returns an integer number from 0 to the value passed by the parameter, but
not including this value. For example, random(5) will return the number 0, 1, 2, 3 or 4.
Note that the random function also has a side effect: before returning a random number, it
changes the randseed variable to return a different number the next time it is called.
To demonstrate the capabilities of the pseudorandom number generator, let's write a
simple program that will gradually fill the initially empty screen with multicolored stars; the
position and color of the next star will be chosen randomly using the random function,
and a short (e.g., 20 ms) delay will be made between the output of two stars for better effect.
The program will look like this:

program RandomStars; { randstars.pas }


uses crt;

The literal translation - something like "random seed" - does not reflect the actual meaning of this word
164

combination because of the numerous figurative meanings of the word seed, and the author of the book has not
met an adequate translation into Russian; it would be possible, perhaps, to translate seed with the word
"seedling", but it is easier to leave the word combination random seed as it is.
§ 2.8. Full screen programs 374
const
DelayDuration = 20;
ColorCount = 16;
var
AllColors: array [1..ColorCount] of word =
(
Black, Blue, Green, Cyan, Red, Magenta, Brown,
LightGray, DarkGray, LightBlue, LightGreen, LightCyan,
LightRed, LightMagenta, Yellow, White
§ 2.9. Files 369
);
var
x, y, col: integer;

begin
randomize;
clrscr;
while not keypressed do begin
x := random(ScreenWidth) + 1;
y := random(ScreenHeight) + 1;
if (x = ScreenWidth) and (y = ScreenHeight) then
continue;
col := random(ColorCount) + 1;
gotoxy(x, y);
TextColor(AllColors[col]);
write('*');
delay(DelayDuration)
end;
write(#27'[0m');
clrscr
end.

You can press any key to end the program.

2.9. Files
2.9.1. General information
We have already managed to work with files through standard input and output streams,
relying on the fact that the necessary file will be "slipped" to us by the user at the start of our
program, redirecting input or output by means of the command interpreter. Of course, the
program itself can work with files, as long as it has enough authority to do so.
In order to work with the contents of a file, it must be opened. At that, the program
addresses the operating system, declaring its intention to start working with the file; usually,
such an address specifies which file our program is interested in (i.e. specifies the file name)
and what the program is going to do with it (the mode of working with the file - read-only,
write-only, read and write, add to end). Once the file is successfully opened, a new input or
output stream associated with the file on the disk specified when the file was opened becomes
available to our program along with the standard input/output streams we already know.

The operations that can be done with such a thread are generally similar (and at the operating
system level - simply the same) to those that we can perform with standard threads: mostly,
of course, they are reads and writes, although there are others.
I/O threads associated with newly opened files must be distinguished from each other and
from standard threads. The Pascal language provides so-called file variables for this purpose;
there is a whole family of special file types to describe such variables. It should be noted that
a file type differs significantly from other types; the most noticeable difference is that file type
§ 2.9. Files 370
variables represent the only variant of a file type expression, i.e. file type values exist only
as "something that is somehow stored in file variables" and nothing else; they cannot even be
assigned. You can pass file variables to subroutines only through var-parameters. This
may seem unusual, because before we always talked about values of a given type and
expressions, the calculation of which gives such values, and variables of the same type were
considered just as a storage of the corresponding value. With file types it's the opposite: we
have only file variables; we can guess that they store something, and even say that they
probably store a "file type value", but all these talks will be nothing more than abstract
philosophy, because there are no means of working with such values in isolation from the
variables storing them in Pascal. In other words, file type variables allow us to distinguish
between any number of simultaneously active (open) I/O threads, but that's all: we can't use
file variables for anything else.
Depending on how we are going to work with the file, we have to choose a specific type
of file variable. Here we have three possibilities:

• work with a file assuming that it is text; for this purpose, a file variable of type text
is used;
• work with a file as an abstract sequence of bytes, being able to write and read any of
its fragments using the so-called block read and block write operations; this will
require a file variable, the type of which is called file;
• assume that a file consists of fixed-length blocks of information that correspond to the
machine representation in memory of values of some type; here we need a so-called
typed file, for which Pascal supports a whole family of user-entered types, such as file
of integer, file of real, or (more often) file of myrecord, where
myrecord is the name of the type-record described earlier.
Note that we can work with one and the same file in at least two, and often all three of these
ways; the way we choose to work depends not on the file, but on the task at hand.
Regardless of the type of file variable we use, before we start working with a file, we need
to assign a file name to the variable; this is done by calling the assign procedure.
For example, if we want to work with the text file data.txt located in the current
directory, in the variable description section we have to write something like

var
fl: text;
and somewhere in the program is a call

assign(f1, 'data.txt');

We emphasize that this call simply associates the name 'data.txt' with a file variable.
The assign procedure does not try to open the file or even check if it exists (which is
understandable, because we may be about to create a new file). Of course, you can use not
only constants, but also any string expressions as a file name; if the name starts with the
"/" character, it is considered an absolute file name and will be counted from the root
§ 2.9. Files 371
directory of our system, if the name starts with any other character, it is considered relative
and counted from the current directory; however, this is not due to Pascal, but to Unix
operating systems.
Since beginners often get confused between file name and file variable name, let us
emphasize once again that these are two completely different, initially unrelated entities. A
file name is what it (i.e. a file) is called on disk, the name by which the operating system
knows it. When we write a program, we may not know at all what the file name will be during
its operation: perhaps the user will give us the file name, or we will get it from some other
sources, perhaps we will even read it from another file; this often happens in real tasks.
On the other hand, the name of a file variable is the name of the variable and nothing
more. We are free to name our variables as we wish; if we rename the variables but do not
change anything else in the program, the behavior of our program will not change, because
variable names have no effect on this behavior . And, of course, the file variable name we
172

choose has nothing to do with which file on our disk we are going to use.
The relationship between a file variable name and a file name on disk begins to exist only
after the assign procedure is called; moreover, no one forbids calling this procedure
again, breaking the old relationship and establishing a new one.
After the file variable has been assigned a file name, we can try to open the file for further
work with it. If we are going to read information from a file, we should open it using the
reset procedure; in this case the file must already exist (if it does not exist, an error will
occur), and work with it will start from its initial position, i.e. the first read operation will
retrieve data from the very beginning of the file, the next operation will retrieve the next
portion of data, etc.
д. An alternative way to open a file is to use the rewrite procedure. In this case, the file
does not have to exist: if it does not exist, it will be created; if it already exists, all the
information in it will be destroyed and the work will start "from scratch". Text files can also
be opened for appending, this is done with the append procedure; this procedure does
not work for typed and block files.
It should be taken into account that the operation of opening a file is always fraught with
errors, and the built-in diagnostics that the Free Pascal compiler is able to insert into our
programs is notable for its incomprehensibility; therefore, it is highly desirable to disable the
built-in I/O error handling by specifying the {$I-} directive in the program, which we
already know, and to organize error handling independently, using the value of the
lOResult variable (see page 319).
To read and write text and typed files we use the already familiar read and write
operators, and for text (but not typed) files we can also use readln and writeln; the
only difference is that, working with files instead of standard I/O streams, we specify a file
variable as the first argument in these operators. For example, if we have a file variable fl
and a variable x of type integer, we can write something like

Note that the fundamental absence of influence of specific variable names chosen by the programmer on
172

the program behavior is one of the key features of compiled programming languages, which include Pascal,
among others.
§ 2.9. Files 372
write(f1, x);
- and if fl is of type text, the text representation of the number stored in x (i.e.
sequence of bytes with character-number codes) will be written to the corresponding file,
whereas if fl is a typed file, exactly two bytes will be written - the machine
representation of the number of type integer. Similarly, the familiar eof and SeekEof
functions are used (the latter is only for text files): when working with files, these procedures
take one argument - a file variable, so we can write something like "while not eof(fl)
do".
For working with block files read and write are not suitable, instead
BlockRead and BlockWrite procedures are used, which we will consider later in the
paragraph devoted to this type of files.
When the file is finished, it should be closed by calling the close procedure. The file
variable can then be used further to work with another (or even the same) file; if you reset
or rewrite the same file variable after closing the file, a file with the same name will
be opened , but you can reassign the name by calling assign again.
173

To conclude the introductory paragraph, here is the text of the program that writes the
same phrase Hello, world! to the text file hello.txt:

program HelloFile; { hellofile.pas }


const
message = 'Hello, world!'; filename = 'hello.txt';
var
f: text;
begin
assign(f, filename);
rewrite(f);
writeln(f, message);
close(f)
end.

After running such a program, a 14 byte hello.txt file (13 message characters and a line
feed) will appear in the current directory, which can be viewed, for example, with the cat
command:

avst@host:~/work$ ./hellofile
avst@host:~/work$ ls -l hello.txt
-rw-r--r-- 1 avst avst 14 2015-07-18 18:50 hello.txt
avst@host:~/work$ cat hello.txt
Hello, world!
avst@host:~/work$

It would be a mistake to say that the same file will be opened: in the time that elapses between close
173

and reset/rewrite, someone else may have renamed or deleted our file and written a completely
different one to disk under its name.
§ 2.9. Files 373
Actually the right thing to do, of course, would have been a little neater:
program HelloFile;
const
message = 'Hello, world!';
filename = 'hello.txt';
var
f: text;
begin
{$I-}
assign(f, filename);
rewrite(f);
if IOResult <> 0 then
begin
writeln('Couldn't open file ', filename);
halt(1)
end;
writeln(f, message);
if IOResult <> 0 then
begin
writeln('Couldn't write to the file');
halt(1)
end;
close(f)
end.

The first error message is much more important than the second one: files are very often not
opened for reasons beyond our control, whereas if a file is opened, writing to it will succeed
in most cases (though not always, of course: for example, the disk may run out of space).
An open file, if it is a simple disk file, is characterized by its current position, which is
usually set to the beginning of the file when it is opened, and to the end of the file when a text
file is opened using the append procedure. Each input or output operation shifts the current
position forward by as many bytes as the number of bytes that were input or output. Therefore,
for example, successive read operations from the same file will not read the same data, but
successive portions of the data in the file, one after the other. In some cases, the current
position of an open file can be changed.

2.9.2. Text files


As the name implies, a text file is a file containing text, or, more precisely, data in text
format (see §§1.4.5 and 1.4.6). We can say that the data in such a file is a sequence of
characters, sometimes separated by line feeds; this is the format in which we read data from
the standard input stream and output it to the standard output stream.
To work with text files, as already mentioned, file variables of the built-in type text
are used; only for such input/output streams the concept of a line is defined, which makes
§ 2.9. Files 374
it meaningful to use the operators writeln and readln, as well as the function eoln
(end of line), which returns true if the current position in this file contains the end of a line.
Other file types do not consist of lines, so neither output with line feed, nor input to the end
of line, nor the end of line itself makes sense for them.
It is important to realize that if we are not talking about characters and strings, but about
data of other types - for example, numbers - then an output operation to a text file implies a
translation from machine representation to text representation, and an input operation implies
a translation from text representation to machine representation. If this is not clear, reread
§1.4.6, and do so now; if the difficulty persists, ask someone who can explain it to you. The
difference between textual and machine representation (particularly for numbers) should be
quite obvious to you; if it is not, there is no point in going any further.
In particular, in §1.4.6 we considered files containing one hundred integers from 1000 to
100099, each 1001 larger than the previous one, with the numbers.txt file containing
these numbers in textual representation and the numbers.bin file in machine
representation, which we called binary at the time. The program creating the first of these
files could look like this:

program GenerateNumTxt; { gennumtx.pas }


const name = 'numbers.txt'; start = 1000; step = 1001; count =
100; var f: text; i: integer; n: longint;
begin assign(f, name); rewrite(f); n := start; for i := 1 to
count do begin writeln(f, n); n := n + step
end;
close(f)
end.

Text files do not allow forced changes to the current position and do not involve alternating
read and write operations; such files should be written as a whole, from beginning to end,
sometimes in several steps (in this case, the file is opened for appending using append).
There is no going back when writing text files; if something needs to be changed at the
beginning or in the middle of an existing text file, the whole file is overwritten. Therefore, for
text files, the reset procedure opens the file in read-only mode, and the rewrite
and append procedures open the file in write-only mode.
The peculiarities of the textual representation of data require extra care when performing
reads "to the end of the file". We have already discussed this in detail in §2.5.4 for the case of
a standard input stream; when reading from a plain text file, similar problems arise, solved by
the same SeekEof function, only in this case it is called with one parameter, which is a file
variable. Recall that the SeekEof function actually checks if there are still meaningful
(non-space) characters in the stream; for this purpose it reads and discards all space characters,
and if during this reading/ discarding there is an "end of file" situation, the function returns
"true", if a meaningful character is found, this character is returned back to the stream (it is
considered unread so that it can be used by the next read), and the function itself returns
"false". A "file" version is also provided for the SeekEoln function, which similarly
"searches" for the end of a string, i.e. checks if something else meaningful can be read from
the current string.
§ 2.9. Files 375
Suppose, for example, we have a text file, whose name we get through the command line
argument; the file consists of lines, each of which contains one or more floating-point
numbers. We need to multiply the numbers on one line, and add the results of these
multiplications and output the result. For example, for a file containing

2.0 3. 05.0
0.5 12.0

the result should be the number 36.0. The corresponding program can be written as
follows:

program MultAndAdd; { multandadd.pas }


var sum, mul, n: real; f: text; begin {$I-} if ParamCount < 1
then begin writeln('Please specify the file name'); halt(1)
end; assign(f, ParamStr(1)); reset(f); if IOResult <> 0 then
begin writeln('Could not open ', ParamStr(1))); halt(1) end;
sum := 0; while not SeekEof(f) do begin mul := 1; while not
SeekEoln(f) do begin read(f, n); mul := mul * n end;
readln(f); sum := sum + mul end; writeln(sum:7:5) end.

Pay attention to readln(f) after the read/multiply loop. It is inserted to remove the line
feed character from the input stream; if this operator is removed, the program will simply
"hang".
It is clear that SeekEof and SeekEoln functions can be used only for text files; for
any other data formats such functions simply do not make sense, because both data separation
by spaces and data spacing on different lines are obviously phenomena possible only when
working with text representation.
It is worth noting that the standard input and standard output streams also have their own names
- Free Pascal provides global variables of type text for them. A standard output stream can be
referred to by the name output; for example, writeln(output, 'Hello') is the
same as just writeln('Hello'). Similarly, a standard input stream is referred
to by the name input, so you can write read(input, x) instead of just read(x).
These variables can be useful, for example, if you are writing a subroutine that will output data in text
form, but you don't know in advance whether the data will be output to a text file or to a standard
stream; in this case, you can provide a parameter of type text, as which to pass either an open
file variable or output.
Influenced by the C language, Free Pascal has also incorporated other names for global
variables that denote standard streams: stdin (same as input) and stdout (same as
output). In addition, Free Pascal allows output to the standard error reporting stream (see §1.2.11),
also called the diagnostic stream, which is labeled ErrOutput or stderr.

2.9.3. Typed files


A typed file in Pascal is a file containing a sequence of records of equal size corresponding
to the machine representation of values of some type. For example, the numbers.bin file
used in §1.4.6 can be thought of as consisting of 100 records of four bytes each, corresponding
§ 2.9. Files 376
to the machine representation of a four-byte integer (type longint). The following program
will create such a file:

program GenerateNumBin; { gennumbin.pas }


const name = 'numbers.bin'; start = 1000; step = 1001; count =
100; var f: file of longint; i: integer; n: longint;
begin assign(f, name); rewrite(f); n := start; for i := 1 to
count do begin write(f, n); n := n + step end; close(f) end.

If you compare this program with the GenerateNumTxt program from the previous
paragraph, you will find that almost nothing has changed in the text: the program name has
changed, the suffix in the file name has changed (.bin instead of .txt), the write
operator is used instead of writeln and, finally, the most important thing: the file variable
in the previous program was of type text, while in this program it is of type file of
longint.
In principle, a file can consist of records of almost any type, only file types cannot be
used; it is strongly discouraged (although possible) to use pointers, which we will consider in
the next chapter. A file can consist of data of any other type; in particular, using the file
of char type, you can open a text file as a typed file, or any file at all, because any
file consists of bytes.
Very often a record is used as a record type in a typed file. For example, when
creating a program for working with topographic maps, we could use a file that contains points
on the terrain, given latitude and longitude and named. For this purpose we could describe
such a type:
type

NamedPoint = record
latitude, longitude: real;
name: string[15];
end;

and the corresponding file variable:

var
f: file of NamedPoint;

To create such a file you can use the rewrite procedure, to open an existing file you can
use the reset procedure. There is no open-add operation for typed files.
Unlike text files, which consist of lines of different sizes, records of typed files have a
fixed size, which allows you to alternate read and write operations to any place in an existing
file. You can change the current position in an open typed file using the seek procedure,
which must be passed two parameters: a file variable and the record number (the very first
record in the file has the number 0). For example, the following two lines:

seek(f, 15);
write(f, rec);
§ 2.9. Files 377
will write the rec record to position #15, regardless of which file positions we have
worked with before. This can be used to modify individual records of an existing file, which
is especially important for files of significant size because it allows us to avoid overwriting
them. For example, let's say we have a file consisting of NamedPoint records and
we need to take the record number 705 from this file and change its name (i.e. the value of
the name field) to the string 'Check12'. To do this, we can read this record into a
variable of the NamedPoint type (we will assume that we have such a variable and it is
called np), change the value of the name field and write the resulting record to the same
place:

seek(f, 705);
read(f, np);
np.name := 'Check12';
seek(f, 705);
write(f, np);

Note that we had to apply seek again before writing; the point is that after the read
operation, the current position in the open file f corresponded to the record following the
read, i.e. in this case, record #706, and we had to correct this.
Generally speaking, not all files that can be opened support changing the current position; those
I/O streams for which the very notion of "current position" makes sense are called positionable.
These include, in particular, ordinary disk files; but for threads associated with the terminal (keyboard
input, screen output) or with the /dev/null pseudo-device mentioned in §1.2.11, the current
position is not defined. We will discuss the "positionability" of I/O threads in detail in Volume 2.
Since typed files allow alternating read and write operations, by default, procedures that
open a typed file open it in "read and write" mode. This applies to both reset and
rewrite: the only difference between them is that rewrite creates a new file, and if
there is already a file with that name, it eliminates its old contents; reset does neither, and
if there is no file with that name, an error is thrown.
This approach can cause problems, for example, when working with a file that the
program should only read, but the program does not have enough authority to write it. In such
a situation, an attempt to open the file for reading and writing will fail, i.e. both reset and
rewrite will fail. The problem is solved by a global variable called filemode; by
default it contains the value 2, which means that typed files are opened for reading and
writing. If we write 0 to this variable, files will be opened in read-only mode, which will
allow us (using the reset procedure) to successfully open a file that we don't have write
privileges for, but we do have read privileges for; of course, we will only be able to read such
a file. Very rarely there is a situation when we have write permissions to a file, but no read
permissions. In this case we need to set the filemode variable to 1 and use rewrite
to open the file.

2.9.4. Block I/O


In addition to text and typed files, Free Pascal supports so-called untyped files, which
allow you to read and write large portions at once. The original Pascal didn't have such a
§ 2.9. Files 378
facility, but this is the option that best matches the file handling capabilities of modern
operating systems.
Information read from an untyped file can be placed in an arbitrary memory area, i.e.
almost any variable will do; the same can be said about writing to an untyped file: information
for such a write can be taken from a variable of any type. When opening an untyped file, the
block size in bytes is specified, and when reading and writing, the number of blocks to be read
or written. Most often the number 1 is specified as the block size, which allows you
to read and write completely arbitrary fragments.
A file variable for block I/O must be of type file with no element type specified:

var
f: file;

As with other file types, untyped files are named using the assign procedure, and opened
by calling the familiar reset and rewrite procedures, but these procedures have
a second parameter when working with untyped files - an integer indicating the block size. It
is very important not to forget about this parameter, because "by default" (i.e. if you forget to
specify the second parameter) the block size will be 128 bytes, which usually does not
correspond to our purposes. It is not clear why such a "default" is adopted; as we have already
mentioned, the most common size of a "block" is one byte, which is the most universal.
Just as with typed files, both the reset and rewrite routines attempt to open a file
in read-and-write mode by default; this can be affected by changing the value of the global
variable filemode, as described in the previous paragraph.
To read from and write to untyped files, the BlockRead and BlockWrite
procedures are used, which are very similar to each other: both receive four parameters each,
with the first parameter being a file variable, the second parameter being a variable of arbitrary
type and size (except for file variables) into which the information read from the file will
be placed or from which the information to be written to the file will be taken (for
BlockRead and BlockWrite, respectively). The third parameter is an integer
specifying the number of blocks to be read or written, respectively; of course, the product of
this number by the block size to be used must never exceed the size of the variable specified
by the second parameter. Finally, as the fourth parameter, a variable of the longint,
int64, word or integer type is passed to the procedures, and into this variable the
procedures record the number of blocks that they actually managed to read or write. This
result may generally be less than what we expected; this is most often the case when reading,
when there is less information left in the file than we are trying to read. For example:

const
BufSize = 100;
var
f: file;
buf: array[1..BufSize] of char;
res: integer;
begin
§ 2.9. Files 379
{ ... }
BlockRead(f, buf, BufSize, res);
{ ... }
BlockWrite(f, buf, BufSize, res);

One special case is very important for us, which happens only when using BlockRead: if
the variable specified by the last parameter contains the value 0 after the function call, it
means the "end of file" situation has occurred.
In principle, you can leave the fourth parameter unspecified, then any discrepancy between the
result and the expectation will cause an error. It is strongly not recommended to do so, especially
when reading: in fact, if there is not enough data left in the file or if the end of the file is reached, there
is nothing wrong with it, and in general, working with the fourth parameter allows you to write
programs more flexibly.
For example, let's consider a program that copies one file to another, getting their names
from the command line parameters. We will use untyped file variables for both source and
destination files; we will open the first file in read-only mode and the second in write-only
mode. We will read the file in fragments of 4096 bytes (4 Kb), and this size will be set
to a constant; we will use an array of byte type elements of the corresponding size as a
buffer, i.e. a variable where the read information is placed.
We will write to the target file at each step exactly as much information as was read from
the source file. When the "end of file" situation occurs, we will immediately terminate the
read/write loop, and we will have to do it before writing, i.e. from the middle of the loop body;
we will use the break operator for this purpose, and make the loop itself "infinite". After
the loop is finished, we will naturally have to close both files. Since we already know about
the existence of the ErrOutput variable denoting the error message stream, we will output
all such messages into this stream as it should be. After detecting errors, we will terminate the
program with code 1 to show the operating system that something went wrong. The complete
program will look like this:

program BlockFileCopy; { block_cp.pas }


const
BufSize = 4096;
var
src, dest: file;
buffer: array [1..BufSize] of byte;
ReadRes, WriteRes: longint;
begin
{$I-} if ParamCount < 2 then begin
writeln(ErrOutput, 'Expected: source and dest. names');
halt(1)
end;
assign(src, ParamStr(1));
assign(dest, ParamStr(2));
filemode := 0;
reset(src, 1); { block size of 1 byte }
if IOResult <> 0 then begin
writeln(ErrOutput, 'Couldn't open ', ParamStr(1));
§ 2.9. Files 380
halt(1)
end;
filemode := 1;
rewrite(dest, 1);
if IOResult <> 0 then
begin
writeln(ErrOutput, 'Couldn't open ', ParamStr(2));
halt(1)
end; while true do begin BlockRead(src, buffer, BufSize,
ReadRes); if ReadRes = 0 then { end of file! } break;
BlockWrite(dest, buffer, ReadRes, WriteRes); if WriteRes <>
ReadRes then begin writeln(ErrOutput, 'Error writing the
file'); break end; close(src); close(dest) end.

2.9.5. Operations on a file as a whole


Several operations can be performed on a file as a single object, the most important and
popular of which are erase and rename. Pascal provides erase and rename procedures
for this purpose.
The erase procedure receives one parameter - a file variable of arbitrary type, i.e. a
typed file or a variable of file or text type will do. By the time erase
is called, the file variable must be assigned a file name using assign, but the file must
not be open, i.e. after assign is executed, we must either not call reset, rewrite
or append at all, or, having opened the file and worked with it, close it using close.
For example, the following simple program deletes from disk a file whose name is specified
by a command line parameter:

program EraseFile; { erase_f.pas }


var f: file; begin {$I-} if ParamCount < 1 then begin
writeln(ErrOutput, 'Please specify the file to erase'); halt(1)
end; assign(f, ParamStr(1)); erase(f); if IOResult <> 0 then
begin

writeln(ErrOutput, 'Error erasing the file');


halt(1) end
end.
The rename procedure receives two parameters: a file variable and a new name for the
file (as a string). As with erase, the file variable can be of any file type, and at the time
rename is called, it must be assigned a file name using assign, but the file must not
be open. The following program receives two command line arguments, the old and new file
names, and renames the file accordingly:
program RenameFile;
var
f: file;
begin
{$I-}
§2.10. Addresses, pointers and dynamic memory 420
if ParamCount < 2 then
begin
writeln(ErrOutput, 'Expected the old and new names');
halt(1)
end;
assign(f, ParamStr(1));
rename(f, ParamStr(2));
if IOResult <> 0 then
begin
writeln(ErrOutput, 'Error renaming the file');
halt(1) end
end.

2.10. Addresses, pointers and dynamic memory


So far, we have only used variables defined at program writing time; since a variable is
nothing more than a memory location, this means, among other things, that it is at program
writing time that we fully determine how much memory our program will use at runtime.
Meanwhile, when writing a program, we may simply not know how much memory we
need. Learning problems are often specifically formulated so that the size problem does not
arise: for example, we may be told that there can be as many numbers as we want but no more
than 10000, or that strings can be of any length but no longer than 255 characters, or something
like that; in practice, restrictions of this kind may not be acceptable. This is, for example, the
problem solved by the standard sort program: this program reads some text (from a file or
from a standard input stream), then sorts it and outputs the lines of the read text in sorted
order. The program does not impose any restrictions on the length of an individual line, nor
on the total number of lines; just as long as there is enough memory. It should be noted that
the amount of memory may vary from system to system; the sort program is written
without any assumptions about how much memory will be available to it, and tries to take as
much as it needs to sort a particular text. If there is not enough memory in a particular system
for the particular text it is given, the program will of course fail; but if there is enough memory,
it will successfully complete its task, and it will not take up any extra memory during
execution.
Mastering addresses and pointers will allow us to write programs that determine at
runtime how much memory to use. This becomes possible thanks to dynamic variables,
which, unlike ordinary variables, are not described in the program text, but are created during
program execution. It would be wrong to assume that pointers are needed only for this
purpose; their scope of application is much wider.
You may find this chapter to be the most difficult; unfortunately, you may not understand
what we are talking about at all. Pointers are not part of the computer science curriculum, and
it seems that the reason for this is not because students cannot learn them, but because pointers
are beyond the capabilities of most teachers.
Meanwhile, later we will have to learn C language, in which, unlike Pascal, it is
practically impossible to do anything without understanding the concept of addresses and
pointers. Pascal is more loyal to students in this respect, it allows you to write quite complex
§2.10. Addresses, pointers and dynamic memory 421
programs without resorting to pointers, i.e. you can first learn to program in Pascal, and then,
when the time is right, learn pointers at the same time - and thus prepare yourself for the
wisdom of C.
We have previously advised you to skip a paragraph or even an entire chapter if it is too
difficult and come back to it later; we cannot recommend this for the chapter on indexes,
because if you skip it, you will probably not be able to understand anything in either Volume
2 or Volume 3 of this book. If you find the material in this chapter difficult to understand, we
can advise you either to find someone who can help you understand it, or to postpone learning
new tools for a while and try to write programs using everything you have already learned.
Perhaps after a few months and half a dozen written and debugged programs, the chapter
about pointers will not seem so difficult to you. Besides, you may try to read the third part of
our book, which is devoted to programming at the level of machine commands using assembly
language, and then resume your attempts to master pointers. What you should definitely not
do is try to learn C without understanding pointers; you won't succeed anyway.

2.10.1. What is an index


Let us recall a few basic statements to begin with. First, computer memory consists of
identical cells, each with its own address. In most cases, an address is simply an integer, but
not always; on some architectures, an address may consist of two numbers, and it is
theoretically possible to imagine addresses having an even more complex structure. What we
can say for sure is that an address is some information that allows us to uniquely identify
a memory cell.
Second, we often use not individual memory cells, but some sets of cells in a row, which
we call memory areas. In particular, any variable is a memory area (variables of some types,
such as char or byte, occupy exactly one cell, which can be considered a special case
of an area). The address of a memory area is the address of the first cell of this area.
Like other information, addresses can be stored in memory, that is, one memory
location can contain (store) the address of another memory location. Many high-level
programming languages, including Pascal, provide a special type or family of expression
types that correspond to addresses; a variable of this type, that is, a variable that stores an
address (e.g., the address of another variable) is called a pointer.
The whole wisdom of pointers and addresses can be expressed in just two short phrases:
A pointer is a variable that
stores an address

A statement of the form "A points to B"


means "A contains the address of B"
In practice, terminological liberties are often taken by confusing the concepts of address and
pointer; for example, one may hear that a function "returns a pointer", when in fact it does not
return a variable, but a value, i.e. an address, not a pointer. Similarly, it is quite often said that
an address (not a pointer) points to something, although the address, of course, does not point
to a variable but is a property of it. Although statements of this kind are not quite correct
formally, they do not, strangely enough, cause any confusion: it is always clear what is meant.
§2.10. Addresses, pointers and dynamic memory 422
2.10.2. Pointers in Pascal
In most cases, Pascal uses so-called typed pointers - variables that store an address for
which it is known exactly what type of variable is located in the corresponding memory area.
To denote a pointer, the original Wirt description of Pascal used the symbol "1", but in ASCII
(and on your keyboard as well) there is no such symbol, so all real-life versions of Pascal use
the symbol """ (you can easily find this "cap" on the key "6"). For example, if you describe
two variables

var
p: "integer;
q: "real;

then p will be able to store the address of a variable of type integer, and q will
be able to store the address of a variable of type real.
The address of a variable can be obtained using the address taking operation, which is
denoted by "@" , i.e., for example, the expression @x gives the address of variable x. In
182

particular, if we describe a variable of type real and a variable of type pointer to real:

var
r: real;
p: "real;

then it will be possible to put the address of the first variable into the second one:

p := @r;

Just in case, let us emphasize that the address taking operation can be applied to any
variable, not only to a variable that has an identifier name. For example, it can be used to
get the address of an array element or a record field.
Pointers and address expressions in general would be completely useless if it were
impossible to do something with a memory region (i.e., in most cases, with a variable)
knowing only its address. For this purpose, the dereference operation is used, the name of
which is often translated into Russian by the unpretentious word "dereferencing", although
we could use, for example, the term "address reference". This operation is denoted by the
already familiar symbol """, which is placed after the pointer name (or, generally speaking,
after an arbitrary address expression, which can be, for example, a call of a function returning
an address). Thus, after the address assignment from the above example, the expression p~
will denote "what p points to", which in this case is the variable g. In particular, the operator
р' := 25.7;

will put the value 25.7 into the memory located at the address stored in p (i.e. simply

182
In the original version of Pascal there was no such operation, which, in our opinion, complicates not
only the work, but also the explanations; fortunately, in modern versions of Pascal the address taking
operation is always present.
§2.10. Addresses, pointers and dynamic memory 423
into the variable g), and the operator
writeln(p');

will print this value.


Let us note one more important point. Pascal also provides untyped pointers (and
addresses), for which the built-in pointer type is introduced. Addresses of this type are
treated by the compiler simply as abstract addresses of memory cells, without making any
assumptions about what type of values are stored in memory at such an address; pointers of
this type are therefore capable of storing an arbitrary address in memory, regardless of what
is at that address. If we describe a variable of pointer type, e.g.
var
ap: pointer;

then it will be possible to put the address of any type into such a variable; moreover, the value
of this variable can be assigned to a variable of any pointer type, which is actually fraught
with errors: for example, you can put the address of a variable of the string type into
ar, and then forget about it and assign its value to a variable of the "integer" type; if
you now try to work with such a variable, nothing good will come out, because in fact this
address contains not integer, but string. That's why you should be extremely careful
when working with untyped pointers, and it's better not to use them at all unless you seriously
need it. Note that such a need will arise for you not soon, if at all: the real need to use untyped
pointers appears when creating non-trivial data structures, which may be needed only in large
and complex programs.
We might not have mentioned untyped pointers at all in our book, if it weren't for the fact
that the result of the address fetch operation is an untyped address. Why the creators of Turbo
Pascal, who first introduced the address fetch operation into this language, did it this way
remains unclear (for example, in C, the same operation is perfectly capable of generating a
typed address). This aspect of compiler behavior can be corrected, however, by inserting the
{$T+} directive into the program (for example, in its beginning).
There is another important case of an untyped address expression - the built-in constant
nil. It denotes an invalid address, i.e. one where no variable can be located in memory, and
is assigned to variables of pointer types to show that the pointer does not point anywhere at
the moment. The constant nil is sometimes called a null pointer , although it is not strictly
speaking a pointer, since a pointer is such a variable.
If we try to extract the "dry residue" from this paragraph, we get the following:
• if t is some type, then ~t is a "pointer to t" type;
• if x is an arbitrary variable, the expression @x means "address of variable x" (untyped
by default, but if you apply the {$T+} directive, it has the type "address of type T",
where T is the type of variable x);
• if p is a pointer (or other address expression), then p~ denotes "what p points
to";
• the word nil denotes a special "null address" used to show that this pointer does not
§2.10. Addresses, pointers and dynamic memory 424
point to anything at the moment.

2.10.3. Dynamic variables


As we have already mentioned, a dynamic variable is a variable (i.e. a memory area) that
is created during program execution. It is clear that such a variable cannot have a name,
because we set all the names when we write a program. Dynamic variables are handled using
their addresses stored in pointers.
Memory for dynamic variables is allocated from a special memory area called heap.
When a dynamic variable is destroyed, memory is returned back to the heap and can be
allocated for another dynamic variable. If there is not enough space in the heap, our program
(unbeknownst to us) asks the operating system to allocate more memory at the expense of
§2.10. Addresses, pointers and dynamic memory 425
This is a string, which resides in the heap
Р
Figure 2.4. Pointer to a string in dynamic memory, which
increases the heap size. It should be taken into account that it is
impossible to reduce the size of the heap, that is, if our program
has already requested and received memory from the operating
system, this memory will remain at the disposal of the program until its completion. It
follows from this, for example, that if you need to create some dynamic variable at once and
destroy some other one, it is better to destroy the old variable first and then create a new
one. Sometimes it allows you to save on the total size of the heap.
A dynamic variable is created using the new pseudo-procedure, which must be applied
to a typed pointer. Two things happen: first, a memory area of the required size is allocated
from the heap, where the newly created dynamic variable will be placed (in fact, this dynamic
variable is the memory area that has just been allocated); second, the address of the created
dynamic variable is entered into the specified pointer.
For example, if we describe a pointer

var
p: "string;

then we can now create a dynamic variable of type string; this is done using new:

new(p);

When executing this new, first, a memory area of 256 bytes will be allocated from the heap,
which will become our new (dynamic) variable of the string type; second, the address of
this memory area will be stored in the p variable. Thus, we now have an unnamed
variable of the string type, the only way to access it is through its address: the expression
p~ corresponds to this variable. We can, for example, put a value into this variable:

p" := 'This is a string, which resides in the heap';

The structure obtained in the program memory can be schematically represented as shown in
Fig. 2.4.

Deleting a dynamic variable that we no longer need is done using the dispose
pseudoprocedure; its parameter is the address of the variable to be disposed of:

dispose(p);

Strangely enough, the value of the pointer p does not change; the only thing that happens
is that the memory that was occupied by the variable p~ is returned back to the heap; in
other words, the status of this memory area is changed, instead of being occupied, it is marked
as free and available for release on demand (by one of the following new). Of course, the
value of the pointer p cannot be used after that, because we have informed the heap manager
that we are no longer going to work with the variable p~.
It is important to realize that a dynamic variable is not bound to a specific pointer. For
§2.10. Addresses, pointers and dynamic memory 426
example, if we have two pointers:

var
p, q: 'string;

then we can allocate memory using one of them:

new(p);

then at some point put that address into another pointer:

q: = p;

and work with the resulting variable, labeling it as q~ instead of p~; indeed, the address of
our variable is now in both pointers. Moreover, we can occupy the pointer p for something
else, and work with the previously allocated variable only through q, and, when the time
comes, delete this variable using, again, the pointer q:

dispose(q);

What is important here is the address itself (i.e. the value of the address), not which of the
pointers this address currently lies in.
Another very important point is that if you are careless, you can easily lose the address of
a dynamic variable. For example, if we create a variable using new(p), work with it, and
then execute new(p) again without deleting the variable, the heap manager will allocate a
new variable and store its address in the pointer p; as always in such cases, the old value of
the variable p will be lost, but the address of the first allocated dynamic variable was stored
there, and we have no other way to access it!
A dynamic variable that has been forgotten to be freed but is no longer pointed to by any
pointer becomes so-called garbage (garbage; this term should not be confused with the word
junk, which also translates as garbage but in programming usually means meaningless data
rather than lost memory). Some programming languages provide so-called garbage collection,
which ensures automatic detection of such variables and their return to the heap, but Pascal
has no such thing, and this, in general, is even good. The thing is that garbage collection
mechanisms are quite complex and often trigger at the most inopportune moment,
"suspending" for some time the execution of our program; for example, at the DARPA Grand
Challenge robot car competition in 2005, one of the cars running Linux and Java programs
flew at a speed of about 100 km/h into a concrete wall, and one of the possible causes was a
garbage collector that was not activated in time.
Anyway, Pascal doesn't have garbage collection, so we need to be careful about deleting
unnecessary dynamic variables; otherwise, we can run out of available memory very quickly.
By the way, programmers call the process of garbage generation as memory leaks. Note
that memory leaks indicate only one thing: the program author's carelessness and inattention.
There can be no excuses for memory leaks, and if someone tries to tell you the opposite, do
not believe it: such a person simply does not know how to program.
§2.10. Addresses, pointers and dynamic memory 427
The material of this paragraph may leave you a bit perplexed. In fact, why describe a
pointer (for example, to a string, as in our example) and then do some kind of new, if you
can immediately describe a variable of type string and work with it as usual?
Working with dynamic variables makes some sense if these variables have a relatively
large size, for example, several hundred kilobytes (it can be an array of records, some fields
of which are also arrays, etc.); it is simply dangerous to describe a variable of such a size as a
regular local variable, as there may not be enough stack memory, but there will be no problems
with placing it in the heap; besides, if you have many such variables, but not all of them are
needed at the same time, it may be useful to create and delete them However, all these
examples are, frankly speaking, a bit fanciful; pointers are nearly

are never used in this way. The full potential of pointers is revealed only when creating so-
called linked dynamic data structures, which consist of separate variables of the "record" type,
and each such variable contains one or more pointers to other variables of the same type. We
will consider some of such data structures in the following paragraphs.

2.10.4. Single-linked lists


Perhaps the simplest dynamic data structure is the so-called one-link list. Such a list is
built from links, each of which, representing a variable of type "record", has as one of its
fields a pointer to the next element of the list; obviously, such a field must be of type "pointer
to a record of the same type". The last element of the list contains in this field the value nil
to show that there are no more elements. It is enough to store the address of the very first
element of the list in some pointer, and we can get to any other element of the list if necessary.
The word "single-linked" in the name of this data structure means that each of its elements
is pointed to by a single pointer; however, one might as well say that each element has a single
field designed to keep the list connected, and claim that "single-linkedness" implies just that.
Finally, we can say that the list has a single chain of pointer links, and attribute the mysterious
"one-connectivity" to this fact. As we will see later, all these statements are equivalent.
An example of a single-linked list of three elements containing integers 25, 36 and 49 is
shown in Fig. 2.5. To create such a list, we need a record consisting of two fields: the first
will store an integer, the second will store the address of the next link in the list. There is a
trick here in Pascal: we cannot use the name of the type being described when describing its
own fields, i.e. we cannot write like this:

type
§2.10. Addresses, pointers and dynamic memory 428
item =
geso
rrdddud
data: integer;
next: "item; { error! item type is not yet described }
end;

At the same time, Pascal allows us to describe a pointer type using a type name that has not
yet been introduced; as a consequence, we can give a separate name to the item pointer
type before we describe the item type itself, and use that name when describing the item,
for example:
type
itemptr = "item;
item = record
data: integer;
next: itemptr;
end;

Such a description will not cause any objections from the compiler. Having item and
itemptr types, we can describe a pointer for working with a list:

var
first: itemptr;

and the list itself, shown in the figure, can be created, for example, like this:

new(first);
first".data := 25;
new(first".next);
first".next".data := 36;
new(first".next".next);
first".next".next".next".data := 49;
first".next".next".next".next := nil;

Constructions like first".next".data may scare you off; it's a normal reaction, but
we can't afford to be scared for long, so we'll have to figure out what's going on here. So, since
the pointer to the first element is called first, the expression first" will denote
the whole element. Since this element itself is a record of two fields, the fields are accessed,
as with any record, via a dot and a field name; thus, first".data is the field of the
first element.
§2.10. Addresses, pointers and dynamic memory 429

Figure 2.6. Navigation in a single-linked list

element, which contains an integer (in the figure and in our example it is the number 25), and
first~.next~ is a pointer to the next (second) element of the list. In turn,
first~.next~ is the second element of the list itself, and so on (see Figure 2.6).
If something here remains unclear, there is no point in moving on! First get an
understanding of what is going on, otherwise you will not understand anything in the
further text.
Of course, this is not how lists are handled in most cases; the example that required the
dreaded first~.next~.next~.next~.next~.next is more for illustrative
purposes. Usually, when working with a single-linked list, one of two things is done: either
the elements are always placed at the beginning of the list (this is done with an auxiliary
pointer), or two pointers are stored, one at the beginning and one at the end of the list, and the
elements are always placed at the end. Before we show you how it is done, we will offer you
two problems that we strongly recommend you to try to solve yourself, at least before you
read the rest of this paragraph. The point is that once you have figured out how to do it
yourself, you will never forget it, and working with lists will never cause you problems again;
if you start right away with the examples we will give next, it will be much harder to figure
out how to do it yourself, because even the most strong-willed people often cannot resist the
temptation to write "by analogy" (i.e., without fully understanding what is going on).
So here's the first task; it will require creating a single-linked list and adding items to the
beginning of it:
Write a program that reads integers from the standard input stream until the situation
"end of file" occurs, then prints all the entered numbers in reverse order. The number
of numbers is unknown in advance, it is forbidden to introduce explicit restrictions on
this number.

The second task will also require the use of a single-linked list, but we will have to add new
elements to the end of the list, for which we need to work with the list through two pointers:
the first will store the address of the first element of the list, the second - the address of the
last element.

Write a program that reads integers from the standard input stream until
§2.10. Addresses, pointers and dynamic memory 430
the situation "end of file" occurs, after which it prints all entered numbers
twice in the order in which they were entered. The number of numbers is
unknown in advance, it is forbidden to introduce explicit restrictions on
this number.

Keep in mind that you can consider such problems solved no sooner than you have a program
written by you and corresponding to the conditions working on your computer (and working
correctly). Even in this case, the problem is not always correctly solved, because you may
miss some important cases during its testing or simply misinterpret the results obtained; but
if there is no working program at all, it is out of the question that the problem is solved.
In the hope that you have at least tried to solve the proposed tasks, we will continue our
discussion of working with lists. First of all, let us note one extremely important point. If the
context of the problem to be solved implies that you start working with an empty list, i.e. a
list that does not contain a single element, be sure to turn your pointer into a correct empty
list by entering the value nil into it. Beginners often forget about this and get constant
crashes as an output.
Adding an element to the beginning of a singly-linked list is done in three steps. First,
using an auxiliary pointer, we create (in dynamic memory) a new list element. Then we fill
this element; in particular, we make its pointer to the next element point to the element of the
list which is now (for now) the first, and after adding a new one it will become the second,
i.e. just the next element after the new one. The necessary address, as it is easy to guess, is in
the pointer to the first element of the list, and we will assign it to the next field in the new
element. Finally, the third step is to recognize the new element as the new first element of the
list; this is done by entering its address into the pointer storing the address of the first element.
§2.10. Addresses, pointers and dynamic memory 431

first

tmp

d)

first

Fig. 2.7. Putting a new element at the beginning of a single-link list

Schematically what is happening is shown in Fig. 2.7 on the example of the same list of
integers, consisting of elements of type item, and the address of the first element
of the list is stored in the pointer first; at first, this list contains elements storing
the numbers 36 and 49 (Fig. 2.7, "a)"), and we need to put a new element containing the
number 25 in its beginning. For this purpose, we introduce an additional pointer, which will
be called tmp from the English temporary, which means "temporary":

var

tmp: itemptr;
§2.10. Addresses, pointers and dynamic memory 432
In the first step we, as mentioned, create a new element:

new(tmp);

The resulting situation is shown in Fig. 2.7, "b)". Nothing has happened to the list yet, the
created new element does not affect it in any way. The element itself is still very "unfinished":
both of its fields contain incomprehensible garbage, which is shown in the figure by "?!"
symbols. It's time to make the second step - to fill in the fields of the new element. In the
data field we will need to enter the number 25, while the next field should
indicate the element that (after including the new element in the list) will become the next
after the new one; this is, in fact, the element that is now the first in the list, i.e. its address is
stored in the pointer first, and we will assign it to the next field:

tmp'.data := 25;
tmp'.next := first;

The state of our data structure after these assignments is shown in Fig. 2.7, "c)". All that
remains is to declare the new (and fully prepared for its role) element as the first element of
the list by assigning its address to the first pointer; since this address is stored in tmp,
everything turns out to be quite simple:

first := tmp;

The result is the situation shown in Fig. 2.7, "d)". Forgetting about our temporary pointer and
the fact that the first element of the list was just "new", we get exactly the state of the list we
were aiming for.
This three-step procedure of adding a new element to the beginning has one remarkable
property: for the case of an empty list, the sequence of actions to add the first element is
completely the same as the sequence of actions to add a new element to the beginning of
a list that already contains elements. The corresponding sequence of states of the data
structure is shown in Fig. 2.8: at first the list is empty ("a)"); at the first step we create a new
element ("b)"); at the second step we fill it ("c)"); the last action makes this new element the
first element of the list ("d)"). All this works only if we remembered to make the list
correct before starting to work with it by putting the value nil in the first pointer!
§2.10. Addresses, pointers and dynamic memory 433

Fig. 2.8. Operation of the procedure of entering to the beginning with initially empty
list

If we keep putting the items at the beginning of the list, they will be placed in the list in
the opposite order to the order in which they were put there; this is what is required in the first
of the two problems proposed earlier. The solution will consist of two loops, the first of them
will read numbers until the "end of file" situation occurs , and after reading the next number
183

it will be added to the beginning of the list; after that it will be left only to go through the list
from its beginning to the end, printing the numbers contained in it. The whole program will
look like this:

program Numbersl;
type
itemptr = "item;
item = record
data: integer;
next: itemptr;
end;
var
first, tmp: itemptr;
n: integer;
begin
first := nil; { make the list correctly empty! }
while not SeekEof do{ loop reading numbers }
begin
read(n);
new(tmp); { created }
tmp".data := n; { fill }
tmp'.next := first;
first := tmp{ include in list }
end;

Recall that when reading numbers, we need to use the SeekEof


183
function to track the end-of-file
situation; we discussed this in detail in §2.5.4.
§2.10. Addresses, pointers and dynamic memory 434
tmp := first; { go through the list from the beginning
}
while tmp <> nil do{ to the end }
begin
writeln(tmp'.data);
tmp := tmp'.next { move to the next item } end
end.

In this text, we have never used dispose; here, freeing memory makes little sense, because
immediately after the first (and only) use of a constructed list, the program is terminated, so
all the memory allocated to it in the system is freed; of course, this includes the heap. In more
complex programs, such as those that build one list after another and so many times, memory
freeing should never be forgotten. You can free memory from a single-linked list by using a
loop in which at each step the first element is removed from the list, and so on until the list is
empty. There is a trick to this too. If you just dispose(first) the first element of the
list will cease to exist, which means we won't be able to use its fields, but the address of the
second element is stored there. Therefore, before destroying the first element, we must
memorize the address of the next element; after that, the first element is destroyed and
the pointer first is assigned the address of the next element stored in the auxiliary
pointer. The corresponding loop looks like this:

while first <> nil do


begin
tmp := first'.next; { memorize the address of the next }
dispose(first); { destroy the first element }
first := tmp { the list now starts from the next } end

You can also do something else: remembering the address of the first element in the auxiliary
pointer, exclude this element from the list by changing first accordingly, and then
delete it:

while first <> nil do


begin
tmp := first; { memorize the address of the first
}
first := first'.next; { exclude it from the list }
dispose(tmp) { delete it from memory }
end
§2.10. Addresses, pointers and dynamic memory 435
first

first

Figure 2.9. Two ways of deleting the first element of a single-linked list

Fig. 2.9 shows the first method on the left and the second on the right. You can decide for
yourself which one is better; there is no difference in efficiency between them.
As you can see, the condition of the second problem differs from the condition of the first
one mainly in the order in which the items are printed. In principle, no one prevents us from
building the list "backwards", as for the first problem, and then "reverse" it by a separate loop
(try to figure out how to do it yourself as an exercise); but it would be more correct to build
the list in the required order at once, for which we need to add items to the end of the list, not
to its beginning.
If a single-linked list is to be incremented "from the end", it is usual to store not one
pointer for this purpose, as in the previous examples, but two pointers: to the first element of
the list and to the last element. When the list is empty, both pointers must be set to nil. The
procedure of adding a new element to such a list (to its end) looks even simpler than the just
discussed procedure of adding it to the beginning: if, for example, the pointer to the last
element is called last, then we can create a new element right where we need it, i.e. put its
address in last~.next, then move the pointer last to the newly created last element
and fill its fields after that; if we need to add the number 49 to the list, we can do it this
way (see Fig. 2.10):

new(last'.next);
§2.10. Addresses, pointers and dynamic memory 436
first

first

first

first

Fig. 2.10. Adding a new item to the end of the list


last := la.st~.next; last'.data := 49; last'.next := nil;

As is often the case, any simplification has its price. Unlike putting in the beginning, when
only the first pointer is stored for the list, putting in the first element when there
are two pointers is a special case that requires a completely different set of actions: in fact,
the last pointer does not point anywhere in this situation, because the last element
does not exist yet (and none exists at all), so the last~.next variable that we used so
dashingly does not exist either. The correct sequence of actions for putting the number 25
into the list, which was empty before, will now be as follows:

new(first); { create the first element of the list }


last := first; { declare it last }
la.st~.data. := 25; { fill }
la.st~.next := nil;

As we can see, the last two lines remained exactly the same as before (we have purposely
tried to make it so), while the first two lines were completely changed. The above can be
generalized. If the number we put into the list is in the variable n, and we don't know whether
there is at least one item in the list by that time or not, then the correct code for putting an
item into the list would be as follows:

if first = nil then


§2.10. Addresses, pointers and dynamic memory 437
begin
new(first);
last := first
end
else
begin
new(last'.next);
last := last'.next
end;
last'.data := n;
last'.next := nil;
We do not give the full solution of the second problem; given all that has been said, it should
be quite obvious.

2.10.5. Stack and queue


In some computer science textbooks, one can find a statement that "stack" and "queue"
are such data structures; in such cases, a stack is usually referred to as the entity we called a
single-linked list in the previous paragraph. We can only urge the reader not to believe in such
anti-scientific stories; in fact, both stack and queue can be realized by completely different
data structures (lists, arrays, various combinations of both, the same queue is often realized
by means of a ring buffer, etc.; moreover, the values do not even have to be stored in RAM,
we often have to use files on disk); stack and queue themselves are determined not by the data
structure that realizes them, but by the operations available to the user.
Both the stack and the queue represent some abstract "object" that provides (no matter
how) storage of values of a certain type, and two basic operations are defined for this object:
adding a new value and retrieving a value. The difference is in what order the retrieval takes
place: values are retrieved from the queue in the same order in which they were put there, and
from the stack - in the reverse order. In English, a queue is usually denoted by the abbreviation
FIFO (first in, first out, i.e. "first in, first out") and a stack by the abbreviation LIFO (last in,
first out, "last in, first out").
In other words, a queue is such a thing where you can put values of some type (for
example, integer), and you can extract them from there, and when extracting them we always
get the value that (among those that have been entered but have not been extracted yet) has
been in our queue longer than others; with a stack the situation is the opposite: the most "fresh"
values are extracted first.
The simplest variant of stack realization we already know: it is a single-linked list, when
working with it we always deal with its beginning: it is the beginning where new elements are
added, and elements are also retrieved from the beginning. After a bit of thought, we can see
that a queue is also very easy to implement on the basis of a single-linked list, only new
elements should be added to the end instead of the beginning, but elements should still be
retrieved from the beginning.
Note that in both cases we are talking about the realization of a mechanism, concerning
which we care about the available operations, while the way these operations are done may
§2.10. Addresses, pointers and dynamic memory 438
remain behind the scenes. This situation in programming occurs quite often; it is this fact that
gave birth to the so-called object-oriented programming, which is super-popular nowadays.
We will learn what it is later; a separate part will be devoted to it in the third volume of our
book. Before we get to OOP, let us note that in any such situation programmers try to create
a set of subroutines (procedures) that implement all necessary operations, and describe them
in such a way that even if the principles of implementation change (in our case, for example,
when we switch from using a list to using an array), only the text of these procedures will
change, and the programmer using them will not notice anything at all.
In other words, if someone asks us to implement the same stack or some other abstract
object for which operations are important, but not its internal structure, we usually write a set
of procedures and functions and explain how to use this set, being careful that all our
explanations do not depend on what we use inside our procedures as an implementation
mechanism. Such a set of procedures, together with explanations of how to use them, is called
an interface . An interface is said to be successfully designed if no changes in the
184

implementation details of our abstract object lead to changes neither in the interface itself nor
in the explanations of how to use it, i.e. can be painlessly ignored by those programmers who
will use our product.
One approach to creating good interfaces is to first think of an interface based on the set
of operations that the user expects from us, and then try to write an implementation that will
allow the procedures in our interface to work as intended. Since we start thinking about the
implementation after the interface is created, such an interface is very likely to be independent
of a particular implementation, which is what we need.
Let's try to apply this to the case of a stack; for definiteness, let it be a stack of integers of
the longint type. It is clear that no matter how it is realized, we will have to store at least
something, at least some information (if it is realized as a single-linked list, it will be a pointer
to the first element of the list, but we don't want to think about it yet). We can store information
only in variables, and if we remember the existence of variables of the "record" type, we will
notice that we can store any information (of a reasonable size) in one variable if we wish, we
just need to choose a suitable type for it. So, whatever the implementation of our stack is, it
is necessary and sufficient for each stack (of which there can be as many as we like) to describe
a variable of some type; not knowing what the implementation will be, we cannot say what
type it will be, but this does not prevent us from naming it. Since we are talking about a stack
of numbers of the longint type, let's call this type StackOfLongints.
The basic operations on the stack are traditionally denoted by the words push (adding a
value to the stack) and pop (extracting a value). We implement both of these operations as
procedures. At first glance it would be logical to call them something like
StackOfLongintsPush and StackOfLongintsPop, but looking carefully at
the resulting long names, we may (quite naturally) doubt that it will be convenient to use them.
Therefore, we will replace the cumbersome StackOfLongints with the short
SOL, and call the procedures SOLPush and SOLPop respectively. We will provide each
procedure with two parameters. The first of them in both cases will be a variable parameter

The abbreviation API, formed from the words application programming interface, i.e. "application
184

programming interface", is often used.


§2.10. Addresses, pointers and dynamic memory 439
for passing the stack itself to the procedure; through the second parameter, the SOLPush
procedure will receive a value of the longint type to be placed on the stack, while
in the SOLPop procedure the second parameter will be a variable parameter - the
procedure will write the number extracted from the stack into this variable.
It should be noted that putting a new element on the stack is always successful , while 185

removing it from the stack when it is empty is a known error. To avoid such an error, we will
add to our procedures a logical function SOLIsEmpty, which receives the stack as a
parameter and returns true if it is empty and false otherwise. Of course, the function will not
change the stack, but we will still pass the stack through the parameter-variable to avoid
copying the object - possibly cumbersome; looking ahead, we will note that our "stack object"
will be a simple pointer, but at the current stage of interface design we don't "know" this yet.
Let's agree to completely exclude SOLPop calls for an empty stack when working with
186

our subroutines; for this purpose we will always need to check with SOLIsEmpty whether
the stack is empty before calling the SOLPop procedure (unless the presence of values in
the stack is obvious - for example, immediately after executing SOLPush, but this is rarely
the case).
As a matter of fact, we're almost done inventing the interface. Let's remember that a
variable, if nothing is assigned to it, can contain arbitrary garbage, and add a procedure that
turns the freshly described variable into a correct empty stack; let's call it SOLInit. That's
it, the interface is ready. Remembering the method of step-by-step detailing , in which empty
subroutines are written first, we can fix our interface in the form of a program fragment:

type
StackOfLongints = ;

procedure SOLInit(var stack: StackOfLongints);


begin end;

procedure SOLPush(var stack: StackOfLongints; n: longint);


begin end;

procedure SOLPop(var stack: StackOfLongints; var n: longint);


begin end;

function SOLIsEmpty(var stack: StackOfLongints) : boolean;


begin end;

Of course, this fragment won't even compile because we haven't specified what
StackOfLongints is; but we have to start somewhere. Now that the interface is
completely fixed, we can start implementing it. Since we wanted to use a single-linked list,

Except for the case when there is not enough memory, but modern operating systems are such that in this
185

case our program will not know about the problem: it will simply be killed, and automatically.
If it comes to writing documentation on our procedures and functions, such agreements should definitel y
186

be reflected in it.
§2.10. Addresses, pointers and dynamic memory 440
we need a type to link it:

type
LongItemPtr = "LongItem;
LongItem = record
data: longint;
next: LongItemPtr;
end;

Now we can start "fleshing out" our "stub" interface subroutines. To begin with, let's note that
only a pointer to the beginning of the list is needed to work with the stack, so we can use a
variable of type LongItemPtr as StackOfLongints. Let's reflect this fact:

type
StackOfLongints = LongItemPtr;

We take our four subroutines and implement their bodies:

procedure SOLInit(var stack: StackOfLongints);


begin
stack := nil
end;

procedure SOLPush(var stack: StackOfLongints; n: longint);


var
tmp: LongItemPtr;
begin
new(tmp);
tmp".data := n;
tmp".next := stack;
stack := tmp
end;

procedure SOLPop(var stack: StackOfLongints; var n: longint);


var
tmp: LongItemPtr;
begin
n := stack".data;
tmp := stack;
stack := stack".next;
dispose(tmp)
end;

function SOLIsEmpty(var stack: StackOfLongints) : boolean;


begin
§2.10. Addresses, pointers and dynamic memory 441
SOLIsEmpty := stack = nil end;

With the use of these procedures, the solution to the problem of outputting an input
sequence of numbers in reverse order, which we considered in the previous paragraph, can
become noticeably more elegant:

var
s: StackOfLongints;
n: longint;
begin
SOLInit(s);
while not SeekEof do
begin
read(n);
SOLPush(s, n)
end;
while not SOLIsEmpty(s) do
begin
SOLPop(s, n);
writeln(n) end
end.

The full text of our example can be found in the sol.pas file.
Many programmers prefer, when calling a subroutine may cause an error, to form it not as a
procedure, but as a function that returns a logical value. For example, we could execute SOLPop
as a function : 187

function SOLPop(var stack: StackOfLongints; var n: longint)


: boolean; var
tmp: LongItemPtr;
begin
if stack = nil then
begin
SOLPop := false;
exit
end;
n := stack".data;
tmp := stack;
stack := stack".next;
dispose(tmp);
SOLPop := true
end;

This would make the second loop from our example noticeably shorter: while SOLPop(s, n) do
writeln(n)

The disadvantage of this solution, as you can guess, is that here the function has a side effect. We
will refrain from using such techniques until we are forced to do so by the specifics of the C language.

By the way, the first edition of the book did exactly that.
187
§2.10. Addresses, pointers and dynamic memory 442
Let's try to apply the same methodology to the queue. It is clear that we will definitely
need some variable to organize the queue; let's call it of type QueueOfLongints. The
basic operations on a queue, unlike the stack, are usually called put and get; like the stack, the
queue will need an initialization procedure and a function to find out if the queue is empty.
The following stubs can be written to fix the interface:

type
QueueOfLongints = ;

procedure QOLInit(var queue: QueueOfLongints);


begin
end;

procedure QOLPut(var queue: QueueOfLongints; n: longint);


begin
end;

procedure QOLGet(var queue: QueueOfLongints; var n: longint);


begin
end;

function QOLIsEmpty(var queue: QueueOfLongints) : boolean;


begin
end;

For implementation, we will use a list of items of the same LongItem type, but note
that we need two pointers of type LongItemPtr to represent the queue; to combine
them into one variable, we will use a record:

{ qol.pas }
type
QueueOfLongints = record first, last: LongItemPtr; end;

The implementation of the interface procedures will look like this:

procedure QOLInit(var queue: QueueOfLongints);


begin

queue.first := nil;
queue.last := nil
end;

procedure QOLPut(var queue: QueueOfLongints; n: longint);


begin
if queue.first = nil then
begin
new(queue.first);
queue.last := queue.first
§2.10. Addresses, pointers and dynamic memory 443
end
else
begin
new(queue.last~.next);
queue.last := queue.last~.next
end;
queue.last~.data := n;
queue.last~.next := nil
end;

procedure QOLGet(var queue: QueueOfLongints; var n: longint);


var
tmp: LongItemPtr;
begin
n := queue.first~.data;
tmp := queue.first;
queue.first := queue.first~.next;
if queue.first = nil then
queue.last := nil;
dispose(tmp)
end;

function QOLIsEmpty(var queue: QueueOfLongints) : boolean;


begin
QOLIsEmpty := queue.first = nil end;

This code, complete with a demo main program, is in the qol.pas file.

2.10.6. Passing through the list by pointer to pointer


Let's consider a rather simple task: suppose we have the same list of integers and we need
to remove all elements containing negative values from it. The difficulty here is that to remove
an element from the list we need to change the pointer that points to it; if the element to be
removed is not the first in the list, we need to change the value of the pointer next in the
previous element of the list, if it is the first, we need to change the pointer first.
It will not work in this task to loop through the list as we did, for example, to print it. In
fact, executing the loop according to the familiar scheme

tmp := first;
while tmp <> nil do
begin
{ actions with element tmp" } tmp := tmp".next
end

— At the moment of working with an element, we do not remember where the previous
element is located in memory and cannot change the value of its next field. We can try to
§2.10. Addresses, pointers and dynamic memory 444
deal with this problem by storing the address of the previous element in the loop variable:

tmp := first;
while tmp".next <> nil do
begin
{ actions with element tmp".next" } tmp := tmp".next
end

— but then the first element of the list will have to be processed separately; this, in turn, will
entail special processing for the case of an empty list. Besides, this fragment itself can be
executed only if the list is not empty, otherwise it will crash when trying to calculate the
condition in the loop header; the list may be empty as a result of throwing out the first
elements, so an additional check will be needed here as well. As a result, to remove all
negative values we get something like the following fragment:

if first <> nil then


begin
while first".data < 0 do { delete from beginning } begin
tmp := first;
first := first".next;
dispose(tmp)
end
end;
if first <> nil then
begin
tmp := first;
§2.10. Addresses, pointers and dynamic memory 445

Fig. 2.11. Pointer to pointer as a loop variable

while tmp~.next <> nil do


begin
if tmp~.next~.data < 0 then
begin
tmp2 := tmp~.next;
tmp~.next := tmp~.next~.next;
dispose(tmp2)
end
else
tmp := tmp~.next
end
end

It is hard to call the resulting solution beautiful: the fragment is cumbersome because of the
presence of two loops and volumizing if'ci:, and it is rather difficult to read. Meanwhile, in
fact, when deleting the first element, exactly the same actions are performed as when deleting
any other element, only in the first case they are performed on the pointer first, and in
the second case - on tmp~.next.
To reduce the size and make the solution clearer, a non-trivial technique involving a
pointer to a pointer allows us to reduce the size and make the solution clearer. If our list link
type is called item, as before, and its pointer type is called itemptr, then such a
§2.10. Addresses, pointers and dynamic memory 446
pointer to pointer pointer is described as follows:

var
pp: "itemptr;

The resulting variable pp is intended for storing the address of the pointer to the item,
so it can equally well contain both the address of our first and the address of the next
field located in any of the items of the list. This is exactly what we are trying to achieve; we
organize the loop through the list with the removal of negative items by using pp as a loop
variable, and pp will first contain the address of first, then the address of next from
the first link of the list, from the second link, and so on (see Fig. 2.11). In general, pp will at
every moment point to the pointer where the address of the current (considered) link of the
list is located. The initial value of the pp variable will be obtained as a result of the obvious
assignment pp := @first, the transition to the consideration of the next element will
look a bit more complicated:

pp := @(pp"".next);

If the two "caps" after pp seem too complicated, remember that pp" is what pp points
to, i.e., as we agreed, a pointer to the current element; hence, pp is the current element
itself, pp .next is its next field (i.e., just a pointer to the next element), we take
an address from it, put that address into pp, and then we're ready to work with the next
element.
The transition to the next one should be performed only if the current element has not
been deleted; if the element has been deleted, the transition to the next one will happen by
itself, because although the address in pp will not change, the address stored in the pp"
pointer (whether it is the first or the next field, it doesn't matter) will change.
It remains to understand what the condition of such a loop should be. The loop should
end when there is no next element, i.e. when the pointer to the next element contains nil.
This can be either the first pointer (if the list is empty) or the next field of the last
element in the list (if there is at least one element in the list). This condition can be checked
by the expression pp" <> nil. Finally (assuming that the variable tmp is of type
itemptr, as before, and the variable pp is of type "itemptr), our loop will take
the following form:

pp := @first;
while pp" <> nil do
begin
if pp"".data < 0 then begin
tmp := pp";
pp" := pp"".next;
dispose(tmp)
end else
pp := @(pp .next) end
§2.10. Addresses, pointers and dynamic memory 447
Comparing this solution with the previous one, we can see that it repeats almost verbatim the
loop that was the main loop in that solution, but we have lost the separate loop for removing
items from the beginning and separate checks for the list emptiness. The use of a pointer to a
pointer allowed us to generalize the main loop, thus removing the need to handle special cases.
Using a pointer to pointer allows you not only to delete elements from any place in the
list, but also to insert elements in any position of the list - more precisely, in the place where
the pointer pointed to by the working pointer is currently pointing. If the working pointer
contains the address of the first pointer, the insertion will be performed at the
beginning of the list, but if it contains the address of the next field of one of the links of the
list, the insertion will take place after this link, i.e. before the next link. It can also be an
insertion to the end of the list, if only the address of the next field from the last link of
the list is in the working pointer. For example, if the working pointer is still called pp, the
auxiliary pointer of the itemptr type is called tmp, and the number to be added to the
list is stored in the variable n, then inserting a link with the number n into the position
marked with pp will look like this:

new(tmp);
tmp".next := pp"; tmp".data := n; pp" := tmp;

As you can see, this fragment repeats verbatim the procedure for inserting an element at the
beginning of a single-linked list that we saw on page 413, except that pp" is used instead of
first; as with deleting an element, this method of inserting an element is a generalization of
the procedure for inserting at the beginning. 413, except that pp" is used instead of
first; as in the case of deleting an element, this method of inserting an element is a
generalization of the procedure for inserting an element at the beginning. The generalization
is based on the fact that any "tail" of a singly-linked list is also a singly-linked list, only instead
of first, the next field of the element preceding such a "tail" is used for it. We will
use this fact later; among other things, it allows a rather natural application of recursion to the
processing of single-linked lists.
By being able to insert links into arbitrary places in a list, we can, for example, work with
a list of numbers sorted in ascending order (or descending order, or by any other criterion).
At the same time, we need to make sure that each element is inserted exactly in the right
position of the list, so that the list remains sorted after such an addition.
This is done quite simply. If, as before, the number to be inserted is stored in the variable
n, and the list to the beginning of which first points is sorted in ascending order, then
inserting a new element can be done, for example, in the following way:

pp := @first;
while (pp"" <> nil) and (pp"".data < n) do
pp := @(pp"".next);
new(tmp);
tmp".next := pp";
tmp".data := n;
pp" := tmp;
§2.10. Addresses, pointers and dynamic memory 448
Only the first three lines are of interest here, we have already seen the other four. The while
loop allows us to find the right place in the list, or, more precisely, the pointer that points to
the link before which we need to insert a new element. As usual, we start from the beginning,
in this case it means that the address of pointer first is written into the working pointer.
There may be no next element in the list at all - if there are no elements in the list at all, or if
all the values stored in the list are less than the one that will be inserted now; in this case, the
pointer pp" (i.e. the one our working pointer points to) will have the value nil - hence
the first part of the condition in the loop. In such a situation, we have nowhere to go further,
we need to insert it right now.
The second part of the loop condition ensures that we find the right position: as long as
the next element of the list is smaller than the inserted one, we need to move further down the
list.
Here we can pay attention to one important peculiarity of calculating logical expressions. If, for
example, the pp"" pointer is equal to nil, an attempt to access the pp"".data variable will
lead to an immediate program crash, because the pp"" record simply does not exist. Fortunately,
such a reference will not happen. This happens because Free Pascal uses "lazy semantics" when
evaluating logical expressions: if the value of an expression is already known, its further
subexpressions are not evaluated. In this case, there is an and conjunction between the two parts
of the condition, so if the first part is false, the whole expression is false, so there is no need to
evaluate the second part.
It will be useful to know that the original Pascal, as described by Niklaus Wirth, did not have this
property: when any expression was evaluated there, all its subexpressions were evaluated. If Free
Pascal had done the same thing, we would have had to go to a lot of trouble, because evaluating the
condition (pp~ <> nil) and (pp".data < n) would have guaranteed a crash if pp~
was nil. I must say that the break operator was not present in the original Pascal either, instead
of it we had to use goto. However, the original Pascal didn't have the address fetch operation either,
so we couldn't apply our "pointer to pointer" technique there.
Free Pascal allows you to use the "classical" semantics of evaluating logical expressions, where
all subexpressions are necessarily evaluated. This is enabled by the {$B+} directive, and disabled
by the {$B-} directive. Fortunately, it is the {$B-} mode that is used by default. Most likely, you
will never need to change this mode in your practice; if you think you do, think again.
Let us note one more point just in case. For some reason, the authors of many tutorials are afraid
to talk about the address capture operation, and without it, the technique described here is
inaccessible. Instead of it, such a strange construction as a list with a key link is occasionally
considered, and only its next field is used from the "key link" (in the role in which we use a separate
pointer first), and all this only so that for any of the links used it is possible to consider a "pointer
to the previous element" (for the first element such "previous" turns out to be the "key link"). Needless
to say, such solutions are not applied in real life.

2.10.7. Bilinked lists; decks


The lists we have considered so far are called singly-linked lists, which obviously suggests
the existence of some other lists. In this paragraph we will consider lists called doubly-
connected.
In introducing the notion of a single-linked list, we noted that this name can equally well
be explained by the presence of only one pointer pointing to each of the elements of the list;
§2.10. Addresses, pointers and dynamic memory 449
the presence of only one pointer in each element to maintain cohesion; and the presence of
only one coherent chain built from pointers in the list. Similarly, in a doubly-linked list, each
element in the list is pointed to by two pointers (from the previous element and from the next
element), each element has two pointers - to the previous element and to the next element,
and the list has two chains of cohesion - forward and backward. It should be noted that usually
(though not always) such a list is handled with the help of two pointers - to the

the first element of the list and the last element. An example of a double-linked list is shown
in Fig. 2.12.
A bilinked list can be constructed from links described, for example, as follows:

type
item2ptr = ~item2;
item2 = record
data: integer;
prev, next: item2ptr;
end;

As in the case of the word next for the pointer to the next element, the word prev (from
the English word previous) traditionally means the pointer to the previous element; of course,
you can use another name, but if your program is read by someone else, it will be more
difficult to understand it.
It should be said that doubly-linked lists are used somewhat less often than unilinked
ones, because they require twice as many pointer operations for any operations of link
insertion and deletion, and the links themselves become larger due to the presence of two
pointers instead of one, i.e. they occupy more memory. On the other hand, double-linked lists
have a number of undoubted advantages over single-linked lists. First of all, it is the obvious
symmetry, which allows to look through the list both in forward and backward directions; in
some problems it is important.
The second undoubted advantage of a doubly-linked list is not so obvious, but sometimes
even more important: knowing the address of any link of a doubly-linked list, we can find all
its links in memory. Incidentally, the pointer-to-pointer technique discussed in the previous
paragraph is never necessary when working with a doubly-linked list. In fact, let the current
link be pointed to by a pointer current; if the prev field in the current link is null, then
§2.10. Addresses, pointers and dynamic memory 450
it is the first link, and when inserting a new element to the left of it, first must be
changed; otherwise, it is not the first link, that is, there is a previous link in which the next
field must be changed, but we know where it is: the expression current~.prev~.next
gives the pointer that must be changed. In addition, the current~.prev field must be
changed. Inserting to the right of the current link is done in the same way: first change
current~.next~.prev, and if it does not exist (i.e. the current link is the last in the list),
then last; then change current~.next.
If the first, last and current links of a doubly-linked list are pointed to by the first,
last and current pointers respectively, the new number is in the variable n, and
the temporary pointer is called tmp, inserting a new link to the left of the current link looks
like this:

new(tmp);
tmp~.prev := current~.prev;
tmp~.next := current;
tmp~.data := n;
if current~.prev = nil then
first := tmp
else
current~.prev~.next := tmp; current~.prev := tmp;

The inset on the right looks similar:

new(tmp);
tmp~.prev := current;
tmp~.next := current~.next;
tmp~.data := n;
if current~.next = nil then
last := tmp
else
current~.next~.prev := tmp; current~.next := tmp;

Deleting the current item is even easier:

if current~.prev = nil then first := current~.next else


current~.prev~.next := current~.next; if current~.next =
nil then last := current~.prev else current~.next~.prev :=
current~.prev; dispose(current);

This fragment has some disadvantage: the pointer current after its execution points to a
non-existent (destroyed) element. This can be avoided by using a temporary pointer to delete
an element, and putting the address of the previous or the next element into current
beforehand (before destroying the current element), depending on whether we go
through the list in forward or backward direction.
Adding an element to the beginning of a doubly-linked list, taking into account a
possible special case, can look like this:
§2.10. Addresses, pointers and dynamic memory 451
new(tmp);
tmp~.da.ta. := n;
tmp~.prev := nil;
tmp~.next := first; if first = nil then last := tmp else
first~.prev := tmp;
first := tmp;

Adding to the end is done in the same way (with the precision of replacing the direction):

new(tmp);
tmp~.data := n;
tmp~.prev := last;
tmp~.next := nil;
if last = nil then
first := tmp
else
last~.next := tmp;
last := tmp;

We can generalize the above procedure for inserting to the left of the current link by assuming
that if the current link does not exist (i.e., the insertion is "eleev from a nonexistent link"),
then it is an end-of-list insertion. It will look like this:

new(tmp);
if current = nil then
tmp~.prev := last
else
tmp~.prev := current~.prev;
tmp~.next := current;
tmp~.data := n;
if tmp~.prev = nil then
first := tmp

else
tmp~.prev~.next := tmp;
if tmp~.next = nil then
last := tmp
else
tmp~.next~.prev := tmp;

A similar generalization of insertion to the right of the current link to the situation "insertion
to the right of a nonexistent link is insertion to the beginning" happens almost the same way,
only the lines responsible for filling the fields tmp~.prev and tmp~.next are
changed (in our fragment these are lines two through six):

tmp~.prev := current; if current = nil then tmp~.next :=


first else
§2.10. Addresses, pointers and dynamic memory 452
tmp~.next := current~.next;

We do not provide schematic diagrams of pointer changes for all these cases, leaving the
visualization of what is happening to the reader as a very useful exercise.
Bilinked lists allow you to create an object commonly called a deque (deque ). A deque 188

is an abstract object that supports four operations: add to the beginning, add to the end, extract
from the beginning, and extract from the end. A value added to the beginning of a deck can
be immediately retrieved back (by applying extract from beginning), but if you extract values
from the end, the value just added to the beginning will be retrieved after all other values
stored in the deck; the situation is symmetrical with add to end and extract from beginning.
When using only "add to start" and "fetch from start" operations, the deck turns into a
stack (as well as when using add and fetch from end), and when using "add to start" and "fetch
from end" - into a queue; but you should not use the deck as a stack or a queue, because there
are much simpler implementations for stack and queue: they can be realized through a single-
linked list, while the deck cannot be realized through a single-linked list (or, rather, you can
use two such lists, but this is very inconvenient). Note that in English the corresponding
operations are usually called pushfront, pushback, popfront and popback.

As usual, we can start the implementation of a deck with a stub. For example, for a deck
storing numbers of the longint type, the stub would be as follows (as before, let's assume
that before accessing element extraction procedures, we always check if our object is empty):

type
LongItem2Ptr = ~LongItem2;
LongItem2 = record
data: longint;
prev, next: LongItem2Ptr;
end;

LongDeque = record
first, last: LongItem2Ptr;
end;

procedure LongDequeInit(var deque: LongDeque);


begin
end;
procedure LongDequePushFront(var deque: LongDeque; n:
longint);
begin
end;
procedure LongDequePushBack(var deque: LongDeque; n: longint);
begin
end;

188
In fact, the origin of the term "deck" is not so obvious. Originally, the object in question was called
"double-ended queue", in English double-ended queue; these three words were shortened by English-speaking
programmers first to dequeue, and then to deque.
§2.10. Addresses, pointers and dynamic memory 453
procedure LongDequePopFront(var deque: LongDeque; var n:
longint);
begin
end;
procedure LongDequePopBack(var deque: LongDeque; var n:
longint);
begin
end;
function LongDequeIsEmpty(var deque: LongDeque) : boolean;
begin end;

We will offer the reader to implement all these procedures and functions himself.

2.10.8. Overview of other dynamic data structures


First of all, we should note that in many tasks a list link contains not just one data field,
as in our examples, but several such fields. There is often a need to link one value with other
values, for example, an inventory number with a description of an item, or a person's name
with his detailed questionnaire, or a computer's network address with a description of the
network services running on it, etc. In this case, one link contains both the information that is
searched for (number, name, address) and the data that serves as the result of the search. Note
that the information on which the search is performed is called the search key.
Searching a list, whether it is a one-link or a two-link list, is quite slow, because you have
to go through the list link by link until you find the right key. Even if the list is maintained in
a key-sorted order, on average you have to look through half of the links in the list for each
search. In other words, on average, when searching for a record with the desired key in the
list, kp operations are spent per search, where n is the length of the list and k is some constant
factor; the duration of each search operation is said to depend linearly on the number of
records stored. If, say, the length of the list triples, the average time taken to find the desired
record will increase by the same factor. Such a search is called linear search.
Linear search can satisfy us only in tasks where the number of records does not exceed
several dozens; if the number of records is several hundreds, we will have problems with
linear search; if the number of records is several thousands, the user may be indignant about
how slow our program works, and he will be right; and when the number of records goes to
the area of tens of thousands, we should forget about linear search - we might as well not write
the program at all, it will be useless anyway.
Fortunately, the variety of data structures built from links floating in the "heap" is by no
means limited to lists. More "tricky" data structures can dramatically reduce search time; one
of the simplest variants here is the so-called binary search tree. Like a list, a tree is built from
variables of type "record" containing pointer fields to a record of the same type; the records
composing the tree are usually called nodes. In the simplest case, each node has two pointer
fields, a right subtree and a left subtree; these are usually called left and right. A
pointer containing the value nil is treated as an empty subtree.
A binary search tree is maintained in sorted form; in this case it means that both for the
tree itself and for any of its subtrees all values smaller than the value in the current node are
§2.10. Addresses, pointers and dynamic memory 454
located in the left subtree, and all values larger than the current one are located in the right
subtree. An example of a binary search tree containing integers is shown in Fig. 2.13. In such
a tree it is possible to find any element knowing the required value or to make sure that there
is no such element in the tree for the number of actions proportional to the height of the tree,
i.e. the length of the largest possible coherent chain from the root of the tree to

Fig. 2.13. Binary search tree storing integers

of a "leaf" node (in the tree in the figure such a height is three). Note that a binary tree of
height h can contain up to 2h - 1 nodes; for example, a tree of height 20 can contain more than
a million values, and the search will be performed for 20 comparisons; if the same values were
in a list, the average number of comparisons for each search would require half a million.
Of course, upon closer examination, everything turns out to be not so simple and rosy.
The tree is not always as densely filled with nodes as in our figure. An example of such a
situation is shown in Fig. 2.14; at height 5, the tree here contains only nine nodes out of 31
possible nodes. Such a tree is called unbalanced. In the worst case, the tree can take a
degenerate form when its height is equal to the number of elements; this happens, for example,
if the values introduced into the tree are initially arranged in ascending or descending order.
However, the worst case scenario for a tree is always the case for a list. In addition, there are
tree balancing algorithms that allow you to rebuild the search tree by reducing its height to
the minimum possible height; these algorithms are quite complex, but they can be
implemented.
We will return to the consideration of binary search trees in the chapter devoted to
"advanced" cases of using recursion. The point is that all basic operations on a binary tree,
except for balancing, are many times (!) easier to write with recursion than without it. For
now, let us note that trees, of course, are not only binary; in the general case, the number of
descendants of a tree node can be made as large or dynamically changed as desired.
§2.10. Addresses, pointers and dynamic memory 455

Fig. 2.14. Unbalanced binary search tree

In the search, the selection of a descendant of the next node is made according to the letters
in the alphabet. For example, there is a well-known method of storing text strings, in which
each node of the tree can have as many descendants as there are letters in the alphabet; when
searching, the descendant of the next node is selected according to the next letter in the string.
The height of such a tree is always equal to the length of the longest string stored in it, and it
does not need balancing.
The search time in a list is proportional to the length of the list, the search time in a
balanced binary tree is proportional to the binary logarithm of the number of nodes; for cases
when the search time is critical and the amount of stored data is large, there is an approach in
which the search time does not depend on the number of stored elements at all - it remains
constant even for a dozen values, even for ten million. The method is called a hash table.
To create a hash table, a so-called hash function is used. In essence, it is just some
arithmetic function that produces an integer number for a given search key value, and this
number should be difficult to predict; it is said that the hash function should be a well-
distributed random variable. It is important that the hash function should depend only on the
value of the key, i.e. the hash function should always take the same values on the same keys.
The hash table itself is an array of a size known to be of the same value.
The hash function is used to calculate the number of records exceeding the required number
of records; initially, all array elements are marked as free. Having calculated the hash function,
§2.10. Addresses, pointers and dynamic memory 456
its value is used to determine in which array position the record with the specified key should
be located; this is done by simply calculating the remainder of the hash function value divided
by the array length. To reduce the probability of two keys hitting the same position, the size
of the array used for the table is usually chosen to be a simple number. 189

If the calculated position is empty when entering a new element into the hash table, the
new record is entered into this position. If the calculated position is empty when searching for
a key in the hash table, it is considered that there is no record with this key in the table. A
somewhat more interesting question is what to do if a position is occupied but the key of the
record in this position is different from the key we need; this situation means that two different
keys, despite all our efforts, have the same residuals from dividing the hash function by the
array length. This is called a collision. There are two main methods of collision resolution.
The first of them is very simple: the array stores not the records themselves, but pointers that
allow forming a list (a usual one-link list) of records whose keys give the same hash value (or
rather, not the hash, but its residue from division by the array length). The method, despite its
external simplicity, has a very serious drawback: the lists themselves are quite bulky and time-
consuming to work with them.
The second way of resolving collisions is trickier: if a position in the table is occupied by
a record whose key does not match the one you are looking for, the next position is used, and
if it is occupied, the next one after it, and so on. When a new record is entered into the table,
it is simply placed in the first free position found (after a hash function has been defined);
when searching, the records are looked through one by one in search of the required key, and
if a free record is found, it is considered that the required key is absent in the table. The
disadvantages of this method include a rather unobvious algorithm for deleting records from
the table. The easiest way to do it is as follows: if the positions immediately following the
record to be deleted are occupied by something, then find the first free position in the table,
and then sequentially find every record located in the table between the ones just deleted.
The record deleted and found first free, temporarily deleted from the table, and then included
back according to the usual rule (the record may be included earlier or in exactly the same
place). Clearly, this may take quite a lot of time. There is a more efficient procedure for
deleting records that actually does the same thing, but completes in some cases before it finds
an empty position (see, e.g., [10], §6.4, "R algorithm"); this procedure is more difficult to
explain, and it is easy to make a mistake if not clearly understood.
It should be noted that both methods (the first to a lesser extent, the second to a greater
extent) are sensitive to table filling. It is considered that a hash table should be no more than
two-thirds full, otherwise the constant linear search (either through lists or through the table
itself) will negate all the benefits of hashing. If there are too many records in the table, it must
be rebuilt using a new size; for example, you can double the current size, then find the nearest
prime number from above and declare the new table size. Rebuilding a table is an extremely
inefficient operation, because each record from the old table will have to be added to the new
one by the usual rule with calculation of hash function and taking the remainder from division,
and there are no "reduced" algorithms for this.
One way or another, building hash tables requires arrays with unknown - dynamically

Recall, just in case, that a prime number is a natural number that is divisible only by one and itself.
189
§2.11 More on recursion 457
defined - lengths. The original Pascal language did not include such tools; modern dialects,
including Free Pascal, support dynamic arrays, but we will not consider them; if you wish,
you can learn this tool yourself.

2.11. More on recursion


We have already encountered recursion in §2.3.7; unfortunately, at that time we knew too
little about it and could not understand the examples in which the expressive power of
recursion would be fully revealed. Now that we know how to work with strings and lists, have
an idea of trees, and have more experience in general, we will return to the discussion of
recursive programming and try to give a more adequate idea of its capabilities.

2.11.1. Reciprocal recursion


Earlier we mentioned that recursion can be mutual; in the simplest case, one subroutine
calls another, and the second subroutine calls the first again. There is a small technical
problem: one of the two subroutines involved in mutual recursion must be described in the
program before the second, but it must (by convention) contain a call to the second subroutine
- that is, a call to the subroutine that will be described later. The problem here is that the
compiler does not know the name of the second subroutine yet and, consequently, does not
allow using this name, i.e. calling the second subroutine from the first one will cause a
compilation error.
To solve this problem, Pascal introduced the concept of preliminary declaration
(declaration) of subroutines. Unlike the subroutine description, the declaration only tells the
compiler that later in the program text there will be a subroutine (procedure or function) with
such and such name, and it will have such and such parameters, and for functions - such and
such type of return value. The declaration does not give any more information to the compiler,
neither the subprogram body nor the local descriptions section is written in it. Instead of all
this, the keyword forward and a semicolon are placed immediately after the header,
formed in the usual way, for example:

procedure TraverseTree(var p: NodePtr); forward;


function CountValues(p: NodePtr; lim: longint): integer;
forward;

Such declarations allow us not to write the subroutine body, which we may not be ready for
yet, usually because not all subroutines called from this body are described in the program
yet. The declaration tells the compiler, first, the name of the subroutine and, second, all the
information necessary to check the correctness of calling such subroutines and to generate the
machine code that makes the calls (this may require transformations of expression types, so
that the types of the subroutine's parameters at the point of its call must be known to the
compiler not only for correctness checking).
When constructing mutual recursion, we first provide in the program a declaration (with
the word forward) of the first of the subroutines involved in it, then describe (completely,
§2.11 More on recursion 458
with a body) the second of these subroutines, and then provide a complete description of the
first. In both bodies, the compiler can see the names of both subroutines and all the
information needed to call them, which is what we needed.
Clearly, the declared subroutine must be described later; if this is not done, an error will
occur at compile time.

2.11.2. The Towers of Hanoi


We considered the Tower of Hanoi problem in the introductory part of this book when
we discussed the concept of an algorithm (see §1.3.5). At that time we promised to
demonstrate the text of this program first in Pascal and then in C; now it is time to fulfill the
first part of the promise.
Recall that in this problem there are three rods, one of which has N flat disks of different
sizes, the largest disk at the bottom and the smallest disk at the top. In one move it is possible
to move one disk from one rod to another, at the same time it is possible to put only the smaller
disk on top of the larger one; any disk can be placed on the empty rod. It is not possible to
take several disks at once. The goal is to move all disks from one rod to another in the smallest
possible number of moves, using the third one as an intermediate one.
As we discussed in §1.3.5, the simplest and best known solution is constructed as a
recursive procedure for transferring a given number n of disks from a given source rod to a
given target rod using a given intermediate (it could be computed, but it is easier to obtain its
number by a parameter of the procedure); the simpler case is the transfer of n - 1 disks, and
the recursion basis is the degenerate case p = 0, when nothing needs to be transferred anywhere
anymore.
The initial data for our program is the total number of disks, which for convenience we
will get as a command line parameter . The recursive implementation of the algorithm will
190

be formalized in the form of a procedure, which for clarity receives four parameters: the
number of the source rod (source), the number of the target rod (target), the
number of the intermediate rod (transit) and the number of disks (n). The
procedure itself will be called "solve". During its work it will print lines like "1: 3 ->
2" or "7: 1 -> 3", which will mean, respectively, the transfer of disk number 1 from the
third rod to the second and disk number 7 from the first to the third. The main program, having
converted the command line parameter into a number , will call the solve subroutine with
191

parameters 1, 3, 2, N, as a result of which the problem will be solved. The text of


the program will be as follows:

program hanoi; { hanoi.pas }

procedure solve(source, target, transit, n: integer);


begin
if n = 0 then

Let's not be like school teachers and read this number from a keyboard; it's inconvenient and simply stupid.
190

Here we will use the built-in val procedure for brevity.


191
§2.11 More on recursion 459
Exit;
solve(source, transit, target, n-1);
writeln(n, ': ', source, ' -> ', target);

solve(transit, target, source, n-1) end;

var
n, code: integer;
begin
if ParamCount < 1 then
begin
writeln(ErrOutput, 'No parameters given');
halt(1)
end;
val(ParamStr(1), n, code);
if (code <> 0) or (n < 1) then
begin
writeln(ErrOutput, 'Invalid token count');
halt(1)
end;
solve(1, 3, 2, n)
end.

Note that the procedure that solves the problem consists of only eight lines, three of which are
service lines.
Let us now try to do without recursion. First, let us recall the algorithm we gave on page
183: on odd moves the smallest disk is moved "in a circle". 183: on odd moves, the smallest
disk is moved "in a circle", and if the total number of disks is even - in the "forward" order,
i.e. 1 ^ 2 ^ 3 ^ 1 ^ ..., and if the total number of disks is odd - in the "reverse" order, i.e. 1 ^ 3
^ 2 ^ 1 ^ ...; as for even moves, on them we do not touch the smallest disk, as a result of which
the move is unambiguous.
An attempt to implement this algorithm in the form of a program encounters an
unexpected obstacle. For a human, an action like "look at the rods, find the smallest disk and,
without touching it, make the only possible move" is so simple that we execute such an
instruction without thinking for a second; in a program, however, we have to remember which
rod currently has which disks on it and perform many comparisons, taking into account that
the rods may turn out to be empty.
In the archive of examples to this book you will find the corresponding program in a file
named hanoi2.pas. Here we do not give its text in order to save space. Let us only note
that to store information about the location of disks on the rods, we used three single-linked
lists in this program, one for each rod, and the first list at the beginning of the work contains
numbers from n to 1 in reverse order, where n is the number of disks; to store pointers to the
first elements of these lists, we use an array of three corresponding pointers. To solve the
problem, we start a loop that runs as long as at least one "disk" (i.e., a list element) is present
in lists #1 and #2. We simply compute the motion of the smallest disk on odd-numbered
§2.11 More on recursion 460
moves by formulas, for which we use the move number. In particular, the number of the rod
from which the disk must be taken is calculated as follows for an even total number of
disks:

src := (turn div 2) mod 3+1;

and at odd so (turn means the move number):

src := 3 - ((turn div 2+2) mod 3);

The case with even strokes is more complicated. To calculate such a move, we must first find
out on which of the rods the smallest disk is located, and exclude this rod from consideration.
Then for the remaining two rods we need to determine in which direction the disk is moved
between them, taking into account that one of them may be empty (in which case the disk is
moved to it from the other rod), or they may both contain disks, in which case the smaller disk
is moved to the other rod, where the larger one is located. The body of the subroutine in which
these actions are placed, despite their apparent simplicity, took 15 lines.
The total length of the program was 111 lines (against 27 for the recursive solution). If
we discard the empty and service lines, as well as the text of the main part of the program,
which is practically the same in both cases (only the parameters of the solve procedure call
differ), and count only the significant lines that implement the solution, the recursive version
had eight such lines (the text of the solve subroutine), while the new solution has 87 lines.
In other words, the solution has become more complicated by more than an order of
magnitude!
Now let's try to make a recursion-free solution that does not use the above "tricky"
algorithm. Instead, recall that to move all disks from one rod to another, we must first move
all but the largest disk to the intermediate rod, then move the largest disk to the target rod, and
finally move all other rods from the intermediate to the target rod. Although the description
of the algorithm is obviously recursive, it is possible to implement it without recursion; in
principle, this is true for any algorithm, i.e. recursion can always be replaced by a loop, the
only question is how difficult it is.
The difficulty is that at each moment we have to remember what we are moving to where
and for what purpose. For example, in the process of solving the problem for four disks, at
some point we move the first (smallest) disk from the second rod to the third, to move two
disks from the second rod to the first, to be able to move the third disk from the second rod to
the third, to move three disks from the second rod to the third (since we have already moved
the fourth disk there), to move all four disks from the first to the third rod. This can be
somewhat more conveniently diagrammed "from the end":
• we solve the problem of transferring four disks from the first rod to the third one, and
we have already removed all disks to the second rod, transferred the fourth disk to the
third rod and now we are in the process of transferring all other disks there, i.e.
• we solve the problem of transferring three disks from the second rod to the third one,
and now we are trying to free the third disk in order to transfer it to the third rod, for
which purpose
• we solve the problem of transferring two disks from the second rod to the first one, and
§2.11 More on recursion 461
now we are trying to free the second disk in order to transfer it to the first rod, for which
purpose
• transfer the first disk from the second rod to the third.
Obviously, here we are dealing with tasks that are characterized by information about how
many disks we want to move, from where, to where, and what state we are in. We have a
variable number of tasks, so we'll have to use some kind of dynamic data structure, the easiest,
I guess, is a regular list. For each task we have to store the number of disks and two numbers
of rods, as well as the state of things, and we can distinguish three such states:
1. we have not yet done anything at all; the next action in this case is to clear the largest
of our disks, removing all that are smaller than it to the intermediate rod;
2. we've already cleared the largest disk, now we need to move it, and then move all the
disks smaller than it to it;
3. we've already solved it, so we can take it off the list.
In principle, one of these states (the last one) could be "spared", but this would be at the
expense of program clarity. To denote the states, we will use an enumerated type with three
possible values, denoting what should be done next when we see the given task again:

type
TaskState = (StClearing, StLargest, StFinal);

Let's describe the type for task list links:


type
ptask = "task;
task = record
amount, src, dst: integer;
state: TaskState; next: ptask;
end;

and two signs, a working and a temporary one:

var
first, tmp: ptask;

To solve the problem we need to move all disks (n pieces) from the first rod to the third one.
Let's formalize this as a task and put the resulting task in the list (as a single item), considering
that we haven't done anything yet, so we'll specify StClearing as the task state:

new(first);
first".amount := n;
first".src := 1;
first".dst := 3;
first".state := StClearing;
first".next := nil;
§2.11 More on recursion 462
Next, a cycle is organized, which runs until the task list is empty. At each step of the loop, the
actions to be performed depend on the state of the task at the beginning of the list. If it is in
the StClearing state, then in the case when this task requires moving more than one
disk, another task must be added to move n - 1 disks to the intermediate stick; if the current
task is created to move only one disk, it does not need to be done. In both cases, the current
task itself is moved to the next state, StLargest, that is, when we see it next time
(which will be either after all the smaller disks have been moved or, if there is only one disk,
right at the next step), we will need to move the largest of the disks implied by this task and
proceed to the final stage.
If at the top of the list there is a task in the StLargest state, the first thing we do is to
move the largest of the disks to which this task applies; the number of this disk coincides with
their number in the task. By "migrate a disk" we mean in this case that we simply print the
appropriate message. After that, if there is more than one disk in the task, we need to
§2.11 More on recursion 463
add a new task to move all the smaller disks from the intermediate stick to the target stick; if
there is only one disk, we don't need to do this. In any case, the current task is put into the
StFinal state, so that the next time we see it at the top of the list, it can be eliminated.
The next problem we face is the calculation of the intermediate rod number; we have
provided a separate function in the program for this purpose:

function IntermRod(src, dst: integer) : integer;


begin
if (src <> 1) and (dst <> 1) then IntermRod := 1
else
if (src <> 2) and (dst <> 2) then
IntermRod := 2
else
IntermRod := 3
end;

Using this function, the basic "problem solving" loop looks like this:
while first <> nil do begin
case firsV.state of
StClearing:
begin
firsV.state := StLargest;
if firsV.amount > 1 then
begin
new(tmp);
tmpVsrc := firstVsrc;
tmpVdst := IntermRod(firstVsrc, firsV.dst);
tmpVamount := firsV.amount - 1;
tmpVstate := StClearing;
tmpVnext := first;
first := tmp
end
end;
StLargest:
begin
firsV.state := StFinal;
writeln(firstVamount, ': ', firstVsrc,
' -> ', firsV.dst);
if firsV.amount > 1 then
begin
new(tmp);
tmpVsrc := IntermRod(firstVsrc, firsV.dst);
tmpVdst := firsV.dst;
tmpVamount := firsV.amount - 1;
tmpVstate := StClearing;
tmpVnext := first;
first := tmp
end
end;
StFinal:

begin
tmp := first;
first := first~.next;
dispose(tmp)
end;
end end;

The full program text can be found in the file hanoi3.pas; note that its size is 90 lines,
§2.11 More on recursion 464
of which 70 are significant lines of the implementation of the solve procedure (including
auxiliary subroutines). The result is somewhat better than in the previous case, but still,
compared to the recursive solution, the difference is almost an order of magnitude.

2.11.3. Comparison with a sample


Let's consider such a problem. We are given two strings of characters whose length is
unknown in advance. We consider the first string as the string to be matched, and the second
string as a sample. In the sample, the symbol '?' can be matched to an arbitrary symbol,
the symbol '*' can be matched to an arbitrary subchain of symbols (possibly empty), the
other symbols denote themselves and are matched only to themselves. Thus, to the pattern
'abc' corresponds only the string 'abc'; to the pattern 'a?c' corresponds any string
of three characters beginning with 'a' and ending with 'c' (the character in the middle
can be any). Finally, any string starting with 'a' corresponds to the 'a*' pattern,
and any string containing the letter 'a' anywhere corresponds to the '*a*' pattern. It is
necessary to determine whether a given string corresponds (completely) to the pattern.
The algorithm for this matching, if recursion can be used in doing so, turns out to be quite
simple. At each step we look at the remainder of the string and the sample; first these
remainders match the string and the sample, then, as the algorithm progresses, the characters
at the beginning are discarded from them, and we assume that for the already discarded
characters the matching was successful. The first thing to do at the beginning of each step is
to check if we have run out of sample. If it has run out, the result depends on whether the
string has also run out. If it has run out, we record a positive result of matching, if it has not
run out, we state failure; indeed, only an empty string can be matched with an empty sample.
If we haven't run out of the sample yet, we check if the first character of the remainder of
the sample (i.e. the first character of the sample) has an '*' at its beginning. If not,
everything is simple: we compare the first characters of the string and the sample; if the first
character of the sample is not a '?' and is not equal to the first character of the string, the
algorithm ends here, stating that the comparison has failed, otherwise we consider that the
next characters of the sample and the string have been successfully matched, discard them
(i.e. shorten the residues of both strings in front) and return to the beginning of the algorithm.
The most interesting thing happens if at the next step the first character of the sample
turned out to be an '*' character. In this case, we need to sequentially try to match this
'asterisk' with an empty sub-chain of the string, with one character of the string, with two
characters, etc., until we run out of the string itself. We do this in the following way. We start
an integer variable that will denote the current variant under consideration. Assign zero to this
variable (we start the consideration with an empty chain). Now, for each alternative under
consideration, we discard one character (asterisk) from the sample, and from the string - as
many characters as the number in our variable. We try to match the obtained residues using
the call of the subroutine we are writing now, i.e. we make a recursive call to "ourselves". If
the result of the call is "true", then we end here, also returning "true". If the result is "false",
we check if we can still increment the variable (if we will not go beyond the string being
matched). If there is nowhere else to increment, we terminate the work by returning "false".
Otherwise, we return to the beginning of the loop and consider the next possible value of the
§2.11 More on recursion 465
chain length matched with the "asterisk".
We will implement this whole algorithm in the form of a function that will receive as
parameters two strings, the matched string and the sample string, and two integers indicating
from which position we should start examining both strings. Our function will return a value
of boolean type: true if the matching succeeded and false if it failed. Taking
into account that values of the string type occupy quite a lot of memory space (256 bytes)
and that our function constantly calls itself, we will pass strings to it as var-parameters, thus
avoiding copying them (see the discussion on this topic on page 350). We will call the function
Matchldx, because it solves the problem of matching, but not just over strings, but over
"residues" of given strings, starting from given indices.
The initial problem will be solved by another function, which we will call simply Match;
it will be passed only two strings, that's all. The body of this function will consist of a call to
the Matchldx function with the same two strings and with index values equal to one, which
means that both strings will be considered from the beginning. This time, we will describe the
string parameters as ordinary parameter-values. The point is that the Match function is called
once, no recursive calls are provided for it, so copying strings during the call is not so bad
here; on the other hand, if its parameters are declared as variable parameters, we will lose the
ability to call this function with parameters that are not variables - for example, we won't be
able to set a pattern as a string literal, which is often necessary. Using parameters-values for
Match, we can use parameters-variables for Matchldx, because it will be called only
from Match - and, therefore, will work with its local variables. In other words, if we need
to match a string with a sample, we will turn to our functions, they will make copies of both
strings for themselves, and then they will work only with these copies.
To test it, let's add a header and a main part to our program that takes a string and a sample
from the command line parameters. The result is as follows:

program MatchPattern; { match_pt.pas }

function MatchIdx(var str, pat: string; idxs, idxp: integer)


: boolean;
var
i: integer;
begin
while true do
begin
if idxp > length(pat) then begin
MatchIdx := idxs > length(str);
exit
end;
if pat[idxp] = '*' then begin
for i:=0 to length(str)-idxs+1 do
if MatchIdx(str, pat, idxs+i, idxp+1) then begin
MatchIdx := true;
exit
end;
§2.11 More on recursion 466
MatchIdx := false;
exit
end;
if (idxs > length(str)) or
((str[idxs] <> pat[idxp]) and (pat[idxp] <> '?')) then
begin
MatchIdx := false;
exit

end;
idxs := idxs + 1;
idxp := idxp + 1
end
end;

function Match(str, pat: string): boolean; begin


Match := MatchIdx(str, pat, 1, 1) end;

begin
if ParamCount < 2 then
begin
writeln(ErrOutput, 'Two parameters expected');
halt(1)
end;
if Match(ParamStr(1), ParamStr(2)) then
writeln('yes') else
writeln('no')
end.

When working with this program, note that "*" and "?" characters are perceived by the
command interpreter in a specific way: it also considers it its duty to make some kind of
comparison with the sample (see §1.2.7 for details). The easiest and safest way to avoid
troubles with special characters is to take your parameters in apostrophes, the command
interpreter does not perform any substitutions inside them; the apostrophes themselves will
disappear when passed to your program, i.e. you will not see them as part of the strings
returned by the ParamStr function. A session of work with the match_pt program
may look like this:

avst@host:~/work$ ./match_pt 'abc' 'a?c'


yes
avst@host:~/work$ ./match_pt 'abc' 'a??c'
no
avst@host:~/work$ ./match_pt 'abc' '***a***c***'
yes
avst@host:~/work$

et cetera, et cetera.
§2.11 More on recursion 467
2.11.4. Recursion when working with lists
As already mentioned, a singly-linked list is recursive in nature: you can assume that the
list is either empty or consists of the first element and the list. This property can be exploited
by processing single-linked lists using recursive subroutines, with the empty list almost
always being the recursion basis. Suppose, for example, we have a simple list of integers
consisting, as in the previous examples, of items of type item:

type
itemptr = "item;
item = record
data: integer;
next: itemptr; end;

Let's start with a simple calculation of the sum of numbers in such a list. Of course, we can
loop through the list, as we have done many times before:
function ItemListSum(p: itemptr) : integer; var
sum: integer;
tmp: itemptr;
begin
tmp := p;
sum := 0;
while tmp <> nil do begin
sum := sum + tmp".data;
tmp := tmp".next
end;
ItemListSum := sum
end;

Now let's try to take advantage of the fact that the sum of an empty list is zero, and the sum
of a non-empty list is equal to the sum of its remainder to which the first item is added. The
new (recursive) implementation of the ItemListSum function is as follows:

function ItemListSum(p: itemptr) : integer;


begin
if p = nil then
ItemListSum := 0
else
ItemListSum := p".data + ItemListSum(p".next) end;

To be honest, if your list contains several million records, it's better not to do this because you
may run out of stack, but on the other hand, lists are not usually used to store millions of
records. If you are not threatened by stack overflow, then successful recursive solutions,
strange as it may seem, may work even faster than "traditional" loops.
We will continue with the procedure of deleting all elements of the list. How this is done
cyclically, we have considered in detail earlier; now we will note that to delete an empty list,
nothing needs to be done, while to delete a non-empty list, we should free memory from its
§2.11 More on recursion 468
first element and delete its remainder. Of course, deleting the remainder should be done first,
so as not to lose the pointer to this very remainder when deleting the first element. Let's write:

procedure DisposeItemList(p: itemptr);


begin
if p = nil then
Exit;
DisposeltemList(p'.next);
dispose(p)
end;

We can't say that we gain much in code size here, but the fact that recursive list deletion is
easier to read is undoubted: you don't even need to draw diagrams to understand it.
Let us now consider a more complicated example. On p. 430 we looked at an example of
code inserting a new element into a list of integers sorted in ascending order while preserving
the sorting. To solve this problem cyclically, we needed a pointer to a pointer. Note now that
since we need to insert a new element into the sorted list, we have three possible cases:
• the list is empty - you need to insert an element into it first;
• the list is non-empty, and its first element is larger than the one to be inserted - we need
to insert a new element at the beginning;
• the list is non-empty, and the first element is less than or equal to the inserted one - it
should be inserted into the rest of the list.
As we have already discussed, if the pointer to the first element of the list is passed to a
subroutine as a parameter-variable, such a subroutine will be able to insert and delete elements
anywhere in the list, including at the beginning. Besides, it will be useful to remember that
inserting an element into an empty one-connected list and inserting an element into the
beginning of a non-empty list are performed in exactly the same way, which allows us to
combine the first two cases into one, which will serve as the basis for recursion. Note that the
role of the "pointer to the first element" for the list that is the remainder of the original list is
played by the next field in the first element of the original list. Taking all this into account,
the procedure that inserts a new element into the sorted list while preserving sorting will look
like this:

procedure AddNumIntoSortedList(var p: itemptr; n: integer);


var
tmp: itemptr;
begin
if (p = nil) or (p".data > n) then
begin
new(tmp);
tmp".data := n;
tmp".next := p;
p := tmp
end
§2.11 More on recursion 469
else
AddNumIntoSortedList(p".next, n)
end;

Compare this with the code on page 430 and its explanations. 430 and its explanations.
Comments, as they say, are unnecessary.

2.11.5. Working with binary search tree


When discussing binary trees in §2.10.8, we deliberately did not provide code examples
for working with them, leaving this issue until the discussion of recursive methods. While in
the case of single-linked lists recursion can make the work somewhat easier, in the case of
trees it is not a question of making the work easier, but of making it possible.
To give an example, consider a binary search tree containing integers of the longint
type, which may consist, for example, of such nodes:
type
TreeNodePtr = "TreeNode;
TreeNode = record
data: longint;
left, right: TreeNodePtr;
end;

To work with it, we need a pointer, which is often called the root of the tree; however, the
root is just as often the initial node itself, not the pointer to it. From the context it is usually
easy to understand what we mean. Let us describe the root pointer as follows:

var
root: TreeNodePtr = nil; Now we can start describing the basic tree
operations. Since we will have to use recursion actively, we will formalize each action as a
subroutine. To understand roughly how it will look like, let's assume that the tree has already
been built and write a function that calculates the sum of values in all its nodes. For this
purpose, note that for an empty tree such a sum is obviously equal to zero; if the tree is non-
empty, then to calculate the sum we will have to first calculate the sum of the left subtree,
then the sum of the right subtree, add them together and add the number from the current
node. Since the left and right subtrees are the same trees as the whole tree, but with fewer
nodes, we can use the same function we are writing to calculate subtree sums:
function SumTree(p: TreeNodePtr): longint; begin
if p = nil then SumTree := 0 else
SumTree :=
SumTree(p'.left) + p'.data + SumTree(p'.right) end;
As a matter of fact, almost everything we do with a tree follows the same scheme: an empty
tree is used as a degenerate case for the recursion basis, and then recursive calls are made for
the left and/or right subtree. Let's continue our set of examples with a subroutine that adds a
new element to the tree; when the tree is empty, we need to create the element "right here",
i.e. change the pointer that serves as the root pointer for this tree; with this in mind, we will
§2.11 More on recursion 470
pass the pointer to the procedure as a parameter-variable. If the tree is not empty, i.e. at least
its root element exists, then three cases are possible. First, the element to be added may be
strictly smaller than the one in the root element; then the addition should be done in the left
subtree. Second, it may be strictly larger, in which case the right subtree should be used. The
third variant spoils the whole picture by adding a special case: the numbers may be equal.
Depending on the task, different approaches to further actions are possible in such a situation:
sometimes an error is generated when keys match, sometimes some counter is incremented to
show that the given key was entered one time more, sometimes nothing is done at all. We will
stop at informing the caller that the addition cannot be performed; for this purpose, we will
provide our procedure with a parameter of type boolean, into which it will write "true"
when the value is successfully added, and "false" when there is a key conflict. The result is
the following:

procedure AddToTree(var p: TreeNodePtr; val: longint; var ok:


boolean);
begin
if p = nil then
begin
new(p);
p'.data := val;
p'.left := nil;
p'.right := nil;
ok := true
end
else
if val < p'.data then
AddToTree(p~.left, val, ok)
else
if val > p'.data then
AddToTree(p~.right, val, ok) else
ok := false
end;

If you try to write a procedure that determines whether a given number exists in a given tree,
you will get something very similar. Actually, it would be more correct to formalize this
subroutine as a function, because it doesn't "do" anything in the sense of changes, it just
calculates the result; but now it is more important to show the resulting similarity between the
two procedures:

procedure IsInTree(p: TreeNodePtr; val: longint; var res:


boolean);
begin
if p = nil then res := false
else
if val < p'.data then
IsInTree(p~.left, val, res)
else
§2.11 More on recursion 471
if val > p'.data then
IsInTree(p~.right, val, res)
else
res := false
end;

The similarity of the two procedures is not accidental, because in both cases the search is
performed. To search for the required position, it seems worth writing one generalized
subroutine, which can be used to implement both addition and presence check: this subroutine
will search for a position in the tree for a given tree and value, where a node with this value
should be (but is not necessarily) located. This "position" is nothing but the address of the
pointer that points to the corresponding node or should point to it if the node does not exist
yet. For a change, let's still formalize this subroutine as a function, because it simply calculates
its result without changing anything anywhere. Since the pointer address will be the return
value of the function, we will have to describe and name the corresponding value type:
type
TreeNodePos = "TreeNodePtr;

Our function must in some cases return the address of the pointer given to it - if the tree is
empty, and also if the root element of the tree contains the number we are looking for. To
return such an address, it must at least be known, and for this purpose we will pass the pointer
as a parameter variable; the name of such a parameter, as we remember, becomes synonymous
with the variable used as a parameter at the call point during the execution of the subroutine,
so that the address taking operation applied to the parameter name will give us the address of
this variable. Secondly, the cases of an empty tree and equality of the sought value to the value
at the root node can now be combined: in the case of equality we have found the position
where the corresponding number is located, and in the case of an empty subtree - the position
where it should be located. The caller can distinguish between these two cases by checking
whether the pointer located at the address obtained from the function is equal to nil or not.
The text of the function is as follows:
function SearchTree(var p: TreeNodePtr; val: longint)
: TreeNodePos; begin
if (p = nil) or (p'.data = val) then SearchTree := @p else
if val < p'.data then
SearchTree := SearchTree(p~.left, val) else
SearchTree := SearchTree(p~.right, val) end;

Using this function, we can rewrite two previously written subroutines in a new way, making
them noticeably shorter. There will be no resemblance between their texts now, because
everything they had in common we took out in SearchTree; so now nothing prevents us
from formalizing IsInTree as a function after all. The result will be like this:

procedure AddToTree(var p: TreeNodePtr; val: longint;


var ok: boolean);
var
§2.11 More on recursion 472
pos: TreeNodePos;
begin
pos := SearchTree(p, val);
if pos" = nil then
begin
new(pos");
pos .data := val;
pos .left := nil;
pos .right := nil;
ok := true
end
else
ok := false
end;

function IsInTree(p: TreeNodePtr; val: longint): boolean;


begin
IsInTree := SearchTree(p, val)" <> nil end;

A small demonstration program using these functions can be found in the file
treedemo.pas. Unfortunately, we have to remind you of a serious disadvantage of binary
search trees: the tree may become unbalanced if the order in which the stored values are
entered into them is unsuccessful. There are several approaches to such a construction of a
binary search tree, in which unbalance either does not occur at all or is quickly eliminated,
but the story about them is far beyond the scope of our book.
In case the reader has an irresistible desire to learn tree balancing algorithms, we will take the
liberty to draw his attention to several books, in which the relevant issues are discussed in detail. We
should start with N. Wirth's textbook "Algorithms and Data Structures" [8]; the description in this book
is described in detail in the textbook "Algorithms and Data Structures". [8]; the presentation in this
book is characterized by brevity and understandability, since it is intended for beginners. If this is not
enough, try the huge book by Cormen, Leiserson and Rivest [9]; finally, for the strong-hearted there
is also a four-volume book by Donald Knuth, the third volume of which [10] contains a detailed
analysis of all kinds of data structures oriented on sorting and searching.
As an exercise, we venture to suggest that the reader try to write subroutines for working
with trees without recursion. Note that the tree search is realized relatively simply and is a bit
like inserting an element into the right place of a singly-linked list, as it was shown in §2.10.6.
The algorithm for traversing the tree, which is also needed to calculate the sum of the list
elements, is much more complicated; but even here we have already considered something
similar when we solved the problem of the Towers of Hanoi (§2.11.2). In the program we
began discussing on p. 445, we had to memorize the path by which we arrived at the need to
do this or that action, and then go back and continue the interrupted activity. Traversing the
tree is essentially the same thing: for each node we reach, we have to remember where we
came from, as well as what we have already done at that node, or, more precisely, what we
will have to do when we return to that node again.
If it doesn't work, that's okay, this task is quite difficult; we'll come back to it when we
learn the C language.
§2.11 More on recursion 473
2.12. More about program design
If builders built houses the way programmers write
programs, the first woodpecker that flew in would destroy
civilization.
Weinberg's second law

We have already repeatedly drawn the reader's attention (see page 241) to the fact that
the program text cannot be written in any way: it is intended first of all for the human reader
and only secondly for the compiler, and if you forget about it, not only other people will not
understand anything in your program, but also, what is especially offensive, you risk getting
lost in your own program before you finish it.
There are a number of simple but very important rules to improve the comprehensibility
and readability of a program. Let's try to formulate the most important of them.

2.12.1. On the role of ASCII typing and the English language


We have already mentioned several times the need to use English in the comments and in
the choice of names, but we did it usually in footnotes, without going into details. Let us try
to give a detailed explanation of this requirement.
When creating a program text, you should take into account that there are a variety of
operating systems and environments in the world, programmers
use several dozens (if not hundreds) of text editors for every taste, as well as all sorts of code
visualizers, the functions of which may vary greatly. Not all programmers speak Russian;
besides, there are different encodings for Cyrillic characters. Finally, screen sizes and fonts
used may differ significantly from one workplace to another. At the same time, assumptions
with the word "never" are generally harmful; this also applies to the assumption that your
program will "never" be read by programmers from other countries.
Reading a properly designed program should not be a problem, no matter who is reading
your program or what operating environment they are using. This can be achieved with just
three simple rules: ASCII characters are always available, everyone knows English, and the
screen is never smaller than 24x80 characters, but it doesn't have to be any larger. We will
return to the question of program text width in §2.12.7, but now we will discuss the first two
of the three rules.
Back in the introductory section (§1.4.5), we described the ASCII encoding and the
reasons for its universality. If you use only characters from the ASCII set in the program text,
you can be sure that this text will be successfully read on any computer in the world, in any
operating system, with the help of any program designed to work with text, and so on. Recall
that this set includes:
• uppercase and lowercase letters of the Latin alphabet without
diacritical marks: ABCDEFGHIJKLMNOPQRSTUVWXYZ,
abcdefghijklmnopqrstuvwxyz;
• Arabic numerals: 0123456789;
• arithmetic signs, parentheses and punctuation marks: .,;;:''''-?!@#$%"&O[]
§ 2.12. More on program design 474
{}<>=+-*/~\l;
• underscore _;
• whitespace characters - space, tab, and line feed.
No other characters are included in this set. ASCII has no room for characters of national
alphabets, including the Russian Cyrillic alphabet, nor for Latin letters with diacritical marks,
such as S or A. This set does not include many of the usual typographic symbols, such as the
long dash, herringbone quotation marks, and many others. Characters that are not included in
the ASCII set cannot be used in the program text - even in comments, not to mention string
constants and even more so identifiers. Note that most programming languages will not allow
you to use anything in identifiers, but there are some translators that consider characters not
included in the ASCII table to be acceptable in identifiers - an example is most Lisp
interpreters. And most translators do not pay any attention to the contents of string constants
and comments at all, allowing you to insert almost anything there. Nevertheless, the
connivance of translators should not confuse us: the text of a program in any programming
language must consist of ASCII characters and only of them. Any non-ASCII character may
turn into something completely different when the text is transferred to another computer,
may simply be unreadable, etc.
The question remains what to do if the program you are writing has to communicate with
the user in Russian. We'll give you the answer later; for now, we'll note that the limitation on
the character set used is not the only reason why programs are not allowed to use Russian;
moreover, it has nothing to do with Russian specifically. Thus, the set of letters included in
ASCII is sufficient to represent text in Italian, and with the use of replacing individual letters
with digraphs - and in German, but neither Italian nor German in the text of the program can
be used in the same way as Russian. There is only one language that is allowed in the text
of a computer program, and this language is English. For programmers English is not a
luxury but a means of mutual understanding. By tradition, programmers all over the world
use English to communicate among themselves . As a rule, we can assume that any person
192

working with computer program texts will understand at least a not very complex text in
English, because this is the language in which documentation for various libraries, all sorts of
specifications, descriptions of network protocols are written, many books on programming
are published; of course, many books and other texts are translated into Russian (as well as
into French, Japanese, Hungarian, Hindi and other national languages), but it would be
unreasonable to expect that any text you need will be available in Russian.
Three important requirements follow from this. First, any identifiers in the program
must consist of English words or be abbreviations of English words. If you have forgotten
the word you need, don't hesitate to look it up in the dictionary. Let us emphasize that the
words should be English - not German, not French, not Latin and especially not Russian
"translit" (the latter is generally considered by professionals to be extremely bad taste).

We will leave aside the question of whether this is good or bad and confine ourselves to stating the fact.
192

It should be noted, however, that doctors and pharmacists all over the world have a tradition of filling
prescriptions and some other medical documents in Latin, and, for example, the official language of the
Universal Postal Union is French; in any case, the existence of a common professional language of
communication is useful in many ways.
§ 2.12. More on program design 475
Secondly, the comments in the program must be written in English; as already mentioned,
it is better not to write comments at all than to try to write them in a language other than
English. And finally, the user interface of the program should be either English or
"international", i.e. allowing translation into any language.
By now, some readers may have had a legitimate question: "What should I do if I don't
know English? The answer will be trivial: learn it urgently. A programmer who cannot write
English more or less competently (and even less understand English) is professionally
unsuitable in modern conditions, however unpleasant it may sound.
The question remains what to do if the program according to the requirements must
communicate with the user in a language other than English; according to the rules formulated
above, this means that it must be made international, but how to make it correctly? The basic
principle is quite simple: all messages in languages other than English must be placed in
separate files external to your program. In this case your program will read these files, analyze
them and give the user the messages received from the files; this does not violate our
principles in any way, because the program can process absolutely any data, including texts
in any language, and the prohibition on languages other than English concerns only the text
of the program itself.
Free Pascal even contains special tools for creating international programs, but we will
not consider these tools in order to save time and effort - in the hope that the reader will not
stop learning Pascal. In the second volume of our book, we will discuss one of the libraries
designed for creating international C programs.

2.12.2. Allowable structural indentation styles


We already know that nested fragments should be shifted to the right relative to what they
are nested in, and that the lines that begin and end any structure should start in the same
horizontal position. The size of the right shift, also called structural indentation, can be two
spaces, three spaces, four spaces, or one tab character; you can choose any of these four
options, but then stick with the chosen option throughout your program. Thus, all the sample
programs in this book are typed using four-space indentation.
We also already know that there are three valid choices for the arrangement of operator
brackets (for Pascal, these are the words begin and end).
§ 2.12. More on program design 476
while p <> nil do while p <> nil do
while p <> nil do begin begin
begin
s := s + p~.data; s := s + p~.data;
s := s + p~.data;
p := p~.next p := p~.next
p := p~.next
end; end;
end;

Fig. 2.15. Three styles of operator brackets arrangement

In all Pascal programs in this part of the book, we always shifted the word begin to the
next line after the operator header (while or if), and wrote it starting at the same position
where the header begins, i.e., we did not make a separate shift for the operator brackets (Figure
2.15, left). Two other options are allowed. The word begin can be left on the same line as
the operator header, with end placed exactly under the beginning of the operator (not under
the word begin!), as shown in the same figure in the middle; this is how C programs will
be laid out in the relevant parts of our book. Finally, you can shift to the next line, but provide
a separate shift for the operator brackets (this option is shown on the right in the figure). When
using the last option, you usually choose the indentation size of two spaces, because otherwise
the horizontal space on the screen is quickly exhausted; as we have already mentioned, this
option is acceptable, but we will not recommend it.

2.12.3. If statement with else branch


In Pascal, the branching operator can be used both in the full version with the else
branch and in the abbreviated version without it. If the else branch is absent, then
everything is clear with the design of the text of the construction; there is also no freedom in
the case when both branches are present but consist of one simple operator, i.e. they do not
require the use of operator brackets:

if p <> nil then


res := p~.da.ta. else
res := 0;

Just remember that the body should always be placed on a separate line, and in the case of an
if statement, this applies to both branches. Thus, the following variant is
unacceptable:

if p <> nil then res := p'.data else res := 0;


§ 2.12. More on program design 477
Let us now consider the case when both branches of the if operator use a
compound operator. In this situation, we should first of all remember which of the three
allowed options of opening operator bracket placement we finally chose. If we have chosen
the first or the third option (in both cases, the opening operator bracket is shifted to the next
line), everything is quite simple. If we decided not to shift the brackets of the compound
operator, but to shift only its body, the if will have to be formatted as follows:
if p <> nil then
begin
flag := true;
x := p'.data
end
else
begin
new(p);
p'.data := x
end
As a matter of fact, this is the way we used to formalize branching in our examples. If, despite
all our efforts, you have chosen the option of shifting the operator brackets, then the if will
look like this:
if p <> nil then
begin
flag := true;
x := p'.data
end
else
begin
new(p);
p~.da.ta. := x
end
In both cases, else together with parentheses takes three lines, which many programmers
think is too much for a delimiter. If the opening operator bracket is left on the same line as
the header, then in this case there are also two options for the else branch.
to put the word else on the same line as the closing line.
with an operator bracket or write else on a new line:
if p <> nil then
if p <> nil then
begin flag :=
begin flag :=
true;
true;
x := p'.data
x = p'.data
end
end else begin
else begin
new(p);
new(p);
p~.data. := x
end p'.data := x
end
§ 2.12. More on program design 478
Let's now consider the case when only one branch of an if-else construct requires the use
of a compound operator, while the second branch consists of a single simple operator. You might not
consider this case as a special case at all, but the results may be a bit unaesthetic, especially if you
decide not to demolish begin on a separate line. You will often find the following recommendation:
if one branch of an if-else construct is a compound operator, you should also
use a compound operator for the second branch, even if it consists of one
simple operator. Note that following this recommendation is, generally speaking, optional;
moreover, as you can see, we did not do it in our examples - but we will definitely do it when we get
to learning C.

2.12.4. Features of the design of the operator of choice


The Pascal language does not suggest using the word begin as part of a case
statement, although it does suggest the word end, so the layout of the header and
body of a case statement does not depend on whether we leave the opening bracket
on the same line as the header of the complex statement, whether we move it to the next line,
and whether we indent it with our own indentation; on the other hand, in most cases we have
to use compound statements in each branch, and their layout depends on the style chosen.
In either case, another question to answer is whether you will shift the labels relative to
the statement itself; the labels marking the beginning of the next alternative in a choice
statement can either be left in the column where the header of the choice statement
begins, or shifted relative to the header by the amount of the indentation. As usual in
such cases, you can choose either of the two alternatives, but only once - for the whole
program.
If we leave begin on the same line as the operator header, then depending on the label
placement style we choose, we can write like this:

case Operation of '+': begin case Operation of '+': begin


writeln('Addition'); c := a + b writeln('Addition'); c := a
end; + b end;
'-': begin '-': begin
writeln('Subtraction') ; writeln('Subtraction');
c := a - b c := a - b end; else
end; begin
else begin writeln('Error');
writeln('Error'); c := 0 end end
c := 0
end
end
§ 2.12. More on program design 479
Both options are acceptable, but the first one looks significantly more attractive and clearer,
so if we don't demolish begin on a separate line, it will be better to decide in favor of
shifting labels.
If we agreed to demote begin to the next line but not to shift it, the above code
fragment should look like this:
case Operation of case Operation of ' + ':
' + ': begin
begin writeln('Addition');
writeln('Addition');
c := a + b end;
c := a + b
end; '-':
'-': begin
begin writeln('Subtraction');
writeln('Subtraction'); c := a - b end; else
c := a - b begin
end; writeln('Error');
else
c := 0 end end
begin
writeln('Error');
c := 0
end
end

For this case we will also recommend shifting the labels (we did exactly that in our
examples), but we will leave the final decision to the reader.
Finally, if we not only demolish begin, but also supply the compound operator
with its own shift, it will look like this:
case Operation of case Operation of ' + ':
' + ': begin
begin writeln('Addition');
writeln('Addition'); c := a + b end;
c := a + b '-':
end; begin
'-': writeln('Subtraction');
begin c := a - b end;
writeln('Subtraction'); else
c := a - b end; begin
else writeln('Error');
begin c := 0 end end
writeln('Error');
c := 0 end end

For this case, our recommendation would be the opposite: if we equip compound
operators with a separate shift, it is better not to shift the labels in the selection operator,
the result will be more aesthetically pleasing.

2.12.5. Sequences of mutually exclusive if's


The selection operator, as we know, has a very important restriction: the condition
for switching to one of the labels is that the selecting expression must be equal to one
of the constants, and both the constants and the expression must be of an ordinal type.
If it is necessary to choose one of several possible branches of work based on more
complicated conditions or on a selection expression having a non-ordered type (for
example, selection by string value), we have to use a long chain of branching operators:
§ 2.12. More on program design 480
if // else if // else if // .... // else. If you follow the rules of nested
operators design literally, the branch bodies of such a choice construction will have to
be shifted further and further to the right, like this:

if cmd = "Save" then


begin
writeln('Saving...');
SaveFile
end
else
if cmd = "Load" then
begin
writeln('Loading...'); LoadFile
end
else
if cmd = "Quit" then begin
writeln('Good bye...');
QuitProgram
end
else
begin
writeln('Unknown command') end

It is easy to see that with this approach, seven or eight branches will be enough to run
out of horizontal screen space; however, many more branches may be needed. More
importantly, this (formally seemingly correct) style of formatting misleads the reader
of the program as to the relationship between the branches of the design. It is clear that
these branches have the same nesting rank. If in doubt, try swapping the branches. It is
obvious that the program operation will not change in any way, and if so, it means that
the assumption that, for example, the first branch is "more important" than the second
one, and the second branch is "more important" than the third one, is incorrect. But they
are shifted to different positions!
The problem can be explained in another way. It is clear that such a chain of if 's
is a generalization of the select operator and serves the same purpose as the select
operator; the only difference is the expressive power - if is not limited to comparing
an expression of an ordinal type with a constant. But the branches of the choice operator
are written at the same level of nesting. It is quite logical to think that the branches of
such a construction of if's should also be located at the same indentation level.
This is achieved by treating the keywords else and if next to each other as a
single unit. Regardless of the style you choose, you can write else if on a single
line with a space, or you can separate them on different lines starting at the same
position. In particular, if you demolish begin on a separate line, you can arrange the
above fragments like this (if on a new line each time):

if cmd = "Save" then


begin
§ 2.12. More on program design 481
writeln('Saving...');
SaveFile
end
else
if cmd = "Load" then
begin
writeln('Loading...');
LoadFile
end
else
if cmd = "Quit" then
begin
writeln('Good bye...');
QuitProgram
end
else
begin
writeln('Unknown command')
end

or like this (if on the same line with the previous else; this is also acceptable and
probably more correct - we agreed to consider else and if as a single integer):

if cmd = "Save" then


begin
writeln('Saving...');
SaveFile
end
else if cmd = "Load" then
begin
writeln('Loading...');
LoadFile

end
else if cmd = "Quit" then
begin
writeln('Good bye...');
QuitProgram
end
else
begin writeln('Unknown command') end

If you decide to leave begin on the same line with the headers of operators, the
design of the chain of if's can be formalized as follows:

if cmd = "Save" then begin


writeln('Saving...');
§ 2.12. More on program design 482
SaveFile
end else
if cmd = "Load" then begin
writeln('Loading...');
LoadFile
end else
if cmd = "Quit" then begin
writeln('Good bye...');
QuitProgram
end else begin writeln('Unknown command') end

or this:
if cmd = "Save" then begin
writeln('Saving...');
SaveFile
end else if cmd = "Load" then begin
writeln('Loading...');
LoadFile
end else if cmd = "Quit" then begin
writeln('Good bye...');
QuitProgram
end else begin
writeln('Unknown command')
end

We emphasize that everything said in this paragraph applies only to the case when the
else branch consists of exactly one if statement. If this is not the case, the
usual rules of branching operator formatting should be applied.

2.12.6. Tags and goto operator


We already know from §2.4.3 that goto can and should be used in some cases;
moreover, we know exactly which cases they are; to repeat, there are exactly two of
them, and there is no third. What remains to be answered is the question of how to
formalize all this. To be more precise, the goto operator itself does not cause any
problems with its layout, it is an ordinary operator, which should be formatted
according to the usual rules; but the way to put the label may become the subject of a
heated discussion.
Recall that an operator is always marked with a label, i.e., even if we mark an
"empty space" with a label, the compiler will assume that an empty operator is located
there. There are two questions that need to be answered (fixed): whether to shift the
label relative to the enclosing construct (note that we have already encountered a similar
dilemma when discussing selection operators in §2.12.4) and whether to place the
labeled operator on the same line as the label or on a separate line.
The most popular variant is that the label is not shifted, that is, it is written in the
same horizontal position in which the beginning and end of the volumetric control
structure are placed, and the labeled operator is shifted to the next line, for example:
§ 2.12. More on program design 483
procedure GoodProc;
label
quit;
var
p, q: "SomethingBig;
begin
new(p);
new(q);
{ ... }
quit:
dispose(q);
dispose(p)
end;

The label here is in the leftmost position only because that is where the enclosing
structure (in this case it is a subroutine) is placed. The label can also occur not at the
top level, for example:

if cond = 1 then begin


while f <> 0 do begin
while g <> 0 do begin
while h <> 0 do begin
if t = 0 then
goto eject
end end;
eject:
writeln("go on...") end
end

The above variant is the most popular, but it is not the only acceptable variant. Quite
often the labeled operator is written in the same line as the label, like this:
{ ... }
quit: dispose(q);
dispose(p)
end;

This option looks good if the label - along with the colon and the space after it - takes
up less space horizontally than the selected indentation size, allowing horizontal
alignment for operators:
{ ... }
q: dispose(a);
dispose(b)
end;

Since a one-letter label name by itself does not look too nice, this style is usually used
in combination with tab indentation (the maximum length of the label name is 6
§ 2.12. More on program design 484
characters).
Some programmers prefer to treat the label simply as a part of the labeled operator,
without distinguishing it as a special entity; the operator is shifted as usual, but this
time together with the label. The end of the above subroutines in this style would look
like this:
{ ... }
quit: dispose(q);
dispose(p)
end;

Sometimes the label is shifted, but the labeled operator is demolished to the next line,
roughly like this:
{ ... }
quit:
dispose(q);
dispose(p)
end;

The main disadvantage of such solutions is that the label merges with the
surrounding "landscape" and ceases to be visible as a special point in the code structure;
we will take the liberty to recommend refraining from this style, but nevertheless, we
will leave the decision to the reader.

2.12.7. Maximum width of the program text


Thou shalt not cross 80 columns in thy file. 193

(The sacred 80-column rule)

The traditional text width in programming is 80 characters. The origin of the


number 80 goes back to the days of punched cards; the most popular punched card
format offered by IBM contained 80 columns for punching holes, and when using these
cards to represent text information, each column specified one character. One punch
card, thus, contained a line of text up to 80 characters long, and it was from such lines
consisted of the texts of computer programs of those times. The 80-character line length
was still one of the standards for dot matrix printers in the early 1990s. When
alphanumeric terminals were introduced in the early 1970s, they were 80 characters
wide to ensure compatibility between two fundamentally different ways of entering
computer programs. To this day, computers start in text mode after the power is turned
on, and only after the operating system is loaded do they switch to graphics mode; the
screen width in text mode in most cases is still the same 80 characters. Finally, the
familiar terminal emulators have a default line width of 80 characters, although this is
usually easy to fix by simply resizing the window.
The number 80 did not arise by chance. If the limit on line length is made much

193
Thou shalt not go beyond eighty columns in thy file // The sacred rule of the eightieth column.
§ 2.12. More on program design 485
smaller, it will be inconvenient to write programs, especially in structured languages
where structural indentation is required; for example, even the simplest programs will
not fit into a width of 40 characters. On the other hand, programs with significantly
longer lines are hard to read, even if the corresponding lines fit on a screen or a sheet
of paper. The reason for this is purely ergonomic and is connected with the necessity
to move your eyes left-right. The reader can easily see for himself that in any
typographically produced book the width of the typing strip does not exceed 75
characters; the recommended length of the book line is 50-65 characters, lines up to
75 characters are considered acceptable, but no more; this "magic" limit was known to
book publishers long before the computer era. "The 80-column punched cards that
came to hand " were well suited for representing lines of text: the first four columns
194

were usually reserved for the line number, the fifth contained a space separating the
number from the content, and there were just 75 positions for the line itself.
With the modern size of displays, their graphic resolution, and the ability to sit
close to them without harm to health, many programmers see nothing wrong with
editing text with a window width substantially larger than 80 characters. From the point
of view of ergonomics, this solution is not quite successful; it is advisable either to
make the font larger, so that your eyes get tired less, or to use the screen width to place
several windows with the possibility of simultaneous editing of different files - this will
make navigation in your code more convenient, because the code of complex programs
usually consists of many files, and you often have to make changes in several of them
at the same time. Programming-oriented window text editors such as geany, gedit,
kate, etc. routinely show the right border line on the screen, just at the level of the
80th character.
Many programmers prefer not to open a text editor window wider than 80
characters; moreover, many programmers use text editors that work in a terminal
emulator, such as vim or emacs; both editors have graphical versions, but not all
programmers like these versions. Quite often in the process of program operation there
is a need to view and even edit source code on a remote machine, and the quality of
communication (or security policy) may not allow the use of graphics, and then the
window of an alphanumeric terminal becomes the only available tool. There are known
software tools designed to work with program source texts (for example, detecting
differences between two versions of the same source text), which are implemented on
the assumption that source text lines do not exceed 80 characters in length.
Often a program listing may need to be printed on paper. The presence of long lines
in this situation presents you with an unpleasant choice. You can make the long lines
fit on paper in one line - either by reducing the font size, using a wider sheet of paper,
or using a "landscape" orientation - but this leaves most of the paper area blank, and
makes the listing harder to read; if you trim the lines by simply dropping a few right-

IBM produced various punched cards, not only 80-column ones, back in the thirties of the XX
194

century - much earlier than the first computers; they were intended for sorting machines - tabulators,
which were used, in particular, for processing statistical information. However, back in the 19th century,
analogs of punch cards were used to control weaving machines.
§ 2.12. More on program design 486
hand positions, you risk missing something important; finally, if you force the lines to
be automatically transposed, the readability of the resulting paper listing will be worse
than the readability of the original text from the screen, which is already the case.
The conclusion from all of the above is quite obvious: whatever text editor you use,
you should not allow lines longer than 80 characters to appear in your program. In fact,
it is always advisable to keep within 75 characters; this will allow a programmer using
a vim editor with line numbering enabled to work comfortably with your text, for
example; such source code will produce a nice and easy-to-read listing with numbered
lines.
Some code style guides allow you to exceed the line length limit "in exceptional cases".
For example, the design style set for the Linux kernel categorically forbids text messages to
be spread over several lines, and for this case it is said that it is better if a line of source text
"goes beyond" the set limit. The reason for this prohibition is quite obvious. The Linux kernel
is an extremely extensive program, and it is difficult to navigate its source code. It is often
necessary to find out exactly which fragment of the source code caused a particular message
to appear in the system log, and the easiest way to find the appropriate place is a simple text
search, which of course will not work if the message we are trying to find is spread over several
text constants located in different lines of the source.
However, exceeding the allowable line length remains undesirable. The same Linux
kernel code design guidelines have additional restrictions on this issue - for example, there
must be "nothing significant" beyond the right border of the screen, so that a person viewing
the program cursorily and not seeing text to the right of the border will not miss some important
property of the program. It may take some serious experience to determine whether your case
is acceptable. Therefore, the best option is still to consider the 80-character boundary
requirement to be strict, i.e., not allowing exceptions; as practice shows, this can always be
handled by successfully splitting the expression, reducing the text message, reducing the level
of nesting by moving parts of the algorithm into auxiliary subroutines.
In addition to the standard screen width, attention should also be paid to the screen
height. As mentioned above, subprograms should be kept as small as possible to fit the
screen height; the question remains as to what this "screen height" should be. The
traditional answer to this question is 25 lines, although there are variations (e.g., 24
lines). It should not be assumed that the screen will be larger; however, as already
mentioned, the length of the subroutine in some cases has the right to slightly exceed
the height of the screen, but not by much.

2.12.8. How to split a long string


Since we have decided not to go beyond 80 columns, the question logically arises:
what to do if the next line of the program does not want to fit into this limit. Let's start
with the case when the header of an operator (if, while or case) is too long.
First of all, you should consider whether it is possible to shorten the conditional
expression. In many cases, long expressions in the headers of structural operators arise
due to the lack of programmer's experience; for example, the author has often seen in
student programs checking whether a symbol belongs to a particular set (e.g., a set of
punctuation marks) through a sequence of explicitly written comparisons with each of
the elements of the set, something like:
§ 2.12. More on program design 487
if (a = ',') or (a = '.') or (a = ';') or (a == ':') or (a
...

Of course, there is no problem of lack of space in the string here, there is only a problem
of lack of imagination. In such a situation, a programmer who has passed the beginner's
stage will describe a function that checks whether a given symbol belongs to a
predefined set, and the if'a header will contain a call to this function:
if IsPunctuation(a) then

A more experienced programmer will use a ready-made function from the standard
library, and an even more experienced programmer may claim that the standard
function is too complicated because it depends on the locale settings in the
environment, and go back to the version with his own function. Anyway, there is no
problem with header length.
Unfortunately, problems are not always solved so easily. Multiline headings, no
matter how hard we try to overcome them, still sometimes occur in the program.
Unfortunately, there is no unambiguous answer how to deal with them; we will consider
one of the possible options, which seems to us the most practical and meets the task of
program readability.
So, if the header of a complex operator has to be spread over several lines,
then:
- break the expression in the header into multiple lines; it is preferable to
break the line by "top-level operation", which is usually the logical
conjunction "and" or "or";
• shift each subsequent header line relative to the first header line by the
normal indentation size;
• regardless of the number of simple operators in the body, be sure to put the
body of your operator in operator brackets, i.e. make it a compound
operator;
• regardless of the style you use, tear off the opening statement bracket on
the next line so that it serves as a visual separator between the header and
body lines of your statement.
All together it will look something like this:

while (TheCollection'.KnownSet'.First = nil) and


(TheCollection'.ToParse'.First <> nil) and
(TheCollection'.ToParse'.First'.s = ' ') do
begin
SkipSpace(TheCollection)
end;

This option works fine if you don't move the compound operator relative to the header;
if you prefer this ("third") style of design, you may be advised to move the word then,
do, or of to the next line, like this:
§ 2.12. More on program design 488
while (TheCollection'.KnownSet'.First = nil) and
(TheCollection'.ToParse'.First <> nil) and
(TheCollection~.ToParse~.First~.First~.s = ' ')
do
begin
SkipSpace(TheCollection)
end;

The role of a visual separator here is played by the closing word from the title.
If you are using a style that leaves the opening parenthesis on the same line as the
header, you can use another formatting option: as in the previous example, take down
the last token of the header on a separate line, and leave the opening parenthesis on the
same line:

while (TheCollection'.KnownSet'.First = nil) and


(TheCollection'.ToParse'.First <> nil) and
(TheCollection~.ToParse~.First~.First~.s = ' ')
do begin
SkipSpace(TheCollection)
end;

This style is not to everyone's liking, but you don't have to follow it; even if you leave
the operator bracket on the title line everywhere else, for a multi-line title you may well
make an exception and format it as shown in the first example in this paragraph.
Let's consider other situations when a line may not fit into the allotted horizontal
space. I would like to note at once that the best way to deal with such situations is to
avoid them. Often code style guides are written under the assumption that a
programmer can always avoid an undesirable situation, and they simply do not say what
to do if the situation does occur; such a default leads to the fact that programmers begin
to get out of the situation in any way, and often even in different ways within the same
program. To avoid this, we will give some examples of what to do if a long string does
not want to get shorter.
Suppose an overly long expression is encountered in the right-hand side of an
assignment. The first thing we suggest is to try to break the string by the sign of the
assignment. If there is something long enough to the left of the assignment, this option
may help, for example:

MyArray[f(x)].ThePtr~.MyField : = StrangeFunction(p,
q, r) + AnotherStrangeFunction(z);

Note that the expression to the right of the assignment is not only moved to the next
line, but also moved to the right by the indentation size. If the expression still doesn't
fit on the screen after that, you can start splitting it too, and it is best to do it by the
signs of the lowest-priority operations, for example:

MyArray[f(x)].ThePtr~.MyField :=
StrangeFunction(p, q, r) * SillyCoeff +
§ 2.12. More on program design 489
AnotherStrangeFunction(z) / SecondSillyCoeff +
JustAVariable;

It may happen that even after this, the screen is still too narrow for your expression.
Then you can try to start splitting the subexpressions included in your expression into
several lines; their parts should, in turn, be moved one more indentation, so that the
expression is more or less easy to read, as far as it is possible to talk about readability
for such a monstrous expression:

MyArray[f(x)].ThePtr~.MyField : = StrangeFunction(p,
q, r) + AnotherStrangeFunction(z) *
FunctionWhichReturnsCoeff(z) *

AnotherSillyFunction(z) + JustAVariable;

If an expression consists of a large number of top-level subexpressions (e.g.,


summands) that are not very long by themselves, it is perfectly acceptable to leave
several such subexpressions on the same line:
MyArray[f(x)].ThePtr~.MyField : = =
a + b + c + d + e + f + g + h + i + j + k + l + m
+ m + n + o + p + q + r + s + t + u + v + w + x +
y + z;

Of course, if in real life you had to add up 26 variables like this, it is a reason to wonder
why you don't use an array; here we give the sum of simple variables for illustration
only, in real life you will have something more complex instead of variables.
The situation when there is a simple variable name or even an expression to the left
of the assignment, but a short one, so that breaking the string by the assignment sign
gives no (or almost no) advantage, deserves a separate discussion. Of course, the
expression to the right of the assignment is still best broken down by lower-priority
operations; the only question is at what position to start each subsequent line. There are
exactly two answers to this question: you can either shift each next line by one indent,
as in the examples above, or you can place its beginning exactly below the beginning
of the expression in the first line of our assignment (right after the assignment sign).
Compare, here is an example of the first option:

MyArray[n] := StrangeFunction(p, q, r) *
SillyCoeff + AnotherStrangeFunction(z) /
AnotherCoeff + JustAVariable;

And this is how the same code will look like if you choose the second option:

MyArray[n] := StrangeFunction(p, q, r) * SillyCoeff +


AnotherStrangeFunction(z) / AnotherCoeff +
JustAVariable;

Both variants are acceptable, but they have significant drawbacks. The first option loses
to the second in clarity, but the second option requires a non-standard indentation size
§ 2.12. More on program design 490
for the second and subsequent lines, which turns out to depend on the length of the
expression to the left of the assignment. Note that this (second) option is completely
unsuitable if you use tabs as an indentation, because this alignment can only be
achieved with spaces, and you should never mix spaces and tabs.

If the disadvantages of both options seem significant to you, you can make it a rule
to always translate the line after the assignment sign, if the whole operator does not fit
on one line. This variant (discussed at the beginning of the paragraph) is free from both
disadvantages, but requires the use of an extra line; however, there is an unlimited
supply of lines in the Universe. It will look like this:
MyArray[n] : =
StrangeFunction(p, q, r) * SillyCoeff +
AnotherStrangeFunction(z) / AnotherCoeff +
JustAVariable;
The next case that needs to be considered is a subprogram call that is too long. If you
cannot fit the parameters into one line when calling a procedure or function, the line
will naturally have to be broken, and this is usually done after another comma
separating the parameters from each other. As in the case of an expression scattered
over several lines, the question arises as to which position to start the second and
subsequent lines from, and there are two options: to shift them either by the indentation
size or so that all parameters are written "in column". The first option looks like this:
VeryGoodProcedure("This is the first parameter",
"Another parameter", YetAnotherParameter, More +
Parameters * ToCome);
The second option for the above example would look like this:
VeryGoodProcedure("This is the first parameter",
"Another parameter",
YetAnotherParameter, More + Parameters
* ToCome);
Note that this option, as well as a similar option for formatting expressions, is not
suitable when using tabs as an indentation size: only spaces can achieve such alignment,
and spaces and tabs should not be mixed.
If you don't like both options for one reason or another, we can suggest another
option, which is rarely used, although it looks quite logical: consider the subprogram
name and parentheses as a volumetric construct, and parameters as nested elements. In
this case, our example will look like this:
VeryGoodProcedure(
"This is the first parameter."
"Another parameter", YetAnotherParameter, More +
Parameters * ToCome
);
It often happens that the subprogram header is too long. In this situation, you should
§ 2.12. More on program design 491
first of all carefully consider the possibilities of its reduction, while allowing, among
other things, the option of changing the division of the code into subprograms. As we
have already mentioned, subroutines with six or more parameters are very hard to use,
so if a large number of parameters has caused the header to "swell", you should consider
whether it is possible to change your architecture in such a way as to reduce this number
(perhaps at the cost of introducing more subroutines).
The next thing to pay attention to is the names (identifiers) of the parameters. Since
these names are local for your subprogram, they can be made short, up to two or three
letters. Of course, we deprive these names of self-explanatory power, but in any case,
the subprogram header is usually provided with a comment, at least a short one, and
the corresponding explanations about the meaning of each parameter can be included
in this comment.
Sometimes, even after all these tricks, the header still does not fit into 79 characters.
Most likely, you will have to separate the parameter list on different lines, but before
you do that, you should try to remove the beginning and end of the header on separate
lines. For example, you can write the word procedure or function on a
separate line (the next line will not be moved!). In addition, the type of the function's
return value specified at the end of the header can also be moved to a separate line along
with the colon, but this line should be moved so that the return value type is somewhere
below the end of the parameter list (even if you use tabs). The point is that the reader
of your program expects to see the return value type there (somewhere on the right),
and it will take extra effort for the reader to find it on the next line on the left instead
of on the right. All together it may look like this:

procedure
VeryGoodProcedure(fpar: integer; spar: MyBestPtr; str:
string); begin
{...}
end;

function
VeryGoodFunction(fpar: integer; spar: MyBestPtr; str:
string)
: ExcellentRecordPtr;
begin
{...}
end; If this does not help and the header is still too long, the only option left is to
split the parameter list into parts. Naturally, line feeds are inserted between the
descriptions of the individual parameters. If several parameters have the same type and
are listed comma-separated, it is desirable to leave them on the same line and place line
breaks after the semicolon after the type name. In any case, the question remains as to
the horizontal placement (shift) of the second and subsequent lines. As for the cases of
a long expression and a long subprogram call discussed above, there are three options
here. First, you can start the parameter list on one line with the subprogram name, and
shift the subsequent lines by the indentation size. Second, you can start the list on the
same line as the subprogram name, and shift the subsequent lines so that all parameter
§ 2.12. More on program design 492
descriptions start at the same position (this case is not suitable when tabulation is used
for formatting). Finally, it is possible, considering the name of the subprogram and the
opening parenthesis as the header of a complex structure, to shift the description of the
first parameter to the next line, shifting it by the indentation size, placing the rest of the
parameters below it, and placing the parenthesis closing the parameter list on a separate
line in the first position (under the beginning of the header).
The case of a long string constant stands somewhat apart in our list. Of course, the
worst thing you can do is to "shield" the line feed character by continuing the string
literal at the beginning of the next line of code. Don't ever do that:

writeln('This is a string which unfortunately is \


too long to fit on a single code line');

In Pascal, thanks to the string addition operation, you can do this:

writeln('This is a string which unfortunately is '


+ 'too long to fit on a single code line');

But this is not quite right either. A single text message that is output as a single line
(i.e., does not contain line feed characters among the output text) should not be split
into different lines of code at all (see the remark on page 475). There are two more
ways to deal with the string literal length.
First of all, no matter how trivial it may seem, you should consider whether it is
possible to shorten the phrase contained in the line without losing its meaning. As you
know, brevity is the sister of talent. For example, for the example under consideration,
the following variant is possible:

writeln('String too long to fit on a line');

We have left the meaning of the English phrase unchanged, but now it fits into a code
line quite normally, contrary to its own content.
Secondly (if you don't want to shorten anything), you may notice that some string
constants would fit in a line of code if the operator containing them started in the
leftmost position, i.e. if there were no structural indentation. In such a situation, it is
quite easy to deal with a stubborn constant: just give it a name - for example, describe
it in the constants section:

const TheLongString =
'This string could be too long if it was placed in the
code';
{ ... }

writeln(TheLongString);

Unfortunately, there are times when none of the above methods works. Then the only
thing left to do is to follow the rules in the Linux Kernel Coding Style Guide and leave
a line longer than 80 characters in the code. Just make sure that this length does not
§ 2.12. More on program design 493
exceed the limits of reasonableness. So, if the resulting line of code "exceeds" 100
characters and you think that none of the above mentioned methods can be used to
defeat the malicious constant, you probably only think so; the author of these lines has
never seen a situation in which a string constant could not fit into the usual 80
characters, let alone a hundred.

2.12.9. Spaces and separators


Signs that are separated into separate tokens in the program text regardless of the
presence or absence of whitespace around them are called delimiters. These are usually
arithmetic operations, parentheses, and punctuation marks such as commas,
semicolons, and colons. For example, the + and - operations are delimiters, because
you can write a + b, or you can write a+b, but the meaning is the same. At
the same time, the and operation is not a separator: a and b is not the same as
aandb.
Although whitespace around delimiters is not mandatory, in some cases adding
them can make program text more aesthetically pleasing and readable - but not always.
However, as usual, it is impossible to propose a single universal set of rules for placing
such spaces; there are different approaches with their own advantages and
disadvantages. It is safe to say that for punctuation marks - commas, semicolons and
colons - one simple rule is best: no spaces should be placed before them, and spaces
should be placed after them (this can be either a space itself or a line break).
Some programmers put spaces on the inside of parentheses (round, square, curly,
and angle brackets), like this:

MyProcedure( a, b[ idx + 5 ], c );

We do not recommend doing so, although it is acceptable; it is better to write it this


way:

MyProcedure(a, b[idx + 5], c);

When referring to procedures and functions, a space between the name of the called
subroutine and the opening bracket is usually not put, just like the space between
the name of an array and the opening square bracket of an indexing operation.
A somewhat separate issue is the question of which arithmetic operations should
be separated by spaces, and how - on one side or both sides. One of the most popular
and clear recommendations is as follows: symbols of binary operations should be
separated by spaces on both sides, symbols of unary operations should not be
separated by spaces. It should be taken into account that the operation of selecting a
field from a complex variable (in Pascal it is a point) is not a binary operation, because
on the right side it has not an operand, but the name of the field, which cannot be the
value of the expression. We emphasize that this is the most popular style, but by no
means the only one; it is possible to follow completely different rules, for example, to
space out binary operations of the lowest priority (i.e., operations of the "top" level) in
any expression, and not to space out the rest, etc.
§ 2.12. More on program design 494
2.12.10. Selecting names (identifiers)
The general rule for choosing names is fairly obvious: identifiers should be
chosen according to what they are used for. Some authors argue that identifiers must
always be meaningful and consist of several words. In fact, it is not always so: if a
variable performs a purely local task and its application is limited to several program
lines, the name of such a variable may well consist of a single letter. In particular, an
integer variable that plays the role of a loop variable is most often called simply "è",
and there is nothing wrong with that. But single-letter variables are appropriate only
when it is clear from the context unambiguously (and without additional efforts to
analyze the code) what it is and why it is needed, and then, perhaps, only in those rare
cases when the variable contains some physical quantity traditionally denoted by such
a letter - for example, temperature may well be stored in the variable t, and spatial
coordinates - in the variables x, y and z. A pointer can be called p or ptr, a string
can be called str, a variable for temporary storage of some value can be
called tmp; a variable whose value will be the result of a function calculation is often
called result or simply res, a sum is quite suitable for an adder, and so on.
It is important to realize that such brevity is suitable only for local identifiers, i.e.
those whose scope of visibility is limited - for example, to a single subprogram. If an
identifier is visible in the whole program, it must be long and clear - at least to avoid
conflicts with identifiers from other subsystems. To understand what we are talking
about, imagine a program where two programmers are working on, one of them dealing
with a temperature sensor and the other with a clock; both temperature and time are
traditionally denoted by the letter t, but if our programmers use this circumstance to
name globally visible objects, there will be no problems: a program with two different
global variables with the same name has no chance to pass the linking stage.
Moreover, when it comes to globally visible identifiers, length and verbosity alone
do not guarantee absence of problems. Let's say we need to write a function that polls
a temperature sensor and returns the received value; if we call it GetTemperature,
formally everything seems to be fine, but in fact, with a very good probability we need
to find out the temperature previously written to a file or simply stored somewhere in
the program memory in another subsystem, and the GetTemperature identifier is
quite suitable for such an action too. Unfortunately, there is no universal recipe for
avoiding such conflicts, but we can still give some advice: when choosing a name for
a globally visible object, consider whether such a name could stand for something
else. In the example under consideration, the GetTemperature identifier can be
offered two or three alternative roles at once, so it should be recognized as unsuccessful.
For example, the ScanTemperatureSensor identifier could be more successful,
but only if it is used to work with all temperature sensors your program deals with - for
example, if such a sensor is known to be the only one, or if the
ScanTemperatureSensor function receives a number or other sensor identifier
as input. If your function is intended to measure, for example, the temperature in the
cabin of a car, and there is also a sensor, say, of the coolant temperature in the engine,
then you should add another word to the function name so that the resulting name
§ 2.12. More on program design 495
identifies what is happening unambiguously, for example:
ScanCabinTemperatureSensor.

2.12.11. Letter case in names and keywords


One of the features of Pascal is its fundamental case insensitivity: one and the same
identifier can be written as myname, MYNAME, MyName, mYnAmE and so on. The
same applies to keywords: begin, Begin, BEGIN, bEgIn... the compiler will
tolerate everything.
Nevertheless, the most common opinion is that keywords should be written in
lower case (i.e. in small letters) and not to make up anything extra on this topic. You
can make one logical exception to this rule: write in capital letters those BEGIN and
END that frame the main part of the program (we don't do this in our book, but in
general it is quite common practice).
Sometimes you can find Pascal programs in which keywords are capitalized:
Begin, End, If, etc.; there are also programs in which only the names of control
statements (If, While, For, Repeat) are capitalized, and all others, including
begin and end, are written in lower case. All of this is acceptable, though
exotic; we just need to clearly formulate for ourselves the rules when we write
keywords in this or that way, and strictly follow these (our own!) rules throughout the
program.
It is very rare to find a text where all keywords are typed in capital letters. As
practice shows, such text is harder to read; therefore, you should not write like that.
And, of course, you should not resort to refinements like BeGiN or, say, FunctioN
- this also occurs, but belongs to the realm of meaningless posturing.
As for the choice of names for identifiers, Pascal has a certain tradition on this
subject. If the name of a variable consists of a single letter or is a short abbreviation
(which is quite acceptable for local variables, see §2.12.10), the name of such a variable
is usually written in small letters: i, j, t, tmp, res, cnt, and so on. If the name of
a variable (as well as the name of a type, procedure or function, constant, label, etc.)
consists of several words, these words are written together, starting each with a capital
letter: StrangeGlobalVariable, ListPtr, UserListItem,
ExcellentFunction, KillThemAll, ProduceSomeCompleteMess,
etc. The question remains what to do with single-word names; in our book we wrote
them in lower case (counter, flag, item), but many programmers prefer to
capitalize them (Counter, Flag, Item); sometimes short variable and type names
are written with a small letter, and function and procedure names with a capital letter.
As usual in such cases, the choice is yours, but always act according to the same chosen
principles.

2.12.12. How to deal with description sections


The Pascal standard requires a strict ordering of description sections, but
fortunately existing implementations do not follow this requirement, allowing
§ 2.12. More on program design 496
description sections to be arranged in any order and more than one section of any type
to be created - in particular, it is not necessary to be limited to a single variable
description section.
All this makes it possible to distinguish between real global variables and variables
that are needed in the main part of the program, but to which you do not need access
from subprograms. For example, if there is a loop in the main part and an integer loop
variable is needed for it, then, for lack of anything better, you will have to describe
such a variable in the section of variable descriptions related to the whole program, but
it is clear that it has nothing to do with global variables. Therefore, everything you
need in the main part of the program and only in it - variables, labels, and
sometimes types that are not used anywhere but in the main part - should be
described immediately before the word begin, which denotes the beginning
of the main part, i.e. after all subroutine descriptions. If you need real global
variables - i.e. those that can be accessed from more than one subroutine, or from the
main program and subroutine - you should create another var section to describe
them, this time before the subroutine descriptions.
Note that the transition between parts of the program by labels is impossible, so the
labels used in subprograms must be described in them, while the labels described in
the global section of label description must be intended for the main program.
Therefore, if labels are used in the main program, their description section must
be located immediately before the start of the main program.

2.12.13. Continuity of compliance


To conclude the conversation about program design, let us note one extremely
important point. The rules of design must be observed in the program text
continuously throughout the whole time of its existence - starting from the moment
when the first line of the future program is written, and then, in fact, always.
One of the most stupid and serious mistakes that beginners often make is the word
"later". When the author of a program says, or even does not say, but thinks phrases
like "now I will make it work/compile/whatever else, and then...", he allows something
that should not be allowed: the possibility to "temporarily" do without rules. The point
is that "later", when the program is already written, you may not need its correct design.
Of course, if you come back to this text after some time, or, even worse, if someone
else has to read it, the illiterate text is likely to be simply thrown away; but it may
happen that no one will ever come back to this particular text, also because you never
finish it, not least because of your own attitude to the rules of design.
You have to read a program not only (and not so much) "afterwards", but right in
the process of its creation, because it is impossible to keep all the text in your memory;
while writing a program, the correct placement of the same structural indents is exactly
what you need most of all, non-compliance with the basic rules of design leads to
overconsumption of intellectual effort, unnecessary errors and efforts to eliminate
them, unnecessary frantic attempts to remember how this or that fragment is organized
and what it does. Beginners usually try to save time and effort with this proverbial
"later", but in reality the effect is just the opposite: if you try to save a few seconds, you
§ 2.12. More on program design 497
may lose several hours.
When you gain a little experience, continuous maintenance of the correctness of
the program text will cease to require any noticeable effort from you - it will, as they
say, "go on reflexes", your hands will begin to make the necessary changes in the text
automatically, without interrupting the main course of thought. You can reach this level
quite quickly - literally in a week or two, but for this you will need to immediately
strictly refuse any "postponements for later", just do not allow yourself to leave any
fragments in the text with violations of the rules of design.
Ensuring good readability of the text is the first priority; a program whose
text violates design rules cannot be considered right or wrong, working or not
working, useful or useless at all. As long as design irregularities are not eliminated,
they are the only property of the program that can be discussed,
and fixing them is the only thing you can do with a program like this.
If you see an indentation violation in a text, fix it immediately, and until you do,
do not do anything else with the program. Very soon it will become a habit and will no
longer cause internal protest. Let us emphasize once again: yes, it is very important.
Indentation may be broken when you move a fragment from one place to another, add
or remove an if (less often, a loop), etc. In any such case, always indent the text
properly first, and only then continue with the meaningful work with the text. Note that
any programmer's text editor allows you to shift a selected block of text to the
right or left by a specified number of positions; for programmers, this operation is
routine; if you don't know how to do it in your editor, figure it out! Don't waste time
manually punching in and out spaces in every line; this process is rightly distasteful
and may prevent you from following the principle of continuity of compliance.

2.13. Testing and debugging


2.13.1. Debugging in a programmer's life
By now your knowledge is sufficient to write a program of noticeable size; if you've
ever tried it, you know that a freshly written program almost never works as expected;
it requires a long, painstaking process of finding and fixing bugs, called debugging.
To begin with, we will try to formulate some basic principles related to debugging,
and we will do it in such a way that they may seem like a joke to you; you will soon
see for yourself that this joke has a very small part of a joke, and the rest is the real hard
truth. So:
• there's always a mistake;
• the mistake is always in the wrong place;
• if you know exactly where the error is, the error may have a different opinion;
• if you think the program should work, it's time to remember that "should" is
when you borrow and don't pay it back;
• If debugging is the process of correcting errors, then writing a program is the
process of introducing them;
• as soon as a mistake is discovered, the case always looks hopeless;
§ 2.13. Testing and debugging 470
• finding a mistake always seems silly;

• the more hopeless it looked, the more stupid the mistake seems;
• the computer doesn't do what you want it to do, it does what you asked it to do;
• a correct program works correctly under any conditions, an incorrect program
sometimes works too;
• and it better not be working;
• if the program works, it doesn't mean anything;
• if the program "crashes", you should be happy: the error has manifested itself,
so it can now be found;
• the louder the rumble and brighter the special effects when the program
"crashes", the better - it is much easier to look for a noticeable error;
• if there is definitely an error in the program, but the program still works, you are
out of luck - this is the nastiest case;
• neither the compiler, nor the library, nor the operating system is at fault;
• no one wants you dead, but if you do, no one's gonna be upset;
• it's actually not that bad - it's worse;
• the first written line of the future program text makes the debugging stage
inevitable;
• if you are not ready for debugging - don't start programming;
• the computer won't explode; but no one promised you more than that.
In computer labs one can often observe students who, having written some text in a
programming language, having successfully compiled it, and having made sure that the
result does not meet the task at hand, stop any constructive activity and switch, for
example, to another task, usually with a similar result. They usually explain such a
strange choice of strategy with the sacramental phrase "well, I wrote it, but it doesn't
work", and it is pronounced with an intonation that implies that the program itself is to
blame, as well as the teacher, the computer, the weather in Africa, the Argentinean
ambassador to Sweden, or the cafeteria lady, but certainly not the speaker, because he
wrote the program.
In such a situation, one simple principle should be immediately recalled: the
computer does exactly what is written in the program. This fact seems trivial, but it is
immediately followed by the second one, namely: if the program does not work
properly, it is not written properly. With this in mind, the statement "I wrote the
program" requires clarification: it would be more correct to say "I wrote the wrong
program".
Clearly, writing an incorrect program, even one that compiles successfully, is
certainly not a noteworthy endeavor; after all, the simplest Pascal text that compiles
successfully consists of only two words and one dot: "begin end. " Of course,
this program doesn't solve the problem at hand - but a program that "was written and
doesn't work" doesn't solve anything either, so how is it any better?
Another situation is no less typical, also occurring mostly on test papers and
expressed by the phrase "I wrote everything, but I didn't have time to debug it". The
problem here lies in the content of the word "everything": the authors of such programs
§ 2.13. Testing and debugging 471
often do not even suspect how far they were actually from solving the problem.
One can even understand the feelings of a beginner who has struggled to write a
program text and found that the program does not want to meet his expectations at all.
The very process of writing a program, usually called "coding", still seems very
difficult to a beginner, so subconsciously the author of such a program expects at least
some reward for "successfully" overcoming difficulties, and what the computer does in
the end resembles not a reward but a mockery.
Experienced programmers perceive all this quite differently. First, they know for
sure that coding, i.e. the process of writing the program text itself, is only a small part
of the various activities called programming, and not just a small part, but also the
easiest. Secondly, having the experience of creating programs, a programmer
understands well that at the moment when the program text has been successfully
compiled at last, nothing ends, but on the contrary - the most laborious phase of
program creation begins, called, as we have already guessed, debugging. Debugging
takes more effort than coding, requires much more sophisticated skills, and the main
thing is that it can take several times longer, and this is not an exaggeration at all.
Being psychologically ready for debugging, the programmer rationally calculates
his forces and time, so that errors detected at the first program launch do not discourage
him: it should be so! It is rather strange if a little or very complicated program does not
show errors at the first launches. The beginner's problem may be that, forgetting about
the upcoming debugging, he spent all his time and energy on writing the first version
of the text; when it comes to the most interesting part, there is no more time or energy.
Mountaineers who climb serious mountains follow one crucial principle: the goal
of climbing is not to reach the summit, but to get back. Those who forget this principle
often die on the descent after reaching the coveted summit. Of course, programming is
not so cruel - in any case, you will not die here; but if your goal is to write a program
that does what you want it to do, you need to be ready to spend two thirds of your time
and energy on debugging and not on anything else.
It is impossible to avoid debugging, but following some simple rules can make it
much easier. So, the most important thing: try to check the work of separate parts of
the program as you write them. Two laws work for you here at once: firstly, it is
much easier to debug code you have just written than code you have already forgotten;
secondly, the complexity of debugging grows nonlinearly as the amount of text to be
debugged increases. By the way, there is a rather obvious consequence of this rule: try
to divide your program into subroutines so that they depend on each other as little
as possible; among other benefits, this approach will make it easier to test parts of your
program separately. We can suggest an even more general principle: when you write
code, think about how you will debug it. Debugging does not forgive carelessness at
the coding stage; even such a "harmless" design violation as a short body of a branch
or loop statement left on the same line as the statement header can cost you a lot of
wasted nerves.
We have already seen the second rule at the beginning of the paragraph in the form
of a simple and succinct phrase: the error is always in the wrong place. If you have
an "intuitive certainty" that the effect you are observing can be caused, of course, only
by an error in this procedure, in this loop, in this fragment - do not believe your
§ 2.13. Testing and debugging 472
intuition. In general, intuition is a wonderful thing, but it does not work during program
debugging. The explanation is very simple: if your intuition was worth something
against this particular error, you would not make this error. So, before trying to fix this
or that fragment, you should be objectively (not "intuitively") sure that the error is
located here. Remember: in a place of the program where you know that you can make
a mistake, you are most likely not to make a mistake; on the contrary, the most intricate
errors appear exactly where you could not expect them; by the way, that's why they
appear there.
Objective methods of error localization include debug printing and step-by-step
execution under the control of a debugger program; if you hope to do without them, it
is better not to start writing programs at all; and here we come to the third rule: the
method of staring during debugging practically does not work. No matter how
much you stare at your text, the result will be expressed by the phrase "everything
seems to be right, but why doesn't it work?". Again, the reason is quite obvious: if you
could see your own error, you wouldn't make a mistake. There are two more
considerations in favor of the ineffectiveness of "close look": you will look most
attentively at those fragments of your code where you expect to find an error, and as
we already know, it is probably not there; plus, this includes such a well-known
phenomenon as "blurred view" - even if you look directly at the line of the program
where the error is made, you will hardly notice the error. The reason for this effect is
also easy to understand: you have just written this code fragment yourself using some
considerations that seem correct to you, and, as a result, the code fragment itself
continues to seem correct to you even if it actually contains an error. So, don't drag
your feet and don't waste precious time on "carefully studying" your own code:
you will have to debug the program anyway.
One more thing to note here: rewriting the program again is generally a good thing,
but it won't help you avoid debugging; most likely, you will just make mistakes in the
same places again. As for rewriting separate program fragments, it is even worse: the
error will probably be in a fragment other than the one you decide to rewrite.
The next rule is as follows: don't hope that the error is somewhere outside your
program. Of course, the compiler and the operating system are also programs, and
there are errors in them too, but hundreds of thousands of other users have already
caught all the simple errors there. The probability of running into an unknown error in
system software is much lower than the chance of winning a jackpot in some lottery.
As long as your programs do not exceed several hundred lines, you may consider that
they simply do not have the same weight category: to show an error in the same
compiler you need something much more tricky. And by the time your programs
become complex enough, you will realize that trying to blame it on the compiler and
the operating system looks rather ridiculous.
One more thing to consider: if you cannot find a bug in your program yourself,
no one else will find it. Students often ask the same question: "Why doesn't my
program work?" Your humble servant, hearing this question, usually, in turn, asks who
they think he is: a psychic, a telepath or a clairvoyant. In most cases it is more difficult
for an outsider to understand your program than to write a similar program from scratch.
§ 2.13. Testing and debugging 473
Besides, it is your program; you have made it yourself, you can figure it out yourself.
It is interesting that in the vast majority of cases the student asking this question has
not even tried to do anything to debug his program.
We will try to finish this paragraph in a positive way. Debugging is unavoidable
and very hard, but, as it often happens, the debugging process turns out to be a very
exciting, even a gamble. Some programmers declare that the debugger program is their
favorite computer game because no strategy, no solitaire, no arcade games give such a
variety of puzzles and food for brains, no flying games and shooters lead to the release
of so much adrenaline as the debugging process, and no successful quests do not bring
such a lot of adrenaline as the debugging process.
satisfaction as a successfully found and destroyed bug in the program. As it is easy to
guess, the question here is only in your personal attitude to what is happening: try to
perceive it as a game of "who's who" with a bug, and you will see that even this aspect
of programming can be enjoyable.

2.13.2. Tests
If you find that your program contains an error, it means that you have run it at
least once and most likely fed it with some data; however, the latter is not necessary, in
some cases programs "crash" immediately after startup, before they have time to read
anything. Anyway, you have already started testing your program; it is worth saying a
few words about the organization of this work.
Beginners, as a rule, "test" their programs very simply: run them, type some input
data and see what happens. This approach is really no good, but this, alas, becomes
clear not immediately and not to everyone; the author has met professional teams of
programmers, which even have specially hired testers, and they do exactly this from
morning till night: they race the program under test both ways, typing various data into
various input forms, pressing buttons and making other movements in the hope that
sooner or later they will come across some inconsistency. When programmers make
changes to a program, the work of such "testers" starts from the very beginning,
because, as we know, anything can break with any changes.
A similar picture can be observed in computer classes: students type input data on
the keyboard every time they run their programs, so that the same text is typed ten,
twenty, forty times. When you see such a picture, you wonder when this student will
become even a little bit lazy to do such nonsense.

To understand where such a student is wrong and how to act correctly, let's
remember that the standard input stream is not necessarily associated with the
keyboard; when we start a program, we can decide where it will read information from.
We have already used this in testing very simple programs (see page 252); even for a
program that needs only one integer as input, we chose not to enter this number every
time, but used the echo command. Of course, we still have to type the number when
we form a command, but the command itself remains in the history that the command
interpreter remembers for us, so we don't need to type the same number a second time:
instead, we use the up arrow or Ctrl-R search (see §1.2.8) to repeat the command
§ 2.13. Testing and debugging 474
we've already entered.
Of course, you can use the command interpreter's capabilities to store and re-run
test cases only in the simplest cases; if you approach the matter correctly, you should
create a set of tests represented in some objective form - usually in the form of a file or
several files - to check the program's operation.
By test we mean, and this is very important, all the information that is needed to
run a program or some part of it, input data that shows one or another aspect of its
functioning, check the result and give a verdict on whether everything worked correctly
or not. A test may consist of just data - for example, in one file we can form the data
that the program should input, in another file - what we expect to get at the output. A
more complex test may include a special test program code - this is how we have to act
when testing separate parts of a program, for example, its separate subroutines. Finally,
complex tests are designed in the form of whole programs - such programs that run the
program under test themselves, give it some data as input and check the results.
Suppose we decided to write a program to reduce simple fractions: it receives two
integers as input, meaning numerator and denominator, and prints two numbers as the
result - numerator and denominator of the same fraction reduced to the simplest
possible form.
When the program is written and passes compilation, further actions of a novice
programmer who has never thought about the correct organization of testing may look
like this:
newbie@host:~/work$ ./frcancel
25 15
5 3
newbie@host:~/work$ ./frcancel
7 12
7 12
newbie@host:~/work$ ./frcancel
100 2000
1 20
newbie@host:~/work$

In most cases, beginners are satisfied with this, thinking that the program is "correct",
but the task of fraction reduction is not as simple as it seems at first glance. If the
program is written "head-on", it will most likely not work for negative numbers. Our
beginner can be told about it by his more experienced friend or, if it happens in the
classroom, by the teacher; if he tries to run his program and give it something "with
minus" as input, its author can make sure that his seniors are right and the program, for
example, "loops" (this is what will happen with the simplest implementation of Euclid's
algorithm, which does not take into account the peculiarities of mod operation
for negative operands). Of course, fixing the program is not a problem, but another
thing is more important: any fixes may "break" what worked before, so the new version
of the program will have to be tested from the beginning, i.e. the test runs that have
already been done will have to be repeated, each time entering numbers from the
§ 2.13. Testing and debugging 475
keyboard.
As the reader has probably guessed by now, testing performed by a more
experienced programmer could look like this:

advanced@ho ~/work$ echo 25 15 ./frcancel


st:
5 3 |
advanced@ho ~/work$ echo 7 12 /frcancel
st:
7 12 | .
advanced@ho ~/work$ echo 100 |
1 20 st: 2000 ./frcancel
advanced@host:~/work$

This approach is undoubtedly better than the previous one, but it is also still far from
being a full-fledged test, because to repeat each of the tests you will have to find it in
the history manually, and after running it you will have to spend time checking if the
numbers are printed correctly or not.
To understand how to properly organize testing of the program, let's first note that
each test consists of four numbers: two of them are given to the program as input, and
the other two are needed to compare the result printed by the program with them. Such
a test can be written in a single line; thus, the three tests in our example are expressed
as the following lines

25 15 5 3 7 12 7 12 100 2000 1 20

It remains to invent some mechanism that, having a set of tests in this form, will run
the program under test the required number of times without our participation, feed it
with test data and check the correctness of the results. In our situation, the easiest way
to accomplish this is to write a script in the command interpreter language; if you don't
remember how to do this, reread §1.2.15.
To understand how our script will look like, let's imagine that the four numbers that
make up the test are located in the variables $a, $b, $c, and $d. You can "run" the
test with the command "echo $a $b | ./frcancel "; but we don't just need
to run the program, we need to compare the result with the expected result, for which
we also need to put the result into a variable. For this purpose we can use assignment
and "back apostrophes", which, as we remember, substitute the result of the command
execution:

res='echo $a $b | ./frcancel'.

The result in the $res variable can be compared with the expected result, and if a
mismatch is detected, the user can be informed about it:

if [ x"$c $d" != x"$res" ]; then


echo TEST $a $b FAILED: expected "$c $d", got "$res"
fi

The read command built into the interpreter will help us to "drive" the numbers
§ 2.13. Testing and debugging 476
from the tests into the variables $a, $b, $c and $d; this command takes the names
of variables (without the "$" sign) as parameters, reads a string from its input stream,
splits it into words and "arranges" these words into the specified variables; if there are
more words, the last variable will contain the rest of the string consisting of all the
"extra" words. The read command has a useful property: if the next line is read
successfully, it is completed successfully, and if the thread has run out - unsuccessfully.
This allows you to use it to organize a while loop like this:

while read a b c d; do
# here will be the body of the done loop

In the body of such a loop, variables $a, $b, $c and $d sequentially take the first,
second, third and fourth word of the next string as their values. Note that each of our
tests is a four-word string (the words are numbers, but they also consist of characters;
there is nothing but strings in scripting languages). In the body of the loop we will place
the above if, which runs a separate test, and we will only have to figure out how to
feed the resulting construct to the standard input the sequence of our tests. To do this,
we can use a redirect of the "document here" kind, which is done with the "<<" sign.
This sign is followed by some word ("stop word"), and then the text to be input to the
command is written, and this text ends with a line consisting entirely of the stop word.
All together it will look something like this:

#!/bin/sh
#frcancel_test .sh

while read a b c d ; do
res='echo $a $b | ./frcancel'.
if [ x"$c $d" != x"$res" ]; then
echo TEST $a $b FAILED: expected "$c $d", got "$res"
fi
done <<END
25 15 5 3
7 12 7 12
100 2000 1 20
END

In spite of its primitive nature, it is already a real full-fledged test suite with the ability
to run tests automatically. As you can see, adding a new test is reduced to writing one
more line before END, but that's not the main thing; more important is that running
all the tests doesn't require any effort from us, we just run the script and see if it
produces anything. If it doesn't produce anything - the run was successful. The general
rule, which we have actually already followed, is as follows: a test may be hard to
write, but it should be easy to run.
The point here is this. Tests are needed not only to try to detect errors right after
writing a program, but also to assure ourselves to some reasonable extent that we have
not broken anything by making changes. That's why debugging should never be
§ 2.13. Testing and debugging 477
considered finished, and tests should never be thrown away (for example, erased) - they
will come in handy many times; and, of course, we should count on the fact that after
any slightest noticeable changes in the program we will have to check it with all the
tests we have, and it is understandably a bit expensive to do it manually.

If another test reveals an error in the program, do not rush to fix something. First
of all, you should consider whether you can simplify the test where the error occurred,
i.e. whether you can write a test that shows the same error but is simpler. Of course,
this does not mean that the more complex test should be thrown away. Tests should not
be thrown away at all. But the simpler the test is, the fewer factors that can affect the
program's work and the easier it will be to find an error. By successively simplifying
one test, you can create a whole family of tests, and perhaps the simplest of them will
not show an error. This is no reason to stop: try to take the simplest test that still "bugs"
and simplify it in some other direction. In any case, all the created tests, regardless of
whether they show some error right now or not, are valuable for further debugging:
some of them can show you the way to find an error, while others can show you where
you should not look for an error.
There is a special approach to writing programs called test first - in Russian it can be
roughly translated as "test first". In this approach, you write tests first, run them, make sure
they don't work, and then write the program text to make the tests work. If the programmer
starts to think that the program is still not written the way it should be, he should first write a
new test, which, if it does not work, will give some objective confirmation of the "incorrectness"
of the program, and only then change the program so that both the new test and all the old
ones pass. Writing program text intended for something other than satisfying the available
tests is completely excluded.
In this approach, tests are mostly written not for the whole program, but for each of its
procedures and functions, for subsystems including several interconnected procedures, etc.
Following the "test first" principle allows you to edit the program more boldly, without fear of
spoiling it: if we spoil something, the tests that stopped working will tell us about it; if something
spoils, but no test stops working, it means that the test coverage is insufficient and we need
to write more tests.
Of course, you don't have to follow this approach, but knowing it exists and having it in
mind will be at least helpful.

2.13.3. Debug print


When everything possible has been squeezed out of the tests, the fantasy of "what
else to test" has finally dried up, and the program continues to work incorrectly, there
comes a moment when you need to understand why the tests give such unexpected
results; in other words, you need to find out what is really going on in the program.
Perhaps, the cheapest and the most "cheap and angry" way to do this is to insert
additional operators into the program that will print something. The information that
such operators print has nothing to do with the task to be solved, it is needed only in
the process of debugging; in fact, all this is called debugging printing. Most often
debug printing allows you to find out the answers to two questions: "whether the
program reaches this place" and "how this variable changes (what value it gets)".
The first thing to remember about debug printing is that the debug messages
§ 2.13. Testing and debugging 478
should be easily recognizable and should not blend in with the rest of the
information your program produces. You can start each debug message, for
example, with five asterisks, or with the word "DEBUG" (necessarily in capital letters,
unless your program does not output messages in capital letters, in which case the debug
print marker should be typed in lower case), or with a concise "XXX"; the main thing
is that the debug print should be clearly visible. The second simple rule for debug
printing is that you should not forget about line feeds; for Pascal, this means that you
should use writeln. This is because of the so-called output buffering; you may not
see a message that didn't end with a line feed immediately, and if the program crashes,
you may not see it at all, which is not good for debug printing.
More specifically, output operations place information in a buffer, from which it is given to
the operating system for output to a stream in certain cases: when the buffer is full, when the
program is terminated, when a procedure is called that forcibly clears the buffer (for Free
Pascal, such a procedure is called flush; in particular, flush(output) forcibly
flushes the buffer of a standard output stream). In addition, when output to the
terminal (as opposed to output to a file), flushing is also performed when a line feed is issued
and when the program requests an input operation, in case an input prompt has been issued
before.
Another point related to debug printing is rather obvious, but some students have
to repeat it several times for some reason: no one has canceled structural indentation
for operators related to debug printing. Even if you intend to clean the inserted
operators out of the program text after five minutes, this is no reason to turn the program
text into a ridiculous scribble, even if only for five minutes.
However, do not hurry to remove the debugging seal from the text. As practice
shows and the universal law of meanness says, as soon as the last debug print operator
is removed from the program, another error will be detected in it immediately, and you
will have to reinsert most of the operators you just removed to catch it. It would be
better to put the debugging operators into curly brackets and turn them into comments
(but without breaking the structural indentation!) But even inserting and deleting
comment characters may be unreasonably time-consuming. It is better to use
conditional compilation directives, which came to Pascal from the C language and look
rather strange at first glance. Conditional compilation uses "symbols" that need to be
"defined"; they are not good for anything else, unlike C, where the main role of similar
"symbols" is quite different. Anyway, make up some identifier that you will use as a
"symbol" to turn debug printing on and off and for nothing else; we can recommend
the word "DEBUG" for this role. Place a "{$DEFINE DEBUG}" directive at the
beginning of the program that will "define" this symbol. Now place each program
fragment intended for debug printing between the conditional compilation directives,
like this:

{$IFDEF DEBUG}
writeln('DEBUG: x = ', x, ' y = ', y);
{$ENDIF}

As long as there is a directive defining the DEBUG symbol at the beginning of the
§ 2.13. Testing and debugging 479
program, such a writeln operator will be taken into account by the compiler as
usual, but if you remove the DEFINE directive, the DEBUG "symbol"
becomes undefined, and everything between {$IFDEF DEBUG} and {$ENDIF}
will be simply ignored by the compiler. Note that the DEFINE directive doesn't
even have to be removed completely, just remove the "$" character from it, and it will
turn into an ordinary comment, and all debug printing will be disabled; if debug printing
is needed again, it will be enough to insert the character in its place. But you can do
even better: do not insert the DEFINE directive into the program at all, and define the
symbol from the compiler command line if necessary. To do this, just add a checkbox
"-dDEBUG" to the command line, like this:

fpc -dDEBUG myprog.pas

By doing so, we can compile our program with or without using debug print without
changing the source code. You will understand why this may be important when you
start using version control.
In some cases, when debugging, you may want to know what the current state of a
complex data structure is - a list, a tree, a hash table, or something even more complex.
You should not be afraid of such situations, they are nothing complicated; it is just
worth describing a procedure specially designed to print out the current state of the data
structure we need. Such a procedure can be enclosed in a conditionally compiled
fragment, so that the procedure is not included in the version of the executable file
without debug printing and does not increase the amount of machine code.

2.13.4. Debugger gdb


Debug printing is undoubtedly a powerful tool, but in some cases we can
understand what is going on much faster if we are allowed to execute our program step
by step with viewing the current values of variables. For this purpose, a special program
called a debugger is used; in modern versions of Unix, including Linux, the most
popular debugger is gdb (GNU Debugger), and we will try to use it now.
The first thing to realize when starting to work with the debugger is that we will
need some help from the compiler. The debugger works with an executable file that
contains the result of translating our program into machine code; as a consequence, it
usually does not contain any information about the names of our variables and
subroutines, source code lines, etc.The debugger could, of course, show us the machine
code obtained from our program in the form of mnemonics (assembler symbols) and
offer us to go through it step by step, but it would not be of much use: most likely,
looking at the mnemonics of machine commands, we simply do not recognize the
program constructs from which this code was obtained. This problem can be solved by
the so-called debugging information, which the compiler can insert into the executable
file at our request. This information includes information about all the names we used
in the program, as well as the names of the files that contained the source code of our
program, and the numbers of source code lines from which this or that fragment of
machine code was derived.
§ 2.13. Testing and debugging 480
Debugging information occupies a relatively large amount of space in the
executable file without affecting the execution of the program - it is needed only when
the debugger is used. Therefore, the compiler does not supply debugging information
to the executable unless you ask it to do so, and you can ask it by specifying the -g
switch on the command line:

fpc -g myprog.pas
The resulting executable file will be much larger in size, but we will be able to see
fragments of our source code and use variable names when executing our program
under the control of the debugger.
The gdb debugger is a program that has its own built-in command line
interpreter; we perform all actions with our program by giving commands to the
debugger. The debugger can work in different modes - in particular, it can be connected
to an already running program (process), and also with its help you can figure out in
what place of the program and for what reasons the crash occurred , but we will be
201

enough to deal with only one, the most popular mode, in which the debugger itself starts
our program and controls the course of its execution, obeying our commands. The
command line built into gdb is equipped with editing functions, autocompletion
(except, of course, not file names, but variable and subroutine names), storing and
searching the history of entered commands, so working with gdb turns out to be quite
convenient - provided we know how to do it.
You can start the debugger to work in this mode by specifying the name of the
executable file as a parameter:

gdb ./myprog

If we need to pass some command line arguments to our program, this is done with the
--args switch, for example:

gdb --args ./myprog abra schwabra kadabra

(in this example, the myprog program will be run with three command line arguments
- abra, schwabra, and kadabra).
Once started, the debugger will report its version and some other information and
issue its command line prompt, usually looking like this:

(gdb)

We can start program execution with the start command, in this case the debugger
will start our program, but will stop it at the first statement of the main part, not allowing
it to do anything; further execution will take place under our control. We can do the
opposite: give the run command, then the program will start and run as normal (that

If the operating system has generated a so-called core file; however, this usually does not happen
201

with programs compiled with fpc.


§ 2.13. Testing and debugging 481
is, as it does without the debugger), and if all is well, the program will terminate safely
and the debugger will do nothing; but if the program execution is interrupted by
pressing Ctrl-C, then the program will not be destroyed when executed under the
debugger; instead, the debugger will stop it and issue its command line prompt, asking
us what to do next. In addition, if the program crashes while running under the
debugger, the debugger will show us where in the source code it happened and allow
us to view the current values of variables, which in most cases allows us to understand
why the crash occurred.
When the program being debugged is stopped, we can tell the debugger to perform
one step in it; the question is what will be considered a "step". There are two answers
to this question. The next command considers one line of the source program text as
a "step", and if there are procedure and function calls in this line, their execution is
considered as a part of the "step", i.e. the next command does not go inside the
called subroutines. The second command for "one step" is called step and differs in
that it goes inside the called procedures and functions, i.e. if the current line contains a
call to a subprogram and we gave the step command, then after that the first line of
the text of this subprogram will become the current one. The step and next
commands can be repeated just by pressing Enter, which speeds up step-by-step
execution.
After stopping program execution (including after step or next commands),
the debugger usually shows a line of source text that corresponds to the current point
of execution, which in most cases allows you to orient yourself and understand where
we are. If the given line is not enough, you can use the list command, which will
display the neighborhood of the current line - the five lines before it, the line itself and
the five lines after it. If this is not enough, you can tell the list command the line
number to start with; for example, "list 120" will display ten lines starting at line
120. If desired, you can see the next ten lines by pressing Enter instead of typing the
next command, and so on to the end of the file.
If the "steps" executed by the program are too many for a step-by-step execution,
we can start the program for normal execution, during which it will not stop after each
step; at the initial start, as already mentioned, the run command is used for this
purpose, and if the program has been suspended, the cont command (from the word
continue) can be used to continue its execution. Usually, before using one of these
commands, so-called breakpoints are set in the program; when the program reaches
such a point, it will stop and the debugger will give us an invitation asking for further
instructions. Breakpoints are set using the break command, which needs a parameter;
it can be either the line number of the source code (and when debugging a program
consisting of several source codes - the file name and line number separated by a colon),
or the name of a subprogram (procedure or function). In the first case the program will
be stopped when it reaches the specified line (but before this line is executed), in the
second case the program will stop as soon as the specified subroutine is called.
Note that since Pascal does not distinguish between upper and lower case letters in
identifiers, at the object code level (including debugging information) the compiler sets
§ 2.13. Testing and debugging 482
all names to uppercase. It means that when using the debugger, the names of
procedures and functions must be written in capital letters, otherwise the debugger
will not understand us. Technically speaking, the same applies to variable names, but,
for example, the versions of the compiler and debugger used by the author of this book
perfectly understood local variable names written in any case.
When you create a new breakpoint, the debugger shows you its number, which you
can use for more flexible work with breakpoints. For example, the command
"disable 3" will temporarily turn off breakpoint #3, and the command "enable
3" will turn it back on. The "ignore 3 550" command will tell the debugger that
breakpoint #3 should be "followed without stopping" (ignored) 550 times, and only
stopped after that - that is, if it is ever reached for the 551st time. Finally, the cond
(from the word conditional) command allows you to specify a stop condition in the
form of a logical expression. For example,
cond 5 I < 100

indicates that you should stop at breakpoint #5 only if the value of the i variable is
less than one hundred. The "info breakpoints" command allows you to find
out what breakpoints you have, what conditions are set for them, ignore counters, etc.
You can view the values of variables when the program is stopped by using the
inspect command. If necessary, the "set var" command allows you to change
the value of a variable, although this is relatively rarely used; for example, "set var
x=50" will force the variable x to be set to 50.
In a program that actively uses subroutines, the bt (or backtrace, for short)
command can be very useful. This command shows which subroutines have been called
(but not yet completed), with what parameters they were called, and from where in the
program. For example, while debugging the hanoi2 program (see §2.11.2), the bt
command might produce:
(gdb) bt
#0 MOVELARGER (RODS=...) at hanoi2.pas:52
#1 0x080483d9 in SOLVE (N=20) at hanoi2.pas:91
#2 0x08048521 in main () at hanoi2.pas:110

This means that the MOVELARGER procedure (called MoveLarger in the program
text) is active now, the current line is line 52 in the hanoi2.pas file; the
MoveLarger procedure was called from the SOLVE procedure
(Solve), the call is located in line 91. Finally, Solve was called from the main
part of the program (denoted by the word main; the point here is that gdb is mainly
C-oriented, and it uses a function named main instead of main), the call is located
in line 110.
The first number in each output line of the bt command is the frame number.
Using this number, we can switch between the contexts of the listed subroutines; for
example, to look at the values of variables at the points where the subroutines below
were called. For example, in our example, the frame 1 command will allow us to look
at the point in the Solve procedure where it calls MoveLarger. After the frame
§ 2.13. Testing and debugging 483
command, the list and inspect commands can be used to provide
information related to the current position of the selected frame.
Another useful command is call; it allows you to call any of your subroutines
with specified parameters at any time. Unfortunately, there are some limitations here;
gdb doesn't know anything about Pascal strings, for example, so if your subroutine
requires a string as one of its parameters, you can call it by specifying some suitable
variable as a parameter, but you can't specify a specific string value.
Exiting the debugger is done with the quit command, or you can arrange an
"end of file" situation by pressing Ctrl-D. In addition, it is useful to know that the
debugger has a help command, although it is not so easy to work with.
As mentioned at the beginning of this paragraph, gdb can be used in different
modes. For example, if you have already started your program and it behaves
incorrectly, but you don't want to repeat the actions that caused this behavior, or you
are not sure if you can recreate the existing situation, you can connect the debugger to
an existing process. To do this, of course, you need to find out the process number; we
described how to do this in §1.2.9. Next, gdb is run with two parameters: the name
of the executable file and the process number, e.g.:

gdb ./myprogram 2765

The debugger needs the executable file to take debugging information from it, i.e.
information about variable names and source code line numbers. After a successful
connection, the debugger pauses the process and waits for your instructions; you can
use the bt command to find out where you are and how you got there, use the
use the inspect command to view the current values of variables, use the break,
cont, step, next, etc. commands. After exiting the debugger, the process will
continue execution, unless, of course, you killed it during debugging.
The last of the three gdb modes, core-file analysis mode, is not needed when working
with Free Pascal, because Free Pascal creates executables so that they intercept operating
system signals indicating an emergency, generate an error message, and terminate - which
from the system's point of view looks like a correct termination, not an emergency, and does
not result in the creation of a core file. We will return to the study of gdb in the second volume;
for programs written in C, we will definitely need to analyze core files.

2.14. Modules and separate compilation


As long as the source code of a program consists of a few tens of lines, it is easiest
to store it in a single file. However, as the size of the program increases, it becomes
more and more difficult to work with a single file, and there are several reasons for this.
First, a long file is elementary hard to flip through. Secondly, as a rule, a programmer
works with only a small fragment of source code at each moment, carefully throwing
away the other parts of the program so as not to be distracted, and in this respect it
would be better if the fragments that are not in work now were located somewhere far
away, i.e. so as not to be seen even accidentally. Third, if a program is divided into
separate files, it is much easier to find the right place in it, just as it is easier to find the
§ 2.14. Modules and separate compilation 484
right paper in a cabinet of office folders than in a large drawer full of papers piled up
in disorder. Finally, it often happens that one and the same code fragment is used in
different programs - and it is likely to be edited from time to time (for example, to fix
errors), and it is obvious that it is much easier to fix the file in one place and copy the
whole file to all other projects than to fix the same fragment that is inserted into
different files.
Almost all programming languages support including the contents of one file in
another file during translation; in most Pascal implementations, including our Free
Pascal, this is done with the {$I file_name} directive, for example:

{$I myfile.pas}

This will work the same as if you had pasted the entire contents of myfile.pas
right there instead of this line.

Partitioning the program text into separate files connected by the translator
removes some of the problems, but, unfortunately, not all of them, because such a set
of files remains, as programmers say, one unit of translation - in other words, we can
only compile them all together in one go. Meanwhile, although modern compilers work
quite fast, but the volumes of the most serious programs are such that their complete
recompilation may take several hours and sometimes even days. If we have to wait for
a day (or even a couple of hours - it will be enough) after making any, even the most
insignificant, change in the program, it will be absolutely impossible to work. Besides,
programmers almost always use so-called libraries - sets of ready-made subprograms
that are changed very rarely and, accordingly, it would be silly to spend time on
recompiling them all the time. Finally, the problems are caused by constantly occurring
name conflicts: the larger the code volume, the more different global identifiers (at least
subroutine names) are required in it, the probability of random coincidences grows, and
there is almost nothing you can do about it during translation in one step.
All these problems can be solved by the technique of split translation. Its essence
is that a program is created as a set of separate parts, each of which is compiled
separately. Such parts are called translation units, or modules. Most programming
languages, including Pascal, assume that modules are individual files. Usually, a set of
logically related subroutines is formed as a separate translation unit; everything
necessary for their operation is also placed in the module - for example, global
variables, if any, as well as all sorts of constants and so on. Each module is compiled
separately; the translation of each of them results in some intermediate file containing
the so-called object code , and such files are combined into a ready executable file
202

with the help of the link editor (linker); the link editor usually works so fast that the
rebuilding of the executable file from intermediate files each time does not create
significant problems.

202
Object code is a kind of a preparation for machine code: program fragments are represented in it
by sequences of instruction codes, but some addresses may not be arranged in these codes because they
were not known at the time of compilation; the final transformation of the code into machine code is
the task of the link editor.
§ 2.14. Modules and separate compilation 485
A very important property of a module is that it has its own namespace: when
creating a module, we can decide which of the input names will be visible from other
modules and which will not; it is said that a module exports part of the names entered
in it. It often happens that a module enters several dozens and sometimes hundreds of
identifiers, but all of them turn out to be needed only in the module itself, and from the
rest of the program only one or two subroutines need to be addressed, and it is their
names that the module exports. This eliminates the problem of name conflicts: labels
with the same names may appear in different modules, and this does not bother us in
any way, unless they are exported. Technically, this means that when translating the
source code of a module into object code, all identifiers other than the exported ones
disappear.

2.14.1. Modules in Pascal


In Pascal, the file containing the main part of the program is syntactically different
from the files implementing the other (if you will, "subordinate") translation units. The
main module, as we have seen, begins with an optional program header with the
keyword program; "non-primary" modules (which we have not yet encountered)
begin with an optional header with the keyword unit:

unit myunit;

Unlike the identifier appearing in the program header, which has absolutely no effect
on anything, the module identifier (name) is a very important thing. First of all, this
name identifies the module in other translation units, including the main program; to
get the module's capabilities at your disposal, you must place the uses directive,
already familiar to us from the chapter on full-screen programs, in the program (and if
necessary, in another module, but more on that later). The crt module we used earlier
is included in the compiler, but connecting it is not fundamentally different from
connecting the modules we wrote ourselves:

program MyPrograml;
uses myunit;

You can use several uses directives, or you can list modules comma-separated in
one such directive, for example:

uses crt, mymodule, mydata;

The easiest thing to do is to make the name of the module specified in its header match
the main part of the file name, or, more precisely, to make the original module file have
a name formed from the module name by adding the suffix ".pp"; for example, the
mymodule file would be easiest to call mymodule.pp. This convention can be
circumvented, but we will not discuss this possibility.
The further text of the module should consist of two parts: the interface , labeled
with the keyword interface, and the implementation, labeled with the keyword
§ 2.14. Modules and separate compilation 486
implementation. In the interface part we describe everything that will be visible
from other translation units that use this module, and for subroutines in the interface
part we put only their headers; in addition to subroutines, the interface part can also
describe constants, types and global variables (but do not forget that global variables
are better not to use at all).
In the implementation, we must, first, write all subroutines whose headers are
placed in the interface part; second, we can describe here any objects that we do not
want to show to the "outside world" (i.e. other modules); these can be constants,
variables, types, and even subroutines whose headers were not placed in the interface
part.
The main idea of dividing a module into an interface and an implementation is that
we have to tell in detail about all the features of the interface to those programmers who
will use our module, otherwise they will simply not be able to use it. When we create
documentation for our module, we have to describe in it all the names that the interface
section introduces. Moreover, when someone starts using our module (note that this
also applies to the case when we use it ourselves), we have to do our best to keep the
rules of using the names introduced in the interface unchanged, i.e. we can add new
types or subroutines, but if we think of changing something that was already there, we
have to think twice: all programs using our module will "break".
Everything is much simpler with the implementation. We don't need to tell our
module users about it, we don't need to include it in the documentation either ; we can 203

change it at any time without fear that something other than our module itself will break.
Among other things, possible name conflicts must be taken into account. If all the
names used in a module are visible throughout the program, and the program itself is
large enough, the problem of random name conflicts like "oh, I think someone has
already named a completely different procedure in a completely different place" gives
programmers a lot of headaches, especially if some mo-
dulles are used simultaneously in different programs. Obviously, hiding in the module
those names that are not intended for direct use from other translation units drastically
reduces the probability of such random coincidences.
Own module namespaces allow to solve not only the problem of name conflict, but
also the problem of simple "foolproofing", especially relevant in large program
developments, in which several people take part. If the author of a module does not
assume that this or that procedure will be called from other modules, or that a variable
should not be changed in any other way than by procedures of the same module, then
it is enough for him not to put the corresponding names in the interface, and you can
not worry about anything - other programmers will not be able to access them, purely
technically.
In general, hiding the details of the implementation of a subsystem in the program
is called encapsulation and allows programmers to more boldly correct the code of
modules, without fear that other modules in this case will stop working: it is enough to

203
In fact, the implementation is also often documented, but such documentation is intended not for
the users of the module, but for those programmers who work in the same team with us and who may
need to debug or improve our module.
§ 2.14. Modules and separate compilation 487
keep unchanged and working those names that are taken out in the interface.
Like the main program file, the module file ends with the keyword end and a dot.
Before that, you can insert a so-called initialization section - write the word begin,
a few operators and only then end with a dot; these operators will be executed before
the main part of the program starts. But it only makes sense to do this if you have global
variables, so if you do it right, you won't need the initialization section for a very long
time - perhaps never, unless you decide to make Free Pascal your main tool.
As an example, let's return to our binary search tree from §2.11.5 and try to put
everything we need to work with it into a separate module. The interface will consist
of two types - the tree node itself and its pointer, i.e. TreeNode and TreeNodePtr
types, as well as two subroutines: AddToTree procedure and IsInTree function.
We may notice that the "generalized" SearchTree function and the
TreeNodePos type returned by it are implementation peculiarities that the module
user does not need to know about: what if we want to change this implementation.
Therefore, the TreeNodePos type will be described in the implementation part of
the module, and only AddToTree and IsInTree, but not SearchTree,
will be present from the function headers in the interface part. It will look like this:
unit Ingtree; interface { Ingtree.pp }
type
TreeNodePtr = 'TreeNode;
TreeNode = record
data: longint;
left, right: TreeNodePtr;
end;

procedure AddToTree(var p: TreeNodePtr; val: longint; var ok:


boolean); function IsInTree(p: TreeNodePtr; val: longint):
boolean;

implementation

type
TreeNodePos = 'TreeNodePtr;

function SearchTree(var p: TreeNodePtr; val: longint):


TreeNodePos; begin
if (p = nil) or (p'.data = val) then
SearchTree := @p
else
if val < p'.data then
SearchTree := SearchTree(p'.left, val)
else
SearchTree := SearchTree(p'.right, val)
end;

procedure AddToTree(var p: TreeNodePtr; val: longint; var ok:


boolean);
var
pos: TreeNodePos;
begin
§ 2.14. Modules and separate compilation 488
pos := SearchTree(p, val);
if pos' = nil then
begin
new(pos');
pos''.data := val;
pos''.left := nil;
pos''.right := nil;
ok := true
end
else
ok := false
end;

function IsInTree(p: TreeNodePtr; val: longint): boolean;


begin
IsInTree := SearchTree(p, val)' <> nil end;

end.

To demonstrate the work of this module, let's write a small program that will read
from the keyboard requests of the form "+ 25" and "? 36", the request of the first
kind will be executed by adding the specified number to the tree, in response to the
request of the second kind - by printing Yes or No depending on the number of the
specified number.
§ 2.14. Modules and separate compilation 489
of whether the specified number is in the tree or not. The program will look like this:
program UnitDemo; { unitdemo.pas }
uses lngtree; var root: TreeNodePtr = nil; c: char; n:
longint; ok: boolean; begin while not eof do begin readln(c,
n); case c of '?': begin
if IsInTree(root, n) then
writeln('Yes!') else writeln('No.') end; '
+ ': begin AddToTree(root, n, ok); if ok
then writeln('Successfully added') else
writeln('Couldn't add!') end; else
writeln('Unknown command "', c, '"') end
end end end.
It is enough to run the compiler once to compile the whole program:
fpc unitdemo.pas
The Ingtree.pp module will be compiled automatically, and only if required.
The result will be two files: Ingtree.ppu and Ingtree.o. If the module's source
text is changed, the compiler will recompile the module again the next time the whole
program is rebuilt, but if it is left untouched, only the main program will be recompiled.
The compiler finds out whether the module needs to be recompiled or not by
comparing the last modification dates of Ingtree.pp and Ingtree.ppu files;
if the first one is newer (or if the second one simply does not exist), the compilation is
performed, otherwise the compiler considers it unnecessary and skips it. However, no
one prevents you from compiling the module "manually" by issuing a separate
command:
2.14.2. Using modules from each other
Quite often modules need to use the capabilities of other modules. The simplest of
such cases occurs when you need to call a subroutine of another module from a
subroutine of one module. Similarly, it may be necessary to use the name of a constant,
type, or global variable in the body of a subroutine that was entered by another module.
All these cases do not create any difficulties; just insert the uses directive into the
implementation section (usually right after the word implementation), and all
the features provided by the interface of the module specified in the directive will be
available to you.
Things are a bit worse if you have to make your module's interface dependent on
another module. Fortunately, such cases are much rarer, but they are still possible. For
example, you may need to describe a new type in the interface part of your module
based on a type introduced in another module (e.g. a record type was introduced in one
module, and another module introduces a type array of such records, etc.). You may
just as well need, when creating a new array type, to refer to a constant introduced by
another module; finally, you may need in your interface routines a parameter of a type
described in another module, or a return value of such a type from a function, or,
finally, just a global variable having a type that came from another module. All these
situations have a common feature: in the interface part of your module, you use a name
fpc Ingtree.pp
§ 2.14. Modules and separate compilation 490
introduced by another module.
In principle, there are no special problems in this case: it is enough to place the
uses directive in the interface part (usually right after the word interface) or in
the very beginning of the module right after its header; the effect will be exactly the
same. It should only be taken into account that such dependency, unlike dependency
at the implementation level, gives rise to certain restrictions: the interface parts of two
or more modules cannot depend on each other crosswise or "in a circle".
In general, cross-dependencies between modules should be avoided in any case,
but sometimes this is still necessary; at least make sure that your modules use each
other's features only in their implementations, not in interfaces. Programmers try to
avoid dependencies between interfaces even if they are not cross-dependent, but this
is not always the case.
2.14.3. Module as an architectural unit
When distributing the program code into modules, you should keep in mind
several rules.
First of all, all features of one module must be logically related to each other.
When a program consists of two or three modules, we can still remember how parts of
the program are distributed among the modules, even if this distribution is not subject
to any logic. The situation changes dramatically when the number of modules reaches
at least a dozen; meanwhile, programs consisting of hundreds of modules are quite
common; moreover, you can easily find programs with thousands and even tens of
thousands of modules. You can navigate in such an ocean of code only if the program
implementation is not just scattered among modules, but is divided into subsystems
according to some logic, each of which consists of one or several modules.
To check if you are doing the correct breakdown into modules, ask yourself a
simple question about each module (and about each subsystem consisting of several
modules): what exactly is this module (this subsystem) responsible for? The answer
should consist of a single phrase, as in the case of subprograms. If you can't give such
an answer, then most likely your principle of division into modules needs correction.
In particular, if a module is responsible not for one task, but for two unrelated tasks, it
would be logical to consider splitting this module into two.
There is one more point related to global identifiers. Pascal does not have separate
namespaces for global objects, so to avoid possible name conflicts, all globally visible
identifiers referring to one subsystem (a module or some logically united set of modules) are
often provided with a common prefix denoting this subsystem. For example, if you create a
module for working with complex numbers, it makes sense to start all exported identifiers of
such a module with the word Complex, something like ComplexAddition,
ComplexMultiplication. ComplexRealPart, etc. This is not so relevant in
small programs, but in large projects, name conflicts can become a serious problem.

2.14.4. Weakening of module cohesion


When realizing some modules constantly have to use the possibilities realized in
others; it is said that the realization of one module depends on the existence of another,
§ 2.14. Modules and separate compilation 491
or that the modules are coupled with each other. Experience shows that the weaker
the coupling of modules, that is, their dependence on each other, the more useful,
versatile and easy to modify these modules.
§ 2.14. Modules and separate compilation 492
The coupling of modules can be different; in particular, if one module uses the
capabilities of another, but the second does not depend on the first, we speak of one-
way dependence, whereas if each of the two modules is written assuming the existence
of the second, we have to speak of mutual dependence. In addition, if a module only
calls subroutines from another module, we speak of coupling by calls, if a module
refers to global variables of another module, we speak of coupling by variables. We
also distinguish data coupling, when one and the same data structure, located in
memory, used by two or more modules; in general, such coupling can occur without
variable coupling - for example, if one of the subroutines included in the module,
returns a pointer to a data structure belonging to the module.
Experience shows that one-way dependence is always better than mutual
dependence, and call coupling is always preferable to variable coupling. Particular
caution is required for data coupling, which often becomes a source of unpleasant
errors. In a program ideal from the point of view of partitioning into modules, all
dependencies between modules are one-way, there are no global variables at all, and
for each data structure placed in memory, you can specify its owner (a module that is
responsible, for example, for timely destruction of this structure), and each data
structure is used only by its owner.
Practice makes some adjustments to the "ideal" requirements. Often there is a need
for interdependent modules. Of course, it is still possible to merge such modules into
one, and in some cases this is what we should do, but not always. For example, when
implementing a multiplayer game, we could allocate in one module communication
with the user, and in another - support for communication with other instances of our
program, which serve other players; almost inevitably, such modules will have to
address each other, but since each of them is responsible for its own (clearly
formulated!) subtask, it is not necessary to merge them into one module - the clarity of
the program from this will not win at all. It can be said that mutual dependence of
modules should be avoided if possible, but you should not be seriously afraid of its
occurrence.
This is not the case with cohesion on variables and on data. You can always do
without global variables, and, as we have already discussed above (see §2.3.5), you
should try to do without them, because using them can be too expensive. But as long
as the global variable is localized in the module, the situation can be kept under control,
because the module is not the whole program. If the global variable is visible in the
whole program (exported from its module), then, firstly, any change in it can
potentially disrupt the work of any of the program subsystems, and, secondly, anyone
can make changes in it, that is, we have to be ready to look for the cause of any failure
throughout the program.
Variable chaining has another negative effect: such modules are more difficult to
modify. Imagine that one of your modules must "remember" the coordinates of some
object in space. For example, when creating it, you decided to store the usual
orthogonal (Cartesian) coordinates. In the process of program operation you may find
out that it is more convenient to store not Cartesian coordinates but polar coordinates;
if modules communicate with each other only by calling subroutines, such
§ 2.14. Modules and separate compilation 493
modification will not be a problem, but if variables in which coordinates are stored are
available from other modules and are actively used by them, you will most likely have
to forget about modification - rewriting the whole program may be too complicated.
Besides, there is often a situation when the values of several variables are somehow
related to each other, so that when one variable is changed, the other(s) must be
changed as well; in such cases, it is said that it is necessary to ensure the integrity of
the state. Chaining by global variables deprives the module of the possibility to
guarantee such integrity.
The worst situation is with cohesion on dynamic data structures. If we can always
do without global variables, and in most cases it is not too difficult, we usually have to
pass pointers to dynamic variables from one module to another, and there is nothing
we can do about it. And then, for example, one of the modules may consider a certain
data structure unnecessary and delete it, while the pointers to it will remain in other
modules and will still be used. Errors occurring in this case are almost impossible to
localize. Therefore, the "one-owner rule" should be strictly adhered to: each dynamic
data structure created must have an "owner", which can be a subsystem, a
module or even another ("main") data structure , and such an owner must be
204

one and only one.


During its existence, a dynamic data structure may change its owner - for example,
subroutines of one module may create a list or a tree, but another module will use it. It
should be said that there is not even necessarily a coupling: if, say, one module calls a
subroutine of another module to create a certain list, takes this list for itself and is
further responsible for it, and the module that created the list does not store any
information about it - then there is no coupling, because at each moment of time only
its owner works with the list. In any case, the rule of existence and uniqueness of the
owner must be strictly observed, without regard to the presence or absence of coupling.
A data structure may be used either by the owner or by someone whom the owner has
called; in the latter case, the caller may not assume that the data structure will last
longer than until control is returned to the owner, and must not remember any pointers
to the data structure or parts of it.
Note that the concept of "owner" of a dynamic data structure is not supported by
programming language tools and, therefore, exists only in the programmer's mind. If
the relationship of "ownership" is not quite obvious from the program text, be sure to
write appropriate comments, especially in cases where the owner changes.
The general approach to module cohesion can be formulated by the following brief
rules:
• avoid reciprocal (bidirectional) dependencies between modules unless it is
difficult, but do not consider them completely unacceptable;
• avoid using global variables as long as possible; use them only if they can
save at least a few days of work (saving a few hours of work should not be
considered a reason for introducing global variables);
• avoid coupling by data, and if this is not possible, rigorously enforce the

204
In object-oriented programming languages, an object is also added to this list.
§ 2.14. Modules and separate compilation 522
one-owner rule.
Part 3

Processor capabilities and


assembly language
This part of our book will be devoted to programming in NASM assembly language
- as usual, in the Unix environment. Meanwhile, the overwhelming majority of
professional programmers, having heard about it, will only grin and ask a rhetorical
question: "Who writes in assembly language for Unix? This is the 21st century!" The
most interesting thing is that they will be absolutely right. In the modern world,
assembly language programming has been displaced even from such a traditionally
"assembly language" area as programming microcontrollers - small single-chip
computers designed to be embedded in all sorts of equipment, from washing machines
and cell phones to airplanes and turbines in power plants. In most cases, microcontroller
firmware is now written in C, with only small insertions in assembly language; the same
is true for operating system kernels and for other tasks that require customization to the
capabilities of a particular processor.
Of course, it is not possible to do without assembly language fragments at all.
Separate assembler modules, as well as assembler inserts in texts in other languages,
are present in the kernels of operating systems and in the system libraries of the same
C language (and other high-level languages); in special cases microcontroller
programmers also have to abandon C and write "in assembler" in order, for example,
to save scarce memory. For example, the rather popular ATtiny4 microcontroller has
only 16 bytes of RAM and 512 bytes of pseudo-permanent memory for storing the
program code; in such circumstances there is practically no alternative to assembly
language. However, such cases are rare even in the world of microcontrollers, most of
which provide the programmer with a far from
520
such a harsh environment. Few of you who are currently learning assembly language
programming will have to use these skills in practice at least once in your lifetime.
So why waste time learning assembly language? It's never going to be useful
anyway, is it? It looks that way only at first glance; on closer examination of the
question, the ability to think in terms of machine instructions is not just "useful", it turns
out to be vital for any professional programmer, even if that programmer never writes
assembly language. Whatever language you write your programs in, you need at least
a rough idea of what exactly the processor will do to fulfill your highest will. If you
don't have such an idea, the programmer starts mindlessly applying all available
operations, not knowing what he is actually doing. Meanwhile, a single assignment in
the familiar Pascal can be performed in a billionth of a second, but it can also take a
long time if, for example, we think of assigning large arrays of strings to each other.
Things are even more interesting with more complex programming languages: an
assignment written in C+ can be executed in a single machine command, or it can
involve millions of commands . Two such assignments are written in the program in
1

exactly the same way (with an equal sign), but this fact does not help us in any way: we
cannot adequately estimate the resource consumption of this or that operation without
understanding how and what the processor does. A programmer who has no experience
of working at the level of processor's commands simply does not know what he is
actually doing; when inserting some operations into a program in a high-level language,
he often does not realize how complex a task he is putting before the processor. As a
result, we have huge programs that are discouraging in their low efficiency - for
example, office document automation applications that are "cramped" in four gigabytes
of RAM and for which the processor, which is many orders of magnitude faster than
the supercomputers of the eighties, turns out to be "too slow".
Experience shows that a professional computer user, be it a programmer or a system
administrator, may not know something, but by no means can afford not to
understand how a computer system is organized at all its levels, from electronic logic
circuits to cumbersome application programs. 205Not understanding something, we
leave room in our rear for the "feeling of magic": on some almost subconscious level
we continue to think that something there is unclean and without a couple of wizards
with magic wands did not do without it. Such a feeling is categorically inadmissible for
a professional: on the contrary, a professional must understand (and intuitively feel)
that the device he is dealing with was created by people like himself, and is nothing
"magical" or "unknowable".
If the goal is to achieve this level of understanding, it does not matter at all what
particular architecture and assembly language you study. Once you know one assembly
language, you can start writing in any other language after spending two or three hours
(or even less) studying reference information; but the important thing is that, by being
able to think in terms of machine commands, you will always know what actually
happens when you execute your programs.

1For those who know C+-+, let's explain: what happens if you apply an assignment operation to an
object of type list<string> containing two or three thousand elements?
521
In spite of all the above, it seems necessary to explain the choice of a particular
architecture. The material of our "assembler" part of the book is based on the instruction
system of processors of the x86 family, and we will use the 32-bit variant of this
architecture, the so-called ²386 instruction system. At the moment of writing this text
32-bit computers of x86 family have been almost completely replaced by computers
based on 64-bit processors , but fortunately these processors can execute programs in
206

32-bit mode. We will not study the 64-bit instruction system itself and there is a certain
reason for that. All the available descriptions of this system are built on the principle of
enumerating its differences from the 32-bit case; it turns out that we need to study the
32-bit instruction system first in any case. But having studied it we will have already
reached our goal - we will get experience of working in assembly language; further
transition to the 64-bit case is possible but for our purposes it is a bit excessive. Even
the 32-bit instruction system we will study far not in all its (nightmarish) splendor: we
will have enough about a tenth of the possibilities of the processor we are studying to
be able to write programs.
There is one more reason to study 32-bit architecture. One of the main technical
inventions, the understanding of which should be taken out of assembly programming,
is the so-called stack frame used when interfacing subroutines written in high-level
languages. The x86_64 architecture doubled the number of general-purpose registers
available to the program, which is good in itself, but this number of registers is almost
always enough to pass all its parameters to any subroutine; the stack frame degenerates,
losing half of its content - local data and return address are still stored in the stack, but
parameter values are not. Note that the savings here are not so great, because calling
another subroutine will require the same registers, so they will still have to be stored in
the stack - just not in the parameter area, which no longer exists, but in the local data
area; something is really saved only in the case of subroutines that do not call anyone
themselves. In any case, understanding how a stack frame is organized in its full version
is obligatory for a good programmer, and from this point of view the 32-bit architecture
turns out to be a better candidate for the role of a teaching aid.
It will be appropriate to say a few words regarding the choice of a particular
assembler. As you know, there are two main approaches to assembly language syntax
for working with x86 processors: AT&T syntax and Intel syntax. The same processor
instruction is represented quite differently in these syntax systems: for example, an
instruction in Intel syntax looking like

mov eax, [a+edx]

in AT&T syntax would be written as follows:

movl a(%edx), %eax

206
The author considers it appropriate to note that much of this text was prepared for printing on the
EEEPC-901; this netbook is equipped with a single-core 32-bit processor, which is nevertheless
sufficient to handle all the tasks the author encounters under normal circumstances.
522
AT&T syntax is traditionally more popular in the Unix OS environment, but this creates
some problems when applied to the task at hand. Tutorials oriented to assembly
language programming in Intel syntax do exist, while AT&T syntax is described only
in special (reference) technical literature, which is not intended for teaching. In
addition, one should also take into account the longstanding dominance of MS DOS as
a platform for similar courses; all of this makes Intel syntax much more familiar to
teachers (and, oddly enough, to some students as well) and better supported. There are
two main assemblers available for Unix that support Intel syntax: NASM ("Netwide
Assembler"), developed by Simon Tetham and Julian Hall, and FASM ("Flat
Assembler"), created by Tomasz Grishtár. It is difficult to make a clear choice between
these assemblers. Our book deals with the NASM assembly language, including its
specific macro tools; this choice is not for any good reason and is simply random.

3.1. Introductory information


We are already familiar with the concept of assembler from the introductory
section (see §1.4.7); there we had time to discuss the basic principles of computer
design and how the processor processes programs. Perhaps now is a good time to return
to this paragraph and reread it; moreover, it would be useful to refresh your memory of
the whole of Chapter 1.4.
Before we start programming in the chosen assembly language, we will have to
dwell on some more subtle points, without which it will be difficult to understand what
is going on. In this chapter we will discuss the peculiarities of the environment in which
our programs will be executed, make a short excursion into the history of the processor
family used, and then, to get acquainted with the new tool, write a simple program and
make it work.

3.1.1. Classical principles of execution


programs
In the introductory section, we have already met the notion of a von Neumann
machine. Strictly speaking, the principles of computing machine design, known as von
Neumann architectural principles, assume that:
• The computer is based on a central processor (an electronic circuit that performs
calculations) and RAM (an electronic device that stores information and can
communicate directly with the central processor);
• RAM consists of memory cells; each cell is capable of storing ("remembering")
a number from a certain range (in particular, the vast majority of modern
computer architectures use cells of 8 binary digits; such a cell is capable of
storing a number from 0 to 255); all memory cells have the same device, the
same size and differ only by their numbers - the so-called addresses (the
principle of memory linearity and homogeneity);
• The central processor at any moment can write a number from this range to any
of the cells, as well as read the contents of any cell, i.e. find out what number is
§ 3.1. Introductory information 523
stored there (the principle of direct memory access);
• the CPU automatically performs one after the other the operations prescribed by
the program (program control principle);
• the program is stored in RAM cells in the form of machine instructions -
numbers representing code designations of operations to be performed (the
principle of a stored program);
• a memory cell itself "does not know" whether the number stored in it belongs to
the program code or it is just some data (the principle of indistinguishability of
commands and data).
Often this list also includes the use of the binary number system, but it is difficult to consider
this aspect as determinative, it is "from a different opera".
The indistinguishability of commands and data allows treating programs as data
and creating programs for which other programs act as processed information.
Moreover, back in the mid-1990s on some platforms programs could modify themselves
during execution, but modern computing systems exclude this possibility or at least
make it difficult.
The CPU contains electronic circuits for storing information, similar to memory
cells; they are called registers. A common distinction is made between general-purpose
registers and service registers. General-purpose registers are designed for short-term
placement of processed data; operations with them are performed orders of magnitude
faster than with memory cells, but the total volume of registers can be in the millions,
and in modern conditions - billions of times less than memory, so the registers are
usually located in the initial data and intermediate results for calculations performed
right now. At the end of a particular calculation, the data is transferred to RAM to free
the registers for other purposes.
Service registers contain information that the processor itself needs to organize
program execution. The most important of the service registers is the instruction
pointer, sometimes also called the instruction counter; this register contains the
address of the memory location from which the processor will need to retrieve the next
action code.
The CPU works by infinitely repeating an instruction processing cycle consisting
of three steps:
• retrieve the code of the next machine instruction from memory, starting from the
cell whose address is currently in the instruction pointer;
• increase the value of the instruction pointer by the length of the extracted code 207

, after which the register will contain the address of the instruction following the
current one;
• decrypt the command code extracted from memory and perform the action
corresponding to this code.
For some reason, many students are stumped in an exam by the question how the CPU
knows how it knows which machine instruction (out of millions of instructions in
memory) to execute right now; the correct answer is quite trivial - the address of the

In particular, the instruction codes of the processor we are going to study can occupy from 1 to
207

15 cells.
§ 3.1. Introductory information 524
required machine instruction is in the instruction pointer. The processor in no way tries
to penetrate into the logic of the program being executed, into what the results of the
program as a whole should be; it only follows the once and for all established cycle:
read the code, increment the instruction pointer, execute the instruction, start again. The
automatic incrementing of the address in the instruction pointer register causes the
machine instructions that make up the program to be executed one after the other in the
sequence in which they are written in the program (and located in memory).
When an instruction pointer contains the address of a particular location in RAM,
it is said that this memory location (or, what is the same, this section of the program) is
where the control is located. The logic of this term is based on the fact that the
processor's actions are subordinate to machine instructions (controlled by them), with
the processor fetching the next instruction from memory at the address from the
instruction pointer.
To organize the familiar branching, loops and subroutine calls, machine
commands are used that forcibly change the contents of the instruction pointer, as a
result of which the instruction sequence is broken and program execution continues
from another place - from the instruction whose address is stored in the register. This
is called a transition or control transfer (to another section of the machine code). Note
that the instruction execution cycle discussed above involves first incrementing the
instruction pointer and then executing the instruction, so if the next instruction
performs a control transfer, it writes a new address into the instruction pointer on top
of the address already there, calculated during automatic incrementing.
We have already discussed transitions in the introductory part of our book (see 67).
It was also said that control transfer instructions are unconditional and conditional: the
former simply put a given address into the instruction pointer, while the latter first
check if a condition is met, and if it is not met, do nothing, i.e. no transition takes place,
and execution continues as usual with the next instruction. It is conditional transitions
that allow you to organize branching, as well as loops, the duration of which depends
on the conditions; the instructions of unconditional transitions play rather an auxiliary
role, although very important.
Most processors also support the return address memory transition used to call
subroutines. When performing such a transition, the address in the instruction pointer
is first stored somewhere in memory (as we will see later, the hardware stack is used
for this purpose), and only then is replaced by a new address, usually the address of the
beginning of the machine code of a procedure or function. Since the address of the next
instruction is already in the instruction pointer by the time the instruction is executed
(in this case, the jump instruction with memorized return), it is the address of the next
instruction that will be memorized. When the subroutine finishes its work, it returns
control, that is, it places in the instruction pointer the value that was memorized when
it was called, so that in the called part of the program execution continues from the
instruction following the stored transition instruction. Usually such an instruction is
called a recall instruction.
Let us repeat once again that control transfer is understood as a forced change
of the address located in the instruction pointer register (instruction counter). This
§ 3.1. Introductory information 525
is worth remembering.

3.1.2. Peculiarities of programming under


managing multitasking operating systems
Since we are going to run the programs we have written under the Unix operating
system, it will be appropriate to describe in advance some peculiarities of such systems
from the point of view of executable programs; these peculiarities apply not only to
Unix and do not depend on the programming language used, but they become especially
noticeable when working in assembly language.
Almost all modern operating systems allow you to run and execute several
programs simultaneously. This mode of operation of a computer system, called
multitasking , generates some problems that require support from the hardware, first
208

of all - the central processor.


First, we need to protect running programs from each other and the operating
system itself from user programs. If (even if not by malicious intent, but by mistake)
one of the running tasks changes something in the memory belonging to another task,
it will most likely lead to a crash of this second task, and it will be fundamentally
impossible to find the cause of such a crash. If a user task (again by mistake) makes
changes in the operating system memory, it will lead to the crash of the whole system,
and, again, without the slightest possibility to understand the reasons. Therefore, the
CPU must support a memory protection mechanism: each running task is allocated a
certain memory area, and cells outside this area cannot be accessed by the task.
Second, in multitasking mode, user tasks are generally not allowed to work directly
with external devices . If this rule were not enforced, tasks would constantly conflict
209

over access to devices, and such conflicts would, of course, lead to crashes. To limit
what a user task can do, the creators of the CPU declared some of the available machine
instructions to be privileged. The processor can operate either in privileged mode, also
called supervisor mode, or in restricted mode (aka task mode or user CPU mode ) . 210

In restricted mode, privileged instructions are not available; in privileged mode, the
processor can execute all available instructions, both regular and privileged. The
operating system executes in privileged mode, of course, and switches the mode to
restricted mode when control is transferred to a user task. The processor can return to
privileged mode only if control is returned to the operating system; this precludes
execution of user program code in privileged mode. Privileged instructions include
instructions that interact with external devices; also included in this category are

208
The term "task" is, strictly speaking, quite complex, but simplistically a task can be understood
as a program that is started for execution under the control of an operating system; in other words, when
a program is started on the system, a task is created.
209
There are exceptions to this rule, such as displaying graphical information on the display,
but in this case the device must be assigned to one user task and strictly unavailable for other tasks.
210
In fact, the i386 processor and its descendants have not two but four modes, also called rings
of protection, but in reality operating systems use only ring zero (the highest possible privilege level)
and ring three (the lowest privilege level).
§ 3.1. Introductory information 526
instructions used to configure memory protection mechanisms and some other
instructions that affect the operation of the system as a whole. All such "global" actions
are considered to be the prerogative of the operating system. When working under a
multitasking operating system, a user task is only allowed to transform
information in its allocated area of RAM. The task performs all interaction with
the outside world through calls to the operating system. The task cannot even just
display a string on the screen by itself, it has to ask the operating system to do it. Such
an appeal of a user task to the operating system for some services is called a system
call. It is interesting that only the operating system is able to complete the task; it
becomes obvious if we remember that the task itself is an object of the operating system,
it is the operating system that loads the program code into memory, allocates memory
for data, sets up protection, starts the task, provides allocation of processor time to it;
when the task is completed, it is necessary to mark its memory as free, stop allocating
processor time to this task, etc., and only the operating system can do this, of course.
Thus, a correct user task cannot do without system calls, because it needs to call the
operating system even just to terminate.
One more important point that should be mentioned before starting to study a
particular processor is the presence of a virtual memory mechanism in our runtime
environment. Let's try to understand what it is. As we have already mentioned, RAM
is divided into cells of the same capacity (in our case each cell contains 8 bits of data),
and each such cell has its own serial number. It is this number that the CPU uses to
work with memory cells via a common bus to distinguish them from each other. Let's
call this number the physical address of the memory cell. Initially, memory cells had
no addresses other than physical ones. It was physical addresses that were used in the
machine code of programs and they were called simply "addresses" without the
qualifying word "physical". With the development of multitasking mode of computing
systems it turned out that due to a number of reasons the use of physical addresses is
inconvenient. For example, a program in machine code that uses physical addresses of
memory cells will not be able to work in another memory area - and in a multitasking
situation it may turn out that the area we need is already occupied by another task. There
are other reasons as well, which we will return to in Volume 2 when we examine the
operating system principles.
Modern processors use two types of addresses. The processor itself works with
memory using physical addresses that we already know, but programs that run on the
processor use quite different addresses - virtual addresses. A virtual address is a number
from some abstract virtual address space. On i386 processors, virtual addresses are 32-
bit integers, that is, the virtual address space is a set of integers from 0 to 2 - 1;
32

addresses are usually written in hexadecimal, so an address can be a number from


00000000 to ffffffff. It is important to realize that a virtual address does not
have to correspond to any memory location at all. To be more precise, some virtual
addresses correspond to physical memory locations, some do not, and some addresses
may or may not correspond to physical memory. Such correspondences are set by
means of the CPU configuration, which is the responsibility of the operating system.
The central processor, having received a virtual address from the next machine
§ 3.1. Introductory information 527
instruction, converts it into a physical address by which it accesses the RAM. Thus, in
programs we use virtual (abstract) addresses as addresses, rather than physical memory
cell numbers, which are then converted into real cell numbers by the processor itself.
The device in the processor that converts virtual addresses into physical addresses is
called a memory management unit (MMU, read em-em-u); it can be translated as
"memory management unit", but usually the abbreviation "MMU" is not translated.
The presence of MMU in the processor allows, in particular, each program to have
its own address space: indeed, no one prevents the operating system from setting up
address transformations so that the same virtual address in one user task is mapped to
one physical cell, and in another task - to a completely different one.
It is not in our plans to study in detail all the features of the i386 processor; we will
confine ourselves to examining the commands available to a user task running in
limited mode. Moreover, even the limited-mode capabilities we will not consider all of
them. In particular, Unix family operating systems execute user tasks in the so-called
flat memory addressing model, in which some registers and some types of machine
commands are not used. We won't waste our time studying these registers and
commands because we won't be able to use them anyway. Later we will consider in
detail the mechanisms of interaction with the operating system, including how to
organize a system call for Linux and FreeBSD systems; however, until we know these
mechanisms, we will perform input/output and program termination with the help of
ready-made macros - special identifiers, which our assembler will expand into whole
sequences of machine commands and translate them in this form. Of course, in time we
will learn not only to do without these macros, but also to create such macros ourselves,
but we just have to start somewhere.

3.1.3. History of the ²386 platform


In 1971, Intel Corporation released a family of chips called MCS-4. One of these
chips, Intel 4004, which we have already mentioned in the introduction, was the world's
first complete central processing unit on a single chip, i.e., in other words, the first
microprocessor in history - at least among those available to the general public. The
machine word of this processor was four bits. A year later, Intel released the Intel
211

8008, an eight-bit processor, followed by the more advanced Intel 8080 in 1974.
Interestingly, the 8080 used different operation codes, but programs written in assembly
language for the 8008 could be translated for the 8080 without changes. Intel designers
supported a similar "source code compatibility" for the Intel 8086 16-bit processor that
appeared in 1978. Released a year later, the Intel 8088 processor was practically the
same device, differing only in the bit size of the external data bus (for the 8088 it was
8 bits, for the 8086 - 16 bits). It was the 8088 processor that was used in the IBM PC
computer, which gave rise to the numerous and incredibly popular family of 212

211
Recall that a machine word is a portion of information processed by a processor in one go.
212
The popularity of IBM-compatible machines is a very controversial phenomenon; many
other architectures with substantially better design could not survive in a market flooded by IBM-
compatible computers, cheaper because of their mass availability. Anyway, this is the situation now and
§ 3.1. Introductory information 528
machines, still called IBM PC-compatible or simply IBM-compatible.
The 8086 and 8088 processors did not support memory protection and had no
division of commands into regular and privileged, so it was impossible to run a full
multitasking operating system on computers with these processors . The same was the
213

case with the 80186 processor released in 1982. Compared to its predecessors, this
processor was much faster, because it had hardware implementation of some operations
performed in previous processors by microcode; the clock frequency also increased.
The processor included some subsystems that previously had to be supported by
additional chips-such as an interrupt controller and a direct memory access controller.
In addition, the processor's instruction system was expanded by the introduction of
additional instructions; for example, it became possible to stack all general-purpose
registers with a single instruction. The address bus of the 8086, 8088 and 80186
processors was 20-bit, which allowed addressing no more than 1 MB of RAM (2 cells). 20

The same year, 1982, saw the 80286 processor, which was the last 16-bit processor
in the series. This processor supported the so-called "protected" mode of operation
(protected mode) and the segmented virtual memory model, which implied, among
other features, memory protection; four protection rings allowed to prohibit user tasks
from performing actions affecting the system as a whole, which is necessary when
running a multitasking operating system. The address bus received four additional bits,
increasing the maximum amount of directly accessible memory to 16 MB.
True multitasking operating systems were created only for the next processor in the
line, the 32-bit Intel 80386, for short referred to simply as "I386". This processor, mass
production of which began in 1986, differed from its predecessors by increasing
registers to 32 bits, significantly expanding the instruction system, increasing the
address bus to 32 bits, which allowed to directly address up to 4 GB of physical
memory. The addition of support for paged organization of virtual memory, best suited
for multitasking, completed the picture. It was with the appearance of the ²386 that the
so-called IBM-compatible computers finally became full-fledged computing systems.
At the same time І386 fully preserved compatibility with the previous processors of its
series, which is due to the rather strange at first sight register system. For example,
universal registers of 8086-80286 processors were called AX, BX, CX and DX and
contained 16 bits of data each; in the І386 processor and later processors of the line
there are registers containing 32 bits each and called EAX, EBX, ECX and EDX (the
letter E means the word "extended", i.e. "extended"). The lower 16 bits of each of these
registers retain their old names (AX, BX, CX, and DX, respectively) and are still
available for operation without their "extended" parts.
Further development of the x86 processor family up to 2003 was purely
quantitative: speed was increased, new commands were added, but there were no
fundamental changes in the architecture. In 2001 the alliance of Hewllett Packard and

no significant changes are foreseen yet.


213
Multitasking systems for these machines are known, but without memory protection, that is, in
conditions when any of the running programs can do literally anything with the whole system, the
practical applicability of such systems remained extremely doubtful.
§ 3.1. Introductory information 529
Intel companies released Itanium (Merced) processor, the architecture of which, called
IA-64, was incompatible with x86, but included emulation of execution of commands
of the ²386 architecture; emulation turned out to be too slow for practical use, and
nobody was in a hurry to create programs and operating systems for the new
architecture. In 2003 AMD company presented a new processor, Opteron, which
architecture became a 64-bit extension of x86 architecture like 32-bit І386 became an
extension of the original 16-bit 8086 processor architecture. The new instruction system
was called "x86_64". The appearance of Opteron finally killed the Itanium architecture,
which never managed to get serious distribution, though processors of this architecture
are still produced. However, Intel itself released the Xeon processor with x86_64
architecture in 2004.
The "multi-core" architectures that followed are nothing more than quantitative
development, and in a direction that practically does not increase the real system performance.
The point is that even highly loaded server machines are mostly "bumped" in their performance
not in the speed of the processor and especially not in the competition between programs for
a single processor, but rather in the speed of disk exchanges and bus operation; neither on
this, nor on the other "multi-core" can not affect the "multi-core". As a rule, all but one of the
cores in a system are idle most of the time.
Processors of x86_64 architecture can execute 32-bit programs, which makes
migration much easier. In particular, the computer you are trying to program on is most
likely a 64-bit computer; it may have a 32-bit or 64-bit operating system installed. Your
own assembly language programs will be 32-bit but it will not prevent you from
executing them in any way.

h .1.4. Getting to know the tool


In order to write assembly language programs, we must first study the processor
we will be working with (even if not all its capabilities, but at least some essential part
of them),
and , secondly, assembly language syntax. Unfortunately, there is a certain problem
here: it is impossible to study these two things at the same time, but it is a thankless
task to study the processor's instruction system without having any idea about assembly
language syntax, and to study syntax without having any idea about the instruction
system, so no matter where we start, the result will be somewhat strange. We'll try to
go the other way: first, we'll get some idea of both the instruction system and assembly
language syntax, even if this idea is very superficial, and then we'll start a systematic
study of both.
Now we will write a working program in assembly language, translate it and run it.
At first, not everything in the program text will be clear; we will explain some things
right now, and leave some things until a more appropriate time. The task we will choose
for ourselves is very simple: to print (i.e. to print to the screen, or, strictly speaking, to
print to the standard output stream) the word "Hello" five times. As we discussed on
page 529, we will need to contact the operating system to print the string to the screen,
as well as to terminate the program correctly, but we will use ready-made macros for
this purpose, which are described in a separate file. The assembler, checking this file
§ 3.1. Introductory information 530
and our instructions, converts each use of such a macro into a fragment of assembly
language code and translates these fragments itself.
When we introduced the concept of assembly language in §1.4.7, we noted that
machine commands in assembly language are denoted by short words, called
mnemonics, that are easy to remember. In addition to mnemonics, assembly language
programs also contain directives, i.e. direct orders to the assembler, and macro calls,
which use the capabilities of macros. Our first program will not have so many
mnemonics, directives and macro calls will occupy more space in it. So let's write:

/include "stud_io.inc"
global _start

section .text
_start: mov eax, 0
again: PRINT "Hello"
PUTCHAR 10 inc eax cmp eax, 5 jl again FINISH

Let's try to explain something now. The first line of the program contains the
'/include directive, which instructs the assembler to insert in place of the directive
itself the entire contents of some file, in this case, the file stud_io.inc. This file
is also written in assembly language and contains descriptions of PRINT, PUTCHAR
and FINISH macros, which we will use respectively to print a line, to move to the
next line on the screen and to terminate the program. By seeing and executing the
/include directive, the assembler will read the file with the macro descriptions,
allowing us to use them.
It is important to note that the /include directive must be placed in the program
text before macro names are encountered. The assembler looks through the text from
top to bottom. Initially, it knows nothing about macros and will not be able to process
them if it is not informed about them. After looking through the file containing the
macro descriptions, the assembler remembers these descriptions and continues to
remember them until the translation is complete, so we can use them in the program -
but not before the assembler knows about them. That is why we put the /include
directive at the very beginning of the program: now macros can be used in the whole
program text.

After the '/include directive we see a line with the word global; this is
also a directive, we will return to it a little later.
The next line of the program contains the section directive; let us try to explain
its meaning. The Unix executable file is designed in such a way that it stores machine
commands in one place, and initialized data (i.e., data that is given an initial value
directly in the program) in another place, and finally, the third place contains
information about how much memory the program will need for uninitialized data. The
corresponding parts of the executable file are called sections. When loading the
executable file into memory, the operating system creates separate memory areas (so-
called segments) for machine code (taking our section containing machine code as a
basis), for data (here initialized and uninitialized data are combined; in general, a
§ 3.1. Introductory information 531
segment may consist of several sections), and for the stack (no sections correspond to
this segment).
Based on our program text, the assembler generates separate images (i.e., future
memory contents) for each of the sections; we must place our executable code in one
section, descriptions of memory areas with a given initial value in another section, and
descriptions of memory areas without initial values in a third section. The
corresponding sections are called .text, .data, and .bss. The stack section is
formed by the operating system without our participation, so it is not mentioned in
assembly language programs. In our simple program, we only need the .text
section; the directive under consideration tells the assembler to start forming this
section. In the future, when we consider more complex programs, we will have to deal
with all three sections.
Next in the program we see the line
_start: mov eax, 0

The word mov refers to an instruction that causes the processor to send some data
from one location to another. The instruction is followed by two parameters called
operands; for the mov instruction, the first operand specifies where the
data should be copied to, and the second operand specifies what data should be copied
there. In this particular case, the command requires the number 0 (zero) to be written
to the EAX register . The value stored in the EAX register will be used as a loop
214

counter, that is, it will indicate how many times we have already typed the word
"Hello"; clearly, at the beginning of the program execution this counter should be zero,
since we have not typed anything yet.
So, the line in question means telling the processor to put a zero in the EAX; but
what's the mysterious "_start:" at the beginning of the line?
The word _start (the underscore in this case is part of the word) is a so-called
label. Let's first try to explain what these labels are "in general", and then we'll tell you
why we need a label in this particular case.
The mov eax,0 command is converted into machine code by the assembler
(see §§1.1.3 and 1.4.7). Note for clarity that the machine code of this command consists
of five bytes: b8 00 00 00 00 00 00, the first of which specifies the actual
action "place a given number in a register" and also the number of the EAX register.
The other four bytes (all together) specify the number to be placed in the register; in
this case it is the number 0. During the execution of the program, this code will be
located in some area of RAM (in this case - in five consecutive cells). In some cases,
we need to know what address a particular memory location will have; in the case of
commands, we may need the address, for example, to force the processor to transfer

The reader already experienced in assembly language programming may notice that it is "more
214

correct" to do it with a completely different command: xor eax, eax, because it allows you to
achieve the same effect faster and with less memory consumption; however, for a simple tutorial
example, such a trick requires too long explanations. However, we will come back to this issue later
and will certainly consider this and other similar tricks.
§ 3.1. Introductory information 532
control to this location in the program (i.e., to make a conditional or unconditional jump
here).
Of course, RAM can store not only commands but also data. We usually call
memory areas intended for data as variables and name them almost the same way as in
high-level programming languages, including Pascal. Naturally, we need to know what
address the beginning of the memory area allocated for the variable has. The address,
as we have already mentioned, is set by with eight hexadecimal digits, for example,
11

18b4a0f0. It is inconvenient to memorize such numbers, besides, at the moment


of writing a program we don't know where this or that command or variable will be
placed in memory. This is where labels come to our aid. A label is a word (identifier)
entered by the programmer, with which the assembler associates some number,
most often an address in memory, but not always; a label may denote just a number
not related to addresses. In this case, _start is exactly the label associated with the
memory address. If the assembler sees a label before an instruction (or, as we will see
later, before a directive allocating memory for a variable), it takes it as an instruction
to enter a new label in its internal tables and associate the corresponding address with
it; if the label is found in the instruction parameters, the215 assembler remembers what
address (or just a number) is associated with this label and substitutes this address
(number) instead of the label in the instruction. Thus, the _start label in our
program will be associated with the address of the cell starting from which the machine
code corresponding to the mov eax,0 command will be placed in RAM (code b8
00 00 00 00 00 00).
It is important to understand that labels exist in the assembler's memory during the
program translation, some continue to exist during the linkage editor, but a ready-to-
execute program represented in machine code will not contain any labels, only
addresses substituted instead.
The label in the line under discussion is followed by a colon character.
Interestingly, we could have left it out. Some assemblers distinguish colons from non-
colon labels; but our NASM is not one of those; here we decide whether to put a colon
after the label or not. Usually programmers put colons after labels that mark machine
commands (i.e., labels that can be passed control), but do not put colons after labels
that mark data in memory (variables). Since the _start label marks a
command, we decided to put a colon after it.
The attentive reader may notice that no transitions to the _start label are made
in our program. Why is it needed then? The point is that the word "_start" is a
special mark that marks the program entry point, i.e. the place in the program where
the operating system should transfer control after loading the program into RAM; in
other words, the _start mark denotes the place where program execution will
start.
Let's return to the program text and consider the following line:

At least for the processor and system we are considering.


1:1
§ 3.1. Introductory information 533
again: PRINT "Hello"

As you can easily guess, the word again at the beginning of the line is another
marker. The word "again" means "again" in English. For the word Hello to be
printed five times, we have to return to this point in the program four more times; hence
the name of the label. The word PRINT is the name of the macro, and the string
"Hello" is the parameter of the macro. The macro itself is described, as already
mentioned, in the file stud_io.inc. When our assembler sees the macro name and
parameter, it will substitute them with a number of commands and directives, the
execution of which will eventually result in the "Hello" string being displayed on the
screen.
It is very important to realize that PRINT has nothing to do with the
capabilities of the CPU. We have already mentioned this fact several times, but
nevertheless we will repeat it again: PRINT is not the name of any CPU command,
the CPU cannot print anything. The program line we are considering is not a command
but a directive, also called a macro call. Obeying this directive, the assembler will form
a fragment of assembly language text (note for clarity that in this case this fragment
will consist of 23 lines in the case of Linux and 15 lines for FreeBSD) and will translate
this fragment itself, receiving a sequence of machine instructions. These instructions
will contain, among other things, an appeal to the operating system for the service of
data output (system call write). The set of macros, including the PRINT macro, is
introduced for convenience at first, while we do not yet know how to access the
operating system. Later we will learn this, and then the macros described in
stud_io.inc will not be needed; moreover, we will learn to create such macros
ourselves.
Let's go back to the text of our example. The next line looks like

PUTCHAR 10

This is also a call to a macro called PUTCHAR, which is designed to print a


single character. In this case, we use it to print a character with code 10; as we already
know, this is a line feed character, which means that when this character is printed, the
cursor on the screen will move to the next line. Note that this and the following lines
contain only commands and macro calls, but no labels. We do not need them because
we are not going to make transitions to any of the following commands, and, therefore,
we do not need information about the addresses in memory where these commands will
be located.
The next line in the program is as follows:

inc eax

Here we see the machine command inc, meaning an order to increment a given
register by 1. In this case, the EAX register is incremented. Recall that in the EAX
register we have agreed to store information about how many times the word "Hello"
has already been printed. Since the execution of the previous two lines of the program,
§ 3.1. Introductory information 534
containing calls to the PRINT and PUTCHAR macros, eventually led to the printing
of the word "Hello", we should reflect this fact in the register, which we do. Curiously,
the machine code of this command is very short - only one byte (hexadecimal 40,
decimal 64).
Next in our program is the compare command:

cmp eax, 5

The machine command for comparing two integers is denoted by the mnemonic cmp
from the English mnemonic "to compare". In this case, the contents of the EAX register
and the number 5 are compared. The results of the comparison are written to a
special register of the processor called the flag register. This allows, in particular, to
make a conditional transition depending on the results of the previous comparison,
which we do in the next line of the program:

jl again

Here jl (jump if less) is a mnemonic for the machine command of conditional jump,
which is executed if the previous comparison yielded the result "the first operand is less
than the second operand", i.e. in our case - if the number in the EAX register is less
than 5. In terms of our task, this means that the word "Hello" has been printed less than
five times, so we need to continue printing it, for which we make a transition (transfer
control) to the command marked with the again label.
If the result of the comparison was anything other than "less than", the jl
instruction will do nothing, so the processor will move on to the next
instruction in the sequence. This will happen if the word "Hello" has already been typed
five times, i.e. just when it is time to end the loop. After the loop ends, our initial task
is solved, so it is time to terminate the program as well. This is the purpose of the next
line of the program:

FINISH

The word FINISH stands for, as already noted, a macro that unfolds into a sequence
of commands that executes a request to the operating system to terminate the execution
of our program.
We only need to go back to the beginning of the program and consider the line

global _start

The word global is a directive that requires the assembler to consider some label to
be "global", that is, as if visible from the outside (strictly speaking, visible from outside
the object module; we will consider this concept later). In this case, the _start label
is declared global. As we already know, this is a special label that marks the entry point
into the program, i.e. the place in the program where the operating system should
transfer control after the program is loaded into RAM. It is clear that this marker must
be visible from the outside, which is achieved by the global directive.
§ 3.1. Introductory information 535
So, our program consists of three parts: preparation, a loop, the beginning of which
is marked again, and the final part, which consists of a single line FINISH. Before
starting the loop, we put the number 0 in the EAX register, then at each iteration
of the loop we print the word "Hello", translate the line, increment the EAX register by
one, compare it with the number 5; if the EAX register still contains a number
less than five, we go back to the beginning of the loop (i.e. to the mark again),
otherwise we exit the loop and terminate the program execution.
To try this program, as they say, in practice, you need to arm yourself with some
text editor, type this program and save it in a file with a name ending with ".asm" -
this is how files containing assembly language source code are usually called.
Suppose we have saved the program text in the file hello5.asm. To get the
executable file, we need to perform two actions. The first is to run the NASM
assembler, which will build an object module using the source text we have given. An
object module is not yet an executable file. As we already know from the Pascal part,
large programs usually consist of a whole set of source files called modules, plus we
may want to use someone else's third-party routines organized into libraries. Each
module is compiled separately, resulting in an object file; its contents are so-called
object code, which requires another stage of processing to turn into machine code ready
for execution by the processor. To produce an executable file, all the object files
obtained from the modules must be linked together, libraries must be attached to them,
and all links (addresses) must be finalized, thus turning the object code into machine
code; this is done by the linker, also sometimes called link editor or linker.
Our program consists of only one module and does not need any libraries, but this
does not exclude the assembly (linking) stage. This is the second action needed to build
the executable: run the linker to build the executable from the object file. It is at this
stage that the _start label will be used; we can specify that the global directive
does not just make the label "visible from the outside", but causes the assembler to
insert information about the label into the object file, visible to the linker.
So, first, we call the NASM assembler:

nasm -f elf hello5.asm

The "-f elf" checkbox tells the assembler to expect an object file in ELF format
(executable and linkable format), which is the format used in our system for executable
files . The result of running the assembler will be the file hello5.o containing the
216

object module. Now we can run the linker, which is called ld:
ld hello5.o -o hello5

If you are working with a 64-bit operating system, which is most likely the case these days,
you will have to add another key for the linker to build the 32-bit executable; in particular, for
GNU ld on Linux it would look like this:

This is true at least for modern versions of the Linux and FreeBSD operating systems. Other
216

systems may require a different format for object and executable files.
§ 3.1. Introductory information 536
ld -m elf_i386 hello5.o -o hello5

and when working with FreeBSD, it's like this:

ld -m elf_i386_fbsd hello5.o -o hello5

To find out what hardware platform you are dealing with, you can use the command " uname
-a". This command will produce one rather long line of text, near the end of which you will
find the hardware architecture designation: ²386, ²586, ²686, x86 indicate 32-bit
processors, while x86_64, amd64, etc., indicate 64-bit processors. - indicate 64-bit
processors. It may also turn out that your computer does not belong to the І386 family at
all, which may be indicated by some armv6l (for example, this is the designation of the
Raspberry Pi architecture), but in this case it has a completely different instruction system and
most likely there is no NASM assembler at all. There is nothing you can do about it, you will
have to find another computer.
With the -o checkbox (from the word output) we have set the name of the
executable file (hello5, this time without the suffix). Let's run it with the command
"./hello5". If we haven't made any mistakes, we will see five lines of Hello.

3.1.5 Macros from the stud_io.inc file


The macros described in the stud_io.inc file will be needed many times in
the future, so in order not to return to them, we will describe their capabilities once
again. The file itself can be found in the archive of examples attached to our book.
Under Linux, you can use the file as it is in the archive.
To work under FreeBSD, you will have to modify the file slightly. To do this, open the file
in the text editor you use for programming, and find the following two lines at the very beginning
of the file (after the comment annotation):

'/.define STUD_IO_LINUX
//.define STUD_IO_FREEBSD

The semicolon symbol means comment here, i.e. the assembler sees the first of these two
lines, and ignores the second as a comment. To adapt the file for FreeBSD, we need to take
the first line out of the work and put the second line in. To do this, we remove the semicolon
at the beginning of the second line and put it at the beginning of the first line. The result is like
this:

//.define STUD_IO_LINUX
'/.define STUD_IO_FREEBSD
After this edit, your stud_io.inc file is ready to run under FreeBSD.
In the program we described in the previous paragraph, we used the PRINT,
PUTCHAR and FINISH macros. In addition to these three macros, our
stud_io.inc file also supports the GETCHAR macro, so there are four
macros in total.
The PRINT macro is designed to print a string; its argument must be a string in
apostrophes or double quotes, it cannot print anything else.
§ 3.1. Introductory information 537
The PUTCHAR macro is designed to print a single character. As an argument it
accepts the character code written as a number or as the character itself taken in quotes
or apostrophes; you can also use a single-byte register as an argument to this macro -
AL, AH, BL, BH, CL, CH, DL or DH. You cannot use other registers as an
argument to PUTCHAR! In addition, the argument of this macro can be an executive
address enclosed in square brackets - then the character code will be taken from the
memory location at this address.
The GETCHAR macro reads a character from the standard input stream (from
the keyboard). After reading the character code is written to the register EAX; since the
character code always fits in one byte, it can be retrieved from the register AL, the
remaining digits of EAX will be zero. If there are no more characters (reached a
familiar situation of the end of the file - remember, in Unix it can be simulated by
pressing Ctrl-D), in the EAX will be recorded in the value -1 (hexadecimal
FFFFFFFFFFFF, that is, all 32 bits of the register are equal to one). This macro does
not accept any parameters.
The FINISH macro terminates program execution. This macro can be called
without parameters, or it can be called with a single numeric parameter that specifies
the termination code (see page 310).

3.1.6 Rules of assembly program design


When studying the Pascal language, we paid much attention to the rules of program
text formatting. There are also rules for assembly language programs, and they are quite
different; the reader can already notice it from the text of the example above.
The peculiar style used when working in assembly language is due to two reasons.
First, assembly languages belong to a rather small group of programming languages in
which the source code line is the basic syntactic unit. Nowadays, most languages treat
the end of a line as one of the whitespace characters, not fundamentally different from
the space and tab. Apart from assembly languages, line-by-line syntax is now preserved
only in Fortran. It is interesting that it was Fortran (or rather, its early versions) that laid
down a certain tradition of code layout, which is now used for assembly language; as
for Fortran itself, the so-called free syntax has become popular in recent years, allowing
the use of traditional structural indents; however, there are programmers who prefer to
write in Fortran "the old-fashioned way".
The tradition of "fixed syntax" dates back to the days when a Fortran program was a deck
of punch cards. Early versions of Fortran required that a label, which in Fortran represents an
integer, be written in columns one through five, and often a label shorter than five digits was
shifted to the right, leaving the first positions empty. In the first position it was also possible to
place the character C, denoting a comment; later the comment could be denoted by * or !.
The text of the operator had to start strictly from the seventh position, and a space was required
before it in the sixth position; a non-space character in the sixth position meant that this line
(or rather, punch card) was a continuation of the previous one, and the first five positions had
to be left empty.
There is a second reason why assembly language programs are not like Pascal
§ 3.1. Introductory information 538
programs: an assembly language program is a prototype of the contents of RAM,
which, according to von Neumann's principles, is linear and homogeneous. It is for this
reason that, as the reader may have already noticed, no structural indents are used here
and no attempt is made to move away from using strings as the basic syntactic units.
Only comments are left to emphasize controlling constructions, and you should use
them to the fullest, unless you want to end up with a text that you will never understand
yourself. Let us emphasize once again: in assembly language programs, write as
detailed comments as possible! Unlike other languages, where there may be "too
many" comments, it is almost impossible to "re-comment" an assembly program.
Modern assembly language text usually somewhat resembles early Fortran
programs. The general principle of assembly code design is quite simple. You should
mentally divide the horizontal space of the screen into two parts - the mea-
and the command area. Often the comments area is also highlighted. The code is written
"in column" like this:
xor ebx, ebx ; zero ebx
xor ecx, ecx ; zero ecx
; another byte from the
lp: mov bl, string
[esi+ecx] ; is the string over?
cmp bl, 0 je ; end the loop if so
lpquit push ebx ;
inc ecx p pjml ; next index
; repeat the loop
lpquit: jecxz done ; finish if the string is
mov edi, esi empty
; point to the buffer's
beginning
; get a char
lp2: pop ebx ; store the char
mov [edi], bl ; next address
inc edi loop lp2 ; repeat ecx times
done:
If a label does not fit in the space provided for labels, it is placed on a separate line:
fill_memory:
jecxz fm_q
fm_lp: mov [edi], al
inc edi
loop fm_lp
fm_q: ret

Usually, when working in assembly language, a column the width of one tab is allocated
for labels, and, of course, it is the tab character (not spaces!) that is placed before each
command (including after labels). Some programmers prefer to give labels two tabs;
this allows longer labels to be used without having to allocate separate lines for them:
fill_memory: jecxz fm_q
fm_lp: mov [edi], al
inc edi
loop fm_lp
§ 3.1. Introductory information 539
fm_q: ret

Quite often you can find a style that assumes a separate column for the designation of
the team. It looks like this:
fill_memory jecxz fm_q
:
fm_lp: mov [edi],
inc aledi
loop fm_lp
fm_q: ret
Unfortunately, this style "breaks" when using, for example, macros with names longer
than seven characters, because the name of such a macro should be written, quite
naturally, in the command column, but there is not enough space in this column.
However, an exception to the rule can be made for macros.
It should be emphasized that a number of general principles of code design apply
to assembly language in the same way as to any other programming language. Let us
try to enumerate them.
First of all, let us remind you that the program text must consist exclusively of
ASCII characters; any characters not included in the ASCII set are not allowed in the
program text even in comments, not to mention string constants or even identifiers. We
have already mentioned this point several times, notably in the footnotes on pages 258
and 274. Among other things, it was stated that comments should be written in English,
otherwise it is better not to write them at all; for assembly language this
recommendation is not suitable, it is absolutely impossible to do without comments
here, but only one thing follows from this: if you have any problems with English, you
should solve them immediately, and at first use a dictionary.
As with any program, assembly language text is subject to the eighty-column rule:
your program lines must not exceed 79 characters in length. The reasons for this
were discussed in detail in §2.12.7).
Of course, we should not forget about choosing meaningful names for labels,
especially since in assembly programs most of the identifiers entered by the
programmer are global; the rules for dividing code into separate subroutines, as well as
into modules and subsystems, are also completely independent of the language used
and apply to assembly language just like everywhere else; here we can advise you to
return to the part about Pascal again and reread §§2.4.4, 2.12.10, 2.14.3 и 2.14.4.

3.2. Basics of the І386 instruction system


In this chapter we will give the simplest and most necessary information about the
architecture of the i386 processor: its register system, instructions for copying
information, for performing integer arithmetic operations, and for controlling program
execution. Subroutine organization and floating-point arithmetic will be covered in
separate chapters.

31 1 15 87 0
6
EAX
1 __ AH AL
J
§ 3.2. Basics of the І386 command system 546
_150
- AX - -1 csLl
1
31 15 87 0
6 _150
EBX
1 __ BH
- BX -
BL

-1
J ss ;
_150
1
31 6 15 87 0 DS :
ECX
1 __ CH
- CX -
CL

-1
J ESL
_150

!
1
31 15 87 0
6 _150
EDX
1 __ DH
- DX -
DL

-1
J FS :
150
1
31 6 15 0 GS L !
ESI
1 __ ___ SI
J
EDI 1 __
31
116 15

DI -|/ .IFAGs 1
16 15 0

1 FLAGS
31 16 15 0 J
EBP
1 __ ___ L BP

31
ESP _
16 15
; SPEIP G ";15 ,P I

Figure 3.1. i386 register system

The processor commands that can be used only in privileged mode and,
consequently, are used only by the operating system, will not be considered at all: a
story about them would take up too much space, and to try them in practice, one would
have to write one's own operating system. This is not necessary for the intended
educational purposes, and if desired, the reader can use the reference literature to learn
more about the processor's capabilities.

3.2.1. Register system


A register is an electronic device in the central processing unit capable of holding
a certain amount of data in the form of binary digits. In most cases (but not always) the
contents of a register are treated as an integer written in binary notation. The registers
of the i386 processor can be divided into general purpose registers, segment registers,
and special registers. Each register has its own name consisting of two or three Latin
letters; this is where x86 processors differ from many other processors, where registers
are numbered rather than named.
Segment registers (CS, DS, SS, ES, GS and FS) are not used in the "flat" memory
model. To be more precise, before transferring control to the user task, the operating
system enters some values into these registers, which the task can theoretically change,
but nothing good will come out of it - most likely, it will crash. We take into account
the existence of these registers, but we will not return to them.
Earlier (see page 534) we mentioned that the user task memory consists of
segments - code segment, data segment and stack segment. Technically speaking, these are
the "wrong" segments. To be more precise, segment registers are part of the hardware support
that is provided in the processor for forming user task segments, but Unix systems do not take
advantage of this support; this does not invalidate the fact that segments exist, they are just
not done the way the processor designers intended.
§ 3.2. Basics of the І386 command system 547
The general purpose registers of the i386 processor are the 32-bit registers EAX,
EBX, ECX, EDX, ESI, EDI, EBP, and ESP. As noted on page. 531, the letter E
in the name of these registers stands for the word extended, which was introduced when
the i386 processor's 32-bit registers were transitioned from the 16-bit registers of older
processors to the 32-bit registers of the i386 processor. For compatibility with previous
x86 processors, each 32-bit register has a separate lower half (the lower 16 bits) that
has a separate name, which is obtained by dropping the letter E; in other words, we can
also work with the 16-bit registers AX, BX, CX, DX, DX, SI, DI, BP, and SP,
which are the lower halves of the corresponding 32-bit registers.
In addition, the registers AX, BX, CX and DX are also divided into low and
high parts, now eight-bit. Thus, for the register AX its low byte is also called AL, and
the high byte - AH (from the words low and high). Similarly, we can work with the
registers BL, BH, CL, CH, DL and DH, which are the low and high bytes of the
registers BX, CX and DX. The other general-purpose registers do not have such separate
single-byte subregisters.
Almost all general-purpose registers have a specific role in some cases, partially encoded
in the register name. For example, in the name of the AX register, the letter A stands
for the word "accumulator"; in many architectures, including John von Neumann's famous
IAS, an accumulator was a register that participated in any arithmetic operations as one of
the operands and as the place where the result should be placed. The related special role of
the AX and EAX registers is evident in integer multiplication and division commands (see
§3.2.8).
In the name of the BX register the letter B stands for the word "base", but no special
role in 32-bit processors is assigned to this register, although in 16-bit processors such a role
existed.
In the name CX the letter C corresponds to the word "counter". Registers ECX, CX,
and in some cases even CL are used in many machine commands that involve (in one
sense or another) a certain number of iterations.
The DX register name is symbolized by the word "data". The EDX register (or DX if a
hexadecimal operation is performed) plays a special role in integer multiplication (to store the
part of the result that does not fit in the accumulator) and integer division (to store the higher
part of the divisor, and after the operation is performed - to store the remainder of the division).
The SI and DI register names mean "source index" and "destination index"
respectively. ESI and EDI registers are used in commands working with data arrays, where
ESI stores the address of the current position in the source array (for example, in the memory
area to be copied somewhere), and EDI stores the address of the current position in the
destination array (in the memory area where data is copied or written).
The name of the BP register is derived from the words "base pointer". As a rule, the EBP
register is used to store the base address of the stack frame in subroutines that have
parameters and local variables.
Finally, the SP register name stands for "stack pointer". Despite the fact that the ESP
register belongs to the group of general-purpose registers, in reality it is always used as a
stack pointer, i.e. it stores the address of the current position of the hardware stack top. Since
it is hard to do without the stack, and other registers are not suitable for this purpose, we can
consider that ESP never acts in any other role. It is referred to the group of general-purpose
registers only on the grounds that it can be used in arithmetic operations on a par with other
§ 3.2. Basics of the І386 command system 548
registers of this group.
To the special purpose registers we will refer to the instruction counter, aka the
pointer of the current instruction EIP and the flag register FLAGS.
The EIP register, whose name is derived from the words extended instruction
pointer, stores the address of the location in RAM from which the processor should
fetch the next machine instruction to be executed. After fetching an instruction from
memory, the value in the EIP register is automatically incremented by the length of
the read instruction (note that an instruction can occupy from one to eleven consecutive
locations in memory), so that the register again contains the address of the instruction
to be executed next.
The FLAGS register is the only one of the registers we are considering that
is very rarely used as a whole and is never treated as a number at all. Instead, each
binary digit (bit) of this register represents a flag that has its own name. Some of these
flags are set to zero or one by the processor itself, depending on the result of the next
instruction executed; other flags are set explicitly by the corresponding instructions and
further affect the course of execution of some instructions. In particular, flags are used
to perform conditional transitions: a certain instruction performs an arithmetic or other
operation, and the next instruction transfers control to another place in the program, but
only if the result of the previous operation satisfies certain conditions; these conditions
are checked by the set flags. Let us list some of the flags:
• ZF - zero flag; this flag is set during arithmetic and comparison operations: if
the result of the operation is zero, ZF is set to one;
• CF - carry flag; after performing an arithmetic operation on unsigned numbers,
this flag is set to one if a carry from a higher digit is required, i.e. the result does
not fit in the register, or if a borrow from a non-existent digit is required during
subtraction, i.e. the subtracted is greater than the decremented (see §1.4.2);
otherwise the flag is set to zero;
• SF - sign flag; it is set equal to the high bit of the result, which for sign numbers
corresponds to the sign of the number (see page 208);
• OF - overflow flag (overflow flag); set to one if an overflow occurred when
working with signed numbers (see page 208);
• DF - direction flag (direction flag); this flag can be set with the std command
and reset with the cld command; depending on its value, string operations,
which we will consider a little later, are performed in forward or reverse
direction;
• PF and AF are parity flag and auxiliary carry flag; we do not need these flags;
• IF and TF are interrupt flag and trap flag; these flags are not available to us,
they can be changed only in the privileged mode of the processor.
In fact, such a set of flags existed before the ²386 processor; during the transition to the ²386
processor, the flag register, like all other registers, increased in size and changed its name to
EFLAGS, but all the new flags are inaccessible in limited mode, so we will not consider them.
§ 3.2. Basics of the І386 command system 549
3.2.2. User task memory. Segments
It is clear that CPU registers will not be enough to store all the information needed
in any more or less complex program. Therefore, registers are used only for short-term
storage of intermediate results that are about to be needed again. In addition to registers,
a program can use RAM to store information.
As we discussed in §3.1.1, the von Neumann architecture assumes that both the
program itself (i.e., the machine instructions that make it up) and all the data it works
with are located in memory cells that are identical in their structure and have addresses
from a single address space. In our case, each memory cell can store eight bits (one
byte) and has its own unique address - a number of 32 bits (we are talking, of course,
about virtual addresses, which we discussed in §3.1.2).
Although physically all memory locations are exactly the same, the operating
system can set different access capabilities to different memory locations for a user
task. This is achieved by means of hardware memory protection, which we have already
mentioned. In particular, some memory areas may be available to a task for reading
only, but not for modification; in addition, not every memory area is allowed to be
treated as machine code, i.e., to write the addresses of cells from this area to the
instruction counter register. If a task is allowed to treat the contents of a memory area
as a fragment of an executable machine program, the memory area is said to be
executable; the memory area whose contents the task can modify is called writable. The
term read access is also commonly used, but when applied to RAM, the absence of this
type of access usually means the absence of any access at all.
Usually, modern operating systems build the virtual address space of a user task by
dividing it into several segments, among which there are three main segments: code
segment, data segment and stack segment. At program startup, the first two segments
are formed in memory on the basis of information recorded in the executable file, and
the third - the stack - could be created empty, but in Unix family systems this segment
at startup contains information about command line parameters and environment
variables (see §§1.2.6, 1.2.16, 2.6.12). As we have already mentioned in §3.1.4, the
information in an executable file is organized in sections, and the contents of one
segment can be formed from one or more sections; in addition, some sections play an
auxiliary role and are not loaded into memory at startup.
A code segment contains the machine code that actually makes up the executable
program. Naturally, the memory area allocated for the code segment is available to the
task for execution. At the same time, the operating system does not allow user tasks
to modify the contents of the code segment; an attempt of a task to write something
into its own code segment is considered as a violation of memory protection. This is
done for a rather simple reason: if several instances of the same program are
simultaneously launched in the system as tasks, the operating system usually stores only
one instance of the machine code of such a program in physical memory. This is true
even if the running tasks belong to different users and have different permissions in the
system. If one of such tasks modifies "its" code segment, it will obviously prevent the
others from running - after all, their machine code is located (physically) in the same
memory area. However, the code segment is read-only, so it can be used not only for
§ 3.2. Basics of the І386 command system 550
code per se, but also for storing constant data - information that does not change during
program execution.
A code segment in memory is formed from a code section written to an executable
file, which is denoted by ".text" in programs; a dot before the section name is
mandatory and is part of the name.
The second of the three main segments is called the data segment; global and
dynamic variables are stored here. This segment is available to the task both read and
write, but the operating system usually prohibits the transfer of control inside the data
segment to make it somewhat difficult to "hack" computer programs. The initial
contents of the data segment are determined by two sections of the executable file. The
first of them is called the data section proper, in programs it is denoted by ".data " 233

and contains initialized data, i.e. such global variables for which the initial value is set
in the program. The second section is called the uninitialized data section or BSS
section and is denoted by ".bss"; as the name makes clear, this section is for
234

variables for which no initial value is specified. The BSS section differs from the data
section in one important feature: since the contents of the data section at the moment of
program start must be as specified by the program, the executable file has to store its
image (as a whole), whereas for the BSS section it is enough to store only its size.
Already at runtime, a task may ask the operating system to increase the data segment;
this creates a new memory area that can be used to store dynamic variables (see
§2.10.3). However, the way memory is allocated for the heap (recall that this is the
name of the memory area in which dynamic variables are placed) depends on the
operating system and the compiler used; often a separate segment is created for the
heap.
We will not consider working with dynamic memory in assembly language, but for
inquisitive readers, we will inform you that in Linux additional memory allocation is done by the
brk system call, which can be found in the technical documentation of the kernel; this call
allows you to change (usually increase) a data segment. FreeBSD allocates additional
memory using the mmap system call, which is unfortunately much more complicated,
especially for assembly language programs; we will return to this call in Volume 2 after we
learn more about C. The result of mmap's operation is a separate, separate data segment.
The result of mmap is a single segment.
The third main segment is the so-called stack segment; it is needed to store local
variables in subroutines and return addresses from subroutines. We still have a detailed
story about the stack ahead of us, for now we will just note that this segment is also
writable; its availability for execution depends on the particular operating system and
even on the particular version of the kernel: for example, in most versions of Linux it
is possible to pass control to the stack section, but a special "patch " to the kernel source
235

233
Data; read "data".

Originally the abbreviation BSS stood for Block Started by Symbol, which was due to the
234

peculiarities of one old assembler. Now programmers prefer to decipher BSS as Blank Static Storage.
Programmers use the English word patch to denote a file containing a formal list of differences
235

between two versions of a program's source code; having the initial version of the source code and such
§ 3.2. Basics of the І386 command system 551
code removes this possibility. This section can also grow in size as needed, and this
happens automatically (as opposed to growing the data segment, which must be
requested by the operating system explicitly). The stack segment is always present in a
user task; its initial contents depend only on the program's startup parameters. The
executable file does not contain any information about this segment, so no sections have
anything to do with it.

3.2.3. Directives for memory retraction


The contents of this paragraph are not directly related to the i386 processor architecture;
here we will look at assembly language-specific directives. We have to tell you about them
now, because they will not be necessary for the rest of this chapter.
The assembler translates the symbols of machine commands we have written into
a certain image of a memory area - an array of numbers (data) that will be written into
adjacent RAM cells. Then, when the program is started, control will be transferred to
this memory area (i.e. the address of one of these cells will be written to the EIP
register) and the CPU will start executing our program using the numbers from the
image created by the assembler as command codes.

Similarly, you can use the assembler to create an image of a memory area that
contains data rather than commands. To do this, we need to tell the assembler how
much memory we need for certain needs, and possibly set the values that will be placed
in this memory before the program starts.
Using our instructions, the assembler will form a separate image of memory
containing commands (the image of the .text section) and a separate image of
memory containing initialized data (the image of the .data section), and will
also calculate how much memory we need, the initial value of which we don't care
about, so we don't need to form an image for it, but only specify the total size (the size
of the .bss section). The assembler will write all this into a file with object code, and
the linker will form an executable file from such files (possibly several), which
contains, besides the actual machine code, firstly, the data to be written into memory
before the program starts, and secondly, instructions on how much memory the program
will need other than the memory needed to accommodate the machine code and initial
data. To tell the assembler in which section this or that fragment of the memory
image to be formed should be placed, we must use the section directive in the
assembly language program; for example, the line

section .text

means that the result of processing of subsequent lines should be placed in the code

a file, a special program can be used to get the modified version, which allows sending to each other
not the whole program, but only the file containing the necessary changes. The word patch literally
translates as "patch", but programmers have not adopted this translation. There is no established
Russian-language term corresponding to the English patch, and in most cases the direct transliteration
from English - "patch" - is used.
§ 3.2. Basics of the І386 command system 552
section, and the line

section .bss

causes the assembler to switch to forming the uninitialized data section. Section
switching directives can occur as many times as you like in a program - we can form
part of one section, then part of another, then return to forming the first section.
We can inform the assembler about our memory needs by using memory
reservation directives, which are divided into two types: directives for reserving
uninitialized memory and directives for setting initial data. Usually, both types of
directives are preceded by a label so that it can be used to refer to the address in memory
where the assembler has allocated the required cells for us.
Memory reservation directives (of both kinds) in NASM assembler use bytes,
words, as well as double and quadruple words as memory capacity units, and there is
a small problem with this terminology. We have mentioned several times that the I386
processor we are studying is a 32-bit processor, that is, its machine word (see page
1.4.7) is 32 bits. We know from §3.1.3 that the previous processors of the same line (up
to and including the 80286) were 16-bit, that is, they had a machine word of 16 bits.
Programmers working with these processors at the level of machine instructions were
accustomed to call two bytes of information a "word", and four bytes were called a
"double word". When the word size doubled with the release of the next processor,
programmers did not change the usual terminology, and they could hardly do it. Thus,
our NASM assembler can generate machine code for all x86 processors - not only for
32-bit processors, but also for 16-bit and 64-bit ones; it would be strange to change the
meaning of the term "word" as a unit of memory quantity measurement every now and
then, because forming an image of sections which do not contain machine commands
(.data and .bss sections) is not connected with the type of the processor used at
all.
All this creates a certain confusion: we remember that the machine word is 32 bits,
i.e. four bytes, but we use the word word in assembly language programs to denote
a memory area of two bytes; a four-byte memory area is called a "double word"
(dword, double word), and "quadro words" (qword, quadro word) are also
occasionally used.
Uninitialized memory reservation directives tell the assembler to allocate a given
number of memory locations, and nothing is specified beyond the number. We do not
require the assembler to fill the allocated memory with any specific values; it is enough
that the memory is available at all. The resb directive is used to reserve a specified
number of single-byte cells; the resw directive is used to reserve memory
for a certain number of "words", i.e. two-byte values (for example, short integers); the
resd directive is used for "double words" (four-byte values); the directive
is followed (as a parameter) by a number indicating the number of values for which we
reserve memory. As we have already mentioned, a label is usually placed before the
memory reservation directive. For example, if we write the following lines:

string resb 20 count resw 256 xresd1


§ 3.2. Basics of the І386 command system 553
- then at the address associated with the string label there will be an array of 20
single-byte cells (such an array can be used, for example, to store a string of
characters); at the address count the assembler will allocate an array of 256
double-byte "words" (i.e. 512 cells, which can be used, for example, for some counters;
finally, at address x there will be one "double word", i.e. four bytes of memory, in
which you can store a rather large integer, or you can store the address of some other
memory area, because we have, as we remember, 32-bit addresses.
Directives of the second type, called initial data directives, do not simply reserve
memory, but specify what values should be in this memory by the time the program is
started. The corresponding values are specified after the directive, separated by
commas; memory is allocated as much as the values are specified. The db
directive is used to specify single-byte values, the dw directive is
used to specify "words", and the dd directive is used to specify "double
words". For example, the string
fibon dw 1, 1, 2, 3, 5, 8, 13, 21

will reserve memory for eight two-byte "words" (i.e. 16 bytes in total), with the first
two "words" containing the number 1, the third word containing the number 2, the
fourth word containing the number 3, etc. The fibon label will be associated with
the address of the first byte of the memory allocated and filled in this way.
Numbers can be specified not only in decimal, but also in hexadecimal, octal, and
binary. Hexadecimal numbers in NASM assembler can be specified in three ways: by
adding the letter h to the end of the number (e.g., 2af3h), or by writing
the $ symbol before the number ($2af3), or by putting 0x symbols before the
number, as in C (0x2af3). When using the $ symbol, care must be taken that the
number immediately following the $ is a number, not a letter, so if the number
begins with a letter, you must add a 0 (e.g., write $0f9, not $f9). Similarly, you
should watch the first character when using the letter h: for example, a21h will be
perceived by the assembler as an identifier, not as a number, so you should write
0a21h; but with the number 2fah such a problem does not arise initially, because
the first character in its record is a digit. An octal number is denoted by adding the letter
o or q after the number (e.g., 634o, 754q). Finally, a binary number is denoted by
the letter b (10011011b).
Character codes and text strings deserve special mention. As we already know, to
work with text data, each character is assigned a character code - a small positive
integer. We are already familiar with the ASCII encoding table (see §1.4.5). To prevent
the programmer from having to memorize the codes corresponding to printed characters
(letters, numbers, etc.), the assembler allows you to write the character itself instead of
the code by enclosing it in apostrophes or double quotes. Thus, the directive

fig7 db '7'

will place in memory a byte containing the number 55 - the code of the character
§ 3.2. Basics of the І386 command system 554
"seven", and the address of this cell will be associated with the label fig7. We can
also write a whole string at once, for example, like this:

welmsg db 'Welcome to Cyberspace!'

In this case, the welmsg address will contain a string of 22 characters (i.e. an array
of single-byte cells containing the codes of the corresponding characters). As already
mentioned, NASM allows using both single quotes (apostrophes) and double quotes,
so that the next string is completely similar to the previous one:

welmsg db "Welcome to Cyberspace!"

Within double quotes, apostrophes are treated as a normal character; the same can be
said for the double quote character within apostrophes. For example, the phrase "So I
say: "Don't panic!"" can be set as follows:
panic db 'So I say, "Don't," "'", "don't panic"'

Here we first used an apostrophe to mark the beginning of a string literal, so that the
double-quote character marking the beginning of direct speech entered our string as a
simple character. Then, when we needed an apostrophe in the string, we closed the
single quotes and used the double quotes to type the apostrophe character inside them.
At the end, we used apostrophes again to set the rest of our phrase, including the double-
quote character ending the direct speech.
Note that strings in single and double quotes can be used not only with the db directive,
but also with the dw and dd directives, but you need to take into account some subtleties,
which we will not consider.
When writing programs, initial data directives are usually placed in the .data
section (i.e., the .data section directive is placed before the data
description), while memory reservation directives are allocated to the .bss
section. This is due to the already mentioned difference in their nature: initialized
data must be stored in the executable file, while for uninitialized data it is enough to
specify their total number. The .bss section, as we remember, differs from .data
in that only a size specification is stored in the executable; in other words, the size of
the executable does not depend on the size of the .bss section. Thus, if we add a
directive to the .data section

db "This is a string

— then the size of the executable file will increase by 16 bytes (we have to store the
string "This is a string" somewhere), whereas if we add the directive to the
.bss section

resb 16
— the assembler will allocate 16 bytes of memory, but the size of the executable file
will not change at all.
§ 3.2. Basics of the І386 command system 555
We can also place the directives for setting the initial data in the .text section, so that
they will end up in the code segment during operation; we just need to remember that then
this data cannot be changed during the program operation. But if we have a large array in our
program that we don't need to change (some table of constants or, more often, some text that
our program must print), it is more advantageous to place this data in the code segment,
because if users run many instances of our program at the same time, they will have one code
segment for all of them and we will save memory. It is clear that this saving is possible only
for immutable data. Remember that an attempt to change the contents of a code segment at
runtime will cause the program to crash!
Assembler allows us to use any commands and directives in any sections. In particular,
we can put machine commands in the data section, and they will be translated into the
corresponding machine code as usual, but we will not be able to transfer control to this code.
Still, in some exotic cases this may make sense (indeed, we can treat programs as data, so
there are programs that work with machine code as data), so the assembler will silently
execute our instructions, generating machine code that is never executed. If the assembler
encounters memory reservation directives ( resb, resw, etc.) in the .data section, it will
also do its job, but in this case a warning message will be issued; indeed, the situation is a bit
strange, because it increases the size of the executable file without any effect, although it does
not lead to any fatal consequences. Directives to reserve uninitialized memory in a code
section will look even stranger: indeed, if the initial value is not specified and we cannot change
this memory, then no meaningful value will ever get into such memory, and what good is it
then! Nevertheless, the assembler will continue the translation even in this case, issuing only
a warning message. A warning message will also be generated if the BSS section contains
anything other than uninitialized memory reservation directives: the assembler knows for sure
that the image generated for this section will have nowhere to write to. Even though the
assembler will continue working in all of the above cases after issuing a warning, it is more
correct to assume that you have made a mistake and correct the program.

3.2.4. The mov command and operand types


One of the most common commands in assembly language programs is the
command to move data from one place to another. It is called mov (from the word to
move). This command is also interesting for us because we can use its example to study
a number of very important issues, such as types of operands, the concept of operand
length, direct and indirect addressing, the general view of the executive address,
working with labels, etc.
So, the mov command has two operands, i.e. two parameters written after the
command mnemocode (in this case, the word "mov") and specifying the objects on
which the command will work. The first operand specifies the place where the data will
be placed, and the second operand specifies where the data will be taken from. Thus,
the instruction already familiar to us from the introductory examples is

mov eax, ebx

copies data from the EBX register to the EAX register. It is important to note
that the mov command only copies data without performing any conversions.
There are other commands for data conversion.
§ 3.2. Basics of the І386 command system 556
In the examples discussed above, we have seen at least two uses of the mov
command:
mov eax, ebx
mov ecx, 5

The first variant copies the contents of one register into another register, while the
second variant puts into the register some number specified directly in the command
itself (in this case the number 5). This example shows that operands are of different
types. If the operand is the name of the register, we speak of a register operand; if the
value is specified directly in the instruction itself, such an operand is called a direct
operand.
In fact, in this case we should not even talk about different types of operands, but about
two different commands, which are simply denoted by the same mnemonics. The two mov
commands from our example are translated into completely different machine codes, with
the first one occupying two bytes in memory and the second one five bytes, four of which are
used to place the immediate operand.
In addition to direct and register operands, there is a third type of operand - an
address operand, also called a memory operand. In this case, the operand in one way
or another specifies the address of the memory location or area to be dealt with by the
command. It should be remembered that in NASM assembly language the "memory"
operand is always denoted by square brackets, in which the address itself is written.
In the simplest case, the address is given explicitly, i.e. in the form of a number; usually
when programming in assembly language we use labels instead of numbers, as already
mentioned. For example, we can write:

section .data
; ...
count dd 0

(the ";" symbol in assembly language means a comment), describing a 4-byte


memory area with the count label associated with its address, where the number 0
is initially stored. If you now write

section .text
; ...
mov [count], eax

- this mov command will indicate copying of data from the EAX register to the
memory area marked with the count label, and, for example, the command

mov edx, [count]

will, on the contrary, indicate copying from memory at address count to register
EDX. To understand the role of square brackets, consider the command

mov edx, count


§ 3.2. Basics of the І386 command system 557
Recall that the label (in this case count), as we discussed on page 535, is simply
replaced by a number, in this case the address of a memory location. 535, the assembler
simply replaces it with some number, in this case the address of the memory location.
For example, if the count memory location is in cells whose addresses begin with
40f2a008, then the above command is exactly the same as if we had written

mov edx, 40f2a008h

Now it is obvious that this is just a familiar form of the mov instruction with a direct
operand, i.e. this instruction writes the number 40f2a008 into the EDX register
without looking into whether this number is the address of any memory location or not.
If we add square brackets, we are talking about accessing memory at a given address,
i.e. the number will be used as the address of the memory area where the value to be
handled (in this case - put it into the EDX register) is located.

3.2.5. Indirect addressing; executive address


It is not always possible to set the address of a memory area as a number or a label.
In many cases we have to calculate the address in one way or another and then access
the memory area by this calculated address. For example, this is how things will be if
we need to fill all the elements of some array with the specified values: we know the
address of the beginning of the array for sure, but we will have to organize a loop
(through the array elements) and at each step of the loop copy the specified value into
the next (each time different) element of the array. The easiest way to do this is to set a
certain address equal to the array start address before entering the loop and increment
it at each iteration.
An important difference from the simplest case discussed in the previous paragraph
is that the address used for memory access will be calculated during program execution
rather than set when the program is written. Thus, instead of telling the processor to
"access memory at such and such address", we need to require a more complex action:
"take a value there (for example, in a register), use this value as an address, and access
memory at this address". This way of addressing memory is called indirect addressing
(as opposed to direct addressing, when the address is set explicitly).
The i386 processor allows only values stored in processor registers to be used for
indirect addressing. The simplest type of indirect addressing is accessing memory at an
address stored in one of the general-purpose registers. For example, the instruction

mov ebx, [eax]

means "take the value in the EAX register, use that value as an address, access memory
at that address, fetch four bytes from there, and write them to the EBX register",
whereas the instruction

mov ebx, eax

meant, as we have already seen, simply "copy the contents of the EAX register to the
§ 3.2. Basics of the І386 command system 558
EBX register".
Let's consider a small example. Suppose we have an array of single-byte elements
intended for storing a string of characters, and we need to put the code of the '@'
character into each element of this array. Let's see what code fragment we can use to
do it (let's use the commands we already know from the example on page 533, adding
to them the commands we already know from the example on page 533). 533, adding
to them the decrement command dec, which decrements its operand by one):
section .bss
array resb 256 ; array of 256 bytes
sectio
.text
n
; ... ; number of elements -> into
mov ecx, 256 counter (ECX)
mov edi, ; array address -> in EDI
array : '@'
mov al, ; required code -> into one-
again: mov [edi], al byte
; AL the code into the next
enter
inc edi element
; increment the address
dec ecx ; decrease the counter
jnz again ; if it is not zero, repeat the
cycle
Here we used the ECX register to store the number of iterations of the loop that still
remain to be executed (initially 256, at each iteration we decrease it by one, reaching
zero - end the loop), and to store the address we used the EDI register, in which before
entering the loop we put the address of the beginning of the array array array
and at each iteration we increase it by one, thus moving to the next cell.
An attentive reader may notice that the code fragment is not quite rationally written. First,
you could have used only one variable register, either by comparing it not to zero but to the
number 256, or by viewing the array from the end. Secondly, it is not quite clear why the AL
register was used to store the character code, because you could have used the direct
operand directly in the command that puts the value into the next element of the array.
All this is true, but then we would have to use, first, an explicit indication of the operand
size, which we have not discussed yet; and, second, we would have to use the cmp
command or complicate the command of assigning the initial value of the address. By using
a code that is not quite rational here, we were able to limit ourselves to fewer explanations
that distract attention from the main task.
Thus, the address for memory access is not always predetermined; we can calculate
the address already during program execution, write the result of calculations into the
processor register and use indirect addressing. The address at which the next machine
instruction will address the memory (no matter whether this address is explicitly set or
calculated) is called the executive address. Above we have considered the situations
when the address is calculated, the result of calculations is stored in a register and it is
the value stored in the register that is used as the executive address. For the convenience
of programming, the i386 processor allows you to set the executive address so that it is
calculated during instruction execution.
More specifically, we can require the processor to take some predetermined value
§ 3.2. Basics of the І386 command system 559
(perhaps zero), add to it a value stored in one of the registers, and then take a
EAX
EAX
EBX
EBX
ECX ECX
CONSTANT EDX
1T 1-Г EDX
ESI 4 * 8
BI
EDI™
EDI
EBP EDI
EBP EBP
ESP

Fig. 3.2. General view of the executive address

value stored in another register, multiply it by 1, 2, 4, or 8, and add the result to the
existing address. For example, we can write

mov eax, [array+ebx+2*edi]

By executing this instruction, the processor will add the number specified by the
array label , with the contents of the EBX register and double the contents of the
236

EDI register, use the result of the addition as the executive address, fetch 4 bytes from
the memory area at this address and copy them to the EAX register. Each of the three
summands used in the executive address is optional, so we can use only two summands
or just one - as we have done so far.
It is important to realize that the expression in square brackets cannot be arbitrary
in any way. For example, we cannot take three registers, we cannot multiply one
register by 2 and another by 4, we cannot multiply by numbers other than 1, 2, 4, and
8, we cannot multiply two registers with each other, or subtract a register value instead
of adding it, etc. A general view of the executive address is shown in Fig. 3.2; as can
be seen, ESP cannot be used as the register to be multiplied by a factor; however, any
of the eight general-purpose registers can be used as the register whose value is simply
added to a given address.
The assembler allows certain liberties with address writing, as long as it can correctly
convert the address into a machine instruction. First, the summands can be arranged in any
order. Second, you can use two or more constants instead of one: the assembler itself will add
them up and write the result into the resulting machine command. Finally, you can multiply a
register by 3, 5 or 9: if you write, for example, [eax*5], the assembler will "translate" it as
[eax+eax*4]. Of course, if you try to write [eax+ebx*5], the assembler will generate
an error, because you have already used the summand it needs.
To understand why such a complex type of executive address may be needed, it is
enough to imagine a two-dimensional array consisting, for example, of 10 lines, each
of which contains 15 four-byte integers. Let's call this array matrix, putting the

Recall that a label is nothing but a designation of some number, in this case, most likely, the
236

address of the beginning of some array.


§ 3.2. Basics of the І386 command system 560
corresponding label before its description:

matrix resd 10*15

To access the elements of the N-th line of such an array, we can calculate the offset
from the beginning of the array to the beginning of this N-th line (to do this we need to
multiply N by the length of the line, which is 15 * 4 = 60 bytes), put the result of
calculations, say, in EAX, then in another register (for example, in EBX) put the
number of the desired element in the line - and the executive address of the form
[matrix+eax+4*ebx] will exactly show us the place in memory where the
desired element is located.
The processor's ability to calculate the effective address can be used separately
from memory access if desired. For this purpose, the lea instruction is provided (the
name is derived from the words load effective address). The command has two
operands, and the first one must be a register operand (2 or 4 bytes in size), and the
second one must be an operand of the "memory" type. The command does not make
any memory access; instead, the address calculated in the usual way for the second
operand is entered into the register specified by the first operand. If the first operand is
a two-byte register, the lower 16 bits of the calculated address will be written to the
register. For example, the command

lea eax, [1000+ebx+8*ecx]

will take the value of the ECX register, multiply it by 8, add to it the value of the EBX
register and the number 1000, and the result will be entered into the EAX
register. Of course, you can use a label instead of a number. The restrictions on the
expression in brackets are exactly the same as in other cases of using an operand of the
"memory" type (see Figure 3.2 on page 561).
Let us emphasize once again that the lea command only calculates an address
without accessing memory, despite the use of an operand of the "memory" type. It can
be used for ordinary arithmetic calculations, including those not related to addresses,
and, I must say, sometimes it can be very convenient. As we will see later, integer
multiplication commands are rather cumbersome, so, say, if you want to multiply by
three, five or nine, the easiest way to do it is to use lea.

3.2.6. Operand sizes and their permissible combinations


Recall that we introduced three types of operands:
• direct, setting the value directly in the command;
• register, which instructs you to take a value from a given register and/or place a
value in that register;
• operands of the "memory" type, specifying the address at which the required
value is located in memory and/or at which the result of an instruction is to be
written to memory.
In different situations, there may be certain restrictions on the type of operands. For
§ 3.2. Basics of the І386 command system 561
example, it is obvious that a direct operand cannot be used as the first argument of the
mov instruction, because this argument must specify the place where the data is
copied; we can copy data to a register or to a RAM area, but direct operands do not
specify either of these. There are other constraints, usually imposed by the design of
the processor itself as an electronic circuit. For example, neither in the mov command
nor in other commands can two operands of the "memory" type be used at once. For
example, if it is necessary to copy a value from memory area x to memory area y,
it will have to be done through a register:

mov eax, [x] mov [y], eax

The mov [y],[x] command will be rejected by the assembler as incorrect


because it does not correspond to any machine code: the processor simply does not
know how to perform such copying in one instruction.
All other combinations of operand types for the mov command are allowed, i.e.
we can use one mov command:
• copy the value from register to register;

• copy the value from the register to memory;


• copy a value from memory to a register;
• set (direct operand) the value of the register;
• set (by direct operand) the value of a cell or memory area.
The last option deserves special consideration. So far, in all the commands we have
used in the examples, at least one of the operands has been a register; this allows us not
to think about the size of the operands, that is, whether our operands are single bytes,
two-byte "words", or four-byte "double words". Note that the mov instruction cannot
transfer data between operands of different sizes (e.g., between the one-byte AL
register and the two-byte CX register); therefore, if at least one of the operands is a
register, we can always tell unambiguously what size data portion is to be processed (in
the case of mov, it is a simple copy). However, when the first operand of the
mov instruction specifies the address in memory where the value is to be written, and
the second operand is direct (i.e. the value to be written is specified directly in the
instruction), the assembler does not know and has no reason to assume what size data
portion is to be sent, or, in other words, how many bytes of memory, starting from the
specified address, are to be written. Therefore, for example, the command
mov [x], 25 ; ERROR!!!

will be rejected as incorrect: it is not clear whether you mean a byte with value 25, a
"word" with value 25, or a "double word" with value 25. Nevertheless, a
command like the one above may well be necessary, and the processor knows how to
execute such a command. To use such an instruction, we need to explain to the
assembler exactly what we mean by putting a size specifier before any of the operands
- the word byte, word, or dword, meaning byte, word, or double word
§ 3.2. Basics of the І386 command system 562
(i.e., size 1, 2, or 4 bytes), respectively. For example, to write the number 25 into a
four-byte memory location at address x, we can write
mov [x], dword 25

or

mov dword [x], 25


Let us make one important remark. Different machine commands that perform similar
actions can be denoted by the same mnemonics. Thus,
mov eax, 2
mov eax, [x] mov [x], eax mov [x], al
are four completely different machine commands, they have different machine code
values and even occupy different numbers of bytes in memory. At the same time, the
commands
mov eax, 17
mov eax, x

use the same machine code for the operation and differ only in the value of the second
operand, which is direct in both cases; indeed, since the label x will be replaced by an
address, that is simply a number.

3.2.7. Integer addition and subtraction


The addition and subtraction operations on integers are performed with the add
and sub commands, respectively. Both commands have two operands each, with
the first operand specifying both one of the numbers involved in the operation and the
place where the result should be written; the second operand specifies the second
number for the operation (the second addition or subtraction). The first operand must
be of register or memory type, and the second operand of both commands can be of any
type, but you can not use two operands of memory type in one command - that is, these
commands can have the same five forms as the mov command. For example, the
command

add eax, ebx

means "take a value from the EAX register, add to it a value from the EBX
register, and write the result back to the EAX register". The command

sub [x], ecx

means "take a four-byte number from memory at address x, subtract from it the value
from the ECX register, write the result back into memory at the same address". The
command

add edx, 12
§ 3.2. Basics of the І386 command system 563
will increment by 12 the contents of the EDX register, and the command

add dword [x], 12

will do the same with the four-byte memory location at address x; note that we had to
explicitly specify the size of the operand, as discussed in the previous paragraph.
Interestingly, the add and sub instructions do not care whether we consider
their operands to be signed or unsigned numbers . Adding and subtracting signed and
237

unsigned numbers is done exactly the same from an implementation point of view, so
that when adding and subtracting, the processor may not (and does not) know
whether it is working with signed or unsigned numbers. It is the programmer's
responsibility to remember which numbers are meant.
The add and sub commands set the values of the OF, CF, ZF, and SF flags
according to the result obtained (see page 548). The ZF flag is set if the last operation
results in zero, otherwise the flag is cleared; it is clear that the value of this flag is
meaningful for both signed and unsigned numbers, because the representation of zero
is the same for them.
The flags SF and OF make sense to consider only when working with signed
numbers. SF is set if a negative number is obtained, otherwise it is reset. The
processor sets this flag by copying the high bit of the result into it; for signed numbers
the high bit, as we know, corresponds to the sign of the number. The OF flag is set if
an overflow has occurred, which means that the sign of the obtained result does not
correspond to the one that should be obtained based on the mathematical sense of
operations - for example, if the result of adding two positive ones is negative or vice
versa. Clearly, this flag has no meaning for unsigned numbers.
Finally, the CF flag is set if (in terms of unsigned numbers) there is a carry
from a higher digit or a borrow from a non-existent digit. In terms of meaning, this flag
is analogous to OF when applied to unsigned numbers (the result does not fit into
the operand size or is negative). CF has no meaning for signed numbers.
Without knowing what numbers are meant, the processor sets all four flags based
on the results of the add and sub instructions; the programmer must use those that
correspond to the meaning of the operation performed.
The presence of the carry flag allows you to organize addition and subtraction of
unsigned numbers that do not fit into the registers in a way reminiscent of school
addition and subtraction "in column" - the so-called addition and subtraction with
carry. The i386 processor has adc and sbb commands for this purpose. By their
operation and properties they are completely similar to the add and sub
commands, but they differ from them in that they take into account the value of the

We discussed signability and unsignability of integers in the introductory section, see §1.4.2; if
237

you don't feel confident in handling these terms, be sure to reread this paragraph and understand the
issue; if necessary, find someone who can explain it to you. §1.4.2; if you do not feel confident in
handling these terms, be sure to reread this paragraph and understand the question; if necessary, find
someone who can explain it to you. Otherwise you run the risk of not understanding anything at all.
§ 3.2. Basics of the І386 command system 564
carry flag (CF) at the moment when the operation is started. The adc command adds
the carry flag value to its final result; the sbb command, on the other hand, subtracts
the carry flag value from its result. After the result is generated, both commands reset
all flags, including CF, according to the new result.
Here is an example. Suppose we have two 64-bit integers and the first one is written
into the EDX (high 32 bits) and EAX (low 32 bits) registers and the second one is
written into the EBX and ECX registers in the same way. Then these two numbers
can be added by the commands

add eax, ecx ; add low-order parts


adc edx, ebx ; now older ones, taking into account
the carry over

If we need to subtract, it is done by the commands sub eax, ecx ; subtract the younger
parts sbb edx, ebx ; now the older ones, taking into account the loan

The addition and subtraction operations should be supplemented by several other


commands. Incrementing and decrementing an integer by one is such a common special
case that special increment and decrement commands inc and dec are provided
for it. These commands, which we have already encountered in earlier examples, have
only one operand (register or memory type) and perform incrementing and
decrementing by one respectively. Both commands set the ZF, OF, and SF flags,
but do not affect the CF flag. When using these commands with a memory operand,
explicitly specifying the size of the operand is mandatory, because there is no other
way for the assembler to know what size memory area is meant.
The neg command, which also has one operand, indicates a sign change, i.e., a
unary minus operation. It is usually applied to signed numbers; however, it sets all four
flags ZF, OF and SF and CF as if the operand were subtracted from zero.
Finally, the cmp command (from the word compare) performs exactly the same
subtraction as the sub command, except that the result is not written anywhere.
The command is called to set flags, usually followed immediately by a conditional jump
command. It is worth remembering that the first operand of cmp cannot be a direct
operand; at first glance this seems illogical, but the reason for this restriction is very
simple to understand: the sub command cannot work with a direct operand as the first
operand, because the result must be written there, and cmp is executed exactly the
same way as sub, except for the last action of writing the result; to make cmp accept
a direct operand on the left side, we would have to significantly complicate its hardware
implementation.

3.2.8. Integer multiplication and division


Unlike addition and subtraction, multiplication and division are schematically
§ 3.2. Basics of the І386 command system 565
implemented in a relatively complex way , so that multiplication and division
238

instructions may seem to be organized in a very awkward way for the programmer. The
reason for this seems to be that the creators of the i386 processor and its predecessors
acted here primarily for reasons of convenience in the implementation of the processor
itself.

It should be said that multiplication and division cause some difficulties not only
for processor designers, but also for programmers, not only because of the
inconvenience of the corresponding commands, but also because of their very nature.
First, unlike addition and subtraction, multiplication and division are performed quite
differently for signed and unsigned numbers, so different commands are necessary.
Second, interesting things happen with operand sizes. In multiplication, the size
(number of significant bits) of the result can be twice the size of the original operands
(not by one single bit, as in addition and subtraction), so if we don't want to lose
information, a single checkbox is not enough: we need an additional register to store
the higher bits of the result. With division the situation is even more interesting: if the
modulus of the divisor exceeds 1, the size of the result will be smaller than the size
of the divisor (to be more precise, the number of significant bits of the result of binary
division does not exceed n - t +1, where n and t are the number of significant bits of
the divisor and the divisor respectively), so it is desirable to be able to set the divisor
longer than the divisor and the result. In addition, integer division gives not one but two
numbers as the result: a quotient and a remainder. It is desirable to combine finding the
quotient and remainder in one operation, otherwise it may lead to double execution (at
the level of electronic circuits) of the same operations.
All integer multiplication and division commands have only one operand , 239

which specifies the second multiplier in multiplication commands and the divisor in
division commands, and this operand can be of register or memory type, but not direct.
In the role of the first multiplier and divisor, as well as the place for recording the result,
the implicit operand is used, which are the registers AL, AX, EAX, and, if necessary,
the register pairs DX:AX and EDX:EAX (recall that the letter A stands for the word
"accumulator"; this is the special role of the register EAX, which was discussed
on page 546). 546).
The mul command is used to multiply unsigned numbers, and the imul
command is used to multiply signed numbers. In both cases, depending on the digit
capacity of the operand (the second multiplier), the first multiplier is taken from the
register AL (for a one-byte operation), or AX (for a two-byte operation), or EAX
(for a four-byte operation), and the result is placed in the register AX (if the
operands were one-byte), or in the register pair DX:AX (for a two-byte operation), or

238
Some processors, even modern ones, do not have these operations at all, and the reason for this is
the high complexity of their hardware implementation. On such processors, multiplication has to be
performed "manually", by binary column; usually a subroutine is created for such multiplication.
239
In fact, there is an exception to this rule: the command of integer multiplication of signed numbers
imul has two-place and even three-place forms, but we will not consider these forms: it is even more
difficult to use them than the usual one-place form.
§ 3.2. Basics of the І386 command system 566
in the register DX:AX (for a two-byte operation), or in the register DX:AX (for a two-
byte operation).
Table 3.1. Location of implicit operand and results for integer division and
multiplication operations depending on the digit capacity of the explicit operand
multiplication division
discharge. implicit multiplication divisible private residue
(bit) multiplier result
8 AL AX AX AL AH
16 AX DX:AX DX:AX AX DX
32 EAX EDX:EAX EDX:EAX EAX EDX

EDX:EAX pair (for a four-byte operation). This can be visualized more clearly in the
form of a table (see Table 3.1).
The mul and imul commands reset the CF and OF flags if the upper half
of the result is not actually used, i.e. all significant bits of the result fit in the lower half.
For mul this means that all digits of the higher half of the result contain zeros, for
imul - that all digits of the higher half of the result are equal to the high bit of
the lower half of the result, i.e. the whole result, whether it is the AX register or
register pairs DX:AX, EDX:EAX, is a sign extension of its lower half (respectively
registers AL, AX or EAX). Otherwise, CF and OF are set (both of them). The
values of the other flags are undefined after execution of mul and imul; this means
that nothing meaningful can be said about their values, and different processors may
set them differently, and even as a result of executing the same instruction on the same
processor with the same operand values, the flags may (at least theoretically) get
different values.
To divide (and find the remainder of the division) integers, the div command (for
unsigned numbers) and idiv (for signed numbers) are used. The only operand of the
command, as mentioned above, specifies the divisor. Depending on the digit capacity
of this divisor (1,2 or 4 bytes), the divisor is taken from register AX, register
pair DX:AX or register pair EDX:EAX, the quotient is placed in register AL, AX or
EAX, and the remainder of the division - in registers AH, DX or EDX respectively
(see Table 3.1). The partial is always rounded towards zero (for unsigned and positive
- to the smaller side, for negative - to the larger side). The sign of the remainder
calculated by the idiv command always coincides with the sign of the divisor, and
the absolute value (modulus) of the remainder is always strictly less than the modulus
of the divisor. The values of flags after integer division are not defined.
The situation when the divider contains the number 0 at the moment of execution of the
div or idiv instruction deserves special consideration. It is known that division by zero is
impossible, and the processor does not have its own means to report an error. Therefore, the
processor initiates a so-called exception, also called an internal interrupt, which results in
the operating system taking control; in most cases, it reports an error and terminates the
current task as an emergency. The same thing will happen if the result of division does not fit
into the allocated digits: for example, if we put 10h into EDX and any other number,
§ 3.2. Basics of the І386 command system 567
even just 0, into EAX, and try to divide it (i.e. hexadecimal 1000000000, or 2 ) by, say, 2
36

(writing it, for example, into EBX to make the division 32-bit), the result (2 ) will not fit into
35

the 32 bits, and the processor will have to initiate an exception. We will talk more about
exceptions (internal interrupts) in §3.6.3.
In integer division of signed numbers, it is often necessary to expand the divisor
before division: if we worked with one-byte numbers, from a one-byte divisor located
in AL, we must first make a two-byte divisor located in AX, for which we must
enter 0 in the high half of AX if the number is non-negative and FF16 if negative.
In other words, you actually need to fill the high half of AX with the sign bit from
AL. This can be done with the cbw (convert byte to word) command. Similarly, the
cwd (convert word to doubleword) command extends the number in the AX register
to the DX:AX register pair, that is, it fills the DX register bits. The
command cwde (convert word to dword, extentded) extends the same register AX to
the register EAX, filling the upper 16 bits of this register. Finally, the command cdq
(convert dword to qword) extends EAX to the register pair EDX:EAX, filling the
EDX register bits. The scope of these commands is not limited to integer division,
especially when it comes to cwde. The cbw, cwd, cwde, and cdq commands have
no operands because they always operate on the same registers.
Note that when dividing unsigned numbers, no special commands are needed to
expand the digit capacity of the number: it is sufficient to simply zero out the high part
of the divisor, be it AH, DX or EDX.

3.2.9. Conditional and unconditional transitions


As already mentioned, the normal sequential execution of instructions can be
interfered with by performing a control transfer, also called a transition; a control
transfer instruction forcibly writes a new address into the EIP register, forcing the
processor to continue program execution from another location. There is a distinction
between unconditional jump instructions, which perform a control transfer to another
place in the program without any checks, and conditional jump instructions, which can,
depending on the result of checking some condition, either perform a jump to a given
point or not perform it, in which case program execution will continue as usual from
the next instruction.

Before discussing the means for performing transitions available in the ²386
processor instruction system, we should first note that in the ²386 processor instruction
system all control transfer instructions are divided into three types depending on the
"range" of such transfer.
• Far transitions involve transferring control to a program fragment
located in another segment . Since we use a "flat" memory model under Unix
240

OS, we cannot need such transitions: we simply have no other segments.


• Near transitions are transfers of control to an arbitrary location

Here, segments in the processor's "understanding" are meant; see the remark on page. 546.
240
§ 3.2. Basics of the І386 command system 568
within the same segment; in fact, such transitions represent an explicit change in
the EIP value. In the "flat" memory model, these are the kind of transitions
that we can use to jump to an arbitrary location in our address space.
• Short jumps are used for optimization if the point to be jumped to is no more
than 127 bytes forward or 128 bytes backward from the current instruction. In
the machine code of such an instruction, the offset is specified by only one byte,
hence the corresponding limitation.
The type of transition can be specified explicitly by putting the word short or near
after the command (the assembler understands the word far, of course, but we
don't need it). If you don't do this, the assembler chooses the default transition type,
and for unconditional transitions it is near, which usually suits us, but for
conditional transitions it is short by default, which creates certain difficulties.
The unconditional jump command is called jmp (from the word jump, which
literally translates as "jump"). The command has one operand, which defines the
address where control should be transferred. Most often we use the form of the jmp
command with a direct operand, i.e. the address specified directly in the command; of
course, we do not specify a numeric address, which we usually do not know, but a label.
It is also possible to use a register operand (in this case, the transition is made at an
address taken from a register) or an operand of the "memory" type (the address is read
from a double word located at a given position in memory); such transitions are called
indirect, as opposed to direct, for which the address is specified explicitly. Here are
some examples:

jmp cycle ; switch to the cycle mark


jmp eax ; jump to address from EAX register
jmp [addr] ; move to the address contained in the
; in the memory that is marked with the
label addr jmp [eax] ; jump to the address read from
the
; memory located at address,
; taken from the EAX register
Here, the first command sets the direct transition and the other commands set the
indirect transition.
When a jmp instruction uses a direct operand, the assembler actually calculates the
address difference between the label we want to jump to and the address of the instruction
immediately following jmp; it is this difference that serves as the actual direct operand in the
resulting machine instruction. Relative addressing is said to be used when performing a jump
to an explicitly specified address. When creating a program in assembly language, you may
not think about this point or even do not know it at all, because in the program text we simply
specify the labels in the commands, and the assembly language takes care of the rest. Note
that all this is true only for the direct transition command, while indirect transitions are
performed at the "real" (absolute) address.
If the label to jump to is close enough to the current position, you can try to optimize the
machine code by applying the word short:
§ 3.2. Basics of the І386 command system 569
mylabel:
; ...
; a small number of teams
; ...
jmp short mylabel
It is usually hard to tell by eye whether the label is really close enough, especially since macros
(e.g., GETCHAR) can generate a whole series of commands, sometimes of poorly predictable
length. But you don't have to worry about that: if the distance to the label is greater than the
allowed distance, the assembler will generate an error of the following kind:

file.asm:35: error: short jump is out of range


and all that remains is to find the line with the specified number (in this case 35) and remove
the "failed" tin short.
Now let's consider conditional transition commands. The processor supports quite
a few of them: a transition can be executed depending on the value of one flag, a
combination of flags and even depending on the value of a register.
Let's make an important remark right away. In contrast to unconditional transition
instructions, conditional transition instructions are considered "short" by the assembler
by default, unless you specify the transition type explicitly. This strange approach to
transition instructions is due to historical reasons: on the early x86 processors,
conditional transitions were only short, there were no others. The i386 processor and
all later ones, of course, support close conditional transitions as well;
Table 3.2. The simplest conditional transition commands
team transition team transition
condition condition
jz ZF=1 jnz ZF=0
js SF=1 jns SF=0
jc CF=1 jnc CF=0
jo OF=1 jno OF=0
jp PF=1 jnp PF=0

long-range conditional jumps are still not supported, but we don't need them anyway.
Another non-trivial point is that all conditional jump instructions allow only a direct
operand (usually just a label). You cannot get the address for such a transition either
from a register or from memory. Usually this is not necessary, but if you still need it,
you can make a transition by the opposite condition two commands ahead, and the next
command to put an unconditional transition; it will turn out that we will safely jump
over this unconditional transition, if the initial condition of the transition is not met,
and, conversely, perform the transition, if the condition was met.
As with unconditional jumps, when translating conditional jump instructions into machine
code, the operand is not an address as such, but the difference between the memory position
to which the jump should be made and the instruction following the current one, i.e. relative
addressing is used.
The simplest conditional jump commands jump to a specified address depending
§ 3.2. Basics of the І386 command system 570
on the value of a single flag. The names of these commands are formed from the letter
J (from the word jump), the first letter of the flag name (e.g., Z for the flag ZF) and
possibly the letter N (from the word "not") inserted between them, if the transition is
to be made under the condition that the flag is zero. All these commands are
summarized in Table 3.2. Recall that we discussed the meaning of each of the flags on
page 548. 548.
Such conditional jump commands are usually placed immediately after an
arithmetic operation (e.g., immediately after the cmp command, see page 567). Thus,
two commands
cmp eax, ebx jz are_equal
can be read as an order "compare values in registers EAX and EBX and if they are
equal, switch to the are_equal label".
If we need to compare two numbers for equality, everything is quite simple: just
use the ZF flag, as in the previous example. But what do we do if we are interested,
for example, in the condition a < b? First, of course, we apply the command

cmp a, K

The command will compare its operands - more precisely, it will subtract the value of
b from a and set the flag values accordingly. What follows, as we will see in a moment,
is a bit more complicated.
If a and b are signed numbers, then at first glance everything is simple: subtracting
a - b under the condition a < b gives a strictly negative number, so the sign flag (SF,
sign flag) must be set and we can use the js or jns command. But the result might
not fit into the operand length (for example, 32 bits, if we compare 32-bit numbers),
i.e. an overflow might occur! In this case, the value of the SF flag will be opposite
to the true sign of the result, but the OF flag (overflow flag) will be raised. In other
words, if the condition a < b is fulfilled after the comparison (or subtraction), two
variants of flag values are possible: SF=1 but OF=0 (i.e. there was no overflow, the
number is negative), or SF=0 but OF=1 (the number is positive, but it is the result
of overflow, and in fact the result is negative). In other words, we are interested in the
fact that the SF and OF flags are not equal to each other: SF=OF. For such a case,
the i386 processor has an instruction jl (from the words jump if less than, "jump if
less than"), also denoted by the mnemonic jnge (jump if not greater or equal, "jump
if not greater or equal").
Let us now consider the situation when numbers a and b are unsigned. As we have
already discussed in §3.2.7 (see page 565), it makes no sense to consider the flags OF
and SF after arithmetic operations on unsigned numbers, but it makes sense to consider
the flag CF (carry flag), which is set to one if the arithmetic operation results in a
carry from a higher digit (for addition) or a borrow from a non-existent digit (for
subtraction). This is exactly what we need here: if a and b are considered unsigned and
a < b, such a borrowing will occur when subtracting a - b, so we can use the value of
the CF flag, i.e. execute the jc command, which has the synonyms jb (jump if
§ 3.2. Basics of the І386 command system 571
below, "jump if below") and jnae (jump if not above or equal, "jump if not above or
equal") specifically for this situation.
When we are interested in "greater than" and "less than or equal to" relations, we
have to include the ZF flag, which (for both signed and unsigned numbers) indicates
equality of the arguments of the preceding cmp command.
All commands of conditional transitions by the result of arithmetic comparison are
given in Table 3.3.
§ 3.2. Basics of the І386 command system 572
Table 3.3. Commands of conditional transition by results of arithmetic comparison
(cmp a, b)
com's If, uh. sino
name. vaVb transition condition them
equality
je jne jz jnz
equal not equal a = b a ZF= 1
=b ZF= 0
inequalities for signed numbers
jl less a<b SF=OF
g jne not greater or equal
jle a6b SF=OF or ZF= 1
jng less or equal not greater
jg a>b SF=OF and ZF= 0
jnle greater not less or equal
jge greater or equal not less a > b SF=OF
jnl
inequalities for unsigned numbers
jb below a<b CF= 1 jc

jnae not above or equal


jbe below or equal to not a6b CF= 1 or ZF= 1
jna above
ja above a>b CF= 0 and ZF= 0
jnbe not below or equal
jae above or equal not below a > b CF= 0. jnc
jnb

3.2.10. On the construction of branches and cycles


Beginners often get lost when trying to use conditional and unconditional jump
commands to construct branching (if-else operator) or loop with precondition
(while operator) constructs familiar from Pascal. The secret of their construction is
that the conditional transition in most cases has to be made by the opposite condition;
for example, if in Pascal we would write a loop with a heading like "while a = 0
do", in assembly language we have to first compare a to zero (for example, with the
cmp dword [a], 0 command) and then make the transition with the jnz
command, i.e. jump if not zero.
In general, a normal Pascal loop with a precondition
while condition do body
by means of machine commands is realized according to the following scheme:
cycle: condition calculation
JNx cycle_quit ; exit
body performance
§ 3.2. Basics of the І386 command system 573
JMP cycle ; repeat
cycle_quit:

and branching in its full version

if condition then branchi else branch2


turns into
condition calculation
JNx else_branch ; on the else branch
branch executioni
JMP if_quit ; branch else traversal
else_branch:
branch2 execution
if_quit:
In both cases, we use the "mnemonic" JNx to denote a conditional jump on condition
failure, i.e., a jump that is executed if the condition is false. This is quite understandable
if we take into account that right after the condition in the first case there is the loop
body and in the second case there is the then branch, i.e. exactly those actions that must
be performed if the condition is met (true); but in this case we don't need to jump
anywhere, we are already where we need to be. The "jump" must be performed if the
body intended for execution is to be skipped without executing - that is, if the condition
is false.
When programming in assembly language, you may notice that conditional
transitions on failure to fulfill a condition are much more common than transitions on
its fulfillment. It is worth saying that in such cases, if there is a choice, it is better to use
mnemonics with the letter p, such as jnb, jna, jnge, jnle, etc., to
emphasize that the transition is made on the opposite condition; this will add clarity to
your program. Recall that the letter n in all these mnemonics means not.

3.2.11. Conditional transitions and ECX register; loops


As already mentioned, some general-purpose registers have special roles in some
cases. In particular, the ECX register is best suited for the role of a loop counter: the
i386 processor instruction system has special instructions for constructing loops with
ECX as the counter, while there are no such instructions for other registers.
One of these instructions is called loop and is designed to organize loops
with a predetermined number of iterations. It uses the ECX register as the loop counter,
into which the number of desired iterations must be entered before starting the loop.
The loop command itself performs two actions: it decreases the value in the ECX
register by one and, if the value is not equal to zero as a result, moves to the specified
label. Note that the loop command has one important limitation: it performs only
"short" transitions, that is, with its help it is impossible to make a transition to a label
that is more than 128 bytes away from the command itself.
Suppose, for example, we have an array of 1000 double words, specified using the
directive
§ 3.2. Basics of the І386 command system 574
array resd 1000

and we want to calculate the sum of its elements. This can be done using the following
code fragment:

mov 1000 ; number of iterations


ecx, mov array ; address of the first
esi, mov 0 element
; initial value of the
lp:
eax, add [esi]. ; add the number to the
sum
eax, add 4 sum
; address of the next
esi, loop element
; decrease the counter;
lp ; if necessary, we'll
continue
Here we have actually used two loop variables - the ECX register as a counter and the
ESI register to store the address of the current array element.
Of course, we can do the same for any other general-purpose register using two
commands. For example, we can reduce the EAX register by one and go to the lp
label, provided that the result obtained in EAX is not equal to zero; it will look like
this:

dec eax
jnz lp

In the same way, you can write two commands for the ECX register:

dec ecx
jnz lp

The advantage of the loop lp command over these two commands is that its
machine code takes up less memory, although it is oddly slower on most processors.
In the example with an array you can do without ESI, just a counter: mov ecx,
1000 mov eax, 0
Ip: add eax, [array+4*ecx-4]
loop Ip
There are two interesting points here. First, we have to pass the array from the end to the
beginning. Secondly, the executive address in the add command has a somewhat strange
appearance. Indeed, the ECX register runs from 1000 to 1 (the loop is not executed for
the zero value), while the addresses of the array elements run from array+4*999 to
array+4*0, so we should multiply by 4 not ECX but (ecx-1). However, we cannot do
this and simply subtract 4. At first glance this contradicts what was said in §3.2.5 regarding
the general form of the executive address (the constant summand must be one or none), but
in fact the NASM assembler will subtract the value 4 from the array value right during
translation and translate it in this form, so that the constant summand will be one in the final
machine code.
Let us now consider two additional conditional jump commands. The jcxz (jump
if CX is zero) command makes a jump if the CX register contains zero. Flags are not
§ 3.2. Basics of the І386 command system 575
taken into account. Similarly, the jecxz command makes a transition
if zero is contained in the ECX register. As with the loop command,
this transition is always short. To understand why these commands were introduced,
imagine that the ECX register already contains zero when you enter the loop.
Then the loop body will be executed first, and then the loop instruction will decrease
the counter by one, as a result of which the counter will be equal to the maximum
possible unsigned integer (the binary notation of this number consists of all units), so
that the loop body will be executed 2 times, whereas by its meaning it probably should
32

not have been executed at all. To avoid such troubles, you can put the jecxz
command before the loop:

; fill ecx jecxz lpq lp: ; loop body


; ...
loop lp
lpq:

To complete the picture, let us mention two modifications of the loop command. The
loope command, also called loopz, makes a transition if the ECX register is non-
zero after it has been decremented by one and the ZF flag is set, while the loopne
command (or, what is the same thing, loopnz) - if the ECX register is non-zero and
the ZF flag is reset. The ECX register is decremented by these commands in any case,
i.e. even when ZF is "wrong". As can be guessed, the letter "e" here means equal
and the letter "z" means zero.

3.2.12. Bitwise operations


Information written to registers and memory in the form of bytes, words and double
words can be considered not only as a representation of integers, but also as strings
consisting of separate and (in general) unrelated bits.
To work with such bit strings, special bitwise operation commands are used. The
simplest of them are the two-place and, or and xor commands, which perform
the corresponding logical operation ("and", "or", "excluding or") separately on the first
bits of both operands, separately on the second bits, etc.; the result, which is a bit string
of the same length as the operands, is stored, as usual for arithmetic commands, in the
register or memory area defined by the first operand. The restrictions on the operands
used in these instructions are the same as for two-place arithmetic instructions: the first
operand must be of either register or memory type, the second operand can be of any
type; you cannot use an operand of memory type for the first and second operands at
the same time; if neither operand is a register operand, you must specify the digit
capacity of the operation using one of byte, word, and dword. Bitwise negation
(inversion) can be performed with the not command, which has one operand. The
operand can be of register or memory type; in the latter case, of course, the digit
capacity of the operand must be specified. All these commands set the ZF, SF and PF
flags according to the result; usually only the ZF flag is used.
In assembly language programs, it is common to see the xor command with both
operands representing the same register, e.g., "xor eax, eax". This means zeroing
§ 3.2. Basics of the І386 command system 576
the specified register, i.e. the same as "mov eax, 0". The xor command is used
for this because it takes up less space (2 bytes versus 5 for mov) and runs a few clock
cycles faster. Some programmers prefer to use two commands xor eax,eax and
not eax instead of mov eax,-1, although the gain here is not so noticeable (4
bytes of code versus 5), and you can lose in terms of execution time. There are other
examples of such use of bitwise operations. For example, the and command can be
used to get the remainder of dividing an unsigned number by a power of two (from 1
to 32); for example, "and eax,3" will leave in eax the remainder of dividing
its initial value by 4, and "and eax,1fh" will leave in eax the remainder of
dividing it by 32.
When you just need to check if one of the specified bits is present in a number, you
may find it convenient to use the test command, which works in the same way as
the and command, i.e. it performs a bitwise "and" over its operands, but it does not
write the result anywhere, but only prints it out.
§ 3.2. Basics of the І386 command system 577
SHR

SHL, SAL 0

SAR

Fig. 3.3. Schematic diagram of bitwise shift commands

The command "test eax, eax" is often used instead of "cmp eax, 0". In particular, the
command "test eax, eax " is often used instead of "cmp eax, 0" to check
for equality to zero, which takes less memory and works faster.
In addition to commands that work on each bit of the operand (operands) and
realize logical operations, it is often necessary to use bit shift operations that work on
all bits of the operand at once, simply by shifting them. The simple bitwise shift
commands shr (shift right) and shl (shift left) have two operands, the first of which
specifies what to shift and the second of which specifies how many bits to shift. The
first operand can be a register operand or of the "memory" type (in the second case the
bit size must be specified). The second operand can be either direct, that is, a number
from 1 to 31 (in fact, you can specify any number, but only the lower five digits will be
used), or the CL register; no other registers can be used. When executing these
instructions with the CL register as the second operand, the processor ignores
all but the low five CL bits. The register itself, of course, does not change.
The scheme of shifting by 1 bit is as follows. When shifting to the left, the high bit
of the shifted number is transferred to the CF flag, the other bits are shifted to the left
(i.e. the bit with the number p gets the value, which before the operation had a bit with
241

the number p - 1), zero is written to the low bit. When shifting to the right, on the
contrary, the lowest bit is written to the CF flag, all bits are shifted to the right (i.e. the
bit with the number p gets the value that the bit with the number p +1 had before the
operation), zero is written to the high bit.
Note that for unsigned numbers, a shift to the left by n bits is equivalent to
multiplication by 2 ", and a shift to the right is equivalent to integer division by 2 " with
discarding the remainder. It is interesting that for signed numbers the situation with the
shift to the left is absolutely similar, but the shift to the right for any negative number
will give a positive, because the sign bit will be written to zero. Therefore, along with
the commands of simple shift, the commands of arithmetic bitwise shift sal (shift
arithmetic left) and sar (shift arithmetic right) are also introduced. The sal
command does the same thing as the shl command (they are actually the same
machine command). As for the sar command, it works similarly to the shr

By convention, we assume that the bits are numbered from right to left starting from zero, i.e., for
241

example, in a 32-bit number, the low-order bit is numbered 0 and the high-order bit is numbered 31.
§ 3.2. Basics of the І386 command system 578
command, except that the value in the high bit is kept the same as it was before the
operation; thus, if we consider the shifted bit string as a record of a signed integer, the
sar operation will not change the sign of the number (positive will remain positive,
negative will remain negative). In other words, the arithmetic right shift operation is
equivalent to division by 2 " with the remainder discarded for signed integers. The
operations of prime and arithmetic shifts are shown schematically in Fig. 3.3.
Bit-shift commands are much faster than multiplication and division commands;
moreover, they are much easier to handle: you can use any registers, so you don't have
to think about releasing the accumulator. That's why programmers almost always use
bitwise shift commands when multiplying and dividing by powers of two. High-level
language compilers also try to use shifts instead of multiplication and division when
translating arithmetic expressions.
In addition to the above, the i386 processor also supports "complex" bit-shift
instructions shrd and shld, which work through two registers; cyclic bit-shift
instructions ror and rol; cyclic bit-shift instructions through the CF flag - rcr
and rcl. The commands working with individual bits of their operands - bt, bts,
btc, btr, bsf and bsr - can be very useful. We will not consider all these
commands; if desired, the reader can learn them on his own using reference books.
Let's consider an example of a situation in which it is reasonable to use bitwise
operations. Bit strings are convenient for representing subsets of a finite number of
initial elements; simply put, we have a finite set of objects (for example, employees of
some enterprise, or toggle switches on some control panel, or simply numbers from 0
to N) and we need a program to be able to represent a subset of this set: which employees
are at work now; which toggle switches on the control panel are set to "on"; which of
N athletes participating in a marathon have passed the next control test; which of the
N marathon athletes have passed the next test; and which of the N marathon athletes
have passed the next test. The most obvious representation for a subset of a set of N
elements is a memory area containing N binary digits (so, if the set can include numbers
from 0 to 511, we need 512 digits, i.e. we need 64, i.e. 64, i.e. 512 digits).
§ 3.2. Basics of the І386 command system 579
512 bits, i.e. 64 single-byte cells), where each of the N possible elements is assigned
one bit, and this bit will be equal to one if the corresponding element is included in the
subset, and zero otherwise. Each of the N objects is said to be assigned one of two
statuses: either "included in the set" (1) or "not included in the set" (0).
So, let us require a subset of a set of 512 elements; these can be completely arbitrary
objects, we are only interested in the fact that each of them has a unique number - a
number from 0 to 511. To store such a set, we will describe an array of 16 double words
(recall that a double word contains 32 bits, i.e. it can store the status of 32 different
objects). As usual, we will consider the array elements as numbered (or having indices)
from 0 to 15. The array element with index 0 will store the status of objects with
numbers from 0 to 31, the element with index 1 - the status of objects with numbers
from 32 to 63, etc. At the same time, within the element itself we will consider the bits
numbered from right to left, that is, the lowest digit will have the number 0, the highest
- the number 31. For example, the status of the object with the number 17 will be stored
in the 17th bit of the zero element of the array; the status of the object with the number
37 - in the 5th bit of the first element; the status of the object with the number 510 - in
the 30th bit of the 15th element of the array. In general, to find out by the number of
the object X, in which bit of which element of the array stores its status, it is enough to
divide X by 32 (the number of bits in each element) with a remainder. The quotient
will correspond to the number of the element in the array, the remainder will correspond
to the number of bits in this element. This could be done with the div command, but
it is better to remember that the number 32 is the power of two (2 ), so if we take the
5

lower five bits of the number X, we will get the remainder of its division by 32, and if
we perform a bitwise shift to the right for it by 5 positions, the result will be equal to
the desired quotient. For example, let the number X is stored in the register EBX,
and we need to know the number of the element and the number of bits in the element.
Both numbers do not exceed 255 (more precisely, the element number does not exceed
15 and the bit number does not exceed 32), so we can place the result in single-byte
registers; let these be BL (for the bit number) and BH (for the array element number).
Since putting any new values into BL and BH will spoil the contents of the EBX
register as an integer, it would be logical to first copy the number somewhere else, such
as EDX, then in EBX to zero all bits except the five low-order bits. In this case, and
the value of EBX as an integer, and the value of its smallest byte - the register BL
will be equal to the desired remainder of the division; then in EDX we perform a
shift to the right and the result, which will fully fit in the smallest byte of the register
EDX, that is, in the register DL, copy to BH:
mov edx, ebx
and ebx, 11111b ; take 5 lower digits
shr edx, 5 ; divided the rest by 32 mov bh,
dl
However, the same can be done in a shorter way, without using additional registers,
because all the bits we need are in EBX from the beginning. The lower five digits of
the number X are the remainder we need from division, and the quotient we need is
§ 3.2. Basics of the І386 command system 580
the next few (in this case, no more than four) digits. When EBX entered the number X,
these digits were in positions beginning with the fifth, and we need them to be in the
register BH, which is nothing but the second byte of the register EBX, so it is enough
to shift the entire contents of EBX to the left by three positions, and the desired
result of the division will neatly "fit" in BH; after that, the contents of BL we shift
back to the same three bits, which at the same time clears us its high bits:
shlebx , 3
shrbl , 3

Having learned how to convert the object number to the array element number and the
number of digits in the element, let's return to the original problem. First, let's describe
the array:
section .bss
set512 resd 16

Now we have a suitable memory area, and the label set512 is associated with the
address of its beginning. Somewhere at the beginning of the program (and perhaps not
only at the beginning) we will probably need a set clearing operation, i.e. such a set of
commands after which the status of all elements is zero (no element is included in the
set). To do this, it is enough to put zeros into all array elements, for example, like this:
section .text

xor eax, eax ; eax :=


mov ecx, 16 0
mov esi, set512
lP: mov [esi+4*ecx- eax
loop 4],
lp

The mov command here will be executed 16 times - with values in ECX from 16
to 1, hence the cumbersome expression in the executive address.
Let us now have the number of element X in the EBX register, and we need to
add the element to the set, i.e. set the corresponding bit to one. To do this, we first find
the bit number of the array element and calculate the mask - a number in which only
one bit (just the one we need) is equal to one, and the other bits are zeros. Then we will
find the required array element and apply the "or" operation to it and to the mask, the
result of which we will put back into the array element. In this case, the bit we need in
the element will be equal to one, while the other bits will not change. To calculate the
mask, we will take the one and shift it to the left by the required number of bits. Recall
that of the registers only CL can be the second argument of bit shift commands, so it
makes sense to calculate the bit number in CL at once. So, let's write:

; add an element to the set512 set,


§ 3.2. Basics of the І386 command system 581
; whose number is in EBX
mov cl bl ; get the bit number
and , cl 11111b ; in the CL register
mov , ea 1 ; create a mask
shl x ea cl ; in the EAX register
mov x ed ebx ; calculate the item
shr x ed 5 ; in register edx
number
or x [set512+4*ed eax ; apply mask
x],
The task of excluding an element from the set is solved in a similar way, only this time
the mask will be inverted (0 in the required bit, ones in all other bits), and
we will apply it with the and command (logical "and"), as a result of which the
required bit will be zeroed, while the others will not change:

; remove an element from the set512 set,


; whose number is in EBX
mov cl, bl ; get the bit number
and cl, 11111b ; in the CL register
mov eax 1 ; create a mask
shl , eax cl ; in the EAX register
not , eax ; invert the mask
mov edx ebx ; calculate the item
shr , edx 5 ; in register edx
number
and , [set512+4*ed eax ; apply mask
x],
To find out whether an element with a given number is included in the set, you can also
use the mask (one in the required digit, zeros in the others) and the test command.
The result will be shown by the ZF flag: if it is raised, it means that the corresponding
element was not in the set, and vice versa:

; find out if the set512 set includes an element,


; whose number is in EBX
movcl , bl ; get bit number
andcl , 11111b ; in the CL register
§ 3.2. Basics of the І386 command system 582
mov eax 1 ; create a mask
shl eax cl ; in the EAX register
mov edx ebx ; calculate the item
shr edx 5 ; in register edx
number
test [set512+4*edx], eax ; apply the mask
; now ZF=1 means that the in the set
element
; was absent, and ZF=0 for being present

Let's consider another example. Suppose we need to count how many elements are
included in an array. To do this, we will have to look through all the array elements and
count the single bits in each of them. The easiest way to do this is to load a value from
an array element into a register, and then shift the value to the right by one bit and each
time check if there is a one in the low-order bit; this can be done exactly 32 times, but
it is easier to finish when there is zero left in the register. We will look through the array
from the end, indexing by ECX: this will allow us to use the jecxz command. We
will use the EBX register as the result counter, and use EAX to analyze the array
elements.

; count the elements in the set set512


xor ebx, ebx ; EBX := 0
mov ecx, 15 ; last index
lP: mov eax, [set512+4*ecx] ; loaded item
lp2: test eax, 1 ; one in the lowest digit?
jz inc notone ; if not, let's jump
notone shr ebx eax, ; if so, increment the
: test 1 eax, counter
jnz eax lp2 ; shifted EAX
; is there anything else
jecxz ecx lp out there?
dec ; if so, let's go on.
jmp ; inner loop
; if ECX is zero, we finish
quit: ; otherwise we reduce it
; now the counting result
; isandin continue
EBX the outer
cycle
3.2.13. String operations
For convenience in working with arrays (continuous areas of memory), the i386
processor introduces several instructions grouped under the category of string
operations. It is these instructions that utilize the ESI and EDI registers in their
special role discussed on page. 547. The general idea behind string commands is that a
read from memory is performed at an address from the ESI register, a write to
memory is performed at an address from the EDI register, and then these
registers are incremented (or decremented) by 1, 2, or 4 depending on the instruction.
Some commands perform a read to a register or a write to memory from a register; in
this case, an appropriately sized "accumulator" register, that is, the AL, AX, or EAX
register, is used. String commands have no operands, always using the same
§ 3.2. Basics of the І386 command system 583
registers.
The "direction" of address changes (movement along the lines) is determined by
the DF flag (remember, its name means direction flag, i.e. "direction flag"). If this
flag is reset, the addresses are increased, i.e. the string operation is performed from left
to right; if the flag is set, the addresses are decreased (working from right to left). DF
can be set with the command std (set direction), and reset with the command cld
(clear direction).
The simplest of the string commands are the stosb, stosw, and stosd
commands, which write a byte, word, or double word from the AL, AX, or EAX
register to memory at address [edi], respectively, and then increment or decrement
(depending on the value of DF) the EDI register by 1, 2, or 4. For example, if we have
an array of

buf resb 1024

and need to fill it with zeros, we can apply the following code:

xor al, al ; reset al


mov edi, buf ; array start address
mov ecx, ; array length
1024 cld ; working in a forward
lp: stosb loop ; al -> [edi], increase
direction
lp edi

The lodsb, lodsw and lodsd commands, on the other hand, read a byte, word
or double word from memory at the address located in the ESI register and
place the read into the AL, AX or EAX register, after which they increment or
decrement the value of the ESI register by 1, 2 or 4. Using these commands
with the rep prefix is usually pointless, since we will not be able to insert any other
actions between successive executions of the string command that process the value
read and placed in the register. Using lods series commands without a prefix, on the
contrary, can be very useful. For example, suppose we have an array of four-byte
numbers

array resd 256

and we need to count the sum of its elements. This can be done as follows:

xor ebx, ebx ; zero the sum mov esi, array

mov ecx, 256


cld
Ip: lodsd
add ebx, eax
loop lp
§ 3.2. Basics of the І386 command system 584
It is often convenient to combine lods commands with the corresponding stos
commands. Suppose, for example, that we need to increase by one all elements of the
same array. This can be done in the following way:
mov esi, array
mov edi, esi mov ecx, 256 cld
lp: lodsd
inc eax
stosd
loop lp

If you just want to copy data from one memory location to another, the movsb,
movsw, and movsd commands are very convenient. These commands copy a
byte, word or double word from memory at address [esi] to memory at address
[edi] and then increment (or decrement) both registers ESI and EDI by 1, 2 or
4 respectively.
The stosX and movsX commands can be prefixed with rep. The command
prefixed with rep will be executed as many times as the number in the ECX register
(except for stosw and movsw; if they are prefixed, the CX register will be used).
With the rep prefix, we can rewrite the example above for stosb without using
the label:
xor al, al
mov edi, buf
mov ecx, 1024
cld
rep stosb

Note that rep also allows for a zero initial value of ECX, in which case the string
command is not executed even once (unlike the loop command, where the zero
case has to be considered separately).
The stosX commands are often used in conjunction with lodsX, in which
case the rep prefix cannot be used because it refers to only one machine command
(in fact, it is part of the command; it is the F3 byte, which literally precedes the
command code itself, i.e., it is placed right before it). The movsX commands are quite
different; they are most often used exactly with the prefix. For example, if we have two
string arrays
buflresb 1024
buf2resb 1024

and you want to copy the contents of one of them into the other, you can do it this way:

mov ecx, 1024 mov esi, bufl mov edi, buf2 cld rep
movsb

Thanks to the ability to change the direction of work (with the help of DF) we can copy
§ 3.2. Basics of the І386 command system 585
partially overlapping memory areas. Suppose, for example, the bufl array contains
the string "This is a string" and we need to insert the word "long" before
the word "string". To do this, we must first copy the memory area starting from the
address [bufl+10] five bytes forward to make room for the word "long" and a
space. Obviously, we can only do this copying from the end to the beginning, otherwise
some of the letters will be erased before we copy them. If the word "long" (together
with the space) is contained in the buffer buf2, we can insert it into the phrase in
buf1 in the following way:
std
mov edi, buf1+15+5
mov esi, buf1+15
mov ecx, 6 rep movsb mov esi, buf2+4 mov ecx, 5 rep
movsb

Let us explain that the length of the source string is 16 characters, so the address
buf1+15 is the address of the last letter in the string - g in the word string.
Having copied six characters, i.e. the whole word string, to a new position, we
changed the address of the "source" (buf2+4 is the address of the space in the
string "long ") and continued copying.
In addition to those listed above, the i386 processor implements the cmpsb,
cmpsw, and cmpsd (compare string), and scasb, scasw, and scasd (scan
string) commands. The scas series commands compare an accumulator (AL, AX, or
EAX, respectively) to a byte, word, or double word at address [edi], setting flags
like the cmp command, and increment/decrement the EDI. The cmps commands
compare bytes, words, or double words in memory at [esi] and [edi], set
flags, and increment/decrement both registers.
The rep prefix has no meaning for these commands, but with the scasX and
cmpsX. commands you can use the repz and repnz prefixes (also called repe
and repne), which, in addition to decreasing and checking the ECX register (or
CX, if the command is two-byte), also check the value of the ZF flag and continue
working only if this flag is set (repz/repe) or reset (repnz/repne).
It is interesting to note that the repe/repz prefix has exactly the same machine code
as the rep prefix used with the stosX and movs.V instructions; the processor
"understands" whether or not to check the flag depending on which instruction this prefix
precedes - for stos and movs the flag is not checked, for scas and cmps it
is checked. The repne/repnz prefix has a separate code. In addition to the above
commands, the rep prefix can be used to prefix some other commands that extract and write
information to I/O ports, but they are privileged commands, so we do not consider them.
For example, if we need to find the letter 'a' in a mystr character array of size
mystr_len, we can proceed as follows:

mov edi, mystr


mov ecx, mystr_len
mov al, 'a' cld repnz scasb
§ 3.2. Basics of the І386 command system 586
The last line will cyclically compare the AL register, in which we put the code of the
desired letter, with the byte [edi] and will stop in two cases: if ECX has
reached zero or if the next comparison shows equality (ZF flag is raised). If after
that ECX will be equal to zero, then the desired character in the array is not found (we
have reached the end of the array). You can also check what the value of the cell at
address [edi] is equal to.

3.2.14. Some more interesting teams


On page. On page 586 we introduced the std and cld commands, which can
be used to set and reset the DF flag (direction flag). The same can be done with the
carry flag (CF): the stc command sets it and the clc command resets it; this is
sometimes used to transfer information between different parts of a program. Oddly
enough, there are no similar commands for the other flags we know of; there are
commands to control privileged flags, such as cli and sti for the IF (interrupt
flag), but these cannot be used in limited mode.

The lahf command copies the contents of the flag register to the AH
register: the CF flag is copied to the lowest bit of the register (bit
number 0), the PF flag to bit number 2, the AF flag to bit number
4, the ZF flag to bit number 6 and finally the SF flag to bit number 7,
i.e. the highest bit. The other bits are left undefined.
The movsx (move signed extension, "move signed extension") and movzx
(move zero extension, "move zero extension") commands allow you to
combine copying with digit expansion. Both commands have two operands, and the
first one must be a register, and the second one can be a register or of type "memory",
and in any case the length of the first operand must be twice the length of the second
one, i.e. you can copy from byte to word or from word to double word. The movzx
command fills the missing bits with zeros, and the movsx command fills them with
the value of the high bit of the original operand.
Quite interesting is the cpuid command available on Pentium and later
processors, which can be used to find out which processor model our program is
running on and what features this processor supports; a detailed description of the
command can be found on the Internet or in reference books, we will not give it here,
but it is useful to remember the fact of its existence.
We will also mention the commands xlat (convenient for recoding text data
through the recoding table), bswap (allows you to rearrange the bytes of a given 32-
bit register in reverse order, first appeared in the 80486 processor), aaa, aad, aam
and aas (allow you to perform arithmetic operations on binary-decimal numbers in
which each half byte represents a decimal rather than a hexadecimal digit).
The xchg command allows you to swap the values of its two operands. One of
them is any of the general-purpose registers, the other is a register or operand of the
"memory" type with the same size. The register operand can be specified first or second
- as you can easily guess, it has no effect on anything. When both operands of the
§ 3.2. Basics of the І386 command system 587
command are registers, in principle this command does not give any serious
possibilities, but if one of the operands is a memory location, the use of xchg allows
you to put some value into this memory in one indivisible action, and store the value
that was there before in a register. Why it is so important to provide indivisibility of
memory actions, we will learn in the VII part of our book, which will be devoted to
parallel programming.
Indivisibility can also be achieved with some other commands - more specifically, with all
commands that assume that some value will be retrieved from memory, a new value calculated
from it, and written to the same memory; for example, commands like " inc dword [x] "
can be made indivisible,

"neg byte [b]", "sub [m], eax", etc. To make such a command "indivisible", it must
be prefixed with the lock prefix, i.e. write something like this

lock sub [m], eax

As with the rep, repe, and repne prefixes we used in conjunction with string
commands in §3.2.13, the lock prefix is a single byte that is added to the front of the
command's machine code. For clarity, we note that this is byte F0, but it is not necessary to
remember this. By executing an instruction with this prefix, the processor sets a special flag
on the control bus that prohibits other processors (and other actors like DMA controllers, which
we will postpone until the last part of the second volume) from any work with RAM; this, in fact,
ensures the indivisibility of the operation. You should only realize that such a command can
be executed dozens of times slower than the same command without the prefix. Well, for the
xchg command the processor forbids all access to memory without any lock prefix,
simply because xchg is specially designed for use in conditions when atomicity of the
operation is required.
Consideration of the command system cannot be considered complete without the
nop command. It performs a very important action: it does nothing. Its name itself is
derived from the word no operation.

3.3. Stack, subroutines, recursion


3.3.1. The concept of a stack and its purpose
As we already know, a stack in programming means a data structure built according
to the "last in first out" (LIFO) principle, i.e. an object over which the operations "add
an element" and "extract an element" are defined, and the elements that were added are
extracted in reverse order. As applied to low-level programming, the concept of a stack
is significantly narrower: here, a stack is understood as a continuous memory area for
which a special register stores the address of the top of the stack; memory in the
considered area above the top (i.e. with addresses smaller than the address of the top)
is considered free, and memory from the top to the end of the area (up to higher
addresses), including the top itself, is considered occupied; the register storing the
address of the top is called a stack pointer (see Fig. 3.4). The operation of adding some
value to the stack decreases the address of a vertex, thus shifting the vertex upwards
§3.3 Stack, subroutines, recursion 596
(i.e. in the direction of smaller addresses) and writes the added value to the new vertex;
the fetch operation reads the value from the stack vertex and shifts the vertex
downwards, increasing its address.
Generally speaking, the direction of stack growth depends on the particular processor; on
the machines we are considering, and in general on most

Figure 3.4. Stack

of existing architectures the stack "grows downwards", i.e. in the direction of decreasing
addresses, but you can find processors on which the stack "grows upwards", and such
processors where the direction of stack growth can be chosen, and even such processors
where the stack is organized cyclically.
The stack can be used, for example, to temporarily store register values; if a certain
register stores a value needed for further calculations, and we need to temporarily use
this register for something else, the easiest way to get out of the situation is to store the
value of the register in the stack, then use the register for other needs, and then retrieve
the stored value from the stack back to the register. This is quite convenient, but another
thing is much more important: the stack is used when calling subroutines to store
return addresses, to pass actual parameters to subroutines and to store local
variables. It is the use of the stack that allows implementing the recursion mechanism,
when a subroutine can directly or indirectly call itself.

3.3.2. Stack organization in the І386 processor


Most existing processors support stack handling at the machine instruction level,
and the i386 is no exception. Stack commands allow words and double words to be
added to and removed from the stack; individual bytes cannot be written to the stack,
so the address of the stack top always remains even. Moreover, when working in 32-bit
mode, it is desirable to always use double words in the stack, keeping the address of
the vertex a multiple of four; everything will work without it, but the stack commands
will be slower.
§3.3 Stack, subroutines, recursion 597
As already mentioned (see page 547), the ESP register, which is formally part of
the general-purpose register group, is nevertheless almost never used in any role other
than that of a stack pointer; the name of this register stands for stack pointer. The
address contained in ESP is considered to point to the top of the stack, i.e. to the
memory area where the last value stored in the stack is stored. The stack "grows" in the
direction of decreasing addresses, i.e. ESP decreases when a new value is added to the
stack and increases when the value is removed.
The value is placed on the stack by the push command, which has one operand.
This operand can be direct, register or memory type and have the size word or
dword; if the operand is not register, the size must be specified explicitly. To retrieve
a value from the stack, the pop command is used, the operand of which can be of
register or memory type; of course, the operand must have the size word or dword.
We emphasize once again that two-byte operands should not be used when working
with the stack; nevertheless, it is necessary to remember about specifying the operand
size.
The push and pop commands combine copying data (to or from the stack
top) with shifting the top itself, i.e. changing the value of the ESP register. It is possible
to access the value at the top of the stack without fetching it - by applying [esp]
operand (in any command allowing the "memory" type operand). For example, the
command
mov eax, [esp]

will copy the four-byte value from the top of the stack to the EAX register.
Of course, you can work this way not only with the top, but also with any data in the stack,
because it is a normal memory area; the only restriction here is that the free part of the stack
- cells with addresses smaller than the current esp value - should not be used, because
the operating system may use this memory between the quanta of time allocated to your
process, and thus corrupt your data. For example, when processing so-called signals, this is
exactly what will happen, and it can happen at a completely random moment, because you
don't know between which commands your program will be taken out of control to let other
programs work, and at that time a signal may arrive. We will discuss signal processing in detail
in Volume 2.
As mentioned above, the stack is very convenient to use for temporary storage of
values from registers:

push eax ; memorize eax


; ... use eax for other needs ...
pop eax ; restore eax

Let's consider a more complicated example. Let the register ESI contains the address
of a string of characters in memory, and it is known that the string ends with a byte with
a value of 0 (but we do not know what is the length of the string) and we need to
"reverse" this string, that is, to write its constituent characters in reverse order in the
same memory location; zero byte, which plays the role of a limiter, of course, remains
§3.3 Stack, subroutines, recursion 598
in place and is not copied anywhere. One way to do this is to sequentially write the
character codes to the stack, and then go through the string from beginning to end again,
retrieving characters from the stack and writing them to the cells that make up the string.
Since one-byte values cannot be written to the stack, and two-byte values are
possible but undesirable, we will write four-byte values using only the low-order byte.
Of course, it is possible to do everything more rationally, but now we are more
interested in the clarity of our illustration. We will use the EBX register for
intermediate storage, and only its low byte (BL) will contain useful information, but we
will write the whole EBX to the stack and retrieve it from the stack. The task will be
solved in two cycles. Before the first cycle we will put a zero in the ECX register, then
at each step we will extract a byte at address [esi+ecx] and place this byte (as part
of a double word) in the stack, and ECX will be incremented by one, and so on
until the next extracted byte is not zero, which means the end of the string according to
the task conditions. As a result, all non-zero elements of the string will be in the stack,
and in the register ECX will be the length of the string.
Since the number of iterations (string length) for the second loop is known in
advance and is already contained in ECX, we organize this loop using the loop
command. Before entering the loop, we will check if the string is empty (i.e. ECX is
not equal to zero), and if the string was empty, we will immediately go to the end of
our fragment. Since the value in ECX will be decreasing, and we need to pass the line
in the forward direction - along with ECX we will use the EDI register, which at
the beginning will be set equal to ESI (that is, pointing to the beginning of the line),
and at each iteration we will shift it. So, we write:

xor ebx, ebx ; reset ebx


xor ecx, ecx ; reset ecx
Ip: mov bl, [esi+ecx] ; the next byte in the
cmp bl, 0 je string
Ipquit push ebx ; the end of the line?
inc ecx p pjml ; if yes - end of the
cycle
; bl as part of ebx
Ipqui jecxz done mov thethe
; if following
line isindex
empty -
t: edi, esi ; repeat the cycle
end
lp2: pop ebx ;; again
retrieve
from the
mov [edi], bl ; write down
beginning of the buffer
inc edi ; next address
loop lp2 ; repeat ecx
done: times

3.3.3. Additional stack commands


If necessary, it is possible to push the value of all general-purpose registers
onto the stack with a single command; this command is called pushad (push all
doublewords). Note that this command pushes the contents of the EAX, ECX, EDX,
§3.3 Stack, subroutines, recursion 599
EBX, ESP, EBP, ESI, and EDI registers (in that order) onto the stack, with the
ESP value being pushed as it was before the command was executed. The
corresponding stack extraction command is called popad (pop all doublewords). It
fetches eight four-byte values from the stack and writes these values to the registers in
the reverse order to the pushad command, while the value corresponding
to the ESP register is ignored (i.e., it is fetched from the stack but not written to the
register).
The flag register (EFLAGS) can be written to the stack with the pushfd
command and retrieved with the popfd command, however, if we are working in
restricted mode, only some flags (namely the flags that can be changed in restricted
mode) can be changed, the rest will not be affected by the popfd command.
There are similar commands for 16-bit registers supported for compatibility with older
processors; they are called pushaw, popaw, pushfw and popfw and work in a
completely similar way, but use the corresponding 16-bit registers instead of 32-bit registers.
The pushaw and popaw commands are practically not used, as for the pushfw and
popfw commands, their use might make sense if we take into account that there are no
flags in the "extended" part of the EFLAGS register, the value of which we could change in
the limited mode of operation; in reality, however, these commands are not used either,
because they can break the stack alignment to addresses divisible by four, thus slowing down
the work with the stack.

3.3.4. Subprogrammes: general principles


Recall that a subroutine is a separate part of program code that can be called from
the main program (or from another subroutine); calling means temporarily transferring
control to the subroutine so that when the subroutine does its work, it returns control to
the point from which it was called. We have already encountered subroutines in the
form of Pascal procedures and functions.
When calling a subroutine, it is necessary to memorize the return address, i.e. the
address of the machine command following the subroutine call command, and to do it
in such a way that the called subroutine itself can use this stored address to return
control when it finishes its work. Besides, subroutines often receive parameters that
affect their work and use local variables in their work. All this requires allocating RAM
(or registers). The simplest solution would be to allocate each subroutine its own
memory area for storing all local information, including the return address, parameters,
and local variables. Then the call of the subprogram will require first of all to write into
the memory area belonging to the subprogram (in predetermined places) the values of
the parameters and the return address, and then to transfer control to the beginning of
the subprogram.
It is interesting that once upon a time subroutines were treated in this way, but with
the development of programming methods and techniques, the need for recursion arose
- a program structure in which some subroutines can directly or indirectly call
themselves, and potentially an unlimited number of times (to be more precise, limited
only by the memory size). It is clear that each recursive call requires a new instance of
the memory area for storing the return address, parameters and local variables, and the
§3.3 Stack, subroutines, recursion 600
later such an instance is created, the earlier the corresponding call will finish its work,
i.e. recursive calls of subroutines in a certain sense obey the rule "the last one in - the
first one out". The idea of using the already familiar stack when calling subroutines
follows quite logically from this.
The essence of the approach is that before calling the subroutine, the values of the
call parameters are placed on the stack, then the call itself is made, i.e. control transfer,
which is combined with saving the return address in the same stack. When the
subroutine receives control, it (already itself) reserves a certain amount of memory in
the stack for storing local variables, usually by simply shifting the stack pointer to the
required number of cells. The area of stack memory containing the parameter values,
return address and local variables associated with a single call is called a stack frame.

3.3.5. Calling subprograms and returning from them


A subprogram call, as it is clear from the above, is a transfer of control to the
subprogram start address with simultaneous storing of the return address in the stack,
i.e. the address of the machine instruction immediately following the call instruction.
The І386 processor provides the call command for this purpose; similarly to the
jmp command, the argument of the call command can be direct (the transition
address is specified directly in the command, usually by a label; as for the jmp
command, the distance from the current position is used in the machine code, i.e.
relative addressing is used), register (the address of the control transfer is located in a
register) and memory type (the transition should be performed at the address read from
a specified memory location). The call command does not have a "short" form; since
we usually do not need the "far" form due to the lack of segments, there is only one
form - the "near" form, which we always use.
Returning from a subprogram is performed by the ret command (from the
word return - "return"). In its simplest form, this command has no arguments. When
executing this command, the processor fetches four bytes from the stack top and writes
them to the EIP register, as a result of which control is transferred to the address that
was in memory at the stack top.
Let's consider a simple example. Suppose that in our program we often have to fill
memory areas of different length with some single-byte value. Such an action can be
quite well formalized as a subroutine. For the simplicity of the picture, let's assume that
the address of the necessary memory area is passed through the EDI register,
the number of single-byte cells to be filled - through the ECX register, and
the value to be written into all these cells - through the AL register. The code of the
corresponding subroutine can look like this, for example:

; fill memory (edi=address, ecx=length, al=value)


fill_memory:
jecxz fm_q
fm_lp: mov [edi], al inc edi loop fm_lp fm_q: ret

This subroutine can be accessed as follows:


§3.3 Stack, subroutines, recursion 601
mov edi, my_array mov ecx, 256 mov al, '@' call
fill_memory

As a result, 256 bytes of memory starting from the address specified by the my_array
label will be filled with the '@' character code (number 64).

3.3.6. Organization of stack frames


The subroutine given as an example in the previous paragraph did not actually use
the stack frame mechanism, storing only the return address on the stack. This was
enough because the subroutine did not need local variables, and we passed the
parameters through registers. In practice, subroutines are rarely so simple. In more
complex cases, we will certainly need local variables, because there are not enough
registers for everything. Besides, passing parameters through registers may also be
inconvenient: first, they are not always enough, and second, the subroutine may need
the values passed through registers for a long time, and this will actually deprive it of
the possibility to use for its internal needs those registers that were used when passing
parameters. Finally, if you need to call another subroutine that also accepts parameters
through registers, the information from the registers will still have to be "unloaded"
somewhere (usually into the same stack).
That's why usually, especially when translating a program from a high-level
language, such as Pascal or C, parameters to functions are passed through the stack,
and local variables are placed on the stack. As it was said above, parameters are placed
in the stack by the calling program, then when calling a subroutine the return address
is put into the stack, and then the called subroutine itself reserves space in the stack for
local variables. All this together forms a stack frame. The contents of the stack frame
can be accessed using addresses "bound" to the address containing the return address;
in other words, the memory location starting from which the return address was written
to the stack is used as a kind of reference point. Thus, if three four-byte parameters are
placed on the stack and then the procedure is called, the return address will be at
[esp], and the parameters will obviously be available at [esp+4], [esp+8]
and [esp+12]. If local four-byte variables are placed in the stack, they will be
available at addresses [esp-4], [esp-8], etc.
It is not very convenient to use the ESP register to access parameters, because in
the procedure itself we may also need a stack - both for temporary data storage and for
calling other subroutines. Therefore, the first action of the subroutine usually saves the
value of the ESP register in some other register (most often EBP) and uses it to
access parameters and local variables, and the ESP register continues to play its role
as a stack pointer, changing as necessary; before returning from the subroutine it is
usually restored to its original value.
§3.3 Stack, subroutines, recursion 602

Figure 3.5. Stack frame structure

value by simply forwarding the value from EBP to it so that it points to the return
address again.
Another question naturally arises: what if other subroutines also use the EBP
register for the same purpose? In this case, the first call of another subroutine will ruin
our work. Of course, we can save EBP in the stack before calling each subroutine, but
since there are usually many more subroutine calls in a program than subroutines
themselves, it is more economical to follow a simple rule: each subroutine should save
the old EBP value itself and restore it before returning control. The stack is also used
to save the EBP value. The saving is performed by a simple command push ebp
immediately after receiving control, so that the old EBP value is placed in the stack
immediately after the return address of the subroutine, and this address of the stack top
is used as the "anchor point". To do this, the next command is mov ebp,esp. As a
result, the EBP register points to the place in the stack where its own, EBP, stored
value is located; if we now access the memory at address [ebp+4], we will find
there the address of return from the subroutine, and the parameters stored in the stack
before calling the subroutine are available at addresses [ebp+8], [ebp+12],
[ebp+16], etc. Memory for local variables is allocated by simply subtracting the
required number from the current ESP value; so, if we need 16 bytes for local
variables, we should execute the command sub esp,16 immediately after saving
§3.3 Stack, subroutines, recursion 603
EBP and copying the ESP contents into it; if (for the sake of simplicity) all our local
variables also occupy 4 bytes, they will be available at addresses [ebp-4], [ebp-
8], etc. The structure of a stack frame with three four-byte parameters and four four-
byte local variables is shown in Fig. 3.5.
Let us repeat that at the beginning of its work, according to our agreements, each
subprogram must execute

push ebp mov ebp, esp sub esp, 16 ; substitute


volume instead of 16
; memory for local variables

The completion of the subroutine should now look like this:


mov esp, ebp pop ebp ret

The І386 processor supports special commands for servicing stack frames. Thus, at the
beginning of a subroutine, instead of the three commands given above, you could give one
command "enter 16, 0", and instead of two commands before ret you could write
leave. The problem, oddly enough, is that enter and leave are slower than the
corresponding set of simple commands, so they are almost never used; if we disassemble
machine code generated by a Pascal or C compiler, we are likely to find exactly those
commands at the beginning of any procedure or function, as shown above, and nothing like
enter. The only justification for the existence of enter and leave commands may be
their shortness (for example, the machine command leave occupies only one byte in
memory), but nowadays nobody usually thinks about saving memory on machine code;
performance is usually more important.
Let us make one more important remark. When working under Unix OS, we don't
have to worry about stack availability or stack size. The operating system creates a
stack automatically at the start of any task, and during its execution it increases the size
of memory available for the stack if necessary: as the top of the stack moves "up" (i.e.
in the direction of decreasing addresses) through the virtual address space, the operating
system puts more and more new pages of physical memory to correspond to virtual
addresses. That is why in Figs. 3.4 and 3.5 we have depicted the top edge of the stack
as something fuzzy. However, after lightweight processes (tracks) appeared in Unix-
systems, a rather strict limit was imposed on the stack size: 8 MB. If this limit is
exceeded, the system kernel will destroy your process as a crash.

3.3.7. Basic conventions of subroutine calls


Despite the detailed description of the stack frame mechanism given in the previous
paragraph, there is still room for maneuver in some issues. For example, in what order
should the values of subprogram parameters be written to the stack? If we write a
program in assembly language, this question, in fact, does not arise; however, it turns
out to be unexpectedly crucial when creating compilers for high-level programming
languages.
§3.3 Stack, subroutines, recursion 604
The creators of classical Pascal compilers usually followed the "obvious" way: a
251

procedure or function call was translated from Pascal in the form of a series of
commands to put values on the stack, and the values were put in the natural (for
humans) order - from left to right; then the call command was inserted into the code.
When such a procedure receives control, the values of actual parameters are placed in
the stack from bottom to top, i.e. the last parameter is placed closer to the frame
reference point (available at address [ebp+8]). This, in turn, implies that to access
the first (and any other) parameter, a Pascal procedure or function must know the
total number of these parameters, since the location of the n-th parameter in the stack
frame depends on the total number. Thus, if a procedure has three four-byte parameters,
the first of them will appear in the stack at address [ebp+16], while if there are
five of them, the first one will be found at address [ebp+24]. This is why Pascal
does not allow creating procedures or functions with a variable number of arguments,
so-called variadic subroutines (which is quite normal for an educational language, but
not quite acceptable for a professional language). As we discussed in §2.1 (see the note
on page 235), all sorts of writeln, readln, and other entities that resemble
variadic subroutines are actually part of the Pascal language in general, i.e., they should
be considered operators rather than procedures.
The creators of the C language took a different path. When translating a C function
call, the parameters are placed on the stack in reverse order, so that the first of them (if
there is one, of course) is always available at address [ebp+8], the second at address
[ebp+12], and so on, regardless of the total number of parameters. This allows
you to create variadic functions; in particular, the C language itself does not include
any functions at all, but the "standard" library provides a number of functions that
assume a variable number of arguments (such as printf, scanf, etc.), and all
these functions are also written in C (you can't do this in Pascal).
On the other hand, the absence of variadic subroutines in Pascal allows us to put
the care of stack cleaning on the caller. Indeed, a Pascal subroutine always knows how
much space the actual parameters occupy in its stack frame (since for each subroutine
this amount is set once and for all and cannot change) and, accordingly, can take care
of stack cleaning. As we have already mentioned, there are more subroutine calls in any
program than subroutines themselves, so by shifting the care of stack cleaning from the
caller to the called one, a certain memory saving (the number of machine instructions)
is achieved. When using C conventions, such saving is impossible, because a subroutine
in general case does not know and cannot know how many parameters are passed to
252

251
Pascal compilers don't have to do this; for example, the familiar Free Pascal tries to pass parameters
through registers, and only if there are not enough registers does it place the remaining parameters on
the stack, but the order is indeed "direct".
252
Different situations use different ways of fixing the number of parameters; for example, the
printf function finds out how many parameters to fetch from the stack by analyzing the format
string, and the execlp function fetches arguments until it hits a null pointer, but both are just special
cases.
§3.3 Stack, subroutines, recursion 605
it, so the care of clearing the stack of parameters remains on the caller; usually it is done
simply by increasing the ESP value by a number equal to the total length of the actual
parameters. For example, if a procl subroutine takes three four-byte parameters (let's
call them al, a2, and a3) as input, its call would look something like this:

push dword a3 ; put parameters on the stack


push dword a2
push dword a1
call proc1 ; call subroutine
add esp, 12 ; remove parameters from the stack

In the case of Pascal conventions, the last command (add) is unnecessary, the one
being called takes care of everything. The i386 processor even has a special form of the
ret instruction with one operand for this purpose (we used ret without operands
in the examples above). This operand, which can only be direct and is always two bytes
long ("word"), specifies the amount of memory (in bytes) occupied by the function
parameters. For example, a procedure that accepts three four-byte parameters through
the stack, the Pascal compiler will end with the command
ret 12

This command, like the usual ret command, will retrieve the return address from the
stack and pass control over it, but in addition (at the same time) will increase the ESP
value by a given number (in this case 12), relieving the caller of the obligation to clear
the stack.
Both Pascal and C compilers organize the return of values from functions through
registers, and the "most important" register is used - for i386 it is EAX, the authors
of the compilers were unanimous in this. To be more precise, integer values that fit into
this register (i.e. no more than four-byte values) are returned through EAX. Eight-
byte integers are returned through the register pair EDX:EAX; looking ahead, we note
that floating-point numbers are returned through the "main" register of the arithmetic
coprocessor. Only if the returned value is not a number - for example, you need to return
a record (record in Pascal, struct in C) - the return is done through the
memory provided by the caller, and the caller must pass the address of this memory to
the subroutine through the stack along with the parameters.

3.3.8. Local tags


The essence and main advantage of subroutines is their isolation. In other words,
while writing one subroutine, we usually do not remember how other subroutines are
organized inside, and perceive each subroutine except one (the one we are writing right
now) as one big command. This allows us to eliminate unnecessary details from our
minds and concentrate on the implementation of a particular program fragment. The
problem is that we will definitely need labels in the body of any subprogram, and we
need to make it so that when choosing names for such labels we don't have to remember
whether there is a label with the same name somewhere else (in another subprogram).
§3.3 Stack, subroutines, recursion 606
The NASM assembler provides special local labels for this purpose. Syntactically,
these labels differ from regular labels in that they begin with a dot. The assembler
localizes such labels in a program fragment bounded on both sides by regular (non-
local) labels. In other words, the assembler does not consider a local label by itself, but
as something subordinate to the last (nearest from above) non-local label. For example,
in the following fragment:

first_proc:
; .
.cycle:
second_proc:
; ... ...
.cycle:
; ... ...
third_proc:

The first .cycle label is subordinate to the first_proc label and


the second to the second_proc label, so they do not conflict with
each other. If the .cycle label occurs in the operands of an instruction between the
first_proc and second_proc labels, the assembler will know that it is the
first .cycle label that is meant; if it occurs after second_proc but
before third_proc, the second label will be used, while the
occurrence of a .cycle label before first_proc or after third_proc
will cause an error. If we start each subroutine with a regular label and use only local
labels inside the subroutine, we can use local labels with the same names in different
subroutines, and the assembler will not get confused by them.
In fact, the assembler achieves this effect in a less than honest way: when it sees a label
whose name starts with a dot, it simply appends to the front the name of the last label it
encountered without a dot. So, in the example above, we're not talking about two identical
.cycle labels, but two different labels first_proc.cycle and
second_proc.cycle. It is useful to keep this in mind and not to use labels containing a
dot explicitly in the program, although the assembler allows it.

3.3.9. Example: pattern matching


Here is an example of a subroutine that uses recursion. One of the simplest classical
problems solved recursively is the familiar string-to-sample matching, and we will use
it. A detailed description of the solution algorithm was given in §2.11.3 when we solved
this problem in Pascal, so here we will limit ourselves to brief remarks.
Let us refine the problem taking into account the use of "low-level" strings. We are
given two strings of characters, the length of which is unknown in advance, but we
know that each of them is limited to zero bytes (note that in Pascal the strings were
organized differently). We consider the first string as the string to be matched, the
second string as a sample. In the sample, the character '?' can be matched with an
arbitrary character, the character '*' can be matched with an arbitrary
§3.3 Stack, subroutines, recursion 607
subchain of characters (possibly even empty), the other characters denote themselves
and are matched only with themselves. It is required to determine whether a given string
matches (in its entirety) a given pattern, and return the result 0 if it does not
match, and 1 if it does.
As we have seen by solving the problem in Pascal, the recursive algorithm for it
turns out to be quite simple. At each step (more precisely, at each recursive call), we
consider the remainder of the string and the sample; at first these remainders match the
string and the sample, then, as the algorithm progresses, the characters at the beginning
are discarded from them, and we assume that for the already discarded characters the
matching was successful.
Depending on the first character of the sample, our algorithm works according to
one of three basic scenarios. If instead of the first character we see a bounding zero in
the sample, i.e. we have run out of sample, the algorithm stops here and immediately
gives the result: "true" if the string to be matched has also run out, otherwise "false";
indeed, only an empty string can be matched with an empty sample.
If the sample has not run out yet and the first character in it is any character other
than '*', you must match the first character in the sample to the first character in the
string, taking into account that the string must be non-empty, and the character '?' in
the sample is successfully matched to any character in the string. If the matching fails
(i.e. either the first character in the string is a bounding zero, or the first character in the
sample is not a question mark, and the character in the sample is not equal to the
character in the string), then we return "false"; otherwise, we match the remainders of
the string and the sample, discarding the first characters, and return the result of the
matching as our result.
Finally, if the first character of the sample is an '*' character, we must
sequentially try to match this 'asterisk' with an empty sub-chain of the string, with one
character of the string, with two characters, etc., until we run out of string itself. We do
this in the following way. We start an integer variable I, which will denote the current
variant under consideration. We assign zero to this variable (we start the consideration
with an empty chain). Now, for each alternative under consideration, we discard one
character (asterisk) from the sample, and from the string - as many characters as the
current number in the variable I. We try to match the resulting residuals using a
recursive call to "themselves". If the result of the call is "true", we finish the work by
returning the truth; if the result is "false", we check whether we can still increment the
variable I (we will not go beyond the string to be matched). If there is nowhere else
to increase, we terminate the loop by returning "false"; otherwise, we return to the
beginning of the loop and consider the next possible value of I.
The program we wrote earlier in Pascal used the Pascal representation of strings, and
because of this it was very different from the assembly language solution we are about to
write: for example, there we had to use four parameters for the subroutine, whereas here there
will be only two parameters. Looking ahead, we note that the C solution will repeat the
assembly language solution almost word for word; it can be found in Volume 2, §4.3.22.
We will implement it in assembly language as a subroutine, which we will call
match. The subroutine will assume that it is passed two parameters - the string address
§3.3 Stack, subroutines, recursion 608
([ebp+8]) and the sample address ([ebp+12]); the subroutine itself will use one
four-byte variable (the one we called I); it will be allocated space in the stack frame,
so it will be located at address [ebp-4]. To increase speed, our subroutine will copy
the addresses from the parameters into the ESI (line address) and EDI (sample
address) registers at the very beginning. In addition, the subroutine will use the EAX
register to perform arithmetic operations. It will also return the result of its work
through it: number 0 as an indication of logical falsehood (no match found) or number
1 as an indication of logical truth (match found).
We will "discard" characters from the beginning of strings by simply incrementing
the string address in question: indeed, if the address string contains a string,
we can assume that the address string+1 contains the same string except for the
first letter.
The subroutine will call itself recursively, and, being called recursively, will have
to perform work on values different from those set in the previous call. In this case,
registers as local data storage will be needed both for the initial subroutine call and for
the "nested" (recursive) one, but there is only one set of registers in the processor, and
it is necessary to make sure that different "instances" of the running subroutine do not
interfere with each other.
There can be various agreements about which registers the subroutine is allowed
to "spoil" and which registers it must leave behind in the form they were in when it was
called. As an extreme case we can consider the options when the subroutine is allowed
to spoil all registers, or when it is not allowed to spoil any registers except, perhaps,
EAX, through which the result is returned. The first case forces us anywhere in the
program, when calling any subroutine, to first save on the stack all registers in which
we have stored anything important; this is a bad option, because most small subroutines
do not need all registers, so our saving and restoring will be unnecessary work, reducing
the efficiency of the program. On the other hand, imposing on all subroutines the
requirement to restore all registers is also redundant, although not that redundant.
A compromise known as the CDECL convention and used by most C compilers on
the x86 platform is quite successful. According to this convention, a subroutine is
allowed to spoil EAX, EDX, and ECX, while all other general-purpose registers must
either be left untouched or saved and restored before returning control. This choice of
registers is not accidental. The point is that EAX is spoiled every now and then,
because many operations can be performed only through it; that's why nobody uses it
for long-term data storage, and, by the way, this is the reason why EAX is used to
return values from functions. The EDX register is often used together with EAX either
as a "high part" of a 64-bit number or for storing a 64-bit number. of a 64-bit number
or to store the remainder from division; any multiplication or division operation will
spoil this register, so it is not suitable for long-term storage of information either.
Finally, ECX is used as a counter in all sorts of loops which also occur very often.
With this in mind, these three registers are the ones that get messed up more often than
others, and as a result, losing the values in them causes the least amount of problems
for the caller.
Our subroutine will work in accordance with CDECL: it will "spoil" the values of
§3.3 Stack, subroutines, recursion 609
registers EBP, ESI, EDI and EAX, but EAX will be "spoiled" anyway, because
we return the final value through it, so we should save only ESI, EDI and, of course,
EBP. It may seem that we could save a little money by using, for example, ECX/EDX
instead of ESI/EDI, which according to CDECL we have the right to spoil, but we
are calling ourselves, so we would still have to save these registers, but not at the
beginning of the procedure, but before the recursive call to ourselves.
The text of our subroutine is as follows:

match: ; START OF SUBPROGRAM


push ebp mov ; organize the stack frame
ebp, esp sub
esp, 4 ; local variable I
; will be at address [ebp-
push esi 4]
push edi ; save ESI and EDI
registers
mov esi, [ebp+8] ; (EAX will still
change;
mov edi, [ebp+12] ; line and pattern
.again: ; others
and this we don't
is where use)
we'll
; load parameters:
come back to when
addresses
; compare another
cmp byte [edi], 0 ; symbol and shift
jne .not_end ; we're
if out
not,of
let's jump
the sample?
cmp byte [esi], ; the sample ran out, but
0 jne near .false the
; line?
if not, return False.
§3.3 Stack, subroutines, recursion 610

jmp .true ; ended at the same time: TRUTH


.not_end: ; if the sample hasn't run out.
cmp byte [edi], ; isn't there an asterisk at the
'* .not_star
jne beginning
; If of it?
not, let's jump out of
; here! We'll organize a cycle.
mov dword [ebp- 0 ; I := 0
4], .star_loop:
; prepare for the recursion.
mov eax, edi ; second argument first:
inc eax push eax ; trace symbol pattern

mov eax, esi ; now for the first argument:


add eax, [ebp-4]; string from the first
push eax ; character
(recall, [ebp-4] is an I)
call match ; we summon ourselves, but
; with new parameters
add esp, 8 ; clear the stack after the call
test eax, eax ; what did we get back?
jnz .true ; returned non-zero, i.e. TRUE
; means that the remainder of the
; string matched the remainder of
; the sample => return TRUE
add eax, [ebp-4] ;
; returned 0, i.e. FALSE we
; should try to write off more
; characters to this asterisk.
cmp byte 0 ; but maybe the line has already
[esi+eax] ; ended.
je .false ; then there's nothing more to
inc dword [ebp- ; try.
otherwise try: I := I + 1
4] .star_loop
jmp ; and to the beginning of
.not_star: the cycle
; this by Iwe go if the
is where
mov al, [edi] ; sample
is not empty and does not
cmp al, '?' ; start
maybe with '*'a '?
there's
je .quest ; If so, let's jump from here.
cmp al, [esi] ; if not, the characters at the
; beginning of the string
; and the sample must
; match; if the string has
ended, this check fails
jne .false ;
; do not
toomatch (or end lines) =>
; return False
jmp .goon ; are a match - keep watching.
;
.quest: ; sample starts with '?'
cmp byte [esi], ; you just need to make sure the
0 .false
jz ; string doesn't
over
.goon: inc esi ; characters matched =>
§ 3.4. Main features of the NASM assembler 611
inc edi ; move down the line and
jmp .again ; pattern and continue
.true: ; this is where we jumped
mov eax, 1 ; bring back the TRUTH
jmp .quit
.false: and they jumped in here to
xor eax, eax ; return FALSE
.quit: ; that's it, job's over.
pop edi ; bring everything back to
pop esi ; order in front of
mov esp, ebp ; return control
pop ebp ; we have the result in EAX
ret ; Return control
; END OF PROCEDURE

If, for example, a string is located in memory labeled string and a pattern is
located in memory labeled pattern, the call to the match subroutine will look like
this:

push dword pattern


push dword string
call match
add esp, 8

After that the result of the match (0 or 1) will be in the EAX register. The text of this
example together with the main program using command line parameters can be found
in the file match.asm.
Note that at the beginning of the subroutine, when trying to jump to the .false label,
we had to explicitly specify that the transition is "near". The point is that the .false label
was a little farther away from the transition command than is acceptable f or a "short"
transition. See the discussion on page. 572.

3.4. Main features of the NASM assembler


Earlier we used the NASM assembler, limiting ourselves to general remarks and
occasionally digressing to describe some of its features that we could not do without.
Thus, in §3.1.4, exactly enough explanations were given to allow us to understand one
simple program. Later, we needed to use memory to store data, and had to devote §3.2.3
to memory reservation directives and labels. Before giving an example of a complex
subroutine in §3.3.9, we had to talk about local labels in §3.3.8.

This chapter will be devoted entirely to a study of the NASM assembler, beginning
with a brief description of the command-line keys used to run it and continuing with a
more formal description of its language syntax than before. After that, we will devote
a separate chapter to the macro processor.
§ 3.4. Main features of the NASM assembler 612
3.4.1. Command line keys and options
As we have already mentioned, when calling the nasm program, you must specify
the name of the file containing the assembly language source code, and in addition, it
is usually necessary to specify the keys specifying the mode of operation. We are already
familiar with one of these keys - -f; let us remind you that it specifies the format of
the resulting code. In our case, the elf format is always used. Interestingly, if you
do not specify this key, the assembler will create the output file in a "raw" format, i.e.,
simply put, it will convert our commands into a binary representation and write them
to a file in this form. We cannot run such a file under operating systems, but if, for
example, we wanted to write a program to be placed in the boot sector of a disk, the
"raw" format would be just what we need.
The -o key specifies the name of the file to which the translation result should
be written. If we use the elf format, we can trust NASMty to choose the file name:
it will drop the .asm suffix from the source file name and replace it with .o,
which is what we need in most cases. If for some reason we prefer a different
name, we can specify it explicitly with -o.
We will need the -d key after learning the macro processor; it is used to define a
macro character in case we don't want to edit the source code to do so. For example, -
dSYMBOL has the same effect as inserting the string '/"define SYMBOL
at the beginning of the program, and -dSIZE=1024 will not only define the SIZE
character, but also assign it the value 1024, as the /define SIZE 1024
directive would do. We'll come back to this on page. 626.
The ability to generate a so-called listing - a detailed report of the assembler about
the work done - is very interesting from the cognitive point of view. The listing includes
lines of source code with information about the addresses used and the final code
generated as a result of processing each source line. Listing generation is triggered by
the -l switch, followed by a file name. As an example, take any assembly language
program and translate it with the -l flag; so, if your program is called prog.asm, try
using the command
nasm -f elf -l prog.lst prog.asm

The text of the listing will be placed in the prog.lst file. Be sure to look through
the resulting file and figure out what's going on; if you don't understand something, find
someone who can help you figure it out.
The -g switch, which requires NASM^ to include so-called debugging
information in the translation results, can be very useful. When this key is specified,
NASM inserts into the object file, in addition to the object code, information about the
name of the source file, line numbers in it, etc. All this information is completely useless
for the program operation, especially since it can be several times larger than the
"useful" object code. However, if your program does not work as you expect, compiling
with the -g flag will allow you to use a debugger (e.g. gdb) to execute the program
step by step, which in turn will allow you to figure out what is going on.
§ 3.4. Main features of the NASM assembler 613
Another useful key is -e; it instructs NASM^ to run our source code through the
macro processor, output the result to the standard output stream (simply put, to the
screen), and rest easy. This mode of operation can be useful if we made a mistake when
writing a macro and can't find our mistake; when we see the result of macro-processing
our program, we will most likely understand what went wrong and why.
NASM supports other command-line keys; those who wish to learn them for
themselves can consult the documentation.

3.4.2. Syntax basics


The basic syntactic unit of almost any assembly language (and NASM is no
exception) is a line of text. This makes assembly languages different from most (though
by no means all) high-level languages, in which the line feed character is equated with
a space.
If we didn't have enough string length to fit everything we wanted to fit in it, we
can use a means of "gluing" strings together. By placing a "backslash" (the "\" symbol)
in the last character of the line, we tell the assembler to consider the next line as a
continuation of the previous one. This is much better than allowing too long lines in the
program text. We discussed the issue of program text width in §2.12.7; recall that it is
always desirable to keep within 75 characters, at most 79. Here and below, when
describing the syntax supported by NASM, we will understand "text line" to include
"logical" strings glued from several strings using backslashes, without specifying that
we mean just such strings.

A line of text in NASM assembly language consists (in general) of four fields:
label, command name, operands, and comment, with label, command name, and
comment being optional fields. As for the operands, the requirements for them are
imposed by the command; if the command name is missing, the operands are missing,
and if the command is specified, the operands must correspond to it. All four fields may
also be missing, in which case the string is empty. The assembler ignores empty lines,
but we can use them to visually separate parts of a program.
A word consisting of Latin letters, digits, and the characters '_', '$', '#', '@',
'~', '.', and '?' can be used as a label, and a label can begin only with the
letter or characters '_', '?', and '.'. As we recall from §3.3.8, labels starting with a
dot are considered local. In addition, in some cases, you can precede the label name
with a '$' character; this is usually used if you want to create a label whose name is
the same as the name of a register, command, or directive. This may be necessary if
your program is made up of modules written in different programming languages; then
other modules may well have labels that match assembly keywords, and you will need
to refer to them somehow. The assembler is case-sensitive in label names, so, for
example, 'label', 'LABEL', 'Label', and 'LaBeL' are four different labels. A
colon character may be placed after a label if it is present in the string, but not
necessarily. As noted, programmers usually put colons after labels to which control can
be transferred, and do not put colons after labels that designate memory regions.
Although the assembler does not require this, the program is clearer when this
§ 3.4. Main features of the NASM assembler 614
convention is used.
The command name field, if present, may contain a machine command designation
(possibly with a prefix such as rep or lock), or pseudo-commands - directives of a
special kind (we have already considered some of them and will come back to this
issue), or, finally, the name of a macro (we have also met with these, for example,
PRINT used in the examples; a separate paragraph will be devoted to the creation of
macros). Unlike labels, the assembler does not distinguish between letter registers in
the names of machine commands and pseudo-commands, so we can equally well write
mov, MOV, Mov, and even mOv, although we should not write it that way, of course.
Macro names, as well as label names, are case-sensitive.
The requirements for the contents of the operand field depend on which specific
command, pseudo-command, or macro is specified in the command field. If there is
more than one operand, they are separated by a comma. Register names often have to
be used in the operand field, and they are case-insensitive, just as machine command
names are.
The reader who is confused about where case is important and where it is not,
should remember one simple rule: the NASM assembler does not distinguish
between upper and lower case letters in all words that it has entered itself: in
instruction names, register names, directives, pseudo-commands, operand lengths
and transition types (words byte, dword, near, etc.).etc.), but he counts upper
and lower case letters as different letters in the names that the user (a programmer
writing in assembly language) enters - in labels and macro names.
Let us note one more NASM property related to operand writing. An operand of
type "memory" is always written using square brackets. This is not the case for
some other assemblers, which causes constant confusion.
A comment is indicated by a semicolon (";"). Starting from this symbol, the
assembler ignores all text up to the end of the line, which allows you to write anything
you want there. It is usually used to insert explanations into the program text for those
who will have to read the text.

3.4.3. Pseudo commands


Pseudo-commands are a series of words entered by the NASM assembler that can
be used syntactically in the same way as the mnemonics of machine commands,
although they are not actually machine commands. Some such pseudo-commands are
db, dw, dd, resb, resw, and resd~, which we already know
from \Sx\ref{memory_reservation} Ome
The letter q in their names stands for quadro - "quadruple word" (8 bytes), the letter
t comes from the word ten and stands for ten-byte memory areas. These pseudo-
commands are usually used in programs that handle floating-point numbers (simply
put, fractional numbers); moreover, dt allows only floating-point numbers as
initializers (e.g., 71.361775). In addition to the dt pseudo-command, floating-
point numbers can also be used in the dd and dq arguments; this is because the
arithmetic coprocessor can handle three formats of floating-point numbers - regular,
§ 3.4. Main features of the NASM assembler 615
double-precision, and high-precision, occupying 4 bytes, 8 bytes, and 10 bytes,
respectively.
The equ pseudo-command for defining constants deserves a separate discussion.
This pseudo-command is always used in combination with a label, i.e. it is an error not
to put a label in front of it. The equ pseudo-command associates the label preceding
it with an explicitly specified number. The simplest example:

four equals 4

We have defined label four, specifying the number 4. Now, for example,

mov eax, four

is the same as

mov eax, 4

It is worth recalling that any label is nothing more than a number, but when a program
line containing the mnemonic of a machine instruction or a memory allocation directive
is labeled, the corresponding memory address (which is nothing more than a number)
is associated with such a label, whereas the equ directive allows you to specify a
number explicitly.
The equ directive is often used to associate with some name (label) the length
of an array just specified with a db, dw, or any other directive. This is accomplished
by using the pseudo-label $, which in each line where it appears denotes the current
address of . For example, you could write it like this:
253

msg db "Hello and welcome", 10, 0


msglen equ $-msg

The expression $-msg, which is the difference of two numbers known to the assembler
at runtime, will be calculated directly at assembly time. Since $ means the current
address after the string has been described, and msg means the address of the
beginning of the string, their difference is exactly equal to the length of the string (19
in our example). We will return to the computation of expressions during assembly in
§3.4.5.
The times directive allows you to repeat a command (or pseudo-command) a
specified number of times. For example,

stars times 4096 db '*'

specifies a memory area of 4096 bytes filled with the '*' character code, just as
4096 identical strings containing the db '*' directive would do.
Sometimes the incbin pseudo-command can be useful, which allows you to create a

More specifically, the current offset relative to the beginning of the section.
253
§ 3.4. Main features of the NASM assembler 616
memory area filled with data from some external file. We will not consider it in detail; the
interested reader can study this directive himself by referring to the documentation.

3.4.4. Constants
Constants in NASM assembly language fall into four categories: integers,
character constants, string constants, and floating point numbers.
As already mentioned, integer constants can be specified in decimal, binary,
hexadecimal, and octal number systems. If you simply write a number consisting of
digits and perhaps a minus sign as the first character, the assembler will treat the number
as a decimal number; how to specify constants in other number systems is explained in
detail on page 554. 554.
Character constants and string constants are very similar to each other; indeed,
wherever a string constant should be used, a character constant can be used. The
difference between string and character constants is only in their length: a character
constant is a constant that fits within the length of a "double word" (i.e., contains no
more than 4 characters) and can therefore be considered an alternative record of an
integer (or bit string). Both character and string constants can be written using double
quotes and apostrophes. This allows you to use apostrophe and quote characters
themselves in strings: if a string contains a quote character of one type, it is enclosed in
quotes of another type (see the example on page 555).
Symbolic constants containing less than 4 characters are considered synonymous with
integers, with the low bytes equal to the character codes of the constant and the missing high
bytes filled with zeros. When using character constants, it should be remembered that integers
in computers with i386 processors are written in reverse byte order, that is, the least significant
byte comes first. At the same time, according to the meaning of the string (and character
constant), the code of the first letter should be placed in memory first. That's why, for example,
the constant 'abcd' is equivalent to the number 64636261h: 64h is the code of
the letter d, 61h is the code of the letter a, and in both cases the byte with the value
61h is first and 64h is last. In some cases, the assembler accepts as string constants
such constants which are short enough to be considered character constants. This happens ,
for example, if the assembler sees a character constant more than one character long in the
parameters of the db directive or a constant more than two characters long in the parameters
of the dw directive.
Floating-point constants that specify fractional numbers are syntactically
distinguished from integer constants by the presence of a decimal point. Note that
integer constant 1 and constant 1.0 have nothing in common! For clarity, we
note that the bitwise record of a floating-point 1.0 single-precision number (that is, a
record that takes up 4 bytes, just as for an integer) is equivalent to the integer
3f800000h (1065353216 in decimal notation). A floating-point constant can also
be specified in exponential form, using the letter e or E. For example, 1.0e-5 is
the same as 0.00001. Note that the decimal point is still required.

3.4.5. Calculating expressions at assembly time


In some cases, the NASM assembler computes arithmetic expressions it encounters
§ 3.4. Main features of the NASM assembler 617
directly during assembly. It is important to realize that only the computed results are
included in the final machine code, not the computation itself. Naturally, to compute
an expression at assembly time, it is required that the expression does not contain any
unknowns: everything needed for the computation must be known to the assembler at
the time of its operation.
An expression evaluated by the assembler must be integer, that is, it must consist
of integer constants and labels, and it must use operations from the following list:
• + and - - addition and subtraction;

• * is multiplication;
• / and % - integer division and remainder of division (for unsigned integers);
• // and %% - integer division and remainder of division (for signed integers);
• &, |, ~ - bitwise "and", "or", "excluding or" operations;
• << and >> - bitwise shift operations to the left and right;
• unary operations - and + are used in their usual role: - changes the sign
of a number to the opposite sign, + does nothing;
• The unary operation ~ denotes bitwise negation.
When using % and %% operations, it is necessary to leave a space character after the
operation sign so that the assembler does not confuse them with macro directives (we
have already used macro directives in the examples, and we will consider them in detail
later).
Another unary operation, seg, is not applicable for us due to the absence of segments in
the "flat" memory model.
Unary operations have the highest priority, followed by multiplication, division and
remainder operations, and addition and subtraction operations have even lower priority.
Next in descending order of priority are the shift operations, the & operation,
then the ~ operation, and then the | operation, which has the lowest
priority. You can change the order of operations by enclosing part of the expression in
parentheses.
3.4.6. Critical expressions
The assembler analyzes the source text in two passes. The first pass calculates the
size of all machine commands and other data to be placed in program memory; as a
result, the assembler determines what numerical value should be assigned to each of
the labels encountered in the program text. The second pass generates the actual
machine code and other memory contents. The second pass is needed so that, for
example, it is possible to refer to a label that appears in the text later than the reference
to it: when the assembler sees a label, say, in the jmp command, before the actual
command marked with this label is encountered, it cannot generate code on the first
pass because it does not know the numerical value of the label. On the second pass, all
the values are already known and there are no problems with code generation.
All this has a direct relation to the mechanism of expression evaluation. It is clear
that an expression containing a label can be evaluated by the assembler on the first pass
only if the label was in the text before the expression to be evaluated; otherwise, the
§ 3.4. Main features of the NASM assembler 618
evaluation of the expression has to be postponed until the second pass. There is nothing
wrong with this unless the value of the expression does not affect the size of the
instruction, the memory area allocated, etc., i.e. the numerical values to be assigned to
further labels encountered do not depend on the value of the expression. If this
condition is not fulfilled, the impossibility to calculate the expression on the first pass
will lead to the impossibility to fulfill the task of the first pass - to determine the
numerical values of all labels. Moreover, in some cases no number of passes would
help even if the assembler could do it. The NASM assembler documentation gives such
an example:

times (label-$) db 0
label: db 'Where am I?'

Here, the line with the times directive should create as many null bytes as the number
of cells the label label is from the line itself - but the label label
is just as many cells away from the line as the number of null bytes that will be
created. So how many should there be?!
This makes it necessary to introduce the notion of a critical expression: it is an
expression evaluated during assembly whose value the assembler needs to know during
the first pass. The assembler considers as critical any expression that in one way or
another affects the size of anything in memory, and therefore may affect the values of
labels entered later. Only numeric constants can be used in critical expressions, as well
as labels defined higher up in the program text than the expression in question. This
ensures that the expression can be evaluated on the first pass.
Besides the times directive argument, the critical category includes, for
example, expressions in the arguments of pseudo-commands resb, resw, etc., and
in some cases - expressions in the executive addresses, which may affect the final size
of the assembled instruction. Thus, the commands "mov eax,[ebx]", "mov
eax,[ebx+10] " and "mov eax,[ebx+10000] " generate 2 bytes, 3 bytes and
6 bytes of code, respectively, because the executive address in the first case occupies
only 1 byte, in the second case - 2 bytes because of the single-byte number included in
it, and in the last case - 5 bytes, of which four are used to represent the number 10000;
but how much memory will the command take up?

mov eax, [ebx+label]

if the label value has not been defined yet? However, these difficulties can be
avoided if you explicitly specify the digit capacity inside the executive address with
byte, word or dword. For example, if you write

mov eax, [ebx + dword label]

- then, even if the label value is not yet known, its length (and, consequently, the
length of the entire machine instruction) is already specified.
§ 3.4. Main features of the NASM assembler 619
3.5. Macro tools and macro processor
3.5.1. Basic Concepts
A macroprocessor is a software tool that takes as input some text and, using the
instructions given in the text itself, partially transforms it, giving as output, in turn, a
text that no longer has instructions for transformation. As applied to programming
languages, a macroprocessor is a converter of the program source text, usually
combined with a compiler; the result of the macroprocessor's work is a text in the
programming language, which is then processed by the compiler according to the rules
of the language (see Fig. 3.6).

Fig. 3.6. Schematic diagram of macroprocessor operation


Since assembly languages are usually very poor in their representational
capabilities (compared to high-level languages), assemblers are usually equipped with
very powerful macroprocessors to compensate programmers for the inconvenience. In
particular, the NASM assembler we are considering contains an algorithmically
complete macroprocessor, which we can make write almost the whole program for us
if we wish.
We have already met macros: the frequently used PRINT and FINISH are
macros, or, more precisely, macro names. In general, a macro is a rule according
to which a program fragment containing a certain word should be transformed. The
word itself is called a macro name; often the term "macro name" is replaced by the
word "macro", although this is not entirely correct.
Before we can use a macro, it must be defined, i.e., first, we must tell the macro
processor that a certain identifier is henceforth considered a macro name (so that its
appearance in the program text requires the macro processor's intervention), and
second, we must specify the rule by which the macro processor should act when it
encounters this name. The program fragment that defines the macro is called a macro
definition. When the macro processor encounters a macro name and parameters in the
program text (called a macro call, or macro call), it replaces the macro name (and
possibly the parameters related to it) with a fragment of text obtained according to the
macro definition. This replacement is called a macro substitution, and the resulting text
is called a macro expansion . 254

It can also happen that the macro processor performs the transformation of the
program text without seeing any macro name, but obeys even more direct instructions
expressed in the form of macro directives. We already know one such macro directive:
it is the '/include directive, which orders the macroprocessor to replace
the program itself with the contents of the file specified by the directive's parameter.
Thus, the familiar string

The term "macro expansion" is not a very successful calque of the English term macro expansion.
254
§ 3.5. Macro means and macro processor 620
/include "stud_io.inc"

is replaced by whatever is in the stud_io.inc file.

3.5.2. The simplest examples of macros


To give you an idea of how you can use the macro processor and what it is used
for, let us give two simple examples. As we saw in §§3.3.6, 3.3.7, and 3.3.9, writing a
subroutine call in assembly language takes several lines - two more than the number of
parameters passed. This is not always convenient, especially for people accustomed to
high-level languages. Using the macro mechanism, we can significantly reduce the
subroutine call record. For this purpose, we will describe macros pcalll, pcall2,
etc., for calling a procedure with one parameter, with two parameters, etc.,
respectively. With the help of such macros, the procedure call record will be reduced
to one line; instead of

push edx
push dword mylabel
push dword 517 call myproc add esp, 12

you could write

pcall3 myproc, dword 517, dword mylabel, edx

which, of course, is more convenient and understandable. Later, when we understand


macro definitions more deeply, we will rewrite these macros, instead of them
introducing a single macro pcall, which works for any number of arguments,
but for now we will limit ourselves to special cases. So, let's write a macro definition:

%macro pcall1 2 ; 2 -- number of macro parameters


push %2 call %1 add esp, 4
%endmacro

We have described a multiline macro named pcalll, which has two parameters: the
name of the called procedure for the call command and the argument of the
procedure to be placed on the stack. The lines written between the '/macro and
/endmacro directives make up the body of the macro - a template for the text that
should result from macro substitution. In the body of the macro you can use the
parameters specified when the macro is called - they are %1, %2, etc. up to %9, with
%0 representing the total number of parameters; there can be more than nine
parameters, but more on that later. In our example, the macro substitution will be quite
simple: the macro processor only needs to replace occurrences of %1 and %2 with
the first and second parameters defined in the macro call. If after such a definition in
the program text there is a line of the following form
§ 3.5. Macro means and macro processor 621
pcall1 proc, eax

- The macro processor will treat this string as a macro call and perform macro
substitution according to our macro definition, considering the word proc as the first
parameter and the word eax as the second parameter and substituting them
instead of %1 and %2. The result is like this:

push eax call proc add esp, 4

Let's describe macros pcall2 and pcall3 in a similar way:

/macro pcall2 3
push /3
push /2
call /1
add esp, 8
/endmacro
'/.macro pcall3 4
push /4
push /3
push /2
call /1
add esp, 12
/endmacro

For completeness, you can also add the pcall0 macro:

/macro pcallO 1 call /1


/endmacro

Of course, this macro, unlike the previous ones, does not reduce the size of the program
in any way, but it will allow us to make all calls to subprograms uniformly. We leave
the description of macros pcall4, pcall5, etc. up to pcall8 to the reader as
an exercise; at the same time, for self-checking, answer the question why we propose
to stop at pcall8 and not, for example, at pcall9 or pcall12.
The example we examined used a multiline macro; as we have seen, calling a
multiline macro syntactically looks just like using machine commands or pseudo-
commands: the macro name is written instead of the command name, followed by a
comma followed by the parameters. A multi-line macro is always converted to one or
more lines in assembly language. But what if, for example, we need to generate a part
of a string with a macro rather than a fragment of several strings? Such a need also
arises quite regularly. For example, in the example given in §3.3.9, we can see that
inside subroutines we often need to use constructions like [ebp+12], [ebp-4],
etc. to refer to subroutine parameters and its local variables. It is not so difficult to get
used to these constructions; but we can go another way, using one-line macros. Let's
§ 3.5. Macro means and macro processor 622
start by writing the following macro definitions:
255

%define argl ebp+8


%define arg2 ebp+12
%define arg3 ebp+16
%define locall ebp-4
%define local2 ebp-8
%define local3 ebp-12

In addition to these, we'll add this:


%define arg(n) ebp+(4*n)+4
%define local(n) ebp-(4*n)

Now the procedure parameter can be accessed like this:


mov eax, [arg1]

or like this (if, for example, the described macros were not enough)
mov [arg(7)], edx

In principle, we could include square brackets inside macros, so that we don't have to write
them every time. For example, if we change the definition of macro arg1 to the following:

%define arg1 [ebp+8]


then the corresponding macro call would look like this:

mov eax, arg1

We have not done this for reasons of clarity. The NASM assembler supports, as we know, the
convention that any memory access is formalized with square brackets; if there are no square
brackets, we deal with a direct or register operand. A programmer who is used to this
convention will have to make extra efforts when reading the program to remember that arg1
in this case is not a label but the name of a macro, so it is a memory access that is performed
here, not loading the label address into the register. Such things do not help the program
comprehensibility at all. Keep in mind that you yourself, even if you are the author of the
program, may forget completely what was meant in a few days, and then saving two characters
(brackets) will cost you invaluable time.

3.5.3. Single-line macros; macro variables


As you can see from the examples in the previous paragraph, a single-line macro is
a macro whose definition consists of a single line, but its call is expanded into a
fragment of a line of text (i.e., it can be used to generate part of a line). Note that a
once-defined macro can be redefined if necessary by simply inserting another definition
of the same macro into the program text. From the moment the macro processor "sees"

Here and later in our examples we assume that all procedure parameters and all local variables are
255

always "double words", i.e., 4 bytes in size; in reality, of course, this is not always the case, but we are
now more concerned with the illustrative value of the example.
§ 3.5. Macro means and macro processor 623
the new definition, it will use it instead of the old one. With this in mind, the same
macro name can mean different things in different places in the program and be
expanded into different text fragments. Moreover, a macro can be removed altogether
by using the %undef directive; upon encountering such a directive, the macro
processor will immediately "forget" that the macro exists. An interesting question is
what happens if one macro definition uses a call to another macro, and the latter is
occasionally overridden.
If we use the familiar '/define directive to describe a one-line macro A and
use a macro call of macro B in its body, this macro call is not disclosed in the
directive itself; the macroprocessor leaves the occurrence of macro B as it is until it
encounters a call of macro A. When the macro substitution for A is executed, its
result will contain B, and the macro processor will in turn execute a macro substitution
for it. Obviously, this will use the definition of macro B that was current when A
was substituted (not defined).
Let's explain the above with an example. Let's assume that we have entered two
macros:

'/. definethenumber25
'/. definemkvarddthenumber.

If you now write in the program the line

varlmkvar

— then the macro processor will first perform a macro substitution for mkvar,
obtaining the string

varldd thenumber

— and from it, in turn, by macro substitution thenumber will get the string

varldd 25

If you now override thenumber and call mkvar again:

/definethenumber36
var2mkvar

— then the result of the macroprocessor will be a string containing exactly the number
36:
var2dd 36

— even though we have not changed the mkvar macro itself: at the first step, dd
thenumber will be obtained as last time, but thenumber now has the value 36,
§ 3.5. Macro means and macro processor 624
and it will be substituted. This macro substitution strategy is called "lazy ". 256

However, the NASM assembler allows you to use another strategy, called vigorous, for
which the /xdefine directive is provided. This directive is completely analogous
to the '/define directive with the only difference that if macro calls occur in the
body of the macro description, the macro processor performs their macro substitutions
immediately, i.e. right at the moment of macro definition processing, without waiting
for the user to call the macro being described. Thus, if in the above example you replace
the /define directive in the description of the mkvar macro with /xdefine:
/define thenumber 25
/xdefine mkvar dd
varl mkvar thenumber
/define thenumber 36
var2 mkvar

- then both resulting strings will contain the number 25:


varldd 25
var2dd 25

The redefinition of the thenumber macro cannot affect the operation of the mkvar
macro now, because the body of the mkvar macro does not contain the word
thenumber this time: while processing the definition of mkvar, the macro
processor substituted its value (25) instead of the word thenumber.
Sometimes it is necessary to associate with the macro name not just a string, but a
number resulting from the calculation of an arithmetic expression. The NASM
assembler allows you to do this using the /assign directive. Unlike /define and
/xdefine, this directive not only performs all substitutions in the body of the macro
definition, but also attempts to evaluate the body as an ordinary integer arithmetic
expression. If it fails, an error is logged. Thus, if you write in the program first
'/. assignvar25

and then

'/. assignvarvar+1

- then as a result the value 26 will be associated with the var macro name, which
will be substituted if the macro processor encounters the word var in the further
program text.
The macro variables introduced by the '/assign directive are commonly
referred to as macro variables. This is an important tool that allows you to specify an
entire program to the macro processor, the result of which can be assembly language

The name is a corruption of the English lazy and is partially justified by the fact that the macro
256

processor seems to be "lazy" to execute macro substitution (in this case macro thenumber) until it is
forced to do so.
§ 3.5. Macro means and macro processor 625
text - generally speaking, any length of text.

3.5.4. Conditional compilation


Often when developing programs, there is a need to create different versions of an
executable file using the same source code. Suppose we write custom programs and we
have two customers Petrov and Sidorov; their programs are almost identical, but each
of them has specific requirements that the other does not have. In such a situation, of
course, we would like to have and maintain one source code: otherwise we will have
two copies of the same code, and we will have to fix, for example, every found error in
two places. However, when compiling a version for Petrov, we should exclude
fragments intended for Sidorov, and vice versa.
There are other such situations; we have already seen one of them, debugging
printing, in our study of Pascal (see §2.13.3). There we also looked at conditional
compilation directives, noting at the same time that they look rather strange in Pascal.
The reason for this "strangeness" is very simple: Pascal does not have a macro
processor, and conditional compilation is usually implemented by the macro processor,
if, of course, it is provided in the language. Our NASM assembler does the same - its
macro processor directives include conditional compilation directives.
Let's consider an example related to debugging. Suppose we have written a
program, compiled it and run it, but it crashes, and we cannot understand the reason,
but we think that the crash occurs in some "suspicious" fragment. To check our
assumption, we want to print the corresponding messages immediately before entering
this fragment and immediately after exiting it. To avoid having to erase these messages
several times and insert them again, we will use conditional compilation directives. It
will look like this:

/ifdef DEBUG_PRINT
PRINT "Entering suspicious section" PUTCHAR 10
/endif
;
; here comes the "suspicious" part of the program.
/ifdef DEBUG_PRINT
PRINT "Leaving suspicious section"
PUTCHAR 10
/endif

Here /ifdef is one of the conditional compilation directives, meaning "compile


only if the specified one-line macro is defined" (in this case it is the DEBUG_PRINT
macro). Now you should insert a line defining this symbol into the beginning of the
program:

'/.define DEBUG_PRINT

Then at startup NASM will "see" and compile the fragments of our source code
enclosed between /ifdef and /endif; when we find an error and we don't need
§ 3.5. Macro means and macro processor 626
the debug print anymore, it will be enough to remove this '/.define from the
beginning of the program or even just put a comment sign in front of it:

;/define DEBUG_PRINT

- and fragments framed by corresponding directives will be simply ignored by the


macroprocessor, so you can leave them in the program text in case they are needed
again.
When studying similar tools in Pascal, we noted that to enable and disable debug
printing framed by conditional compilation directives, you can do without editing the
source code at all. This is also possible when working with assembly language. As we
saw in §3.4.1, a macro character can be defined with the NASM command line key; in
particular, the debug print from our example can be enabled by running NASM like
this:

nasm -f elf -dDEBUG_PRINT prog.asm

This saves us from having to insert the /define directive into the source text and
then delete it.
Returning to the two-customer situation, we can envision constructions like the
following in the program:

%ifdef FOR_PETROV
;
; here's the code for Petrov only.

%elifdef FOR_SIDOROV
;
; and here, just for Sidorov

%else
; if neither symbol is defined,
; abort compilation and generate an error message
%error Please define either FOR_PETROV or FOR_SIDOROV
%endif

(as you can easily guess, the %elifdef directive allows you to shorten the record
requiring %else and %ifdef, saving one %endif at the same time). When
compiling such a program, it will be necessary to specify the -dFOR_PETROV or -
dFOR_SIDOROV key, otherwise NASM will start processing the fragment located
after %else and will generate an error message when it encounters the %error
directive.
In addition to checking for the presence of a macro character, it is also possible to
check for the absence of a macro character (i.e. the exact opposite condition). This is
done with the %ifndef (if not defined) directive. As with %ifdef, there is a
shortened version of the %else construct for %ifndef - the %elifndef
directive.
§ 3.5. Macro means and macro processor 627
It is not only the presence or absence of a macro that can be used to specify the
condition under which a fragment should or should not be compiled; NASM supports
other conditional compilation directives. The most common is the %if directive, in
which the condition is specified by an arithmetic-logic expression that is evaluated at
compile time. We have already met such expressions in §3.4.5; to form logical
expressions, the set of allowed operations is extended by =, <, >, >=, <=, in their
usual sense, the operation "not equal" can be specified by the symbol <>, as in Pascal,
or by the symbol !=, as in C; a C-like form of writing the operation "equal" in the
form of two equal signs == is also supported. In addition, the logical connectives
&& ("and"), || ("or"), and
("excluding or"). Note that all expressions used in the %if directive are considered
critical (see §3.4.6). As with all other Zif-directives, there is a shortened form
of the %else construct, the %elif directive.
Let us briefly list the other conditional directives supported by NASM. The
%ifidn and %ifidni directives take two comma-separated arguments and
compare them as strings, making macro substitutions in the text of the arguments if
necessary. The code fragment following these directives is translated only if the strings
are equal, and %ifidn requires an exact match, whereas %ifidni ignores case
and considers, for example, the strings foobar, FooBar and FOOBAR to be
the same. The %ifnidn and %ifnidni directives can be used to check the
opposite condition; all four directives have Zelif forms, %elifidn,
%elifidni, %elifnidn and %elifnidni, respectively. The %ifmacro
directive checks for the existence of a multiline macro; the %ifnmacro,
%elifmacro, and %elifnmacro directives are supported. The %ifid,
%ifstr, and %ifnum directives check whether their argument is an identifier,
string, or numeric constant, respectively. As usual, NASM supports all optional forms
of %ifnXXX, %elifXXX, and %elifnXXX for all three directives.
In addition to those listed above, NASM supports the %ifctx directive and the
corresponding forms, but the explanation of its operation is rather complicated and we will not
discuss this directive.

3.5.5. Macro-repetitions
The NASM assembly macroprocessor allows you to repeatedly (cyclically) process
the same code fragment. This is achieved by the %rep (from the word repetition) and
%endrep directives. The %rep directive takes one mandatory parameter, which
means the number of repetitions. The code fragment between the %rep and
%endrep directives will be processed by the macroprocessor (and assembler)
as many times as the number of times specified in the %rep directive parameter. In
addition, the %exitrep directive may occur between the %rep and %endrep
directives, which terminates macro-repeat execution prematurely.
Let's consider a simple example. Suppose we need to describe a memory area
consisting of 100 consecutive bytes, with the first byte containing the number 50,
the second byte containing the number 51, etc., and the last byte containing
§ 3.5. Macro means and macro processor 628
the number 149. Of course, you can just write a hundred lines of code:

db 50 db 51 db 52

db 148 db 149

- but this is, firstly, tedious and, secondly, takes too much space in the program text. It
would be more correct to entrust the generation of this code to the macroprocessor,
using macro-repetition and macro-variable:

/assign n 50
/rep 100
db n
/assign n n+1
/endrep

Upon encountering such a fragment, the macroprocessor will first associate the value
50 with the macro variable n, then it will examine the two lines between %rep and
%endrep one hundred times; each examination of these lines will lead to the
generation of the next line db 50, db 51, db 52, etc. to be assembled; the number
changes due to the fact that the value of the macro variable n changes (increases by
one) at each macro-repeat pass. In other words, as a result of processing this fragment
by the macroprocessor, exactly one hundred lines of code will be obtained as shown
above, and it is these lines that will be assembled.
Let's consider a more complicated example. Suppose we need a memory area
containing all Fibonacci numbers , not exceeding 100 000, sequentially as four-byte
257

integers. The corresponding sequence of dd directives can be generated using this


code fragment:

fibonacci
/assign i 1
/assign j 1
/rep 100000
/if j > 100000
/exitrep
/endif

dd j

/assign k j+i
/assign i j
/assign j k
/endrep

Recall that Fibonacci numbers are a sequence of numbers starting with two units, each next
257

number of which is obtained by adding the previous two: 1, 1, 2, 3, 5, 8, 13, 41, 54, etc.
§ 3.5. Macro means and macro processor 629
fib_countequ ($-fibonacci)/4

The label fibonacci will be associated with the start address of the generated
memory region, and the label fib_count will be associated with the total number
of numbers placed in this memory region (we have already encountered this technique,
see page 614).

You can use macro-repeats not only to generate memory areas filled with numbers,
but also for other purposes. For example, suppose we have an array of 128 two-byte
integers:
array resw 128

and we want to write a sequence of 128 inc commands incrementing each of the
elements of this array by one. We can do it this way:
/assign a 0
%rep 128
inc word [array + a]
/assign a a+2
%endrep

The reader might note that using 128 instructions in such a situation is irrational and it would
be more correct to use a runtime loop like this:

mov ecx, 128


lp: inc word [array + ecx*2 - 2]
loop lp

In most cases this option is indeed preferable because these three commands will naturally
occupy several tens of times less memory than a sequence of 128 inc commands, but
you should keep in mind that such code will run about one and a half times slower, so in some
cases using a macrocycle to generate a sequence of identical commands (instead of a runtime
cycle) may be meaningful.

3.5.6. Multi-line macros and local labels


Now let's return to multiline macros; such macros generate a text fragment
consisting of several lines rather than a line fragment. The description of a multiline
macro also consists of several lines, enclosed between the '/macro and
/endmacro directives. In §3.5.2 we have already considered the simplest examples
of multiline macros, but in the case of a slightly complex macro, we will not have
enough resources. Suppose, for example, we want to describe a macro zeromem that
takes two parameters as input - the address and the length of a memory location - and
expands into code that fills this memory with zeros. Without thinking much about what
§ 3.5. Macro means and macro processor 630
is going on, we could write, for example, the following (incorrect!) code : 258

Here and until the end of the paragraph, we use the EAX and ECX registers without saving their
258

contents; we will assume that our macros follow the same conventions as subroutines written according
to CDECL (see page 607).
§ 3.5. Macro means and macro processor 631
%macro zeromem 2 ; (two parameters - address and length)
mov ecx, %2
mov eax, %1
Ip: mov byte [eax],
0
inc eax loop lp
%endmacro
NASM will accept this description and even allow us to make one macro call. If at least
two calls to the zeromem macro occur in our program, we will get an error message
when we try to translate the program - NASM will complain that we use the same label
(lp:) twice. Indeed, at each macro call, the macro processor will insert the whole body
of our macro definition instead of the call, only replacing %1 and %2 with the
corresponding parameters and keeping everything else unchanged. So, the string

lp: mov byte [eax], 0

containing the lp label will be encountered by the assembler (after macro-processing)


twice - or, more precisely, exactly as many times as the zeromem macro will be
called.
It is clear that some mechanism is needed to localize the label used within a multi-
line macro, so that such labels, obtained by calling the same macro in different places
of the program, do not conflict with each other. In NASM, this mechanism is called
"local labels in macros". To use it, you should start the label name with two %
characters - so in the above example, both occurrences of the label lp should be
replaced with %%lp. This label will be replaced with a new (non-repeating) identifier
in each subsequent macro call: the first time the zeromem macro is called,
NASM will replace %%lp with ..@1.lp, the second time with ..@2.lp, and so on.
Let us note one more disadvantage of the above definition of zeromem. If, when calling
this macro, the user (the programmer using our macro, or perhaps we ourselves) specifies the
EAX register as the first parameter (memory area start address) or the ECX register as the
second parameter (memory area length), the macro call will be successfully translated, but
the program will not work as expected. Indeed, if you write something like

section .bss
array resb 256
arr_len equ $-array

section .text

mov ecx, array mov eax, arr_len

zeromem ecx, eax


; ...

— then the beginning of the zeromem macro will expand into the following code:

mov ecx, eax mov eax, ecx


; ...
§ 3.5. Macro means and macro processor 632
— as a result of which, obviously, both registersEXX and EAX will contain the
length of the array, and the address of its beginning will be lost. Most likely, such a program
will crash after reaching this code fragment.
To avoid such problems, you can use conditional compilation directives, checking whether
the first parameter is not an ECX register and the second one is an EAX register, but
you can also do something simpler - load parameter values into registers by temporarily writing
them to the stack, i.e. instead of

mov ecx, %2
mov eax, %1

write

push dword %2 push dword %1 pop eax pop ecx

Finally, our macro definition will take the following form:

%macro zeromem 2 ; (two parameters - address and length)


push dword %2 push dword %1 pop eax pop ecx
%%lp: mov byte [eax], 0
inc eax loop %%lp
%endmacro

3.5.7. Macros with variable number of parameters


When describing multiline macros using the '/macro directive, the NASM
assembler allows you to specify a variable number of parameters. This is done using
the "-" symbol, which in this case acts as a dash. For example, the directive

%macro mymacro 1-3

specifies a macro that accepts from one to three parameters, and the directive

/macro mysecondmacro 2-*

specifies a macro that allows an arbitrary number of parameters, not less than two.
When working with such macros, the designation /0 may be useful, instead of which
the macro processor substitutes a number equal to the actual number of parameters
during macro expansion.
Recall that the arguments of a multiline macro are denoted in its body as %1, /2,
etc., but NASM does not provide indexing facilities (i.e., a way to extract the nth
parameter, where p is calculated during macro substitution). How to use parameters in
this case, if even their number is not known in advance? The problem is solved by the
directive '/rotate, which allows you to reassign parameters. Let's consider the
simplest version of the directive:
§ 3.5. Macro means and macro processor 633
/rotate 1

The numeric parameter indicates by how many positions the parameter numbers should
be shifted. In this case it is the number 1, so that the parameter previously designated
/2, after this directive will have the designation %1, in turn, the former /3 will
turn into /2, etc., well, and the parameter, which was the first and had the
designation %1, due to the "cyclicality" of our shift will receive a number equal to the
total number of parameters. The designation /0 does not participate in the rotation
and does not change in any way.
If the /rotate directive is given a negative parameter, it will perform a cyclic
shift in the opposite direction (to the left). Thus, after

/rotate -1

/1 will denote the parameter that was the most recent, /2 will denote the parameter
that was the first (i.e., labeled %1), and so on.
Recall that earlier (see page 620) we promised to write the pcall macro, which
allows us to form a call to a subroutine with any number of arguments in one line. Now,
having at our disposal macros with a variable number of arguments and the /rotate
directive, we are ready to do it. Our macro, which we will simply call pcall, will take
as input the address of the procedure (argument to the call command) and an arbitrary
number of parameters to be placed on the stack. We will, as before, assume for
simplicity that each parameter occupies exactly 4 bytes. Recall that the parameters must
be placed on the stack in reverse order, starting from the last one. We will achieve this
by using the %rep macro loop and the '/rotate -1 directive, which will make
the last (currently) parameter number 1 at each step. The number of iterations of the
loop is one less than the number of parameters passed to the macro, because the first
parameter is the name of the procedure and it does not need to be stacked. After this
loop, we will have to turn the last parameter into the first one again (it will be the very
first of all parameters, i.e. the address of the procedure) and make a call, and then
insert the add command to clear the stack of parameters. So, let's write:

/macro pcall 1-* ; from one to as many as you like.


/rep /0 - 1 ; cycle through all parameters except the
first one
/rotate -1 ; last parameter becomes /1
push dword /1
/endrep
/rotate -1 ; procedure address becomes /1
call /1
add esp, (/0 - 1) * 4
/endmacro

If you now call this macro, for example, like this:

pcall myproc, eax, myvar, 27


§ 3.5. Macro means and macro processor 634
- then the result of the substitution will be the following fragment:

push dword 27
push dword myvar
push dword eax call myproc add esp, 12

as requested.

3.5.8. Macrodirectives for working with strings


The NASM assembler supports two directives for converting strings (string constants)
during macroprocessing. They may be useful, for example, inside a multi-line macro, one of
the parameters of which is (must be) a string and this string must be converted somehow.
The first of the directives, /strlen, allows you to define the length of a string. The
directive has two parameters. The first one is the name of the macro variable to which the
number corresponding to the string length should be assigned, and the second one is the
string itself. Thus, as a result of executing

%strlen sl 'my string'

the macro variable sl will get the value 9.


The second directive, %substr, allows you to select a character with a given number
from a string. For example, after executing

%substr varl 'abcd' 1


%substr var2 'abcd' 2
%substr var3 'abcd' 3

macro variables var1, var2 and var3 will get the values 'a', 'b' and 'c'
respectively, i.e. the effect will be the same as if we had written

'/.define var1 'a'


'/.define var2 'b'
'/.define var3 'c'

All this makes sense, as a rule, only if you get either the name of a macro variable or the
designation of a positional parameter in a multiline macro as a directive argument.
Recall that all macro directives are executed during macro-processing (before
compilation, i.e. long before the execution of our program), so, of course, at the time of the
corresponding macro substitutions, all strings used must already be known.

3.6. Interaction with the operating system


In this chapter, we will consider the means of interaction between the user program
and the operating system, which will allow us to abandon the use of macros from the
stud_io.inc file in the future, and, if desired, to create their analogues ourselves.
User tasks access the operating system kernel using so-called system calls, which
in turn are implemented via software interrupts - if that term has any right to exist at
§ 3.6. Interaction with the operating system 646
all, but the i386 platform we are studying is characterized by such terminology. To
understand what this is all about, we will have to discuss in detail what interrupts
actually are, what they are used for, and where the strange term "software interrupt"
(which, note, does not interrupt anything in any way) comes from. Therefore, we will
devote the first four paragraphs of this chapter to presenting the necessary theoretical
information, and only then, having a ready base, we will consider the mechanism of
system calls of Linux and FreeBSD operating systems at the level of machine
commands.
§ 3.6. Interaction with the operating system 647
Task 1

Task 2

Task 3

Figure 3.7. Simultaneous execution of tasks on one processor

work downtime

Figure 3.8. Processor idle time in a single-task system

3.6.1. Multitasking and its main types


As mentioned in the introduction, multitasking (multiprogramming mode) is a
mode of operation of a computing system in which several programs can be executed
in the system simultaneously. For this, generally speaking, you do not need multiple
physical processors. A computer system may have only one processor, which does not
prevent the realization of the multiprogramming mode itself. Anyway, the number of
processors in the system is in general less than the number of simultaneously executed
programs. It is clear that a processor can execute only one program at each moment of
time. What, in this case, is meant by multiprogramming?
The seeming paradox is resolved by introducing the following definition of
simultaneity for the case of running programs (processes or tasks): two tasks running
on the same computer system are called simultaneous if their execution periods
(time interval from the moment of start to the moment of completion of each of
the tasks) fully or partially overlap.
In other words, if the processor, while working on one task at any given time,
switches between several tasks while paying attention to one or the other, these tasks
will be considered to be executed simultaneously according to our definition (see Fig.
3.7).
In the simplest case, multitasking allows you to solve the problem of CPU idle time
during I/O operations. Let's imagine a computer system in which a single task (for
example, calculation of a complex mathematical model) is running. At some moment
of time the task may require a data exchange operation with some external device (for
example, reading another block of input data or, on the contrary, writing final or
intermediate results). The speed of external devices (disks, etc.) is usually orders of
magnitude slower than the speed of the central processor, and in any case is by no
means infinite. Thus, to read a given block of data from a disk, you have to turn on the
head drive to move it to the right position (to the right track) and wait for the disk itself
to rotate to the right angle (to work with a given sector); then, while the sector passes
§ 3.6. Interaction with the operating system 648
under the head, read the data written in this sector into the internal buffer of the disk
controller ; finally, you should place the read data in the memory area where the user
272

program expects them to appear, and only then return control to it. During all this time
(at least the time spent on moving the head and waiting for the necessary phase of disk
rotation) the central processor will be at best idle, and most likely it will have to
continuously poll the controller in a loop for readiness (Fig. 3.8).
All this does not create problems if we have only one task and there is nothing else
for the processor to do, but if besides the task that is already running, we have other
tasks waiting for their time, then it is better to use the CPU time to solve other tasks,
and not let it go to waste waiting for the end of I/O operations, That's what multitasking
operating systems do. In such a system, a task queue is formed from the tasks that need
to be solved. As soon as the active task requests an I/O operation, the operating system
performs the necessary actions to start the device controllers to execute the requested
operation or puts the requested operation in the queue, if you can not start it
immediately for some reason, after which the active task is replaced by another - a new
(taken from the queue) or already executed earlier, but did not have time to complete.
In this case, the replaced task is considered to have entered the state of waiting for the
I/O result, or the blocking state.
In the simplest case, a new active task remains in the execution mode until it either
terminates or requests an I/O operation in its turn. In this case, the blocked task at the
end of the I/O operation goes from the blocked state to the state of readiness for
execution, but there is no switch to it (see Fig. 3.9); this is due to the fact that
task 1 locking readiness task 1

task 2

Figure 3.9. Batch OS

The operation of changing the active task, generally speaking, consumes a lot of
processor time. Such a way of multitasking construction, when the active task is
changed only in case of its termination or request for I/O operation, is called batch
mode , and operating systems realizing this mode are called batch operating systems.
273

The batch multitasking mode is the most efficient from the point of view of using the
processing power of the central processor, that is why the batch mode is used to control

272
Reading directly into RAM is theoretically possible, but technically is difficult and is rarely used.
The Russian-language term "batch mode" is a well-established, though not very successful,
273

translation of the English term "batch mode"; the word batch can also be translated as "deck" (in fact,
originally they meant decks of punched cards representing tasks). This term should not be confused
with words derived from the English word packet, which is also usually translated into Russian as
"packet".
§ 3.6. Interaction with the operating system 649
supercomputers and other machines, the main purpose of which is large volumes of
numerical calculations.
With the appearance of the first terminals and dialog (in other words, interactive)
mode of work with computers, there was a need for other strategies for changing active
tasks, or, as it is commonly said, scheduling the CPU time. Indeed, a user dialoguing
with one or another program will hardly want to wait until some active task calculating,
say, an inverse matrix of the order of 1000x1000, finishes its work. At the same time,
a lot of CPU time is not required to service the dialog with the user: in response to each
user action (e.g., pressing a key), it is usually necessary to perform a set of actions
within a few milliseconds, while the user can create no more than three or four such
events per second even in the active typing mode (the speed of computer typing of 200
characters per minute is considered quite high). It would be illogical to wait for the user
to completely finish his dialog session: most of the time the processor could perform
arithmetic operations to calculate the matrix.

Time-sharing mode helps to solve this problem. In this mode, each task is allocated
a certain amount of work time, called a time quantum. At the end of this quantum, if
there are other tasks ready for execution in the system, the active task is forcibly
suspended and replaced by another task. The suspended task is placed in the queue of
tasks ready for execution and stays there while the other tasks work off their quanta;
then it gets another quantum of time to work again, and so on. Naturally, if the active
task has requested an I/O operation, it is put into the blocking state (just like in the batch
mode). Tasks in the blocked state are not queued for execution and do not receive time
quanta until the I/O operation is completed (or another reason for the blocking
disappears) and the task moves to the ready for execution state.
There are various algorithms of execution queue support, including those in which
tasks are assigned a certain priority, expressed as a number. For example, a task can be
assigned two priority components - static and dynamic; the static component represents
the level of "importance" of execution of this particular task assigned by the
administrator, while the dynamic component is changed by the scheduler: while the
task is being executed, its dynamic priority decreases, while when it is in the execution
queue, the dynamic component of priority, on the contrary, increases. Out of several
tasks ready for execution, the one with the highest sum of priorities is selected, so that
sooner or later even the task with the lowest static priority will get control at the expense
of the increased dynamic priority.
Some operating systems, including early versions of Windows, used a strategy
intermediate between batch mode and time-sharing mode. In these systems, tasks were
allocated a time quantum, as in time-sharing systems, but there was no forced change of the
current task when the time quantum expired; the system only checked to see if the current
task's time quantum had expired when the task accessed the operating system (not
necessarily for I/O). A task that did not need the services of the operating system could remain
on the processor for as long as it wanted, just as in batch operating systems. This mode of
operation is called non-displacing. It is not used in modern systems because it imposes too
strict requirements on the programs running in the system; for example, in early versions of
Windows any program engaged in long calculations blocked the work of the whole system,
§ 3.6. Interaction with the operating system 650
and a looped task led to the need to reboot the computer.
Sometimes the time-sharing mode is also unsuitable. In some situations, such as
controlling the flight of an airplane, a nuclear reactor, an automatic production line,
etc., some tasks must be completed strictly before a certain point in time; for example,
if the autopilot of an airplane, receiving a signal from the pitch and roll sensors, takes
more time than allowed to calculate the necessary corrective action, the airplane may
lose control altogether.
When the tasks being performed (at least some of them) have strict time limits for
completion, real-time operating systems are used. Unlike time-sharing systems, the
task of a real-time scheduler is not to let all programs work for a certain amount of time,
but to ensure that each task is completed in the time allotted to it, and if this is
impossible - to remove the task, freeing the processor for those tasks that can still be
completed by the deadline. In real-time systems, what is more important is not the total
number of tasks solved in the system in a fixed amount of time (called system
performance), but the predictability of the execution time for each individual task.
Scheduling in real-time systems is a rather complex section of the computational
sciences, worthy of a separate book and obviously beyond the scope of our textbook. It
is unlikely that you will ever encounter real-time systems in practice, at least as a
programmer; if you do, you will need to spend time studying specialized literature, but
this is the case in any specific area of engineering.

3.6.2. Hardware support for multitasking


It is clear that in order to build a multitasking mode of operation of a computer
system, the hardware (first of all, the central processor itself) must have certain
properties. Some of them we have already mentioned in §3.1.2 - firstly, memory
protection, and secondly, division of machine commands into ordinary and privileged
ones, with the exception of the possibility to execute privileged commands in the
limited mode of operation of the central processor.
Indeed, when several programs are simultaneously located in the memory of the
machine, if no special measures are taken, one of the programs may modify the data or
code of other programs or the operating system itself. Even if we assume that there is
no malicious intent on the part of the developers of the programs being run, this will
not save us from accidental errors in programs, and such an error may, on the one hand,
lead to severe crashes of the whole system, and on the other hand - be completely
elusive, up to the absolute impossibility to establish which of the tasks is "guilty" of
what is happening. The point is that in order to detect and eliminate an error it is
necessary to reproduce the circumstances under which it occurs, and it is practically
impossible to recreate the state of the whole system with all the tasks running in it.
Obviously, we need means to limit the possibilities of a running program to access
memory areas occupied by other programs. Such protection can be implemented
programmatically only by interpreting the whole machine code of an executing
program, which is usually inadmissible for efficiency reasons. Consequently, hardware
support of memory protection is required to limit the current task's ability to access the
main memory.
§ 3.6. Interaction with the operating system 651
As long as memory protection exists, the processor must support commands to
control that protection. If, again, no special measures are taken, such commands can be
executed by any of the running programs, removing memory protection or modifying
its configuration, which would make memory protection itself practically meaningless.
The problem under consideration concerns not only memory protection but also work
with external devices. As we have already mentioned, to ensure normal interaction of
all programs with I/O devices, the operating system must take direct work with devices
on itself, and user programs should provide an interface for requesting services to work
with devices, and user programs should not be able to address devices directly.
Therefore, it is necessary to prohibit user programs from executing processor
commands that read/write I/O ports. In general, when transferring control to a user
program, the operating system must be sure that the task cannot (except by referring to
the operating system itself) perform any actions affecting the system as a whole.
The problem, as we already know, is solved by introducing two modes of CPU
274

operation: privileged and restricted. In the literature, the privileged mode is often called
"kernel mode" or "supervisor mode". The restricted mode is also called "user mode" or
simply unprivileged mode. We have chosen the term restricted mode as the most
accurate one describing the essence of this mode of central processor's operation
without being bound to its use by operating systems. In privileged mode the processor
can execute any existing commands. In restricted mode, execution of commands
affecting the system as a whole is prohibited; only commands whose effect is limited
to modifying data in memory areas not covered by memory protection are allowed. The
operating system itself is executed in privileged mode; user programs are executed in
restricted mode.
As we noted in §3.1.2, a user program can only modify data in its allocated
memory; any other actions require a call to the operating system. This is ensured by the
CPU's support of a memory protection mechanism and limited mode of operation.
These two hardware requirements, however, are not yet sufficient to realize a
multitasking system.
Let's return to the I/O operation situation. In a single-task system (Figure 3.8 on
page 636), during the execution of an I/O operation, the CPU could continuously poll
the device controller for its readiness (whether the required operation has been
completed) and then prepare everything to resume the active task - in particular, copy
the read data from the controller buffer to the memory belonging to the task. It should
be noted that in this case the processor would be continuously busy during the I/O
operation, despite the fact that it would not be performing any useful computations.
This mode of interaction is called active waiting. Clearly, the processor time could have
been used more usefully.
When switching to the multi-task processing shown in Fig. 3.9 on page. 638,
another problem arises. When an I/O operation is completed, the processor is busy
executing the second task. Meanwhile, the moment an operation is completed, at a
minimum, the first task must be moved from the blocked state to the ready state; other

See footnote 6 on p. 527.


274
§ 3.6. Interaction with the operating system 652
actions may be required, such as copying data from the controller buffer, resetting the
controller (e.g., turning off the disk motor), and, in more complex situations, initiating
another I/O operation that was previously deferred (this could be a read operation from
the same disk that was requested by another task while the first operation was still in
progress). All this has to be done by the operating system. But how will it know when
an I/O operation is completed if the processor is busy performing another task and does
not continuously poll the controller?
The problem can be solved by the interrupt device. In the case of a disk operation,
at the moment of its completion, the disk controller gives the CPU a certain signal
(electrical impulse) called an interrupt request. The central processor, having received
this signal, interrupts the execution of the active task and transfers control to the
operating system procedure that performs all actions necessary at the end of the I/O
operation. Such a procedure is called an interrupt handler. After the interrupt handler
procedure is completed, control is returned to the active task.
To realize batch multitasking it is enough to implement interrupts, memory
protection and two modes of processor operation at the hardware level. If it is necessary
to create a time-sharing system or even more so a real-time system, it is necessary to
have one more component in the hardware - a timer. The scheduler of a time-sharing
operating system needs the ability to track the expiration of time quanta allocated to
user programs; in a real-time system such a feature is also needed, and the requirements
for it are even more stringent: if a task active at that moment is not removed from the
processor in time, the scheduler risks not having time to allocate the required processor
time to more important programs, which is fraught with unpleasant consequences
(remember the example with the autopilot of an airplane). A timer is a relatively simple
device, the whole work of which is reduced to generating interrupts at regular intervals.
These interrupts allow the operating system to get control, analyze the current state of
the available tasks and, if necessary, change the active task.
So, in order to implement a multitasking operating system, the computer hardware
must support:
• interrupt machine;
• memory protection;

• privileged and restricted modes of central processor operation;


• timer.
The first three properties are necessary in any multitasking system, the last one may be
absent in case of batch layout, although in real existing systems the timer is always
present. It should be noted that only the timer is a separate device, the rest are features
of the CPU.
Theoretically, if a timer is present, it is possible to make the timer interrupt the only
interrupt in the system. The operating system, having received control as a result of such an
interrupt, will have to poll all active controllers of external devices for the completion of the
operations performed. In reality, such a scheme causes many problems, first of all, with
efficiency, and the benefit of its use is not obvious.
§ 3.6. Interaction with the operating system 653
3.6.3. Interruptions and exceptions
The modern term "interrupt" has evolved quite far from its original meaning;
novice programmers are often surprised to find that some interrupts do not interrupt
anything at all. It would be somewhat difficult to give a strict definition of an interrupt.
Instead, let's try to explain the nature of the different types of interrupts and find what
they have in common that justifies the existence of the term.
Interrupts in the original sense are already familiar to us from the previous
paragraph. Certain devices in a computer system may function independently of the
CPU; from time to time they may need the attention of the operating system, but the
single CPU (or, no better, all the CPUs in the system) may be busy processing a user
program at just such a moment. Hardware (or external) interrupts were designed to
solve this problem. To support hardware interrupts, the processor has dedicated pins;
an electrical pulse applied to such a pin is perceived by the processor as a notification
that a device needs attention. In modern shared-bus architectures, one of the bus tracks
is used to request an interrupt.
The sequence of events when an interrupt occurs and is processed looks like this:
• The device that needs the attention of the processor sets an "interrupt request"
signal on the bus;
• the processor brings the execution of the current program to a point where
execution can be interrupted so that it can be restarted from the same place; the
processor then puts an "interrupt acknowledgement" signal on the bus and blocks
other interrupts;
• Upon receiving an interrupt acknowledgement, the device transmits a number
on the bus that identifies the device; this number is called the interrupt number;
• the processor stores somewhere (for example, in the stack of the active task) the
current values of the instruction counter and the flag register; this is called small
hiding; the instruction counter and the flag register must be stored, otherwise
the execution of the first instruction of the interrupt handler will change (corrupt)
both of them, making it impossible to return from the handler transparently (i.e.,
unnoticeable for the interrupted task); the other registers can be stored by the
interrupt handler itself;
• the privileged mode of operation of the central processor is established, after
which control is transferred to the entry point of the procedure in the operating
system, called, as we have already mentioned, an interrupt handler; the address
of the handler can be previously read from special memory areas or calculated
in some other way.
Recall that it is possible to switch from the privileged to the restricted mode of operation
of the CPU with a simple command, because in the privileged mode all the capabilities
of the processor are available; at the same time, the transition from the restricted (user)
mode back to the privileged mode cannot be made with the help of an ordinary
command, because it would make the very existence of the privileged and restricted
modes meaningless. In this respect, an interrupt is also interesting because the CPU
mode becomes privileged before it is processed.
§ 3.6. Interaction with the operating system 654
The timer mentioned above is probably the simplest of all external devices: all it
does is to issue interrupt requests at regular intervals (for example, 1000 times per
second on the processors we are considering).
Let us now consider the following question: what should the CPU do if the active
task required dividing an integer by zero? It is clear that further program execution
makes no sense: the result of division by zero cannot be represented by any integer, so
the variable that was supposed to contain the result of the division will contain garbage
at best; the final results will most likely be irrelevant. Trying to notify the program of
what has happened by setting some flag is obviously also pointless: if the programmer
did not check the divisor for zero before performing the division, it is unlikely that he
will check the value of some flag after the division.
The processor cannot terminate the current task by itself, it is too complex and
depends on the operating system implementation. All it has to do is to hand over control
to the operating system, notifying it of what has happened. The operating system will
decide what to do with the emergency task on its own. Obviously, it is required to
switch to the privileged mode and transfer control to the operating system code; before
that it is desirable to save registers (at least the instruction counter and the flag register);
even if the task will not be continued from the same place under any circumstances
(and the processor, generally speaking, has no right to assume this), the values of the
registers may be useful for the operating system to analyze the incident. Moreover,
somehow the operating system should be informed of the reason why control was
transferred to it; besides division by zero, such reasons can be a violation of memory
protection, an attempt to execute a forbidden or non-existent instruction, etc. All such
situations are called exceptions; they have a common property: the processor (no matter
for what reasons) cannot execute the next instruction.
It is easy to see that the actions to be performed by the processor when an exception
occurs are very similar to the hardware interrupt case discussed earlier. The main
difference is that there is no bus communication (interrupt request and
acknowledgement): information about these events occurs inside the processor, not
outside it. In terms of hardware implementation, exceptions can be much simpler than
hardware interrupts, since they always occur at a certain phase of instruction execution;
the reader will find details in the book [11]. The rest of the exception handling steps
repeat the hardware interrupt handling steps almost verbatim.
Another cardinal difference between the situation of invalid task actions and a
hardware interrupt is, in fact, the presence of a task responsible for what is happening,
and the task is a planning unit, i.e. the execution of the task can be suspended and then
continued from the same place. This allows the operating system to handle exception
handling quite differently from hardware interrupts. Exception handling is said to occur
in the context of the user task.
Despite the differences, handling situations in which the processor is unable to
execute an instruction for one reason or another is similar to hardware interrupts, if only
in that there must be a handler somewhere in memory to which control must be
transferred when the situation occurs, and the address where the handler is located must
be configurable, but, of course, such configuration must be a privileged action, so that
only the operating system can tell the processor where to transfer control when it is
§ 3.6. Interaction with the operating system 655
needed. The developers of x86 processors took a fairly simple path here. All handlers
for both hardware interrupts and exceptions are numbered from 0 to 255; a special
area is allocated in RAM for storing the so-called interrupt descriptor table. This table
contains entries of eight bytes each; each entry corresponds to its handler and contains
information on how exactly to transfer control to it - at which address, to which
segment , whether hardware interrupts should be temporarily blocked, etc.
275

Since the numbering of handlers is end-to-end and includes both hardware


interrupts and exceptions, we will not find strange the terminology actually introduced
by the creators of x86: they call exceptions internal interrupts. This terminology is
justified by the fact that the cause of an external interrupt is outside the CPU, while the
cause of an internal interrupt is inside the CPU. It should be noted that this terminology
is not usually used when describing other processors: hardware (aka external) interrupts
are simply called interrupts, and exceptions are called exceptions, or traps, or whatever.
Let us emphasize that internal interrupts do not interrupt anything or anyone!
Their name (even if we call them so and not exceptions) is justified only by the fact that
they use the same numbering and handler organization system as hardware interrupts.
In reality, when an internal interrupt occurs, the handler, although located in the
operating system kernel as part of the operating system, is executed as part of the task
as a scheduling unit, so it would be incorrect to consider the task interrupted in any
sense at all; but even if it were not, it would be strange to consider that the task was
interrupted by someone, when in fact the task itself caused the accident by its own
actions. Note that hardware interrupts quite obviously do interrupt the current task.

3.6.4. System calls and "program interrupts"


As mentioned above, the user task is not allowed to do anything other than convert
the data in its allocated memory. All actions that affect the world outside the task are
performed through the operating system. Consequently, we need a mechanism that
allows the user task to call the kernel of the operating system for some services. Recall
that the appeal of a user task to the kernel of the operating system for services is
called a system call. It is clear that by its essence a system call is a transfer of control
from a user task to the operating system kernel. However, there are two problems here.
First, the kernel operates in privileged mode, while the user task operates in restricted
mode. Secondly, the kernel address space for the user task is usually inaccessible
(moreover, these addresses may not exist in the task's address space at all). However,
even if it were available, allowing a user task to transfer control to an arbitrary point in
the kernel would be a bit strange.
So, to execute a system call, we need to change the execution mode from user mode
to privileged mode and transfer control to some entry point in the operating system. All
this must be initiated by the user task, i.e. it requires some special CPU instruction. On
different architectures, the corresponding instruction may be called trap, svc
(supervisor call, i.e., "supervisor call"), and so on. On some architectures, including

Of course, we remember that operating systems typically don't use the segmented virtual memory
275

component on the i386, but that component itself doesn't go anywhere.


§ 3.6. Interaction with the operating system 656
the modern 64-bit "heir" of i386 under consideration, this instruction is called simply
syscall, i.e. "system call". Of course, the operating system determines where
(and how) the control will be transferred in this case and of course there must be some
code fragment specially intended for processing system calls (i.e. a handler).
We have already seen something similar when considering hardware and internal
interrupts. The creators of x86 processors decided not to invent a separate mechanism
for system calls; instead, they introduced the int command (from the word
"interrupt") into the instruction system, which was originally intended to literally force
a call to an interrupt handler; at that time, however, x86 had neither privileged mode
nor virtual memory yet. The command (or rather its effect) was called a software
interrupt. On i386 not any handler can be called with the int command, but only the
one specially designed for this purpose.
So, if we accept the terminology typical for x86 processors, we can distinguish
three types of interrupts: external (aka hardware), internal and software interrupts,
which can be considered a special case of internal interrupts. The difference between
software interrupts and the others is that they occur at the initiative of the user task,
while the other interrupts occur without the user's knowledge: external interrupts - at
the request of external devices, internal interrupts - when it is impossible to execute the
next command of the active program. If we use the term "software interrupt", we can
say that system calls are realized through software interrupts.
The fact that a software interrupt certainly doesn't interrupt anything seems
obvious, and the term "software interrupt" itself seems to be an oxymoron when you
look at it closely; it's no wonder that when describing architectures other than x86, most
often only "real" (read hardware) interrupts are referred to as interrupts; as already
mentioned, instead of the term "internal interrupt" in this case the term "exception" or
"trap" is used, and instead of "program interrupt" they simply speak of a system call,
without distinguishing between the call itself and the mechanism of its realization.
In any case, privilege escalation (transition from restricted mode to privileged
mode) is possible only if control is simultaneously transferred to a predetermined entry
point, and the addresses of possible entry points can only be configured in privileged
mode. By setting the addresses of its own procedures as all provided handlers, the
operating system is guaranteed that when the mode of operation is changed to
privileged mode, the operating system's own code will take control, and only such of
its code as is specifically designed for this purpose. Execution of user code in privileged
mode is completely excluded. Using the terminology of "three types of interrupts", we
can say that mode change from restricted to privileged mode occurs only when an
interrupt (of any of the three types) occurs, whereas if we call interrupts only "real"
interrupts, we have to say that the CPU mode changes to privileged mode in three cases:
interrupt, exception, and system call.
Agreements on how exactly a system call should be made, how to pass parameters
to it, what interrupt to use, how to get the result of execution, etc., vary from system to
system. Even if we are talking about two members of the Unix family (Linux and
FreeBSD) running on the same i386 hardware platform, the low-level implementation
of system calls is quite different. The next two paragraphs are devoted to describing the
§ 3.6. Interaction with the operating system 657
system call conventions of these two systems ; if you wish, you can read only one of
276

these two paragraphs pertaining to the system you are using.


It should be noted that Unix systems are designed mainly for C programming. For
this language, libraries are supplied with the system to facilitate working with system
calls - in particular, for each system call a library function ("wrapper") is provided,
allowing you to access the services of the kernel as an ordinary subroutine. System calls
in Unix OS have names that coincide with the names of the corresponding wrapper
functions from the C library. Unfortunately, this C-orientation leads to some
inconveniences when working in assembly language. For example, system calls can
change their numbers from system to system: for example, getppid in Linux has
number 64, and in FreeBSD - number 39. Programmers working in C may not think
about this, because in any Unix system they only need to call a regular function named
getppid, and the specific execution of the system call is assigned to the library
included with the system, so a program written by a C programmer using getppid
will be successfully compiled in any system and work the same way.
When we write in assembly language, we don't have a library of system calls, we
have to specify the call number explicitly, so in the text intended for Linux we have to
use the number 64, and for FreeBSD - 39. It turns out that the source text will be
good for one system and wrong for the other. The same is the case with some numeric
constants that calls receive as input. The macro processor with its conditional
compilation directives can partially help us out, or we can limit ourselves to one system
(which is not really the right thing to do). Fortunately, FreeBSD and Linux systems are
still similar in many ways; the numerical values associated with system calls overlap.
Anyway, forewarned is forearmed.

3.6.5. Linux system call convention


The i386-based Linux kernel uses program interrupt number 80h for system calls.
The system call number is passed to the kernel via the EAX register; if the system call
accepts parameters, they are located in the EBX, ECX, EDX, ESI, EDI, and (in
very rare cases) EBP registers; note that all system call parameters are four-byte
values, either integer or address. The result of the call is returned via the EAX register,
with the value between fffffff000h and ffffffffh indicating an error and
representing the conditional error code.
Let's consider for example the write system call, which allows you to output
data through one of the open I/O streams, including writing to an open file, as well as
to the standard output stream (in common parlance, "to the screen"). This system call
takes three parameters: the I/O stream descriptor (number), the memory address where
the data to be output is located, and the amount of that data in bytes. In Linux for i386,
the write call is number 4.
The standard output stream in Unix has the descriptor 1 (more precisely, an output

Naturally, the version for i386; versions designed for other hardware architectures are organized
276

differently.
§ 3.6. Interaction with the operating system 658
stream numbered 1 is considered standard output). For example, to print a line "to
the screen", which is what the PRINT macro does, we would need to put the number
4 in EAX, the number 1 in EBX, the line address in ECX, and the line length
in EDX, and then issue an int 80h command to initiate a software interrupt.

Another important system call is called _exit and is used to terminate a


program. It has the number 1 and takes one parameter, which is the termination code
familiar from Pascal's halt (see §2.4.2). Recall that programs use the termination
code to tell the operating system whether they have successfully completed their task:
if everything went as expected, code 0 is used; if errors occurred, codes 1, 2,
etc. are used.
Knowing all this, we can write a program that prints a line and terminates
immediately afterwards; we don't need the stud_io.inc file and its macros
anymore:

global _star
t
sectio
.data
n
msg db "Hello world",
msg_le 10
equ ms
n $- g
sectio
.text
n
_start mov eax 4 ; call write
: mov , ebx 1 ; standard output
mov , ecx msg
mov , edx msg_len
int , 80h
eax
mov , 1 ; call _exit
mov ebx 0 ; code for
int , 80h "success".
Some system calls do not fit into this convention; for example, the llseek call has a 64-bit
parameter and returns a 64-bit number too. What the kernel and the library do in such cases
is a question we will leave outside the scope of our book.

3.6.6. FreeBSD OS system call convention


The FreeBSD OS convention is a bit more complicated. This system also uses the
80h interrupt and accepts the system call number through the EAX register, but all
call parameters are not passed through registers but through the stack, similar to the
way parameters are passed to subroutines according to C conventions, i.e. in reverse
order (see page 601). As in Linux, all call parameters are four-byte values. The result
of a system call is returned via the EAX register, but the error is indicated by the set
value of the CF flag rather than by the value falling into a special space (as in Linux).
If CF is reset, the call was successful and its result is in EAX, if the flag is set, an
error occurred and the error code is written in EAX.
§ 3.6. Interaction with the operating system 659
There is one other very unobvious feature to consider. The FreeBSD kernel assumes
that control is transferred to it by invoking the following procedure:

kernel:
int 80h
ret

If we have such a procedure, all we need to do to call the kernel is to put the
parameters on the stack just like a normal procedure, put the call number in EAX,
and call kernel; the call command will put the return address on the stack,
which will be on the top of the stack when the program interrupt is executed, and the
parameters will be on the stack below the top. The FreeBSD kernel takes this into
account and does nothing with the number at the top of the stack (because this number
- the return address from the kernel procedure - has nothing to do with the call
parameters), and retrieves the actual parameters from the stack below the top (from
positions [esp+4], [esp+8], etc.).
When working in assembly language, it is not necessary to separate the interrupt
call into a separate subroutine; it is enough to put an additional "double word" into the
stack before the int command, for example, by executing the push eax
command (or any other 32-bit register). After executing the system call and returning
from it, you should remove from the stack everything that was put there; this is done,
as well as when calling ordinary subroutines, by increasing the ESP register by the
required value with a simple add command.
In describing the Linux convention in the previous paragraph, we used the write
and _exit calls for illustration (see page 650). A similar program for FreeBSD would
look as follows:

global _start

section .data
msg db "Hello world", 10
msg_len equ $-msg

section .text
_start:
push dword msg_len
push dword msg
push dword 1 ; standard output
mov eax, 4 ; write
push eax ; anything
int 80h esp, ; 4 double
add 16 words
; code for
push dword 0 "success".
mov eax, 1 ; call _exit
push eax ; anything
§ 3.6. Interaction with the operating system 660
int 80h

We did not clear the stack after the _exit system call because it does not return
control anyway. In this example, we do not handle errors, assuming that writing to the
standard output stream is always successful (this is generally not true, but programmers
often ignore it). If we wanted to handle errors "fairly", the first instruction after int
80h should be the jc or jnc instruction, which makes a conditional jump
depending on the state of the CF flag, otherwise we risk that the next instruction
will set this flag according to its results and the sign of the error will be lost. In Linux
it was a bit easier, it was enough not to touch the EAX register and nothing would be
lost.

3.6.7. Examples of system calls


In the above examples, we looked at the _exit and write system calls; recall
that _exit has number 1 and takes one parameter, the completion code, and the
277

write call has number 4 and takes three parameters, namely the number
("descriptor") of the output stream (1 for a standard output stream), the address of the
memory area where the output data is located, and the amount of this data.
To enter data (both from files and from the standard input stream, i.e. "from the
keyboard"), the read call, number 3, is used. Its parameters are similar to
the write call: the first parameter is the number of the input stream descriptor (for
standard input the descriptor 0 is used), the second parameter is the address of the
memory area where the read data should be placed, and the third parameter is the
number of bytes to be read. Naturally, the memory area whose address we pass in the
second parameter must be at least as large as the number passed in the third parameter.
It is very important to analyze the value returned by the call read (recall that this
value immediately after the call is contained in the register EAX). If the reading was
successful, the call will return a strictly positive number - the number of bytes read,
which, of course, can not exceed the number "ordered" through the third parameter, but
may well be less (for example, we demanded to read 200 bytes, but in reality was read
only 15). The case when read returns the number 0 is very important - it indicates
that an "end-of-file" situation has occurred in the input stream being used. When
reading from files, it means that the whole file has been read and there is no more data
in it; recall that when typing from the keyboard in Unix, you can simulate the "end of
file" situation by pressing the Ctrl-D key combination.
Remember that a program that uses the read call and does not analyze its
result is obviously incorrect. Indeed, in this case we cannot know how many of the
first bytes of our memory area contain actually read data, and how many of the
remaining bytes continue to contain arbitrary "garbage" - hence, any meaningful work
with this data is impossible.
When reading, as with other system calls, an error can occur. As we have seen, in

277
At least on Linux and FreeBSD systems; hereafter, unless explicitly stated, it is assumed that this
is true for at least those two systems.
§ 3.6. Interaction with the operating system 661
Linux this is detected by the "negative" value of the EAX register after returning from
a call, or more specifically, by the value between fffffff000h and
ffffffffffh; FreeBSD uses the CF flag (carry flag): if the call succeeds, this
flag will be reset on exit, if an error occurs, the flag will be set. This applies to the
read call, the previously discussed write call (we did not handle error situations
so as not to complicate our examples, but this does not mean that errors cannot occur),
and all other system calls.
When a program is started, it usually has I/O streams numbered 0 (standard input),
1 (standard output), and 2 (error reporting stream) open, so we can apply a read
call to handle 0 and a write call to handles 1 and 2. Often, however, a task
requires the creation of other I/O streams, such as those for reading and writing files on
disk. Before we can work with a file, we need to open it, as a result of which we will
have another I/O thread with its own number (descriptor). This is done using the open
system call number 5. The call accepts three parameters. The first parameter is the
address of a string of text specifying the name of the file; the name must end with a
zero byte, which serves as a limiter. The second parameter is a number specifying the
mode of use of the file (read, write, etc.); the value of this parameter is formed as a bit
string in which each bit represents a particular feature of the mode, e.g., write-only
accessibility, permission to create a new file if it does not exist, etc. Unfortunately, the
arrangement of these bits is different for Linux and FreeBSD; some of the flags are
Table 3.4. Some flags for the second parameter of the open call
title description significance for
Linux FreeBSD
O_RDONLY reading only 000h 000h
O_WRONLY recording only 001h 001h
O_RDWR reading and writing 002h 002h
O_CREAT allow file creation 040h 200h
O_EXCL require the creation of a file 080h 800h
O_TRUNC if the file exists, destroy 200h 400h
its contents
O_APPEND if the file exists, add 400h 008h
at the end of

together with their descriptions and numerical values are given in Table 3.4. Note that
two variants for this parameter are the most common. The first is opening a file for
reading only, in both systems under consideration this case is set by the number 0.
The second case is opening a file for writing, when a file is created if it was not there,
and if it was, its old contents are lost (in C programs this is set by the combination
O_WRONLY|O_CREAT|O_TRUNC). For Linux the corresponding numerical value
is 241h, for FreeBSD it is 601h. The third parameter of the open call is used only
when a file is created and specifies access rights for it (see §1.2.13). In most cases it
should be set to the octal number 0666q, which corresponds to read and write
§ 3.6. Interaction with the operating system 662
permissions for all users of the system; 0600q (owner-only permissions) is less
frequently used, and other values are almost never used; we will learn why this is so in
the second volume of our book (see §5.2.3).
For an open call, it is especially important to analyze its return value and check
if an error has occurred. The call may fail for a variety of reasons, most of which the
programmer can neither prevent nor predict: for example, someone may unexpectedly
erase a file we intended to open for reading, or deny us access to the directory where
we intended to create a new file. So, after executing the open call, we need to check
if the EAX register contains a value between fffffff000h and ffffffffh (in
Linux) or if the CF flag is raised (in FreeBSD). If the call succeeds, the EAX register
contains the descriptor of the open file (input or output stream). It is this descriptor that
should now be used as the first parameter in the read and write calls to the file.
As a rule, this value should be copied immediately after the call to the memory area
specially allocated for it.
When all actions with the file are completed, it should be closed. This is done using
the close call, which has the number 6. The call takes one parameter equal to the
file descriptor of the file to be closed. The I/O stream with this descriptor then ceases
to exist; subsequent calls to open may use the same descriptor number again.
A Unix task can find out its number (the so-called process ID) with the getpid
call, and its "parent process" (the one that created the process) with the getppid call.
The getpid call on both systems in question is number 20, while the getppid
call is number 64 on Linux and number 39 on FreeBSD. Both calls take no
parameters; the requested number is returned as the result of the call via the EAX
register. Note that these two calls always complete successfully; there is no place for
errors to occur.
The kill system call (number 37) allows you to send a signal to a process
with a given number. The call takes two parameters, the first one specifies the process
number , the second one specifies the number of the signal; in particular, signal #15
278

(SIGTERM) instructs the process to terminate (but the process can intercept this signal
and terminate not immediately or not terminate at all), and signal #9 (SIGKILL)
destroys the process, and this signal can neither be intercepted nor ignored.
Unix family operating system kernels support hundreds of different system calls;
interested readers can find information about these calls on the Internet or in specialized
literature. Note that to familiarize yourself with information about system calls, it is
desirable to know the C programming language, and it is much easier to work at the
level of system calls using the C language. Moreover, some system calls in some
systems may not be supported by the kernel, but instead emulated by C library
functions, which makes their use in assembly language programs almost impossible. In
this connection, it is appropriate to recall that we are considering assembly language

278
In fact, it is possible to send a signal to a group of processes or even to all processes in the system
at once; we will postpone a detailed description of all this - both the kill call and the process groups
themselves - until the next volume.
§ 3.6. Interaction with the operating system 663
for educational, not practical, purposes. Programs intended for practical use are better
written in C or other suitable languages.

3.6.8. Access to command line parameters


Working on Unix systems, we use command-line parameters all the time; we
discussed this in detail in the introduction (see §1.2.6), and we have even written
programs in Pascal that receive information through their command line (see §2.6.12);
if the term "command-line parameters" still makes you even the slightest bit unsure, it
is worth going back to the previous part of the book and practicing some more.
When starting a program, the operating system allocates a special memory area in
the address space of the newly created task (in the stack segment, to be more precise),
in which the words that make up the command line are located. For convenience,
information about the addresses of these words, together with their total number, is
placed on the stack of the task to be started (at least, Linux and FreeBSD do it this way,
although theoretically other agreements on parameter passing are possible), and then
control is passed to our program. At the moment when the program starts executing
from the _start label, at the top of the stack (i.e. at address [esp]) is a four-byte
integer equal to the number of command line items (including the program name), at
the next stack position (at address [esp+4]) is the address of the memory area
containing the name by which the program was called, then (at address [esp+8]) is
the address of the first parameter, then the address of the second parameter, and so on.
Each element of the command line is stored in memory as a string (array of characters),
bounded on the right by a zero byte.
For example, let's consider a program that prints its command line parameters
(including null). We will not use the stud_io.inc tools, because we know how to
do without them. Our program will be suitable for both Linux and FreeBSD. Since
system calls in these systems are executed differently, we will use conditional
compilation directives to select one or another text. These directives will assume that
when compiling under Linux we define (on the NASM command line) the OS_LINUX
macro symbol, and when running under FreeBSD we define the OS_FREEBSD
symbol. When working under Linux, our example (let's call it cmdl.asm) will need
to be compiled using the command

nasm -f elf -dOS_LINUX cmdl.asm

and when working under FreeBSD - with the command

nasm -f elf -dOS_FREEBSD cmdl.asm

To use the write call we need to know the length of each line to be printed, so for
convenience we will describe the strlen subroutine, which receives the address of
the line as a parameter through the stack and returns the length of this line through the
EAX register (assuming that the end of the line is marked with a zero byte). The
subroutine will follow the CDECL convention: for its internal needs it will use the EAX
§ 3.6. Interaction with the operating system 664
and ECX registers, which according to CDECL it has the right to corrupt, it will use
the EBP register as a reference point of the stack frame, as it is usually done, and will
restore it on exit, and will not touch the other registers.
Using strlen, we will write the print_str subroutine, which will receive
the address of the string as the first and only parameter, determine its length by calling
strlen, and output the resulting string to the standard output stream using the
write system call. In this subroutine we need the string address twice - the first time
we will pass it to the strlen subroutine and the second time to the system call. It
will have to be copied from the stack to a register anyway, so we'll leave it in the register
and not access the stack a second time; but since we're using CDECL, we should assume
that the subroutine being called will mess up EAX, ECX, and EDX. In fact, we know
that strlen does not mess up EDX, but we will not use this knowledge, otherwise
there is a risk that sometime in the future we will change the strlen code, seemingly
staying within CDECL, but print_str will no longer work; so when calling
subroutines we should not use knowledge of their internals, but instead use general
rules. With this in mind, we use the EBX register to store the string address, which will
have to be saved at the beginning of the subroutine and restored at the end; by the way,
if you make a system call according to Linux rules, EBX will still have to be
corrupted (not so for FreeBSD).
In addition to command line parameters, we will have to print line feed characters.
Here we will take a not quite optimal way in terms of performance, but we will save a
dozen lines of code: we will describe in memory a string consisting of a line feed
character (i.e. a memory area of two bytes, the first one is a line feed character with the
code 10, the second one is a limiting zero) and print this string with the help of the
print_str subroutine that we already have.
Of course, a special subroutine calling write for a single byte would work faster,
because it would not have to calculate the length of the string. If we are talking about overall
performance, we should not call the OS kernel twice for each string, and one such call is too
much; it would be better to form one large array in memory, copying the contents of all
command line parameters into it and placing translation characters where necessary, and print
it all in one system call. System calls are expensive, because they require context switching
and involve a number of complex actions performed in the kernel. The problem is that the
program text will grow by five times with such optimization and will lose in visibility quite
seriously.
We'll call the string consisting of a single line feed nlstr and put it right at the
beginning of the .text section. We can do this because our program doesn't change
this memory location; if it didn't, we'd have to put it in the .data section.
The main program, starting with the _start label, will place the number of
command line parameters in the EBX register, and in the ESI register will place a
pointer to the place in the stack where the address of the next command line parameter
to be printed is located. It would be more logical to use ECX for the counter, but it can
and will be corrupted by called subroutines, whereas EBX is required to be restored
by CDECL. At each iteration of the loop, ESI will be incremented by 4 to indicate
the next position on the stack, and EBX will be decremented to indicate that there is
§ 3.6. Interaction with the operating system 665
one less line to print. The complete text will turn out like this:

;; cmdl.asm ;;
global_start

section .text
nlstr db10 , 0

strlen
: ; arg1 == address of the
string push ebp mov ebp, esp xor
eax, eax
mov ecx, [ebp+8] ; arg1
. p:l
cmp byte [eax+ecx], 0 jz .quit inc
eax
jmp short .lp
.quit:
pop ebp ret

print_str: . arg1 == address of the


push string
;
ebp mov esp
ebp, push ; will be spoiled
ebx mov [ebp+8] ; arg1
ebx, push ; (and in ebx, as
ebx well)
call strlen
add esp, 4 ; the length is now in
%ifdef OS_FREEBSD eax
push eax ; length
push ebx ; arg1
push dword 1 ; stdout
mov eax, 4 ; write
§ 3.6. Interaction with the operating system 666
push eax ; extra dword
int 80h add esp,
16
/elifdef OS_LINUX
mov edx, eax ; edx nowcontainsthelength
mov ecx, ebx;arg1; wasstoredinebx
mov ebx, 1 ; stdout
mov eax, 4 ; write
int 80h
/else
/error please define either OS_FREEBSD or OS_LINUX /endif
pop ebx
mov esp, ebp
pop ebp ret

_star
t argc
mov ebx, [esp] ;
mov esi, esp add argv argv[i]
again esi, 4 ;
: push dword [esi]
; call print_str
add esp, 4 push
dword nlstr call
print_str add
esp, 4 add esi,
4 dec ebx jnz
again
-
OS_FREEBSD
/.ifdef
push dword 0 ; success
mov eax, 1 ;
push eax ; _exit
int 80h extra dword

/else mov ebx, 0 ;


mov eax, 1 ; success
int 80h _exit

/endi
f

3.6.9 Example: Copying a file


Let's consider another example of a program actively interacting with the
operating system. This program will receive through the command line parameters the
names of two files - the original and the co-
and create a copy under a given name from a given original. Our program will work
§ 3.6. Interaction with the operating system 667
quite simply: after checking that it is indeed passed two parameters, it will try to open
the first file for reading, the second file - for writing, and if it succeeds, it will cyclically
read data from the first file in portions of 4096 bytes until the "end of file" situation
occurs. Immediately after reading each chunk, the program will write the read to the
second file. The real cp command, intended for copying files, is much more
complicated, but we don't need the extra complexity for the tutorial example.
It is clear that our program will have to make extensive use of system calls. The
matter is complicated by the fact that we would like, of course, to write a program that
will successfully compile and run under both Linux and FreeBSD. As we saw in the
example program from the previous paragraph, this requires a rather cumbersome
framing of each system call with conditional compilation directives. The previous
example, which contained only two system calls, could have been written without much
thought to this problem, which we did; a program with more than a dozen calls to the
operating system is a different matter. In order not to clutter the source code with
monotonous but voluminous (and thus distracting) constructions, we will write one
multi-line macro that will make the system call (or rather, generate assembly code to
execute the system call). The text of this macro will contain all the differences in the
organization of system calls for Linux and FreeBSD. The macro will accept an arbitrary
number of parameters, not less than one; the first parameter will specify the number of
the system call, the rest - the values of the system call parameters. Note that under Linux
our macro will refuse to work with more than six parameters because they will not fit
into the registers; for FreeBSD we will not impose such a restriction.
We will make our macro conform to the CDECL convention: when using it, we
have to assume that the EAX, ECX and EDX registers will be corrupted, while all
others will retain their value. For FreeBSD, CDECL compliance will work by itself,
since its system calls only use the EAX register, while for Linux we have to take some
measures. If the total number of macro parameters is more than one (i.e. at least one
parameter is passed to the system call), we will need to save on the stack and restore
the EBX register at the end of the macro; if the total number of parameters is more
than four, we will also save and restore ESI, EDI and EBP.
The issue of values returned by the system call deserves a separate consideration.
As we already know, in Linux only the EAX register is used for this purpose, with a
special range for error codes among its possible values, whereas in FreeBSD the CF
flag is also used, and if it is raised, then an error has occurred and EAX contains its
code. Both variants imply some difficulties in processing: under FreeBSD the CF
flag is immediately spoiled, so it must be checked immediately after returning
from a call (in our case this means that it must be checked in the body of the macro),
while for Linux we have to write a somewhat cumbersome check if a number falls
within a given range.
Taking advantage of the fact that the ECX register can still (according to CDECL
convention) be corrupted, we make a certain convention that our macro will follow in
both systems. If the system call completes successfully, its result will be in the EAX
register, and ECX will be zero; if an error occurs, the ECX register will
contain its code (fortunately, error codes in both systems are never zero), and the EAX
§ 3.6. Interaction with the operating system 668
register will then contain the number -1. This will make it easier to check the
success of the system call in the program text (after calling our macro).
When passing parameters to the macro and allocating them to the appropriate
registers (in the Linux version), we will use a technique we have already seen (see the
comment on page 632) - putting all parameters on the stack and then fetching them into
the necessary registers. In the FreeBSD version we don't need any register allocation,
but we do need to put the parameters on the stack for use by the system call itself. In
both cases, we can start the body of the macro by putting all its parameters on the stack
(in reverse order, so that we don't have to reorder them in the FreeBSD version). To do
this, we will use the '/rotate directive just as we did when writing the pcall
macro (see page 634).
After that, in the FreeBSD version it is enough to put the call number into EAX
and you can give control to the kernel; in the Linux version it is not so simple, we
have to extract parameters from the stack and place them in registers, and different sets
of registers will be used for different number of parameters; to handle all this correctly
we will have to write a number of nested conditional compilation directives that are
triggered depending on the number of parameters passed to the macro.
After returning from a system call, our actions also differ depending on the
operating system used. In general, there are more differences between the two
implementations of the macro than there is in common, so we will separate them
completely with conditional compilation directives to make it clearer. We will call the
macro itself kernel. Perhaps it would be more logical to call it something else, but, for
example, the most natural word for it - syscall - denotes the mnemonics of a
machine command, although not present on all i386-compatible processors, but known
to the NASM assembler, so we should not use this word.
Since the structure of macro directives is quite confusing, we will use structural
indents for them, and we will place the mnemonics of machine commands two tabs
from the left edge of the screen so that they do not get mixed up with the directives.
Our final macro will look like this:

/macrokernel 1-*
/ifdef OS_FREEBSD /rep /0 /rotate -1 push dword /1 /endrep
mov eax, [esp] int 80h jnc //ok mov ecx, eax
mov eax, -1 jmp short //q //ok: xor ecx, ecx
//q: add esp, /0 * 4
/elifdef OS_LINUX
/if /0 > 1
push ebx
/if /0 > 4
push esi
push edi push ebp /endif /endif /rep /0
/rotate -1
push dword /1
/endrep pop eax /if /0 > 1
pop ebx
/if /0 > 2
§ 3.6. Interaction with the operating system 669
pop ecx
/if /0 > 3
pop edx /if /0 > 4
pop esi
§ 3.6. Interaction with the operating system 670
/if /0 > 5
pop edi
/if /0 > 6
pop ebp
/if /0 > 7
/error "Can't do Linux syscall with 7+
params" /endif
/endif
/endif
/endif
/endif
/endif
/endif
int 80h
mov ecx eax
and ecx
, 0fffff000
,
cmp ecx h
0fffff000
jne ,//okh
mov ecx, eax
neg ecx
mov eax, -1
jmp short //q
//ok: xor ecx, ecx
//q:
/if /0 > 1
/if /0 > 4
pop ebp
pop edi
pop esi
/endif
pop ebx
/endif
/else
/error Please define either OS_LINUX or OS_FREEBSD
/endif
/endmacro

The text of the macro is, of course, quite long, but this is compensated for by reducing
the size of the main code. For example, when talking about system call conventions,
we gave the code of a program that prints one line in the Linux (page 651) and FreeBSD
(page 652) versions. Using the kernel macro, we can write like this:

section .data
msg db "Hello world", 10 msg_len equ $-msg section .text
global _start
_start: kernel 4, 1, msg, msg_len kernel 1, 0

and that's all, and this program will compile and work correctly under both systems,
you just need to remember to specify NASM\ flag -dOS_LINUX or -
§ 3.6. Interaction with the operating system 671
dOS_FREEBSD.
One more thing will depend on the system used in our program. When opening the
copied file for reading, the second parameter of the open call should be O_RDONLY,
which is zero on both systems in question; but when opening the target file for writing,
we will have to use a combination of O_WRONLY, O_CREAT, and O_TRUNC, two
of which, as discussed on page 655, have different numerical values on Linux and
FreeBSD. 655, have different numerical values on Linux and FreeBSD. The second
parameter of the open system call should in this case be 241h in Linux and 601h
in FreeBSD (see Table 3.4). In order not to remember the differences between the
two supported systems, we will introduce a special symbol-label, the value of which
will depend on the system for which the translation is performed:

%ifdef OS_FREEBSD
openwr_flags equ 601h
%else ; assume it's Linux
openwr_flags equ 241h
%endif

Now let's form the variable section. We will need a buffer for temporary data storage,
into which we will read the next portion of data from the first file to write it to the
second file. In addition, we will also place file descriptors in variables. We could also
use registers, but we would lose in clarity. We will call the corresponding variables
fdsrc and fddest. Finally, for convenience, we will create variables for storing
the number of command line parameters and the address of the beginning of the array
of pointers to command line parameters, calling these variables argc and argvp.
All these variables do not require initial values and can therefore be located in the .bss
section:

sectio .bss
buffer
n resb 4096
bufsiz equip $-
e
fdsrc resd buffer
1
fddest resd 1
argc resd 1
argvp resd 1
When launching our program, the user may specify the wrong number of command line
parameters; the file specified as the data source may not be available or may not exist;
finally, we may not be able to open the file specified as the target file for writing for
some reason. In the first case, we should explain to the user what parameters to run our
program with, in the other two cases we should simply inform him about the error. Our
program will generate error messages into the standard diagnostic thread with
descriptor 2. We will place all three error messages in the .data section as initialized
variables:

section .data
helpmsg db 'Usage: copy <src> <dest>', 10
§ 3.6. Interaction with the operating system 672
helplen equ $-helpmsg
err1msg db "Couldn't open source file for reading", 10
err1len equ $-err1msg
err2msg db "Couldn't open destination file for writing",
10 err2len equ $-err2msg

Now let's start writing the .text section, i.e. the program itself. First of all, let's
make sure that we have exactly two parameters passed to us, for this purpose we will
extract from the stack the number at the top of the stack, which denotes the number of
command line elements, and put it into the argc variable. Just in case we
save the address of the current stack top in the argvp variable, but we won't
extract anything else from the stack, so we will have an array of addresses of command
line element strings in the stack area. Let's check that the variable argc contains
the number 3 - a correct command line in our case should consist of three elements:
the name of the program itself and two parameters. If the number of parameters is
incorrect, print an error message to the user and exit:

section .text
global _start
_start:
pop dword [argc] mov [argvp], esp cmp dword [argc],
3 je .args_count_ok kernel 4, 2, helpmsg, helplen kernel
1, 1 .args_count_ok:

Our next action should be to open the file, whose name is specified by the first
command line parameter, for reading. We remember that the argvp variable contains
the address in memory (stack memory), starting from which the addresses of command
line items are located. Let's extract the address from argvp into the ESI register,
then take the four-byte value at address [esi+4] - this will be the address of the first
parameter of the command line, i.e. the line specifying the name of the file to be read
and copied. To store the address, we will use the EDI register, and then make a call to
open. We will have to use two parameters - the actual address of the file name and the
mode of its use, which will be 0 (O_RDONLY) in this case. The result of the system
call must be checked. Recall that the kernel macro is designed so that the EAX
value equal to -1 indicates an error, and any other value indicates the successful
execution of the call; when applied to the open call, the result of successful
execution is the descriptor of a new I/O stream, in this case it is the input stream
associated with the copied file. In case of success, we save the obtained descriptor in
the fdsrc variable; in case of failure, we generate an error message and exit.

mov esi, [argvp] mov edi, [esi+4] kernel 5, edi, 0


; O_RDONLY
cmp eax, -1 jne .source_open_ok kernel 4, 2,
err1msg, err1len kernel 1, 2 .source_open_ok:
mov [fdsrc], eax
§ 3.6. Interaction with the operating system 673
Now it's time to open the second file for writing. To retrieve its name from memory we
will use ESI and EDI registers in the same way, after that we will execute the system
call open, in case of error we will display a message and exit, in case of success
we will save the descriptor in the variable fddest. The open call here will look a
bit more complicated. First of all, the mode of opening for writing, as discussed above,
depends on the system and will be set by the symbol-label openwr_flags.
Secondly, since it is possible to create a new file, our system call must also receive the
third parameter, which, as we have noted earlier, is usually equal to 666q. Taking all
this into account, we will get the following code:

mov esi, [argvp]


mov edi, [esi+8]
kernel 5, edi, openwr_flags, 0666q
cmp eax, -1
jne .dest_open_ok
kernel 4, 2, err2msg, err2len
kernel 1, 3
.dest_open_ok:
mov [fddest], eax

Now let's write the main loop. In it, we will read from the first file, analyze the result,
and if the end of the file is reached (value 0 in EAX) or an error occurs (value -1),
we will exit the loop, and if the reading is successful, we will write all the read
(that is, as many bytes from the buffer memory area as read has read; this
number is contained in EAX) to the second file. Since read cannot return a number
larger than its third parameter (4096 in our case), we can combine the error and end-
of-file situations using the EAX 6 0 condition.

.again: kernel 3, [fdsrc], buffer, bufsize


cmp eax, 0
jle .end_of_file
kernel 4, [fddest], buffer, eax jmp .again

We have exited the loop by moving to the .end_of_file label; sooner or later our
program, having reached the end of the first file, will move to this label, after which we
will only have to close both files by calling close and terminate the program:

.end_of_file:
kernel 6, [fdsrc].
kernel 6, [fddest].
kernel 1, 0

Note that we have made all labels in the main program, except for the _start
label, local (their names start with a dot). It is not necessary to do so, but this
approach to labels (all labels that are not supposed to be accessed from somewhere far
away should be made local) allows us to avoid problems with name conflicts in larger
§ 3.6. Interaction with the operating system 674
programs.
The full text of our example can be found in the file copy.asm.

3.7. Split broadcast


We have already encountered building a program from separate modules when we
studied Pascal (see §2.14.1). Recall the basic idea of separate compilation: each module
is compiled separately, the compilation results in a file in some intermediate format,
then all these files are linked together to form an executable. The advantage is that the
final compilation of the executable file is not a separate compilation.
building an executable file from individual modules in an intermediate representation
is much faster (in some cases - by several orders of magnitude) than compiling
individual modules into this intermediate representation; when making changes to the
program source code, you can compile only those modules that have been affected, and
for the rest use intermediate files obtained earlier, saving valuable programmer time.
Pascal compilers usually use some "proprietary" intermediate representation of the
compiled modules, but for low-level programming languages - assembly language and
C, which we will study later - this format is fixed and is called object code; we have
already met this concept (see §3.1.4). As we have seen, the linker, aka link editor, aka
linker program ld is used to build the executable, but so far we have always run
ld to create an executable from a single (main) module; in this chapter we will learn
how to use it to finally build an executable from an arbitrary set of object modules.
As we mentioned when discussing Pascal modules, a very important property of a
module is that it has its own namespace, which allows us to hide from other modules
the names used by our module for internal purposes, and thus avoid accidental name
conflicts. In assembly language terms, this means that labels entered in a module will
only be visible from elsewhere in the same module, unless we specifically declare them
"global"; recall that in NASM assembly language this is done with the global
directive. It often happens that a module introduces several dozens, and sometimes
hundreds of labels, but all of them are needed only in the module itself, and only one
or two procedures are needed from the rest of the program. This practically eliminates
the problem of name conflict: labels with the same names may appear in different
modules, and it doesn't bother us in any way, unless they are global. Technically, this
means that when translating the source code of a module into object code, all labels
except those declared as global labels disappear, so that the object file contains only
information about the names of global labels.
Hiding implementation details (so-called encapsulation) can be used as the
simplest "foolproofing", preventing other programmers from using our module's
features in a different way than we intended; of course, such protection can be easily
circumvented if desired, but our colleagues might at least think about whether they are
doing the right thing. In addition, local names can be left out of the technical
documentation and can be changed without fear that something will "break" in other
modules.
§ 3.7. Split broadcast 675
3.7.1 Support of modules in NASM
The NASM assembler supports modular programming by introducing two basic
concepts: global labels and external labels. We are already familiar with the former:
such labels are declared with the global directive and, as we already know, differ
from the usual ones in that information about them is included in the module object file
and becomes visible to the system linkage editor. As for external labels, on the contrary,
these are labels that we expect other modules to introduce. Most often it is simply the
name of a subprogram (less often a global variable) that is described somewhere in
another module, but which we need to refer to. To make this possible, we must inform
the assembler of the existence of this label. Indeed, during translation the assembler
sees the text of only one module and knows nothing about the fact that other modules
declare certain labels, so if we try to access a label from another module without
informing the assembler of its existence, we will get an error message. To declare
external labels, the NASM assembler introduces the extern directive. For example,
if we write a module in which we want to refer to the myproc procedure, but the
procedure itself is described elsewhere, we should write the extern directive to inform
the assembler about it:

extern myproc

Such a string tells the assembler literally the following: "the myproc label exists even
though it is not in the current module; if you encounter this label, just generate the
appropriate object code, and the link editor will substitute the specific address for the
label".

3.7.2. Example
As a multi-module example, we will write a simple program that asks the user for
his name and then greets him by name. This time we will organize string handling the
way it is usually done in C programs: we will use the null byte as a sign of the end of
the string. We have already encountered this representation of strings when we studied
command line parameters (§3.6.8) and even wrote the strlen subroutine that
calculates the length of a string; we will need it this time as well.
The head program will depend on two main subroutines, putstr and getstr,
each of which will be placed in a separate module. The putstr subroutine will need
to calculate the length of the string in order to print the entire string in one call to the
operating system; for this calculation we will use the familiar strlen, which we
will also put into a separate module. Another module will contain

subroutine that organizes the _exit call; we will call it quit. All modules will be
named the same as the subroutines they contain: putstr.asm, getstr.asm,
strlen.asm and quit.asm.
To organize system calls, we use the kernel macro, which we described on
page. 663. We will also put it in a separate file, but this file cannot be a full-fledged
§ 3.7. Split broadcast 676
module. Indeed, a module is a unit of translation, while a macro, in general, cannot be
translated into anything: as we noted earlier, macros completely disappear during
translation and there is nothing left of them in the object code. This is understandable,
because macros are a set of instructions not for the processor, but for the assembler
itself, and in order for a macro to be of any use, the assembler must, of course, see the
macro definition wherever it encounters a reference to the macro. That's why we will
connect the file containing our kernel macro to other files with the '/include
directive at the stage of macro-processing (unlike modules, which are assembled into a
single whole with the help of the link editor much later - after the translation is
completed). We will call this file kernel.inc; we may well start with it by opening
it for editing and typing in the macro definition given on page 663; nothing else in this
file. 663; nothing else needs to be typed in this file.
Next we will write the strlen.asm file. It will look like this:

;; asmgreet/strlen.asm ;;
global strlen

section .text
; procedure strlen
; [ebp+8] == address of the string
strle push ebp esp
n: mov ebp,
eax
xor eax, ;
mov ecx, [ebp+8] arg1
. p:l cmp byte [eax+ecx 0
jz - .qui ],

inc teax
jmp shor . pl
t
.quit pop ebp
: ret

The first line of the file indicates that the strlen label will be defined in this module
and that this label should be made visible from other modules. It is better to place
global and extern directives at the very beginning of the module text for clarity.
We will not comment the code of the procedure in detail, as we are already familiar
with it.

With the strlen procedure at our disposal, let's write the putstr.asm
module. The putstr procedure will call strlen to calculate the length of the
string and then call the write system call; the new procedure will differ from the
print_str procedure we wrote in the example that prints command line
arguments by using the kernel macro.

;; asmgreet/putstr.asm ;;
'/include "kernel.inc" ; need kernel macro
global putstr ; module describes putstr
extern strlen ; and uses strlen itself
§ 3.7. Split broadcast 677
section .text
; procedire putstr
; [ebp+8] = address of the string
putstr push ebp ; normal start
: mov ebp, esp ; subroutines
push dword [ebp+8] ; call strlen to
call strlen ;calculate string
length.
add esp, 4 ; result now
inEACH
kernel 4, 1, [ebp+8], eax ; call write
mov esp, ebp; normal termination
pop ebp ; subroutines
the turn of the most complex module - getstr. The getstr procedure
Now it is ret
will receive as input the address of the buffer in which the read string should be placed,
as well as the length of this buffer to prevent it from overflowing if the user thinks of
typing a string that will not fit in the buffer. To simplify the implementation, we will
read the string one character at a time. Of course, real programs don't do this, because
a system call is quite expensive in terms of program execution time, and it's a bit
wasteful to spend it on a single character; but our goal now is not to get an efficient
program, so we can make our lives a bit easier.
The getstr subroutine will use the EDX register to store the address of the
current position in the buffer and the ECX register to store the total number of
characters read; at the beginning of the loop, ECX will be incremented by one
and its new value will be compared with the value of the second argument of our
procedure (i.e. the buffer size). This will allow us, in case of a threat of buffer overflow,
to terminate the execution of the procedure by writing a limiting zero to the end of the
buffer - there is still enough space for it in the buffer, because we will write it in this
case instead of reading the next symbol. Register EDX, we will also increase by one,
but already at the end of the cycle, after reading the next character and checking
whether it is not a character of the end of the line. When the end of the line is detected,
we will transfer control outside the loop without incrementing EDX, so that the limit
zero will be written to the buffer over the end-of-line character. There is also a third
case in which the character reading loop will be terminated - an "end of file" situation
occurs on standard input; in this case, no character will be read into the next buffer cell,
but instead a zero will be written into that cell.
Since our procedure will only use the ECX, EDX and AL registers, the CDECL
convention will be followed without any extra effort. The kernel macro is also
written in accordance with CDECL and may corrupt the values of the EAX, ECX and
EDX registers; we do not store anything long-term in EAX, we only use its low byte
(AL) for short-term storage of the read character code to compare it with the line feed
code; but ECX and EDX we will have to store on the stack before calling kernel
and restore afterwards. The complete getstr.asm module will look like this:

;; asmgreet/getstr.asm ;;
§ 3.7. Split broadcast 678
^inclu "kernel.inc" ; you need a kernel macro
de
global getstr ; getstr is exported
section .text
getstr: ; argl buffer address, arg2 - length
- ; standard start
push ebp mov ; procedures
ebp, esp ; ECX -- read count
xor ecx, ecx ; EDX -- current address in
mov edx, the buffer
[ebp+8] .again: inc ; increase the counter
ecx immediately
cmp ecx, ; and compare with the buffer
[ebp+12] jae size
.quit push ecx 1 ;; read if there's no room
1 character into--the
get
push edx out.
buffer
kernel 3, 0, ;
; recovering
save registers ECX
edx, pop edx ; andand
EDX EDXECX
pop ecx cmp ; system call returned 1?
eax, 1 jne ; if not, we're out.
.quit mov al, ; code of the read character
[edx] cmp al, ; -- is that a line feed
10 je .quit inc code?
edx jmp .again ; if so, we're out.
mov [edx], byte
; increment the current
0 mov esp, ebp address
.quit: pop ebp ; continue the cycle
ret
; -enter the :limiting 0
Now let's write the simplest of our modules quit.asm
; standard completion
;; asmgreet/quit.asm ;; ; procedures
'/include "kernel.inc"
global quit
section .text
quit: kernel 1, 0

All the subroutines are ready; let's start writing the head module, which we will call
greet.asm. Since all calls to system calls are placed in subroutines, we won't need
the kernel macro in the head module (and, therefore, the inclusion of the
kernel.inc file). We will describe the text of messages generated by the program
as usual in the form of initialized strings in the .data section; we should only
remember that in this program all strings must have a zero byte limiting them. We will
place the buffer for reading the string in the .bss section. The .text section
will consist of solid subroutine calls.
§ 3.7. Split broadcast 679
;; asmgreet/greet.asm ;;
global _start ; this is the head unit
extern putstr ; it uses subroutines
extern getstr ; putstr, getstr and
extern quit quit

sectio .data
; describe the messages
n db 'Hi, what
nmq is text your 0
db 'Pleased name?', 10, 0
p ym meet you, dear
excel to
db' !', 10, 0 ',
sectio .bss ; allocate und buffe
n buf resb512 memory er r
buflen equ$-buf
; the beginning of the
sectio
parent program
n
; call putstr for nmq
_start .text
: push dword nmq ; call getstr
call putstr add ; with parameters buf
esp, 4 push dword and
buflen push dword ; buflen
buf call getstr
add esp, 8 push ; call putstr for pmy
dword pmy call
putstr add esp, 4
push dword buf ; call putstr for
call putstr add ; the string entered
esp, 4 push dword ; by the user
exc ; call putstr for exc
§ 3.7. Split broadcast 680
call putstr
add esp, 4
call quit ; call quit

So, our working directory now contains the files kernel.inc, strlen.asm,
putstr.asm, getstr.asm, quit.asm and greet.asm. To get a working
program, we need to call NASM for each of the modules separately (remember that
kernel.inc is not a module):
nasm -f elf -dOS_LINUX strlen.asm
nasm -f elf-dOS_LINUX
nasm -f elfputstr.asm
nasm -f -dOS_LINUX
elf getstr.asm
Notenasm -dOS_LINUX
-f -dOS_LINUX
that the checkbox is needed only for those modules that use
elf quit.asm
kernel.inc, so we could have left it unchecked when compiling strlen.asm and
-dOS_LINUX
greet.asm. However, practice shows that it is easier to always specify such checkboxes
greet.asm
than to remember which modules need them and which do not.
The result of NASM work will be five files with the suffix ".o" representing object
modules of our program. To combine them into an executable file, we will call the ld
linkage editor (on 64-bit systems do not forget to add the -m elf_i386 checkbox):

ld greet.o strlen.o getstr.o putstr.o quit.o -o greet

The result this time will be an executable file called greet, which we will run as
usual with the ./greet command:

avst@host:~/work$ ./greet
Hi, what is your name?
Andrey Stolyarov
Pleased to meet you, dear Andrey Stolyarov!
avst@host:~/work$

3.7.3. Object code and machine code


The above examples show that each object module, among other things, is
characterized by a list of symbols (in assembler terms, labels) that it provides to other
modules, as well as by a list of symbols that it itself must provide to other modules.
Literally translating the names of the corresponding directives (global and
extern) from English, we can call such symbols "global" and "extern"; more
often, however, they are called "exportable" and "importable".

It is clear that when translating the source code, the assembler, seeing the reference
to an external label, cannot replace this label with a specific address, because it does
not know it - after all, the label is defined in another module, which the assembler does
not see. All the assembler can do is to leave a free space for such an address in the final
§ 3.7. Split broadcast 681
code and write information into the object file, which will allow the link editor to
arrange all the missing addresses when their values are already known. On closer
examination it turns out that the assembler cannot replace labels with specific addresses
not only in case of references to external labels, but never at all. The point is that, since
the program consists of several (as many as you like) modules, the assembler, when
translating one of them, cannot predict which module will be the last one in the final
program, what size all the preceding modules will be and, therefore, cannot know in
which memory area (even virtual memory) the code that the assembler is generating
now will be located.
Obviously, the linking editor does not see the source code of modules, and cannot
see it, since it is intended to link modules derived by different compilers from source
code in, quite possibly, different programming languages. Consequently, all the
information that is required for the final transformation of object code into executable
machine code must be written to the object file. The object code, which is obtained as
a result of assembly, is a kind of "semi-finished product" of machine code, in which
instead of absolute (numerical) addresses there is information about how to calculate
these addresses and where they should be placed in the code.
Note that you can find out information about the characters contained in the object
file using the nm program. As an exercise, try to apply this program to the object files
of modules you have written (or modules from the above examples) and interpret the
results.

3.7.4. Libraries
Most often, programs are not written "from scratch", as we have done in most
examples, but use sets of ready-made subroutines in the form of libraries. Naturally,
such subroutines are included in modules, and it is more convenient to have the modules
themselves in precompiled form, so as not to waste time on compilation; of course, it
is useful to have the source code of these modules available, but libraries are used more
often in precompiled form. Generally speaking, there are different kinds of program
libraries; for example, there are macro libraries, which of course cannot be precompiled
and exist only as source code. Here, however, we will consider a narrower concept,
namely what is meant by the term "library" at the link editor level.
From a technical point of view, a subroutine library is a file that combines some
number of object modules and, as a rule, contains tables for accelerated search of
symbol names in these modules.
Note one important property of object files: each of them can be included in the
final program only in its entirety or not included at all. This means, for example, that
if you have combined several subroutines in one module, and someone needs only one
of them, the executable file will still contain the code of the whole module (i.e. all
subroutines). It is worth keeping this in mind when dividing a library into modules; for
example, system libraries supplied with operating systems, compilers, etc. are usually
organized according to the "one function, one module" principle.
Libraries are compiled from individual object modules using specially designed
programs. In Unix, the corresponding program is called ar. Its original purpose was
§ 3.7. Split broadcast 682
not limited to creating libraries (the very name ar means "archiver"), so when you
call the program, you must specify with a command line parameter what you want it to
do. For example, if we wanted to combine all modules of the greet program into a
library (except, of course, the main module, which cannot be used in other programs),
we could do it with the following command:

ar crs libgreet.a strlen.o getstr.o putstr.o quit.o

The choice of file name for a library should be noted separately. The .a suffix (from
the word archive) is considered standard for static library files in Unix, but there is
more to it than that. There is a rather unobvious convention that you should add not
only a suffix to the library name (which is clear and familiar), but also a prefix - these
three letters lib. So in this case the library name is just greet, while the name of
the file containing the library is libgreet.a, this file will be the result of ar.
After that, you can link the greet program with the link editor by specifying the
name of the library file:

ld greet.o libgreet.a

But you can do something else - specify with flag -1 the name of the library (in
our case just greet), and with flag -L - the directory where the library with this
name should be searched (in our case the current one):
ld greet.o -l greet -L .
This approach is convenient for libraries installed in the system, because the linker
knows the system directories itself and the -L flag is not needed.
Unlike a monolithic object file, a library, while packed into a single file, continues
to be a set of object modules from which the link editor selects only those it needs to
satisfy unresolved links. This will be discussed in more detail in the next paragraph.

3.7.5. Algorithm of the linkage editor


The linkage editor is given a list of objects on the command line, each of which can
be either an object file or a library, object files can be specified by filename only,
whereas libraries can be specified in two ways: either by explicitly specifying a
filename or, using the -1 flag, by specifying a library name, which can be
simplistically understood as a library filename with the prefix lib and the suffix .a
stripped away . When using the -1 flag, the linkage editor tries to find a library
279

file with a corresponding library name.


In its work, the link editor uses two lists of symbols: a list of known (resolved)
symbols and a list of unresolved links. The first list contains symbols exported by object
modules (in our texts in NASM assembly language we marked such symbols with the
global directive), the second list contains symbols to which there are already

We do not consider here the case of so-called shared libraries whose files have a .so suffix; the
279

concept of dynamic loading requires additional discussion that is beyond the scope of our book.
§ 3.7. Split broadcast 683
references, i.e. there are modules importing these symbols (for NASM these are
symbols declared with the extern directive and then used), but which have not yet
met in any of the modules as exported.
The linkage editor starts by initializing the list of allowed symbols as empty and
the list of unallowed symbols as containing only one entry point label (the default is the
_start label), and proceeds step by step from left to right through the list of objects
specified on its command line. In case the next specified object is an object file, the
linkage editor "accepts" it into the generated executable file. All symbols exported by
this module are entered into the list of known symbols; if some of them were present in
the list of unauthorized links, they are removed from it. Symbols imported by the
module are added to the list of unresolved references, unless they appear in the list of
known symbols at that time. Object code from the module is accepted by the link editor
for further conversion into executable code and insertion into an executable file.
If the next object in the list specified on the command line is a library, the link
editor's actions will be more complex and flexible, since there may be no need to accept
all the modules that make up the library. First of all, the link editor will check the list
of unresolved links; if the list is empty, the library will be completely ignored as
unnecessary. Usually, however, the list is not empty in this situation (otherwise the
programmer would not have specified the library), and the next action of the link editor
is to try to find modules in the library that export one or more characters with names
appearing on the current list of unresolved links; if such a module is found, the link
editor "accepts" it, modifies the character lists accordingly, and starts looking at the
library again, and so on until none of the remaining unaccepted modules in the library
are found. Then the link editor stops examining the library and moves on to the next
object in the list. As a result, only those modules are taken from the library that are
needed to satisfy the character import needs of the preceding modules, plus possibly
those modules needed by already accepted modules from the same library. Thus, when
building the greet program from the previous paragraph, the linkage editor first took
the getstr, putstr, and quit modules from the libgreet.a library because
they contained characters imported by the previously accepted greet.o module;
then the linkage editor took the strlen module as well, because the putstr
module needed it.
The link editor generates error messages and refuses to continue building the
executable in two main cases. The first of them occurs when the list of objects (modules
and libraries) is exhausted, but the list of unresolved references is not empty, i.e. at least
one of the received modules refers as an external reference to a symbol that never
appeared in any of the modules; such an error situation is called an undefined reference.
The second case of the error situation is the appearance in the next accepted module
exported symbol, which at this point is already in the list of known; in other words, two
or more accepted modules export the same symbol. This is called a name conflict . 280

280
Modern link editors, in order to please careless programmers, allow some cases of name conflict
not to be considered an error; this is used, for example, by C+ compilers. Try not to rely on such features
as much as possible.
§ 3.7. Split broadcast 684
Interestingly, the link editor never goes backwards in its progress through the
object list, so that if a module in a library was not accepted when the editor got to that
library, it will not be accepted again, even if an imported symbol appears in any of the
subsequent modules, which could have been resolved by accepting more modules from
the previously processed library. An important consequence of this fact is that object
modules should be specified before the libraries that these modules need. A second
important consequence is that libraries should never cross-depend on each other, i.e.
if one library uses the features of a second library, the second library should not use the
features of the first. If such cross-dependencies occur, the two libraries should be
merged into a single library, although it is better to first consider whether some of the
dependencies can be eliminated, even at the cost of duplicating functionality.
One more important remark. As long as libraries do not depend on each other at
all, we don't have to worry too much about the order of parameters for the linkage
editor: it is enough to first specify all the object files that make up our program in any
order, and then, again in any order, list all the necessary libraries. If dependencies
between libraries appear, the order of their specification becomes important, and if it is
not observed, the program will not be built. As you can see, library dependencies, even
if they are not cross-referenced, pose certain problems; it must be said that the order of
arguments for the link editor is by no means the most serious of these problems,
although it is the most obvious. Therefore, before relying on the capabilities of another
library, you should think carefully and repeatedly; it is better not to allow such
dependencies at all, i.e., you should try to design libraries so that they never use the
capabilities of other libraries. If this does happen, you should consider merging such
libraries into one - but this is also not always the right thing to do, since the features of
any library must be logically unified in some way.
Knowing how the link editor works will be useful not only (and not so much) in
assembly language programming, but also in practical work in high-level programming
languages, especially C and C+-+. If you do not take into account the contents of this
paragraph, you risk, on the one hand, overloading your executables with unnecessary
(unused) content and, on the other hand, designing your libraries in such a way that you
start to get confused.
To conclude the discussion of the linker editor, we should note that it (as a program)
is quite complex, although you may never need most of its functionality. Anyway, ld
recognizes several dozen different command-line options and can even handle special
linker scripts that control linking. We won't look at scripts, as we won't look at most of
the options; we'll focus on just a few that, first, you may need, and second, give you an
idea of how and what you can control.
We have already seen several flags: -1 allows you to connect a library by its short
name (without specifying the full file name and path), -L adds new directories to the
beginning of the directory list, where the linker will look for library files connected by
-1. The -o flag specifies the name of the resulting file (usually an executable). The
-m flag specifies the architecture (if you like, the "platform") for which the build is
performed; for example, we have repeatedly mentioned that when building 32-bit
programs on 64-bit systems you should specify the -m e1f_i386 switch.
The -nostdlib flag removes "system" directories from the search list,
§ 3.7. Split broadcast 685
leaving only those you specify with -L; the -e option allows you to set a label name
for the entry point other than _start (for example, if you include gogogo in the
-e linker command line, the label gogogo will be used instead of _start). The
-u option takes a symbol name as a parameter; the linker with this option will display
a runtime message about every mention of the specified symbol it encounters in object
files and libraries.
In the future, when programming in C and C++, you may find useful the -static
flag, which prohibits the use of dynamic library versions and makes the resulting
executable file statically compiled, independent of anything external, and the -s flag (from
the word strip), which removes all "unnecessary" information (mainly debugging information)
from the resulting file. When programming in assembly language, we did not use any dynamic
libraries, and we did not put debugging information into object files either, although it is
possible if you specify the -g flag to nasm^, which is similar in meaning to the same flag in
Free Pascal (see §2.13.4, page 502).

3.8. Floating point arithmetic


So far we have considered only commands for working with integers. Meanwhile,
back in the introduction we talked about calculations in floating-point numbers (see
§1.4.3), and when studying Pascal we even worked with them ourselves (remember the
real type). Recall that a floating-point number is usually considered to represent
some value approximately, and rounding errors occur during arithmetic operations; this
is the inevitable cost of representing inherently continuous quantities in a discrete way.
The early x86 processors (up to 80386) had no floating point capabilities; they
could either be emulated by software, which was very slow, or an additional chip called
an arithmetic coprocessor could be installed in the computer: 8087 for 8086, 80287 for
80286, and finally 80387 for 80386. Computers based on the 386 processor were almost
all equipped with a coprocessor; there was no demand for computers without one,
because the slight reduction in system cost did not compensate for the disgustingly slow
performance of the machine with any little or no noticeable computational tasks. The
last processor in the line designed for the coprocessor as a separate chip was the 486SX;
in the development of the next processor, the 486DX, the coprocessor circuits were
incorporated into the same physical chip as the main processor. Nevertheless, from the
point of view of the running program, the arithmetic coprocessor is still (to this day) a
separate processor with its own system of registers, quite different from those of the
main processor, with its own flags, which have to be copied into the main flag register
by special instructions, and with its own peculiar principles of operation.

3.8.1 Floating point number formats


We discussed the representation of floating-point numbers in the introductory
section (see §1.4.3). Recall that this representation consists of three parts - the sign bit,
the biased exponent and the mantissa. The hardware platform we are considering uses
§ 3.7. Split broadcast 686
three formats of floating-point numbers - ordinary, double and increased precision 281

(see Table 3.5). The mantissa usually satisfies the ratio 1 6 t < 2, which allows not
storing its integer part, implying it is equal to 1, although in the enhanced-precision
format, the mantissa can be stored as 1.

Table 3.5. Floating point number formats


Title size order mantissa
"accuracy." (bit) size displacement (bit)
normal accuracy 32 8 127 +
23
double precision 64 11 1023 +
52
increased accuracy 80 15 16383 ++
64
The integer part of the mantissa is not stored and is assumed to be 1.
The high bit of the mantissa is considered to be its integer part (usually equal to 1).

The integer part of the mantissa is still stored (one bit is allocated for it). The arithmetic
value of a floating-point number is defined as

(-1) - 2 ' - ts

where s is the sign bit, p is the order value (as an unsigned integer), b is the order offset
for this format, t is the mantissa.
In all formats, the value of order, consisting of only zeros or of only ones, is
considered as a sign of a special form of number. Of all these forms, only the ordinary
zero, whose representation consists of only zeros - in sign, in order, and in mantissa -
turns out to be an ordinary number. Zero was included in the "special cases" for one
simple reason: it obviously cannot be represented by a number with a mantissa between
1 and 2. All other "special cases" indicate that something has gone wrong, the only
question is how serious the "wrong" is.
In particular, a number whose sign bit is set to one and all other bits - both in the
mantissa and in the order - are zero, means "minus zero". The occurrence of "minus
zero" in calculations indicates that in fact there must be a negative number, so small in
modulo that it can not be represented at all with at least one significant bit in the
mantissa. However, the same situation may occur with "plus zero" (a positive number
too small in modulo), but an ordinary zero may still be really a zero and not the result
of a rounding error, while "minus zero" is always the result of such an error. How
"serious" it is depends on the problem being solved. In most applied calculations the
difference between zero and, for example, the number 2 , or even 2 , is of no
-1000 -30

importance, such small values can usually be simply neglected: for example, if
calculations show that a car is moving at a speed of 10- km/h, then in any reasonable
10

sense of the word we should consider that this car is standing still.
A non-zero mantissa at zero order means a denormalized number. In this case, it
is assumed that the offset order is equal to its minimum allowed value (-126, -1022, -
16832; the value of the order without taking into account the offset is one), and the

281
The corresponding English terms are single precision, double precision and extended precision.
§ 3.8. Floating-point arithmetic 696
integer part of the mantissa is zero. The appearance of denormalized numbers in the
IEEE-754 standard has caused and continues to cause serious criticism, because it
dramatically complicates the implementation of processors, and software too, without
giving any practical gain. Anyway, the coprocessor can (i.e. physically capable) to
carry out calculations with denormalized numbers, but if you found denormalization in
your calculations, it will be more correct to change the units of measurement used - for
example, the calculation of capacitor capacitance to be carried out in picofarads instead
of farads, and the diameter of a gear of some mechanism to represent in millimeters
instead of, say, in light years.
However, even if you measure microbes in units more suitable for galaxies, you will not
get into the area of denormalization - for this you need something more serious. Judge for
yourself. In astronomy the unit of distance called the parsec is quite popular, it's a little over
three light years. Distances to the most distant objects that astronomers still somehow manage
to deal with are measured in gigaparsecs; it's probably safe to say that a gigaparsec is the
largest unit of length used in practice. A gigaparsec is 3.0857-10 meters. Let us now pass to
26

the other end of the scale; microbes we will leave alone, let us consider viruses at once. The
size of a common influenza virion is something around 100 nm, i.e. 10 meters. Knowing this,
-7

we find that the influenza virion is about 3.25-10 gigaparsecs in diameter. The machine (i.e.,
-33

binary) order to represent such a number would be -112, meaning that even for single
precision numbers, we would still have 14 powers of two left before we hit the denormalization
region - enough to break the unfortunate virion into individual atoms. To tell the truth, if we
want to measure an electron in the same gigaparsecs, we will still lack orders of magnitude in
the single-precision number, but nobody prevents us from taking a double-precision number,
especially since everyone usually works in them; and there, having at our disposal or ders up
to 2 , we can easily express the Planck length (within the framework of modern physical
-1022

concepts - the smallest possible length, i.e. it can't be less in any way), taking as a unit, say,
the diameter of the observable part of the Universe; however, it will turn out to be "only"
something about 10 , i.e. the machine order will be minus two hundred and something, which
-62

is far from a thousand. All this will help to evaluate the mental abilities (and most importantly -
the degree of irresponsibility) of the authors of IEEE-754, who, having already described a tool
for measuring quantities much smaller than it can be meaningful (which is normal, everything
should be done with a reserve, and the digit capacity of the machine order does not affect the
complexity of implementation), nevertheless invented denormalized numbers, sharply
complicating for their sake both the implementation of processors and the rules of working
with them.
The order consisting of only units is used to denote various kinds of "nonnumbers":
plus-infinity and minus-infinity (the mantissa consists of zeros, the sign bit indicates
whether infinity is positive or negative), as well as uncertainty, "silent nonnumber",
"signaling nonnumber", and even "unsupported number" (there are units in the
mantissa, the type of nonnumber depends on their specific location). In normal
calculations nothing of this kind can occur; for example, "infinities" occur when
division by zero is performed, but the processor is configured so as not to initiate an
exceptional situation (internal interrupt) and not to give control to the operating system.
In normal calculations, the processor always initiates an exception for division by zero
and other similar circumstances. The application of calculations that do not stop when
performing operations on obviously incorrect source data is a separate rather nontrivial
§ 3.8. Floating-point arithmetic 697
subject that we will not consider.
Since the coprocessor can handle real numbers stored in memory in any of the three
formats listed above, the assembler has to support designations for eight-byte and ten-
byte memory areas. To indicate such operand sizes in instructions, the NASM
assembler provides the keywords qword (from quadro word, "quadruple word") and
tword (from ten word). There are also corresponding pseudo-commands for
describing data (dq specifies an eight-byte value, dt - a ten-byte value), as
well as for reserving uninitialized memory (resq reserves a specified number
of eight-byte elements, rest - a specified number of ten-byte elements).
In pseudo-commands dd and dq, floating-point numbers can be used along with
the usual integer constants, and dt only allows them as initializers. NASM
distinguishes floating-point numbers from integers by the presence of a decimal point;
"scientific notation" is allowed, i.e., for example, 1000.0 can be written as 1.0e3,
1.0e+3, or 1.0E3, and 0.001 can be written as 1.0e-3, and so on.
NASM also supports writing floating-point constants in hexadecimal notation, but this
feature is rarely used.
Note that the coprocessor itself, if it is not interfered with, performs all operations
with numbers of increased precision, and uses numbers of other formats only during
loading and unloading.

3.8.2. Arithmetic coprocessor device


The arithmetic coprocessor has eight 80-bit registers for storing numbers, which
we will conventionally designate R0, R1, ..., R7; the registers form a kind of stack, i.e.
one of the registers Rn is read as a stack.
§ 3.8. Floating-point arithmetic 698

Fig. 3.10. Arithmetic coprocessor registers

is the top of the stack and is labeled ST0, the next one is labeled ST1, etc., and it is
considered that R0 follows R7 (for example, if R7 is currently labeled as ST4,
then the role of ST5 will play the register R0, ST6 will be in R1, etc.). Fig.
3.10 shows the situation when the top of the stack declared register R3; the role of the
top of the stack can play any of the registers Rn, and when entering a new value in
this stack, all the values that have already been stored there, remain in their place, and
only the number of the register that plays the role of the top changes, that is, if the stack
shown in the figure, bring a new value, then the role of the top - ST0 - will go to the
register R2, the register R3 will be designated ST1, and so on. When a
value is removed from the stack, the opposite action occurs. Note that these registers
can be accessed only by their current number in the stack, that is, by the names ST0,
ST1, ..., ST7. You cannot address them by their permanent numbers (R0, R1, ...,
R7), the processor does not give this possibility.
The designations ST0, ST1, ..., ST7 correspond to NASM conventions. Other
assemblers use other designations; in particular, MASM and some other assemblers
designate the arithmetic coprocessor registers using parentheses: ST(0), ST(1), ..., ST(7),
and these are the designations most often found in the literature. Do not be surprised by this.
To control the course of calculations are used service registers CR, SR and TW,
the structure of which is shown in Figure 3.11. 3.11. These registers consist of separate
flag bits, contain several two-bit fields and one three-bit field. Now for completeness
we will tell about all bits of these registers, but if something will be unclear - note that
it is not terrible, you always have time to return to this paragraph.
The SR (state register) contains a number of flags describing, as the name
implies, the state of the arithmetic coprocessor. In particular, bits 13, 12 and 11 (three
bits in total) contain a number from 0 to 7, called TOP, which indicates which of the
Rn registers is currently considered to be the top of the stack. Flags C0 (bit 8), C2
(bit 10) and C3 (bit 14) correspond in meaning to the CPU flags CF, PF and ZF;
there is also a flag C1, but it is rarely used.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 10

CR IC RC PC IEM PM UM OM ZM DM IM

15 14 13 12 11 10 9 8 7 6 5 4 3 2 10

SR B C3 TOP C2 C1 C0 IR SF PE UE OE ZE DE IE
§ 3.8. Floating-point arithmetic 699

15 14 13 12 11 10 9 8 7 6 5 4 3 2 10

TW tag7 tag6 tag5 tag4 tag3 tag2 tag1 tag0


Fig. 3.11. Discharges of registers CR, SR and TW

The lower six digits of the SR register indicate special situations: loss of precision
(PE), too large or too small result of the last operation (OE and UE, overflow,
underflow), division by zero (ZE), denormalization (DE), invalid operation (IE).
They are joined by the SF bit, indicating stack overflow or anti-overflow (SF). All
these bits will be discussed in detail in §3.8.7.
The IR (interrupt request) flag indicates the occurrence of an unmasked
exceptional situation resulting in the initiation of an internal interrupt; you can only see
this flag set in an interrupt handler within the operating system, so it does not concern
us. Finally, bit B (busy) means that the coprocessor is currently busy with
asynchronous execution of the command. It should be said that in modern processors
this bit cannot be seen set except in the interrupt handler.
Control register CR (control register) also consists of individual flags, but, unlike
the status register, these flags are usually set by the program and are designed to control
the coprocessor, that is, to set the mode of operation. The lower six bits of this register
correspond to special situations in the same way as the lower six bits SR, but are
designed not to signal the occurrence of these situations, but to mask them - if the
corresponding bit contains a one, a special situation will not lead to the occurrence of
an internal interrupt, but only to set a bit in the register CR. Bits 11 and 10 (RC,
rounding control) set the mode of rounding the result of the operation: 00 - to the
nearest number, 01 - downward, 10 - upward, 11 - towards zero (that is,
to reduce the absolute value).
Bits IC (12) and IEM (7) of the CR register are not used in modern processors.
Bits 9 and 8 (PC, precision control) set the precision of the operations performed: 00
- 32-bit numbers, 10 - 64-bit numbers, 11 - 80-bit numbers (this mode is used by
default and the need to change it arises very rarely).
The TW tag register contains two bits to indicate the state of each of the registers
R0-R7: 00 - the register contains a number, 01 - the register contains zero, 10 -
the register contains a special kind of number (NAN, infinity or denormalized
number), 11 - the register is empty. Initially, all eight registers are marked
as empty, as numbers are added to the stack, the corresponding registers are marked as
filled, when numbers are removed from the stack - again as empty. This allows tracking
stack overflow and anti-overflow - such situations when the ninth number is added to
the stack (which has nowhere to put) or, on the contrary, an attempt is made to extract
a number from an empty stack.
Service registers FIP and FDP are designed to store the address and operand of
the last machine instruction executed by the coprocessor and are used by the operating
system when analyzing the causes of an error (exceptional) situation.
§ 3.8. Floating-point arithmetic 700
The mnemonic designations of all machine commands related to the arithmetic
coprocessor begin with the letter f from the English letter floating; the phrase "floating
point" in English sounds like floating point. Most of these commands have no operand
or have one operand, but there are also commands with two operands. As an operand
can be coprocessor registers, denoted STn, or operands of the "memory" type. The
coprocessor does not support direct operands, i.e. floating point numbers directly in the
instruction.

3.8.3. Data exchange with the coprocessor


The fld (float load) command, which has one operand, allows a number to be
written to the register stack from a specified location, which can be an operand of type
"memory" of size dword, qword or tword, or the STn register. For example, the
command
fld st0

creates a copy of the stack top, and the command


fld qword [matrix+ecx*8]

loads an element with the number stored in the ECX register onto the stack from the
matrix array of eight-byte numbers. At the same time, the value of the TOP number
in the SR register is decremented, so that the top of the stack is moved up, the old top
is named ST1, etc.
To retrieve the result from the coprocessor (from the top of the register stack) you
can use the commands fst and fstp, which have one operand. Most often it is
an operand of the "memory" type, but it is also possible to specify a register from the
stack, for example, ST6, it is only important that this register must be empty.
The main difference between these two commands is that fst simply reads the
number at the top of the stack (i.e. in the ST0 register), while fstp fetches the
number from the stack by marking ST0 as free and incrementing the value of TOP.
Actually, the letter "p" in the name of the fstp command stands for the word pop.
For some reason, the fst command cannot work with 80-bit operands of the
"memory" type, while fstp has no such limitation. There is one more thing to note:
the command

fstp st0

first writes the contents of ST0 into itself and then pushes ST0 out of the stack, so
that the effect of this command is to destroy the value at the top of the stack. This is
usually done if the number at the top of the stack is not needed for further calculations.
It is often necessary to convert an integer number into floating-point format and
vice versa. The fild command allows you to take an integer from memory and write
it to the coprocessor stack (of course, in floating-point format). The command has one
§ 3.8. Floating-point arithmetic 701
operand, necessarily of the "memory" type, of the word, dword or qword size
(in this case it means an eight-byte integer). The fist and fistp commands
perform the opposite action: they take the number located in ST0, round it to an integer
in accordance with the set rounding mode and write the result into memory at the
address specified by the operand. Similar to the fst and fstp commands, the
fist command does not modify the stack itself in any way, while the fistp
command removes a number from the stack. The operand of the fistp command
can be of word, dword or qword size, while the fist command can work only
with word and dword.
The fxch command swaps the contents of the stack top (ST0) and any other
STn register that is specified as its operand. The registers must not be empty. Most
often fxch is used to swap ST0 and ST1, in which case the operand can be
omitted.
The coprocessor supports a number of commands to load frequently used constants
onto the stack: fld1 (loads 1.0), fldz (loads +0.0), fldpi (loads l), fldl2e
(loads log e), fldl2t (loads log 10), fldln2 (loads ln2), fldlg2
2 2

(loads lg2). All of these commands have no operands; as a result of the execution
of each of them, the value of TOP is decremented and the corresponding value
appears in the new register ST0. The set rounding mode determines in which direction
the loaded approximate value will differ from the mathematical value.

3.8.4. Arithmetic commands


The simplest way to perform the four arithmetic actions on the coprocessor are the
commands fadd, fsub, fsubr, fmul, fdiv and fdivr with one operand,
which can be an operand of the "memory" type of size dword or qword. The fadd
and fmul commands respectively add and multiply the ST0 register with its operand,
the fsub command subtracts the operand from ST0, the fdiv command
divides ST0 by its operand, fsubr, on the contrary, subtracts ST0 from its
operand, fdivr divides its operand by ST0; the result of all commands is written
back to ST0. All six commands can be used without operands, in which case ST1
plays the role of an operand.
All of the above commands also have a form with two operands, and only STn
registers can act as both operands, and one of them must be ST0 (but it can be either
the first or the second operand). In this case, the commands perform the specified action
on the first and second operands and place the result in the first operand.
In addition, each of these commands also has a "push-out" form, which is denoted
by faddp, fsubp, fsubrp, fmulp, fdivp, and fdivrp, respectively;
in this form, the commands always have two operands-registers STn, and the second
operand must be ST0; after the operation is executed and the result is written into the
first operand, these commands remove ST0 from the stack, i.e. it is marked as empty
and the TOP value is increased by one; the number displaced from the stack is not
written anywhere. For example, "fsubp stl, st0" will subtract the value of ST0
§ 3.8. Floating-point arithmetic 702
from the value of ST1, put the result into ST1, but then the value (former ST0)
will be removed from the top of the stack, so that the calculated difference value will
be just at the (new) top of the stack.
Commands in the "push-down" form can also be written without operands, in this
case ST1 and ST0 are used as operands; the action in this case can be described
by the phrase "take two operands from the stack, perform a specified action on them,
put the result back into the stack". Let us emphasize that the value taken from the top
of the stack (from ST0) is used as the right argument of the operation, while the left
argument is the value taken from the stack next (i.e. from ST1). Since both original
values are removed from the stack and the new value is written, it naturally appears as
a new stack top (ST0) and is located in the same place where the ST1 register was
located before.
Some programmers consider only this form of commands worthy of use. Indeed,
any arithmetic expression can be calculated this way, unless it contains too many nested
brackets (otherwise we will not have enough stack depth). To do this, the expression
must be represented in the so-called Polish inverse notation (POLIZ ), in which the
144

operands are written first, then the operation sign; the operands can be any complex
expressions, also written in POLIZ. For example, the expression (x + y) * (1 - z) would
be written in POLYZ: x y + 1 z - *. Let x, y and z be described as memory
areas (variables) of length qword and contain floating point numbers. Then to
compute our expression we can simply translate the POLYZ entry into an assembly
language entry, and each element of POLYZ will become a single instruction:
fld qwor [x] ; x
fld d qwor [y] ; у
faddp d ; +
fldl ; 1
fld qwor [z] ; z
fsubp d ;
fmulp ; *

The result of the calculation will be in ST0. However, using other forms of arithmetic
commands can shorten the program text considerably; as it is easy to see, the following
fragment does exactly the same thing:
fldqword [x]
faddqword [y]
fldl
fsub qword [z]
fmulp

The single operand commands fiadd, fisub, fisubr, fimul, fidiv and
fidivr are sometimes useful, performing the appropriate arithmetic operation on
ST0 and its operand, which must be of type "memory" of size word or dword and

It is useful to know that in English this would be RPN from the words reverse polish notation; the
144

translation, as you can see, is literal.


§ 3.8. Floating-point arithmetic 703
treated as an integer.
An arbitrary arithmetic expression can be converted from the traditional infix-scobotic form
to POLYZ using a rather simple algorithm known as Dijkstra's algorithm. According to this
algorithm, the original expression is viewed from left to right, with the next elements of the
POLYZ being written out as far as possible. During the review, an auxiliary stack is used,
which in some cases contains operation symbols and opening parentheses. The algorithm is
organized in such a way that operands (constants and variables) and closing parentheses
never enter the stack.
At the beginning of the work we make the first position of the initial expression current.
The POLYZ is considered empty. We clear the stack and put the opening parenthesis in it.
Now we consider the current element of the expression (as long as there are any);
• if this element is an operand (variable or constant), write it out as the next element of
the POLYZ;
• if this element is an opening bracket, put it on the stack;
• if this element is a closing curly brace, we extract the elements from the stack and write
them out as the next elements of the POLYS until the opening brace is on the top of
the stack; it is also extracted, but not written into the POLYS;
• Finally, if this element is an operation symbol, then:
• if there is an opening bracket at the top of the stack, or if the priority of the
operation at the top of the stack is less than the priority of the current operation,
put the current operation on the stack;
• if, on the contrary, at the top of the stack there is an operation with the same or
higher priority than the current one, then we extract the operation from the stack,
write out the operation extracted from the stack as the next element of POLYZ,
and compare the current operation again (it is possible that more than one
operation will be extracted from the stack in the end); when the operations of the
same or higher priority on the stack are over, we put the current operation on the
stack.
When the expression is completely read (there are no more outstanding elements), we extract
elements from the stack and write them out as the next elements of the POLYS until there is
an opening bracket at the top of the stack. It is also extracted, but not written into the POLYS.
Check that the stack is empty. If it is not empty, it indicates the presence of an unclosed
parenthesis in the expression; if, on the contrary, the stack was emptied before the expression
ended, it means that there were more closing brackets than opening brackets in the
expression.
For example, the translation of the expression a - b + c * d would be as follows:
• put a bracket on the stack;
• write out operand a;
• put a minus on the stack;
• write out operand b;
• since the priority of plus is not higher than the priority of minus, retrieve from the stack
and write out minus (now POLYZ contains ab-);
• since the top of the stack is again a bracket, we put a plus on the stack;
• write out the operand c (now POLYZ contains ab - c);
• Since the top of the stack is now a plus and the next element of the expression is a
multiplication sign, and since multiplication has a higher priority than addition, we put
multiplication on the stack (now the stack contains multiplication, plus and
parenthesis);
§ 3.8. Floating-point arithmetic 704
• write out the operand d, we get the POLYSIS ab - cd;
• since the expression is over, we select one element from the stack and write them out
until we meet a parenthesis: first multiply, then add and we get a POLYZ a b - cd * +;
• remove the bracket from the stack; since after that the stack was empty and our
expression was empty too, the translation was successful.
If there are unary operations in the source expression, they should be renormalized in some
way (for example, unary minus should be marked with @ or something else ), because
in the stack and POLYZ, unlike in the source expression, we will not be able to understand
from the context which minus - unary or binary - is meant Otherwise, everything is done in the
same way, taking into account that the priority of any unary operation is usually higher than
the priority of all binary operations.
To conclude the conversation about simple arithmetic, let us mention three more
commands. The fabs command calculates the ST0 module, the fchs command
(from the word change sign) changes the sign of ST0 to the opposite one, the
frndint command rounds ST0 to an integer according to the set rounding mode.
The result is written back to ST0. All three commands have only one form - without
operands.
We leave the commands fprem, fpreml, fscale, fxtract for inquisitive readers
to study on their own.

3.8.5. Commands for calculating mathematical functions


The fsin, fcos and fsqrt commands calculate the sine, cosine and square
root of a number lying in ST0, respectively, and the result is placed back in ST0.
The fsincos command is a bit more complicated: it extracts a number from the
stack, calculates its sine and cosine and puts them on the stack, so that the sine ends up
in ST1, the cosine in ST0, and the total number on the stack is one more than it
was before the command was executed.
The fptan command, which calculates the tangent, behaves somewhat
exotically. It takes an argument from ST0, calculates its tangent, puts the result back
into ST0, but after that it adds 1 more number to the stack, so that there is one more
number in the stack than before the command was executed, with one in ST0 and
the result of the tangent calculation in ST1. The purpose of all these dances is to
simplify the calculation of the tangent: it can now be obtained with the familiar
fdivrp command; if the tangent is not needed, we can get rid of the unit by dividing
by it, i.e. with the fdivp command, or simply throw it out of the stack with the
fstp st0 command.
The fpatan command calculates arctg ^, where x is the value in ST0, y is the
value in ST1. These two numbers are removed from the stack, the result is written to
the stack, so that there is one less number on the stack than there was. The sign of the
result coincides with the sign of y, the modulus of the result does not exceed l.
In addition, the coprocessor provides commands f2xm1, fyl2x and
fyl2xp1, which are interesting because looking at their description of the form "the
command does this and that", it is not easy to guess why it does this, that is, why such
a command is needed at all.
§ 3.8. Floating-point arithmetic 705
We start with the fyl2x command, which computes y - log2 x, where x is the
value of ST0, y is the value of ST1; these values are removed from the stack, and re-
§ 3.8. Floating-point arithmetic 706
The result is added to the stack, so that in the end there is one less number on the stack
than there was, and the result of the calculation is at the top. Everything seems to be
clear here, except for the role of the operand y, but even with it everything turns out to
be quite simple. The point is that in real life we have to calculate logarithms on any
base, not only on base 2, and here the formula well known from the school program
comes to the rescue.
lQ х
gb
loga x = log a
b

Since our processor can calculate only binary logarithms, the role of b will naturally be
two; to work with any base a other than 2, we need to calculate y = log1а in advance and
use it as the second operand of the fyl2x instruction, saving both multiplications and
repeated calculations of the coefficient (number y).
Having dealt with fyl2x and being in a good mood, we come across the next
command - fyl2xp1, and all our mood vanishes in a flash. This command works
the same way as the previous one, only it calculates y - log2 (x + 1). In addition, the
documentation says that the value of x must not modulo exceed 1 - -D, otherwise the
result is undefined.
To understand why the creators of the coprocessor needed such a monster, recall
that logo 1 is zero for any base; hence, if the value of x is close to one, then logo x will
be close to zero. Now imagine that your argument of the logarithm is very close to one
- for example, different from one by the value £ = 2 . This value itself is perfectly
-100

representable even by single-precision (four-byte) numbers due to the machine order,


because the machine order for them, taking into account the offset, is from -126 to
127; but if we try to represent, even as a number of increased precision, the argument
of the logarithm, i.e. (according to the conditions of the problem) the number 1 + £, the
machine order will be zero, and the length of the mantissa will not be enough to include
at least one significant bit; in other words, we will simply lose the difference between
the unit and the required argument, the argument will be equal to one, the result (the
logarithm of the unit) will be zero, that's all the calculations. It turns out that the usual
command for calculating the logarithm is useless in the close neighborhood of one; this
is where the fyl2xp1 command comes in handy, the argument of which is not the
number to be logarithmed, but its difference from one. The result, of course, will be
"close to zero", but - again thanks to machine ordering - all available mantissa capacity
will be utilized to represent the best possible approximation to the exact result.
Limitations on the modulus of the operand are also clear here: if the operand is larger,
we are already quite far from one and can logarithmize by conventional means.
The command f2xm1 calculates 2 - 1, where x is the value of ST0, and writes
Ж

the result back to ST0. The argument must not be modulo greater than 1, otherwise the
result is undefined. Here, newcomers, who are not adept in the intricacies of
computational math, are usually stumped by the question of why the subtraction of one
is necessary. You can guess it by carefully reading the description of fyl2xp1 (try
it before you read further!), but if you can't guess it, remember that a 0 = 1 for any
§ 3.8. Floating-point arithmetic 707
х
positive a, well, for values of x close to zero, a (including, of course, 2 ) is close to
145 Ж

one. If we try to represent the result as an ordinary floating-point number (of any
precision) when calculating the power function in the neighborhood of zero, the
machine order will be zero all the time because of its proximity to one, so the loss of
precision (when the result of calculations ceases to differ from one at all) will come
very quickly, because the mantissa is not so long (even for a number of higher precision
there are "only" 64 bits, so 1 + 2 cannot be distinguished from one), whereas if the
126 -65

result of the calculation is not the value 2 , but its difference from one, the use of
Ж

machine order will allow us, even when working with single-precision numbers, not
only not to lose significance, but even not to go into the area of denormalization, while
high-precision numbers with their 15-bit order will give us the opportunity to feel fine
when approaching one, for example, at 2- . 16000

All commands in this paragraph do not have operands.

3.8.6. Comparison and processing of its results


The general idea of comparison and actions depending on its results for floating-
point numbers is the same as for integers: first, a comparison is made, and flags are set
according to the results, and then a conditional jump instruction is used depending on
the state of the flags. Everything is somewhat complicated by the fact that the arithmetic
coprocessor has its own system of flags, and the main processor does not have
commands of conditional transition according to these flags. Therefore, we have to add
to the usual scheme the setting of flags of the main processor in accordance with the
current state of flags of the coprocessor.
Comparison can be performed with the commands fcom, fcomp and fcompp.
The fcom and fcomp commands have one operand, either of the "memory" type of
size dword or qword, or the STn register; the operand can be omitted, in which
case ST1 will be the operand. The commands compare ST0 to its operand (or to
ST1 if no operand is specified). The fcomp command differs from fcom in
that it pushes ST0 out of the stack. The fcompp command, which has no
operand, compares ST0 to ST1 and pushes them both off the stack.
As a result of comparison commands, the C3 and C0 flags in the SR register
(see page 686) are set as follows: if the numbers being compared are equal, C3 is
set to one, C0 is reset to zero; otherwise, C3 is reset, and if the first of
the numbers being compared (i.e., the number in the ST0 register) is less than
the second (set by the operand or the ST1 register), C0 is set to one, and if it
is greater, it is reset. Flag C3 is thus similar in meaning to flag ZF, and flag C0 to
flag CF (when comparing unsigned integers).
In fact, the comparison commands also set the C2 flag, and if everything is in order - it
is reset to zero, if the numbers are not comparable (for example, both numbers are "plus
infinity", or one of them is "not-number") and the coprocessor is configured so as not to initiate

Recall, just in case, that for non-positive bases the degree function is undefined, only integer
145

exponents of degree are defined.


§ 3.8. Floating-point arithmetic 708
interrupts in these situations - then C2 is set to one
To use the result of the comparison for a conditional transition, the flags from SR
must be copied to the FLAGS register of the main processor. This is done with the
commands

fstsw ax sahf

The first of them copies SR to the AX register, and the second one loads some (not
all!) flags into FLAGS from AH. In particular, after executing these two commands,
the value of flag C3 is copied to ZF and the value of C0 is copied to CF , 42

which fully meets our needs: now we can use for conditional transition any of the
commands provided for unsigned integers: ja, jb, jae, jbe, jna, etc. (see Table
3.3 on page 575). Let us emphasize once again: the use of these commands is only due
to the fact that after executing fstsw and sahf, the result of the comparison is
in the flags CF and ZF; there is nothing else in common between floating-point
numbers and unsigned integers, generally speaking.
Suppose, for example, we have variables a, b and m of size qword containing
floating point numbers, and we want to put the smaller of a and b into m. This can
be done as follows:

fld [b] ; b to the top of the


qword
fld [a] stack
; now (ina inST0) ST0, b in ST1
qword
fcom ; compare them
fstsw ax ; copy flags to AX
sahf ; and from there to
FLAGS.
Note for the sake of argument that the
42
C2 flag is copied to PF.
ja lpa ; if a>b - jump
fxch ; otherwise swap the
lpa: numbers
;now the larger one is in ST0,
fstp st0 the smaller ; eliminate one in theST1.unnecessary
fstp larger
[m] ; write the smaller one
qword into memory
The "unnecessary" number could be removed from the stack in other ways. Instead of the
penultimate command, two commands could be given: first ffree st0, which marks the
ST0 register as free, then fincstp, which increments the value of TOP by one These
commands are discussed in §3.8.9.
The ficom and ficomp commands, which always have one operand of the
"memory" type of word or dword size and treat this operand as an integer, may
also be useful in some cases. Otherwise, they are similar to the fcom and fcomp
commands: the first operand of the comparison is ST0, and flags C3, C2 and C0
are set according to the results of the comparison. The ficomp command, unlike
ficom, pushes ST0 out of the stack. Finally, the ftst command, which has no
operands, compares the top of the stack to zero.

3.8.7. Exceptional situations and their treatment


As a result of floating-point calculations, exceptional situations may occur, which
§ 3.8. Floating-point arithmetic 709
in some cases indicate an error in the program or input data, and in other cases may
reflect quite normal features of the calculation process. There are six such situations:
invalid operation, denormalization, division by zero, overflow, anti-overflow and loss
of precision.
Invalid Operation (#I) can mean one of two things: an invalid operand or a stack
error. The situation of an invalid operand is recorded when trying to use "nonnumbers"
as operands, extracting a square root or logarithm from a negative number, etc. A stack
error is defined as an attempt to write a new number to a filled stack (i.e., when all eight
registers are occupied), or an attempt to push a number out of the stack when there are
no numbers on the stack, or an attempt to use a register that is currently empty as an
operand. The optional SF flag allows you to distinguish a stack error from an invalid
operand situation.
Denormalization (Denormalized, #D) is fixed when attempting to perform an
operation on a denormalized number, or when attempting to load a denormalized
number of ordinary or double (but not extended) precision from memory into a
coprocessor register.
Zero divider (#Z) means literally this - when the divide command is executed, the
divider has zero in it.

Overflow (#O) occurs when the result of an operation is too large to be represented
as a floating-point number of the required size. This can happen, for example, when
transferring a number from the internal ten-byte representation to a four- or eight-byte
representation using, for example, the fst command, if the number does not fit into
the new representation.
Underflow (#U) means that the result of the next operation is so small modulo that
it cannot be represented as a floating-point number of the size specified in the
instruction (including when executing the fst instruction with an operand of
the "memory" type of the qword or dword size). The difference between #U and
#D (denormalization) is that it is the result of calculation or conversion to another
format, not the original denormalized operand.
Loss of Precision (#P) occurs when the result of an operation cannot be represented
accurately by the means available; in most cases, this is perfectly normal.
As discussed in §3.8.2, in each of the CR and SR registers, the lower six bits
correspond to exceptional situations in the order in which they are listed: bit #0
corresponds to an invalid operation, bit #1 corresponds to denormalization, etc.; bit #5
corresponds to loss of precision. In addition, in the SR register, bit #6 is set if the
invalid operation that caused bit #0 to be set is due to a stack error. The CR register
bits control what the processor should do when an exceptional situation occurs. If the
corresponding bit is reset, an internal interrupt will be initiated when an exception
occurs (see §3.6.3). If the bit is set, the exception is considered masked and the
processor will not initiate any interrupts when the exception occurs; instead, it will try
to synthesize a relevant result as much as possible (e.g., when dividing by zero, the
result will be an "infinity" of the appropriate sign; when accuracy is lost, the result will
be rounded to a number representable in the format used, according to the set rounding
§ 3.8. Floating-point arithmetic 710
mode, etc.).
When any exceptional situation occurs, the coprocessor sets the corresponding bit
(flag) in the SR register to one. If the situation is not masked, this bit is useful to the
operating system in the interrupt handler to understand what happened; if the situation
is masked and no interrupt occurs, the set flags can be used in the program to track
down the exceptions that occurred. Note that these flags do not reset themselves, they
can only be reset explicitly, and this is done with the fclex command. The
commands for interacting with the CR and SR registers will be discussed in detail in
§3.8.9.
3.8.8. Exceptions and the wait command
There is one non-obvious peculiarity associated with the handling of exceptional
situations: the instruction, the execution of which led to an exception, only raises a flag
in the SR register, but does not initiate an internal interrupt, even if the corresponding
flag in CR is not set. The coprocessor remains in this state until execution of the
next instruction begins. The problem here may arise if the instruction that caused the
exception uses an operand located in memory (for example, an integer), and between
this instruction and the following instruction of the arithmetic coprocessor there is an
instruction executed by the main processor that changes the value of the operand placed
in memory. In this case, by the time an internal interrupt is initiated, the value that
caused the interrupt will have been lost. For example, if during execution of a sequence
of instructions

fimul dword [k] mov [k], ecx


fsqrt

The result of fimul operation will be an overflow followed by an internal interrupt,


then this interrupt will occur only when the coprocessor reaches the fsqrt
instruction, but by that time the operating system will no longer be able to find out what
value of the operand (variable k) caused the exception. In this example, the problem is
solved by changing the order of commands:

fimuldword [k]
fsqrt
mov [k], ecx

The processor supports a special command fwait (or simply wait, these are two
designations for the same machine instruction), which checks the coprocessor status
register for unmasked exceptions and initiates an interrupt if necessary. This command
is worth using if the last f-command may have caused an exception and you do not
intend to perform any more operations with floating numbers.
It is interesting that some mnemonics of coprocessor commands actually
correspond to two machine commands: first comes the wait command, then the
command that performs the desired action. An example of such mnemonics is the
already familiar fstsw: it is actually two commands - wait and fnstsw; if
§ 3.8. Floating-point arithmetic 711
necessary, you can use fnstsw separately, without waiting, but to do so you need to
understand exactly what you are doing. The fclex command from the previous
paragraph is organized the same way: this designation corresponds to the machine
commands wait and fnclex. The fnstsw and fnclex commands are
examples of arithmetic coprocessor commands that do not check for unhandled
exceptions before doing their main work.

3.8.9. Coprocessor control


Recall that the coprocessor operation mode can be controlled through the CR
(control register), and according to the results of operations, the processor sets the
contents of the SR (status register), which can be analyzed. Finally, the current state
of the registers that make up the stack is reflected in the register TW (tag word). The
structure of these registers was discussed in §3.8.2.
The fstcw, fnstcw, and fldcw commands are provided for working with
the CR register. The fstcw instruction, as usual, means two machine instructions
wait and fnstcw. All three instructions have a single operand, which can only be
an operand of type "memory" of size word. The first two instructions write the contents
of the CR register to a specified location in memory, the last instruction, on the
contrary, loads the contents of the CR register from memory. For example, with the
following commands we can set the rounding mode "towards zero" instead of the
default mode "to the nearest" (note that we will allocate four bytes in the stack so as not
to disturb its alignment, but we will use only two):

sub esp, 4 ; allocate memory in the stack


fstcw [esp] ; get CR contents into it
or word [esp], 00001100000000000000b
; force bits 11 and 10 of fldcw [esp];
load the result back into CR
add esp, 4 ; free memory

The contents of the SR register can be obtained with the familiar fstsw instruction,
the operand of which can be either the AX register (and nothing else) or the "memory"
type of word size. There is also an instruction fnstsw, and fstsw is a
designation for two machine instructions wait and fnstsw. Note that the reverse
operation (loading a value) for SR is not provided, which is quite logical: this register
is needed to analyze what is happening. Nevertheless, some commands affect this
register directly. For example, the TOP value can be increased by one with the
fincstp command and decreased by one with the fdecstp command (both
commands have no operands). These commands should be used with caution, because
they do not change the "busy" status of stack registers; in other words, fdecstp
causes ST0 to become an "empty" register, while fincstp causes ST7 to
become "busy" (because it is the former ST0). Another active action with the SR
register that can be performed by the programmer is clearing the exception flags. Such
§ 3.8. Floating-point arithmetic 712
clearing is performed by the commands fclex (clear exceptions) and fnclex,
which we have already mentioned in the previous paragraph.
It is recommended to always execute the fclex command before the fldcw
command, otherwise it may happen that writing the CR register "demask" one of the
exceptions, the flag of which is already raised, causing an interrupt.
The TW register cannot be directly read or written, but there is one instruction
that can directly affect it. It is called ffree, has one operand - the STn register, and
its action is to mark a given register as "free" (or "empty"). In particular, the following
commands remove a number from the stack top "to nowhere":

ffree st0 fincstp

If you do not know (or have doubts about) the state of the arithmetic coprocessor when
you start calculations, but you know for sure that its registers do not contain any
information useful for you, you can
bring it "back to the initial state" using the finit or fninit command (finit
is a notation for wait fninit, see §3.8.8). The CR register is set to 037Fh
(nearest rounding, highest possible precision, all exceptions masked); the SR
register is zeroed, which means TOP=0, all flags are cleared,
including the exception flags; the FIP, FDP registers are also zeroed, and
the TW register is filled with ones, which corresponds to an empty stack; the registers
that make up the stack are not changed in any way, but since TW is filled with
ones, they are all considered free (containing no numbers).
With the fsave command you can save the entire state of the coprocessor,
i.e. the contents of all its registers, in the memory area to restore it later. This is useful
if you need to temporarily stop some computational process, perform some auxiliary
calculations, and then return to the pending one. To save, you need a memory area 108
bytes long; the fsave command has one operand, it is an operand of type "memory",
and you don't need to specify its size. The fsave mnemonic actually stands for two
machine commands, wait and fnsave. After saving the state in memory, the
coprocessor is "reset" in the same way as with the finit command (see above), so
there is no need to give the finit command separately after fsave. To restore
the previously saved state of the coprocessor you can use the frstor command; like
fsave, this command has a
one operand of the "memory" type, for which the size does not need to be specified
either.
Sometimes it is necessary to save or restore only the auxiliary registers of the
coprocessor. This is done by the commands fstenv, fnstenv and fldenv using a
memory area of 28 bytes; a detailed description of these commands will be omitted.
To conclude the conversation about the coprocessor, let's mention the fnop
command. As you can guess, this is a very important command: it does nothing.
713
Concluding remarks
Of course, we have not even considered a tenth of the i386 processor's capabilities,
and if we talk about extensions of its capabilities that appeared in later processors (for
example, MMX registers), the share of what we have studied will be even more
modest. However, we can write programs in assembly language now, and it will allow
us to get experience of programming in terms of machine commands, which, as it was
said in the preface, is a necessary condition for quality programming in any language
at all: you cannot create good programs without understanding what is really going on.
Readers who wish to learn more about the i386 hardware platform can consult
technical documentation and reference books, which are more than sufficiently
available on the Internet. However, I would like to warn those who wish to do so that
the i386 processor (partly "thanks" to the heavy legacy of the 8086) has one of the most
chaotic and illogical instruction systems in the world; This becomes especially
noticeable once we leave the cozy world of limited mode and "flat" memory model,
where the operating system has carefully arranged for us, and come face to face with
segment descriptor generation, ridiculous jumps between protection rings, and other
"charms" of the platform that the creators of modern operating systems have to struggle
with.
So if you are seriously interested in low-level programming, we can advise you to
study other architectures, for example, SPARC or ARM processors. However, curiosity
is not a vice in any case, and if you are ready for some difficulties - find any reference
book on i386 and study it to your heart's content :-)
Literature
[1] Baurn S. UNIX operating system. M.:Mir, 1986.
[2] Dennis M. Ritchie. The Evolution of the Unix Time-sharing System. In: Lecture
Notes in Computer Science 79: Language Design and Programming Methodology,
Springer-Verlag, 1980. Online version: http://cm.bell-
labs.com/cm/cs/who/dmr/hist.html
[3] Eric S. Raymond. The Art of Programming for Unix. M.: Williams Publishing
House, 2005. The original English version is available online at
http://www.faqs.org/docs/artu/.
[4] Linus Torvalds, David Dimon. Just For Fun (The Story of an Unintentional
Revolutionary). Moscow: Eksmo Press, 2002.
[5] Perelman Ya. Living mathematics. Moscow: Nauka. Gl. ed. fiz.-mat. litt., 1967.
160 с.
[6] Boss W. Lectures on mathematics. Т. 6: From Diophantus to Turing: textbook.
Moscow: ComKniga, 2006. 208 с.
[7] Feynman R. F. You are certainly joking, Mr. Feynman! Transl. from English by N.
A. Zubchenko, O. L. Tikhodeeva, M. Shifman. Moscow: SIC Regular and Chaotic
Dynamics, 2001. 336 с
[8] Wirth N. Algorithms and Data Structures. Per. s Engl. 2nd ed. SPb.: Nevsky
Dialect, 2001. 352 с. ISBN 5-7940-0065-1
[9] Cormen T., Leiserson C., Rivest R. Algorithms: Construction and Analysis. M.:
MCNMO, 2000. 960 с. ISBN 5-900916-37-5.
[10]Knuth D. The Art of Programming, vol. 3. Sorting and Searching, 2nd ed. Per. from
Engl. Moscow: Williams Publishing House, 2000. 832 c. ISBN 5-8459-0082-4
[11]Э. Tannenbaum. Computer architecture. 4th edition. SPb.: Peter, 2003.
http://www.stolyarov.info
Study edition

Andrey Viktorovich STOLYAROV


PROGRAMMING: INTRODUCTION TO THE PROFESSION
Edition of the second, corrected and supplemented
in three volumes
Volume I: BASICS OF PROGRAMMING

Drawing and cover design by Elena Domennova


Corrector Ekaterina Yasinitskaya

Printed from the finished original layout

Signed in print on 16.02.2021.


Format 60x90 1/16. Usl.pec.l. 44. Circulation 500 (1-200) copies. Izd.№ 022.

Publishing house MAKS Press LLC


License ID № 00510 from 01.12.99
119992 GSP-2, Moscow, Leninskie Gory,
Lomonosov Moscow State University, 2nd academic building, 527 k.
Tel. 8(495)939-3890/91. Tel./Fax 8(495)939-3891

Printed in full compliance with the quality of the


provided materials in OOO "Photoexpert"
115201, Moscow, Kotlyakovskaya str. 3, p. 13.
13.
Andrey Viktorovich
Stolyarov (born 1974) -
PhD in Physics and
Mathematics, PhD in
Philosophy, Associate
Professor: works at the
Department of
Algorithmic Languages,
Faculty of Computational
Mathematics and
Cybernetics, Lomonosov
Moscow State University.

1. Introduction to Pascal and the Beginnings of


2. Programming
Processor capabilities and assembly language
3. C programming
Operating system objects and services
4 . Networks and protocols
5 . Parallel programs and shared data
6 . System kernel: a look behind the scenes
7 . Paradigms in programmer's thinking

8.
9.
10. C++ language, OOP and ATD
11. Non-destructive paradigms
12. Compilation, interpretation, scripting

http://www.Stolyarov, info

You might also like