Professional Documents
Culture Documents
(see next page). The text in this file corresponds exactly to the printed
version of the book. Electronic versions of this and other books by the
author can be obtained at http://www.stolyarov.info.
PLEASE READ CAREFULLY BEFORE USING THIS BOOK
А. V. STOLYAROV
PROGRAMMING
INTRODUCTION TO THE PROFESSION
second edition
in three volumes
PRESS
Moscow - 2021
UDC 519.683+004.2+004.45
BBK 32.97.
С81
Table of contents
ISBN 978-5-317-06574-4 (vol. I) Preface one, philosophical..... 10
ISBN 978-5-317-06573-7 amended
Preface two, methodological ............................................................................. 20
Preface three, parting ........................................................................................ 39
Book structure and conventions used in the text .... 43
1. Preliminary information 49
1.1. Computer: what it is ................................................................................ 49
1.2. How to use a computer properly ............................................................. 72
1.3. Now a little math................................................................................... 136
1.4. Programs and Data ................................................................................ 196
Table of Contents
Preface one, philosophical ................................................................................ 10
Preface two, methodological ............................................................................. 20
Can you learn to be a programmer .......................................................... 21
Self-learning isn't easy either .................................................................. 22
There's a way out, or "Why Unix" ........................................................... 23
Reason one is math.................................................................................. 24
Reason two is psychological ................................................................... 25
Reason three is ergonomic ...................................................................... 27
Reason four is pedagogical...................................................................... 27
Language defines thinking ...................................................................... 29
How to ruin a good idea and how to save it ............................................ 36
Preface three, parting ........................................................................................ 39
Structure of the book and conventions used in the text ................................ 43
1. Preliminary information 49
1.1. Computer: what it is ................................................................................ 49
1.1.1. A bit of history ........................................................................... 49
1.1.2. Processor, memory, bus.............................................................. 63
1.1.3. Principles of central processing unit operation .... 66
1.1.4. External devices ......................................................................... 67
1.1.5. Memory Hierarchy ..................................................................... 69
1.1.6. Summary .................................................................................... 71
1.2. How to use a computer properly ............................................................. 72
1.2.1. Operating systems and types of user
interface ...................................................................................... 72
1.2.2. History of Unix OS .................................................................... 82
1.2.3. Unixon the home machine .......................................................... 86
1.2.4. First session in the computer ...................................................... 89
1.2.5. Directory Tree.Working with files ............................................. 91
1.2.6. Command and its parameters ..................................................... 95
1.2.7. File name templates .................................................................... 98
1.2.8. Command history and autodescription of file names 99
1.2.9. Task management ..................................................................... 100
1.2.10. Running in the background .................................................... 105
1.2.11. Redirecting I/O streams.......................................................... 106
1.2.12. Text editors ............................................................................ 108
1.2.13. File permissions ..................................................................... 115
1.2.14. Electronic documentation (man command) .... 118
1.2.15. Command files in Bourne Shell ............................................. 119
1.2.16. Environment variables............................................................ 126
1.2.17. Session logging ...................................................................... 128
1.2.18. Graphics subsystem in Unix OS ............................................. 128
1.3. Now a little math................................................................................... 136
1.3.1. Elements of combinatorics ....................................................... 136
1.3.2. Positional number systems ....................................................... 152
1.3.3. Binary logic .............................................................................. 164
1.3.4. Types of infinity ....................................................................... 170
1.3.5. Algorithms and Computability ................................................. 175
1.3.6. Algorithm and its properties ..................................................... 185
1.3.7. Sequencing has nothing to do with . . . . 193
1.4. Programs and Data ................................................................................ 196
1.4.1. On measuring the quantity of information ............................... 196
1.4.2. Machine representation of integers ........................................... 204
1.4.3. Floating point numbers.............................................................. 210
1.4.4. Texts and languages ................................................................. 211
1.4.5. Text as a data format.Encodings .............................................. 215
1.4.6. Binary and textual data .............................................................. 221
1.4.7. Machine code, compilers and interpreters . 224
. 1The picture is completed by my extreme sensitivity in the choice of means of PR and other
publicity: I am categorically not ready to tolerate anything even remotely resembling spam - after
all, my dissertation in philosophy was called "Information Freedom and Information Violence",
and it grew out of the study of a private question about the specific reasons why spam cannot
be considered a manifestation of freedom of speech.
Meanwhile, we needed to raise a substantial sum of 600,000 rubles (I remind you, we are
talking about the beginning of 2015). Half of this sum was supposed to be spent on partially
compensating my working hours, which would allow me to stay afloat without wasting time on
casual part-time jobs; the other half of the sum was to be spent on publishing a paper book. By
posting on my website stolyarov.info, I had little hope for anything, but I had thought of
a system of incentives for those who would support me financially: For a donation of 300 rubles
I promised a mention in the list of sponsors, which will be placed in the book, for a donation of
500 rubles - a "branded" CD with the author's autograph (I note that no copy of such a disk was
not demanded by anyone in the end, so this idea was, apparently, quite unsuccessful); for a
donation of 1500 rubles - a copy of a paper book, again, with an autograph, and from 3000 rubles
- a book in a gift version, in which it will be made in accordance with the number of such
donations.
1
Details about my attitude to so-called "social networks" are outlined in my article "Theater of Content
Absurdity. Social networks: the history of one terminological deformation", which can be easily found on
the Internet with the help of search engines.
Almost immediately - at least before the first donations came in, anyway - several people
asked me what would happen to the money if the required amount was not raised; I responded
by writing a separate page where I promised to at least do something anyway, even if I didn't
receive any donations at all. Specifically, I promised that if the sum of collected donations was
less than 25 000 rubles, I would still write a part of the book devoted to the C language and
publish it as a separate book, plus I would once again finalize the text of my C++ book and
republish it for the fourth time. At the sum of donations from 25 to 55 thousand I promised to
revise and republish also my old book on NASM, at the sum from 55 to 100 thousand - to revise
and republish "Introduction to Operating Systems", at the sum from 100 to 120 thousand - to
write a part devoted to Pascal and publish it as a separate book. Finally, when the threshold of
120 thousand was reached, I promised that I would write the whole book and continue raising
money to make it possible to publish it. I set September 1, 2015 as the date of my decision, while
the described events took place at the beginning of January - the announcement of the project
was dated 7.01.2015.
After the announcement, it was quiet for two days; the first donation came the next day, but
it could not be counted: one of my old friends, known under the nickname Gremlin, decided to
support the project. The merry-go-round didn't start until January 10, when I received seven
donations during the day, totaling over 14 thousand. I would like to take this opportunity to
sincerely thank Grigory Kraynov, who, when he sent the second donation, was not lazy and
brought information about the project to the general public through the notorious "social
networks".
The first milestone of 25,000 was reached on January 12, the second milestone (55,000) -
on January 16; on February 4, the amount exceeded 100,000, and on February 10 - the magical
120,000, so all my "backup options" became irrelevant at once; the book now had to be finished
with a carcass or a stuffed animal.
Of course, things were not so rosy; by spring the first wave of donations had finally dried up,
so I even had to stop working on the manuscript to earn money on the side. In the summer I
managed to announce the project on the Linux.Org.Ru site, with special thanks to its owner
Maxim Valyansky for permission to do so; the announcement generated a second wave of
donations. In the first year, the project went into the negative many times and came back out
again, and until the last minute it was unclear whether there would be enough money for the
publication and in what form.
Work on the manuscript was going surprisingly briskly at that time. By February 23, 2015 -
a measly month and a half after the start - I had already finished the manuscript of the C part, a
month later I finished the introductory part; then I stopped for a couple of months, because the
project just went into negative territory and I had other things to do. When I resumed work in
early June, by September 1 I had written the Pascal part, designed to "tell the story of
programming". I was most afraid of this part at the beginning of the project, because I had no
experience in teaching Pascal to students, and private lessons with high school students
preparing for the Unified State Exam are not quite the same. But as you know, the road will take
one who walks; having become the largest TEX file I had ever edited, the part of the manuscript
devoted to basic programming skills brought a fair dose of creative satisfaction, and at the same
time marked the end of the manuscript of the parts that had to be created from scratch - only the
parts that had already existed before as separate books remained.
At the beginning of November 2015 the manuscript was completed in the form originally
planned. The result gave a strange impression: seven parts (introduction, Pascal, assembler, C,
operating systems, parallel programming and C++) stretched for 1103 pages, so the book was
too thick. However, I had no money to publish it anyway, and I was also actively looking for an
artist who could draw a decent cover for this ugliness. While the manuscript was being read by
several of my friends for obvious errors, on top of that I had a clear feeling that I didn't like the
manuscript as it was - it was missing several chapters that I really wanted to have (in particular,
chapters on working with the terminal driver and the ncurses library), and the further I went on,
the more I wanted to include parts on Tcl and Tcl/Tk, various exotic languages like Lisp and
Prolog, and to show how to use OOP to create graphical interfaces.
Thinking about what to do next led to the idea of turning the book into a three-volume set,
which would allow me to publish the first parts I was sure of right away, and to continue working
on anything that seemed too raw. The idea was published on the website on December 15 and
seemed to be unobjectionable, so I concentrated on preparing the first volume, which included
only the introduction and the Pascal part. Besides many revisions of the text, an important part
of this preparation was the drawing and design of the cover. My friends introduced me to the
designer Elena Domennova, who brilliantly realized the idea I had formed. The plot of the
drawing, where the globe stands on three crutches on the back of Bug, riding through a field with
a rake on a bicycle with square wheels, and around flies a crazy fish with wings and webbed
feet, I borrowed from a work that can be easily found on the Internet under the title "The Amazing
World of Programming". The original work was a drawing with felt-tip pens on a whiteboard that
someone was too lazy to take a picture of. Many thanks to the author of the original drawing for
the idea, which provided me with a considerable dose of positive mood. Unfortunately, I still don't
know who created this drawing, but I hope that if its author ever sees the cover of my first volume,
he will like our remake with Elena :-).
The first volume went to the printer on March 2, 2016. By that time, a little more than 400
thousand rubles had been collected in donations; taking into account the compensation for my
time (557 hours of which had been spent by that time) and the costs of publishing, the project
"flew into the minus" by almost 34 thousand rubles. Technically, I had the manuscript of the
second volume ready, but there was nothing to publish this second volume with. And after
finishing the chapter about ncurses and a few other fragments, the second volume swelled quite
a bit - up to 650 pages. The last straw was the obsessive feeling that I didn't like the text about
operating systems in its current form; after all, the book "Introduction to Operating Systems" was
ten years old by that time, and the further I went, the more I wanted to take it apart and build it
up again. As a result, I decided to include in the second volume only the parts about assembler
and C language (under the general title "Low-Level Programming"), and the rest - actually,
everything that came out of that old book - to bring it up to date, to provide an additional full-
length part devoted to computer networks, and to publish a separate - third - volume, entitled
"Systems and Networks".
Despite all the difficulties, the publication of the first volume seemed to be an important
victory, first of all from an ideological and philosophical point of view, because it was possible to
publish it at the expense of the donations collected, retaining control over copyright and thus
escaping from the copyrastic enclosure built for authors by the publishing and media industry.
There were, however, reasons for dissatisfaction. I myself, of course, did not intend to stop on
Pascal, but it was not quite clear to the public who had seen only the first volume; of course,
there was a considerable number of haters who said that "outdated" Pascal is sold to students
only by those who do not know anything else. The publication of the second volume clearly had
to be accelerated as quickly as possible.
In April 2016, I was offered to give a course "Computer Architecture and Assembly
Language" at the MSU branch in Yerevan; of course, I did not have time to publish the volume
"Low-Level Programming" before the start of my business trip - and, I must say, it is very good
that I did not have time. Reading the lectures, on the basis of which I put the corresponding
chapter of the manuscript, showed that there was something to be corrected; when nine years
before that - in spring 2007 - I gave the same course in Tashkent (from which, in fact, the book
about NASM grew), I didn't know some curious things yet, such as CDECL convention and many
other things; While giving lectures in Yerevan, I clearly saw how some of the examples in the
text should be reworked and how the emphasis should be shifted in some places.
When I returned from Yerevan, I spent the whole of May 2016 on finalizing the parts about
NASM and C; the second volume went to print in early June, and the fact that the project was
still in a deep "minus" did not stop me, I really wanted, as they say, to "close the gestalt" and at
the same time to demonstrate that Pascal is used in my book certainly not because I supposedly
know nothing else; note that I know C much better, for that matter, and in fact I have been
teaching it, not Pascal, to students every year since 2000. The manuscript was burning my
hands, and I did not want to postpone the publication of the second volume.
So the second volume went to press, and the project was down almost 190,000; that was
already a lot more than I could afford. After the marathon publishing of the first two volumes, I
needed a break anyway, so I honestly announced on the website that I would return to work on
the third volume when the project was out of the minus. The break, however, didn't last that long.
On September 29, 2016, having received a record-breaking donation for the fantastic sum of
99999 rubles, I couldn't go on shirking work.
The logic of presentation used in the old book on operating systems had to be completely
broken in the end. To tell in detail about some aspects of the kernel's work it was necessary that
the reader was already familiar with the problems arising when accessing shared data, but of
course it would be not good at all to put the part about parallel programming before the part about
operating systems as a phenomenon; we had to "saw" the material of the already actually ready
part - to tell separately how the kernel looks like from the viewpoint of user tasks, what services
it provides to the user-programmer at the level of system calls, and to make a separate part
about the kernel's operation and the system calls. To this was added an independent part about
computer networks; it started with an already existing chapter on network sockets, grew into a
libécé on TCP/IP stack protocols and, together with a lot of pictures, grew to almost a hundred
pages.
After the dizzying speed with which the text of the first two volumes was produced, work on
the third was unexpectedly slow, not least because of the long search for the correct sequence
of presentation, but also because some of the material was outside my area of confident
knowledge and some issues had to be carefully studied. Teachers, following Richard Feynman,
often say: if you want to master a subject seriously, read a course of lectures on it. I can now say
from my own experience that writing my own book on the subject is an even more reliable
method; I once lectured on operating systems for several years, but I certainly learned even more
about the subject in the course of the book.
In June 2017, the manuscript of the third volume was finally completed and submitted to the
proofreader, and on July 14, 2017. - On July 14, 2017, the day when the sacramental number of
seconds since 01.01.1970 passed one and a half billion, the volume went to press, which I could
not refuse myself the pleasure of mentioning when talking about the time system call. The
project, which had been in a financial plus for some time, again "sank into a minus", this time,
however, not so deeply - by 117 thousand. The total time of work on the manuscript amounted
to 975 hours by this point - almost twice as much as the miserable five hundred hours I had
originally planned; however, I had not intended to turn an old 200-page book on operating
systems into a full 400-page volume, and there was a lot in the first two volumes (after giving up
the idea of publishing one big book) that might not have been there.
The most painful was the last, fourth volume under the general title "Paradigms". It was not
planned at all - except for the part into which my book on C++, which by that time had gone
through three editions, was to be turned. As the book was turning into a four-volume book, I
realized what I wanted to see in the fourth volume except the notorious C++. It turned out to be
quite logical: the first volume - the very basics (at the level of an "advanced" school), the second
volume - how everything actually works, the third - "systematic", and the fourth - "applied science"
where efficiency is not always critical and one can afford all sorts of "liberties" with the style of
thinking.
The amount of work to be done was, frankly, daunting. At first I didn't touch the manuscript
because the project was "in the negative," but on October 2, 2017, that excuse disappeared:
book sales and continuing donations brought the project to a plus, where it has remained ever
since. But even then I could not immediately bring myself to continue working: in the spring,
when I was finishing the third volume, I completely "burned out", I did not manage to recover
over the summer, then I had a "fun" fall semester, and only in January-February 2018, when
students passed the session and rested on vacation, and teachers could afford to do nothing for
a month, I managed to more or less come to my senses.
At the beginning of the semester, students persuaded me to republish Introduction to C++
again (for the fourth time), which required, as they say, "restoring the context", and I took
advantage of this to continue working on the fourth volume. I had to start, as usual, with
rearranging the structure of the parts. Initially I planned four parts: on C++, on some widget library
for creating GUIs (imagine, I even thought it would be Qt; dibs on me), on scripting using Tcl as
an example (plus Tcl/Tk at the same time), and on "exotic" languages like Lisp and Prolog,
where, in fact, I planned to cover, as they say, the topic of paradigms. Almost at once it became
clear that putting off the talk about paradigms to the very end of the volume was an absolutely
crazy idea; there was a short preface devoted to paradigms in the book on C++, but such a
superficial text did not fit into the big book where paradigms were to become one of the main
subjects of discussion. So there appeared the part "about paradigms in gener al", the first in the
fourth volume and the ninth in the general numbering. The part on C++, which was supposed to
be the seventh according to the initial idea, turned out to be the tenth in the end.
My own understanding of programmer reality didn't stand still either. In 2018, the article "Pure
Compilation as a Programming Paradigm" was published, written in co-authorship with my
graduate student Anna Anikina and at that time already former graduate student Oleg
Frantsuzov; I thought it was right to develop the topic of interpreted and compiled execution in
the book, and the part about scripting was best suited for this purpose - but in this form it had to
be put after the discussion about "exotic" (and mostly interpreted) languages; so these two parts
changed places. In order not to multiply the parts, I decided to attach the material about graphical
interfaces and OOP application for them to the part about C++; that's how the structure of the
fourth volume got its final form. Also, I suddenly realized how exactly C language uglifies the
thinking of novice programmers if it is used as the first language in training; the result of this
"enlightenment" was the paragraph "Conceptual Difference between C and Pascal" included into
the part about paradigms.
On June 10, 2019, I had to withdraw the hard copies of the first and second volumes from
free sale, just enough to fulfill my obligations to the donors. The manuscript of the fourth volume,
which by then had been almost two years in the making, was still far from being published, and
I could not even predict its date; it was clear that the work had to be accelerated. But I had no
time to do it - I had other things to do that had nothing to do with the book. Nevertheless, armed
with the notes of my lectures on the course "Programming Paradigms", I got down to the part
about "non-destructive paradigms". The part "about paradigms in general" was already ready by
that time, in the final part the description of Tcl and Tcl/Tk was completed and the most interesting
thing was left - compilation and interpretation as paradigms; there was still "hanging" material
about graphical interfaces, but at least it was already clear that the FLTK library would serve as
a basis, on which I even had to make a small commercial project by that time.
The "non-destructive" part showed me in a convincing way what a pound of flesh is worth. It
would seem that here are the lectures - my own lectures! - on all these languages; put them into
literary form, and it was done. Not so. During the lectures one could get away with suggesting to
"google" how to write real programs with the help of existing implementations of Common Lisp,
Scheme and Prolog, and not those worthless toys that students usually run from inside the
interpreter. This option did not work for the book, and I had to look into the issue in detail,
considering different implementations; for a long time I did not want to believe that everything
was so bad with them. The work on this part stretched for half a year and was finished onl y on
December 12; fortunately, I had to admit at some point that there were no more acceptable
implementations of Refal and abandon the chapter about this language, otherwise it would have
taken even more time. On the bright side, I finally dotted the I'S related to carring and the fixed
point combinator; the chapter on lazy computation has separate paragraphs on both of these
intricate entities.
After finishing this part, I managed, taking advantage of the winter session and vacations, to
finish rather quickly, first, the chapter on creating graphical interfaces with FLTK, and, second,
to finish the "philosophical" aspects of the final part - considering interpretation, compilation and
scripting as special paradigms. On February 9, 2020, the last "to be finalized" mark was dropped
from the manuscript. The fourth volume turned out to be the thickest of all, 656 pages (vs. 464,
496, and 400 pages of the previous volumes). It took almost another month to proofread and
prepare for printing; the volume went to press only on March 5, when the coronavirus panic was
growing worldwide. I would like to take this opportunity to once again thank Alla Nikolaevna
Matveeva and other employees of MAKS Press, who literally snatched the finished volume from
the printing house on the last day before the "general closure" - March 27, 2020.
It is noteworthy that the total time of work on the manuscript amounted to 1703 hours, of
which 728 hours were spent on the fourth volume. At the time of sending the fourth volume to
print, the total amount of donations collected amounted to 1,173,798 rubles.
In the meantime, the first two volumes were long gone, there were very few copies of the
third volume left; working copies of the first three volumes were full of marks, plus there was a
serious ideological revision due to the epiphany about Pascal, C, and side effects. All this made
me think that the second edition should not be delayed. The announcement of the beginning of
work on the second edition, which also contained plans to create a problem book to support the
whole book, was published on the website on May 13, 2020; alas, I could not immediately force
myself to start working on the manuscript - the coronavirus marasmus going on in the country
was not conducive to constructive activity. In May I had only one meaningful working day, during
which the structure of the book was reorganized - three volumes of more or less similar length
were made from four volumes that were very "floating" in volume. All I was able to do in June
was to make the notes accumulated in the first volume into the manuscript during two
"approaches to the shell". More or less dense work began only towards the end of July - and was
almost immediately interrupted by vacation; fortunately, it was this two-week shock "rest" in the
form of a category five water trip that allowed me to get my brain more or less back in working
order.
Initially, I thought that preparing the manuscript for the second edition would take about 200
hours, well, maybe a little more. As usual, reality made its own adjustments: only the preparation
of the first volume of these notorious hours took 140 and more, and I managed to finish it only
on November 11. At the same time, another noteworthy story took place: I finally decided that
everyone who could claim their "prize" set of first edition books had already done so, and I put
the remaining eight sets on sale; despite the knowingly inflated prices, the books sold out in just
over a week. After two months and another 196 hours of work time, the last notes "to be revised"
were thrown out of the new second volume; there was still the third volume, which was made
from the old fourth, but it was the last one to be published, the list of desired revisions and
corrections in it had not yet had time to swell, so one could hope that it would be faster. And so
it was, it took a little over two weeks and "only" about 50 hours to rework the last volume. At the
same time, I figured out how to make the subject index common to all three volumes; what I
didn't expect was that getting it in relative order would take another forty hours or so.
Anyway, on February 4, 2021, I deemed the book ready for a second edition. Since the
beginning of the project, more than 2200 working hours have been spent on it, 500+ of them on
preparing the text for the second edition. The volume of donations received passed 1750,000
rubles, but, unfortunately, the publication of the paper book threw the balance "in the minus"; it
is noteworthy that this happened for the first time since 2018, when the project returned from
negative territory after the publication of the third volume; when the fourth volume was published,
the project did not go into the minus, there was enough money.
I would like to say thank you to everyone who reported errors in the text of the
published volumes; special thanks to Ekaterina Yasinitskaya for her heroic proofreading
work, which borders on a feat, and to Elena Domennova for the beautiful covers. I would
also like to thank Leonid Chayka for his high praise of the book in his popular video blog.
And, of course, I am deeply grateful to all those who participated in financing the project,
thus making it possible. Below is the list of donors (except for those who preferred to
remain incognito):
Gremlin, Grigoriy Kraynov, Arseniy Sher, Vasily Taranov, Sergey Setchenkov, Valeria Shakirzyanova,
Katerina Galkina, Ilya Lobanov, Suzana Tevdoradze, Oksana Ivanova, Julia Kulikova, Kirill Sokolov,
jeckep, Anna Sergeevna Kulyova, Marina Ermakova, Maxim Olegovich Perevedentsev, Ivan Sergeevich
Kostarev, Evgeny Dontsov, Oleg Frantsuzov, Stepan Kholopkin, Artem Sergeevich Popov, Alexander
Bykov, I. Beloborodov.Б., Kim Maxim, artyrian, Igor Elman, Ilyushkin Nikita, Kalsin Sergey
Alexandrovich, Evgeny Zemtsov, Shramov Georgiy, Vladimir Lazarev, eupharina, Nikolay Korolev,
Goroshevsky Aleksey Valerievich, Lemenkov D.D., Forester, say42, Anya "canja" F., Sergey,
big_fellow, Dmitry Volkanov, Tanechka, Tatiana 'Vikora' Alpatova, Andrey Belyaev, Andrey Loshkins
(Alexander and Daria), Kirill Alexeev, kopish32, Ekaterina Glazkova, Oleg "burunduk3" Davydov,
Dmitry Kronberg, yobibyte, Mikhail Agranovsky, Alexander Shepelev, G.Nerc=Y.uR, Vasily Artemyev,
Smirnov Denis, Pavel Korzhenko, Ruslan Stepanenko, Tereshko Grigory Yuryevich 15e65d3d,
Lothlorien, vasiliandets, Maxim Filippov, Gleb Semenov, Pavel, unDEFER, kilolife, Arbichev, Ryabinin
Sergey Anatolievich, Nikolay Ksenev, Kuchin Vadim, Maria Trofimova, igneus, Alexander Chernov,
Roman Kurynin, Andrey Vlasov, Boris Dergachev, Aleksey Alekseevich, Georgy Moshkin, Vladimir
Rutsky, Roman Fedulov, Denis Shadrin, Anton Panfyorov, os80, Ivan Zubkov, Konstantin Arkhipenko,
Alexander Asiryan, Dmitry S. Guskov, Toigildin Vladislav, Masutacu, D.A.X., Kaganov Vladislav,
Anastasia
Nazarova, Gena Ivan Evgenievich, Linara Adylova, Alexander, izin, Nikolay Podonin, Julia
Korukhova, Evgeniya Kuzmenkova, Sergey "GDM" Ivanov, Andrey Shestimerov, vap, Tatyana
Gratsianova, Yuri Menshov, nvasil, V.. Krasnykh, Ogryzkov Stanislav Anatolievich, Buzov Denis
Nikolaevich, capgelka, Volkovich Maxim Sergeevich, Vladimir Ermolenko, Goryachaya Ilona
Vladimirovna, Polyakova Irina Nikolaevna, Anton Khvan, Ivan K., Aleksey Salnikov, Aleksey
Shcheslavsky, Roman Zolotarev, Konstantin Glazkov, Sergey Cherevkov, Andrey Litvinov,
Shubin M.V., Alexey Syschenko, Nikolay Kurto, Dmitry Kovrigin
Anatolievich, Andrey Kabanets, Yuri Skursky, Dmitry Belyaev, Baranov
Vitaly, Sergey Novikovmaxon86, mishamm, Spiridonov Sergey
Vyacheslavovich, Sergey Cherevkov, Kirill Filatov, Chaplygin Andrey, Victor
Nikolayevich Ostroukhov, Nikolay Bogdanov, Baev Alen, Ploskov Alexander,
Sergey Matveeva.k.a. stargrave, Ilya, aykar, Oleg Bartunov, micky_madfree,
Alexey Kurochkin aka kaa37, Nikolay Smolin, I, JDZab,Kravchik Roman
Dmitry Machnevbergentroll, Ivan A. Frolov, Alexander Chashchin, Muslimov Ya,
Sedar,Maxim Sadovnikov, Yakovlev S.D., Rustam Kadyrov, Nabiev Marat
Pokrovsky Dmitry Evgenievich, Zavorin Alexander, Pavlochev Sergey
Yuryevich, Rustam YusupovNoko Anna,Andrey Voronov, Lisitsa Vladimir
Alexey Kovura, Chaika Leonid Nikolaevich, Koroban Dmitry, Alexey
Veresov, suhorez,Olga Sergeyevna Tsaun,Olga Sergeyevna Tsaun, Sergey Boborykin,
Olokhtonov Vladimir, Alexander Smirnitsky, Maxim Klochkov, Anisimov
Sergey, Vadim Vadimovich Chemodurovrumiantcev, babera,Artyom Korotchenko
Evgeny Shevkunov, Alexander Smirnitsky, Artyom Shutov, Zaseev Zaurbek
Slobodnyuk, Yan Zaripov, Vitaly Bodrenkov, Alexander Sergienko,
Denis Kuzakov, Fluffy Bumblebee, Sergey Spivak, suuuumcaa, Gagarin, Valery Gainullin, Alexander
Makhayev (mankms), VD, A.B. Likhachev, Col_Kurtz, Dmitry Sergeevich H., Anatoly Kamchatnov,
Evgeny Tabakaev, Alexander Troshenkov, Andrey Malyuga, Andrey Sorokin, Ivan Burkin, Alexander
Logunov, moya_bbf3, Vilnur_Sh, Alexander Kipnis, Oleg G. Geier, Vladimir Isayev (fBSD), Filimonov
Sergey Vladimirovich vsudakou, Danilov Evgeny, Vorobyov V. Geier, Vladimir Isaev (fBSD),
Filimonov Sergey Vladimirovich, vsudakou, AniMath, Danilov Evgeny, Vorobiev V. S., mochalov,
Kamchatka LUG, Sergey Loginov, Artem Chistyakov, A&A Sulimovs, Denis Denisov, Andrey Sutupov,
kuddai, Aleksey Ozeritsky, alexz, Vladimir Tsoi, Vladimir Berdovshchikov, Sergey Dmitrichenko, Danil
Ivantsov, D.A. Zamyslov, Vladimir Khalamin, Maxim Karasev (begs), ErraticLunatic,
А. Е. Artemiev, FriendlyElk,Alexey Spasyuk, Konstantin Andrievsky
Andreyevich, Vladislav Vyacheslavovich Sukachev, Artyom Abramovmaxon86,Sokolov
Pavel Andreyevich, Alexey N, Nikita Gulyaev, Evgeny Bodiul,rebus_x
The experience of this project has made me rethink my attitude to reality in many
ways, and in some respects even believe in humanity. I can hardly think of another equally
convincing proof that my work is in demand and that I do not waste my time on my books
in vain. But the main conclusion from the success of our project with you, dear donators,
is that we can really do without copyright parasites and the institution of so-called
"copyright" (and in fact purely publishing) law in general. The creators of free
software in their field have shown this long ago; in the field of fiction this fact is also
practically obvious, as evidenced by the multitude of "samizdat" sites on the Internet and
the abundance of amateur translations of foreign "art"; the book you are holding in your
hands is another very clear nail in the coffin of the traditional (i.e. copyright) publishing
and media business built on information violence, and a very serious step towards
building a free information society, I would like to congratulate you and myself once
again on this very convincing, albeit small, victory.
The only refuge for amateur programmers suddenly turned out to be web
development. Unfortunately, once people start in this field, they usually end up in it. The
difference between scripts that make up websites and serious programs can be compared,
perhaps, with the difference between a moped and a dump truck; moreover, having got
used to the "forgiving" style of scripting languages like PHP, most neophytes are
fundamentally unable to switch to programming in strict languages - even in some Java,
not to mention C, and the intricacies of C+ are beyond the horizon of understanding for
such people. Web coders, as a rule, call themselves programmers and often even get good
money without realizing what real programming is and what they have lost for
themselves.
2
In fact, one side effect out of three can be "saved", but it is no longer so essential and also may
(though not necessarily) lead to a loss of effectiveness.
effects supposedly "come in many forms", and it is indeed true - among all side
effects you can single out those that are side effects only because of the C
language structure, and would not be so in other languages... but such arguments
will have no effect on a person who "thinks in C". I'll take the liberty to say that
the well-known "Cishness of the brain" consists precisely in taking side effects
for granted and the term "side effect" itself as having no negative connotation.
Despite all of the above, it is necessary to learn C and low-level programming. A
programmer who does not know C is unlikely to be taken seriously by sensible employers,
even if the candidate is not required to write in C, and there are reasons for that. A person
who does not feel at the subconscious level how exactly a computer does this or that
simply cannot write high-quality programs, no matter how high-level programming
languages he or she uses. It is also best to learn the basics of interaction with the operating
system in C, everything else does not give the full feeling.
The necessity of learning C, if postulated, brings us back to the problem of pointers.
It is necessary to master them before starting to learn C, and for this purpose we need
some language which a) has pointers, and in full size, without any garbage collection; b)
we can do without pointers until the learner is more or less ready to perceive them; and c)
by starting to use pointers the learner will expand his possibilities, i.e. there must be a real
need in pointers. Note that without the last requirement you could use C++ in conjunction
with STL, but you cannot throw this requirement away, we have already discussed above
what will happen to a beginner if you give him containers before low-level data structures.
But only Pascal satisfies all three points simultaneously; this language allows you to
approach pointers smoothly and from afar, without using or introducing them until the
level of the learner is sufficient for this; at the same time, since their introduction, pointers
in Pascal exhibit almost all the properties of "real" pointers, except for address arithmetic.
A search for another language with similar pointer learning capabilities has proved
fruitless; there seems to be no alternative to Pascal.
On the other hand, if we consider learning Pascal as a preparatory stage before C, we
can leave out some of its features, such as type-multiplicity, with operator and nested
subroutines, to save time. It should be remembered that the purpose of learning here is
not "the Pascal language" but programming. There is absolutely no point in insisting on
teaching formal syntax, operation priority tables, and other such nonsense to the student:
the output is not knowledge of Pascal, which the student may never need again, but the
ability to write programs. The highest barriers to the student's way here are, firstly, the
same pointers and, secondly, recursion, which you can also learn how to work with on the
example of Pascal. Note that the CRT module, fondly loved by our teachers (so much so
that the sacramental "uses crt;" can often be seen in programs that do not use any of
its features, even in textbooks), works great in Free Pascals under Linux and *BSD,
allowing you to create full-screen terminal programs; in C it is much harder to do this,
even a professional usually needs a few days to more or less understand the ncurses
library.
Using Pascal also eliminates the problem of side effects. Assignment here is an
operator, not an operation; there is a clear and unambiguous distinction between functions
and procedures, and a separate procedure call operator, so a Pascal program can be
written without a single side effect. Unfortunately, existing implementations destroy this
aspect of conceptual purity by allowing functions to be called for the sake of a side effect
(this was forbidden in the original Virtual Pascal) and by introducing a number of library
functions that have side effects; but if one understands the direction one should take, these
shortcomings are easily circumvented.
Another "inevitability" is assembly language programming. Here we have something
quite reminiscent of the well-known mutually exclusive paragraphs. On the one hand, it
is better never to write anything in assembly language at all, except for short fragments
in operating system kernels (for example, entry points to interrupt handlers and all kinds
of virtual memory management) and in microcontroller firmware. Everything else is more
correct to write in the same C, the efficiency of execution time does not suffer at all and
even in some cases increases due to optimization; at the same time, the gain in labor costs
can reach dozens of times. Most programmers will never encounter a single "assembly"
task in their entire lives. On the other hand, assembly language experience is absolutely
necessary for a skilled programmer; in the absence of such experience, people do not
understand what they are doing. Since assembly languages are almost never used in
practice, the only chance to get some experience is during the learning period, and
therefore it is clear that there is no way we can neglect assembly language.
Learning assembly language can also demonstrate what the operating system kernel
is, why it is needed, and how to interact with it; a system call no longer seems magical
when you have to do it manually at the level of machine commands. Since the goal here,
again, is not to learn a specific assembly language or even assembly programming as
such, but only to understand how the world works, it is certainly not necessary to provide
the student with ready-made libraries that will do all the work for him, in particular, to
translate a number into a textual representation; On the contrary, by writing a simple
assembly language program that reads two numbers from a standard input stream,
multiplies them and outputs the resulting product, the student will understand and feel
much more than if he or she is offered to write something complex and complex in the
same assembly language, but the translation from text to number and back again is done
for him or her in some macro library. Here you should also see how subroutines with local
variables and recursion are organized (no, not on the example of a factorial, which has
been sucked out of your finger and everyone is already bored with it, but rather on the
example of comparing a string with a sample or something similar), how a stack frame is
built, what kind of linkage conventions there are.
If you have to learn assembly language programming anyway, it is logical to do it
before learning C, because it helps you to understand why C is the way it is: this, to put
it mildly, strange language becomes not so strange if you consider it as a substitute for
assembly language. Address arithmetic, assignment as an operation, separate increment
and decrement operations, and much more - all this is easier not only to understand but
also to accept if you already know by now what programming looks like at the level of
CPU instructions. On the other hand, the idea to start learning programming from
assembly language is not even worth discussing, it is an obvious absurdity.
With this in mind, there is a rather unambiguous chain of languages for initial training:
Pascal, assembly language, C. You can add something to this chain at any place, but it
seems that you can neither remove elements from it nor rearrange them.
Knowing C, we can return to the study of the phenomenon called the operating system
and its capabilities from the point of view of a programmer creating user programs. Our
student already understands what a system call is, so we can tell him what they are, using
the level of terminology characteristic of this subject area - namely, the level of describing
system calls in terms of C functions. File I/O, Unix process control (which, by the way,
is organized in a much simpler and clearer way than in other systems), ways of process
interaction - all these are not only concepts demonstrating the structure of the world, but
also new opportunities for the student to develop his own ideas leading to independent
developments. Mastering sockets and the unexpected discovery of how easy it is to write
programs that communicate with each other through a computer network gives students
a great deal of enthusiasm.
At some point in the course it is worth mentioning shared data and multithreaded
programming, emphasizing that it is better not to work with tracks even if you know how
to do it; in other words, you should know how to work with tracks, if only to make a
conscious decision not to use them. At the same time, any qualified programmer needs to
understand why mutexes and semaphores are needed, where the need for mutual exclusion
comes from, what a critical section is, etc., otherwise, for example, when reading a
description of the Linux or FreeBSD kernel architecture, a person will simply not
understand what we are talking about.
It is curious that this is the traditional sequence of programming courses at the VMK
faculty: in the first semester the course "Algorithms and algorithmic languages" is
supported by practice in Pascal, in the second semester the lecture course is called
"Computer architecture and assembly language", and the lectures in the third semester -
"Operating systems" - imply practice in C. The fourth semester is a bit more complicated;
the lecture course there is called "Programming Systems" and is built as a rather strange
combination of introduction to the theory of formal grammars and object-oriented
programming using C as an example. I would venture to say that C is not a very good
language for the initial learning of OOP, and in general this language has no place in the
programs of the main courses: those students who will become professional programmers
can (and do) master C+-+ themselves, while those who will work in related or other
specialties, a superficial acquaintance with a narrowly specialized professional tool,
which is undoubtedly the C+-+ language, does not add either a general outlook or
understanding of the world.
The situation with the readers this book is somewhat different: those who are not
interested in programming as a kind of activity will simply not read it, and those who
originally wanted to become programmers but changed their plans upon closer
acquaintance with this kind of activity will most likely quit reading somewhere in the
second or third part and will not get to the end in any case. At the same time, those who
will finish this book in its entirety - that is, future professionals - may find useful not the
C+ language as such, because it can be learned from any of the hundreds of existing
books, but rather that special view of this language, which your humble servant always
tries to convey to students at seminars of the fourth semester: a view of C+ not as a
professional tool, but as a unique phenomenon among existing programming languages,
as C+ was before it was hopelessly spoiled by the authors of standards. Therefore, C+ is
among the languages described in this book; the peculiarities of my approach to this
language are described in the preface to the corresponding part.
How to ruin a good idea and how to save it
Unfortunately, small things and particularities make the series of programming
courses adopted at VMK hopelessly far from perfection. Thus, in the lectures of the first
semester students are for some reason taught the so-called "standard Pascal", which is a
monster, suitable only for intimidation and not found in nature; at the same time, in the
seminars the same students are forced to program in Turbo Pascal under the same MS-
DOS - on a dead system in a dead environment, in addition, which has nothing to do with
the "standard" Pascal, about which they tell in the lectures. Moreover, the lectures are
built as if the goal is not to learn how to program, but to study in detail the Pascal language
as such, and in its obviously dead version: a lot of time is spent on formal description of
syntax, it is repeatedly emphasized in what sequence the sections of descriptions should
go (any really existing version of Pascal allows you to arrange descriptions in any
sequence, and certainly there is nothing good in following the conventions of standard
Pascal to describe at the very beginning of Pascal, and it is not good to describe in any
order whatsoever).
Things are a little better in the second semester. Until recently, assembly language
was also demonstrated under MS-DOS, i.e. the instruction system of 16-bit Intel
processors (8086 and 80286) was studied. Programming under an obviously dead system
relaxed the students and cultivated a contemptuous attitude towards the subject (and this
attitude was often transferred to the teachers).
In 2010, one of the three streams at VMK started an experiment to introduce a new
program. We should give credit to the authors of the experiment, they eliminated the
rotten corpse of MS-DOS from the educational process, but, unfortunately, along with the
frank deadness, the experimenters threw out the Pascal language, starting the training with
C. This could have been conditionally acceptable if all applicants entering the first course
had at least rudimentary programming experience: for a person who has already seen
pointers, C is not very difficult. Alas, even the USE in computer science cannot ensure
the presence of at least the most rudimentary programming skills in the entrants: it is quite
possible to pass it with a positive score without knowing how to program, not to mention
the fact that pointers are not included in the school program of computer science (and,
accordingly, in the USE program). Most freshmen come to the faculty with absolutely
zero understanding of what programming is and how it all looks like; C becomes their
first programming language in their lives, and the output is an outright disaster.
By the way, the author of these lines warned the ideologists of the notorious
experiment about the problem of pointers, but of course nobody listened to him. Now he
can only state with a certain amount of grim satisfaction that everything ended exactly as
it should have ended - the use of input-output via cin/cout in classes on supposedly
"pure C" and the annual output in the form of many dozens of new "programmers" who
do not know what pure C is and do not understand what they are doing.
It is interesting that at some point the lecturers reading the course "Computer
Architecture" on the remaining two streams, as they call it, were nailed to the wall by the
demand to modernize the course, but this, to put it bluntly, did not help anything. The
lecturers proudly announced the transition to 32-bit architecture, as if the size of a
machine word could affect anything; assembler is now studied under Windows, and for
the workshop they had to make a monstrous wrapper, supposedly allowing to create
window applications in assembler; the fact that the executable files with this wrapper are
four and more megabytes in size is enough to understand what all this actually has to do
with the study of assembly language and its role in the world around us. But the situation
is no better on the first ("experimental") stream, where NASM under Linux is studied:
programs there are written using I/O from the standard C library - and also with its own
wrapping, hiding, in particular, the entry point into the program. Most students who have
mastered this workshop are convinced that the process should end with the ret
command.
With some stretch one can agree that in reality it doesn't make much difference which
assembly language to study, the main thing is to catch the logic of working with registers
and memory areas; but how both versions of the course currently being read are
emphasized is quite impossible to understand. At the end of the course, students usually
don't understand what an interrupt is, and almost nobody knows what a stack frame looks
like; it's not quite clear what the whole semester is spent on and what is the benefit of this
version of the course.
Everything gets more or less back to normal only in the second year, where Unix is
used (previously FreeBSD, now, unfortunately, Linux, since the technical services of the
faculty seem to have failed to support FreeBSD) and pure C is studied in this environment,
which is ideally suited for C. However, before that, two whole semesters are spent, to put
it mildly, with questionable efficiency.
The order of programming disciplines in junior courses adopted at VMK seems to be
potentially the most successful of all that is encountered - if it were not for the above
mentioned "trifles". The persistent unwillingness of some teachers to give up Windows,
and of others to take into account that the purely technical nature of teaching at the
university is inappropriate, puts the future of the whole concept in doubt. All the steps
taken to "modernize" the courses and workshops in recent years (to be more precise -
during the whole time of the author's work at the faculty) turned out to be purely
destructive, destroying the fundamental character of programmer training at VMK and
turning it either into a little sensible technical training, or into a meaningless phenomenon
akin to the well-known cargo cult.
The book you are holding in your hands is an attempt of its author to preserve at least
in some form a unique methodological experience, which is in danger of total oblivion.
In conclusion, I feel it is my duty to give my fellow teachers fair warning about one
important thing. If you want to use this book in the classroom, your own primary way of
interacting with computers in your daily life should be (become?) the command line. The
book is designed for the student to use the command line in preference to graphical
interfaces - that's the only way they have a chance to take the steps listed above toward a
profession. If you copy files, shuffle
You can hardly convince your students that the command line is more efficient and
convenient, because you don't believe it yourself. In that case, this book is useless for you.
Preface three, parting words
This preface, the last of three, is addressed to those for whom the book was written -
to those who have decided to study programming, that is, one of the most fascinating
kinds of human intellectual activity.
For a long time, the smartest and most skillful people have wanted to create something
that works by itself; before the advent of electricity, this was available only to mechanical
watchmakers. In the XVIII century, Pierre Jacquet-Droz created several unique
3
mechanical dolls, which he called "automatons": one of these dolls plays five different
melodies on the organ, while pressing with his fingers the necessary keys of the organ,
albeit made especially for her, but at the same time really controlled by the keys; another
draws on paper quite complex pictures - any of the given three. Finally, the last, the most
complex puppet, the "Writing Boy" or "Calligrapher", writes a phrase on paper by dipping
a goose quill into an inkwell; the phrase consists of forty letters and is "programmed" by
turning a special wheel. This mechanism, completed in 1772, consists of more than six
thousand parts.
Of course, the hardest part of building such an automaton is to come up with all its
mechanics, to find the combination of parts that will make the mechanical arms make
such complex and precise movements; no doubt the creator of the Writing Boy was a
unique genius in the field of mechanics. But once you are dealing with mechanics, genius
alone is not enough. Pierre Jacquet-Droz had to make each of the six thousand parts,
milling them out of metal with fantastic precision; of course, some of the work was done
by the hired workers of his workshop, but the fact remains that, apart from the genius of
the designer of such mechanical products, their appearance requires a huge amount of
human labor, and one that cannot be called creative.
Jacquet-Droz's automatons are a kind of extreme illustration of the possibilities of the
creative human mind combined with the investment of a great deal of routine labor in the
manufacture of material parts; but the same principle can be observed in almost any kind
of engineering activity. A brilliant architect can draw a sketch of a beautiful palace and
with build its detailed design, but the palace will never appear unless there are those
willing to pay for the labor of thousands of people involved in the whole chain of
production of building materials and then in the construction itself. A genius designer can
invent a new car or airplane, which will remain an idea until thousands of other people
agree (most likely for money, which must also come from somewhere) to produce all the
necessary parts and units, and then, combining them all together, to conduct a cycle of
tests and improvements. Everywhere creative technical genius stumbles upon the material
prose of life; we see with our own eyes the results of the work of ingenious designers, if
the resistance of the material environment can be overcome, but we can only guess how
many equally ingenious ideas have been wasted without ever finding an opportunity to be
embodied in metal, plastic or stone.
With the advent of programmable computers, it has become possible to create
something that works by itself, avoiding the complexities associated with material
3
A number of sources say "Droz", but this is incorrect; the last letter in the French surname Droz is
not pronounced.
embodiment. The design of a house, airplane, or automobile is just a formal description,
which must then be used to create the automobile or house itself, otherwise it will be of
no use. A computer program is also a formal description of what should happen, but,
unlike technical projects, the program itself is a finished product. If Pierre Jacquet-Droz
could materialize his ideas by simply making drawings, he would surely surprise the
public with something much more complex than "The Writing Boy". It is not an
exaggeration to say that programmers have such an opportunity; perhaps programming is
the most creative of all engineering professions, and programming attracts not only
professionals but also a great number of amateurs. The eternal question of what is more
in programming - technique or art - has not been solved in anyone's favor and is unlikely
to be solved ever.
The flight of engineering thought, unbound by production routine, inevitably leads to
increasing complexity of programming as a discipline, and this is the reason for some
peculiarities of this unique profession. It is known that a programmer cannot be taught, a
person can become a programmer only by himself or not at all. Higher education is
desirable because a good knowledge of mathematics, physics and other sciences puts
brains in order and sharply increases the potential for self-development; however, we
should admit that all this is desirable but not obligatory. "Programming" subjects studied
at the university may be useful, providing information and skills that otherwise would
have to be found independently; but, observing the development of future programmers,
we can quite definitely say that the role of "programming" subjects in this development
is much more modest than is commonly believed: without a teacher, a future programmer
would find everything he needs by himself, and he does so, since the efforts of teachers
meet his needs in special knowledge by only a quarter.
Being a university teacher myself, I have to admit that I know many excellent
programmers who have non-core higher education (chemical, medical, philological) or
even no diploma at all; on the other hand, being a professional programmer, though now,
perhaps, a former one, I must say that core university education certainly helped me in
terms of professional growth, but in general, I made myself a programmer by myself,
another option is simply impossible. So, higher education for a programmer is desirable
but not obligatory, but self-study, on the contrary, is categorically necessary: if a potential
programmer does not make himself, others will not make a programmer out of him at all.
The book you are reading now is the result of an attempt to gather together the basic
information you need when learning programming on your own, so that you don't have to
search for it in various places and sources of dubious quality. Of course, you can become
a programmer without this book; there are many different paths you can take to eventually
come to an understanding of programming; this book will show you certain waypoints,
but even with that in mind, your path to your goal will remain yours alone, unique and
different from others.
This book alone will not be enough to become a programmer; all you can get out of
it is a general understanding of what programming is as a human activity and how
approximately it should be done. Besides, this book will remain an absolutely useless pile
of paper for you if you decide to just read it without trying to write programs on a
computer. One more thing: this book will not teach you anything if the Unix command
line is not your primary means of everyday work with your machine.
The explanation for this is very simple. To become a programmer, you first have to
start writing programs so that they work; then at some point you have to switch from
sketches to trying to extract some usefulness from your own programs; then you have to
take the last crucial step - to bring the usefulness of your programs to such a level that
someone other than you starts using them. Writing any useful program with a graphical
interface is quite difficult - you have to be a programmer to do it, but to become one, you
have to start writing useful programs. This vicious circle can be broken by dropping the
graphical interface from consideration, but programs that have no graphical interface and
yet are useful only exist in Unix OS, nowhere else.
Unfortunately, there is one more not very pleasant thing, which it would be better to
take into account from the very beginning. Not everyone can become a programmer, and
it's not a matter of intelligence or "aptitude", but of your individual aptitudes.
Programming is a very hard work that requires extreme intellectual tension, and only
those relatively rare perverts who are able to enjoy the process of creating computer
programs can endure this torture. It is quite possible that in the course of studying this
book you will realize that programming is "not your thing"; that's okay, there are many
other good professions in the world. If this book "only" allows you to realize in time that
this is not your path and not to spend the best years of your life on fruitless attempts to
study at a university in some programming specialty - well, this is a great deal in itself:
the best years wasted will never be returned to you, and the sooner you realize what you
need (or rather, don't need), the better.
But enough about sad things. The first, introductory part of this book contains
information that you will need later in programming, but which does not require
programming exercises in itself. It can take you from one day to several weeks to learn
the introductory part; during this time, try to install some Linux or BSD system (FreeBSD,
OpenBSD, or any other system - of course, if you can manage to install it) on your
computer and start using this system in your daily work. For this purpose, you can use
almost any old computer that has not yet crumbled into rusty ashes; you are unlikely to
find a "live" Pentium-1 nowadays, but a Pentium-II class machine from the late 1990s is
enough to run some of the actively supported Linux distributions. By the way, you can
use the appearance of the necessary operating system in your household as a test of your
own readiness for further: if three or four weeks have passed, and there is still nothing
Unix-like on your computers, you can not deceive yourself: you simply do not need
further attempts to "learn to program".
Once you have Unix at your disposal, start by trying to do as much of your normal
"computer stuff" as possible in it. Yes, you can listen to music, watch photos and videos,
access the Internet, have adequate substitutes for the usual office applications, you can
do everything. At first, it may be unfamiliar and hard to use; don't worry about it
Don't worry, this period will soon pass. When you get to the beginning of the second part
of our book, take your text editor and Pascal compiler in your hands and try it out. Try,
try, try, try, try, try! Know that your computer will not explode from your programs, so
try harder. Try this and that, try this and that. If some task seems interesting to you - solve
it, it will be more useful than the tasks from a problem book. And remember: all of this
should be "fun"; it is useless to torture programming.
To all those who are not afraid, I sincerely and wholeheartedly wish you success. I
have spent more than six years on this book; I hope it was not in vain.
Part III, also included in the first volume, is devoted to assembly language
programming; together with the next part, devoted to the C language and located in the
second volume, it is intended to demonstrate an important phenomenon, conventionally
called low-level programming.
It will be appropriate to say a few words for those who doubt the necessity of learning
low-level as such. The difference between programmers who know how to program in
assembly language and C and those who don't is really the difference between those who
know what they are doing and those who don't. The statement that in modern conditions
"you can do without it" is partly true: among the people who get paid for writing programs
you can find those who don't know how to work with pointers, those who don't know the
machine representation of integers, and those who don't understand the word "stack"; it
is also true that all these people find quite well-paid positions for themselves. This is all
true; but to conclude that low-level programming is "unnecessary" would be at least
strange. The ability to write programs without fully understanding one's own actions is
created by software that cannot be written in this style by itself; this software, usually
called system software, obviously has to be developed by someone. And the statement
that "you don't need a lot of system developers" seems quite ridiculous: there is an
objective shortage of qualified people, i.e. the demand for them exceeds the supply, so
you need them, at any rate, more than there are; well, the fact that you need them in
general less than those to whom high qualification in their work is not so critical has no
relevance here at all, because what matters is the ratio of supply and demand, not the
volume of demand as such.
Assembler and C have one very important thing in common: it is absolutely
impossible to do either of them without a thorough understanding of what is going on. A
trainee may not be able to "pull off" machine-level programming and go into web
development, computer support of business processes, and other similar areas, but this is
no reason not to try to teach anyone serious programming from the start.
While C language belongs to the actively used professional tools, "assembly
language" in modern conditions is written very rarely and in very specific cases; the vast
majority of programmers do not encounter a single assembly language task in their entire
life. Nevertheless, the ability and experience of working at the machine instruction level
is vital to understanding what is going on, making the study of assembly language
rigorously necessary. We can consider that all the material in the first volume (i.e. the
first three parts of the book) is united by one property: most likely, it will not be directly
applicable in your future professional practice, but you cannot become a good
programmer without it. That's why the volume is called "Programming Basics".
The second volume, as already mentioned, opens with Part IV, devoted to the C
language; knowing this language is very important in itself, but we will need it, among
other things, to master the later parts of the second volume, in which all the examples are
written in C.
In Part V, we will learn about the main "visible" objects of the operating system and
how to interact with them through system calls; this part includes material on file I/O,
process management, interprocess communication, and terminal driver management.
The discussion of core system services continues in Part VI on computer networks.
Any data transfer over a network is, of course, also made possible only by the operating
system. Experience has shown that the simplicity of the socket interface and the ease with
which Unix allows you to create programs that communicate with each other over a
network literally delights many students and dramatically increases the "degree of
enthusiasm". The material on sockets is preceded by a small "literacy" on networks in
general, the TCP/IP protocol stack is considered, and examples of application layer
protocols are given.
Part VII describes what problems may arise when several "actors" (running programs
or instances of one and the same program) simultaneously access one and the same
portion of data, whether it is a RAM area or a disk file. This is exactly the situation that
occurs if you use so-called tracks - independent threads of parallel execution within one
instance of a running program. We must admit that this part of the book is written rather
not to teach the reader how to use tracks (although all the information necessary for this
purpose is there) but to convince him/her that tracks should not be used; but even if the
reader, following the author, decides never to use multithreaded programming for
anything, the material of this part will remain useful. First, such a decision should have
been made consciously, with an opportunity to give arguments in its favor. Secondly,
working with shared data is not only in multithreaded programs: the same multi-user
databases are an example of this, and sooner or later any professional programmer will
face a task of this kind. Besides, working with shared data is unavoidable in the operating
system kernel, so it would be difficult to explain some aspects of its internal structure
without a preliminary talk about shared data, critical sections and mutual exceptions.
The volume ends with Part VIII, which attempts to explain how an operating system
works from the inside out. Here we will learn about virtual memory models, talk about
CPU time scheduling, and how I/O is actually organized (i.e., at the OS kernel level,
where it all really happens).
All this can be - again conditionally - combined by the term "system programming";
the C language as the most suitable for creating system programs also belongs to this area,
so don't be surprised that the part devoted to this language appeared in the volume under
the general title "Systems and Networks" together with the material devoted to the
operating system and computer networks.
The third and final volume of the book is entitled "Paradigms". The programming
languages discussed in the first and second volumes - Pascal, assembly language and C -
are often referred to the so-called Von Neumann languages, because their construction is
4
4
The rules of the Russian language make the spelling of the adjective "von Neumannian" and
especially its antonym "nefonneimanian" problematic. If we follow the letter of the rules, "von
Neumannian" should be spelled separately, but the author of these lines has a strong inner protest against
such spelling; as for "nefonneimanian", there is no correct spelling for it at all, any variant violates some
rule. Here and hereafter we will use the fused spelling; if you like, consider it as the author's orthography.
C, and other von Neumann languages are used to. This leads to the emergence of a variety
of programming paradigms.
The first part of the third volume, numbered IX, discusses programming paradigms
(and paradigms in general) as a phenomenon. Here, the reader will find explanations of
what paradigms are and what this phenomenon looks like in programming; examples of
private paradigms are discussed, including those that the reader has already met before
(recursion, event-driven programming, etc.) and an overview of "big" paradigms, such as
functional, logic, and object-oriented programming. Most of the examples are based on
the C language, and only to demonstrate logic programming the Prolog language is used
with appropriate explanations.
Part X is devoted to the C+ language and the paradigms of object-oriented
programming and abstract data types. C+, to use "modern" terms, is presented as a
truncated subset, which does not include any "features" imposed on the world by
standardization committees; more details about the choice of the C+-+ subset are given
in §10.2. The main material of this part has been published several times before in a
separate book; as a kind of "bonus", the part includes a chapter on building graphical user
interfaces in C+ using the FLTK library.
Part XI is entirely devoted to the alternative view of programs, which assumes that
nothing changes during execution - new information may appear (and does appear), but,
having once appeared, any data object remains unchanged until it disappears (leaves the
scope of visibility). Here we will finally get acquainted with functional and logical
programming, for which we will consider "exotic" programming languages of "very high
level" - Lisp, Scheme, Prolog and Hope.
In the final, XII part of our book, the strategies of program execution - interpretation
and compilation - are considered as peculiar programming paradigms. The part begins
with a look at the command-scripting language Tcl, whose interpreted nature is
unquestionable; it is in this capacity that we are interested in it. The study of Tcl comes
with one more "bonus" related to GUIs, but not directly related to the study of paradigms
- a brief acquaintance with the Tcl/Tk library, which allows you to build simple GUI
programs very quickly "on your knees". Having completed the study of Tcl, we will
devote the rest of this section to the peculiarities of programmer's thinking, conditioned
by the chosen strategy of program execution, and discuss the limits of what is permissible
when applying interpretation and compilation.
Note that before you start studying the material in Volume 3, especially the C/-+
part, you should have some programming experience. Your programs must reach
volumes measured in thousands of lines, and they must have third-party users; only then
will you understand what object-oriented programming is and why you need it. Haste in
this matter is fraught with irreversible consequences for thinking. As they say, he who is
forewarned - well, he who is forearmed.
In the text of all three volumes, there are fragments typed in reduced sans serif font.
When reading the book for the first time, you can safely skip such passages; some of them
may contain forward references and are intended for readers who already know something
about programming. Examples of what not to do are marked with this sign in the margin:
New concepts introduced are in bold italics. In addition, the text uses italics for semantic
emphasis and bold type to emphasize facts and rules that it is desirable not to forget,
otherwise there may be problems with subsequent material.
At the end of volume three you will find a general subject index; for each term it is
indicated in which volume and on which page it appears in the text - for example, 2:107
means that the term you are interested in can be found on page 107 of volume two. 107
of the second volume.
The home page for this book on the Internet is located at
http://www.stolyarov.info/books/programming_intro
Here you can find an archive of the example programs given in the book, as well as an
electronic version of the book itself. For the examples included in the archive, the file
names are given in the text.
Part 1
Preliminary information
5
This follows even from its name: the English word computer literally translates as "calculator", and
the official Russian term "computer" is formed from the words "electronic computer".
§ 1.1. Computer: what it is 53
The machine was capable of addition and subtraction; overflow was indicated by the
ringing of a bell. The machine has not survived to this day, but a working copy was created
in 1960. According to some reports, Shikkard's machine may not have been the very first
mechanical counting machine: Leonardo da Vinci's sketches (XVI century) depicting a
counting mechanism are known. It is not known whether this mechanism was embodied
in metal.
The oldest surviving counting machine is Blaise Pascal's arithmometer, created in
1645. Pascal began work on the machine in 1642 at the age of 19. The inventor's father
dealt with tax collection and had to perform long, grueling calculations; with his
invention, Blaise Pascal hoped to make his father's work easier. The first sample had five
decimal disks, that is, could work with five-digit numbers. Later, machines with up to
twenty disks were created. Addition on Pascal's machine from the point of view of the
operator was simple - you had to type first the first summand, then the second; as for
subtraction, for it had to use the so-called method of nine additions.
If we have (for example) only five digits, the carry to the sixth digit, as well as the borrow
from it, is safely lost, which allows us to perform the addition of some other number instead of
subtracting a number. For example, if we want to subtract the number 134 ( 00134) from the
number 500 (that is, at five digits, from 00500), we can add the number 99866 instead.
If we had a sixth digit, we would get 100366, but since there is no sixth digit, the result is
00366, which is exactly what we need. As it is easy to guess, the "magic" number 99866 is
obtained by subtracting our subtractor from 100000; from the point of view of arithmetic, instead
of the operation x - y we perform x + (100000 - y) - 100000, and the last subtraction occurs by itself
due to the transfer to the non-existent sixth digit.
The trick here is that it turns out to be unexpectedly simple to get the number 100000 - y from
the number y. Let's rewrite the expression 100000 - y in the form 99999 - y + 1. Since the number
y is not more than five-digit, the subtraction of 99999 - y in the column will occur without a single
loan, that is, simply each digit of the number y will be replaced by a digit that completes it to nine.
It remains only to add a one, and the job is done. In our example, the digits 00134 are
replaced by their corresponding digits 99865, then add one and we get the "magic" 99866,
which we added to 500 instead of subtracting 134.
On Pascal's arithmometers, subtraction was performed in a slightly trickier way. First, it was
necessary to dial the nine-digit addition of the decreasing (the number 99999 - x, for our example
it will be 99499), for which the drums with the digits of the result, visible through special
windows, contained two digits each - the main and complementary to the nine, and the machine
itself was equipped with a bar, with the help of which the "unnecessary" row of digits was closed,
so that it did not distract the operator. To the dialed nine addition was added subtracted, in our
example 00134, that is, the number 99999 - x + y was obtained. However, the operator kept
looking at the digits of the nines, which displayed 99999-(99999-x + y), that is, just x - y. For the
numbers in our example, the result of the addition would be the number 99633, whose nine-
digit complement, the number 00366, is the correct result of the 500 - 134 operation.
Now this method seems to us something like a trick, curious, but not very necessary in
modern realities. But we will meet with the calculation of addition, which requires addition of one,
when we will discuss the representation of negative integers in a computer.
Thirty years later, the famous German mathematician Gottfried Wilhelm Leibniz built
a mechanical machine capable of performing addition, subtraction, multiplication and
§ 1.1. Computer: what it is 54
division, and multiplication and division were performed on this machine in the same way
we perform multiplication and division in a column - multiplication is performed as a
sequence of additions, and division - as a sequence of subtractions. In some sources you
can find a statement that the machine was supposedly able to calculate square and cube
roots; in fact, this is not true, just calculate the root, having a device for multiplication, is
much easier than without it.
The history of mechanical arithmometers lasted for quite a long time and ended in the
second half of the 20th century, when mechanical calculating devices were replaced by
electronic calculators. One common property of arithmometers is important for our
historical excursion: they could not perform calculations consisting of more than one
action without human participation; meanwhile, solving even relatively simple problems
requires performing long sequences of arithmetic operations. Of course, arithmometers
facilitated the labor of calculators, but there was still a need to write out intermediate
results on paper and type them manually with the help of wheels, levers, in later versions
- with the help of buttons.
The English mathematician Charles Bebbidge (1792-1871) drew the attention of to 14
the fact that the labor of calculators could be automated completely; in 1822 he proposed
the design of a more complex device known as a difference machine. This machine was
to interpolate polynomials by the finite difference method, which would automate the
construction of tables of a variety of functions. Having secured the support of the English
government, Bebbidge began work on the machine in 1823, but the technical difficulties
he encountered somewhat exceeded his expectations. The story of this project is told in
different ways by different sources, but all agree that the total amount of government
subsidies amounted to a whopping £17,000 at the time; some authors add that Bebbidge
spent a similar amount from his own fortune. The fact is that Bebbidge never built a
working machine, and in the course of the project, which dragged on for almost two
decades, he himself cooled to his idea, concluding that the method of finite differences
was only one (albeit important) of a huge number of calculation problems; the next
machine conceived by the inventor was to be universal, that is, adjustable to solve any
problem.
In 1842, having failed to obtain any working device, the British government refused
to further finance Bebbidge's activities. Based on the principles proposed by Bebbidge,
the Swede Georg Schutz in 1843 completed the construction of a working difference
machine, and in the following years built several more copies, one of which he sold to the
British government, the other - to the government of the United States. At the end of the
20th century, two copies of Bebbidge's difference machine were built from his original
drawings, one for the Science Museum in London and the other for the Computer History
Museum in California, thus demonstrating that Bebbidge's difference machine could work
if it were completed.
However, in historical terms, it is not the difference machine that is more interesting,
but the universal computing machine conceived by Bebbidge, which he called analytical.
The complexity of this machine was such that Bebbidge could not even fulfill its
14
In fact, an earlier description of the difference machine is known - in a book by the German
engineer Johann von Muller, published in 1788. It is not known whether Bebbidge used the ideas from
this book.
§ 1.1. Computer: what it is 55
drawings; the conceived device exceeded the possibilities of the technology of that time,
and his own possibilities as well. Anyway, it was in Bebbidge's works on the analytical
machine that, first, the idea of program control, i.e. the execution of actions prescribed
by a program, emerged; and, second, actions not directly related to arithmetic appeared:
the transfer of data (intermediate results) from one storage device to another and the
execution of certain actions depending on the results of data analysis (e.g., comparison).
In the same year, when the British government stopped funding the difference
machine project, Bebbidge gave a lecture at the University of Turin, devoted mainly to
the analytic machine; the Italian mathematician and engineer Federic Luigi Menabrea
published in French an abstract of this lecture. At Babbage's request, Lady Augusta Ada
15
Lovelace translated this abstract into An glish, supplying her translation with extensive
16
commentaries much larger than the article itself. One section of these comments contains
a complete set of commands for computing Bernoulli numbers on an analytical machine;
this set of commands is considered the first computer program ever written, and Ada
Lovelace herself is often referred to as the first programmer. Interestingly enough, Ada
Lovelace, while pondering the possibilities of the analytical machine, was already able to
look into the future of computers; among other things, her comments contained the
following fragment: "The essence and purpose of the machine will change from the
information we put into it. The machine will be able to write music, paint pictures, and
show science in ways we have never seen before." In fact, Ada Lovelace observed that the
machine conceived by Bebbidge could be seen as a tool for processing information in a
broad sense, while the solution of computational mathematical problems is only a special
case of such processing.
If a working difference machine, as mentioned above, was still a
built in the middle of the 19th century, although not by Bebbidge, the idea of a
programmable computer was almost a hundred years ahead of the state of the art: the first
working program-controlled computers appeared only in the second quarter of the 20th
century. At present it is considered that chronologically the first programmable computer
was Z1, built by Konrad Zuse in Germany in 1938; the machine was completely
mechanical, electricity was used only in the motor that drove the mechanisms in motion.
The Z1 used binary logic, and the elements that calculated logical functions such as
conjunction, disjunction, etc., were realized as sets of metal plates with clever cutouts.
For the interested reader, we can recommend to find a video on the Internet demonstrating
these elements on an enlarged model: the impression made by their work is certainly
worth the time spent.
The Z1 machine was not very reliable, the mechanisms often jammed, distorting the
result, so it was not practical, but it was followed a year later by the Z2, which used the
same mechanics to store information ("memory"), but performed computational
operations using electromagnetic relays. Both machines carried out instructions received
from punched tape; they were unable to rewind the tape, which severely limited their
Menabrea published quite a lot of his works in French, which in those days was more popular as an
15
Figure 1.1. Radio lamp (double triode) in action (left); circuit of a trigger on two triodes
5
(right)
By taking two triodes and connecting the anode of each triode to the grid of the other, we
have a device called a trigger. It can be in one of two stable states: current flows through one of
the two triodes (it is said to be open), and due to this, there is a potential on the grid of the second
triode that prevents the current from flowing through the second triode (the triode is closed). By
briefly applying a negative potential to the grid of the open triode, we stop the current through it,
as a result the second triode opens and closes the first triode; in other words, the triodes switch
roles and the trigger goes to the opposite steady state. A trigger can be used, for example, to
store a single bit of information. Other triode connection schemes allow to build logic gates
realizing conjunction, disjunction and negation. All this makes it possible to use radio tubes to
build an electronic computing device.
Due to the absence of mechanical parts, machines using electron-vacuum lamps
worked much faster, but the radio lamps themselves are quite unreliable: the bulb can lose
its seal, the coil heating the cathode will burn out over time. One of the first programmable
computers - ENIAC - contained 18,000 lamps, and the machine could only work if all the
lamps were in good working order. Despite unprecedented measures taken to improve
reliability, the machine had to be constantly repaired.
5
Photo from Wikipedia; the original can be downloaded from
https://en.wikipedia.Org/wiki/File:Dubulttriode_darbiibaa.jpg. Images used here and hereafter from
Wikimedia Foundation sites are licensed for redistribution under various Creative Commons licenses;
detailed information, as well as original images in substantially better quality, can be obtained from the
respective web pages. In what follows, we omit detailed notation, limiting ourselves to referring only to
the web pages containing the original image.
ENIAC was created by American scientist John Moushley and his student J. Eckert; the work
was started during World War II and financed by the military, but, fortunately for the creators of
the machine, they were not able to complete it before the end of the war, so the project was
declassified. The pioneers of tube computer construction from Great Britain were less lucky: the
Colossus Mark I and Colossus Mark II machines built in the strictest secrecy after the end of the
§ 1.1. Computer: what it is 58
war by Churchill's personal order were destroyed, and their creator Tommy Flowers, obeying
17
the same order, was forced to burn all the design documentation, which made it impossible to
recreate the machines. The general public became aware of this project only thirty years later,
and its participants were deprived of deserved recognition and were actually excommunicated
from the world development of computer technology. By the time the project was declassified,
the achievements of the Colossus creators were only of historical interest, and most of them
were lost when the machines and documentation were destroyed.
It is often claimed that the Colossus machines were designed to decrypt messages
encrypted by the German electromechanical encryption machine Enigma, and that the famous
mathematician Alan Turing, one of the founders of the theory of algorithms, participated in the
project (and almost led it). This is not true; Turing did not take any part in the Colossus project,
and the machine built with his direct participation and really intended for breaking Enigma codes
was called Bombe, was purely electromechanical and, strictly speaking, was not a computer, as
well as Enigma itself. Tommy Flowers' machines were designed to break ciphergrams made with
the Lorenz SZ machine, the cipher of which was much more resistant to breaking than the
Enigma cipher, and did not lend itself to electromechanical methods.
However, Tommy Flowers did have a chance to work under Turing for some time in one of
the British cryptanalytic projects, and it was Turing who recommended Flowers' candidacy for
the Lorenz SZ-related project.
Computers built on radio tubes are usually called first-generation computers; it
should be noted that only electronic computers are distinguished by generations, and all
kinds of mechanical and electromechanical computers are not included in them. In
particular, Konrad Zuse's machines were not electronic, so they are not considered "first-
generation computers" or computers in general.
The capabilities of the machines of this era were very limited: due to the bulky
element base we had to make do with meager (by modern standards) memory capacities.
Nevertheless, it was to the first generation that one of the most important inventions in
the history of computing machines belongs - the principle of stored program, which
implies that the program in the form of a sequence of command codes is stored in the
same memory as the data, and the memory itself is homogeneous and command codes
do not differ from data in principle. Machines corresponding to this principle are
traditionally called von Neumann machines in honor of John von Neumann.
The history of the name is rather peculiar. One of the first electronic machines to store a
program in memory was the EDVAC computer; it was built by Moushley and Eckert, who are
familiar to us from ENIAC, and they discussed and designed the new machine already during
the construction of ENIAC. John von Neumann, who participated as a scientific consultant in the
Manhattan project , became interested in the ENIAC project because Manhattan required huge
18
amounts of calculations, on which a whole army of female calculators who used mech anical
arithmeters worked. Naturally, von Neumann took an active part in discussions with Moushley
and Eckert about the architectural principles of the new machine (EDVAC); in 1945 he
summarized the results of the discussions in a written document known as the First Draft of the
There is a version that Churchill's goal was to prevent publicity of the fact that he knew in advance
17
about the mass bombing of Coventry from the intercepted ciphergrams, but he did nothing about it in order
not to give away Britain's ability to open German ciphers; however, historians differ on this point.
Moreover, the version about the conscious sacrifice of Coventry is refuted by a number of testimonies of
the direct participants of the events of that time. The question of why it was necessary to destroy the
equipment used to open the ciphers after the end of the war remains open.
The American project to build an atomic bomb.
18
§ 1.1. Computer: what it is 59
Report on the EDVAC machine. Von Neumann did not consider the document complete: in this
version, the text was intended only for discussion by members of the Moushley and Eckert
research group, which included, among others, Herman Goldstein. The prevailing version of
historical events is that it was Goldstein who commissioned the reprinting of the manuscript
document, putting only von Neumann's name on its title page (which is formally correct, since
von Neumann was the author of the text, but not quite correct in the light of scientific traditions,
since the ideas set forth in the document were the result of collective work), and then, having
reproduced the document, sent out several dozen copies to interested scientists. It was this
document that intentionally linked von Neumann's name with the relevant architectural principles,
although von Neumann does not appear to be the author (at least not the sole author) of most of
the ideas presented there. Later von Neumann built another machine, the IAS, in which he
embodied the architectural principles outlined in the "message".
There are many interesting stories associated with the computational work done for the
Manhattan Project; some of them were described by another participant in the project, the
famous physicist Richard Feynman, in his book "You're Joking, Mr. Feynman" [7]. [7]. There is,
in particular, such a fragment:
And as for Mr. Frenkel, who started all this activity, he began to suffer from computer
disease - everyone who has worked with computers today knows about it. It is a very
serious disease, and it is impossible to work with it. The trouble with computers is that
you play with them. They are so beautiful, so many possibilities - if it's an even number,
you do this, if it's an odd number, you do that, and very soon you can do more and more
sophisticated things on one single machine, if you are smart enough.
After a while, the whole system fell apart. Frenkel paid no attention to it, he was no longer
in charge. The system was very, very slow, and he was sitting in his room, figuring out
how to get one of the tabulators to automatically print the arctangent of X. Then the
tabulator would turn on, print the columns, then - bang, bang, bang - calculate the
arctangent automatically by integration and produce the whole table in one operation.
Absolutely useless. After all, we already had arctangent tables. But if you've ever worked
with computers, you understand what a sickness it is - the delight of being able to see
how much can be done.
Unfortunately, our time is too different from when Feynman worked on the Manhattan Project
and even from when he wrote his book. Not everyone who now deals with computers is aware
of the existence of this "computer disease", computers have become all too commonplace, and
most people find computer games much more fun than "playing" with the computer itself, with its
capabilities. Feynman is absolutely right that "everyone who worked with computers" knew about
this disease - it's just that in those days there were no "end users", everyone who worked with
computers was a programmer. Strangely enough, it is this "disease" that turns a person into a
programmer. If you want to become a programmer, try to catch the disease described by
Feynman.
In one way or another, the stored program principle was a definite breakthrough in
the field of computer science. Before that, machines were programmed either with
punched tapes, like Konrad Zuse's machines, or with jumpers and toggle switches, like
ENIAC; it took several days to physically set the program - to rearrange all the jumpers
and switch the toggle switches - and then the count would pass in an hour or two, after
which the machine had to be reprogrammed again. Programs in those days were not
written, but rather invented, because in essence a program was not a sequence of
instructions, but a scheme of connection of machine nodes.
§ 1.1. Computer: what it is 60
Storing a program in memory in the form of instructions made it possible, firstly, not
to spend a huge amount of time on changing a program: it could be read from an external
medium (punched cards or a deck of punched cards), placed in memory and executed,
and this happened quite quickly; of course, it also took a lot of time to prepare the program
- to think it up and then put it on punched cards or punched cards - but it did not use up
the time of the machine itself, which cost a lot of money. Secondly, the use of the same
memory both for command codes and for processed data made it possible to treat a
program as data and to create programs that operated on other programs. Such now-
familiar phenomena as compilers and operating systems would have been unthinkable on
machines that did not meet the definition of a von Neumann machine.
Strictly speaking, von Neumann's architectural principles include not only the principle of the
stored program but also a variety of other properties of a computer; you will return to this point
in §3.1.1.
Meanwhile, the memory capacity of computers had managed to grow somewhat; for
example, the already mentioned IAS of John von Neumann had 512 memory cells of 40
bits each. But while the Americans continued to build computers focused exclusively on
scientific and engineering numerical calculations, even if using a stored program, in
Britain at the same time found people who paid attention to the potential of computing
machines to process information beyond the narrow "calculation" area. The first, or at any
rate one of the first computers, originally designed for purposes broader than numerical
calculations, is considered to be the LEO I, developed by the British company J. Lyons
& Co. Lyons & Co.; it is noteworthy that this firm, which was engaged in the food supply,
restaurant and hotel business, had nothing to do with the engineering industry. In 1951,
the newly built computer took over a large part of the accounting and financial analysis
functions of the company, and the actual calculations as such, although noticeable, but by
no means the largest share of the operations performed by the machine. By accepting
input data from the punch cards and outputting the results to a text printing device, the
machine made it possible to automate the preparation of payrolls and other similar
documents. Ada Lovelace's prophecy slowly began to come true: the object of work for
the computer program was information, and mathematical calculations were an important,
but by no means the only way to process it.
Meanwhile, the growing level of technology was inevitably approaching a revolution
in computer construction. The main innovation behind the revolution was the
semiconductor transistor, an electronic element that is circuitry very similar to the radio
lamp triode. The transistor, like the triode, has three contacts, which are usually called
"base," "emitter," and "collector" (or "gate," "source," and "drain" for a type of transistor
called a field-effect transistor). When the voltage at the base changes with respect to the
emitter (at the gate with respect to the source), the current between the emitter and
collector (between the source and the drain) changes. In analog electronics, both the triode
tube and the semiconductor transistor are used to amplify the signal because the currents
flowing between the anode and cathode of the triode, and between the emitter and
collector of the transistor, can be much larger than the signals applied to "close" them to
the grid or base, respectively. In digital circuits, power does not play a role; here the
control effect as such is more important. In particular, like two triodes, two transistors
allow you to make up a trigger, with the current flowing through one transistor closing
§ 1.1. Computer: what it is 61
the other, and vice versa.
The first working transistor is believed to have been created in 1947 at Bell Labs, with
William Shockley, John Bardeen and Walter Brattain credited as the inventors; they were
awarded the Nobel Prize in Physics a few years later. Early transistors were bulky,
unreliable, and inconvenient to work with, but rapid improvements in crystal growth
technology made it possible to mass-produce transistors, which, compared to radio tubes,
were, first, quite tiny; second, transistors did not require cathode heating, so they also
consumed much less electricity; finally, again compared to tubes, transistors were
virtually trouble-free: Of course, they also sometimes failed, but this was an emergency,
whereas lamp failure was just a routine event, and the radio tubes themselves were
regarded as consumables rather than permanent parts of the design.
The second major invention, which determined the change of computer generations,
was the memory on magnetic cores. The bank of such memory (Fig. 1.2, right) was a
rectangular grid of wires with ferrite rings at its nodes; each ring stored one bit of
information, replacing the bulky device of three or four radio lamps used for the same
purpose in the first generation of computers. Computers built on solid-state electronic
components, primarily transistors, are commonly referred to as second-generation
computers. If the first-generation computers occupied entire buildings, the second-
generation machine fits in one room; power consumption has been dramatically reduced,
and the capabilities, and above all the amount of RAM, significantly increased. The
reliability of machines has also increased, as transistors fail much less frequently than
radio lamps. The cost of computers in monetary terms dropped significantly. The first
fully transistorized computing machines were built in 1953, and in 1954 IBM released the
IBM 608 Transistor Calculator, which is called the first commercial computer.
The next major change in the approach to building computers was made possible with
the invention of integrated circuits - semiconductor devices in which a single crystal
houses the
Several (up to several billion in modern conditions) elements such as transistors, diodes,
resistors and capacitors. Computers based on integrated circuits are considered to be of
§ 1.1. Computer: what it is 62
the third generation; despite their still very high cost, it became possible to mass-produce
these machines - up to tens of thousands of copies. The central processor of such a
computer was a cabinet or nightstand full of electronics. As the technology improved,
microchips became more and more compact, and their total number in the CPU steadily
decreased. In 1971 the next transition of quantity into quality took place: microchips were
created that contained the whole central processor. Which chip became the first
microprocessor in history is not known; most often it is called Intel 4004, about which
we can at least say for sure that it was the first microprocessor available on the market.
According to some sources, the MP944 chip, which was used in the avionics of the F-14
fighter jet, should be given the lead in this case, but the general public, as usual, knew
nothing about this development until 1997.
The advent of microprocessors made it possible to "package" a computer into a
desktop device known as a "personal computer". From this point on, it is customary to
count the history of the fourth generation of computers, which continues to this day.
Strange as it may seem, no qualitatively new improvements have been proposed for the
past half a century or so. The Japanese project of "fifth-generation computers" did not
yield significant results, especially since it was not based on a
Photo from Wikimedia Commons; http://commons.wikimedia.org/wiki/File:
8
KL_CoreMemory_Macro.jpg.
on the technological development of the hardware base, but on the alternative direction
of software development.
As can be seen, nowadays computers are used to process any information that can be
recorded and reproduced. Besides traditional databases and texts, to which electronic
informatics was reduced in the middle of XX century, computers successfully process
recorded sound, images, video recordings; there are attempts to process tactile
information, albeit in embryonic state - in practical application it is still only Braille
displays for the blind, but engineers do not give up attempts to create all kinds of
electronic gloves and other similar devices. The situation with taste and smell is much
worse, but taste and smell information in general at the current level of technology can
not be processed; there is no doubt that if someday a way to record and reproduce taste
and smell will be found, computers will be able to work with these types of information.
Of course, sometimes computers are also used for numerical calculations; there is
even a special industry for the production of so-called supercomputers designed
exclusively for solving large-scale computational problems. Modern supercomputers
have tens of thousands of processors and in most cases are produced in single copies; in
general, supercomputers are rather an exception to the general rule, while most
applications of computers have very little in common with numerical calculations. The
question may naturally arise - why then do computers still continue to be called
computers? Wouldn't it be better to use some other term, for example, call them info-
analyzers or info-processors? Strange as it may seem, this is completely unnecessary; the
point is that it is not only numbers and not only formulas that can be calculated. If we
recall the notion of a mathematical function, we will immediately find that both the area
of its definition and the area of its values can be sets of arbitrary nature. As it is known,
§ 1.1. Computer: what it is 63
any information can be processed only if it is represented in some objective form;
moreover, digital computers require discrete representation of information, and this is
nothing else than representation in the form of chains of symbols in some alphabet, or
simply texts; note that it is precisely such representation of arbitrary information that is
considered in the theory of algorithms. With this approach, any transformation of
information turns out to be a function from a set of texts to a set of texts, and any
processing of information becomes a computation of a function. It turns out that
computers are still engaged in calculations, albeit not of numbers, but of arbitrary
information.
1.1.2. Processor, memory, bus
There's a machine in our hall, A tire
goes through it, And information goes
back and forth on the tire.
The internal structure of almost all modern computers is based on the same principles.
The basis of the computer is a common bus, which is, roughly speaking, many (several
dozens) parallel wires called tracks. The bus is connected to the central processing unit
(CPU from central processing unit), RAM from random access memory, and controllers
that allow you to control the rest of the computer devices. The CPU communicates with
the rest of the computer through the bus; RAM and controllers are designed to ignore any
information passing over the bus except that which is addressed to a particular memory
bank or controller. To accomplish this, a portion of the bus tracks are allocated to an
address; this portion of the bus tracks is called the address bus. The tracks that carry
information are called data buses, and the tracks that carry control signals are called
control buses. From the circuitry point of view, each track can be in the state of logical
one (the track is "pulled up" to the supply voltage of the circuit) or zero (the track is
connected to "ground", i.e. zero voltage level); a certain combination of zeros and ones
on the address bus constitutes an address, and all devices, except the CPU, are enabled to
work with the bus only when the state of the address bus corresponds to their address, and
the rest of the time they do not pay any attention to the bus and do not transmit anything
to it, so as not to interfere with the work of the CPU with other devices.
RAM, or simply memory, consists of identical memory cells , each of which has its
19
own unique address that distinguishes this cell from others. All cell addresses technically
possible on a given computer form the address space; its size is determined by the number
of tracks on the address bus: if these tracks are N, then the possible addresses will be 2N
(to the reader who has not studied combinatorics, this may seem incomprehensible; in this
case, come back here after §1.3.1, where all the necessary information is given).
Many computers use virtual addresses to work with memory; in this case, the address space
we just discussed - actually the set of possible address bus states - is called physical (as
19
It is useful to know that the English term memory cell, literally translated into Russian as "memory
cell", is used in reality in a completely different way - it denotes a circuit that stores a single bit. In the
sense in which we use the term memory cell, English sources use the phrase memory addressable location
or simply memory location.
§ 1.1. Computer: what it is 64
opposed to virtual, which is formed by virtual addresses). We will return to the discussion of
virtual memory in §3.1.2.
There are only two operations that can be performed on a memory cell: write a value
to it and read a value from it. To perform these operations, the CPU sets the address of
the desired cell on the address bus and then, using the control bus, transmits an electrical
impulse that causes the selected cell - that is, the one whose address is set on the bus and
no other - to transfer its contents to the data bus (read operation) or, conversely, to set its
new contents according to the state of the data bus (write operation). The old cell contents
are lost in this process. On the data bus information is transmitted in parallel: for example,
if a cell contains, as is usually the case, eight digits (zeros and ones), then to transfer data
during its reading and writing uses eight tracks; when performing a read operation, the
memory cell must set these tracks to logic levels corresponding to the digits stored in it,
and during a write operation, conversely, to set the digits stored in the cell in accordance
with the logic levels of the data tracks. To store values, several memory cells are often
used in a row, that is, they have neighboring addresses, and the data bus bit capacity is
usually sufficient for simultaneous transfer of information of several cells.
It should be noted that RAM is an electronic device that requires power to function.
When the power is turned off, the information stored in the memory cells is
immediately and irretrievably lost.
Computer memory should never be confused with disk storage devices, where
files are stored. The CPU can directly interact with memory via the bus; the CPU cannot
work with disks and other devices by itself, for this purpose it is necessary to run special
rather complex programs on the CPU, which are called drivers. Drivers organize work
with the disk and other external devices through controllers by transferring certain control
information.
Some memory blocks may physically represent permanent memory rather than RAM.
Such memory does not support write operation, i.e. its contents cannot be changed, at
least not by CPU operations; on the other hand, information written to such memory is
not erased when the power is turned off. The CPU does not distinguish between RAM
and persistent memory cells in any way, because they work exactly the same way when
performing a read operation. Usually, when a computer is manufactured, some program
is written into the permanent memory to test the computer hardware and prepare it for
operation. This program starts running when you turn on the computer; its job is to find
where the operating system can be loaded from, then load it and give it control; everything
else, including running user programs, is up to the operating system. We will talk about
operating systems in detail later; for now, let's just note that an operating system is a
program, not something else, i.e. it was written by programmers; an operating system
differs from all other programs only in that, having started on the computer before other
programs, it gets access to all its capabilities, while all other programs are started by the
operating system, and the operating system starts all other programs so that it does not
give them any access to the computer's capabilities; user programs are the only programs
that can be started by the operating system.
Note that the operating system is responsible, among other things, for organizing work
with all external devices, so it contains all the necessary drivers for this purpose.
Disks connected to a computer may contain a large amount of various information,
§ 1.1. Computer: what it is 65
and in order not to get confused in this abundance, the operating system organizes the
storage of information on disks in the form of files - units of information storage, having
names accessible to a person. Every computer user knows about the existence of files,
because they have to work with them not just every day, but every time you need to put
some information in storage or, conversely, to use the information stored earlier. Not
everyone understands that files are placed on disks, and only on them. There are no
files in memory, they are simply not needed there, because memory allocation is
constantly changing depending on the needs of running programs; this is handled by the
same operating system, and only it knows which memory areas are currently used and for
what. Even running programs are not allowed into this kitchen, each of them disposes
only of those areas that are given to it; well, a human user does not need to know about
memory allocation at all, he could not do anything useful with this knowledge anyway.
Therefore, no names are required for memory areas; the operating system identifies them
for itself as it sees fit.
where the next instruction to be executed is located. The processor works by executing an
instruction processing cycle time after time. At the beginning of this cycle, the address is
taken from the instruction counter and the code of the next instruction is read from the
20
The corresponding English terms are program counter and instruction pointer; it may seem that the
name "counter" is not very good here because it does not count anything, but the point is that the meaning
of the English word counter is much wider.
§ 1.1. Computer: what it is 66
memory cells located at this address. Immediately thereafter, the instruction counter
changes its value to indicate the next instruction in memory; for example, if the instruction
just read occupied three memory locations, the instruction counter is incremented by
three. The processor circuits decode the code and perform the actions prescribed by the
code: for example, this may be the familiar instruction to "take the contents of two
registers, add them up, and put the result back into one of the registers" or "copy a number
from one register to another", etc. When the actions prescribed by the instruction are
executed, the processor returns to the beginning of the instruction processing cycle, so
that the next pass of this cycle executes the next instruction, and so on to infinity (or
rather, until the processor is shut down).
Some machine instructions can change the sequence of instruction execution by
instructing the processor to move to another location in the program (that is, simply put,
to explicitly change the current value of the instruction counter). Such instructions are
called transition instructions. There is a distinction between conditional and
unconditional transitions; a conditional transition instruction first checks the truth of
some condition and makes a transition only if the condition is met, while an unconditional
transition instruction simply forces the processor to continue executing instructions from
a given address without any checks. Processors usually also support return-point memory
transitions, which are used to call subroutines.
RAM
D E S C E.
LENTS
running a certain program, and this program needs to write to a file on the disk, then for
22
this purpose the program will address the operating system with a request to write such
and such data to such and such a file, the operating system will calculate in which place
of the disk the corresponding file is or should be located, and, addressing, in its turn, to
the driver (i.e. actually to its separate part), will request to write certain data to a certain
place of the disk; then the driver, armed with its knowledge of the capabilities of the disk
controller, will first perform
In fact, in this case there are two different buses in the computer - one for memory cells and one for
21
controllers.
In fact, in the case of the disk, things will be a bit more complicated; we'll leave a more detailed
22
1.1.6. Summary
So, we can summarize some results: a computer is based on a common bus to which
the RAM and CPU are connected; external devices, including hard disks and disk drives,
as well as keyboards, monitors, sound devices, and in general everything that you are used
to seeing in a computer, but which is neither CPU nor memory, is also connected to the
common bus, only not directly, but through special circuits called controllers. With
memory, the processor can work by itself, to work with all other devices require special
programs called drivers. Disk storage devices are used for long-term storage of
information, where information is usually organized in the form of the files you are
familiar with; files can store both data and computer programs, but to run a program or
process data, both must first be loaded into RAM.
Among all programs a special place is occupied by a program called the operating
system; it is launched first and gets full access to all possibilities of the computer
hardware, and all other programs are launched under the control (and under the control)
of the operating system, and do not have direct access to the hardware; to perform actions
that are not reduced to the transformation of information in the allocated memory,
programs have to address the operating system.
used at all if it doesn't have a graphical user interface, but this is largely just a consequence
of the propaganda of some corporations. Up until the mid-1990s, graphical user interfaces
were not as widespread as they are now, which did not prevent people from using
computers, and even now many users prefer to copy files and view the contents of disks
using two-pane file monitors such as Far Manager or Total Commander, whose
ideological predecessor was Norton Commander, which worked in text mode. Curiously,
even the traditional window interface, which implies the ability to resize windows, move
them around the screen, partially overlap them, etc., in the conventional era of the 1980s
and 1990s was often implemented without any graphics, on the screen of an alphanumeric
monitor.
However, both Norton Commander and all its later clones, and window interfaces that
used text mode (and in the days of MS-DOS they were very popular), although they do
not use graphics as such, are still based on the same basic principle as the now familiar
"icon-menu" interfaces: they use the screen space to place the so-called interface
elements, or widgets, which usually include menus, buttons, checkboxes and
radioobuttons. checkboxes and radiobuttons, fields for entering text information, as well
as static explanatory inscriptions; the use of graphical mode somewhat expands the
repertoire of widgets, including windows with pictograms ("icons"), all sorts of sliders,
indicators and other elements that the developer had enough imagination. Meanwhile,
observing the work of professionals - programmers and system administrators, especially
those who use Unix systems, one can notice another approach to human-computer
interaction: the command line. In this mode, the user enters commands from the
keyboard, prescribing the execution of certain actions, and the computer executes these
commands and displays the results on the screen; once upon a time, this was called the
dialog mode of work with the computer, in contrast to the batch mode, when operators
formed packages of tasks received from programmers in advance, and the computer
37
An end user is usually defined as a person who uses computers to solve some tasks that are not related
to the further use of computers; for example, a secretary or a designer using a computer is an end user, but
a programmer is not, because the tasks he solves are aimed at organizing the work of other users (or even
himself) with the computer. It is quite possible that the end users will be those who use the program that
the programmer is currently writing.
§ 1.2. How to use a computer properly 105
processed these tasks when ready.
Initially, the dialog mode of work with computers was built with the help of so-called
teletypes , which represented an electromechanical typewriter connected to a communication
38
line. The original purpose of teletypes was to transmit text messages at a distance; not so lon g
ago for urgent messages used telegrams, which the letter carrier delivered to the home of the
addressee, and the received telegram was a strip of printed text issued by the teletypewriter, cut
with scissors and pasted on a dense base. The teletype generally worked quite simply: whatever
the operator typed on the keyboard was transmitted to the communication line, and whatever
came from the communication line was printed on paper. For example, if two teletypes were
connected to each other, the operators could "talk" to each other; in fact, that's how telegrams
were transmitted, except that the lines of communication between the teletypes were
automatically switched in much the same way as wireline telephony lines are switched, and in
some cases wireline telephones were used as lines of communication. Telegrams have been
almost completely displaced by the development of digital communication networks - mobile
telephony and the Internet.
The idea of connecting a teletype to a computer dates back to the first generation of
computers; teletypes were mass-produced for telegraphy and were available on the market, so
they did not need to be developed, and computer engineers of that time had other things to worry
about. When working with a computer in dialog mode using a teletype, the operator typed in
Fig. 1.4. ASR-33 teletype with punch and punch tape reader 15
command on the keyboard, and the computer's response was printed on a paper tape.
Interestingly, this mode of operation "lasted" for a surprisingly long time: it was completely
eliminated from practice only by the end of the 1970s.
Using a teletype as a computer access device had an obvious disadvantage: it was very
paper-intensive. Initially, this was the reason for the mass transition from traditional teletypes to
alphanumeric terminals, which were equipped with a keyboard and a display device (screen)
based on an electron-beam tube (kinescope); everything the operator typed on the keyboard was
transmitted to the communication line, as in the case of a teletype, and the information received
38
It is interesting to note that in the Russian language the word "teletype" is firmly established as a
nominal designation of the corresponding devices, while in English-language sources the term
"teleprinter" (teleprinter) is more often used; the point is that the word teletype was a registered
trademark of one of the manufacturers of such equipment.
§ 1.2. How to use a computer properly 106
from the communication line was displayed on the screen, thus avoiding wasting paper.
Saving paper is by no means the only or even the main advantage of the screen in
comparison with paper tape, because on the screen you can change the image in any place at
any time; for terminals almost immediately were introduced control chains of characters known
as escape-sequences (from the special character Escape, which has a code of 27), when
receiving which the terminal moved the cursor to a specified position on the screen, changed the
color of the output text, etc.
Now alphanumeric terminals are no longer produced; if necessary, this role can handle any
laptop equipped with a serial port or USB-serial adapter, if you run the appropriate software on
it. By the way, the initial configuration of the above-mentioned server machines that do not have
a video card, is done in this way: the system administrator connects his work computer via COM-
port to the server machine being configured and runs a terminal emulator. This allows the
operating system to be downloaded from an external media, installed on the server machine, set
up communication with the local network and remote access tools; further configuration, as well
as management during operation is usually done remotely over the network,
15
Photo from Wikipedia, see http://en.wikipedia.org/wiki/ File:ASR-33_at_CHM.agr.jpg.
panel file managers that continue the tradition of the famous Norton Commander: the
text-based Midnight Commander, as well as the graphical gentoo (not to be confused with
the Linux distribution of the same name), Krusader, emelFM2, Sunflower, GNOME
Commander, Double Commander, muCommander, and so on. Nevertheless, many
professionals prefer to work with files - copy them, rename them, sort them into separate
directories 39 40
, move them from disk to disk, delete them - using command-line
commands. This is explained by one very simple fact: it is really more convenient and
faster.
Interestingly, command line facilities are also present in Windows family systems;
you can get a terminal window with a corresponding prompt there by pressing the
sacramental "Start" button, selecting "Run" from the menu and entering three letters
"cmd" as the command name; but the standard Windows command line prompt is very
primitive, it is inconvenient to use, and most users are not even aware of its existence. It's
not suitable for professionals either, so in the Windows world even they have to make do
with graphical interfaces, using the command line only on rare occasions, usually related
to system maintenance. Programmers who are used to Unix systems and for one reason
or another are forced to work with Windows often install command line interpreters
ported from Unix; for example, such an interpreter is included in the MinGW package.
Of course, the command line requires some memorization, but there are not many
17
Ibid, see http://en.wikipedia.Org/wiki/File:DEC_VT100_terminal.jpg
40
Nowadays, the term "folder" is in common use; this term, which actually means an element of the
graphical interface - the "icon box" - is not acceptable for naming a file system object containing file
names. In particular, folders are not necessarily represented on disk in any way, and their icons do
not have to correspond to files; at the same time, you can work with files and directories without "folders"
- neither two-pane file managers nor the command line imply any "folders". In this book, we use correct
terminology; we consider the terms "directory" and "directory" to be equal, and the word "folder" appears
in the text only when we need to remind you of its inappropriateness.
§ 1.2. How to use a computer properly 108
commands to be memorized; meanwhile, graphical interfaces, despite all the claims about
their "intuitive comprehensibility", also require a lot of memorization: just using the Ctrl
and Shift keys in combination with the "mouse" when selecting items (this is still quite
simple, because the result is immediately visible) and when copying files, moving them
and creating "shortcuts". Learning to work with graphical interfaces from scratch, i.e.
when the trainee has no experience with computers at all, turns out to be harder than
learning to work with command line tools; the general public slowly stops noticing this
simply because nowadays people get used to graphical interfaces from pre-school age due
to their widespread use - which, in turn, is more the result of the efforts of PR departments
of certain commercial corporations rather than a consequence of a very questionable
nature. Often a user does not get used to a graphical interface in principle, but to a
particular version of it and finds himself completely helpless, for example, when
switching to another version of the operating system.
Of course, before command line tools became really convenient, they had to go
through a long way of improvement. Modern command line interpreters "remember"
several hundred of the last commands entered by the user and allow you to quickly and
effortlessly find the desired command among the memorized ones; in addition, they allow
you to edit the entered command using the "arrow" keys, "guess" the name of the file by
the first letters entered, some variants of the command line interface give the user
contextual clues as to what else can be written in this part of the entered command, etc.
Working with such a command line can be many times and even dozens of times faster
than performing the same actions with the help of any "tricked out" graphical interface.
Imagine, for example, that you have returned from a trip to Paris and want to copy photos
from your camera card to your computer. Commands
cd Photoalbum/2015
mkdir Paris
cd Paris
mount /mnt/flash
cp /mnt/flash/dcim/* .
umount /mnt/flash
taking into account autocomplete file names and using the command history, you can type
in six or seven seconds without much hurry, since you won't have to type most of the text
at all: You probably have only the name of the Photoalbum subdirectory in your
home directory that starts with Ph, so you can type just those two letters, press the Tab
key, and the command interpreter will add the name Photoalbum with a slash
after it; the same can be done by typing the command "mount /mnt/flash " (the
mnt directory is most likely the only directory in the root directory that starts
with m, and its flash subdirectory is most likely the only one that starts with f);
instead of "cp /mnt/flash/dcim/* . " an experienced user will type "cp
!:1/dcim/* ." instead of "cp !:1/dcim/* .", and the interpreter will
substitute the first argument of the previous command instead of "!:1", i.e.
"/mnt/flash"; the command "umount /mnt/flash" does not need to be
typed, it will be enough to type "u!m" (the text of the last command starting with m will
§ 1.2. How to use a computer properly 109
be substituted for !m), or simply press the up arrow twice and add the letter u to the
beginning of the command mount /mnt/flash that appears on the screen.
If you perform the same actions through the icon interface, you will first need to click
the mouse to get to the contents of the card, then, using the mouse in combination with
the Shift key, mark the entire list of files, right-click the mouse to call the context menu,
select "copy", then find (using the same mouse clicks) the Photoalbum/2015
directory, call the context menu again, create the Paris subdirectory, double-click it,
and finally, calling the context menu for the third time, select "paste". Even if you do
everything quickly, this procedure will take you at least twenty or thirty seconds, if not
more. But this, strange as it may seem, is not the main thing. If you, for example, very
often copy photos to your disk, then using the command line this procedure can be
automated by writing a so-called script - an ordinary text file consisting of commands.
For our example, the script might look like this:
#!/bin/sh
cd Photoalbum/2015
mkdir $1
cd $1
mount /mnt/flash
cp /mnt/flash/dcim/* .
umount /mnt/flash
If you now name either of these two scripts, e.g. getphotos, the next time you need
to copy new photos (e.g. when you return from Milan), all you have to do is give the
command
./getphotos Milan
This trick does not work with graphical interfaces: unlike commands, mouse movements
and clicks cannot be formally described, at least not in a way that is simple enough for
practical use.
Note that it is also better to launch graphical/window programs from the command
line rather than using all sorts of menus. For example, if you know the address of the site
you want to go to, the easiest way to launch a browser is to give the command:
there may be some problems with its typing, especially taking into account the
autocompletion - for example, the author on his computer it was enough to type only the
letters fir and press Tab; well, the address of the site you would still have to type on
the keyboard, only, perhaps, not in the command line, but in the appropriate window of
the browser.
The expressive power of modern command interpreters is remarkable: for example,
you can use the text output from one command as part of another command, not to
mention that the result of one program can be sent to the input of another program, and
thus build a whole chain of information transformations called a pipeline. Each program
that appears in your system, including those written by you personally, can be used in an
infinite number of combinations with other programs and built-in tools of the command
line interpreter itself; at the same time you can use programs of other authors for such
purposes that their authors did not even suspect. In general, if the possibilities of a
graphical user interface are limited by the imagination of its developer, the
possibilities of a properly organized command line are limited only by the
capabilities of the computer.
In any case, these possibilities are certainly worth exploring. Of course, overcoming
the influence of the propaganda of "software" monsters and convincing all computer users
to switch to the command line is an unrealistic task in modern conditions; but, as long as
you are reading this book, apparently, you are not quite an ordinary user. So, for an IT
professional, fluent command-line skills are practically mandatory; the absence of
these skills drastically reduces your value as a specialist. In addition, the command line
interface proves to be extremely useful during initial programming training, if you will,
as a teaching aid. The reasons for this are detailed in the "methodical preface", but it may
be incomprehensible to the non-specialist, in which case the author can only ask the reader
to take the importance of the command line on faith for a while; it won't last long, it will
soon become clear to you.
Our entire book is written assuming that you have a Unix system installed on your
computer and that you are using a command-line interface to work with your computer;
the remainder of this chapter is devoted to how to do this. Although we've already
mentioned it in the preface, we feel it's worth repeating: if you want to learn anything
from this book, the command-line interface should become your primary way of working
with computers on a daily basis, and you should do so as soon as possible.
Paying tribute to the popularity of firefox, the author nevertheless considers it necessary to note that
41
he himself stopped using this browser in 2018 due to the unjustified "weighting" of its graphical interface
and switched to palemoon.
§ 1.2. How to use a computer properly 111
was Ken Thompson. According to one of the legends, then he was interested in a new
area of programming at that time - computer games. Due to the high cost of computing
equipment of that time Ken Thompson had certain difficulties with using computers for
entertainment purposes, so he was interested in the available in Bell Labs machine PDP-
7; this machine was already obsolete and, as a consequence, there were not so many
applicants for it. System software included in the standard for that machine, Thompson
was not satisfied, and, using the experience gained in the project MULTICS, he wrote for
PDP-7 his own operating system. Initially, Thompson's system was a dual-task system,
i.e. it allowed running two independent processes according to the number of terminals
connected to the PDP-7 [2].
The name UNICS (similar to MULTICS) was jokingly suggested by Brian
Kernighan. The name stuck, only the last letters of CS were later replaced by a single X
(the pronunciation remained the same). Ken Thompson was joined in its development by
Dennis Ritchie, and the two of them transferred the system to the more advanced PDP-11
minicomputer. It was then that the idea arose to rewrite the system (at least as much of it
as possible) in a high-level language. Thompson tried to use a truncated dialect of the
BCPL language, which he called "B" (read "bi"), but the language was too primitive for
this: it didn't even have structural data. Ritchie proposed to extend the language; the
authors used the next letter of the English alphabet after "B" to name the resulting
language - the letter "C", which, as you know, in English is called "C".
In 1973, the system created by Thompson was rewritten in C. For that time, this was
a more than dubious step: the prevailing view was that high-level programming was
fundamentally incompatible with the level of operating systems. Time showed, however,
that this very step determined the tendencies of industry development for many years. The
C programming language and the Unix operating system retain their popularity almost
half a century after the described events. Apparently, the reason is that Unix turned out to
be the first operating system rewritten in a high-level language, and C became that
language.
In 1974, Thompson and Ritchie published an article in which they described their
achievements. The PDP-11 was a very popular machine at the time, installed in many
universities and other organizations, so after the article was published there were many
people who wanted to try the new system. At this point in history, the special position of
AT&T played an important role: anti-trust restrictions prevented it from participating in
the computer business, or any business outside of telephony. Therefore, copies of Unix
with source code were made available to everyone on a non-commercial basis. According
to one legend, Ken Thompson signed each copy, recorded on a reel-to-reel tape, with the
words "love, ken" [3]. The next big step was to port Unix to a new architecture. This idea
was proposed by Dennis Ritchey and Stefan Johnson and tested on the Interdata 8/32
machine. As part of this project, Johnson developed a portable C compiler, which was
nearly the first portable compiler in history. The porting of the system was completed in
1977.
The most important contribution to the development of Unix came from researchers
at UC Berkeley. One of the most popular branches of Unix, BSD, now represented by
FreeBSD, NetBSD, OpenBSD, and BSDi, was created there; in fact, the acronym BSD
stands for Berkeley Software Distribution. Unix-related research began there in 1974;
§ 1.2. How to use a computer properly 112
Ken Thompson's lectures at Berkeley between 1975 and 1976 played a role. The first
version of BSD was created in 1977 by Bill Joy.
In 1984, anti-trust restrictions were lifted from AT&T after one of its divisions was
split up; AT&T management began the rapid commercialization of Unix, and the free
distribution of Unix source code was stopped. The following years were marked by
confrontations and exhausting litigation between Unix developers, in particular between
AT&T and BSDi, which tried to continue development on the basis of BSD. The
ambiguities over the legal status of BSD stalled the development of the Unix community
as a whole. Beginning in 1987, work was done in Berkeley to remove AT&T's proprietary
code from the system. The legal disputes were not resolved until 1993, when AT&T sold
its Unix division (Unix Software Labs, USL) to Novell; the latter's lawyers identified
three of the 18,000 (!) files in dispute and reached a settlement with UC Berkeley that
resolved the dispute.
While the Unix developers were busy squabbling, the market was flooded with cheap
Intel-based computers and Microsoft operating systems. The Intel 80386 processor,
introduced in 1986, was suitable for Unix; there were also attempts to port BSD to the
i386 platform, but (not least because of legal problems) nothing was heard of these
developments until early 1992.
Another interesting line of events can be traced back to 1984, when Richard Stallman
founded the Free Software Foundation and issued a corresponding ideological manifesto.
The nascent social movement set itself the goal of creating a free operating system to
begin with. Reportedly, it was Stallman who in 1987 convinced researchers at Berkeley
to purge BSD of code owned by AT&T. Stallman's supporters managed to create a
substantial number of free software tools, but without a completely free OS kernel, the
goal was still far off. This did not change until the early 1990s. In 1991, Linus Torvalds,
a Finnish student, began work on a Unix-like operating system kernel for the i386
platform, without using code from other operating systems.
According to Torvalds himself, his creation was first conceived as a terminal emulator for
remote access to a university computer. The corresponding Minix program did not satisfy him.
To understand the i386 device at the same time, Torvalds decided to write his terminal emulator
as an operating system-independent program. The terminal emulator assumes two counter data
streams, for processing of which Torvalds made a CPU time scheduler, which actually does the
same thing as schedulers in kernels of multitasking operating systems. Later the author needed
to pump files, so the terminal emulator was provided with a disk drive driver; eventually the author
was surprised to find himself writing an operating system [4].
Torvalds published his interim results openly on the Internet, which allowed first
dozens and then hundreds of volunteers to join the development.
The new operating system was named Linux after its creator. It is noteworthy that this
name was given to the system by one of the project's third-party participants. Torvalds
himself planned to name the system "Freax". The first publicly available code (version
0.01) appeared in 1991, the first official version (1.0) - in 1994, the second - in 1996. As
Linus Torvalds himself points out, the court war between AT&T and the University of
Berkeley, which prevented the distribution of BSD on i386, played an important role in
Linux's meteoric rise. Linux got a big head start, eventually leaving BSD in second place:
nowadays BSD systems are less common, although they are still actively used. Torvalds'
§ 1.2. How to use a computer properly 113
kernel solved a major problem for the Richard Stallman-led social movement: a
completely free operating system was finally available. Moreover, Torvalds decided to
use Stallman's GNU GPL license for the kernel, so Stallman and his associates only had
to declare that they had achieved their goal.
The current Linux kernel source code includes code written by tens of thousands of
people. One of the consequences of this is that it is fundamentally impossible to "buy"
Linux: the kernel, as a copyrighted work, has too many copyright holders to be able to
talk seriously about making any kind of agreement with all of them. The only license
under which the Linux kernel can be used is the GNU GPL v.2 license, originally (at
Stallman's suggestion) adopted for the kernel source code; one of the features of this
license is that every programmer who makes a copyright contribution to the kernel accepts
the terms of the GNU GPL by the very fact of such a contribution, that is, he agrees to
make the results of his work available to all comers under its terms.
Nowadays, the trademark "Unix" is not used to name specific operating systems.
Instead, it refers to Unix-like operating systems, which form a whole family. The most
popular are Linux, represented by several hundred variants of distributions from various
vendors, and (with some margin) FreeBSD. Both systems are freely distributed. In
addition, we should mention commercial systems of the Unix family, among which the
most famous are SunOS/Solaris and AIX.
After nearly half a century of history, Unix - no longer as a specific operating system,
but as a general approach to building them - does not look obsolete at all, although it has
undergone virtually no revolutionary changes since the mid-1970s. Even the creation of
the X Window graphical add-on did not significantly change the fundamentals of Unix.
Note that the notorious Android is nothing more than Linux with its own (i.e., Google's
custom-made for Android) graphical shell; the same is found on Apple computers:
MacOS X is a descendant system of BSD.
If you are in a computer lab, you will probably receive brief instructions on how to
log in from your teacher or the computer lab system administrator, along with your login
name (login) and password (password). So, enter your login name and password. If you
make a mistake, the system will display a Login incorrect message, which can
mean either a typo in the login name or an incorrect password. Note that the case of letters
is important in both cases, so the reason why the system does not accept the password
could be, for example, an accidentally pressed CapsLock key.
You will need a command line interpreter to work with the system. When using
remote terminal access (for example, using the putty program), the command line is
the only means of working with the system that is available to you. An invitation will
appear as soon as you enter the correct name and password. If you are working in a Unix
terminal class and logging in using a text console, you will also be prompted immediately
after entering a valid username and password, but in this case you have the option of
running one of the possible graphical window interfaces. This is more convenient if only
because you can open several windows at the same time. To start the X Window graphical
shell, you need to give the command startx . It is also possible to log in to the system
42
using the GUI at once; this option is possible both when working with a local machine
and when using remote access. Once you have a working graphical shell, you should run
one or more instances of xterm or some equivalent; they look like graphical windows
in which the command prompt runs.
Your first action on the system, unless it is your personal computer, should be to
change your password. Depending on the system configuration, this may require the
passwd command or (in rare cases) some other command; your system administrator
will probably tell you. Type this command (without parameters). The system will ask you
first for your old password, then (twice) for your new password. Note that nothing is
displayed on the screen when you enter the password. The password you come up with
must be at least eight characters long and must contain upper and lower case Latin letters,
numbers and punctuation marks. The password should not be based on a natural language
word or your login name. However, you should come up with a password that you can
easily remember. The easiest way is to take some memorable phrase containing
punctuation marks and numerals and build a password based on it (numerals are
transferred by numbers, the first letters are taken from other words, and the letters
corresponding to nouns are capitalized, the rest are lowercase). For example, the proverb
"One with a biped, seven with a spoon" can be used to "make" the password
"1sS,7sL.". One last thing: do not share your password with anyone and never let
anyone and never let anyone work in the system under your name. Phrases like "I don't
care", "I trust my friends" or "I don't have anything secret there anyway" are
amateurishness and thoughtlessness in the worst sense of the word, and you will realize
Some systems may require a different command; contact your system administrator for information.
42
§ 1.2. How to use a computer properly 117
it as you gain experience.
lizzie@host:~$ pwd
/home/lizzie
You can find out which files are in the current directory by using the ls command:
lizzie@host:~$ ls
work tmp
Unix file names can contain any number of dots in any position, i.e., for example,
a.b..c...d....e is a perfectly valid file name. Names starting with a dot
correspond to "invisible" files; the ls command does not show them unless
specifically requested. To see all files, including invisible ones, add the -a parameter:
lizzie@host:~$ ls -a
. .. .bash_history work tmp
Some of the names shown may correspond to subdirectories of the current directory,
others may have special values. To make it easier to distinguish files by type, you can use
the -F checkbox:
§ 1.2. How to use a computer properly 118
lizzie@host:~$ ls -aF
./ ../ .bash_history work/ tmp/
Now we see that all names except .bash_history correspond to directories. Note
that "." is a reference to the current directory itself, and "..." is a reference
to the directory containing the current directory (in our
example, it's /home/avst. - is a reference to the directory
containing the current directory (in our example it is /home/avst).
You can move to a different directory with the cd command:
lizzie@host:~$ pwd
/home/lizzie
lizzie@host:~$ cd tmp
lizzie@host:~/tmp$ pwd
/home/lizzie/tmp
lizzie@host:~/tmp$ cd .
lizzie@host:~$ pwd
/home/lizzie
lizzie@host:~$ cd /usr/include
lizzie@host:/usr/include$ pwd
/usr/include
lizzie@host:/$ cd /
lizzie@host:/$ pwd
/
lizzie@host:/$ cd
lizzie@host:~$ pwd
/home/lizzie
The last example shows that the cd command without specifying a directory makes the
user's home directory current, as it was immediately after logging in.
Table 1.1. Commands for working with files
cp file copying
mv renaming or moving a file
rm file deletion
directory creation
mkdir
rmdir directory deletion
touch creating a file or setting a new modification time
fications
less View the contents of a paging file
work/progs/../../work/../../vasya/photos/../photos/mars.jpg
or
photos/../photos/../photos/../photos/mars.jpg
use the minus sign, but never start a file name with it, because almost all commands
working with files treat words starting with minus as special keys (and then try to do
something with such a file; in fact, you can do anything with it, but you need to know and
remember how). All the other "tricky" characters in filenames can also cause problems
on the fly; all of these problems are actually quite easy to solve, but avoiding a problem
is always easier than solving it.
Later we will consider a number of constructions that imply several commands in one
line, the semicolon is interesting only because it is the simplest among them. It is
important to realize that the ";" symbol is not included in either the first command or the
second.
Each command consists of words, the first of which the interpreter considers to be the
name of the command, the rest of which are its arguments; in our example, ls, pwd,
and echo are the command names, -a is the argument we used to ask ls
to show "invisible" files, and the word abrakadabra is the argument of
§ 1.2. How to use a computer properly 121
echo, which it simply printed (this command is designed to print its arguments). In
the previous paragraph, we used arguments to specify file and directory names;
commands and programs can give their command-line arguments a wide variety of
meanings, from the interpreter's point of view, all of these arguments are nothing more
than strings.
Since we are talking about words, we can guess that the space character is a bit special
from the interpreter's point of view - it is used to separate words from each other; the tab
character can also theoretically play the same role, but when working with modern
command interpreters in dialog mode, the tab is usually used for something else - when
the interpreter receives this character, it tries to complete the word we enter for us (we
will discuss this possibility in detail in §1.2.8). The number of spaces between two words
can be arbitrary, it doesn't affect anything; you can see this with the example of two echo
commands:
As we can see, inserting extra spaces doesn't change anything. But what if we need a
parameter that contains spaces - for example, if you encounter a file that has a space in its
name? Even if you strictly follow the advice we gave at the end of the previous paragraph
and never use spaces in filenames, other computer users (especially those who are used
to Windows and don't know that there is anything else in the world) are often not so
careful, so you may be given a file with something with spaces in its name on a flash
drive, it may be sent to you in the mail, it may be the name of a file downloaded from an
Internet site, and so on.
The command line interpreter provides three basic ways to deprive a character of its
special role: to "shield" it with a backslash "\", to enclose it in apostrophes, or to enclose
it in double quotes. For example, if a friend with whom you recently traveled to Paris
gave you a flash drive with photos, and you discovered the Photos from Paris
catalog, it would be best to immediately rename the catalog by replacing spaces with
underscores; you can do this with one of the following commands:
Similarly, if you want to use the echo command to print words separated by more than
one space, you can arrange that with quotes too:
Screening can also achieve this, but it is inconvenient: you have to put a backslash before
§ 1.2. How to use a computer properly 122
each space. It should be noted that such methods deprive not only the space, but also other
"tricky" characters of special meaning. In particular, the semicolon usually separates one
command from another, but if you need it by itself, as an ordinary character, any of the
three methods will do:
Note that the escape, apostrophe, and double-quote characters themselves disappear once
they have accomplished their mission, so that the programs and commands we run do not
see them (unless we make them stand for themselves).
Within apostrophes, only the apostrophe itself has a special meaning - it is considered
as a closing character, and it is impossible to deprive it of this role, but all other characters,
including quotes, and backslash, and in general anything - denote themselves and do not
have any special meaning:
If you need an apostrophe character as itself, there are two options: either escape it or use
double quotes:
Double quotation marks differ from apostrophes in that they deprive not all special
characters of special meaning. In particular, the escape character works inside double
quotes:
Looking ahead, we will say that inside the double quotes also retain a special meaning of
the symbols "!", "'" (reverse apostrophe) and "$".
It is very important to realize that apostrophes and quotation marks only change the
meaning of the characters inside, but do not separate those characters into separate words;
you can even use different kinds of quotation marks in the same word:
But with their help you can create an empty word by putting two double quotes or two
apostrophes in a row ("" or '').
The reader accustomed to the traditional terminology of the Windows world may be surprised by the
43
use of the term "suffix" instead of "extension". The point here is that in MS-DOS and early Windows, the
"extension" of a file was intrinsically linked to its type and was treated separately from the name, whereas
in Unix systems the ending of the file name never played such a role and was always just part of the name.
§ 1.2. How to use a computer properly 124
that is, it passes the word to the command as if it were not a pattern at all. This feature
should be used with caution; most command interpreters other than the Bourne Shell do
not have this feature.
The second good feature of the interpreter, which makes life much easier for the user,
is that the interpreter remembers the history of the commands you have entered, and it
saves this history in a file when you finish a session, so you can use your commands the
next day or a week later. If the command you need was given recently, you can return it
to the screen by pressing the up-arrow key; if you accidentally skip over the desired
command during this "upward movement", you can go back by pressing the down-arrow,
which is quite natural. You can edit any command from the saved history using the
familiar left and right arrows, Home, End and Backspace keys in their usual role.
The entire saved history can be viewed using the history command; in most cases
it is more convenient to combine it with the less pager, i.e. give the command
history | less. Here you will see that each of the commands memorized by the
interpreter has a number; you can repeat any of the old commands, knowing its number,
by using an exclamation mark; for example, !137 will execute the command stored in
history as number 137. Note that "!!!" indicates the last command entered, and
"!!:0", "!:1", etc. - individual words from it; an individual word can be extracted not
only from the last command - for example, !137:2 indicates the second word
from command number 137; "!abc" indicates the last command starting with the string
abc, and individual words can be extracted here too.
Finally, you can search for a substring in the history. To do this, press Ctrl-R (from
the word reverse) and start typing your substring. As you type letters, the interpreter will
find more and more old commands containing the typed substring. If you press Ctrl-R
again, you will get the next (i.e., even older) command containing the same substring.
If you get confused while editing a command, searching the history, etc., you can reset
the input string at any time by pressing Ctrl-C; this is much faster than, for example,
deleting all the entered characters by pressing Backspace.
§ 1.2. How to use a computer properly 125
Once you start to actively use the tools listed here, you will soon see that your labor
costs for typing commands have been reduced by at least twenty times. Do not neglect
these features! It will take you about five minutes to master them, and the time saved will
be hundreds of hours.
avst@host:~$ ps
PID TTYTIMECMD
2199 pts/500 :00: 00bash.
2241 pts/500 :00: 00ps.
As you can see, the default command only gives you a list of processes running in that
particular session. Unfortunately, the flags of the ps command vary greatly from version
to version (particularly for *BSD and Linux). You should consult the documentation for
that particular OS for detailed information; here we will limit ourselves to noting that the
"ps ax" command will list all existing processes, and the "ps axu" command will
§ 1.2. How to use a computer properly 127
additionally show information about process owners . 44
In some cases, the top program, which works interactively, may be useful. It
displays a list of the most active processes on the screen, updating it once a second. To
exit the top program, enter the letter q.
You can remove a process by means of a so-called signal; note that this is what
happens when you press the Ctrl-C and Ctrl-\ combinations mentioned above;
the terminal driver sends signals to the process. Each signal has its own number, name
and some predefined role; we can't say anything more about signals, the concept of
"sending a signal to the process" can't be explained without getting into the maze, but we
don't need it now. It is enough to know, firstly, that a process can be sent a signal with a
given number (or name); secondly, that a process can decide how to react to most signals,
including not reacting to them at all; and thirdly, that there are such signals over which
processes have no control; this allows you to kill a process for sure.
Ctrl-C and Ctrl-\ send the SIGINT and SIGQUIT signals to the active
process, respectively (for clarity, note that they are numbered 2 and 3, but you don't need
to remember that). Usually both of these signals will cause the process to terminate
immediately; if not, it is likely that your process has intercepted them and you will have
to use the non-interceptable SIGKILL signal (#9) to remove it. The kill command
allows you to send an arbitrary signal to a process, but before you use it, you need to know
the number of the process you want to kill. To do this, you usually open another terminal
window and give the command ps ax in it; the list that appears will show both process
numbers and their command lines, which usually allows you to find out the number of the
process you want to kill. For example, if you have written a prog program, run it, and
it hangs so badly that no combinations help, you are likely to find a line like this near the
end of the ps ax command:
The line should be identified by the program name (in this case ./prog) and the process
number at the beginning of the line (here it is 2763). Knowing this number, we can use
the kill command, but remember that by default it sends a SIGTERM signal (#15) to
the specified process, which can also be intercepted by the process. You can specify a
different signal either by number or by name (TERM, KILL, INT, etc.). The following
two commands are equivalent; both send the SIGKILL signal to process 2763:
kill -9 2763
kill -KILL 2763
Very rarely the process does not disappear even after this. This can only happen in two cases.
First, it can be a so-called zombie process, which has actually already terminated, but remains
in the system, because its immediate ancestor - the one who started it - for some reason does
not hurry to ask the operating system for information about the circumstances of its descendant's
This is true for Linux and FreeBSD. In other operating systems, such as SunOS/Solaris, the ps
44
You can find out which of the two situations is the case from the output of the same ps ax
command. The zombie process is marked with the letter Z in the STAT column and the word
"defunct" in the command line column, like this:
A process in an "uninterrupted sleep" state can be distinguished by the letter D in the STAT
field:
If any process stays in this state for even a few seconds, it's a reason to check if everything is
okay with your computer.
avst@host:~$ jobs
[1]+ Running updatedb &
When the task is completed, the command interpreter will inform us about it. In case of
successful completion, the message will look like this:
If the program at the end of the program informs the operating system that it does not
consider its execution successful (this happens rarely with updatedb, but much more
often with other programs), the message will have a different form:
[1]+ Exit 1 updatedb &
Finally, if the background process is removed by a signal, the message will be something
like this (for the SIGTERM signal):
[1]+ Terminated updatedb &
When signaling to processes that are background tasks of a particular instance of the shell,
you can refer to process numbers by background task number by appending "%" to the
number. Thus, the command "kill %2" would send a SIGTERM signal to the second
background task. The "%" symbol without a number indicates the last of the background
tasks.
If a task is already running out of the background and we don't want to wait for it to
finish, we can make a regular task a background task. To do this, press Ctrl-Z, as
a result of which the current task will be suspended. Then using the bg (background)
command you can put the suspended task back to execution, but in the background mode.
It is also possible to make any of the background and suspended tasks the current one (i.e.
the one the command interpreter is waiting for). This is done with the fg (foreground)
command.
Remember: the Ctrl-Z combination does not kill the active task, but only
temporarily suspends its execution. This is especially important for those who are used
to working with "console" programs in Windows; there this combination has a completely
different meaning. If you are used to it, it's time to get used to it.
Note that background execution is especially useful when running a windowed
application, such as a web browser, a text editor running in a separate window (e.g.
geany), or just another instance of xterm. When we run such a program, we usually
don't want our command interpreter to wait for it to finish without accepting new
commands from us.
§ 1.2. How to use a computer properly 130
1.2.11. Redirecting I/O streams
In Unix systems, running programs communicate with the outside world through so-
called I/O streams; each such stream allows a sequence of bytes to be received externally
(input) or, conversely, transmitted externally (output), and these bytes may come from
the keyboard, from a file, from a communication channel with another program, from a
hardware device, or from a communication partner through a computer network;
similarly, they may be output to the screen, to a file on disk, to a communication channel,
transmitted to a hardware device, or sent out through a computer network. A program can
handle several I/O streams simultaneously, distinguishing them by numbers; these
numbers are called descriptors.
Virtually all Unix programs follow the convention that an I/O stream with descriptor
0 is a standard input stream, a stream with descriptor 1 is a standard output stream, and a
stream with descriptor 2 is a stream for outputting error messages. When accepting and
passing data through standard streams, most programs make no assumptions about what
a particular stream is actually associated with. This allows the same programs to be used
both for terminal operations and for reading from and/or writing to a file. Command
interpreters, including the classic Bourne Shell, provide facilities for controlling the I/O
of running programs. The symbols <, >, >>, >& and | are used for this purpose
(see Table 1.2).
Unix usually has a less program that allows you to page through the contents of
files using the up arrow, down arrow, PgUp, PgDn, etc. keys for scrolling. The same
program allows you to page through the text submitted to it for standard input. Using the
less program is useful if the information given by any of the programs you run
does not fit on the screen. For example, the command
ls -lR | less
will allow you to view a list of all files in the current directory and all its subdirectories.
Note that many programs output all error messages and warnings to the diagnostic
thread. To view page-by-page messages from such a program (for example, from a C
compiler called gcc), you must issue a command that merges the diagnostic stream with
the standard output stream and sends the less merged result to the program's input:
gcc -Wall -g myprog.c -o myprog 2>&1 | less
If for some reason you are not interested in the information flow produced by some
program, you can redirect it to the /dev/null pseudo-device: anything directed
there simply disappears. For example, the following command will generate a list of all
files
Table 1.2. Examples of I/O redirections
cmdl > filel
run the cmdl program, directing its output to the
file filel; if the file exists, it will be overwritten
from scratch, if it does not exist, it will be created
§ 1.2. How to use a computer properly 131
cmdl >> filel run the cmdl program by writing its output to the end
of the filel file; if the file does not exist, it will
be created.
cmd2 < file2 run the cmd2 program, giving it the contents of
file2 as standard input; if the file does not exist, an
error will occur
cmd3 > filel < file2 run the cmd3 program, redirecting both input and
output
cmdl | cmd2
run cmdl and cmd2 programs simultaneously,
feeding data from the standard output of the first to the
standard input of the second (the so-called pipeline).
cmd4 2> errfile
send a stream of error messages to the errfile file
cmd5 2>&1 | cmd6 merge the standard output and diagnostic streams of
the cmd5 program and direct them to the standard
input of the cmd6 program
on your system, except for those directories it does not have permission to read; all error
messages will be ignored:
Device files, which include /dev/null, are a separate, rather serious topic that we will
postpone a detailed review of until the second volume. For now, we will need only one of these
files in our daily life - this /dev/null file, and we will only need it to send everything
unnecessary there. Just in case, we should keep in mind that all device files are located in the
/dev directory, the name of which is derived from the English devices (literally "devices").
format; second, the editor must not automatically format paragraphs of text (for example,
MSWord is not suitable for this purpose); and third, the editor must use a monospaced
font, i.e., a font in which all characters have the same width. The easiest way to find out
whether an editor satisfies this property is to type a string of ten Latin letters m and a
string of ten Latin letters i below it. In an editor using monospaced font, the resulting
text will look like this:
mmmmmmmmmmmm
iiiiiiiiiiiiii
whereas in an editor that uses a proportional font (and is therefore unsuitable for
programming), the view will look something like this:
mmmmmmmmmmmm
iiiiiiiiiiiiii
However, for editors that work in a terminal window, this property is done automatically,
but we would not recommend using such text editors that open their own graphical
windows anyway.
vim editor
The vim (Vi Improved) editor is a clone of the classic text editor for Unix-like VI
operating systems. Working in
§ 1.2. How to use a computer properly 133
Fig. 1.6. Moving the cursor in vim using the alphabetic keys of this family of editors may seem
a bit inconvenient to a novice user, as they are fundamentally different from the on-screen text
editors with menu systems that most users are accustomed to. At the same time, many
programmers working under Unix-systems prefer to use these editors, because for a person who
knows how to use the basic functions of these editors, it is this variant of the interface that is
the most convenient for working on the program text. Moreover, as the experience of the author
of these lines shows, all this applies not only to programs; the text of the book you are reading
is typed in the vim editor, as well as the texts of all other books by the author.
If you find vim too difficult to master, there are other text editors available, two of which
are described below. For readers who choose not to learn vim, here is the sequence of
keystrokes to exit vim: if you accidentally start vim, in almost any situation you can press
Escape, then type :qa!, which will exit the editor without saving your changes.
To start the vim editor, just give the command vim myfile.c. If the myfile.c file
does not exist, it will be created the first time you save changes. The first thing you should
realize when working with vim is that it has two modes of operation: text input mode and
command mode. As soon as you start, you are in command mode. In this mode, any keystrokes
will be taken as commands to the editor, so if you try to enter text, you won't like the result.
You can move through the text in command mode using the arrow keys, but more
experienced vim users prefer to use j, k, h, and l to move down, up, left, and right,
respectively (see Figure 1.6). To make it easier to remember these letters, note that the four
keys you need are located next to each other on the keyboard, with the h key on the left and
the l key on the right, which are used to move left and right; the letter j looks a bit like
a down arrow, and is used to move down; the only thing left is k, which is used to move up.
Table 1.4. File commands of the vim editor
:w save the edited file
:w <name> write the file under a new name
:w! save, ignoring (if possible) the readonly flag
:wq save file and exit
exit the editor (if the file has not been modified since the last
:q
save)
exit without saving, resetting the changes made, read the contents
:q!
of the <name> file and paste it into the text to be edited.
:r <name> start editing another file
show the list of editable files (active buffers)
:e <name> move to buffer number N
:ls
:b <N>
The reason for this choice is that in UNIX the arrow keys generate a sequence of bytes beginning
with the Esc code (27); any such sequence can be perceived by the editor as a request for a
command mode transition and several character commands, and the only way to distinguish the Esc
sequence generated by pressing the key from the same sequence entered by the user is to measure
§ 1.2. How to use a computer properly 134
the time between the arrival of the Esc code and the one following it. When working on a slow link (for
example, when editing a file remotely on a slow or unstable network), this method can be frustrating.
A few of the most commonly used commands are listed in Table 1.3. The i, a, o, and O
commands put you in text entry mode. Everything you enter from the keyboard is now treated
as text to be inserted. Naturally, it is possible to use the Backspace key in its usual role. In most
cases it is also possible to use the arrow keys, but in some versions of vim, with some
peculiarities of customization, as well as when working through a slow connection channel, the
editor may not respond correctly to arrows. In this case, you need to exit input mode to navigate
through the text. Exit the input mode and return to the command mode by pressing the Escape
key.
To search by text, you can use (in command mode) the sequence /<text>, ending it by
pressing Enter. Thus, /myfun will position the cursor at the nearest occurrence of the string
myfun in your text. You can repeat the search by typing / and pressing Enter immediately.
You can move to a line with a given number (for example, to a line for which the compiler
has generated an error message) by typing a colon, the line number and pressing Enter.
Commands to save, load files, exit, etc. are also available via colon. (see Table 1.4).
When working with several files simultaneously, the Ctrl -~ combination allows you
to quickly switch between the two most recently edited files. By default, the editor requires the
current file to be saved, which is not always convenient; this can be overridden by setting the
hidden option with :set hidden. By the way, this and other commands can be written
to the .vimrc file in your home directory, so that they are always executed when you start
the editor.
The commands for selecting blocks and working with blocks deserve special mention. To
start selection of a fragment consisting exclusively of whole lines, use the V command; to
select a fragment consisting of an arbitrary number of characters, use the v command. The
selection boundary is set by arrows or by the corresponding h, j, k, and l commands.
The selected block can be deleted with the d command and copied with the y command.
In both cases, the selection is deselected and the text fragment that was under the selection is
placed in a special buffer. The contents of the buffer can be inserted into the text with the
commands p (after the cursor) and P (before the cursor). Text can also be placed in the buffer
without selection. Thus, all commands that delete certain text fragments (x, dd, dw, d$, etc.)
place the deleted text in the buffer. The yy, yw, y$ commands put the current line, the current
word, and the characters from the cursor to the end of the line into the buffer, respectively.
If you decide to seriously learn vim, we strongly recommend that you go through the
vimtutor tutorial program, which usually appears on your system along with vim itself.
Nano Editor
The Nano editor has become extremely popular in the world of "pops" I.piii.x'a in the last
ten years - some popular distributions put this very editor as the default one. The history of its
appearance is quite interesting. In the vicinity of the turn of the century, the Pine e-mail client,
which worked in a terminal window, was quite popular in Unix systems. The built-in text editor
of this client, originally intended for editing e-mails, was released at some point as a separate
program called "Pico" from the words Pine Composer; as for the Nano editor, it is a clone of
Pico, implemented from scratch by the Gnu project members because of doubts about the
§ 1.2. How to use a computer properly 135
license purity of Pico. This editor is not intended for programming, but it does have a number
of purely "programmer" functions, such as syntax highlighting, automatic shifting, etc. To start
this editor, as usual, you use its name as a command and the name of the file to be edited as an
argument:
nano myfile.pas
There are no tricky "modes" in this editor, you can immediately type text using the arrow keys,
PgUp/PgDn, Home, End, Backspace and Del in their usual role. All these keys have "single-
byte" alternatives in case of working through a slow communication line, but, unlike the same
vim commands, the location of the corresponding letters on the keyboard is not quite convenient
- for example, to move the cursor to the right, left, up and down you can use the combinations
Ctrl-F, Ctrl-B, Ctrl-P and Ctrl-N - from the words forward, backward, previous,
next; perhaps the arrows are much more convenient.
In the lower part of the screen there are two lines with hints; here it is worth remembering
at once that the symbol "~" stands for Ctrl, i.e., for example, "~C" corresponds to pressing
Ctrl-C. By the way, this combination in Nano is used in a very useful, though unexpected
(for exactly such keys) role: it shows the number of row and column corresponding to the
current cursor position (the mnemonic word here is current [position]). It is worth paying
attention to the combinations Ctrl-O (write Out) - to save the edited file and Ctrl-X -
to exit the editor (if the text contains unsaved changes, the editor will offer to save it).
You may not immediately notice that the editor asks questions in many cases, and for this
purpose it uses the third line from the bottom, just above the prompts. In particular, when you
try to save a file, it always asks if you want to save it under this name; usually you just press
Enter (or enter a different name), but you have to notice that the editor wants something from
you first. Try saving your file right away and pay attention to the bottom of the screen, then
you probably won't miss the editor's question next time.
The Ctrl-Shift-_ combination is extremely useful in programming. After that, the
editor will prompt you to enter the row number and column number where you want to go; the
two numbers you are looking for are entered using commas, or you can enter only the row
number and press Enter.
Another "secret knowledge" worth arming yourself with is the way to copy a text fragment
here. To do this, you first delete a line or several lines in a row from the text using Ctrl-K
(strangely enough, from the word cut - it's just that the letter C was already occupied), and then
press Ctrl-U (uncut) to paste the newly deleted fragment back into the text. Naturally,
between
Table 1.5: The most common commands of the joe editor
Ctrl-K D save file
Ctrl-K X save and exit
Ctrl-C bail out
Ctrl-Y delete the current line
Ctrl-K B kick off the block
Ctrl-K K end block
Ctrl-K C copy the selected block to a new location
Ctrl-K M move the selected block to a new location
§ 1.2. How to use a computer properly 136
Ctrl-K Y deselect
Ctrl-K L line number
Ctrl-Shift-'-' undo
Ctrl-' redo the canceled action (redo)
Ctrl-K F keyword search
Ctrl-L repeated search
with these actions you can move the cursor where you want it to go. Ctrl-U can be pressed
several times, which allows, firstly, to "multiply" text fragments, and, secondly, if you want to
copy a fragment rather than move it, you can first (immediately after deleting it) paste it in its
place, and then go to the desired location and paste the same fragment again.
You can learn about the Nano's additional features from the built-in description, which is
invoked by Ctrl-G or the traditional F1 key.
Editor Joe
Another popular Unix text editor is called Joe, from Jonathan's Own Editor. To run it, just
give the command joe myfile.c. If the file myfile.c does not exist, it will be created
the first time you save changes. Unlike the vim editor, the joe interface is more similar to the
text editors most users are used to. The arrow keys, Enter, Backspace, and others work in their
usual roles, and the Delete key is usually available as well. Commands to the editor are given
using key combinations, most of which begin with Ctrl-K. In particular, Ctrl-K h will
show a memo of the most used editor commands at the top of the screen (see Table 1.5).
file access rights are often written as an octal number, usually three-digit, sometimes four-digit.
In this case, the lowest digit (the last digit) corresponds to the rights for all users, the middle
digit corresponds to the rights for a group and the highest (usually the first) digit denotes the
rights for the owner. Execution rights are set by the lowest bit of each group (value 1), write
rights by the next bit (value 2), and read rights by the highest bit (value 4); these values are
summed, i.e., for example, read and write rights are indicated by 6 (4 + 2) and read and execute
rights by 5 (4 + 1). The permissions for the /bin/cat file from our example can be encoded
with the octal number 0755 . 46
For directories, the interpretation of permissions bits is slightly different. Read permissions
allow you to view the contents of a directory. Write permissions allow you to modify the
directory, i.e. create and destroy files in it, and you can delete an alien file, and one to which
you have no access rights - it is enough to have write access rights to the directory itself. As for
the "execute" bit, for a directory this bit means the ability to use the contents of the directory in
any way, including, for example, opening files in the directory. Thus, if a directory has read
permissions but no execute permissions, we can browse it, but we can't use what we see; this
is a rather pointless situation, it's not usually done this way. On the contrary, if we have execute
rights but no read rights, we can only open a file from that directory if we know the exact name
of the file. There is no way for us to know the name, because we have no way to browse the
directory. This option is sometimes used by system administrators; however, in most cases,
read and execute permissions for a directory are set and removed together.
The remaining three (higher) digits of the permission word are called SetUid Bit (04000),
45
Details about number systems will be described in §1.3.2; in principle, it is not necessary to understand the
octal number system to work with access rights to files, it is enough to remember which type of rights corresponds
to which digit (4, 2, 1) and that the final designation of the access mode is their sum, which, of course, turns out
to be a digit from 0 to 7.
46
Note that the number is written with a zero in front; according to C rules, this means that the number is
written in octal, and since professional Unix users are very fond of this language, they usually write octal and
hexadecimal numbers following the C conventions and without specifying it, i.e. assuming that they will be
understood.
§ 1.2. How to use a computer properly 138
SetGid Bit (02000) and Sticky Bit (01000). If an executable file is set to SetUid Bit,
the file will have the rights of its owner (most often the root user) when executed, regardless of which
user ran the file. SetGid Bit works in a similar way, setting the execution to run as the group of
the file's owner instead of the group of the user running the program. For example, the SetUid Bit
is typically set for the passwd program. Sticky Bit on simple files is ignored by modern
systems. For directories, the SetGid Bit means that whichever user creates a file in that directory,
the "owner group" for that file will be set to the same group as the directory itself. Sticky Bit
means that even if a user has write permission to a given directory, they can only delete their own
(owned) files - this is used to create public storage locations like the /tmp directory. The SetUid
Bit on directories is ignored on most systems. We will return to the discussion of permissions in
Volume 2.
The chmod command is used to change file permissions. This command allows you to
47
sets the myfile.c file to owner-only write permissions and read permissions to all.
Permissions can also be specified as a mnemonic string like [ugoa][+-
=][rwxsXtugo]. The letters u, g, o and a at the beginning stand for the owner (user),
group (group), others and all at once, respectively; "+" stands for adding new
permissions, "-" for removing old permissions, "=" for setting the specified permissions and
removing all others. After the sign the letters r, w, x mean, as you can guess, read, write and
execute rights, the letter s means Set/Unset bits (makes sense for owner and group),
t means Sticky Bit, and the letters u, g and o to the right of the action sign
mean the rights set for owner, group and others respectively. The letter X (capitalized) means
to set/unset the execution bit only for directories, and for those files for which at least someone
has execution rights. If the chmod command is used with the -R checkbox, it will change
permissions to all files in all subdirectories of a given directory. For example, the chmod a+x
myscript command will make the myscript file executable; the chmod go-
rwx * command will remove all but the owner's permissions from all files in the current
directory. The following command can be very useful
chmod -R u+rwX,go=rX ~
just in case you accidentally mess up the permissions on your home directory; this command
will probably restore everything to a satisfactory state. To explain, this command sets all files
in your home directory and all its subdirectories to read and write permissions for the owner;
for directories, as well as files for which execution is allowed to anyone, the owner is also
assigned execution rights. Read permissions are set for the group and other users, execution
permissions are set for executable files and directories, and all other permissions are removed.
man 2 write
will output exactly the document devoted to the write system call, since section #2 contains
reference documents on system calls. Let's list the other sections of the system reference book:
• 1 - Unix OS user commands (such commands as ls, rm, mv, etc. are described in this
section);
• 2 - Unix OS kernel system calls;
• 3 - C library functions (this section can be referred to, for example, for information about
the sprintf function);
• 4 - device file descriptions;
• 5 - descriptions of system configuration file formats;
• 6 - game programs;
• 7 - general concepts (for example, man 7 ip will give useful information about
programming using TCP/IP);
• 8 - Unix system administration commands (for example, in this section you will find a
description of the mount command for mounting file systems).
The directory may also contain other sections, not necessarily labeled with a number; for
§ 1.2. How to use a computer properly 140
example, when the Tcl language interpreter is installed in the system, its reference pages are
usually organized in a separate section, which may be called "P ", "ZP ", ETC.
#!/bin/sh
Of course, /bin/sh is not the only possible interpreter. For example, a Perl program may
start with the line
#!/usr/bin/perl
- if, of course, the Perl interpreter is installed in the system. To turn a plain text file into an
executable script, it is enough to form its first line from the characters " #! " and the path
to the interpreter, and set the x bit in the file access rights (see §1.2.13). In the simplest case,
after the header line, a script file contains the commands to be executed, one per line (empty
lines are ignored); we have already given examples of such scripts in §1.2.1 (see page 80), but
did not explain how to handle them. Let's try to fill this gap.
If we remember that the echo command prints its command line arguments, we can
write a script that prints a poem. Let's take any text editor and create a file humpty.sh
containing the following text:
#!/bin/sh
echo "Humpty Dumpty sat on a wall,"
echo "Humpty Dumpty had a great fall."
echo "All the king's horses and all the king's men."
echo "Couldn't put Humpty together again."
inconvenient, to say the least. Therefore, a slightly different convention from the usual
conventions for file names applies to file names containing programs to be run. Absolute and
relative names-any names containing at least one slash-work the same way as regular file
names, but short names-names without slashes-are not considered file names in the current
directory, but command names. The command interpreter either executes such commands
itself or, if there is no built-in command with such a name, searches for the executable file in
system directories; we will learn more about which directories are considered "system" in this
sense later.
This is why we can run the same text editor with just its short name, vim, even though
it is not in the current directory; but for the same reason we cannot use short names to run
programs in the current directory, so we have to artificially turn the short name into a relative
name by adding the sacramental "./". This is how we'll have to run our own programs when
we start writing them. Of course, we could run our example differently, either by an absolute
name (something like /home/vasya/humpty.sh) or by some more complicated relative
name - in particular, if we are in the directory /home/vasya, it has a subdirectory called
work, and we put the script in that subdirectory, we could run it with the command
work/humpty.sh. One thing is important: the command name must contain at least one
slash, otherwise the system will try to find a command with that name in the system directories,
fail, and generate an error.
Let us note one more important point. In most systems, the length of the first line of a script is
limited, and in some systems the limits are quite severe - only 32 bytes. You can pass a command line
parameter to the interpreter (the interpreter itself, as a program), but, alas, not more than one; the
system will pass the name of the script file as the second argument. We don't need this now, but in
Volume 3, as we study a variety of interpreted languages, we will encounter some inconvenience
because of these limitations.
Of course, scripts are not limited to simple sequences of commands; the Bourne Shell
interpreter allows you to execute certain commands depending on the results of condition
checks, organize loops, use variables to store information, and so on. We will now look at some
of these features, but there is one difficulty here: not all readers have enough experience to
understand what is going on. If you haven't experienced programming (in any language at all),
Actually, the command line interpreter handles the ls command and some other commands by itself
48
without running any external programs, but just in case, all these commands are also available as a separate
program.
§ 1.2. How to use a computer properly 142
the rest of this paragraph may seem confusing and abstruse. There is nothing wrong with that,
just skip it and come back here later, after learning the basics of programming using Pascal as
an example - at least after you have finished chapter 2.2. Keep in mind that the Bourne Shell
scripting language is quite specific, because it belongs to the group of command-scripting
languages, in which programs longer than two or three hundred lines are not usually written
(and if they are written, it means in most cases that something went wrong in the management
of a particular project); you should certainly not start learning programming with this language.
You should continue reading this paragraph only if you already have experience of writing
working programs in any language, even small ones.
Like many other programming languages, Bourne Shell allows you to store information in so-called
variables - if you will, to associate some information with some name that can be accessed later to
use the previously stored information. Variables in the Bourne Shell language have names consisting
of Latin letters, numbers, an underscore and always starting with a letter. The value of a variable can
be any string of characters. To assign a value to a variable, you must give an assignment command,
for example:
1=10
MYFILE=/tmp/the_file_name
MYSTRING="Here are several words"
Note that there should not be spaces in the variable name and around the equal sign (assignment
character), otherwise the command will not be considered as an assignment, but as an ordinary
command in which the assignment character is one of the parameters. If there are spaces in the string
acting as a value, the string itself should be enclosed in quotes; if there are no spaces in the value, the
quotes can be omitted.
To refer to a variable, the $ sign is used to indicate that its value should be substituted for the
variable name. For example, the command
will print the string "10 /tmp/the_file_name Here are several words ". You can
enclose variable names in curly braces to make a concatenated text from variable values; for example,
the command "echo ${I}abc" will print "10abc".
The $(( )) construct is used to perform arithmetic operations. For example, the command
"I=$(( $I + 7 )))" will increase the value of the variable I by seven. Inside the double brackets
you can omit the variable reference sign - the interpreter treats as a variable name any word that
cannot be interpreted otherwise in an arithmetic expression (as a number or as an operation symbol).
Spaces are also optional in most cases, so you can just write " I=$((I+7))", the effect will be the
same.
Special variables $0, $1, $2, ..., $12, etc. are used to access the command line arguments
of the script itself, with $0 representing the name of the script as specified by the user at
startup. In this case $# is converted into an integer - the number of arguments. For example, if you
create a script argdemo.sh with the following text:
#!/bin/sh
# argdemo.sh
Note the "empty" parameter in the last echo command. It is needed to prevent echo from processing
command-line options starting with a minus in the parameters after it. The command accepts a few
such parameters, but only before the beginning of the normal parameters that need to be printed; if
you don't put an empty parameter at the beginning, it is possible for something starting with minus to
affect echo's operation by passing a third parameter to the script.
Bourne Shell supports subroutines, which we will not consider for the sake of space, but just in
case we note that within a subroutine the variables $1, $2, etc. denote not the script command
line arguments, but the values of the parameters with which the subroutine was called; $#
corresponds to their number.
Recall (see §1.2.6) that the $ symbol retains its special meaning inside double quotes, but loses
it inside apostrophes.
To go further, we need to know that any commands executed in the system have the property of
terminating successfully or unsuccessfully. For this purpose, programs, no matter what language they
are written in, when they terminate, inform the operating system about the success of their work in the
form of a so-called termination code; formally, this code is a number from 0 to 255, with zero being
considered a success code and all other numbers being considered unsuccessful. We will learn how
exactly all this happens later, but what is important for us now is the very fact that the termination code
exists, because in the Bourne Shell any condition - for a branch or for a loop - actually represents the
execution of a command, and its successful completion is considered as a logical truth, and its
unsuccessful completion is considered as false.
The most common way to do this is to use the test command built into the Bourne Shell
interpreter, which can check various assumptions. If the assumption is correct, the command will
terminate with a null (successful) return code, otherwise with a one (unsuccessful). A synonym for the
test command is the opening square bracket symbol, and the command itself in this case takes the
closing square bracket symbol as its parameter (as a sign of expression termination), which allows you
to visually write the tested expression, enclosing it in square brackets. Here are some examples.
[ -f "file.txt" ]
# whether a file named file.txt exists
[ "$I" -lt 25 ]
# the value of variable I is less than 25
[ "$A" = "abc" ]
# the value of variable A is the string abc
[ "$A" != "abc" ]
# the value of variable A is not the string abc
Note that the same thing could have been written differently, without using square brackets, but not as
clearly:
Of course, not only test, but also any other command can act as a condition. For example:
In addition to branching, the Bourne Shell language supports more complex constructs, including
loops. For example, the following fragment will print all numbers from 1 to 100:
I=1
while [ $I -le 100 ]; do
echo $I
I=$((( I + 1 )))
done
The -le checkbox, accepted by the test command, is derived from the words less or equal; -
lt is used for "strictly less", -gt and -ge are used for "more" and "greater than or equal to",
respectively.
The while construct is not the only loop option available in the Bourne Shell; a second loop
construct is built with the word for, specifying the name of a variable that must run through all values
in a given word list, the in keyword, and the word list itself, ending with a semicolon; the loop
body is framed with the same words do and done. Thus, the following loop
will print the English names of the rainbow colors (in a column, since the echo command translates
the line when it finishes). The for loop is especially convenient when combined with filename
substitution (see §1.2.7).
Information about the success of command execution can be used not only in if and while
constructs, but also with the help of the so-called logical connectives && and ||, corresponding to
the logical operations "and" and "or". As usual, logical truth corresponds to the successful completion
of the command, and falsehood - to the unsuccessful one; the operation of the bindings is based on
the fact that at certain values of the first operand of conjunction and disjunction the general result is
§ 1.2. How to use a computer properly 145
clear without calculating the second operand: if the first operand of conjunction is false, the result
(falsehood) is already known, you can do nothing further, and in the same way you can do nothing if
the first operand of disjunction is true. Simply put, the command line
will cause the interpreter to execute cmd1 first; and cmd2 will be executed only if cmd1 completes
successfully. Conversely, the command line
cmd1 || cmd2
represents a mapping between cmd1 and the cmd2 | cmd3 pipeline as an integer. The "truth"
value of a pipeline is determined by the success or failure of the last of its component commands. As
usual, the order of operations can be reversed by using parentheses, e.g.:
In this example, the standard output of cmdl and cmd2 (if, of course, it is executed at all) will be
directed to the standard cmd3 input.
All of the features listed here are available not only in scripts but also in a normal session, i.e. we
can enter, for example, a loop construct as a normal command and the loop will be execut ed
immediately; this is not so convenient but can be useful. The Bourne Shell language contains a lot of
other features that we will not cover here. For more detailed information about programming in this
language, you should refer to specialized literature (e.g., [1]).
path to the user's home directory; the LANG variable, which is used by multilingual
ru_RU.KOI8-R
In addition, the interpreter provides the ability to copy variable values back to the environment
using the export command:
PATH=$PATH:/sbin:/usr/sbin
export PATH
or just
export PATH=$PATH:/sbin:/usr/sbin
Internal variable assignments, such as those we used in the command files in the previous
paragraph, do not affect the environment by themselves.
The variable can be removed from the environment using the unset and export
commands:
unset MYVAR
export MYVAR
Modifying the environment affects the execution of all commands that we give to the
interpreter, because the processes started by the interpreter inherit a modified set of
environment variables. In addition, if necessary, you can run an individual command with an
environment modified just for it. This is done in the following way:
VAR=value command
For example, to change user information, including the command interpreter used, you can use
the chfn command, which can be implemented in different ways: in some systems it
asks the user a series of questions, and in others it offers to edit a certain text, from which it
then extracts the desired values. To edit text, the vi text editor is launched by default, which
is not convenient for all users. You can get out of this situation, for example, in the following
way:
§ 1.2. How to use a computer properly 147
EDITOR=joe chfn
xterm &
§ 1.2. How to use a computer properly 149
twm
In this case, xterm will run first (we run it just in case, so that we can work even if the
window manager has an inconvenient configuration), followed by the twm window manager.
Note that xterm runs in the background (the & sign is placed at the end of the first line for
this purpose). This is done so that you don't have to wait for it to finish to run twm.
If the graphical shell starts automatically at system boot, and you log in by entering your name
and password in the graphical input form (this input f orm is drawn for you by the so-called display
manager), you will need the .xsession file instead of .xinitrc to set up your session. In fact,
it is organized in the same way as .xinitrc, i.e. commands from it are executed after starting
your work session; you should only take into account that you give the startx command from an
existing work session (even if it is not a graphical one), where environment variables affecting the
work of many programs are already set up; when logging in through the display manager, it is the
.xsession file that is responsible for setting up the environment in the work session. Some
recommend that the commands for starting the programs that make up your work session be written
in the .xinitrc file, and that the . xsession file be built from two commands: executing a standard
script describing your work environment ( .profile) and running .xinitrc. The contents of
.xsession are as follows:
. "/.profile
. "/.xinitrc.
In this case, there's a good chance you'll get a completely identical working environment both when
you launch X Window via startx and when you log in via the display manager.
All window managers existing nowadays can be divided into those that try to implement
the desktop metaphor in addition to their main functions (window management), and those that
do not. The difference between them is huge; the former are usually even called desktop
environments (DE). These include Gnome, KDE, xfce, MATE, Cinnamon, Unity; "regular"
window managers are represented by such programs as IceWM, Fluxbox, Window Maker,
BlackBox, the already mentioned twm, as well as mwm, fvwm, AfterStep and many others.
Strictly speaking, not all DEs are window managers - some of them include their own window
manager as a separate program, e.g. xfce's window manager is xfwm, and Gnome's window manager
can even be changed. However, usually a window manager included in a DE is not designed to work
separately from its DE.
The history of the "desktop metaphor" is rather peculiar. It is believed to have been
invented back in 1970 at Xerox PARC, Xerox's research center; Alan Kay is credited as the
author. The first experimental computer with a graphical interface - Xerox Alto - appeared in
1973. It should be understood that at that time computers were still "big", the rapid transition
to the fourth generation of computers was almost ten years away; punch cards were actively
used to work with computers, they were just beginning to be overtaken by alphanumeric
terminals, and a graphical monitor, even equipped with a "mouse" manipulator (so familiar to
us now) was perceived by professionals as exotic. There were no end-users in the modern sense
at that time; any computer at all was "exotic" for the general public.
The first realization of a "desktop" for a computer available to mass consumers appeared
only a decade later - in 1983 - on the Commodore 64 computer, and there the graphical interface
was not the main one. As the main user interface "desktop" was used in 1984 by the creators of
§ 1.2. How to use a computer properly 150
Apple Macintosh. We can not say that this approach immediately became popular; on the
market flooded with "IBM-compatible" computers, the first shell with DE - Windows 1.0 -
appeared in 1985, but tangible popularity of systems of this line reached only in the early 1990s.
It is understandable: for all this graphical economy stopped disgustingly slow, computers had
to first become fast enough, and while all this was not too great, the public got used to working
in text mode (although not with the command line - the command line in MS-DOS was too
primitive for serious work). The marketers were not able to convince the mass user that
Windows was exactly what the user needed, despite titanic advertising efforts.
Unfortunately, nowadays most users cannot imagine working with a computer in any other
way than in the desktop metaphor; this partly explains the even sadder fact that almost all
popular Linux distributions, unless they are specifically tweaked after installation, offer some
variant of the Desktop Environment by default; one of the justifications for this is said to be the
desire to make it easier for users to migrate from the Windows world.
Of course, nothing good comes out of it. To begin with, all DEs as one require a lot of
resources and, which is quite natural for programs of this class, often "slow down" quite
noticeably, especially on "old" computers. Meanwhile, the graphical shell is an auxiliary tool
whose duty is to provide the user with an opportunity to run programs that solve his tasks; it is
for the sake of these programs, called application programs, that the computer itself, the
operating system, and the graphical shell exist; a situation in which auxiliary programs take
away valuable resources from applications looks strange, to put it mildly.
Keep in mind that if you do allow any DE to start in your system, it will feel obliged to
create a whole bunch of subdirectories in your home directory with names like Desktop,
Downloads, Documents, Music, Photos and so on - note that they are completely
empty. By the way, if you now throw some files into the Desktop directory, they will
immediately appear as icons on your main screen (outside of windows), which is, in fact, the
"desktop" in the sense of DEs running Unix. And be thankful if these directories are named in
English instead of Russian, and on top of that with spaces in the names (yeah, just like that -
Desktop, Downloads, Documents), because of which you will have strange problems
with programs that don't expect it, and difficulties when trying to deal with all this from the
command line.
As already mentioned, in order to learn programming (and in general to increase your own
efficiency when working with computers), you should make the command line interface your
main tool; Desktop Environment programs only get in the way of this, so it is advisable to
immediately replace the environment that your Linux distribution has installed by default with
one of the "lightweight" window managers. Fortunately, many of these are included in most
distributions, although they are not installed by default. The author of this book would venture
to recommend that you try IceWM first, but don't get "hooked" on it; when you get more or
less used to the new environment, you should definitely try other window managers.
If you are logging in in graphical mode, i.e. through the display manager (in most cases
this will be the case, although the system can always be reconfigured to not run the display
manager), see if the display manager itself, in addition to providing a username and password,
also provides some sort of menu or other means of selecting the desired window manager; the
appropriate choice is usually called something like session type in English and "session type"
or something similar in Russian. If there is no such option, you will have to use the .xinitrc
and .xsession files mentioned above; for example, you can write something like this in
§ 1.2. How to use a computer properly 151
.xinitrc
xterm &
icewm
(however, xterm is not necessary here, just icewm), and in the .xsession file, as we
suggested above, this:
. "/.profile
. "/.xinitrc.
You can, however, do it even simpler - create a single .xsession file, write a single word
icewm in it, and with a good chance you will be satisfied with the result.
Note the "&" symbol. We run the xterm program in the background, so that the old xterm
instance (with which we issue the command) does not have to wait for it to finish: otherwise,
starting a new xterm would make no sense, because we would not be able to use the
old one while it is running.
The xterm program has a well-developed system of options. For example,
will launch the terminal emulator on a black background with gray letters (the same set of
colors is usually used in a text console).
In most cases, a window that is partially hidden can be fully displayed (raised to the top
level) by clicking on its title bar (rather than anywhere in the window, as you may be used to).
Your settings may also allow you to do the reverse - to "drown" the window by showing what's
below it; this is usually done by right-clicking on the title bar. To move a window across the
screen, you can also use its title bar: just put the mouse cursor over the title bar, press (and keep
the left button pressed), select a new window position and release the button. If the window
title is not visible (for example, it is hidden under other windows), the same operation can be
done using vertical and horizontal parts of the window frame, except for selected areas in the
corners of the frame; these corner areas are used to resize the window, i.e. when you drag them
with the mouse, not the whole window is moved, but only the corner you have captured.
If you lose the window you need, you can usually find it easily by right-clicking in an
empty space on the screen - this will bring up a menu consisting of a list of existing windows.
In most cases, window managers support so-called virtual screens, on each of which you
can place your own windows. This is useful if you are working with a large number of windows
at the same time. The virtual screens map, which shows the virtual screens, is usually located
in the lower right corner of the screen; to switch to the virtual screen you need, just click on the
corresponding place on the map. Classic window managers (unlike IceWM, by the way) usually
consider all available virtual screens as parts of one "big screen", so, for example, a window
can be located partially on one virtual screen and partially on another. This is especially useful
when for some reason it is desirable to make a window larger than the size of your (real, if you
will, physical) monitor.
From the windows in which a particular text is displayed, it is usually possible to copy that
text to other windows. To do this, just select the text with the mouse; many programs running
under X Window do not have a special "copy" operation: the text that is selected is copied.
However, even if copy and paste operations are provided as separate entities - through menus
or hotkeys (for example, it is so in browsers and office applications), these operations belong
to another, parallel scheme of text copying, and do not cancel automatic copying of everything
that is selected. You can paste the selected text with the third (middle) mouse button. Most
likely, your mouse has a "wheel" for scrolling; note that this wheel can be pressed from top to
bottom without scrolling, and then it will work as a regular (third) button, which is what you
actually need. Besides, if, for example, you don't want to reach for the mouse, you can try
pressing the Shift-Ins key combination, most likely it will lead to the same result.
§ 1.2. How to use a computer properly 153
1.3. Now, a little math
In this chapter we will consider very brief information from the field of mathematics,
without knowledge and understanding of which you will definitely have problems during
further reading of this book (and learning programming). Most of this information relates to
so-called discrete mathematics, which is completely ignored in the school mathematics
curriculum, but in recent years has become part of the school computer science curriculum.
Unfortunately, the way these things are usually presented in school leaves us no choice but to
tell them ourselves.
Vasya and Petya decided to play spies. For this purpose Vasya was not lazy to
screw three colored bulbs to the window of his room, which can be seen from
outside if they are lit, and each of the bulbs can be lit independently of the others.
Vasya has screwed the bulbs on tightly, so it is impossible to change their places.
How many different signals can Vasya send to Petya with the help of his bulbs, if
the variant "none of them is lit" is also considered as a signal? Vasya doesn't know
exactly when Petya will decide to look at his window, so all kinds of variants with
Morse code and other similar signaling systems are not suitable: Vasya needs to
put the lights in the position corresponding to the signal to be transmitted, and
leave them in that position for a long time, so that Petya will definitely notice the
signal.
Many readers will probably give the correct answer without too much thought: eight; however,
what is of interest here is not how to calculate the answer (by raising a two to the right degree),
but why the answer is calculated in this way. To find out, we start with a trivial case: when
there is only one light bulb. Obviously, two different signals can be transmitted here: one of
them will be indicated by the bulb being on, and the other by the bulb being off.
Let us now add another light bulb. If, for example, this second bulb is always on, we can
transmit only two signals as before: "first bulb on" and "first bulb off". But nobody prevents
the second bulb from being turned off; in this position we will also have two different signals:
"first bulb on" and "first bulb off". The one to whom the signals are intended, in our task Petya,
can look at both bulbs, that is, consider the state of both of them. The first two signals (when
the second bulb is on) will be different for him from the second two signals (when the second
bulb is off). In total, therefore, we will get the possibility of transmitting four different signals:
off-off, on-off, on-off, on-off and on-off.
§1.3. Now a little math 154
Let's equip these four signals with numbers from 1 to 4 and add one more bulb, the third
one. If we turn it on, we can transmit four different signals (by turning the first two bulbs on
and off). If we turn it off, we will get four more signals, which will be different from the first
four; in total we will get eight different signals. No one forces us to stop there; numbering the
existing eight signals with numbers from 1 to 8 and adding a fourth bulb, we get 8 + 8 = 16
signals. The reasoning can be generalized: if with the help of n bulbs we can transmit N signals,
then adding a bulb with the number n +1 doubles the number of possible signals (i.e. they get
2N), because the first N we get with the help of the originally available bulbs when the new
one is turned off, and the second N we get (with the same available bulbs) if the new one is
turned on.
It will be useful to consider the degenerate case: no bulbs, i.e., n = 0. Of course, you can't
play spies in this way, but the case is nevertheless important from the mathematical point of
view. To the question "how many signals can be transmitted with 0 light bulbs", most people
will answer "none", but, strangely enough, this answer is unfortunate. Indeed, our "signals"
distinguish one situation from another, or, more precisely, they correspond to some different
situations. To be even more precise, we can see that there are actually infinitely many
"situations", it's just that in signaling we ignore some factors, thereby combining many
situations into one. For example, our young spy Vasya could signal "all the lights are off" as
"I'm not home"; the other seven combinations in our friends' signaling could mean "I'm doing
homework", "I'm reading a book", "I'm eating", "I'm watching TV", "I'm sleeping", "I'm doing
something else", and finally, "I'm not doing anything, I have nothing to do at all, so come visit".
If we look carefully at this list, we can notice that in any of the situations further clarifications
are possible: the signal "I am eating" can equally denote the situations "I am having lunch", "I
am having dinner", "I am eating a delicious cake", "I am trying to overcome unpalatable and
disgusting peppers", etc. etc.; "I am doing my homework" can equally well mean "I am solving
math problems", "I am coloring the maps assigned in geography", or "I am blowing off my
neighbor Katya's Russian exercises". Possible variants are "I'm doing my homework and I'm
feeling good, so I'll get it all done soon" and "I'm doing my homework, but I have a
stomachache, so homework will take longer today." Each of the possible signals somewhat
reduces the overall uncertainty, but, of course, does not eliminate it.
Let's go back to our degenerate example. Not having a single light bulb, we cannot
distinguish situations from each other at all, but does that mean that we don't have any situations
at all? Obviously not: our young spies are still engaged in something or, on the contrary, not
engaged, it is just that our degenerate version of signaling does not allow us to distinguish these
situations. Simply put, we have combined all possible situations into one, completely removing
any certainty; but we have combined them into one situation, not zero of them.
Now everything becomes clear: at zero bulbs we have one possible signal, and adding each
new bulb doubles the number of signals, so that N = 2", where n is the number of bulbs and N
is the number of possible signals. In passing, we note that sometimes the above reasoning
allows us to better understand why k° == 1 for any k > 0.
The problem about the number of signals transmitted by n light bulbs, each of which may
or may not be lit, is equivalent to many other problems; before we get into the dry math, here
is another formulation:
Masha has a brooch, a chain, earrings, a ring and a bracelet. Every time she
§1.3. Now a little math 155
leaves the house, Masha thinks long and hard about which jewelry to wear this
time and which to leave at home. How many choices does she have?
To understand that this is the same problem, let's introduce an arbitrary assumption that is not
related to the essence of the problem and does not affect this essence in any way: let our young
spy Vasya from the previous problem turned out to be Masha's younger brother and decided to
tell his friend Petya what jewelry his sister has put on this time. To do this, he has to add two
more bulbs to the three already existing ones, so that there are as many bulbs as Masha has
jewelry. The first bulb out of five will indicate whether Masha is wearing a brooch, the second
bulb will indicate whether Masha is wearing a chain, and so on, one bulb for each piece of
jewelry Masha has. We already know that the number of signals transmitted by the five bulbs
is 2 = 32; it is obvious that this number is exactly equal to the number of combinations of
5
Given a set of n elements. How many different subsets of this set are there?
Answer 2" is easy to remember, and unfortunately, that's what they usually do in school; the
result of this "cheap and angry" approach to learning math is the ease with which a student can
be completely confused by any non-standard formulation of the problem conditions. Here's an
example:
Dima has four different-colored cubes. By placing them one on top of the other,
Dima builds "towers", and the one cube with nothing on it is also considered a
"tower" by Dima; in other words, Dima's tower has a height from 1 to 4 cubes.
How many different "towers" can be built from the available cubes, one at a time?
If only you knew, dear reader, how many high school students, without thinking at all, give the
answer 2 to this problem! By the way, some of them, noticing that the empty tower is excluded
4
by the terms of the problem, "improve" their answer by subtracting the "forbidden" variant, and
get 2 - 1. It doesn't become more correct from that, because it is simply not the same problem
4
in which a two is raised to the degree p; but to notice it, one should understand why a two is
raised to the degree p in "that" problem, and schoolchildren who have memorized the "magic"
2 have fatal problems with it.
"
By the way, the correct answer to this problem is 64, but the solution has nothing to do
with raising a two to the sixth degree; if there were three cubes, the answer would be 15, but
for five cubes the correct answer is 325. The point here, of course, is that in this problem it
matters not only what cubes the tower consists of, but also in what order the cubes that make
up the tower are arranged. Since for towers consisting of more than one cube it is possible to
get different variants simply by swapping the cubes, the resulting combinations are much more
numerous than if we consider possible sets of cubes without taking into account their order.
Before proceeding to the problems in which permutations are essential, let us consider a
couple more problems on the number of variants without permutations of elements. The first
of them we will "make up" from the original problem about young spies:
§1.3. Now a little math 156
Daddy brought Vasya a cheap Chinese lamp that can either be turned off, just glow or
blink. Vasya immediately screwed the lamp to his window, which already has three
ordinary light bulbs. How many different signals can Vasya transmit to Petya now that
he has improved his spy equipment?
The problem is, of course, quite elementary, but it is interesting in one way. If a person
understands how (and why exactly so) the problem with three ordinary light bulbs is solved,
then he will not have any problems with a "tricky" light bulb with three different states; but if
the problem is tried to solve by an average schoolboy, who was taught the formula N = 2"
without explaining where it came from, then with a good degree of certainty he will "sit down"
on this new problem. And it is solved by the same reasoning: we can transmit eight different
signals if the Chinese lamp is extinguished; we can transmit the same number of signals if it is
just lit; and the same number of signals if it is blinking. Total 8 + 8 + 8 + 8 = 3 - 8 = 24. This
case shows how much more valuable the scheme of formula derivation is than the formula
itself, and now it is time to note that in combinatorics it is always so; moreover, combinatorial
formulas are simply harmful to remember, it is better to derive them every time, all of them are
so simple that you can derive them in your mind. If you memorize any formula from the field
of combinatorics, you run the risk of applying it inappropriately, as the above-mentioned
schoolchildren do when trying to solve the problem about towers of cubes.
Another task on the same topic looks like this:
Olya has blanks for flags in the shape of a regular rectangle, a triangle and a rectangle
with a cut-out; Olya also has patches of other colors in the shape of a circle, a square,
a triangle and a star. Olya decided to make a lot of flags for the holiday; how many
different ones
flags she can make by sewing one of the available patches onto one of the available
blanks?
§1.3. Now a little math 157
This problem is also, one could say, a standard textbook problem, so most people who at least
roughly know what we are talking about will simply multiply the two numbers and get the
absolutely correct answer - 12. But what is much more interesting here is not how to solve a
particular problem, but in what ways it can be done. To begin with, let's note that our reasoning
with the light bulbs also works remarkably well here: indeed, if Oli had only rectangular blanks,
she could make as many different flags as she has different patches, i.e. four. If she had only
triangular-shaped blanks, she could also make four different flags, and the same is true if all
her blanks were rectangular-shaped with a cutout. But the first four variants differ from the
second four variants, and the third four variants differ from both the first and the second by the
shape of the blanks; therefore, the total number of variants is 3 - 4 = 12.
Another variant of reasoning may be more interesting. Let's make a table with three
columns for the number of different blanks and four rows for the number of different patches.
In each column we will place the flags made using the corresponding blank, and in each row
we will place the flags made using the corresponding patch (see Fig. 1.7). For anyone who has
formed the abstraction of multiplication in his brain, it is immediately obvious that the cells
with flags are 3 - 4 = 12; interestingly, this insight is akin to the concept of the area of a
rectangle, only for the discrete case.
Consider another problem similar to the one we left for later:
Dima has a box with cubes of four different colors. By placing the cubes one on
top of the other, Dima builds "towers" up to four cubes high, and Dima also
considers one cube with nothing on it to be a "tower". How many different
"towers" can be built from the available cubes? It is considered that Dima has as
many cubes of each color as he wants.
Despite the apparent similarity (a large part of the text here was simply copied), this problem
is much simpler than its previous version, where there were only four cubes. However, again,
if you don't understand how combinatorial results are obtained, this problem is impossible to
solve, because standard formulas don't work for it. The correct answer here is 340; we invite
the reader to demonstrate how this answer was obtained.
So far, all the problems we have considered have been solved without taking permutations
into account; we have not solved the only problem in which permutations turned out to be
essential. We will start our discussion of problems with permutations with the canonical
problem of the number of possible permutations. As usual, let us first formulate it in the
schoolboy way:
Kolya has seven American billiard balls with different numbers (for example, "solid"
from one to seven) in his bag. How many different ways can Kolya put them in a row on
the shelf?
There are two ways to arrive at the correct answer, and we will consider them both. Let's start
with a trivial variant: there is only one ball, how many "ways" can we put it on the shelf?
Obviously, there is only one way. Now let's take two balls; it doesn't matter what their numbers
are, the result doesn't change, but let them be balls numbered 2 and 6 for definiteness.
Obviously, there are two ways to arrange them on the shelf: "two on the left, six on the right"
§1.3. Now a little math 158
and "six on the left, two on the right". The first way is called direct, the second way is called
reverse, because the numbers of balls from left to right in this case do not increase, but, on the
contrary, decrease.
Now let's add a third ball (for example, let it be number 3) and see how many ways there
are. We can choose the leftmost ball in three ways: put a two on the left, put a two on the left
®® ®® ®
®® ®® ®
®® ® ®® ®
Figure 1.8. Transpositions of three balls
a three or put a six on the left. Whichever ball we choose, the remaining two balls on the
remaining two positions can be placed in two ways already known to us - forward and
backward; in other words, for each choice of the leftmost ball there are two choices for the rest,
i.e. there are six choices in total (Fig. 1.8). Note that permutations are usually numbered in this
order: first they are sorted in ascending order of the first element (i.e., first come permutations
in which the first element has the smallest number, and at the end - permutations where the
number of the first element is the largest), then all permutations having the same first element
are sorted in ascending order of the second element, and so on.
If now we add a fourth ball (let it be the ball with number 5), we will get four ways of
choosing the leftmost one of them, and with each such way the rest of the balls can be arranged
in the six ways already known to us; the total number of permutations for the four balls is 24.
Now we are hopefully ready to generalize: if for k - 1 balls there are M^-i possible permutations,
then by adding the k-th ball we increase the number of permutations by a factor of k, i.e. M = к
k - M^-i. In fact, when adding the k-th ball the total number of balls becomes k, i.e. the very first
(for example, the leftmost) ball can be chosen in k ways, and the rest, according to the made
assumption, can be arranged (with the leftmost one fixed) in M 1 ways. Since we started from
к
the fact that for one ball there is one possible permutation, i.e. Mi = 1, the total number of
permutations for k balls will be equal to the product of all natural numbers from 1 to k:
M = k - Mk-i = k - (k - 1) - M = - - - - = k - (k - 1) - ... 2 - 1
к к2
As is well known, this number is called the factorial of k and is denoted "k!"; in fact, the
definition of the factorial of the natural number k is "the number of permutations of k elements,"
and that the factorial is equal to the product of the numbers from 1 to k is a corollary.
For the problem formulated above, the answer will thus be 7! = 5040 combinations.
This result can be reached in another way. Consider a bag containing seven balls and seven
empty positions on the shelf. We can choose a ball to fill the first empty position in seven ways;
whichever ball we choose, there will be six in the bag. In other words, when one position has
§1.3. Now a little math 159
already been filled, we have six choices to fill the second position, regardless of which of the
seven possible ways the first empty position was filled. Thus, for each of the seven ways to fill
the first position we have six ways to fill the second position, and the total number of ways to
fill the first two positions is 7 - 6 = 42. There are five balls left in the bag, i.e. for each of 42
combinations of the first two balls there are five variants of the third ball; the total number of
variants for the first three balls is 42 - 5 = 210. But for each such combination we have four
ways of choosing the next ball, because there are four balls left in the bag; and so on. We can
choose the penultimate ball from the remaining balls in the bag in two ways, the last one - in
one way. It turns out that we have a total of seven ways of arranging the seven balls
7 - 6 - 5 - 4 - 3 - 2 - 1=7! = 5040
Repeating the same reasoning for the case of balls, we come to the already familiar expression
k - (k - 1) - (k - 2) - ... - 3 - 2 - 1 = к!
- only this time we come to it moving from larger numbers to smaller ones, instead of vice
versa, as in the previous reasoning. Note that both considerations will be useful for
understanding further calculations.
Let us now consider the intermediate problem, starting, as usual, with a special case:
Kolya still has seven American billiard balls in his bag with numbers from one to
seven. Vasya showed Kolya a small shelf on which only three balls can fit. How
many different ways can Kolya fill this shelf with balls?
Undoubtedly, the reader will easily find the answer to this question by repeating the first three
steps of the above reasoning: we can choose the first of the three balls in seven ways, the second
in six ways, and the third in five ways; the answer is 7 - 6 - 5 = 210 choices. This number can
be written using the factorial symbol:
§1.3. Now a little math 160
7-6-5-4-3-2-1 7!
7-6-5 =
4-3-2- 1 4!
In the general case when we have n items (elements of a set)
and we need to compose an ordered set (tuple) of length k, we have:
n - (n - 1) - ... - (n - k + 1) =
n - (n - 1) - ... - (n - k + 1) - (n - k) - ... - 2 - 1
= (n - k) - ... - 2 - 1=
п!
(n - k)!
This quantity, called the number of placements from n to k, is sometimes denoted by A^ (read
A from en to ka) in the Russian-language literature. In the English-speaking literature, as well
as in some Russian-speaking sources, the notation (nD is used, which is called the decreasing
factorial.
Now is probably a good time to solve the problem that we formulated on page 139 but did
not solve. 139, but we didn't solve it. Let's recall its condition:
Dima has four different-colored cubes. By putting them one on top of another, Dima
builds towers, and Dima considers one cube, on which nothing stands, to be a tower too;
in other words, Dima's tower has a height from 1 to 4 cubes. How many different towers
can be built from the available cubes?
Clearly, there will be 4 towers of four cubes! = 24. There will be the same number of towers of
three cubes: each of them is obtained from one strictly defined tower of height 4 by removing
one cube, but the fact that Dima keeps this cube in his hands or has put it away instead of
putting it on top of the tower does not change the number of combinations. There will be 4 - 3
= 12 towers of two cubes, and 4 towers of one cube, according to the number of cubes available.
24 + 24 + 12 + 4 = 64, this is the answer to the problem.
Now we come close to another classical problem, for the sake of which, in general, the
whole conversation about permutations was started. As usual, we start with a special case:
In Kolya's bag there are the same seven American billiard balls with numbers from one
to seven. Vasya gave Kolya an empty bag and asked him to put any three balls into it.
How many different ways can Kolya fulfill Vasya's request?
This problem differs from the previous problem about Kolya and Vasya in that the balls in the
bag are obviously intermingled; in other words, we are no longer interested in the order of the
elements in the final combinations. To understand how this problem is solved, let us imagine
that Kolya is also interested in how many ways he can choose three balls out of the available
seven without taking order into account, and he first wrote down on a piece of paper all 210
variants obtained when solving the previous problem, where instead of a bag there was a shelf,
i.e. all the possible variants of the placements from seven to three, taking into account the order
of elements. Knowing that the variants differing only in the order of the elements will now have
to be considered as identical, Kolya decided to see how many times the combinations consisting
of balls with numbers 1, 2 and 3 occur among the 210 combinations written out. Having
§1.3. Now a little math 161
carefully looked through his records, Kolya found six such combinations: 123, 132, 213, 231,
312 и 321. Deciding to check some other set of balls, Kolya looked through his list for
combinations using balls with numbers 2, 3, and 6; he found six such combinations: 236, 263,
326, 362, 623, and 632 (these combinations are already familiar to us from Figure 1.8 on page
143).
At this point in his research, Kolya began to guess (hopefully together with us) that the
same thing would happen for any set of balls. In fact, the list of 210 combinations includes all
possible choices of three balls out of seven, taking into account their order; as a consequence,
whatever three balls out of seven we take, our list will contain, again, all combinations
consisting of these three balls, that is, simply, all permutations of the chosen three balls; well,
as we know, there are 3 permutations of three elements! = 3 - 2 - 1 = 6. It turns out that any of
the combinations we are interested in is represented in the list six times; in other words, the list
is exactly six times longer than the result we need. All we have to do is divide 210 by 6, and
we get the answer to the problem: 35 = TS0 = W3 = 44.
, 63! 4!3!
In the general case, we are interested in how many ways we can choose k items from the
available n without taking into account their order; the corresponding value is called the number
of combinations of n by k and is denoted as CP (read "Ce of en by ka"; the letter C comes from
the word combinations, i.e., "combinations" or "combinations"). Repeating the above reasoning
for the general case, we note that, if we consider all (n)& placements (which differ from
combinations in that in them the order of elements is considered important), then in it each
combination will be represented by k! times the number of possible permutations of k elements,
i.e. the number (n)& = ( - )! exceeds the sought SP exactly k! times. As a matter of fact, all that is
п
!
к
left is to divide ( PC; by k!, and we get the most important formula of school combinatorics:
п
^к _n\
п
= k!(n - k)!
And now we will tell you something that you could hardly have been told in school: under no
circumstances, for no carrot, under no circumstances, memorize this formula! If you do
happen to memorize it, or if you managed to memorize this formula by heart before you got
your hands on our book, try to forget it again, like the worst nightmare. The point is that
remembering this formula is simply dangerous: it is tempting to apply it without too much
thought in any combinatorial problem where any two numbers are present; in most cases such
application will be erroneous and will give wrong results.
Instead of memorizing the formula itself, memorize the scheme of its derivation. When you
really need to find the number of combinations of n initial elements by k selected elements, you
can derive the formula for SP in your mind while you are writing it; to do this, write "SP =" and draw
a fractional line and run through the following in your mind: the total number of permutations
of n is n! but we only need k terms of the factorial, so we remove the extra terms by dividing by
(n - k)!; what we get is the number of combinations, taking order into account, and we don't
care about order, so we count each combination k! times, divide by k! and get what we need.
This approach will prevent you from misusing the formula, because you will know exactly what
the formula means and what you can use it for.
It is interesting that the same formula can be derived by another reasoning. Imagine that
we first put n balls in a row on the shelf, then separated the first k of them and poured them into
§1.3. Now a little math 162
a bag, and the rest n - k poured into another bag; of course, in each bag the balls are mixed up.
How many such (final) combinations are possible, in which the balls are placed in two bags,
and the order in which the balls are poured into each of the bags does not concern us? Let's
follow the scheme of reasoning already familiar to us: initially we had n! combinations, but
among them every k! became indistinguishable due to the fact that k balls were mixed in the
first bag, so we are left with P! combinations (after the first k balls were poured into the first bag
and mixed there, and the remaining (n - k) balls were not poured anywhere yet), but among these
combinations every (n - k)! then also became indistinguishable due to the mixing of balls in the
second bag. Total n! exceeds the desired SP by k!(n - k)! times, i.e. SP = щП-к u-
This reasoning is remarkable because, unlike the previous one, it is symmetric: both
multipliers in the denominator of the fraction are obtained in the same way. The problem does
indeed have some symmetry: to choose which balls to pour into another bag can obviously be
chosen in the same number of ways as to choose which balls to leave in the bag. This is
expressed by the identity Cк = C"к .
For degenerate cases one makes the assumption that C° = C™ = 1 for any natural p, but this
assumption, as it is easy to see, is quite natural. Indeed, C0 corresponds to the answer to the
question "in how many ways can zero balls from n available be transferred to another bag".
Obviously, there is only one way: we simply do nothing, and all the balls remain in the original
bag, and the second bag remains empty. Things are almost as simple with C": "how many ways
can we pour n balls from a bag with n balls into another bag"? Naturally, exactly one: we put
the second bag, pour into it everything found in the first bag, and the job is done.
The numbers C " are called binomial coefficients, because through them we can write the general
form of the decomposition of Newton's binomial:
(a + Ъ)" = V S "a"-к Ък
k=0
For example, (a + Ъ) =
5
a5 + 5a Ъ + 10a Ъ + 10a Ъ + 5a
4 32 23 4
+ Ъ5 , with the numbers 1, 5,10,10, 5 and 1
representing C0, C1, C , C3,C and C5. It is interesting that the majority of professional mathematicians
2 4
consider everything here so obvious that they do not condescend to any explanations; meanwhile, the
rest of the public, including the majority of people who have higher technical education but are not
professional mathematicians, do not see any connection between the problem of pouring balls from
bag to bag and the decomposition of Newton's binomial; when asked where the combinations of balls
in the binomial formula came from, they usually answer with the sacramental "it just so happened",
apparently believing that what is happening is either a random coincidence or an accidental
coincidence.
Meanwhile, in order to obtain in the process of decomposition of the binomial our "problem about
bags of balls", it is enough to notice that it should not be a question of how to decompose the binomial
into summands (this question is too general and affects by no means only combinations), but about
what coefficient will stand at each term of the decomposition.
Just in case, let us remind you how the school "opening of brackets" occurs when multiplying one
sum by another. To multiply the sum (a1 + a2 + ... + a") by the sum ( Ъ1 + Ъ + ... + Ъ ), you must first
2 т
multiply each of the summands of the first sum by the first summand of the second sum (in this case -
by Ъ1), so that you get a1 Ъ1 + a2 Ъ1 + ... + Ya "Ъ1; then do the same with the second summand
of the second sum (in this case - by Ъ1); then do the same with the second summand of the second
sum. + Ya "Ъ1; then do the same with the second summand of the second sum, obtaining a-1 Ъ2 + a
§1.3. Now a little math 163
Ъ22 + ... + Ya "Ъ2 , and so on for each summand of the second sum; all the obtained chains of summands
should be added together. The result is a sum consisting of pt summands representing all possible
products of the form aibj. In particular, (a + Ъ)(c + d) = as + Ъc + ad + Ъd.
§1.3. Now a little math 164
С
0 1
C0C4
g" *0<*1g "*2 1 1
'2 '2 '2 1 2 1
<-*1<-*2<-*3
'3' 3 '3 '3
<*1<*2<*3<*4 1 3 3 1
С С С С С
4 4 4 4 4 14 6 4 1
g*0<*1g " *2<*3<*4<*5
С С С С С С
5 5 5 5 5 5 5 10 10 5
Ю<*1<*2<*3 Г' 4<*5<*6
С С С С С С
'6 6 6 6 6 6 6 61 20 15 6
Figure 1.9. Pascal's triangle 5
It is clear that if you do not bring in homogeneous terms when opening the parentheses in the
expression (a + b) ", you will get 2" summands in the final sum. For example:
For the sake of clarity, we have not used degree exponents here. Each summand of the final expansion
is a product in which either a or b is taken from each initial "bracket", and the sum itself consists of all
possible such summands. It is not difficult to see that there are exactly 2 of them possible, for from each
bracket we must take either a or b, i.e. we get the familiar problem about young spies and light bulbs;
but this is not relevant in this case.
After adding such terms, we obviously obtain the sum of one-members of the form Mak b" (recall
-k
that in our decomposition example n = 4, but this is only a partial illustration of the general reasoning),
and it remains for us to find out what M is equal to; it is easy to guess that M is the answer to the
question how many ways we can choose from all n "brackets" to "brackets", from which we take the
summand a as a denominator and from the rest we take the summand b. In this formulation it becomes
clear that this is, in fact, our problem about balls: instead of balls we have "brackets"; instead of moving
the ball into another bag we choose the summand b from the "bracket", instead of leaving the ball in
the original bag we choose the summand a from the "bracket". In particular, for our example, the
monomial a2 b2 occurs six times in the expansion:
Pascal's triangle has many interesting properties, which we will not enumerate here,
because we have already got a bit carried away. Let us note only one of them: the sum of
numbers in any line of Pascal's triangle is 2 , where n is the number of the line if they are
п
numbered from zero (i.e., n corresponds to the degree of the binomial whose expansion
coefficients make up this line). In other words, ^k=o C^ = 2 . This property also has a completely
п
trivial combinatorial meaning, which we suggest the reader to find on his own as an exercise.
To conclude the discussion of combinatorics, let us consider another textbook
problem:
Seven chess players are participating in a chess tournament, and it is
assumed that each of them will play exactly one game with each other.
How many games will be played in total?
It is clear that each of the seven must play six games, one with each of the other
participants of the tournament. But the following phrase for some reason drives many
novice combiners into a stupor: since two people participate in each game, the total
§1.3. Now a little math 154
number of games will be twice less than 7 - 6, i.e. 726 = 21 games will be played.
Since there are often difficulties with this "two people participate in each game",
we will have to give some explanations, and we will give them in two ways. First of
all, let's remember that chess players in competitions necessarily write down all moves,
and both participants of each game do it; the filled-in protocols are then handed over to
the judges. Imagine now that each of the tournament participants has prepared one
protocol form for each upcoming game. It is clear that each of them prepared six such
forms, and in total, therefore, they were prepared 6 - 7 = 42. Now the chess players met
in the hall on the day of the tournament and began to play; after each game its
participants handed their protocols to the judges, i.e. after each game the judges
received two protocols. At the end of the tournament, obviously, all 42 protocols end
up with the judges, but the judges received two protocols after each game - hence, there
were half as many games, i.e. 21.
There is a second variant of the explanation. The results of sports competitions
under the so-called "round robin system", where everyone plays exactly one game with
everyone else, are usually presented in the form of a tournament table, with a row and
a column for each participant in the tournament. Diagonal cells of the table, i.e. such
cells, which stand at the intersection of the row and column corresponding to one
player, are shaded, because nobody is going to play with himself. Further, if, for
example, player B and player D played a game and B won, then it is considered that
the game ended with the score 1:0 in favor of B; in his row at the intersection with
column D is entered the result "1:0", while in the row D at the intersection with column
B is entered the result "0:1" (see Fig. 1.10).
It is obvious that at first there were 7 - 7 = 49 cells in the table, but seven of them
were immediately painted over and there are 42 cells left; at the end of each game two
cells are filled in, i.e. after 21 games all cells will be filled in and the tournament will
be over.
А Б В Г Д Е Ж
А XXX
Б XXX 1:0
В XXX
Г XXX
Д 0:1 XXX
Е XXX
Ж XXX
Fig. 1.10. Tournament table
Translated into a purely mathematical language, this problem turns into a problem
about the number of edges in a complete graph. Recall that a graph is a finite set of
abstract vertices, as well as a finite set of unordered pairs of vertices, which are called
edges; a graph is represented as a picture in which vertices are denoted by points and
edges by lines connecting the corresponding vertices. A complete graph is a graph in
which any two vertices are connected by an edge, and only one edge at that. Complete
§1.3. Now a little math 155
n(n~ 1)
a graph with n vertices contains a friction; indeed, in
each vertex has (n - 1) edges, i.e. there are n(n - 1) "edge ends" in the graph, but since
each edge has two ends, their total number is п(п21) .
three-digit decimal numbers, from 0 to 999. In this case, we can initially assume that there are
an infinite number of digits, just that all of them, except for the first (lowest) k, contain zeros.
When adding one to the number n - 1 (in our example - to the number 999) we exhaust
к
the possibilities of k digits and have to use one more digit, the (k + 1)-th one. It makes no
sense to use a new digit before this moment, because we can represent all smaller numbers
using only k digits, and if we get into the next digit, we will get more than one representation
for the same number. But when all combinations of lower digits have been exhausted, we
have no other option but to use the next digit. The logical thing to do in this next digit is to start
with the smallest possible digit, i.e., one, and to zero all the lower digits to "start over"; thus,
the one to + 1 digit must correspond to the total number of combinations that can be obtained
in the first k digits.
The fact that all mankind now uses the base-10 system is nothing more than an
accident: the base of the number system corresponds to the number of fingers on our
hands. Working with this system seems to us "simple" and "natural" only because we
get used to it from early childhood; in fact, as we will see later, counting in binary is
much easier, there is no multiplication table needed (at all; that is, it simply does not
§1.3. Now a little math 156
exist there), and multiplication in columns, so hated by schoolchildren in the lower
grades, in binary turns into a trivial procedure of "writing out with shifts". Back in the
17th century, Gottfried Wilhelm Leibniz, who was the first in history to describe the
binary number system in the form in which it is known now, noticed this circumstance
and stated that the use of the decimal system is a fatal mistake of mankind.
Anyway, we can, if we wish, use any number of digits, starting from two, to create
a positional number system; if we follow the traditional approach and, using n digits,
assign values from 0 to (n - 1) to them, then with such a number system we can work
57
in much the same way as with the familiar decimal system. For example, in any number
system, writing down the number 1000 (where n is the base of the number system) will
п
pentameter system it is 125. You just need to remember one important thing. The
number system determines how the number will be written, but the number itself and
its properties do not depend on the number system; a prime number will always
remain prime, an even number will always remain even, 5 - 7 will be 35 regardless of
what digits (even Roman digits!) we write these numbers with.
Before proceeding to consider other systems, let us note two properties of the
ordinary decimal notation of a number that generalize without change to number
systems on a different base. The first of these follows directly from the definition of
positional notation. If a number is represented by the digits dkdk-i... d d.]d0..., then its
2
numerical value will be ^2 10 d ; for example, for the number 3275, its value is
i=0
k
k
second property requires a slightly longer explanation, but, by and large, it is not more
complicated: if we divide a number by 10 with a remainder and write out the
remainders, and so on until the next partial is zero, we will get (in the form of written
out remainders) all the digits that make up this number, starting with the youngest. For
example, divide 3275 by 10 with a remainder, we get 327 and 5 in the remainder; divide
327 by 10, we get 32 and 7 in the remainder, divide 32, we get 3 and 2 in the remainder,
divide 3 by 10, we get 0 and 3 in the remainder. The sequence of remainders 5, 7, 2, 3
are the digits of the number 3275.
Both of these properties generalize to the number system on an arbitrary base, only
instead of 10 in calculations we need to use the corresponding base of the number
system. For example, for the semeric entry 1532u the numerical value will be B7 +5+ 32
+3-7 +2 = 611 (of course, we perform all calculations in the decimal system, because
1
it is easier for us). Now let's try to find out what digits make up the semeric notation of
the number 611, for which we will successively perform several divisions by 7 with a
remainder. The result of the first division will be 87, with a remainder of 2; the result
of the second division will be 12, with a remainder of 3; the result of the third division
will be 1, with a remainder of 5; the result of the fourth division will be 0, with a
remainder of 1. So, the semeric notation of the number 611 consists of the digits 2, 3,
5, 1, if we list them starting from the youngest, that is, this notation is 15327 (we have
already seen it somewhere).
57
Other approaches are possible; for example, a number system using three digits whose values are
0, 1, and -1 is quite often mentioned in the literature; we will leave such number systems outside the
scope of our book, but the interested reader can easily find descriptions of them in other sources.
§1.3. Now a little math 157
As we see, the first of the two formulated properties of the positional notation
allows us to translate a number from any number system into the one in which we are
used to perform calculations (for us it is the system on base 10), and the second property
- to translate a number from the notation we are used to (i.e. decimal) into a notation in
an arbitrary number system.
Note that when converting a number from "some other" system to decimal, we can
save on multiplications by representing
in the form of
number). Adding 64, 8, 4 and 1, we get 77, this is the number we are looking for.
There are two ways to convert from decimal to binary. The first is the traditional
one: divide the original number in half with a remainder, writing out the resulting
remainders until zero remains in the quotient. Since division in half is not difficult to
perform in the mind, usually the whole operation is carried out by drawing a vertical
line on the paper; on the left (and from top to bottom) write first the original number,
then the results of division, and on the right write out the remainders. For example,
converting the number 103 to binary yields: 51 and 1 as a remainder; 25 and 1 as a
§1.3. Now a little math 159
remainder; 12 and 1 as a remainder; 6 and 0 as a remainder; 3 and 0 as a remainder; 1
and 1 as a remainder; 0 and 1 as a remainder (see Figure 1.11, left). All that remains is
to write out the remainders, looking from bottom to top, and we get 110011l2. Similarly,
for the number 76 we get 38 and 0, 19 and 0, 9 and 1, 4 and 1, 2 and 0, 1 and 0, 0 and
1; writing out the remainders, we get 10011002 (ibid., right).
There is another way, based on the knowledge of degrees of two. At each step we
choose the greatest degree of two, not exceeding the remaining number, then write out
a unit in the corresponding digit, and subtract the corresponding degree from the
number. Suppose, for example, we needed to convert the number 757 into binary. The
greatest degree of two that does not exceed it is the ninth (512), leaving 245. The next
degree of two is the seventh (128, since 256 is not suitable); that leaves 117. Next in
exactly the same way, subtract 64, leaving 53; subtract 32, leaving 21; subtract 16,
leaving 5; subtract 4, leaving 1; subtract 1 (the zero degree of two), leaving 0. The result
will be 10111101012. This method is especially convenient if the original number
slightly exceeds one of the powers of two: for example, the number 260 is converted to
"binary" almost instantly: 256 + 4 = 1000001002.
Since, as we have already mentioned, programmers often use the number systems
of base 16 and (a little less often) 8 to reduce binary numbers, there is often a need to
convert from binary to binary.
and vice versa. Fortunately, if the base of one number system is a natural degree of
n of the base of another number system, then one digit of the first system
corresponds exactly to n digits of the second system. In practice, this property applies
only to translations between binary and base 8 and 16, although it would be possible,
for example, to translate numbers from ternary to nine and back again in the same way;
it is just that neither ternary nor nine has been widely used in practice.
To convert a number from octal to binary, each digit of the original number is
replaced by the corresponding three binary digits (see Table 1.6). For example, for the
number 3741 , these would be the groups of digits 011 111 100 001, the insignificant
g
The reverse conversion of a fractional number from decimal to binary is also easy,
but it is a bit more difficult to explain why it is done in this way. To begin with, we
separately convert the integer part of the number, write out what we get, and forget
about it, leaving only the decimal fraction, which is obviously smaller than one. Now
we need to find out how many halves (one or none) we have in this fractional part. To
do this, it is enough to multiply it by two. In the resulting number, the integer part can
be equal to zero or one, this is the desired "number of halves" in the original number.
Whatever the obtained integer part is, we write it out as another binary digit, and
remove it from the working number, because we have already taken it into account in
the result. The remaining number is again a fraction, obviously smaller than one,
because we have just cut off the whole part; we multiply this fraction by two to
determine the "number of quarters", write it out, cut it off, multiply it by two, determine
the "number of eighths", and so on.
For example, for the already familiar number 5.40625, the conversion back to
58
Hereinafter, in the recording of positional fractions, we will use a period to separate the fractional
part, rather than a comma, as is usually done in Russian-language literature. The point has always been
used in English texts in this role, and as a result it is used in all existing programming languages. When
programming, you should accustom yourself to the idea that there is no such thing as a "decimal point",
but rather a "decimal point".
§1.3. Now a little math 161
binary would look like this. We immediately translate the integer part as an ordinary
integer, get 101, write out the result, put a binary dot and forget about the integer part
of our original number. We're left with 0.40625. Multiply it by two and we get 0.8125.
Since the integer part is zero, we write out the digit 0 in the result (right after the decimal
point) and continue the process. Multiplying 0.8125 by two gives 1.625; write out a one
in the result, remove it from the working number (we get 0.625), multiply it by two to
get 1.25, write out a one, multiply 0.25 by two to get 0.5, write out a zero, multiply it
by two to get 1.0, write out a one. This is the end of the translation, because we still
have a zero in the working number, and, of course, no matter how many times we
multiply it, we will get only zeros; note that in principle we have the right to do so -
because to the obtained binary fraction we can add to the right side as many zeros as
we want, all of them will be insignificant. The written out result is 101.0110²2, which,
as we have seen, is the binary representation of the number 5.40625.
It will not always be so favorable; in most cases you will get an infinite (but of
course periodic) binary fraction. To understand why this happens so often, it is enough
to remember that any finite or periodic decimal fraction can be represented as an
irreducible simple fraction with an integer numerator and a natural denominator; in fact,
this is the definition of a rational number. So, it is not difficult to see that in the form
of a finite binary fraction such and only such rational numbers are representable,
which have the degree of two in the denominator. Of course, a similar restriction is
present for decimal fractions, and in general in the number system on any base, but in
the general case this restriction is formulated in a milder way: a rational number
represented as an irreducible fraction ^ is representable as a finite fraction in the system
on base N if and only if there exists an integer degree of the number N divisible integer
n. In particular, the fraction Ts can be represented as a finite decimal because 10 = 100
2
an irreducible simple fraction turned into a finite decimal fraction if its denominator is
decomposed into prime factors in the form 2 - 5т : in this case we only need to choose
fc
the greater of k and t and use it as the degree into which to raise 10.
In the case of the binary system, things are tighter: the degree of a two, whatever it
is, can only be divided integer by another degree of two. As applied to the conversion
from decimal to binary, it should be noted that any finite decimal fraction is a number
of the form 17^; in order for the denominator to contain only twos, the numerator must
be divisible by five the required number of times. Thus, the 5.40625 considered in the
example above is541006 25 , but the numerator 540625 is perfectly divisible without
5
remainder by 5 = 3125 (the result of division is 173), so after the reduction of fives in
5
the denominator remains the degree of two, which allows us to write this number as a
finite binary fraction. But of course this is not always the case; in most cases (in
particular, always when the last significant digit of the decimal fraction is different from
5) the resulting binary fraction will be infinite, although periodic. In such a case, you
must follow the above procedure with successive multiplication by two until you get a
working number that you have already seen; this will mean that you have hit a period;
recall that a periodic fraction is written by putting its period in parentheses. For
§1.3. Now a little math 162
example, for the number 0.3510, we get 0.7 (write out 0), 1.4 (write out 1, leave 0.4), 0.8
(write out 0), 1.6 (write out 1, leave 0.6), 1.2 (write out 1, leave 0.2 ) - and finally 0.4,
which we already saw four steps ago. Hence, the period of the fraction is four digits,
and the result is 0.35 = 0.01(0110)2.
10
Since periodic fractions "pop up" quite often when translating between number
systems, it is useful to be able to determine which simple fraction corresponds to a
given periodic fraction. Let us remind you that it is not difficult to turn a finite decimal
fraction into a simple fraction: the integer part should be taken separately (in extreme
case we will add it to the numerator after multiplying it by the denominator), and count
the digits in the fractional part - and the simple fraction we need will be of the form ^^,
where M is an integer obtained by writing out the digits of our fractional part, and k is
the number of these digits; it remains only to reduce the obtained fraction, if possible -
and the job is done. For example, the number 17.325 has an integer part of 17 and a
10
For an arbitrary base number system, everything is done exactly the same way,
except instead of 10, the base of the number system is used. For example, for 10 .10012
we have M - 10012, k - 4, so TGSi TT'HT'CHYA RL/T 1 O1001 -- 100000 +1 100101001 (prrrr
CHT/GRTTYГ G'VГ G Py WHifPP'T'PCI we get 10 100001000010000 (all numbers here, of course,
binary). Note that for the binary case, the fraction is always irreducible, unless it
originally contained non-significant zeros; indeed, if the fractional part ends with a one,
the integer in the numerator (the same M) will be odd, and the denominator will always
be a power of two.
But what if the fraction is infinite, even though it is periodic? They don't tell you
about this at school (unless it is a math school). At first glance, the situation may seem
hopeless, because the number k turns out to be "infinity"; but in reality there is nothing
difficult here, you just need to use a different technique. Let's start with the decimal
case to make it easier to understand how it works. First of all, let's discard the integer
part, and with it those digits of the fractional part that are not included in the period of
the fraction - we already know how to deal with them.
§1.3. Now a little math 163
we know how to deal with them. We will be left with only infinitely repeating digits,
the very period. Let's denote the number in question by x and see what x - 10 . Since fc
the fraction is infinite, it will be the number obtained by writing out the digits of the
period instead of the k zeros, to which is assigned an "infinite tail" equal to x; to
"discard" this tail, it is enough to subtract x, so that the number x - 10 - x will always be
fc
a finite fraction (and if the period began with the first decimal place after the decimal
point, it will be an integer). It remains to find x by solving a trivial equation.
For example, consider the number 7.3327327327327... = 7.3(327). The part that is
already easy to handle (7.3), let's put aside, leaving 0.0(327), which we will denote by
x. The length of our period is k = 3, so we will multiply by 10 = 1000. We have 1000x
3
= 32.7(327); what-
to remove the damned periodic "tail", we subtract x from the two 32. 7 _ 327
999999
Remembering that we still have 7.3, we'll turn that into io and get
0. that 73254 _
12209
parts of the equality and get 999x = 32.7, so x 99901665 .
73 , 32773-999+327
the original number is 70 + t+h = -999+- = exactly the same, only we need to replace 10
As usual, everything happens in a system on an arbitrary basiswith
number.
the desired
For
example, let's try to deal with the fraction 0.01(0110)2, which appeared in our
calculations a few paragraphs above. We put 0.012 aside, leaving 0.00(0110) 2, which
we denote by x; the length of the period is 4, the base of the system is 2, so we will
multiply by 100002 (i.e. by 16). We have:
100002 - х = 01.10(0110)2
100002 - x - x = 1.10 2
11112 - x =1.12
111102 - х = 11 2
_11
х
= 11110
Since 0.01 = loo (digits are binary), we get:
2
11
1 1111 + 110 _ 10101
0.01(0110)2 +
()2
100 ' 11110 111100= 111100
In the decimal system, this would be Y = D = 0.35, which is exactly what we got the
above periodic fraction 0.01(0110)2 from. We could reduce the resulting simple fraction
without converting it to decimal (-| | |10101 |= -i11 ^,) using, say, the algorithm Eu-
11110010100.
klida for finding the greatest common divisor - taking two numbers, at each step
subtract the smaller one from the larger one until they are equal. Also, it is curious to
note here that 0.00(0110)2 is nothing else,
X 1 1 0 0 0 1 X 1 1 0 0 01
1 1 0 1 1 1 01
1 1 0 0 0 1 1 1 0 0 01
+
+ 0 0 0 0 0 0 11 0 0 0 1
1 1 0 0 0 1 110 0 0 1
11 0 0 0 1 1001 1 1 1 1 01
§1.3. Now a little math 164
100 1 1 1 1 1 0 1
Fig. 1.12. Multiplication by column in binary system
as one tenth; indeed, 0.35 = 0.25 + 0.1, and 0.25 is |, that is, the same 0.0²2 that we "put
aside".
At this point, the reader may ask a very reasonable question, how it is that we so
dashingly perform operations on binary numbers. The answer will be worthy of Captain
Hindsight: we do it with the ordinary "column", the same one, which is undoubtedly
familiar to the reader from junior high school, it is only necessary to remember that we
have a different base of the number system and we have only two digits at our disposal.
Say, for example, for addition, having written out two binary numbers one above the
other, in each column from right to left we count ones. If there are none, we write zero,
if there is one (1+0 or 0+1), we write one. If there are two ones, we can no longer write
the result in binary digits, and we get the old school "we write zero, we write one in our
mind". Taking into account the transposition (this "in the mind"), there can be three
ones in the column, and then we get "one write, one in the mind". This is how we, for
example, got 1111 + 110 = 10101 (be sure to try it yourself!) Subtracting by column in
the binary system is also no more difficult than in the decimal system, only you need
to remember that when borrowing from the higher digit in the lower one, you get two
(10 ), and not ten at all.
2
- 102, so that the common denominator will be 11112 - 102 - 102 - 102 = 111100 (if
something is not clear here, note the following fact: the school "adding zeros" works in
any system, not only in decimal). The numerator of the first fraction had to be
multiplied by 11112, but since the numerator itself was a simple one, the multiplication
caused no problems; the second numerator was not so simple, it had two ones (decimal
3), but we had to multiply it by 102, i.e. just add a zero, which we did.
Now let's see what column multiplication turns into in binary. On pg. 153 we
mentioned that Leibniz called the decimal system the fatal mistake of mankind; it is
suggested that he said this when he saw how easy it was to multiply numbers in binary.
We hope that the reader remembers how multi-digit decimal numbers are
multiplied in a column: the longer one is written out at the top, the shorter one at the
bottom, and the whole "upper" number is multiplied separately by each digit of the
"lower" one, writing out the result each time one digit to the left of the previous one. In
the binary system, everything is the same with one important difference: since there are
only two digits, the "upper" number has to be multiplied at each step either by zero
(which, as we understand, is quite simple) or by one (also, to put it bluntly, nothing
complicated). Zero chains can be written or not written; multiplication by one is
reduced, obviously, to mechanical rewriting of the first ("top") multiplier. The main
thing is not to get confused with shifts.
For example, let's try to multiply the numbers 49io = 11000l2 and 13io = 110l2 in
binary. Having written out the numbers one above the other, we multiply the first
§1.3. Now a little math 165
multiplier (1110001) first by one (that is, we simply write it out), then by zero (it is
even easier - we write out the corresponding number of zeros), then two more times by
one (see Fig. 1.12, left). Add up the resulting column and get 1001111101 ; let the
2
reader see for himself that this answer is correct. The lines of zeros can be omitted, then
the column will look like in Fig. 1.12, right.
The first balloon in the history of the world has been used remarkably well and even
produced the right results. In particular, it is known that the Mongolfier brothers lifted
the first balloon in history into the air by filling it with smoke from a mixture of straw
and wool; they regarded straw as the vegetative beginning of life, and wool as the
animal beginning, which, in their opinion, should lead to the possibility of flight, and
the flight actually took place, despite the fact that the animal and vegetative beginning
had nothing to do with it. In other words, it is possible to get absolutely correct result,
starting from absolutely false premises. According to this, an implication is false only
if its left argument is true and its right argument is false (a lie cannot follow from the
truth), in all other cases the implication is considered true.
If we denote the set {0,1} by the letter B , then the logical operations of two
59
From the word Boolean, after the English mathematician George Boole, who first proposed the
59
xf y=xVy|y=x&y
The function labeled "=" is called equivalence: it is true when its arguments are equal
and false when they differ. It is easy to see that equivalence is the negation of
"excluding or". Three more functions remain, these are the "implication in reverse" ("x
^ y = y ^ x"), and the "greater than" and "less than" functions, which represent the
negation of implicatures. In total, starting with constants, we have listed exactly 16
functions, that is, all of them.
If we consider logical functions of three arguments, the domain of their definition
- the set B x B x B - consists of eight triples {(0,0,0), (0, 0,0,1), (0,1,0),..., (1,1, 0),
(1,1,1)}; as a consequence, a logical function of three arguments is defined by a set of
eight values, and the total of such functions will be 2 = 256. In the general case, a
8
= 4294967296 , and so on. If something in this paragraph seems unclear, reread the
paragraph on combinatorics; if that doesn't help either, be sure to find someone
who can explain to you what's going on here. It's not about the number of functions,
of course; but if you don't understand something here, you definitely have problems
with simple combinatorics, and that's no good.
Returning to functions of two arguments, we note that conjunction and disjunction
are in many ways analogous to multiplication and addition: thus, conjunction with zero,
like multiplication by zero, always yields zero; conjunction with one, like
multiplication by one, always yields the original argument; disjunction with zero, like
addition with zero, also always yields the original argument. Because of this similarity,
mathematicians often omit the conjunction notation in formulas, whether "x & y" or "x
L y", and write simply "xu" or "x - y". In particular:
x&x= xx V x = x x = xx = 0 x V x =1 x = x
They, of course, do not need to be memorized either. The reader is invited to find the
corresponding reasoning on his own.
The so-called de Morgan laws deserve special mention:
x V y = x & yh &y= xVy
For some reason, the fact that these relations are nominal, that is, they bear the name of
the person who supposedly discovered them, makes many beginners quite afraid; if
someone needed to "discover" these relations, and even for this discovery they were
named by the name of the person who discovered them, then there is no way without
rote learning. Meanwhile here everything is actually quite elementary. The first ratio:
"What do we need to make a disjunction false? If at least one argument is true, then the
whole disjunction becomes true; therefore, for it to be false, both arguments must be
false, i.e. x must be false and y must be false. The second relation is, "What do we need
to make a conjunction false? Apparently, it is enough for at least one of its arguments
to be false". So much for all of de Morgan's "great and terrible" laws.
To conclude the review of binary logic, let us give one recommendation. If you
have to solve a problem in which a certain logical formula is given and you are asked
to do something with it, first make sure that only conjunction, disjunction and negation
are used in the formula. If this is not the case, immediately get rid of all other operation
signs, reducing them to the first three; note that this can always be done. Take, for
example, the mysterious implication. For it to be true, it is sufficient for the first
§1.3. Now a little math 169
argument to be false (for anything can follow from a lie); similarly, it is sufficient for
the second argument to be true (for truth can follow from both truth and falsehood). We
obtain that
x^y=xVy
Similarly, if you encounter "excluding or", replace it with one of the following: "one
of the arguments must be true, but not two at the same time" or "you need one to be
false, the other true, or vice versa":
x f y = (x V y)(hu) x f y = hu V hu
Note that the second one is obtained from the first one by opening brackets:
(x V y) (xu) = (x V y)(x V y) = xx V yx V xu V yu = xu V yu
By the way, we can write any logical function from any number of variables in a similar form,
just by looking at its truth table. Having chosen those lines where the value of the function is
1, for all such lines we write out conjuncts consisting of all variables, and those that are equal
to zero in this line are included in the conjunct with the negation sign. All the conjuncts obtained
(which will be as many as there are sets of variables on which the function is equal to one) are
written through the disjunction sign. For example, for Pierce's arrow, the corresponding entry
will consist of one conjunct hu, and for Schaeffer's stroke it will consist of three conjuncts hu
V hu V hu. If, for example, we consider a function of three arguments f (x,y,z) which is equal
to one on the four sets {(0, 0, 0, 0), (0, 0,1), (0,1,1), (0,1,1), (1,1,1) } and zero on all other sets,
then the corresponding expression for this function will look like this: xyzVxyzVxyz Vxyz Vxyz.
This form of writing a logical function is called disjunctive normal form (DNF).
also countable. To number them, imagine a coordinate half-plane, where on the horizontal
axis are laid down the values of the denominator of the fraction - recall, such must be natural,
that is, starting from one - and on the vertical axis are laid down the values of the numerator.
In other words, we need to think of some numbering for integer points of the coordinate plane
lying to the right of the vertical axis, and it is necessary that each number should be numbered
only once, that is, for example, fractions 2, 2, 3, etc. do not need to assign different numbers,
because these fractions denote the same number. However, it is just simple: when numbering
should, firstly, skip all contractible fractions, and secondly, all fractions with numerator 0,
except for the very first such fraction - 1, which will be the denotation of zero. The simplest
example of such numbering is constructed by "corners" diverging from the origin. We will give
the number one to the fraction y. On the first "corner" will get fractions 1. 0 and -1, but the fraction
0 we, as agreed, skip, but the other two (numbers 1 and -1) will get numbers two and three.
Moving along the next "corner", we will number: 1 (#4), 2 (#5), 0 (skip), - 1 (#6) and -d (#7). On the
next "corner" after that, we will already have to skip contractible fractions: 3 (#8), 2 (skip), 3 (#9),
4 (skip), -31 (#10), -2 (skip), -3 (#11). Continuing the process "to infinity", we will assign natural
infinite decimal fractions. Let us now consider, for example, a fraction having zero
60
Just in case we remind that an irrational number is a number that cannot be represented as a fraction2
, where t is an integer and n is a natural number; an example of such a number is \/'2. It is very important
not to make the common but no less monstrous mistake of writing all infinite decimal fractions such as
1 or 7 as "irrational". The "infinity" of periodic fractions is actually due to the choice of number system
and has nothing to do with the property of the number itself; for example, in a number system with a
base of 21, both of these fractions will have a finite "21-item" representation, while 1 will be an infinite
fraction.
§1.3. Now a little math 171
integer part; as its first digit after the decimal point we take any digit except that which
is the first digit in fraction No. 1; as the second digit - any digit except that which is the
second digit in fraction No. 2, and so on. The resulting fraction will be obviously
different from every fraction in our numbering, because it differs from a fraction with
arbitrary number p by at least a digit in the n-th position after the decimal point. It turns
out that in our (infinite!) numbering this new fraction has no number, and this does not
depend on how exactly we tried to number the fractions. It is easy to see that there are
an infinite number of such "unaccounted" fractions, although it is not so important. So,
no numbering can cover the whole set of infinite decimal fractions. The proof scheme
used here is called the Cantor diagonal method in honor of the German mathematician
Georg Cantor, who invented it.
The set of infinite decimal fractions is said to have the power of a continuum;
mathematicians denote it by the symbol Ki ("aleph-one"). To understand how "many" it
is, let us conduct a mental experiment, especially since its results will be useful to us
when we consider the theory of algorithms. Suppose we have an alphabet , i.e. a finite
set of "symbols", whatever these symbols may be: they may be letters, numbers, any
signs at all, but they may just as well be elements of any nature, as long as they are a
finite set. Let us denote the alphabet by the letter A. Let us now consider the set of all
finite chains composed of the symbols of the alphabet A, that is, the set of finite
sequences of the form a-1, a2 , a3 ,.... a^, where each a" e A. Such a set including an
empty chain (a chain of length 0) is denoted by A*. It is not difficult to see that, since
the alphabet is finite and every single chain is finite too (although we do not limit their
length, i.e. we can consider chains of length a billion symbols, a trillion, a trillion
trillion, and so on), the whole set of chains will be countable. Indeed, let the alphabet
include n symbols. An empty chain will be numbered 1; chains of one character will be
numbered 2 to n +1; chains of two characters, totaling n , will be numbered n + 2 to n
2 2
+ n +1; similarly, chains of three characters will start with n + n + 2 and end with n +
2
Let us now consider an alphabet consisting of all the characters ever used in books
on Earth. Let's include Latin, Cyrillic, Greek alphabet, Arabic letters, exotic alphabets
used in Georgian and Armenian, all Chinese, Japanese and Korean characters,
Babylonian cuneiform, hieroglyphics of ancient Egypt, Scandinavian runes, add
numbers and all mathematical signs, think for a while if we have not forgotten
something, add everything that we remember or that our acquaintances advise us. It is
clear that such a super-alphabet, despite its rather impressive size, will still be finite.
Let us denote it by the letter V and see what we end up with in the set of chains V*
(recall that we also consider only finite chains).
Obviously, this set will include all books ever written on Earth. Not only that, it
will also include all books that have not yet been written, but will someday be ; all 61
The obvious objection here is that symbols may appear in the future which we have not included
61
in V; but on closer examination nothing is changed by this, it is enough to devise some way of
designating such symbols by means of the symbols already existing. For example, we can think of some
special symbol followed by the usual decimal digits to indicate the number of the "new" symbol, and
assign these numbers to all new symbols as they appear. Moreover, you can even consider an alphabet
of two characters 0 and 1, and all other characters encoded by combinations of zeros and ones; in fact,
in the memory of computers, everything is just like that.
§1.3. Now a little math 172
books that have never been and will never be written, but theoretically could have been
written; as well as books that no one would ever write - for example, all books whose
text is a chaotic sequence of symbols from different writing systems, where the German
letter "B" is adjacent to the pre-revolutionary Russian "yat", and Japanese characters
are interspersed with Babylonian cuneiform.
If this is not enough for you, imagine a book as thick as the radius of the solar
system, and the diagonal of the sheet as from us to the neighboring galaxy, and typed
in the usual 12th font; it is clear that it is physically impossible to make such a book,
but, nevertheless, the set V* will include not just "this book", it will include all such
books, differing from each other at least by one symbol. And not only these books, the
Universe is infinite! Who's stopping us from imagining a book a billion light years in
size? And all such books?
If your imagination has not yet failed, you can only be envious. And after all this,
we remember that the set V* is "only" countable! To get a continuum, we need to
consider not just books of ridiculously huge size, we need infinite books, the kind that
have no size at all, that never end, no matter how much we move along the text; in
comparison with this, a book with a format of a billion light years turns out to be
something like a child's toy. Moreover, such an "infinite book" itself, taken separately,
does not yet give a continuum, it is only one; no, we need to consider all infinite books,
and then we will see that the considered set (of infinite books) has the power of a
continuum. But it is not quite clear how exactly we are going to "consider" these infinite
books, especially directly in all their variety, if we cannot even imagine one such book.
For those readers who want to test their imagination, we would advise them to search the
Internet for articles about the so-called Graham number. If, say, to cut the whole observable
part of the Universe into Planck holes (now in physics it is considered that it is the smallest
possible volume, which cannot be divided into parts) and imagine a number, the decimal
record of which occupies the whole Universe (or rather, all its part, at least somehow available
for observation from the Earth), and each decimal digit of the record has the size of one Planck
hole, the resulting monster and in the footsteps of Graham's number is not fit for Graham's
number; even if in each Planck hole of our Universe we "cram" one more such universe and
all these universes are filled to the brim with decimal digits - it will not bring us very close to
Graham's number. Imagination fails there long before the description of the number reaches
the finish line, despite the fact that it is quite correctly formulated in mathematical language.
Of course, the record of Graham's number is also included in our V*, which is understandable
- this number is mathematically defined (i.e. there really exists a text giving its definition) and
is a solution to a clearly formulated problem.
Infinite texts, like infinite fractions, you might say, play in a different league - there, trying
to imagine something like that is simply futile from the start.
It is quite obvious that it is impossible to operate with continuous infinities in any
constructive sense; countable infinities represent an absolute limit not only for the
human brain but for any other brain, unless such a brain happens to be infinite in itself.
Frankly speaking, even when working with countable infinities, we never consider the
infinities themselves, we simply state that whatever element N is, there will always be
an element N +1; in other words, whatever the set of elements of the set already
considered, we will always come up with another element that is part of the set but not
yet considered in the proposed set. Here, by and large, there is no "infinity"; there is
§1.3. Now a little math 173
simply our refusal to consider some "ceiling" above which for some reason it is
forbidden to jump.
At the same time, in mathematics we consider not only continuum infinities, but
also infinities exceeding the continuum: the simplest example of such an infinity is the
set of all sets of real numbers (its power is denoted by N2). Of course, such constructions
do not carry any constructive sense; as long as we set ourselves applied tasks, the
appearance of a continuum infinity in our reasoning should serve as a kind of "warning
light": be careful, going beyond the limits of the constructive. Does this mean that math
is somehow bad? Of course not; it's just that math doesn't have to be constructive at all.
Mathematicians are constantly testing the limits of human thinking, and for this alone
they should be thanked, as well as for the fact that it is mathematicians who provide the
general public with the means to develop brain capacity; for the sake of this effect alone
- the development of one's own intellectual capacity - mathematics is certainly worth
studying. It's just that not every mathematical model is suitable for applied or, if you
will, engineering purposes; that in itself is neither bad nor good, it's just a fact.
The situation with the question whether there exist sets which cannot be numbered (i.e.
uncountable) but which are "smaller" than the continuum; in other words, whether there are
any other infinities between the countable and the continuum. The existence of such infinities
can neither be proved nor disproved, i.e. we have the right to think that such sets do not exist,
but just as we have the right to think that they do. It is clear, however, that it will not be possible
to construct such a set, that is, to describe the elements of which it will consist, just as we
described natural numbers for countable sets and infinite decimal fractions for continuums; if
this were possible, the existence of such sets would be proved, and this is impossible (and
this very impossibility, strangely enough, has been proved). That is, even if we assume that
such sets exist, nobody will let us "touch" them.
set A* already familiar to us from the previous paragraph. Taking this into account, we
can consider that the computer's work always consists in calculating some function of
the form A* ^ A* (the expression X ^ Y denotes a function with the area of definition X
and the area of values Y).
In doing so, we can easily notice a certain problem. If we can calculate the value
Recall that an alphabet can be understood as an arbitrary finite set, usually consisting of at least
62
two elements, although in some problems alphabets of one symbol are also considered. An alphabet
cannot be empty.
§1.3. Now a little math 174
of a function (in any sense), then obviously we can somehow write down our
intermediate calculations, i.e., we can represent the calculation as a text. Moreover, if
a function can be computed in some sense, then we can represent in the form of a text
the rule itself of how this function should be computed. The set of all possible texts, as
we already know (see page 172), is countable. Meanwhile, by means of the same
Kantor's diagonal method, it is easy to see that the set of all functions over the natural
numbers has the power of a continuum; if we replace texts, i.e. elements of the set
A*, by their numbers (and this can be done by virtue of the countability of the set D*),
it turns out that functions of the form A* ^ A* are also a continuum, whereas the set of
all possible rules for calculating a function is no more than countable, because they can
be written in the form of texts. Consequently, it is not possible to specify for each such
function the rule by which it will be computed; some functions are said to be
computable, while others are not.
Even if we consider only functions whose domain of definition is the set of natural
numbers and whose domain of values is the set of decimal digits from 0 to 9, such a set
of functions will be a continuum: indeed, to each such function we can mutually
uniquely correspond an infinite decimal fraction with zero integer part, taking f (1) as
the first digit, f (2) as the second, f (27) as the twenty-seventh, and so on, and the set of
infinite decimal fractions, as we have already seen, is uncountable, i.e. a continuum. It
is clear that if we extend the domain of values to all natural ones, the functions will not
become less; however, they will not become "more" either - they will remain the same
continuum. At the same time, the computable functions, let us remind you, are not more
than countable infinity, because each of them corresponds to a description, i.e. a text,
and the set of all texts is countable. It turns out that there are much more "natural"
functions than there are computable functions, whatever we mean by computability - if
only we mean that the rules of constructive computation can be written down.
Since infinite binary fractions are also a continuum, we can simplify the set of functions
under consideration even more, leaving only two variants of the result: 0 and 1, or "false" and
"true". Such functions, which, taking a natural argument, produce truth or falsehood, also turn
out to be a continuum, from which it immediately follows that the set of all sets of natural
numbers is also uncountable (has the capacity of a continuum): indeed, every such function
sets the set of natural numbers, and, on the contrary, to every set of natural numbers there
corresponds a function which produces truth for the elements of the set and falsehood for the
numbers not included in the set. We shall not need this result in what follows, but it is so
beautiful that it would be barbaric not to mention it.
A computer performs its calculations by obeying a program that embodies a certain
constructive procedure, or algorithm. Simply put, in order for a computer to be useful
to us in any way - and it can only be useful by creating one piece of information from
another - we need someone who knows exactly how to get "another" piece of
information from this "one" piece of information, and knows it so well that he or she
can make the computer put this knowledge into practice without direct control from the
owner of the original knowledge; this person, in fact, is called a programmer. As it is
easy to guess, an algorithm is the very rule by which a function is computed; we can
§1.3. Now a little math 175
say that a function should be considered computable if there is an algorithm for its
computation.
The simplicity of this reasoning is actually deceptive; the concepts of algorithm
and computable function turn out to be very complicated. Thus, in almost any school
computer science textbook you will find a definition of an algorithm - not an
explanation of what we are talking about, not a story about the subject, but a definition,
a short phrase like "an algorithm is this and that" or "an algorithm is called this and
that". Such a definition is usually highlighted in big bold font, surrounded by a frame,
provided with some pictogram with an exclamation mark - in short, everything is done
to convince both students and their teachers that it should be memorized by heart.
Unfortunately, such definitions are good only as a rote-learning exercise. In fact, there
is no such thing as a definition of an algorithm, that is, whatever "definition" is given,
it will be obviously wrong: every author who gives such a definition makes a factual
error the moment he decides to start formulating it, and it does not matter at all what
the final formulation will be. There is no correct definition, not in the sense that "there
may be different definitions" or even that "we do not know the exact definition now,
but perhaps we will someday"; on the contrary, we know for sure that there is no
definition of an algorithm, and there cannot be, because any such definition,
whatever it may be, would take the basis out from under an entire section of
mathematics - the theory of computability.
To understand how this happened, we need another excursion into history. In the
first half of the 20th century, mathematicians became interested in the question of how,
among all the theoretical variety of mathematical functions, to identify those that a
person using any mechanical or any other devices can calculate. This was initiated by
David Hilbert, who in 1900 formulated a list of unsolved (at that time) mathematical
problems known as "Hilbert problems"; the problem of solving an arbitrary
Diophantine equation, known as Hilbert's tenth problem, later turned out to be
unsolvable, but to prove this fact it was necessary to create a theory formalizing the
notion of "solvability": without this, it is impossible to say anything definite about the
set of problems that can be constructively solved, or about what should be understood
by constructiveness Problems of problem solvability (in other words, computability of
functions) were dealt with by such famous mathematicians as Kurt Gödel, Stephen
Clini, Alonzo Church and Alan Turing.
Functions operating with irrational numbers had to be discarded immediately.
Irrationals turned out to be "too many"; in the previous paragraph we gave some
explanations why continuous infinities are not suitable for constructive work.
Since continuous infinities are not constructively computable, limiting us to
countable sets, Gödel and Clini proposed to consider for theoretical investigations only
functions of natural arguments (possibly several) whose values are also natural
numbers; if necessary, any functions working on arbitrary countable sets (including,
importantly for us, the set A*) can be reduced to such "natural" functions by replacing
elements of sets by their numbers.
Common sense suggests that even such functions are not always computable;
reference to common sense is required here because we have not yet understood (and,
§1.3. Now a little math 176
strictly speaking, will not understand) what a "computable function" is. Nevertheless,
as already mentioned, functions of the form N ^ N (where N is the set of natural
numbers) are a continuum, while algorithms seem to be no more than a countable set;
the total number of possible functions, even when we consider "just" natural functions
of a natural argument, is "much larger" than the number of functions that can somehow
be computed.
Studying the computability of functions, Gödel, Clini, Ackerman and other
mathematicians came up with a class of so-called partially recursive functions. The
definition of this class is a basic set of very simple initial functions (constant, increment
by one, and projection - a function of several arguments whose value is one of its
arguments) and operators, i.e., operations on functions that allow to construct new
functions (composition, primitive recursion, and minimization operators); a partial-
recursive function is any function that can be constructed with the help of the listed
operators from the listed initial functions. The word "partial" in the name of the class
indicates that this class necessarily includes functions that are defined only on some
63
set of numbers, and for numbers not included in this set they are not defined, i.e. cannot
be computed. Note that the epithet "recursive" in this context means that the functions
are expressed one through the other - perhaps even through itself, but not necessarily.
As we will see later, in programming the meaning of the term "recursion" is somewhat
narrower.
Numerous attempts to extend the set of computable functions by introducing new
operations were not successful: each time it was proved that the class of functions
defined by new sets of operations turns out to be the same as the already known class
of partially recursive functions, and all new operations are safely (though in some cases
rather cunningly) expressed through the existing ones.
Alonso Church abandoned further attempts to extend this class and stated that it
seems to be precisely the partially recursive functions that correspond to the notion of
a computable function in any reasonable understanding of computability. This claim is
called Church's thesis . Note that Church's thesis cannot be regarded as a theorem - it
cannot be proved, since we have no definition of a computable function, much less a
definition of a "reasonable understanding". But why not, you may ask, provide some
definition so that Church's thesis is provable? The answer here is very simple. By
turning Church's thesis into a supposedly proven fact, we would be depriving
ourselves, quite unreasonably, of the prospects for further research on
computability and various mechanisms of computation.
So far, all attempts to create a set of constructive operations richer than the one
proposed earlier have failed: each time it turns out that the class of functions is exactly
the same. It is quite possible that this will always be the case, that is, the class of
computable functions will never be expanded; this is what Church's thesis asserts. But
this cannot be proved, if only because it is not quite clear what a "constructive
operation" is and what their set is. Hence, there is always the possibility that in the
future someone will come up with a set of operations that will be more powerful than
the basis for partially recursive functions. In this case, Church's thesis will be refuted,
Why this is so imperative, we shall learn a little later, see the discussion on page 190. 190.
63
§1.3. Now a little math 177
or, more precisely, a new thesis will appear in its place, similar to the existing one, but
referring to a different class of functions. Let us emphasize that the definition of a
computable function will not appear because even if the class of computable functions
turns out to be extended, it cannot in itself mean that it cannot be extended further.
It is a bit of a stretch to consider that the class of partially recursive functions with all its
properties represents some abstract mathematical theory like Euclid's geometry or, say,
probability theory, while the notion of computability as such is outside mathematics, being a
property of our Universe (the "real world") along with the speed of light, the law of universal
gravitation, and the like. Church's thesis in this case turns out to be a kind of scientific
hypothesis about how the real world works; everything finally falls into place if we remember
that according to Karl Popper's theory of scientific knowledge, hypotheses are not true, but
only unproved, and the researcher must take into account that any hypothesis, no matter how
much evidence it finds, may be disproved in the future. Church's thesis states that any function
that can be constructively computed is in the class of partially recursive functions; no one has
yet been able to disprove this statement, and we therefore accept it as true. Note, Popper's
falsification criterion applies perfectly well to Church's thesis. Indeed, we can (and easily
enough) specify such an experiment, the positive result of which would disprove Church's
thesis: it is enough to construct some constructive automaton that would compute a function
that does not belong to the class of partially recursive functions.
The formal theory of algorithms is constructed in much the same way as the theory
of computability. An algorithm is said to be a constructive realization of some
transformation from an input word to a result word, and both the input word and the
result word are finite chains of symbols in some alphabet. In other words, in order to
be able to discuss an algorithm, we must first fix some alphabet A, and then algorithms
will turn out to be constructive realizations of the same familiar transformations of the
form A* ^ A*, that is, simply speaking, realizations (if you like, constructive rules of
computation) of functions of one argument, for which the argument is a word of
symbols from the alphabet, and the word is the result of computation. Of course, all
this can in no way be considered a definition of an algorithm, since it relies on such
expressions as "constructive realization", "constructive rules of computation", and
these "terms" themselves remain undefined. Continuing the analogy, we note that not
every such transformation can be realized by an algorithm, because there is a continuum
of such transformations, while algorithms are, of course, no more than a countable set,
because whatever we understand by an algorithm, we, in any case, mean that it can be
written down in some way, i.e., represented as a finite text, and the set of all possible
texts is countable. Moreover, it would not be quite right to identify an algorithm with
the transformation it performs, since two different algorithms can perform the same
transformation; we will return to this question shortly.
Alan Turing, one of the founders of the theory of algorithms, proposed a formal
model of an automaton known as a Turing machine. This automaton has a tape, infinite
in both directions, in each cell of which a character of the alphabet can be written or
the cell can be empty. Along the tape moves the head, which can be in one of several
predetermined states, and one of these states is considered the initial (in which the head
is at the beginning of work), and the other - the final (when moving to it the machine
completes work). Depending on the current state and the character in the current cell,
the machine can:
• write to the current cell any alphabetical character instead of the one currently
§1.3. Now a little math 178
written there, including the same character, i.e. leave the cell contents
unchanged;
• change the state of the head to any other state, including remaining in the state
the head was in before;
• move one position to the right, one position to the left, or remain in the current
position.
A program for a Turing machine, more commonly referred to simply as a "Turing
machine", is represented as a table specifying what the machine should do for each
combination of the current symbol and the current state; the symbols are marked
horizontally, the states are marked vertically (or vice versa), and three values are
recorded in each cell of the table: a new symbol, a new state, and the next move (left,
right, or stay put). An input word is recorded on the tape before the operation is started;
if, after some number of steps, the machine has moved to a final state, the word now
recorded on the tape is said to be the result of the operation.
Turing's thesis states that whatever reasonable understanding of an algorithm is,
any algorithm corresponding to this understanding can be realized as a Turing machine.
This thesis is confirmed by the fact that many attempts to create a "more powerful"
automaton have failed: for each created formalism (formal automaton) it is possible to
specify how to build a Turing machine analogous to it. Many such formalisms have
been constructed: these are normal Markov algorithms, and all kinds of automata with
registers, and variations on the theme of the Turing machine itself - Post's machine,
machines with several tapes, and so on. Each such "algorithmic formalism", being
considered as a concretely defined working model instead of the "elusive" notion of an
algorithm, turns out to be in one way or another useful for the development of theory,
and in some cases - for practical application; there are, in particular, programming
languages based on the lambda calculus (which should be referred rather to the theory
of computable functions), as well as languages whose computational model resembles
Markov algorithms. Thus for each such formalism it was proved that it can be realized
on a Turing machine, and a Turing machine can be realized on it.
Nevertheless, it is impossible to prove Turing's thesis, because it is impossible to
define what a "reasonable understanding of the algorithm" is; this does not exclude the
theoretical possibility that sometime in the future Turing's thesis will be disproved: for
this purpose it is enough to propose some formal automaton corresponding to our
understanding of the algorithm (i.e. constructively realizable), but having
configurations that do not translate into a Turing machine. The fact that no one has yet
managed to propose such an automaton does not formally prove anything: what if
someone is luckier?
All this is very similar to the situation with computable functions, partially
recursive functions and Church's thesis, and this similarity is not accidental. As we have
already noted, all transformations of the form A* ^ A* can be transformed into
transformations of the form N ^ N by replacing an element of the set A* (i.e. a word)
with its number (which can be done since the set A* is countable), and vice versa by
replacing the number of the word with the word itself. Moreover, it is proved that any
transformation realized by a Turing machine can be given as a partially recursive
function, and any partially recursive function can be realized as a Turing machine.
§1.3. Now a little math 179
Does this mean that "computable function" and "algorithm" are the same thing?
Formally speaking, no, and there are two reasons for this. First, both notions are
undefined, so it is impossible to prove their equivalence, as well as to disprove it.
Secondly, as already mentioned, these notions are somewhat different in their content:
if two functions written in different ways have the same domain of definition and
always give the same values for the same arguments, it is usually considered that we
are talking about two records of the same function, whereas when applied to two
algorithms, we speak about the equivalence of two different algorithms.
An excellent example of such algorithms is the solution of the famous problem of the
Towers of Hanoi. The problem involves three rods, one of which has N flat disks of different
sizes on it in the form of a kind of pyramid (the largest disk at the bottom and the smallest at
the top). In one move we can move one disk from one rod to another, and if the rod is empty,
any disk can be placed on it, but if there are already some disks on the rod, we can place only
the smaller disk on top of the larger one, but not vice versa. It is impossible to take several
disks at once, that is, we move only one disk per move. The task is to move all disks from one
rod to another in the smallest possible number of moves, using the third one as an intermediate
one.
It is well known that the problem is solved in 2N - 1 moves, where N is the number of disks,
and, moreover, the recursive algorithm for solving the problem is well known, as outlined, for
64
example, in the book by J. Perelman's "Living Mathematics" [5], which was published in 1934.
The recursion basis can be the transfer of one disk in one move from the source rod to the
target rod, but it is even easier to use as a basis the degenerate case when the problem has
already been solved and nothing needs to be transferred anywhere, i.e. the number of disks
to be moved is zero. If we need to move N disks, we use our own algorithm (i.e., recursively
access it) to first move N - 1 disks from the source rod to the intermediate rod, then move the
largest disk from the source rod to the target rod, and again access ourselves to move N - 1
disks from the intermediate rod to the target rod.
To implement this algorithm in the form of a program, we need to come up with some
rules for recording moves. This is quite simple: since only one disk is moved, the move is
represented as a pair of rod numbers: from which one and to which one we are going to move
the next disk. The initial data for our program will be the number of disks. We will write the
texts of this program in Pascal and C later, when we have studied these languages sufficiently;
for the impatient reader we can suggest to look at §2.11.2, where the solution in Pascal is
given, and for the solution in C we will have to turn to the second volume of our book (§4.3.22).
Note, ahead of time, that the recursive subroutine that performs the actual solution of the
problem will take eight lines in both languages, including the header and operator brackets,
and everything else that we have to write is auxiliary actions that check the correctness of the
input data and translate them from textual to numerical representation.
Let us now consider another solution to the same problem, without recursion. Perelman
did not give this solution, so no one knows it in the end ; at the same time, in Russian the
65
solution is described much simpler than the recursive one. So, on odd strokes (on the first,
third, fifth, etc.) the smallest disk (i.e. disk #1) moves "in a circle": from the first rod to the
When we talk about algorithms or program fragments, recursion refers to the use of such an
64
algorithm (or such a program fragment) by itself to solve a simpler case of a problem.
The author does not claim the laurels of the inventor of this variant of the algorithm, because he
65
remembers exactly that he was told this solution in his student years by one of the senior students who
took a special seminar with him, but it is difficult to remember who it was.
§1.3. Now a little math 180
second, from the second to the third, from the third to the first, and so on, or, on the contrary,
from the first to the third, from the third to the second, from the second to the first, and so on.
The choice of the "direction" of this move depends on the total number of disks: if it is even,
we go in the "natural" direction, i.e. 1 ^ 2 ^ 3 ^ 1 ^ ..., if it is odd, we go in the "reverse" circle:
1 ^ 3 ^ 2 ^ 1 ^ ..... Moving the disk on even moves is defined unambiguously by the fact that
we must not touch the smallest disk, and there is only one way to make a move without
touching it, i.e. we simply look at the two rods that do not have the smallest disk on them and
make the only possible move between them.
Strange as it may seem, the computer program embodying this variant of the puzzle
solution turns out to be much (more than ten times!) more complicated than the one given
above for the recursive case. The point here is the formal realization of the phrase "look at t he
two rods and make a single move". A person solving a puzzle will indeed look at the rods,
compare which of the upper disks is smaller, and move that disk. To do the same in a program,
we would have to remember which disks are currently present on which rod, which is not too
difficult if you know how to work with single-linked lists, but still difficult enough not to insert
the text of this program into a book - at least not in its entirety. The recursive solution was so
easy because we didn't remember which disks were present where, we knew what moves to
make without it.
What we can say for sure is that, given the same input data, this program will print the
same moves as the previous (recursive) program, although the program itself will obviously
be written quite differently. The terminological problem that arises at this point is as follows.
Obviously, we are talking about two different programs (such programs are called equivalent).
But are we talking about two different algorithms or about the same algorithm written in
different words? In most cases, the "obvious" answer is that these are two different algorithms,
albeit equivalent, i.e. realizing the same function (of course, computable).
Nevertheless, it is usually considered that the theory of algorithms and the theory
of computable functions are more or less the same thing. Common sense suggests that
this is correct: both represent some "constructive" transformation from some countable
set to itself, only the sets are considered different: in one case it is the set of chains over
some alphabet, in the other case it is simply the set of natural numbers. The numbering
of chains makes it easy to pass from one to the other without violating
"constructiveness", whatever it is; hence, we are dealing with the same essence, only
expressed in different terms. Whether to distinguish transformations only in terms of
their "external manifestations", as in the theory of computability, or in terms of their
"concrete embodiment", as in the theory of algorithms, is ultimately a matter of
tradition. Moreover, many authors use the notion of the "Church-Turing thesis", which
implies that these theses should not be separated: we are talking about the same thing,
but in different terms.
After all this has been said, any definition of an algorithm is perplexing at best,
because if we give such a definition, we will thereby scrap the theory of computability
and the theory of algorithms, Church's thesis together with Turing's thesis, multiple
attempts to build something more complex - for example, a Turing machine with
multiple tapes, proofs of their equivalence to the original Turing machine, the work of
hundreds, perhaps even thousands of researchers - and the definition will still turn out
to be either wrong or so vague that it is impossible to use it
Does this mean that the notion of algorithm cannot be used at all because of its
indefiniteness? Of course not. First, if we speak about mathematics, we often use
concepts that have no definitions: for example, we cannot define a point, a line and a
§1.3. Now a little math 181
plane, but this fact does not cancel geometry in any way. Secondly, if we speak about
the sphere of engineering and technical knowledge, strict definitions are very rare here
in general, and this does not bother anyone. Finally, we should take into account the
theses of Church and Turing: taking into account these theses, the notion of an
algorithm acquires a quite rigorous mathematical filling, it is only necessary to
remember where this filling came from, i.e., not to forget the role of the theses of
Church and Turing in our theory.
input, such an algorithm would produce the numbers 2 and 3 as the result, whereas for
the equation x - x - 1 =0 it would, using one or another textual notation of the square
2
root, produce the expressions1 2'5 and1 '2 '5 as the answer. It is important to realize that
such an algorithm will not operate on the numerical value of y/b at any point, since it
has no discrete representation.
Note that the number y/5 has a rather simple "analog" representation: it is enough to fix
a certain standard of unit length, for example, a centimeter, and take a rectangle with sides 1
and 2, and its diagonal will just represent the root of five. In other words, we can construct
(with a circular and a ruler, if you will) a segment that will be exactly ^/5 times longer than the
given length. If you think about it a bit, you can, using the initial integer values, get y5 as a
force acting on some body, as a volume of liquid in a vess el, and similar analog physical
quantities. Well, algorithms do not work with anything like that ; as we have
§1.3. Now a little math 182
already said, the whole theory of computability is built exclusively on integers, and computers
operate with integers (eventually).
In principle, so-called analog computers are known, working just with continuous
physical processes; initial, intermediate and resulting data in such machines are represented
by values of electrical quantities, most often voltage; but the functioning of analog computers
has nothing to do with algorithms.
The third fundamental property inherent in any algorithm is not so obvious; it is
called determinism and consists in the fact that an algorithm does not leave any
"freedom of choice" to the executor: the prescribed procedure can be followed in one
and only one way. The only thing that can affect the algorithm's execution is the initial
data; in particular, given the same initial data, the algorithm always produces the same
results.
Readers who have already experienced practical programming may notice that programs,
especially game programs (and also, for example, cryptographic programs), often deviate from
this rule by using random number sensors. In fact, this in no way contradicts the theory of
algorithms, just that random numbers, wherever they come from, should be considered as a
part of the initial data.
The properties of objectivity/infinity (in the sense that every algorithm has a finite
objective representation), discreteness and determinacy are inherent in all algorithms
without exception, i.e., if something does not possess any of these properties, it is
obviously not an algorithm. Of course, these properties should be considered rather as
axioms, i.e. statements that are accepted despite their unprovability: it is simply
assumed that algorithms, whatever is meant by them, are always so. With some stretch
one can claim that an algorithm must still be understandable for the executor, but this
is already on the edge of the foul: understandability refers not to the algorithm but to
its record, but the algorithm and its record are not the same thing; moreover, one
algorithm can have infinitely many representations even in the same record system: for
example, if you rename all the variables in a program or swap subroutines, the
algorithm will not change.
Along with the above mentioned mandatory properties, an algorithm may have (but
may not have) some private properties, such as mass, completeness (applicability to
individual input words or to all possible input words), correctness and usefulness
(whatever they mean), etc. Perhaps these properties are desirable, but nothing more;
they are characteristics that a single algorithm may or may not fulfill, and in many cases
such fulfillment cannot (cannot!) even be verified. Unfortunately, there are many
various literary sources of doubtful quality (which unfortunately include some school
textbooks of computer science), where mandatory properties of algorithms are piled in
one heap with private ones, generating an absolutely fantastic shambles.
Let us start with the mass property of an algorithm, as the simplest one; this
property is usually understood in the sense that an algorithm should be able to solve a
family of problems rather than a single problem; this is why an algorithm is made
dependent on input data. The mass property is obviously verifiable, i.e., by considering
a particular algorithm, it is easy to determine whether it has mass or not; but that is
actually the end of it. It is in no way obligatory, i.e. an algorithm that does not depend
on input words at all does not cease to be an algorithm. As among computable functions
there are constants, so among algorithms there are generators of a single result. By the
way, the famous program "Hello, world" belongs to this category - it is the example of
§1.3. Now a little math 183
the first program in a programming language, which is a favorite of many authors; all
it does is to output the phrase "Hello, world!" and terminate. By the way, we will also
66
start learning Pascal and then C with this very program. Obviously, a program that
always prints the same phrase does not depend on any input data at all and, therefore,
has no mass. If we considered massiveness to be a mandatory property of an algorithm,
we would have to assume that programs like this one do not implement any algorithms.
If, say, we consider ordinary triangles in the plane, we can notice that every triangle obeys
the triangle inequality (that is, the sum of the lengths of any two sides is obviously greater than
the length of the third side), and also its sum of angles is always equal to 180°; these are
properties of all triangles without exception. Besides, among all triangles there are also right-
angled triangles for which Pythagoras' theorem is satisfied; of course, Pythagoras' theorem is
a very important and necessary property, but it would be absurd to demand its fulfillment for
all triangles. This is also the case with the massiveness of algorithms.
When reasoning about properties of algorithms does not draw an explicit
distinction between mandatory properties, i.e., properties inherent to all algorithms
without exception, and private properties, such that an algorithm may or may not
possess, such reasoning results in gross factual errors; now we will try to discuss the
most popular of them. This discussion will allow us, first, not to repeat the mistakes,
whose popularity does not in any way cancel their grossness, and, second, in the course
of the discussion we will learn some more interesting aspects of the theory of
algorithms.
It is surprisingly common to encounter the statement that any algorithm must have
the properties of "correctness" and "completability", i.e., in other words, any algorithm
must always terminate in a finite number of steps, and not just terminate, but produce
the correct result. Of course, we would like all algorithms (and, therefore, all computer
programs) to never hang and never make any errors; as they say, it is not harmful to
want. In reality, things are quite different: such a "paradise-ideal" state of affairs turns
out to be fundamentally unattainable, not only technically, but also, as we will soon
see, purely mathematically. One might as well demand that all cars be equipped with
perpetual motion machines and that the number of l's should be equal to three to make
it easier to count.
Let us start with "correctness". It is obvious that this notion is simply impossible to
formalize; moreover, it is often impossible to specify any criterion for checking
"correctness" at all when developing an algorithm, or, even worse, different people may
use different criteria when evaluating the same algorithm. This property of the subject
area is well known to professional programmers: one and the same behavior of a
computer program may seem correct to its author and not only incorrect to the
customer, but even outrageous in some cases. No formal descriptions, no detailed
testing can correct this situation. Among programmers it is not unreasonably believed
that there are no "correct" programs, there are only such programs in which no errors
have been found yet - but it doesn't mean that there are no errors there. Some time ago
the topic of formal verification of programs, proof programming, etc. was popular
among programmers-researchers; nobody achieved any encouraging results in this field
Clearly, an algorithm S' can be either self-applicable or not self-applicable. Consider its
application to itself, that is, S'(S'). Suppose S is self-applicable; then the result of S(S )
' '
is true, and hence S'(S') will, according to its own definition, go into an infinite loop,
i.e., it will not be self-applicable, which contradicts the assumption. Suppose now that
the algorithm S' is not self-applicable. Then the result of S(S') will be false, so that S'(S')
will safely terminate, that is, it will turn out to be self-applicable, which, again, contradicts
hence, it simply does not exist. As a consequence, the algorithm S does not exist,
otherwise S' could be written (regardless of which of the algorithmic formalisms we use
- both branching and infinite loop are implemented everywhere).
Let us return to the property of "completability". Usually, when it is mentioned,
there is no mention of "input words", i.e., authors who attribute this property to an
algorithm seem to mean that the algorithm should be, following the classical
terminology, applicable to any input word. Meanwhile, in fact, it is impossible (in the
general case) to check even the applicability of the algorithm to one single input word,
let alone to the entire infinite set of such words. Of course, we can specify trivial cases
of such algorithms: for example, an algorithm that always produces the same value
regardless of the input word and does not analyze this input word in any way will be
obviously applicable to any input word. However, not only in the "general case", but
67
Here we are reasoning according to a simplified scheme - in particular, we do not distinguish
between an algorithm and its record; moreover, we talk about "an algorithm in general", whereas from
the formal point of view we should use one of the strictly defined formalisms, at least the same Turing
machine. All this can be easily corrected, but the conciseness of the statement will suffer greatly, and
our task now is not to give a formal proof, but to explain why things are the way they are.
§1.3. Now a little math 185
also in all the cases that are "interesting" enough to include almost any useful computer
program, we cannot say anything definite on the subject of "completability":
algorithmic intractability is a stubborn thing. If we consider "completability" to be an
a priori property of any algorithm (that is, if we assume that something that does not
have this property is not an algorithm), then we will not be able to give examples of
algorithms at all, except for the most trivial ones, and we will certainly miss out on a
large and most interesting part of algorithms. In other words, if we include
"completability" in the notion of an algorithm, as textbook authors often do, we, with
such a notion of "algorithm", in the vast majority of cases would not be able to say
definitively whether we are looking at an algorithm or not, i.e., we would simply not
be able to distinguish algorithms from non-algorithms; why do we need such a notion?
The situation starts to look even more piquant if we take into account that in the theory of
computable functions the possible indeterminacy of a function on a subset of the domain of its
definition plays an extremely important role, and any attempts to further define functions like
"consider another function which is equal to zero at all points at which the original function is
not defined" fail miserably.
To see why this is so, consider the following rather simple reasoning. Since we have
agreed that any constructive calcul-
It follows from this that there is a countable set of computable functions, i.e. there is some
numbering of all computable functions. Let us now try to introduce such a notion of a
computable function of one argument that any function computable in this sense is defined on
the whole set of natural arguments. It is clear that, since we are talking about computable
functions, these functions are also at most a countable set. Let us denote them all by f-1 , f2 ,
f3 ,... , f ,...; in other words, let the numbered sequence f" exhaust the set of computable (in
n
our sense) functions. Consider now the function e(n) = f (n) + 1. It is clear that in any
n
reasonable sense such a function is computable, but at the same time it is different from each
of f", that is, it turns out that it is not included in the set of computable functions .
68
As a consequence, we must either agree that the addition of a unit can make a
computable function incomputable, or take it as a given that when considering computable
functions (whatever we mean by such functions) we cannot restrict ourselves to functions that
are everywhere definite. Note that if the functions of the family f" do not have to be universally
defined, then we have no contradiction; indeed, if the function f is not defined at a point k,
k
then e(n), defined in the above way, is also undefined at a point k, and, as a consequence, its
difference from each of f is no longer guaranteed by its definition; in other words, e(n) can
k
coincide with any of those functions f , which are not defined at the point corresponding to
k
their number, and, consequently, "gets the right" not to go beyond the set {f }. k
The theory of computable functions considers, among other things, a certain "simplified"
class of functions - the so-called primitive-recursive functions. They are distinguished from
partially recursive functions (see page 178) by the absence of the minimization operator, i.e.,
the functions are constructed on the basis of the same primitive functions (constant, unit
68
For the reader interested in computable functions, we can recommend for a start the book by W.
Boss "From Diophantus to Turing" [6], which contains a rather successful popular review of the relevant
mathematical theory. [6], which contains a rather successful popular review of the corresponding
mathematical theory. There are, of course, also more specialized manuals. We will leave a more detailed
presentation of the theory of algorithms and computability outside the scope of our book, in order not
to try to cover the vastness.
§1.3. Now a little math 186
increase and projection) and the operators of superposition and primitive recursion. Al l
functions constructed in this way are obviously defined for any values of arguments, there is
simply no place for them to "loop"; in the class of partially recursive functions, it is the
minimization operator that introduces the possibility of "uncertainty".
It would seem that this class is quite broad, and, if we talk about the arithmetic of integers,
we have the (deceptive) impression that it covers "almost everything". To give an example of
a partially recursive function that would not be primitive-recursive, Wilhelm Ackermann had to
invent a function that was later called the Ackermann function; this function of two arguments
grows so rapidly that all its meaningful values fit into a small table, beyond which these
numbers exceed the number of atoms in the universe.
If, however, we return from the field of integer functions to the world of algorithms, it will
turn out that, when it comes to the numbering of input and output chains, the number of atoms
in the Universe is not such a large number, while an example of a function which, being
partially recursive, is not primitive-recursive, is not primitive-recursive, "suddenly" turns out to
be any computer program that uses "non-arithmetic" loops, i.e., in fact, ordinary while
loops, for which at the moment of entering the loop it is impossible to determine (except
by passing the loop) how many iterations this loop will have. Similarly, a program using
recursion for which it is impossible to know in advance at the moment of entry how many
nesting levels there will be .
69
For someone experienced in writing computer programs, two things are quite obvious.
On the one hand, if a program is written using only arithmetic loops, such as for in Pascal,
and without any recursion (or using "primitive" recursion, for which the number of nested calls
does not exceed a given number), it will be possible to estimate (from above) the execution
time of such a program in advance. There is simply no place for such programs to loop, so no
matter what data are input, such a program will surely terminate in a finite number of steps.
On the other hand, alas, we cannot write any useful program in this way. As soon as we
try to solve a practical problem, we have to resort either to a while loop, or to recursion
without a "level counter", or to a backtrack operator (which is actually a while loop too,
so we should use a loop instead of a backtrack). Well, together with such "nondeterministic
repetition", sources of uncertainty, algorithmic unsolvability of the stopping issue and other
delights known to users under the general name of "glitches" seep into our program. When
looking at reality from this angle, it turns out that the general "glitchiness" of programs is not a
consequence of programmers' carelessness, as it is commonly believed, but rather a
mathematical property of the subject area. This is because if we add at least one
nondeterministic repetition to our program (be it a non-arithmetic loop or non-primitive
recursion), we thus move the algorithm implemented by our program from the class of
primitive-recursive functions to the class of partial-recursive functions.
Thus, the indeterminacy of some computable functions at some points,
or, what is the same, the inapplicability of some algorithms to some
input words, is a fundamental property of any models of constructive
computation, almost trivially following from the very foundations of the theory. Not only can
nothing be done about it, but we should not do anything about it, otherwise we will lose much
more than we gain: our programs will become "correct" and completely stop "glitching", but
they will also become useless.
Note that the "primitive recursion operator" uses an integer parameter that is decremented by one
69
on each recursion, so that the number of remaining recursions is always exactly equal to this parameter.
§1.3. Now a little math 187
1.3.7. Sequencing has nothing to do with it
Among the "definitions" of an algorithm, which, despite their initial incorrectness,
are still found in the literature, and more often than we would like, variations on the
theme of "sequences of actions" prevail. So, in reality, an algorithm in the general
case may have nothing in common with any sequences of actions, at least those
explicitly described.
Indeed, we have already mentioned that a variety of formal systems are used to
study algorithms: among them, for example, a Turing machine (represented by a state
transition table) and partial recursive functions (represented by a combination of basic
functions and operators). Algorithmic formalisms also include normal Markov
algorithms (a set of rules for rewriting a word in view), lambda calculus (systems of
functions of one argument over some expression space), abstract machine with
unbounded registers (MNR), and many other models. Among all these formalisms, the
sequence of actions written out in explicit form is inherent only in the ISM; meanwhile,
it has been repeatedly proved that all these formalisms are pairwise equivalent, i.e.,
they specify the same class of possible computations. In other words, any algorithm
can be represented in any of the available formal systems, among which most of
them do not imply a description (at least explicitly) of the sequence of actions.
If we return from mathematical heaven to the sinful programmer's earth, we will find that
the embodiment of an algorithm in the form of a computer program will
not always be a description of a sequence of actions . Everything here depends
on the programming paradigm used, i.e. on the style of thinking used by the programmer; in
many respects this style is determined by the programming language. Thus, when working in
Haskell and other functional programming languages, we do not have any actions at all, we
have only functions in a purely mathematical sense - functions that calculate some value on
the basis of given arguments. The calculation of such a function is expressed through other
functions, and this expression has nothing to do with actions and their sequences. In most
cases, for Haskell programs it is impossible to predict in which sequence the calculations will
be performed, because the language implements the so-called lazy semantics, which allows
you to postpone the execution of a calculation until the result is needed for another calculation.
While in Haskell we at least specify how to compute the result, even if not in the form of
a specific "sequence of actions", when working in programming languages with declarative
semantics, we do not pay attention at all to how to search for the required solution; we only
specify what properties it should have, and leave it to the system to find it. Among such
languages, the most famous is the relational language Prolog: a program in this language is
written as a set of logical statements, and its execution is a proof of a theorem (or rather, an
attempt to disprove it).
Finally, the now super-popular paradigm of object-oriented programming is also based
not on "action sequences" but on some kind of message exchange between abstract objects.
If we speak about sequences of actions, the program is written as such only in so-called
imperative programming languages, also sometimes called "Fonneyman's". Such languages
include, for example, Pascal, C, BASIC, and quite a few others; however, the popularity of the
Fonneiman style is due solely to the prevailing approach to building computer architecture,
and not at all to the "simplicity" or even "naturalness" of imperative constructions.
It is clear that programs in any really existing programming language have the property
of constructive computability, otherwise there could not exist a practical implementation of
§1.3. Now a little math 188
such a language - and it exists. It is obvious, thus, that a computer program is always an
algorithm, no matter how it is written; the definition of an algorithm as a sequence of
actions, therefore, is no good and can lead to quite dangerous misconceptions. Incidentally,
the authors of such definitions also mislead themselves: this seems to be the source of the
often-occurring but completely insane statement that "programming languages are divided into
algorithmic and non-algorithmic". Of course, in reality, any programming language is
algorithmic. Perhaps, one could imagine some programming language that does not
possess the property of determinism, based, say, on heuristics and, as a consequence, does
not guarantee not only the correctness of programs, but even their stability, and the results of
program execution make them absolutely unpredictable. Strange as it may seem, such
approaches to solving some problems do exist (at least the same neural networks), but
fortunately it did not come to creation of such programming languages: it is hard to work with
unpredictable results, such methods, in particular, completely exclude debugging, so the
practical potential of non-algorithmic computations is very doubtful (hello to neural networks
fans).
However, when speaking of "non-algorithmic" languages, the authors of textbooks of
dubious quality usually mean an entity much simpler: namely, all languages in which a
program is written otherwise than in the form of the same "sequence of actions". Strictly
speaking, "non-algorithmic" should probably be considered all modern languages, except
assembly language, because even the imperative C and Pascal allow recursive programming
and, as a consequence, writing a program in a form in which a specific sequence of actions
can not be seen.
Note that to express the same idea about "division" of programming languages we can
offer quite correct wording: for example, we can say that programming languages are divided
into imperative and non-imperative, which, of course, will require at least a brief explanation
of the term "imperative programming", and here "sequence of actions" has just the right to
appear - but already in application to programs in concrete programming languages, not to
abstract algorithms.
In defense of their picture of the world, those who like to define an algorithm as a
sequence of actions quite often resort to the statement that the theory of algorithms is
"the wrong algorithm", and the theory of computability is supposedly not related to
programming at all. It turns out to be somewhat difficult to refute such a statement - in
fact, it turns out to be purely terminological at a closer look, and disputes, the central
point of which is the meaning of this or that term, are a thankless occupation.
Nevertheless, it is still possible and even necessary to say something on this topic.
Let's start with algorithmic intractability: we can appeal to the "abstract mathematicity"
of the theory of algorithms as much as we like, but it is worth suggesting to a more or
less competent programmer to develop an add-on for the operating system that would
automatically find "hung" programs and stop them - and if the programmer is
something in terms of qualification, we will immediately hear that he won't take up
such a task, and no one will either, and if someone does, he won't solve it anyway,
because it's an algorithmically intractable problem. It turns out that in this case the
algorithm turned out to be "just right".
Moreover, the algorithmic unsolvability of the applicability problem (the stopping
problem) inspires programmers to quickly and efficiently send "to the garden" any ideas
related to proving properties of an arbitrary program: if it is impossible even to formally
predict whether it will stop or not, what can we say about more interesting properties.
§1.3. Now a little math 189
In spite of this, if we only mention computability theory or some terms from it, like
"partial recursive functions", and for some reason the risk of running into an
interlocutor claiming that "this is not the right algorithm" increases dramatically. In
principle, this phenomenon is easily explained: algorithmic intractability is usually
demonstrated without going into the maze of computability theory or even mentioning
it at all, and in most cases it is done exactly as we did on page 189. 189. The proof of
unsolvability of the problem of algorithm's self-applicability is practically trivial, so
most future programmers who have seen it at least once have no difficulties in
understanding the essence of algorithmic unsolvability. The theory of computability is
another matter: it is a very specific subject, so most programmers have never even heard
the term "partial-recursive function" and certainly do not know what it is. Of course, it
is much easier to dismiss the incomprehensible than to go into it.
Moreover, we have already shown above that the classes of functions introduced
in the theory of computability have a direct relation to practical programming (see the
discussion on page 192), namely, that primitive-recursive functions correspond to the
class of programs without "unpredictable" loops, while partial-recursive functions
correspond to the whole set of computer programs. Will there be anyone willing to
argue that an algorithm in the mathematical sense is the "wrong" algorithm?
For some reason, this task puzzles even many programmers; meanwhile, it is easy to
see that the initial area of the treasure hunt was 60 x 80 m , and after the discovery of
2
new information it was reduced to 30 x 20 m , i.e. exactly eight times. This is the
2
reduction of uncertainty; since one bit halves uncertainty, we are dealing with three bits
of information.
Often in problems similar to this one, we talk about the probabilities of occurrence of some
events, and then that a message indicating the occurrence of some combination of such
events has been received, and ask what is the information capacity of the received message.
For example:
On the first day of the table tennis tournament Vasya had to play first with Petya
and then with Kolya. Watching the practice games that took place before the
tournament, the fans estimated that Vasya played with Kolya on approximately
§ 1.4. Programs and data 205
equal terms, but Petya beat Vasya 75% of the time.
Masha could not attend the tournament, but she was rooting for Vasya and
asked her friends to tell her the results of both of his games. Some time later,
she received an SMS message saying that Vasya had beaten both of his
opponents. (1) What is the information capacity of this message? (2) If Vasya
had lost both games and Masha had been informed about it, what would be the
information capacity of the message?
Such problems are no more difficult to solve than the problem about archeologists in the
courtyard of a castle, but instead of a courtyard we have the space of elementary outcomes,
and instead of an area we have a probability. The probability of Vasya's victory in the first
game, according to the conditions of the problem, is 4, the probability of victory in the second game
is 2; since for combinations of independent events the probabilities are multiplied, it turns out
that the probability that Vasya will win both games is 1 ' 1 = 8. Consequently, the information
that Vasya won both games reduces uncertainty eight times, and the information value of the
message is 3 bits.
The probability that Vasya will lose both games is |, so here the information value of the
message will be, firstly, lower, since a more probable event has occurred, and, secondly, this
time the sought number will not only be not integer, but even not rational: since the uncertainty
this time decreases | times, the information value of the message, if measured still in bits, will
be log 8 =3 - log 3.
2 2
In addition to the term "bit", the term "byte" is also used, usually referring to eight
bits. Oddly enough, this was not always the case: there were computers that had a
minimum memory location of, for example, seven bits rather than eight, and on these
computers a byte was usually just the amount of information stored in one such
location; they even introduced the special term octet to denote exactly eight bits.
Fortunately, you are unlikely to encounter in practice an understanding of a byte other
than eight bits, but it is worth keeping in mind the history of the term in any case.
An eight-bit byte can take on one of 256 possible values; these values are usually
interpreted as either a number between 0 and 255, or a number between -128 and 127.
Eight-bit memory cells are well suited for storing the letters that make up a text if that
text is written in a language that uses an alphabet like Latin; a so-called character code
is used to represent each individual letter, and there are significantly fewer different
codes than there are possible byte values. With multilingual texts everything is a bit
worse: for Cyrillic there are still enough codes, but, for example, with hieroglyphs this
approach is no longer suitable. We will come back to the question of text encoding.
Since we are talking about memory cells, it should be noted that memory cells are
used to store any information that is processed by a computer, including programs or,
more precisely, command codes that make up programs. When the range of values of
one cell is not enough, several memory cells are used in a row, and we no longer speak
of a cell, but of a memory area.
It is important to realize that the memory cell itself "does not know" how the
information stored in it should be interpreted. Let's consider this on the simplest
example. Suppose we have four consecutive memory cells whose contents correspond
to the hexadecimal numbers 41, 4E, 4E and 41 (the corresponding decimal numbers
are 65, 78, 78, 65). The information contained in such memory area can be interpreted
with the same success as an integer 1095650881; as a fractional number (so-called
§ 1.4. Programs and data 206
floating point number) 12.894105; as a text string containing the name 'ANNA';
finally, as a sequence of machine commands. In particular, on i386 platform processors,
these will be the commands conventionally labeled inc ecx, dec esi, dec esi,
inc ecx; we will discuss what these commands do in the third part of the book.
The history of units of measurement of large amounts of information is rather
peculiar. Buying a flash key in a store, we usually pay attention to its capacity, which
nowadays is usually expressed in gigabytes, denoted by "GB"; the reader has
undoubtedly many times encountered other units of this kind, such as kilobyte,
megabyte, terabyte, etc. Remembering that "byte" is not always 8 bits, we may notice
that such units of measurement of memory capacity are not quite logical, but this is half
the problem - machines, on which the memory cell would differ from eight bits, soon
half a century as nobody has seen, and, of course, absolutely all current storage media
(at least those that can be bought in the store) work with eight-bit bytes. What's much
worse is that there are two different opinions about what a kilobyte, a megabyte, and a
gigabyte actually are.
Long before the era of mass computerization, which began around the mid-1980s,
people who built and worked with computers had a need to characterize the RAM
capacity of different machines in some brief way. Recall that in §1.1.2 we mentioned
the address bus, which, like other parts of the bus, consists of a number of tracks, and
each track can carry a single bit, either a logical one or a logical zero; if the address bus
contains N tracks, then this bus allows a total of 2N addresses to be distinguished; hence,
this is what - 2N - will be the memory limit on a computer using this bus.
87
In real life, memory is usually smaller than the bus allows: the bus tracks are not
as expensive as the memory itself, and they are designed "out of the box" when the
processor is built. Memory itself usually consists of banks, each of which has its own
connection to the bus, so that some tracks of the address bus select the bank to be used,
while other tracks already select the cell inside the bank. The number of memory banks
connected to a particular computer does not have to be a power of two or even an even
number - for example, they can be connected three; but the banks themselves for
convenience always contain the number of cells corresponding to the power of two,
otherwise the address space of the computer would cease to be continuous, that is, the
addresses to which the memory cells correspond would be mixed with addresses that
can not be used, because the corresponding cells in the composition of the machine is
simply not available. It is very inconvenient to work with such a "piecewise" address
space.
Anyway, the number of memory cells is always closely related to powers of two,
although it is not always such a power. For example, "in the times when computers
were big and programs were small", memory could consist of banks of, say, 2 = 8192 13
cells each; this is still OK, programmers usually remember degrees of two, but what if
there are three such banks connected to the machine? Or seven? Looking at the numbers
24576 and 57344, it is unlikely to realize that these are actually 3 - 2 and 7 - 2 .
13 13
It is not known who first noticed the closeness of the numbers 1000 and 2 = 1024
10
More specifically, the cell limit; see footnote 9 on page 63. 63.
87
§ 1.4. Programs and data 207
and suggested that 1024 cells be denoted by the term kilobyte. In fact, it is not even a
fact that "byte" was in this story from its very beginning; say, if memory cells on some
machine consisted of 39 bits, they were not usually called "bytes"; with such cells,
usually the machine word (i.e. the size of the data portion processed by the processor
in one operation) coincided with the size of the cell. If such a machine had 2048 cells,
specialists said that the memory capacity was "2 K", sometimes going so far as to
explain that they meant "2 K words"; it was clear to everyone (that is, in fact, to all
other specialists) that the "K" stood for "kilo", but it was not 1000 as in other fields, but
1024. This is quite logical, considering that memory sizes have almost never been
multiples of 1000, but they have almost always been multiples of 1024 (actually, history
also knows machines with decimal addressing, such as IBM 702 and IBM 705, but this,
as they say, passed quickly). With the use of this "K" numbers become clearer; in our
example in the paragraph above we can say about a bank that its capacity is 8 K, and
memory capacities with three and seven such banks are 24 K and 56 K respectively, it
is enough to remember the multiplication table to understand what is going on.
Bytes, apparently, appeared a little later, when computers began to actively process
text information, and it became clear that it was expensive to spend a long (30-40 bits)
word to store the code of one character, and to reduce the machine word to 8, 7, or even
6 bits - it's just absurd. The logical next step was the transition to cells smaller than the
machine word - for example, a word could correspond to two, four or eight cells. This
finally fixed the kilobyte as a unit of memory size measurement, and when the byte size
actually lost its uncertainty and "froze" in its eight-bit version, it became possible to
use the same unit to measure the amount of information (not everyone realizes that this
is not the same as the number of cells in the computer memory).
With the growth of volumes naturally appeared megabytes (1024 kB, or 2 = 20
but at some point experts and ordinary computer users faced a rather unpleasant
phenomenon coming from the marketing departments of companies - manufacturers of
equipment. Hard disks suddenly appeared on the market, the volumes of which seemed
to be designated in gigabytes (GB), but a gigabyte was understood not as 2 , but as 10 30 9
bytes. Manufacturers motivated it by the fact that the prefix "giga-" according to
international standards means 10 (billion), and they do not care about the "jargon" of
88 9
computer specialists.
Alas, the marketeers had a lot to fight for here. If at the level of kilobytes the
difference between such and other understanding of the unit of measurement is
insignificant (actually, 2.4%), when using gigabytes the difference reaches almost
7.5%, and if you try to interpret a terabyte as a degree of two, the result will differ from
the decimal ("metric") by 10%, which is quite a lot.
The problem here is that even in the field of measuring the quantity of information,
computer scientists themselves did not always apply degrees of two. For example, the
throughput of digital communication channels, which is not technically tied to bytes,
was usually measured in bits per second, and since serial bit transmission is also not
tied to degrees of two, all sorts of kilobits, megabits and gigabits have been denoted by
Here, by the way, is another role of standards: to justify moral freaks in their moral ugliness.
88
§ 1.4. Programs and data 208
the corresponding degrees of ten, not degrees of two, since the early computer
networks.
Already in the mid-nineties, there were proposals to introduce new prefixes for
denoting degrees of 1024 in units of measurement (as opposed to the traditional degrees
of a thousand). Standardizers immediately jumped on this idea, and thus "standard"
prefixes appeared: kibi- (from the words kilo, binary), mebi-, gibi-, tebi-, pebi- and even
exbi- (2 ), zebi- (2 ) and, pardon the expression, yobi- (2 ). The designations for these
60 70 80
prefixes have also been standardized: Ki, Mi, Gi, Ti, Pi, Pi, Ei, Zi and Yi. At the same
time it was proposed to designate bytes with a capital letter "B", and bits - the whole
word "bit", so that when you see a simple "B", you do not have to guess whether you
meant bits or bytes. For example, gibibyte, i.e. 2 bytes, according to these rules should
30
The only thing left to do was to convince the general public that "from now on" a
kilobyte is equal to 1000 bytes, and here the standardizers received a passive but very
effective response from the public; simply put, for the first ten years or so the public
ignored this innovation completely. The author of these lines first heard about "kibibits"
near the end of the noughties; the largest companies, vulnerable to the activities of
standardizers, reluctantly started using "new units" with footnotes around 2012-2013
to avoid lawsuits for misleading the public by using traditional units in the "new
meaning" (which, as we understand, is smaller than what people are used to). The funny
thing is that even among the standardizers there is no complete agreement on what
constitutes a kilobyte, megabyte, and gigabyte; when it comes to RAM capacity, these
units are almost always used in their traditional meanings (1024, 1024 , 1024 ). 23
To be fair, the "new" prefixes have one undoubted advantage: there is no ambiguity
here - if, of course, you know what it is at all - i.e. there are many people in the world
who do not know what "GiB" is, but there are hardly any people who believe that it is
10 bytes.
9
256 we will get "all zeros", i.e. just zero; the familiar transfer to a non-existent digit
§ 1.4. Programs and data 210
has taken place. In general, when we use the positional unmixed number system on
base N to represent positive integers and limit the number of digits by the number
k, the maximum number we can represent is Nк - 1; so, in our example with the
counter there were five decimal digits, and the maximum number was 99999 = 10 - 1, 5
and in the example with the eight-bit cell the system was binary, the digits were eight,
so the maximum number was 28 - 1 = 255.
Some high-level programming languages allow you to operate with any large integers, as
long as you have enough memory, but we will not consider such tools for now: it is important
for us to understand how the computer works, while high-level languages try to hide the
computer device from us as much as possible. Let us only note that this feature, usually called
long arithmetic, significantly reduces the speed of integer calculations. Programming
languages that support long arithmetic will be discussed in the last volume of this book.
We have already noted that one cell usually consists of eight digits and can store a
number from 0 to 255, but if you want to work with numbers from a larger range,
use several consecutive memory cells, and here the question arises quite unexpectedly,
in what order to place the parts of the representation of one number in neighboring
memory cells. Two different approaches to byte ordering are used on different
machines. One approach, called little-endian , assumes that the first goes the lowest
89
byte of the number, then the bits are arranged in ascending order, the highest byte is the
last. The second approach, called big-endian, is the exact opposite: the highest byte of
the number comes first, and the lowest byte is placed last in memory. For example, the
number 7500 in hexadecimal is written as 1D4C16. If you represent it as a 32-bit (4-byte)
integer on a computer using the big-endian approach, the four-cell memory area storing
this number will be filled as follows: the first two bytes (with the lowest addresses) will
be set to 00, the next (third) byte will be set to 1D, and the last, fourth byte will
be set to 4C: 00 00 001D4C. If the same number is written to the memory of a
computer using the little-endian approach, the values of the individual bytes in the
corresponding memory area will be in the opposite order: 4C 1D 00 00 00. Most
computers in use today use the "little-endian" order, that is, they store the least
significant byte first, although "big-endian" machines are also sometimes found.
Let us now see what to do if we need to work not only with positive numbers. It is
clear that some other way of interpreting combinations of binary digits is required, such
that some of the combinations are considered to represent a negative number. In such
cases we will say that the cell or memory area stores a signed integer, in contrast to the
previous case when we speak of an unsigned integer.
In the early days of computing, different approaches were tried to represent
negative integers, for example, storing the sign of a number as a separate bit. It turned
out, however, that it was inconvenient to realize even the simplest operations - addition
and subtraction - because the sign bit of both summands had to be taken into account.
"The terms" big-endians and little-endians were introduced by Jonathan Swift in Gulliver's Travels
89
to denote the irreconcilable supporters of breaking eggs from the blunt end and from the sharp end,
respectively. In Russian, these names were usually translated as "blunt-enders" and "sharp-enders".
Arguments in favor of one or another architecture indeed often resemble a holy war between pointy-
pointed and blunt-pointed people.
§ 1.4. Programs and data 211
Computer creators quickly enough came to the use of the so-called additional code . 90
To understand how the additional code works, let's go back to our example with a
mechanical counter. In most cases, such roller chains can spin both forward and
backward, and if scrolling forward gave us an addition of one, then scrolling backward
will subtract one. Let us now have all rollers set to zero and unscroll the counter
backwards. The result will be 99999; it is understandable, because when we added one
to 99999 we got 00000, and now we have done the reverse operation. It is said that we
have borrowed from a non-existent digit: as in the case of transfer to a non-existent
digit, if we had another roll, everything would be correct (e.g. 100000-1 = 99999), but
it is not there. The same thing happens in binary: if there were zeros in all digits of a
cell (00000000) and we subtracted a one, we get all ones: 11111111; if we now add a
one again, we get zeros in all digits again. This logically leads us to the idea of using
as a representation of the number -1 the ones in all digits of a binary number,
no matter how many such digits we have. Thus, if we are working with eight-bit
numbers, 11111111 now means -1, not 255; if we are working with sixteen-bit
numbers, 1111111111111111 now means, again, -1, not 65535, and so on.
Continuing the operation of subtracting a unit over an eight-bit cell, we will come
to the conclusion that to represent the number -2 we should use 11111110
(previously it was 254), to represent -3 - 11111101 (previously it was 253) and
so on. In other words, we voluntaristically declared some combinations of binary digits
to represent negative numbers instead of positive ones, and always the new (negative)
value of a combination of digits is obtained from the old (positive) one by subtracting
the number 256 from it: 255 - 256 = -1, 254 - 256 = -2 and so on. (the number 256
represents 2 , and our reasoning is true only for the special case with eight-bit numbers;
8
in the general case, the number 2 " must be subtracted from the old value, where n is the
used digit capacity). The question remains at what point to stop, that is, to stop counting
numbers as negative; otherwise, getting carried away, we can reach 00000001 and
declare that it is not 1 at all, but -255. The following convention is accepted: if a
set of binary digits is considered as a representation of a signed number, then the
combinations whose highest bit is 1 are considered negative, and the other
combinations are considered non-negative. It turns out that the most modulo negative
number will be represented by one unit in the high bit and zeros in all others; in the
eight-bit case, this is 10000000, -128. If you subtract one unit from this number, you
get 01111111; this combination (high zero, other units) is considered the representation
of the largest signed number and for the eight-bit case represents, as you can easily see,
the number 127. As you have already guessed, adding one to this number will again
give the largest modulo negative. Here we see two simplest cases of overflow.
The role of overflow in signed integer arithmetic is similar to the role of carry to a
non-existent digit and borrow from it in unsigned arithmetic: both are the result of a
lack of digit capacity to represent the result of an operation (addition or subtraction). In
the general case of overflow, the sum of two positive numbers turns out to be "negative"
or, conversely, the sum of negative numbers turns out to be "positive". Unless special
whose representation is described in the previous paragraph, computers can also handle
fractional numbers, called floating-point numbers. Such numbers involve the separate
storage of the mantissa M (a binary fraction from the interval 1 6 M < 2) and the
machine order P, an integer denoting the degree of two by which the mantissa is to be
multiplied. A separate bit s is allocated for the sign of the number: if it is equal to 1 -
the number is considered negative, otherwise positive. The final number is considered
equal to N = ( - 1) M2SР . A set of private agreements on the format of floating-point
numbers, known as the IEEE-754 standard, is currently used by almost all processors
capable of working with fractional numbers, despite the fact that the standard itself is
an example of a jumble of extremely unsuccessful technical solutions.
Since the integer part of the mantissa is always 1, it can be left out, so all available
bits are used to store the digits of the fractional part (there are exceptions to this rule,
but we will not discuss them for now). To store the machine order at different times
used different ways - signed integer using additional code, a separate bit for the sign of
the order, etc.; IEEE-754 suggests storing the machine order as an offset unsigned
integer: the corresponding bits are treated as an unsigned integer, from which to obtain
the machine order subtract a constant called the machine order offset. The specific
number of bits allocated to the order and mantissa depends on the size of the whole
number; for example, in an eight-byte floating-point number, the first bit (as in any
other) stores the sign, the next 11 bits - the order (the order offset in this case is 1023),
and 52 bits remain for the mantissa.
It is clear that even the simplest arithmetic operations on floating-point numbers
are much more complicated than on integers. For example, to add or subtract two such
numbers, you must first bring them to the same order; for this purpose, the mantissa of
the number whose machine order is smaller than the other is shifted to the right by the
required number of positions. Then the actual addition or subtraction is performed, and
then, if the result is less than one or greater than or equal to 2, it is subjected to
normalization, that is, change the order and simultaneously shift the mantissa so that
the value of the number has not changed, but the mantissa again began to meet the
91
In fact, with integers, one can obviously represent rational numbers (as a numerator/denominator
fraction), or use the concept of so-called fixed-point numbers, where an integer is considered to
represent not units but, say, hundred-thousandths of a processed quantity.
§ 1.4. Programs and data 214
condition 1 6 M < 2. Similar normalization is done in multiplication and division of
numbers.
When shifting to the right, the lower bits of the mantissa that have no place in the
allocated digits are simply lost. The difference between the result and what would have
been obtained if nothing had been discarded is called a rounding error. Generally
speaking, rounding error in operations with binary (as well as with decimal) fractions
is inevitable, no matter how many digits we have allocated for storing the mantissa,
because even an ordinary division of two integers, which form an irreducible fraction
and the divisor is not a power of two, will result in a periodic binary fraction (see page
160), for the "exact" storage of which we would need an infinite amount of memory.
The binary fraction representation of numbers such as |, 1, 1o, etc., is infinite (though
periodic), so that significant bits must inevitably be discarded when converting them to
floating-point format. Therefore, calculations over floating-point numbers almost
always give not an exact result, but an approximate one. In particular, programmers
consider it wrong to try to compare two floating-point numbers for strict equality,
because the numbers may not be "equal" just because of rounding errors.
By the way, the author of these lines has repeatedly met the statement that any
calculations on a computer are made with errors; as you can guess, the source of this
nonsense is the same school textbooks of computer science. Don't believe it, you are being
deceived! Computer calculations in integers are absolutely accurate.
1966 by the American scientist Joseph Weizenbaum. For a person sitting at a computer
keyboard, programs like "Eliza " can create an impression that there is a live person
"on the other side", although people who know what the matter is, in fact, quite easily
distinguish a person from the program by specially creating a conversational situation
in which the program lacks "perfection". It is noteworthy that such programs do not go
into the meaning of the text at all, i.e. they do not perform semantic analysis of the
interlocutor's lines; instead, they analyze the structure of the received phrases and
themselves use the words received from the user to construct phrases whose meaning
they do not understand.
The range of existing formal languages is quite wide; several thousand
programming languages alone have been created in the history of computers, although
not all of them exist now - many programming languages have lost all their supporters
and died, as they say, a natural death. However, there are also at least several hundred
supported programming languages, i.e. languages in which we could write programs,
if we wanted to, and then execute these programs on computers.
In addition to these, so-called markup languages are widely used to design texts;
perhaps the best known of these is HTML, which is used to create hypertext pages on
Internet sites. Note that in many school textbooks you can find a completely insane
statement that HTML is supposedly a programming language; don't believe it.
Finally, in general, any computer program that accepts information in the form of
text as input, by the very fact of its existence sets a certain formal language consisting
of all such texts in which this program works without errors. Such languages are often
very primitive and, as a rule, have no name.
When discussing formal languages, sometimes doubts arise as to whether a
language can be considered a programming language, i.e., a language in which
programs are written. In some cases, the answer depends on the answer to the question
of what a "program" is, and this question is not as simple as it seems at first sight; in
any case, there are situations when different people give different answers to the
question whether a certain text is a program or not. For example, when working with
databases, the SQL query language is quite popular; there are claims that writing
queries in this language constitutes programming, but there is also an opposite opinion.
To eliminate terminological ambiguity, a narrower term can be introduced;
languages in which any algorithm can be written are called algorithmically complete
languages. Since, as we remember, the notion of algorithm has no definition, any of
92
According to one version, the program was named after the heroine of the play Pygmalion by
Bernard Shaw; there is indeed something in common, because in the play Professor Higgins teaches
Eliza how to pronounce words correctly, but at first completely overlooks other aspects that distinguish
a lady of high society from a flower girl.
§ 1.4. Programs and data 217
the introduced formalisms is used instead, most often a Turing machine; from the
formal point of view, an algorithmically complete language is a language in which an
interpreter (if you like, a simulator or model) of a Turing machine can be written. Some
authors prefer to call such languages "Turing-complete" rather than "algorithmically
complete" for greater certainty. It is worth noting that algorithmically complete
languages often turn out to be languages that from the very beginning were not intended
for writing programs at all. For example, the book you are reading has been prepared
using the TjX markup language created by Donald Knuth; this language consists mainly
of commands that specify text features such as font shape, size and position of headings
and illustrations, etc., i.e. it is not originally designed for programming; nevertheless,
TjX is algorithmically complete, although not all of its users know this.
128. The first thirty-two of them, from 0 to 31, assigned the role of control, such as line
feed (10), carriage return (13), tabulation (9) and others. Code 32 was assigned to the
character "space"; further in the table (see Fig. 1.14) are punctuation marks and
mathematical symbols, occupying codes 33 to 47; codes 48 to 57 correspond to the
Arabic numerals from 0 to 9, so that if you subtract 48 from the code of a digit, we
always get its numerical value (we will use it again).
Uppercase Latin letters are located in the ASCII table in positions 65 through 90 in
alphabetical order, and positions 97 through 122, again in alphabetical order, are
occupied by lowercase letters. As it is easy to see, the letters are arranged in the table
in such a way that the binary representation of an uppercase letter differs from the
binary representation of a lowercase letter by exactly one bit (the sixth, i.e. the
penultimate bit). This was done on purpose to make it easier to bring all characters of
the processed text to the same case, for example, to perform a case-insensitive search.
The remaining free positions are occupied by a variety of characters that, at the
time, seemed more useful than others to the members of the working group that created
the table. The last displayed character in the ASCII table is code 126 (that's a tilde,
"~"), and code 127 is considered control, as are codes 0 through 31; this code, called
RUBOUT, was originally intended to cross out a character. The point is that in those
days characters were often represented by punched holes in punch cards, with a
punched hole corresponding to a one and an un-punched hole to a zero; a punch card
designed for storing seven-bit text was exactly seven positions wide, so each character
was represented by exactly one line of holes. The binary representation of the number
127 consists of seven units, i.e. if you "punch" this code in any line of the perforated
tape, you will get seven holes, regardless of what was there originally. When reading
the punch tape, the code 127 was to be skipped, assuming that earlier there was a
symbol in this place, which was later crossed out.
The mass transition to the use of eight-bit bytes in computers has caused a
persistent association of byte with symbol in programming, because, of course, to store
one ASCII-code began to use a byte. At the same time, the appearance of the eighth bit
allowed the creation of a number of different extensions of the ASCII-table, which used
codes from the 128-255 area. In particular, at different times and in different systems,
five (!) different ASCII extensions providing for Russian letters (Cyrillic alphabet)
were used. Historically, the first of them was the COI8 encoding (eight-bit information
exchange code), which came into use back in the mid-1970s on machines of the EC
computer series. The main disadvantage of this encoding is a somewhat "strange" order
of letters, completely different from the alphabetical order: "UABCDEF-GH..." This
makes it difficult to sort strings, requiring an unnecessary conversion step, whereas if
§ 1.4. Programs and data 219
the characters in the table are arranged in alphabetical order, sorting can be performed
by simply comparing the character codes in an arithmetic sense.
This order of letters in the KOI8 table has a very definite reason, which becomes
clear if you write out the rows of the resulting extended ASCII table with 16 elements
per row. It turns out that Cyrillic letters in KOI8 are in the second half of the 256-code
table in exactly the same positions in which their Latin counterparts are located in the
first half of the table (i.e. in the original ASCII table). The point is that in the past there
were (and still are) often situations when someone forcibly "discards" the eighth (high)
bit in text data. KOI8 is the only Russian encoding that retains readability in such a
situation. For example, the phrase "good afternoon" will turn into "DOBRYJ
DENX" when the eighth bit is discarded; it is not very convenient to read such text, but
still possible. That's why COI8 encoding for a long time was considered the only
acceptable in telecommunications and surrendered its position only under the onslaught
of Unicode, which will be discussed below. It should be noted that the Russian alphabet,
containing 33 letters, "slightly failed" to fit into two consecutive lines of 16 codes;
"unlucky" this time was the letter "ё", which was sent to the "eviction" in KOI8,
assigning to its lowercase and uppercase versions the codes A316 and BZsch, while
all other letters occupy a continuous area of codes from C016 to FF16.
In the MS-DOS era, i.e. in the 1980s and early 1990s, the most popular Cyrillic
encoding on personal computers was the so-called "alternative" Russian encoding, also
known as cp866 (code page #866). In it, it should be said, the characters were not too
well arranged: the alphabetical order was preserved, that is, it could be used for sorting
without intermediate transformations (except for the letter "ё", which was again
unlucky), but at the same time, the uppercase Russian letters in cp866 were in a row,
while between the first sixteen and the second sixteen lowercase letters there were
codes for 48 pseudo-graphic characters. It is interesting that this encoding is still used
today - for example, in some versions of Windows it is used when working with console
applications; it was also used as the only Cyrillic encoding in OS/2 family systems.
During the mass migration of users to Windows systems, many people were
unpleasantly surprised to learn that these systems used another Cyrillic encoding,
completely different from either cp866 or KOI8. It was cp1251 encoding, which
contained at the same time characters from other Cyrillic alphabets - Ukrainian,
Belarusian, Serbian, Macedonian and Bulgarian, but its creators forgot about, for
example, about
Kazakh and many other non-Slavic languages using the Cyrillic alphabet. It should be
noted that the letter "ё" was once again unlucky - it was outside the main code area.
Apple Macintosh computers also used and still use their own encoding, the so-
called MacCyrillic. Another "standard" worth mentioning is ISO 8859-5, whose code
table differs from all of the above; however, this standard, created by yet another
committee with a completely unclear purpose, has never been used anywhere.
Several basic properties of textual representation of data can be distinguished:
With the growing number of known and used symbols in computer information processing,
there was a need for some order in this area, and in 1987 the Unicode registry was created,
which is often (and mistakenly) mistaken for an encoding. In fact, the basis of Unicode is the
so-called "universal character set" (universal character set, UCS), that is, simply a register of
known characters in which they are provided with numbers; as for encodings, there are at
least four of them based on Unicode: UCS-2, UCS-4 (aka UTF-32), UTF-8 and UTF-16.
The UCS-2 and UCS-4 encodings use two bytes and four bytes, respectively, to store a
single character. Since by now there are more than 110,000 characters in the Unicode
Registry, the UCS-2 encoding does not cope with the task of representing any characters (two
bytes can store only 65536 different values, and this has long been insufficient) and is now
considered obsolete; as for UCS-4, it is tried not to use because of too large and unproductive
memory consumption: indeed, to store most characters is enough to store one byte, and more
than three bytes are not needed at all never and hardly ever needed: it would be strange to
expect that by the
It should be noted that multibyte encodings are completely devoid of most of the above
advantages of textual data representation; they can be considered textual data at all with a
very high degree of suspicion: indeed, they even depend on the byte order of the integer
representation; therefore, standards require that in a file containing "text" in any of the
Unicode-based encodings, the first two (or four) bytes must be a pseudo-character with the
code FEFF , which allows a program reading the text to understand what order the bytes are
1e
in. As a result, the union of two such "texts" does not necessarily represent a text, because
the two source texts may be in different byte order, and the notorious FEFF found in the
1e
middle of the text is not perceived as an indication of a byte order change; a fragment of "text"
in multibyte encodings does not have to be a correct text.
The only exception is the UTF-8 encoding, which continues to use a sequence of bytes
rather than multibyte integers to encode texts. Moreover, the first 127 codes in it coincide with
ASCII and it is considered that if a byte begins with a zero bit, then this byte corresponds to a
one-byte character code. Thus, plain text in the traditional ASCII format appears correct in
terms of UTF-8 and is interpreted in the same way in both ASCII and UTF-8. For characters
not included in ASCII, UTF-8 assumes the use of sequences of two, three or four bytes, and
if necessary, five or six bytes, although this is not necessary yet: Unicode simply does not
have that many characters.
In UTF-8, a byte starting with bits 110 means that we are dealing with a character whose
code is represented by two bytes; the first byte of a three-byte character starts with bits 1110;
the first byte of a four-byte character starts with bits 11110. Additional bytes (the second and
subsequent in the representation of this symbol) begin with bits 10. Thus, to represent useful
information in the two-byte code uses 11 bits of 16, in three-byte - 16 bits of 24, in four-byte -
§ 1.4. Programs and data 221
21.
Text represented in UTF-8 is independent of byte order in integers; a fragment of such
text remains correct, except perhaps for "chunks" of a single character representation at the
beginning and end; the union of two or more UTF-8 texts remains UTF-8 text. UTF-8 has only
one obvious disadvantage: since variable-length codes are used for a single character, to
retrieve a character by its number, you have to look through all the text preceding it. In addition,
UTF-8 text has a disadvantage common to all Unicode-based encodings: no one can
guarantee you that on a computer where someone tries to read text generated in UTF-8, the
font you use will have the correct images for all the characters you use. Indeed, it is not easy
to create a font that represents one hundred and ten thousand different characters.
The strangest of all Unicode encodings is UTF-16: it uses two-byte numbers, but
sometimes two such numbers are used to represent one character. This encoding has no
advantages compared to the others, but has all their disadvantages at the same time; in
particular, if we are in principle satisfied with the variable length of the character code, it would
be much better to use UTF-8, because it does not depend on byte order and is also compatible
with ASCII, and no disadvantages compared to UTF-16 it does not have. However, it is UTF-
16 is used in the Windows line of systems, starting with Windows-2000; this is due to the fact
that in earlier systems of the line, starting with Windows NT, used UCS-2, also assuming two-
byte codes; as we already know, two bytes were not enough to represent all the characters
Unicode.
Anyway, text representation based on Unicode encodings, except for UTF-8, cannot be
considered as text data in the original programming sense at all; UTF-8 is somewhat better in
this respect, but while ASCII characters are known to be present and displayed correctly in
any operating environment, the same cannot be said of other characters. Therefore, in some
cases, the use of characters not included in the ASCII set is considered
undesirable or prohibited ; for example, this applies to the texts of computer programs
in any programming language.
which we will write the sequence of numbers 1000, 2001, 3002, ..., 100099 - that is, a
total of 100 numbers, each next one greater than the previous one by 1001. The trick is
to write these numbers in one file in text representation, one per line, and in the other
file - in the form of four-byte integers, exactly as they are represented in the computer
memory when performing all sorts of calculations. We'll call the files numbers.txt
These files can be found in the archive of examples in this book; the Pascal programs to create
93
As we can see, the binary file is exactly 400 bytes in size, which is quite understandable:
we wrote 100 numbers of 4 bytes each into it. The size of the text file is 592 bytes, i.e.
a little more; it could have been smaller if we had written into it, for example, numbers
from 1 to 100, i.e. numbers whose representation consists of fewer digits. Let's see
where the number 592 comes from. Numbers from 1000 to 9008 have four digits in
their text representation, and such numbers are nine; numbers from 10009 to 99098 are
written with five digits each, and such numbers we have ninety; the last number of our
sequence, 100099, is represented by six digits, and such number we have one. In
addition, each number is written on a separate line, and lines, as we know, are separated
by a line feed character (character with the code 10); since there are only one hundred
numbers, there are also one hundred line feed characters after each number (including
the last one; in general, it is considered that a correct text file should end with a correct
line, i.e. the last character in it should be just a line feed character). In total we have 4 -
9 + 5 - 90 + 6 - 1 + 100 = 36 + 450 + 6 + 100 = 592, which we saw in the output of the
command ls -l.
The numbers.txt file can be printed, for example:
[-]
98097 99098 100099 avst@host:~/work$
It can also be opened in any text editor - if we do that, we will see the same hundred
numbers, or rather their decimal representation, and we can make changes if we want.
In other words, we can work with the numbers.txt file using our usual text tools.
The numbers.bin file is another matter. If you try to print it, you will see
some blatant gibberish, something like this:
- and we'll be lucky if our terminal doesn't fall into disrepair, having accidentally
§ 1.4. Programs and data 223
received control sequences reprogramming its behavior among all this nonsense.
However, the terminal can be brought to its senses at any moment with the reset
command. Anyway, we see that numbers.bin is not intended for humans; the
same can be said about any file that is not a text file.
The hexdump program, which allows you to print the byte values of a given
file (in hexadecimal notation) and, if you specify the additional key -C, also show
the symbols corresponding to the bytes - or rather, those that can be displayed on the
screen.
The leftmost column here is the addresses, i.e. the byte numbers from the beginning of
the file; they are 1016 spaced, since each line contains an image of sixteen consecutive
bytes. Next come the bytes themselves, and in the column to the right are the characters
whose codes are contained in the corresponding bytes (or a dot if the character cannot
be displayed). If we look in this column, we will see the familiar gibberish.
Returning to the contents of the bytes, let's remember that the numbers we have by
the problem condition are, firstly, four-byte numbers, and, secondly, our computer
belongs to the little-endian class, i.e. the least significant byte of the number comes
first. Taking the first four bytes from our dump, we see e8 03 00 00; rearranging
them in reverse order, we get 000003e8; converting from hexadecimal to decimal,
we have (taking into account that E stands for 14): 3 - 16 + 14 - 16 + 8 - 16 = 768 +
2 1 0
224 + 8 = 1000, which is, as we remember, the first number written to the file. Just in
case we do the same for the last four bytes of our dump: 03 87 01 00; rearranging
them, we get 00018703 = 1 - 16 + 8 - 163 + 7 - 16 +0 -16 + 3 - 16 = 65536+8 -
16
4 2 1 0
The right column contains quite readable text, which is understandable because the file
is a text file; the hexdump program had to replace only line feed characters with dots
here. Looking through the byte values, starting from the first byte, we see 3116 (decimal
49) - the ASCII code of digit 1, then three times 30 - the ASCII code of zero, then
16
0A (decimal 10) - this is just the line feed character, and so on.
16
This representation is convenient for humans, but it is more difficult for programs
to work with it; in order to perform any calculations, numbers have to be converted into
a representation that can be handled by the CPU. Of course, this transformation, as we
will see later, is not difficult at all, it is just necessary to remember about it and, of
course, to understand what, when and in what form we have a representation.
94
The presence or absence of such a possibility depends on a particular processor; for example,
Pentium processors can, bypassing the registers, add a given number to the contents of a given memory
location and perform some other operations, while the processors SPARC, used in computers from Sun
Microsystems, could only copy the contents of a memory location into a register or, conversely, the
contents of a register into a memory location, but no other operations on memory locations could not
perform.
§ 1.4. Programs and data 225
form of numeric codes denoting certain operations, and a special processor register
called the program counter or instruction pointer determines from which memory
location the next instruction code should be retrieved. The processor performs a cycle
of instruction processing, i.e. it retrieves the next instruction code from memory,
increments the instruction counter, decrypts the retrieved code, performs the prescribed
actions, retrieves the next instruction from memory again, and so on ad infinitum.
The representation of a program consisting of machine instruction codes and, as a
consequence, "understandable" to a central processor, is called machine code. The
processor can easily decipher such command codes, but it is very difficult for a human
to memorize them, especially since in many cases the required number has to be
calculated by substituting code chains of binary bits in certain places. Here, for
example, two bytes written in hexadecimal as 01 D8 (the corresponding decimal
values - 1, 216), denote on Pentium processors command "take a number from the
register EAX, add to it a number from the register EBX, the result of the addition put
back into the register EAX. Remember the two numbers 01 D8 is not difficult, but
the different commands on the Pentium processor - a few hundred, and besides here the
command itself - only the first byte (01), and the second (D8) we have to calculate
in our minds, remembering (or learning from the reference book), that the lowest three
bits in this byte denote the first register (the first summand and also the place where the
result should be written), the next three bits denote the second register, and the highest
two bits here must be equal to ones, which means that both operands are registers.
Knowing (or, again, having looked it up in the reference book) that the EAX register
number is 0 and the EBX register number is 3, we can now write down the binary
representation of our byte: 11011000 (spaces are inserted for clarity), which gives
216 in decimal notation and the desired D8 in hexadecimal notation.
If we need to recall a piece of our program written two days ago, to read it, we will
have to manually arrange bytes into their constituent bits and, checking the reference
book, remember what command does what. If a programmer is forced to compose
programs this way, he will not write anything useful in his whole life, because in any
program, even the smallest but practically applicable, there will be several thousands
of such commands, and the largest programs consist of tens of millions of machine
commands.
When working with high-level programming languages, such as Pascal, C, Lisp,
etc., the programmer is given the opportunity to write a program in a form that is
understandable and convenient for a human, but not for a central processor. The CPU
cannot execute such a program, and in order to make the program written in a high-
level language execute, one of two possible ways of program translation has to be used.
These two ways are called compilation and interpretation.
In the first case, you use a compiler, a program that takes as input the text of a
program in a high-level programming language and outputs the equivalent machine
code . For example, in the next part of the book we will write programs in Pascal; after
95
95
Generally speaking, a compiler is a program that translates programs from one language to another;
translation into machine code language is only a special case, though a very important one.
§ 1.4. Programs and data 226
typing the program text (the so-called source code) and saving it in a file, we will run
the Pascal compiler, which, after reading the text of our program, will either generate
error messages or, if the program is written in accordance with the rules of the Pascal
language, will create its equivalent in the form of an executable file containing the
machine code of our program. By running our program, we will require the operating
system to load this machine code into RAM and transfer control to it, causing the
processor to perform the actions we described in the Pascal text.
It should be emphasized that a compiler is also a program written in some
programming language; in particular, our Pascal compiler is itself, oddly enough,
written in Pascal, and its creators use the previous version of their own compiler to
compile each subsequent version.
The second way of executing programs written in high-level languages is called
interpretation. An interpreter program loads the source text of a program from a file
specified to it and performs the actions prescribed by this text step by step without
translating anything. Modern interpreters usually create their own internal
representation of the executed program for convenience and to increase the speed of
work, but this representation has nothing to do with machine code.
Here we should make an important remark. From one school textbook to another
there is a completely nonsense phrase that an interpreter supposedly translates a
program into machine code step by step and immediately executes this code. If you
have been told something like this or if you have read it yourself in some other book
written by no one (even if it has the Ministry of Education stamp, it happens) - don't
believe it, you are being deceived once again! This technique of execution is possible
in principle and even has a name - JIT-compilation (the abbreviation JIT is formed
from the English words just in time ), but it is relatively difficult to implement; for
example, it has to circumvent the restrictions imposed by the operating system, which,
unless special measures are taken, does not allow you to write anything to the memory
areas storing the machine code of the program being executed, and does not allow you
to transfer control to the memory areas whose contents the program can change (that
is, to execute their contents as a set of machines). Because of the technical problems
that arise, JIT-compilation is not so often used; by the way, many programmers do not
consider this variant of program execution as interpretation at all. A normal interpreter
does not have to be so tricky to execute our program, because it is a program in itself
and can simply perform the necessary actions without translating them into any code;
it is ten times easier to create such an interpreter than a JIT-compiler that translates
program fragments into machine code at runtime. The strange person who first put the
phrase about step-by-step translation into machine code with immediate execution into
the textbook obviously never wrote any programs himself, otherwise such an idea
would not have occurred to him.
We have already encountered one interpreter: it is a command interpreter that
handles the commands we type on the command line. As we saw in §1.2.15, it is
possible to write real programs in the language of this interpreter, called Bourne Shell.
We will not consider other interpreters for the moment, deferring familiarity with them
until Volume 3; but in general, interpreted execution is characteristic of a wide range
of programming languages, such as Perl, Python, Ruby, Lisp, and many others.
§ 1.4. Programs and data 227
It is interesting to note that nowadays the boundaries between compilation and
interpretation are gradually blurring. Many compilers do not translate a program into
machine code, but into some intermediate representation, usually called "bytecode";
this is how Java and C# compilers work. In many cases, compilers generate code that
interprets some part of the intermediate representation of the program at runtime. On
the other hand, interpreters also translate the program into an intermediate
representation in order to increase its efficiency, but then they execute it themselves.
There are compilers that seem to create a separate executable file, but a close look at
this file reveals that it contains the entire interpreter of the program's internal
representation plus the representation itself.
In any case, compiled execution often uses elements of interpretation and
interpreted execution uses elements of compilation, and the question arises as to where
to draw the line between these two approaches and whether such a line exists at all. We
venture to propose a rather simple answer to this question, which allows us to say in
each particular case whether it is compiled or interpreted execution. During the
execution of an interpreted program the interpreter itself has to be in memory,
while the compiler is needed only at the stage of compilation, and the program can be
executed without its participation. Note that not all specialists agree with this
interpretation (in particular, with the fact that JIT-compilers should be referred to
interpreters, not compilers). In general, the question of the boundaries between
interpretation and compilation is quite complex and entails a whole trail of
methodological problems; in the third volume of our book we will devote a separate
part of a rather large volume to this problem.
Programming in high-level languages is convenient but, unfortunately, not always
applicable. The reasons may be very different. For example, a high-level language may
not take into account some peculiarities of a particular processor, or the programmer
may not be satisfied with the particular way in which the compiler implements certain
constructions of the source language with the help of machine codes. In these cases, it
is necessary to abandon the high-level language and compose a program in the form of
a specific sequence of machine commands. However, as we have already seen, it is very
difficult to compose a program directly in machine codes. This is where a program
called an assembler comes to the rescue.
An assembler is a special case of a compiler: a program that takes as input a text
containing human-friendly symbols of machine commands and translates these
symbols into a sequence of corresponding machine command codes that can be
understood by the processor. Unlike machine commands themselves, their symbols,
also called mnemonics, are relatively easy to memorize. Thus, the command from the
example given earlier, whose code, as we have found out with some difficulty, is 01
D8, looks like this in the conventional designations of : 96
96
Here and below, the conventions corresponding to the NASM assembler are used unless otherwise
stated; the NASM assembler will be discussed in detail in Part 3 of our book.
§ 1.4. Programs and data 228
Here we don't need to memorize the numeric code of the command and calculate
operand designations in our minds, we just need to remember that the word add
denotes addition, and in such cases the first summand (not necessarily a register, it can
be a memory location) is always the first after the command designation, the second -
the second summand (it can be a register, a memory location, or just a number), and
the result is always written in the place of the first summand. The language of such
symbols (mnemonics) is called assembly language.
Assembly language programming is fundamentally different from programming in
high-level languages. In a high-level language (like Pascal), we specify only general
instructions, and the compiler is free to choose how to execute them - for example,
which registers and memory locations to use for storing intermediate results, which
algorithm to use for executing some non-trivial instruction, and so on. In order to
optimize performance, the compiler can rearrange instructions, replace one instruction
with another - as long as the result remains unchanged. In an assembly language
program, we specify unambiguously and unambiguously what machine
instructions our program will consist of, and the assembler (unlike a high-level
language compiler) does not have any freedom.
Unlike machine codes, mnemonics are accessible to humans, that is, a programmer
can work with mnemonics without much difficulty, but this does not mean that
programming in assembly language is easy. An action that would take us one high-
level language statement to describe may require dozens, if not hundreds, of assembly
language lines, and in some cases even more. The point is that a high-level language
compiler contains a large set of ready-made "recipes" for solving often arising small
problems and provides all these "recipes" to the programmer in the form of convenient
high-level constructs; assembly language contains nothing of the kind, so we have only
the processor's capabilities at our disposal.
It is interesting that there can be several different assemblers for one processor. At
first glance this seems strange, because the same processor cannot work with different
machine code systems (so-called instruction systems). In fact, there is nothing strange
here, just remember what an assembler really is. The instruction system of a processor,
of course, cannot change (unless you take a different processor). However, for the same
commands it is possible to invent different designations; for example, the already
familiar command add eax, ebx in the designations proposed by AT&T will
look like addl %ebx, %eax - and the mnemonics are different, and the registers
are not so labeled, and the operands are not in the same order, although the resulting
machine code is, of course, strictly the same - 01 D8. Besides, when programming in
assembly language, we usually write not only mnemonics of machine commands, but
also directives, which are direct orders to the assembler. Following such directives, the
assembler can reserve memory, declare this or that label visible from other program
modules, proceed to generation of another program section, calculate (right during
assembly) some expression, and even "write" a program fragment in assembly language
(following, of course, our instructions), which it will process later. The set of such
directives supported by the assembler can also be different, both in terms of capabilities
and syntax.
Since an assembler is nothing more than a program written by quite ordinary
§ 1.4. Programs and data 229
programmers, no one prevents other programmers from writing their own assembler
program, which is often the case. The NASM assembler discussed in our book is one
of the assemblers that exist for the 80x86 family of processors; there are other
assemblers for these processors.
Part 2
Pascal program itself, like almost any programming language , is text in ASCII
97 98
representation, which we discussed in §1.4.4. Consequently, we will have to use some text
editor to write the program; we described some of them in §1.2.12. We will save the result in
a file named with the suffix ".pas", which usually means the text of the Pascal program.
99
97
Pascal interpreters also exist, but are very rarely used.
There are exceptions to this rule, but they are so rare that you can quite safely ignore them.
98
Recall that the Unix family of systems does not have the "extension" of a file name that many users are
99
accustomed to; the word "suffix" means much the same thing - a few characters at the very end of a name that
§2.1 First programs 233
We will then run the compiler, which in our case is called fpc from the words Free Pascal
Compiler, and if all is well, the result of our exercises will be an executable file that we can
run.
For the sake of clarity and convenience, we will create an empty directory before 100
starting our experiments and conduct all experiments in it. In the examples of dialogs with
the computer, here and below we will reproduce the command line prompt consisting of the
user name (avst), the machine name (host), the current directory (recall that the user's
home directory is indicated by the "~" symbol) and the "$" sign traditionally used in the
prompt. So, let's begin:
avst@host:~$ mkdir firstprog
avst@host:~$ cd firstprog avst@host:~/firstprog$ Is
avst@host:~/firstprog$
We created the directory firstprog, entered it (i.e. made it the current directory), and,
using the ls command, verified that it is empty, i.e. does not contain any files. If something
here is not quite clear, please reread §1.2.5 immediately.
Now it's time to launch a text editor and type the program text in it. The author of these
lines prefers the vim editor, but if you don't want to learn it at all (which, admittedly, is
not very easy), you can just as easily use other editors, such as joe or nano. In all cases,
the principle of starting a text editor is the same. The editor itself is just a program that has a
name to run, and this program should be given (as a parameter) the name of the file you want
to edit. If such a file does not already exist, the editor will create it the first time you save it.
The program we will write will be very simple: all it will do is to display in English, the
101
same string every time. Of course, there is no practical use of such a program, but it is more
important to make some program work, just to make sure that we can do it. In the example,
the program will produce the phrase Hello, world! , so we will name the file of its
102
source code hello.pas. So let's run the editor (substitute joe or nano for vim
if you want):
vim hello.pas
Now we need to type the program text. It will look like this:
program hello;
begin
are separated from the main name by a period, but unlike other systems, Unix does not consider a suffix to be
anything other than just a piece of the name.
Once again, let us remind you that the term "folder" should not be used!
100
101
It is more correct to speak, of course, not about the screen, but about the standard output stream,
see §1.2.11. See §1.2.11, but we will allow ourselves to use the terms loosely for the time being, so as not to
complicate the understanding of what is going on.
102
Hello, world! (The tradition of starting learning a programming language with a program that prints
this exact phrase was introduced by Brian Kernighan for C a long time ago; it's not a bad tradition in itself, so
you can follow it when learning Pascal. However, you can use any phrase you like.
§2.1 First programs 234
writeln('Hello, world!') end.
Now it is time to give some explanations. The first line of the program is the so-called header,
which shows that all the text that follows represents a program (the word program), which
its author named hello (actually, you can write here any name consisting of Latin letters,
numbers and the underscore character, but the name must begin with a letter; we will call
such names identifiers). The header ends with a semicolon character. Generally speaking,
modern implementations of Pascal allow you not to write a header, but we will not use it: a
program without a header does not look so clear.
The word begin, which we wrote on the second line, means beginning in English; in
this case it means the beginning of the main part of the program, but our program is so simple
that it actually consists of this "main part" alone; later we will write programs in which the
"main part" will be quite small compared to everything else.
The next line, writeln('Hello, world!'), does exactly what our program
was written for - it prints "Hello, world!". Let us explain that the word "write" means
"write" in English, and the mysterious addition "ln" comes from the word line and means
that after printing we need to translate the line (we will consider this point in detail later). The
resulting word "writeln" is used in Pascal to denote an output operator with a line feed,
and the parentheses list everything the operator should output; in this case, it is one line.
The reader already familiar with Pascal may object that most sources do not call writeln an
operator, but a "built-in procedure"; but this is not quite correct, because the word "procedure"
(without the epithet "built-in") denotes a subroutine that is written by the programmer himself, and
built-in procedures should be called procedures that the programmer could write, but does not need
to do so, because the compiler already contains them. We cannot write anything like writeln with
its variable number of parameters and output formatting directives; in other words, if this "built-in
procedure" did not exist in the language, we could not make it ourselves. Writeln and other
similar entities have their own syntax (a colon after the argument, followed by an integer width in
characters), so it is a part of the Pascal language that is handled by the compiler in its own way, not
according to generalized rules. In this situation, it seems much more logical to call writeln an
operator rather than something else. To be fair, the compiler treats the words write, writeln,
read, readln, etc. as ordinary identifiers, not as reserved words, which is a strong
argument against classifying these entities as operators; however, Free Pascal classifies these
words as modifiers, which also includes, for example, break and continue, which no one
calls them anything other than operators.
The line intended for printing is enclosed in apostrophes to show that this text fragment
stands for itself and not for any language constructs. If we had not put apostrophes, the
compiler would have tried to understand what we mean by the word "Hello" and, failing to
find any suitable meaning, would have issued an error message, and would not have translated
our program into machine code in the end; but since the word is enclosed in apostrophes, it
stands for itself and nothing else, so the compiler does not need to think about anything. A
sequence of characters enclosed in apostrophes and specifying a text string is called a string
literal or string constant.
The last line of our program consists of the word end and a dot. The word end means
"end" in English (again, in this case, the end of the main part of the program). Pascal rules
require the program to end with a dot, either to be sure or just for beauty.
§2.1 First programs 235
So, the text is typed, save it and exit the text editor (in vish we press Eus-colon-wq-Enter;
in nano - Ctrl-O, Enter, Ctrl-X, in joe - Ctrl-K, then another x; in the future we will not tell
you how to do this or that in different editors, because in §1.2.12 all this has already been
discussed). When we get the command line prompt again, we make sure that our directory is
no longer empty - it contains the file we just typed:
avst@host:~/firstprog$ ls
hello.pas
avst@host:~/firstprog$
For the sake of clarity, we'll give a command that shows the contents of the file; it's best not
to do this with large texts, but our program consists of only four lines, so we can afford to
experiment a little. The command itself is called cat, as we already know, and its parameter
is the name of the file:
By the way, as long as the file in our directory is only one, you can not type its name on the
keyboard, but press the Tab key, and the command line interpreter will write the name for us.
Now that we are sure that we have the file and that it contains what we expect, we can
run the compiler. Recall that it is called fpc; as usual, it needs a parameter, and this is, once
again, our source file name:
The compiler will print a number of lines that vary slightly depending on the particular version
of the compiler. If among them there is a string similar to
/usr/bin/ld: warning: link.res contains output sections; did you forget
-T? - you can safely ignore it, as well as all other lines, unless they contain the word
Error, Fatal, warning or note. An error message (with the word Error or
Fatal) means that the text you've fed to the compiler doesn't follow the rules of the Pascal
language, so you won't get compilation results - the compiler just doesn't know what to do.
Warnings (with the word warning) are issued if the program text formally complies with
the language requirements, but for some reason the compiler thinks that the result will not
work as you expected (most likely, incorrectly); the only exception is the above warning about
output sections, it is not actually issued by the compiler, but by the ld (linker)
program it calls, and we are not concerned with this warning. Finally, comments (messages
with the word note) are issued by the compiler if some part of the program seems strange
to it, although it should not, in its opinion, lead to incorrect operation.
For example, if we wrote writenl instead of writeln and then forgot to put a
period at the end of the program, we would see, among other things, messages like these:
§2.1 First programs 236
hello.pas(3,12) Error: Identifier not found "writenl"
hello.pas(5) Fatal: Syntax error, "." expected but "end of file"
found
The first message informs us that the compiler does not know the word writenl, so
nothing good will come out of our program; the second message means that the file has run
out and the compiler has not reached the point, and this has upset it so much that it refuses to
consider our program further at all (this is how Fatal differs from just Error).
Pay attention to the numbers in brackets after the file name; hello.pas(3,12)
means that the error was detected in the file hello.pas, in line 3, column 12, and
hello.pas(5) means an error in line 5 - there is no such line in our program, but by the
time the compiler detected that the file unexpectedly ran out, line 4 was left behind, and the
fact that there is nothing in line 5 is another matter.
Line numbers given together with error messages and warnings are very valuable
information that allows us to quickly find the place in our program where the compiler did
not like something. Regardless of which text editor you use, it is desirable to immediately
understand how to find a line by number in it, otherwise programming will be difficult.
You can verify the success of the compilation by once again viewing the contents of the
current directory with the ls command: avst@host:~/firstprog$ ls hello hello.o hello.pas
avst@host:~/firstprog$
As you can see, there are more files. We are not very interested in the file hello.o, it is
a so-called object module, which the compiler feeds to the linker and then forgets to delete
for some reason; but the file called simply hello without the suffix is what we started the
compiler for: an executable file, that is, simplistically speaking, a file containing the machine
code corresponding to our program. To be sure, let's try to get more information about these
files:
avst@host:~/firstprog$ ls -l
total 136
-rwxr-xr-x1avstavst1233242015-06-0419: 57hello
-rw-r--r--1avstavst18922015-06-0419:57hello.o
-rw-r--r--1avstavst472015-06-0419:57hello.pas
avst@host:~/firstprog$
In the first column, we see that the hello file has execution rights set (the letter "x"). At
the same time, we notice how much larger this file is than the source text: the file hello.pas
occupies only 47 bytes (by the number of characters in it), while the resulting executable file
"weighs" more than 120 kilobytes. Actually, everything is not so bad: as we will soon see,
the resulting executable file will not increase with the growth of the source program. It's just
that the compiler is forced to stuff into the executable file everything that is needed to perform
various input/output operations at once, and we don't use these possibilities yet.
All that remains is to run the resulting file. This is done in the following way:
avst@host:~/firstprog$ ./hello
Hello, world!
§2.1 First programs 237
avst@host:~/firstprog$
If you have a question about why you should always write " ./hello" rather than just "hello",
recall that we've already dealt with this when studying command scripts; there's a detailed
explanation on page 120. 120 for a detailed explanation. Briefly, a name with no slashes in it is used
in Unix to run commands from system directories (listed in PATH, see §1.2.16), and our working
directory is not one of them; so we necessarily need an absolute or relative name with at least one
slash in it. The name "." is present in any directory and refers to the directory itself, which is what
we need.
Naturally, there can be more than one operator in a program; in the simplest case, they
will be executed one after another. Let's consider such a program for example:
program hello2;
begin
writeln('Hello, world!');
writeln('Good bye, world.') end.
Now we have not one operator in the program, as before, but two; you can notice that we put
a semicolon between the operators. Usually a semicolon is placed at the end of the next
operator to separate it from the next one, if, of course, there is a "next" operator; if there is
not, there is no need for a semicolon. The word end is not an operator, so it is not
usually preceded by a semicolon.
If this program is compiled and run, it will print first (as a result of the first statement)
"Hello, world!" and then (as a result of the second statement) "Good bye,
world.":
avst@host:~/firstprog$ ./hello2
Hello, world!
Good bye, world.
avst@host:~/firstprog$
Let's go back a bit and explain in a bit more detail the letters "ln" in the name of the
writeln operator, which, as we have already said, mean line feed. Pascal also has the
write operator, which works in exactly the same way as writeln, but does not perform
a line feed at the end of the output operation. Let's try to edit our first program by removing
the letters "ln":
program hello;
begin
write('Hello, world!') end.
After that, let's run the fpc compiler again to update our executable and see how our
program will work now. The picture on the screen will look something like this:
avst@host:~/firstprog$ ./hello
Hello, world!avst@host:~/firstprog$
§2.1 First programs 238
This time, as before, our program outputted "Hello, world!", but it did not translate the
string; when, after its completion, the command line interpreter printed another prompt, it
appeared on the same line as the output of our program.
Here is another example. Let's write such a program:
program NewlineDemo;
begin
write('First');
writeln('Second');
write('Third');
writeln('Fourth')
end.
Let's save this program to a file, for example, nldemo.pas, compile and run it:
avst@host:~/firstprog$ fpc nldemo.pas
avst@host:~/firstprog$ ./nldemo
FirstSecond
ThirdFourth
avst@host:~/firstprog$
In the program, we output the first and third words using the write operator, while the
second and fourth words are output using writeln, i.e. with a line feed. The effect of this
is quite obvious: the word "Second" is printed on the same line as "First"
(after which the program did not do a line feed), so they merged together; the same happened
with the words "Third" and "Fourth".
Let us make one important terminological remark. The word "operator" is an example of a rather
unfortunate translation of an English term; in the original, this entity is called statement, which would
be more correctly translated as "statement", "sentence" or something else. The problem is that the
word operator also exists in English, and in programming this word denotes what we call operations
in Russian - addition, subtraction, multiplication, division, comparison and so on.
It is interesting that in mathematics the English word operator has passed into Russian
terminology, practically preserving (except that it has somewhat narrowed) its original meaning; the
reader may be familiar with such terms as linear operator, differentiation operator, etc.
In any case, the term "operator" in the Russian-language programming vocabulary is firmly fixed
as a designation of such a programming language construct that prescribes some action - not a
calculation, but exactly an action. How it happened is not important now; unfortunately, it creates
problems when programming languages use the word operator - in English, of course, and not
in Russian. There is no such word in Pascal, but in the third volume of our book we will study C++,
where the word operator is used quite actively. That's why it is useful to remember that in
programming the Russian word "operator" and the English word "operator" denote completely
different entities.
Before we finish talking about the simplest programs, let's mention one more very
important point - the way we have arranged different parts of our source code in relation to
each other. The header
§2.1 First programs 239
and the words begin and end, which denote the beginning and end of the main
part of the program, we wrote at the beginning of the line, while the operators, which make
up the main part, we moved to the right, putting four spaces before them; in addition, we
placed all these elements each on a separate line.
Interestingly, the compiler does not need all this at all. We could write, for example, like
this:
or like this:
As for our second program, which has more operators, there's also a lot more room for
messing around - for example, you could write something like
From the compiler's point of view, nothing would change; moreover, as long as we are talking
about very primitive programs, it is unlikely that anything will change from our own point of
view, unless our aesthetic sense is revolted. However, the situation changes dramatically for
less or less complex programs. The reader will soon see for himself that program texts are
rather hard to understand; in other words, to understand what a program does and how it does
it from the available program text, one has to make intellectual efforts that in many cases
exceed the efforts required to write a program text from scratch. If the text is also written in
the wrong way, it turns out to be impossible to understand it, as Bulgakov's famous character
said. The most interesting thing is that you can get hopelessly lost not only in someone else's
program, but also in your own, and sometimes it happens before the program is finished.
There is nothing easier (and more frustrating) than getting lost in your own code before you
have had time to write anything.
In practice, programmers have to read programs - both their own and others' - almost
more time than they spend writing them, so, quite naturally, the readability of program code
is always given the most careful attention. Experienced programmers say that the program
text is intended first of all for human reading and only secondly for computer
processing. Practically all existing programming languages give the programmer a certain
freedom in terms of code design, which allows the most complex program to make it
understandable at a glance to any of its readers familiar with the language used; but with the
same success it is possible to make a very simple program incomprehensible to its own author.
To improve the readability of programs, a number of techniques are used, which together
make up a competent style of program code design, and one of the most important moments
here is the use of structural indents. Almost all modern programming languages allow an
§2.1 First programs 240
arbitrary number of whitespace characters - spaces or tabs - at the beginning of any line of
103
text, which allows you to design a program so that its structure can be "grabbed" by a
unfocused eye without reading. As we will soon see, the structure of a program is formed by
the principle of nesting one thing into another; the technique of structural indentation allows
us to emphasize the structure of a program by simply shifting to the right any nested constructs
relative to what they are nested in.
In our simplest programs, there is only one level of nesting: the writeln and write
operators are nested in the main part of the program. In this case, neither the header nor the
main part itself is nested in anything, so we wrote them starting from the leftmost column of
characters in the program text, and shifted the writeln and write operators to
the right to show that they are nested in the main part of the program. The only thing we
have chosen rather arbitrarily here is the size of the structural indentation, which in our
examples throughout this book will be four spaces. In reality, many projects (in particular, in
the Linux kernel) use the tab character for structural indentation, and exactly one. You may
also see a two-space indentation, which is the indentation used in the code of programs
produced by the Free Software Foundation (FSF). It is very rare to use three spaces; this
indentation size is sometimes found in programs written for Windows. Other indentation sizes
should not be used, and there are a number of reasons for this. One space is too small to
visually emphasize blocks, the left edge of the text is perceived as something smooth and does
not serve its purpose. The number of spaces exceeding four is difficult to enter: if there are
more than five spaces, they have to be counted as you type, which slows down your work,
but five spaces also turns out to be too many.
If you use more than one tab, there is almost nothing on the horizontal screen. If you use more
than one tab, there is almost no horizontal space on the screen.
Once you have chosen one indentation size, it should be adhered to throughout the
program text. This also applies to other decisions made about the design style: the source text
of the program must be stylistically homogeneous, otherwise its readers will have to
constantly rearrange their perception, which is rather tedious.
program mult175_113;
begin
103
A curious exception to this statement is Python, which does not allow an arbitrary number of spaces at
the beginning of a line - on the contrary, it has a strict syntax requirement for the number of such spaces, which
corresponds to the principles of structural indentation. In other words, most programming languages allow
structural indentation, whereas Python requires it.
126
Of course, it is much easier to use a calculator or the arithmetic capabilities of a command interpreter
(see §1.2.15), but that is not important now.
§ 2.2. Expressions, variables and operators 270
writeln(175*113)
end.
After compiling and running this program, we will see the answer 19775 on the screen.
The "*" symbol in Pascal (and in most other programming languages) denotes the
multiplication operation, and the construct "175*113" is an example of an arithmetic
expression . We could make this program a little more "user-friendly" by showing in its output
what the answer is actually referring to:
program mult175_113;
begin
writeln('175*113 = ', 175*113) end.
Here we have given the writeln operator two arguments to print: a string and an
arithmetic expression. We can specify as many arguments as we like, listing them, as in this
example, separated by commas. If we now compile and run a new version of the program, it
will look like this:
Note the fundamental (from the compiler's point of view) difference between characters
included in the string literal and characters outside it: if the expression 175*113 outside
the string literal was evaluated, the compiler did not try to consider the same characters
between the apostrophes and thus included in the string literal as an expression or in any other
capacity than just as characters. Therefore, in accordance with our instructions, the program
first printed one by one all characters from the given string literal (including, as you can easily
see, spaces), and then the next argument of the writeln operator, which is the expression
with the value 19775.
Of course, multiplication is by no means the only arithmetic operation that Pascal
supports. Addition and subtraction in Pascal are denoted by the natural "+" and "-" signs,
so that, for example, the expression 12 + 105 will result in 117, and the expression
10 - 25 will result in 15. Just as in writing mathematical formulas, operations
in Pascal expressions have different priorities; for example, the priority of multiplication in
Pascal, as in mathematics, is higher than the priority of addition and subtraction, so the value
of the expression 10 + 5*7 will be 45, not 105: when calculating this expression,
multiplication is done first, i.e. 5 is multiplied by 7, resulting in 35, and only then
the resulting number is added to ten. Just as in mathematical formulas, in Pascal's expressions
we can use parentheses to change the order of operations: (10 + 5) * 7 will result
in 105.
Note that Pascal also provides unary (i.e. having one argument) operations "+" and "-
", i.e. you can, for example, write -(5 * 7) and get -35: the unary operation
-, as you would expect, changes the sign of the number to the opposite sign. The unary
operation + is also provided, but it makes no sense: its result is always equal to the
§ 2.2. Expressions, variables and operators 271
argument.
The division operation is somewhat more complicated. The usual mathematical division
is indicated by the slash "/"; just remember that the division operation is different from
multiplication, addition, and subtraction: even if its arguments are integers, the result
generally cannot be expressed as an integer. This leads to a somewhat unexpected effect for
beginners. For example, if we write the writeln(14/7) operator in a program, we may
not even immediately recognize the number 2 in its output:
2.0000000000000000E+0000
To understand what the problem is and why writeln did not print just "2", we will need
a rather lengthy explanation introducing the concept of expression type . Since this is one
127
of the fundamental concepts in programming and we will not do without it in the future, let's
try to deal with it right now.
Note first that all the numbers we have written in the above examples are integers; if we
had written something like 2.75, we would have been talking about a different type of
number, a so-called floating-point number. You and I have looked at the representation of
both types of numbers in detail (see §1.4.2 and 1.4.3) and have seen that they are stored and
handled quite differently. Mathematically, "2" and "2.0" are the same number, but in
programming they are quite different things because they have different representations, so
that operations on them require different sequences of machine commands. Moreover, integer
2 can be represented as a two-byte, one-byte, four-byte or even eight-byte integer, signed or
unsigned . All these are also examples of situations where different types are involved. In
128
Pascal, each type has its own name; for example, signed integers can be of type shortint,
integer, longint, and int64 (respectively one-byte, two-byte, four-byte, and eight-
byte signed integers), the corresponding unsigned types are called byte, word,
longword, and qword, and floating-point numbers are usually of type real (although
Free Pascal supports other types of floating-point numbers).
As it often happens in engineering disciplines, the notion of type is poorly definable: even
if we try to give such a definition, it is likely to be in some way inconsistent with reality. Most
often in the literature one can find the statement that a type is a set of possible values, but if
we accept such a definition, we cannot explain the difference between 2 and 2.0; therefore,
the notion of type should include not only a set of values, but also the machine representation
of these values accepted for a given type. In addition, many authors emphasize the important
role of the set of operations defined for a given type, and this is, in principle, reasonable. By
stating that an expression type fixes the set of values, their machine representation and
the set of operations defined on these values, we are likely to be close to the truth.
Types (and expressions) are not only numeric. For example, the boolean type,
intended for working with logical expressions, has only two values: true and false; the
set of operations on values of this type includes the familiar conjunction, disjunction,
negation, and "exclusive or". The 'Hello, world!' used in our very first program is
Of course, if we simply write a number in a program, such an entry (a so-called numeric constant, or
127
how many characters we want to allocate for printing a number and how many of them for decimal
places. To do this, the write and writeln operators add a colon, an integer (how many
characters total), another colon, and another number (how many decimal places) after the numeric
expression. For example, if you write writeln(14/7:7:3), it will print 2,000, and since
there are only five characters, it will print two more spaces before it.
In addition to ordinary division, Pascal provides for integer division, known from school
mathematics as division with remainder. For this purpose, two more operations are
introduced, denoted by the words div and mod, which mean division (with the remainder
discarded) and the remainder of such division. For example, if we write
writeln(27 div 4, ' ',27 mod 4);
- it will print two numbers separated by a space: "6 3".
called a variable name. For example, we can name a variable "x", "counter", "p12",
"LineNumber" or "grand_total". Later we will encounter variables that have no
names, but we are still a long way off from that. Note that Pascal does not distinguish between
uppercase and lowercase letters, i.e. according to Pascal's rules the words "LineNumber",
"LINENUMBER", "linenumber" and "LiNeNuMBeR" denote one and the same
variable; another issue is that using different spelling variants for one and the same identifier
is considered by programmers to be extremely bad taste.
A variable has a value associated with it at any given time; a variable is said to store a
value or a value is said to be contained in a variable. If a variable name occurs in an arithmetic
expression, a so-called variable reference is performed, where the variable value is
substituted into the expression instead of the variable name.
Pascal is classified as a strictly typed programming language; this means, in particular,
that each variable in a Pascal program has a strictly defined type. In the previous paragraph
we considered the concept of expression type; a variable type can be understood, on the one
hand, as the type of an expression consisting of a single reference to this variable, and, on the
other hand, as the type of an expression whose value can be stored in such a variable. For
example, the most popular type in Pascal programs, which is called integer, implies that
variables of this type are used to store integer values, can contain numbers from -32768
to 32767 (integers, two-byte signed integers), and a reference to such a variable will also,
of course, be an expression of type integer.
Before a variable can be used in a program, it must be described. To do this, a variable
description section is inserted between the header and the main part of the program; this
section begins with the word "var" (from the word variables), followed by one or more
variable descriptions and their types. For example, the variable descriptions section may look
like this:
var
x: integer;
y: integer;
flag: boolean;
This refers to the computer's RAM, or rather, the part of it that is allocated to our program for operation.
129
You can also start an identifier with an underscore, but in Pascal programs you don't usually do that; but
130
var
x, y, z: integer;
Pascal provides several different ways to put a value into a variable. For example, you can set
the initial value of a variable directly in the description section; this is called initialization :
var
x: integer = 500; If you don't do this, the variable will still contain some
value, but what value - it is impossible to predict, it can be arbitrary garbage. Such a
variable is called uninitialized . Using a "garbage" value is inherently wrong, so the
compiler, when it sees such a use, issues a warning; unfortunately, it does not always handle
it; in some cases a program may be written so "cunningly" that it will refer to an
uninitialized variable, but the compiler will simply not notice it. Sometimes it happens the
other way around.
The value of a variable can be changed at any time by executing a so-called assignment
operator. In Pascal, an assignment is denoted by ":=", to the left of which is written the
variable to which the new value is to be assigned, and to the right is written the expression
whose value is to be entered into the variable. For example, the operator
x := 79
(read "put x equal to 79") will put the value 79 into the variable x, and from the moment
this operator is executed, expressions containing the variable x will use this value. The old
value of the variable, whatever it may be, is lost when the assignment is executed. The
operation of assignment can be illustrated by the following example:
program assignments;
var
x: integer = 25;
begin
writeln(x);
x := 36;
writeln(x);
x := 49;
writeln(x)
end.
When the first of the writeln statements is executed, the variable x contains the
initial value given in the description, that is, the number 25; this is what will be printed. Then
the assignment operator on the next line will change the value of x; the old value of 25 will
§ 2.2. Expressions, variables and operators 275
be lost, and the variable will now contain the value 36, which will be printed by the
second writeln operator; after that, the variable will contain the value 49, and the
last writeln operator will print it. In general, the execution of this program
will look like this:
avst@host:~/firstprog$ ./assignments
25
36
49
avst@host:~/firstprog$
Another example of assignment is somewhat more difficult to understand:
x := x + 5
Here, the value of the expression to the right of the assignment sign is calculated first; since
the assignment itself has not yet occurred, the old value of x, the value that was in
this variable just before the operator was executed, is used in the calculation. Then the
calculated value, which for our example will be 5 more than the value of x, is put back
into the variable x, that is, roughly speaking, as a result of the execution of this
operator, the value contained in the variable x becomes 5 more: if it was 17, it
becomes 22, if it was 100, it becomes 105, and so on.
read(x);
x := x*x;
writeln(x)
end.
As you can see, the program uses one variable of the integer type, which is called x.
The main part of the program includes three operators. The first of them, read, prescribes
to read an integer number from the keyboard and then put the read number into the variable
x. The program execution will stop until the user enters the number, and due to certain
peculiarities of the terminal operation mode (or rather, in our case - its software emulator,
which repeats the peculiarities of real terminals) the program will "see" the entered number
no sooner than the user presses the Enter key.
The second operator in our example is an assignment operator; the expression on its right
side takes the current value of the variable x (i.e. the value that has just been read from the
keyboard by the read operator), multiplies it by itself, i.e. squares it, and puts the result
back into the variable x. The third operator - the familiar writeln - prints the result.
§ 2.2. Expressions, variables and operators 277
Execution of this program may look like this:
avst@host:~/firstprog$ ./square
25
625
avst@host:~/firstprog$
Immediately after starting the program "freezes", so that an inexperienced user may think that
it has frozen, but in fact the program is just waiting for the user to enter the required number.
In the example above, the number 25 was entered by the user, and the number 625 was
given by the program.
By the way, now is a good time to show why the expression "input from the keyboard" is
not quite true and it would be more correct to speak about "input from the standard input
stream". First of all, let's create a text file containing one line and a number in this line, let it
be 37 for a change. We'll call the file num.txt. To create it, you can use the same text
editor that you use to enter program texts, but you can do it in a simpler way - for example,
like this:
Now let's run our square program by redirecting the input from the num.txt file to it:
avst@host:~/firstprog$ ./square < num.txt 1369
avst@host:~/firstprog$
The number 1369 is the square of the number 37; our program printed it. As we can see,
it did not enter the original number from the keyboard - it read it from the num.txt file in
accordance with our instructions. It is easy to make sure of it: edit the num.txt file, replacing
the number 37 with some other number, and run the square program again with
redirection from the file, as shown in the example; this time the program will print the square
of the number you entered into the file.
In our example, we first used the system command echo with redirected output to
generate a file containing a number, and then ran our program with redirected input from that
file. Roughly the same results can be achieved without any intermediate files by running the
echo command at the same time as our program with a so-called pipeline (see §1.2.11);
in this case, the output of the echo command will go directly to the input of our program.
This is done in the following way:
avst@host:~/firstprog$ echo 37 | ./square 1369
avst@host:~/firstprog$ echo 49 | ./square
2401
avst@host:~/firstprog$
Let's redirect the output to the result.txt file:
avst@host:~/firstprog$ echo 37 | ./square > result.txt
avst@host:~/firstprog$
§ 2.2. Expressions, variables and operators 278
This time nothing was displayed on the screen at all, but the result was written to a file, which
is easy to verify:
avst@host:~/firstprog$ cat result.txt 1369
avst@host:~/firstprog$
Now we know from our own experience: output "to the screen" is not always on the
screen, and input "from the keyboard" is not always performed from the keyboard.
That is why it is more correct to speak about output to the standard output stream and about
input from the standard input stream. Let us emphasize that the program itself does not know
where its input is coming from and where its output is directed to, all redirections are
made by the command line interpreter just before launching our program.
2.2.5. Beware of the shortage of digitization!
Once Ilya Muromets came to chop off the head of the
Serpent Gorynych. He came and cut off the Snake's head.
And the Snake grew two heads. Ilya Muromets cut off
two heads of the Snake, and the Snake grew four. He
chopped off four heads, and the Snake grew eight heads.
Ilya Muromets chopped heads, chopped, chopped,
chopped, chopped, finally chopped a total of 65535 heads
- that's when the Serpent Gorynych died. Because he was
sixteen-bit.
Let's continue the experiments with the square program started in the previous
paragraph, but this time we will take larger numbers.
If everything seemed to be OK with the first two runs, something obviously went wrong on
the third one. To understand what was going on, let's remember that variables of the
integer type in Pascal can take values from -32768 to 32767; but the square of the
number 200 is 40000, so it simply doesn't fit into a variable of the integer type!
Hence the ridiculous result, which is also negative.
The result, despite its absurdity, is very simple to explain. We already know that we are dealing
with a signed integer and we have had an overflow (see §1.4.2, page 208). The digit capacity of our
numbers is 16 bits, so the overflow results in a final number that is 2 = 65536 less than it should be.
16
The correct result of the multiplication would be 200 = 40000, but the overflow reduced it by 65536,
2
so that 40000 - 65536 = -25536; this is exactly what we see in the example.
The largest number that our program processes correctly is 181, its square is 32761;
the square of 182, which is 33124, does not fit into the digit capacity of the
§ 2.2. Expressions, variables and operators 279
integer type. But everything is not so terrible, we just need to apply another type of
variable. The most obvious candidate for the role of such a type is longint, which has a
32-bit capacity; variables of this type can take values from -2147483648 to
2147483647 (i.e. from -2 31
to 2 - 1). It is enough to change one word in the program - just replace integer with
31
longint:
and the capabilities of our program (after its recompilation) will increase dramatically:
though, of course, he who seeks will always find it; of course, the maximum possible number exists
for int64 as well:
The last thing we can do to extend the range of numbers is to replace signed numbers with unsigned
ones. It won't do much, we'll gain only one bit, but there are no numbers with a bitness greater than
While numbers of the longint type can be found in almost any Pascal implementation, the
131
int64 type is a peculiarity of our chosen implementation (i.e. Free Pascal), so if you try to use it in other
versions of Pascal, there is a high probability that it will not be there.
§ 2.2. Expressions, variables and operators 280
64 in Pascal (at least not in Free Pascal). So, let's change int64 to qword (from quadro word,
i.e. "quadruple word"; "word" on x86 architectures traditionally means 16 bits) and try it:
Since the maximum possible value of our variable is now 2 - 1 = 1844 6 744 0 7 3 7095 5 1 6 1 5 , we
64
can predict at which number the program will fail. The square of the number 2 is 2 , which is one
32 64
more than allowed. Therefore, the largest number that our program can still square is 2 - 1 = 42 9 4
32
9 9 6 7 7 2 95. Check:
This is the somewhat unexpected effect of overflow, or, to put it more strictly, of transfer to a non-
existent discharge, because this time we are dealing with unsigned ones. Do you remember the joke
about the Serpent Gorynych?
program sequence;
begin
writeln('First');
readln;
writeln('Second');
readln;
writeln('Third') end.
Let's explain that the readln operator works in much the same way as the familiar read
operator, with the difference that, having read everything that was required, it necessarily
waits for the end of the line at the input. Since in our example we have not specified
parameters for this operator, it will do just that - waiting for the end of the line on input, i.e.
simply waiting for the Enter key to be pressed. If our program is compiled and run, it will
type the word "First" and stop waiting; nothing else will happen until you press Enter. The
program has completed its first statement and started executing the second; it will not
complete until the line feed character is read on the input.
When you press Enter, you will see that the program "came to life" and printed the word
"Second", then stopped again. When you pressed Enter, the first of the two
readln'crs finished, then the second writeln ran, which printed the word "Second";
after that, the second readln started executing. Like the first one, it will run until you press
Enter. When you press Enter a second time, you will see that the program has printed
§ 2.2. Expressions, variables and operators 281
"Third" and terminated: this is first the penultimate statement (readln) terminated, and
then the last one was executed.
In many computer science and programming textbooks, especially school textbooks, literally all
example programs end with this readln; sometimes readln at the end of a program becomes
so habitual that students and even some teachers begin to perceive it "as furniture", completely
forgetting what it is actually needed for at the end of the program. This whole semi-shamanic
shambles started with the fact that most schools use Windows family systems as a teaching aid, and
since it is very difficult to write a normal program for Windows, the programs are written "console",
or, to be more precise, just OѲB's programs. Of course, students are offered to run the programs
exclusively from under the integrated environment like Turbo Pascal or whatever one is rich in, and
nobody thinks it necessary to bother about the correct configuration of this environment. As a result,
when launching a program created in an integrated environment, the operating system opens an MS -
DOS emulator window for executing such a program, but this window automatically disappears as
soon as the program ends; naturally, we simply do not have time to read what our program has
printed.
Note that this "problem" can be overcome in many different ways, such as setting up the
environment so that the window does not close, or simply starting a command line session and
executing compiled programs from it; unfortunately, teachers instead prefer to show students how to
insert a completely irrelevant statement into programs, which is actually needed to keep the window
open until the user presses Enter; however, this is usually not explained to students.
Fortunately, we do not use Windows or integrated environments, so these problems do not
concern us. We should also note that the ridiculous readln at the end of every program is far from
being the only problem,
Figure 2.1. Block diagrams for simple succession, complete and incomplete succession
branching
that arises when using integrated environments. For example, having got used to launching a
program for execution from under the integrated environment by pressing the corresponding key,
students lose sight of the notion of an executable file, and with it the compiler and the role it plays -
many yesterday's students are sure that compilation is needed to check the program for errors, and
nothing else, in fact, nothing else.
§ 2.2. Expressions, variables and operators 282
As an independent exercise, take any collection of English-language poems and write a
132
program that will type some sonnet line by line, each time waiting for you to press Enter.
The sequence of operators in a program is often depicted schematically in the form of so-
called flowcharts. A flowchart for a simple sequence of operators is shown in Fig. 2.1 (left).
Note that common actions in flowcharts are traditionally depicted as a rectangle, the
beginning and the end of the program fragment under consideration are indicated by a small
circle, and the condition check is indicated by a rhombus; more about this in the next
paragraph.
entered number. As you know, the modulus is equal to the number itself if the number is non-
negative, and for negative numbers the modulus is obtained by changing the sign. For
simplicity we will work with integers. The program will look like this:
program modulo;
var
x: integer;
begin
It is very important that the poems should be in English. The use of Cyrillic characters in program texts is
132
inadmissible, despite the fact that compilers usually allow it. Proper design of a program capable of "speaking
Russian" requires studying the possibilities of special libraries that allow creating multilingual programs; all
non-English messages are not in the program itself, but in special external files.
Actually, Pascal has a built-in function to calculate the modulus, but we'll ignore that fact here.
133
§ 2.2. Expressions, variables and operators 283
read(x);
if x > 0 then
writeln(x)
else
writeln(-x)
end.
As you can see, the first step here is to read the number; the read number is placed in the
variable x. Then, if the entered number is strictly greater than zero (the condition x > 0
is met), the number itself is printed; otherwise, the value of the expression -x is printed,
i.e. the number obtained from the initial one by changing the sign.
What is noteworthy here is that the design
is entirely a single operator, but complex because it contains the operators writeln(x)
and writeln(-x).
To put it more strictly, the if operator is composed as follows. First, we write the if
keyword, which tells the compiler that our program will now contain a branching construct.
Then we write a condition, which is a so-called logical expression; such expressions are
similar to simple arithmetic expressions, but the result of their calculation is not a number,
but a logical value (a value of the boolean type), i.e. true (true) or false (false). In
our example, a logical expression is formed by a comparison operation, which is labeled ">"
("more").
After the condition, we must write the keyword then; the compiler recognizes by it
that our logical expression has ended. Next comes the operator that we want to execute if the
condition is met; in our case, this operator is writeln(x). In principle, the conditional
operator (i.e. the if operator) can end here if we want incomplete branching; but if we
want to make branching complete, we write the keyword else, followed by another
operator specifying the action we want to perform if the condition is false. The syntax of the
if statement can be expressed as follows:
if <condition> then <operator1> [ else <operator2> ] Square brackets
here denote the optional part.
Our modulus calculation can be written with incomplete branching, and even a bit shorter:
program modulo;
var
x: integer;
begin
read(x);
if x < 0 then
x := -x;
writeln(x)
end.
§ 2.2. Expressions, variables and operators 284
Here we have changed the condition in the if statement to "x is strictly less than
zero" and in this case we put into the variable x the number obtained from the old value
of x by changing the sign; if x was non-negative, nothing happens. Then, regardless of
whether the condition is false or true, the writeln operator is executed, which prints what
is finally in the variable x.
Note how the indentation is arranged in our examples. The operators nested in the if,
i.e. being a part of it, are shifted to the right relative to what they are nested in, by the already
familiar four spaces; in the style we have chosen, we get a total of eight spaces - two
indentation sizes . In our example, these operators themselves are on the second level of
134
nesting.
Let us note one more important point. Beginners often make a rather typical mistake -
they put a semicolon before else; the program fails to compile after that. The point is that
in Pascal, the semicolon, as we have already mentioned, separates one operator from another;
when the compiler sees the semicolon, it considers that the next operator is over, and in this
case it is an if. Since the word else has no meaning by itself and can appear in the
program only as a part of the if operator, and this operator has already ended from
the compiler's point of view, the word else encountered afterwards leads to an error. Let
us repeat once again: in Pascal programs, no semicolon is placed before else as part of
the if statement!
if a > b then
begin
t := a;
a := b;
134
Recall that the indentation size you choose can be two spaces, three, four, or exactly one tab character;
see page 242 for an explanation.
§ 2.2. Expressions, variables and operators 285
b := t
end
Let's emphasize again that the whole construction consisting of the words begin, end and
everything between them is a single operator - also, of course, belonging to the "complex"
ones, because it includes other operators.
Pay attention to the layout of code fragments containing a compound operator!
There are three different acceptable ways to arrange the construct shown above; in our
example, we moved begin to the line following the if statement header, but we did
not move it relative to if; as for end, we wrote it in the same column where the construct
that end closes begins.
The second popular way of arranging such a construction differs in that begin is left
on the same line as the if header:
Note that end remains where it was! You may think that it closes if; you may still insist
that it closes begin, but the horizontal position of the word end must in any case
coincide with the position of the line containing the thing (whatever it is) that this end
closes. In other words, end must be indented exactly the same way as the line containing
the thing that end closes. Following this rule allows you to "grasp" the overall structure of
the program with a unfocused eye, which is very important when working with source code.
The third variant is used comparatively rarely, when the word begin is shifted to a separate
nesting level, and what is enclosed in the compound operator is shifted even further. It looks like this:
if a > b then begin t := a; a := b; b := t end
We will not recommend the use of this style for a number of reasons, but in principle it is acceptable.
program modulo;
var
x: integer;
negative: boolean;
begin
read(x);
negative := x < 0;
§ 2.2. Expressions, variables and operators 287
if negative then
x := -x;
writeln(x)
end.
Here the variable negative after assignment will contain the value true if the
number entered by the user (the value of the variable x) is less than zero, and false
otherwise. After that we use the negative variable as a condition in the if
statement.
The Pascal language allows operations on logical values that correspond to the basic
functions of logic algebra (see §1.3.3). These operations are denoted by the keywords not
(negation), and (logical "and", conjunction), or (logical "or", disjunction) and xor
("excluding or"). For example, we could use the assignment operator flag := not flag
to change the value of the logical variable flag to the opposite value; we can check
whether the integer variable k contains a number written with one digit by using the logical
expression (k >= 0) and (k <= 9). Pay attention to the brackets! The point here
is that in Pascal the priority of logical connectives, including the and operation, is higher
than the priority of comparison operations, just as the priority of multiplication and division
is higher than the priority of addition and subtraction; if you don't put brackets, the
expression k >= 0 and k <= 9 will be "parsed" by the Pascal compiler according
to the priorities as if we wrote k >= (0 and k) <= 9, which will cause a compilation
error.
Pascal allows you to write any complex logical expressions; for example, if the variable
c is of type char, then the expression
((c >= 'A') and (c <= 'Z')) or ((c >= 'a') and (c <= 'z')))
will let you know if its current value is a Latin letter. Here we could do without brackets
around and, since the priority of and is higher than or anyway, but this is a case
where eliminating redundant brackets would not add any clarity to the expression.
Sometimes in programs of novice programmers you can meet in branching and loop
operators conditions formed by comparing a logical variable with a logical value, something
like
Formally, from the point of view of the Pascal compiler, this is not an error, because logical
values, like any other values, can be checked for equality or inequality to each other.
Nevertheless, you should never write like this; if you find this in your code, it means that you
probably haven't realized what logical expressions and logical values are and what their role
in programming is; it is also possible that you don't fully understand the essence of the
condition in branching and loop operators, because any such condition is a logical expression
and nothing else. If B is a logical variable or some more complex logical expression, then
instead of B = true you should write just B, and instead of B = false you
should use the negation operation, i.e. write not B. If the resulting expression does
not seem very clear after such a substitution, it makes sense to look for a more adequate name
§ 2.2. Expressions, variables and operators 288
for the logical variable you are using. For example, if your variable is simply called flag,
then writing if flag = true then may seem even more logical and understandable
than if flag then, but if instead of the impersonal word "flag", which can denote
any logical value at all, you use something more meaningful and relevant to the task at hand
- for example, found or exists, if you were looking for something, or some
negative, as in our example above, everything will become much clearer: if found
then looks more natural than if found = true then.
There are three different loop statements in Pascal; the simplest of them is the while
loop .The header of this statement specifies a logical expression that will be evaluated before
executing each iteration of the loop; if the result of the evaluation is false, the execution of
the loop will be terminated immediately; if the result is true, the body of the loop, defined by
a single (possibly compound) statement, will be executed. Since the condition is checked each
time before the body is executed, this loop is called a preconditioned loop. The beginning of
the construct is marked with the while keyword, while the body is separated from
the condition by the do keyword. The syntax of the while statement can be
summarized as follows:
while <condition> do <operator>
Let's say, for example, that we need to display the same "Hello, world!" message on
the screen, but not once, but twenty times. Naturally, this can and must be done with the help
of a loop, and the while loop is quite suitable for this purpose , we just need to 135
figure out how to set the loop condition and how to change something in the program state so
that the loop is executed exactly twenty times and the condition is false on the twenty-first
time. The easiest way to achieve this is to simply count how many times the loop has already
135
Looking ahead, we should note that Pascal provides another control construct - for loop - just for such
situations when the number of iterations is known exactly when entering a loop.
§ 2.2. Expressions, variables and operators 289
been executed, and when it has been executed twenty times, do not execute it again. You can
organize such counting by introducing an integer variable to store a number equal to the
number of iterations that have already been executed. Initially we will put zero into this
variable, and at the end of each iteration we will increase it by one; as a result, at each moment
of time this variable will be equal to the number of iterations that have been executed so far.
This variable is often called a loop counter.
In addition to incrementing the variable by one, we need to do one more thing in the body
of the loop, which is actually what the loop is all about, i.e. printing the string. It turns out
that we need two operators in the loop body, and the while statement syntax provides
for only one operator as the loop body; but we already know that this is not a problem - we
just need to combine all the operators we need into one compound operator using the begin
and end operator brackets. Our entire program will look like this:
Note that in the body of the loop we have placed the print statement first, and only then the
assignment statement that increments the variable i by one. For this particular task, nothing
would have changed if we had swapped them; however, we should not do so. As practice
shows, it is better - safer in terms of possible errors - to always follow one rather simple
convention: the preparation of variable values for the first iteration of a while loop
should take place just before the loop, and the preparation of values for the next iteration
should be located at the very end of the loop body. In this case, the preparation for the first
iteration consists in assigning zero to the loop counter, and the preparation for the next
iteration consists in incrementing the counter by one; we put the i := 0 operator before
the while loop itself, and the i := i + 1 operator last in its body.
We can approach this question in another way. Since the variable i stores the number of
lines printed so far (zero at first, then one more each time), it is quite logical to print the next
line first, and only then take this fact into account by increasing the variable by one.
The value of the loop counter can be used not only in the condition, as we did in the
hello20 program, but also in the body of the loop. Suppose we are interested in the squares
of integers from 1 to 100; we can print them, for example, like this:
program square100;
var
i: integer;
begin
i := 1;
while i <= 100 do
begin
writeln(i * i);
§ 2.2. Expressions, variables and operators 290
i := i + 1 end end.
It will not be very convenient to use the result of this program, because it will place each number on
a separate line. We can improve it by replacing writeln with write; but if we do so without
taking any additional steps, i.e., simply remove the letters ln and run the program as it is, the result
may be quite disconcerting:
avst@host:~/work$ ./square100
14916253649648110012114416919622525628932436140044148452957662567672978484190096
11024108911561225129613691444152116001681176418491936202521162209230424012500260
12704280929163025313632493364348136003721384439694096422543564489462447614900504
15184532954765625577659296084624164006561672468897056722573967569774479218100828
18464864988369025921694099604980110000avst@host:~/work$
The point is that the write operator fulfills our will literally: if we demanded to print a number,
it will print the digits that make up the decimal notation of this number, and nothing else - no
spaces or any other separators. If we look carefully at the output, we can see that the digits that make
up the decimal notation of the numbers 1, 4, 9, 16, etc. have not gone anywhere, just that the
numbers are not separated from each other.
It is very easy to solve this problem, just tell the write operator that we want it to print a space
character after each number. In addition, at the end of the program, i.e. after the loop, it is desirable
to add the writeln operator so that the program, before terminating, translates the line to print,
and the command line prompt after its completion would appear on a new line, not merged with the
printed numbers. The whole program will look like this:
program square100;
var
i: integer;
begin
i := 1;
while i <= 100 do
begin
write(i * i, ' ');
i := i + 1
end;
writeln
end.
Let us now consider an example of such a loop for which we do not know the number of
iterations in advance. Suppose we are writing a program that at some point must ask the user
his year of birth, and we need to check whether the entered number can really represent the
year of birth. Let's assume that the user's year of birth cannot be less than 1900 . In addition,
136
we will assume that the user's year of birth cannot be greater than 2020, since one-year-old
children do not know how to use a computer; if enough time has passed by the time you are
reading this book, you can adjust these values yourself.
Either way, we need to ask the user to enter their year of birth; if we are not satisfied with
136
At the time this book was written in 2016, there were only two people left on Earth about whom it was
reliably known that they were born before 1900, but when the second edition was being prepared for publication
in 2021, there were sadly no such people left.
§ 2.2. Expressions, variables and operators 291
the entry, we need to tell the user that they appear to have made a mistake and ask them to
repeat the entry. In the descriptions section, we can provide a year variable:
var
year: integer;
As for the dialog with the user itself, it can be implemented as follows:
With such a program, for example, the following dialogue can take place:
Please type in your birth year: 1755 1755 is not a valid year!
Please try again: -500
-500 is not a valid year!
Please try again: 2050
2050 is not a valid year!
Please try again: 1974
The year 1974 is accepted. Thank you!
Note that the loop may not run even once if the user immediately enters a normal year; on the
other hand, the user may be stubborn, so strictly speaking we cannot know what the maximum
number of iterations of our loop is. We can assume that, say, a billion iterations is still too
many for the user's patience, but what exactly is the upper limit? A hundred? A thousand?
Assuming that a user can still be patient for 1000 iterations but not for 1001 looks rather
ridiculous, and any other specific number will look equally ridiculous in this role; it is easier
not to make any assumptions at all.
§ 2.2. Expressions, variables and operators 292
GS
Fig. 2.2. Block diagrams of cycles with precondition and postcondition
repeat
readln(year)
until (year >= 1900) and (year <= 2020)
Let's look at another example. The following loop inputs numbers from the keyboard and
adds them until the total sum is greater than 1000:
§ 2.2. Expressions, variables and operators 293
sum := 0;
repeat
readln(x);
sum := sum + x until sum > 1000
The syntax of the repeat operator can be represented as follows: repeat <operators>
until <condition>
Usually, the repeat operator is much rarer than while operator in programs, but you
should know about its existence anyway - sooner or later this construct will come in handy.
program hello20for;
var
i: integer;
begin
for i := 1 to 20 do
writeln('Hello, world!')
end. The construct for i := 1 to 20 do means that the variable i will be
used as a loop variable, its initial value will be 1 and its final value will be 20, i.e. it
must run through all values from 1 to 20, and for each such value the loop body will be
executed. Since there are 20 such values, the body will be executed twenty times, which is
what we need. If you compare the resulting program with the one we wrote on page 266, you
will notice that its text is much more compact; moreover, for a person who is already used
to the for syntax, this version is much easier to understand.
Let's now rewrite the square100 program, taking as a basis the variant that prints
numbers over a space. Using a for loop, the same effect can be achieved in the following
way:
As you can see, you can use the value of the loop variable in the for loop body. It should
be noted at once that this value can be accessed, but it should never be changed. Changing
the loop variable during loop execution is the prerogative of the for operator itself, and
attempts to interfere with its operation may lead to unpredictable consequences. Besides, there
is one more restriction related to the loop variable: after the for loop is completed,
the value of the loop variable is considered undefined, i.e. we should not assume that this
variable will be equal to a specific number. Of course, some value will be there, but it may
depend on the compiler version and even on the place where the loop is encountered in the
program. Simply put, the compiler's creators do not pay any attention to what value to leave
in the loop variable after the loop is finished and can leave anything there.
In both of our examples, the loop variable changed in the upward direction, but you can
make it run values in the opposite direction, from larger to smaller. To do this, the word to
is replaced by the word downto. For example, the program
program countdown;
var
i: integer;
begin
for i := 10 downto 1 do
prints a string
10... 9... 8... 7... 6... 5... 4... 3... 2... 1... start!
(to save space, we have shown only eight lines, although there should be 24). To do this, in
principle, is very simple: we need to print 24 lines, and in each line we first print a certain
number of spaces, and then print an "asterisk" and translate the line. In the very first line we
don't print any spaces at all, we print an asterisk right away; we can consider that we print
zero spaces. Each subsequent line prints one more space than the previous line. If we consider
that the line numbers are from 1 to 24, it is easy to see that the line with the number n should
have n - 1 spaces.
It is clear that we should output strings in a loop, one iteration per line. Since at the
moment of entering the loop we know exactly how many iterations there will be, we should
use the for loop. Let's call the loop variable n, its value will correspond to the line
number. The loop ___ should 19 look like this :
for n := 1 to 24 do
begin
{ print the desired number of spaces }
writeln('*')
end
program StarSlash;
var
n, m: integer;
§ 2.2. Expressions, variables and operators 296
begin
for n := 1 to 24 do
begin
19
The curly brackets in Pascal mean a comment, i.e. a text fragment that is intended exclusively for the
human reader and should be completely ignored by the compiler. Hereafter, we will sometimes write comments
in Russian. In a textbook such liberty is acceptable, but in real programs you should not do this in any case:
firstly, Cyrillic characters are not included in ASCII; secondly, the world language of communication among
programmers is English. If comments are written in a program at all, they should be written in English, and if
possible without errors; otherwise, it is better not to write them at all.
Let's consider a slightly more complicated task - to display a figure of approximately this
kind:
*
* *
**
**
**
**
*
This figure is often called a "diamond". This time we will read the height of the figure from
the keyboard, i.e. we will ask the user to tell us how high he wants to see the "diamond".
Further analysis will be required.
First of all, we note that the height of our figure is always an odd number, so if the user
enters an even number, we will have to ask him to repeat the input; the same should probably
be done if the entered number is negative. An odd number is known to be represented as 2p+1,
where n is an integer; the top of our figure will consist of n+1 rows. For the figure shown
above, the height is seven lines, and p will be three.
Now we need to figure out how many spaces and where to print to get the shape we are
looking for. Note that when printing the very first line, we have to print n spaces, when
printing the second line - n - 1 space, and so on; when printing the last (n + 1)'th line, no
spaces are needed at all (we can consider that we print zero spaces).
The situation with spaces after the first asterisk is a bit more complicated. The first line
doesn't need any such spaces at all, there is only one asterisk there; but further on there is a
rather interesting process: in the second line you need to print one space (and after it a second
asterisk), in the third line - three spaces, in the fourth - five, and so on, each time two spaces
more. It is not difficult to guess that for the line number k (k > 1) the required number of
spaces is expressed by the formula 1 + 2(k - 2) = 2k - 3. It is interesting that with some stretch
we can consider this formula "true" also for the case k = 1, where it gives -1: if the operation
"print t spaces" is further defined for negative t as "return back to the corresponding number
of characters", it turns out that, having printed the first asterisk, we will have to go back one
§ 2.2. Expressions, variables and operators 297
position and print the second asterisk exactly on top of the first one. However, it is much more
difficult to do this than to simply check the value of k, and if it is equal to one, then after the
first asterisk we should not print any more spaces or asterisks, but translate the line at once.
The complete printing of a line with the number k should look like this: first we print n
+1 - k spaces, then an asterisk; after that, if k is equal to one, we simply issue a line feed and
consider the line feed finished; otherwise we print 2k - 3 spaces, an asterisk and only after
that we do a line feed. We need to do all this for k from 1 to n+1, where n is the "half-height"
of our "diamond".
Once the top of the figure has been printed, we need to somehow output the bottom of
the figure as well. We could continue numbering the lines and derive formulas for the number
of spaces in each line numbered n +1 < k 6 2p +1, which is in principle not that difficult;
however, we can do it even simpler by noticing that the lines we print now are exactly the
same as in the upper part of the figure, i.e., we first print the same line as n-th, then the same
as (n - 1)-th, and so on. The simplest way is to perform exactly the same printing procedure
for each line as described in the paragraph above, only this time the line numbers to we have
all numbers from n to 1 running in the reverse direction.
Our program will consist of three main parts: entering a number that means the height of
the figure, printing the top part of the figure, printing the bottom part. Here is its text (recall
that the words div and mod mean division with remainder and remainder of division):
It is easy to notice a very serious drawback of our program: the loops for drawing the upper
and lower parts of the figure differ only in the header, while the bodies in them are exactly
the same. In general, programmers think that this is not allowed: if we have two or more
copies of the same code in our program, then if we suddenly want to correct one of these
fragments (for example, if we find an error in it), most likely we will have to correct them all,
and this leads to unproductive labor costs (which is half the trouble) and provokes errors
because we edited part of it and forgot part of it. But we can deal with this problem only by
studying the so-called subroutines, which will be the subject of the next chapter.
Bitwise operations can be divided into two types: logical operations performed on
individual bits (all at the same time) and shifts. Thus, the not operation applied to an
integer (as opposed to the familiar not operation applied to a value of the boolean type)
results in a number whose bits are all opposite to the initial one. For example, if the variables
x and y are of type integer, after executing the operators
x := 75;
y := not x;
the variable y will contain the number -76, the machine representation of which
1111111110110100 is a bitwise inversion of the above representation of the number
75. Note that if x and y were of word type, i.e. unsigned type of the same digit
capacity, the result in the y variable would be 65460; the machine representation of this
number as a 16-bit unsigned number is the same as that of the number -76 as a 16-bit
signed number.
The and, or, and xor operations already familiar to us from §2.2.9 work in a
similar way on integers. All these operations are binary, that is, they require two operands;
when we apply them to integers, the corresponding logical operations ("and", "or", "excluding
or") are performed simultaneously on the first bits of the operands, on their second bits, and
§ 2.2. Expressions, variables and operators 299
so on; the results (also separate bits) are concatenated into an integer of the same type and,
consequently, of the same digit capacity, which becomes the result of the whole operation.
For example, an eight-bit unsigned representation (i.e., a representation of the byte type)
for the numbers 42 and 166 would be 00101010 and 10100110, respectively; if
we have variables x, y, p, q, and r of the byte type, then after the assignments of
x := 42;
У := 166;
p := x and y;
q := x or y; r := x xor y;
example, the result of the expression 1 shl 5 will be the number 32, and the result
of 21 shl 3 will be 168.
When shifting to the right, the low-order bits disappear, and the zero bits are added to the
left. For unsigned integers this is equivalent to division by a power of two with discarding the
remainder, and the same is true for positive numbers, even represented as signed numbers,
but when shifting negative numbers to the right, the equivalence to division fails; it is
understandable - if you remember how signed integers are represented in the computer (see
page 206), it is obvious that the result of any shift to the right will be a positive number,
because the signed bit will contain zero. The built-in functions SarShortint,
SarSmallint, SarLongint and SarInt64 allow to remedy the situation; we leave
them for the reader to study on his own.
const
hello = 'Hello world!';
part = copy(hello, 3, 7);
We will discuss the copy function and other tools for working with strings in §2.6.11.
The mechanism of named constants allows you to associate a name, i.e. an identifier,
with a certain constant value (compile-time constant) and use this identifier instead of the
value written explicitly throughout the program text. This is done in the constants description
section, which can be placed anywhere between the header and the beginning of the main part
of the program, but usually programmers place the constants section as close to the beginning
of the file as possible - for example, right after the program header. The point here is that the
values of some constants can be (and are) the most frequently changed part of the program,
and placing constants at the very beginning of the program saves time and intellectual effort
when editing them.
For example, consider the program hello20for (see page 271); it outputs the
message "Hello, world!" "to the screen" (to the standard output stream) and does it 20
times. This task can be generalized in an obvious way: the program outputs a given message
a given number of times. We know from the school physics course that it is best to solve
almost any problem in a general form, and to substitute specific values at the very end, when
the general solution has already been obtained. The same way can be done in programming.
In fact, what will change in the program if we want to change the message being output? And
if we want to output the message not 20 times, but 27? The answer is obvious: only the
corresponding constants-literal will change. In such a short program as hello20for, of
course, it is not difficult to find these literals; but if the program consists of at least five
hundred lines? Five thousand? And this is far from the limit: in the largest and most complex
computer programs, the number of lines is in the tens of millions.
In this case, the constants defined in the code, on which the program execution depends,
are sufficiently arbitrary: this is clearly indicated by the fact that the problem is obviously
generalized to arbitrary values. It is logical to expect that we might want to change the values
of constants without changing anything else in the program; in this sense, constants are like
knobs on a variety of technical devices. Named constants make such "tuning" easier: if
without their use literals are scattered all over the code, then by giving each constant its own
name, we can collect all the "tuning parameters" at the beginning of the program text,
annotating them if necessary. For example, instead of the program hello20for we can
write the following program:
§ 2.2. Expressions, variables and operators 302
program MessageN; { message_n.pas }
const
message = 'Hello, world!'; { what to print } count =20;
{ how many times }
var
i: integer;
begin
for i := 1 to count do writeln(message) end.
As you can see, the constant descriptions section consists of the const keyword followed
by one or more constant descriptions; each such description consists of the name (identifier)
of the new constant, an equal sign, an expression specifying the value of the constant (this
expression itself must be a compile-time constant), and a semicolon. From the moment the
compiler processes such a description, the identifier introduced by this description will be
replaced by the constant value associated with it in further program text. The constant name
itself is, quite naturally, also considered a compile-time constant; as we will see later (e.g.,
when we study arrays), this fact is quite important.
The usefulness of named constants is not limited to facilitating "customization" of the
program. For example, it often happens that one and the same constant value occurs in several
different places in the program, and it makes sense that if you change it in one of the places,
you should also (synchronously) change all the other places where the same constant occurs.
For example, if we are writing a program that controls a storage chamber of individual
automated cells, we will certainly need to know how many cells we have. This will determine,
for example, counting the number of free cells, all sorts of user interface elements where we
need to select one cell from all available cells, and much more. It is clear that the number
meaning the total number of cells will occur time and again in different parts of the program.
If now the engineers suddenly decide to design the same storage chamber with a few more
cells, we will have to go through our whole program looking for the cursed number that must
be changed everywhere. It is easy to guess that such things are an inexhaustible source of
errors: if, say, one and the same number occurs thirty times in a program, we can be sure that
from the first look through we will "catch" only twenty such occurrences and miss the rest.
The situation becomes more complicated if there are two different, independent of each
other parameters in the program that happen to be equal to the same number; for example,
we have 26 cells of a storage room, and we also have a check printer that has 26 characters in
its line, and both numbers occur directly in the program text. If one of these parameters has
to be changed, we can be sure that we will not only miss some of the occurrences of the
required parameter, but we will also change the parameter that did not need to be changed
once or twice.
It is quite different if the number of cells of our storage box is explicitly mentioned
only once in the program - at the very beginning of the program, and then the name of a
constant is used throughout the text, for example, LockerBoxCount or something like
that. It is very easy to change the value of such a parameter, because the value itself is written
in exactly one place in the program; the risk of changing something wrong also disappears.
There is one more very important advantage of named constants: it is much easier to
understand a program created using them. Put yourself, for example, in the place of a person
§ 2.2. Expressions, variables and operators 303
who somewhere in the wilds of a long (say, several thousand lines) program stumbles upon
the number 80, written just like that, in digits in an explicit form - and, of course, without
comments. What is this "80", what does it correspond to, where did it come from? Maybe it
is the age of the program author's grandfather? Or the number of floors in a skyscraper on
New York's Broadway? Or the maximum number of characters allowed in a line of text
displayed on the screen? Or the room temperature in degrees Fahrenheit?
After spending a considerable amount of time, the reader of such a program may notice
that 80 is part of the network address, the so-called port, when establishing a connection
with some remote server; remembering that the port with this number is usually used for web
servers, it will be possible to guess that the program is trying to get something from
somewhere via HTTP protocol (and, by the way, it is not a fact that the guess will be correct).
How long will it take to analyze this? A minute? Ten minutes? An hour? It depends, of course,
on the complexity of a particular program; but if the DefaultHttpPortNumber
identifier had been used instead of the number 80 in the program, there would have been no
need to waste time at all.
In most cases, the rules of program code design simply forbid numbers written in explicit
form to appear in the program (outside the constants description section), except for 0, 1
and (sometimes) -1; all other numbers must be named. In some organizations,
programmers are prohibited to use not only numbers but also strings in the depths of the
program code, i.e. all string literals needed in the program must be put in the beginning and
named, and these names must be used in the further text.
The constants we have considered are called untyped constants in Pascal, because their type
is not specified in their description; it is inferred when they are used. In addition to them, Pascal (at
least its dialects related to the famous Turbo Pascal, including our Free Pascal) also provides typed
constants, for which the type is explicitly specified in the description. Unlike untyped constants, typed
constants are not compile-time constants; moreover, in the compiler's default mode, the values of
such constants are allowed to be changed at runtime, making the use of the name "constant"
questionable. We leave the history of the origin of this strange entity out of our book; the interested
reader can easily find the relevant materials on his own. Typed constants are not needed in our
course and we will not consider them. Anyway, in case you come across examples of programs that
use typed constants, something like
const
message: string = 'Hello, world!';
count: integer = 20;
remember that this is not the same as constants without type indication, and is similar in behavior
to an initialized variable rather than a constant.
2.3. Subprograms
By the word "subprogram" programmers call a separate (i.e., having its own beginning,
its own end, and even its own variables) part of a program intended for solving some part of
a task. Almost any part of the main program or another subroutine can be allocated to a
subroutine; in the place where there used to be code that now appears in the subroutine, we
write the so-called subroutine call, consisting of its name and (in most cases) a list of
parameters passed to it. Through the parameters we can pass to the subprogram any
information it needs for its work.
Putting code parts into subroutines allows you to avoid code duplication, because the
same subroutine can be called as many times as you like from different parts of the program
once written. By specifying different parameter values when calling it, you can adapt the same
subroutine to solve a whole family of similar problems, saving even more on the amount of
code you have to write.
Experienced programmers know that saving code size is not the only reason for using
subroutines. Very often
in programs there are such subroutines that are called only once, which, of course, not only
does not reduce the size of the written code, but even on the contrary - increases it, because
the design of the subroutine itself requires writing several extra lines. Such "one-time"
subroutines are written to reduce the complexity of human perception of the program. By
correctly separating code fragments into subroutines and replacing them with their names in
the main program, we allow the reader of our program (and, by the way, mostly ourselves)
not to think about minor details when working with the main program.
Pascal provides two types of subroutines: procedures and functions. A procedure can
contain almost any set of actions, but it is important to make sure that these actions are
somehow related to each other, otherwise such a procedure will be of little use. Launching a
procedure is a separate operator in the program, which is called the procedure call operator.
As for functions, their main task is to calculate a certain value (by formula or otherwise), and
§ 2.3. Subprograms 305
they are called from arithmetic expressions, and their calls are also arithmetic expressions.
As we will soon see, procedures and functions look very similar. Originally in Pascal, functions
could only be called from expressions, but the creators of Turbo Pascal removed this "inconvenient"
restriction, so that functions can be used instead of procedures. In some other programming
languages there are only procedures or (much more often) only functions; you may encounter the
statement that dividing subroutines into two types is redundant and meaningless.
In reality, the absence of division into procedures and functions distorts the perception of the
most important phenomenon in programming - side effects, provokes their thoughtless application
and eventually cripples the programmer's thinking, which has already been mentioned in the prefaces
(see page 31). We will come back to this question many times.
2.3.1. Procedures
Subroutines, whether procedures or functions, are described in the program between its
header and the main program, i.e. in the familiar description section. To create a first
impression of the subject, let's take a very simple, albeit strange, example: let's rewrite our
very first program hello (see page 234), putting the only action it contains into a procedure.
It will look like this:
program HelloProc;
procedure SayHello;
begin
writeln('Hello, world!')
end;
begin
SayHello
end.
This program first describes a procedure named SayHello, which does all the work, and
the main program consists of a single action - calling this procedure. During the execution of
the program, the subroutine call is as follows. The computer remembers the address of the
137
memory location where the call instruction was encountered in the program, and then
transfers control to the called subroutine, i.e. proceeds to execution of its machine code. When
the subroutine is terminated, the return address memorized before its call is used to return
control to the place where the call originated, or, more precisely, to the next instruction after
the call.
In general, the structure of the procedure text is very similar to the structure of the whole
program: it consists of a header, a section of local descriptions (in our elementary example,
this section is not present, or rather, it is empty, but in the next example we will see how it
looks like when it contains something) and an analog of the "main part", the so-called body,
which looks exactly the same - it starts with the word begin, ends with the word end, and
137
The return address of a subroutine is stored in the so-called hardware (machine) stack; we will return to
this in the next part of our book, which is devoted to computer architecture and assembly language.
§ 2.3. Subprograms 306
contains operators inside. There are, however, some differences: the text of the procedure
ends with a semicolon instead of a dot, and the header may contain a list of formal parameters
(the SayHello procedure does not have this list, but it happens relatively rarely).
Our example illustrates what a subroutine is, but it does not show why it is needed:
compared to the original program with which we began our acquaintance with Pascal, the
new program is almost twice as long and much less understandable, so that it seems as if we
have gained nothing, but on the contrary, lost. This is true, but only for the reason that we
have put too simple an action into the procedure.
Now let's return to the diamond program from the previous paragraph (see page 276)
and try to make it more understandable. Let's first note that the program contains several times
a loop to print the required number of spaces; let's put this action into a procedure. In different
In Pascal, subroutine parameters are written in parentheses immediately after the subroutine
name in a list, very similar to the list of variable descriptions. In our case, the parameter will
be just one - the number of spaces to be printed. We'll call the procedure PrintSpaces
and the parameter count; the procedure header will look like this:
Now let's remember that to print a given number of spaces we need a for loop, and in
it we need a variable of integer type as a loop counter. We can say for sure about this
variable that it does not concern anyone and nothing outside of our procedure, in other words,
it is such a detail of the procedure implementation that we do not need to know, unless we
write and edit our procedure itself. Most existing programming languages, including Pascal,
allow in such cases to describe local variables that are accessible (and generally visible) only
inside a single subroutine. For this purpose, the subprogram has its own description section,
which, like the description section of the main program, is located between the header and the
word begin. Our entire procedure will look like this:
Let us emphasize that the names i and count in this procedure are local, i.e. they do
not affect the rest of the program: we can, for example, describe variables (and not only
variables) with the same names in any place, and perhaps of other types, and it will not lead
to any bad consequences.
Having described the procedure, we can now call it anywhere in the program by writing
something like PrintSpaces(k), and k spaces will be printed. The name of the
procedure, with a list of parameters enclosed in parentheses if necessary, is, in fact, the
procedure call statement.
Before rewriting the diamond program, let's recall the remark we made right after
writing it - that the bodies of two loops in this program turned out to be exactly the same and
§ 2.3. Subprograms 307
that it is impossible to do so, but it is impossible to cope with the problem without subroutines.
Now we have subroutines at our disposal, so let's fix this drawback at the same time. Recall
that the bodies of both cycles we have printed the next line of the figure with "half-height" n
and did it in the following way: first print n + 1 - k spaces, then an asterisk, then, if k > 1, then
print 2k - 3 spaces and another asterisk, and finally translate the line. As we can see, to perform
these steps we need to know two values: k and p, and this is enough to print the desired line
without paying any attention to what is going on around, including the fact in which of the
two cycles (i.e. in which phase of printing the figure) the program is now.
We will print a separate line of our figure in a procedure called
PrintLineOfDiamond; we will pass the numbers k and n to the procedure through
parameters. After replacing the bodies of both loops with calls of this procedure, the main
program will become quite short; the whole program will look like this:
var
n, k, h: integer;
begin
repeat
write('Enter the diamond's height (positive odd): ');
readln(h)
until (h > 0) and (h mod 2=1);
n := h div 2;
for k := 1 to n + 1 do
PrintLineOfDiamond(k, n);
for k := n downto 1 do
PrintLineOfDiamond(k, n)
end.
§ 2.3. Subprograms 308
Despite the abundance of service lines (when describing each procedure we spend at least
three extra lines - on the header, on the word begin and on the word end) and empty lines,
which we insert between subroutines for clarity, the new version of the program still turned
out to be four lines shorter than the previous one, and if we compare the length of their texts
in bytes, it turns out that we saved almost a quarter of the volume. However, such a modest
saving is only due to the primitive nature of the problem to be solved; in more complex cases,
the savings due to the use of subroutines can reach tens, hundreds, thousands of times, so that
it is simply unthinkable to create serious programs without dividing them into subroutines.
Before moving on, let us note one technical point. As in the description of ordinary
variables, in the description of subprogram parameters, parameters of the same type can be
listed comma-separated, specifying their type once; this is what we did when describing the
PrintLineOfDiamond procedure. If we need parameters of different types, the type is
specified for them separately, and a semicolon is placed between the descriptions of
parameters of different types. For example, if we want to improve the PrintSpaces
procedure so that it can print not only spaces but also any other characters, we can pass the
required character through a parameter:
Here, the parameters ch and count have different types, so a semicolon appears between
their descriptions in the header.
Sometimes there are procedures that do not require parameters; an example of such a
procedure is SayHello, which we described at the beginning of this paragraph. The Pascal
language allows in such cases not to write a parameter list either in the procedure description
or when calling it - the parameters disappear together with the parentheses in which their list
should be enclosed. Looking ahead, we note that this is true for functions as well.
The list of parameters specified when describing a subprogram - the one consisting of
variable names and their types - is often called the list of formal parameters, and the list of
values (or, more precisely, in the general case - expressions, the calculation of which gives
the values sought) specified when calling the subprogram - the list of actual parameters. We
do not use these terms, but it is useful to remember their existence.
2.3.2. Functions
In addition to procedures, Pascal also provides another type of subroutines - so-called
functions. At first glance, functions are very similar to procedures: they too consist of a
header, a description section, and a main part, just as local variables can be described in them,
etc.; but, unlike procedures, functions are designed to compute a value and are called from
arithmetic expressions. A function returns control to the calling fragment of the program not
just for nothing, but by communicating a calculated value; a function is said to return a value.
Let's consider the simplest example - a function that converts a floating-point number into
§ 2.3. Subprograms 309
a cube:
As you can see, the function header starts with the keyword function, then, as for a
procedure, the name and the list of parameters are written down; after it, the type of the return
value must be specified (with a colon) for the function, i.e., the type of the value our function
is intended to calculate. In this case, the function calculates a number of type real. Like a
procedure, a function can contain (but, like our Cube, may not contain) a section of local
descriptions, and its main part (the body of the function) is written between the words begin
and end, followed by a semicolon.
As we remember, a procedure call is a separate special kind of operator. This is not the
case with functions. They are called in the same way as procedures - by writing a name and,
if necessary, a list of parameter values in brackets after it; but, unlike procedures, this is no
longer an operator. A function call is an expression that has the same type as the return
value specified for the function. Thus, "Cube(2.7)" is an expression of type real. Of
course, this expression, as well as any other, can be a part of more complex expressions; for
example, in a program you may encounter something like
a := Cube(b+3.5) - 17.1;
where a and b are variables of type real; during the calculation of the expression in the
right part of the assignment, control will be temporarily given to the Cube function, and the
parameter called x in it will receive the value of the result of the calculation b+3.5; when
the function finishes its work, the number calculated by it will be used as a decreasing number
when performing subtraction, and the result of subtraction will be stored in the variable a.
Of course, the function can be called in a simpler way, for example, like this:
a := Cube(b);
a := Cube(a);
but in such a situation it would be better to just write the number 1000 so as not to waste
time calculating it every time this operator is executed.
Free Pascal, like its inspiration Turbo Pascal, allows you to call a function in the same way as a
procedure - with a separate statement, ignoring its return value. We won't do that; moreover, we
strongly discourage you from doing so.
Like a procedure, a function is a subroutine, that is, it is a separate code fragment to which
§ 2.3. Subprograms 310
control is temporarily transferred when it is called. After finishing its work, a function returns
control just like a procedure, and there is only one fundamental difference: before returning
control to the one who called it, the function must fix the value that it was called to calculate.
The function "gives" this value to the caller together with the returned control, so
programmers say that the function returns a value, and this value itself is called returned.
As we have seen in the example, the return value is specified in the function code by a
special kind of assignment operator, which has the name of the function itself in the left part
instead of the variable name. Our function Cube consists of this operator, but there are more
complicated cases. For example, the sequence of Fibonacci numbers is well known, in which
the first two elements are equal to ones, and each subsequent one is equal to the sum of the
two previous ones: 1,1, 2, 3, 5, 8,13, 21, 34,..... The function that calculates the Fibonacci
number by its number could look like this (given the rapid growth of these numbers, we will
use integer type numbers for their numbering, and longint type numbers for working
with the Fibonacci numbers themselves):
function Fibonacci(n: integer): longint;
var
i: integer;
p, q, r: longint;
begin
if n <= 0 then
Fibonacci := 0
else begin q =:0 ; r := 1; for i := 2 to n do begin
p: = q;
q =:r ;
r := p + q
end;
Fibonacci := r end
end;
Some explanations will be appropriate here. The basic algorithm implemented in our function
operates under the condition that the number passed to the function through the parameter n
is not less than one. At each step, the two previous Fibonacci numbers and the current one
are stored in the variables p, q and r. Before starting the work, q contains 0 and r contains
the number 1, which corresponds to the Fibonacci numbers 0 and 1. Note that the variable
r now contains the current Fibonacci number, and the number of the current
number is considered to be the first. The loop located in the following lines "shifts" the
variables p, q and r by one position along the sequence, for this purpose p contains what
was in q (the penultimate number becomes the pre-next number), q contains what was in
r (the last number becomes the penultimate number), and r contains the sum of p and q,
i.e. the next Fibonacci number is calculated. In other words, each iteration of the loop
increases the number of the current number by one, and the current number itself appears in
the variable r each time. The loop is executed as many times as it takes for the number of
the current number to equal the value of the parameter n: if n is equal to one, the loop
is not executed at all, if it is equal to two, the loop runs one iteration, and so on. The resulting
§ 2.3. Subprograms 311
number r is returned as the final value of the function.
It is easy to see that all this can only work for parameter values (number of numbers) of
one and more, so we had to consider the case of n = 0 separately; this is what the if
statement is for. This allows our example to illustrate another important point: in the body of
a function there can be more than one assignment operator specifying the value that the
function will return, but each time our function is called, one and only one such operator must
be triggered, i.e. you cannot specify some return value and then "change your mind" and
specify another one.
boolean type (see §2.2.9). It is clear that a call to such a function will be a logical
expression; however, a crucial feature escapes the attention of many beginners: a call to a
function that returns boolean can itself serve as a condition in branching and loop
operators. This, in particular, allows you to avoid cluttering the text with complex (especially
multi-line) conditions by placing them in separate functions.
For example, if we need to check, as in the example on page 264, whether a variable c
(of type char) contains a Latin letter, the header of an operator (e.g., if) that uses such
a condition will be quite cumbersome:
if ((c >= 'A') and (c <= 'Z')) or ((c >= 'a') and (c <= 'z'))
then
This is exactly the case when you should not be lazy and write a logical function, calling it,
for example, IsLatinLetter:
Our if will now be much more concise and, strangely enough, clearer:
if IsLatinLetter(c) then
Besides, once a function is written, we can use it elsewhere in the program if necessary. Just
in case, we remind you that if you notice that you want to write something like
The only exception is the family of so-called file variable types, which cannot be passed to subroutines
138
by value, returned from functions, or even simply assigned. We will consider these variables in Chapter 2.9,
devoted to working with files.
§ 2.3. Subprograms 312
if IsLatinLetter(c) = true then
- then this is a reason to reread the discussion on page 264 and make yourself finally get rid
of the bad habit of comparing logical values with constants, especially with the true
constant.
We can call it by specifying an arbitrary expression at the call point as a parameter value, as
long as it results in an integer (of any integer type), for example:
a := 15;
p(2*a + 7) ;
In such a call, the value of the expression 2*a + 7 will be calculated first; the obtained
number 37 will be entered into the local variable (parameter) x of the procedure p,
after which the body of the procedure will be executed. It is obvious that from inside the
procedure we cannot influence the value of the expression 2*a +7 and even less the value
of the variable a, about which we know nothing at all, being inside the procedure. We can,
in principle, assign new values to the variable x (although some programmers consider this
a bad style), but when the procedure is finished, x will simply disappear along with all the
information we have written into it.
Some confusion among beginners is caused by the fact that you can (that is, no one
prevents you from) specifying the name of a variable of the corresponding type as a parameter
value:
a := 15;
p(a);
This case is no different from the previous one: the reference to the variable a is nothing
more than a special case of an expression, the result of the calculation of this expression is
the number 15, it is entered into the local variable x, and then the body of the procedure
is executed. We can say that variable x has become a copy of a, but this copy is not
connected to its original; actions on the copy do not affect the original in any way, i.e., as in
the previous case, we cannot affect the value of variable a from within the procedure.
§ 2.3. Subprograms 313
Meanwhile, in some cases it is convenient to be able to change the value(s) of one or
more variables located at the call point from within the subroutine. For example, this may be
necessary if our subroutine calculates more than one value, and the caller needs all these
values. We can describe a function to pass a single value from the subroutine to the caller, as
we did for cube expansion, for example; but what if we wanted to write a subroutine that, for
a given number, computes its square, cube, fourth degree, and fifth degree all at once? For
this simple example, we can specify a "frontal" solution to the problem: write not one
function, but four; but if we call them all sequentially, the result will be ten multiplications,
whereas it is enough to do only four of them to solve the problem. In more complicated cases,
there may be no "frontal" solution to the problem.
This is where parameter-variables come to the rescue, which differ from parameter-
values in that in this case it is not the value that is passed to the subprogram, but the variable
as such. The name of the parameter during the execution of the subprogram becomes
synonymous with the variable that was specified as a parameter in the call point, and
everything that is done with the parameter name inside the subprogram actually happens with
this variable.
In the subprogram header, the word var is placed before the parameter-variable
descriptions and acts before the semicolon or closing parenthesis; for example, for a procedure
with the header
begin
quad := x * x;
cube := quad * x;
fourth := cube * x; fifth := fourth * x end;
The powers procedure has five parameters; when calling it, you can specify an arbitrary
expression of type real as the first parameter , but the other four parameters require
139
specifying a variable, and a variable of type real and no other. This restriction is quite
understandable: inside the powers procedure, the identifiers quad, cube, fourth and
fifth will be synonyms for what we specify at the call point with the corresponding
parameters, and they are handled inside the procedure as variables of type real; if we try to
use a variable of any other type (i.e., using a different machine representation to store the
value), we will get complete chaos at the output, so the Pascal compiler does not allow such
things.
By the way, you can even use an integer expression; Pascal silently converts integers to floating-point
139
numbers if necessary, but to convert backwards, you have to explicitly specify how the conversion should be
performed: with rounding or with discarding the fractional part. We will talk about this in the future.
§ 2.3. Subprograms 314
A correct call to such a subroutine could look like this:
var
p, q, r, t: real;
begin
{ ... }
powers(17.5, p, q, r, t);
As a result of such a call, the variables p, q, r, and t will contain the second, third,
fourth, and fifth powers of the number 17.5.
Beginners are often confused by the word "variable"; a superficial perception of what is
going on may give the impression that only an identifier naming the variable can be
substituted for the parameter-variable in a call. But in reality it is not so; not all variables in
Pascal have identifier names, there are variables that are part of other variables, and there are
variables with no name at all. We haven't met them yet, but we will.
Passing information from a subroutine "out" is not the only use for parameter variables.
For example, later we will consider variables that are quite large; copying such a variable
would take too long, so that if you pass them by value, the whole program might be too slow.
When passing a variable (however large) to a subroutine via a parameter-variable, no
copying takes place, which makes it possible to use parameter-variables for optimization.
In addition, Pascal provides one special kind of variables (so-called file variables) that cannot
even be assigned, let alone copied; such variables can be passed to subroutines only through
parameter variables and in no other way.
The original version of Pascal proposed by Wirth did not provide such a possibility; the description
140
sections there had a strictly fixed order, and the variable description section had to come before the procedure
and function description sections. Current implementations of Pascal have no such restriction.
§ 2.3. Subprograms 315
the second one uses it. So, you should not do this. As far as possible, all communication
with subroutines should be maintained through parameters: transfer information to
subroutines through parameter-values, and from subroutines "outside" all information
should be transferred in the form of values returned by functions and, if necessary,
through parameter-variables.
The main reason for this lies, as it often happens, in the peculiarities of human perception
of a program. If the work of a procedure or function depends only on its parameters, it is much
easier to imagine the work of the program and understand what is going on; if the values of
global variables interfere, you will have to remember about them, that is, every time you look
at the call of a subprogram with some parameters, you will have to take into account that its
work will also depend on the values in some global variables; the parameters at the point of
call are clearly visible, which is not the case with global variables. Quite often it looks as if
some subprogram suddenly changes its behavior (most often from correct to incorrect), and
it is not easy to understand that global variables are to blame. It is said that global variables
accumulate state; unexpected changes in the behavior of some or other parts of the program
appear to be a consequence of this accumulation.
Besides, global variables can be accessed from many different places in the program, so,
for example, if you find out during debugging that someone managed to put a value into a
global variable that nobody expected there, you may spend a lot of time trying to find out
where in your program it happened; this is especially true when creating large programs where
several programmers work on them at the same time.
There is another reason why global variables should be avoided whenever possible. There
is always a possibility that an object that is currently one in your program will need to be
"multiplied". For example, if you are implementing a game and your implementation has a
game board, it is very, very likely that you may need two game boards in the future. If your
program works with a database, you can (and should) assume that sooner or later you will
need to open two or more such databases simultaneously (for example, to change the format
of data representation). The series of examples can be continued indefinitely. If we now
assume that the information critical for working with your database (or game field, or any
other object) is stored in a global variable and all subroutines are tied to the use of this
variable, you will not be able to make a "meta-transition" from one object instance to several.
Now the first six lines of the main part of the program can be replaced by a single assignment:
n := NegotiateSize div 2;
As you can see, the function eventually calculates the value, but it also outputs something to
the screen, reads something from the keyboard; all these I/O operations are nothing but its
side effects.
To tell the truth, it is not at all clear why we should do it this way, although people who
are used to some other programming languages create such functions surprisingly often. Let
us draw the reader's attention to the fact that in terms of simplifying the program text, we can
achieve almost the same success by using a procedure rather than a function, thus avoiding
the use of side effects altogether:
If you don't like the second line here, you can do the halving inside the procedure when
assigning a value to the res variable; actually, we pulled the halving out of the subroutine
(function first, then procedure) in our example purely for clarity reasons, to show that calling
a function with a side-effect can be part of a more complex expression - it's illustratively more
effective than just calling the function in the right-hand side of the assignment.
Many programmers sincerely believe that this solution is absolutely equivalent to the
previous one, also in terms of side effects, and that a "side effect" is in itself any change of
anything anywhere, i.e. any assignment, any input or output operation, etc. is a side effect.
Moreover, someone will probably try to convince you of the same thing. Don't fall for it!
They are deluded. Neither assignment nor I/O by itself has anything to do with side effects,
at least as long as we work in Pascal and refrain from writing functions (not procedures, that's
important!) that perform I/O or assignments to variables other than our own local variables.
The roots of mass misconceptions about the essence of side effects grow from the
popularity of C and C+ languages, in which, strange as it may seem, it is really true; when we
get to the study of C in the second volume, we will find out that there the whole program
execution consists of side effects, that is, literally completely, 100%, and this is not an
exaggeration - formally it is true. But first of all, there are no procedures there, and secondly
(which may completely discourage a person with insufficient experience) assignment is not
an operator, but an arithmetic operation.
Strange as it may seem, there are situations when the use of side effects (note,
consciously) can be justified, including in Pascal programs, i.e. it would be a mistake to say
that side effects should never be used at all. This is why Pascal allows functions that have
side effects. But in every such situation, another solution can be found that does not require
side effects; in general, Pascal allows you to do without side effects at all, and this is one of
its undoubted advantages. Since our task now is to learn to program well, we will try to take
advantage of this feature of Pascal and get used to the fact that an expression is always
evaluated just for the sake of its result. Later, when we learn C, where it is fundamentally
impossible to work without side effects, the habits formed now will help us distinguish
between unavoidable and often harmless side effects, many of which would not be so in other
languages at all, and inappropriate side effects, the application of which turns the program
into a rebus.
2.3.7. Recursion
Subprograms, as we already know, can be called from each other, but the matter is not
limited to this: a subprogram can call itself if necessary - both directly, when the body of the
subprogram explicitly contains a call to the subprogram itself, and indirectly, when one
subprogram calls another, another, possibly a third, etc., and some of them calls the first one
again.
Beginners at the mention of recursion are usually concerned about local variables; it turns
out that there is no problem, because the local variables of a subroutine are created when
§ 2.3. Subprograms 318
the subroutine is called and disappear when the subroutine returns control; a recursive
call will create a "new set" of local variables, and so on each time. If a procedure describes a
variable x and that procedure has called itself ten times, x will exist in eleven instances.
It is important to realize that recursion must end sooner or later, and for this purpose it is
necessary that each subsequent recursive call should solve the same problem, but for at least
a slightly simpler case. Finally, it is obligatory to identify the so-called recursion basis - such
a simple (usually trivial or degenerate) case in which further recursive calls are no longer
needed. If this is not done, the recursion will be infinite; but since each recursive call
consumes memory - for storing the return address, for placing parameter values, for local
variables - when the program goes into infinite recursion, sooner or later the available memory
will run out and the program will crash.
We will give a very simple example of recursion. In §2.3.1, we wrote the PrintChars
procedure that prints a given number of identical characters; the character itself and the
desired number of characters are passed through parameters (see page 289). This procedure
can be implemented using recursion instead of a loop. For this purpose, we should note that,
first, the case when the required number of characters is zero is a degenerate case in which
nothing needs to be done; second, if the case is not degenerate, then printing n characters is
the same as printing first one character and then (n - 1) characters, and the task "print (n - 1)
characters" is quite suitable as a "slightly simpler" case of the same problem. A recursive
implementation would look like this:
As you can see, for the case where there is nothing to print, our procedure does nothing, but
for all other cases it prints one character, so that there is one less character to print; the
procedure uses itself to print the remaining characters. For example, if we call
PrintChars('*', 3), the procedure will print an asterisk and call
PrintChars('*', 2); the "new version" of the procedure will print an asterisk and
call PrintChars('*', 1), which will print an asterisk and call PrintChars ('*', 0); this
last call will do nothing and terminate, the previous one will also terminate, the one before it
will also terminate, and finally our original call will terminate. As you can easily see, the
asterisk will be printed three times.
It is often useful to use the reverse recursion, when a subroutine first makes a recursive
call and then performs some other actions. For example, suppose we have a task to print
(separated by spaces for clarity) the digits that make up the decimal representation of a given
number. It is not a problem to detach the lowest digit from the number: it is the remainder of
division by 10. All other digits of the number can be extracted by repeating the same process
for the original number divided by ten with discarding the remainder. The recursion can be
§ 2.3. Subprograms 319
based on the case of zero: we will not print anything in this case. If we must necessarily print
zero, yielding the number 0, then this special case can be handled by writing another
procedure that will print zero for the zero argument and call our procedure for any other
number. If you write something like
procedure PrintDigitsOfNumber(n: integer);
begin
if n > 0 then
begin
write(n mod 10, ' ');
PrintDigitsOfNumber(n div 10) end
end;
If we call it, for example, PrintDigitsOfNumber(7583), then it will not print exactly
what we want: "3 8 5 7". The digits are correct, but the order is reversed. It's
understandable, because we "chopped off" the lowest digit first and printed it immediately,
and so on, so they are printed from right to left. But the problem is solved with just one small
change:
procedure PrintDigitsOfNumber(n: integer);
begin
if n > 0 then
begin
PrintDigitsOfNumber(n div 10);
write(n mod 10, ' ') end
end;
Here we have swapped the recursive call and the print operator. Since returns from recursion
occur in the reverse order of entering recursion, the digits will now be printed in the reverse
order of the order in which they were "split", i.e., the same call will print "7 5 8 3", which
is what we needed.
Not only procedures, but also functions can be recursive. For example, in the following
example, the ReverseNumber function calculates the number obtained from the initial
number by "flipping backwards" its decimal entry, while the recursion itself takes place in the
auxiliary function DoReverseNumber:
function DoReverseNumber(n, m: longint): longint;
begin
if n = 0 then
DoReverseNumber := m
else
DoReverseNumber :=
DoReverseNumber(n div 10, m * 10 + n mod 10) end;
Dutch scientist and programmer Edsger Dijkstra in 1968 in his article Go to statement
considered harmful proposed to refuse the practice of uncontrolled "jumps" between
142
different code places in order to increase program clarity. Two years before that, Italians
Corrado Böhm and Giuseppe Iacopini had formulated and proved a theorem, now commonly
referred to as the structural programming theorem; this theorem states that any algorithm
represented by a flowchart can be transformed to an equivalent algorithm (i.e. one that
produces the same output words on the same input words) using a superposition of only three
"elementary constructions": direct succession, in which one action is executed first, followed
by another; incomplete branching, in which a certain action is executed first, followed by
another; and branching, in which a certain action is executed first, followed by another action.
In practice, it is common to add also full branching and a loop with a postcondition; we have
already seen all these basic constructs in Figures 2.1 and 2.2 (see pages 258 and 270). All of
these basic constructs have one very important thing in common: each of them has exactly
one entry point and exactly one exit point.
The mysterious word "superposition" in this case means that each rectangle denoting
141
By the way, you should not think that it is impossible in modern conditions; it is not an exaggeration to
say that almost every programmer at least once got hopelessly confused in his own code. Many beginners start
to take the structure of their code more seriously only after such an incident.
142
This name can be roughly translated from English as "A go to operator considered harmful".
§ 2.4. Program design 321
(according to flowchart rules) some action can be replaced by a more detailed (or more
formal) description of this action, i.e., in turn, by a flowchart fragment that is also constructed
as one of the basic constructions. Such replacement is called detailing; the reverse
replacement of a correct flowchart fragment (i.e., a fragment that has one entry point, one exit
point, and is constructed as one of the basic constructs) by a single rectangle ("action") is
usually called generalization. Actually, the very possibility to generalize arises as a result of
observing the rule about one input point and one output point; a rectangle, which denotes a
single action in block diagrams, also has exactly one input and exactly one output, which
allows you to replace any basic structural programming construct with one rectangle, i.e. to
generalize it. This, in turn, allows you to make any flowchart simpler and simpler by hiding
minor details, until it as a whole is simple enough to understand at a glance.
While generalization is usually required when studying an existing program, detailing,
on the contrary, is widely used when creating new programs. One of the most popular
strategies of writing program code, which is called top-down step-by-step detailing, is to start
writing a program from its main part, and instead of some isolated fragments write so-called
stubs; a stub can be either a simple comment like "this is where this and that should happen"
or a call to a subroutine (procedure or function), for which only a header and an empty body
are written (for functions, as a rule, an operator specifying some return value is added). Then
the stubs are gradually replaced by working code, and new stubs appear, of course.
It is easy to see that each plug corresponds to a rectangle, which denotes a certain complex
action in the flowchart, which should be replaced by a more detailed fragment of the flowchart
in the process of detailing. By the way, we have already used top-down step-by-step detailing
when creating the StarSlash program (see page 274).
Fibonacci := r
end;
If in the previous variant we had to divide the whole body of the function into two branches
of the if operator and remember about it throughout the whole body of the function, here
we first process the "special case", and if it is this case, we fix the return value and
immediately terminate execution. Then we safely forget about the already processed case;
this technique allows us not to enclose the rest of the code (which actually implements
everything the function was written for) in the else branch.
The benefit of using the exit operator becomes more obvious as the number of special
cases grows. Suppose, for example, we need to write a subroutine that solves a quadratic
equation. It will get the coefficients of the equation through parameters; the result that our
procedure will return through parameters-variables will consist of three values: logical
(whether roots are found) and two values of real type - these will be the roots
143
themselves. We will not emphasize the case of matching roots, we will simply assign the same
number to both variables.
There are two special cases. First, the coefficient of the second degree may be zero, in
which case the equation is not a quadratic equation and cannot be solved as a quadratic
equation. Second, the discriminant may be negative. If we do not use the early exit operator,
the procedure will look like this:
The reader familiar with complex numbers may notice that the roots of a quadratic equation always exist;
143
solving a quadratic equation in complex numbers is not very difficult, but now we have other goals in mind, so
we will solve it the "schoolboy way," in real numbers.
§ 2.4. Program design 323
ok := false
else
begin
d := b*b - 4*a*c;
if d < 0 then
ok := false
else
begin
d := sqrt(d) ;
x1 := (-b - d) / (2*a);
x2 := (-b + d) / (2*a);
ok := true
end
end
end;
The most interesting thing - the actual solution of the equation - was "buried" at the third
level of nesting, and in general, the control structure of our procedure looks quite scary.
Using the exit operator, we can rewrite it in a slightly different way:
ok := false;
exit
end;
d := b*b - 4*a*c;
if d < 0 then
begin
ok := false;
exit
end;
d := sqrt(d);
x1 := (-b - d) / (2*a);
x2 := (-b + d) / (2*a);
ok := true
end;
Obviously, the text of the procedure has become much clearer, which is greatly facilitated
by reducing the length of its body by one and a half times (it was 15 lines, now it is 10).
In practice, there are subroutines with a considerably large number of special cases - there
may be five, ten, whatever; if we try to write such a subroutine without exit, we simply
will not have enough screen width for structural indents. Besides, organizing the processing
of special cases with the help of nested if'cu constructions is not quite correct even at the
ideological level: the general case, which is obviously "more important" than all the special
cases considered separately, is processed somewhere in the depth of the control structure,
which distracts attention from it and contradicts its main role.
The whole program can also be terminated prematurely; this is done with the halt
operator. This operator can be used anywhere in the program, including any subroutine, but
it should be done with care. For example, novice programmers are very fond of "handling"
any erroneous situations by issuing an error message and immediately terminating the
program; to understand why this should not be done, it is enough to imagine a text editor that
would react in such a radical way to any incorrect keystroke.
The version of the halt operator included in Free Pascal has two forms: the normal form,
where the word halt is simply written in the program, and the parametric form, in which case an
integer expression is specified in parentheses after the word halt, i.e., something like
halt(1) is written. The parameter of the halt operator, if it is specified, sets the termination
code for our program, which allows us to tell the operating system whether we think our program
was successful or not. Code 0 means successful termination, codes 1, 2, etc. are considered
by the operating system as errors. Theoretically, any number from 0 to 255 (single-byte unsigned)
can be used as a termination code, but usually large numbers are not used in this role - in most cases
the termination code does not exceed 10.
The halt operator without parameters is equivalent to the halt(0) operator, i.e. it
corresponds to successful termination.
Finally, the operators for loop termination (break) and a separate loop iteration
(continue), which came into Pascal from the C language, are often useful. We can't
illustrate these operators on simple tasks, as we have considered so far, because they are not
needed there; but very soon we will encounter more complex tasks where break and
continue will allow us to simplify the program text considerably.
§ 2.4. Program design 325
2.4.3. Unconditional transitions
The goto unconditional jump operator allows you to transfer control to another
point in the program at any time or, if you like, to continue execution of the program from
another place. Operators to which unconditional jumps will be made are marked with so-
called labels, which can be ordinary identifiers or numbers-numbers; the compiler supports
the latter for the sake of compatibility with older Pascal dialects. Since a program with
identifier labels is obviously easier to read than a program with number labels, numbers are
not usually used as labels nowadays.
Labels should be listed in the description section using the label keyword to form the
label description section; the labels themselves are listed comma-separated, with a
semicolon at the end, for example:
label
Quit, AllDone, SecondStage;
Usually, label descriptions are inserted immediately before the word begin or before the
variable description section. It should be noted that Pascal prohibits "jumping" from one
subroutine to another, "jumping" into a subroutine from the main program and "jumping" into
the main program from subroutines, so it makes no sense to make labels "global". Labels used
inside a subprogram should be described in the description section of that subprogram, and
labels required in the main part of the program - just before it starts.
To mark an operator with a label, the label is written before the text of the operator,
separated from it by a colon, e.g. as follows:
goto point;
In the literature, you can often find a statement that the goto operator supposedly "cannot
be used" because it confuses the program. In most cases this is true, but there are two (not
one, not three, but two) situations in which the use of goto is not only acceptable, but
also desirable.
The first of the two situations is very simple: jumping out of repeatedly nested control
constructs, for example, loops. The break and continue operators specially designed
for this purpose can handle the exit from one loop, but what to do if you need to "jump out"
from, say, three loops nested into each other? Of course, strictly speaking, you can do without
goto here: you can insert checks of some special checkbox into the loop conditions, check
the checkbox in the inner loop itself and make break;, and then all the loops will be
terminated. Note that in most cases the checkbox will have to be checked not only in the loop
conditions, but also in some parts of the loop bodies with if's that check the same
checkbox. It would be at least strange to claim that all this clutter would be clearer than a
§ 2.4. Program design 326
single goto operator (provided, of course, that the name of the label is chosen well and
corresponds to the situation in which the transition is made to it).
The second situation is a bit more difficult to describe, since we have not yet considered
either files or dynamic memory. Nevertheless, try to imagine that you are writing a subroutine
which, at the beginning of its work, takes a certain resource for a while and then gives it back.
This scheme of operation is quite common; in English, releasing the captured resources before
the end of the work is called cleanup, which can be roughly translated as cleaning. The need
to cleanup before exiting a subroutine does not present any problems if we have only one exit
point; the problems start if somewhere in the middle of the subroutine text there is a need to
end it early with exit. An attempt to do without goto will lead to the fact that everywhere
just before termination, i.e. before exit and before the end of the subroutine body, you will
have to duplicate the code of all operations that perform cleanup. Duplicating code is usually
not good: if we now change the beginning of the subroutine, adding or removing operations
to capture a resource, there is a high probability that out of the resulting several identical
fragments that perform cleanup, we will fix only some of them and forget about the rest.
That's why in such a situation we usually do something else: we put a label before the cleaning
operations located at the end of the subroutine (as a rule, it is called quit or cleanup),
and instead of exit we make goto to this label.
Note that in both cases, the transition is made "down" in the code, i.e. forward in the
execution sequence (the label is in the program text below the goto operator) and "out" of
the control constructs. If you feel the urge to geto backwards, it means that you are creating
a loop, and there are special operators for loops; try using while or repeat/until. If
you want to jump inside a control construct, it means that something has gone completely
wrong and you need to understand the reasons for such strange desires; note that Pascal will
not allow such a thing.
| .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
.0
0.INUL
| SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
.
1.IDLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
.
2.ISPC ! " # $ % & ' ( ) * + , - . /
3. I0 1 2 3 4 5 6 7 8 9 ; < = > ?
4.| @ A B C D E F G H I J K L M N O
5.I P Q R S T U V W X Y Z [ \ ] ~
6.I ' a b c d e f g h i j k l m n o
7.I p q r s t u v w x y z { | } ~ DEL
Figure 2.3. Hexadecimal ASCII codes
In this chapter we will consider a somewhat simplified situation: assume that one byte is
sufficient to store the code of a single character. ASCII and its many eight-bit extensions (see
§1.4.5) fit into this picture; moreover, many programs written for single-byte character codes
will work quite well with UTF-8. As for the "quite correct" handling of multibyte characters,
this is a topic for a separate discussion, and a rather complex one at that; a detailed study of
this issue would distract us from more pressing tasks, so we will leave these problems outside
the scope of this book.
character enclosed in apostrophes denotes itself. In addition, you can define a character using
its code (in decimal system): for example, #10 means a line feed character, and #55 is
exactly the same as '7' (as you can see from the table, the code of the character seven is
3716, i.e. 55sh). To specify special characters with codes from 1 to 26, you can also use the
so-called "carriage" notation: ~A means a character with code 1 (i.e., the same as #1), ~B
means the same as #2, and so on; instead of #26, you can write ~Z.
For example, as we already know very well, the writeln operator, having printed all
its arguments, produces a line feed character at the end, which moves the cursor to the next
line. However, no one prevents us from issuing a line feed character without writeln; in
particular, instead of the familiar
writeln('Hello, world!')
we could write
or
If your system uses Unicode-based encoding, you can easily make the mistake of putting between
155
apostrophes a character whose code takes more than one byte; the compiler cannot handle it. The universal
recipe here is very simple: you should use only characters from the ASCII set in the program text, and place
everything else in separate files if necessary.
§ 2.5. Symbols and their codes; text data 321
write('Hello, world!', ~J)
(in both of these cases, the string is output first, and then the line feed character separately);
looking ahead, we note that you can do even trickier things by "driving" the line feed character
directly into the string itself in one of the following ways:
(here in both cases write prints only one line, but that line contains a line feed character
at the end).
The apostrophe character is used as a limiting character for literals representing both
single characters and strings; if it is necessary to specify the "'" character itself, it is doubled,
thus telling the compiler that it is the apostrophe character itself, and not the ending of the
literal, that is meant. For example, the phrase "'That's fine!" in a Pascal program is specified
as "'That''is fine!". If we need not a string but a single apostrophe character, the
corresponding literal will look like this: "''''"; the first apostrophe indicates the beginning of
the literal, the next two apostrophes are the apostrophe character itself, and the last one is the
end of the literal.
Comparison operations are defined on characters in the same way as on numbers; in fact,
simply the character codes are compared: for example, the expression 'a' < 'z' would
be true, and '7' > 'Q' would be false. This, as well as a continuous arrangement in
the ASCII-table characters of some categories, allows you to calculate a single logical
expression to determine whether the symbol belongs to such a category; so, the expression
(c >= 'a') and (c < = 'z') is a record on the ASCII-table.= 'z')
is an entry in Pascal for the question "is the character in variable c a lowercase Latin letter";
similar expressions for uppercase letters and numbers look like (c >= 'A') and (c
<= 'Z') and (c >= '0') and (c <= '9').
At runtime of a program, it is possible to get the numerical value of its code by an existing
character, i.e. an expression of char type; for this purpose, the built-in function ord . is 156
var
c: char;
n: integer;
then the assignment n := ord(c) will be correct, and the code of the character stored in
the variable c will be entered into the variable n. The reverse operation - getting a
character by a given code - is performed by the built-in function chr; assignment c :=
chr(n) will put into the variable c the character whose code (as an ordinary number)
is in n - provided that this number was in the range from 0 to 255; the result of chr for
other values of the argument is undefined. Of course, chr(55) is the same as #55
or '7'; but, unlike literals, the chr function allows us to construct a character whose code
we didn't know at the time we wrote the program or which changes during the program. For
In fact, ord can do more than this; we will consider its full capabilities when we discuss the generalized
156
As you can see, here chr is taken from the expression i*16 + j: in each line we have
16 characters, i contains the number of the line, j - the number of the column, so this
expression is just equal to the code of the desired symbol, it remains only to turn this code
into a symbol, which we do with the help of chr. The result of the program looks like this:
|.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .
| F
2.
! " # $ % & ' ( ) * + , - ./
|
3. 0 1 2 3 4 5 6 7 8 9 ;< = >?
|
4. @ A B C D E F G H I J KL M NO
|
5. P Q R S T U V W X Y Z [\ ] ~
|
6. ' a b c d e f g h i j kl m no
|
7. P q r s t u v w x y z {| } ~
|
2.5.2. Character input
Let's consider the opposite problem, when we have to get the code of a character unknown
at the moment of writing the program. We already know that we can use read to read,
for example, an integer from the keyboard; but what happens if our program tries to do this
and the user enters some gibberish? Usually, when entering numbers, the read operator
checks if the user input is correct, and if the user makes a mistake and enters something that
doesn't match the program's expectations, it displays an error message and terminates the
program. It looks like this:
§ 2.5. Symbols and their codes; text data 323
Runtime error 106 at $080480C4
$080480C4
$08061F37
If we know that the program is written in Free Pascal, we can find (for example, on the
Internet) information about what 106 is, but that's all; if the program is intended for a
user who doesn't know how to program himself, such diagnostics is useless for him and can
only spoil his mood; besides, as it was said before, it's not a good idea to terminate the program
because of any error.
We can "take the initiative" and tell the compiler that we will handle errors ourselves.
This is done by inserting a rather strange-looking {$I-} directive into the program text
("I" from the word input, "-" means that we disable built-in diagnostic messages for user
input errors). After that, we can always find out whether the next input operation was
successful or not; the built-in function lOResult is used for this purpose. If this function
returns 0, the operation was successful; if it is not zero, it indicates that an error has occurred.
In our case, if the user enters gibberish instead of a number, lOResult will return the
above-mentioned 106. For example, a program multiplying two integers could look like this
(taking into account the use of lOResult):
program mul;
var
x, y: longint;
begin
{$I-}
read(x, y);
if IOResult = 0 then writeln(x * y) else writeln('I
couldn't parse your input') end.
Of course, the phrase "I couldn't parse your input", which means "I couldn't parse your input",
looks friendlier than the frightening "Runtime error 106", but it doesn't solve the
problem completely. We don't know whether the user made a mistake when entering the first
number or the second one, which character caused the error, in which input position it
happened - in fact, we don't know anything at all, except that the user entered something
indecipherable instead of a number. This deprives us of the possibility to give the user a
diagnostic message, the informational value of which would be at least a little higher than the
sacramental "user, you are wrong".
For the trivial case of entering two integers it is not a problem, but when parsing more
complex texts, especially all sorts of files containing text in some formal language, such
features are useless: the program must explain to the user in detail what exactly is the error
and where it is made, otherwise it will be absolutely impossible to work with such a program.
The only option when we retain full control over what is happening and can make our
program's reaction to errors as flexible as we want is to refuse the services of the read
operator to turn the digits entered by the user (i.e. the textual representation of a number) into
a number and do it all by ourselves, reading the user input character by character.
Reading an integer, if we decided to implement it ourselves, should be placed in a
§ 2.5. Symbols and their codes; text data 324
procedure for convenience. In the simplest case, we can entrust this procedure with issuing
an error message, although, of course, this will somewhat reduce its scope - in any case, it is
unlikely that we will be able to use the same procedure in other programs of ours, because in
them such an error message, and not another, may not fit the general style, or it may be
necessary to issue it not by printing it to the screen, but somehow more cleverly - for example,
with the help of a dialog box or something else; but for the sake of simplicity, we will do it
this way. To make it more universal, let our procedure work with numbers of the longint
type; we'll call it ReadLongint.
Before we start writing the procedure, note that it may succeed, in which case it must
somehow inform the caller of the number read; but the user may enter something that cannot
be interpreted as a number, in which case the procedure will have nothing to tell us about the
number read, but will still have to inform the caller that it failed. The procedure will send the
results of its work "outward" through the parameter variables , of which there will be two:
157
of type boolean to notify about success/failure and of type longint to pass the read
number. If the number is correct, the procedure will put the value true into the first
variable and the read number into the second variable; if the user made a mistake, the
procedure will print a message about it and put false into the first variable, while the
second variable will not be touched at all, because there is nothing to put into it - the number
has not been read. The caller can call the procedure again if he wants to.
We will use two considerations when forming a number from the resulting characters.
First, as we have seen, the codes of characters-digits in the ASCII table are consecutive,
starting from 48 (code zero) and ending with the number 57 (code nine); this allows us to
obtain the numerical value of the character-digit by subtracting the code zero from the code
of the character in question. Thus, ord('5') - ord('0') equals 5 (53 - 48), ord('8')
- ord('0') in the same exact way equals 8, and so on.
Secondly, you can compose the numerical value of the decimal notation of a number from
the individual values of its digits, looking through these digits from left to right, by following
a fairly simple algorithm. To begin with, we need to create a variable in which the desired
number will be formed, and put zero there. Then, after reading the next digit, we increase the
already accumulated number tenfold, and to what we get, add the numerical value of the
freshly read digit. For example, when reading the number 257 we will have zero in the
variable before we start reading; after reading the digit "2" the new value of the number will
be 0 - 10 + 2 = 2, after reading the digit "5" we will get 2 - 10 + 5 = 25, after reading the last
digit we will get 25 - 10 + 7 = 257, which was required.
It remains to be noted that for character-by-character reading we can use the
familiar read operator, specifying a variable of char type as a parameter. To make it
more universal, let our function work with numbers of the longint type. The final text
will look like this:
{ char2num.pas }
procedure ReadLongint(var success: boolean; var result:
longint);
Note that when an error is detected, we not only report it, but also execute the readln
statement; called without parameters, this statement will remove all characters up to the
nearest line feed from the input stream; in other words, if we detect an error in user input, we
discard the entire line in which the error was detected. Try experimenting with our procedure,
making various errors in various quantities (including several errors in one line), first as we
present it, and then by removing the readln operator; its purpose will probably become
obvious to you.
§ 2.5. Symbols and their codes; text data 326
To demonstrate work with the ReadLongint procedure, let's write a program that
will ask the user for two integers and output their product. Its main part can look like this, for
example:
var
x, y: longint;
ok: boolean; begin repeat
write('Please type the first number: ');
ReadLongint(ok, x) until ok; repeat write('Please type
the second number: '); ReadLongint(ok, y) until ok; writeln(x,
' times ', y, ' is ', x * y) end.
In fact, in Unix systems the tradition of organizing dialog with the user is somewhat different: it is
considered that the program should not ask the user questions and should not say anything at all, as
long as everything goes as it should; something should be said only in case of errors. Applied to our
program, this means that write statements that issue input prompts should be simply
removed.
If, in addition, we turn our ReadLongint procedure into a function that returns a logical value
(instead of passing it through a parameter variable), we get an obvious source of side effects in the
program (see §2.3.6), but we can rewrite our main part of the program much shorter:
var
x, y: longint;
begin
while not ReadLongint(x) do ;
while not ReadLongint(y) do ;
writeln(x, ' times ', y, ' is ', x * y) end.
This program is remarkable in that the bodies of both loops are empty; the semicolon character plays
the role of an empty statement, or rather, it ends the while loop statement, leaving it with an
empty body. As you can guess, this is made possible by a side effect of the ReadLongint
function; this side effect, which consists of input/output operations and putting a value into a
parameter variable, is what makes up the actual loop body.
In general, this is an example of how not to do things. The loop header side-effect technique is
often used in other programming languages, including C, which we will study in the second volume
of this book. Pascal programmers use this technique less often; Pascal does not honor side effects
at all. As we study C, we will try to show in which (not so frequent, it must be said) cases side effects
in the loop header are not a trick, but a justifiable programming technique; but until then, we strongly
advise you to continue, as we agreed in §2.3.6, to avoid using side effects.
The conclusion of this paragraph can be expressed in one phrase: character-by-
character input and analysis of textual information is the most universal approach to its
processing.
You can check the correct operation of this program not only with the help of a file (this is
not very interesting - it will give "Ok" as many times as there are lines in the file, but you
can't count them, really), but also with the help of ordinary keyboard input. We should
remember (see §1.2.9) that Unix terminal drivers are able to simulate the end of a file; if the
terminal is not reconfigured, it does this when the Ctrl-D key combination is pressed. If
we run the FilterOk program without redirecting input and start typing arbitrary strings,
the program will say "Ok" in response to each input string; when we get bored, we can press
Ctrl-D and the program will correctly terminate with a "Good bye". Of course, we
could simply "kill" the program by pressing Ctrl-C instead of Ctrl-D, but then it
would not give us any "Good bye".
Now let's write a more interesting filter that counts the length of each input string and
outputs the result when the string ends. As in the previous case, we don't need to store the
whole string; we will read the text character by character, and to store the current value of the
string length we will create a variable count; when reading any character except the end of
the string, we will increment this variable by one, and when reading the end of the string we
will output the accumulated value and reset the variable to zero to calculate the length of the
next string. It is also important not to forget to zero our variable at the very beginning of the
program, so that the length of the very first line is also calculated correctly. All together it
will look like this:
Let's consider a more complex example. Suppose we need a filter program that selects lines
starting with non-space characters from the input text and prints them only, and ignores lines
starting with a space or tab, as well as empty lines. It may feel like reading the whole line is
necessary here, but this time it is not. In fact, we only need to remember whether we are
printing the current line or not; there are times when we don't know whether to print the
current line yet - this happens if we haven't read a single character of the line yet. To store
both conditions, we'll use logical variables: one, which we'll call know, will remember
158
whether we know whether or not to print the current line, and the second, print, will
only be used if know is "true", in which case it will indicate whether or not we are printing
the current line.
After reading the character, we first check to see if it is a line feed character. If so, we
first check to see if the current line is being printed, and if so, we print the line feed character;
we then put a "lie" value in the know variable to indicate that we are starting the next line,
which we don't yet know if it should be printed.
If the character read is not a line feed character, we have two options. If we don't know
yet whether the current line is printable, it's time to find out: depending on whether the space
character (or tab) is read or not, we put the value "false" or "true" into the variable print,
and then we put "true" into the variable know, because now we know for sure whether
the current line is printable or not.
Next, whatever the read character is, we know whether to print it or not: indeed, even if
we didn't know it just now, we already know it. If necessary, we print the character, and that's
where the body of the loop ends.
At the beginning of the program, we must remember to specify that we don't know yet
whether the next line will be printed; this is done by putting a "lie" in the know variable.
To be on the safe side, we should also put "lie" into the print variable, otherwise
the compiler will warn that this variable can be used without initialization (this is not the case
here, but the situation was too complicated for the compiler). The full text of the program will
look like this:
program SkipIndented; { skip_indented.pas }
var c: char; know, print: boolean;
begin know := false; print := false; while not eof do begin
read(c); if c = #10 then begin if know and print then writeln;
know := false end else begin if not know then begin print := (c
<> ' ') and (c <> #9); know := true end; { by this point always
know = true } if print then write(c) end end end end.
It will be interesting to test this program by giving it its own source code as input:
avst@host:~/work$ ./skip_indented < skip_indented.pas program
SkipIndented; var begin end. avst@host:~/work$
158
Here it would be more correct to use an enumerated type variable for the three options ("yes", "no",
"don't know"), but we haven't parsed the enumerated type yet.
§ 2.5. Symbols and their codes; text data 330
Indeed, in our program, only these four lines are written from the leftmost position, while
the rest are shifted to the right (structural indents are used) and begin, as a consequence,
with spaces.
When analyzing the text of our program, you may get the misleading impression that if
you enter the text not from a file but from the keyboard, the characters output by the program
will be mixed with the characters entered by the user. In fact, it is not so: the terminal gives
our program the text entered by the user not one character at a time, but in whole lines, that
is, by the time our program reads the first character of the line, the user has already entered
the whole line, so that the characters will appear on the screen in the next line. You can check
this for yourself.
Of course, filter programs are much more complex, and it is often necessary to store in
memory not only entire strings, but also the entire input text (this is how the sort program,
for example, is forced to do). Later, when we get acquainted with dynamic memory
management, we will learn to write such programs ourselves.
program SimpleSum;
var
sum, count, n: longint;
begin
(0x0sum := 0;
count := 0;
while not eof do { bad idea! }
§ 2.5. Symbols and their codes; text data 331
begin
read(n);
sum := sum + n;
count := count + 1
end;
writeln(count, ' ', sum)
end.
- and are surprised to find that it works "somehow wrong". They have to press Ctrl-
D several times for the program to finally calm down; in this case, the number of
numbers produced by the program is more than we entered.
Figuring out what's wrong here can be tricky, but we'll give it a try; to do so, we'll
need to understand in detail what read actually does when we ask it to read a number,
but that shouldn't be a problem since we already did it ourselves in §2.5.2. So, the first
thing read does is skip whitespace characters, that is, it reads characters from the
input stream one at a time until it finds the first digit of the number.
Having found this digit, read begins to read the digits that make up the
representation of the number, one by one, until it again comes across the space
character, and in the process of this reading accumulates the value of the number in the
same way as we did, by a sequence of multiplications by ten and adding the value of
the next digit.
Note that the "end of file" situation does not have to occur immediately after
reading the last number. In fact, it almost never does; remember that text data is a
sequence of lines, and at the end of each line is a line feed character. If the data input
to our program is a valid text, then after the very last number in this text there should
be a line feed (this is if the user has not left a dozen or two insignificant spaces after
the number, and no one forbids him to do so). It turns out that at the moment when
read finishes reading the last number from the input stream, the numbers have
already run out, but the characters in the stream have not yet. As a consequence, eof
still thinks that nothing has happened and gives "false"; as a result, our program does
one more read, which, when trying to read the next character, safely stops at the
end of the file. Its behavior in this situation is a bit unexpected - without producing any
errors, it simply pretends to read the number 0 (it is not clear why this is done, but
the reality is like this). Hence the discrepancy between the number of entered numbers
calculated by the program and the number of numbers actually entered by the user
(although the sum is correct, because adding an extra zero does not change it; but this
is just lucky for us - imagine what would happen if we calculated the product instead
of the sum).
And this is not the end of the story. In the case when input is not from a real file
(which may indeed run out) but from the keyboard, where the end-of-file situation has
to be simulated by pressing Ctrl-D, it is quite possible to continue reading
after the end-of-file situation occurs; in our situation, this means that read,
having hit the end-of-file situation, has used it, so that eof will not see
this end-of-file situation. That's why you have to press Ctrl-D twice for the program
to terminate. When working with real files, this effect is not observed, because once a
§ 2.5. Symbols and their codes; text data 332
file has ended, the associated input stream remains in the end-of-file situation
"forever".
Anyway, we have a problem, and we need a means to solve it; Free Pascal provides
us with such a means: the SeekEof function. Unlike the usual eof, this function
first reads and discards all whitespace characters, and if it finally "hits" the end of the
file, it returns "true", and if it finds a non-whitespace character, it returns "false". At
the same time, the found non-white character is "returned" to the input stream, so that
the next read will start from it.
The above "incorrect" program turns into a correct one with a single correction -
we need to replace eof in the loop header with SeekEof and everything will
work exactly as we want it to.
Free Pascal's implementation of SeekEof until recently contained a completely
ridiculous bug that made it impossible to work with any input streams other than real disk files
and terminal (keyboard) input. In particular, programs using SeekEof could not work as
part of pipelines. The author of the book was informed about this by readers soon after the
first volume of the first edition was published.
Interestingly, the bug was introduced into the code around 2001; apparently, someone
tried to report it to the Free Pascal developers several times, but instead of fixing the bug,
they preferred to write in the documentation that this function is not designed to work with
streams other than disk files - in other words, they declared the bug a bug in the best
traditions. The creators of Free Pascal were silent about why such a function is needed at all
and how it is suggested to write programs that read numbers from text streams.
In September 2020, while preparing the book for its second edition, yours truly decided
to try to convince this company that SeekEof should be fixed. During the discussion, the
author got the clear impression that there was not a single person on the Free Pascal
development team who understood why the SeekEof function was needed, and in general,
what text input streams are (especially in Unix systems) and how they are handled. You can
read more about this story on the author's website, and there are links to forum discussions
there as well. To convince one particular hard-headed character that the world really works
this way, the author had to use heavy artillery: dig up the original Borland Pascal 7.0 box with
all the books in it and quote, as they say, the original source - the original description of
SeekEof, which, among other things, said what this function was for. Not only that, but I had
to extract from the archives of 25 years ago the distribution kit of the same BP 7.0, find the
RTL sources, find the file containing the implementation of four functions at once - Eof,
SeekEof, Eoln and SeekEoln - and clearly demonstrate to these strange people that
the original version of SeekEof never did all those idiotic things that the code of its
FreePascal implementation contained for some reason. Apparently, those who tried to report
this bug earlier didn't have a box of Borland Pascal^ in their closet, and the Free Pascal^
maintainers wouldn't settle for anything less.
The most interesting thing is that the same character, with whom your humble friend
spent dozens of hours in fruitless discussion, reluctantly agreed to fix the SeekEof function
itself, but "at the same time" completely broke some other function in the code, He refused to
discuss his idiotic actions, and there were no other Free Pascal maintainers who had access
to the code and were willing to cooperate.
Anyway, when the second edition of the book is released in early 2021, the last official
release of Free Pascal is still 3.2.0, in which SeekEof still contains 19 years of someone
§ 2.5. Symbols and their codes; text data 333
else's idiocy. To get a fixed version of SeekEof, you need to download an archive of the
latest version under development from freepascal.org and work with it; no one can
say when the next official release will be, of course, but there is at least some hope that
SeekEof will work correctly in the next version.
This whole story, unfortunately, shows that it would be extremely imprudent to use Free
Pascal as a professional tool. The author would not have had anything to do with this project
if it weren't for the fact that there are no other live implementations of Pascal in the world right
now.
Just in case, let's also mention the SeekEoln function, which returns "true"
when the end-of-line character is reached. Like SeekEof, it reads and discards
whitespace characters. This function may be needed, for example, if the input data
format involves sets of variable-sized numbers grouped on different strings.
2.6. Pascal's type system
2.6.1. Built-in types and custom types
The variables and expressions we have used so far belong to different types
(integer and longint, real, boolean and some others), but all these types
have one thing in common: they are built into the Pascal language, that is, we don't
need to explain to the compiler what these types are, the compiler already knows about
them.
Pascal does not stop with built-in types; Pascal allows us to create new types of
expressions and variables ourselves. In most cases, new types are named using the
identifiers we already know, just as we know about variables, constants, and
subroutines; in some cases, anonymous types are also used, i.e., as the name implies,
types that are not named, but their use in Pascal is somewhat limited. Any types
introduced by the programmer (i.e., not built-in) are called user-defined types because
they are introduced by the user of the compiler; clearly, this user is the programmer,
who of course should not be confused with the user of the resulting program.
To describe user-defined types and their names, type description sections
starting with the keyword type are used, just as variable description sections start
with the word var. As in the case with variables, the type description section can be
located in the global area or locally in a subprogram (between the subprogram header
and its body). Like variables, types described in a subprogram are visible only in that
subprogram, while types described outside subprograms (i.e. in the global area) are
visible in the program from the place where the type is described until the end of the
program.
The simplest variant of a new type is a synonym of some type we already have,
including a built-in type; for example, we can describe the type MyNumber as a
synonym of the type real:
type
MyNumber = real;
You can do everything with these variables that you can do with variables of type
real; moreover, you can mix them in expressions with variables and other values of
type real, assign them to each other, and so on. At first glance, the introduction of
such a synonym may seem pointless, but sometimes it turns out to be useful. For
example, while writing some relatively complex program, we may doubt what digit
capacity of integers will be enough for a local problem; If, for example, we decide that
ordinary two-byte integer'cu's are enough for us, and then (in the process of testing
or even operating the program) it turns out that their digit capacity is not enough and
the program works with errors due to overflows, then to replace integers with
longint'bi's or even int64's we will have to look through the program text
carefully to identify the variables that should work with numbers of higher digit
capacity and change their types; In doing so, we risk missing something or, on the
contrary, turning into a four-byte variable a variable for which two bytes are enough.
Some programmers cope with this problem, as they say, cheap and easy: simply
use longint always and for everything. But on closer examination this approach
turns out to be not too successful: firstly, longint's bitness may be
insufficient too and int64 will be needed, and secondly, sometimes thoughtless
use of variables with a bitness higher than required leads to a noticeable memory
overrun (for example, when working with large arrays) and slowing down of the
program (if you use int64 instead of integer in a 32-bit system, the program's
speed may drop several times).
The introduction of synonym types allows us to deal with this problem more
elegantly. Suppose we are writing a traffic simulator in which each object participating
in our simulation has its own unique number, and we also need loops that run through
all such numbers; at the same time, we don't know for sure whether 32767 objects will
be enough to achieve the required simulation goals or not. Instead of guessing which
type to use - integer or longint, we can introduce our own type, or rather,
type name:
type
SimObjectId = integer;
Now we will use the SimObjectId type name (instead of integer) wherever
we need to store and process the simulation object identifier, and if we suddenly realize
that the number of objects in the model is dangerously close to 32000, we can replace
integer with longint in one place in the program - in the description of the
SimObjectId type - and the rest will happen automatically. By the way, if we
never need a negative value of the object identifier during the program, we can replace
integer with unsigned word, which is more suitable here.
type
digit10 = 0..9;
var
d: digit10;
The variable d described in this way can take only ten values: integers from zero to
nine. Otherwise, you can work with this variable in the same way as with other integer
variables.
It is worth noting that (at least for Free Pascal) the machine representation of a number
in such a variable is the same as the machine representation of an ordinary integer, so that
the size of a variable of range type is the same as the size of the smallest variable of a built-
in integer type that can accept all values of a given range. Thus, a variable of digit10
type will occupy one byte; a variable of range type will occupy one byte
15000.. 15010 will occupy two bytes, oddly enough, because the smallest type that
can accept all values from this range is the integer type. Similarly, a variable of the
range type will occupy two bytes as well
0.. 60010, since a variable of type word can take all these values, and so on. At
first glance, this seems a bit odd, since both of these ranges involve only 11 different values
each; clearly, one byte would be sufficient. But the point is that by using a single byte in these
situations, the compiler would be forced to represent the numbers in the ranges differently
than regular integers, and for every arithmetic operation on the numbers in the ranges, it
would have to insert an additional addition or subtraction into the machine code to bring the
machine representation to the form handled by the CPU. There is nothing impossible in this,
but in modern conditions program performance is almost always more important than the
amount of memory used.
Range types are not limited to integers; for example, we can specify a subset of
characters:
type
LatinCaps = 'A'..'Z';
We will return to this issue when we study the concept of an ordinal type. For now, let
us note one more very important point: when defining a range, only compile-time
constants can be used to specify its boundaries (see §2.2.15).
Another simple case of a user-defined type is the so-called enumerated type; an
expression of this type can take one of the values listed explicitly in the type
description. These values themselves are specified by identifiers. For example, to
describe the colors of a rainbow, we could specify the type
§ 2.6. Pascal's type system 336
type
RainbowColors =
(red, orange, yellow, green, blue, indigo,
violet);
var
rc: RainbowColors;
The variable rc can take one of the values listed in the type description; for example,
you could do this:
rc := green;
the number of the next value exceeds the number of the previous one by one.
In classical Pascal, constants specifying an enumerated type are equal only to
themselves and nothing else; the sentence "explicitly set the value of a constant in an
enumeration" is simply meaningless from the point of view of classical Pascal. At the same
time, modern dialects of Pascal, including Free Pascal, under the influence of C (in which
similar constants are simply integers) have somewhat modified and allow explicitly setting
numbers for constants in enumerations, giving rise to a strange (if not to say nonsense) type
that is considered an ordinal type, but at the same time does not allow succ and pred to
be used and in fact does not give anything useful, because (unlike C again) Pascal has tools
specifically designed to describe compile-time constants (see page 279). page 279), and not
only integer ones. We will not consider such "explicit number values", as we will not consider
many other facilities available in Free Pascal.
Note that a constant specifying the value of an enumerated type can be used only
in one such type, otherwise the compiler would not be able to determine the type of an
expression consisting of this constant alone. Thus, describing the RainbowColors
type in a program, and then forgetting about it and describing (for example, to simulate
traffic signals) something like
Recall that we met this function when we worked with character codes; see §2.5.1. §2.5.1.
159
§ 2.6. Pascal's type system 337
type
Signals = (red, yellow, green);
type
RainbowColors = (
RcRed, RcOrange, RcYellow, RcGreen,
RcBlue, RcIndigo, RcViolet.
);
Signals = (SigRed, SigYellow, SigGreen);
1. all possible values of this type are assigned integer numbers, and this is done in
some natural way;
2. comparison operations are defined on values of this type, and the element with
the lower number is considered to be the smaller one;
3. for a type it is possible to specify the smallest and the largest value; for all values
except the largest, the next value is defined, and for all values except the
smallest, the previous value is defined; the sequence number of the previous
value is one less, and the next value is one more than the number of the initial
value.
Pascal has a built-in ord function to calculate the ordinal number of an element,
and pred and succ functions to calculate the previous and next elements,
respectively. All these functions are already familiar to us, but now we can move from
special cases to general ones and explain what these functions actually do and what
their scope of application is.
The ordinal types are:
• boolean type; its smallest value false has the number 0, and its largest
value true has the number 1;
• char type; its element numbers correspond to character codes, the smallest
element is #0, the largest element is #255;
• integer types, both signed and unsigned ; 160
The Free Pascal compiler does not consider 64-bit integer types, i.e. int64 and qword types,
160
to be ordinal. This restriction is introduced by the arbitrary behavior of the compiler's creators and has
no other grounds than some simplification of implementation.
§ 2.6. Pascal's type system 338
• any enumerated type.
At the same time, the real type is not an ordinal type; this is quite understandable,
because floating-point numbers (actually binary fractions) do not allow any "natural"
numbering. The functions ord, pred and succ cannot be applied to values of the
real type, as this will cause a compilation error. In general, no types other
than those listed above, i.e. except boolean, char, integers (up to and including 32
bits), enumerated types and ranges, are ordinal in Pascal.
The concept of an ordinal type is very important in Pascal, because in many
situations any ordinal type is allowed, but no other type is allowed. We have already
seen one such situation: a range can be specified as a subrange of any ordinal type -
and no other. In the future, we will also encounter other situations where ordinal types
must be used.
2.6.4. Arrays
The Pascal language allows you to create complex variables, which differ from
simple variables in that they themselves consist of variables (of course, of a different,
"simpler" type). There are two main types of complex variables: arrays and records;
we will start with arrays.
An array in Pascal is a complex variable consisting of several variables of the
same type, called array elements. To refer to the array elements are so-called indices
- values of one or another ordinal type, most often - ordinary integers, but not
necessarily; the index is written in square brackets after the array name, that is, if, for
example, a is a variable of the "array" type, for which indices are integers, then a[3]
is an element of the array a, having the number (index) 3. In square brackets
you can specify not only a constant, but also an arbitrary expression of the
corresponding type, which allows you to calculate indices during program execution
(without this feature, arrays would have no meaning).
Since an array is essentially a variable, even a "complex" one, this variable must
have a type; it can be described and named like other types. For example, if we plan to
use arrays of one hundred numbers of type real in a program, the corresponding
type can be described as follows:
type
real100 = array [1..100] of real;
Now real100 is a type name; a variable of this type will consist of one hundred
variables of type real (array elements), which are provided with their numbers
(indices), and integers from 1 to 100 are used as such indices. By entering a type
name, we can describe variables of this type, for example:
var
a, b: real100;
Described in this way a and b are arrays; they consist of the elements a[1], a[2],
§ 2.6. Pascal's type system 339
..., a[100], b[1], b[2], ..., b[100] For example, if for some reason we
need to study the behavior of sine in the vicinity of zero, we can start by forming a
sequence of numbers 1, |, |, |,..., 2 in the elements of array a :
-99
a[1] := 1;
for i := 2 to 100 do
a[i] := a[i-1] / 2;
and then enter the corresponding sine values into the elements of the array b:
for i := 1 to 100 do
writeln(a[i], ' ', b[i]);
Note that here we perform all operations on array elements, not on arrays themselves;
but an array can be treated as a whole if necessary, because it is a variable. In particular,
arrays can be assigned to each other:
a := b;
But you should think carefully first, because this assignment copies all the information
from the memory area occupied by one array to the memory area occupied by another
array, and this can be relatively time-consuming, especially if you do it often, say, in
a loop. Similarly, care should be taken when passing arrays as a parameter to
procedures and functions; we'll come back to this issue. It is important to remember
that only arrays of the same type can be assigned to each other; if two arrays are of
different types, even if described in exactly the same way, they cannot be assigned.
As we have already mentioned, in principle it is possible not to give a name to a
type, but to describe a variable of that type at once; the type itself will be considered
anonymous in this case. This also applies to array types: for example, we could do the
following:
var
a, b: array [1..100] of real;
— and in a simple task where no more arrays of this type are assumed, we will not
notice any difference at all. But if at some point, for example, locally in some
procedure, we need one more such array, and we describe it:
var
c: array [1..100] of real;
— then this array will be incompatible with arrays a and b, i.e. they cannot be
assigned to each other, even though they are described in exactly the same way. The
§ 2.6. Pascal's type system 340
point is that formally they belong to different types: every time the compiler sees such
a variable description, another type is actually created, but it is not given a name, so in
our example the first such type was created when describing arrays a and b, and
the second one (though exactly the same, but new) - when describing array c.
The indication of index change limits deserves special attention. The syntactic
construction "1..100" definitely reminds us of something: range types were
described in exactly the same way, and this is not accidental. In Pascal, you can use
any ordinal type to index arrays, and when describing an array type, you actually
specify the type to which the index value should belong, which is most often a range,
and anonymous - but not necessarily. Thus, we could describe the real100 type in
a more detailed way:
type
from1to100 = 1..100;
real100 = array [from1to100] of real;
Here we first describe the range type explicitly, giving it the name "from1to100",
and then use this name when describing the array. There are arrays whose indexes are
not ranges, but something else. For example, if we have balls of seven colors of the
rainbow in our task, and we process the colors using the enumerated type
RainbowColors (see page 336), and at some point we need to count how many
balls of each color there are, an array of this type may be convenient:
type
type
CharCounters = array [char] of integer; PosNegAmount
= array [boolean] of real;
The first of these types defines an array of 256 elements (of type integer)
corresponding to all possible values of type char (i.e. all possible characters); such
an array may be needed, for example, when analyzing the frequency of text. The
second type assumes two elements of type real, corresponding to logical values; to
understand where such a strange construction can be useful, imagine a geometric
problem associated with calculating the sums of areas of some figures, and the
summation should be performed separately for the figures corresponding to some
condition and for all others.
When describing array variables, you can initialize them, i.e., set the initial values
of their elements. These values are listed in parentheses separated by commas, for
example:
type
§ 2.6. Pascal's type system 341
arr5 = array[1..5] of integer;
var
a: arr5 = (25, 36, 49, 64, 81);
Let's note one more important point. Each array element is a full-fledged variable.
You can not only assign values to them, but also, for example, pass them to procedures
and functions via variable parameters (if you don't remember what this is, be sure to
read §2.3.4). Looking ahead, we note that, like any variable, array elements have an
address in memory, and this address can be learned and used; we will learn how to do
this in Chapter 2.10.
Arrays are indispensable in such problems where it is obvious from the condition
that it is necessary to maintain many values of the same type differing by numbers.
Let's consider the following problem as an example.
A computer science Olympiad was being held in the city N. There were a lot of
participants expected, so it was decided that they would register for the
Olympiad right in their schools. Since there are only 67 schools in the city, and
not more than two or three dozen students from each school can take part in the
Olympiad, the organizers decided to arrange the numbering of the participants'
cards in the following way: the number consists of three or four digits, and the
two lower digits set the number of the participant among the students of the
same school, and the higher one or two digits - the number of the school; for
example, in school No. 5 the future participants of the Olympiad were given
cards with the numbers 501, 502, ...., and in school No. 49 - with numbers
4901, 4902, etc.
Schoolchildren who came to the Olympiad presented their cards to the
organizers, and the organizers entered the following lines into the text file
olymp.txt: first they typed the card number, and then, after a space, the
surname and first name of the participant. Naturally, the participants appeared
at the Olympiad in a completely arbitrary order, so the file could contain, for
example, the following fragment:
On the day of the Olympiad there was a soccer match between popular teams in
the city, as a result of which not all students who registered in their schools
ended up coming to the Olympiad - some of them preferred to go to the stadium.
It is necessary to find out from which schools the largest number of participants
came.
At first glance, this task may seem difficult to solve, because we haven't yet learned
how to work with files and how to process strings in Pascal, but strangely enough, we
§ 2.6. Pascal's type system 342
don't need to. We can ignore the names of students, because we only need the school
number, which is extracted from the number of the participant's card by integer
division of this number by 100. As for the files, we can compensate for our lack of
knowledge by knowing how to redirect the standard input stream: in the program we
will read the information using the usual means - the readln operator (after reading
a number, this operator will reset all the information to the end of the line, which is
what we need), and we will run the program by redirecting the standard input from the
olymp.txt file. We will read until the file is finished; we already know how to do
this from §2.5.3.
While reading, we will have to count the number of students for each of the 67
schools in the city, i.e. we will need to maintain 67 variables at the same time; arrays
are invented just for such situations. To be on the safe side, we will put the number 67
in a named constant at the beginning of the program. In the same way, we will put the
maximum allowed number of students from one school into a named constant; it
corresponds to the number by which we need to divide the card number to get the
school number (in our case it is 100):
program OlympiadCounter;
const
MaxSchool = 67;
MaxGroup = 100;
In principle, we need exactly one array in our program, so we could leave its type
anonymous, but we won't do that and describe the type of our array with its name:
type
CountersArray = array [1..MaxSchool] of integer;
The variables we will need are, first, the array itself; second, we will need integer
variables to loop through the array, to read card numbers from the input stream, and to
store the school number. Let's describe these variables:
var
Counters: CountersArray; i, c, n: integer;
Now we can write the main part of the program, which we start by zeroing out all the
elements of the array; indeed, so far we haven't seen a single participant card, so the
number of participants from each school should be zero:
{$I-}
while not eof do
begin
readln(c);
if IOResult <> 0 then
begin
writeln('Incorrect data');
halt(1) end;
n := c div MaxGroup;
if (n < 1) or (n > MaxSchool) then begin
writeln('Illegal school id: ', n, ' [', c, ']');
halt(1)
end;
Counters[n] := Counters[n] + 1;
end;
The next step of processing the obtained information is to determine what number of
students from the same school is a "record", i.e. simply to determine the maximum
value among the elements of the Counters array. This is done as follows. To begin
with, we will declare school #1 to be the "record" school, no matter how many students
arrive from there (even if none). To do this, we will put its number, i.e. the number 1,
into the variable n; this variable will store the number of the school that is currently
(for now) considered to be a "record" school. Then we will look through the
information for all other schools, and every time the number of students from the next
school exceeds the number of students from the school that has been considered the
"record" school so far, we will assign to variable n the number of the new
"record" school:
n := 1; { give the record to the
first school }
for i := 2 to MaxSchool do{ go through the rest }
§ 2.6. Pascal's type system 344
if Counters[i] > Counters[n] then { new record? } n
:= i; { update the number of the "record"
school }
By the end of this cycle, all counters will have been looked through, so that variable n
will contain the number of one of the schools from which the maximum
number of students came; in general, there may be more than one such school (for
example, the Olympiad may have had 17 participants from schools #3, #29 and #51,
and fewer from all other schools). What remains to be done is what the program is
written for: to print the numbers of all schools from which exactly as many students
came as from the one whose number is in the variable n. This is quite simple: we look
through all schools in order, and if there are as many students from this school as from
the n'th school, we print its number:
for i := 1 to MaxSchool do
if Counters[i] = Counters[n] then writeln(i)
The only thing left to do is to end the program with the word end and a dot. The
whole text of the program is as follows:
program OlympiadCounter; { olympcount.pas }
const MaxSchool = 67; MaxGroup = 100; type CountersArray =
array [1....MaxSchool] of integer; var Counters:
CountersArray; i, c, n: integer; begin for i := 1 to
MaxSchool do Counters[i] := 0; {$I-} while not eof do begin
readln(c); if IOResult <> 0 then begin writeln('Incorrect
data'); halt(1) end; n := c div MaxGroup; if (n < 1) or (n
> MaxSchool) then begin writeln('Illegal school id: ', n,
' [', c, ']'); halt(1) end; Counters[n] := Counters[n] + 1
end; n := 1; for i := 2 to MaxSchool do if Counters[i] >
Counters[n] then n := i; for i := 1 to MaxSchool do if
Counters[i] = Counters[n] then writeln(i) end.
Here the Checkpoint identifier is the name of the new type, record is the
keyword for the record, and the description of the record fields follows in the same
format as we describe variables in the var sections: first the variable names (in this
case, fields) separated by commas, then a colon, the type name, and a semicolon. A
variable of type Checkpoint would thus be a record with integer fields n and
penalty, latitude and longitude fields of type real, and a
hidden field of logical type. Let's describe such a variable:
var
cp: CheckPoint;
As in the case of arrays, when working with a record, most, if not all, of the actions are
performed on its fields; the only thing that can be done with the record itself as a single
entity is assignment. But the fields themselves (again, like array elements) are full-
fledged variables; you can do everything with them that is usually done with variables
- you can assign values to them, pass them to subroutines via variable parameters, etc.
const
MaxCheckPoint = 75;
and after describing the Checkpoint type (as done in the previous paragraph)
describe a type for an array that can hold information about all checkpoints in the
course and describe a variable of that type:
-type
CheckPointArray = array [1..MaxCheckPoint] of
Checkpoint;
var
track: CheckPointArray;
The resulting data structure in the track variable corresponds to our idea of a table.
When we build a table on paper, we usually write down the names of its columns at
the top, and then we have rows containing "records". In the data structure just built,
the role of the table header with column names is played by the field names - n,
latitude, longitude, hidden and penalty. The table rows correspond to
array elements: track[1], track[2], etc., each of which represents a
Checkpoint record, and a separate cell of the table is a field of the corresponding
record: track[7].latitude, track[63].hidden, etc.
An array element can also be another array. This situation is so frequent that it has
its own name - "multidimensional array", and a special syntax to describe it: when
specifying an array type, we can write not one type of index in square brackets, but
several, separated by a comma. For example, the following type descriptions are
equivalent:
type
arrayl = array [1..5, 1..7] of integer;
array2 = array [1..5] of array [1..7] of integer;
type
matrix5x5 = array [1..5, 1..5] of real;
Multidimensional arrays can also be fields of records, just as records can be their elements.
There are no formal restrictions on the depth of nesting of type descriptions in Pascal, but you
should not get carried away with it: computer memory is by no means infinite, and in addition,
some specific situations may impose additional restrictions. For example, you should think
ten times before making a huge array of a local variable in a subroutine: local variables are
§ 2.6. Pascal's type system 347
located in stack memory, which may be much smaller than you expected.
In order for a type to be used when passing a parameter to a subprogram, this type must be
described in the type description section and named:
type
MyRange = 1..100;
The same can be said about returning a value from a function: only types with names are
suitable for this.
The second point worth mentioning concerns passing into subroutines (and returning
from functions) values of complex types that occupy a large amount of memory. Pascal, in
principle, does not forbid to do this: you can pass a large record or array into a procedure, and
the program will successfully pass compilation and even work somehow. You only need to
remember two things. First, as already mentioned, local variables (including parameter-
values) are located in memory in the area of the hardware stack where there may be little
space. Secondly, copying large amounts of data itself takes non-zero time, which you can feel
very well if you do such things often - for example, calling a subroutine in a loop, each time
passing it a large-sized array as a value parameter.
Therefore, if possible, you should refrain from passing values that occupy a significant
amount of memory into subroutines; if you cannot avoid it, it is better to use var-parameter
even if you do not intend to change anything in the passed variable. The point is that when
you pass any value through a var-parameter, no copying takes place. We have already said
that the local name (var-parameter's name) becomes a synonym of the variable specified
during the subprogram's work; note that this synonym is realized at the machine code level
by passing an address, and the address does not take much space - 4 bytes on 32-bit systems
and 8 bytes on 64-bit systems.
So, if you work with variables that have a significant size, it is best not to pass them
§ 2.6. Pascal's type system 348
as parameters to subroutines at all; if you do have to do so, you should pass them as var-
parameters; and only in a very extreme case (which usually does not occur) you can try
to pass such a variable by value and hope that everything will be fine.
The question naturally arises as to which variables are considered to have a "significant
size". Of course, a size of 4 or 8 bytes is not "significant". You may not worry too much about
copying 16 bytes or even 32 bytes, but if the size of your type is even larger - then passing
objects of this type by value first becomes "undesirable", then somewhere at the level of 128
bytes - "significant".
"highly undesirable", and somewhere at the 512 byte level is unacceptable, even though the
compiler will not object. If you think of passing a data structure that occupies kilobytes or
more by value, at least try not to show your source code to anyone: there is a great risk that
they will not want to do business with you anymore after seeing such a trickery.
var
r: real;
begin
{ ... } r := 15;
- then, strangely enough, nothing terrible will happen: the variable r will contain the value
15.0, which is absolutely correct and natural for such a situation. In the same way, you can
assign to a variable of type real the value of an arbitrary integer expression, including
those calculated during program execution; the value assigned will be, of course, a floating-
point number, the integer part of which is equal to the value of the expression, and the
fractional part is zero.
Beginners often don't think about what actually happens in this situation; meanwhile, as
we know, the machine representations of integer 15 and floating-point 15.0 are quite
different from each other. A normal assignment, where you put the value of an expression of
exactly the same type into a variable, is reduced (at the level of machine code) to simply
copying information from one place to another; assigning an integer to a floating-point
variable cannot be realized in this way, you have to convert one representation into another.
This is how we come to the notion of type conversion.
The case of "magical conversion" of an integer into a fractional one refers to the so-called
implicit type conversions; this is what is said when the compiler converts the type of an
expression by itself, without direct instructions from the programmer. The possibilities of
implicit conversions in Pascal are rather modest: you are allowed to implicitly convert
numbers of different integer types into each other and into floating-point numbers, plus you
can convert floating-point numbers themselves into each other, of which there are also several
types - single, double and extended, and the familiar real is a synonym of
one of the three, in our case - double. A little later we will meet with implicit conversion
of a character into a string, but that's all over; the compiler won't convert anything else on its
own initiative. Implicit conversions occur not only in assignments, but also in subroutine
§ 2.6. Pascal's type system 349
calls: if a procedure expects a parameter of type real, we can easily specify an integer
expression when calling it, and the compiler will understand us.
Note that the compiler agrees to "magically transform" an integer number into a floating-
point number, but it refuses to do the conversion in the opposite direction: if we try to assign
a value to an integer variable that is of floating-point type, an error will be generated during
compilation. The point here is that there are different ways to convert a fractional number
into an integer one, and the compiler refuses to choose such a way for us; we have to specify
what specific way the number should be rid of the fractional part. There are actually two
options: conversion by discarding the fractional part and by mathematical rounding; the first
option uses the built-in trunc function, the second option uses round. So, if we have
three variables:
var
i, j: integer; r: real;
in variable i will be number 15 (the result of "stupid" discarding of the fractional part),
and in variable j - number 16 (the result of rounding to the nearest integer). In fact, the
functions round and trunc do not perform type conversion, but some kind of
calculation, because in the general case (when the fractional part is non-zero) the obtained
value differs from the initial one. These functions are only indirectly related to the
conversation about type conversions: it would not be quite fair to say that the corresponding
implicit conversion is forbidden and not explain what to do if it is required.
Let us note one important point. Implicit transformations can be applied to expression
values, but not to variables; this means that when passing a variable to a subprogram via
var-parameter, the type of the variable must exactly match the type of the
parameter, no liberties are allowed here; for example, if your subprogram has a var-
parameter of type integer, you cannot pass it a variable of type longint, or
of type word, or of any other type - only integer. This has a simple explanation: the
machine code of the subroutine is generated for a variable of the integer type and
does not and cannot know anything about what actually happened at the call point, so if
the compiler allowed passing, for example, a variable of the longint type instead of
integer, the subroutine would have no way of knowing that it was given a variable of the
wrong type. Everything is much simpler with value conversions (unlike variables): the
compiler does all conversions at the call point, and the body of the subroutine gets the type it
is waiting for.
In addition to implicit conversions, Free Pascal 161
also supports explicit type conversions, and
Following the Turbo Pascal and Delphi compilers, but unlike the classic Pascal variants where there was
161
var
c: char;
begin
{ ... }
byte(c) := 65;
Of course, not all types can be converted to each other. Free Pascal allows explicit conversions in
two cases: (1) ordinal types can be converted to each other without restriction, and this applies not
only to integers, but also to char, boolean, and enumerated types; and (2) types that have the
same machine representation size can be converted to each other. As for the conversion of variable
types, the first case does not work for them, and size matching is required.
You should be careful with explicit type conversions, because what happens will not always
coincide with your expectations. For example, it is better to get the code of a symbol using the ord
function, and to create a symbol by a given code using chr, which are specially designed for this
purpose. If you are not quite sure that you need explicit type conversion or
know how to do without it, it is better not to use it at all. Note that you can
always do without such conversions, but in some complicated situations the possibility to do the
conversion allows you to save labor costs; the other question is that you will not meet such tricky
cases for a long time.
This program will successfully pass compilation and even, at first glance, work normally,
printing the sacramental "Hello, world!". But only at first glance; in fact, after the usual
phrase, the program will print 17 more characters with code 0, and only then will print a
line feed and terminate; you can easily verify this in various ways: redirect the output to a file
and look at its size, run the program as a pipeline in conjunction with the familiar wc
program, apply hexdump (you can apply it to the output at once, or to the resulting
file). The character with the code 0, being printed, does not show itself on the screen, even
the cursor does not move anywhere, so these 17 "extra" zeros are not visible - but they are
there and can be detected. What is especially unpleasant is that formally this invisible "null
character" has no right to occur in text data, so the output of our program is no longer, formally
speaking, a correct text.
It is easy to guess where these parasitic "zeros" came from: the phrase "Hello,
world!" consists of 13 characters, and we have declared an array of 30 elements. The
assignment operator, having copied 13 characters from the string literal into the array
elements hello[1], hello[2], ..., hello[13], filled the other elements with zeros for lack
of anything better.
Of course, you can fix the program and make it correct, for example, like this (recall that
the break operator stops loop execution prematurely):
program HelloString;
var
hello: array [1..30] of char;
i: integer;
begin
hello := 'Hello, world!';
for i := 1 to 30 do
begin
if hello[i] = #0 then
break;
write(hello[i])
end;
§ 2.6. Pascal's type system 352
writeln
end.
but this is very cumbersome - once again we have to process the string one character at a
time!
The situation when we need to process a string without knowing its length in advance is
quite typical. Imagine that you need to ask a user what his name is; one will answer with a
laconic "Vova", and another will say that he is no less than Ostap-Suleiman-Berta-Maria
Bender-Zadunaisky. The question whether this can be predicted at the stage of program
writing should be considered rhetorical. In this connection, it is desirable to have some
flexible tool for working with strings that takes into account this fundamental property of a
string - to have unpredictable length. Besides, the concatenation operation (joining one string
to another) is very often performed on strings, and it is desirable to designate it in such a way
that its invocation would be unburdensome for the programmer. By the way, a special case of
concatenation - adding a single character to a string - is so common that once you get used to
it, you will remember it nostalgically in the future when working in C (where this action
requires much more effort).
Anyway, to solve most of the problems that arise when working with strings, Pascal - or
rather, its later dialects, including Turbo Pascal and, of course, our Free Pascal - provides a
special family of types, which will be the subject of the next paragraph.
Note that as part of a string literal we can represent not only characters that have a printed
representation, but also any other characters through their codes. In the examples we have
already met strings ending with a line feed character, such as 'Hello'#10; "tricky"
characters can be inserted not only at the beginning or end of a string literal, but also at any
place in the string, for example, 'one'#9'two' - here two words are separated by a tab
character. In general, within a string literal, you can arbitrarily alternate sequences of
characters enclosed in apostrophes and characters specified by their codes; for example,
'first'#10#9'second'#10#9#9'third'#10#9#9#9'fourth'.
there is a valid string literal, the result of printing it (due to tabs and line feeds inserted into
it) will look like this:
first
second
third
fourth
Characters with codes from 1 to 26 can also be represented as ~A, ~B, ..., ~Z; this is
justified by the fact that, as already mentioned, characters with corresponding codes are
generated by combinations of Ctrl-A, Ctrl-B, etc. when entered from the keyboard. In
particular, the literal from our example can be written as follows:
'first'~J~I'second'~J~I~I'third'~J~I~I~I'fourth'
§ 2.6. Pascal's type system 353
2.6.10. Type string
The string type, introduced in Pascal specifically for working with strings, is
actually a special case of an array of elements of the char type, but it is a rather nontrivial
case. First of all, we should note that when describing a variable of the string type, you
may or may not specify the limit of the string size, but the string will not become "infinite":
its maximum length is limited to 255 characters. For example:
var
si: string[15];
s2: string;
Variable s1 described in this way can contain a string up to and including 15 characters
long, and variable s2 can contain a string up to 255 characters long. Specifying a number
greater than 255 is not allowed, it will cause an error, and there is a rather simple explanation:
the string type assumes that the string length is stored in a separate byte, and a
byte, as we remember, cannot store a number greater than 255.
A variable of the string type occupies one byte more than the maximum length of
the stored string: for example, our variables s1 and s2 will occupy 16 and 256 bytes,
respectively. A variable of the string type can be handled as a simple array of elements
of the char type, with indexing of elements containing characters of the string starting from
one (for si it will be elements s1[1] through s1[15], for s2 - elements s2[1]
through s2[255]), but - somewhat unexpectedly - one more element with index 0 is found
in these arrays. This is the byte used to store the length of the string; since the array elements
must be of the same type, this byte, when accessed, is of the same type as the other elements,
i.e. char. For example, if you perform an assignment
si := 'abrakadabra';
then the expression s1[1] will be equal to 'a', the expression s1[5] will be equal
to 'k'; since the word "abrakadabra" has 11 letters, the last meaningful element will
be s1[11], it is also equal to 'a', the values of elements with large indices are
undefined (they can contain anything, and especially anything that is not). Finally, the element
s1[0] will contain the length, but since s1[0] is an expression of char type, it
would be wrong to say that it will be equal to 11; in fact, it will be equal to the character
with code 11, which is denoted as #11 or ~K, and the length of the string can be obtained
by calculating the expression ord(s1[0]), which will give the desired 11. However,
there is a more common way to find out the string length: use the built-in length
function, which is specially designed for this purpose.
Regardless of the length limit, all variables and expressions of the string type in
Pascal are assignment compatible, i.e. we can make the compiler execute both s2 := si
and si := s2, and in this second case the string may be cut off during assignment; in fact,
the possibilities of s2 are somewhat wider, nobody prevents this variable from containing
a string, say, 50 characters long, but si cannot contain more than 15 characters.
§ 2.6. Pascal's type system 354
What is especially nice is that when assigning strings, as well as when passing them by
value to subroutines, and when returning from functions, only the significant part of the
variable is copied; for example, if a variable declared as string without a length limit is a
string of three characters, only those three characters (plus the byte containing the length) will
be copied, even though the entire variable occupies 256 bytes.
It is even more interesting that if your subprogram accepts a var-parameter of
type string, then you can submit any variable of type string as this parameter,
including such a variable for which the length is limited in the description. This exception to
the general rule (about identical coincidence of the variable type with the var-parameter
type) looks ugly and unsafe, but turns out to be very convenient in practice.
Variables of the string type can be "stacked" using the " + " character, which
for strings means connecting them to each other. For example, the program
program abrakadabra;
var
si, s2: string;
begin
si := 'abra';
s2 := si + 'kadabra';
writeln(s2)
end.
will print, as you can guess, the word "abrakadabra".
Strings can be empty, i.e., containing no characters. The length of such a string is zero;
the literal denoting an empty string looks like "''" (two apostrophe characters placed next to
each other); this literal should not be confused with "' '", which denotes a space character
(or a string consisting of a single space character).
In almost all cases, expressions of the char type can be implicitly converted to the
string type - a string containing exactly one character. This is especially
convenient in combination with the addition operation. Thus, the program
§ 2.6. Pascal's type system 355
program a_z; { a_z.pas }
var s: string; c: char; begin s := ''; for c := 'A' to 'Z' do
s := s + c; writeln(s) end.
the variable s will contain the string "abra". Note that SetLength can also increase the length
of the string, which will result in "garbage" at the end of the string - unintelligible characters that were
in that memory location before the string was placed there; therefore, if you decide to increase the
length of the string using SetLength, it is best to immediately fill all of its "new" elements with
something meaningful. Note that only the current length of the string changes, but in no way the size
of the memory area allocated for this string. In particular, if you describe a string variable with ten
characters
s10: string[10];
356 The Pascal language and the beginnings of programming
and then try to set its length to more than 10, you will not succeed: the s10 variable cannot
contain a string longer than ten characters.
All of the procedures and functions listed below until the end of the
paragraph perform actions that you must be able to do yourself; we strongly
discourage the use of these tools until you are fluent with strings at the
character-by-character level. You should not start using each of the
functions and procedures listed below until you are satisfied that you can
do the same thing "manually".
The built-in functions LowerCase and UpCase take as input an expression of type string
and return the same string - the same string as contained in the parameter, except that
the Latin letters are brought to lower or upper case, respectively (in other words, the first function
replaces uppercase letters in the string with lowercase letters, and the second function, on the
contrary, replaces lowercase letters with uppercase letters).
The copy function takes three parameters as input: string, start position and number of
characters, and returns a substring of the given string starting from the given start position, with the
length as the given number of characters, or less if there were not enough characters in the original
string. For example, copy('abrakadabra', 3, 4) will return the string 'raka' and
copy('foobar', 4, 5) will return the string 'bar'.
The delete procedure also takes as input a string (this time via a parameter variable,
because the string will be changed), the starting position and the number of characters, and deletes
the given number of characters from this string (right on the spot, i.e. in the variable you passed as
a parameter), starting from the given position (or to the end of the string, if there are not enough
characters). For example, if the variable s contained the same "abrakadabra", then after
performing delete(s, 5, 4) the variable s will contain "abrabra"; but if we had applied
delete(s, 5, 100), we would have gotten just "abra".
The built-in insert procedure inserts one string into another. The first parameter specifies the
string to be inserted; the second parameter specifies the string type variable into which the specified
string is to be inserted. Finally, the third parameter (an integer) specifies the position from which the
insertion should be performed. For example, if the variable s contains the string "abcdef"
and then insert('PQR', s, 4), the variable s will contain the string "abcPQRdef".
The pos function accepts two strings as input: the first one specifies the substring to search
for, the second one specifies the string to search in. It returns an integer equal to the position of the
substring in the string if it is found, or 0 if it is not found. For example, pos('kada',
'abrakadabra') will return 5, and pos('aaa', 'abrakadabra') will return 0.
The val procedure, which constructs a number of type longint, integer or byte
from its string representation, can be very useful. The first parameter to the procedure is a string that
must contain the text representation of the number (possibly with some spaces before it); the second
parameter must be a variable of type longint, integer, or byte; the third parameter is another
variable, always having
§ 2.6. Pascal's type system 357
word type. If everything was successful, the procedure will put the obtained number into the second
parameter, and the number 0 into the third parameter; if there was an error (i.e. the string did
not contain a correct representation of the number), the number of the position in the string where
the conversion to a number failed is put into the third parameter, and the second parameter remains
undefined in this case.
Pascal also provides a means of converting a number back to a string representation. If decimal
representation is required, you can use the str pseudo-procedure; here you can use character
count specifiers in the same way as we did for print in the write statement (see page 246). For
example, str(12.5:9:3, s) will put the string " 12,500" (with three spaces at the beginning
to make it exactly nine characters) into the variable s.
There are also built-in tools for translating to binary, octal and hexadecimal, but this time they
are almost ordinary functions called BinStr, OctStr and HexStr. Unlike str, they work only
with integers and are functions, i.e. they return the received string as their value, not via a parameter.
All three for some reason provide two parameters: the first is an integer of arbitrary type, the second
is the number of characters in the resulting string.
Here the command line consists of four words: the program name itself, the word "abra",
the word "schwabra", and the word "kadabra".
In a Pascal program these parameters are available using the built-in functions
ParamCount and ParamStr; the first function, without receiving any parameters,
returns an integer corresponding to the number of parameters without taking into account the
program name (in our example it will be number 3); the second function takes an integer as
input and returns a string corresponding to the command line parameter with a given
number. In this case, the program name is considered to be parameter number 0 (i.e. it can
be found out using the ParamStr(0) expression), and the others are numbered from one
to the number returned by the ParamCount function.
Let's write an example program that prints all elements of its command line, no matter
how many there are:
i: integer;
§ 2.7. Selection operator 358
begin
for i := 0 to ParamCount do writeln('[', i, ']: ',
ParamStr(i))
end.
Once compiled, we can try this program in action, for example:
avst@host:~/work$ ./cmdline abra schwabra kadabra
[0]: /home/avst/work/cmdline
[1] : abra
[2] : schwabra
[3] : kadabra
avst@host:~/work$ ./cmdline
[0]: /home/avst/work/cmdline
avst@host:~/work$ ./cmdline "one two three"
[0]: /home/avst/work/cmdline
[1]: one two three
avst@host:~/work$
Note that a three-word phrase enclosed in double quotes was treated as a single parameter.
This has nothing to do with the Pascal language; it is a property of the command interpreter,
which we discussed in detail in §1.2.6.
case nv of
1: begin
{...}
end;
2: begin
{...}
end;
3: begin
{...}
end;
{...}
14: begin
{...} end end
§ 2.7. Selection operator 360
The determining factor here is that the case-expression has an ordinary integer type, and the
variants are designated by numbers 1, 2, 3 and so on, sometimes up to quite large numbers
(the author of these lines has seen such constructions with 30 or more variants). So, they fire
you from your job for such programming, and rightly so. Indeed, how should you read it?
What, for example, is 3, i.e. what does it mean in this case? You can find the answer to
this question by digging the program up and down, finding out where the value of this variable
nv comes from, in which cases it takes some values and in which cases it takes other
values; it will take a lot of time for the reader of the program. But that's not the worst of it.
The author of the program can confuse values of this kind himself by returning, say, number
7 instead of number 5 from some function by mistake. In general, a program whose internal
logic is based on "variant numbers" is surprisingly fast and dashing to "fight off".
The right thing to do in such a case is not to use the numbers of these variants, which do
not tell anyone anything, but the values of the enumerable type specially introduced for this
purpose (see §2.6.2), choosing meaningful identifiers.
It is necessary to note the influence of the selection operator on the size of subroutines.
As we have already mentioned, an ideal subroutine is no more than 25 lines long; if you
encounter a selection operator in your subroutine, in most cases you will not fit into this limit.
In principle, this is the case when it is quite acceptable to exceed the specified size, but not by
much. Nested selection operators
into each other is almost always unacceptable. If the implementation of all or some of the
alternatives in your choice operator is so complicated, you should separate each alternative
into a separate subroutine, and the choice operator itself will then consist of their calls.
had nothing in common with terminals in Unix, the screen was controlled through BIOS
interface or even by direct access to video memory. Now all this is of archaeological interest,
except that the creators of Free Pascal set as one of their goals to achieve full compatibility
with Turbo Pascal, including the crt module; as a result, the version of Free Pascal that you
and I use on unix systems contains its own implementation of the crt module. This
implementation supports all the functions that were present in MS-DOS's Turbo Pascals, but
implements them, of course, by means of escape sequences and terminal driver
reprogramming.
It should be noted that the interface of the crt module is much simpler than that of the
same ncurses library and is much better suited for beginners. It is this module that we
will use.
To make the module's features available in the program, we need to tell the compiler that
we are going to use it. This is usually done immediately after the program header, before all
other sections, including before the constants section (though not necessarily), for example:
program tetris;
uses crt;
Before we start discussing the module's features, we should make one caveat. As soon as a
program written using the crt module is run, it will immediately reprogram the terminal to
suit its needs; among other things, this means that the life-saving Ctrl-C combination will
no longer work, so if your program hangs or you simply forget to provide a correct way to tell
it that it is time to terminate, you will have to remember how to kill processes from a nearby
terminal window. Perhaps you should reread §1.2.9.
Note that for programs written using the crt module, I/O redirections make no sense at
all; the whole thing just won't work.
From the words Cathode Ray Tube, i.e. cathode ray tube, aka "kinescope". The liquid crystal "flat"
162
monitors, which have now completely replaced the kinetoscope monitors, did not exist at that time.
§ 2.8. Full screen programs 363
and ScreenHeight. These variables are also entered by the crt module; when the
program starts, the module writes into them the actual number of available characters in a line
and the lines themselves.
With GotoXY at our disposal, we can already do something interesting. Let's write
a program that displays our traditional phrase "Hello, world!", but it does it not within
our dialog with the command interpreter, as before, but in the center of the screen, cleared of
all extraneous text. Having printed the inscription, let's move the cursor back to the upper left
corner so that it doesn't spoil the picture, wait five seconds (this can be done using the delay
procedure, whose argument is an integer expressed in thousandths of a second; it is also
provided by the crt module), clear the screen again and finish the work. The duration of
the delay, as well as the text of the message to be output, will be placed at the beginning of
the program as named constants.
It remains to calculate the coordinates for printing the message. The vertical coordinate
is simply half the screen height (ScreenHeight, divided in half); for the horizontal
coordinate, we subtract the length of our message from the screen width (ScreenWidth),
and again divide the remaining space in half. With this approach, the difference between the
upper and lower margins, as well as between right and left, will not exceed one; we can't
achieve the best anyway, because the output in alphanumeric mode is possible only in
accordance with the available characters, you can't move the text by half a character either
horizontally or vertically. By the way, we should not forget that we will also need integer
division, using the div operation.
So, writing:
Note that the current coordinates of the cursor can be obtained using the WhereX and WhereY
functions; these functions take no parameters. If we use GotoXY to try to move the cursor to
an existing position, it will be at that position, whereas if we try to move it to a position that does not
exist on our screen, the resulting current coordinates will be anything but what we expect. In addition
to GotoXY, the current position of the cursor is naturally changed by output operations (usually
the write operator is used together with the crt module).
Unfortunately, these tools have a very serious limitation: if the user resizes the window in which
the program is running, the program will not know about it; the ScreenWidth and
ScreenHeight values will remain as they were set by the crt module at startup. The source
of this limitation is quite obvious: back when the crt module was invented, the screen could not be
resized.
§ 2.8. Full screen programs 364
2.8.3. Dynamic input
To organize keyboard input by one key, as well as to handle all sorts of "tricky" keys like
"arrows", F1 - F12 and other such things, the crt module provides two functions:
KeyPressed and ReadKey. Both functions do not take parameters. The KeyPressed
function is quite simple: it returns the logical value true if the user has pressed a key whose
code you haven't read yet, and false if the user hasn't pressed anything.
The ReadKey function is a bit more complicated. It allows you to get the code of the
next key pressed; if you call ReadKey before the user has pressed something, the function
will block until the user deigns to press something; if the key has already been pressed, the
163
function will return control immediately. It should be emphasized that the ReadKey call,
while returning another code, removes it from the incoming buffer, i.e. this function has a side
effect.
The return value type of the ReadKey function is an ordinary char, and if the user
presses any key with a letter, number or punctuation mark, exactly this character will be
returned. The same is true for space (' '), tab (#9), Enter (#13; note that it is not #10,
although in some other version it may be #10), Backspace (#8), Esc (#27). The
combinations Ctrl-A, Crtl-B, Ctrl-C, ..., Ctrl-Z give the codes 1, 2, 3, ..., 26,
the combinations Ctrl-[, Ctrl-\ and Ctrl-] give the following codes 27, 28 and 29.
The ReadKey function is quite tricky with other service keys, such as arrow keys,
Insert, Delete, PgUp, PgDown, F1-F12. In the days of MS-DOS, the creators of the crt
module took a rather unobvious and not quite beautiful solution: they used so-called "extended
codes". In practice, it looks like this: the user presses a key, the ReadKey function in the
program returns the symbol #0 (symbol with code 0), which means that the function must
be immediately called for the second time; it returns control immediately, and the symbol
returned by the function in this repeated call just identifies the pressed special key. For
example, "left arrow" codes 0-75, "right arrow" codes 0-77, "up arrow" and "down arrow"
codes 0-72 and 0-80 respectively. The following simple program will allow you to find out
which keys correspond to which codes:
163
Beginners in this case often say "hangs", but this is wrong: when something "hangs", the only way to get
it out of this state is by extraordinary measures like killing the process, whereas simply waiting for an event -
in this case a keystroke - to stop as soon as this event occurs is not called hanging, but locking.
§ 2.8. Full screen programs 365
until c = ' ' end.
c: char; begin
c := ReadKey;
if c = #0 then begin
c := ReadKey;
code := -ord(c) end else begin
code := ord(c) end
end;
var
i: integer;
begin
§ 2.8. Full screen programs 366
repeat
GetKey(i);
writeln(i)
until i = ord(' ')
end.
To get a general idea of the possibilities offered by dynamic input, we will first write a
program that, like hellocrt, will display the text "Hello, world!" in the middle
of the screen, but which can then be moved with the arrow keys; exit the program by any key
that has a normal (not extended) code.
In the constants section we will have only the text of the message. To display the message
and remove it from the screen, we will write the ShowMessage and HideMessage
procedures; the latter will display a number of spaces equal to the length of the message at the
desired position. The basis of the program will be a relatively short procedure
MoveMessage, which accepts five parameters: two integer variables - the current
coordinates of the message on the screen; the message itself as a string; two integers dx and
dy, specifying the change of x and y coordinates.
In the main program we will make a pseudo-infinite loop in which we will read the key
codes. If a normal "non-extended" code is read (the GetKey procedure has written a positive
number to the variable), the loop will be interrupted with the break operator, and the
program will end after clearing the screen. If the extended code is read (the number in the
variable is negative), then, if it corresponds to one of the four arrow keys, the
MoveMessage procedure will be called with the corresponding parameters; the program
will ignore the other keys with extended codes. The full text of the program is as follows:
This program has a serious drawback: it does not keep track of valid values for coordinates,
so we can easily "push" the message off the screen; after that it will always appear in the upper
left corner. We will suggest the reader to fix this as an exercise.
Let's consider a more complex example. Our next program will display an asterisk symbol
(*) in the middle of a blank screen. At first the symbol will be stationary, but if you press any
of the four arrows, the symbol will start moving in the specified direction at a rate of ten
characters per second. Pressing the other arrows will change the direction of its movement,
and pressing the spacebar will stop it. The Escape key will end the program.
In the constants section we will again have DelayDuration equal to 100, i.e. 10
seconds. This is the time interval that will pass between two movements of the star.
Taking into account the experience of the previous program, we will collect all the data
specifying the current state of the sprocket into one record, which we will pass to the
procedures as a var-parameter. This data includes the current coordinates of the star, as
well as the direction of motion specified by the familiar dx and dy values. The type for this
record will be called simply star. The procedures ShowStar and HideStar, receiving
a single parameter (a record of type star) will show the star on the screen and remove it
by typing a space in this place; the procedure MoveStar will move the star by one
position in accordance with the dx and dy values. For convenience, we will also describe
the SetDirection procedure, which puts the specified values into the dx and dy fields.
The main part of the program will first set the initial values for the asterisk and display it
in the middle of the screen; the program will then enter a pseudo-infinite loop in which, if the
user has not pressed any keys (i.e. KeyPressed has returned false), MoveStar
will be called and delayed; since nothing else needs to be done in this case, the loop body
will be terminated prematurely with the continue statement (recall that, unlike
break, the continue statement prematurely terminates only one iteration of the
loop body, not the entire loop). When the code of one of the arrows is received,
SetDirection will be called with the corresponding parameter values, when the code of
the space character (32) is received, the asterisk will be stopped by calling SetDirection
§ 2.8. Full screen programs 368
with zero dx and dy, when Escape (27) is received, the loop will be terminated by the
break operator. All together will look like this (here and below we omit the body of the
GetKey procedure to save space, it is the same in all examples):
program MovingStar; { movingstar.pas }
uses crt;
const
DelayDuration = 100;
type
star = record
CurX, CurY, dx, dy: integer; end;
s.CurX := ScreenWidth;
s.CurY := s.CurY + s.dy;
if s.CurY > ScreenHeight then s.CurY := 1 else if s.CurY < 1
then s.CurY := ScreenHeight;
ShowStar(s)
end;
var
s: star;
ch: char;
begin
clrscr;
§ 2.8. Full screen programs 369
s.CurX := ScreenWidth div 2;
s.CurY := ScreenHeight div 2;
s.dx := 0;
s.dy := 0;
ShowStar(s); while true do begin if not KeyPressed then begin
MoveStar(s);
delay(DelayDuration); continue
end;
GetKey(c); case c of
-75: SetDirection(s, -1, 0);
-77: SetDirection(s, 1, 0);
-72: SetDirection(s, 0, -1);
-80: SetDirection(s, 0, 1);
32: SetDirection(s, 0, 0); 27: break
end
end; clrscr end.
terminals of the last models (for example, DEC VT340, production of which was discontinued
only in the second half of the 1990s), formed a color image on the screen and supported
escape-sequences that set the text color and background color.
Unfortunately, the interface of our module crt reveals these possibilities not to the full
extent; the fact is that the prototype of this module from Turbo Pascal was designed for the
standard text mode of so-called IBM-compatible computers, where everything was relatively
simple: each sign corresponded to two one-byte cells of video memory, the first byte
contained the character code, the second - the color code, and four bits of this byte set the
color of the text, three bits - the background color, which made it possible to use only eight
different colors for the background and sixteen - for the text. Even the capabilities of
alphanumeric terminals produced in those times were wider, not to mention modern emulator
programs.
Since the crt module introduced in Free Pascal was developed primarily to maintain
§ 2.8. Full screen programs 370
compatibility with its prototype, its interface repeats the features of the prototype's interface
and does not provide any broader features. Such features could be used using the video
module, but it is much more complicated to work with it, and our tasks are only educational;
perhaps, if you seriously want to write full-screen programs for an alphanumeric terminal, it
would be better to learn C and use the ncurses library. However, as you will soon see, the
capabilities of the crt module are quite enough to create quite interesting effects; at the
same time, it is much easier to master it.
We have only two main tools here: the TextColor procedure sets the text color, and
the TextBackground procedure sets the background color. The colors themselves are
set by constants described in the crt module; they are listed in Table 2.1. Note that all 16
constants listed in the table can be used to set the text color, while only the eight constants in
the left column can be used to set the background color. For example, if you execute
TextColor(Yellow);
TextBackground(Blue);
write('Hello');
then the word "Hello" will be printed in yellow letters on a blue background, and this
combination will be used for all text output until you change the text or background color
again. Alternatively, you can make the output text blink; to do this, when you call
TextBackground, you add a Blink constant to its argument; you can do this using
normal addition, although a bitwise "or" operation would be more appropriate. For example,
the text displayed on the screen after executing TextColor(Blue or Blink) will
be blue and blinking.
The described tools have a fundamental disadvantage: the text and background color
settings remain in effect after your program is terminated, and the crt module has no means
to find out what color of text and background is set now (in particular, at the moment of
launching your program), so if we use only the crt module tools, we will not be able to
restore the terminal settings, and after our program is terminated, the user will have to "bring
the terminal back to normal", for example, with the reset command, or close the window
altogether. However, this problem can be solved by using the terminal capabilities directly.
Operator
write(#27'[0m');
will output an escape sequence to the terminal (literally, a sequence of character codes starting
with the Escape code, i.e. 27; see page 366) that corresponds to restoring the terminal's
"default" settings; in particular, for terminal emulators used in X Window, the settings that
were in effect when the emulator was started are restored. 366), which corresponds to
restoring the terminal to its "default" settings; in particular, for terminal emulators used with
X Window, the settings that were in effect when the emulator was started are restored.
The following program demonstrates all possible combinations of text and background
colors, filling the screen with asterisks according to the following rules: in each line the color
of the asterisks themselves (i.e. text color) is the same, it is chosen new for each line; the
width of the line is divided into equal (as far as possible) parts corresponding to all possible
§ 2.8. Full screen programs 371
background colors; for each screen position before printing the asterisk, the text color is set
corresponding to the color of the current line, and the blink attribute is added to the
asterisks in positions with even numbers, making them blink.
All possible color values will be placed in the AllColors array; for convenience,
we introduce ColorCount and BGColCount constants corresponding to the total
number of all colors and background colors. The MakeScreen procedure will be
responsible for selecting the text color for each line; it will loop through all screen line
numbers and for each line call the MakeLine procedure with two parameters: line number
and selected color value. MakeLine will calculate the column width for each possible
background color and loop through all positions of the row, setting the appropriate colors for
each of them and giving an asterisk. If the column width is less than one, force it to one to
avoid further division by zero. Note that we will not print an asterisk in the last position of the
last line to avoid scrolling the entire screen; alas, the crt module does not allow us to print
a character in this position so that nothing goes anywhere.
To prevent the program from terminating immediately after forming the "picture" (so that
we wouldn't have time to see the picture), we need to perform some input operation; we don't
want to use the notorious ReadKey, but it doesn't make sense to drag our (rather
cumbersome) GetKey procedure into the program to use it in such a trivial situation, so
we'll use the usual readln operator to "pause" the program; the user will have to press the
Enter key to exit the program.
The text of the program turns out like this (we advise the reader to see for himself how
exactly the necessary values of positions and indexes of the array of colors are obtained with
the help of div and mod operations):
program ColorsDemo; { colordemo.pas }
uses crt;
const
ColorCount = 16;
BGColCount = 8;
var
AllColors: array [1..ColorCount] of word =
(
Black, Blue, Green, Cyan,
Red, Magenta, Brown, LightGray,
DarkGray, LightBlue, LightGreen, LightCyan, LightRed,
LightMagenta, Yellow, White
);
begin
MakeScreen;
readln;
write(#27'[0m');
clrscr end.
creating "toys" is the so-called random number sensor, which allows you to make each game
session different from others, to introduce variety and healthy doses of unpredictability into
the game.
Of course, not all games require elements of randomness; but you should not be in a hurry.
Even a program that plays chess can- By the way, if you have not noticed such a desire, it may well
36
mean that you are studying programming in vain. May the reader forgive me for another reminder that not
everyone should become a programmer; but if, having received at your disposal a toolkit sufficient for creating
your own "toy", you did not feel the desire to rush immediately to make such a toy, then, most likely, the
programming process does not give you pleasure - and this, as we discussed in the prefaces, almost certainly
means that the work of a programmer will turn into a natural torture for you. However, it's up to you.
If he always plays the same opening, he will bore the user to death.
Generating truly random numbers that cannot be predicted is a science; however,
programmers have long since learned how to deal with this task, and modern operating
systems (including Linux) include tools for generating random numbers based on
unpredictable events, such as intervals between packet arrivals from the local computer
network and between keystrokes on the keyboard, fluctuations in the speed of hard disks, and
so on. Such tools are usually required in serious cases, for example, when generating secret
cryptographic keys - in general, in situations where the ability of an outsider to predict the
sequence of "random" numbers generated on our computer could cost us dearly.
Game programs do not generate such situations, unless we are going to play with someone
for money, and a lot of it. In most cases, no one will simply not try to predict the sequence of
numbers generated in a game program or in some screensaver, because such a prediction, even
§ 2.8. Full screen programs 373
if it can be made, will not bring the predictor any benefit at all comparable to the cost of time
spent (and in most cases - none at all). Therefore, programmers often replace the generation
of "real" random numbers with sequences of pseudo-random numbers.
The general principle of pseudorandom number generation is as follows. There is a certain
variable, which is usually called random seed ; with the help of some tricky formula, the
164
next value of this variable is obtained from the previous value of this variable, which is entered
into it every time a random number is required. The random number itself is obtained from
the current value in the random seed by some other formula, probably much less tricky.
Sequences of pseudo-random numbers have one undoubted advantage: if such a need
arises, they can be repeated, starting from the same random seed value as the last time.
Sometimes this is required when debugging programs. However, in most cases we need the
opposite: to make the sequence new and "unpredictable" each time, taking into account that
nobody will seriously try to predict it. For this purpose, at the random seed is filled with some
value that can be expected to be new every time - for example, the current value of the system
time, which is measured as the number of seconds since January 1, 1970, or something similar.
Free Pascal provides built-in facilities for generating a sequence of pseudorandom
numbers. To fill the random seed at the beginning of the program, you must call the
randomize procedure; it will put a "random" number (actually, just the value of the current
time) into a global variable called randseed, but that's beside the point - this variable
should not be accessed directly. It should be emphasized that randomize must be called
exactly once; if, for example, you start calling it every time you need another random number,
it is likely that all the numbers generated in the program within one second will be the same.
To get a random number, you should use the random function, which is present in two
variants: it can be called without parameters, and then it will return a number of type real
on the half-interval from 0 to 1 (including zero, but not including one); if you need an integer,
the random function is called with one (integer, or more precisely - of type longint)
parameter, and it returns an integer number from 0 to the value passed by the parameter, but
not including this value. For example, random(5) will return the number 0, 1, 2, 3 or 4.
Note that the random function also has a side effect: before returning a random number, it
changes the randseed variable to return a different number the next time it is called.
To demonstrate the capabilities of the pseudorandom number generator, let's write a
simple program that will gradually fill the initially empty screen with multicolored stars; the
position and color of the next star will be chosen randomly using the random function,
and a short (e.g., 20 ms) delay will be made between the output of two stars for better effect.
The program will look like this:
The literal translation - something like "random seed" - does not reflect the actual meaning of this word
164
combination because of the numerous figurative meanings of the word seed, and the author of the book has not
met an adequate translation into Russian; it would be possible, perhaps, to translate seed with the word
"seedling", but it is easier to leave the word combination random seed as it is.
§ 2.8. Full screen programs 374
const
DelayDuration = 20;
ColorCount = 16;
var
AllColors: array [1..ColorCount] of word =
(
Black, Blue, Green, Cyan, Red, Magenta, Brown,
LightGray, DarkGray, LightBlue, LightGreen, LightCyan,
LightRed, LightMagenta, Yellow, White
§ 2.9. Files 369
);
var
x, y, col: integer;
begin
randomize;
clrscr;
while not keypressed do begin
x := random(ScreenWidth) + 1;
y := random(ScreenHeight) + 1;
if (x = ScreenWidth) and (y = ScreenHeight) then
continue;
col := random(ColorCount) + 1;
gotoxy(x, y);
TextColor(AllColors[col]);
write('*');
delay(DelayDuration)
end;
write(#27'[0m');
clrscr
end.
2.9. Files
2.9.1. General information
We have already managed to work with files through standard input and output streams,
relying on the fact that the necessary file will be "slipped" to us by the user at the start of our
program, redirecting input or output by means of the command interpreter. Of course, the
program itself can work with files, as long as it has enough authority to do so.
In order to work with the contents of a file, it must be opened. At that, the program
addresses the operating system, declaring its intention to start working with the file; usually,
such an address specifies which file our program is interested in (i.e. specifies the file name)
and what the program is going to do with it (the mode of working with the file - read-only,
write-only, read and write, add to end). Once the file is successfully opened, a new input or
output stream associated with the file on the disk specified when the file was opened becomes
available to our program along with the standard input/output streams we already know.
The operations that can be done with such a thread are generally similar (and at the operating
system level - simply the same) to those that we can perform with standard threads: mostly,
of course, they are reads and writes, although there are others.
I/O threads associated with newly opened files must be distinguished from each other and
from standard threads. The Pascal language provides so-called file variables for this purpose;
there is a whole family of special file types to describe such variables. It should be noted that
a file type differs significantly from other types; the most noticeable difference is that file type
§ 2.9. Files 370
variables represent the only variant of a file type expression, i.e. file type values exist only
as "something that is somehow stored in file variables" and nothing else; they cannot even be
assigned. You can pass file variables to subroutines only through var-parameters. This
may seem unusual, because before we always talked about values of a given type and
expressions, the calculation of which gives such values, and variables of the same type were
considered just as a storage of the corresponding value. With file types it's the opposite: we
have only file variables; we can guess that they store something, and even say that they
probably store a "file type value", but all these talks will be nothing more than abstract
philosophy, because there are no means of working with such values in isolation from the
variables storing them in Pascal. In other words, file type variables allow us to distinguish
between any number of simultaneously active (open) I/O threads, but that's all: we can't use
file variables for anything else.
Depending on how we are going to work with the file, we have to choose a specific type
of file variable. Here we have three possibilities:
• work with a file assuming that it is text; for this purpose, a file variable of type text
is used;
• work with a file as an abstract sequence of bytes, being able to write and read any of
its fragments using the so-called block read and block write operations; this will
require a file variable, the type of which is called file;
• assume that a file consists of fixed-length blocks of information that correspond to the
machine representation in memory of values of some type; here we need a so-called
typed file, for which Pascal supports a whole family of user-entered types, such as file
of integer, file of real, or (more often) file of myrecord, where
myrecord is the name of the type-record described earlier.
Note that we can work with one and the same file in at least two, and often all three of these
ways; the way we choose to work depends not on the file, but on the task at hand.
Regardless of the type of file variable we use, before we start working with a file, we need
to assign a file name to the variable; this is done by calling the assign procedure.
For example, if we want to work with the text file data.txt located in the current
directory, in the variable description section we have to write something like
var
fl: text;
and somewhere in the program is a call
assign(f1, 'data.txt');
We emphasize that this call simply associates the name 'data.txt' with a file variable.
The assign procedure does not try to open the file or even check if it exists (which is
understandable, because we may be about to create a new file). Of course, you can use not
only constants, but also any string expressions as a file name; if the name starts with the
"/" character, it is considered an absolute file name and will be counted from the root
§ 2.9. Files 371
directory of our system, if the name starts with any other character, it is considered relative
and counted from the current directory; however, this is not due to Pascal, but to Unix
operating systems.
Since beginners often get confused between file name and file variable name, let us
emphasize once again that these are two completely different, initially unrelated entities. A
file name is what it (i.e. a file) is called on disk, the name by which the operating system
knows it. When we write a program, we may not know at all what the file name will be during
its operation: perhaps the user will give us the file name, or we will get it from some other
sources, perhaps we will even read it from another file; this often happens in real tasks.
On the other hand, the name of a file variable is the name of the variable and nothing
more. We are free to name our variables as we wish; if we rename the variables but do not
change anything else in the program, the behavior of our program will not change, because
variable names have no effect on this behavior . And, of course, the file variable name we
172
choose has nothing to do with which file on our disk we are going to use.
The relationship between a file variable name and a file name on disk begins to exist only
after the assign procedure is called; moreover, no one forbids calling this procedure
again, breaking the old relationship and establishing a new one.
After the file variable has been assigned a file name, we can try to open the file for further
work with it. If we are going to read information from a file, we should open it using the
reset procedure; in this case the file must already exist (if it does not exist, an error will
occur), and work with it will start from its initial position, i.e. the first read operation will
retrieve data from the very beginning of the file, the next operation will retrieve the next
portion of data, etc.
д. An alternative way to open a file is to use the rewrite procedure. In this case, the file
does not have to exist: if it does not exist, it will be created; if it already exists, all the
information in it will be destroyed and the work will start "from scratch". Text files can also
be opened for appending, this is done with the append procedure; this procedure does
not work for typed and block files.
It should be taken into account that the operation of opening a file is always fraught with
errors, and the built-in diagnostics that the Free Pascal compiler is able to insert into our
programs is notable for its incomprehensibility; therefore, it is highly desirable to disable the
built-in I/O error handling by specifying the {$I-} directive in the program, which we
already know, and to organize error handling independently, using the value of the
lOResult variable (see page 319).
To read and write text and typed files we use the already familiar read and write
operators, and for text (but not typed) files we can also use readln and writeln; the
only difference is that, working with files instead of standard I/O streams, we specify a file
variable as the first argument in these operators. For example, if we have a file variable fl
and a variable x of type integer, we can write something like
Note that the fundamental absence of influence of specific variable names chosen by the programmer on
172
the program behavior is one of the key features of compiled programming languages, which include Pascal,
among others.
§ 2.9. Files 372
write(f1, x);
- and if fl is of type text, the text representation of the number stored in x (i.e.
sequence of bytes with character-number codes) will be written to the corresponding file,
whereas if fl is a typed file, exactly two bytes will be written - the machine
representation of the number of type integer. Similarly, the familiar eof and SeekEof
functions are used (the latter is only for text files): when working with files, these procedures
take one argument - a file variable, so we can write something like "while not eof(fl)
do".
For working with block files read and write are not suitable, instead
BlockRead and BlockWrite procedures are used, which we will consider later in the
paragraph devoted to this type of files.
When the file is finished, it should be closed by calling the close procedure. The file
variable can then be used further to work with another (or even the same) file; if you reset
or rewrite the same file variable after closing the file, a file with the same name will
be opened , but you can reassign the name by calling assign again.
173
To conclude the introductory paragraph, here is the text of the program that writes the
same phrase Hello, world! to the text file hello.txt:
After running such a program, a 14 byte hello.txt file (13 message characters and a line
feed) will appear in the current directory, which can be viewed, for example, with the cat
command:
avst@host:~/work$ ./hellofile
avst@host:~/work$ ls -l hello.txt
-rw-r--r-- 1 avst avst 14 2015-07-18 18:50 hello.txt
avst@host:~/work$ cat hello.txt
Hello, world!
avst@host:~/work$
It would be a mistake to say that the same file will be opened: in the time that elapses between close
173
and reset/rewrite, someone else may have renamed or deleted our file and written a completely
different one to disk under its name.
§ 2.9. Files 373
Actually the right thing to do, of course, would have been a little neater:
program HelloFile;
const
message = 'Hello, world!';
filename = 'hello.txt';
var
f: text;
begin
{$I-}
assign(f, filename);
rewrite(f);
if IOResult <> 0 then
begin
writeln('Couldn't open file ', filename);
halt(1)
end;
writeln(f, message);
if IOResult <> 0 then
begin
writeln('Couldn't write to the file');
halt(1)
end;
close(f)
end.
The first error message is much more important than the second one: files are very often not
opened for reasons beyond our control, whereas if a file is opened, writing to it will succeed
in most cases (though not always, of course: for example, the disk may run out of space).
An open file, if it is a simple disk file, is characterized by its current position, which is
usually set to the beginning of the file when it is opened, and to the end of the file when a text
file is opened using the append procedure. Each input or output operation shifts the current
position forward by as many bytes as the number of bytes that were input or output. Therefore,
for example, successive read operations from the same file will not read the same data, but
successive portions of the data in the file, one after the other. In some cases, the current
position of an open file can be changed.
Text files do not allow forced changes to the current position and do not involve alternating
read and write operations; such files should be written as a whole, from beginning to end,
sometimes in several steps (in this case, the file is opened for appending using append).
There is no going back when writing text files; if something needs to be changed at the
beginning or in the middle of an existing text file, the whole file is overwritten. Therefore, for
text files, the reset procedure opens the file in read-only mode, and the rewrite
and append procedures open the file in write-only mode.
The peculiarities of the textual representation of data require extra care when performing
reads "to the end of the file". We have already discussed this in detail in §2.5.4 for the case of
a standard input stream; when reading from a plain text file, similar problems arise, solved by
the same SeekEof function, only in this case it is called with one parameter, which is a file
variable. Recall that the SeekEof function actually checks if there are still meaningful
(non-space) characters in the stream; for this purpose it reads and discards all space characters,
and if during this reading/ discarding there is an "end of file" situation, the function returns
"true", if a meaningful character is found, this character is returned back to the stream (it is
considered unread so that it can be used by the next read), and the function itself returns
"false". A "file" version is also provided for the SeekEoln function, which similarly
"searches" for the end of a string, i.e. checks if something else meaningful can be read from
the current string.
§ 2.9. Files 375
Suppose, for example, we have a text file, whose name we get through the command line
argument; the file consists of lines, each of which contains one or more floating-point
numbers. We need to multiply the numbers on one line, and add the results of these
multiplications and output the result. For example, for a file containing
2.0 3. 05.0
0.5 12.0
the result should be the number 36.0. The corresponding program can be written as
follows:
Pay attention to readln(f) after the read/multiply loop. It is inserted to remove the line
feed character from the input stream; if this operator is removed, the program will simply
"hang".
It is clear that SeekEof and SeekEoln functions can be used only for text files; for
any other data formats such functions simply do not make sense, because both data separation
by spaces and data spacing on different lines are obviously phenomena possible only when
working with text representation.
It is worth noting that the standard input and standard output streams also have their own names
- Free Pascal provides global variables of type text for them. A standard output stream can be
referred to by the name output; for example, writeln(output, 'Hello') is the
same as just writeln('Hello'). Similarly, a standard input stream is referred
to by the name input, so you can write read(input, x) instead of just read(x).
These variables can be useful, for example, if you are writing a subroutine that will output data in text
form, but you don't know in advance whether the data will be output to a text file or to a standard
stream; in this case, you can provide a parameter of type text, as which to pass either an open
file variable or output.
Influenced by the C language, Free Pascal has also incorporated other names for global
variables that denote standard streams: stdin (same as input) and stdout (same as
output). In addition, Free Pascal allows output to the standard error reporting stream (see §1.2.11),
also called the diagnostic stream, which is labeled ErrOutput or stderr.
If you compare this program with the GenerateNumTxt program from the previous
paragraph, you will find that almost nothing has changed in the text: the program name has
changed, the suffix in the file name has changed (.bin instead of .txt), the write
operator is used instead of writeln and, finally, the most important thing: the file variable
in the previous program was of type text, while in this program it is of type file of
longint.
In principle, a file can consist of records of almost any type, only file types cannot be
used; it is strongly discouraged (although possible) to use pointers, which we will consider in
the next chapter. A file can consist of data of any other type; in particular, using the file
of char type, you can open a text file as a typed file, or any file at all, because any
file consists of bytes.
Very often a record is used as a record type in a typed file. For example, when
creating a program for working with topographic maps, we could use a file that contains points
on the terrain, given latitude and longitude and named. For this purpose we could describe
such a type:
type
NamedPoint = record
latitude, longitude: real;
name: string[15];
end;
var
f: file of NamedPoint;
To create such a file you can use the rewrite procedure, to open an existing file you can
use the reset procedure. There is no open-add operation for typed files.
Unlike text files, which consist of lines of different sizes, records of typed files have a
fixed size, which allows you to alternate read and write operations to any place in an existing
file. You can change the current position in an open typed file using the seek procedure,
which must be passed two parameters: a file variable and the record number (the very first
record in the file has the number 0). For example, the following two lines:
seek(f, 15);
write(f, rec);
§ 2.9. Files 377
will write the rec record to position #15, regardless of which file positions we have
worked with before. This can be used to modify individual records of an existing file, which
is especially important for files of significant size because it allows us to avoid overwriting
them. For example, let's say we have a file consisting of NamedPoint records and
we need to take the record number 705 from this file and change its name (i.e. the value of
the name field) to the string 'Check12'. To do this, we can read this record into a
variable of the NamedPoint type (we will assume that we have such a variable and it is
called np), change the value of the name field and write the resulting record to the same
place:
seek(f, 705);
read(f, np);
np.name := 'Check12';
seek(f, 705);
write(f, np);
Note that we had to apply seek again before writing; the point is that after the read
operation, the current position in the open file f corresponded to the record following the
read, i.e. in this case, record #706, and we had to correct this.
Generally speaking, not all files that can be opened support changing the current position; those
I/O streams for which the very notion of "current position" makes sense are called positionable.
These include, in particular, ordinary disk files; but for threads associated with the terminal (keyboard
input, screen output) or with the /dev/null pseudo-device mentioned in §1.2.11, the current
position is not defined. We will discuss the "positionability" of I/O threads in detail in Volume 2.
Since typed files allow alternating read and write operations, by default, procedures that
open a typed file open it in "read and write" mode. This applies to both reset and
rewrite: the only difference between them is that rewrite creates a new file, and if
there is already a file with that name, it eliminates its old contents; reset does neither, and
if there is no file with that name, an error is thrown.
This approach can cause problems, for example, when working with a file that the
program should only read, but the program does not have enough authority to write it. In such
a situation, an attempt to open the file for reading and writing will fail, i.e. both reset and
rewrite will fail. The problem is solved by a global variable called filemode; by
default it contains the value 2, which means that typed files are opened for reading and
writing. If we write 0 to this variable, files will be opened in read-only mode, which will
allow us (using the reset procedure) to successfully open a file that we don't have write
privileges for, but we do have read privileges for; of course, we will only be able to read such
a file. Very rarely there is a situation when we have write permissions to a file, but no read
permissions. In this case we need to set the filemode variable to 1 and use rewrite
to open the file.
var
f: file;
As with other file types, untyped files are named using the assign procedure, and opened
by calling the familiar reset and rewrite procedures, but these procedures have
a second parameter when working with untyped files - an integer indicating the block size. It
is very important not to forget about this parameter, because "by default" (i.e. if you forget to
specify the second parameter) the block size will be 128 bytes, which usually does not
correspond to our purposes. It is not clear why such a "default" is adopted; as we have already
mentioned, the most common size of a "block" is one byte, which is the most universal.
Just as with typed files, both the reset and rewrite routines attempt to open a file
in read-and-write mode by default; this can be affected by changing the value of the global
variable filemode, as described in the previous paragraph.
To read from and write to untyped files, the BlockRead and BlockWrite
procedures are used, which are very similar to each other: both receive four parameters each,
with the first parameter being a file variable, the second parameter being a variable of arbitrary
type and size (except for file variables) into which the information read from the file will
be placed or from which the information to be written to the file will be taken (for
BlockRead and BlockWrite, respectively). The third parameter is an integer
specifying the number of blocks to be read or written, respectively; of course, the product of
this number by the block size to be used must never exceed the size of the variable specified
by the second parameter. Finally, as the fourth parameter, a variable of the longint,
int64, word or integer type is passed to the procedures, and into this variable the
procedures record the number of blocks that they actually managed to read or write. This
result may generally be less than what we expected; this is most often the case when reading,
when there is less information left in the file than we are trying to read. For example:
const
BufSize = 100;
var
f: file;
buf: array[1..BufSize] of char;
res: integer;
begin
§ 2.9. Files 379
{ ... }
BlockRead(f, buf, BufSize, res);
{ ... }
BlockWrite(f, buf, BufSize, res);
One special case is very important for us, which happens only when using BlockRead: if
the variable specified by the last parameter contains the value 0 after the function call, it
means the "end of file" situation has occurred.
In principle, you can leave the fourth parameter unspecified, then any discrepancy between the
result and the expectation will cause an error. It is strongly not recommended to do so, especially
when reading: in fact, if there is not enough data left in the file or if the end of the file is reached, there
is nothing wrong with it, and in general, working with the fourth parameter allows you to write
programs more flexibly.
For example, let's consider a program that copies one file to another, getting their names
from the command line parameters. We will use untyped file variables for both source and
destination files; we will open the first file in read-only mode and the second in write-only
mode. We will read the file in fragments of 4096 bytes (4 Kb), and this size will be set
to a constant; we will use an array of byte type elements of the corresponding size as a
buffer, i.e. a variable where the read information is placed.
We will write to the target file at each step exactly as much information as was read from
the source file. When the "end of file" situation occurs, we will immediately terminate the
read/write loop, and we will have to do it before writing, i.e. from the middle of the loop body;
we will use the break operator for this purpose, and make the loop itself "infinite". After
the loop is finished, we will naturally have to close both files. Since we already know about
the existence of the ErrOutput variable denoting the error message stream, we will output
all such messages into this stream as it should be. After detecting errors, we will terminate the
program with code 1 to show the operating system that something went wrong. The complete
program will look like this:
var
p: "integer;
q: "real;
then p will be able to store the address of a variable of type integer, and q will
be able to store the address of a variable of type real.
The address of a variable can be obtained using the address taking operation, which is
denoted by "@" , i.e., for example, the expression @x gives the address of variable x. In
182
particular, if we describe a variable of type real and a variable of type pointer to real:
var
r: real;
p: "real;
then it will be possible to put the address of the first variable into the second one:
p := @r;
Just in case, let us emphasize that the address taking operation can be applied to any
variable, not only to a variable that has an identifier name. For example, it can be used to
get the address of an array element or a record field.
Pointers and address expressions in general would be completely useless if it were
impossible to do something with a memory region (i.e., in most cases, with a variable)
knowing only its address. For this purpose, the dereference operation is used, the name of
which is often translated into Russian by the unpretentious word "dereferencing", although
we could use, for example, the term "address reference". This operation is denoted by the
already familiar symbol """, which is placed after the pointer name (or, generally speaking,
after an arbitrary address expression, which can be, for example, a call of a function returning
an address). Thus, after the address assignment from the above example, the expression p~
will denote "what p points to", which in this case is the variable g. In particular, the operator
р' := 25.7;
will put the value 25.7 into the memory located at the address stored in p (i.e. simply
182
In the original version of Pascal there was no such operation, which, in our opinion, complicates not
only the work, but also the explanations; fortunately, in modern versions of Pascal the address taking
operation is always present.
§2.10. Addresses, pointers and dynamic memory 423
into the variable g), and the operator
writeln(p');
then it will be possible to put the address of any type into such a variable; moreover, the value
of this variable can be assigned to a variable of any pointer type, which is actually fraught
with errors: for example, you can put the address of a variable of the string type into
ar, and then forget about it and assign its value to a variable of the "integer" type; if
you now try to work with such a variable, nothing good will come out, because in fact this
address contains not integer, but string. That's why you should be extremely careful
when working with untyped pointers, and it's better not to use them at all unless you seriously
need it. Note that such a need will arise for you not soon, if at all: the real need to use untyped
pointers appears when creating non-trivial data structures, which may be needed only in large
and complex programs.
We might not have mentioned untyped pointers at all in our book, if it weren't for the fact
that the result of the address fetch operation is an untyped address. Why the creators of Turbo
Pascal, who first introduced the address fetch operation into this language, did it this way
remains unclear (for example, in C, the same operation is perfectly capable of generating a
typed address). This aspect of compiler behavior can be corrected, however, by inserting the
{$T+} directive into the program (for example, in its beginning).
There is another important case of an untyped address expression - the built-in constant
nil. It denotes an invalid address, i.e. one where no variable can be located in memory, and
is assigned to variables of pointer types to show that the pointer does not point anywhere at
the moment. The constant nil is sometimes called a null pointer , although it is not strictly
speaking a pointer, since a pointer is such a variable.
If we try to extract the "dry residue" from this paragraph, we get the following:
• if t is some type, then ~t is a "pointer to t" type;
• if x is an arbitrary variable, the expression @x means "address of variable x" (untyped
by default, but if you apply the {$T+} directive, it has the type "address of type T",
where T is the type of variable x);
• if p is a pointer (or other address expression), then p~ denotes "what p points
to";
• the word nil denotes a special "null address" used to show that this pointer does not
§2.10. Addresses, pointers and dynamic memory 424
point to anything at the moment.
var
p: "string;
then we can now create a dynamic variable of type string; this is done using new:
new(p);
When executing this new, first, a memory area of 256 bytes will be allocated from the heap,
which will become our new (dynamic) variable of the string type; second, the address of
this memory area will be stored in the p variable. Thus, we now have an unnamed
variable of the string type, the only way to access it is through its address: the expression
p~ corresponds to this variable. We can, for example, put a value into this variable:
The structure obtained in the program memory can be schematically represented as shown in
Fig. 2.4.
Deleting a dynamic variable that we no longer need is done using the dispose
pseudoprocedure; its parameter is the address of the variable to be disposed of:
dispose(p);
Strangely enough, the value of the pointer p does not change; the only thing that happens
is that the memory that was occupied by the variable p~ is returned back to the heap; in
other words, the status of this memory area is changed, instead of being occupied, it is marked
as free and available for release on demand (by one of the following new). Of course, the
value of the pointer p cannot be used after that, because we have informed the heap manager
that we are no longer going to work with the variable p~.
It is important to realize that a dynamic variable is not bound to a specific pointer. For
§2.10. Addresses, pointers and dynamic memory 426
example, if we have two pointers:
var
p, q: 'string;
new(p);
q: = p;
and work with the resulting variable, labeling it as q~ instead of p~; indeed, the address of
our variable is now in both pointers. Moreover, we can occupy the pointer p for something
else, and work with the previously allocated variable only through q, and, when the time
comes, delete this variable using, again, the pointer q:
dispose(q);
What is important here is the address itself (i.e. the value of the address), not which of the
pointers this address currently lies in.
Another very important point is that if you are careless, you can easily lose the address of
a dynamic variable. For example, if we create a variable using new(p), work with it, and
then execute new(p) again without deleting the variable, the heap manager will allocate a
new variable and store its address in the pointer p; as always in such cases, the old value of
the variable p will be lost, but the address of the first allocated dynamic variable was stored
there, and we have no other way to access it!
A dynamic variable that has been forgotten to be freed but is no longer pointed to by any
pointer becomes so-called garbage (garbage; this term should not be confused with the word
junk, which also translates as garbage but in programming usually means meaningless data
rather than lost memory). Some programming languages provide so-called garbage collection,
which ensures automatic detection of such variables and their return to the heap, but Pascal
has no such thing, and this, in general, is even good. The thing is that garbage collection
mechanisms are quite complex and often trigger at the most inopportune moment,
"suspending" for some time the execution of our program; for example, at the DARPA Grand
Challenge robot car competition in 2005, one of the cars running Linux and Java programs
flew at a speed of about 100 km/h into a concrete wall, and one of the possible causes was a
garbage collector that was not activated in time.
Anyway, Pascal doesn't have garbage collection, so we need to be careful about deleting
unnecessary dynamic variables; otherwise, we can run out of available memory very quickly.
By the way, programmers call the process of garbage generation as memory leaks. Note
that memory leaks indicate only one thing: the program author's carelessness and inattention.
There can be no excuses for memory leaks, and if someone tries to tell you the opposite, do
not believe it: such a person simply does not know how to program.
§2.10. Addresses, pointers and dynamic memory 427
The material of this paragraph may leave you a bit perplexed. In fact, why describe a
pointer (for example, to a string, as in our example) and then do some kind of new, if you
can immediately describe a variable of type string and work with it as usual?
Working with dynamic variables makes some sense if these variables have a relatively
large size, for example, several hundred kilobytes (it can be an array of records, some fields
of which are also arrays, etc.); it is simply dangerous to describe a variable of such a size as a
regular local variable, as there may not be enough stack memory, but there will be no problems
with placing it in the heap; besides, if you have many such variables, but not all of them are
needed at the same time, it may be useful to create and delete them However, all these
examples are, frankly speaking, a bit fanciful; pointers are nearly
are never used in this way. The full potential of pointers is revealed only when creating so-
called linked dynamic data structures, which consist of separate variables of the "record" type,
and each such variable contains one or more pointers to other variables of the same type. We
will consider some of such data structures in the following paragraphs.
type
§2.10. Addresses, pointers and dynamic memory 428
item =
geso
rrdddud
data: integer;
next: "item; { error! item type is not yet described }
end;
At the same time, Pascal allows us to describe a pointer type using a type name that has not
yet been introduced; as a consequence, we can give a separate name to the item pointer
type before we describe the item type itself, and use that name when describing the item,
for example:
type
itemptr = "item;
item = record
data: integer;
next: itemptr;
end;
Such a description will not cause any objections from the compiler. Having item and
itemptr types, we can describe a pointer for working with a list:
var
first: itemptr;
and the list itself, shown in the figure, can be created, for example, like this:
new(first);
first".data := 25;
new(first".next);
first".next".data := 36;
new(first".next".next);
first".next".next".next".data := 49;
first".next".next".next".next := nil;
Constructions like first".next".data may scare you off; it's a normal reaction, but
we can't afford to be scared for long, so we'll have to figure out what's going on here. So, since
the pointer to the first element is called first, the expression first" will denote
the whole element. Since this element itself is a record of two fields, the fields are accessed,
as with any record, via a dot and a field name; thus, first".data is the field of the
first element.
§2.10. Addresses, pointers and dynamic memory 429
element, which contains an integer (in the figure and in our example it is the number 25), and
first~.next~ is a pointer to the next (second) element of the list. In turn,
first~.next~ is the second element of the list itself, and so on (see Figure 2.6).
If something here remains unclear, there is no point in moving on! First get an
understanding of what is going on, otherwise you will not understand anything in the
further text.
Of course, this is not how lists are handled in most cases; the example that required the
dreaded first~.next~.next~.next~.next~.next is more for illustrative
purposes. Usually, when working with a single-linked list, one of two things is done: either
the elements are always placed at the beginning of the list (this is done with an auxiliary
pointer), or two pointers are stored, one at the beginning and one at the end of the list, and the
elements are always placed at the end. Before we show you how it is done, we will offer you
two problems that we strongly recommend you to try to solve yourself, at least before you
read the rest of this paragraph. The point is that once you have figured out how to do it
yourself, you will never forget it, and working with lists will never cause you problems again;
if you start right away with the examples we will give next, it will be much harder to figure
out how to do it yourself, because even the most strong-willed people often cannot resist the
temptation to write "by analogy" (i.e., without fully understanding what is going on).
So here's the first task; it will require creating a single-linked list and adding items to the
beginning of it:
Write a program that reads integers from the standard input stream until the situation
"end of file" occurs, then prints all the entered numbers in reverse order. The number
of numbers is unknown in advance, it is forbidden to introduce explicit restrictions on
this number.
The second task will also require the use of a single-linked list, but we will have to add new
elements to the end of the list, for which we need to work with the list through two pointers:
the first will store the address of the first element of the list, the second - the address of the
last element.
Write a program that reads integers from the standard input stream until
§2.10. Addresses, pointers and dynamic memory 430
the situation "end of file" occurs, after which it prints all entered numbers
twice in the order in which they were entered. The number of numbers is
unknown in advance, it is forbidden to introduce explicit restrictions on
this number.
Keep in mind that you can consider such problems solved no sooner than you have a program
written by you and corresponding to the conditions working on your computer (and working
correctly). Even in this case, the problem is not always correctly solved, because you may
miss some important cases during its testing or simply misinterpret the results obtained; but
if there is no working program at all, it is out of the question that the problem is solved.
In the hope that you have at least tried to solve the proposed tasks, we will continue our
discussion of working with lists. First of all, let us note one extremely important point. If the
context of the problem to be solved implies that you start working with an empty list, i.e. a
list that does not contain a single element, be sure to turn your pointer into a correct empty
list by entering the value nil into it. Beginners often forget about this and get constant
crashes as an output.
Adding an element to the beginning of a singly-linked list is done in three steps. First,
using an auxiliary pointer, we create (in dynamic memory) a new list element. Then we fill
this element; in particular, we make its pointer to the next element point to the element of the
list which is now (for now) the first, and after adding a new one it will become the second,
i.e. just the next element after the new one. The necessary address, as it is easy to guess, is in
the pointer to the first element of the list, and we will assign it to the next field in the new
element. Finally, the third step is to recognize the new element as the new first element of the
list; this is done by entering its address into the pointer storing the address of the first element.
§2.10. Addresses, pointers and dynamic memory 431
first
tmp
d)
first
Schematically what is happening is shown in Fig. 2.7 on the example of the same list of
integers, consisting of elements of type item, and the address of the first element
of the list is stored in the pointer first; at first, this list contains elements storing
the numbers 36 and 49 (Fig. 2.7, "a)"), and we need to put a new element containing the
number 25 in its beginning. For this purpose, we introduce an additional pointer, which will
be called tmp from the English temporary, which means "temporary":
var
tmp: itemptr;
§2.10. Addresses, pointers and dynamic memory 432
In the first step we, as mentioned, create a new element:
new(tmp);
The resulting situation is shown in Fig. 2.7, "b)". Nothing has happened to the list yet, the
created new element does not affect it in any way. The element itself is still very "unfinished":
both of its fields contain incomprehensible garbage, which is shown in the figure by "?!"
symbols. It's time to make the second step - to fill in the fields of the new element. In the
data field we will need to enter the number 25, while the next field should
indicate the element that (after including the new element in the list) will become the next
after the new one; this is, in fact, the element that is now the first in the list, i.e. its address is
stored in the pointer first, and we will assign it to the next field:
tmp'.data := 25;
tmp'.next := first;
The state of our data structure after these assignments is shown in Fig. 2.7, "c)". All that
remains is to declare the new (and fully prepared for its role) element as the first element of
the list by assigning its address to the first pointer; since this address is stored in tmp,
everything turns out to be quite simple:
first := tmp;
The result is the situation shown in Fig. 2.7, "d)". Forgetting about our temporary pointer and
the fact that the first element of the list was just "new", we get exactly the state of the list we
were aiming for.
This three-step procedure of adding a new element to the beginning has one remarkable
property: for the case of an empty list, the sequence of actions to add the first element is
completely the same as the sequence of actions to add a new element to the beginning of
a list that already contains elements. The corresponding sequence of states of the data
structure is shown in Fig. 2.8: at first the list is empty ("a)"); at the first step we create a new
element ("b)"); at the second step we fill it ("c)"); the last action makes this new element the
first element of the list ("d)"). All this works only if we remembered to make the list
correct before starting to work with it by putting the value nil in the first pointer!
§2.10. Addresses, pointers and dynamic memory 433
Fig. 2.8. Operation of the procedure of entering to the beginning with initially empty
list
If we keep putting the items at the beginning of the list, they will be placed in the list in
the opposite order to the order in which they were put there; this is what is required in the first
of the two problems proposed earlier. The solution will consist of two loops, the first of them
will read numbers until the "end of file" situation occurs , and after reading the next number
183
it will be added to the beginning of the list; after that it will be left only to go through the list
from its beginning to the end, printing the numbers contained in it. The whole program will
look like this:
program Numbersl;
type
itemptr = "item;
item = record
data: integer;
next: itemptr;
end;
var
first, tmp: itemptr;
n: integer;
begin
first := nil; { make the list correctly empty! }
while not SeekEof do{ loop reading numbers }
begin
read(n);
new(tmp); { created }
tmp".data := n; { fill }
tmp'.next := first;
first := tmp{ include in list }
end;
In this text, we have never used dispose; here, freeing memory makes little sense, because
immediately after the first (and only) use of a constructed list, the program is terminated, so
all the memory allocated to it in the system is freed; of course, this includes the heap. In more
complex programs, such as those that build one list after another and so many times, memory
freeing should never be forgotten. You can free memory from a single-linked list by using a
loop in which at each step the first element is removed from the list, and so on until the list is
empty. There is a trick to this too. If you just dispose(first) the first element of the
list will cease to exist, which means we won't be able to use its fields, but the address of the
second element is stored there. Therefore, before destroying the first element, we must
memorize the address of the next element; after that, the first element is destroyed and
the pointer first is assigned the address of the next element stored in the auxiliary
pointer. The corresponding loop looks like this:
You can also do something else: remembering the address of the first element in the auxiliary
pointer, exclude this element from the list by changing first accordingly, and then
delete it:
first
Figure 2.9. Two ways of deleting the first element of a single-linked list
Fig. 2.9 shows the first method on the left and the second on the right. You can decide for
yourself which one is better; there is no difference in efficiency between them.
As you can see, the condition of the second problem differs from the condition of the first
one mainly in the order in which the items are printed. In principle, no one prevents us from
building the list "backwards", as for the first problem, and then "reverse" it by a separate loop
(try to figure out how to do it yourself as an exercise); but it would be more correct to build
the list in the required order at once, for which we need to add items to the end of the list, not
to its beginning.
If a single-linked list is to be incremented "from the end", it is usual to store not one
pointer for this purpose, as in the previous examples, but two pointers: to the first element of
the list and to the last element. When the list is empty, both pointers must be set to nil. The
procedure of adding a new element to such a list (to its end) looks even simpler than the just
discussed procedure of adding it to the beginning: if, for example, the pointer to the last
element is called last, then we can create a new element right where we need it, i.e. put its
address in last~.next, then move the pointer last to the newly created last element
and fill its fields after that; if we need to add the number 49 to the list, we can do it this
way (see Fig. 2.10):
new(last'.next);
§2.10. Addresses, pointers and dynamic memory 436
first
first
first
first
As is often the case, any simplification has its price. Unlike putting in the beginning, when
only the first pointer is stored for the list, putting in the first element when there
are two pointers is a special case that requires a completely different set of actions: in fact,
the last pointer does not point anywhere in this situation, because the last element
does not exist yet (and none exists at all), so the last~.next variable that we used so
dashingly does not exist either. The correct sequence of actions for putting the number 25
into the list, which was empty before, will now be as follows:
As we can see, the last two lines remained exactly the same as before (we have purposely
tried to make it so), while the first two lines were completely changed. The above can be
generalized. If the number we put into the list is in the variable n, and we don't know whether
there is at least one item in the list by that time or not, then the correct code for putting an
item into the list would be as follows:
implementation details of our abstract object lead to changes neither in the interface itself nor
in the explanations of how to use it, i.e. can be painlessly ignored by those programmers who
will use our product.
One approach to creating good interfaces is to first think of an interface based on the set
of operations that the user expects from us, and then try to write an implementation that will
allow the procedures in our interface to work as intended. Since we start thinking about the
implementation after the interface is created, such an interface is very likely to be independent
of a particular implementation, which is what we need.
Let's try to apply this to the case of a stack; for definiteness, let it be a stack of integers of
the longint type. It is clear that no matter how it is realized, we will have to store at least
something, at least some information (if it is realized as a single-linked list, it will be a pointer
to the first element of the list, but we don't want to think about it yet). We can store information
only in variables, and if we remember the existence of variables of the "record" type, we will
notice that we can store any information (of a reasonable size) in one variable if we wish, we
just need to choose a suitable type for it. So, whatever the implementation of our stack is, it
is necessary and sufficient for each stack (of which there can be as many as we like) to describe
a variable of some type; not knowing what the implementation will be, we cannot say what
type it will be, but this does not prevent us from naming it. Since we are talking about a stack
of numbers of the longint type, let's call this type StackOfLongints.
The basic operations on the stack are traditionally denoted by the words push (adding a
value to the stack) and pop (extracting a value). We implement both of these operations as
procedures. At first glance it would be logical to call them something like
StackOfLongintsPush and StackOfLongintsPop, but looking carefully at
the resulting long names, we may (quite naturally) doubt that it will be convenient to use them.
Therefore, we will replace the cumbersome StackOfLongints with the short
SOL, and call the procedures SOLPush and SOLPop respectively. We will provide each
procedure with two parameters. The first of them in both cases will be a variable parameter
The abbreviation API, formed from the words application programming interface, i.e. "application
184
removing it from the stack when it is empty is a known error. To avoid such an error, we will
add to our procedures a logical function SOLIsEmpty, which receives the stack as a
parameter and returns true if it is empty and false otherwise. Of course, the function will not
change the stack, but we will still pass the stack through the parameter-variable to avoid
copying the object - possibly cumbersome; looking ahead, we will note that our "stack object"
will be a simple pointer, but at the current stage of interface design we don't "know" this yet.
Let's agree to completely exclude SOLPop calls for an empty stack when working with
186
our subroutines; for this purpose we will always need to check with SOLIsEmpty whether
the stack is empty before calling the SOLPop procedure (unless the presence of values in
the stack is obvious - for example, immediately after executing SOLPush, but this is rarely
the case).
As a matter of fact, we're almost done inventing the interface. Let's remember that a
variable, if nothing is assigned to it, can contain arbitrary garbage, and add a procedure that
turns the freshly described variable into a correct empty stack; let's call it SOLInit. That's
it, the interface is ready. Remembering the method of step-by-step detailing , in which empty
subroutines are written first, we can fix our interface in the form of a program fragment:
type
StackOfLongints = ;
Of course, this fragment won't even compile because we haven't specified what
StackOfLongints is; but we have to start somewhere. Now that the interface is
completely fixed, we can start implementing it. Since we wanted to use a single-linked list,
Except for the case when there is not enough memory, but modern operating systems are such that in this
185
case our program will not know about the problem: it will simply be killed, and automatically.
If it comes to writing documentation on our procedures and functions, such agreements should definitel y
186
be reflected in it.
§2.10. Addresses, pointers and dynamic memory 440
we need a type to link it:
type
LongItemPtr = "LongItem;
LongItem = record
data: longint;
next: LongItemPtr;
end;
Now we can start "fleshing out" our "stub" interface subroutines. To begin with, let's note that
only a pointer to the beginning of the list is needed to work with the stack, so we can use a
variable of type LongItemPtr as StackOfLongints. Let's reflect this fact:
type
StackOfLongints = LongItemPtr;
With the use of these procedures, the solution to the problem of outputting an input
sequence of numbers in reverse order, which we considered in the previous paragraph, can
become noticeably more elegant:
var
s: StackOfLongints;
n: longint;
begin
SOLInit(s);
while not SeekEof do
begin
read(n);
SOLPush(s, n)
end;
while not SOLIsEmpty(s) do
begin
SOLPop(s, n);
writeln(n) end
end.
The full text of our example can be found in the sol.pas file.
Many programmers prefer, when calling a subroutine may cause an error, to form it not as a
procedure, but as a function that returns a logical value. For example, we could execute SOLPop
as a function : 187
This would make the second loop from our example noticeably shorter: while SOLPop(s, n) do
writeln(n)
The disadvantage of this solution, as you can guess, is that here the function has a side effect. We
will refrain from using such techniques until we are forced to do so by the specifics of the C language.
By the way, the first edition of the book did exactly that.
187
§2.10. Addresses, pointers and dynamic memory 442
Let's try to apply the same methodology to the queue. It is clear that we will definitely
need some variable to organize the queue; let's call it of type QueueOfLongints. The
basic operations on a queue, unlike the stack, are usually called put and get; like the stack, the
queue will need an initialization procedure and a function to find out if the queue is empty.
The following stubs can be written to fix the interface:
type
QueueOfLongints = ;
For implementation, we will use a list of items of the same LongItem type, but note
that we need two pointers of type LongItemPtr to represent the queue; to combine
them into one variable, we will use a record:
{ qol.pas }
type
QueueOfLongints = record first, last: LongItemPtr; end;
queue.first := nil;
queue.last := nil
end;
This code, complete with a demo main program, is in the qol.pas file.
tmp := first;
while tmp <> nil do
begin
{ actions with element tmp" } tmp := tmp".next
end
— At the moment of working with an element, we do not remember where the previous
element is located in memory and cannot change the value of its next field. We can try to
§2.10. Addresses, pointers and dynamic memory 444
deal with this problem by storing the address of the previous element in the loop variable:
tmp := first;
while tmp".next <> nil do
begin
{ actions with element tmp".next" } tmp := tmp".next
end
— but then the first element of the list will have to be processed separately; this, in turn, will
entail special processing for the case of an empty list. Besides, this fragment itself can be
executed only if the list is not empty, otherwise it will crash when trying to calculate the
condition in the loop header; the list may be empty as a result of throwing out the first
elements, so an additional check will be needed here as well. As a result, to remove all
negative values we get something like the following fragment:
It is hard to call the resulting solution beautiful: the fragment is cumbersome because of the
presence of two loops and volumizing if'ci:, and it is rather difficult to read. Meanwhile, in
fact, when deleting the first element, exactly the same actions are performed as when deleting
any other element, only in the first case they are performed on the pointer first, and in
the second case - on tmp~.next.
To reduce the size and make the solution clearer, a non-trivial technique involving a
pointer to a pointer allows us to reduce the size and make the solution clearer. If our list link
type is called item, as before, and its pointer type is called itemptr, then such a
§2.10. Addresses, pointers and dynamic memory 446
pointer to pointer pointer is described as follows:
var
pp: "itemptr;
The resulting variable pp is intended for storing the address of the pointer to the item,
so it can equally well contain both the address of our first and the address of the next
field located in any of the items of the list. This is exactly what we are trying to achieve; we
organize the loop through the list with the removal of negative items by using pp as a loop
variable, and pp will first contain the address of first, then the address of next from
the first link of the list, from the second link, and so on (see Fig. 2.11). In general, pp will at
every moment point to the pointer where the address of the current (considered) link of the
list is located. The initial value of the pp variable will be obtained as a result of the obvious
assignment pp := @first, the transition to the consideration of the next element will
look a bit more complicated:
pp := @(pp"".next);
If the two "caps" after pp seem too complicated, remember that pp" is what pp points
to, i.e., as we agreed, a pointer to the current element; hence, pp is the current element
itself, pp .next is its next field (i.e., just a pointer to the next element), we take
an address from it, put that address into pp, and then we're ready to work with the next
element.
The transition to the next one should be performed only if the current element has not
been deleted; if the element has been deleted, the transition to the next one will happen by
itself, because although the address in pp will not change, the address stored in the pp"
pointer (whether it is the first or the next field, it doesn't matter) will change.
It remains to understand what the condition of such a loop should be. The loop should
end when there is no next element, i.e. when the pointer to the next element contains nil.
This can be either the first pointer (if the list is empty) or the next field of the last
element in the list (if there is at least one element in the list). This condition can be checked
by the expression pp" <> nil. Finally (assuming that the variable tmp is of type
itemptr, as before, and the variable pp is of type "itemptr), our loop will take
the following form:
pp := @first;
while pp" <> nil do
begin
if pp"".data < 0 then begin
tmp := pp";
pp" := pp"".next;
dispose(tmp)
end else
pp := @(pp .next) end
§2.10. Addresses, pointers and dynamic memory 447
Comparing this solution with the previous one, we can see that it repeats almost verbatim the
loop that was the main loop in that solution, but we have lost the separate loop for removing
items from the beginning and separate checks for the list emptiness. The use of a pointer to a
pointer allowed us to generalize the main loop, thus removing the need to handle special cases.
Using a pointer to pointer allows you not only to delete elements from any place in the
list, but also to insert elements in any position of the list - more precisely, in the place where
the pointer pointed to by the working pointer is currently pointing. If the working pointer
contains the address of the first pointer, the insertion will be performed at the
beginning of the list, but if it contains the address of the next field of one of the links of the
list, the insertion will take place after this link, i.e. before the next link. It can also be an
insertion to the end of the list, if only the address of the next field from the last link of
the list is in the working pointer. For example, if the working pointer is still called pp, the
auxiliary pointer of the itemptr type is called tmp, and the number to be added to the
list is stored in the variable n, then inserting a link with the number n into the position
marked with pp will look like this:
new(tmp);
tmp".next := pp"; tmp".data := n; pp" := tmp;
As you can see, this fragment repeats verbatim the procedure for inserting an element at the
beginning of a single-linked list that we saw on page 413, except that pp" is used instead of
first; as with deleting an element, this method of inserting an element is a generalization of
the procedure for inserting at the beginning. 413, except that pp" is used instead of
first; as in the case of deleting an element, this method of inserting an element is a
generalization of the procedure for inserting an element at the beginning. The generalization
is based on the fact that any "tail" of a singly-linked list is also a singly-linked list, only instead
of first, the next field of the element preceding such a "tail" is used for it. We will
use this fact later; among other things, it allows a rather natural application of recursion to the
processing of single-linked lists.
By being able to insert links into arbitrary places in a list, we can, for example, work with
a list of numbers sorted in ascending order (or descending order, or by any other criterion).
At the same time, we need to make sure that each element is inserted exactly in the right
position of the list, so that the list remains sorted after such an addition.
This is done quite simply. If, as before, the number to be inserted is stored in the variable
n, and the list to the beginning of which first points is sorted in ascending order, then
inserting a new element can be done, for example, in the following way:
pp := @first;
while (pp"" <> nil) and (pp"".data < n) do
pp := @(pp"".next);
new(tmp);
tmp".next := pp";
tmp".data := n;
pp" := tmp;
§2.10. Addresses, pointers and dynamic memory 448
Only the first three lines are of interest here, we have already seen the other four. The while
loop allows us to find the right place in the list, or, more precisely, the pointer that points to
the link before which we need to insert a new element. As usual, we start from the beginning,
in this case it means that the address of pointer first is written into the working pointer.
There may be no next element in the list at all - if there are no elements in the list at all, or if
all the values stored in the list are less than the one that will be inserted now; in this case, the
pointer pp" (i.e. the one our working pointer points to) will have the value nil - hence
the first part of the condition in the loop. In such a situation, we have nowhere to go further,
we need to insert it right now.
The second part of the loop condition ensures that we find the right position: as long as
the next element of the list is smaller than the inserted one, we need to move further down the
list.
Here we can pay attention to one important peculiarity of calculating logical expressions. If, for
example, the pp"" pointer is equal to nil, an attempt to access the pp"".data variable will
lead to an immediate program crash, because the pp"" record simply does not exist. Fortunately,
such a reference will not happen. This happens because Free Pascal uses "lazy semantics" when
evaluating logical expressions: if the value of an expression is already known, its further
subexpressions are not evaluated. In this case, there is an and conjunction between the two parts
of the condition, so if the first part is false, the whole expression is false, so there is no need to
evaluate the second part.
It will be useful to know that the original Pascal, as described by Niklaus Wirth, did not have this
property: when any expression was evaluated there, all its subexpressions were evaluated. If Free
Pascal had done the same thing, we would have had to go to a lot of trouble, because evaluating the
condition (pp~ <> nil) and (pp".data < n) would have guaranteed a crash if pp~
was nil. I must say that the break operator was not present in the original Pascal either, instead
of it we had to use goto. However, the original Pascal didn't have the address fetch operation either,
so we couldn't apply our "pointer to pointer" technique there.
Free Pascal allows you to use the "classical" semantics of evaluating logical expressions, where
all subexpressions are necessarily evaluated. This is enabled by the {$B+} directive, and disabled
by the {$B-} directive. Fortunately, it is the {$B-} mode that is used by default. Most likely, you
will never need to change this mode in your practice; if you think you do, think again.
Let us note one more point just in case. For some reason, the authors of many tutorials are afraid
to talk about the address capture operation, and without it, the technique described here is
inaccessible. Instead of it, such a strange construction as a list with a key link is occasionally
considered, and only its next field is used from the "key link" (in the role in which we use a separate
pointer first), and all this only so that for any of the links used it is possible to consider a "pointer
to the previous element" (for the first element such "previous" turns out to be the "key link"). Needless
to say, such solutions are not applied in real life.
the first element of the list and the last element. An example of a double-linked list is shown
in Fig. 2.12.
A bilinked list can be constructed from links described, for example, as follows:
type
item2ptr = ~item2;
item2 = record
data: integer;
prev, next: item2ptr;
end;
As in the case of the word next for the pointer to the next element, the word prev (from
the English word previous) traditionally means the pointer to the previous element; of course,
you can use another name, but if your program is read by someone else, it will be more
difficult to understand it.
It should be said that doubly-linked lists are used somewhat less often than unilinked
ones, because they require twice as many pointer operations for any operations of link
insertion and deletion, and the links themselves become larger due to the presence of two
pointers instead of one, i.e. they occupy more memory. On the other hand, double-linked lists
have a number of undoubted advantages over single-linked lists. First of all, it is the obvious
symmetry, which allows to look through the list both in forward and backward directions; in
some problems it is important.
The second undoubted advantage of a doubly-linked list is not so obvious, but sometimes
even more important: knowing the address of any link of a doubly-linked list, we can find all
its links in memory. Incidentally, the pointer-to-pointer technique discussed in the previous
paragraph is never necessary when working with a doubly-linked list. In fact, let the current
link be pointed to by a pointer current; if the prev field in the current link is null, then
§2.10. Addresses, pointers and dynamic memory 450
it is the first link, and when inserting a new element to the left of it, first must be
changed; otherwise, it is not the first link, that is, there is a previous link in which the next
field must be changed, but we know where it is: the expression current~.prev~.next
gives the pointer that must be changed. In addition, the current~.prev field must be
changed. Inserting to the right of the current link is done in the same way: first change
current~.next~.prev, and if it does not exist (i.e. the current link is the last in the list),
then last; then change current~.next.
If the first, last and current links of a doubly-linked list are pointed to by the first,
last and current pointers respectively, the new number is in the variable n, and
the temporary pointer is called tmp, inserting a new link to the left of the current link looks
like this:
new(tmp);
tmp~.prev := current~.prev;
tmp~.next := current;
tmp~.data := n;
if current~.prev = nil then
first := tmp
else
current~.prev~.next := tmp; current~.prev := tmp;
new(tmp);
tmp~.prev := current;
tmp~.next := current~.next;
tmp~.data := n;
if current~.next = nil then
last := tmp
else
current~.next~.prev := tmp; current~.next := tmp;
This fragment has some disadvantage: the pointer current after its execution points to a
non-existent (destroyed) element. This can be avoided by using a temporary pointer to delete
an element, and putting the address of the previous or the next element into current
beforehand (before destroying the current element), depending on whether we go
through the list in forward or backward direction.
Adding an element to the beginning of a doubly-linked list, taking into account a
possible special case, can look like this:
§2.10. Addresses, pointers and dynamic memory 451
new(tmp);
tmp~.da.ta. := n;
tmp~.prev := nil;
tmp~.next := first; if first = nil then last := tmp else
first~.prev := tmp;
first := tmp;
Adding to the end is done in the same way (with the precision of replacing the direction):
new(tmp);
tmp~.data := n;
tmp~.prev := last;
tmp~.next := nil;
if last = nil then
first := tmp
else
last~.next := tmp;
last := tmp;
We can generalize the above procedure for inserting to the left of the current link by assuming
that if the current link does not exist (i.e., the insertion is "eleev from a nonexistent link"),
then it is an end-of-list insertion. It will look like this:
new(tmp);
if current = nil then
tmp~.prev := last
else
tmp~.prev := current~.prev;
tmp~.next := current;
tmp~.data := n;
if tmp~.prev = nil then
first := tmp
else
tmp~.prev~.next := tmp;
if tmp~.next = nil then
last := tmp
else
tmp~.next~.prev := tmp;
A similar generalization of insertion to the right of the current link to the situation "insertion
to the right of a nonexistent link is insertion to the beginning" happens almost the same way,
only the lines responsible for filling the fields tmp~.prev and tmp~.next are
changed (in our fragment these are lines two through six):
We do not provide schematic diagrams of pointer changes for all these cases, leaving the
visualization of what is happening to the reader as a very useful exercise.
Bilinked lists allow you to create an object commonly called a deque (deque ). A deque 188
is an abstract object that supports four operations: add to the beginning, add to the end, extract
from the beginning, and extract from the end. A value added to the beginning of a deck can
be immediately retrieved back (by applying extract from beginning), but if you extract values
from the end, the value just added to the beginning will be retrieved after all other values
stored in the deck; the situation is symmetrical with add to end and extract from beginning.
When using only "add to start" and "fetch from start" operations, the deck turns into a
stack (as well as when using add and fetch from end), and when using "add to start" and "fetch
from end" - into a queue; but you should not use the deck as a stack or a queue, because there
are much simpler implementations for stack and queue: they can be realized through a single-
linked list, while the deck cannot be realized through a single-linked list (or, rather, you can
use two such lists, but this is very inconvenient). Note that in English the corresponding
operations are usually called pushfront, pushback, popfront and popback.
As usual, we can start the implementation of a deck with a stub. For example, for a deck
storing numbers of the longint type, the stub would be as follows (as before, let's assume
that before accessing element extraction procedures, we always check if our object is empty):
type
LongItem2Ptr = ~LongItem2;
LongItem2 = record
data: longint;
prev, next: LongItem2Ptr;
end;
LongDeque = record
first, last: LongItem2Ptr;
end;
188
In fact, the origin of the term "deck" is not so obvious. Originally, the object in question was called
"double-ended queue", in English double-ended queue; these three words were shortened by English-speaking
programmers first to dequeue, and then to deque.
§2.10. Addresses, pointers and dynamic memory 453
procedure LongDequePopFront(var deque: LongDeque; var n:
longint);
begin
end;
procedure LongDequePopBack(var deque: LongDeque; var n:
longint);
begin
end;
function LongDequeIsEmpty(var deque: LongDeque) : boolean;
begin end;
We will offer the reader to implement all these procedures and functions himself.
of a "leaf" node (in the tree in the figure such a height is three). Note that a binary tree of
height h can contain up to 2h - 1 nodes; for example, a tree of height 20 can contain more than
a million values, and the search will be performed for 20 comparisons; if the same values were
in a list, the average number of comparisons for each search would require half a million.
Of course, upon closer examination, everything turns out to be not so simple and rosy.
The tree is not always as densely filled with nodes as in our figure. An example of such a
situation is shown in Fig. 2.14; at height 5, the tree here contains only nine nodes out of 31
possible nodes. Such a tree is called unbalanced. In the worst case, the tree can take a
degenerate form when its height is equal to the number of elements; this happens, for example,
if the values introduced into the tree are initially arranged in ascending or descending order.
However, the worst case scenario for a tree is always the case for a list. In addition, there are
tree balancing algorithms that allow you to rebuild the search tree by reducing its height to
the minimum possible height; these algorithms are quite complex, but they can be
implemented.
We will return to the consideration of binary search trees in the chapter devoted to
"advanced" cases of using recursion. The point is that all basic operations on a binary tree,
except for balancing, are many times (!) easier to write with recursion than without it. For
now, let us note that trees, of course, are not only binary; in the general case, the number of
descendants of a tree node can be made as large or dynamically changed as desired.
§2.10. Addresses, pointers and dynamic memory 455
In the search, the selection of a descendant of the next node is made according to the letters
in the alphabet. For example, there is a well-known method of storing text strings, in which
each node of the tree can have as many descendants as there are letters in the alphabet; when
searching, the descendant of the next node is selected according to the next letter in the string.
The height of such a tree is always equal to the length of the longest string stored in it, and it
does not need balancing.
The search time in a list is proportional to the length of the list, the search time in a
balanced binary tree is proportional to the binary logarithm of the number of nodes; for cases
when the search time is critical and the amount of stored data is large, there is an approach in
which the search time does not depend on the number of stored elements at all - it remains
constant even for a dozen values, even for ten million. The method is called a hash table.
To create a hash table, a so-called hash function is used. In essence, it is just some
arithmetic function that produces an integer number for a given search key value, and this
number should be difficult to predict; it is said that the hash function should be a well-
distributed random variable. It is important that the hash function should depend only on the
value of the key, i.e. the hash function should always take the same values on the same keys.
The hash table itself is an array of a size known to be of the same value.
The hash function is used to calculate the number of records exceeding the required number
of records; initially, all array elements are marked as free. Having calculated the hash function,
§2.10. Addresses, pointers and dynamic memory 456
its value is used to determine in which array position the record with the specified key should
be located; this is done by simply calculating the remainder of the hash function value divided
by the array length. To reduce the probability of two keys hitting the same position, the size
of the array used for the table is usually chosen to be a simple number. 189
If the calculated position is empty when entering a new element into the hash table, the
new record is entered into this position. If the calculated position is empty when searching for
a key in the hash table, it is considered that there is no record with this key in the table. A
somewhat more interesting question is what to do if a position is occupied but the key of the
record in this position is different from the key we need; this situation means that two different
keys, despite all our efforts, have the same residuals from dividing the hash function by the
array length. This is called a collision. There are two main methods of collision resolution.
The first of them is very simple: the array stores not the records themselves, but pointers that
allow forming a list (a usual one-link list) of records whose keys give the same hash value (or
rather, not the hash, but its residue from division by the array length). The method, despite its
external simplicity, has a very serious drawback: the lists themselves are quite bulky and time-
consuming to work with them.
The second way of resolving collisions is trickier: if a position in the table is occupied by
a record whose key does not match the one you are looking for, the next position is used, and
if it is occupied, the next one after it, and so on. When a new record is entered into the table,
it is simply placed in the first free position found (after a hash function has been defined);
when searching, the records are looked through one by one in search of the required key, and
if a free record is found, it is considered that the required key is absent in the table. The
disadvantages of this method include a rather unobvious algorithm for deleting records from
the table. The easiest way to do it is as follows: if the positions immediately following the
record to be deleted are occupied by something, then find the first free position in the table,
and then sequentially find every record located in the table between the ones just deleted.
The record deleted and found first free, temporarily deleted from the table, and then included
back according to the usual rule (the record may be included earlier or in exactly the same
place). Clearly, this may take quite a lot of time. There is a more efficient procedure for
deleting records that actually does the same thing, but completes in some cases before it finds
an empty position (see, e.g., [10], §6.4, "R algorithm"); this procedure is more difficult to
explain, and it is easy to make a mistake if not clearly understood.
It should be noted that both methods (the first to a lesser extent, the second to a greater
extent) are sensitive to table filling. It is considered that a hash table should be no more than
two-thirds full, otherwise the constant linear search (either through lists or through the table
itself) will negate all the benefits of hashing. If there are too many records in the table, it must
be rebuilt using a new size; for example, you can double the current size, then find the nearest
prime number from above and declare the new table size. Rebuilding a table is an extremely
inefficient operation, because each record from the old table will have to be added to the new
one by the usual rule with calculation of hash function and taking the remainder from division,
and there are no "reduced" algorithms for this.
One way or another, building hash tables requires arrays with unknown - dynamically
Recall, just in case, that a prime number is a natural number that is divisible only by one and itself.
189
§2.11 More on recursion 457
defined - lengths. The original Pascal language did not include such tools; modern dialects,
including Free Pascal, support dynamic arrays, but we will not consider them; if you wish,
you can learn this tool yourself.
Such declarations allow us not to write the subroutine body, which we may not be ready for
yet, usually because not all subroutines called from this body are described in the program
yet. The declaration tells the compiler, first, the name of the subroutine and, second, all the
information necessary to check the correctness of calling such subroutines and to generate the
machine code that makes the calls (this may require transformations of expression types, so
that the types of the subroutine's parameters at the point of its call must be known to the
compiler not only for correctness checking).
When constructing mutual recursion, we first provide in the program a declaration (with
the word forward) of the first of the subroutines involved in it, then describe (completely,
§2.11 More on recursion 458
with a body) the second of these subroutines, and then provide a complete description of the
first. In both bodies, the compiler can see the names of both subroutines and all the
information needed to call them, which is what we needed.
Clearly, the declared subroutine must be described later; if this is not done, an error will
occur at compile time.
be formalized in the form of a procedure, which for clarity receives four parameters: the
number of the source rod (source), the number of the target rod (target), the
number of the intermediate rod (transit) and the number of disks (n). The
procedure itself will be called "solve". During its work it will print lines like "1: 3 ->
2" or "7: 1 -> 3", which will mean, respectively, the transfer of disk number 1 from the
third rod to the second and disk number 7 from the first to the third. The main program, having
converted the command line parameter into a number , will call the solve subroutine with
191
Let's not be like school teachers and read this number from a keyboard; it's inconvenient and simply stupid.
190
var
n, code: integer;
begin
if ParamCount < 1 then
begin
writeln(ErrOutput, 'No parameters given');
halt(1)
end;
val(ParamStr(1), n, code);
if (code <> 0) or (n < 1) then
begin
writeln(ErrOutput, 'Invalid token count');
halt(1)
end;
solve(1, 3, 2, n)
end.
Note that the procedure that solves the problem consists of only eight lines, three of which are
service lines.
Let us now try to do without recursion. First, let us recall the algorithm we gave on page
183: on odd moves the smallest disk is moved "in a circle". 183: on odd moves, the smallest
disk is moved "in a circle", and if the total number of disks is even - in the "forward" order,
i.e. 1 ^ 2 ^ 3 ^ 1 ^ ..., and if the total number of disks is odd - in the "reverse" order, i.e. 1 ^ 3
^ 2 ^ 1 ^ ...; as for even moves, on them we do not touch the smallest disk, as a result of which
the move is unambiguous.
An attempt to implement this algorithm in the form of a program encounters an
unexpected obstacle. For a human, an action like "look at the rods, find the smallest disk and,
without touching it, make the only possible move" is so simple that we execute such an
instruction without thinking for a second; in a program, however, we have to remember which
rod currently has which disks on it and perform many comparisons, taking into account that
the rods may turn out to be empty.
In the archive of examples to this book you will find the corresponding program in a file
named hanoi2.pas. Here we do not give its text in order to save space. Let us only note
that to store information about the location of disks on the rods, we used three single-linked
lists in this program, one for each rod, and the first list at the beginning of the work contains
numbers from n to 1 in reverse order, where n is the number of disks; to store pointers to the
first elements of these lists, we use an array of three corresponding pointers. To solve the
problem, we start a loop that runs as long as at least one "disk" (i.e., a list element) is present
in lists #1 and #2. We simply compute the motion of the smallest disk on odd-numbered
§2.11 More on recursion 460
moves by formulas, for which we use the move number. In particular, the number of the rod
from which the disk must be taken is calculated as follows for an even total number of
disks:
The case with even strokes is more complicated. To calculate such a move, we must first find
out on which of the rods the smallest disk is located, and exclude this rod from consideration.
Then for the remaining two rods we need to determine in which direction the disk is moved
between them, taking into account that one of them may be empty (in which case the disk is
moved to it from the other rod), or they may both contain disks, in which case the smaller disk
is moved to the other rod, where the larger one is located. The body of the subroutine in which
these actions are placed, despite their apparent simplicity, took 15 lines.
The total length of the program was 111 lines (against 27 for the recursive solution). If
we discard the empty and service lines, as well as the text of the main part of the program,
which is practically the same in both cases (only the parameters of the solve procedure call
differ), and count only the significant lines that implement the solution, the recursive version
had eight such lines (the text of the solve subroutine), while the new solution has 87 lines.
In other words, the solution has become more complicated by more than an order of
magnitude!
Now let's try to make a recursion-free solution that does not use the above "tricky"
algorithm. Instead, recall that to move all disks from one rod to another, we must first move
all but the largest disk to the intermediate rod, then move the largest disk to the target rod, and
finally move all other rods from the intermediate to the target rod. Although the description
of the algorithm is obviously recursive, it is possible to implement it without recursion; in
principle, this is true for any algorithm, i.e. recursion can always be replaced by a loop, the
only question is how difficult it is.
The difficulty is that at each moment we have to remember what we are moving to where
and for what purpose. For example, in the process of solving the problem for four disks, at
some point we move the first (smallest) disk from the second rod to the third, to move two
disks from the second rod to the first, to be able to move the third disk from the second rod to
the third, to move three disks from the second rod to the third (since we have already moved
the fourth disk there), to move all four disks from the first to the third rod. This can be
somewhat more conveniently diagrammed "from the end":
• we solve the problem of transferring four disks from the first rod to the third one, and
we have already removed all disks to the second rod, transferred the fourth disk to the
third rod and now we are in the process of transferring all other disks there, i.e.
• we solve the problem of transferring three disks from the second rod to the third one,
and now we are trying to free the third disk in order to transfer it to the third rod, for
which purpose
• we solve the problem of transferring two disks from the second rod to the first one, and
§2.11 More on recursion 461
now we are trying to free the second disk in order to transfer it to the first rod, for which
purpose
• transfer the first disk from the second rod to the third.
Obviously, here we are dealing with tasks that are characterized by information about how
many disks we want to move, from where, to where, and what state we are in. We have a
variable number of tasks, so we'll have to use some kind of dynamic data structure, the easiest,
I guess, is a regular list. For each task we have to store the number of disks and two numbers
of rods, as well as the state of things, and we can distinguish three such states:
1. we have not yet done anything at all; the next action in this case is to clear the largest
of our disks, removing all that are smaller than it to the intermediate rod;
2. we've already cleared the largest disk, now we need to move it, and then move all the
disks smaller than it to it;
3. we've already solved it, so we can take it off the list.
In principle, one of these states (the last one) could be "spared", but this would be at the
expense of program clarity. To denote the states, we will use an enumerated type with three
possible values, denoting what should be done next when we see the given task again:
type
TaskState = (StClearing, StLargest, StFinal);
var
first, tmp: ptask;
To solve the problem we need to move all disks (n pieces) from the first rod to the third one.
Let's formalize this as a task and put the resulting task in the list (as a single item), considering
that we haven't done anything yet, so we'll specify StClearing as the task state:
new(first);
first".amount := n;
first".src := 1;
first".dst := 3;
first".state := StClearing;
first".next := nil;
§2.11 More on recursion 462
Next, a cycle is organized, which runs until the task list is empty. At each step of the loop, the
actions to be performed depend on the state of the task at the beginning of the list. If it is in
the StClearing state, then in the case when this task requires moving more than one
disk, another task must be added to move n - 1 disks to the intermediate stick; if the current
task is created to move only one disk, it does not need to be done. In both cases, the current
task itself is moved to the next state, StLargest, that is, when we see it next time
(which will be either after all the smaller disks have been moved or, if there is only one disk,
right at the next step), we will need to move the largest of the disks implied by this task and
proceed to the final stage.
If at the top of the list there is a task in the StLargest state, the first thing we do is to
move the largest of the disks to which this task applies; the number of this disk coincides with
their number in the task. By "migrate a disk" we mean in this case that we simply print the
appropriate message. After that, if there is more than one disk in the task, we need to
§2.11 More on recursion 463
add a new task to move all the smaller disks from the intermediate stick to the target stick; if
there is only one disk, we don't need to do this. In any case, the current task is put into the
StFinal state, so that the next time we see it at the top of the list, it can be eliminated.
The next problem we face is the calculation of the intermediate rod number; we have
provided a separate function in the program for this purpose:
Using this function, the basic "problem solving" loop looks like this:
while first <> nil do begin
case firsV.state of
StClearing:
begin
firsV.state := StLargest;
if firsV.amount > 1 then
begin
new(tmp);
tmpVsrc := firstVsrc;
tmpVdst := IntermRod(firstVsrc, firsV.dst);
tmpVamount := firsV.amount - 1;
tmpVstate := StClearing;
tmpVnext := first;
first := tmp
end
end;
StLargest:
begin
firsV.state := StFinal;
writeln(firstVamount, ': ', firstVsrc,
' -> ', firsV.dst);
if firsV.amount > 1 then
begin
new(tmp);
tmpVsrc := IntermRod(firstVsrc, firsV.dst);
tmpVdst := firsV.dst;
tmpVamount := firsV.amount - 1;
tmpVstate := StClearing;
tmpVnext := first;
first := tmp
end
end;
StFinal:
begin
tmp := first;
first := first~.next;
dispose(tmp)
end;
end end;
The full program text can be found in the file hanoi3.pas; note that its size is 90 lines,
§2.11 More on recursion 464
of which 70 are significant lines of the implementation of the solve procedure (including
auxiliary subroutines). The result is somewhat better than in the previous case, but still,
compared to the recursive solution, the difference is almost an order of magnitude.
end;
idxs := idxs + 1;
idxp := idxp + 1
end
end;
begin
if ParamCount < 2 then
begin
writeln(ErrOutput, 'Two parameters expected');
halt(1)
end;
if Match(ParamStr(1), ParamStr(2)) then
writeln('yes') else
writeln('no')
end.
When working with this program, note that "*" and "?" characters are perceived by the
command interpreter in a specific way: it also considers it its duty to make some kind of
comparison with the sample (see §1.2.7 for details). The easiest and safest way to avoid
troubles with special characters is to take your parameters in apostrophes, the command
interpreter does not perform any substitutions inside them; the apostrophes themselves will
disappear when passed to your program, i.e. you will not see them as part of the strings
returned by the ParamStr function. A session of work with the match_pt program
may look like this:
et cetera, et cetera.
§2.11 More on recursion 467
2.11.4. Recursion when working with lists
As already mentioned, a singly-linked list is recursive in nature: you can assume that the
list is either empty or consists of the first element and the list. This property can be exploited
by processing single-linked lists using recursive subroutines, with the empty list almost
always being the recursion basis. Suppose, for example, we have a simple list of integers
consisting, as in the previous examples, of items of type item:
type
itemptr = "item;
item = record
data: integer;
next: itemptr; end;
Let's start with a simple calculation of the sum of numbers in such a list. Of course, we can
loop through the list, as we have done many times before:
function ItemListSum(p: itemptr) : integer; var
sum: integer;
tmp: itemptr;
begin
tmp := p;
sum := 0;
while tmp <> nil do begin
sum := sum + tmp".data;
tmp := tmp".next
end;
ItemListSum := sum
end;
Now let's try to take advantage of the fact that the sum of an empty list is zero, and the sum
of a non-empty list is equal to the sum of its remainder to which the first item is added. The
new (recursive) implementation of the ItemListSum function is as follows:
To be honest, if your list contains several million records, it's better not to do this because you
may run out of stack, but on the other hand, lists are not usually used to store millions of
records. If you are not threatened by stack overflow, then successful recursive solutions,
strange as it may seem, may work even faster than "traditional" loops.
We will continue with the procedure of deleting all elements of the list. How this is done
cyclically, we have considered in detail earlier; now we will note that to delete an empty list,
nothing needs to be done, while to delete a non-empty list, we should free memory from its
§2.11 More on recursion 468
first element and delete its remainder. Of course, deleting the remainder should be done first,
so as not to lose the pointer to this very remainder when deleting the first element. Let's write:
We can't say that we gain much in code size here, but the fact that recursive list deletion is
easier to read is undoubted: you don't even need to draw diagrams to understand it.
Let us now consider a more complicated example. On p. 430 we looked at an example of
code inserting a new element into a list of integers sorted in ascending order while preserving
the sorting. To solve this problem cyclically, we needed a pointer to a pointer. Note now that
since we need to insert a new element into the sorted list, we have three possible cases:
• the list is empty - you need to insert an element into it first;
• the list is non-empty, and its first element is larger than the one to be inserted - we need
to insert a new element at the beginning;
• the list is non-empty, and the first element is less than or equal to the inserted one - it
should be inserted into the rest of the list.
As we have already discussed, if the pointer to the first element of the list is passed to a
subroutine as a parameter-variable, such a subroutine will be able to insert and delete elements
anywhere in the list, including at the beginning. Besides, it will be useful to remember that
inserting an element into an empty one-connected list and inserting an element into the
beginning of a non-empty list are performed in exactly the same way, which allows us to
combine the first two cases into one, which will serve as the basis for recursion. Note that the
role of the "pointer to the first element" for the list that is the remainder of the original list is
played by the next field in the first element of the original list. Taking all this into account,
the procedure that inserts a new element into the sorted list while preserving sorting will look
like this:
Compare this with the code on page 430 and its explanations. 430 and its explanations.
Comments, as they say, are unnecessary.
To work with it, we need a pointer, which is often called the root of the tree; however, the
root is just as often the initial node itself, not the pointer to it. From the context it is usually
easy to understand what we mean. Let us describe the root pointer as follows:
var
root: TreeNodePtr = nil; Now we can start describing the basic tree
operations. Since we will have to use recursion actively, we will formalize each action as a
subroutine. To understand roughly how it will look like, let's assume that the tree has already
been built and write a function that calculates the sum of values in all its nodes. For this
purpose, note that for an empty tree such a sum is obviously equal to zero; if the tree is non-
empty, then to calculate the sum we will have to first calculate the sum of the left subtree,
then the sum of the right subtree, add them together and add the number from the current
node. Since the left and right subtrees are the same trees as the whole tree, but with fewer
nodes, we can use the same function we are writing to calculate subtree sums:
function SumTree(p: TreeNodePtr): longint; begin
if p = nil then SumTree := 0 else
SumTree :=
SumTree(p'.left) + p'.data + SumTree(p'.right) end;
As a matter of fact, almost everything we do with a tree follows the same scheme: an empty
tree is used as a degenerate case for the recursion basis, and then recursive calls are made for
the left and/or right subtree. Let's continue our set of examples with a subroutine that adds a
new element to the tree; when the tree is empty, we need to create the element "right here",
i.e. change the pointer that serves as the root pointer for this tree; with this in mind, we will
§2.11 More on recursion 470
pass the pointer to the procedure as a parameter-variable. If the tree is not empty, i.e. at least
its root element exists, then three cases are possible. First, the element to be added may be
strictly smaller than the one in the root element; then the addition should be done in the left
subtree. Second, it may be strictly larger, in which case the right subtree should be used. The
third variant spoils the whole picture by adding a special case: the numbers may be equal.
Depending on the task, different approaches to further actions are possible in such a situation:
sometimes an error is generated when keys match, sometimes some counter is incremented to
show that the given key was entered one time more, sometimes nothing is done at all. We will
stop at informing the caller that the addition cannot be performed; for this purpose, we will
provide our procedure with a parameter of type boolean, into which it will write "true"
when the value is successfully added, and "false" when there is a key conflict. The result is
the following:
If you try to write a procedure that determines whether a given number exists in a given tree,
you will get something very similar. Actually, it would be more correct to formalize this
subroutine as a function, because it doesn't "do" anything in the sense of changes, it just
calculates the result; but now it is more important to show the resulting similarity between the
two procedures:
The similarity of the two procedures is not accidental, because in both cases the search is
performed. To search for the required position, it seems worth writing one generalized
subroutine, which can be used to implement both addition and presence check: this subroutine
will search for a position in the tree for a given tree and value, where a node with this value
should be (but is not necessarily) located. This "position" is nothing but the address of the
pointer that points to the corresponding node or should point to it if the node does not exist
yet. For a change, let's still formalize this subroutine as a function, because it simply calculates
its result without changing anything anywhere. Since the pointer address will be the return
value of the function, we will have to describe and name the corresponding value type:
type
TreeNodePos = "TreeNodePtr;
Our function must in some cases return the address of the pointer given to it - if the tree is
empty, and also if the root element of the tree contains the number we are looking for. To
return such an address, it must at least be known, and for this purpose we will pass the pointer
as a parameter variable; the name of such a parameter, as we remember, becomes synonymous
with the variable used as a parameter at the call point during the execution of the subroutine,
so that the address taking operation applied to the parameter name will give us the address of
this variable. Secondly, the cases of an empty tree and equality of the sought value to the value
at the root node can now be combined: in the case of equality we have found the position
where the corresponding number is located, and in the case of an empty subtree - the position
where it should be located. The caller can distinguish between these two cases by checking
whether the pointer located at the address obtained from the function is equal to nil or not.
The text of the function is as follows:
function SearchTree(var p: TreeNodePtr; val: longint)
: TreeNodePos; begin
if (p = nil) or (p'.data = val) then SearchTree := @p else
if val < p'.data then
SearchTree := SearchTree(p~.left, val) else
SearchTree := SearchTree(p~.right, val) end;
Using this function, we can rewrite two previously written subroutines in a new way, making
them noticeably shorter. There will be no resemblance between their texts now, because
everything they had in common we took out in SearchTree; so now nothing prevents us
from formalizing IsInTree as a function after all. The result will be like this:
A small demonstration program using these functions can be found in the file
treedemo.pas. Unfortunately, we have to remind you of a serious disadvantage of binary
search trees: the tree may become unbalanced if the order in which the stored values are
entered into them is unsuccessful. There are several approaches to such a construction of a
binary search tree, in which unbalance either does not occur at all or is quickly eliminated,
but the story about them is far beyond the scope of our book.
In case the reader has an irresistible desire to learn tree balancing algorithms, we will take the
liberty to draw his attention to several books, in which the relevant issues are discussed in detail. We
should start with N. Wirth's textbook "Algorithms and Data Structures" [8]; the description in this book
is described in detail in the textbook "Algorithms and Data Structures". [8]; the presentation in this
book is characterized by brevity and understandability, since it is intended for beginners. If this is not
enough, try the huge book by Cormen, Leiserson and Rivest [9]; finally, for the strong-hearted there
is also a four-volume book by Donald Knuth, the third volume of which [10] contains a detailed
analysis of all kinds of data structures oriented on sorting and searching.
As an exercise, we venture to suggest that the reader try to write subroutines for working
with trees without recursion. Note that the tree search is realized relatively simply and is a bit
like inserting an element into the right place of a singly-linked list, as it was shown in §2.10.6.
The algorithm for traversing the tree, which is also needed to calculate the sum of the list
elements, is much more complicated; but even here we have already considered something
similar when we solved the problem of the Towers of Hanoi (§2.11.2). In the program we
began discussing on p. 445, we had to memorize the path by which we arrived at the need to
do this or that action, and then go back and continue the interrupted activity. Traversing the
tree is essentially the same thing: for each node we reach, we have to remember where we
came from, as well as what we have already done at that node, or, more precisely, what we
will have to do when we return to that node again.
If it doesn't work, that's okay, this task is quite difficult; we'll come back to it when we
learn the C language.
§2.11 More on recursion 473
2.12. More about program design
If builders built houses the way programmers write
programs, the first woodpecker that flew in would destroy
civilization.
Weinberg's second law
We have already repeatedly drawn the reader's attention (see page 241) to the fact that
the program text cannot be written in any way: it is intended first of all for the human reader
and only secondly for the compiler, and if you forget about it, not only other people will not
understand anything in your program, but also, what is especially offensive, you risk getting
lost in your own program before you finish it.
There are a number of simple but very important rules to improve the comprehensibility
and readability of a program. Let's try to formulate the most important of them.
working with computer program texts will understand at least a not very complex text in
English, because this is the language in which documentation for various libraries, all sorts of
specifications, descriptions of network protocols are written, many books on programming
are published; of course, many books and other texts are translated into Russian (as well as
into French, Japanese, Hungarian, Hindi and other national languages), but it would be
unreasonable to expect that any text you need will be available in Russian.
Three important requirements follow from this. First, any identifiers in the program
must consist of English words or be abbreviations of English words. If you have forgotten
the word you need, don't hesitate to look it up in the dictionary. Let us emphasize that the
words should be English - not German, not French, not Latin and especially not Russian
"translit" (the latter is generally considered by professionals to be extremely bad taste).
We will leave aside the question of whether this is good or bad and confine ourselves to stating the fact.
192
It should be noted, however, that doctors and pharmacists all over the world have a tradition of filling
prescriptions and some other medical documents in Latin, and, for example, the official language of the
Universal Postal Union is French; in any case, the existence of a common professional language of
communication is useful in many ways.
§ 2.12. More on program design 475
Secondly, the comments in the program must be written in English; as already mentioned,
it is better not to write comments at all than to try to write them in a language other than
English. And finally, the user interface of the program should be either English or
"international", i.e. allowing translation into any language.
By now, some readers may have had a legitimate question: "What should I do if I don't
know English? The answer will be trivial: learn it urgently. A programmer who cannot write
English more or less competently (and even less understand English) is professionally
unsuitable in modern conditions, however unpleasant it may sound.
The question remains what to do if the program according to the requirements must
communicate with the user in a language other than English; according to the rules formulated
above, this means that it must be made international, but how to make it correctly? The basic
principle is quite simple: all messages in languages other than English must be placed in
separate files external to your program. In this case your program will read these files, analyze
them and give the user the messages received from the files; this does not violate our
principles in any way, because the program can process absolutely any data, including texts
in any language, and the prohibition on languages other than English concerns only the text
of the program itself.
Free Pascal even contains special tools for creating international programs, but we will
not consider these tools in order to save time and effort - in the hope that the reader will not
stop learning Pascal. In the second volume of our book, we will discuss one of the libraries
designed for creating international C programs.
In all Pascal programs in this part of the book, we always shifted the word begin to the
next line after the operator header (while or if), and wrote it starting at the same position
where the header begins, i.e., we did not make a separate shift for the operator brackets (Figure
2.15, left). Two other options are allowed. The word begin can be left on the same line as
the operator header, with end placed exactly under the beginning of the operator (not under
the word begin!), as shown in the same figure in the middle; this is how C programs will
be laid out in the relevant parts of our book. Finally, you can shift to the next line, but provide
a separate shift for the operator brackets (this option is shown on the right in the figure). When
using the last option, you usually choose the indentation size of two spaces, because otherwise
the horizontal space on the screen is quickly exhausted; as we have already mentioned, this
option is acceptable, but we will not recommend it.
Just remember that the body should always be placed on a separate line, and in the case of an
if statement, this applies to both branches. Thus, the following variant is
unacceptable:
For this case we will also recommend shifting the labels (we did exactly that in our
examples), but we will leave the final decision to the reader.
Finally, if we not only demolish begin, but also supply the compound operator
with its own shift, it will look like this:
case Operation of case Operation of ' + ':
' + ': begin
begin writeln('Addition');
writeln('Addition'); c := a + b end;
c := a + b '-':
end; begin
'-': writeln('Subtraction');
begin c := a - b end;
writeln('Subtraction'); else
c := a - b end; begin
else writeln('Error');
begin c := 0 end end
writeln('Error');
c := 0 end end
For this case, our recommendation would be the opposite: if we equip compound
operators with a separate shift, it is better not to shift the labels in the selection operator,
the result will be more aesthetically pleasing.
It is easy to see that with this approach, seven or eight branches will be enough to run
out of horizontal screen space; however, many more branches may be needed. More
importantly, this (formally seemingly correct) style of formatting misleads the reader
of the program as to the relationship between the branches of the design. It is clear that
these branches have the same nesting rank. If in doubt, try swapping the branches. It is
obvious that the program operation will not change in any way, and if so, it means that
the assumption that, for example, the first branch is "more important" than the second
one, and the second branch is "more important" than the third one, is incorrect. But they
are shifted to different positions!
The problem can be explained in another way. It is clear that such a chain of if 's
is a generalization of the select operator and serves the same purpose as the select
operator; the only difference is the expressive power - if is not limited to comparing
an expression of an ordinal type with a constant. But the branches of the choice operator
are written at the same level of nesting. It is quite logical to think that the branches of
such a construction of if's should also be located at the same indentation level.
This is achieved by treating the keywords else and if next to each other as a
single unit. Regardless of the style you choose, you can write else if on a single
line with a space, or you can separate them on different lines starting at the same
position. In particular, if you demolish begin on a separate line, you can arrange the
above fragments like this (if on a new line each time):
or like this (if on the same line with the previous else; this is also acceptable and
probably more correct - we agreed to consider else and if as a single integer):
end
else if cmd = "Quit" then
begin
writeln('Good bye...');
QuitProgram
end
else
begin writeln('Unknown command') end
If you decide to leave begin on the same line with the headers of operators, the
design of the chain of if's can be formalized as follows:
or this:
if cmd = "Save" then begin
writeln('Saving...');
SaveFile
end else if cmd = "Load" then begin
writeln('Loading...');
LoadFile
end else if cmd = "Quit" then begin
writeln('Good bye...');
QuitProgram
end else begin
writeln('Unknown command')
end
We emphasize that everything said in this paragraph applies only to the case when the
else branch consists of exactly one if statement. If this is not the case, the
usual rules of branching operator formatting should be applied.
The label here is in the leftmost position only because that is where the enclosing
structure (in this case it is a subroutine) is placed. The label can also occur not at the
top level, for example:
The above variant is the most popular, but it is not the only acceptable variant. Quite
often the labeled operator is written in the same line as the label, like this:
{ ... }
quit: dispose(q);
dispose(p)
end;
This option looks good if the label - along with the colon and the space after it - takes
up less space horizontally than the selected indentation size, allowing horizontal
alignment for operators:
{ ... }
q: dispose(a);
dispose(b)
end;
Since a one-letter label name by itself does not look too nice, this style is usually used
in combination with tab indentation (the maximum length of the label name is 6
§ 2.12. More on program design 484
characters).
Some programmers prefer to treat the label simply as a part of the labeled operator,
without distinguishing it as a special entity; the operator is shifted as usual, but this
time together with the label. The end of the above subroutines in this style would look
like this:
{ ... }
quit: dispose(q);
dispose(p)
end;
Sometimes the label is shifted, but the labeled operator is demolished to the next line,
roughly like this:
{ ... }
quit:
dispose(q);
dispose(p)
end;
The main disadvantage of such solutions is that the label merges with the
surrounding "landscape" and ceases to be visible as a special point in the code structure;
we will take the liberty to recommend refraining from this style, but nevertheless, we
will leave the decision to the reader.
193
Thou shalt not go beyond eighty columns in thy file // The sacred rule of the eightieth column.
§ 2.12. More on program design 485
smaller, it will be inconvenient to write programs, especially in structured languages
where structural indentation is required; for example, even the simplest programs will
not fit into a width of 40 characters. On the other hand, programs with significantly
longer lines are hard to read, even if the corresponding lines fit on a screen or a sheet
of paper. The reason for this is purely ergonomic and is connected with the necessity
to move your eyes left-right. The reader can easily see for himself that in any
typographically produced book the width of the typing strip does not exceed 75
characters; the recommended length of the book line is 50-65 characters, lines up to
75 characters are considered acceptable, but no more; this "magic" limit was known to
book publishers long before the computer era. "The 80-column punched cards that
came to hand " were well suited for representing lines of text: the first four columns
194
were usually reserved for the line number, the fifth contained a space separating the
number from the content, and there were just 75 positions for the line itself.
With the modern size of displays, their graphic resolution, and the ability to sit
close to them without harm to health, many programmers see nothing wrong with
editing text with a window width substantially larger than 80 characters. From the point
of view of ergonomics, this solution is not quite successful; it is advisable either to
make the font larger, so that your eyes get tired less, or to use the screen width to place
several windows with the possibility of simultaneous editing of different files - this will
make navigation in your code more convenient, because the code of complex programs
usually consists of many files, and you often have to make changes in several of them
at the same time. Programming-oriented window text editors such as geany, gedit,
kate, etc. routinely show the right border line on the screen, just at the level of the
80th character.
Many programmers prefer not to open a text editor window wider than 80
characters; moreover, many programmers use text editors that work in a terminal
emulator, such as vim or emacs; both editors have graphical versions, but not all
programmers like these versions. Quite often in the process of program operation there
is a need to view and even edit source code on a remote machine, and the quality of
communication (or security policy) may not allow the use of graphics, and then the
window of an alphanumeric terminal becomes the only available tool. There are known
software tools designed to work with program source texts (for example, detecting
differences between two versions of the same source text), which are implemented on
the assumption that source text lines do not exceed 80 characters in length.
Often a program listing may need to be printed on paper. The presence of long lines
in this situation presents you with an unpleasant choice. You can make the long lines
fit on paper in one line - either by reducing the font size, using a wider sheet of paper,
or using a "landscape" orientation - but this leaves most of the paper area blank, and
makes the listing harder to read; if you trim the lines by simply dropping a few right-
IBM produced various punched cards, not only 80-column ones, back in the thirties of the XX
194
century - much earlier than the first computers; they were intended for sorting machines - tabulators,
which were used, in particular, for processing statistical information. However, back in the 19th century,
analogs of punch cards were used to control weaving machines.
§ 2.12. More on program design 486
hand positions, you risk missing something important; finally, if you force the lines to
be automatically transposed, the readability of the resulting paper listing will be worse
than the readability of the original text from the screen, which is already the case.
The conclusion from all of the above is quite obvious: whatever text editor you use,
you should not allow lines longer than 80 characters to appear in your program. In fact,
it is always advisable to keep within 75 characters; this will allow a programmer using
a vim editor with line numbering enabled to work comfortably with your text, for
example; such source code will produce a nice and easy-to-read listing with numbered
lines.
Some code style guides allow you to exceed the line length limit "in exceptional cases".
For example, the design style set for the Linux kernel categorically forbids text messages to
be spread over several lines, and for this case it is said that it is better if a line of source text
"goes beyond" the set limit. The reason for this prohibition is quite obvious. The Linux kernel
is an extremely extensive program, and it is difficult to navigate its source code. It is often
necessary to find out exactly which fragment of the source code caused a particular message
to appear in the system log, and the easiest way to find the appropriate place is a simple text
search, which of course will not work if the message we are trying to find is spread over several
text constants located in different lines of the source.
However, exceeding the allowable line length remains undesirable. The same Linux
kernel code design guidelines have additional restrictions on this issue - for example, there
must be "nothing significant" beyond the right border of the screen, so that a person viewing
the program cursorily and not seeing text to the right of the border will not miss some important
property of the program. It may take some serious experience to determine whether your case
is acceptable. Therefore, the best option is still to consider the 80-character boundary
requirement to be strict, i.e., not allowing exceptions; as practice shows, this can always be
handled by successfully splitting the expression, reducing the text message, reducing the level
of nesting by moving parts of the algorithm into auxiliary subroutines.
In addition to the standard screen width, attention should also be paid to the screen
height. As mentioned above, subprograms should be kept as small as possible to fit the
screen height; the question remains as to what this "screen height" should be. The
traditional answer to this question is 25 lines, although there are variations (e.g., 24
lines). It should not be assumed that the screen will be larger; however, as already
mentioned, the length of the subroutine in some cases has the right to slightly exceed
the height of the screen, but not by much.
Of course, there is no problem of lack of space in the string here, there is only a problem
of lack of imagination. In such a situation, a programmer who has passed the beginner's
stage will describe a function that checks whether a given symbol belongs to a
predefined set, and the if'a header will contain a call to this function:
if IsPunctuation(a) then
A more experienced programmer will use a ready-made function from the standard
library, and an even more experienced programmer may claim that the standard
function is too complicated because it depends on the locale settings in the
environment, and go back to the version with his own function. Anyway, there is no
problem with header length.
Unfortunately, problems are not always solved so easily. Multiline headings, no
matter how hard we try to overcome them, still sometimes occur in the program.
Unfortunately, there is no unambiguous answer how to deal with them; we will consider
one of the possible options, which seems to us the most practical and meets the task of
program readability.
So, if the header of a complex operator has to be spread over several lines,
then:
- break the expression in the header into multiple lines; it is preferable to
break the line by "top-level operation", which is usually the logical
conjunction "and" or "or";
• shift each subsequent header line relative to the first header line by the
normal indentation size;
• regardless of the number of simple operators in the body, be sure to put the
body of your operator in operator brackets, i.e. make it a compound
operator;
• regardless of the style you use, tear off the opening statement bracket on
the next line so that it serves as a visual separator between the header and
body lines of your statement.
All together it will look something like this:
This option works fine if you don't move the compound operator relative to the header;
if you prefer this ("third") style of design, you may be advised to move the word then,
do, or of to the next line, like this:
§ 2.12. More on program design 488
while (TheCollection'.KnownSet'.First = nil) and
(TheCollection'.ToParse'.First <> nil) and
(TheCollection~.ToParse~.First~.First~.s = ' ')
do
begin
SkipSpace(TheCollection)
end;
The role of a visual separator here is played by the closing word from the title.
If you are using a style that leaves the opening parenthesis on the same line as the
header, you can use another formatting option: as in the previous example, take down
the last token of the header on a separate line, and leave the opening parenthesis on the
same line:
This style is not to everyone's liking, but you don't have to follow it; even if you leave
the operator bracket on the title line everywhere else, for a multi-line title you may well
make an exception and format it as shown in the first example in this paragraph.
Let's consider other situations when a line may not fit into the allotted horizontal
space. I would like to note at once that the best way to deal with such situations is to
avoid them. Often code style guides are written under the assumption that a
programmer can always avoid an undesirable situation, and they simply do not say what
to do if the situation does occur; such a default leads to the fact that programmers begin
to get out of the situation in any way, and often even in different ways within the same
program. To avoid this, we will give some examples of what to do if a long string does
not want to get shorter.
Suppose an overly long expression is encountered in the right-hand side of an
assignment. The first thing we suggest is to try to break the string by the sign of the
assignment. If there is something long enough to the left of the assignment, this option
may help, for example:
MyArray[f(x)].ThePtr~.MyField : = StrangeFunction(p,
q, r) + AnotherStrangeFunction(z);
Note that the expression to the right of the assignment is not only moved to the next
line, but also moved to the right by the indentation size. If the expression still doesn't
fit on the screen after that, you can start splitting it too, and it is best to do it by the
signs of the lowest-priority operations, for example:
MyArray[f(x)].ThePtr~.MyField :=
StrangeFunction(p, q, r) * SillyCoeff +
§ 2.12. More on program design 489
AnotherStrangeFunction(z) / SecondSillyCoeff +
JustAVariable;
It may happen that even after this, the screen is still too narrow for your expression.
Then you can try to start splitting the subexpressions included in your expression into
several lines; their parts should, in turn, be moved one more indentation, so that the
expression is more or less easy to read, as far as it is possible to talk about readability
for such a monstrous expression:
MyArray[f(x)].ThePtr~.MyField : = StrangeFunction(p,
q, r) + AnotherStrangeFunction(z) *
FunctionWhichReturnsCoeff(z) *
AnotherSillyFunction(z) + JustAVariable;
Of course, if in real life you had to add up 26 variables like this, it is a reason to wonder
why you don't use an array; here we give the sum of simple variables for illustration
only, in real life you will have something more complex instead of variables.
The situation when there is a simple variable name or even an expression to the left
of the assignment, but a short one, so that breaking the string by the assignment sign
gives no (or almost no) advantage, deserves a separate discussion. Of course, the
expression to the right of the assignment is still best broken down by lower-priority
operations; the only question is at what position to start each subsequent line. There are
exactly two answers to this question: you can either shift each next line by one indent,
as in the examples above, or you can place its beginning exactly below the beginning
of the expression in the first line of our assignment (right after the assignment sign).
Compare, here is an example of the first option:
MyArray[n] := StrangeFunction(p, q, r) *
SillyCoeff + AnotherStrangeFunction(z) /
AnotherCoeff + JustAVariable;
And this is how the same code will look like if you choose the second option:
Both variants are acceptable, but they have significant drawbacks. The first option loses
to the second in clarity, but the second option requires a non-standard indentation size
§ 2.12. More on program design 490
for the second and subsequent lines, which turns out to depend on the length of the
expression to the left of the assignment. Note that this (second) option is completely
unsuitable if you use tabs as an indentation, because this alignment can only be
achieved with spaces, and you should never mix spaces and tabs.
If the disadvantages of both options seem significant to you, you can make it a rule
to always translate the line after the assignment sign, if the whole operator does not fit
on one line. This variant (discussed at the beginning of the paragraph) is free from both
disadvantages, but requires the use of an extra line; however, there is an unlimited
supply of lines in the Universe. It will look like this:
MyArray[n] : =
StrangeFunction(p, q, r) * SillyCoeff +
AnotherStrangeFunction(z) / AnotherCoeff +
JustAVariable;
The next case that needs to be considered is a subprogram call that is too long. If you
cannot fit the parameters into one line when calling a procedure or function, the line
will naturally have to be broken, and this is usually done after another comma
separating the parameters from each other. As in the case of an expression scattered
over several lines, the question arises as to which position to start the second and
subsequent lines from, and there are two options: to shift them either by the indentation
size or so that all parameters are written "in column". The first option looks like this:
VeryGoodProcedure("This is the first parameter",
"Another parameter", YetAnotherParameter, More +
Parameters * ToCome);
The second option for the above example would look like this:
VeryGoodProcedure("This is the first parameter",
"Another parameter",
YetAnotherParameter, More + Parameters
* ToCome);
Note that this option, as well as a similar option for formatting expressions, is not
suitable when using tabs as an indentation size: only spaces can achieve such alignment,
and spaces and tabs should not be mixed.
If you don't like both options for one reason or another, we can suggest another
option, which is rarely used, although it looks quite logical: consider the subprogram
name and parentheses as a volumetric construct, and parameters as nested elements. In
this case, our example will look like this:
VeryGoodProcedure(
"This is the first parameter."
"Another parameter", YetAnotherParameter, More +
Parameters * ToCome
);
It often happens that the subprogram header is too long. In this situation, you should
§ 2.12. More on program design 491
first of all carefully consider the possibilities of its reduction, while allowing, among
other things, the option of changing the division of the code into subprograms. As we
have already mentioned, subroutines with six or more parameters are very hard to use,
so if a large number of parameters has caused the header to "swell", you should consider
whether it is possible to change your architecture in such a way as to reduce this number
(perhaps at the cost of introducing more subroutines).
The next thing to pay attention to is the names (identifiers) of the parameters. Since
these names are local for your subprogram, they can be made short, up to two or three
letters. Of course, we deprive these names of self-explanatory power, but in any case,
the subprogram header is usually provided with a comment, at least a short one, and
the corresponding explanations about the meaning of each parameter can be included
in this comment.
Sometimes, even after all these tricks, the header still does not fit into 79 characters.
Most likely, you will have to separate the parameter list on different lines, but before
you do that, you should try to remove the beginning and end of the header on separate
lines. For example, you can write the word procedure or function on a
separate line (the next line will not be moved!). In addition, the type of the function's
return value specified at the end of the header can also be moved to a separate line along
with the colon, but this line should be moved so that the return value type is somewhere
below the end of the parameter list (even if you use tabs). The point is that the reader
of your program expects to see the return value type there (somewhere on the right),
and it will take extra effort for the reader to find it on the next line on the left instead
of on the right. All together it may look like this:
procedure
VeryGoodProcedure(fpar: integer; spar: MyBestPtr; str:
string); begin
{...}
end;
function
VeryGoodFunction(fpar: integer; spar: MyBestPtr; str:
string)
: ExcellentRecordPtr;
begin
{...}
end; If this does not help and the header is still too long, the only option left is to
split the parameter list into parts. Naturally, line feeds are inserted between the
descriptions of the individual parameters. If several parameters have the same type and
are listed comma-separated, it is desirable to leave them on the same line and place line
breaks after the semicolon after the type name. In any case, the question remains as to
the horizontal placement (shift) of the second and subsequent lines. As for the cases of
a long expression and a long subprogram call discussed above, there are three options
here. First, you can start the parameter list on one line with the subprogram name, and
shift the subsequent lines by the indentation size. Second, you can start the list on the
same line as the subprogram name, and shift the subsequent lines so that all parameter
§ 2.12. More on program design 492
descriptions start at the same position (this case is not suitable when tabulation is used
for formatting). Finally, it is possible, considering the name of the subprogram and the
opening parenthesis as the header of a complex structure, to shift the description of the
first parameter to the next line, shifting it by the indentation size, placing the rest of the
parameters below it, and placing the parenthesis closing the parameter list on a separate
line in the first position (under the beginning of the header).
The case of a long string constant stands somewhat apart in our list. Of course, the
worst thing you can do is to "shield" the line feed character by continuing the string
literal at the beginning of the next line of code. Don't ever do that:
But this is not quite right either. A single text message that is output as a single line
(i.e., does not contain line feed characters among the output text) should not be split
into different lines of code at all (see the remark on page 475). There are two more
ways to deal with the string literal length.
First of all, no matter how trivial it may seem, you should consider whether it is
possible to shorten the phrase contained in the line without losing its meaning. As you
know, brevity is the sister of talent. For example, for the example under consideration,
the following variant is possible:
We have left the meaning of the English phrase unchanged, but now it fits into a code
line quite normally, contrary to its own content.
Secondly (if you don't want to shorten anything), you may notice that some string
constants would fit in a line of code if the operator containing them started in the
leftmost position, i.e. if there were no structural indentation. In such a situation, it is
quite easy to deal with a stubborn constant: just give it a name - for example, describe
it in the constants section:
const TheLongString =
'This string could be too long if it was placed in the
code';
{ ... }
writeln(TheLongString);
Unfortunately, there are times when none of the above methods works. Then the only
thing left to do is to follow the rules in the Linux Kernel Coding Style Guide and leave
a line longer than 80 characters in the code. Just make sure that this length does not
§ 2.12. More on program design 493
exceed the limits of reasonableness. So, if the resulting line of code "exceeds" 100
characters and you think that none of the above mentioned methods can be used to
defeat the malicious constant, you probably only think so; the author of these lines has
never seen a situation in which a string constant could not fit into the usual 80
characters, let alone a hundred.
MyProcedure( a, b[ idx + 5 ], c );
When referring to procedures and functions, a space between the name of the called
subroutine and the opening bracket is usually not put, just like the space between
the name of an array and the opening square bracket of an indexing operation.
A somewhat separate issue is the question of which arithmetic operations should
be separated by spaces, and how - on one side or both sides. One of the most popular
and clear recommendations is as follows: symbols of binary operations should be
separated by spaces on both sides, symbols of unary operations should not be
separated by spaces. It should be taken into account that the operation of selecting a
field from a complex variable (in Pascal it is a point) is not a binary operation, because
on the right side it has not an operand, but the name of the field, which cannot be the
value of the expression. We emphasize that this is the most popular style, but by no
means the only one; it is possible to follow completely different rules, for example, to
space out binary operations of the lowest priority (i.e., operations of the "top" level) in
any expression, and not to space out the rest, etc.
§ 2.12. More on program design 494
2.12.10. Selecting names (identifiers)
The general rule for choosing names is fairly obvious: identifiers should be
chosen according to what they are used for. Some authors argue that identifiers must
always be meaningful and consist of several words. In fact, it is not always so: if a
variable performs a purely local task and its application is limited to several program
lines, the name of such a variable may well consist of a single letter. In particular, an
integer variable that plays the role of a loop variable is most often called simply "è",
and there is nothing wrong with that. But single-letter variables are appropriate only
when it is clear from the context unambiguously (and without additional efforts to
analyze the code) what it is and why it is needed, and then, perhaps, only in those rare
cases when the variable contains some physical quantity traditionally denoted by such
a letter - for example, temperature may well be stored in the variable t, and spatial
coordinates - in the variables x, y and z. A pointer can be called p or ptr, a string
can be called str, a variable for temporary storage of some value can be
called tmp; a variable whose value will be the result of a function calculation is often
called result or simply res, a sum is quite suitable for an adder, and so on.
It is important to realize that such brevity is suitable only for local identifiers, i.e.
those whose scope of visibility is limited - for example, to a single subprogram. If an
identifier is visible in the whole program, it must be long and clear - at least to avoid
conflicts with identifiers from other subsystems. To understand what we are talking
about, imagine a program where two programmers are working on, one of them dealing
with a temperature sensor and the other with a clock; both temperature and time are
traditionally denoted by the letter t, but if our programmers use this circumstance to
name globally visible objects, there will be no problems: a program with two different
global variables with the same name has no chance to pass the linking stage.
Moreover, when it comes to globally visible identifiers, length and verbosity alone
do not guarantee absence of problems. Let's say we need to write a function that polls
a temperature sensor and returns the received value; if we call it GetTemperature,
formally everything seems to be fine, but in fact, with a very good probability we need
to find out the temperature previously written to a file or simply stored somewhere in
the program memory in another subsystem, and the GetTemperature identifier is
quite suitable for such an action too. Unfortunately, there is no universal recipe for
avoiding such conflicts, but we can still give some advice: when choosing a name for
a globally visible object, consider whether such a name could stand for something
else. In the example under consideration, the GetTemperature identifier can be
offered two or three alternative roles at once, so it should be recognized as unsuccessful.
For example, the ScanTemperatureSensor identifier could be more successful,
but only if it is used to work with all temperature sensors your program deals with - for
example, if such a sensor is known to be the only one, or if the
ScanTemperatureSensor function receives a number or other sensor identifier
as input. If your function is intended to measure, for example, the temperature in the
cabin of a car, and there is also a sensor, say, of the coolant temperature in the engine,
then you should add another word to the function name so that the resulting name
§ 2.12. More on program design 495
identifies what is happening unambiguously, for example:
ScanCabinTemperatureSensor.
• the more hopeless it looked, the more stupid the mistake seems;
• the computer doesn't do what you want it to do, it does what you asked it to do;
• a correct program works correctly under any conditions, an incorrect program
sometimes works too;
• and it better not be working;
• if the program works, it doesn't mean anything;
• if the program "crashes", you should be happy: the error has manifested itself,
so it can now be found;
• the louder the rumble and brighter the special effects when the program
"crashes", the better - it is much easier to look for a noticeable error;
• if there is definitely an error in the program, but the program still works, you are
out of luck - this is the nastiest case;
• neither the compiler, nor the library, nor the operating system is at fault;
• no one wants you dead, but if you do, no one's gonna be upset;
• it's actually not that bad - it's worse;
• the first written line of the future program text makes the debugging stage
inevitable;
• if you are not ready for debugging - don't start programming;
• the computer won't explode; but no one promised you more than that.
In computer labs one can often observe students who, having written some text in a
programming language, having successfully compiled it, and having made sure that the
result does not meet the task at hand, stop any constructive activity and switch, for
example, to another task, usually with a similar result. They usually explain such a
strange choice of strategy with the sacramental phrase "well, I wrote it, but it doesn't
work", and it is pronounced with an intonation that implies that the program itself is to
blame, as well as the teacher, the computer, the weather in Africa, the Argentinean
ambassador to Sweden, or the cafeteria lady, but certainly not the speaker, because he
wrote the program.
In such a situation, one simple principle should be immediately recalled: the
computer does exactly what is written in the program. This fact seems trivial, but it is
immediately followed by the second one, namely: if the program does not work
properly, it is not written properly. With this in mind, the statement "I wrote the
program" requires clarification: it would be more correct to say "I wrote the wrong
program".
Clearly, writing an incorrect program, even one that compiles successfully, is
certainly not a noteworthy endeavor; after all, the simplest Pascal text that compiles
successfully consists of only two words and one dot: "begin end. " Of course,
this program doesn't solve the problem at hand - but a program that "was written and
doesn't work" doesn't solve anything either, so how is it any better?
Another situation is no less typical, also occurring mostly on test papers and
expressed by the phrase "I wrote everything, but I didn't have time to debug it". The
problem here lies in the content of the word "everything": the authors of such programs
§ 2.13. Testing and debugging 471
often do not even suspect how far they were actually from solving the problem.
One can even understand the feelings of a beginner who has struggled to write a
program text and found that the program does not want to meet his expectations at all.
The very process of writing a program, usually called "coding", still seems very
difficult to a beginner, so subconsciously the author of such a program expects at least
some reward for "successfully" overcoming difficulties, and what the computer does in
the end resembles not a reward but a mockery.
Experienced programmers perceive all this quite differently. First, they know for
sure that coding, i.e. the process of writing the program text itself, is only a small part
of the various activities called programming, and not just a small part, but also the
easiest. Secondly, having the experience of creating programs, a programmer
understands well that at the moment when the program text has been successfully
compiled at last, nothing ends, but on the contrary - the most laborious phase of
program creation begins, called, as we have already guessed, debugging. Debugging
takes more effort than coding, requires much more sophisticated skills, and the main
thing is that it can take several times longer, and this is not an exaggeration at all.
Being psychologically ready for debugging, the programmer rationally calculates
his forces and time, so that errors detected at the first program launch do not discourage
him: it should be so! It is rather strange if a little or very complicated program does not
show errors at the first launches. The beginner's problem may be that, forgetting about
the upcoming debugging, he spent all his time and energy on writing the first version
of the text; when it comes to the most interesting part, there is no more time or energy.
Mountaineers who climb serious mountains follow one crucial principle: the goal
of climbing is not to reach the summit, but to get back. Those who forget this principle
often die on the descent after reaching the coveted summit. Of course, programming is
not so cruel - in any case, you will not die here; but if your goal is to write a program
that does what you want it to do, you need to be ready to spend two thirds of your time
and energy on debugging and not on anything else.
It is impossible to avoid debugging, but following some simple rules can make it
much easier. So, the most important thing: try to check the work of separate parts of
the program as you write them. Two laws work for you here at once: firstly, it is
much easier to debug code you have just written than code you have already forgotten;
secondly, the complexity of debugging grows nonlinearly as the amount of text to be
debugged increases. By the way, there is a rather obvious consequence of this rule: try
to divide your program into subroutines so that they depend on each other as little
as possible; among other benefits, this approach will make it easier to test parts of your
program separately. We can suggest an even more general principle: when you write
code, think about how you will debug it. Debugging does not forgive carelessness at
the coding stage; even such a "harmless" design violation as a short body of a branch
or loop statement left on the same line as the statement header can cost you a lot of
wasted nerves.
We have already seen the second rule at the beginning of the paragraph in the form
of a simple and succinct phrase: the error is always in the wrong place. If you have
an "intuitive certainty" that the effect you are observing can be caused, of course, only
by an error in this procedure, in this loop, in this fragment - do not believe your
§ 2.13. Testing and debugging 472
intuition. In general, intuition is a wonderful thing, but it does not work during program
debugging. The explanation is very simple: if your intuition was worth something
against this particular error, you would not make this error. So, before trying to fix this
or that fragment, you should be objectively (not "intuitively") sure that the error is
located here. Remember: in a place of the program where you know that you can make
a mistake, you are most likely not to make a mistake; on the contrary, the most intricate
errors appear exactly where you could not expect them; by the way, that's why they
appear there.
Objective methods of error localization include debug printing and step-by-step
execution under the control of a debugger program; if you hope to do without them, it
is better not to start writing programs at all; and here we come to the third rule: the
method of staring during debugging practically does not work. No matter how
much you stare at your text, the result will be expressed by the phrase "everything
seems to be right, but why doesn't it work?". Again, the reason is quite obvious: if you
could see your own error, you wouldn't make a mistake. There are two more
considerations in favor of the ineffectiveness of "close look": you will look most
attentively at those fragments of your code where you expect to find an error, and as
we already know, it is probably not there; plus, this includes such a well-known
phenomenon as "blurred view" - even if you look directly at the line of the program
where the error is made, you will hardly notice the error. The reason for this effect is
also easy to understand: you have just written this code fragment yourself using some
considerations that seem correct to you, and, as a result, the code fragment itself
continues to seem correct to you even if it actually contains an error. So, don't drag
your feet and don't waste precious time on "carefully studying" your own code:
you will have to debug the program anyway.
One more thing to note here: rewriting the program again is generally a good thing,
but it won't help you avoid debugging; most likely, you will just make mistakes in the
same places again. As for rewriting separate program fragments, it is even worse: the
error will probably be in a fragment other than the one you decide to rewrite.
The next rule is as follows: don't hope that the error is somewhere outside your
program. Of course, the compiler and the operating system are also programs, and
there are errors in them too, but hundreds of thousands of other users have already
caught all the simple errors there. The probability of running into an unknown error in
system software is much lower than the chance of winning a jackpot in some lottery.
As long as your programs do not exceed several hundred lines, you may consider that
they simply do not have the same weight category: to show an error in the same
compiler you need something much more tricky. And by the time your programs
become complex enough, you will realize that trying to blame it on the compiler and
the operating system looks rather ridiculous.
One more thing to consider: if you cannot find a bug in your program yourself,
no one else will find it. Students often ask the same question: "Why doesn't my
program work?" Your humble servant, hearing this question, usually, in turn, asks who
they think he is: a psychic, a telepath or a clairvoyant. In most cases it is more difficult
for an outsider to understand your program than to write a similar program from scratch.
§ 2.13. Testing and debugging 473
Besides, it is your program; you have made it yourself, you can figure it out yourself.
It is interesting that in the vast majority of cases the student asking this question has
not even tried to do anything to debug his program.
We will try to finish this paragraph in a positive way. Debugging is unavoidable
and very hard, but, as it often happens, the debugging process turns out to be a very
exciting, even a gamble. Some programmers declare that the debugger program is their
favorite computer game because no strategy, no solitaire, no arcade games give such a
variety of puzzles and food for brains, no flying games and shooters lead to the release
of so much adrenaline as the debugging process, and no successful quests do not bring
such a lot of adrenaline as the debugging process.
satisfaction as a successfully found and destroyed bug in the program. As it is easy to
guess, the question here is only in your personal attitude to what is happening: try to
perceive it as a game of "who's who" with a bug, and you will see that even this aspect
of programming can be enjoyable.
2.13.2. Tests
If you find that your program contains an error, it means that you have run it at
least once and most likely fed it with some data; however, the latter is not necessary, in
some cases programs "crash" immediately after startup, before they have time to read
anything. Anyway, you have already started testing your program; it is worth saying a
few words about the organization of this work.
Beginners, as a rule, "test" their programs very simply: run them, type some input
data and see what happens. This approach is really no good, but this, alas, becomes
clear not immediately and not to everyone; the author has met professional teams of
programmers, which even have specially hired testers, and they do exactly this from
morning till night: they race the program under test both ways, typing various data into
various input forms, pressing buttons and making other movements in the hope that
sooner or later they will come across some inconsistency. When programmers make
changes to a program, the work of such "testers" starts from the very beginning,
because, as we know, anything can break with any changes.
A similar picture can be observed in computer classes: students type input data on
the keyboard every time they run their programs, so that the same text is typed ten,
twenty, forty times. When you see such a picture, you wonder when this student will
become even a little bit lazy to do such nonsense.
To understand where such a student is wrong and how to act correctly, let's
remember that the standard input stream is not necessarily associated with the
keyboard; when we start a program, we can decide where it will read information from.
We have already used this in testing very simple programs (see page 252); even for a
program that needs only one integer as input, we chose not to enter this number every
time, but used the echo command. Of course, we still have to type the number when
we form a command, but the command itself remains in the history that the command
interpreter remembers for us, so we don't need to type the same number a second time:
instead, we use the up arrow or Ctrl-R search (see §1.2.8) to repeat the command
§ 2.13. Testing and debugging 474
we've already entered.
Of course, you can use the command interpreter's capabilities to store and re-run
test cases only in the simplest cases; if you approach the matter correctly, you should
create a set of tests represented in some objective form - usually in the form of a file or
several files - to check the program's operation.
By test we mean, and this is very important, all the information that is needed to
run a program or some part of it, input data that shows one or another aspect of its
functioning, check the result and give a verdict on whether everything worked correctly
or not. A test may consist of just data - for example, in one file we can form the data
that the program should input, in another file - what we expect to get at the output. A
more complex test may include a special test program code - this is how we have to act
when testing separate parts of a program, for example, its separate subroutines. Finally,
complex tests are designed in the form of whole programs - such programs that run the
program under test themselves, give it some data as input and check the results.
Suppose we decided to write a program to reduce simple fractions: it receives two
integers as input, meaning numerator and denominator, and prints two numbers as the
result - numerator and denominator of the same fraction reduced to the simplest
possible form.
When the program is written and passes compilation, further actions of a novice
programmer who has never thought about the correct organization of testing may look
like this:
newbie@host:~/work$ ./frcancel
25 15
5 3
newbie@host:~/work$ ./frcancel
7 12
7 12
newbie@host:~/work$ ./frcancel
100 2000
1 20
newbie@host:~/work$
In most cases, beginners are satisfied with this, thinking that the program is "correct",
but the task of fraction reduction is not as simple as it seems at first glance. If the
program is written "head-on", it will most likely not work for negative numbers. Our
beginner can be told about it by his more experienced friend or, if it happens in the
classroom, by the teacher; if he tries to run his program and give it something "with
minus" as input, its author can make sure that his seniors are right and the program, for
example, "loops" (this is what will happen with the simplest implementation of Euclid's
algorithm, which does not take into account the peculiarities of mod operation
for negative operands). Of course, fixing the program is not a problem, but another
thing is more important: any fixes may "break" what worked before, so the new version
of the program will have to be tested from the beginning, i.e. the test runs that have
already been done will have to be repeated, each time entering numbers from the
§ 2.13. Testing and debugging 475
keyboard.
As the reader has probably guessed by now, testing performed by a more
experienced programmer could look like this:
This approach is undoubtedly better than the previous one, but it is also still far from
being a full-fledged test, because to repeat each of the tests you will have to find it in
the history manually, and after running it you will have to spend time checking if the
numbers are printed correctly or not.
To understand how to properly organize testing of the program, let's first note that
each test consists of four numbers: two of them are given to the program as input, and
the other two are needed to compare the result printed by the program with them. Such
a test can be written in a single line; thus, the three tests in our example are expressed
as the following lines
25 15 5 3 7 12 7 12 100 2000 1 20
It remains to invent some mechanism that, having a set of tests in this form, will run
the program under test the required number of times without our participation, feed it
with test data and check the correctness of the results. In our situation, the easiest way
to accomplish this is to write a script in the command interpreter language; if you don't
remember how to do this, reread §1.2.15.
To understand how our script will look like, let's imagine that the four numbers that
make up the test are located in the variables $a, $b, $c, and $d. You can "run" the
test with the command "echo $a $b | ./frcancel "; but we don't just need
to run the program, we need to compare the result with the expected result, for which
we also need to put the result into a variable. For this purpose we can use assignment
and "back apostrophes", which, as we remember, substitute the result of the command
execution:
res='echo $a $b | ./frcancel'.
The result in the $res variable can be compared with the expected result, and if a
mismatch is detected, the user can be informed about it:
The read command built into the interpreter will help us to "drive" the numbers
§ 2.13. Testing and debugging 476
from the tests into the variables $a, $b, $c and $d; this command takes the names
of variables (without the "$" sign) as parameters, reads a string from its input stream,
splits it into words and "arranges" these words into the specified variables; if there are
more words, the last variable will contain the rest of the string consisting of all the
"extra" words. The read command has a useful property: if the next line is read
successfully, it is completed successfully, and if the thread has run out - unsuccessfully.
This allows you to use it to organize a while loop like this:
while read a b c d; do
# here will be the body of the done loop
In the body of such a loop, variables $a, $b, $c and $d sequentially take the first,
second, third and fourth word of the next string as their values. Note that each of our
tests is a four-word string (the words are numbers, but they also consist of characters;
there is nothing but strings in scripting languages). In the body of the loop we will place
the above if, which runs a separate test, and we will only have to figure out how to
feed the resulting construct to the standard input the sequence of our tests. To do this,
we can use a redirect of the "document here" kind, which is done with the "<<" sign.
This sign is followed by some word ("stop word"), and then the text to be input to the
command is written, and this text ends with a line consisting entirely of the stop word.
All together it will look something like this:
#!/bin/sh
#frcancel_test .sh
while read a b c d ; do
res='echo $a $b | ./frcancel'.
if [ x"$c $d" != x"$res" ]; then
echo TEST $a $b FAILED: expected "$c $d", got "$res"
fi
done <<END
25 15 5 3
7 12 7 12
100 2000 1 20
END
In spite of its primitive nature, it is already a real full-fledged test suite with the ability
to run tests automatically. As you can see, adding a new test is reduced to writing one
more line before END, but that's not the main thing; more important is that running
all the tests doesn't require any effort from us, we just run the script and see if it
produces anything. If it doesn't produce anything - the run was successful. The general
rule, which we have actually already followed, is as follows: a test may be hard to
write, but it should be easy to run.
The point here is this. Tests are needed not only to try to detect errors right after
writing a program, but also to assure ourselves to some reasonable extent that we have
not broken anything by making changes. That's why debugging should never be
§ 2.13. Testing and debugging 477
considered finished, and tests should never be thrown away (for example, erased) - they
will come in handy many times; and, of course, we should count on the fact that after
any slightest noticeable changes in the program we will have to check it with all the
tests we have, and it is understandably a bit expensive to do it manually.
If another test reveals an error in the program, do not rush to fix something. First
of all, you should consider whether you can simplify the test where the error occurred,
i.e. whether you can write a test that shows the same error but is simpler. Of course,
this does not mean that the more complex test should be thrown away. Tests should not
be thrown away at all. But the simpler the test is, the fewer factors that can affect the
program's work and the easier it will be to find an error. By successively simplifying
one test, you can create a whole family of tests, and perhaps the simplest of them will
not show an error. This is no reason to stop: try to take the simplest test that still "bugs"
and simplify it in some other direction. In any case, all the created tests, regardless of
whether they show some error right now or not, are valuable for further debugging:
some of them can show you the way to find an error, while others can show you where
you should not look for an error.
There is a special approach to writing programs called test first - in Russian it can be
roughly translated as "test first". In this approach, you write tests first, run them, make sure
they don't work, and then write the program text to make the tests work. If the programmer
starts to think that the program is still not written the way it should be, he should first write a
new test, which, if it does not work, will give some objective confirmation of the "incorrectness"
of the program, and only then change the program so that both the new test and all the old
ones pass. Writing program text intended for something other than satisfying the available
tests is completely excluded.
In this approach, tests are mostly written not for the whole program, but for each of its
procedures and functions, for subsystems including several interconnected procedures, etc.
Following the "test first" principle allows you to edit the program more boldly, without fear of
spoiling it: if we spoil something, the tests that stopped working will tell us about it; if something
spoils, but no test stops working, it means that the test coverage is insufficient and we need
to write more tests.
Of course, you don't have to follow this approach, but knowing it exists and having it in
mind will be at least helpful.
{$IFDEF DEBUG}
writeln('DEBUG: x = ', x, ' y = ', y);
{$ENDIF}
As long as there is a directive defining the DEBUG symbol at the beginning of the
§ 2.13. Testing and debugging 479
program, such a writeln operator will be taken into account by the compiler as
usual, but if you remove the DEFINE directive, the DEBUG "symbol"
becomes undefined, and everything between {$IFDEF DEBUG} and {$ENDIF}
will be simply ignored by the compiler. Note that the DEFINE directive doesn't
even have to be removed completely, just remove the "$" character from it, and it will
turn into an ordinary comment, and all debug printing will be disabled; if debug printing
is needed again, it will be enough to insert the character in its place. But you can do
even better: do not insert the DEFINE directive into the program at all, and define the
symbol from the compiler command line if necessary. To do this, just add a checkbox
"-dDEBUG" to the command line, like this:
By doing so, we can compile our program with or without using debug print without
changing the source code. You will understand why this may be important when you
start using version control.
In some cases, when debugging, you may want to know what the current state of a
complex data structure is - a list, a tree, a hash table, or something even more complex.
You should not be afraid of such situations, they are nothing complicated; it is just
worth describing a procedure specially designed to print out the current state of the data
structure we need. Such a procedure can be enclosed in a conditionally compiled
fragment, so that the procedure is not included in the version of the executable file
without debug printing and does not increase the amount of machine code.
fpc -g myprog.pas
The resulting executable file will be much larger in size, but we will be able to see
fragments of our source code and use variable names when executing our program
under the control of the debugger.
The gdb debugger is a program that has its own built-in command line
interpreter; we perform all actions with our program by giving commands to the
debugger. The debugger can work in different modes - in particular, it can be connected
to an already running program (process), and also with its help you can figure out in
what place of the program and for what reasons the crash occurred , but we will be
201
enough to deal with only one, the most popular mode, in which the debugger itself starts
our program and controls the course of its execution, obeying our commands. The
command line built into gdb is equipped with editing functions, autocompletion
(except, of course, not file names, but variable and subroutine names), storing and
searching the history of entered commands, so working with gdb turns out to be quite
convenient - provided we know how to do it.
You can start the debugger to work in this mode by specifying the name of the
executable file as a parameter:
gdb ./myprog
If we need to pass some command line arguments to our program, this is done with the
--args switch, for example:
(in this example, the myprog program will be run with three command line arguments
- abra, schwabra, and kadabra).
Once started, the debugger will report its version and some other information and
issue its command line prompt, usually looking like this:
(gdb)
We can start program execution with the start command, in this case the debugger
will start our program, but will stop it at the first statement of the main part, not allowing
it to do anything; further execution will take place under our control. We can do the
opposite: give the run command, then the program will start and run as normal (that
If the operating system has generated a so-called core file; however, this usually does not happen
201
indicates that you should stop at breakpoint #5 only if the value of the i variable is
less than one hundred. The "info breakpoints" command allows you to find
out what breakpoints you have, what conditions are set for them, ignore counters, etc.
You can view the values of variables when the program is stopped by using the
inspect command. If necessary, the "set var" command allows you to change
the value of a variable, although this is relatively rarely used; for example, "set var
x=50" will force the variable x to be set to 50.
In a program that actively uses subroutines, the bt (or backtrace, for short)
command can be very useful. This command shows which subroutines have been called
(but not yet completed), with what parameters they were called, and from where in the
program. For example, while debugging the hanoi2 program (see §2.11.2), the bt
command might produce:
(gdb) bt
#0 MOVELARGER (RODS=...) at hanoi2.pas:52
#1 0x080483d9 in SOLVE (N=20) at hanoi2.pas:91
#2 0x08048521 in main () at hanoi2.pas:110
This means that the MOVELARGER procedure (called MoveLarger in the program
text) is active now, the current line is line 52 in the hanoi2.pas file; the
MoveLarger procedure was called from the SOLVE procedure
(Solve), the call is located in line 91. Finally, Solve was called from the main
part of the program (denoted by the word main; the point here is that gdb is mainly
C-oriented, and it uses a function named main instead of main), the call is located
in line 110.
The first number in each output line of the bt command is the frame number.
Using this number, we can switch between the contexts of the listed subroutines; for
example, to look at the values of variables at the points where the subroutines below
were called. For example, in our example, the frame 1 command will allow us to look
at the point in the Solve procedure where it calls MoveLarger. After the frame
§ 2.13. Testing and debugging 483
command, the list and inspect commands can be used to provide
information related to the current position of the selected frame.
Another useful command is call; it allows you to call any of your subroutines
with specified parameters at any time. Unfortunately, there are some limitations here;
gdb doesn't know anything about Pascal strings, for example, so if your subroutine
requires a string as one of its parameters, you can call it by specifying some suitable
variable as a parameter, but you can't specify a specific string value.
Exiting the debugger is done with the quit command, or you can arrange an
"end of file" situation by pressing Ctrl-D. In addition, it is useful to know that the
debugger has a help command, although it is not so easy to work with.
As mentioned at the beginning of this paragraph, gdb can be used in different
modes. For example, if you have already started your program and it behaves
incorrectly, but you don't want to repeat the actions that caused this behavior, or you
are not sure if you can recreate the existing situation, you can connect the debugger to
an existing process. To do this, of course, you need to find out the process number; we
described how to do this in §1.2.9. Next, gdb is run with two parameters: the name
of the executable file and the process number, e.g.:
The debugger needs the executable file to take debugging information from it, i.e.
information about variable names and source code line numbers. After a successful
connection, the debugger pauses the process and waits for your instructions; you can
use the bt command to find out where you are and how you got there, use the
use the inspect command to view the current values of variables, use the break,
cont, step, next, etc. commands. After exiting the debugger, the process will
continue execution, unless, of course, you killed it during debugging.
The last of the three gdb modes, core-file analysis mode, is not needed when working
with Free Pascal, because Free Pascal creates executables so that they intercept operating
system signals indicating an emergency, generate an error message, and terminate - which
from the system's point of view looks like a correct termination, not an emergency, and does
not result in the creation of a core file. We will return to the study of gdb in the second volume;
for programs written in C, we will definitely need to analyze core files.
{$I myfile.pas}
This will work the same as if you had pasted the entire contents of myfile.pas
right there instead of this line.
Partitioning the program text into separate files connected by the translator
removes some of the problems, but, unfortunately, not all of them, because such a set
of files remains, as programmers say, one unit of translation - in other words, we can
only compile them all together in one go. Meanwhile, although modern compilers work
quite fast, but the volumes of the most serious programs are such that their complete
recompilation may take several hours and sometimes even days. If we have to wait for
a day (or even a couple of hours - it will be enough) after making any, even the most
insignificant, change in the program, it will be absolutely impossible to work. Besides,
programmers almost always use so-called libraries - sets of ready-made subprograms
that are changed very rarely and, accordingly, it would be silly to spend time on
recompiling them all the time. Finally, the problems are caused by constantly occurring
name conflicts: the larger the code volume, the more different global identifiers (at least
subroutine names) are required in it, the probability of random coincidences grows, and
there is almost nothing you can do about it during translation in one step.
All these problems can be solved by the technique of split translation. Its essence
is that a program is created as a set of separate parts, each of which is compiled
separately. Such parts are called translation units, or modules. Most programming
languages, including Pascal, assume that modules are individual files. Usually, a set of
logically related subroutines is formed as a separate translation unit; everything
necessary for their operation is also placed in the module - for example, global
variables, if any, as well as all sorts of constants and so on. Each module is compiled
separately; the translation of each of them results in some intermediate file containing
the so-called object code , and such files are combined into a ready executable file
202
with the help of the link editor (linker); the link editor usually works so fast that the
rebuilding of the executable file from intermediate files each time does not create
significant problems.
202
Object code is a kind of a preparation for machine code: program fragments are represented in it
by sequences of instruction codes, but some addresses may not be arranged in these codes because they
were not known at the time of compilation; the final transformation of the code into machine code is
the task of the link editor.
§ 2.14. Modules and separate compilation 485
A very important property of a module is that it has its own namespace: when
creating a module, we can decide which of the input names will be visible from other
modules and which will not; it is said that a module exports part of the names entered
in it. It often happens that a module enters several dozens and sometimes hundreds of
identifiers, but all of them turn out to be needed only in the module itself, and from the
rest of the program only one or two subroutines need to be addressed, and it is their
names that the module exports. This eliminates the problem of name conflicts: labels
with the same names may appear in different modules, and this does not bother us in
any way, unless they are exported. Technically, this means that when translating the
source code of a module into object code, all identifiers other than the exported ones
disappear.
unit myunit;
Unlike the identifier appearing in the program header, which has absolutely no effect
on anything, the module identifier (name) is a very important thing. First of all, this
name identifies the module in other translation units, including the main program; to
get the module's capabilities at your disposal, you must place the uses directive,
already familiar to us from the chapter on full-screen programs, in the program (and if
necessary, in another module, but more on that later). The crt module we used earlier
is included in the compiler, but connecting it is not fundamentally different from
connecting the modules we wrote ourselves:
program MyPrograml;
uses myunit;
You can use several uses directives, or you can list modules comma-separated in
one such directive, for example:
The easiest thing to do is to make the name of the module specified in its header match
the main part of the file name, or, more precisely, to make the original module file have
a name formed from the module name by adding the suffix ".pp"; for example, the
mymodule file would be easiest to call mymodule.pp. This convention can be
circumvented, but we will not discuss this possibility.
The further text of the module should consist of two parts: the interface , labeled
with the keyword interface, and the implementation, labeled with the keyword
§ 2.14. Modules and separate compilation 486
implementation. In the interface part we describe everything that will be visible
from other translation units that use this module, and for subroutines in the interface
part we put only their headers; in addition to subroutines, the interface part can also
describe constants, types and global variables (but do not forget that global variables
are better not to use at all).
In the implementation, we must, first, write all subroutines whose headers are
placed in the interface part; second, we can describe here any objects that we do not
want to show to the "outside world" (i.e. other modules); these can be constants,
variables, types, and even subroutines whose headers were not placed in the interface
part.
The main idea of dividing a module into an interface and an implementation is that
we have to tell in detail about all the features of the interface to those programmers who
will use our module, otherwise they will simply not be able to use it. When we create
documentation for our module, we have to describe in it all the names that the interface
section introduces. Moreover, when someone starts using our module (note that this
also applies to the case when we use it ourselves), we have to do our best to keep the
rules of using the names introduced in the interface unchanged, i.e. we can add new
types or subroutines, but if we think of changing something that was already there, we
have to think twice: all programs using our module will "break".
Everything is much simpler with the implementation. We don't need to tell our
module users about it, we don't need to include it in the documentation either ; we can 203
change it at any time without fear that something other than our module itself will break.
Among other things, possible name conflicts must be taken into account. If all the
names used in a module are visible throughout the program, and the program itself is
large enough, the problem of random name conflicts like "oh, I think someone has
already named a completely different procedure in a completely different place" gives
programmers a lot of headaches, especially if some mo-
dulles are used simultaneously in different programs. Obviously, hiding in the module
those names that are not intended for direct use from other translation units drastically
reduces the probability of such random coincidences.
Own module namespaces allow to solve not only the problem of name conflict, but
also the problem of simple "foolproofing", especially relevant in large program
developments, in which several people take part. If the author of a module does not
assume that this or that procedure will be called from other modules, or that a variable
should not be changed in any other way than by procedures of the same module, then
it is enough for him not to put the corresponding names in the interface, and you can
not worry about anything - other programmers will not be able to access them, purely
technically.
In general, hiding the details of the implementation of a subsystem in the program
is called encapsulation and allows programmers to more boldly correct the code of
modules, without fear that other modules in this case will stop working: it is enough to
203
In fact, the implementation is also often documented, but such documentation is intended not for
the users of the module, but for those programmers who work in the same team with us and who may
need to debug or improve our module.
§ 2.14. Modules and separate compilation 487
keep unchanged and working those names that are taken out in the interface.
Like the main program file, the module file ends with the keyword end and a dot.
Before that, you can insert a so-called initialization section - write the word begin,
a few operators and only then end with a dot; these operators will be executed before
the main part of the program starts. But it only makes sense to do this if you have global
variables, so if you do it right, you won't need the initialization section for a very long
time - perhaps never, unless you decide to make Free Pascal your main tool.
As an example, let's return to our binary search tree from §2.11.5 and try to put
everything we need to work with it into a separate module. The interface will consist
of two types - the tree node itself and its pointer, i.e. TreeNode and TreeNodePtr
types, as well as two subroutines: AddToTree procedure and IsInTree function.
We may notice that the "generalized" SearchTree function and the
TreeNodePos type returned by it are implementation peculiarities that the module
user does not need to know about: what if we want to change this implementation.
Therefore, the TreeNodePos type will be described in the implementation part of
the module, and only AddToTree and IsInTree, but not SearchTree,
will be present from the function headers in the interface part. It will look like this:
unit Ingtree; interface { Ingtree.pp }
type
TreeNodePtr = 'TreeNode;
TreeNode = record
data: longint;
left, right: TreeNodePtr;
end;
implementation
type
TreeNodePos = 'TreeNodePtr;
end.
To demonstrate the work of this module, let's write a small program that will read
from the keyboard requests of the form "+ 25" and "? 36", the request of the first
kind will be executed by adding the specified number to the tree, in response to the
request of the second kind - by printing Yes or No depending on the number of the
specified number.
§ 2.14. Modules and separate compilation 489
of whether the specified number is in the tree or not. The program will look like this:
program UnitDemo; { unitdemo.pas }
uses lngtree; var root: TreeNodePtr = nil; c: char; n:
longint; ok: boolean; begin while not eof do begin readln(c,
n); case c of '?': begin
if IsInTree(root, n) then
writeln('Yes!') else writeln('No.') end; '
+ ': begin AddToTree(root, n, ok); if ok
then writeln('Successfully added') else
writeln('Couldn't add!') end; else
writeln('Unknown command "', c, '"') end
end end end.
It is enough to run the compiler once to compile the whole program:
fpc unitdemo.pas
The Ingtree.pp module will be compiled automatically, and only if required.
The result will be two files: Ingtree.ppu and Ingtree.o. If the module's source
text is changed, the compiler will recompile the module again the next time the whole
program is rebuilt, but if it is left untouched, only the main program will be recompiled.
The compiler finds out whether the module needs to be recompiled or not by
comparing the last modification dates of Ingtree.pp and Ingtree.ppu files;
if the first one is newer (or if the second one simply does not exist), the compilation is
performed, otherwise the compiler considers it unnecessary and skips it. However, no
one prevents you from compiling the module "manually" by issuing a separate
command:
2.14.2. Using modules from each other
Quite often modules need to use the capabilities of other modules. The simplest of
such cases occurs when you need to call a subroutine of another module from a
subroutine of one module. Similarly, it may be necessary to use the name of a constant,
type, or global variable in the body of a subroutine that was entered by another module.
All these cases do not create any difficulties; just insert the uses directive into the
implementation section (usually right after the word implementation), and all
the features provided by the interface of the module specified in the directive will be
available to you.
Things are a bit worse if you have to make your module's interface dependent on
another module. Fortunately, such cases are much rarer, but they are still possible. For
example, you may need to describe a new type in the interface part of your module
based on a type introduced in another module (e.g. a record type was introduced in one
module, and another module introduces a type array of such records, etc.). You may
just as well need, when creating a new array type, to refer to a constant introduced by
another module; finally, you may need in your interface routines a parameter of a type
described in another module, or a return value of such a type from a function, or,
finally, just a global variable having a type that came from another module. All these
situations have a common feature: in the interface part of your module, you use a name
fpc Ingtree.pp
§ 2.14. Modules and separate compilation 490
introduced by another module.
In principle, there are no special problems in this case: it is enough to place the
uses directive in the interface part (usually right after the word interface) or in
the very beginning of the module right after its header; the effect will be exactly the
same. It should only be taken into account that such dependency, unlike dependency
at the implementation level, gives rise to certain restrictions: the interface parts of two
or more modules cannot depend on each other crosswise or "in a circle".
In general, cross-dependencies between modules should be avoided in any case,
but sometimes this is still necessary; at least make sure that your modules use each
other's features only in their implementations, not in interfaces. Programmers try to
avoid dependencies between interfaces even if they are not cross-dependent, but this
is not always the case.
2.14.3. Module as an architectural unit
When distributing the program code into modules, you should keep in mind
several rules.
First of all, all features of one module must be logically related to each other.
When a program consists of two or three modules, we can still remember how parts of
the program are distributed among the modules, even if this distribution is not subject
to any logic. The situation changes dramatically when the number of modules reaches
at least a dozen; meanwhile, programs consisting of hundreds of modules are quite
common; moreover, you can easily find programs with thousands and even tens of
thousands of modules. You can navigate in such an ocean of code only if the program
implementation is not just scattered among modules, but is divided into subsystems
according to some logic, each of which consists of one or several modules.
To check if you are doing the correct breakdown into modules, ask yourself a
simple question about each module (and about each subsystem consisting of several
modules): what exactly is this module (this subsystem) responsible for? The answer
should consist of a single phrase, as in the case of subprograms. If you can't give such
an answer, then most likely your principle of division into modules needs correction.
In particular, if a module is responsible not for one task, but for two unrelated tasks, it
would be logical to consider splitting this module into two.
There is one more point related to global identifiers. Pascal does not have separate
namespaces for global objects, so to avoid possible name conflicts, all globally visible
identifiers referring to one subsystem (a module or some logically united set of modules) are
often provided with a common prefix denoting this subsystem. For example, if you create a
module for working with complex numbers, it makes sense to start all exported identifiers of
such a module with the word Complex, something like ComplexAddition,
ComplexMultiplication. ComplexRealPart, etc. This is not so relevant in
small programs, but in large projects, name conflicts can become a serious problem.
204
In object-oriented programming languages, an object is also added to this list.
§ 2.14. Modules and separate compilation 522
one-owner rule.
Part 3
exactly the same way (with an equal sign), but this fact does not help us in any way: we
cannot adequately estimate the resource consumption of this or that operation without
understanding how and what the processor does. A programmer who has no experience
of working at the level of processor's commands simply does not know what he is
actually doing; when inserting some operations into a program in a high-level language,
he often does not realize how complex a task he is putting before the processor. As a
result, we have huge programs that are discouraging in their low efficiency - for
example, office document automation applications that are "cramped" in four gigabytes
of RAM and for which the processor, which is many orders of magnitude faster than
the supercomputers of the eighties, turns out to be "too slow".
Experience shows that a professional computer user, be it a programmer or a system
administrator, may not know something, but by no means can afford not to
understand how a computer system is organized at all its levels, from electronic logic
circuits to cumbersome application programs. 205Not understanding something, we
leave room in our rear for the "feeling of magic": on some almost subconscious level
we continue to think that something there is unclean and without a couple of wizards
with magic wands did not do without it. Such a feeling is categorically inadmissible for
a professional: on the contrary, a professional must understand (and intuitively feel)
that the device he is dealing with was created by people like himself, and is nothing
"magical" or "unknowable".
If the goal is to achieve this level of understanding, it does not matter at all what
particular architecture and assembly language you study. Once you know one assembly
language, you can start writing in any other language after spending two or three hours
(or even less) studying reference information; but the important thing is that, by being
able to think in terms of machine commands, you will always know what actually
happens when you execute your programs.
1For those who know C+-+, let's explain: what happens if you apply an assignment operation to an
object of type list<string> containing two or three thousand elements?
521
In spite of all the above, it seems necessary to explain the choice of a particular
architecture. The material of our "assembler" part of the book is based on the instruction
system of processors of the x86 family, and we will use the 32-bit variant of this
architecture, the so-called ²386 instruction system. At the moment of writing this text
32-bit computers of x86 family have been almost completely replaced by computers
based on 64-bit processors , but fortunately these processors can execute programs in
206
32-bit mode. We will not study the 64-bit instruction system itself and there is a certain
reason for that. All the available descriptions of this system are built on the principle of
enumerating its differences from the 32-bit case; it turns out that we need to study the
32-bit instruction system first in any case. But having studied it we will have already
reached our goal - we will get experience of working in assembly language; further
transition to the 64-bit case is possible but for our purposes it is a bit excessive. Even
the 32-bit instruction system we will study far not in all its (nightmarish) splendor: we
will have enough about a tenth of the possibilities of the processor we are studying to
be able to write programs.
There is one more reason to study 32-bit architecture. One of the main technical
inventions, the understanding of which should be taken out of assembly programming,
is the so-called stack frame used when interfacing subroutines written in high-level
languages. The x86_64 architecture doubled the number of general-purpose registers
available to the program, which is good in itself, but this number of registers is almost
always enough to pass all its parameters to any subroutine; the stack frame degenerates,
losing half of its content - local data and return address are still stored in the stack, but
parameter values are not. Note that the savings here are not so great, because calling
another subroutine will require the same registers, so they will still have to be stored in
the stack - just not in the parameter area, which no longer exists, but in the local data
area; something is really saved only in the case of subroutines that do not call anyone
themselves. In any case, understanding how a stack frame is organized in its full version
is obligatory for a good programmer, and from this point of view the 32-bit architecture
turns out to be a better candidate for the role of a teaching aid.
It will be appropriate to say a few words regarding the choice of a particular
assembler. As you know, there are two main approaches to assembly language syntax
for working with x86 processors: AT&T syntax and Intel syntax. The same processor
instruction is represented quite differently in these syntax systems: for example, an
instruction in Intel syntax looking like
206
The author considers it appropriate to note that much of this text was prepared for printing on the
EEEPC-901; this netbook is equipped with a single-core 32-bit processor, which is nevertheless
sufficient to handle all the tasks the author encounters under normal circumstances.
522
AT&T syntax is traditionally more popular in the Unix OS environment, but this creates
some problems when applied to the task at hand. Tutorials oriented to assembly
language programming in Intel syntax do exist, while AT&T syntax is described only
in special (reference) technical literature, which is not intended for teaching. In
addition, one should also take into account the longstanding dominance of MS DOS as
a platform for similar courses; all of this makes Intel syntax much more familiar to
teachers (and, oddly enough, to some students as well) and better supported. There are
two main assemblers available for Unix that support Intel syntax: NASM ("Netwide
Assembler"), developed by Simon Tetham and Julian Hall, and FASM ("Flat
Assembler"), created by Tomasz Grishtár. It is difficult to make a clear choice between
these assemblers. Our book deals with the NASM assembly language, including its
specific macro tools; this choice is not for any good reason and is simply random.
, after which the register will contain the address of the instruction following the
current one;
• decrypt the command code extracted from memory and perform the action
corresponding to this code.
For some reason, many students are stumped in an exam by the question how the CPU
knows how it knows which machine instruction (out of millions of instructions in
memory) to execute right now; the correct answer is quite trivial - the address of the
In particular, the instruction codes of the processor we are going to study can occupy from 1 to
207
15 cells.
§ 3.1. Introductory information 524
required machine instruction is in the instruction pointer. The processor in no way tries
to penetrate into the logic of the program being executed, into what the results of the
program as a whole should be; it only follows the once and for all established cycle:
read the code, increment the instruction pointer, execute the instruction, start again. The
automatic incrementing of the address in the instruction pointer register causes the
machine instructions that make up the program to be executed one after the other in the
sequence in which they are written in the program (and located in memory).
When an instruction pointer contains the address of a particular location in RAM,
it is said that this memory location (or, what is the same, this section of the program) is
where the control is located. The logic of this term is based on the fact that the
processor's actions are subordinate to machine instructions (controlled by them), with
the processor fetching the next instruction from memory at the address from the
instruction pointer.
To organize the familiar branching, loops and subroutine calls, machine
commands are used that forcibly change the contents of the instruction pointer, as a
result of which the instruction sequence is broken and program execution continues
from another place - from the instruction whose address is stored in the register. This
is called a transition or control transfer (to another section of the machine code). Note
that the instruction execution cycle discussed above involves first incrementing the
instruction pointer and then executing the instruction, so if the next instruction
performs a control transfer, it writes a new address into the instruction pointer on top
of the address already there, calculated during automatic incrementing.
We have already discussed transitions in the introductory part of our book (see 67).
It was also said that control transfer instructions are unconditional and conditional: the
former simply put a given address into the instruction pointer, while the latter first
check if a condition is met, and if it is not met, do nothing, i.e. no transition takes place,
and execution continues as usual with the next instruction. It is conditional transitions
that allow you to organize branching, as well as loops, the duration of which depends
on the conditions; the instructions of unconditional transitions play rather an auxiliary
role, although very important.
Most processors also support the return address memory transition used to call
subroutines. When performing such a transition, the address in the instruction pointer
is first stored somewhere in memory (as we will see later, the hardware stack is used
for this purpose), and only then is replaced by a new address, usually the address of the
beginning of the machine code of a procedure or function. Since the address of the next
instruction is already in the instruction pointer by the time the instruction is executed
(in this case, the jump instruction with memorized return), it is the address of the next
instruction that will be memorized. When the subroutine finishes its work, it returns
control, that is, it places in the instruction pointer the value that was memorized when
it was called, so that in the called part of the program execution continues from the
instruction following the stored transition instruction. Usually such an instruction is
called a recall instruction.
Let us repeat once again that control transfer is understood as a forced change
of the address located in the instruction pointer register (instruction counter). This
§ 3.1. Introductory information 525
is worth remembering.
over access to devices, and such conflicts would, of course, lead to crashes. To limit
what a user task can do, the creators of the CPU declared some of the available machine
instructions to be privileged. The processor can operate either in privileged mode, also
called supervisor mode, or in restricted mode (aka task mode or user CPU mode ) . 210
In restricted mode, privileged instructions are not available; in privileged mode, the
processor can execute all available instructions, both regular and privileged. The
operating system executes in privileged mode, of course, and switches the mode to
restricted mode when control is transferred to a user task. The processor can return to
privileged mode only if control is returned to the operating system; this precludes
execution of user program code in privileged mode. Privileged instructions include
instructions that interact with external devices; also included in this category are
208
The term "task" is, strictly speaking, quite complex, but simplistically a task can be understood
as a program that is started for execution under the control of an operating system; in other words, when
a program is started on the system, a task is created.
209
There are exceptions to this rule, such as displaying graphical information on the display,
but in this case the device must be assigned to one user task and strictly unavailable for other tasks.
210
In fact, the i386 processor and its descendants have not two but four modes, also called rings
of protection, but in reality operating systems use only ring zero (the highest possible privilege level)
and ring three (the lowest privilege level).
§ 3.1. Introductory information 526
instructions used to configure memory protection mechanisms and some other
instructions that affect the operation of the system as a whole. All such "global" actions
are considered to be the prerogative of the operating system. When working under a
multitasking operating system, a user task is only allowed to transform
information in its allocated area of RAM. The task performs all interaction with
the outside world through calls to the operating system. The task cannot even just
display a string on the screen by itself, it has to ask the operating system to do it. Such
an appeal of a user task to the operating system for some services is called a system
call. It is interesting that only the operating system is able to complete the task; it
becomes obvious if we remember that the task itself is an object of the operating system,
it is the operating system that loads the program code into memory, allocates memory
for data, sets up protection, starts the task, provides allocation of processor time to it;
when the task is completed, it is necessary to mark its memory as free, stop allocating
processor time to this task, etc., and only the operating system can do this, of course.
Thus, a correct user task cannot do without system calls, because it needs to call the
operating system even just to terminate.
One more important point that should be mentioned before starting to study a
particular processor is the presence of a virtual memory mechanism in our runtime
environment. Let's try to understand what it is. As we have already mentioned, RAM
is divided into cells of the same capacity (in our case each cell contains 8 bits of data),
and each such cell has its own serial number. It is this number that the CPU uses to
work with memory cells via a common bus to distinguish them from each other. Let's
call this number the physical address of the memory cell. Initially, memory cells had
no addresses other than physical ones. It was physical addresses that were used in the
machine code of programs and they were called simply "addresses" without the
qualifying word "physical". With the development of multitasking mode of computing
systems it turned out that due to a number of reasons the use of physical addresses is
inconvenient. For example, a program in machine code that uses physical addresses of
memory cells will not be able to work in another memory area - and in a multitasking
situation it may turn out that the area we need is already occupied by another task. There
are other reasons as well, which we will return to in Volume 2 when we examine the
operating system principles.
Modern processors use two types of addresses. The processor itself works with
memory using physical addresses that we already know, but programs that run on the
processor use quite different addresses - virtual addresses. A virtual address is a number
from some abstract virtual address space. On i386 processors, virtual addresses are 32-
bit integers, that is, the virtual address space is a set of integers from 0 to 2 - 1;
32
8008, an eight-bit processor, followed by the more advanced Intel 8080 in 1974.
Interestingly, the 8080 used different operation codes, but programs written in assembly
language for the 8008 could be translated for the 8080 without changes. Intel designers
supported a similar "source code compatibility" for the Intel 8086 16-bit processor that
appeared in 1978. Released a year later, the Intel 8088 processor was practically the
same device, differing only in the bit size of the external data bus (for the 8088 it was
8 bits, for the 8086 - 16 bits). It was the 8088 processor that was used in the IBM PC
computer, which gave rise to the numerous and incredibly popular family of 212
211
Recall that a machine word is a portion of information processed by a processor in one go.
212
The popularity of IBM-compatible machines is a very controversial phenomenon; many
other architectures with substantially better design could not survive in a market flooded by IBM-
compatible computers, cheaper because of their mass availability. Anyway, this is the situation now and
§ 3.1. Introductory information 528
machines, still called IBM PC-compatible or simply IBM-compatible.
The 8086 and 8088 processors did not support memory protection and had no
division of commands into regular and privileged, so it was impossible to run a full
multitasking operating system on computers with these processors . The same was the
213
case with the 80186 processor released in 1982. Compared to its predecessors, this
processor was much faster, because it had hardware implementation of some operations
performed in previous processors by microcode; the clock frequency also increased.
The processor included some subsystems that previously had to be supported by
additional chips-such as an interrupt controller and a direct memory access controller.
In addition, the processor's instruction system was expanded by the introduction of
additional instructions; for example, it became possible to stack all general-purpose
registers with a single instruction. The address bus of the 8086, 8088 and 80186
processors was 20-bit, which allowed addressing no more than 1 MB of RAM (2 cells). 20
The same year, 1982, saw the 80286 processor, which was the last 16-bit processor
in the series. This processor supported the so-called "protected" mode of operation
(protected mode) and the segmented virtual memory model, which implied, among
other features, memory protection; four protection rings allowed to prohibit user tasks
from performing actions affecting the system as a whole, which is necessary when
running a multitasking operating system. The address bus received four additional bits,
increasing the maximum amount of directly accessible memory to 16 MB.
True multitasking operating systems were created only for the next processor in the
line, the 32-bit Intel 80386, for short referred to simply as "I386". This processor, mass
production of which began in 1986, differed from its predecessors by increasing
registers to 32 bits, significantly expanding the instruction system, increasing the
address bus to 32 bits, which allowed to directly address up to 4 GB of physical
memory. The addition of support for paged organization of virtual memory, best suited
for multitasking, completed the picture. It was with the appearance of the ²386 that the
so-called IBM-compatible computers finally became full-fledged computing systems.
At the same time І386 fully preserved compatibility with the previous processors of its
series, which is due to the rather strange at first sight register system. For example,
universal registers of 8086-80286 processors were called AX, BX, CX and DX and
contained 16 bits of data each; in the І386 processor and later processors of the line
there are registers containing 32 bits each and called EAX, EBX, ECX and EDX (the
letter E means the word "extended", i.e. "extended"). The lower 16 bits of each of these
registers retain their old names (AX, BX, CX, and DX, respectively) and are still
available for operation without their "extended" parts.
Further development of the x86 processor family up to 2003 was purely
quantitative: speed was increased, new commands were added, but there were no
fundamental changes in the architecture. In 2001 the alliance of Hewllett Packard and
/include "stud_io.inc"
global _start
section .text
_start: mov eax, 0
again: PRINT "Hello"
PUTCHAR 10 inc eax cmp eax, 5 jl again FINISH
Let's try to explain something now. The first line of the program contains the
'/include directive, which instructs the assembler to insert in place of the directive
itself the entire contents of some file, in this case, the file stud_io.inc. This file
is also written in assembly language and contains descriptions of PRINT, PUTCHAR
and FINISH macros, which we will use respectively to print a line, to move to the
next line on the screen and to terminate the program. By seeing and executing the
/include directive, the assembler will read the file with the macro descriptions,
allowing us to use them.
It is important to note that the /include directive must be placed in the program
text before macro names are encountered. The assembler looks through the text from
top to bottom. Initially, it knows nothing about macros and will not be able to process
them if it is not informed about them. After looking through the file containing the
macro descriptions, the assembler remembers these descriptions and continues to
remember them until the translation is complete, so we can use them in the program -
but not before the assembler knows about them. That is why we put the /include
directive at the very beginning of the program: now macros can be used in the whole
program text.
After the '/include directive we see a line with the word global; this is
also a directive, we will return to it a little later.
The next line of the program contains the section directive; let us try to explain
its meaning. The Unix executable file is designed in such a way that it stores machine
commands in one place, and initialized data (i.e., data that is given an initial value
directly in the program) in another place, and finally, the third place contains
information about how much memory the program will need for uninitialized data. The
corresponding parts of the executable file are called sections. When loading the
executable file into memory, the operating system creates separate memory areas (so-
called segments) for machine code (taking our section containing machine code as a
basis), for data (here initialized and uninitialized data are combined; in general, a
§ 3.1. Introductory information 531
segment may consist of several sections), and for the stack (no sections correspond to
this segment).
Based on our program text, the assembler generates separate images (i.e., future
memory contents) for each of the sections; we must place our executable code in one
section, descriptions of memory areas with a given initial value in another section, and
descriptions of memory areas without initial values in a third section. The
corresponding sections are called .text, .data, and .bss. The stack section is
formed by the operating system without our participation, so it is not mentioned in
assembly language programs. In our simple program, we only need the .text
section; the directive under consideration tells the assembler to start forming this
section. In the future, when we consider more complex programs, we will have to deal
with all three sections.
Next in the program we see the line
_start: mov eax, 0
The word mov refers to an instruction that causes the processor to send some data
from one location to another. The instruction is followed by two parameters called
operands; for the mov instruction, the first operand specifies where the
data should be copied to, and the second operand specifies what data should be copied
there. In this particular case, the command requires the number 0 (zero) to be written
to the EAX register . The value stored in the EAX register will be used as a loop
214
counter, that is, it will indicate how many times we have already typed the word
"Hello"; clearly, at the beginning of the program execution this counter should be zero,
since we have not typed anything yet.
So, the line in question means telling the processor to put a zero in the EAX; but
what's the mysterious "_start:" at the beginning of the line?
The word _start (the underscore in this case is part of the word) is a so-called
label. Let's first try to explain what these labels are "in general", and then we'll tell you
why we need a label in this particular case.
The mov eax,0 command is converted into machine code by the assembler
(see §§1.1.3 and 1.4.7). Note for clarity that the machine code of this command consists
of five bytes: b8 00 00 00 00 00 00, the first of which specifies the actual
action "place a given number in a register" and also the number of the EAX register.
The other four bytes (all together) specify the number to be placed in the register; in
this case it is the number 0. During the execution of the program, this code will be
located in some area of RAM (in this case - in five consecutive cells). In some cases,
we need to know what address a particular memory location will have; in the case of
commands, we may need the address, for example, to force the processor to transfer
The reader already experienced in assembly language programming may notice that it is "more
214
correct" to do it with a completely different command: xor eax, eax, because it allows you to
achieve the same effect faster and with less memory consumption; however, for a simple tutorial
example, such a trick requires too long explanations. However, we will come back to this issue later
and will certainly consider this and other similar tricks.
§ 3.1. Introductory information 532
control to this location in the program (i.e., to make a conditional or unconditional jump
here).
Of course, RAM can store not only commands but also data. We usually call
memory areas intended for data as variables and name them almost the same way as in
high-level programming languages, including Pascal. Naturally, we need to know what
address the beginning of the memory area allocated for the variable has. The address,
as we have already mentioned, is set by with eight hexadecimal digits, for example,
11
As you can easily guess, the word again at the beginning of the line is another
marker. The word "again" means "again" in English. For the word Hello to be
printed five times, we have to return to this point in the program four more times; hence
the name of the label. The word PRINT is the name of the macro, and the string
"Hello" is the parameter of the macro. The macro itself is described, as already
mentioned, in the file stud_io.inc. When our assembler sees the macro name and
parameter, it will substitute them with a number of commands and directives, the
execution of which will eventually result in the "Hello" string being displayed on the
screen.
It is very important to realize that PRINT has nothing to do with the
capabilities of the CPU. We have already mentioned this fact several times, but
nevertheless we will repeat it again: PRINT is not the name of any CPU command,
the CPU cannot print anything. The program line we are considering is not a command
but a directive, also called a macro call. Obeying this directive, the assembler will form
a fragment of assembly language text (note for clarity that in this case this fragment
will consist of 23 lines in the case of Linux and 15 lines for FreeBSD) and will translate
this fragment itself, receiving a sequence of machine instructions. These instructions
will contain, among other things, an appeal to the operating system for the service of
data output (system call write). The set of macros, including the PRINT macro, is
introduced for convenience at first, while we do not yet know how to access the
operating system. Later we will learn this, and then the macros described in
stud_io.inc will not be needed; moreover, we will learn to create such macros
ourselves.
Let's go back to the text of our example. The next line looks like
PUTCHAR 10
inc eax
Here we see the machine command inc, meaning an order to increment a given
register by 1. In this case, the EAX register is incremented. Recall that in the EAX
register we have agreed to store information about how many times the word "Hello"
has already been printed. Since the execution of the previous two lines of the program,
§ 3.1. Introductory information 534
containing calls to the PRINT and PUTCHAR macros, eventually led to the printing
of the word "Hello", we should reflect this fact in the register, which we do. Curiously,
the machine code of this command is very short - only one byte (hexadecimal 40,
decimal 64).
Next in our program is the compare command:
cmp eax, 5
The machine command for comparing two integers is denoted by the mnemonic cmp
from the English mnemonic "to compare". In this case, the contents of the EAX register
and the number 5 are compared. The results of the comparison are written to a
special register of the processor called the flag register. This allows, in particular, to
make a conditional transition depending on the results of the previous comparison,
which we do in the next line of the program:
jl again
Here jl (jump if less) is a mnemonic for the machine command of conditional jump,
which is executed if the previous comparison yielded the result "the first operand is less
than the second operand", i.e. in our case - if the number in the EAX register is less
than 5. In terms of our task, this means that the word "Hello" has been printed less than
five times, so we need to continue printing it, for which we make a transition (transfer
control) to the command marked with the again label.
If the result of the comparison was anything other than "less than", the jl
instruction will do nothing, so the processor will move on to the next
instruction in the sequence. This will happen if the word "Hello" has already been typed
five times, i.e. just when it is time to end the loop. After the loop ends, our initial task
is solved, so it is time to terminate the program as well. This is the purpose of the next
line of the program:
FINISH
The word FINISH stands for, as already noted, a macro that unfolds into a sequence
of commands that executes a request to the operating system to terminate the execution
of our program.
We only need to go back to the beginning of the program and consider the line
global _start
The word global is a directive that requires the assembler to consider some label to
be "global", that is, as if visible from the outside (strictly speaking, visible from outside
the object module; we will consider this concept later). In this case, the _start label
is declared global. As we already know, this is a special label that marks the entry point
into the program, i.e. the place in the program where the operating system should
transfer control after the program is loaded into RAM. It is clear that this marker must
be visible from the outside, which is achieved by the global directive.
§ 3.1. Introductory information 535
So, our program consists of three parts: preparation, a loop, the beginning of which
is marked again, and the final part, which consists of a single line FINISH. Before
starting the loop, we put the number 0 in the EAX register, then at each iteration
of the loop we print the word "Hello", translate the line, increment the EAX register by
one, compare it with the number 5; if the EAX register still contains a number
less than five, we go back to the beginning of the loop (i.e. to the mark again),
otherwise we exit the loop and terminate the program execution.
To try this program, as they say, in practice, you need to arm yourself with some
text editor, type this program and save it in a file with a name ending with ".asm" -
this is how files containing assembly language source code are usually called.
Suppose we have saved the program text in the file hello5.asm. To get the
executable file, we need to perform two actions. The first is to run the NASM
assembler, which will build an object module using the source text we have given. An
object module is not yet an executable file. As we already know from the Pascal part,
large programs usually consist of a whole set of source files called modules, plus we
may want to use someone else's third-party routines organized into libraries. Each
module is compiled separately, resulting in an object file; its contents are so-called
object code, which requires another stage of processing to turn into machine code ready
for execution by the processor. To produce an executable file, all the object files
obtained from the modules must be linked together, libraries must be attached to them,
and all links (addresses) must be finalized, thus turning the object code into machine
code; this is done by the linker, also sometimes called link editor or linker.
Our program consists of only one module and does not need any libraries, but this
does not exclude the assembly (linking) stage. This is the second action needed to build
the executable: run the linker to build the executable from the object file. It is at this
stage that the _start label will be used; we can specify that the global directive
does not just make the label "visible from the outside", but causes the assembler to
insert information about the label into the object file, visible to the linker.
So, first, we call the NASM assembler:
The "-f elf" checkbox tells the assembler to expect an object file in ELF format
(executable and linkable format), which is the format used in our system for executable
files . The result of running the assembler will be the file hello5.o containing the
216
object module. Now we can run the linker, which is called ld:
ld hello5.o -o hello5
If you are working with a 64-bit operating system, which is most likely the case these days,
you will have to add another key for the linker to build the 32-bit executable; in particular, for
GNU ld on Linux it would look like this:
This is true at least for modern versions of the Linux and FreeBSD operating systems. Other
216
systems may require a different format for object and executable files.
§ 3.1. Introductory information 536
ld -m elf_i386 hello5.o -o hello5
To find out what hardware platform you are dealing with, you can use the command " uname
-a". This command will produce one rather long line of text, near the end of which you will
find the hardware architecture designation: ²386, ²586, ²686, x86 indicate 32-bit
processors, while x86_64, amd64, etc., indicate 64-bit processors. - indicate 64-bit
processors. It may also turn out that your computer does not belong to the І386 family at
all, which may be indicated by some armv6l (for example, this is the designation of the
Raspberry Pi architecture), but in this case it has a completely different instruction system and
most likely there is no NASM assembler at all. There is nothing you can do about it, you will
have to find another computer.
With the -o checkbox (from the word output) we have set the name of the
executable file (hello5, this time without the suffix). Let's run it with the command
"./hello5". If we haven't made any mistakes, we will see five lines of Hello.
'/.define STUD_IO_LINUX
//.define STUD_IO_FREEBSD
The semicolon symbol means comment here, i.e. the assembler sees the first of these two
lines, and ignores the second as a comment. To adapt the file for FreeBSD, we need to take
the first line out of the work and put the second line in. To do this, we remove the semicolon
at the beginning of the second line and put it at the beginning of the first line. The result is like
this:
//.define STUD_IO_LINUX
'/.define STUD_IO_FREEBSD
After this edit, your stud_io.inc file is ready to run under FreeBSD.
In the program we described in the previous paragraph, we used the PRINT,
PUTCHAR and FINISH macros. In addition to these three macros, our
stud_io.inc file also supports the GETCHAR macro, so there are four
macros in total.
The PRINT macro is designed to print a string; its argument must be a string in
apostrophes or double quotes, it cannot print anything else.
§ 3.1. Introductory information 537
The PUTCHAR macro is designed to print a single character. As an argument it
accepts the character code written as a number or as the character itself taken in quotes
or apostrophes; you can also use a single-byte register as an argument to this macro -
AL, AH, BL, BH, CL, CH, DL or DH. You cannot use other registers as an
argument to PUTCHAR! In addition, the argument of this macro can be an executive
address enclosed in square brackets - then the character code will be taken from the
memory location at this address.
The GETCHAR macro reads a character from the standard input stream (from
the keyboard). After reading the character code is written to the register EAX; since the
character code always fits in one byte, it can be retrieved from the register AL, the
remaining digits of EAX will be zero. If there are no more characters (reached a
familiar situation of the end of the file - remember, in Unix it can be simulated by
pressing Ctrl-D), in the EAX will be recorded in the value -1 (hexadecimal
FFFFFFFFFFFF, that is, all 32 bits of the register are equal to one). This macro does
not accept any parameters.
The FINISH macro terminates program execution. This macro can be called
without parameters, or it can be called with a single numeric parameter that specifies
the termination code (see page 310).
Usually, when working in assembly language, a column the width of one tab is allocated
for labels, and, of course, it is the tab character (not spaces!) that is placed before each
command (including after labels). Some programmers prefer to give labels two tabs;
this allows longer labels to be used without having to allocate separate lines for them:
fill_memory: jecxz fm_q
fm_lp: mov [edi], al
inc edi
loop fm_lp
§ 3.1. Introductory information 539
fm_q: ret
Quite often you can find a style that assumes a separate column for the designation of
the team. It looks like this:
fill_memory jecxz fm_q
:
fm_lp: mov [edi],
inc aledi
loop fm_lp
fm_q: ret
Unfortunately, this style "breaks" when using, for example, macros with names longer
than seven characters, because the name of such a macro should be written, quite
naturally, in the command column, but there is not enough space in this column.
However, an exception to the rule can be made for macros.
It should be emphasized that a number of general principles of code design apply
to assembly language in the same way as to any other programming language. Let us
try to enumerate them.
First of all, let us remind you that the program text must consist exclusively of
ASCII characters; any characters not included in the ASCII set are not allowed in the
program text even in comments, not to mention string constants or even identifiers. We
have already mentioned this point several times, notably in the footnotes on pages 258
and 274. Among other things, it was stated that comments should be written in English,
otherwise it is better not to write them at all; for assembly language this
recommendation is not suitable, it is absolutely impossible to do without comments
here, but only one thing follows from this: if you have any problems with English, you
should solve them immediately, and at first use a dictionary.
As with any program, assembly language text is subject to the eighty-column rule:
your program lines must not exceed 79 characters in length. The reasons for this
were discussed in detail in §2.12.7).
Of course, we should not forget about choosing meaningful names for labels,
especially since in assembly programs most of the identifiers entered by the
programmer are global; the rules for dividing code into separate subroutines, as well as
into modules and subsystems, are also completely independent of the language used
and apply to assembly language just like everywhere else; here we can advise you to
return to the part about Pascal again and reread §§2.4.4, 2.12.10, 2.14.3 и 2.14.4.
31 1 15 87 0
6
EAX
1 __ AH AL
J
§ 3.2. Basics of the І386 command system 546
_150
- AX - -1 csLl
1
31 15 87 0
6 _150
EBX
1 __ BH
- BX -
BL
-1
J ss ;
_150
1
31 6 15 87 0 DS :
ECX
1 __ CH
- CX -
CL
-1
J ESL
_150
!
1
31 15 87 0
6 _150
EDX
1 __ DH
- DX -
DL
-1
J FS :
150
1
31 6 15 0 GS L !
ESI
1 __ ___ SI
J
EDI 1 __
31
116 15
DI -|/ .IFAGs 1
16 15 0
1 FLAGS
31 16 15 0 J
EBP
1 __ ___ L BP
31
ESP _
16 15
; SPEIP G ";15 ,P I
The processor commands that can be used only in privileged mode and,
consequently, are used only by the operating system, will not be considered at all: a
story about them would take up too much space, and to try them in practice, one would
have to write one's own operating system. This is not necessary for the intended
educational purposes, and if desired, the reader can use the reference literature to learn
more about the processor's capabilities.
and contains initialized data, i.e. such global variables for which the initial value is set
in the program. The second section is called the uninitialized data section or BSS
section and is denoted by ".bss"; as the name makes clear, this section is for
234
variables for which no initial value is specified. The BSS section differs from the data
section in one important feature: since the contents of the data section at the moment of
program start must be as specified by the program, the executable file has to store its
image (as a whole), whereas for the BSS section it is enough to store only its size.
Already at runtime, a task may ask the operating system to increase the data segment;
this creates a new memory area that can be used to store dynamic variables (see
§2.10.3). However, the way memory is allocated for the heap (recall that this is the
name of the memory area in which dynamic variables are placed) depends on the
operating system and the compiler used; often a separate segment is created for the
heap.
We will not consider working with dynamic memory in assembly language, but for
inquisitive readers, we will inform you that in Linux additional memory allocation is done by the
brk system call, which can be found in the technical documentation of the kernel; this call
allows you to change (usually increase) a data segment. FreeBSD allocates additional
memory using the mmap system call, which is unfortunately much more complicated,
especially for assembly language programs; we will return to this call in Volume 2 after we
learn more about C. The result of mmap's operation is a separate, separate data segment.
The result of mmap is a single segment.
The third main segment is the so-called stack segment; it is needed to store local
variables in subroutines and return addresses from subroutines. We still have a detailed
story about the stack ahead of us, for now we will just note that this segment is also
writable; its availability for execution depends on the particular operating system and
even on the particular version of the kernel: for example, in most versions of Linux it
is possible to pass control to the stack section, but a special "patch " to the kernel source
235
233
Data; read "data".
Originally the abbreviation BSS stood for Block Started by Symbol, which was due to the
234
peculiarities of one old assembler. Now programmers prefer to decipher BSS as Blank Static Storage.
Programmers use the English word patch to denote a file containing a formal list of differences
235
between two versions of a program's source code; having the initial version of the source code and such
§ 3.2. Basics of the І386 command system 551
code removes this possibility. This section can also grow in size as needed, and this
happens automatically (as opposed to growing the data segment, which must be
requested by the operating system explicitly). The stack segment is always present in a
user task; its initial contents depend only on the program's startup parameters. The
executable file does not contain any information about this segment, so no sections have
anything to do with it.
Similarly, you can use the assembler to create an image of a memory area that
contains data rather than commands. To do this, we need to tell the assembler how
much memory we need for certain needs, and possibly set the values that will be placed
in this memory before the program starts.
Using our instructions, the assembler will form a separate image of memory
containing commands (the image of the .text section) and a separate image of
memory containing initialized data (the image of the .data section), and will
also calculate how much memory we need, the initial value of which we don't care
about, so we don't need to form an image for it, but only specify the total size (the size
of the .bss section). The assembler will write all this into a file with object code, and
the linker will form an executable file from such files (possibly several), which
contains, besides the actual machine code, firstly, the data to be written into memory
before the program starts, and secondly, instructions on how much memory the program
will need other than the memory needed to accommodate the machine code and initial
data. To tell the assembler in which section this or that fragment of the memory
image to be formed should be placed, we must use the section directive in the
assembly language program; for example, the line
section .text
means that the result of processing of subsequent lines should be placed in the code
a file, a special program can be used to get the modified version, which allows sending to each other
not the whole program, but only the file containing the necessary changes. The word patch literally
translates as "patch", but programmers have not adopted this translation. There is no established
Russian-language term corresponding to the English patch, and in most cases the direct transliteration
from English - "patch" - is used.
§ 3.2. Basics of the І386 command system 552
section, and the line
section .bss
causes the assembler to switch to forming the uninitialized data section. Section
switching directives can occur as many times as you like in a program - we can form
part of one section, then part of another, then return to forming the first section.
We can inform the assembler about our memory needs by using memory
reservation directives, which are divided into two types: directives for reserving
uninitialized memory and directives for setting initial data. Usually, both types of
directives are preceded by a label so that it can be used to refer to the address in memory
where the assembler has allocated the required cells for us.
Memory reservation directives (of both kinds) in NASM assembler use bytes,
words, as well as double and quadruple words as memory capacity units, and there is
a small problem with this terminology. We have mentioned several times that the I386
processor we are studying is a 32-bit processor, that is, its machine word (see page
1.4.7) is 32 bits. We know from §3.1.3 that the previous processors of the same line (up
to and including the 80286) were 16-bit, that is, they had a machine word of 16 bits.
Programmers working with these processors at the level of machine instructions were
accustomed to call two bytes of information a "word", and four bytes were called a
"double word". When the word size doubled with the release of the next processor,
programmers did not change the usual terminology, and they could hardly do it. Thus,
our NASM assembler can generate machine code for all x86 processors - not only for
32-bit processors, but also for 16-bit and 64-bit ones; it would be strange to change the
meaning of the term "word" as a unit of memory quantity measurement every now and
then, because forming an image of sections which do not contain machine commands
(.data and .bss sections) is not connected with the type of the processor used at
all.
All this creates a certain confusion: we remember that the machine word is 32 bits,
i.e. four bytes, but we use the word word in assembly language programs to denote
a memory area of two bytes; a four-byte memory area is called a "double word"
(dword, double word), and "quadro words" (qword, quadro word) are also
occasionally used.
Uninitialized memory reservation directives tell the assembler to allocate a given
number of memory locations, and nothing is specified beyond the number. We do not
require the assembler to fill the allocated memory with any specific values; it is enough
that the memory is available at all. The resb directive is used to reserve a specified
number of single-byte cells; the resw directive is used to reserve memory
for a certain number of "words", i.e. two-byte values (for example, short integers); the
resd directive is used for "double words" (four-byte values); the directive
is followed (as a parameter) by a number indicating the number of values for which we
reserve memory. As we have already mentioned, a label is usually placed before the
memory reservation directive. For example, if we write the following lines:
will reserve memory for eight two-byte "words" (i.e. 16 bytes in total), with the first
two "words" containing the number 1, the third word containing the number 2, the
fourth word containing the number 3, etc. The fibon label will be associated with
the address of the first byte of the memory allocated and filled in this way.
Numbers can be specified not only in decimal, but also in hexadecimal, octal, and
binary. Hexadecimal numbers in NASM assembler can be specified in three ways: by
adding the letter h to the end of the number (e.g., 2af3h), or by writing
the $ symbol before the number ($2af3), or by putting 0x symbols before the
number, as in C (0x2af3). When using the $ symbol, care must be taken that the
number immediately following the $ is a number, not a letter, so if the number
begins with a letter, you must add a 0 (e.g., write $0f9, not $f9). Similarly, you
should watch the first character when using the letter h: for example, a21h will be
perceived by the assembler as an identifier, not as a number, so you should write
0a21h; but with the number 2fah such a problem does not arise initially, because
the first character in its record is a digit. An octal number is denoted by adding the letter
o or q after the number (e.g., 634o, 754q). Finally, a binary number is denoted by
the letter b (10011011b).
Character codes and text strings deserve special mention. As we already know, to
work with text data, each character is assigned a character code - a small positive
integer. We are already familiar with the ASCII encoding table (see §1.4.5). To prevent
the programmer from having to memorize the codes corresponding to printed characters
(letters, numbers, etc.), the assembler allows you to write the character itself instead of
the code by enclosing it in apostrophes or double quotes. Thus, the directive
fig7 db '7'
will place in memory a byte containing the number 55 - the code of the character
§ 3.2. Basics of the І386 command system 554
"seven", and the address of this cell will be associated with the label fig7. We can
also write a whole string at once, for example, like this:
In this case, the welmsg address will contain a string of 22 characters (i.e. an array
of single-byte cells containing the codes of the corresponding characters). As already
mentioned, NASM allows using both single quotes (apostrophes) and double quotes,
so that the next string is completely similar to the previous one:
Within double quotes, apostrophes are treated as a normal character; the same can be
said for the double quote character within apostrophes. For example, the phrase "So I
say: "Don't panic!"" can be set as follows:
panic db 'So I say, "Don't," "'", "don't panic"'
Here we first used an apostrophe to mark the beginning of a string literal, so that the
double-quote character marking the beginning of direct speech entered our string as a
simple character. Then, when we needed an apostrophe in the string, we closed the
single quotes and used the double quotes to type the apostrophe character inside them.
At the end, we used apostrophes again to set the rest of our phrase, including the double-
quote character ending the direct speech.
Note that strings in single and double quotes can be used not only with the db directive,
but also with the dw and dd directives, but you need to take into account some subtleties,
which we will not consider.
When writing programs, initial data directives are usually placed in the .data
section (i.e., the .data section directive is placed before the data
description), while memory reservation directives are allocated to the .bss
section. This is due to the already mentioned difference in their nature: initialized
data must be stored in the executable file, while for uninitialized data it is enough to
specify their total number. The .bss section, as we remember, differs from .data
in that only a size specification is stored in the executable; in other words, the size of
the executable does not depend on the size of the .bss section. Thus, if we add a
directive to the .data section
db "This is a string
— then the size of the executable file will increase by 16 bytes (we have to store the
string "This is a string" somewhere), whereas if we add the directive to the
.bss section
resb 16
— the assembler will allocate 16 bytes of memory, but the size of the executable file
will not change at all.
§ 3.2. Basics of the І386 command system 555
We can also place the directives for setting the initial data in the .text section, so that
they will end up in the code segment during operation; we just need to remember that then
this data cannot be changed during the program operation. But if we have a large array in our
program that we don't need to change (some table of constants or, more often, some text that
our program must print), it is more advantageous to place this data in the code segment,
because if users run many instances of our program at the same time, they will have one code
segment for all of them and we will save memory. It is clear that this saving is possible only
for immutable data. Remember that an attempt to change the contents of a code segment at
runtime will cause the program to crash!
Assembler allows us to use any commands and directives in any sections. In particular,
we can put machine commands in the data section, and they will be translated into the
corresponding machine code as usual, but we will not be able to transfer control to this code.
Still, in some exotic cases this may make sense (indeed, we can treat programs as data, so
there are programs that work with machine code as data), so the assembler will silently
execute our instructions, generating machine code that is never executed. If the assembler
encounters memory reservation directives ( resb, resw, etc.) in the .data section, it will
also do its job, but in this case a warning message will be issued; indeed, the situation is a bit
strange, because it increases the size of the executable file without any effect, although it does
not lead to any fatal consequences. Directives to reserve uninitialized memory in a code
section will look even stranger: indeed, if the initial value is not specified and we cannot change
this memory, then no meaningful value will ever get into such memory, and what good is it
then! Nevertheless, the assembler will continue the translation even in this case, issuing only
a warning message. A warning message will also be generated if the BSS section contains
anything other than uninitialized memory reservation directives: the assembler knows for sure
that the image generated for this section will have nowhere to write to. Even though the
assembler will continue working in all of the above cases after issuing a warning, it is more
correct to assume that you have made a mistake and correct the program.
copies data from the EBX register to the EAX register. It is important to note
that the mov command only copies data without performing any conversions.
There are other commands for data conversion.
§ 3.2. Basics of the І386 command system 556
In the examples discussed above, we have seen at least two uses of the mov
command:
mov eax, ebx
mov ecx, 5
The first variant copies the contents of one register into another register, while the
second variant puts into the register some number specified directly in the command
itself (in this case the number 5). This example shows that operands are of different
types. If the operand is the name of the register, we speak of a register operand; if the
value is specified directly in the instruction itself, such an operand is called a direct
operand.
In fact, in this case we should not even talk about different types of operands, but about
two different commands, which are simply denoted by the same mnemonics. The two mov
commands from our example are translated into completely different machine codes, with
the first one occupying two bytes in memory and the second one five bytes, four of which are
used to place the immediate operand.
In addition to direct and register operands, there is a third type of operand - an
address operand, also called a memory operand. In this case, the operand in one way
or another specifies the address of the memory location or area to be dealt with by the
command. It should be remembered that in NASM assembly language the "memory"
operand is always denoted by square brackets, in which the address itself is written.
In the simplest case, the address is given explicitly, i.e. in the form of a number; usually
when programming in assembly language we use labels instead of numbers, as already
mentioned. For example, we can write:
section .data
; ...
count dd 0
section .text
; ...
mov [count], eax
- this mov command will indicate copying of data from the EAX register to the
memory area marked with the count label, and, for example, the command
will, on the contrary, indicate copying from memory at address count to register
EDX. To understand the role of square brackets, consider the command
Now it is obvious that this is just a familiar form of the mov instruction with a direct
operand, i.e. this instruction writes the number 40f2a008 into the EDX register
without looking into whether this number is the address of any memory location or not.
If we add square brackets, we are talking about accessing memory at a given address,
i.e. the number will be used as the address of the memory area where the value to be
handled (in this case - put it into the EDX register) is located.
means "take the value in the EAX register, use that value as an address, access memory
at that address, fetch four bytes from there, and write them to the EBX register",
whereas the instruction
meant, as we have already seen, simply "copy the contents of the EAX register to the
§ 3.2. Basics of the І386 command system 558
EBX register".
Let's consider a small example. Suppose we have an array of single-byte elements
intended for storing a string of characters, and we need to put the code of the '@'
character into each element of this array. Let's see what code fragment we can use to
do it (let's use the commands we already know from the example on page 533, adding
to them the commands we already know from the example on page 533). 533, adding
to them the decrement command dec, which decrements its operand by one):
section .bss
array resb 256 ; array of 256 bytes
sectio
.text
n
; ... ; number of elements -> into
mov ecx, 256 counter (ECX)
mov edi, ; array address -> in EDI
array : '@'
mov al, ; required code -> into one-
again: mov [edi], al byte
; AL the code into the next
enter
inc edi element
; increment the address
dec ecx ; decrease the counter
jnz again ; if it is not zero, repeat the
cycle
Here we used the ECX register to store the number of iterations of the loop that still
remain to be executed (initially 256, at each iteration we decrease it by one, reaching
zero - end the loop), and to store the address we used the EDI register, in which before
entering the loop we put the address of the beginning of the array array array
and at each iteration we increase it by one, thus moving to the next cell.
An attentive reader may notice that the code fragment is not quite rationally written. First,
you could have used only one variable register, either by comparing it not to zero but to the
number 256, or by viewing the array from the end. Secondly, it is not quite clear why the AL
register was used to store the character code, because you could have used the direct
operand directly in the command that puts the value into the next element of the array.
All this is true, but then we would have to use, first, an explicit indication of the operand
size, which we have not discussed yet; and, second, we would have to use the cmp
command or complicate the command of assigning the initial value of the address. By using
a code that is not quite rational here, we were able to limit ourselves to fewer explanations
that distract attention from the main task.
Thus, the address for memory access is not always predetermined; we can calculate
the address already during program execution, write the result of calculations into the
processor register and use indirect addressing. The address at which the next machine
instruction will address the memory (no matter whether this address is explicitly set or
calculated) is called the executive address. Above we have considered the situations
when the address is calculated, the result of calculations is stored in a register and it is
the value stored in the register that is used as the executive address. For the convenience
of programming, the i386 processor allows you to set the executive address so that it is
calculated during instruction execution.
More specifically, we can require the processor to take some predetermined value
§ 3.2. Basics of the І386 command system 559
(perhaps zero), add to it a value stored in one of the registers, and then take a
EAX
EAX
EBX
EBX
ECX ECX
CONSTANT EDX
1T 1-Г EDX
ESI 4 * 8
BI
EDI™
EDI
EBP EDI
EBP EBP
ESP
value stored in another register, multiply it by 1, 2, 4, or 8, and add the result to the
existing address. For example, we can write
By executing this instruction, the processor will add the number specified by the
array label , with the contents of the EBX register and double the contents of the
236
EDI register, use the result of the addition as the executive address, fetch 4 bytes from
the memory area at this address and copy them to the EAX register. Each of the three
summands used in the executive address is optional, so we can use only two summands
or just one - as we have done so far.
It is important to realize that the expression in square brackets cannot be arbitrary
in any way. For example, we cannot take three registers, we cannot multiply one
register by 2 and another by 4, we cannot multiply by numbers other than 1, 2, 4, and
8, we cannot multiply two registers with each other, or subtract a register value instead
of adding it, etc. A general view of the executive address is shown in Fig. 3.2; as can
be seen, ESP cannot be used as the register to be multiplied by a factor; however, any
of the eight general-purpose registers can be used as the register whose value is simply
added to a given address.
The assembler allows certain liberties with address writing, as long as it can correctly
convert the address into a machine instruction. First, the summands can be arranged in any
order. Second, you can use two or more constants instead of one: the assembler itself will add
them up and write the result into the resulting machine command. Finally, you can multiply a
register by 3, 5 or 9: if you write, for example, [eax*5], the assembler will "translate" it as
[eax+eax*4]. Of course, if you try to write [eax+ebx*5], the assembler will generate
an error, because you have already used the summand it needs.
To understand why such a complex type of executive address may be needed, it is
enough to imagine a two-dimensional array consisting, for example, of 10 lines, each
of which contains 15 four-byte integers. Let's call this array matrix, putting the
Recall that a label is nothing but a designation of some number, in this case, most likely, the
236
To access the elements of the N-th line of such an array, we can calculate the offset
from the beginning of the array to the beginning of this N-th line (to do this we need to
multiply N by the length of the line, which is 15 * 4 = 60 bytes), put the result of
calculations, say, in EAX, then in another register (for example, in EBX) put the
number of the desired element in the line - and the executive address of the form
[matrix+eax+4*ebx] will exactly show us the place in memory where the
desired element is located.
The processor's ability to calculate the effective address can be used separately
from memory access if desired. For this purpose, the lea instruction is provided (the
name is derived from the words load effective address). The command has two
operands, and the first one must be a register operand (2 or 4 bytes in size), and the
second one must be an operand of the "memory" type. The command does not make
any memory access; instead, the address calculated in the usual way for the second
operand is entered into the register specified by the first operand. If the first operand is
a two-byte register, the lower 16 bits of the calculated address will be written to the
register. For example, the command
will take the value of the ECX register, multiply it by 8, add to it the value of the EBX
register and the number 1000, and the result will be entered into the EAX
register. Of course, you can use a label instead of a number. The restrictions on the
expression in brackets are exactly the same as in other cases of using an operand of the
"memory" type (see Figure 3.2 on page 561).
Let us emphasize once again that the lea command only calculates an address
without accessing memory, despite the use of an operand of the "memory" type. It can
be used for ordinary arithmetic calculations, including those not related to addresses,
and, I must say, sometimes it can be very convenient. As we will see later, integer
multiplication commands are rather cumbersome, so, say, if you want to multiply by
three, five or nine, the easiest way to do it is to use lea.
will be rejected as incorrect: it is not clear whether you mean a byte with value 25, a
"word" with value 25, or a "double word" with value 25. Nevertheless, a
command like the one above may well be necessary, and the processor knows how to
execute such a command. To use such an instruction, we need to explain to the
assembler exactly what we mean by putting a size specifier before any of the operands
- the word byte, word, or dword, meaning byte, word, or double word
§ 3.2. Basics of the І386 command system 562
(i.e., size 1, 2, or 4 bytes), respectively. For example, to write the number 25 into a
four-byte memory location at address x, we can write
mov [x], dword 25
or
use the same machine code for the operation and differ only in the value of the second
operand, which is direct in both cases; indeed, since the label x will be replaced by an
address, that is simply a number.
means "take a value from the EAX register, add to it a value from the EBX
register, and write the result back to the EAX register". The command
means "take a four-byte number from memory at address x, subtract from it the value
from the ECX register, write the result back into memory at the same address". The
command
add edx, 12
§ 3.2. Basics of the І386 command system 563
will increment by 12 the contents of the EDX register, and the command
will do the same with the four-byte memory location at address x; note that we had to
explicitly specify the size of the operand, as discussed in the previous paragraph.
Interestingly, the add and sub instructions do not care whether we consider
their operands to be signed or unsigned numbers . Adding and subtracting signed and
237
unsigned numbers is done exactly the same from an implementation point of view, so
that when adding and subtracting, the processor may not (and does not) know
whether it is working with signed or unsigned numbers. It is the programmer's
responsibility to remember which numbers are meant.
The add and sub commands set the values of the OF, CF, ZF, and SF flags
according to the result obtained (see page 548). The ZF flag is set if the last operation
results in zero, otherwise the flag is cleared; it is clear that the value of this flag is
meaningful for both signed and unsigned numbers, because the representation of zero
is the same for them.
The flags SF and OF make sense to consider only when working with signed
numbers. SF is set if a negative number is obtained, otherwise it is reset. The
processor sets this flag by copying the high bit of the result into it; for signed numbers
the high bit, as we know, corresponds to the sign of the number. The OF flag is set if
an overflow has occurred, which means that the sign of the obtained result does not
correspond to the one that should be obtained based on the mathematical sense of
operations - for example, if the result of adding two positive ones is negative or vice
versa. Clearly, this flag has no meaning for unsigned numbers.
Finally, the CF flag is set if (in terms of unsigned numbers) there is a carry
from a higher digit or a borrow from a non-existent digit. In terms of meaning, this flag
is analogous to OF when applied to unsigned numbers (the result does not fit into
the operand size or is negative). CF has no meaning for signed numbers.
Without knowing what numbers are meant, the processor sets all four flags based
on the results of the add and sub instructions; the programmer must use those that
correspond to the meaning of the operation performed.
The presence of the carry flag allows you to organize addition and subtraction of
unsigned numbers that do not fit into the registers in a way reminiscent of school
addition and subtraction "in column" - the so-called addition and subtraction with
carry. The i386 processor has adc and sbb commands for this purpose. By their
operation and properties they are completely similar to the add and sub
commands, but they differ from them in that they take into account the value of the
We discussed signability and unsignability of integers in the introductory section, see §1.4.2; if
237
you don't feel confident in handling these terms, be sure to reread this paragraph and understand the
issue; if necessary, find someone who can explain it to you. §1.4.2; if you do not feel confident in
handling these terms, be sure to reread this paragraph and understand the question; if necessary, find
someone who can explain it to you. Otherwise you run the risk of not understanding anything at all.
§ 3.2. Basics of the І386 command system 564
carry flag (CF) at the moment when the operation is started. The adc command adds
the carry flag value to its final result; the sbb command, on the other hand, subtracts
the carry flag value from its result. After the result is generated, both commands reset
all flags, including CF, according to the new result.
Here is an example. Suppose we have two 64-bit integers and the first one is written
into the EDX (high 32 bits) and EAX (low 32 bits) registers and the second one is
written into the EBX and ECX registers in the same way. Then these two numbers
can be added by the commands
If we need to subtract, it is done by the commands sub eax, ecx ; subtract the younger
parts sbb edx, ebx ; now the older ones, taking into account the loan
instructions may seem to be organized in a very awkward way for the programmer. The
reason for this seems to be that the creators of the i386 processor and its predecessors
acted here primarily for reasons of convenience in the implementation of the processor
itself.
It should be said that multiplication and division cause some difficulties not only
for processor designers, but also for programmers, not only because of the
inconvenience of the corresponding commands, but also because of their very nature.
First, unlike addition and subtraction, multiplication and division are performed quite
differently for signed and unsigned numbers, so different commands are necessary.
Second, interesting things happen with operand sizes. In multiplication, the size
(number of significant bits) of the result can be twice the size of the original operands
(not by one single bit, as in addition and subtraction), so if we don't want to lose
information, a single checkbox is not enough: we need an additional register to store
the higher bits of the result. With division the situation is even more interesting: if the
modulus of the divisor exceeds 1, the size of the result will be smaller than the size
of the divisor (to be more precise, the number of significant bits of the result of binary
division does not exceed n - t +1, where n and t are the number of significant bits of
the divisor and the divisor respectively), so it is desirable to be able to set the divisor
longer than the divisor and the result. In addition, integer division gives not one but two
numbers as the result: a quotient and a remainder. It is desirable to combine finding the
quotient and remainder in one operation, otherwise it may lead to double execution (at
the level of electronic circuits) of the same operations.
All integer multiplication and division commands have only one operand , 239
which specifies the second multiplier in multiplication commands and the divisor in
division commands, and this operand can be of register or memory type, but not direct.
In the role of the first multiplier and divisor, as well as the place for recording the result,
the implicit operand is used, which are the registers AL, AX, EAX, and, if necessary,
the register pairs DX:AX and EDX:EAX (recall that the letter A stands for the word
"accumulator"; this is the special role of the register EAX, which was discussed
on page 546). 546).
The mul command is used to multiply unsigned numbers, and the imul
command is used to multiply signed numbers. In both cases, depending on the digit
capacity of the operand (the second multiplier), the first multiplier is taken from the
register AL (for a one-byte operation), or AX (for a two-byte operation), or EAX
(for a four-byte operation), and the result is placed in the register AX (if the
operands were one-byte), or in the register pair DX:AX (for a two-byte operation), or
238
Some processors, even modern ones, do not have these operations at all, and the reason for this is
the high complexity of their hardware implementation. On such processors, multiplication has to be
performed "manually", by binary column; usually a subroutine is created for such multiplication.
239
In fact, there is an exception to this rule: the command of integer multiplication of signed numbers
imul has two-place and even three-place forms, but we will not consider these forms: it is even more
difficult to use them than the usual one-place form.
§ 3.2. Basics of the І386 command system 566
in the register DX:AX (for a two-byte operation), or in the register DX:AX (for a two-
byte operation).
Table 3.1. Location of implicit operand and results for integer division and
multiplication operations depending on the digit capacity of the explicit operand
multiplication division
discharge. implicit multiplication divisible private residue
(bit) multiplier result
8 AL AX AX AL AH
16 AX DX:AX DX:AX AX DX
32 EAX EDX:EAX EDX:EAX EAX EDX
EDX:EAX pair (for a four-byte operation). This can be visualized more clearly in the
form of a table (see Table 3.1).
The mul and imul commands reset the CF and OF flags if the upper half
of the result is not actually used, i.e. all significant bits of the result fit in the lower half.
For mul this means that all digits of the higher half of the result contain zeros, for
imul - that all digits of the higher half of the result are equal to the high bit of
the lower half of the result, i.e. the whole result, whether it is the AX register or
register pairs DX:AX, EDX:EAX, is a sign extension of its lower half (respectively
registers AL, AX or EAX). Otherwise, CF and OF are set (both of them). The
values of the other flags are undefined after execution of mul and imul; this means
that nothing meaningful can be said about their values, and different processors may
set them differently, and even as a result of executing the same instruction on the same
processor with the same operand values, the flags may (at least theoretically) get
different values.
To divide (and find the remainder of the division) integers, the div command (for
unsigned numbers) and idiv (for signed numbers) are used. The only operand of the
command, as mentioned above, specifies the divisor. Depending on the digit capacity
of this divisor (1,2 or 4 bytes), the divisor is taken from register AX, register
pair DX:AX or register pair EDX:EAX, the quotient is placed in register AL, AX or
EAX, and the remainder of the division - in registers AH, DX or EDX respectively
(see Table 3.1). The partial is always rounded towards zero (for unsigned and positive
- to the smaller side, for negative - to the larger side). The sign of the remainder
calculated by the idiv command always coincides with the sign of the divisor, and
the absolute value (modulus) of the remainder is always strictly less than the modulus
of the divisor. The values of flags after integer division are not defined.
The situation when the divider contains the number 0 at the moment of execution of the
div or idiv instruction deserves special consideration. It is known that division by zero is
impossible, and the processor does not have its own means to report an error. Therefore, the
processor initiates a so-called exception, also called an internal interrupt, which results in
the operating system taking control; in most cases, it reports an error and terminates the
current task as an emergency. The same thing will happen if the result of division does not fit
into the allocated digits: for example, if we put 10h into EDX and any other number,
§ 3.2. Basics of the І386 command system 567
even just 0, into EAX, and try to divide it (i.e. hexadecimal 1000000000, or 2 ) by, say, 2
36
(writing it, for example, into EBX to make the division 32-bit), the result (2 ) will not fit into
35
the 32 bits, and the processor will have to initiate an exception. We will talk more about
exceptions (internal interrupts) in §3.6.3.
In integer division of signed numbers, it is often necessary to expand the divisor
before division: if we worked with one-byte numbers, from a one-byte divisor located
in AL, we must first make a two-byte divisor located in AX, for which we must
enter 0 in the high half of AX if the number is non-negative and FF16 if negative.
In other words, you actually need to fill the high half of AX with the sign bit from
AL. This can be done with the cbw (convert byte to word) command. Similarly, the
cwd (convert word to doubleword) command extends the number in the AX register
to the DX:AX register pair, that is, it fills the DX register bits. The
command cwde (convert word to dword, extentded) extends the same register AX to
the register EAX, filling the upper 16 bits of this register. Finally, the command cdq
(convert dword to qword) extends EAX to the register pair EDX:EAX, filling the
EDX register bits. The scope of these commands is not limited to integer division,
especially when it comes to cwde. The cbw, cwd, cwde, and cdq commands have
no operands because they always operate on the same registers.
Note that when dividing unsigned numbers, no special commands are needed to
expand the digit capacity of the number: it is sufficient to simply zero out the high part
of the divisor, be it AH, DX or EDX.
Before discussing the means for performing transitions available in the ²386
processor instruction system, we should first note that in the ²386 processor instruction
system all control transfer instructions are divided into three types depending on the
"range" of such transfer.
• Far transitions involve transferring control to a program fragment
located in another segment . Since we use a "flat" memory model under Unix
240
Here, segments in the processor's "understanding" are meant; see the remark on page. 546.
240
§ 3.2. Basics of the І386 command system 568
within the same segment; in fact, such transitions represent an explicit change in
the EIP value. In the "flat" memory model, these are the kind of transitions
that we can use to jump to an arbitrary location in our address space.
• Short jumps are used for optimization if the point to be jumped to is no more
than 127 bytes forward or 128 bytes backward from the current instruction. In
the machine code of such an instruction, the offset is specified by only one byte,
hence the corresponding limitation.
The type of transition can be specified explicitly by putting the word short or near
after the command (the assembler understands the word far, of course, but we
don't need it). If you don't do this, the assembler chooses the default transition type,
and for unconditional transitions it is near, which usually suits us, but for
conditional transitions it is short by default, which creates certain difficulties.
The unconditional jump command is called jmp (from the word jump, which
literally translates as "jump"). The command has one operand, which defines the
address where control should be transferred. Most often we use the form of the jmp
command with a direct operand, i.e. the address specified directly in the command; of
course, we do not specify a numeric address, which we usually do not know, but a label.
It is also possible to use a register operand (in this case, the transition is made at an
address taken from a register) or an operand of the "memory" type (the address is read
from a double word located at a given position in memory); such transitions are called
indirect, as opposed to direct, for which the address is specified explicitly. Here are
some examples:
long-range conditional jumps are still not supported, but we don't need them anyway.
Another non-trivial point is that all conditional jump instructions allow only a direct
operand (usually just a label). You cannot get the address for such a transition either
from a register or from memory. Usually this is not necessary, but if you still need it,
you can make a transition by the opposite condition two commands ahead, and the next
command to put an unconditional transition; it will turn out that we will safely jump
over this unconditional transition, if the initial condition of the transition is not met,
and, conversely, perform the transition, if the condition was met.
As with unconditional jumps, when translating conditional jump instructions into machine
code, the operand is not an address as such, but the difference between the memory position
to which the jump should be made and the instruction following the current one, i.e. relative
addressing is used.
The simplest conditional jump commands jump to a specified address depending
§ 3.2. Basics of the І386 command system 570
on the value of a single flag. The names of these commands are formed from the letter
J (from the word jump), the first letter of the flag name (e.g., Z for the flag ZF) and
possibly the letter N (from the word "not") inserted between them, if the transition is
to be made under the condition that the flag is zero. All these commands are
summarized in Table 3.2. Recall that we discussed the meaning of each of the flags on
page 548. 548.
Such conditional jump commands are usually placed immediately after an
arithmetic operation (e.g., immediately after the cmp command, see page 567). Thus,
two commands
cmp eax, ebx jz are_equal
can be read as an order "compare values in registers EAX and EBX and if they are
equal, switch to the are_equal label".
If we need to compare two numbers for equality, everything is quite simple: just
use the ZF flag, as in the previous example. But what do we do if we are interested,
for example, in the condition a < b? First, of course, we apply the command
cmp a, K
The command will compare its operands - more precisely, it will subtract the value of
b from a and set the flag values accordingly. What follows, as we will see in a moment,
is a bit more complicated.
If a and b are signed numbers, then at first glance everything is simple: subtracting
a - b under the condition a < b gives a strictly negative number, so the sign flag (SF,
sign flag) must be set and we can use the js or jns command. But the result might
not fit into the operand length (for example, 32 bits, if we compare 32-bit numbers),
i.e. an overflow might occur! In this case, the value of the SF flag will be opposite
to the true sign of the result, but the OF flag (overflow flag) will be raised. In other
words, if the condition a < b is fulfilled after the comparison (or subtraction), two
variants of flag values are possible: SF=1 but OF=0 (i.e. there was no overflow, the
number is negative), or SF=0 but OF=1 (the number is positive, but it is the result
of overflow, and in fact the result is negative). In other words, we are interested in the
fact that the SF and OF flags are not equal to each other: SF=OF. For such a case,
the i386 processor has an instruction jl (from the words jump if less than, "jump if
less than"), also denoted by the mnemonic jnge (jump if not greater or equal, "jump
if not greater or equal").
Let us now consider the situation when numbers a and b are unsigned. As we have
already discussed in §3.2.7 (see page 565), it makes no sense to consider the flags OF
and SF after arithmetic operations on unsigned numbers, but it makes sense to consider
the flag CF (carry flag), which is set to one if the arithmetic operation results in a
carry from a higher digit (for addition) or a borrow from a non-existent digit (for
subtraction). This is exactly what we need here: if a and b are considered unsigned and
a < b, such a borrowing will occur when subtracting a - b, so we can use the value of
the CF flag, i.e. execute the jc command, which has the synonyms jb (jump if
§ 3.2. Basics of the І386 command system 571
below, "jump if below") and jnae (jump if not above or equal, "jump if not above or
equal") specifically for this situation.
When we are interested in "greater than" and "less than or equal to" relations, we
have to include the ZF flag, which (for both signed and unsigned numbers) indicates
equality of the arguments of the preceding cmp command.
All commands of conditional transitions by the result of arithmetic comparison are
given in Table 3.3.
§ 3.2. Basics of the І386 command system 572
Table 3.3. Commands of conditional transition by results of arithmetic comparison
(cmp a, b)
com's If, uh. sino
name. vaVb transition condition them
equality
je jne jz jnz
equal not equal a = b a ZF= 1
=b ZF= 0
inequalities for signed numbers
jl less a<b SF=OF
g jne not greater or equal
jle a6b SF=OF or ZF= 1
jng less or equal not greater
jg a>b SF=OF and ZF= 0
jnle greater not less or equal
jge greater or equal not less a > b SF=OF
jnl
inequalities for unsigned numbers
jb below a<b CF= 1 jc
and we want to calculate the sum of its elements. This can be done using the following
code fragment:
dec eax
jnz lp
In the same way, you can write two commands for the ECX register:
dec ecx
jnz lp
The advantage of the loop lp command over these two commands is that its
machine code takes up less memory, although it is oddly slower on most processors.
In the example with an array you can do without ESI, just a counter: mov ecx,
1000 mov eax, 0
Ip: add eax, [array+4*ecx-4]
loop Ip
There are two interesting points here. First, we have to pass the array from the end to the
beginning. Secondly, the executive address in the add command has a somewhat strange
appearance. Indeed, the ECX register runs from 1000 to 1 (the loop is not executed for
the zero value), while the addresses of the array elements run from array+4*999 to
array+4*0, so we should multiply by 4 not ECX but (ecx-1). However, we cannot do
this and simply subtract 4. At first glance this contradicts what was said in §3.2.5 regarding
the general form of the executive address (the constant summand must be one or none), but
in fact the NASM assembler will subtract the value 4 from the array value right during
translation and translate it in this form, so that the constant summand will be one in the final
machine code.
Let us now consider two additional conditional jump commands. The jcxz (jump
if CX is zero) command makes a jump if the CX register contains zero. Flags are not
§ 3.2. Basics of the І386 command system 575
taken into account. Similarly, the jecxz command makes a transition
if zero is contained in the ECX register. As with the loop command,
this transition is always short. To understand why these commands were introduced,
imagine that the ECX register already contains zero when you enter the loop.
Then the loop body will be executed first, and then the loop instruction will decrease
the counter by one, as a result of which the counter will be equal to the maximum
possible unsigned integer (the binary notation of this number consists of all units), so
that the loop body will be executed 2 times, whereas by its meaning it probably should
32
not have been executed at all. To avoid such troubles, you can put the jecxz
command before the loop:
To complete the picture, let us mention two modifications of the loop command. The
loope command, also called loopz, makes a transition if the ECX register is non-
zero after it has been decremented by one and the ZF flag is set, while the loopne
command (or, what is the same thing, loopnz) - if the ECX register is non-zero and
the ZF flag is reset. The ECX register is decremented by these commands in any case,
i.e. even when ZF is "wrong". As can be guessed, the letter "e" here means equal
and the letter "z" means zero.
SHL, SAL 0
SAR
The command "test eax, eax" is often used instead of "cmp eax, 0". In particular, the
command "test eax, eax " is often used instead of "cmp eax, 0" to check
for equality to zero, which takes less memory and works faster.
In addition to commands that work on each bit of the operand (operands) and
realize logical operations, it is often necessary to use bit shift operations that work on
all bits of the operand at once, simply by shifting them. The simple bitwise shift
commands shr (shift right) and shl (shift left) have two operands, the first of which
specifies what to shift and the second of which specifies how many bits to shift. The
first operand can be a register operand or of the "memory" type (in the second case the
bit size must be specified). The second operand can be either direct, that is, a number
from 1 to 31 (in fact, you can specify any number, but only the lower five digits will be
used), or the CL register; no other registers can be used. When executing these
instructions with the CL register as the second operand, the processor ignores
all but the low five CL bits. The register itself, of course, does not change.
The scheme of shifting by 1 bit is as follows. When shifting to the left, the high bit
of the shifted number is transferred to the CF flag, the other bits are shifted to the left
(i.e. the bit with the number p gets the value, which before the operation had a bit with
241
the number p - 1), zero is written to the low bit. When shifting to the right, on the
contrary, the lowest bit is written to the CF flag, all bits are shifted to the right (i.e. the
bit with the number p gets the value that the bit with the number p +1 had before the
operation), zero is written to the high bit.
Note that for unsigned numbers, a shift to the left by n bits is equivalent to
multiplication by 2 ", and a shift to the right is equivalent to integer division by 2 " with
discarding the remainder. It is interesting that for signed numbers the situation with the
shift to the left is absolutely similar, but the shift to the right for any negative number
will give a positive, because the sign bit will be written to zero. Therefore, along with
the commands of simple shift, the commands of arithmetic bitwise shift sal (shift
arithmetic left) and sar (shift arithmetic right) are also introduced. The sal
command does the same thing as the shl command (they are actually the same
machine command). As for the sar command, it works similarly to the shr
By convention, we assume that the bits are numbered from right to left starting from zero, i.e., for
241
example, in a 32-bit number, the low-order bit is numbered 0 and the high-order bit is numbered 31.
§ 3.2. Basics of the І386 command system 578
command, except that the value in the high bit is kept the same as it was before the
operation; thus, if we consider the shifted bit string as a record of a signed integer, the
sar operation will not change the sign of the number (positive will remain positive,
negative will remain negative). In other words, the arithmetic right shift operation is
equivalent to division by 2 " with the remainder discarded for signed integers. The
operations of prime and arithmetic shifts are shown schematically in Fig. 3.3.
Bit-shift commands are much faster than multiplication and division commands;
moreover, they are much easier to handle: you can use any registers, so you don't have
to think about releasing the accumulator. That's why programmers almost always use
bitwise shift commands when multiplying and dividing by powers of two. High-level
language compilers also try to use shifts instead of multiplication and division when
translating arithmetic expressions.
In addition to the above, the i386 processor also supports "complex" bit-shift
instructions shrd and shld, which work through two registers; cyclic bit-shift
instructions ror and rol; cyclic bit-shift instructions through the CF flag - rcr
and rcl. The commands working with individual bits of their operands - bt, bts,
btc, btr, bsf and bsr - can be very useful. We will not consider all these
commands; if desired, the reader can learn them on his own using reference books.
Let's consider an example of a situation in which it is reasonable to use bitwise
operations. Bit strings are convenient for representing subsets of a finite number of
initial elements; simply put, we have a finite set of objects (for example, employees of
some enterprise, or toggle switches on some control panel, or simply numbers from 0
to N) and we need a program to be able to represent a subset of this set: which employees
are at work now; which toggle switches on the control panel are set to "on"; which of
N athletes participating in a marathon have passed the next control test; which of the
N marathon athletes have passed the next test; and which of the N marathon athletes
have passed the next test. The most obvious representation for a subset of a set of N
elements is a memory area containing N binary digits (so, if the set can include numbers
from 0 to 511, we need 512 digits, i.e. we need 64, i.e. 64, i.e. 512 digits).
§ 3.2. Basics of the І386 command system 579
512 bits, i.e. 64 single-byte cells), where each of the N possible elements is assigned
one bit, and this bit will be equal to one if the corresponding element is included in the
subset, and zero otherwise. Each of the N objects is said to be assigned one of two
statuses: either "included in the set" (1) or "not included in the set" (0).
So, let us require a subset of a set of 512 elements; these can be completely arbitrary
objects, we are only interested in the fact that each of them has a unique number - a
number from 0 to 511. To store such a set, we will describe an array of 16 double words
(recall that a double word contains 32 bits, i.e. it can store the status of 32 different
objects). As usual, we will consider the array elements as numbered (or having indices)
from 0 to 15. The array element with index 0 will store the status of objects with
numbers from 0 to 31, the element with index 1 - the status of objects with numbers
from 32 to 63, etc. At the same time, within the element itself we will consider the bits
numbered from right to left, that is, the lowest digit will have the number 0, the highest
- the number 31. For example, the status of the object with the number 17 will be stored
in the 17th bit of the zero element of the array; the status of the object with the number
37 - in the 5th bit of the first element; the status of the object with the number 510 - in
the 30th bit of the 15th element of the array. In general, to find out by the number of
the object X, in which bit of which element of the array stores its status, it is enough to
divide X by 32 (the number of bits in each element) with a remainder. The quotient
will correspond to the number of the element in the array, the remainder will correspond
to the number of bits in this element. This could be done with the div command, but
it is better to remember that the number 32 is the power of two (2 ), so if we take the
5
lower five bits of the number X, we will get the remainder of its division by 32, and if
we perform a bitwise shift to the right for it by 5 positions, the result will be equal to
the desired quotient. For example, let the number X is stored in the register EBX,
and we need to know the number of the element and the number of bits in the element.
Both numbers do not exceed 255 (more precisely, the element number does not exceed
15 and the bit number does not exceed 32), so we can place the result in single-byte
registers; let these be BL (for the bit number) and BH (for the array element number).
Since putting any new values into BL and BH will spoil the contents of the EBX
register as an integer, it would be logical to first copy the number somewhere else, such
as EDX, then in EBX to zero all bits except the five low-order bits. In this case, and
the value of EBX as an integer, and the value of its smallest byte - the register BL
will be equal to the desired remainder of the division; then in EDX we perform a
shift to the right and the result, which will fully fit in the smallest byte of the register
EDX, that is, in the register DL, copy to BH:
mov edx, ebx
and ebx, 11111b ; take 5 lower digits
shr edx, 5 ; divided the rest by 32 mov bh,
dl
However, the same can be done in a shorter way, without using additional registers,
because all the bits we need are in EBX from the beginning. The lower five digits of
the number X are the remainder we need from division, and the quotient we need is
§ 3.2. Basics of the І386 command system 580
the next few (in this case, no more than four) digits. When EBX entered the number X,
these digits were in positions beginning with the fifth, and we need them to be in the
register BH, which is nothing but the second byte of the register EBX, so it is enough
to shift the entire contents of EBX to the left by three positions, and the desired
result of the division will neatly "fit" in BH; after that, the contents of BL we shift
back to the same three bits, which at the same time clears us its high bits:
shlebx , 3
shrbl , 3
Having learned how to convert the object number to the array element number and the
number of digits in the element, let's return to the original problem. First, let's describe
the array:
section .bss
set512 resd 16
Now we have a suitable memory area, and the label set512 is associated with the
address of its beginning. Somewhere at the beginning of the program (and perhaps not
only at the beginning) we will probably need a set clearing operation, i.e. such a set of
commands after which the status of all elements is zero (no element is included in the
set). To do this, it is enough to put zeros into all array elements, for example, like this:
section .text
The mov command here will be executed 16 times - with values in ECX from 16
to 1, hence the cumbersome expression in the executive address.
Let us now have the number of element X in the EBX register, and we need to
add the element to the set, i.e. set the corresponding bit to one. To do this, we first find
the bit number of the array element and calculate the mask - a number in which only
one bit (just the one we need) is equal to one, and the other bits are zeros. Then we will
find the required array element and apply the "or" operation to it and to the mask, the
result of which we will put back into the array element. In this case, the bit we need in
the element will be equal to one, while the other bits will not change. To calculate the
mask, we will take the one and shift it to the left by the required number of bits. Recall
that of the registers only CL can be the second argument of bit shift commands, so it
makes sense to calculate the bit number in CL at once. So, let's write:
Let's consider another example. Suppose we need to count how many elements are
included in an array. To do this, we will have to look through all the array elements and
count the single bits in each of them. The easiest way to do this is to load a value from
an array element into a register, and then shift the value to the right by one bit and each
time check if there is a one in the low-order bit; this can be done exactly 32 times, but
it is easier to finish when there is zero left in the register. We will look through the array
from the end, indexing by ECX: this will allow us to use the jecxz command. We
will use the EBX register as the result counter, and use EAX to analyze the array
elements.
and need to fill it with zeros, we can apply the following code:
The lodsb, lodsw and lodsd commands, on the other hand, read a byte, word
or double word from memory at the address located in the ESI register and
place the read into the AL, AX or EAX register, after which they increment or
decrement the value of the ESI register by 1, 2 or 4. Using these commands
with the rep prefix is usually pointless, since we will not be able to insert any other
actions between successive executions of the string command that process the value
read and placed in the register. Using lods series commands without a prefix, on the
contrary, can be very useful. For example, suppose we have an array of four-byte
numbers
and we need to count the sum of its elements. This can be done as follows:
If you just want to copy data from one memory location to another, the movsb,
movsw, and movsd commands are very convenient. These commands copy a
byte, word or double word from memory at address [esi] to memory at address
[edi] and then increment (or decrement) both registers ESI and EDI by 1, 2 or
4 respectively.
The stosX and movsX commands can be prefixed with rep. The command
prefixed with rep will be executed as many times as the number in the ECX register
(except for stosw and movsw; if they are prefixed, the CX register will be used).
With the rep prefix, we can rewrite the example above for stosb without using
the label:
xor al, al
mov edi, buf
mov ecx, 1024
cld
rep stosb
Note that rep also allows for a zero initial value of ECX, in which case the string
command is not executed even once (unlike the loop command, where the zero
case has to be considered separately).
The stosX commands are often used in conjunction with lodsX, in which
case the rep prefix cannot be used because it refers to only one machine command
(in fact, it is part of the command; it is the F3 byte, which literally precedes the
command code itself, i.e., it is placed right before it). The movsX commands are quite
different; they are most often used exactly with the prefix. For example, if we have two
string arrays
buflresb 1024
buf2resb 1024
and you want to copy the contents of one of them into the other, you can do it this way:
mov ecx, 1024 mov esi, bufl mov edi, buf2 cld rep
movsb
Thanks to the ability to change the direction of work (with the help of DF) we can copy
§ 3.2. Basics of the І386 command system 585
partially overlapping memory areas. Suppose, for example, the bufl array contains
the string "This is a string" and we need to insert the word "long" before
the word "string". To do this, we must first copy the memory area starting from the
address [bufl+10] five bytes forward to make room for the word "long" and a
space. Obviously, we can only do this copying from the end to the beginning, otherwise
some of the letters will be erased before we copy them. If the word "long" (together
with the space) is contained in the buffer buf2, we can insert it into the phrase in
buf1 in the following way:
std
mov edi, buf1+15+5
mov esi, buf1+15
mov ecx, 6 rep movsb mov esi, buf2+4 mov ecx, 5 rep
movsb
Let us explain that the length of the source string is 16 characters, so the address
buf1+15 is the address of the last letter in the string - g in the word string.
Having copied six characters, i.e. the whole word string, to a new position, we
changed the address of the "source" (buf2+4 is the address of the space in the
string "long ") and continued copying.
In addition to those listed above, the i386 processor implements the cmpsb,
cmpsw, and cmpsd (compare string), and scasb, scasw, and scasd (scan
string) commands. The scas series commands compare an accumulator (AL, AX, or
EAX, respectively) to a byte, word, or double word at address [edi], setting flags
like the cmp command, and increment/decrement the EDI. The cmps commands
compare bytes, words, or double words in memory at [esi] and [edi], set
flags, and increment/decrement both registers.
The rep prefix has no meaning for these commands, but with the scasX and
cmpsX. commands you can use the repz and repnz prefixes (also called repe
and repne), which, in addition to decreasing and checking the ECX register (or
CX, if the command is two-byte), also check the value of the ZF flag and continue
working only if this flag is set (repz/repe) or reset (repnz/repne).
It is interesting to note that the repe/repz prefix has exactly the same machine code
as the rep prefix used with the stosX and movs.V instructions; the processor
"understands" whether or not to check the flag depending on which instruction this prefix
precedes - for stos and movs the flag is not checked, for scas and cmps it
is checked. The repne/repnz prefix has a separate code. In addition to the above
commands, the rep prefix can be used to prefix some other commands that extract and write
information to I/O ports, but they are privileged commands, so we do not consider them.
For example, if we need to find the letter 'a' in a mystr character array of size
mystr_len, we can proceed as follows:
The lahf command copies the contents of the flag register to the AH
register: the CF flag is copied to the lowest bit of the register (bit
number 0), the PF flag to bit number 2, the AF flag to bit number
4, the ZF flag to bit number 6 and finally the SF flag to bit number 7,
i.e. the highest bit. The other bits are left undefined.
The movsx (move signed extension, "move signed extension") and movzx
(move zero extension, "move zero extension") commands allow you to
combine copying with digit expansion. Both commands have two operands, and the
first one must be a register, and the second one can be a register or of type "memory",
and in any case the length of the first operand must be twice the length of the second
one, i.e. you can copy from byte to word or from word to double word. The movzx
command fills the missing bits with zeros, and the movsx command fills them with
the value of the high bit of the original operand.
Quite interesting is the cpuid command available on Pentium and later
processors, which can be used to find out which processor model our program is
running on and what features this processor supports; a detailed description of the
command can be found on the Internet or in reference books, we will not give it here,
but it is useful to remember the fact of its existence.
We will also mention the commands xlat (convenient for recoding text data
through the recoding table), bswap (allows you to rearrange the bytes of a given 32-
bit register in reverse order, first appeared in the 80486 processor), aaa, aad, aam
and aas (allow you to perform arithmetic operations on binary-decimal numbers in
which each half byte represents a decimal rather than a hexadecimal digit).
The xchg command allows you to swap the values of its two operands. One of
them is any of the general-purpose registers, the other is a register or operand of the
"memory" type with the same size. The register operand can be specified first or second
- as you can easily guess, it has no effect on anything. When both operands of the
§ 3.2. Basics of the І386 command system 587
command are registers, in principle this command does not give any serious
possibilities, but if one of the operands is a memory location, the use of xchg allows
you to put some value into this memory in one indivisible action, and store the value
that was there before in a register. Why it is so important to provide indivisibility of
memory actions, we will learn in the VII part of our book, which will be devoted to
parallel programming.
Indivisibility can also be achieved with some other commands - more specifically, with all
commands that assume that some value will be retrieved from memory, a new value calculated
from it, and written to the same memory; for example, commands like " inc dword [x] "
can be made indivisible,
"neg byte [b]", "sub [m], eax", etc. To make such a command "indivisible", it must
be prefixed with the lock prefix, i.e. write something like this
As with the rep, repe, and repne prefixes we used in conjunction with string
commands in §3.2.13, the lock prefix is a single byte that is added to the front of the
command's machine code. For clarity, we note that this is byte F0, but it is not necessary to
remember this. By executing an instruction with this prefix, the processor sets a special flag
on the control bus that prohibits other processors (and other actors like DMA controllers, which
we will postpone until the last part of the second volume) from any work with RAM; this, in fact,
ensures the indivisibility of the operation. You should only realize that such a command can
be executed dozens of times slower than the same command without the prefix. Well, for the
xchg command the processor forbids all access to memory without any lock prefix,
simply because xchg is specially designed for use in conditions when atomicity of the
operation is required.
Consideration of the command system cannot be considered complete without the
nop command. It performs a very important action: it does nothing. Its name itself is
derived from the word no operation.
of existing architectures the stack "grows downwards", i.e. in the direction of decreasing
addresses, but you can find processors on which the stack "grows upwards", and such
processors where the direction of stack growth can be chosen, and even such processors
where the stack is organized cyclically.
The stack can be used, for example, to temporarily store register values; if a certain
register stores a value needed for further calculations, and we need to temporarily use
this register for something else, the easiest way to get out of the situation is to store the
value of the register in the stack, then use the register for other needs, and then retrieve
the stored value from the stack back to the register. This is quite convenient, but another
thing is much more important: the stack is used when calling subroutines to store
return addresses, to pass actual parameters to subroutines and to store local
variables. It is the use of the stack that allows implementing the recursion mechanism,
when a subroutine can directly or indirectly call itself.
will copy the four-byte value from the top of the stack to the EAX register.
Of course, you can work this way not only with the top, but also with any data in the stack,
because it is a normal memory area; the only restriction here is that the free part of the stack
- cells with addresses smaller than the current esp value - should not be used, because
the operating system may use this memory between the quanta of time allocated to your
process, and thus corrupt your data. For example, when processing so-called signals, this is
exactly what will happen, and it can happen at a completely random moment, because you
don't know between which commands your program will be taken out of control to let other
programs work, and at that time a signal may arrive. We will discuss signal processing in detail
in Volume 2.
As mentioned above, the stack is very convenient to use for temporary storage of
values from registers:
Let's consider a more complicated example. Let the register ESI contains the address
of a string of characters in memory, and it is known that the string ends with a byte with
a value of 0 (but we do not know what is the length of the string) and we need to
"reverse" this string, that is, to write its constituent characters in reverse order in the
same memory location; zero byte, which plays the role of a limiter, of course, remains
§3.3 Stack, subroutines, recursion 598
in place and is not copied anywhere. One way to do this is to sequentially write the
character codes to the stack, and then go through the string from beginning to end again,
retrieving characters from the stack and writing them to the cells that make up the string.
Since one-byte values cannot be written to the stack, and two-byte values are
possible but undesirable, we will write four-byte values using only the low-order byte.
Of course, it is possible to do everything more rationally, but now we are more
interested in the clarity of our illustration. We will use the EBX register for
intermediate storage, and only its low byte (BL) will contain useful information, but we
will write the whole EBX to the stack and retrieve it from the stack. The task will be
solved in two cycles. Before the first cycle we will put a zero in the ECX register, then
at each step we will extract a byte at address [esi+ecx] and place this byte (as part
of a double word) in the stack, and ECX will be incremented by one, and so on
until the next extracted byte is not zero, which means the end of the string according to
the task conditions. As a result, all non-zero elements of the string will be in the stack,
and in the register ECX will be the length of the string.
Since the number of iterations (string length) for the second loop is known in
advance and is already contained in ECX, we organize this loop using the loop
command. Before entering the loop, we will check if the string is empty (i.e. ECX is
not equal to zero), and if the string was empty, we will immediately go to the end of
our fragment. Since the value in ECX will be decreasing, and we need to pass the line
in the forward direction - along with ECX we will use the EDI register, which at
the beginning will be set equal to ESI (that is, pointing to the beginning of the line),
and at each iteration we will shift it. So, we write:
As a result, 256 bytes of memory starting from the address specified by the my_array
label will be filled with the '@' character code (number 64).
value by simply forwarding the value from EBP to it so that it points to the return
address again.
Another question naturally arises: what if other subroutines also use the EBP
register for the same purpose? In this case, the first call of another subroutine will ruin
our work. Of course, we can save EBP in the stack before calling each subroutine, but
since there are usually many more subroutine calls in a program than subroutines
themselves, it is more economical to follow a simple rule: each subroutine should save
the old EBP value itself and restore it before returning control. The stack is also used
to save the EBP value. The saving is performed by a simple command push ebp
immediately after receiving control, so that the old EBP value is placed in the stack
immediately after the return address of the subroutine, and this address of the stack top
is used as the "anchor point". To do this, the next command is mov ebp,esp. As a
result, the EBP register points to the place in the stack where its own, EBP, stored
value is located; if we now access the memory at address [ebp+4], we will find
there the address of return from the subroutine, and the parameters stored in the stack
before calling the subroutine are available at addresses [ebp+8], [ebp+12],
[ebp+16], etc. Memory for local variables is allocated by simply subtracting the
required number from the current ESP value; so, if we need 16 bytes for local
variables, we should execute the command sub esp,16 immediately after saving
§3.3 Stack, subroutines, recursion 603
EBP and copying the ESP contents into it; if (for the sake of simplicity) all our local
variables also occupy 4 bytes, they will be available at addresses [ebp-4], [ebp-
8], etc. The structure of a stack frame with three four-byte parameters and four four-
byte local variables is shown in Fig. 3.5.
Let us repeat that at the beginning of its work, according to our agreements, each
subprogram must execute
The І386 processor supports special commands for servicing stack frames. Thus, at the
beginning of a subroutine, instead of the three commands given above, you could give one
command "enter 16, 0", and instead of two commands before ret you could write
leave. The problem, oddly enough, is that enter and leave are slower than the
corresponding set of simple commands, so they are almost never used; if we disassemble
machine code generated by a Pascal or C compiler, we are likely to find exactly those
commands at the beginning of any procedure or function, as shown above, and nothing like
enter. The only justification for the existence of enter and leave commands may be
their shortness (for example, the machine command leave occupies only one byte in
memory), but nowadays nobody usually thinks about saving memory on machine code;
performance is usually more important.
Let us make one more important remark. When working under Unix OS, we don't
have to worry about stack availability or stack size. The operating system creates a
stack automatically at the start of any task, and during its execution it increases the size
of memory available for the stack if necessary: as the top of the stack moves "up" (i.e.
in the direction of decreasing addresses) through the virtual address space, the operating
system puts more and more new pages of physical memory to correspond to virtual
addresses. That is why in Figs. 3.4 and 3.5 we have depicted the top edge of the stack
as something fuzzy. However, after lightweight processes (tracks) appeared in Unix-
systems, a rather strict limit was imposed on the stack size: 8 MB. If this limit is
exceeded, the system kernel will destroy your process as a crash.
procedure or function call was translated from Pascal in the form of a series of
commands to put values on the stack, and the values were put in the natural (for
humans) order - from left to right; then the call command was inserted into the code.
When such a procedure receives control, the values of actual parameters are placed in
the stack from bottom to top, i.e. the last parameter is placed closer to the frame
reference point (available at address [ebp+8]). This, in turn, implies that to access
the first (and any other) parameter, a Pascal procedure or function must know the
total number of these parameters, since the location of the n-th parameter in the stack
frame depends on the total number. Thus, if a procedure has three four-byte parameters,
the first of them will appear in the stack at address [ebp+16], while if there are
five of them, the first one will be found at address [ebp+24]. This is why Pascal
does not allow creating procedures or functions with a variable number of arguments,
so-called variadic subroutines (which is quite normal for an educational language, but
not quite acceptable for a professional language). As we discussed in §2.1 (see the note
on page 235), all sorts of writeln, readln, and other entities that resemble
variadic subroutines are actually part of the Pascal language in general, i.e., they should
be considered operators rather than procedures.
The creators of the C language took a different path. When translating a C function
call, the parameters are placed on the stack in reverse order, so that the first of them (if
there is one, of course) is always available at address [ebp+8], the second at address
[ebp+12], and so on, regardless of the total number of parameters. This allows
you to create variadic functions; in particular, the C language itself does not include
any functions at all, but the "standard" library provides a number of functions that
assume a variable number of arguments (such as printf, scanf, etc.), and all
these functions are also written in C (you can't do this in Pascal).
On the other hand, the absence of variadic subroutines in Pascal allows us to put
the care of stack cleaning on the caller. Indeed, a Pascal subroutine always knows how
much space the actual parameters occupy in its stack frame (since for each subroutine
this amount is set once and for all and cannot change) and, accordingly, can take care
of stack cleaning. As we have already mentioned, there are more subroutine calls in any
program than subroutines themselves, so by shifting the care of stack cleaning from the
caller to the called one, a certain memory saving (the number of machine instructions)
is achieved. When using C conventions, such saving is impossible, because a subroutine
in general case does not know and cannot know how many parameters are passed to
252
251
Pascal compilers don't have to do this; for example, the familiar Free Pascal tries to pass parameters
through registers, and only if there are not enough registers does it place the remaining parameters on
the stack, but the order is indeed "direct".
252
Different situations use different ways of fixing the number of parameters; for example, the
printf function finds out how many parameters to fetch from the stack by analyzing the format
string, and the execlp function fetches arguments until it hits a null pointer, but both are just special
cases.
§3.3 Stack, subroutines, recursion 605
it, so the care of clearing the stack of parameters remains on the caller; usually it is done
simply by increasing the ESP value by a number equal to the total length of the actual
parameters. For example, if a procl subroutine takes three four-byte parameters (let's
call them al, a2, and a3) as input, its call would look something like this:
In the case of Pascal conventions, the last command (add) is unnecessary, the one
being called takes care of everything. The i386 processor even has a special form of the
ret instruction with one operand for this purpose (we used ret without operands
in the examples above). This operand, which can only be direct and is always two bytes
long ("word"), specifies the amount of memory (in bytes) occupied by the function
parameters. For example, a procedure that accepts three four-byte parameters through
the stack, the Pascal compiler will end with the command
ret 12
This command, like the usual ret command, will retrieve the return address from the
stack and pass control over it, but in addition (at the same time) will increase the ESP
value by a given number (in this case 12), relieving the caller of the obligation to clear
the stack.
Both Pascal and C compilers organize the return of values from functions through
registers, and the "most important" register is used - for i386 it is EAX, the authors
of the compilers were unanimous in this. To be more precise, integer values that fit into
this register (i.e. no more than four-byte values) are returned through EAX. Eight-
byte integers are returned through the register pair EDX:EAX; looking ahead, we note
that floating-point numbers are returned through the "main" register of the arithmetic
coprocessor. Only if the returned value is not a number - for example, you need to return
a record (record in Pascal, struct in C) - the return is done through the
memory provided by the caller, and the caller must pass the address of this memory to
the subroutine through the stack along with the parameters.
first_proc:
; .
.cycle:
second_proc:
; ... ...
.cycle:
; ... ...
third_proc:
If, for example, a string is located in memory labeled string and a pattern is
located in memory labeled pattern, the call to the match subroutine will look like
this:
After that the result of the match (0 or 1) will be in the EAX register. The text of this
example together with the main program using command line parameters can be found
in the file match.asm.
Note that at the beginning of the subroutine, when trying to jump to the .false label,
we had to explicitly specify that the transition is "near". The point is that the .false label
was a little farther away from the transition command than is acceptable f or a "short"
transition. See the discussion on page. 572.
This chapter will be devoted entirely to a study of the NASM assembler, beginning
with a brief description of the command-line keys used to run it and continuing with a
more formal description of its language syntax than before. After that, we will devote
a separate chapter to the macro processor.
§ 3.4. Main features of the NASM assembler 612
3.4.1. Command line keys and options
As we have already mentioned, when calling the nasm program, you must specify
the name of the file containing the assembly language source code, and in addition, it
is usually necessary to specify the keys specifying the mode of operation. We are already
familiar with one of these keys - -f; let us remind you that it specifies the format of
the resulting code. In our case, the elf format is always used. Interestingly, if you
do not specify this key, the assembler will create the output file in a "raw" format, i.e.,
simply put, it will convert our commands into a binary representation and write them
to a file in this form. We cannot run such a file under operating systems, but if, for
example, we wanted to write a program to be placed in the boot sector of a disk, the
"raw" format would be just what we need.
The -o key specifies the name of the file to which the translation result should
be written. If we use the elf format, we can trust NASMty to choose the file name:
it will drop the .asm suffix from the source file name and replace it with .o,
which is what we need in most cases. If for some reason we prefer a different
name, we can specify it explicitly with -o.
We will need the -d key after learning the macro processor; it is used to define a
macro character in case we don't want to edit the source code to do so. For example, -
dSYMBOL has the same effect as inserting the string '/"define SYMBOL
at the beginning of the program, and -dSIZE=1024 will not only define the SIZE
character, but also assign it the value 1024, as the /define SIZE 1024
directive would do. We'll come back to this on page. 626.
The ability to generate a so-called listing - a detailed report of the assembler about
the work done - is very interesting from the cognitive point of view. The listing includes
lines of source code with information about the addresses used and the final code
generated as a result of processing each source line. Listing generation is triggered by
the -l switch, followed by a file name. As an example, take any assembly language
program and translate it with the -l flag; so, if your program is called prog.asm, try
using the command
nasm -f elf -l prog.lst prog.asm
The text of the listing will be placed in the prog.lst file. Be sure to look through
the resulting file and figure out what's going on; if you don't understand something, find
someone who can help you figure it out.
The -g switch, which requires NASM^ to include so-called debugging
information in the translation results, can be very useful. When this key is specified,
NASM inserts into the object file, in addition to the object code, information about the
name of the source file, line numbers in it, etc. All this information is completely useless
for the program operation, especially since it can be several times larger than the
"useful" object code. However, if your program does not work as you expect, compiling
with the -g flag will allow you to use a debugger (e.g. gdb) to execute the program
step by step, which in turn will allow you to figure out what is going on.
§ 3.4. Main features of the NASM assembler 613
Another useful key is -e; it instructs NASM^ to run our source code through the
macro processor, output the result to the standard output stream (simply put, to the
screen), and rest easy. This mode of operation can be useful if we made a mistake when
writing a macro and can't find our mistake; when we see the result of macro-processing
our program, we will most likely understand what went wrong and why.
NASM supports other command-line keys; those who wish to learn them for
themselves can consult the documentation.
A line of text in NASM assembly language consists (in general) of four fields:
label, command name, operands, and comment, with label, command name, and
comment being optional fields. As for the operands, the requirements for them are
imposed by the command; if the command name is missing, the operands are missing,
and if the command is specified, the operands must correspond to it. All four fields may
also be missing, in which case the string is empty. The assembler ignores empty lines,
but we can use them to visually separate parts of a program.
A word consisting of Latin letters, digits, and the characters '_', '$', '#', '@',
'~', '.', and '?' can be used as a label, and a label can begin only with the
letter or characters '_', '?', and '.'. As we recall from §3.3.8, labels starting with a
dot are considered local. In addition, in some cases, you can precede the label name
with a '$' character; this is usually used if you want to create a label whose name is
the same as the name of a register, command, or directive. This may be necessary if
your program is made up of modules written in different programming languages; then
other modules may well have labels that match assembly keywords, and you will need
to refer to them somehow. The assembler is case-sensitive in label names, so, for
example, 'label', 'LABEL', 'Label', and 'LaBeL' are four different labels. A
colon character may be placed after a label if it is present in the string, but not
necessarily. As noted, programmers usually put colons after labels to which control can
be transferred, and do not put colons after labels that designate memory regions.
Although the assembler does not require this, the program is clearer when this
§ 3.4. Main features of the NASM assembler 614
convention is used.
The command name field, if present, may contain a machine command designation
(possibly with a prefix such as rep or lock), or pseudo-commands - directives of a
special kind (we have already considered some of them and will come back to this
issue), or, finally, the name of a macro (we have also met with these, for example,
PRINT used in the examples; a separate paragraph will be devoted to the creation of
macros). Unlike labels, the assembler does not distinguish between letter registers in
the names of machine commands and pseudo-commands, so we can equally well write
mov, MOV, Mov, and even mOv, although we should not write it that way, of course.
Macro names, as well as label names, are case-sensitive.
The requirements for the contents of the operand field depend on which specific
command, pseudo-command, or macro is specified in the command field. If there is
more than one operand, they are separated by a comma. Register names often have to
be used in the operand field, and they are case-insensitive, just as machine command
names are.
The reader who is confused about where case is important and where it is not,
should remember one simple rule: the NASM assembler does not distinguish
between upper and lower case letters in all words that it has entered itself: in
instruction names, register names, directives, pseudo-commands, operand lengths
and transition types (words byte, dword, near, etc.).etc.), but he counts upper
and lower case letters as different letters in the names that the user (a programmer
writing in assembly language) enters - in labels and macro names.
Let us note one more NASM property related to operand writing. An operand of
type "memory" is always written using square brackets. This is not the case for
some other assemblers, which causes constant confusion.
A comment is indicated by a semicolon (";"). Starting from this symbol, the
assembler ignores all text up to the end of the line, which allows you to write anything
you want there. It is usually used to insert explanations into the program text for those
who will have to read the text.
four equals 4
We have defined label four, specifying the number 4. Now, for example,
is the same as
mov eax, 4
It is worth recalling that any label is nothing more than a number, but when a program
line containing the mnemonic of a machine instruction or a memory allocation directive
is labeled, the corresponding memory address (which is nothing more than a number)
is associated with such a label, whereas the equ directive allows you to specify a
number explicitly.
The equ directive is often used to associate with some name (label) the length
of an array just specified with a db, dw, or any other directive. This is accomplished
by using the pseudo-label $, which in each line where it appears denotes the current
address of . For example, you could write it like this:
253
The expression $-msg, which is the difference of two numbers known to the assembler
at runtime, will be calculated directly at assembly time. Since $ means the current
address after the string has been described, and msg means the address of the
beginning of the string, their difference is exactly equal to the length of the string (19
in our example). We will return to the computation of expressions during assembly in
§3.4.5.
The times directive allows you to repeat a command (or pseudo-command) a
specified number of times. For example,
specifies a memory area of 4096 bytes filled with the '*' character code, just as
4096 identical strings containing the db '*' directive would do.
Sometimes the incbin pseudo-command can be useful, which allows you to create a
More specifically, the current offset relative to the beginning of the section.
253
§ 3.4. Main features of the NASM assembler 616
memory area filled with data from some external file. We will not consider it in detail; the
interested reader can study this directive himself by referring to the documentation.
3.4.4. Constants
Constants in NASM assembly language fall into four categories: integers,
character constants, string constants, and floating point numbers.
As already mentioned, integer constants can be specified in decimal, binary,
hexadecimal, and octal number systems. If you simply write a number consisting of
digits and perhaps a minus sign as the first character, the assembler will treat the number
as a decimal number; how to specify constants in other number systems is explained in
detail on page 554. 554.
Character constants and string constants are very similar to each other; indeed,
wherever a string constant should be used, a character constant can be used. The
difference between string and character constants is only in their length: a character
constant is a constant that fits within the length of a "double word" (i.e., contains no
more than 4 characters) and can therefore be considered an alternative record of an
integer (or bit string). Both character and string constants can be written using double
quotes and apostrophes. This allows you to use apostrophe and quote characters
themselves in strings: if a string contains a quote character of one type, it is enclosed in
quotes of another type (see the example on page 555).
Symbolic constants containing less than 4 characters are considered synonymous with
integers, with the low bytes equal to the character codes of the constant and the missing high
bytes filled with zeros. When using character constants, it should be remembered that integers
in computers with i386 processors are written in reverse byte order, that is, the least significant
byte comes first. At the same time, according to the meaning of the string (and character
constant), the code of the first letter should be placed in memory first. That's why, for example,
the constant 'abcd' is equivalent to the number 64636261h: 64h is the code of
the letter d, 61h is the code of the letter a, and in both cases the byte with the value
61h is first and 64h is last. In some cases, the assembler accepts as string constants
such constants which are short enough to be considered character constants. This happens ,
for example, if the assembler sees a character constant more than one character long in the
parameters of the db directive or a constant more than two characters long in the parameters
of the dw directive.
Floating-point constants that specify fractional numbers are syntactically
distinguished from integer constants by the presence of a decimal point. Note that
integer constant 1 and constant 1.0 have nothing in common! For clarity, we
note that the bitwise record of a floating-point 1.0 single-precision number (that is, a
record that takes up 4 bytes, just as for an integer) is equivalent to the integer
3f800000h (1065353216 in decimal notation). A floating-point constant can also
be specified in exponential form, using the letter e or E. For example, 1.0e-5 is
the same as 0.00001. Note that the decimal point is still required.
• * is multiplication;
• / and % - integer division and remainder of division (for unsigned integers);
• // and %% - integer division and remainder of division (for signed integers);
• &, |, ~ - bitwise "and", "or", "excluding or" operations;
• << and >> - bitwise shift operations to the left and right;
• unary operations - and + are used in their usual role: - changes the sign
of a number to the opposite sign, + does nothing;
• The unary operation ~ denotes bitwise negation.
When using % and %% operations, it is necessary to leave a space character after the
operation sign so that the assembler does not confuse them with macro directives (we
have already used macro directives in the examples, and we will consider them in detail
later).
Another unary operation, seg, is not applicable for us due to the absence of segments in
the "flat" memory model.
Unary operations have the highest priority, followed by multiplication, division and
remainder operations, and addition and subtraction operations have even lower priority.
Next in descending order of priority are the shift operations, the & operation,
then the ~ operation, and then the | operation, which has the lowest
priority. You can change the order of operations by enclosing part of the expression in
parentheses.
3.4.6. Critical expressions
The assembler analyzes the source text in two passes. The first pass calculates the
size of all machine commands and other data to be placed in program memory; as a
result, the assembler determines what numerical value should be assigned to each of
the labels encountered in the program text. The second pass generates the actual
machine code and other memory contents. The second pass is needed so that, for
example, it is possible to refer to a label that appears in the text later than the reference
to it: when the assembler sees a label, say, in the jmp command, before the actual
command marked with this label is encountered, it cannot generate code on the first
pass because it does not know the numerical value of the label. On the second pass, all
the values are already known and there are no problems with code generation.
All this has a direct relation to the mechanism of expression evaluation. It is clear
that an expression containing a label can be evaluated by the assembler on the first pass
only if the label was in the text before the expression to be evaluated; otherwise, the
§ 3.4. Main features of the NASM assembler 618
evaluation of the expression has to be postponed until the second pass. There is nothing
wrong with this unless the value of the expression does not affect the size of the
instruction, the memory area allocated, etc., i.e. the numerical values to be assigned to
further labels encountered do not depend on the value of the expression. If this
condition is not fulfilled, the impossibility to calculate the expression on the first pass
will lead to the impossibility to fulfill the task of the first pass - to determine the
numerical values of all labels. Moreover, in some cases no number of passes would
help even if the assembler could do it. The NASM assembler documentation gives such
an example:
times (label-$) db 0
label: db 'Where am I?'
Here, the line with the times directive should create as many null bytes as the number
of cells the label label is from the line itself - but the label label
is just as many cells away from the line as the number of null bytes that will be
created. So how many should there be?!
This makes it necessary to introduce the notion of a critical expression: it is an
expression evaluated during assembly whose value the assembler needs to know during
the first pass. The assembler considers as critical any expression that in one way or
another affects the size of anything in memory, and therefore may affect the values of
labels entered later. Only numeric constants can be used in critical expressions, as well
as labels defined higher up in the program text than the expression in question. This
ensures that the expression can be evaluated on the first pass.
Besides the times directive argument, the critical category includes, for
example, expressions in the arguments of pseudo-commands resb, resw, etc., and
in some cases - expressions in the executive addresses, which may affect the final size
of the assembled instruction. Thus, the commands "mov eax,[ebx]", "mov
eax,[ebx+10] " and "mov eax,[ebx+10000] " generate 2 bytes, 3 bytes and
6 bytes of code, respectively, because the executive address in the first case occupies
only 1 byte, in the second case - 2 bytes because of the single-byte number included in
it, and in the last case - 5 bytes, of which four are used to represent the number 10000;
but how much memory will the command take up?
if the label value has not been defined yet? However, these difficulties can be
avoided if you explicitly specify the digit capacity inside the executive address with
byte, word or dword. For example, if you write
- then, even if the label value is not yet known, its length (and, consequently, the
length of the entire machine instruction) is already specified.
§ 3.4. Main features of the NASM assembler 619
3.5. Macro tools and macro processor
3.5.1. Basic Concepts
A macroprocessor is a software tool that takes as input some text and, using the
instructions given in the text itself, partially transforms it, giving as output, in turn, a
text that no longer has instructions for transformation. As applied to programming
languages, a macroprocessor is a converter of the program source text, usually
combined with a compiler; the result of the macroprocessor's work is a text in the
programming language, which is then processed by the compiler according to the rules
of the language (see Fig. 3.6).
It can also happen that the macro processor performs the transformation of the
program text without seeing any macro name, but obeys even more direct instructions
expressed in the form of macro directives. We already know one such macro directive:
it is the '/include directive, which orders the macroprocessor to replace
the program itself with the contents of the file specified by the directive's parameter.
Thus, the familiar string
The term "macro expansion" is not a very successful calque of the English term macro expansion.
254
§ 3.5. Macro means and macro processor 620
/include "stud_io.inc"
push edx
push dword mylabel
push dword 517 call myproc add esp, 12
We have described a multiline macro named pcalll, which has two parameters: the
name of the called procedure for the call command and the argument of the
procedure to be placed on the stack. The lines written between the '/macro and
/endmacro directives make up the body of the macro - a template for the text that
should result from macro substitution. In the body of the macro you can use the
parameters specified when the macro is called - they are %1, %2, etc. up to %9, with
%0 representing the total number of parameters; there can be more than nine
parameters, but more on that later. In our example, the macro substitution will be quite
simple: the macro processor only needs to replace occurrences of %1 and %2 with
the first and second parameters defined in the macro call. If after such a definition in
the program text there is a line of the following form
§ 3.5. Macro means and macro processor 621
pcall1 proc, eax
- The macro processor will treat this string as a macro call and perform macro
substitution according to our macro definition, considering the word proc as the first
parameter and the word eax as the second parameter and substituting them
instead of %1 and %2. The result is like this:
/macro pcall2 3
push /3
push /2
call /1
add esp, 8
/endmacro
'/.macro pcall3 4
push /4
push /3
push /2
call /1
add esp, 12
/endmacro
Of course, this macro, unlike the previous ones, does not reduce the size of the program
in any way, but it will allow us to make all calls to subprograms uniformly. We leave
the description of macros pcall4, pcall5, etc. up to pcall8 to the reader as
an exercise; at the same time, for self-checking, answer the question why we propose
to stop at pcall8 and not, for example, at pcall9 or pcall12.
The example we examined used a multiline macro; as we have seen, calling a
multiline macro syntactically looks just like using machine commands or pseudo-
commands: the macro name is written instead of the command name, followed by a
comma followed by the parameters. A multi-line macro is always converted to one or
more lines in assembly language. But what if, for example, we need to generate a part
of a string with a macro rather than a fragment of several strings? Such a need also
arises quite regularly. For example, in the example given in §3.3.9, we can see that
inside subroutines we often need to use constructions like [ebp+12], [ebp-4],
etc. to refer to subroutine parameters and its local variables. It is not so difficult to get
used to these constructions; but we can go another way, using one-line macros. Let's
§ 3.5. Macro means and macro processor 622
start by writing the following macro definitions:
255
or like this (if, for example, the described macros were not enough)
mov [arg(7)], edx
In principle, we could include square brackets inside macros, so that we don't have to write
them every time. For example, if we change the definition of macro arg1 to the following:
We have not done this for reasons of clarity. The NASM assembler supports, as we know, the
convention that any memory access is formalized with square brackets; if there are no square
brackets, we deal with a direct or register operand. A programmer who is used to this
convention will have to make extra efforts when reading the program to remember that arg1
in this case is not a label but the name of a macro, so it is a memory access that is performed
here, not loading the label address into the register. Such things do not help the program
comprehensibility at all. Keep in mind that you yourself, even if you are the author of the
program, may forget completely what was meant in a few days, and then saving two characters
(brackets) will cost you invaluable time.
Here and later in our examples we assume that all procedure parameters and all local variables are
255
always "double words", i.e., 4 bytes in size; in reality, of course, this is not always the case, but we are
now more concerned with the illustrative value of the example.
§ 3.5. Macro means and macro processor 623
the new definition, it will use it instead of the old one. With this in mind, the same
macro name can mean different things in different places in the program and be
expanded into different text fragments. Moreover, a macro can be removed altogether
by using the %undef directive; upon encountering such a directive, the macro
processor will immediately "forget" that the macro exists. An interesting question is
what happens if one macro definition uses a call to another macro, and the latter is
occasionally overridden.
If we use the familiar '/define directive to describe a one-line macro A and
use a macro call of macro B in its body, this macro call is not disclosed in the
directive itself; the macroprocessor leaves the occurrence of macro B as it is until it
encounters a call of macro A. When the macro substitution for A is executed, its
result will contain B, and the macro processor will in turn execute a macro substitution
for it. Obviously, this will use the definition of macro B that was current when A
was substituted (not defined).
Let's explain the above with an example. Let's assume that we have entered two
macros:
'/. definethenumber25
'/. definemkvarddthenumber.
varlmkvar
— then the macro processor will first perform a macro substitution for mkvar,
obtaining the string
varldd thenumber
— and from it, in turn, by macro substitution thenumber will get the string
varldd 25
/definethenumber36
var2mkvar
— then the result of the macroprocessor will be a string containing exactly the number
36:
var2dd 36
— even though we have not changed the mkvar macro itself: at the first step, dd
thenumber will be obtained as last time, but thenumber now has the value 36,
§ 3.5. Macro means and macro processor 624
and it will be substituted. This macro substitution strategy is called "lazy ". 256
However, the NASM assembler allows you to use another strategy, called vigorous, for
which the /xdefine directive is provided. This directive is completely analogous
to the '/define directive with the only difference that if macro calls occur in the
body of the macro description, the macro processor performs their macro substitutions
immediately, i.e. right at the moment of macro definition processing, without waiting
for the user to call the macro being described. Thus, if in the above example you replace
the /define directive in the description of the mkvar macro with /xdefine:
/define thenumber 25
/xdefine mkvar dd
varl mkvar thenumber
/define thenumber 36
var2 mkvar
The redefinition of the thenumber macro cannot affect the operation of the mkvar
macro now, because the body of the mkvar macro does not contain the word
thenumber this time: while processing the definition of mkvar, the macro
processor substituted its value (25) instead of the word thenumber.
Sometimes it is necessary to associate with the macro name not just a string, but a
number resulting from the calculation of an arithmetic expression. The NASM
assembler allows you to do this using the /assign directive. Unlike /define and
/xdefine, this directive not only performs all substitutions in the body of the macro
definition, but also attempts to evaluate the body as an ordinary integer arithmetic
expression. If it fails, an error is logged. Thus, if you write in the program first
'/. assignvar25
and then
'/. assignvarvar+1
- then as a result the value 26 will be associated with the var macro name, which
will be substituted if the macro processor encounters the word var in the further
program text.
The macro variables introduced by the '/assign directive are commonly
referred to as macro variables. This is an important tool that allows you to specify an
entire program to the macro processor, the result of which can be assembly language
The name is a corruption of the English lazy and is partially justified by the fact that the macro
256
processor seems to be "lazy" to execute macro substitution (in this case macro thenumber) until it is
forced to do so.
§ 3.5. Macro means and macro processor 625
text - generally speaking, any length of text.
/ifdef DEBUG_PRINT
PRINT "Entering suspicious section" PUTCHAR 10
/endif
;
; here comes the "suspicious" part of the program.
/ifdef DEBUG_PRINT
PRINT "Leaving suspicious section"
PUTCHAR 10
/endif
'/.define DEBUG_PRINT
Then at startup NASM will "see" and compile the fragments of our source code
enclosed between /ifdef and /endif; when we find an error and we don't need
§ 3.5. Macro means and macro processor 626
the debug print anymore, it will be enough to remove this '/.define from the
beginning of the program or even just put a comment sign in front of it:
;/define DEBUG_PRINT
This saves us from having to insert the /define directive into the source text and
then delete it.
Returning to the two-customer situation, we can envision constructions like the
following in the program:
%ifdef FOR_PETROV
;
; here's the code for Petrov only.
%elifdef FOR_SIDOROV
;
; and here, just for Sidorov
%else
; if neither symbol is defined,
; abort compilation and generate an error message
%error Please define either FOR_PETROV or FOR_SIDOROV
%endif
(as you can easily guess, the %elifdef directive allows you to shorten the record
requiring %else and %ifdef, saving one %endif at the same time). When
compiling such a program, it will be necessary to specify the -dFOR_PETROV or -
dFOR_SIDOROV key, otherwise NASM will start processing the fragment located
after %else and will generate an error message when it encounters the %error
directive.
In addition to checking for the presence of a macro character, it is also possible to
check for the absence of a macro character (i.e. the exact opposite condition). This is
done with the %ifndef (if not defined) directive. As with %ifdef, there is a
shortened version of the %else construct for %ifndef - the %elifndef
directive.
§ 3.5. Macro means and macro processor 627
It is not only the presence or absence of a macro that can be used to specify the
condition under which a fragment should or should not be compiled; NASM supports
other conditional compilation directives. The most common is the %if directive, in
which the condition is specified by an arithmetic-logic expression that is evaluated at
compile time. We have already met such expressions in §3.4.5; to form logical
expressions, the set of allowed operations is extended by =, <, >, >=, <=, in their
usual sense, the operation "not equal" can be specified by the symbol <>, as in Pascal,
or by the symbol !=, as in C; a C-like form of writing the operation "equal" in the
form of two equal signs == is also supported. In addition, the logical connectives
&& ("and"), || ("or"), and
("excluding or"). Note that all expressions used in the %if directive are considered
critical (see §3.4.6). As with all other Zif-directives, there is a shortened form
of the %else construct, the %elif directive.
Let us briefly list the other conditional directives supported by NASM. The
%ifidn and %ifidni directives take two comma-separated arguments and
compare them as strings, making macro substitutions in the text of the arguments if
necessary. The code fragment following these directives is translated only if the strings
are equal, and %ifidn requires an exact match, whereas %ifidni ignores case
and considers, for example, the strings foobar, FooBar and FOOBAR to be
the same. The %ifnidn and %ifnidni directives can be used to check the
opposite condition; all four directives have Zelif forms, %elifidn,
%elifidni, %elifnidn and %elifnidni, respectively. The %ifmacro
directive checks for the existence of a multiline macro; the %ifnmacro,
%elifmacro, and %elifnmacro directives are supported. The %ifid,
%ifstr, and %ifnum directives check whether their argument is an identifier,
string, or numeric constant, respectively. As usual, NASM supports all optional forms
of %ifnXXX, %elifXXX, and %elifnXXX for all three directives.
In addition to those listed above, NASM supports the %ifctx directive and the
corresponding forms, but the explanation of its operation is rather complicated and we will not
discuss this directive.
3.5.5. Macro-repetitions
The NASM assembly macroprocessor allows you to repeatedly (cyclically) process
the same code fragment. This is achieved by the %rep (from the word repetition) and
%endrep directives. The %rep directive takes one mandatory parameter, which
means the number of repetitions. The code fragment between the %rep and
%endrep directives will be processed by the macroprocessor (and assembler)
as many times as the number of times specified in the %rep directive parameter. In
addition, the %exitrep directive may occur between the %rep and %endrep
directives, which terminates macro-repeat execution prematurely.
Let's consider a simple example. Suppose we need to describe a memory area
consisting of 100 consecutive bytes, with the first byte containing the number 50,
the second byte containing the number 51, etc., and the last byte containing
§ 3.5. Macro means and macro processor 628
the number 149. Of course, you can just write a hundred lines of code:
db 50 db 51 db 52
db 148 db 149
- but this is, firstly, tedious and, secondly, takes too much space in the program text. It
would be more correct to entrust the generation of this code to the macroprocessor,
using macro-repetition and macro-variable:
/assign n 50
/rep 100
db n
/assign n n+1
/endrep
Upon encountering such a fragment, the macroprocessor will first associate the value
50 with the macro variable n, then it will examine the two lines between %rep and
%endrep one hundred times; each examination of these lines will lead to the
generation of the next line db 50, db 51, db 52, etc. to be assembled; the number
changes due to the fact that the value of the macro variable n changes (increases by
one) at each macro-repeat pass. In other words, as a result of processing this fragment
by the macroprocessor, exactly one hundred lines of code will be obtained as shown
above, and it is these lines that will be assembled.
Let's consider a more complicated example. Suppose we need a memory area
containing all Fibonacci numbers , not exceeding 100 000, sequentially as four-byte
257
fibonacci
/assign i 1
/assign j 1
/rep 100000
/if j > 100000
/exitrep
/endif
dd j
/assign k j+i
/assign i j
/assign j k
/endrep
Recall that Fibonacci numbers are a sequence of numbers starting with two units, each next
257
number of which is obtained by adding the previous two: 1, 1, 2, 3, 5, 8, 13, 41, 54, etc.
§ 3.5. Macro means and macro processor 629
fib_countequ ($-fibonacci)/4
The label fibonacci will be associated with the start address of the generated
memory region, and the label fib_count will be associated with the total number
of numbers placed in this memory region (we have already encountered this technique,
see page 614).
You can use macro-repeats not only to generate memory areas filled with numbers,
but also for other purposes. For example, suppose we have an array of 128 two-byte
integers:
array resw 128
and we want to write a sequence of 128 inc commands incrementing each of the
elements of this array by one. We can do it this way:
/assign a 0
%rep 128
inc word [array + a]
/assign a a+2
%endrep
The reader might note that using 128 instructions in such a situation is irrational and it would
be more correct to use a runtime loop like this:
In most cases this option is indeed preferable because these three commands will naturally
occupy several tens of times less memory than a sequence of 128 inc commands, but
you should keep in mind that such code will run about one and a half times slower, so in some
cases using a macrocycle to generate a sequence of identical commands (instead of a runtime
cycle) may be meaningful.
Here and until the end of the paragraph, we use the EAX and ECX registers without saving their
258
contents; we will assume that our macros follow the same conventions as subroutines written according
to CDECL (see page 607).
§ 3.5. Macro means and macro processor 631
%macro zeromem 2 ; (two parameters - address and length)
mov ecx, %2
mov eax, %1
Ip: mov byte [eax],
0
inc eax loop lp
%endmacro
NASM will accept this description and even allow us to make one macro call. If at least
two calls to the zeromem macro occur in our program, we will get an error message
when we try to translate the program - NASM will complain that we use the same label
(lp:) twice. Indeed, at each macro call, the macro processor will insert the whole body
of our macro definition instead of the call, only replacing %1 and %2 with the
corresponding parameters and keeping everything else unchanged. So, the string
section .bss
array resb 256
arr_len equ $-array
section .text
— then the beginning of the zeromem macro will expand into the following code:
mov ecx, %2
mov eax, %1
write
specifies a macro that accepts from one to three parameters, and the directive
specifies a macro that allows an arbitrary number of parameters, not less than two.
When working with such macros, the designation /0 may be useful, instead of which
the macro processor substitutes a number equal to the actual number of parameters
during macro expansion.
Recall that the arguments of a multiline macro are denoted in its body as %1, /2,
etc., but NASM does not provide indexing facilities (i.e., a way to extract the nth
parameter, where p is calculated during macro substitution). How to use parameters in
this case, if even their number is not known in advance? The problem is solved by the
directive '/rotate, which allows you to reassign parameters. Let's consider the
simplest version of the directive:
§ 3.5. Macro means and macro processor 633
/rotate 1
The numeric parameter indicates by how many positions the parameter numbers should
be shifted. In this case it is the number 1, so that the parameter previously designated
/2, after this directive will have the designation %1, in turn, the former /3 will
turn into /2, etc., well, and the parameter, which was the first and had the
designation %1, due to the "cyclicality" of our shift will receive a number equal to the
total number of parameters. The designation /0 does not participate in the rotation
and does not change in any way.
If the /rotate directive is given a negative parameter, it will perform a cyclic
shift in the opposite direction (to the left). Thus, after
/rotate -1
/1 will denote the parameter that was the most recent, /2 will denote the parameter
that was the first (i.e., labeled %1), and so on.
Recall that earlier (see page 620) we promised to write the pcall macro, which
allows us to form a call to a subroutine with any number of arguments in one line. Now,
having at our disposal macros with a variable number of arguments and the /rotate
directive, we are ready to do it. Our macro, which we will simply call pcall, will take
as input the address of the procedure (argument to the call command) and an arbitrary
number of parameters to be placed on the stack. We will, as before, assume for
simplicity that each parameter occupies exactly 4 bytes. Recall that the parameters must
be placed on the stack in reverse order, starting from the last one. We will achieve this
by using the %rep macro loop and the '/rotate -1 directive, which will make
the last (currently) parameter number 1 at each step. The number of iterations of the
loop is one less than the number of parameters passed to the macro, because the first
parameter is the name of the procedure and it does not need to be stacked. After this
loop, we will have to turn the last parameter into the first one again (it will be the very
first of all parameters, i.e. the address of the procedure) and make a call, and then
insert the add command to clear the stack of parameters. So, let's write:
push dword 27
push dword myvar
push dword eax call myproc add esp, 12
as requested.
macro variables var1, var2 and var3 will get the values 'a', 'b' and 'c'
respectively, i.e. the effect will be the same as if we had written
All this makes sense, as a rule, only if you get either the name of a macro variable or the
designation of a positional parameter in a multiline macro as a directive argument.
Recall that all macro directives are executed during macro-processing (before
compilation, i.e. long before the execution of our program), so, of course, at the time of the
corresponding macro substitutions, all strings used must already be known.
Task 2
Task 3
work downtime
program expects them to appear, and only then return control to it. During all this time
(at least the time spent on moving the head and waiting for the necessary phase of disk
rotation) the central processor will be at best idle, and most likely it will have to
continuously poll the controller in a loop for readiness (Fig. 3.8).
All this does not create problems if we have only one task and there is nothing else
for the processor to do, but if besides the task that is already running, we have other
tasks waiting for their time, then it is better to use the CPU time to solve other tasks,
and not let it go to waste waiting for the end of I/O operations, That's what multitasking
operating systems do. In such a system, a task queue is formed from the tasks that need
to be solved. As soon as the active task requests an I/O operation, the operating system
performs the necessary actions to start the device controllers to execute the requested
operation or puts the requested operation in the queue, if you can not start it
immediately for some reason, after which the active task is replaced by another - a new
(taken from the queue) or already executed earlier, but did not have time to complete.
In this case, the replaced task is considered to have entered the state of waiting for the
I/O result, or the blocking state.
In the simplest case, a new active task remains in the execution mode until it either
terminates or requests an I/O operation in its turn. In this case, the blocked task at the
end of the I/O operation goes from the blocked state to the state of readiness for
execution, but there is no switch to it (see Fig. 3.9); this is due to the fact that
task 1 locking readiness task 1
task 2
The operation of changing the active task, generally speaking, consumes a lot of
processor time. Such a way of multitasking construction, when the active task is
changed only in case of its termination or request for I/O operation, is called batch
mode , and operating systems realizing this mode are called batch operating systems.
273
The batch multitasking mode is the most efficient from the point of view of using the
processing power of the central processor, that is why the batch mode is used to control
272
Reading directly into RAM is theoretically possible, but technically is difficult and is rarely used.
The Russian-language term "batch mode" is a well-established, though not very successful,
273
translation of the English term "batch mode"; the word batch can also be translated as "deck" (in fact,
originally they meant decks of punched cards representing tasks). This term should not be confused
with words derived from the English word packet, which is also usually translated into Russian as
"packet".
§ 3.6. Interaction with the operating system 649
supercomputers and other machines, the main purpose of which is large volumes of
numerical calculations.
With the appearance of the first terminals and dialog (in other words, interactive)
mode of work with computers, there was a need for other strategies for changing active
tasks, or, as it is commonly said, scheduling the CPU time. Indeed, a user dialoguing
with one or another program will hardly want to wait until some active task calculating,
say, an inverse matrix of the order of 1000x1000, finishes its work. At the same time,
a lot of CPU time is not required to service the dialog with the user: in response to each
user action (e.g., pressing a key), it is usually necessary to perform a set of actions
within a few milliseconds, while the user can create no more than three or four such
events per second even in the active typing mode (the speed of computer typing of 200
characters per minute is considered quite high). It would be illogical to wait for the user
to completely finish his dialog session: most of the time the processor could perform
arithmetic operations to calculate the matrix.
Time-sharing mode helps to solve this problem. In this mode, each task is allocated
a certain amount of work time, called a time quantum. At the end of this quantum, if
there are other tasks ready for execution in the system, the active task is forcibly
suspended and replaced by another task. The suspended task is placed in the queue of
tasks ready for execution and stays there while the other tasks work off their quanta;
then it gets another quantum of time to work again, and so on. Naturally, if the active
task has requested an I/O operation, it is put into the blocking state (just like in the batch
mode). Tasks in the blocked state are not queued for execution and do not receive time
quanta until the I/O operation is completed (or another reason for the blocking
disappears) and the task moves to the ready for execution state.
There are various algorithms of execution queue support, including those in which
tasks are assigned a certain priority, expressed as a number. For example, a task can be
assigned two priority components - static and dynamic; the static component represents
the level of "importance" of execution of this particular task assigned by the
administrator, while the dynamic component is changed by the scheduler: while the
task is being executed, its dynamic priority decreases, while when it is in the execution
queue, the dynamic component of priority, on the contrary, increases. Out of several
tasks ready for execution, the one with the highest sum of priorities is selected, so that
sooner or later even the task with the lowest static priority will get control at the expense
of the increased dynamic priority.
Some operating systems, including early versions of Windows, used a strategy
intermediate between batch mode and time-sharing mode. In these systems, tasks were
allocated a time quantum, as in time-sharing systems, but there was no forced change of the
current task when the time quantum expired; the system only checked to see if the current
task's time quantum had expired when the task accessed the operating system (not
necessarily for I/O). A task that did not need the services of the operating system could remain
on the processor for as long as it wanted, just as in batch operating systems. This mode of
operation is called non-displacing. It is not used in modern systems because it imposes too
strict requirements on the programs running in the system; for example, in early versions of
Windows any program engaged in long calculations blocked the work of the whole system,
§ 3.6. Interaction with the operating system 650
and a looped task led to the need to reboot the computer.
Sometimes the time-sharing mode is also unsuitable. In some situations, such as
controlling the flight of an airplane, a nuclear reactor, an automatic production line,
etc., some tasks must be completed strictly before a certain point in time; for example,
if the autopilot of an airplane, receiving a signal from the pitch and roll sensors, takes
more time than allowed to calculate the necessary corrective action, the airplane may
lose control altogether.
When the tasks being performed (at least some of them) have strict time limits for
completion, real-time operating systems are used. Unlike time-sharing systems, the
task of a real-time scheduler is not to let all programs work for a certain amount of time,
but to ensure that each task is completed in the time allotted to it, and if this is
impossible - to remove the task, freeing the processor for those tasks that can still be
completed by the deadline. In real-time systems, what is more important is not the total
number of tasks solved in the system in a fixed amount of time (called system
performance), but the predictability of the execution time for each individual task.
Scheduling in real-time systems is a rather complex section of the computational
sciences, worthy of a separate book and obviously beyond the scope of our textbook. It
is unlikely that you will ever encounter real-time systems in practice, at least as a
programmer; if you do, you will need to spend time studying specialized literature, but
this is the case in any specific area of engineering.
operation: privileged and restricted. In the literature, the privileged mode is often called
"kernel mode" or "supervisor mode". The restricted mode is also called "user mode" or
simply unprivileged mode. We have chosen the term restricted mode as the most
accurate one describing the essence of this mode of central processor's operation
without being bound to its use by operating systems. In privileged mode the processor
can execute any existing commands. In restricted mode, execution of commands
affecting the system as a whole is prohibited; only commands whose effect is limited
to modifying data in memory areas not covered by memory protection are allowed. The
operating system itself is executed in privileged mode; user programs are executed in
restricted mode.
As we noted in §3.1.2, a user program can only modify data in its allocated
memory; any other actions require a call to the operating system. This is ensured by the
CPU's support of a memory protection mechanism and limited mode of operation.
These two hardware requirements, however, are not yet sufficient to realize a
multitasking system.
Let's return to the I/O operation situation. In a single-task system (Figure 3.8 on
page 636), during the execution of an I/O operation, the CPU could continuously poll
the device controller for its readiness (whether the required operation has been
completed) and then prepare everything to resume the active task - in particular, copy
the read data from the controller buffer to the memory belonging to the task. It should
be noted that in this case the processor would be continuously busy during the I/O
operation, despite the fact that it would not be performing any useful computations.
This mode of interaction is called active waiting. Clearly, the processor time could have
been used more usefully.
When switching to the multi-task processing shown in Fig. 3.9 on page. 638,
another problem arises. When an I/O operation is completed, the processor is busy
executing the second task. Meanwhile, the moment an operation is completed, at a
minimum, the first task must be moved from the blocked state to the ready state; other
Of course, we remember that operating systems typically don't use the segmented virtual memory
275
Naturally, the version for i386; versions designed for other hardware architectures are organized
276
differently.
§ 3.6. Interaction with the operating system 658
stream numbered 1 is considered standard output). For example, to print a line "to
the screen", which is what the PRINT macro does, we would need to put the number
4 in EAX, the number 1 in EBX, the line address in ECX, and the line length
in EDX, and then issue an int 80h command to initiate a software interrupt.
global _star
t
sectio
.data
n
msg db "Hello world",
msg_le 10
equ ms
n $- g
sectio
.text
n
_start mov eax 4 ; call write
: mov , ebx 1 ; standard output
mov , ecx msg
mov , edx msg_len
int , 80h
eax
mov , 1 ; call _exit
mov ebx 0 ; code for
int , 80h "success".
Some system calls do not fit into this convention; for example, the llseek call has a 64-bit
parameter and returns a 64-bit number too. What the kernel and the library do in such cases
is a question we will leave outside the scope of our book.
kernel:
int 80h
ret
If we have such a procedure, all we need to do to call the kernel is to put the
parameters on the stack just like a normal procedure, put the call number in EAX,
and call kernel; the call command will put the return address on the stack,
which will be on the top of the stack when the program interrupt is executed, and the
parameters will be on the stack below the top. The FreeBSD kernel takes this into
account and does nothing with the number at the top of the stack (because this number
- the return address from the kernel procedure - has nothing to do with the call
parameters), and retrieves the actual parameters from the stack below the top (from
positions [esp+4], [esp+8], etc.).
When working in assembly language, it is not necessary to separate the interrupt
call into a separate subroutine; it is enough to put an additional "double word" into the
stack before the int command, for example, by executing the push eax
command (or any other 32-bit register). After executing the system call and returning
from it, you should remove from the stack everything that was put there; this is done,
as well as when calling ordinary subroutines, by increasing the ESP register by the
required value with a simple add command.
In describing the Linux convention in the previous paragraph, we used the write
and _exit calls for illustration (see page 650). A similar program for FreeBSD would
look as follows:
global _start
section .data
msg db "Hello world", 10
msg_len equ $-msg
section .text
_start:
push dword msg_len
push dword msg
push dword 1 ; standard output
mov eax, 4 ; write
push eax ; anything
int 80h esp, ; 4 double
add 16 words
; code for
push dword 0 "success".
mov eax, 1 ; call _exit
push eax ; anything
§ 3.6. Interaction with the operating system 660
int 80h
We did not clear the stack after the _exit system call because it does not return
control anyway. In this example, we do not handle errors, assuming that writing to the
standard output stream is always successful (this is generally not true, but programmers
often ignore it). If we wanted to handle errors "fairly", the first instruction after int
80h should be the jc or jnc instruction, which makes a conditional jump
depending on the state of the CF flag, otherwise we risk that the next instruction
will set this flag according to its results and the sign of the error will be lost. In Linux
it was a bit easier, it was enough not to touch the EAX register and nothing would be
lost.
write call has number 4 and takes three parameters, namely the number
("descriptor") of the output stream (1 for a standard output stream), the address of the
memory area where the output data is located, and the amount of this data.
To enter data (both from files and from the standard input stream, i.e. "from the
keyboard"), the read call, number 3, is used. Its parameters are similar to
the write call: the first parameter is the number of the input stream descriptor (for
standard input the descriptor 0 is used), the second parameter is the address of the
memory area where the read data should be placed, and the third parameter is the
number of bytes to be read. Naturally, the memory area whose address we pass in the
second parameter must be at least as large as the number passed in the third parameter.
It is very important to analyze the value returned by the call read (recall that this
value immediately after the call is contained in the register EAX). If the reading was
successful, the call will return a strictly positive number - the number of bytes read,
which, of course, can not exceed the number "ordered" through the third parameter, but
may well be less (for example, we demanded to read 200 bytes, but in reality was read
only 15). The case when read returns the number 0 is very important - it indicates
that an "end-of-file" situation has occurred in the input stream being used. When
reading from files, it means that the whole file has been read and there is no more data
in it; recall that when typing from the keyboard in Unix, you can simulate the "end of
file" situation by pressing the Ctrl-D key combination.
Remember that a program that uses the read call and does not analyze its
result is obviously incorrect. Indeed, in this case we cannot know how many of the
first bytes of our memory area contain actually read data, and how many of the
remaining bytes continue to contain arbitrary "garbage" - hence, any meaningful work
with this data is impossible.
When reading, as with other system calls, an error can occur. As we have seen, in
277
At least on Linux and FreeBSD systems; hereafter, unless explicitly stated, it is assumed that this
is true for at least those two systems.
§ 3.6. Interaction with the operating system 661
Linux this is detected by the "negative" value of the EAX register after returning from
a call, or more specifically, by the value between fffffff000h and
ffffffffffh; FreeBSD uses the CF flag (carry flag): if the call succeeds, this
flag will be reset on exit, if an error occurs, the flag will be set. This applies to the
read call, the previously discussed write call (we did not handle error situations
so as not to complicate our examples, but this does not mean that errors cannot occur),
and all other system calls.
When a program is started, it usually has I/O streams numbered 0 (standard input),
1 (standard output), and 2 (error reporting stream) open, so we can apply a read
call to handle 0 and a write call to handles 1 and 2. Often, however, a task
requires the creation of other I/O streams, such as those for reading and writing files on
disk. Before we can work with a file, we need to open it, as a result of which we will
have another I/O thread with its own number (descriptor). This is done using the open
system call number 5. The call accepts three parameters. The first parameter is the
address of a string of text specifying the name of the file; the name must end with a
zero byte, which serves as a limiter. The second parameter is a number specifying the
mode of use of the file (read, write, etc.); the value of this parameter is formed as a bit
string in which each bit represents a particular feature of the mode, e.g., write-only
accessibility, permission to create a new file if it does not exist, etc. Unfortunately, the
arrangement of these bits is different for Linux and FreeBSD; some of the flags are
Table 3.4. Some flags for the second parameter of the open call
title description significance for
Linux FreeBSD
O_RDONLY reading only 000h 000h
O_WRONLY recording only 001h 001h
O_RDWR reading and writing 002h 002h
O_CREAT allow file creation 040h 200h
O_EXCL require the creation of a file 080h 800h
O_TRUNC if the file exists, destroy 200h 400h
its contents
O_APPEND if the file exists, add 400h 008h
at the end of
together with their descriptions and numerical values are given in Table 3.4. Note that
two variants for this parameter are the most common. The first is opening a file for
reading only, in both systems under consideration this case is set by the number 0.
The second case is opening a file for writing, when a file is created if it was not there,
and if it was, its old contents are lost (in C programs this is set by the combination
O_WRONLY|O_CREAT|O_TRUNC). For Linux the corresponding numerical value
is 241h, for FreeBSD it is 601h. The third parameter of the open call is used only
when a file is created and specifies access rights for it (see §1.2.13). In most cases it
should be set to the octal number 0666q, which corresponds to read and write
§ 3.6. Interaction with the operating system 662
permissions for all users of the system; 0600q (owner-only permissions) is less
frequently used, and other values are almost never used; we will learn why this is so in
the second volume of our book (see §5.2.3).
For an open call, it is especially important to analyze its return value and check
if an error has occurred. The call may fail for a variety of reasons, most of which the
programmer can neither prevent nor predict: for example, someone may unexpectedly
erase a file we intended to open for reading, or deny us access to the directory where
we intended to create a new file. So, after executing the open call, we need to check
if the EAX register contains a value between fffffff000h and ffffffffh (in
Linux) or if the CF flag is raised (in FreeBSD). If the call succeeds, the EAX register
contains the descriptor of the open file (input or output stream). It is this descriptor that
should now be used as the first parameter in the read and write calls to the file.
As a rule, this value should be copied immediately after the call to the memory area
specially allocated for it.
When all actions with the file are completed, it should be closed. This is done using
the close call, which has the number 6. The call takes one parameter equal to the
file descriptor of the file to be closed. The I/O stream with this descriptor then ceases
to exist; subsequent calls to open may use the same descriptor number again.
A Unix task can find out its number (the so-called process ID) with the getpid
call, and its "parent process" (the one that created the process) with the getppid call.
The getpid call on both systems in question is number 20, while the getppid
call is number 64 on Linux and number 39 on FreeBSD. Both calls take no
parameters; the requested number is returned as the result of the call via the EAX
register. Note that these two calls always complete successfully; there is no place for
errors to occur.
The kill system call (number 37) allows you to send a signal to a process
with a given number. The call takes two parameters, the first one specifies the process
number , the second one specifies the number of the signal; in particular, signal #15
278
(SIGTERM) instructs the process to terminate (but the process can intercept this signal
and terminate not immediately or not terminate at all), and signal #9 (SIGKILL)
destroys the process, and this signal can neither be intercepted nor ignored.
Unix family operating system kernels support hundreds of different system calls;
interested readers can find information about these calls on the Internet or in specialized
literature. Note that to familiarize yourself with information about system calls, it is
desirable to know the C programming language, and it is much easier to work at the
level of system calls using the C language. Moreover, some system calls in some
systems may not be supported by the kernel, but instead emulated by C library
functions, which makes their use in assembly language programs almost impossible. In
this connection, it is appropriate to recall that we are considering assembly language
278
In fact, it is possible to send a signal to a group of processes or even to all processes in the system
at once; we will postpone a detailed description of all this - both the kill call and the process groups
themselves - until the next volume.
§ 3.6. Interaction with the operating system 663
for educational, not practical, purposes. Programs intended for practical use are better
written in C or other suitable languages.
To use the write call we need to know the length of each line to be printed, so for
convenience we will describe the strlen subroutine, which receives the address of
the line as a parameter through the stack and returns the length of this line through the
EAX register (assuming that the end of the line is marked with a zero byte). The
subroutine will follow the CDECL convention: for its internal needs it will use the EAX
§ 3.6. Interaction with the operating system 664
and ECX registers, which according to CDECL it has the right to corrupt, it will use
the EBP register as a reference point of the stack frame, as it is usually done, and will
restore it on exit, and will not touch the other registers.
Using strlen, we will write the print_str subroutine, which will receive
the address of the string as the first and only parameter, determine its length by calling
strlen, and output the resulting string to the standard output stream using the
write system call. In this subroutine we need the string address twice - the first time
we will pass it to the strlen subroutine and the second time to the system call. It
will have to be copied from the stack to a register anyway, so we'll leave it in the register
and not access the stack a second time; but since we're using CDECL, we should assume
that the subroutine being called will mess up EAX, ECX, and EDX. In fact, we know
that strlen does not mess up EDX, but we will not use this knowledge, otherwise
there is a risk that sometime in the future we will change the strlen code, seemingly
staying within CDECL, but print_str will no longer work; so when calling
subroutines we should not use knowledge of their internals, but instead use general
rules. With this in mind, we use the EBX register to store the string address, which will
have to be saved at the beginning of the subroutine and restored at the end; by the way,
if you make a system call according to Linux rules, EBX will still have to be
corrupted (not so for FreeBSD).
In addition to command line parameters, we will have to print line feed characters.
Here we will take a not quite optimal way in terms of performance, but we will save a
dozen lines of code: we will describe in memory a string consisting of a line feed
character (i.e. a memory area of two bytes, the first one is a line feed character with the
code 10, the second one is a limiting zero) and print this string with the help of the
print_str subroutine that we already have.
Of course, a special subroutine calling write for a single byte would work faster,
because it would not have to calculate the length of the string. If we are talking about overall
performance, we should not call the OS kernel twice for each string, and one such call is too
much; it would be better to form one large array in memory, copying the contents of all
command line parameters into it and placing translation characters where necessary, and print
it all in one system call. System calls are expensive, because they require context switching
and involve a number of complex actions performed in the kernel. The problem is that the
program text will grow by five times with such optimization and will lose in visibility quite
seriously.
We'll call the string consisting of a single line feed nlstr and put it right at the
beginning of the .text section. We can do this because our program doesn't change
this memory location; if it didn't, we'd have to put it in the .data section.
The main program, starting with the _start label, will place the number of
command line parameters in the EBX register, and in the ESI register will place a
pointer to the place in the stack where the address of the next command line parameter
to be printed is located. It would be more logical to use ECX for the counter, but it can
and will be corrupted by called subroutines, whereas EBX is required to be restored
by CDECL. At each iteration of the loop, ESI will be incremented by 4 to indicate
the next position on the stack, and EBX will be decremented to indicate that there is
§ 3.6. Interaction with the operating system 665
one less line to print. The complete text will turn out like this:
;; cmdl.asm ;;
global_start
section .text
nlstr db10 , 0
strlen
: ; arg1 == address of the
string push ebp mov ebp, esp xor
eax, eax
mov ecx, [ebp+8] ; arg1
. p:l
cmp byte [eax+ecx], 0 jz .quit inc
eax
jmp short .lp
.quit:
pop ebp ret
_star
t argc
mov ebx, [esp] ;
mov esi, esp add argv argv[i]
again esi, 4 ;
: push dword [esi]
; call print_str
add esp, 4 push
dword nlstr call
print_str add
esp, 4 add esi,
4 dec ebx jnz
again
-
OS_FREEBSD
/.ifdef
push dword 0 ; success
mov eax, 1 ;
push eax ; _exit
int 80h extra dword
/endi
f
/macrokernel 1-*
/ifdef OS_FREEBSD /rep /0 /rotate -1 push dword /1 /endrep
mov eax, [esp] int 80h jnc //ok mov ecx, eax
mov eax, -1 jmp short //q //ok: xor ecx, ecx
//q: add esp, /0 * 4
/elifdef OS_LINUX
/if /0 > 1
push ebx
/if /0 > 4
push esi
push edi push ebp /endif /endif /rep /0
/rotate -1
push dword /1
/endrep pop eax /if /0 > 1
pop ebx
/if /0 > 2
§ 3.6. Interaction with the operating system 669
pop ecx
/if /0 > 3
pop edx /if /0 > 4
pop esi
§ 3.6. Interaction with the operating system 670
/if /0 > 5
pop edi
/if /0 > 6
pop ebp
/if /0 > 7
/error "Can't do Linux syscall with 7+
params" /endif
/endif
/endif
/endif
/endif
/endif
/endif
int 80h
mov ecx eax
and ecx
, 0fffff000
,
cmp ecx h
0fffff000
jne ,//okh
mov ecx, eax
neg ecx
mov eax, -1
jmp short //q
//ok: xor ecx, ecx
//q:
/if /0 > 1
/if /0 > 4
pop ebp
pop edi
pop esi
/endif
pop ebx
/endif
/else
/error Please define either OS_LINUX or OS_FREEBSD
/endif
/endmacro
The text of the macro is, of course, quite long, but this is compensated for by reducing
the size of the main code. For example, when talking about system call conventions,
we gave the code of a program that prints one line in the Linux (page 651) and FreeBSD
(page 652) versions. Using the kernel macro, we can write like this:
section .data
msg db "Hello world", 10 msg_len equ $-msg section .text
global _start
_start: kernel 4, 1, msg, msg_len kernel 1, 0
and that's all, and this program will compile and work correctly under both systems,
you just need to remember to specify NASM\ flag -dOS_LINUX or -
§ 3.6. Interaction with the operating system 671
dOS_FREEBSD.
One more thing will depend on the system used in our program. When opening the
copied file for reading, the second parameter of the open call should be O_RDONLY,
which is zero on both systems in question; but when opening the target file for writing,
we will have to use a combination of O_WRONLY, O_CREAT, and O_TRUNC, two
of which, as discussed on page 655, have different numerical values on Linux and
FreeBSD. 655, have different numerical values on Linux and FreeBSD. The second
parameter of the open system call should in this case be 241h in Linux and 601h
in FreeBSD (see Table 3.4). In order not to remember the differences between the
two supported systems, we will introduce a special symbol-label, the value of which
will depend on the system for which the translation is performed:
%ifdef OS_FREEBSD
openwr_flags equ 601h
%else ; assume it's Linux
openwr_flags equ 241h
%endif
Now let's form the variable section. We will need a buffer for temporary data storage,
into which we will read the next portion of data from the first file to write it to the
second file. In addition, we will also place file descriptors in variables. We could also
use registers, but we would lose in clarity. We will call the corresponding variables
fdsrc and fddest. Finally, for convenience, we will create variables for storing
the number of command line parameters and the address of the beginning of the array
of pointers to command line parameters, calling these variables argc and argvp.
All these variables do not require initial values and can therefore be located in the .bss
section:
sectio .bss
buffer
n resb 4096
bufsiz equip $-
e
fdsrc resd buffer
1
fddest resd 1
argc resd 1
argvp resd 1
When launching our program, the user may specify the wrong number of command line
parameters; the file specified as the data source may not be available or may not exist;
finally, we may not be able to open the file specified as the target file for writing for
some reason. In the first case, we should explain to the user what parameters to run our
program with, in the other two cases we should simply inform him about the error. Our
program will generate error messages into the standard diagnostic thread with
descriptor 2. We will place all three error messages in the .data section as initialized
variables:
section .data
helpmsg db 'Usage: copy <src> <dest>', 10
§ 3.6. Interaction with the operating system 672
helplen equ $-helpmsg
err1msg db "Couldn't open source file for reading", 10
err1len equ $-err1msg
err2msg db "Couldn't open destination file for writing",
10 err2len equ $-err2msg
Now let's start writing the .text section, i.e. the program itself. First of all, let's
make sure that we have exactly two parameters passed to us, for this purpose we will
extract from the stack the number at the top of the stack, which denotes the number of
command line elements, and put it into the argc variable. Just in case we
save the address of the current stack top in the argvp variable, but we won't
extract anything else from the stack, so we will have an array of addresses of command
line element strings in the stack area. Let's check that the variable argc contains
the number 3 - a correct command line in our case should consist of three elements:
the name of the program itself and two parameters. If the number of parameters is
incorrect, print an error message to the user and exit:
section .text
global _start
_start:
pop dword [argc] mov [argvp], esp cmp dword [argc],
3 je .args_count_ok kernel 4, 2, helpmsg, helplen kernel
1, 1 .args_count_ok:
Our next action should be to open the file, whose name is specified by the first
command line parameter, for reading. We remember that the argvp variable contains
the address in memory (stack memory), starting from which the addresses of command
line items are located. Let's extract the address from argvp into the ESI register,
then take the four-byte value at address [esi+4] - this will be the address of the first
parameter of the command line, i.e. the line specifying the name of the file to be read
and copied. To store the address, we will use the EDI register, and then make a call to
open. We will have to use two parameters - the actual address of the file name and the
mode of its use, which will be 0 (O_RDONLY) in this case. The result of the system
call must be checked. Recall that the kernel macro is designed so that the EAX
value equal to -1 indicates an error, and any other value indicates the successful
execution of the call; when applied to the open call, the result of successful
execution is the descriptor of a new I/O stream, in this case it is the input stream
associated with the copied file. In case of success, we save the obtained descriptor in
the fdsrc variable; in case of failure, we generate an error message and exit.
Now let's write the main loop. In it, we will read from the first file, analyze the result,
and if the end of the file is reached (value 0 in EAX) or an error occurs (value -1),
we will exit the loop, and if the reading is successful, we will write all the read
(that is, as many bytes from the buffer memory area as read has read; this
number is contained in EAX) to the second file. Since read cannot return a number
larger than its third parameter (4096 in our case), we can combine the error and end-
of-file situations using the EAX 6 0 condition.
We have exited the loop by moving to the .end_of_file label; sooner or later our
program, having reached the end of the first file, will move to this label, after which we
will only have to close both files by calling close and terminate the program:
.end_of_file:
kernel 6, [fdsrc].
kernel 6, [fddest].
kernel 1, 0
Note that we have made all labels in the main program, except for the _start
label, local (their names start with a dot). It is not necessary to do so, but this
approach to labels (all labels that are not supposed to be accessed from somewhere far
away should be made local) allows us to avoid problems with name conflicts in larger
§ 3.6. Interaction with the operating system 674
programs.
The full text of our example can be found in the file copy.asm.
extern myproc
Such a string tells the assembler literally the following: "the myproc label exists even
though it is not in the current module; if you encounter this label, just generate the
appropriate object code, and the link editor will substitute the specific address for the
label".
3.7.2. Example
As a multi-module example, we will write a simple program that asks the user for
his name and then greets him by name. This time we will organize string handling the
way it is usually done in C programs: we will use the null byte as a sign of the end of
the string. We have already encountered this representation of strings when we studied
command line parameters (§3.6.8) and even wrote the strlen subroutine that
calculates the length of a string; we will need it this time as well.
The head program will depend on two main subroutines, putstr and getstr,
each of which will be placed in a separate module. The putstr subroutine will need
to calculate the length of the string in order to print the entire string in one call to the
operating system; for this calculation we will use the familiar strlen, which we
will also put into a separate module. Another module will contain
subroutine that organizes the _exit call; we will call it quit. All modules will be
named the same as the subroutines they contain: putstr.asm, getstr.asm,
strlen.asm and quit.asm.
To organize system calls, we use the kernel macro, which we described on
page. 663. We will also put it in a separate file, but this file cannot be a full-fledged
§ 3.7. Split broadcast 676
module. Indeed, a module is a unit of translation, while a macro, in general, cannot be
translated into anything: as we noted earlier, macros completely disappear during
translation and there is nothing left of them in the object code. This is understandable,
because macros are a set of instructions not for the processor, but for the assembler
itself, and in order for a macro to be of any use, the assembler must, of course, see the
macro definition wherever it encounters a reference to the macro. That's why we will
connect the file containing our kernel macro to other files with the '/include
directive at the stage of macro-processing (unlike modules, which are assembled into a
single whole with the help of the link editor much later - after the translation is
completed). We will call this file kernel.inc; we may well start with it by opening
it for editing and typing in the macro definition given on page 663; nothing else in this
file. 663; nothing else needs to be typed in this file.
Next we will write the strlen.asm file. It will look like this:
;; asmgreet/strlen.asm ;;
global strlen
section .text
; procedure strlen
; [ebp+8] == address of the string
strle push ebp esp
n: mov ebp,
eax
xor eax, ;
mov ecx, [ebp+8] arg1
. p:l cmp byte [eax+ecx 0
jz - .qui ],
inc teax
jmp shor . pl
t
.quit pop ebp
: ret
The first line of the file indicates that the strlen label will be defined in this module
and that this label should be made visible from other modules. It is better to place
global and extern directives at the very beginning of the module text for clarity.
We will not comment the code of the procedure in detail, as we are already familiar
with it.
With the strlen procedure at our disposal, let's write the putstr.asm
module. The putstr procedure will call strlen to calculate the length of the
string and then call the write system call; the new procedure will differ from the
print_str procedure we wrote in the example that prints command line
arguments by using the kernel macro.
;; asmgreet/putstr.asm ;;
'/include "kernel.inc" ; need kernel macro
global putstr ; module describes putstr
extern strlen ; and uses strlen itself
§ 3.7. Split broadcast 677
section .text
; procedire putstr
; [ebp+8] = address of the string
putstr push ebp ; normal start
: mov ebp, esp ; subroutines
push dword [ebp+8] ; call strlen to
call strlen ;calculate string
length.
add esp, 4 ; result now
inEACH
kernel 4, 1, [ebp+8], eax ; call write
mov esp, ebp; normal termination
pop ebp ; subroutines
the turn of the most complex module - getstr. The getstr procedure
Now it is ret
will receive as input the address of the buffer in which the read string should be placed,
as well as the length of this buffer to prevent it from overflowing if the user thinks of
typing a string that will not fit in the buffer. To simplify the implementation, we will
read the string one character at a time. Of course, real programs don't do this, because
a system call is quite expensive in terms of program execution time, and it's a bit
wasteful to spend it on a single character; but our goal now is not to get an efficient
program, so we can make our lives a bit easier.
The getstr subroutine will use the EDX register to store the address of the
current position in the buffer and the ECX register to store the total number of
characters read; at the beginning of the loop, ECX will be incremented by one
and its new value will be compared with the value of the second argument of our
procedure (i.e. the buffer size). This will allow us, in case of a threat of buffer overflow,
to terminate the execution of the procedure by writing a limiting zero to the end of the
buffer - there is still enough space for it in the buffer, because we will write it in this
case instead of reading the next symbol. Register EDX, we will also increase by one,
but already at the end of the cycle, after reading the next character and checking
whether it is not a character of the end of the line. When the end of the line is detected,
we will transfer control outside the loop without incrementing EDX, so that the limit
zero will be written to the buffer over the end-of-line character. There is also a third
case in which the character reading loop will be terminated - an "end of file" situation
occurs on standard input; in this case, no character will be read into the next buffer cell,
but instead a zero will be written into that cell.
Since our procedure will only use the ECX, EDX and AL registers, the CDECL
convention will be followed without any extra effort. The kernel macro is also
written in accordance with CDECL and may corrupt the values of the EAX, ECX and
EDX registers; we do not store anything long-term in EAX, we only use its low byte
(AL) for short-term storage of the read character code to compare it with the line feed
code; but ECX and EDX we will have to store on the stack before calling kernel
and restore afterwards. The complete getstr.asm module will look like this:
;; asmgreet/getstr.asm ;;
§ 3.7. Split broadcast 678
^inclu "kernel.inc" ; you need a kernel macro
de
global getstr ; getstr is exported
section .text
getstr: ; argl buffer address, arg2 - length
- ; standard start
push ebp mov ; procedures
ebp, esp ; ECX -- read count
xor ecx, ecx ; EDX -- current address in
mov edx, the buffer
[ebp+8] .again: inc ; increase the counter
ecx immediately
cmp ecx, ; and compare with the buffer
[ebp+12] jae size
.quit push ecx 1 ;; read if there's no room
1 character into--the
get
push edx out.
buffer
kernel 3, 0, ;
; recovering
save registers ECX
edx, pop edx ; andand
EDX EDXECX
pop ecx cmp ; system call returned 1?
eax, 1 jne ; if not, we're out.
.quit mov al, ; code of the read character
[edx] cmp al, ; -- is that a line feed
10 je .quit inc code?
edx jmp .again ; if so, we're out.
mov [edx], byte
; increment the current
0 mov esp, ebp address
.quit: pop ebp ; continue the cycle
ret
; -enter the :limiting 0
Now let's write the simplest of our modules quit.asm
; standard completion
;; asmgreet/quit.asm ;; ; procedures
'/include "kernel.inc"
global quit
section .text
quit: kernel 1, 0
All the subroutines are ready; let's start writing the head module, which we will call
greet.asm. Since all calls to system calls are placed in subroutines, we won't need
the kernel macro in the head module (and, therefore, the inclusion of the
kernel.inc file). We will describe the text of messages generated by the program
as usual in the form of initialized strings in the .data section; we should only
remember that in this program all strings must have a zero byte limiting them. We will
place the buffer for reading the string in the .bss section. The .text section
will consist of solid subroutine calls.
§ 3.7. Split broadcast 679
;; asmgreet/greet.asm ;;
global _start ; this is the head unit
extern putstr ; it uses subroutines
extern getstr ; putstr, getstr and
extern quit quit
sectio .data
; describe the messages
n db 'Hi, what
nmq is text your 0
db 'Pleased name?', 10, 0
p ym meet you, dear
excel to
db' !', 10, 0 ',
sectio .bss ; allocate und buffe
n buf resb512 memory er r
buflen equ$-buf
; the beginning of the
sectio
parent program
n
; call putstr for nmq
_start .text
: push dword nmq ; call getstr
call putstr add ; with parameters buf
esp, 4 push dword and
buflen push dword ; buflen
buf call getstr
add esp, 8 push ; call putstr for pmy
dword pmy call
putstr add esp, 4
push dword buf ; call putstr for
call putstr add ; the string entered
esp, 4 push dword ; by the user
exc ; call putstr for exc
§ 3.7. Split broadcast 680
call putstr
add esp, 4
call quit ; call quit
So, our working directory now contains the files kernel.inc, strlen.asm,
putstr.asm, getstr.asm, quit.asm and greet.asm. To get a working
program, we need to call NASM for each of the modules separately (remember that
kernel.inc is not a module):
nasm -f elf -dOS_LINUX strlen.asm
nasm -f elf-dOS_LINUX
nasm -f elfputstr.asm
nasm -f -dOS_LINUX
elf getstr.asm
Notenasm -dOS_LINUX
-f -dOS_LINUX
that the checkbox is needed only for those modules that use
elf quit.asm
kernel.inc, so we could have left it unchecked when compiling strlen.asm and
-dOS_LINUX
greet.asm. However, practice shows that it is easier to always specify such checkboxes
greet.asm
than to remember which modules need them and which do not.
The result of NASM work will be five files with the suffix ".o" representing object
modules of our program. To combine them into an executable file, we will call the ld
linkage editor (on 64-bit systems do not forget to add the -m elf_i386 checkbox):
The result this time will be an executable file called greet, which we will run as
usual with the ./greet command:
avst@host:~/work$ ./greet
Hi, what is your name?
Andrey Stolyarov
Pleased to meet you, dear Andrey Stolyarov!
avst@host:~/work$
It is clear that when translating the source code, the assembler, seeing the reference
to an external label, cannot replace this label with a specific address, because it does
not know it - after all, the label is defined in another module, which the assembler does
not see. All the assembler can do is to leave a free space for such an address in the final
§ 3.7. Split broadcast 681
code and write information into the object file, which will allow the link editor to
arrange all the missing addresses when their values are already known. On closer
examination it turns out that the assembler cannot replace labels with specific addresses
not only in case of references to external labels, but never at all. The point is that, since
the program consists of several (as many as you like) modules, the assembler, when
translating one of them, cannot predict which module will be the last one in the final
program, what size all the preceding modules will be and, therefore, cannot know in
which memory area (even virtual memory) the code that the assembler is generating
now will be located.
Obviously, the linking editor does not see the source code of modules, and cannot
see it, since it is intended to link modules derived by different compilers from source
code in, quite possibly, different programming languages. Consequently, all the
information that is required for the final transformation of object code into executable
machine code must be written to the object file. The object code, which is obtained as
a result of assembly, is a kind of "semi-finished product" of machine code, in which
instead of absolute (numerical) addresses there is information about how to calculate
these addresses and where they should be placed in the code.
Note that you can find out information about the characters contained in the object
file using the nm program. As an exercise, try to apply this program to the object files
of modules you have written (or modules from the above examples) and interpret the
results.
3.7.4. Libraries
Most often, programs are not written "from scratch", as we have done in most
examples, but use sets of ready-made subroutines in the form of libraries. Naturally,
such subroutines are included in modules, and it is more convenient to have the modules
themselves in precompiled form, so as not to waste time on compilation; of course, it
is useful to have the source code of these modules available, but libraries are used more
often in precompiled form. Generally speaking, there are different kinds of program
libraries; for example, there are macro libraries, which of course cannot be precompiled
and exist only as source code. Here, however, we will consider a narrower concept,
namely what is meant by the term "library" at the link editor level.
From a technical point of view, a subroutine library is a file that combines some
number of object modules and, as a rule, contains tables for accelerated search of
symbol names in these modules.
Note one important property of object files: each of them can be included in the
final program only in its entirety or not included at all. This means, for example, that
if you have combined several subroutines in one module, and someone needs only one
of them, the executable file will still contain the code of the whole module (i.e. all
subroutines). It is worth keeping this in mind when dividing a library into modules; for
example, system libraries supplied with operating systems, compilers, etc. are usually
organized according to the "one function, one module" principle.
Libraries are compiled from individual object modules using specially designed
programs. In Unix, the corresponding program is called ar. Its original purpose was
§ 3.7. Split broadcast 682
not limited to creating libraries (the very name ar means "archiver"), so when you
call the program, you must specify with a command line parameter what you want it to
do. For example, if we wanted to combine all modules of the greet program into a
library (except, of course, the main module, which cannot be used in other programs),
we could do it with the following command:
The choice of file name for a library should be noted separately. The .a suffix (from
the word archive) is considered standard for static library files in Unix, but there is
more to it than that. There is a rather unobvious convention that you should add not
only a suffix to the library name (which is clear and familiar), but also a prefix - these
three letters lib. So in this case the library name is just greet, while the name of
the file containing the library is libgreet.a, this file will be the result of ar.
After that, you can link the greet program with the link editor by specifying the
name of the library file:
ld greet.o libgreet.a
But you can do something else - specify with flag -1 the name of the library (in
our case just greet), and with flag -L - the directory where the library with this
name should be searched (in our case the current one):
ld greet.o -l greet -L .
This approach is convenient for libraries installed in the system, because the linker
knows the system directories itself and the -L flag is not needed.
Unlike a monolithic object file, a library, while packed into a single file, continues
to be a set of object modules from which the link editor selects only those it needs to
satisfy unresolved links. This will be discussed in more detail in the next paragraph.
We do not consider here the case of so-called shared libraries whose files have a .so suffix; the
279
concept of dynamic loading requires additional discussion that is beyond the scope of our book.
§ 3.7. Split broadcast 683
references, i.e. there are modules importing these symbols (for NASM these are
symbols declared with the extern directive and then used), but which have not yet
met in any of the modules as exported.
The linkage editor starts by initializing the list of allowed symbols as empty and
the list of unallowed symbols as containing only one entry point label (the default is the
_start label), and proceeds step by step from left to right through the list of objects
specified on its command line. In case the next specified object is an object file, the
linkage editor "accepts" it into the generated executable file. All symbols exported by
this module are entered into the list of known symbols; if some of them were present in
the list of unauthorized links, they are removed from it. Symbols imported by the
module are added to the list of unresolved references, unless they appear in the list of
known symbols at that time. Object code from the module is accepted by the link editor
for further conversion into executable code and insertion into an executable file.
If the next object in the list specified on the command line is a library, the link
editor's actions will be more complex and flexible, since there may be no need to accept
all the modules that make up the library. First of all, the link editor will check the list
of unresolved links; if the list is empty, the library will be completely ignored as
unnecessary. Usually, however, the list is not empty in this situation (otherwise the
programmer would not have specified the library), and the next action of the link editor
is to try to find modules in the library that export one or more characters with names
appearing on the current list of unresolved links; if such a module is found, the link
editor "accepts" it, modifies the character lists accordingly, and starts looking at the
library again, and so on until none of the remaining unaccepted modules in the library
are found. Then the link editor stops examining the library and moves on to the next
object in the list. As a result, only those modules are taken from the library that are
needed to satisfy the character import needs of the preceding modules, plus possibly
those modules needed by already accepted modules from the same library. Thus, when
building the greet program from the previous paragraph, the linkage editor first took
the getstr, putstr, and quit modules from the libgreet.a library because
they contained characters imported by the previously accepted greet.o module;
then the linkage editor took the strlen module as well, because the putstr
module needed it.
The link editor generates error messages and refuses to continue building the
executable in two main cases. The first of them occurs when the list of objects (modules
and libraries) is exhausted, but the list of unresolved references is not empty, i.e. at least
one of the received modules refers as an external reference to a symbol that never
appeared in any of the modules; such an error situation is called an undefined reference.
The second case of the error situation is the appearance in the next accepted module
exported symbol, which at this point is already in the list of known; in other words, two
or more accepted modules export the same symbol. This is called a name conflict . 280
280
Modern link editors, in order to please careless programmers, allow some cases of name conflict
not to be considered an error; this is used, for example, by C+ compilers. Try not to rely on such features
as much as possible.
§ 3.7. Split broadcast 684
Interestingly, the link editor never goes backwards in its progress through the
object list, so that if a module in a library was not accepted when the editor got to that
library, it will not be accepted again, even if an imported symbol appears in any of the
subsequent modules, which could have been resolved by accepting more modules from
the previously processed library. An important consequence of this fact is that object
modules should be specified before the libraries that these modules need. A second
important consequence is that libraries should never cross-depend on each other, i.e.
if one library uses the features of a second library, the second library should not use the
features of the first. If such cross-dependencies occur, the two libraries should be
merged into a single library, although it is better to first consider whether some of the
dependencies can be eliminated, even at the cost of duplicating functionality.
One more important remark. As long as libraries do not depend on each other at
all, we don't have to worry too much about the order of parameters for the linkage
editor: it is enough to first specify all the object files that make up our program in any
order, and then, again in any order, list all the necessary libraries. If dependencies
between libraries appear, the order of their specification becomes important, and if it is
not observed, the program will not be built. As you can see, library dependencies, even
if they are not cross-referenced, pose certain problems; it must be said that the order of
arguments for the link editor is by no means the most serious of these problems,
although it is the most obvious. Therefore, before relying on the capabilities of another
library, you should think carefully and repeatedly; it is better not to allow such
dependencies at all, i.e., you should try to design libraries so that they never use the
capabilities of other libraries. If this does happen, you should consider merging such
libraries into one - but this is also not always the right thing to do, since the features of
any library must be logically unified in some way.
Knowing how the link editor works will be useful not only (and not so much) in
assembly language programming, but also in practical work in high-level programming
languages, especially C and C+-+. If you do not take into account the contents of this
paragraph, you risk, on the one hand, overloading your executables with unnecessary
(unused) content and, on the other hand, designing your libraries in such a way that you
start to get confused.
To conclude the discussion of the linker editor, we should note that it (as a program)
is quite complex, although you may never need most of its functionality. Anyway, ld
recognizes several dozen different command-line options and can even handle special
linker scripts that control linking. We won't look at scripts, as we won't look at most of
the options; we'll focus on just a few that, first, you may need, and second, give you an
idea of how and what you can control.
We have already seen several flags: -1 allows you to connect a library by its short
name (without specifying the full file name and path), -L adds new directories to the
beginning of the directory list, where the linker will look for library files connected by
-1. The -o flag specifies the name of the resulting file (usually an executable). The
-m flag specifies the architecture (if you like, the "platform") for which the build is
performed; for example, we have repeatedly mentioned that when building 32-bit
programs on 64-bit systems you should specify the -m e1f_i386 switch.
The -nostdlib flag removes "system" directories from the search list,
§ 3.7. Split broadcast 685
leaving only those you specify with -L; the -e option allows you to set a label name
for the entry point other than _start (for example, if you include gogogo in the
-e linker command line, the label gogogo will be used instead of _start). The
-u option takes a symbol name as a parameter; the linker with this option will display
a runtime message about every mention of the specified symbol it encounters in object
files and libraries.
In the future, when programming in C and C++, you may find useful the -static
flag, which prohibits the use of dynamic library versions and makes the resulting
executable file statically compiled, independent of anything external, and the -s flag (from
the word strip), which removes all "unnecessary" information (mainly debugging information)
from the resulting file. When programming in assembly language, we did not use any dynamic
libraries, and we did not put debugging information into object files either, although it is
possible if you specify the -g flag to nasm^, which is similar in meaning to the same flag in
Free Pascal (see §2.13.4, page 502).
(see Table 3.5). The mantissa usually satisfies the ratio 1 6 t < 2, which allows not
storing its integer part, implying it is equal to 1, although in the enhanced-precision
format, the mantissa can be stored as 1.
The integer part of the mantissa is still stored (one bit is allocated for it). The arithmetic
value of a floating-point number is defined as
1Ь
(-1) - 2 ' - ts
where s is the sign bit, p is the order value (as an unsigned integer), b is the order offset
for this format, t is the mantissa.
In all formats, the value of order, consisting of only zeros or of only ones, is
considered as a sign of a special form of number. Of all these forms, only the ordinary
zero, whose representation consists of only zeros - in sign, in order, and in mantissa -
turns out to be an ordinary number. Zero was included in the "special cases" for one
simple reason: it obviously cannot be represented by a number with a mantissa between
1 and 2. All other "special cases" indicate that something has gone wrong, the only
question is how serious the "wrong" is.
In particular, a number whose sign bit is set to one and all other bits - both in the
mantissa and in the order - are zero, means "minus zero". The occurrence of "minus
zero" in calculations indicates that in fact there must be a negative number, so small in
modulo that it can not be represented at all with at least one significant bit in the
mantissa. However, the same situation may occur with "plus zero" (a positive number
too small in modulo), but an ordinary zero may still be really a zero and not the result
of a rounding error, while "minus zero" is always the result of such an error. How
"serious" it is depends on the problem being solved. In most applied calculations the
difference between zero and, for example, the number 2 , or even 2 , is of no
-1000 -30
importance, such small values can usually be simply neglected: for example, if
calculations show that a car is moving at a speed of 10- km/h, then in any reasonable
10
sense of the word we should consider that this car is standing still.
A non-zero mantissa at zero order means a denormalized number. In this case, it
is assumed that the offset order is equal to its minimum allowed value (-126, -1022, -
16832; the value of the order without taking into account the offset is one), and the
281
The corresponding English terms are single precision, double precision and extended precision.
§ 3.8. Floating-point arithmetic 696
integer part of the mantissa is zero. The appearance of denormalized numbers in the
IEEE-754 standard has caused and continues to cause serious criticism, because it
dramatically complicates the implementation of processors, and software too, without
giving any practical gain. Anyway, the coprocessor can (i.e. physically capable) to
carry out calculations with denormalized numbers, but if you found denormalization in
your calculations, it will be more correct to change the units of measurement used - for
example, the calculation of capacitor capacitance to be carried out in picofarads instead
of farads, and the diameter of a gear of some mechanism to represent in millimeters
instead of, say, in light years.
However, even if you measure microbes in units more suitable for galaxies, you will not
get into the area of denormalization - for this you need something more serious. Judge for
yourself. In astronomy the unit of distance called the parsec is quite popular, it's a little over
three light years. Distances to the most distant objects that astronomers still somehow manage
to deal with are measured in gigaparsecs; it's probably safe to say that a gigaparsec is the
largest unit of length used in practice. A gigaparsec is 3.0857-10 meters. Let us now pass to
26
the other end of the scale; microbes we will leave alone, let us consider viruses at once. The
size of a common influenza virion is something around 100 nm, i.e. 10 meters. Knowing this,
-7
we find that the influenza virion is about 3.25-10 gigaparsecs in diameter. The machine (i.e.,
-33
binary) order to represent such a number would be -112, meaning that even for single
precision numbers, we would still have 14 powers of two left before we hit the denormalization
region - enough to break the unfortunate virion into individual atoms. To tell the truth, if we
want to measure an electron in the same gigaparsecs, we will still lack orders of magnitude in
the single-precision number, but nobody prevents us from taking a double-precision number,
especially since everyone usually works in them; and there, having at our disposal or ders up
to 2 , we can easily express the Planck length (within the framework of modern physical
-1022
concepts - the smallest possible length, i.e. it can't be less in any way), taking as a unit, say,
the diameter of the observable part of the Universe; however, it will turn out to be "only"
something about 10 , i.e. the machine order will be minus two hundred and something, which
-62
is far from a thousand. All this will help to evaluate the mental abilities (and most importantly -
the degree of irresponsibility) of the authors of IEEE-754, who, having already described a tool
for measuring quantities much smaller than it can be meaningful (which is normal, everything
should be done with a reserve, and the digit capacity of the machine order does not affect the
complexity of implementation), nevertheless invented denormalized numbers, sharply
complicating for their sake both the implementation of processors and the rules of working
with them.
The order consisting of only units is used to denote various kinds of "nonnumbers":
plus-infinity and minus-infinity (the mantissa consists of zeros, the sign bit indicates
whether infinity is positive or negative), as well as uncertainty, "silent nonnumber",
"signaling nonnumber", and even "unsupported number" (there are units in the
mantissa, the type of nonnumber depends on their specific location). In normal
calculations nothing of this kind can occur; for example, "infinities" occur when
division by zero is performed, but the processor is configured so as not to initiate an
exceptional situation (internal interrupt) and not to give control to the operating system.
In normal calculations, the processor always initiates an exception for division by zero
and other similar circumstances. The application of calculations that do not stop when
performing operations on obviously incorrect source data is a separate rather nontrivial
§ 3.8. Floating-point arithmetic 697
subject that we will not consider.
Since the coprocessor can handle real numbers stored in memory in any of the three
formats listed above, the assembler has to support designations for eight-byte and ten-
byte memory areas. To indicate such operand sizes in instructions, the NASM
assembler provides the keywords qword (from quadro word, "quadruple word") and
tword (from ten word). There are also corresponding pseudo-commands for
describing data (dq specifies an eight-byte value, dt - a ten-byte value), as
well as for reserving uninitialized memory (resq reserves a specified number
of eight-byte elements, rest - a specified number of ten-byte elements).
In pseudo-commands dd and dq, floating-point numbers can be used along with
the usual integer constants, and dt only allows them as initializers. NASM
distinguishes floating-point numbers from integers by the presence of a decimal point;
"scientific notation" is allowed, i.e., for example, 1000.0 can be written as 1.0e3,
1.0e+3, or 1.0E3, and 0.001 can be written as 1.0e-3, and so on.
NASM also supports writing floating-point constants in hexadecimal notation, but this
feature is rarely used.
Note that the coprocessor itself, if it is not interfered with, performs all operations
with numbers of increased precision, and uses numbers of other formats only during
loading and unloading.
is the top of the stack and is labeled ST0, the next one is labeled ST1, etc., and it is
considered that R0 follows R7 (for example, if R7 is currently labeled as ST4,
then the role of ST5 will play the register R0, ST6 will be in R1, etc.). Fig.
3.10 shows the situation when the top of the stack declared register R3; the role of the
top of the stack can play any of the registers Rn, and when entering a new value in
this stack, all the values that have already been stored there, remain in their place, and
only the number of the register that plays the role of the top changes, that is, if the stack
shown in the figure, bring a new value, then the role of the top - ST0 - will go to the
register R2, the register R3 will be designated ST1, and so on. When a
value is removed from the stack, the opposite action occurs. Note that these registers
can be accessed only by their current number in the stack, that is, by the names ST0,
ST1, ..., ST7. You cannot address them by their permanent numbers (R0, R1, ...,
R7), the processor does not give this possibility.
The designations ST0, ST1, ..., ST7 correspond to NASM conventions. Other
assemblers use other designations; in particular, MASM and some other assemblers
designate the arithmetic coprocessor registers using parentheses: ST(0), ST(1), ..., ST(7),
and these are the designations most often found in the literature. Do not be surprised by this.
To control the course of calculations are used service registers CR, SR and TW,
the structure of which is shown in Figure 3.11. 3.11. These registers consist of separate
flag bits, contain several two-bit fields and one three-bit field. Now for completeness
we will tell about all bits of these registers, but if something will be unclear - note that
it is not terrible, you always have time to return to this paragraph.
The SR (state register) contains a number of flags describing, as the name
implies, the state of the arithmetic coprocessor. In particular, bits 13, 12 and 11 (three
bits in total) contain a number from 0 to 7, called TOP, which indicates which of the
Rn registers is currently considered to be the top of the stack. Flags C0 (bit 8), C2
(bit 10) and C3 (bit 14) correspond in meaning to the CPU flags CF, PF and ZF;
there is also a flag C1, but it is rarely used.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 10
CR IC RC PC IEM PM UM OM ZM DM IM
15 14 13 12 11 10 9 8 7 6 5 4 3 2 10
SR B C3 TOP C2 C1 C0 IR SF PE UE OE ZE DE IE
§ 3.8. Floating-point arithmetic 699
15 14 13 12 11 10 9 8 7 6 5 4 3 2 10
The lower six digits of the SR register indicate special situations: loss of precision
(PE), too large or too small result of the last operation (OE and UE, overflow,
underflow), division by zero (ZE), denormalization (DE), invalid operation (IE).
They are joined by the SF bit, indicating stack overflow or anti-overflow (SF). All
these bits will be discussed in detail in §3.8.7.
The IR (interrupt request) flag indicates the occurrence of an unmasked
exceptional situation resulting in the initiation of an internal interrupt; you can only see
this flag set in an interrupt handler within the operating system, so it does not concern
us. Finally, bit B (busy) means that the coprocessor is currently busy with
asynchronous execution of the command. It should be said that in modern processors
this bit cannot be seen set except in the interrupt handler.
Control register CR (control register) also consists of individual flags, but, unlike
the status register, these flags are usually set by the program and are designed to control
the coprocessor, that is, to set the mode of operation. The lower six bits of this register
correspond to special situations in the same way as the lower six bits SR, but are
designed not to signal the occurrence of these situations, but to mask them - if the
corresponding bit contains a one, a special situation will not lead to the occurrence of
an internal interrupt, but only to set a bit in the register CR. Bits 11 and 10 (RC,
rounding control) set the mode of rounding the result of the operation: 00 - to the
nearest number, 01 - downward, 10 - upward, 11 - towards zero (that is,
to reduce the absolute value).
Bits IC (12) and IEM (7) of the CR register are not used in modern processors.
Bits 9 and 8 (PC, precision control) set the precision of the operations performed: 00
- 32-bit numbers, 10 - 64-bit numbers, 11 - 80-bit numbers (this mode is used by
default and the need to change it arises very rarely).
The TW tag register contains two bits to indicate the state of each of the registers
R0-R7: 00 - the register contains a number, 01 - the register contains zero, 10 -
the register contains a special kind of number (NAN, infinity or denormalized
number), 11 - the register is empty. Initially, all eight registers are marked
as empty, as numbers are added to the stack, the corresponding registers are marked as
filled, when numbers are removed from the stack - again as empty. This allows tracking
stack overflow and anti-overflow - such situations when the ninth number is added to
the stack (which has nowhere to put) or, on the contrary, an attempt is made to extract
a number from an empty stack.
Service registers FIP and FDP are designed to store the address and operand of
the last machine instruction executed by the coprocessor and are used by the operating
system when analyzing the causes of an error (exceptional) situation.
§ 3.8. Floating-point arithmetic 700
The mnemonic designations of all machine commands related to the arithmetic
coprocessor begin with the letter f from the English letter floating; the phrase "floating
point" in English sounds like floating point. Most of these commands have no operand
or have one operand, but there are also commands with two operands. As an operand
can be coprocessor registers, denoted STn, or operands of the "memory" type. The
coprocessor does not support direct operands, i.e. floating point numbers directly in the
instruction.
loads an element with the number stored in the ECX register onto the stack from the
matrix array of eight-byte numbers. At the same time, the value of the TOP number
in the SR register is decremented, so that the top of the stack is moved up, the old top
is named ST1, etc.
To retrieve the result from the coprocessor (from the top of the register stack) you
can use the commands fst and fstp, which have one operand. Most often it is
an operand of the "memory" type, but it is also possible to specify a register from the
stack, for example, ST6, it is only important that this register must be empty.
The main difference between these two commands is that fst simply reads the
number at the top of the stack (i.e. in the ST0 register), while fstp fetches the
number from the stack by marking ST0 as free and incrementing the value of TOP.
Actually, the letter "p" in the name of the fstp command stands for the word pop.
For some reason, the fst command cannot work with 80-bit operands of the
"memory" type, while fstp has no such limitation. There is one more thing to note:
the command
fstp st0
first writes the contents of ST0 into itself and then pushes ST0 out of the stack, so
that the effect of this command is to destroy the value at the top of the stack. This is
usually done if the number at the top of the stack is not needed for further calculations.
It is often necessary to convert an integer number into floating-point format and
vice versa. The fild command allows you to take an integer from memory and write
it to the coprocessor stack (of course, in floating-point format). The command has one
§ 3.8. Floating-point arithmetic 701
operand, necessarily of the "memory" type, of the word, dword or qword size
(in this case it means an eight-byte integer). The fist and fistp commands
perform the opposite action: they take the number located in ST0, round it to an integer
in accordance with the set rounding mode and write the result into memory at the
address specified by the operand. Similar to the fst and fstp commands, the
fist command does not modify the stack itself in any way, while the fistp
command removes a number from the stack. The operand of the fistp command
can be of word, dword or qword size, while the fist command can work only
with word and dword.
The fxch command swaps the contents of the stack top (ST0) and any other
STn register that is specified as its operand. The registers must not be empty. Most
often fxch is used to swap ST0 and ST1, in which case the operand can be
omitted.
The coprocessor supports a number of commands to load frequently used constants
onto the stack: fld1 (loads 1.0), fldz (loads +0.0), fldpi (loads l), fldl2e
(loads log e), fldl2t (loads log 10), fldln2 (loads ln2), fldlg2
2 2
(loads lg2). All of these commands have no operands; as a result of the execution
of each of them, the value of TOP is decremented and the corresponding value
appears in the new register ST0. The set rounding mode determines in which direction
the loaded approximate value will differ from the mathematical value.
operands are written first, then the operation sign; the operands can be any complex
expressions, also written in POLIZ. For example, the expression (x + y) * (1 - z) would
be written in POLYZ: x y + 1 z - *. Let x, y and z be described as memory
areas (variables) of length qword and contain floating point numbers. Then to
compute our expression we can simply translate the POLYZ entry into an assembly
language entry, and each element of POLYZ will become a single instruction:
fld qwor [x] ; x
fld d qwor [y] ; у
faddp d ; +
fldl ; 1
fld qwor [z] ; z
fsubp d ;
fmulp ; *
The result of the calculation will be in ST0. However, using other forms of arithmetic
commands can shorten the program text considerably; as it is easy to see, the following
fragment does exactly the same thing:
fldqword [x]
faddqword [y]
fldl
fsub qword [z]
fmulp
The single operand commands fiadd, fisub, fisubr, fimul, fidiv and
fidivr are sometimes useful, performing the appropriate arithmetic operation on
ST0 and its operand, which must be of type "memory" of size word or dword and
It is useful to know that in English this would be RPN from the words reverse polish notation; the
144
Since our processor can calculate only binary logarithms, the role of b will naturally be
two; to work with any base a other than 2, we need to calculate y = log1а in advance and
use it as the second operand of the fyl2x instruction, saving both multiplications and
repeated calculations of the coefficient (number y).
Having dealt with fyl2x and being in a good mood, we come across the next
command - fyl2xp1, and all our mood vanishes in a flash. This command works
the same way as the previous one, only it calculates y - log2 (x + 1). In addition, the
documentation says that the value of x must not modulo exceed 1 - -D, otherwise the
result is undefined.
To understand why the creators of the coprocessor needed such a monster, recall
that logo 1 is zero for any base; hence, if the value of x is close to one, then logo x will
be close to zero. Now imagine that your argument of the logarithm is very close to one
- for example, different from one by the value £ = 2 . This value itself is perfectly
-100
the result back to ST0. The argument must not be modulo greater than 1, otherwise the
result is undefined. Here, newcomers, who are not adept in the intricacies of
computational math, are usually stumped by the question of why the subtraction of one
is necessary. You can guess it by carefully reading the description of fyl2xp1 (try
it before you read further!), but if you can't guess it, remember that a 0 = 1 for any
§ 3.8. Floating-point arithmetic 707
х
positive a, well, for values of x close to zero, a (including, of course, 2 ) is close to
145 Ж
one. If we try to represent the result as an ordinary floating-point number (of any
precision) when calculating the power function in the neighborhood of zero, the
machine order will be zero all the time because of its proximity to one, so the loss of
precision (when the result of calculations ceases to differ from one at all) will come
very quickly, because the mantissa is not so long (even for a number of higher precision
there are "only" 64 bits, so 1 + 2 cannot be distinguished from one), whereas if the
126 -65
result of the calculation is not the value 2 , but its difference from one, the use of
Ж
machine order will allow us, even when working with single-precision numbers, not
only not to lose significance, but even not to go into the area of denormalization, while
high-precision numbers with their 15-bit order will give us the opportunity to feel fine
when approaching one, for example, at 2- . 16000
Recall, just in case, that for non-positive bases the degree function is undefined, only integer
145
fstsw ax sahf
The first of them copies SR to the AX register, and the second one loads some (not
all!) flags into FLAGS from AH. In particular, after executing these two commands,
the value of flag C3 is copied to ZF and the value of C0 is copied to CF , 42
which fully meets our needs: now we can use for conditional transition any of the
commands provided for unsigned integers: ja, jb, jae, jbe, jna, etc. (see Table
3.3 on page 575). Let us emphasize once again: the use of these commands is only due
to the fact that after executing fstsw and sahf, the result of the comparison is
in the flags CF and ZF; there is nothing else in common between floating-point
numbers and unsigned integers, generally speaking.
Suppose, for example, we have variables a, b and m of size qword containing
floating point numbers, and we want to put the smaller of a and b into m. This can
be done as follows:
Overflow (#O) occurs when the result of an operation is too large to be represented
as a floating-point number of the required size. This can happen, for example, when
transferring a number from the internal ten-byte representation to a four- or eight-byte
representation using, for example, the fst command, if the number does not fit into
the new representation.
Underflow (#U) means that the result of the next operation is so small modulo that
it cannot be represented as a floating-point number of the size specified in the
instruction (including when executing the fst instruction with an operand of
the "memory" type of the qword or dword size). The difference between #U and
#D (denormalization) is that it is the result of calculation or conversion to another
format, not the original denormalized operand.
Loss of Precision (#P) occurs when the result of an operation cannot be represented
accurately by the means available; in most cases, this is perfectly normal.
As discussed in §3.8.2, in each of the CR and SR registers, the lower six bits
correspond to exceptional situations in the order in which they are listed: bit #0
corresponds to an invalid operation, bit #1 corresponds to denormalization, etc.; bit #5
corresponds to loss of precision. In addition, in the SR register, bit #6 is set if the
invalid operation that caused bit #0 to be set is due to a stack error. The CR register
bits control what the processor should do when an exceptional situation occurs. If the
corresponding bit is reset, an internal interrupt will be initiated when an exception
occurs (see §3.6.3). If the bit is set, the exception is considered masked and the
processor will not initiate any interrupts when the exception occurs; instead, it will try
to synthesize a relevant result as much as possible (e.g., when dividing by zero, the
result will be an "infinity" of the appropriate sign; when accuracy is lost, the result will
be rounded to a number representable in the format used, according to the set rounding
§ 3.8. Floating-point arithmetic 710
mode, etc.).
When any exceptional situation occurs, the coprocessor sets the corresponding bit
(flag) in the SR register to one. If the situation is not masked, this bit is useful to the
operating system in the interrupt handler to understand what happened; if the situation
is masked and no interrupt occurs, the set flags can be used in the program to track
down the exceptions that occurred. Note that these flags do not reset themselves, they
can only be reset explicitly, and this is done with the fclex command. The
commands for interacting with the CR and SR registers will be discussed in detail in
§3.8.9.
3.8.8. Exceptions and the wait command
There is one non-obvious peculiarity associated with the handling of exceptional
situations: the instruction, the execution of which led to an exception, only raises a flag
in the SR register, but does not initiate an internal interrupt, even if the corresponding
flag in CR is not set. The coprocessor remains in this state until execution of the
next instruction begins. The problem here may arise if the instruction that caused the
exception uses an operand located in memory (for example, an integer), and between
this instruction and the following instruction of the arithmetic coprocessor there is an
instruction executed by the main processor that changes the value of the operand placed
in memory. In this case, by the time an internal interrupt is initiated, the value that
caused the interrupt will have been lost. For example, if during execution of a sequence
of instructions
fimuldword [k]
fsqrt
mov [k], ecx
The processor supports a special command fwait (or simply wait, these are two
designations for the same machine instruction), which checks the coprocessor status
register for unmasked exceptions and initiates an interrupt if necessary. This command
is worth using if the last f-command may have caused an exception and you do not
intend to perform any more operations with floating numbers.
It is interesting that some mnemonics of coprocessor commands actually
correspond to two machine commands: first comes the wait command, then the
command that performs the desired action. An example of such mnemonics is the
already familiar fstsw: it is actually two commands - wait and fnstsw; if
§ 3.8. Floating-point arithmetic 711
necessary, you can use fnstsw separately, without waiting, but to do so you need to
understand exactly what you are doing. The fclex command from the previous
paragraph is organized the same way: this designation corresponds to the machine
commands wait and fnclex. The fnstsw and fnclex commands are
examples of arithmetic coprocessor commands that do not check for unhandled
exceptions before doing their main work.
The contents of the SR register can be obtained with the familiar fstsw instruction,
the operand of which can be either the AX register (and nothing else) or the "memory"
type of word size. There is also an instruction fnstsw, and fstsw is a
designation for two machine instructions wait and fnstsw. Note that the reverse
operation (loading a value) for SR is not provided, which is quite logical: this register
is needed to analyze what is happening. Nevertheless, some commands affect this
register directly. For example, the TOP value can be increased by one with the
fincstp command and decreased by one with the fdecstp command (both
commands have no operands). These commands should be used with caution, because
they do not change the "busy" status of stack registers; in other words, fdecstp
causes ST0 to become an "empty" register, while fincstp causes ST7 to
become "busy" (because it is the former ST0). Another active action with the SR
register that can be performed by the programmer is clearing the exception flags. Such
§ 3.8. Floating-point arithmetic 712
clearing is performed by the commands fclex (clear exceptions) and fnclex,
which we have already mentioned in the previous paragraph.
It is recommended to always execute the fclex command before the fldcw
command, otherwise it may happen that writing the CR register "demask" one of the
exceptions, the flag of which is already raised, causing an interrupt.
The TW register cannot be directly read or written, but there is one instruction
that can directly affect it. It is called ffree, has one operand - the STn register, and
its action is to mark a given register as "free" (or "empty"). In particular, the following
commands remove a number from the stack top "to nowhere":
If you do not know (or have doubts about) the state of the arithmetic coprocessor when
you start calculations, but you know for sure that its registers do not contain any
information useful for you, you can
bring it "back to the initial state" using the finit or fninit command (finit
is a notation for wait fninit, see §3.8.8). The CR register is set to 037Fh
(nearest rounding, highest possible precision, all exceptions masked); the SR
register is zeroed, which means TOP=0, all flags are cleared,
including the exception flags; the FIP, FDP registers are also zeroed, and
the TW register is filled with ones, which corresponds to an empty stack; the registers
that make up the stack are not changed in any way, but since TW is filled with
ones, they are all considered free (containing no numbers).
With the fsave command you can save the entire state of the coprocessor,
i.e. the contents of all its registers, in the memory area to restore it later. This is useful
if you need to temporarily stop some computational process, perform some auxiliary
calculations, and then return to the pending one. To save, you need a memory area 108
bytes long; the fsave command has one operand, it is an operand of type "memory",
and you don't need to specify its size. The fsave mnemonic actually stands for two
machine commands, wait and fnsave. After saving the state in memory, the
coprocessor is "reset" in the same way as with the finit command (see above), so
there is no need to give the finit command separately after fsave. To restore
the previously saved state of the coprocessor you can use the frstor command; like
fsave, this command has a
one operand of the "memory" type, for which the size does not need to be specified
either.
Sometimes it is necessary to save or restore only the auxiliary registers of the
coprocessor. This is done by the commands fstenv, fnstenv and fldenv using a
memory area of 28 bytes; a detailed description of these commands will be omitted.
To conclude the conversation about the coprocessor, let's mention the fnop
command. As you can guess, this is a very important command: it does nothing.
713
Concluding remarks
Of course, we have not even considered a tenth of the i386 processor's capabilities,
and if we talk about extensions of its capabilities that appeared in later processors (for
example, MMX registers), the share of what we have studied will be even more
modest. However, we can write programs in assembly language now, and it will allow
us to get experience of programming in terms of machine commands, which, as it was
said in the preface, is a necessary condition for quality programming in any language
at all: you cannot create good programs without understanding what is really going on.
Readers who wish to learn more about the i386 hardware platform can consult
technical documentation and reference books, which are more than sufficiently
available on the Internet. However, I would like to warn those who wish to do so that
the i386 processor (partly "thanks" to the heavy legacy of the 8086) has one of the most
chaotic and illogical instruction systems in the world; This becomes especially
noticeable once we leave the cozy world of limited mode and "flat" memory model,
where the operating system has carefully arranged for us, and come face to face with
segment descriptor generation, ridiculous jumps between protection rings, and other
"charms" of the platform that the creators of modern operating systems have to struggle
with.
So if you are seriously interested in low-level programming, we can advise you to
study other architectures, for example, SPARC or ARM processors. However, curiosity
is not a vice in any case, and if you are ready for some difficulties - find any reference
book on i386 and study it to your heart's content :-)
Literature
[1] Baurn S. UNIX operating system. M.:Mir, 1986.
[2] Dennis M. Ritchie. The Evolution of the Unix Time-sharing System. In: Lecture
Notes in Computer Science 79: Language Design and Programming Methodology,
Springer-Verlag, 1980. Online version: http://cm.bell-
labs.com/cm/cs/who/dmr/hist.html
[3] Eric S. Raymond. The Art of Programming for Unix. M.: Williams Publishing
House, 2005. The original English version is available online at
http://www.faqs.org/docs/artu/.
[4] Linus Torvalds, David Dimon. Just For Fun (The Story of an Unintentional
Revolutionary). Moscow: Eksmo Press, 2002.
[5] Perelman Ya. Living mathematics. Moscow: Nauka. Gl. ed. fiz.-mat. litt., 1967.
160 с.
[6] Boss W. Lectures on mathematics. Т. 6: From Diophantus to Turing: textbook.
Moscow: ComKniga, 2006. 208 с.
[7] Feynman R. F. You are certainly joking, Mr. Feynman! Transl. from English by N.
A. Zubchenko, O. L. Tikhodeeva, M. Shifman. Moscow: SIC Regular and Chaotic
Dynamics, 2001. 336 с
[8] Wirth N. Algorithms and Data Structures. Per. s Engl. 2nd ed. SPb.: Nevsky
Dialect, 2001. 352 с. ISBN 5-7940-0065-1
[9] Cormen T., Leiserson C., Rivest R. Algorithms: Construction and Analysis. M.:
MCNMO, 2000. 960 с. ISBN 5-900916-37-5.
[10]Knuth D. The Art of Programming, vol. 3. Sorting and Searching, 2nd ed. Per. from
Engl. Moscow: Williams Publishing House, 2000. 832 c. ISBN 5-8459-0082-4
[11]Э. Tannenbaum. Computer architecture. 4th edition. SPb.: Peter, 2003.
http://www.stolyarov.info
Study edition
8.
9.
10. C++ language, OOP and ATD
11. Non-destructive paradigms
12. Compilation, interpretation, scripting
http://www.Stolyarov, info