Professional Documents
Culture Documents
"Every program depends on algorithms and data structures, but few programs depend
on the invention of brand new ones’’.
– Kernighan & Pike
“I will, in fact, claim that the difference between a bad programmer and a good one is
whether he considers his code or his data structures more important. Bad programmers
worry about the code. Good programmers worry about data structures and their
relationships”.
– Linus Torvalds
We study data structures so that we can learn to write more efficient programs. But why
must programs be efficient when new computers are faster every year? The reason is that our
ambitions grow with our capabilities. Instead of rendering efficiency needs obsolete, the
modern revolution in computing power and storage capability merely raises the efficiency
stakes as we attempt more complex tasks.
The quest for program efficiency need not and should not conflict with sound design
and clear coding. Creating efficient programs has little to do with “programming tricks” but
rather is based on good organization of information and good algorithms. A programmer who
has not mastered the basic principles of clear design is not likely to write efficient programs.
Conversely, concerns related to development costs and maintainability should not be used as
an excuse to justify inefficient performance. Generality in design can and should be achieved
without sacrificing performance, but this can only be done if the designer understands how to
measure performance and does so as an integral part of the design and implementation process.
Most computer science curricula recognize that good programming skills begin with a strong
emphasis on fundamental software engineering principles. Then, once a programmer has
learned the principles of clear program design and implementation, the next step is to study the
effects of data organization and algorithms on program efficiency.
As a matter of fact, Program=Algorithm + Data Structure
1
Chapter 1: definitions
There are many different terms used in computer science. Some of these can have different
meanings among the various textbooks and programming languages.
Data: data means a collection of facts concept or instructions in a formalize manner suitable
for communication or processing. Process data is called information. A data item is either the
value of a variable or a constant. A data that does not have subordinate data item is called
elementary item. Data may be organized in many different ways in a computer’s memory. Or
even storage devices. The logical or mathematical model of a particular organization of data is
called data structure. A data structure can be:
Static or dynamic
Linear or non-linear.
A static data structure is one that has a fixed size, means that the size cannot increase
neither decrease during the execution of the program. For this reason, the size is predefined.
Example: arrays, records.
A dynamic data structure on the other hand is one that can grow or shrink, as needed
to contain the data you want to store. That is, you can allocate new storage when it is needed
and discard that storage when you have finish to use it. Dynamic data structures generally
consist of a list of some sample data storage structure linked by the mean of pointers or
references.
Example: you have the linked lists, graphs, and trees.
linearity: a data structure is linear if elements are adjacent to each other. Each element
has exactly two neighbour elements to which it is connected as its previous and next
members.
Example: you have arrays.
A data structure is non-linear if one element can be connected to more than two adjacent
elements.
Examples: trees.
Algorithms are used to manipulate the data contained in the data structure as in searching and
storing. Data structure and algorithm are concerned with coding phase of the life cycle of
software project which includes analysis, design, coding, testing and maintenance.
Abstraction: is a mechanism for separating the properties of an object and restricting
the focus to those relevant in the current context. The user of the abstraction does not have to
2
understand all of the details in order to utilize the object, but only those relevant to the current
task or problem.
A collection is a group of values with no implied organization or relationship between
the individual values. Sometimes we may restrict the elements to a specific data type such as a
collection of integers or floating-point values.
A container is any data structure or abstract data type that stores and organizes a
collection. The individual values of the collection are known as elements of the container and
a container with no elements is said to be empty. The organization or arrangement of the
elements can vary from one container to the next as can the operations available for accessing
the elements.
A sequence is a container in which the elements are arranged in linear order from front
to back, with each element accessible by position. Throughout the text, we assume that access
to the individual elements based on their position within the linear order is provided using the
subscript operator.
A sorted sequence is one in which the position of the elements is based on a prescribed
relationship between each element and its successor. For example, we can create a sorted
sequence of integers in which the elements are arranged in ascending or increasing order from
smallest to largest value.
In computer science, the term list is commonly used to refer to any collection with a
linear ordering. The ordering is such that every element in the collection, except the first one,
has a unique predecessor and every element, except the last one, has a unique successor. By
this definition, a sequence is a list, but a list is not necessarily a sequence since there is no
requirement that a list provide access to the elements by position.
3
Chapter 2: Fundamental data types
In mathematics it is customary to classify variables according to certain important
characteristics. Clear distinctions are made between real, complex, and logical variables or
between variables representing individual values, or sets of values, or sets of sets, or between
functions, functionals, sets of functions, and so on. This notion of classification is equally if not
more important in data processing. We will adhere to the principle that every constant, variable,
expression, or function is of a certain type. This type essentially characterizes the set of values
to which a constant belongs, or which can be assumed by a variable or expression, or which can
be generated by a function.
The primary characteristics of the concept of data type are as follows:
A data type determines the set of values to which a constant belongs, or which may be
assumed by a variable or an expression, or which may be generated by an operator or a
function.
The type of a value denoted by a constant, variable, or expression may be derived from
its form or its declaration without the necessity of executing the computational process.
Each operator or function expects arguments of a fixed type and yields a result of a fixed
type. If an operator admits arguments of several types (e.g., + is used for addition of
both integers and real numbers), then the type of the result can be determined from
specific language rules.
Variables and data types are introduced in a program in order to be used for computation.
To this end, a set of operators must be available. For each standard data type, a programming
language offers a certain set of primitive, standard operators, and likewise with each structuring
method a distinct operation and notation for selecting a component. The task of composition of
operations is often considered the heart of the art of programming. However, it will become
evident that the appropriate composition of data is equally fundamental and essential.
4
Whereas the slash denotes ordinary division resulting in a value of type REAL, the operator
DIV denotes integer division resulting in a value of type INTEGER. If we define the quotient
q = m DIV n, and the remainder r = m MOD n, the following relations hold, assuming n > 0:
q * n + r = m and 0 ≤ r < n.
Examples:
31 DIV 10 = 3 31 MOD 10 = 1
−31 DIV 10 = −4 −31 MOD 10 = 9
P Q P&Q P OR Q ¬P
TRUE TRUE TRUE TRUE FALSE
TRUE FALSE FALSE TRUE FALSE
FALSE TRUE FALSE TRUE TRUE
FALSE FALSE FALSE FALSE TRUE
The definition of such types introduces not only a new type identifier, but at the same time the
set of identifiers denoting the values of the new type. These identifiers may then be used as
constants throughout the program, and they enhance its understandability considerably. If, as
an example, we introduce variables s, d, r, and b.
VAR s: sex
VAR d: weekday
VAR r: rank
then the following assignment statements are possible:
s:= male
d:= Sunday
r:= major
b:= TRUE
Evidently, they are considerably more informative than their counterparts
s:= 1 d:= 7 r:= 6 b:= 2
Which are based on the assumption that c, d, r, and b are defined as integers and that the
constants are mapped onto the natural numbers in the order of their enumeration. Furthermore,
a compiler can check against the inconsistent use of operators. For example, given the
declaration of s above, the statement s:= s+1 would be meaningless.
If, however, we recall that enumerations are ordered, then it is sensible to introduce operators
that generate the successor and predecessor of their argument. We therefore postulate the
following standard operators, which assign to their argument its successor and predecessor
respectively: INC(x) DEC(x)
8
as shown in the figure below, the identifiers s1, s2, ..., sn introduced by a record type definition
are the names given to the individual components of variables of that type. As components of
records are called fields, the names are field identifiers. They are used in record selectors applied
to record structured variables. Given a variable x: T, its i-th field is denoted by x.si. Selective
updating of x is achieved by using the same selector denotation on the left side in an assignment
statement:
x.si:= e
where e is a value (expression) of type Ti. Given, for example, the record variables z, d, and p
declared above, the following are selectors of components:
z.im (of type REAL)
d.month (of type INTEGER)
p.name (of type Name)
p.birthdate (of type Date)
p.birthdate.day (of type INTEGER)
The example of the type Person shows that a constituent of a record type may itself be
structured. Thus, selectors may be concatenated. Naturally, different structuring types may also
be used in a nested fashion. For example, the i-th component of an array being a component of
a record variable r is denoted by r.a[i], and the component with the selector name s of the i-th
record structured component of the array a is denoted by a[i].s.
It is a characteristic of the Cartesian product that it contains all combinations of elements
of the constituent types. But it must be noted that in practical applications not all of them may
be meaningful. For instance, the type Date as defined above includes the 31st April as well as
the 29th February 1985, which are both dates that never occurred. Thus, the definition of this
type does not mirror the actual situation entirely correctly; but it is close enough for practical
purposes, and it is the responsibility of the programmer to ensure that meaningless values never
occur during the execution of a program.
The following short excerpt from a program shows the use of record variables. Its
purpose is to count the number of persons represented by the array variable family that are both
female and single:
VAR count: INTEGER;
family: ARRAY N OF Person;
count := 0;
FOR i := 0 TO N-1 DO
9
IF (family[i].sex = female) & (family[i].marstatus= single) THEN
INC(count) END
END
The record structure and the array structure have the common property that both are
random access structures. The record is more general in the sense that there is no requirement
that all constituent types must be identical. In turn, the array offers greater flexibility by
allowing its component selectors to be computable values (expressions), whereas the selectors
of record components are field identifiers declared in the record type definition.
10
Chapter 3: Arrays
The most basic structure for storing and accessing a collection of data is the array.
Arrays can be used to solve a wide range of problems in computer science. Most programming
languages provide this structured data type as a primitive and allow for the creation of arrays
with multiple dimensions.
The array is probably the most widely used data structure; in some languages it is even
the only one available. An array consists of components which are all of the same type, called
its base type; it is therefore called a homogeneous structure. The array is a random-access
structure, because all components can be selected at random and are equally quickly accessible.
In order to denote an individual component, the name of the entire structure is augmented by
the index selecting the component. This index is to be an integer between 0 and n-1, where n is
the number of elements, the size, of the array.
TYPE T = ARRAY n OF T0
Examples
TYPE Row = ARRAY 4 OF REAL
TYPE Card = ARRAY 80 OF CHAR
TYPE Name = ARRAY 32 OF CHAR
A particular value of a variable
VAR x: Row
with all components satisfying the equation xi = 2−i, may be visualized as shown in Figure 3.
11
Figure 3 Array of type Row with xi = 2−i.
12
VAR a: ARRAY N OF INTEGER
sum:= 0;
FOR i:= 0 TO N-1 DO sum:= a[i] + sum END
k:= 0; max:= a[0];
FOR i:= 1 TO N-1 DO
IF max < a[i] THEN k:= i; max:= a[k] END
END.
In a further example, assume that a fraction f is represented in its decimal form with k-1 digits,
i.e., by an array d such that
f = ∑𝒌−𝟏
𝒊=𝟎 𝒅𝒊 ∗ 𝟏𝟎
𝒊
or
f = 𝑑0 + 10 ∗ 𝑑1 + 100 ∗ 𝑑2 + . . . + 𝑑𝑘−1 ∗ 10𝑘−1
Now assume that we wish to divide f by 2. This is done by repeating the familiar division
operation for all k-1 digits di, starting with i=1. It consists of dividing each digit by 2 taking
into account a possible carry from the previous position, and of retaining a possible remainder
r for the next position:
𝑟: = 10 ∗ 𝑟 + 𝑑[𝑖];
𝑑[𝑖]: = 𝑟 𝐷𝐼𝑉 2;
𝑟: = 𝑟 𝑀𝑂𝐷 2
This algorithm is used to compute a table of negative powers of 2. The repetition of halving to
compute 2−1, 2−2, ..., 2−N is again appropriately expressed by a FOR statement, thus leading to
a nesting of two FOR statements.
PROCEDURE Power(VAR W: Texts.Writer; N: INTEGER);
(*compute decimal representation of negative powers of 2*)
VAR i, k, r: INTEGER;
d: ARRAY N OF INTEGER;
BEGIN
FOR k := 0 TO N-1 DO
Texts.Write(W, "."); r := 0;
FOR i := 0 TO k-1 DO
r := 10*r + d[i]; d[i] := r DIV 2; r := r MOD 2;
Texts.Write(W, CHR(d[i] + ORD("0")))
END ;
d[k] := 5; Texts.Write(W, "5"); Texts.WriteLn(W)
END
END Power.
The resulting output text for N = 10 is
.5
.25
.125
.0625
.03125
.015625
.0078125
.00390625
.001953125
.0009765625
13
II. The importance of arrays
You will notice the array structure looks very similar to list structure. That's because the
two structures are both sequences that are composed of multiple sequential elements that can
be accessed by position. But there are two major differences between the array and the list.
First, an array has a limited number of operations, which commonly include those for array
creation, reading a value from a specific element, and writing a value to a specific element. The
list, on the other hand, provides a large number of operations for working with the contents of
the list. Second, the list can grow and shrink during execution as elements are added or removed
while the size of an array cannot be changed after it has been created.
You may be wondering, if the list structure is a mutable sequence type, why are we
bothering to discuss the array structure, much less plan to implement an abstract data type for
working with arrays? The short answer is that both structures have their uses. There are many
problems that only require the use of a basic array in which the number of elements is known
beforehand and the flexible set of operations available with the list is not needed.
The array is best suited for problems requiring a sequence in which the maximum
number of elements are known up front, whereas the list is the better choice when the size of
the sequence needs to change after it has been created. As you will learn later in the chapter, a
list contains more storage space than is needed to store the items currently in the list. This extra
space, the size of which can be up to twice the necessary capacity, allows for quick and easy
expansion as new items are added to the list. But the extra space is wasteful when using a list
to store a fixed number of elements. For example, suppose we need a sequence structure with
100; 000 elements. We could create a list with the given number of elements
using the replication operator:
values = [ None ] * 100000
But underneath, this results in the allocation of space for up to 200,000 elements, half of which
will go to waste. In this case, an array would be a better choice. The decision as to whether an
array or list should be used is not limited to the size of the sequence structure. It also depends
on how it will be used. The list provides a large set of operations for managing the items
contained in the list. Some of these include inserting an item at a specific location, searching
for an item, removing an item by value or location, easily extracting a subset of items, and
sorting the items. The array structure, on the other hand, only provides a limited set of
operations for accessing the individual elements comprising the array. Thus, if the problem at
hand requires these types of operations, the list is the better choice.
14
length (): Returns the length or number of elements in the array.
getitem ( index ): Returns the value stored in the array at element position index. The
index argument must be within the valid range. Accessed using the subscript operator.
setitem ( index, value ): Modi_es the contents of the array element at position index to
contain value. The index must be within the valid range. Accessed using the subscript
operator.
clearing( value ): Clears the array by setting every element to value.
iterator (): Creates and returns an iterator that can be used to traverse the elements of
the array.
Some computer scientists consider the array a physical structure and not an abstraction since
arrays are implemented at the hardware level. But remember, there are only three basic
operations available with the hardware-implemented array. As part of our Array ADT, we have
provided for these operations but have also included an iterator and operations for obtaining the
size of the array and for setting every element to a given value. In this case, we have provided
a higher level of abstraction than that provided by the underlying hardware-implemented array.
15