You are on page 1of 11

Chapter 1

Introduction to Data Structure and Algorithms

Chapter Objectives

In this chapter, you will learn more about what is a Data Structure and Algorithms is and
how it actually functions. At the end of this chapter, students will be able to:
 Apply the properties of Data Structures and algorithm in order to understand how
data is access and represented in computer
 Develop a scheme to be able to define data type, abstract data type and data
structure
 Develop self-reliance and enough confidence inorder to understand the process of
problem solving and Learn the basic mathematical functions used in analyzing
algorithms

Introduction to Data Structure and Algorithms

 Computer is an electronic machine which is used for data processing and manipulation.
 When programmer collects such type of data for processing, he would require to store all of
them in Computer’s main memory.
 In order to make computer work we need to know

a. Representation of data in computer.


b. Accessing of data.
c. How to solve problem step by step.
 Almost every enterprise application uses various types of data structures in one or the other
way. This topics will give you a great understanding on Data Structures needed to understand
the complexity of enterprise level applications and need of algorithms, and data structures.
What is Data Structure?
 Data structure is a representation of the logical relationship existing between individual elements
of data.
 Data Structure is a way of organizing all data items that considers not only the elements stored
but also their relationship to each other.
 We can also define data structure as a mathematical or logical model of a particular organization
of data items.
 The representation of particular data structure in the main memory of a computer is called as
storage structure.
 The storage structure representation in auxiliary memory is called as file structure.
 It is defined as the way of storing and manipulating data in organized form so that it can be used
efficiently.
 Data Structure are the programmatic way of storing data so that data can be used efficiently.
 Data Structure is a systematic way to organize data in order to use it efficiently.
 Data Structure mainly specifies the following four things
a. Organization of Data
b. Accessing methods
c. Degree of associativity
d. Processing alternatives for information
 Algorithm + Data Structure = Program
 Data structure study covers the following points
a. Amount of memory require to store.
b. Amount of time require to process.
c. Representation of data in memory.
d. Operations performed on that data.

DATA STRUCTURE CHAPTER 1 1


Characteristics of a Data Structure

 Correctness − Data structure implementation should implement its interface correctly.


 Time Complexity − Running time or the execution time of operations of data structure must be
as small as possible.
 Space Complexity − Memory usage of a data structure operation should be as little as possible.

Need for Data Structure

As applications are getting complex and data rich, there are three common problems that applications
face now-a-days.

 Data Search − Consider an inventory of 1 million (106) items of a store. If the application is to
search an item, it has to search an item in 1 million (10 6) items every time slowing down the
search. As data grows, search will become slower.
 Processor speed − Processor speed although being very high, falls limited if the data grows to
billion records.
 Multiple requests − as thousands of users can search data simultaneously on a web server, even
the fast server fails while searching the data.

To solve the above-mentioned problems, data structures come to rescue. Data can be organized in a
data structure in such a way that all items may not be required to be searched, and the required data
can be searched almost instantly.

What is Algorithm?

 It is a step-by-step procedure, which defines a set of instructions to be executed in a certain


order to get the desired output.
 Algorithms are generally created independent of underlying languages
 An algorithm can be implemented in more than one programming language

Execution Time Cases

There are three cases which are usually used to compare various data structure's execution time in a
relative manner.

 Worst Case − this is the scenario where a particular data structure operation takes maximum
time it can take. If an operation's worst case time is ƒ(n) then this operation will not take more
than ƒ(n) time where ƒ(n) represents function of n.
 Average Case − this is the scenario depicting the average execution time of an operation of a
data structure. If an operation takes ƒ(n) time in execution, then m operations will take mƒ(n)
time.
 Best Case − this is the scenario depicting the least possible execution time of an operation of a
data structure. If an operation takes ƒ(n) time in execution, then the actual operation may take
time as the random number which would be maximum as ƒ(n).

From the data structure point of view, following are some important categories of algorithms −

 Search − Algorithm to search an item in a data structure.


 Sort − Algorithm to sort items in a certain order.
 Insert − Algorithm to insert item in a data structure.
 Update − Algorithm to update an existing item in a data structure.
 Delete − Algorithm to delete an existing item from a data structure.

Characteristics of an Algorithm

Not all procedures can be called an algorithm. An algorithm should have the following characteristics −

DATA STRUCTURE CHAPTER 1 2


 Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or phases), and
their inputs/outputs should be clear and must lead to only one meaning.
 Input − An algorithm should have 0 or more well-defined inputs.
 Output − An algorithm should have 1 or more well-defined outputs, and should match the
desired output.
 Finiteness − Algorithms must terminate after a finite number of steps.
 Feasibility − Should be feasible with the available resources.
 Independent − An algorithm should have step-by-step directions, which should be independent
of any programming code.

How to Write an Algorithm?

There are no well-defined standards for writing algorithms. Rather, it is problem and resource
dependent. Algorithms are never written to support a particular programming code.

As we know that all programming languages share basic code constructs like loops (do, for, while), flow-
control (if-else), etc. These common constructs can be used to write an algorithm.

We write algorithms in a step-by-step manner, but it is not always the case. Algorithm writing is a
process and is executed after the problem domain is well-defined. That is, we should know the problem
domain, for which we are designing a solution.

Example

Let's try to learn algorithm-writing by using an example.

Problem − Design an algorithm to add two numbers and display the result.
Step 1 − START
Step 2 − declare three integers a, b & c
Step 3 − define values of a & b
Step 4 − add values of a & b
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP

Algorithms tell the programmers how to code the program. Alternatively, the algorithm can be written
as −
Step 1 − START ADD
Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP

In design and analysis of algorithms, usually the second method is used to describe an algorithm. It
makes it easy for the analyst to analyze the algorithm ignoring all unwanted definitions. He can observe
what operations are being used and how the process is flowing.

Writing step numbers, is optional.

We design an algorithm to get a solution of a given problem. A problem can be solved in more than one
ways.

We design an algorithm to get a solution of a given problem. A problem can be solved in more
than one ways.

DATA STRUCTURE CHAPTER 1 3


Hence, many solution algorithms can be derived for a given problem. The next step is to analyze
those proposed solution algorithms and implement the best suitable solution.

Algorithm Analysis

Efficiency of an algorithm can be analyzed at two different stages, before implementation and after
implementation. They are the following −

 A Priori Analysis − This is a theoretical analysis of an algorithm. Efficiency of an algorithm is


measured by assuming that all other factors, for example, processor speed, are constant and
have no effect on the implementation.
 A Posterior Analysis − This is an empirical analysis of an algorithm. The selected algorithm is
implemented using programming language. This is then executed on target computer machine.
In this analysis, actual statistics like running time and space required, are collected.

We shall learn about a priori algorithm analysis. Algorithm analysis deals with the execution or running
time of various operations involved. The running time of an operation can be defined as the number of
computer instructions executed per operation.

Algorithm Complexity

Suppose X is an algorithm and n is the size of input data, the time and space used by the algorithm X are
the two main factors, which decide the efficiency of X.

 Time Factor − Time is measured by counting the number of key operations such as comparisons
in the sorting algorithm.
 Space Factor − Space is measured by counting the maximum memory space required by the
algorithm.

The complexity of an algorithm f(n) gives the running time and/or the storage space required by the
algorithm in terms of n as the size of input data.

Space Complexity

Space complexity of an algorithm represents the amount of memory space required by the algorithm in
its life cycle. The space required by an algorithm is equal to the sum of the following two components −

 A fixed part that is a space required to store certain data and variables, that are independent of
the size of the problem. For example, simple variables and constants used, program size, etc.
 A variable part is a space required by variables, whose size depends on the size of the problem.
For example, dynamic memory allocation, recursion stack space, etc.

DATA STRUCTURE CHAPTER 1 4


Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part and S(I) is the
variable part of the algorithm, which depends on instance characteristic I. Following is a simple example
that tries to explain the concept −
Algorithm: SUM(A, B)
Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop

Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now, space depends on
data types of given variables and constant types and it will be multiplied accordingly.

Time Complexity

Time complexity of an algorithm represents the amount of time required by the algorithm to run to
completion. Time requirements can be defined as a numerical function T(n), where T(n) can be
measured as the number of steps, provided each step consumes constant time.

For example, addition of two n-bit integers takes n steps. Consequently, the total computational time is
T(n) = c ∗ n, where c is the time taken for the addition of two bits. Here, we observe that T(n) grows
linearly as the input size increases.

Asymptotic analysis of an algorithm refers to defining the mathematical boundation/framing of


its run-time performance. Using asymptotic analysis, we can very well conclude the best case,
average case, and worst case scenario of an algorithm.

Asymptotic analysis is input bound i.e., if there's no input to the algorithm, it is concluded to
work in a constant time. Other than the "input" all other factors are considered constant.

Asymptotic analysis refers to computing the running time of any operation in mathematical units
of computation. For example, the running time of one operation is computed as f(n) and may be
for another operation it is computed as g(n2). This means the first operation running time will
increase linearly with the increase in n and the running time of the second operation will increase
exponentially when n increases. Similarly, the running time of both operations will be nearly the
same if n is significantly small.

Usually, the time required by an algorithm falls under three types −

 Best Case − Minimum time required for program execution.


 Average Case − Average time required for program execution.
 Worst Case − Maximum time required for program execution.

Big Oh Notation, Ο
The notation Ο(n) is the formal way to express the upper bound of an algorithm's running time. It
measures the worst case time complexity or the longest amount of time an algorithm can possibly take
to complete.

Following is a list of some common asymptotic notations using big Oh Notation −


constant − Ο(1)
logarithmic − Ο(log n)
linear − Ο(n)
n log n − Ο(n log n)
quadratic − Ο(n2)
cubic − Ο(n3)
polynomial − nΟ(1)
exponential − 2Ο(n)

DATA STRUCTURE CHAPTER 1 5


Data Structures are normally classified into two broad categories
a. Primitive Data Structure
b. Non-primitive data Structure

STRUCTURED DATA TYPES - are collections of data items which are non-atomic. These data types are
“derived” from the simple data types.

Data types- is a particular kind of data item, as defined by the values it can take, the programming
language used, or the operations that can be performed on it.

Primitive Data Structure


 Primitive data structures are basic structures and are directly operated upon by machine
instructions.
 Primitive data structures have different representations on different computers.
 Integers, floats, character and pointers are examples of primitive data structures.

These data types are available in most programming languages as built in type.
a. Integer: It is a data type which allows all values without fraction part. We can use it for whole
numbers.
b. Float: It is a data type which use for storing fractional numbers.
c. Character: It is a data type which is used for character values.
d. Pointer: A variable that holds memory address of another variable are called pointer.

Primitive Data Types

• Representation format for each type balances compactness, range, accuracy, ease of
manipulation, and standardization
• For any language there is the need to provide different storage representations for integers and
floating point numbers.
• In the case of integers, a variety of sizes allows for the most efficient use of memory for a given
task. For example, the byte type (8-bit) is often useful for I/O tasks while long types (64-bit) are
needed for representing very large values. In between are the 16-bit short and the 32-bit int
types. The integer types use two's complement signed representations.
• For floating point there are two widths available. The IEEE 754 floating point standard is used for
the 32-bit float and 64-bit double types.

Historically, characters were typically represented with 7-bit ASCII codes, which allowed for 128
characters. A large number of 8-bit encodings exist to provide both standard ASCII and

DATA STRUCTURE CHAPTER 1 6


characters needed for particular languages. However, full internationalization with a single code
requires many more characters than is available with a single byte. So a two byte character type-
char - is used to hold the Unicode representation. (An even larger 4-byte system is under
development but not implemented in Java.) The char type can also represent 16-bit unsigned
integers.

 A Boolean (true/false) type is needed for the many logic operations carried out in almost any
program.

Data Representation comes with 8 primitive data types in all. These include 4 types of integers, two
floating points, a character type and a Boolean. Below is a table of the primitive types with their
specifications.
Primitive Data Types
Type Values Default Size Range
Byte signed integers 0 8 bits -128 to 127
Short signed integers 0 16 bits -32768 to 32767
Int signed integers 0 32 bits -2147483648 to 2147483647
-9223372036854775808 to
Long signed integers 0 64 bits
9223372036854775807
+/-1.4E-45 to +/-3.4028235E+38,
Float IEEE 754 floating point 0.0 32 bits
+/-infinity, +/-0, NAN
+/-4.9E-324 to
Double IEEE 754 floating point 0.0 64 bits +/-1.7976931348623157E+308,
+/-infinity, +/-0, NaN
Char Unicode character \u0000 16 bits \u0000 to \uFFFF
1 bit used in
boolean true, false false 32 bit NA
integer

DATA REPRESENTATION

 How numbers and other data, such as characters, are represented in memory
 Representing data is of great practical importance. Ideally a single memory representation, or
type, could represent all data including numbers, characters and boolean values.

Computer memory and transfer rates, however, are not infinite and designers must strike a
compromise between the widest possible range of values and conserving memory and maximizing
speed.

Also, the data representations must use base 2 as dictated by the underlying binary hardware.

Goals of Computer Data Representation

• Compactness and range


– Describes number of bits used to represent a numeric value
– More compact data representation format; less expense to implement in computer
hardware
– Users and programmers prefer large numeric range
• Accuracy
– Precision of representation increases with number of data bits used
• Ease of manipulation

DATA STRUCTURE CHAPTER 1 7


– Manipulation is executing processor instructions (addition, subtraction, equality
comparison)
– Ease is machine efficiency
– Processor efficiency depends on its complexity
• Standardization
– Ensures correct and efficient data transmission
– Flexibility to combine hardware from different vendors with minimal communication
problems

MEMORY REPRESENTATION

1. CHARACTER MEMORY REPRESENTATION

Scheme: based on the assignment of a numeric code to each of the characters in the character set.
Standard Coding Schemes:
1. ASCII (American Standard Code for Information Interchange)
2. EBCDIC (Extended Binary Coded Decimal Interchange Code)

e.g.: 1) char x = ‘A’


A = 4116 = 6510 = 10000012
0 1000001

2) signed char Y = -65;


6510 = 10000012
1 1000001
3) “HI”
0 1001000 0 1001001

H I
UNICODE
 Unicode provides a unique number for every character, no matter what the platform, no matter
what the program, no matter what the language. The Unicode Standard has been adopted by
such industry leaders as Apple, HP, IBM, JustSystems, Microsoft, Oracle, SAP, Sun, Sybase,
Unisys and many others. Unicode is required by modern standards such as XML, Java,
ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement
ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other
products. The emergence of the Unicode Standard, and the availability of tools supporting it,
are among the most significant recent global software technology trends. In Unicode, a letter
maps to something called a code point .
 Every platonic letter in every alphabet is assigned a magic number by the Unicode consortium
which is written like this: U+0639. This magic number is called a code point. The U+ means
"Unicode" and the numbers are hexadecimal. U+0639 is the Arabic letter Ain. The English

DATA STRUCTURE CHAPTER 1 8


letter A would be U+0041. You can find them all using the charmap utility on Windows
2000/XP or visiting the Unicode web site. Say we have a string:

Hello

which, in Unicode, corresponds to these five code points:

U+0048 U+0065 U+006C U+006C U+006F

2. INT (SHORT INTEGERS)

3 Schemes:
#1. Sign magnitude
Left most bit is used as a sign (1 is negative, 0 is positive) and the remaining bits
are used to store the magnitude.
#2. 2’s Complement
Nonnegative integers are represented as in the sign magnitude notation. The
representation of a negative number –n is obtained by first finding the base-two
representation for n, complementing it, and then adding one to the result.
#3. Excess or Biased Notation
The representation of an integer as a string of n bits is formed by adding the bias
2n-1 to the integer and representing the result in base-two.
Examples:
Scheme #1:
a] int n = 65;
0 0000000 01000001

sb magnitude

b] int n = -65;
1 0000000 01000001

sb magnitude

Scheme #2:
a] int n = 65;
0 00000000 01000001

DATA STRUCTURE CHAPTER 1 9


sb magnitude

b] int n = -65;
0 00000000 01000001
11111111 10111110 1’s complement
+ 1 2’s complement
1 11111111 10111111

sb magnitude

Scheme #3:
n = # of bits used to represent the integer
2n-1 = excess/bias
n = 16 bits
excess = 216-1 = 215 = 32768

a] int n = 65;

32768
+0065
32833 excess notation of +65

binary: 1 0000000 01000001

b] int n = -65;
32768
+0-65
32703 excess notation of -65

binary: 0 1111111 1011111

3. float
data type bits = sb + expo + mantissa
float 32 = 1 8 23
double 64 = 1 11 52
long double 80 = 1 14 64 + integer bit = 1

exponent bias (normalized values)


formula: expo = 2n-1 – 1

float 127 = 28-1 – 1 = 27 – 1 = 128 - 1


double 1023 = 211-1 – 1 = 210 – 1 = 1024 - 1
long double 8191 = 214-1 – 1 13
=2 –1 = 8192 - 1

float sb biased exponent mantissa

31 30……………………23 22………….0

DATA STRUCTURE CHAPTER 1 10


double sb biased exponent mantissa

63 62……………………52 51………….0

long double sb biased exponent ib mantissa

79 78……………………65 64 63………….0

Procedure: a. binary form


b. normalized form
c. calculate the biased exponent

Example:
1] float f = 10.25;
a. 1010.01
expo
3
b. 1.01001 * 2
dropped mantissa

c. biased expo:

127
+3
130 = 10000010
sb biased expo mantissa
0 10000010 0100100000000000000000
2] float f = 26.275;
a. 11010.011
b. 1.1010011 * 24
c. 127
+ 4
131 = 10000011

sb biased expo mantissa


0 10000011 1010011000000000000000

3] float f = -0.375;
a. 0.011
b. 1.1 * 2-2
c. 127
+-2
125 = 01111101

sb biased expo mantissa


0 10000011 1000000000000000000000

DATA STRUCTURE CHAPTER 1 11

You might also like