You are on page 1of 16

PG Pathshala

Subject: BIOPHYSICS

Paper No. and Title:


Paper 14:
Biophysics

Module No. and Title:


Module :31
Introduction to Data Structure and Algorithms

Principal Investigator:
Prof. M.R. Rajeswari
Professor, Department of Biochemistry
All India Institute of Medical Sciences, New Delhi

Co-Principal Investigator: Prof. T. P. Singh


Distinguished Biotechnology Research Professor
All India Institute of Medical Sciences, New Delhi

Paper Coordinator: Dr. M. R. Rajeswari


Professor, Department of Biochemistry
All India Institute of Medical Sciences
New Delhi, India

Content Writer: Dr. Himanshu Narayan Singh


Professor, Department of Biochemistry
All India Institute of Medical Sciences
New Delhi, India

Content Reviewer: Dr. Naidu Subbarao


Associate Professor
Centre for Computational Biology and Integrative Sciences
Jawaharlal Nehru University, New Delhi, India

1
Review:
1. A basic introduction of Data structures and algorithms is presented, covering briefly about
primary and secondary data types. Some basic searching and sorting algorithms are presented.
2. Apart from the selection of topics, presentation of topics is understandable.
3. It would give a reasonable background to students about simple Data Structures and
Algorithms which students can extend by further exploration of the introduced ideas.

The biological data expands at an exponential rate and handling such huge amount of data is
tough task for the scientists. To sort out such problems, it is important to store data in well
organized manner so that it can be retrieved and used most productively.

To deal with huge data, computer scientists have developed various data structures to store the
digital information in efficient way and to process various algorithms have been developed.
These data structures and algorithms are being used potentially nowadays to deal with various
major biological problems such as genome sequencing, structure prediction, drug designing etc.

Learning Outcome

1. What is data structure


2. What is algorithm
3. Introduction to Searching and Sorting Algorithms

1. Data Structure
Data Structures are the programmatic way of storing data so that data can be used efficiently.
Almost every enterprise application uses various types of data structures in one or other way.
This tutorial will give you great understanding on Data Structures concepts needed to
understand the complexity of enterprise level applications and need of algorithms, data
structures.

Data can be of various types such as integer, float etc and can be stored various ways.
Therefore, on the basis types of the data and its storing procedure it was divided mainly of /
in two types:

A. Primary Data Structures


B. Derived Data Structures

2
Data Structure

Primary Data Structure Derived Data Structure

Character Linked Stack and


Integer Float Pointer Array
or String List Queue

A. Primary Data Structures


Primary data structures are basically built-in/primitive data structures which includes mainly
four types of data.

(i) Integer: these are whole numbers which can be positive or negative
I = (…….., 4, 3, 2, 1, 0, -1, -2, -3, -4, ……..)
where I is set of integer.

(ii) Float: these are decimal numbers which can be positive or negative.
F = (…….., 4.1, 3.1, 2.1, 1.1, 0.1, -1.1, -2.1, -3.1, -4.1, ……..)
where F is set of float number.

(iii) Character or String: these are alphabets or any combination of alphabets known as
string.
C = (‘a’, ‘b’, ‘c’, ‘d’, efg, xyzabc)
where C is set of characters

(iv) Pointer: it is a programming language object, whose value points to another value
stored elsewhere in the computer memory using its address.

001 Points to 1008


002 1009
003 1010
004 1011

3
In the above example the pointer (shown left) points to the memory 003 (shown by
arrow).

B. Derived Data Structures

There are various complex Data Structures, which are derived and used to store large
connected datasets. These data types are normally built by combination of primary or built-
in data types and associated operations on them. All these data structures allow us to
perform different operations on data. We select these data structures based on which type of
operation is required.

The derived data structures were mainly divided into three categories:
(i) Array: it is a container which can hold fix number of items and these items should
be of same type. Most of the data structure make use of array to implement their
algorithms.

Array comprises two components:


a) Element: item stored in an array is called an element
b) Index: location of an element in an array has a numerical index which
is used to identify the element.

Array can be defined as below:

name elements

int array [10] = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100}

type size

Elements 10 20 30 40 50 60 70 80 90 100

Index 0 1 2 3 4 5 6 7 8 9

Location of elements in array

(ii) Linked-list:
It is data structure which store sequential information which are connected to each
other via links.

4
Linked-list comprises following components:
a) Series of nodes
b) Each node has two parts: (i) Data and (ii) Pointer: address of next
node

Linked-list can be defined as below:

Nodes 10 001 20 002 30 003 40 004

Address 000 001 002 003

Location of elements in array

(iii) Stack and Queue

Stack is an ordered list of similar data type such as integer, float, string etc. with predefined
capacity. It allows addition or removal of data in particular order. Every time an element is
added, it goes on the top of the stack; the only element that can be removed is the element
that was at the top of the stack, just like a pile of objects. Therefore, it is also called as “first
in and first out”.

1001

1002
Stack Data Structure
1003

1004

Queue a linear data structure, in which the first element is inserted from one end called
REAR (also called tail), and the deletion of existing element takes place from the other end
called as FRONT (also called head). This makes queue as FIFO data structure, which means
that element inserted first will also be removed first.

1001 1002 1003 1004 1005

5
REAR FRONT

Stack Data Structure

2. Algorithm

Algorithm is a step by step procedure, which defines a set of instructions to be executed in


certain order to get the desired output. Algorithms are generally created independent of
underlying languages, i.e. an algorithm can be implemented in more than one programming
language.

An algorithm should have following characteristics:

(a) It should be clear and unambiguous. Each of its steps (or phases), and their input/outputs should
be clear and must lead to only one meaning.
(b) It should have 0 or better defined inputs.
(c) It should have 1 or better defined outputs, and should match the desired output.
(d) It must terminate after a finite number of steps.
(e) It should be feasible with the available resources.
(f) It should have step-by-step directions which should be independent of any programming
code.

There are no well-defined standards for writing algorithms. Rather, it is problem and resource
dependent. Algorithm writing is a process and is executed after the problem domain is well-
defined. That is, we should know the problem domain, for which we are designing a solution.

Example: To design an algorithm to add two numbers.


Solution:

Step 1: define two numbers: integer a, b


Step 2: assign values to integers: a = 2; b = 3
Step 3: add values of integers a and b: a + b
Step 4: store output of Step 3 to a variable c
Step 5: print c = 5

To write an algorithm generally pseudocodes are generally used, which comprises assignment
operators, conditional statement, and programming loops. Below are brief descriptions of the
elementary commands that we use in the pseudocode in this module.

(a) Assignment Pseudocode

6
x←y

In the above expression, two variables, ‘x’ and ‘y’, and the assignment operator ‘←’ were used.
The expression indicates the variable x was assigned with value y.

Example: x ← 1; y ← 2; z ← x + y

The sum of variable x and y was assigned to new variable z. Therefore, the value of z will be the
addition of two variables x and y. So value of z is 3.

(b) Conditional pseudocode


if else statement
1 if X is true
2 Y
3 else
4 Z

In the above pseudocode, if statement X is true, it will execute instructions Y, otherwise


instructions of Z will be executed.

Example

X ← marks of the student


X(‘Pass’, ‘Fail’)
1 if X > 40
2 return Pass
3 else
4 return Fail
In the above example, the variable X describes about student credential as either ‘Pass’ or ‘Fail’.
If the student scores greater than 40, it will return Pass otherwise Fail. For example, 78(‘Pass’,
‘Fail’) returns ‘Pass’.

(c) Programming loop

Loops cause program to execute the certain block of code repeatedly until test condition is false.
Loops are used in performing repetitive task in programming.

for loop
1 for i ← x to y
2 Z
In the above pseudocode, the for-loop iterate till the value of variable ‘i' vary within the range x
to y.

7
Example:

sum ( n )
1 sum ← 0
2 for i ← 1 to n
3 sum ← sum + i
4 return sum

In the above example, the function sum calculates the sum of ‘n’ numbers from 1 to n. For
instance, sum(5) returns 1+2+3+4+5 = 15

while loops

1 while X is true
2 Y

In the above pseudocode, the while-loop iterate till the condition is true.

Example:

sum ( x )
1 i←0
2 sum ← i
3 while n ≤ x
4 i←i+1
5 sum ← sum + i
6 return i

In the above example, the function sum calculates the sum of numbers until the number is less
than or equals to x. For instance, sum(5) returns 1+2+3+4+5 = 15. The while loop will repeat the
addition until the number is less than or equals to 5. When the integer 6 will come, it will not
satisfy the condition and terminates the loop.

3. Searching and Sorting Algorithms

There various algorithms have been developed by computer scientists to store, organize and
process the data. In this module, we will be discussing two most extensively used algorithms
described below:

(i) Searching Algorithm

We have discussed here two basic searching algorithms:

Linear Search

8
Linear search is a very simple search algorithm. In this type of search, a sequential search is
made over all items one by one. Every items is checked and if a match founds then that particular
item is returned otherwise search continues till the end of the data collection.

10 20 30 40 50 60 70 80

In the above example to search any number in the data structure, the search will be started from
the first element. If the required number matches it will stop searching, otherwise it will scan
each element until it matches. For example, to search the number 50 first it compare first element
10 with the number, then it will compare with the second element 20 and so on. Once the number
matches (shown in blue colour), the searching will be stopped.

Algorithm
Linear Search ( Array A, Value x)
Step 1: Set i to 1
Step 2: if i > n then go to step 7
Step 3: if A[i] = x then go to step 6
Step 4: Set i to i + 1
Step 5: Go to Step 2
Step 6: Print Element x Found at index i and go to step 8
Step 7: Print element not found
Step 8: Exit

Pseudocode

procedure linear_search (list, value)


for each item in the list
if match item == value
return the item's location
end if
end for
end procedure

Binary Search

This search algorithm works on the principle of divide and conquer. For this algorithm to work
properly the data collection should be in defined sorted order.

Binary search, searches for a particular item by comparing the middle most item of the
collection. If match occurs then index of item is returned. If middle item is greater than item then
item is searched in sub-array to the right of the middle item other wise item is search in sub-array
to the left of the middle item. This process continues on sub-array as well until the size of
subarray reduces to zero.

9
Algorithm for Binary Search is described below, which is describes steps to find the number 50
in the given array.
Step 1: Identify the array in which the element is to be searched.

1 2 3 4 5 6 7 8
Step 2: Sort the array in ascending order.

1 2 3 4 5 6 7 8
Step 2: Determine the half of the array by using the bellow formulae.
𝑇𝑜𝑡𝑎𝑙𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑎𝑟𝑟𝑎𝑦𝑒𝑙𝑒𝑚𝑒𝑛𝑡
𝑀𝑖𝑑𝑑𝑙𝑒𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 2
In the array, total number of elements is 8. So, the mid position will be 4 (element 40). If
the number of elements in array is odd, in that case mid position will be in decimal, so
only integer part will be considered as position.

1 2 3 4 5 6 7 8
Step 3: The number 40 will be compared with the required element 50. The 40 is less than 50,
therefore, the algorithm will search for only right hand side elements of 40.

1 2
Step 4: In right side after 40, there are3 4 elements.
4 5
Again 6
repeat step 27 for these
8
4 elements. The
th
mid position will be 6 , element 60.

Step 5. The number, 601 will be2 compared


3 with4 the required
5 6
number, 7
which is 8greater than 50. So
left side of elements will be considered for further. Now only one element is left and that is
required number. Then the algorithm will be terminated.

1 2 3 4 5 6 7 8
1
0
Pseudocode

1 Procedure binary_search
2 A ← sorted array
3 n ← size of array
4 x ← value ot be searched

5 Set lowerBound = 1
6 Set upperBound = n

7 while x not found

8 if upperBound < lowerBound


9 EXIT: x does not exists.

10 set midPoint = number of elements in array / 2

11 if A[midPoint] < x
12 set lowerBound = midPoint + 1

13 if A[midPoint] > x
14 set upperBound = midPoint - 1

15 if A[midPoint] = x
16 EXIT: x found at location midPoint

17 end while

18 end procedure

1
1
(ii) Sorting Algorithm

Sorting refers to arranging data in a particular format. Sorting algorithm specifies the way to
arrange data in a particular order. Using the sorting algorithms, the data can be sorted out in both
the ways either increasing order or decreasing order.

In this module we will discuss three sorting algorithms described below:

 Bubble Sort
 Insertions Sort

Bubble Sort:
Bubble sort is a simple sorting algorithm. This sorting algorithm is comparison based algorithm
in which each pair of adjacent elements is compared and elements are swapped if they are not in
order.

Description of bubble sort is explained below:

Unsorted array was taken. It has four elements

1 2 3 4
Bubble sort starts with very first two elements, comparing them to check which one is greater. In
this example, it will compare 40 and 10 and swap its position.

1 2 3 4

After sorting first two elements, it will compared elements of next two elements and if element
present at 2nd position is greater than the element present it will be swapped. In this case, 40 will
be swapped with 30. And the same process will be repeated until all the numbers are sorted in
the order (shown as below).

1 2 3 4

1
2
1 2 3 4

10 20 30 40

1 2 3 4

1 2 3 4

Algorithm

1. begin BubbleSort(list)

2. for all elements of list

3. if list[i] > list[i+1]

4. swap(list[i], list[i+1])

5. end if

6. end for

7. return list

8. end BubbleSort

1
3
Insertion Sort

This is a in-place comparison based sorting algorithm. Here, a sub-list is maintained which is
always sorted. For example, the lower part of an array is maintained to be sorted. An element
which is to be 'insert'ed in this sorted sub-list, has to find its appropriate place and insert it there.
Hence the name insertion sort.

Description of bubble sort / insertion sort is explained below:

Unsorted array is taken which has four elements.

1 2 3 4
Insertion sort, first compares initial two elements. If first element is greater than second it will
sort it, otherwise it will be remain same. In present example, two elements will be sorted and will
be in sorted list.

1 2 3 4
Insertion sort will compare next two elements and sort it into the ascending order like previous
step. This process will be repeating until the whole list will be sorted out as described below.

10 30 40 20

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

1
4
Algorithm

Step 1 − If it is the first element, it is already sorted. return 1;

Step 2 − Pick next element

Step 3 − Compare with all elements in the sorted sub-list

Step 4 − Shift all the elements in the sorted sub-list that is greater than the value to be sorted

Step 5 − Insert the value

Step 6 − Repeat until list is sorted

Step 7 – END

Pseudocode

1 procedure insertionSort( A : array of items )


2 int holePosition
3 int valueToInsert

4 for i = 1 to length(A) inclusive do:

5 /* select value to be inserted */


6 valueToInsert = A[i]
7 holePosition = i

8 /*locate hole position for the element to be inserted */

9 while holePosition > 0 and A[holePosition-1] > valueToInsert do:


10 A[holePosition] = A[holePosition-1]
11 holePosition = holePosition -1
12 end while

13 /* insert the number at hole position */


14 A[holePosition] = valueToInsert

15 end for

16 end procedure

Summary

1
5
Data structure introduction refers to a scheme for organizing data, or in other words a data
structure is an arrangement of data in computer's memory in such a way that it could make the
data quickly available to the processor for required calculations. A data structure should be seen
as a logical concept that must address two fundamental concerns. First, how the data will be
stored, and second, what operations will be performed on it? As data structure is a scheme for
data organization so the functional definition of a data structure should be independent of its
implementation.

1
6

You might also like