You are on page 1of 51

Chapter 2

The Big Picture


1

Overview

The big picture answers to several questions.

What are data structures?

What data structures do we study?

What are Abstract Data Types?

Why Object-Oriented Programming (OOP) and


Java for data structures?

How do I choose the right data structures?

2.1 What Are Data Structures

2.1 What Are Data Structures

A data structure is an aggregation of data components


that, together, constitute a meaningful whole.

The components themselves may be data structures.


Stops at some atomic unit.

Atomic or primitive type A data type whose elements


are single, non decomposable data items (can be
broken into parts)
Composite type A data type whose elements are
composed of multiple data items

(ex: take tow integers (simple elements) x, y to form a


point (x, y)
4

2.2 What Data Structures Do We


Study?

An array is an aggregation of entries that are


arranged in contiguous fashion with the
provision of a single-step random access to any
entry.

There are numerous other situations where more


sophisticated data structures are required.

Data structure categories:

Linear
Non-linear

Category is based on how the data is conceptually


organized or aggregated.

2.2 What Data Structures Do We


Study?

Linear structures

List, Queue and Stack are linear collections, each


of them serves as a repository in which entries may
be added or removed at will.

Differ in how these entries may be accessed once they


are added.

2.2 What Data Structures Do We


Study?

List

The List is a linear collection of entries in which


entries may be added, removed, and searched for
without restrictions.

Two kinds of list:

Ordered List
Unordered List

2.2 What Data Structures Do We


Study?

Queue

Entries may only be removed in the order in which


they are added.

First in First out (FIFO) data structures


No search for an entry in the Queue

2.2 What Data Structures Do We


Study?

Stack

Entries may only be removed in the reverse order in


which they are added.

Last In, First Out (LIFO)


No search for an entry in the Stack.

2.2 What Data Structures Do We


Study?

Trees:

A tree is a nonlinear structure with a unique starting


node (the root), in which each node is capable of
having many child nodes, and in which a unique
path exists from the root to every other node. Trees
are useful for representing hierarchical relationships
among data items.

Root The top node of a tree structure; a node with


no parent

10

Tree

11

Not Tree

12

2.2 What Data Structures Do We


Study?

Binary Tree

Binary tree A tree in which each node is capable


of having two child nodes, a left child node and a
right child node

Leaf A tree node that has no children

13

A Binary Tree

14

2.2 What Data Structures Do We


Study?

General Tree

Models a hierarchy such as the organizational


structure of a company, or a family tree.

A non-linear arrangement of entries, it is a generalization


of the binary tree structure, hence the name.

15

2.2 What Data Structures Do We


Study?

Binary Search Tree


A binary tree in which the key value in any node
is greater than the key value in its left child and
any of its descendants (the nodes in the left
subtree) and less than the key value in its right
child and any of its descendants (the nodes in
the right subtree)

16

Binary Search Tree

17

2.2 What Data Structures Do We


Study?

AVL Tree

Height-balanced, binary search tree.

AVL Tree derives its importance from the fact that it


speeds up this search process to a remarkable degree.

18

2.2 What Data Structures Do We


Study?

Heap as a Priority Queue

A priority queue is a specialization of the FIFO


Queue.

Entries are assigned priorities.


The entry with the highest priority is the one to leave first.

The heap is a special type of binary tree.

19

Complete binary tree

a complete binary tree is one in which every


level but the last must have the maximum
number of nodes possible at that level.

The last level may have fewer than the maximum


possible nodes, but they should be arranged from
left to right without any empty spots.

20

Heap

21

2.2 What Data Structures Do We


Study?

Hash Table:

Hash Functions: A function used to manipulate the


key of an element in a list to identify its location in
the list

Hashing: The technique for ordering and accessing


elements in a list in a relatively constant amount of
time by manipulating the key to identify its location
in the list

Hash table: Term used to describe the data


structure used to store and retrieve elements using
hashing
22

Using a hash function to Determine the


Location of the Element in an Array

23

2.2 What Data Structures Do We Study?

Graphs

Graph: A data structure that consists of a set of models and a set of edges
that relate the nodes to each other

Vertex: A node in a graph

Edge (arc): A pair of vertices representing a connection between two nodes


in a graph

Two kinds of graphs:

Undirected graph: A graph in which the edges have no direction


Directed graph (digraph): A graph in which each edge is directed from one vertex to
another (or the same) vertex

A general tree is a special kind of graph, since a hierarchy is a special


system of relationships among entities.

Graphs may be used to model systems of physical connections such as computer


networks, airline routes, etc., as well as abstract relationships such as course prerequisite structures.
Standard graph algorithms answer certain questions we may ask of the system.

24

Graphs

25

Adjacent vertices: Two vertices in a graph that


are connected by an edge
Path: A sequence of vertices that connects two
nodes in a graph
Complete graph: A graph in which every vertex
is directly connected to every other vertex
Weighted graph: A graph in which each edge
carries a value
26

27

Data Type

In a modern computer, data consists fundamentally of binary bits, but


meaningful data is organized into primitive data types such as integer,
real, and boolean and into more complex data structures such as arrays
and binary trees.
These data types and data structures always come along with
associated operations that can be done on the data. For example, the
32-bit int data type isdefined both by the fact that a value of type int
consists of 32 binary bits and by the fact that two int values can be
added, subtracted, multiplied, compared, and so on.
An array is defined both by the fact that it is a sequence of data items
of the same basic type, and by the fact that it is possible to directly
access each of the positions in the list based on its numerical index.
So the idea of a data type includes a specification of the possible
values of that type together with the operations that can be
performed on those values.
28

2.3 What are Abstract Data Types?

An algorithm is an abstract idea, and a program is an


implementation of an algorithm. Similarly, it is useful to
be able to work with the abstract idea behind a data
type or data structure, without getting bogged down in
the implementation details.
The abstraction in this case is called an abstract data
type."
An abstract data type specifies the values of the type,
but not how those values are represented as
collections of bits, and it specifies operations on those
values in terms of their inputs, outputs, and effects
rather than as particular algorithms or program code.
29

2.3 What are Abstract Data Types?

We are all used to dealing with the primitive data types


as abstract data types.
It is quite likely that you don't know the details of how
values of type double are represented as sequences
of bits. The details are, in fact, rather complicated.
However, you know how to work with double values by
adding them, multiplying them, truncating them to
values of type int, inputting and outputting them, and
so on.
To work with values of type double, you only need to
know how to use these operations and what they do.
You don't need to know how they are implemented.
30

2.3 What are Abstract Data Types?


Abstract data type (ADT) A data type whose
properties (domain and operations) are
specified independently of any particular
implementation

A data structure is an abstract data type.

i.e. The primitive data types built into a language integer,


real, character and boolean.

We do not at all worry about its internal representation.

We know we can perform certain operations on integers


that are guaranteed to work the way they are intended to.

+, -, *, / language designers and compiler writers were


responsible for designing the interface for these data types, as
well as implementing them.
31

2.3 What are Abstract Data Types?

Exactly how these operations are implemented


by the compiler in the machine code is of no
concern.

On a different machine, the behavior of an integer


does not change, even though it's internal
representations may change.

As the programmers were concerned, the primitive


data types were abstract entries.

32

2.3 What are Abstract Data Types?

When you create your own new data types(stacks,


queues, trees, and even more complicated structures)
you are necessarily deeply involved in the data
representation and in the implementation of the
operations.
However, when you use these data types, you don't
want to have to worry about the implementation. You
want to be able to use them the way you use the built-in
primitive types, as abstract data types.
This is in conformance with one of the central principles
of software engineering: encapsulation, also known as
information hiding, modularity, and separation of
interface from implementation.
33

2.3 What are Abstract Data Types?

A data structure consisting of several layers of


abstraction.

A Stack (of integers) may be built using a List (of


integers), which in turn may be built using an array
(of integers).

Integer and array are language-defined types.


Stack and List are user-defined.

34

2.3 What are Abstract Data Types?

35

2.3 What are Abstract Data Types?

Since an ADT makes a clean separation


between interface and implementation, the user
only sees the interface and therefore does not
need tamper with the implementation.

The responsibility of maintaining the implementation


is separated from the responsibility of maintaining th
code that uses an ADT.

This makes the code easier to maintain.

ADTs may be used many times in various contexts.

A List ADT may be used directly in application code, or


may be used to build another ADT, such as the Stack.
36

2.3 What are Abstract Data Types?

Object-oriented programming is paradigm that


addresses exactly these issues.

37

2.4 Why OOP and Java for Data


Structures?

OOP paradigm views the program as a system


of interacting objects.

Objects in a program may model physical objects, or


abstract entities.

An object encapsulates state and behavior.

For instance, the state of an integer, is its current value, it


behavior is the set of arithmetic operations that may be
applies to integers.

A class is analogous to the ADT, an object is


analogous to a variable of an ADT.

The user of an object is called its client.

38

2.4 Why OOP and Java for Data


Structures?

A stack may be built using a List ADT.

A stack interface defines four operations:

push

pop

get-Size

isEmpty

39

2.4 Why OOP and Java for Data


Structures?

The stack object contains a List object which


implements its state, and the behavior of the
Stack object is implemented in terms of the List
object's behavior.

40

2.4 Why OOP and Java for Data


Structures?

Reuse by inheritance.

Data structures may be built by inheriting from other


data structures; if data structure A inherits from data
structure B, then every A object is is-A B object.

AVL Tree can inherit from a Binary Search Tree, since


every AVL Tree is is-A special, height balanced, binary
search tree.

Every binary search tree is not an AVL tree.

41

2.4 Why OOP and Java for Data


Structures?

Inheriting means sharing the implementation


code.
A language that implements the OOP paradigm
automatically ensures:

Separation of interface from implementation by


encapsulation

Code reuse inheritance when permitted.

42

2.5 How Do I choose the Right Data


Structures?

The interface of operations that is supported by


a data structure is one factor to consider when
choosing between several available data
structures.

The efficiency of the data structures:

How much space does the data structure occupy?


What are the running times of the operation in its
interface?

43

2.5 How Do I choose the Right Data


Structures?

Example

Implementing a printer queue, requires a queue


data structure.

Maintains a collection of entries in no particular order.


An unordered list would be the appropriate data structure
in this case.

It is not too difficult to fit the requirements of the


application to the operations supported by a data
structure.

It is more difficult to choose from a set of candidate data


structures that all meet the operational requirements.
44

2.5 How Do I choose the Right Data


Structures?

Important, and sometimes contradictory factor


to consider:

The running time of each operation in the interface.

A data structure with the best interface with the best fit
may not necessarily be the best overall fit, if the running
times of its operations are not up to the mark.
When we have more than one data structure
implementation whose interfaces satisfy our
requirements, we may have to select one based on
comparing the running times of the interface operations.
Time is traded off for space,

i.e. more space is consumed to increase speed, or a reduction in


speed is traded for a reduction in the space consumption.

45

2.5 How Do I choose the Right Data


Structures?

Time-space tradeoff

We are looking to buy the best implementation of a


stack.

StackA. Does not provide a getSize operation.

i.e. there is not single operation that a client can use to get the
number of entries in StackA.

StackB. Provides a getSize operation, implemented in


the manner we discussed earlier, transferring entries back
and forth between two stacks.
StackC. Provides a getSize operation, implemented as
follows: a variable called size is maintained that is
incremented every time an entry is pushed, and
decremented every time an entry is popped.

46

2.5 How Do I choose the Right Data


Structures?

Three situations:

Need to maintain a large number stacks, with no


need to find the number of entries.

Need to maintain only one stack, with frequent need


to find the number of entries.

Need to maintain a large number of stacks. With


infrequent need to find the number of entries.

47

2.5 How Do I choose the Right Data


Structures?

Situation 1, StackA fits the bill.

Tempting to pick StackC, simply because we may


want to play conservative: what if we need getSize
in the future?

48

2.5 How Do I choose the Right Data


Structures?

Situation 2, StackB or Stack C.

Need to use getSize.

getSize in StackB is more time-consuming than that


in StackC.

We need only one stack, the additional size variable


used by StackC is not an issue.

Since we need to use getSize frequently, it is better


to with StackC.

49

2.5 How Do I choose the Right Data


Structures?

Situation 3 presents a choice between StackB


and StackC.

If getSize calls are infrequent, we may choose to go


with StackB and suffer a loss in speed.

The faster getSize delivered by StackC is at the


expense of an extra variable per stack, which may
add up to considerable space consumption since
we plan to maintain a number of stacks.

50

2.5 How Do I choose the Right Data


Structures?

getSize in StackB is more time-consuming that


that in StackC.

How can we quantify the time taken in either case?

For each data structure we study, we present the


running time of each operation in its interface.

51

You might also like