Implementation With Ruby Features: One Contains Elements Smaller Than V The Other Contains Elements Greater Than V

Searching Searching
Hashing Hashing
Implementation with ruby features
Sorting, Searching and Haching It uses the ideas of the quicksort
def qsort
Bruno MARTIN, return self if empty?
University of Nice - Sophia Antipolis select { |x| x < first }.qsort
mailto:Bruno.Martin@unice.fr + select { |x| x==first}
http://deptinfo.unice.fr/∼bmartin/mathmods.html + select { |x| x > first }.qsort
end
How can we replace the select operator from ruby?
Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Sorting, Searching and Haching
http://deptinfo.unice.fr/ ∼bmartin/mathmods.html Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr
http://deptinfo.unice.fr/ ∼bmartin/mathmods.html
Searching Searching
Hashing Hashing
Quick Sort Algorithm of Quick Sort
Invented by C.A.R. Hoare in 1960, easy to implement, a good

general purpose internal sort For example, the random element can be the leftmost or the
It is a divide-and-conquer algorithm : rightmost element, we choose the rightmost.
”Our” QuickSort runs on an array [aleft , ..., aright ]:
take at random an element in the array, say v
divide the array into two partitions : def quick!(left,right)
One contains elements smaller than v if left < right
The other contains elements greater than v
m = self.partition(left,right)
put the elements ≤ v at the begining of the array (say, index self.quick!(left, m-1)
between 1 and m − 1) and the elements ≥ v at the end of the self.quick!(m+1, right)
array (index between m + 1 and N) then you have found the end
place to put v between the two partitions (at position m) end
recursively call QuickSort on ([a0 , ..., am−1 ] and [am+1 , ..., aN−1 ])
stop when the partition is reduced to a single element

Searching Searching
Hashing Hashing
Algorithm of the Partition of the Array Example: [3,5,1,2,4].qsort!
Scans (index i) from the left until you find an elt ≥ v (a[i] ≥ v )
Scans (index j) from the right until you find an elt ≤ v (a[j] ≤ v )
Both elements are obviously out of place: swap a[i] and a[j]
Continue until the scan pointers cross (j ≤ i)
Exchange v (a[right]) with the element a[i]
Best seen from
until j<=i do /Users/bmartin/Documents/Enseignement/Mathmods/Programs
i+=1 until self[i]>=v #scans for i:self[i]>=v with trirapide!
j-=1 until self[j]<=v #scans for j:self[j]<=v
if i<=j
self.swap!(i,j) #exchange both elements
i+=1; j-=1 #modify indexes:clean recursion
end
end

Searching Searching
Hashing Hashing
The big picture Quick Sort

def qsort!
def lqsort(left,right) #sort from left to right
if left<right We test that neither i nor j cross the array bounds left and right
v,i,j=self[right],left,right
Because v = self [right] you are sure that the loop on i stops at
until j<=i do
i+=1 until self[i]>=v #scans for i:self[i]>=v least when i = right
j-=1 until self[j]<=v #scans for j:self[j]<=v But if v = self [right] happens to be the smallest element between
if i<=j left and right, the loop on j might pass the left end of the array
self.swap!(i,j) #exchange both elements To avoid the tests, you can choose another solution
i+=1; j-=1 #modify indexes:clean recursion
Take three elements in the array: the leftmost, the rightmost
end
end and the middle one
self.lqsort(left,j) #sort left part Sort them
self.lqsort(i,right) #sort right part Put the smallest at the leftmost position, the greatest at the
end
rightmost position and the middle one as v
end
self.lqsort(0,self.length-1)
self
end
Searching Searching
Hashing Hashing
Quick Sort on Average-Case Partitioning Quick Sort on worst-case partitioning
Quick Sort is very inefficient on already sorted sets: O(N 2 )

Average performance of Quick Sort is about 1.38N log N: Suppose a[0], ..., a[N − 1] sorted without equal elements
very efficient algorithm with a very small constant At the first call v = a[N − 1]
Quick Sort is a divide-and-conquer algorithm which splits the The while on i continues until i = N − 1 and stops because
problem in two recursive calls and “combines” the results a[N − 1] = v : the sort does N comparisons
Divide-and-conquer is a good method every time you can split your The while on j stops on j = N − 2 because a[N − 2] < v : 1
problem in smaller pieces and combine the results to obtain the comparison
We exchange a[N − 1] with itself : 1 exchange
global solution
We call QuickSort on a[0], ..., a[N − 2] and on
But divide-and-conquer leads to an efficient algorithm only when a[N − 2], ..., a[N − 1] which imediately stops
the problem is divided without overlap
So (N + 1) + N + (N − 1) + ... + 2 = N(N + 3)/2
QuickSort is in O(N 2 ) on sorted sets

Searching Searching
Hashing Hashing
CN : average number�of comparisons for sorting N elements: Intuition for the performance of quick sort
CN = N + 1 + N1 N k=1 (Ck−1 + CN−k ) Quicksort running time depends on whether the partitioning is
N + 1 comparisons during the two inner whiles N − 1 + 2
balanced
(2 when i and j cross)
Plus the average number of comparisons on the two sub-arrays The worst-case partitioning occurs when the partitioning produces
((C0 + CN−1 ) + (C1 + CN−2 ) + ... + (CN−1 + C0 ))/N one region with 1 element and one with N − 1 elements: O(N 2 )
2 �N The best-case partitioning occurs when the partitioning produces
By symmetry : CN = N + 1 + N k=1 Ck−1
substract NCN − (N − 1)CN−1 two regions with N/2 elements (CN = N + 2CN/2 ): O(N log N)
worst-case ^ best-case
NCN = (N + 1)CN−1 + 2N
N | N N ^
divide both side by N(N + 1) to obtain the recurrence : / \ | / \ |
�1 N+1 1 N-1 | N/2 N/2 N |
CN CN−1 2 CN−2 2 2 C2 / \ N / \ / \ log N
= + = + + = ... = +2
N +1 N N +1 N −1 N N +1 3 k 1 N-2 | N/4 N/4 N/4 N/4 N |
k=4
�N �N / \ | . |
CN 1 1
Approximation : N+1 ≈2 k=1 k ≈2 1 x dx ≈ 2 ln N 1 N-3 | . |
... | 1 1 ... 1 1 N |
CN ≈ 2NlnN ≈ 2Nln(2)Log (N) ≈ 1.38NLogN / \ | v
1 1 v
Searching Searching
Hashing Hashing
Lower Bound for Sorting Overview
Is sorting an array of size N possible in ≤ N log N operations ?

If you use element comparisons: it is impossible
You need to model your computation problem:
1 Searching
You express each sort by a decision tree where each internal
node represents the comparison between two elements
The left child correspond to the negative answer and the right 2 Hashing
child to the positive one
Each leaf represents a given permutation

Searching Searching
Hashing Hashing
Representing the decision tree model Introduction to Searching
Set to sort: {a1, a2, a3} the corresponding decision tree is :
a1 > a2 Searching: fundamental operation in many tasks: retrieving a

/ \ particular information among a large amount of stored data
a2 > a3 a1 > a3
/ \ / \ The stored data can be viewed as a set
(a1,a2,a3) a1 > a3 (a2,a1,a3) a2 > a3 Information divided into records with field key used for searching
/ \ / \ Goal of Searching: find the records whose key matches a given
(a1,a3,a2) (a3,a1,a2) (a3,a2,a1) (a3,a2,a1) searched key
The decision tree to sort N elements has N! leaves (all possible Dictionaries and symbol tables are two examples of data
permutations) structures needed for searching
A binary tree with N! leaves has a height order of log(N!) which is
approximately N log N (Stirling)
N log N is a lower bound for sorting

Searching Searching
Hashing Hashing
Operations of Searching Sequential Searching in a Sorted List is in O(N)
Sequential searching in a sorted list approximately uses N/2 for

The time complexity often depends on the structure given to the
both a successful and an unsuccessful search
set of records (eg lists, sets, arrays, trees,...)
The (average) complexity of the successful search in sorted
So, when programming a Searching algorithm on a structure, one lists equals the successful search on array in the average case
often needs to provide operations like Insertion, Deletion and For unsuccessful:
sometimes Sorting the set of records The search can be ended by each of the elements of the list
We do 1 comparison if the searched key is less than the first
In any case, the time complexity of the searching algorithm might
element,..., N + 1 comparison if the key is greater than the last
be sensitive to operations like comparison of keys, insertion of one one (the sentinel)
record in the set, shift of records, exchange of records, ... (1 + ... + (N + 1))/N = (N + 1)(N + 2)/2N

Searching Searching
Hashing Hashing
Sequential Searching in an Array is O(N) An Elementary Searching Algorithm : the Binary Search
Sequential Searching in an array uses

N + 1 comparisons for an unsuccessful search in the best,
average and worst case When the set of records gets large and the records are ordered to
(N + 1)/2 comparisons for a successful search on the reduce the searching time, use a divide-and-conquer strategy:
average1 Divide the set into two parts
Suppose that the records have the same probability to be found
Determine in which part the key might belong to
We do 1 comparison with the first one,
.. Repeat the search on this part of the set
.
N to find the last one
on the average: (1 + 2 + ... + N)/N = N(N + 1)/2N
1
average=mean= sum of all the entries
number of entries
Searching Searching
Hashing Hashing
Application to numerical analysis Performance of Binary Search
For finding an approximate of the zeroes of a cont. function by the Proof 1 :

Theorem (Intermediate value theorem) Consider the tree of the recursive calls of the Search
If the function f (x) = y is continuous on [a, b] and u is a number At each call the array is split into two halves
st f (a) < u < f (b), then there is a c ∈ [a, b] s.t. f (c) = u. The tree is a full binary tree
The number of comparisons equals the tree height : log2 N
if one can evaluate the sign of f ((a + b)/2);
Let f be strictly increasing on [a, b] with f (a) < 0 < f (b) Proof 2 :
The binary search allows to find y st f (y ) = 0: The number of comparisons at step N equals the number of
1 start with the pair (a, b) comparisons in one subarray plus 1 because you compare with
2 evaluate v = f ((a + b)/2) the root
3 if v < 0 replace a by v otherwise replace b by v Solve the recurrence
4 iterate on the new pair until the diff. between the values is CN = CN/2 + 1, for N ≥ 2 with C1 = 0 → log N
less than an arbitrary given precision CN = CN/2 + 1 N = 2n C2n = C2n−1 + 1 ...C2n = n = log N

Searching Searching
Hashing Hashing
Performance of Binary Search Order of magnitude
Searching on the average case :

A successful sequential search in a set of 10000 elements
takes 5000 comparisons
Binary Search uses approximately log N comparisons for both
A successful binary search in the same set takes 14
(un)successful search in best, average and worst case
comparisons
Maximal number of comparisons when the search is unsuccessful BUT
Inserting an element :
In an array takes 1 operation
In a sorted array takes N operations : to find the place and
shift right the other elements

Searching Searching
Hashing Hashing
Elementary Searching Algorithm: Interpolation Searching Outline
Dictionary search: if the word begins by B you look near the

beginning and if the word begins by T you turn a lot of pages.
Suppose you search the key k, in the binary search you cut the
array in the middle 1 Searching
1
middle = left + (right − left)
2
In the interpolation you takes the values of the keys into account 2 Hashing
by replacing 1/2 by a better progression
k − A[left].key
position = left + (right − left)
A[right].key − A[left].key

Searching Searching
Hashing Hashing
Performance of the Interpolation Search Hashing
The interpolation search uses approximately log(log N)

comparisons for both (un)successful search in the array
Hashing is a completely different method of searching
But Interpolation search heavily depends on the fact that the keys
are well distributed over the interval The idea is to access directly the record in a table using its
key - the same way an index accesses an entry in an array -
The method requires some computation; for small sets the log N
of binary search is close to log(log N) We use a hash function that computes a table index from the key
So interpolation search should be used for large sets in Basic operations: insert, remove, search
applications where comparisons are particularly expensive or for
external methods where access costs are high

Searching Searching
Hashing Hashing
Hashing Why does M have to be prime ?
The steps in hashing:

An example of hash function is
1 compute a hash function which maps keys in table addresses � �
hash(key ) = key [0] × (2k )0 + key [1] × (2k )1 + ... + key [n] × (2k )n mod M
Since there are more records (N) than indexes (M) in the
table, two or more keys may hash to the same table address :
it’s the collision problem Suppose you choose M = 2k then
2 the collision resolution process XXX mod M is unaffected by adding to XXX multiples of 2k
hash(key ) = key [0] : hash only depends on the 1st char of key
Good hash functions should uniformly distribute entries in the table
The simplest way to ensure that the hash function takes all the
Since, if the function uniformly distributes the keys, the complexity characters of a key into account is to take M prime
of searching is approx. divided by the table’s size

Searching Searching
Hashing Hashing
Transform Keys into Integers in [[0, M − 1]] How to Handle the Collision Process
If your key is already a large integer

choose M to be a prime and compute key mod M We have an array of size M - called the hash table - and a hash
If your key is an uppercase character string function which gives for any key a possible entry in this array
encode each char in a 5-bit code (5 bits (25 ) are required to
Problem: decide what to do when 2 keys hash to the same address
encode 26 items): each letter is encoded by the binary value
of its rank in the alphabet A first simple method is to build for each table entry a linked list
compute the modulo of the corresponding decimal value of records whose keys hash to the same entry
Colliding records are chained together we call it separate chaining
Example
At the initialization, the hash table will be an array of M pointers
ABC ⇒ 00001 00010 00011 ⇒ to empty linked lists
1 ∗ (25 )2 + 2 ∗ (25 )1 + 3 ∗ (25 )0 = 1091 ⇒ 1091 mod M ⇒
index table

Searching Searching
Hashing Hashing
Example Searching Performances
Good hash functions uniformly distribute entries over the table

N
Searching expected values in O(α) (α = M table’s filling rate):
�
Unsuccessful: M1 M (1 + �Li ) since the element �∈ Li
�
Q − (M, N) = α + 1 since �Li = N
Successful: searching for an element in the table equals the cost of

inserting it when only the inserted elements before it were already in
the table:
N−1 N−1
1 � − 1 � i α 1
Q + (M, N) = Q (M, i) = 1+ =1+ −
N N M 2 2M
i=0 i=0
The interest of hashing is that it is efficient and easy to program

Searching Searching
Hashing Hashing
Searching a record in a Hash Table with linked lists Alternative proof for successful search
xi is the i th element inserted into the table and ki = key [xi ]

Main operation on a HashTable: search a record with its key : Xij = 1{h(ki ) = h(kj )} for all i, j (indicator Rand.Var.)
compute the hash value of the key : hash(key ) = i simple uniform hashing: Pr {h(ki ) = h(kj )} = 1/M ⇒ E [Xij ] = 1/M
access to the linked list at position i : HashTable[i] expected number of elements examined in a successful search:
  
if there’s more than your record in the list you have collisions N N
1 � �
searching becomes a search in a list: iterate on each record E 1 + Xij  (1)
N
comparing the keys i=1 j=i+1
unsuccessful search: you iterate down the list without finding �N

Xij =� of elements inserted after xi into the same slot as xi .
your record j=i+1
1 �N � �N 1 �N
� �N 1
� �
Operations of insertion and removal of records in a Hash (1) = N i=1 1 + j=i+1 E [Xij ] = N i=1 1 + j=i+1 M =
Table become linked list operations �N � � � �
1 1 N N
1+ (N − i) = 1 + NM i=1 N − i=1 i =
NM � i=1 �
1+ 1
NM N 2 − N(N+1)
2 = 1 + N−1
2M

Searching Searching
Hashing Hashing
Expected cost – interpretation Searching and Inserting in Linear Probing
If the place HashTable[hash(key )] is already busy

If the keys match, the search is successful
if N = O(M), then α = N/M = O(M)/M = O(1) Else there is a collision
searching takes constant time on the average You search at the next place i + 1
insertion is O(1) in the worst case If the place is free, the search is unsuccessful and you have
deletion takes O(1) worst-case time for doubly linked lists found a place to insert your record
hence, all dictionary operations take O(1) time on average Else if the keys match, the search is successful
with hash tables with chaining If the keys differ try the next position i + 2
But be careful the position after i is i + 1 mod M
And check that the table is not full otherwise the iteration
won’t terminate

Searching Searching
Hashing Hashing
Another structure for Hash Table: Linear Probing Example
When the number of elements N can be estimated in advance

You can avoid using any linked list
You store N records in a table of size M > N
Empty places in the table help you for collision resolution
It is called the linear probing

Searching Searching
Hashing Hashing
Problem with Linear Probing Eliminating the Clustering Problem
Suppose you like to perform the operation of suppression

To suppress an element in the Hash Table, you search it, you
remove it from the array and the place is free again. Is it so simple?
Suppose key 1 and key 2 (different) hash to the same address i Instead of examining each successive entry, we use a second hash
you insert key 1 first at position i function to compute a fixed increment to use for the sequence
(instead of using 1 in linear probing)
you try to insert key 2 at position i, you find it busy, and you Depending on the choice of the second hash function, the program
finally insert it at position i + 1 may not work : obviously 0 leads to an infinite loop
now you suppress key 1. The place i becomes free
you search key 2: it hashes at a free position i: its search is
unsuccessful but key 2 is in the table
A place may have three status: free, busy and suppress

Searching Searching
Hashing Hashing
Performances in Hash Table with linear probing Conclusion on Hashing
This hashing works because it guarantees that when you search for Hashing is a classical problem in CS: various algorithms have been
a particular key you look at every key that hashes to the same studied and are widely used
table address There are many empirical and analytic results that make utility of
In linear probing when the table begins to fill up, you also look to Hashing evident for a broad variety of applications
other keys: 2 different collision sets may be stuck together: Hashing is prefered to binary tree searches for many
clustering problem applications because it is simple to implement and can provide
Linear probing is very slow when tables are almost full because of very fast constant searching times when space is available for a
the clustering problem large enough table
And when the table is full you cannot continue to use it

Searching
Hashing
Hashing in Ruby
zip=Hash.new
zip={"06000" => "Nice", "06100" => "Nice", "06110" => "Le Cannet",
"06130" => "Grasse", "06140" => "Coursegoules", "06140" => "Tourrettes
sur Loup", "06140" => "Vence", "06190" => "Rocquebrune Cap Martin",
"06200" => "Nice", "06230" => "Saint Jean Cap Ferrat", "06230" =>
"Villefranche sur Mer"}
zip["06300"]="Nice" # adds a new entry

zip.keys=>["06140", "06130", "06230", "06110", "06000", "06100",
"06200", "06300", "06190"]
zip.values=>["Vence", "Grasse", "Villefranche sur Mer", "Le Cannet",

"Nice", "Nice", "Nice", "Nice", "Rocquebrune Cap Martin"]
zip.select { |key,val| val="Nice"}=>[["06000", "Nice"], ["06100",

"Nice"], ["06200", "Nice"], ["06300", "Nice"]]
zip.index "Nice" => "06000"
zip.each {|k,v| puts "#{k}/#{v}"}=>

06140/Vence
06130/Grasse
06230/Villefranche sur Mer
06110/Le Cannet
06000/Nice
06100/Nice
06200/Nice
06300/Nice
06190/Rocquebrune Cap Martin


Implementation With Ruby Features: One Contains Elements Smaller Than V The Other Contains Elements Greater Than V

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Implementation With Ruby Features: One Contains Elements Smaller Than V The Other Contains Elements Greater Than V

Uploaded by

Copyright:

Available Formats

Searching Searching

Implementation with ruby features

Sorting, Searching and Haching It uses the ideas of the quicksort

How can we replace the select operator from ruby?

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Quick Sort Algorithm of Quick Sort

Invented by C.A.R. Hoare in 1960, easy to implement, a good

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Algorithm of the Partition of the Array Example: [3,5,1,2,4].qsort!

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

The big picture Quick Sort

Quick Sort on Average-Case Partitioning Quick Sort on worst-case partitioning

Quick Sort is very ineﬃcient on already sorted sets: O(N 2 )

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Lower Bound for Sorting Overview

Is sorting an array of size N possible in ≤ N log N operations ?

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Representing the decision tree model Introduction to Searching

Set to sort: {a1, a2, a3} the corresponding decision tree is :

a1 > a2 Searching: fundamental operation in many tasks: retrieving a

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Operations of Searching Sequential Searching in a Sorted List is in O(N)

Sequential searching in a sorted list approximately uses N/2 for

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Sequential Searching in an array uses

Application to numerical analysis Performance of Binary Search

For finding an approximate of the zeroes of a cont. function by the Proof 1 :

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Performance of Binary Search Order of magnitude

Searching on the average case :

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Elementary Searching Algorithm: Interpolation Searching Outline

Dictionary search: if the word begins by B you look near the

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Performance of the Interpolation Search Hashing

The interpolation search uses approximately log(log N)

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Hashing Why does M have to be prime ?

The steps in hashing:

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

If your key is already a large integer

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Example Searching Performances

Good hash functions uniformly distribute entries over the table

Successful: searching for an element in the table equals the cost of

The interest of hashing is that it is eﬃcient and easy to program

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

xi is the i th element inserted into the table and ki = key [xi ]

unsuccessful search: you iterate down the list without finding �N

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Expected cost – interpretation Searching and Inserting in Linear Probing

If the place HashTable[hash(key )] is already busy

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Another structure for Hash Table: Linear Probing Example

When the number of elements N can be estimated in advance

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Problem with Linear Probing Eliminating the Clustering Problem

Suppose you like to perform the operation of suppression

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

Performances in Hash Table with linear probing Conclusion on Hashing

Bruno MARTIN, University of Nice - Sophia Antipolis mailto:Bruno.Martin@unice.fr

zip["06300"]="Nice" # adds a new entry

zip.values=>["Vence", "Grasse", "Villefranche sur Mer", "Le Cannet",