0 Up votes0 Down votes

8 views32 pagesOct 28, 2010

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

Attribution Non-Commercial (BY-NC)

8 views

Attribution Non-Commercial (BY-NC)

- Google Interview Preparation
- Ppt
- Unit 6
- Unit6- Searching Techniques
- 05987184
- An Efficient Approach of Fast External Sorting Algorithm in Data Warehouse
- Cmpt 125 Study Notes
- Algorithm Book
- 09 Parallel II 11 02 Ann
- MELJUN CORTES Midterm Exam ITC16
- lec01_algorithmanalysis
- [IJCST-V3I5P21]:Niharika Singh, DR. R. G. Tiwari
- 00.Syllabus
- Google Interview Preparation guide
- CUDPP Slides
- quiz1
- Tutorial 2 (1)
- Study Guide
- Lesson1-5.5.pdf
- 1150 Ishaan

You are on page 1of 32

Parallel Computing

George Karypis

Sorting

Outline

Background

Sorting Networks

Quicksort

Bucket-Sort & Sample-Sort

Background

Input Specification

Each processor has n/p elements

A ordering of the processors

Output Specification

Each processor will get n/p consecutive elements of

the final sorted array.

The “chunk” is determined by the processor ordering.

Variations

Unequal number of elements on output.

In general, this is not a good idea and it may require a shift to

obtain the equal size distribution.

Basic Operation:

Compare-Split Operation

Single element per processor

Sorting Networks

Sorting is one of the fundamental problems in

Computer Science

For a long time researchers have focused on the

problem of “how fast can we sort n elements”?

Serial

nlog(n) lower-bound for comparison-based sorting

Parallel

O(1), O(log(n)), O(???)

Sorting networks

Custom-made hardware for sorting!

Hardware & algorithm

Mostly of theoretical interest but fun to study!

Elements of Sorting Networks

Key Idea:

Perform many comparisons in

parallel.

Key Elements:

Comparators:

Consist of two-input, two-output

wires

Take two elements on the input

wires and outputs them in sorted

order in the output wires.

Network architecture:

The arrangement of the

comparators into interconnected

comparator columns

similar to multi-stage networks

developed.

Bitonic sorting network

Θ(log2(n)) columns of

comparators.

Bitonic Sequence

graphically represented 6

4

by lines as follows: 2

1

0

Why Bitonic Sequences?

A bitonic sequence can be “easily” sorted in

increasing/decreasing order.

Bitonic

Split

s s1 s2

• Both s1 and s2 are bitonic sequences.

• So how can a bitonic sequence be sorted?

An example

Bitonic Merging Network

A comparator network that

takes as input a bitonic

sequence and performs a

sequence of bitonic splits

to sort it.

+BM[n]

A bitonic merging

network for sorting in

increasing order an n-

element bitonic

sequence.

-BM[n]

Similar sort in decreasing

order.

Are we done?

Given a set of elements, how do we re-arrange them into

a bitonic sequence?

Key Idea:

Use successively larger bitonic networks to transform the set into

a bitonic sequence.

An example

Complexity

How many columns of

comparators are required

to sort n=2l elements?

i.e.,

depth d(n) of the

network?

Bitonic Sort on a Hypercube

One-element-per-processor case

How do we map the algorithm onto a hypercube?

What is the comparator?

How do the wires get mapped?

pairs of wires that are inputs

to the various comparators?

Illustration

Communication Pattern

Algorithm

Complexity?

Bitonic Sort on a Mesh

One-element-per-processor case

How do the wires get mapped?

Why?

Row-Major Shuffled Mapping

Complexity?

Can we do better?

What is the lowest bound of sorting on a mesh?

communication performed by each process

More than one element per

processor

Hypercube

Mesh

Bitonic Sort Summary

Quicksort

Parallel Formulation

How about recursive decomposition?

Is it a good idea?

We need to do the partitioning of the array around

a pivot element in parallel.

What is the lower bound of parallel

quicksort?

What will it take to achieve this lower bound?

Optimal for CRCW PRAM

One element per processor

Arbitrary resolution of the concurrent writes.

Views the sorting as a two-step process:

(i) Constructing a binary tree of pivot elements

(ii) Obtaining the sorted sequence by performing an inorder

traversal of this binary tree.

Building the Binary Tree

Complexity?

Practical Quicksort

Shared-memory

Data resides on a shared array.

During a partitioning each

processor is responsible for a

certain portion.

Array Partitioning:

Select & Broadcast pivot.

Local re-arrangement.

Is this required?

Global re-arrangement.

Efficient Global Rearrangement

Practical Quicksort

Complexity

personalized communication is not cross-bisection bandwidth limited.

A word on Pivot Selection

Selecting pivots that lead to balanced

partitions is importance

height of the tree

effective utilization of processors

Sample Sort

Generalization of bucket sort with data-driven sampling

n/p elements per-processor.

Each processor sorts is local elements.

Each processor selects p-1 equally spaced elements from its

own list.

The combined p(p-1) set of elements are sorted and p-1 equally

spaced elements are selected from that list.

Each processor splits its own list according to these splitters into

p buckets.

Each processor sends its ith bucket to the ith processor.

Each processor merges the elements that it receives.

Done.

Sample Sort Illustration

Sample Sort Complexity

Assumes

a serial sort

- Google Interview PreparationUploaded byNelliud D. Torres
- PptUploaded byHamsa Girish
- Unit 6Uploaded byHarsha Naidu
- Unit6- Searching TechniquesUploaded byBabu
- 05987184Uploaded bySahilYadav
- An Efficient Approach of Fast External Sorting Algorithm in Data WarehouseUploaded byJournal of Computing
- Cmpt 125 Study NotesUploaded byMonique Lai
- Algorithm BookUploaded bybcdchetan
- 09 Parallel II 11 02 AnnUploaded byKrutarth Patel
- MELJUN CORTES Midterm Exam ITC16Uploaded byMELJUN CORTES, MBA,MPA
- lec01_algorithmanalysisUploaded byElcabrito
- [IJCST-V3I5P21]:Niharika Singh, DR. R. G. TiwariUploaded byEighthSenseGroup
- 00.SyllabusUploaded byelemaniaq
- Google Interview Preparation guideUploaded bySp Uday
- CUDPP SlidesUploaded bybkr
- quiz1Uploaded byEshan Gupta
- Tutorial 2 (1)Uploaded bystacy
- Study GuideUploaded byYazan Aswad
- Lesson1-5.5.pdfUploaded byProjectlouis
- 1150 IshaanUploaded byIshaan
- Deep Sort 2014Uploaded byArpit Desai
- IEEE A4 FormatUploaded bySultan Khan
- Bitonic SortUploaded byIonutz Gabriel
- slides6-analysis.pdfUploaded by252966576
- SDA2_ern.pptUploaded byToni Tabi Seunghyun
- 08a-DivideAndConquerUploaded byfrankjamison
- 07 SortingUploaded byPradipta Paul
- Sorting Algorithms-Sesion 6Uploaded byLuigi Garcia Cueva
- p1Uploaded byshreya1402
- 12 SortingUploaded byapi-3799621

- Crossbar NwUploaded byAbhishek Aggarwal
- Bi Tonic SortUploaded bybuenoandre
- Sequential and Parallel Sorting AlgorithmsUploaded bykesharwanidurgesh
- 3sumUploaded byAakash Parkash
- Switching and Queuing Delay Models 240Uploaded byrohit
- Sorting Bitonic SortUploaded byhonestman_usa
- Cole Parallel Mergesort SIAM 1988Uploaded bydrakseh
- lect10dasfasUploaded bySilverius Irwan
- Billion-scale Similarity Search With GPUsUploaded byergodic
- Parallel Algorithm Lecture NotesUploaded byDeepti Gupta
- Shell SortUploaded byAmit Swami
- Unplugged ActivityUploaded bypinocchio

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.