65 views

Uploaded by William Gilreath

Discussion, explanation, and description of the binar sort algorithm. The algorithm is a linear, general-purpose sorting algorithm. Version 2 of the original binar sort algorithm monograph.

- Week 2
- Sort Card 2
- Cs1101s Exam 2017 Solution
- Lecture 7 2008 Modositott
- DSA Book.pdf
- 3. Algorithms 2
- Arrays.pdf
- String Matching: the Linear Algorithm
- SynapseIndia Reviews on PHP Development Array Function- Part 1
- qe
- Sorting Complexity
- Man
- Design and Analysis of Algorithms
- Algorithm Design
- Data Structure Lab Programs
- Articulo
- Naive Bayes MapReduce
- 31
- Jan p3
- AutoCAD 2D 2 Module 24

You are on page 1of 32

by

William F. Gilreath

(will@williamgilreath.com)

May 2011

Binar Sort: A Linear Generalized Sorting Algorithm

Abstract

Sorting is a common and ubiquitous activity for computers. It is not surprising that there exists a

plethora of sorting algorithms. For all the sorting algorithms, it is an accepted performance limit

that sorting algorithms are linearithmic O(N lg N). The linearithmic lower bound in performance

stems from the fact that the sorting algorithms use the ordering property of the data. The sorting

algorithm uses comparison by the ordering property to arrange the data elements from an initial

permutation into a sorted permutation.

Linear O(N) sorting algorithms exist, but use a priori knowledge of the data to use a specific

property, and thus has greater performance. In contrast, the linearithmic sorting algorithms are

generalized by using a universal property of data, relative ordering by comparison, but have a

linearithmic performance lower bound. The tradeoff in sorting algorithms is generality for per-

formance by the chosen property used to sort the data elements.

A general-purpose, linear sorting algorithm in the context of the tradeoff of performance for ge-

nerality seems implausible. There is an implicit assumption that only the property of ordering is

universal. But, ordering is not the only universal property for data elements. The binar sort is a

general-purpose sorting algorithm that utilizes this other universal property to sort with linear

performance.

Keywords: comparison, comparison sorting, encoding, ordering, linear, linearithmic, sort, sort-

ing, universal property.

-2-

Binar Sort: A Linear Generalized Sorting Algorithm

1. Introduction

A. Sorting

Sorting is a frequent, ubiquitous computational activity on computers, often in tandem with other

algorithms such as search. The interest and ubiquity of sorting has led to the development of a

diverse number of sorting algorithms. But it begs the question of “What exactly is sorting?”

Sorting Formalism

A sorting formalism independent of any specific sorting algorithm defines sorting without a spe-

cific method to sort a collection of data elements. Knuth defines sorting [Knuth 1998] as:

The records:

are supposed to be sorted into non-decreasing order of their keys K1, K2,…,KN, essentially by

discovering a permutation p(1) p(2)…p(N) such that:

Knuth’s definition of sorting defines sorting in terms of the end result, a specific permutation out

of the n! permutations possible for a collection of records. The definition is more mathematical

than algorithmic. The result is described mathematically in organization among the data

elements, but not the mechanism for the result. The actual means is the specific sorting

algorithm.

Linearithmic Sorting

Most sorting algorithms are comparison based sorting algorithms. The comparison sorting

algorithms include such algorithms as the merge sort, quick sort, and heap sort. These sorting

algorithms use comparison to arrange the elements in the sorted permutation, and are general-

purpose in nature.

The comparison sorting algorithms have a well-known theoretical [Johnsonbaugh and Schaefer

2004] performance limit that is the least upper bound for sorting that is linearithmic or O(N lg N)

in complexity. This theoretical lower bound is from the basis of the comparison sorting

algorithm-using sorting to arrange the data elements. A decision tree for N elements is of

logarithmic height lg N. Thus, the time complexity involves the cost that is the cost of using a

decision tree to compare elements lg N and the number of elements N. Hence the theoretical least

upper bound is the product of the two costs involved, O(N lg N), which is linearithmic

complexity.

-3-

Binar Sort: A Linear Generalized Sorting Algorithm

Linear Sorting

Most, but not all sorting algorithms are linearithmic and use comparison as the operation to

organize the data elements. But, there are linear sorting algorithms, or O(N) complexity, but are

not utilizing comparison. Such linear algorithms include the radix sort, hash sort, and counting

sort.

Linear sorting algorithms use a priori knowledge of the data elements, a specific property, that is

required to sort linearly. The linear sorting algorithms map with a function, the data element

using the specific property.

The linear sorting algorithms are not general-purpose as they are tied to a specific property.

Compared to (no pun intended) linearithmic sorting algorithms, the linear are special-purpose. In

sorting algorithms, there is the classic engineering tradeoff of generality with performance for

linear with linearithmic, respectively.

Sorting is algorithmic, but the algorithm has a key operation based upon a property of the data

elements. Linearithmic sorting algorithms use comparison, or a comparison operation, but that

operation is based upon the universal property of ordering. Linear sorting algorithms use a

mapping operation based upon a specific property of the data elements.

The tradeoff in generality to performance for sorting algorithms is an absolute. But there is an

implicit presumption that there is no other universal property of the data elements. Such a

presumption is that ordering is the only universal property of data elements. However, ordering

is not the only universal property of data elements.

B. Data

Long before computers were invented and constructed, data or lots of datum existed. Files, cards,

and records were kept as the industrial revolution lead to a revolution in data for production fig-

ures, records on employees, and the government kept records for taxes, census for population,

and so forth. Hence the modern civilization is characterized by the growth of information in the

form of data in a variety of forms for different and varied uses. Now there is a vast dizzying

ocean of data, with more information accumulated, analyzed, mined, and processed.

Data on a Computer

In order to use data on a computer, the data must be represented in a form the computer can work

with natively. Roth explains, “...computers require processing of data which contains numbers,

letters, and other symbols such as punctuation marks. In order to transmit such alphanumeric da-

ta to or from a computer, or store it internally, in a computer, each symbol must be represented

by a binary code.” [Roth 1992, pp. 13-14]

-4-

Binar Sort: A Linear Generalized Sorting Algorithm

Data comes in the form of integers, ordinals, characters, strings, reals, and Boolean values. Com-

puters are binary, and only work on binary numbers, the binary digits or bits of a binary number.

Hence an integer must be represented in binary, a Boolean value as binary, and all others as a

binary number.

Encoding Data

The representing of data on a computer as a binary number, or more generally in binary form, the

data must be encoded into binary form. The kinds of data, integers, ordinals, reals, characters,

strings, and Booleans must be encoded.

Floyd explains, “In order to communicate, we need not only numbers, but also letters and other

symbols. In the strictest sense, alphanumeric codes are codes that represent numbers and alpha-

betic characters (letters). Most such codes however, also represent symbols and various instruc-

tions necessary for conveying information.” [Floyd 1990 p. 81].

Encoding is the system of representation of binary form of each data type in a consistent method.

Many varieties of encoding exist, some well before computers were invented such as Morse

code, or the Baudot code. Other encodings are for communication efficiently such as Gray code,

Excess-3, and encoding for error detection such as Hamming code. [Floyd 1990 p. 80] Null and

Lobur summarize encoding as “...how these internal values can be converted to a form that is

meaningful to humans. The manner in which this is done depends on both the coding system

used by computer and how the values are stored and retrieved.” [Null and Lobur 2003 p. 68]

There are different encoding standards used, such as ASCII code [ANSI 1986], UTF-8 [Yergeau

2003], and Unicode [Unicode 2006] for characters, IEEE-754-1985 [IEEE 1985] for floating

point numbers, and two’s complement for integers. Encoding is universal because computers

require a ubiquitous form of representation to interchange data. But, the encoding is perceived as

a hardware feature, which is the reason encoding, is not readily apparent as a universal property.

Data (from the software perspective) automatically exists in the computer, and the details of the

data are a hardware, not software focus. An integer or Boolean exists in the programming

language, and thus in the computer, but the fact is is specifically encoded into a binary number is

abstracted away.

2. real (as float or double) - IEEE-754-1985

3. Boolean - binary word (each bit is a Boolean value of true or false)

4. character - ASCII, EBCDIC, Unicode, UTF-8 (string is an array or linear structure of

characters)

The ordering property, in the form of a comparison function or operator, is not the only universal

property in data. Frequently in object-oriented software development, a class must define a

-5-

Binar Sort: A Linear Generalized Sorting Algorithm

comparison method to determine the ordering of instances of the class as objects. From a

mathematical perspective, numbers such as real numbers and integer numbers have an ordering,

but from an algorithmic perspective the ordering is not always given. Thus ordering is not the

only universal property, and thus the tradeoff in sorting algorithms of performance to generality

is not immutable.

The question is then one of: “What is the alternate universal property of data?” The answer is

seemingly obvious, but is not. The answer is the encoding, that is how a data element is

represented, in the computer.

The basic data elements of integers, reals, and characters are all encoded in a binary form on a

computer. Computers are binary in operation and the representation or encoding is in bits--binary

digits. But for computers to exchange and share data the representation must be agreed upon--the

encoding, particularly for characters and strings of characters.

Given that the encoding is universal in a binary form, the germane question is: “How does this

relate to a sorting algorithm?” Simply put, finding another universal property allows for

generality, and encoding is not ordering. Hence the algorithm using encoding is not comparison

sorting, and the linearithmic O(N lg N) lower bound is inapplicable.

Utilizing the encoding, a sorting algorithm can map the data elements instead of comparing a

data element to another, which is the basis for a general-purpose linear sorting algorithm. A

function or operator is needed to use the encoding to arrange the data elements from an initial

random permutation into a sorted permutation. The sorting algorithm is the binar sort algorithm.

Ubiquity of Encoding

An obvious realization is that while Unicode is widely used, there is no one, single, universal

encoding of data, particulary for character data. It would seem that using the encoding as a

universal property to sort upon is contradictory as there is no universal encoding. No universal

encoding implies that there is no universal property, hence that sorting is no general-purpose.

While there is no one universal encoding for data, all the various different encodings can be

translated or converted to one another. Even a proprietary encoding can be translated, out of the

need to work with the bulk of existing software, unless the originator of the proprietary encoding

intends to implement all the various forms of software to work with the encoding. The translation

and conversion is a necessity for use of existing software. No matter how elegant and efficient an

encoding, the bulk of software exists using standardized encodings of data.

The universality is that all encodings are translatable to one another, so encoding is universal in

that each encoding is possible to use for a different platform or system.

More formally, for given encoding domain Ex there exists a translation function Fxy that

translates an encoding domain Ex into the encoding range Ey. A character or glyph cEx in

encoding Ex is mapped by Fxy into the encoding Ey for a character cEy.

-6-

Binar Sort: A Linear Generalized Sorting Algorithm

When cEx Ex cEy Ey then for Fxy, it follows Fxy : Ex Ey therefore Fxy(cEx) cEy.

An encoding this is a subset of another larger encoding, the translation for the subset (a partial

mapping function for Ex mapped by Fxy to Ey) is possible with all other characters outside the

domain encoding mapped to a default character in the range encoding.

-7-

Binar Sort: A Linear Generalized Sorting Algorithm

2. Algorithm Synopsis

The binar sort partitions or divides data elements from the initial array into the lower and upper

sub-arrays. The algorithm extracts the nth bit from a given data element, and then uses that bit to

map the element in the lower or upper sub-array.

Two sub-arrays are created within the original array starting at the extreme endpoints and pro-

gressing towards each other at the middle of the initial array.

The process continues on each created sub-array using the next bit position in the data elements.

This process continues until either the last bit in the data element is reached, or the size of a sub-

array is one element.

The resulting array at the base case for termination cannot partition into more sub-arrays, or there

are no more bits for extraction. The original array is now in a sorted permutation.

-8-

Binar Sort: A Linear Generalized Sorting Algorithm

3. Operation of Algorithm

The binar sort algorithm operates both iteratively and recursively in place on the original array of

data elements.

2. Initialize starting array bounds.

3. Partition the array into sub-arrays.

4. Determine recursive call on sub-arrays.

The first step is to evaluate the passed parameters for termination of the algorithm, to determine

if a recursive base case has been reached. When one of the two possible base cases of the binar

sort is reached, the recursion terminates, and the call returns without any further operation on the

passed parameters.

The two criteria for the base case of the recursion are:

2. The size of a sub-array is one element.

The first case is to reach the end of the number of bits for a data element. In effect, there are no

more bits to extract to use for partitioning. The second case is the check of the bounds of the ar-

ray parameter for any elements to partition into a sub-array. For an array of one element, there is

no point partitioning, as the element is in its final position. For either case, the operation of the

algorithm terminates.

When the binar sort algorithm does not terminate, then the operation proceeds to initialization.

The passed array bounds are used to initialize the bounds for the original array. The passed array

bounds are retained for use later in the operation of the algorithm. The bounds of the original ar-

ray are used by variables to track the changing boundaries of the original array during partition-

ing. After initializing the original array boundaries, the operation then proceeds to partition.

The heart of the binar sort algorithm is the continual partitioning of the original array. The parti-

-9-

Binar Sort: A Linear Generalized Sorting Algorithm

tion step of the algorithm divides the elements of the original array into lower and upper sub-

arrays.

1. Extract the nth bit from the data element in selected position.

2. Map the data element in the correct sub-array.

3. Adjust sub-array boundary to encompass element.

The initial start position used in the operation is the lower index position of the array. The start-

ing position acts as the locus of partitioning the array. The data element at the locus position is

the working element mapped into a sub-array.

Bit Extraction

Before partitioning the working data element, the nth bit is extracted to determine the sub-array

to map the element. Bit extraction uses a shift operation to the left by N-bits, and a bit mask. The

bit mask is a literal value that depends on the data element word size in bits. The result is all the

bits are zero or the value of the literal bit mask.

For a data element of a nybble or 4-bits, the bit mask is hexadecimal 0x8 or a binary number of

10002 for an integer value of 8.

For a data element using the bit mask with a logical and bitwise operation is:

The resulting value for a nybble is either hexadecimal 0x0 or a binary number 00002 the integer

0, a hexadecimal 0x8, or a binary number of 10002 for the integer value of 8, which is a non-

zero. Thus for a bit extraction, the result of the extracted bit for a data element is essentially a

zero or non-zero (the literal bit mask) value.

Placement of Element

The placement of the data element is dependent upon the bit value from the bit extraction. De-

pending on the extracted bit value, the data element is placed by one of two possibilities. The

data element selected is in the lower sub-array.

1. Data element is in the correct position in the lower sub-array, extend sub-array boundary.

2. Data element is in the wrong position, exchange with the element in the upper sub-array.

With an extracted bit value of zero, the data element is in the correct position in the lower sub-

array, and nothing is exchanged. For an extracted bit value of non-zero, the data element is ex-

changed or swapped with the data element in the upper sub-array. In both cases, it then follows

to adjust the array bounds after the data element is placed.

-10-

Binar Sort: A Linear Generalized Sorting Algorithm

Once the data element is placed in the correct sub-array, the sub-array boundaries are adjusted to

encompass the element. For the lower sub-array, the boundary index is incremented, and for the

upper sub-array, the boundary index is decremented. The lower and upper array boundaries ap-

proach each other as each data element is partitioned.

Partition Repetition

The partitioning process continues iteratively for each data element in the array. The iterative

process continues until the array bounds cross or intersect, when all the elements are partitioned

into the sub-arrays for a bit position. The remaining step is to determine the recursive continua-

tion of the operation of the algorithm.

The last step of the binar sort is recursive continuation of the algorithm on the result of partition-

ing. The partitioning creates one or two sub-arrays. For one sub-array, no partitioning occurred,

which is pass through. A result of two sub-arrays, the partitioning process successfully divided

the data elements of the original array.

The continuation of the algorithm must determine the results of partitioning. The original array

bounds are evaluated using the index of sub-array bounds after partitioning, and partitioning con-

tinues recursively.

The binar sort algorithm works on different data types (such as character, integer, ordinal, real,

and string) of different data bit sizes. The word size of the data element is a constant c of the al-

-11-

Binar Sort: A Linear Generalized Sorting Algorithm

gorithm. In illustration of the algorithm, a nybble, or a 4-bit word size (a single hexadecimal

number) is used.

Consider an initial array of nybble values which are eight nybbles to sort into an ordered permu-

tation. The bit mask for the bit extraction is hexadecimal 0x1000, or integer value of 8.

The first pass at partitioning divides the original array (in the center) into a lower sub-array (on

the left) and an upper sub-array (on the right). Initially before any partitioning the configuration

of the array, lower sub-array, and upper sub-array is:

The selected element is 'B', and the bit extracted is from the most significant position, which is a

non-zero. The element is not in the correct position, so is swapped with the element at the boun-

dary of the upper sub-array, the 'E' element. After the exchange, and array boundary adjustment

the configuration is:

The selected element is 'E', and the bit extracted is non-zero. Again, the element is not in the cor-

rect position, so is swapped with the 'C' element. The configuration is:

The selected element is 'C', and the bit extracted is non-zero. The element is not in the correct

position, and so is swapped the 'A' element. The configuration is:

-12-

Binar Sort: A Linear Generalized Sorting Algorithm

The selected element is 'A', and the bit extracted is non-zero. The element is not in the correct

position, and so is swapped the '7' element. The configuration is:

The selected element is '7', and the bit extracted is zero. The element is in the correct position,

and so only the array bounds are adjusted. The configuration is:

The selected element is '4', and the bit extracted is zero. Again, the element is in the correct posi-

tion, and so only the array bounds are adjusted. The configuration is:

The selected element is '0', and the bit extracted is zero. The element is in the correct position,

and so only the array bounds are adjusted. The configuration is:

-13-

Binar Sort: A Linear Generalized Sorting Algorithm

Again, as the selected element is '0', and by the same process, the final configuration of:

The algorithm continues on each sub-array as an array, further partitioning them on the next bit

position.

The partitioning of the lower sub-array is on the second bit position. As before the initial confi-

guration is:

The selected element is '7', and the extracted bit is non-zero. The element is placed with an ex-

change, and the configuration is:

The selected element is '0', and the extracted bit is zero. The element is in the correct position,

and the configuration is:

-14-

Binar Sort: A Linear Generalized Sorting Algorithm

The selected element is '4', and the extracted bit is non-zero. The element is placed with an ex-

change, and the configuration is:

The selected element is '0', and the extracted bit is zero. The element is in the swapped into posi-

tion (with itself as there is only one element), and the configuration is:

The resulting lower and upper sub-array elements are in the correct positions. However, parti-

tioning would continue, but in the case of pass through resulting in only one sub-array. The re-

maining third and fourth bit positions are evaluated, and when the last bit position is reached.

The partitioning of the lower sub-array is on the second bit position. As before the initial confi-

guration is:

The selected element is 'A', and the extracted bit is zero. The element is in the correct position,

and the configuration is:

-15-

Binar Sort: A Linear Generalized Sorting Algorithm

The selected element is 'C', and the extracted bit is non-zero. The element is swapped into the

correct position, and the configuration is:

The selected element is 'B', and the extracted bit is zero. The element is in the correct position,

and the configuration is:

Lastly, the selected element is 'E', and the extracted bit is non-zero. The element is swapped

(with itself as there is only one element) and the array bounds adjusted. The final configuration

is:

The lower sub-array for the third and fourth bits the elements are in the correct position, the case

of pass through. For the upper sub-array, the elements are positioned on the third bit, but then the

recursion terminates, as the sub-array size is one element.

-16-

Binar Sort: A Linear Generalized Sorting Algorithm

The partitioning process starts from an initial array of nibbles or hexadecimal numbers, and con-

tinues until the end of the last bit (pass through), or when the sub-array is only of one element in

size. Consider the array configuration into sub-arrays for each bit position, starting from an ini-

tial array. The configuration for each pass is:

On the partitioning with the second and third bit, the sub-array with [E C] is partitioned into

two sub-arrays of a single element. Thus, for the remaining bit positions, no partitioning occurs

as the recursion terminates.

However, the other sub-arrays are the case of pass through, no data elements are partitioned. This

continues until the last bit position is reached, and the recursion terminates. Both recursive cases

are used in the overall partitioning process.

After the last bit position is evaluated for the three sub-arrays that are partitioned but result in

pass through, the overall array is in the final configuration of:

-17-

Binar Sort: A Linear Generalized Sorting Algorithm

The configuration is, by Knuth's formal definition, a sorted permutation of the original array of

elements. The partitioning on the sub-arrays operates in place, within the index boundaries for

sub-arrays within the overall array.

5. Analysis of Algorithm

The binar sort is linear in both space and time complexity. The algorithm requires no extra sto-

rage space or memory, and performance time is proportional to the number of data elements.

-18-

Binar Sort: A Linear Generalized Sorting Algorithm

Space Analysis

The binar sort is an in-place sorting algorithm; hence the same initial array is used for each re-

cursive invocation of the algorithm for partitioning. The sub-arrays are determined by the passed

parameters as index boundaries within the array. The only change is of the variables for each re-

cursive call of the binar sort.

The array of N elements and a constant number of variables c for reach recursive invocation is

the space complexity S:

O(S) = N + c

As there are multiple invocations through recursion, the constant is the sum of each recursive

call, or for i total calls recursively:

i1

O(S) = N + c j

j 0

i1

c j

= icj = c

j 0

Effectively the sum of the constants c for i recursive calls is simply another constant c, or in

Big-Oh notation a constant c, so the space complexity for the binar sort is:

O(S) = N + c = O(N)

The space complexity of the binar sort is linear or O(N), proportional to the number of data ele-

ments.

Time Analysis

The analysis of the performance time of sorting algorithms typically considers different cases,

from an optimal to a worst case. Each case is simply a different permutation for the particular

sorting algorithm is better or worse performance. Frequently, the permutations of the data for the

-19-

Binar Sort: A Linear Generalized Sorting Algorithm

different cases are for an ascending sorted permutation, a descending sorted permutation, and a

random permutation.

This approach to sorting algorithm analysis is used for the comparison-based sorting algorithms.

An unasked question is "Why use various permutations for performance time analysis?" The an-

swer is that comparison-based sorting algorithms do a relative comparison for positioning a data

element. Each comparison results in a relative positioning of the data element in the overall array

of data elements. Thus, the initial permutation of the data elements can impact sorting perfor-

mance.

Why consider this particular point for time analysis of a sorting algorithm? Simply, the binar sort

does not use comparison to place an element in position, but the extracted bit values. For the

constant c total bits or word size of a data element, each bit constitutes 1/c of the process in the

correct positioning a data element within the array. Each positioning is absolute within the over-

all array, not relative to the other data elements. Effectively, the initial permutation of the data

elements is superfluous to the performance time of the binar sort. Comparison is not used to rela-

tively place the element; each bit is used to put an element in an absolute position.

In the analysis of the performance time, considering worse and optimal cases is irrelevant as the

binar sort is not comparison-based, so is independent of any particular permutation of the data

elements.

Analysis of the binar sort algorithm performance time does not use specific cases. The analysis

approach is to consider each discrete step, and consider the overall total performance time O(T)

as the sum of each step. Thus the initial analysis of the performance time of the binar sort is:

Each discrete step has its own performance time, and for each discrete step the performance time

is:

2. O(T1) - Initialize starting array bounds.

3. O(T2) - Partition array into sub-arrays.

4. O(T3) - Determine recursive call on sub-arrays.

Except for the partition step of the algorithm, all the other steps are constant in the performance

time. For the last step involving a logical decision of the recursive call, the complexity of the

evaluation is constant for the decision, and the greater performance time for two recursive calls is

used as the constant. Substituting a constant for each step, the performance time of the binar sort

becomes:

-20-

Binar Sort: A Linear Generalized Sorting Algorithm

The expression simplifies the constants to a single constant c, and then is:

O(T) = O(T2) + c

Partitioning Analysis

The analysis of partitioning is not constant, as the partitioning step in the algorithm is iterative.

However, the iterative step processes through all elements in the array, and for the recursive con-

tinuation the sub-arrays, which are divisions of the overall array. Thus partitioning is a constant

performance time c for the bit extraction and determination of which sub-array to place a data

element. But this process is repeated n times, or once for each data element. This means that the

performance time for the partitioning step is:

O(T2) = cN

Substituting this into the original performance time expression for the algorithm is:

O(T) = cN + c

But for the Big-Oh time complexity, this expression simplifies to:

O(T) = N

Recursion Analysis

One important consideration in the performance time analysis, is that for the expression as the

sum of the performance time for each discrete step, is that it is for one operation of the algorithm

on a single bit position. This means the performance time expression is:

O(T) = 1N

For each recursive pass, each bit of each data element is accessed once, and there are a constant

number c of bits (and for other data types with variants, such as a string of characters, there is an

overall average length for a data element that is independent of the number of data elements).

This means that the recursion will occur for each bit position, so that the performance time is:

O(T) = cN

Regardless of the permutation of the data elements, the binar sort in the worst case will access a

data element once for each bit. Hence for N data elements with a constant number of bits c, the

performance time is O(N). The binar sort algorithm operates independently of the permutation of

-21-

Binar Sort: A Linear Generalized Sorting Algorithm

the data elements, so avoids a worst or optimal case. Thus, the binar sort algorithm remains con-

sistently linear in performance time.

6. Applied Performance

The binar sort is implemented in both managed code (in Java and C#) and native code (using C)

to evaluate the applied performance of the algorithm. The code was executed on different plat-

forms, from an AMD 64-bit PC with Windows 7, an PowerPC eMac running Mac OS X, an Intel

-22-

Binar Sort: A Linear Generalized Sorting Algorithm

Core Duo MacBook, and Linux Pentium 4 PC. The managed code was executed on the Common

Language Runtime (CLR) of .NET, and the Java Virtual Machine (JVM).

The same data sets are used for all the implementations of the binar sort. The test set examined

three cases:

1. Unsorted integers

2. Sorted integers

3. Partially sorted integers

In each case, the test was executed, and the performance time and size of the data written to a

data log file. The test sizes are varied, but as increments of blocks from a size of 216 = 65536 in-

tegers, and blocks the size of 220 = 1,048,576 integers.

The performance is linear, with a statistical covariance ranging from 0.9994 to 0.9999 in the

graph of the performance. The applied implementation of the binar sort was consistent with the

theoretical expectation, linear O(N) performance.

-23-

Binar Sort: A Linear Generalized Sorting Algorithm

-24-

Binar Sort: A Linear Generalized Sorting Algorithm

Chart of Test for Binar Sort Performance at 0.9999 Covariance on Megabyte Block

Summary of Charts

-25-

Binar Sort: A Linear Generalized Sorting Algorithm

The charts of the performance of data size N to time T illustrate a linear, straight line. The size of

data increases linearly, and then the time to sort the data increases linearly in proportion to a con-

stant. The covariance varies from 0.9994 to 0.9999, but is 0.999 for each equation. Hence the

equations are statistically accurate for the binar sort performance.

When the equations of data size to sort time are plotted, the lines are congruent, and some paral-

lel to each other.

The equations change along the position along the axes, but the plot of the lines are consistent.

This indicates how the binar sort performance is consistently linear, with a change in the equa-

tion constants from the data sizes and the runtime platform.

7. Future Work

-26-

Binar Sort: A Linear Generalized Sorting Algorithm

There are two major potential aspects for further work with the binar sort that are:

1. Variations

2. Optimization

Algorithm Variants

The binar sort is recursive and executes on a serial processor. This leads to the opportunity for:

2. Parallel version of the algorithm.

The algorithm variant of purely iterative is to replace recursion with iteration. The implementa-

tion is more an investigation into the potential possibility, so an open research question. An itera-

tive version would remove the overhead in time and space of activation records on the runtime

stack of the environment. A comparison of the performance of a recursive to iterative implemen-

tation of the binar sort would show possible performance improvement.

The binar sort executes on a serial processor, and thus it has potential parallelization. A parallel

version of the binar sort would let each processor focus on the partition for the processor identi-

ty. The interesting feature is that from the initial array of data elements, each parallel operation is

completely independent of the other. Theoretically, the linear algorithm of O(N) increases in per-

formance by O(N/P) where there are a total of P-processors.

The binar sort is efficient as a linear O(N) algorithm, but optimizations are possible in the opera-

tion of the algorithm. Two such possible optimizations are:

2. Optimize for determining if elements in sorted order.

A potential optimization is for pass through when partitioning, essentially no data elements are

partitioned into sub-arrays. The cost of a recursive call to continue partitioning in the same

bounds for the next bit position is optimized by iteration. Instead of a recursive call, the partition-

ing would proceed but for the next bit position. This also simplifies the continuation recursive

step, as the recursion would always be for two sub-arrays, not just a single one.

The illustration of the operation of the binar sort demonstrated that the algorithm continues the

recursive invocation of the algorithm when the entire array is in a sorted permutation. This is

wasteful of process time, and increases the overhead of additional recursion without any further

need. A viable optimization is to check within the original array bounds to verify if the data ele-

ments are in sorted order, and indicate a sorted permutation exists.

-27-

Binar Sort: A Linear Generalized Sorting Algorithm

The difficulty with the optimization is to avoid unnecessary checks when the array is not poten-

tially sorted, but to determine when the array is sorted to avoid unnecessary recursive partition-

ing. For example, after several occurrences of pass through while partitioning a sorted order

check is performed. If the array is in a sorted ordered arrangement, the recursive calls in other

sub-arrays would terminate.

-28-

Binar Sort: A Linear Generalized Sorting Algorithm

8. Conclusion

The binar sort is a linear, unstable, and recursive sorting algorithm. The binar sort algorithm is

independent of the initial data permutation. By using the bits of the encoded data elements, the

binar sort arranges the data elements into a sorted permutation. Utilizing the encoding makes the

binar sort universal, but also a linear algorithm independent of any initial permutation for ex-

treme performance cases.

Beyond the features of the binar sort algorithm, are some intriguing realizations. One is that ge-

nerality and performance are not a hard, fast, tradeoff that is frequently presumed. The binar sort,

by its nature illustrates that such a tradeoff is not immutable. The binar sort algorithm illustrates

that the linearithmic, comparison based sorting algorithms have an implicit presumption that or-

dering is the only universal property feasible for generalized sorting. This implicit presumption is

flawed, and the other universal property is (of which all computers at the time of this writing are)

the universal property of the encoding of data.

In the analysis of the binar sort algorithm, the permutation of the data is superfluous and irrele-

vant to the performance of the binar sort. This illustrates that comparison based sorting algo-

rithms are sensitive to the permutation of the data set to sort. By using encoding instead of order-

ing, the binar sort avoids the sensitivity flaw to the permutation of the data. This flaw in sensitiv-

ity to the permutation of the data is an accepted defect in existing comparison based sorting algo-

rithms, but again it is an implicit presumption that this is necessary.

The binar sort algorithm breaks new ground, but any further development is far from complete or

a finished. Further work is necessary for improvement and optimization of the binar sort. A pure-

ly iterative variant, and a parallel version of the algorithm are just two possibilities of future

work. Future developments are anticipated with possibly other algorithms inspired by the binar

sort.

-29-

Binar Sort: A Linear Generalized Sorting Algorithm

void BinarSort(int lower, int upper, int pos, int[] array)

{

if(pos == 33 || upper < lower + 1) return;

int hi = upper; //array upper bound position

{

lo++;

}

else

{

int temp = array[hi];

array[hi] = array[lo];

array[lo] = temp;

hi--;

}//end if

}//end while

if(lo == upper + 1)

{

BinarSort(lower, upper, pos + 1, array);

}

else

{

BinarSort(lower, lo - 1, pos + 1, array);

BinarSort(lo, upper, pos + 1, array);

}//end if

}//end BinarSort

-30-

Binar Sort: A Linear Generalized Sorting Algorithm

void bsort(const int low, const int high, const int pos, int list[])

{

int lo = low;

int hi = high;

{

if( (list[lo] << pos) & 0x80000000 )

{

int temp = list[hi];

list[hi] = list[lo];

list[lo] = temp;

hi--;

}

else

{

lo++;

}//end if

}//end while

if(lo == high + 1)

{

bsort(low, high, pos+1, list);

}

else

{

bsort(low, lo-1, pos+1, list);

bsort(lo, high, pos+1, list);

}//end if

}//end bsort

-31-

Binar Sort: A Linear Generalized Sorting Algorithm

References

[ANSI 1986] American National Standards Institute, "Coded Character Set - 7-bit American

Standard Code for Information Interchange", ANSI X3.4, 1986.

[Floyd 1990] Floyd, Thomas L. Digital Fundamentals, 4th edition. Merril Publishing Company,

Columbus, OH. 1990.

[IEEE 1985] IEEE Computer Society. IEEE Standard for Binary Floating-Point Arithmetic,

IEEE Std 754-1985, 1985.

[Johnsonbaugh and Schaefer 2004] Johnsonbaugh, Richard and Schaefer, Marcus. Algorithms.

Prentice-Hall, Upper Saddle River, New Jersey, 2004.

[Knuth 1998] Knuth, Donald E. The Art of Computer Programming, 2nd edition. Addison-

Wesley, Reading, Massachusetts, 1998.

[Null and Lobur 2003] Null, Linda and Lobur, Julia. The Essentials of Computer Organization

and Architecture, Jones and Bartlett Publishers, Sudbury MA. 2003

[Roth 1992] Roth, Jr., Charles A. Fundamentals of Logic Design, 4th edition. West Publishing

Company, New York, NY. 1992.

[Unicode 2006] The Unicode Standard, Version 5.0, Fifth Edition, The Unicode Consortium,

Addison-Wesley Professional, 2006.

[Yergeau 2003] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC

3629, November 2003.

-32-

- Week 2Uploaded byreem
- Sort Card 2Uploaded byas400srinu
- Cs1101s Exam 2017 SolutionUploaded byJoshua
- Lecture 7 2008 ModositottUploaded bySzikinger Ďurica Armani
- DSA Book.pdfUploaded bySathish kumar
- 3. Algorithms 2Uploaded byRomeo Balingao
- Arrays.pdfUploaded byBea Oro
- String Matching: the Linear AlgorithmUploaded byKarim El Sheikh
- SynapseIndia Reviews on PHP Development Array Function- Part 1Uploaded bysynapseindia
- qeUploaded byanto
- Sorting ComplexityUploaded byisa
- ManUploaded byNicholas Lederer
- Design and Analysis of AlgorithmsUploaded bykousjain1
- Algorithm DesignUploaded byEyob Wondimkun
- Data Structure Lab ProgramsUploaded bynofeelingrahul
- ArticuloUploaded byOscarJavierRey
- Naive Bayes MapReduceUploaded byHouda Kamouss
- 31Uploaded byignou11
- Jan p3Uploaded byKabir Singh
- AutoCAD 2D 2 Module 24Uploaded byalvin balisbis
- Bit OpsUploaded byVysakh Sreenivasan
- Globalization GuideUploaded bymaheshumbarkar
- 07 SortingUploaded byPradipta Paul
- hwUploaded byjhdmss
- Alat PenerjemahanUploaded byputrishalehah
- Distribute Dos IntroUploaded byMj prjpt
- GATP TCODESUploaded byDheerajKumar
- 07-StringsAndStreamsUploaded byRosa Gonzalez
- Sorting TechniquesUploaded byJohnFernandes
- Practical File 2018Uploaded byMaverick Shade

- D&D 5e - Premade Character SheetsUploaded byWilliam Gilreath
- Table of Contents for Computer Architecture: A Minimalist PerspectiveUploaded byWilliam Gilreath
- Character Sheet Azure 4th Level Pirate ClericUploaded byWilliam Gilreath
- Quatrains in Requiem for Gary GygaxUploaded byWilliam Gilreath
- Division by Zero Examples in TransmathematicsUploaded byWilliam Gilreath
- Fund Law Add Mul AlgebraUploaded byWilliam Gilreath
- Dean Koontz Says Goodbye to Odd ThomasUploaded byWilliam Gilreath
- Resume William GilreathUploaded byWilliam Gilreath
- The Mynx Book October 2007Uploaded byWilliam Gilreath
- Book Flyer for Computer Architecture: A Minimalist PerspectiveUploaded byWilliam Gilreath
- Binar Shuffle AlgorithmUploaded byWilliam Gilreath
- Valiant QuestUploaded byWilliam Gilreath
- The Mynx Programming Language July 2007Uploaded byWilliam Gilreath
- Chapter 9 of Computer Architecture: A Minimalist PerspectiveUploaded byWilliam Gilreath
- Poem in Memory of the Monks of MyanmarUploaded byWilliam Gilreath
- Index to Computer Architecture: A Minimalist PerspectiveUploaded byWilliam Gilreath

- QMFUploaded bysuganya
- Multi-player Alpha-beta PruningUploaded by.xml
- 5.1_DynamicProgrammingUploaded byفهمان الخولاني
- ON BIPOLAR SINGLE VALUED NEUTROSOPHIC GRAPHSUploaded byAnonymous 0U9j6BLllB
- Static CharacteristicsUploaded byAkhil Gorla
- lecture15.pptUploaded byApel King
- A Neural Network for Predicting Moisture Content of GrainUploaded byLiliana Şeremet
- compilerUploaded byvidhya_bineesh
- Automatic Generation ControlUploaded bycastrojp
- group 1 BF.pptxUploaded byElijah Santos
- QueuesUploaded byadddata
- Algorithm TheoriesUploaded byjan boomsma
- water jugUploaded bySIShahHarsh
- face reconginationUploaded byMadhavi Reddy
- PID ControllerUploaded byadetag
- Optimal Risky PortfoliosUploaded byShilpi Rai
- 2_123Uploaded byDavid Nangas
- lab1Uploaded bycicosz
- Investigations on Hybrid Learning in ANFISUploaded byAnonymous 7VPPkWS8O
- 2017Fall_SyllabusUploaded byPerry Raskin
- Hashing Best StudyUploaded byShanmu Technocrat
- Development of an Expert System for Diagnosis and appropriate Medical Prescription of Heart Disease Using SVM and RBFUploaded byijcsis
- Diagnosing Epilepsy using EEG signals and classification of EEG signals using SVMUploaded byIRJET Journal
- Spreadsheet-based-simulation-of-single-server-model.pptxUploaded byjustinator
- APPLICATIONS OF NUMERICAL METHODS MATLAB.pptxUploaded byRushikeshDigraskar
- DETERMINATION OF A MINING CUTOFF GRADE STRATEGY BASED ON AN ITERATIVE FACTOR.pdfUploaded byRenzo Murillo
- ubc_2005-0529.pdfUploaded bymusiomi2005
- CSD(5)Uploaded byTalha Yasin
- A Simulation Study on Model PredictiveUploaded byUdhaya Arumugam
- unit-1 newUploaded byAmsaveni Kannan