Burrows-Wheeler Transform

Burrows–Wheeler transform
The Burrows–Wheeler transform (BWT, also called block-

sorting compression) rearranges a character string into runs of
Burrows–Wheeler
similar characters. This is useful for compression, since it tends to transform
be easy to compress a string that has runs of repeated characters by Class preprocessing for
techniques such as move-to-front transform and run-length lossless
encoding. More importantly, the transformation is reversible, compression
without needing to store any additional data except the position of
the first original character. The BWT is thus a "free" method of Data structure string
improving the efficiency of text compression algorithms, costing Worst-case O(n)
only some extra computation. The Burrows–Wheeler transform is performance
an algorithm used to prepare data for use with data compression
Worst-case O(n)
techniques such as bzip2. It was invented by Michael Burrows and
space
David Wheeler in 1994 while Burrows was working at DEC
Systems Research Center in Palo Alto, California. It is based on a complexity
previously unpublished transformation discovered by Wheeler in
1983. The algorithm can be implemented efficiently using a suffix array thus reaching linear time
complexity.[1]
Description
When a character string is transformed by the BWT, the transformation permutes the order of the
characters. If the original string had several substrings that occurred often, then the transformed string will
have several places where a single character is repeated multiple times in a row.
For example:
Input SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
Output TEXYDST.E.IXIXIXXSSMPPS.B..E.S.EUSFXDIIOIIIT[2]
The output is easier to compress because it has many repeated characters. In this example the transformed
string contains six runs of identical characters: XX, SS, PP, .., II, and III, which together make 13 out
of the 44 characters.
Example
The transform is done by sorting all the circular shifts of a text in lexicographic order and by extracting the
last column and the index of the original string in the set of sorted permutations of S.
Given an input string S = ^BANANA| (step 1 in the table below), rotate it N times (step 2), where N =
8 is the length of the S string considering also the symbol ^ representing the start of the string and the red |
character representing the 'EOF' pointer; these rotations, or circular shifts, are then sorted lexicographically
(step 3). The output of the encoding phase is the last column L = BNNÂA|A after step 3, and the index
(0-based) I of the row containing the original string S, in this case I = 6.
Transformation
2. All 3. Sort into 4. Take the

1. Input 5. Output
rotations lexical order last column
^BANANA| ANANA|^B ANANA|^B

|^BANANA ANA|^BAN ANA|^BAN
A|^BANAN A|^BANAN A|^BANAN
NA|^BANA BANANA|^ BANANA|^
^BANANA| BNNÂA|A
ANA|^BAN NANA|^BA NANA|^BA
NANA|^BA NA|^BANA NA|^BANA
ANANA|^B ^BANANA| ^BANANA|
BANANA|^ |^BANANA |^BANANA
The following pseudocode gives a simple (though inefficient) way to calculate the BWT and its inverse. It
assumes that the input string s contains a special character 'EOF' which is the last character and occurs
nowhere else in the text.
function BWT (string s)

create a table, where the rows are all possible rotations of s
sort rows alphabetically
return (last column of the table)
function inverseBWT (string s)

create empty table
repeat length(s) times
// first insert creates first column
insert s as a column of table before first column of the table
sort rows of the table alphabetically
return (row that ends with the 'EOF' character)
Explanation
To understand why this creates more-easily-compressible data, consider transforming a long English text
frequently containing the word "the". Sorting the rotations of this text will group rotations starting with "he
" together, and the last character of that rotation (which is also the character before the "he ") will usually be
"t", so the result of the transform would contain a number of "t" characters along with the perhaps less-
common exceptions (such as if it contains "ache ") mixed in. So it can be seen that the success of this
transform depends upon one value having a high probability of occurring before a sequence, so that in
general it needs fairly long samples (a few kilobytes at least) of appropriate data (such as text).
The remarkable thing about the BWT is not that it generates a more easily encoded output—an ordinary
sort would do that—but that it does this reversibly, allowing the original document to be re-generated from
the last column data.
The inverse can be understood this way. Take the final table in the BWT algorithm, and erase all but the
last column. Given only this information, you can easily reconstruct the first column. The last column tells
you all the characters in the text, so just sort these characters alphabetically to get the first column. Then, the
last and first columns (of each row) together give you all pairs of successive characters in the document,
where pairs are taken cyclically so that the last and first character form a pair. Sorting the list of pairs gives
the first and second columns. Continuing in this manner, you can reconstruct the entire list. Then, the row
with the "end of file" character at the end is the original text. Reversing the example above is done like this:
Inverse transformation
Input
BNNÂA|A
Add 1 Sort 1 Add 2 Sort 2
B A BA AN
N A NA AN
N A NA A|
^ B ^B BA
A N AN NA
A N AN NA
| ^ |^ ^B
A | A| |^
BAN ANA BANA ANAN

NAN ANA NANA ANA|
NA| A|^ NA|^ A|^B
^BA BAN ^BAN BANA
ANA NAN ANAN NANA
ANA NA| ANA| NA|^
|^B ^BA |^BA ^BAN
A|^ |^B A|^B |^BA
BANAN ANANA BANANA ANANA|

NANA| ANA|^ NANA|^ ANA|^B
NA|^B A|^BA NA|^BA A|^BAN
^BANA BANAN ^BANAN BANANA
ANANA NANA| ANANA| NANA|^
ANA|^ NA|^B ANA|^B NA|^BA
|^BAN ^BANA |^BANA ^BANAN
A|^BA |^BAN A|^BAN |^BANA
BANANA| ANANA|^ BANANA|^ ANANA|^B

NANA|^B ANA|^BA NANA|^BA ANA|^BAN
NA|^BAN A|^BANA NA|^BANA A|^BANAN
^BANANA BANANA| ^BANANA| BANANA|^
ANANA|^ NANA|^B ANANA|^B NANA|^BA
ANA|^BA NA|^BAN ANA|^BAN NA|^BANA
|^BANAN ^BANANA |^BANANA ^BANANA|
A|^BANA |^BANAN A|^BANAN |^BANANA
Output
^BANANA|
Optimization
A number of optimizations can make these algorithms run more efficiently without changing the output.
There is no need to represent the table in either the encoder or decoder. In the encoder, each row of the
table can be represented by a single pointer into the strings, and the sort performed using the indices. In the
decoder, there is also no need to store the table, and in fact no sort is needed at all. In time proportional to
the alphabet size and string length, the decoded string may be generated one character at a time from right
to left. A "character" in the algorithm can be a byte, or a bit, or any other convenient size.
One may also make the observation that mathematically, the encoded string can be computed as a simple
modification of the suffix array, and suffix arrays can be computed with linear time and memory. The BWT
can be defined with regards to the suffix array SA of text T as (1-based indexing):
[3]
There is no need to have an actual 'EOF' character. Instead, a pointer can be used that remembers where in
a string the 'EOF' would be if it existed. In this approach, the output of the BWT must include both the
transformed string, and the final value of the pointer. The inverse transform then shrinks it back down to the
original size: it is given a string and a pointer, and returns just a string.
A complete description of the algorithms can be found in Burrows and Wheeler's paper, or in a number of
online sources.[1] The algorithms vary somewhat by whether EOF is used, and in which direction the
sorting was done. In fact, the original formulation did not use an EOF marker.[4]
Bijective variant
Since any rotation of the input string will lead to the same transformed string, the BWT cannot be inverted
without adding an EOF marker to the end of the input or doing something equivalent, making it possible to
distinguish the input string from all its rotations. Increasing the size of the alphabet (by appending the EOF
character) makes later compression steps awkward.
There is a bijective version of the transform, by which the transformed string uniquely identifies the
original, and the two have the same length and contain exactly the same characters, just in a different
order.[5][6]
The bijective transform is computed by factoring the input into a non-increasing sequence of Lyndon
words; such a factorization exists and is unique by the Chen–Fox–Lyndon theorem,[7] and may be found in
linear time.[8] The algorithm sorts the rotations of all the words; as in the Burrows–Wheeler transform, this
produces a sorted sequence of n strings. The transformed string is then obtained by picking the final
character of each string in this sorted list. The one important caveat here is that strings of different lengths
are not ordered in the usual way; the two strings are repeated forever, and the infinite repeats are sorted. For
example, "ORO" precedes "OR" because "OROORO..." precedes "OROROR...".
For example, the text "^BANANA|" is transformed into "ANNBAA^|" through these steps (the red |
character indicates the EOF pointer) in the original string. The EOF character is unneeded in the bijective
transform, so it is dropped during the transform and re-added to its proper place in the file.
The string is broken into Lyndon words so the words in the sequence are decreasing using the comparison
method above. (Note that we're sorting '^' as succeeding other characters.) "^BANANA" becomes (^) (B)
(AN) (AN) (A).
Bijective transformation
All Last column

Input Sorted alphabetically Output
rotations of rotated Lyndon word
^^^^^^^^... (^) AAAAAAAA... (A) AAAAAAAA... (A)

BBBBBBBB... (B) ANANANAN... (AN) ANANANAN... (AN)
ANANANAN... (AN) ANANANAN... (AN) ANANANAN... (AN)
^BANANA| NANANANA... (NA) BBBBBBBB... (B) BBBBBBBB... (B) ANNBAA^|
ANANANAN... (AN) NANANANA... (NA) NANANANA... (NA)
NANANANA... (NA) NANANANA... (NA) NANANANA... (NA)
AAAAAAAA... (A) ^^^^^^^^... (^) ^^^^^^^^... (^)
Inverse bijective transform

Input
ANNBAA^
A A AA AA
N A NA AN
N A NA AN
B B BB BB
A N AN NA
A N AN NA
^ ^ ^^ ^^
AAA AAA AAAA AAAA

NAN ANA NANA ANAN
NAN ANA NANA ANAN
BBB BBB BBBB BBBB
ANA NAN ANAN NANA
ANA NAN ANAN NANA
^^^ ^^^ ^^^^ ^^^^
Output
^BANANA
Up until the last step, the process is identical to the inverse Burrows–Wheeler process, but here it will not
necessarily give rotations of a single sequence; it instead gives rotations of Lyndon words (which will start
to repeat as the process is continued). Here, we can see (repetitions of) four distinct Lyndon words: (A),
(AN) (twice), (B), and (^). (NANA... doesn't represent a distinct word, as it is a cycle of ANAN....) At this
point, these words are sorted into reverse order: (^), (B), (AN), (AN), (A). These are then concatenated to
get
^BANANA
The Burrows–Wheeler transform can indeed be viewed as a special case of this bijective transform; instead
of the traditional introduction of a new letter from outside our alphabet to denote the end of the string, we
can introduce a new letter that compares as preceding all existing letters that is put at the beginning of the
string. The whole string is now a Lyndon word, and running it through the bijective process will therefore
result in a transformed result that, when inverted, gives back the Lyndon word, with no need for
reassembling at the end.
Relatedly, the transformed text will only differ from the result of BWT by one character per Lyndon word;
for example, if the input is decomposed into six Lyndon words, the output will only differ in six characters.
For example, applying the bijective transform gives:
Input SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
Lyndon words SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
Output STEYDST.E.IXXIIXXSMPPXS.B..EE..SUSFXDIOIIIIT
The bijective transform includes eight runs of identical characters. These runs are, in order: XX, II, XX,
PP, .., EE, .., and IIII.
In total, 18 characters are used in these runs.
Dynamic Burrows–Wheeler transform

When a text is edited, its Burrows–Wheeler transform will change. Salson et al.[9] propose an algorithm
that deduces the Burrows–Wheeler transform of an edited text from that of the original text, doing a limited
number of local reorderings in the original Burrows–Wheeler transform, which can be faster than
constructing the Burrows–Wheeler transform of the edited text directly.
Sample implementation
This Python implementation sacrifices speed for simplicity: the program is short, but takes more than the
linear time that would be desired in a practical implementation. It essentially does what the pseudocode
section does.
Using the STX/ETX control codes to mark the start and end of the text, and using s[i:] + s[:i] to
construct the ith rotation of s, the forward transform takes the last character of each of the sorted rows:
def bwt(s: str) -> str:

"""Apply Burrows–Wheeler transform to input string."""
assert "\002" not in s and "\003" not in s, "Input string cannot contain STX and ETX
characters"
s = "\002" + s + "\003" # Add start and end of text marker
table = sorted(s[i:] + s[:i] for i in range(len(s))) # Table of rotations of string
last_column = [row[-1:] for row in table] # Last characters of each row
return "".join(last_column) # Convert list of characters into string
The inverse transform repeatedly inserts r as the left column of the table and sorts the table. After the whole
table is built, it returns the row that ends with ETX, minus the STX and ETX.
def ibwt(r: str) -> str:

"""Apply inverse Burrows–Wheeler transform."""
table = [""] * len(r) # Make empty table
for _ in range(len(r)):
table = sorted(r[i] + table[i] for i in range(len(r))) # Add a column of r
s = next((row for row in table if row.endswith("\003")), "") # Iterate over and check
whether last charecter ends with ETX or not
return s.rstrip("\003").strip("\002") # Retrive data from array and get rid of start and
end markers
Following implementation notes from Manzini, it is equivalent to use a simple null character suffix instead.
The sorting should be done in colexicographic order (string read right-to-left), i.e. sorted(...,
key=lambda s: s[::-1]) in Python.[4] (The above control codes actually fail to satisfy EOF
being the last character; the two codes are actually the first. The rotation holds nevertheless.)
BWT applications
As a lossless compression algorithm the Burrows–Wheeler transform offers the important quality that its
encoding is reversible and hence the original data may be recovered from the resulting compression. The
lossless quality of Burrows algorithm has provided for different algorithms with different purposes in mind.
To name a few, Burrows–Wheeler transform is used in algorithms for sequence alignment, image
compression, data compression, etc. The following is a compilation of some uses given to the Burrows–
Wheeler Transform.
BWT for sequence alignment
The advent of next-generation sequencing (NGS) techniques at the end of the 2000s decade has led to
another application of the Burrows–Wheeler transformation. In NGS, DNA is fragmented into small pieces,
of which the first few bases are sequenced, yielding several millions of "reads", each 30 to 500 base pairs
("DNA characters") long. In many experiments, e.g., in ChIP-Seq, the task is now to align these reads to a
reference genome, i.e., to the known, nearly complete sequence of the organism in question (which may be
up to several billion base pairs long). A number of alignment programs, specialized for this task, were
published, which initially relied on hashing (e.g., Eland, SOAP,[10] or Maq[11]). In an effort to reduce the
memory requirement for sequence alignment, several alignment programs were developed (Bowtie,[12]
BWA,[13] and SOAP2[14]) that use the Burrows–Wheeler transform.
BWT for image compression
The Burrows–Wheeler transformation has proved to be fundamental for image compression applications.
For example, [15] Showed a compression pipeline based on the application of the Burrows–Wheeler
transformation followed by inversion, run-length, and arithmetic encoders. The pipeline developed in this
case is known as Burrows–Wheeler transform with an inversion encoder (BWIC). The results shown by
BWIC are shown to outperform the compression performance of well-known and widely used algorithms
like Lossless JPEG and JPEG 2000. BWIC is shown to outperform those in terms of final compression size
of radiography medical images on the order of 5.1% and 4.1% respectively. The improvements are
achieved by combining BWIC and a pre-BWIC scan of the image in a vertical snake order fashion. More
recently, additional works like that of [16] have shown the implementation of the Burrows–Wheeler
Transform in conjunction with the known move-to-front transform (MTF) achieve near lossless
compression of images.
BWT for compression of genomic databases

Cox et al. [17] presented a genomic compression scheme that uses BWT as the algorithm applied during the
first stage of compression of several genomic datasets including the human genomic information. Their
work proposed that BWT compression could be enhanced by including a second stage compression
mechanism called same-as-previous encoding ("SAP"), which makes use of the fact that suffixes of two or
more prefix letters could be equal. With the compression mechanism BWT-SAP, Cox et al. showed that in
the genomic database ERA015743, 135.5 GB in size, the compression scheme BWT-SAP compresses the
ERA015743 dataset by around 94%, to 8.2 GB.
BWT for sequence prediction
BWT has also been proved to be useful on sequence prediction which is a common area of study in
machine learning and natural-language processing. In particular, Ktistakis et al. [18] proposed a sequence
prediction scheme called SuBSeq that exploits the lossless compression of data of the Burrows–Wheeler
transform. SuBSeq exploits BWT by extracting the FM-index and then performing a series of operations
called backwardSearch, forwardSearch, neighbourExpansion, and getConsequents in order to search for
predictions given a suffix. The predictions are then classified based on a weight and put into an array from
which the element with the highest weight is given as the prediction from the SuBSeq algorithm. SuBSeq
has been shown to outperform state of the art algorithms for sequence prediction both in terms of training
time and accuracy.
References
1. Burrows, Michael; Wheeler, David J. (1994), A block sorting lossless data compression
algorithm (http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.html), Technical
Report 124, Digital Equipment Corporation
2. "adrien-mogenet/scala-bwt" (https://github.com/adrien-mogenet/scala-bwt/blob/master/src/m
ain/scala/me/algos/bwt/BurrowsWheelerCodec.scala). GitHub. Retrieved 19 April 2018.
3. Simpson, Jared T.; Durbin, Richard (2010-06-15). "Efficient construction of an assembly
string graph using the FM-index" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881401).
Bioinformatics. 26 (12): i367–i373. doi:10.1093/bioinformatics/btq217 (https://doi.org/10.109
3%2Fbioinformatics%2Fbtq217). ISSN 1367-4803 (https://www.worldcat.org/issn/1367-480
3). PMC 2881401 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881401).
PMID 20529929 (https://pubmed.ncbi.nlm.nih.gov/20529929).
4. Manzini, Giovanni (1999-08-18). "The Burrows–Wheeler Transform: Theory and Practice" (h
ttps://people.unipmn.it/manzini/papers/mfcs99x.pdf) (PDF). Mathematical Foundations of
Computer Science 1999: 24th International Symposium, MFCS'99 Szklarska Poreba,
Poland, September 6-10, 1999 Proceedings. Springer Science & Business Media.
ISBN 9783540664086. Archived (https://ghostarchive.org/archive/20221009/https://people.u
nipmn.it/manzini/papers/mfcs99x.pdf) (PDF) from the original on 2022-10-09.
5. Gil, J.; Scott, D. A. (2009), A bijective string sorting transform (http://bijective.dogma.net/00yy
y.pdf) (PDF)
6. Kufleitner, Manfred (2009), "On bijective variants of the Burrows–Wheeler transform", in
Holub, Jan; Žďárek, Jan (eds.), Prague Stringology Conference (http://www.stringology.org/e
vent/2009/p07.html), pp. 65–69, arXiv:0908.0239 (https://arxiv.org/abs/0908.0239),
Bibcode:2009arXiv0908.0239K (https://ui.adsabs.harvard.edu/abs/2009arXiv0908.0239K).
7. *Lothaire, M. (1997), Combinatorics on words, Encyclopedia of Mathematics and Its
Applications, vol. 17, Perrin, D.; Reutenauer, C.; Berstel, J.; Pin, J. E.; Pirillo, G.; Foata, D.;
Sakarovitch, J.; Simon, I.; Schützenberger, M. P.; Choffrut, C.; Cori, R.; Lyndon, Roger; Rota,
Gian-Carlo. Foreword by Roger Lyndon (2nd ed.), Cambridge University Press, p. 67,
ISBN 978-0-521-59924-5, Zbl 0874.20040 (https://zbmath.org/?format=complete&q=an:087
4.20040)
8. Duval, Jean-Pierre (1983), "Factorizing words over an ordered alphabet", Journal of
Algorithms, 4 (4): 363–381, doi:10.1016/0196-6774(83)90017-2 (https://doi.org/10.1016%2F
0196-6774%2883%2990017-2), ISSN 0196-6774 (https://www.worldcat.org/issn/0196-677
4), Zbl 0532.68061 (https://zbmath.org/?format=complete&q=an:0532.68061).
9. Salson M, Lecroq T, Léonard M, Mouchard L (2009). "A Four-Stage Algorithm for Updating a
Burrows–Wheeler Transform" (https://doi.org/10.1016%2Fj.tcs.2009.07.016). Theoretical
Computer Science. 410 (43): 4350–4359. doi:10.1016/j.tcs.2009.07.016 (https://doi.org/10.1
016%2Fj.tcs.2009.07.016).
10. Li R; et al. (2008). "SOAP: short oligonucleotide alignment program" (https://doi.org/10.109
3%2Fbioinformatics%2Fbtn025). Bioinformatics. 24 (5): 713–714.
doi:10.1093/bioinformatics/btn025 (https://doi.org/10.1093%2Fbioinformatics%2Fbtn025).
11. Li H, Ruan J, Durbin R (2008-08-19). "Mapping short DNA sequencing reads and calling
variants using mapping quality scores" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2577
856). Genome Research. 18 (11): 1851–1858. doi:10.1101/gr.078212.108 (https://doi.org/10.
1101%2Fgr.078212.108). PMC 2577856 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC25
77856). PMID 18714091 (https://pubmed.ncbi.nlm.nih.gov/18714091).
12. Langmead B, Trapnell C, Pop M, Salzberg SL (2009). "Ultrafast and memory-efficient
alignment of short DNA sequences to the human genome" (https://www.ncbi.nlm.nih.gov/pm
c/articles/PMC2690996). Genome Biology. 10 (3): R25. doi:10.1186/gb-2009-10-3-r25 (http
s://doi.org/10.1186%2Fgb-2009-10-3-r25). PMC 2690996 (https://www.ncbi.nlm.nih.gov/pm
c/articles/PMC2690996). PMID 19261174 (https://pubmed.ncbi.nlm.nih.gov/19261174).
13. Li H, Durbin R (2009). "Fast and accurate short read alignment with Burrows–Wheeler
Transform" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705234). Bioinformatics. 25
(14): 1754–1760. doi:10.1093/bioinformatics/btp324 (https://doi.org/10.1093%2Fbioinformati
cs%2Fbtp324). PMC 2705234 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705234).
14. Li R; et al. (2009). "SOAP2: an improved ultrafast tool for short read alignment" (https://doi.or
g/10.1093%2Fbioinformatics%2Fbtp336). Bioinformatics. 25 (15): 1966–1967.
doi:10.1093/bioinformatics/btp336 (https://doi.org/10.1093%2Fbioinformatics%2Fbtp336).
15. Collin P, Arnavut Z, Koc B (2015). "Lossless compression of medical images using
Burrows–Wheeler Transformation with Inversion Coder" (https://ieeexplore.ieee.org/docume
nt/7319012). 2015 37th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC). Vol. 2015. pp. 2956–2959.
doi:10.1109/EMBC.2015.7319012 (https://doi.org/10.1109%2FEMBC.2015.7319012).
ISBN 978-1-4244-9271-8. PMID 26736912 (https://pubmed.ncbi.nlm.nih.gov/26736912).
S2CID 4460328 (https://api.semanticscholar.org/CorpusID:4460328).
16. Devadoss CP, Sankaragomathi B (2019). "Near lossless medical image compression using
block BWT–MTF and hybrid fractal compression techniques" (https://link.springer.com/articl
e/10.1007/s10586-018-1801-3). Cluster Computing. 22: 12929–12937. doi:10.1007/s10586-
018-1801-3 (https://doi.org/10.1007%2Fs10586-018-1801-3). S2CID 33687086 (https://api.s
emanticscholar.org/CorpusID:33687086).
17. Cox AJ, Bauer MJ, Jakobi T, Rosone G (2012). "Large-scale compression of genomic
sequence databases with the Burrows–Wheeler transform". Bioinformatics. Oxford
University Press. 28 (11): 1415–1419. arXiv:1205.0192 (https://arxiv.org/abs/1205.0192).
doi:10.1093/bioinformatics/bts173 (https://doi.org/10.1093%2Fbioinformatics%2Fbts173).
18. Ktistakis R, Fournier-Viger P, Puglisi SJ, Raman R (2019). "Succinct BWT-Based Sequence
Prediction" (https://figshare.com/articles/conference_contribution/Succinct_BWT-based_Seq
uence_prediction/10200137). Database and Expert Systems Applications. Lecture Notes in
Computer Science. Vol. 11707. pp. 91–101. doi:10.1007/978-3-030-27618-8_7 (https://doi.or
g/10.1007%2F978-3-030-27618-8_7). ISBN 978-3-030-27617-1. S2CID 201058996 (https://
api.semanticscholar.org/CorpusID:201058996).
External links
Article by Mark Nelson on the BWT (http://marknelson.us/1996/09/01/bwt/)
A Bijective String-Sorting Transform, by Gil and Scott (http://bijective.dogma.net/00yyy.pdf)
Yuta's openbwt-v1.5.zip contains source code for various BWT routines including BWTS for
bijective version (https://web.archive.org/web/20170306035431/https://encode.ru/attachmen
t.php?attachmentid=959&d=1249146089)
On Bijective Variants of the Burrows–Wheeler Transform, by Kufleitner (https://arxiv.org/abs/
0908.0239)
Blog post (http://google-opensource.blogspot.com/2008/06/debuting-dcs-bwt-experimental-b
urrows.html) and project page (https://code.google.com/p/dcs-bwt-compressor/) for an open-
source compression program and library based on the Burrows–Wheeler algorithm
MIT open courseware lecture on BWT (Foundations of Computational and Systems Biology)
(https://www.youtube.com/watch?v=P3ORBMon8aw)
League Table Sort (LTS) or The Weighting algorithm to BWT by Abderrahim Hechachena (ht
tps://github.com/abderrahimh/ARahim)
Retrieved from "https://en.wikipedia.org/w/index.php?title=Burrows–Wheeler_transform&oldid=1160558208"

Burrows-Wheeler Transform

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Burrows-Wheeler Transform

Uploaded by

Copyright:

Available Formats

Burrows–Wheeler transform

The Burrows–Wheeler transform (BWT, also called block-

2. All 3. Sort into 4. Take the

^BANANA| ANANA|^B ANANA|^B

function BWT (string s)

function inverseBWT (string s)

Add 1 Sort 1 Add 2 Sort 2

Add 3 Sort 3 Add 4 Sort 4

BAN ANA BANA ANAN

Add 5 Sort 5 Add 6 Sort 6

BANAN ANANA BANANA ANANA|

Add 7 Sort 7 Add 8 Sort 8

BANANA| ANANA|^ BANANA|^ ANANA|^B

All Last column

^^^^^^^^... (^) AAAAAAAA... (A) AAAAAAAA... (A)

Inverse bijective transform

Add 1 Sort 1 Add 2 Sort 2

Add 3 Sort 3 Add 4 Sort 4

AAA AAA AAAA AAAA

Lyndon words SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES

In total, 18 characters are used in these runs.

Dynamic Burrows–Wheeler transform

def bwt(s: str) -> str:

def ibwt(r: str) -> str:

BWT for sequence alignment

BWT for image compression

BWT for compression of genomic databases

BWT for sequence prediction

Retrieved from "https://en.wikipedia.org/w/index.php?title=Burrows–Wheeler_transform&oldid=1160558208"

You might also like