Welcome to the home page of our deep-shallow suffix array and BWT construction algorithm. This algorithm has been developed by Giovanni Manzini (Università del Piemonte Orientale) and Paolo Ferragina (Università di Pisa). ADetails and experiments on this algorithm have been published Algorithmica, 40(1), 2004.
All the known algorithms for building the suffix array either require a large amount of space or are inefficient when the input string contains many repeated substrings. Our algorithm was designed to overcome this unpleasant dichotomy by deploying the following idea: use a ``shallow'' sorter for the suffixes with a small common prefix, and a ``deep'' sorter for the suffixes with a large common prefix. The algorithm is very fast in practice and uses a very small amount of space---in addition to the space required by the suffix array itself (hence, it is named lightweight).
The source code of our lightweight suffix-sorting algorithm is
available under GNU GPL. It has been tested only under Linux, but it
should work on any platform supporting GCC. You are encouraged to download it and report any problem to the
authors. The archive includes also an algorithm to compute the
Burrows-Wheeler Tranform of a string (the basic step of the
bzip2 compressor) and some algorithms to compute the LCP
array (published on SWAT 2004).
Note. A recent paper by Stefan
Burkhardt and Juha Karkkainen (Fast lightweight suffix array
construction and checking, Proc. CPM '03, Springer-Verlag LNCS
n. 2676) describes a new approach to lightweight suffix array
construction which is theoretically superior to our
approach. Anyone interested in lightweight suffix array construction
should definitely look at this paper!