Misplaced Pages

Burstsort

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
This article includes a list of general references, but it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (July 2017) (Learn how and when to remove this message)
Burstsort
ClassSorting algorithm
Data structureTrie
Worst-case performanceO(wn)
Worst-case space complexityO(wn)
Optimal?

Burstsort and its variants are cache-efficient algorithms for sorting strings. They are variants of the traditional radix sort but faster for large data sets of common strings, first published in 2003, with some optimizing versions published in later years.

Burstsort algorithms use a trie to store prefixes of strings, with growable arrays of pointers as end nodes containing sorted, unique, suffixes (referred to as buckets). Some variants copy the string tails into the buckets. As the buckets grow beyond a predetermined threshold, the buckets are "burst" into tries, giving the sort its name. A more recent variant uses a bucket index with smaller sub-buckets to reduce memory usage. Most implementations delegate to multikey quicksort, an extension of three-way radix quicksort, to sort the contents of the buckets. By dividing the input into buckets with common prefixes, the sorting can be done in a cache-efficient manner.

Burstsort was introduced as a sort that is similar to MSD radix sort, but is faster due to being aware of caching and related radixes being stored closer to each other due to specifics of trie structure. It exploits specifics of strings that are usually encountered in real world. And although asymptotically it is the same as radix sort, with time complexity of O(wn) (w – word length and n – number of strings to be sorted), but due to better memory distribution it tends to be twice as fast on big data sets of strings. It has been billed as the "fastest known algorithm to sort large sets of strings".

References

  1. ^ Sinha, R.; Zobel, J. (2005). "Cache-conscious sorting of large sets of strings with dynamic tries" (PDF). Journal of Experimental Algorithmics. 9: 1.5. CiteSeerX 10.1.1.599.861. doi:10.1145/1005813.1041517. S2CID 10807318.
  2. "Burstsort: Fastest known algorithm to sort large set of strings | Hacker News".

External links

Sorting algorithms
Theory
Exchange sorts
Selection sorts
Insertion sorts
Merge sorts
Distribution sorts
Concurrent sorts
Hybrid sorts
Other
Impractical sorts
Category:
Burstsort Add topic