Detailed explanation of bucket sorting implementation method of Java data structure and algorithm

This paper describes the implementation of bucket sorting of Java data structure and algorithm. Share with you for your reference, as follows:

Basic idea:

Suppose that the input is a real number evenly distributed on the [0, m) interval generated by a random process. Divide the interval [0, m) into n sub intervals (buckets) of equal size, allocate n input elements to these buckets, sort the elements in the buckets, and then connect the buckets in turn. Input 0 ≤ a [1.. n] < m auxiliary array B [0.. n-1] is a pointer array pointing to the bucket (linked list). Distribute n records into each bucket. If more than one record is divided into the same bucket, it is necessary to sort in the bucket. Finally, list the records in each bucket in turn and remember the ordered sequence.

[bucket keyword] mapping function

Bindex = f (key), where bindex is the subscript of bucket array B (i.e. the bindex bucket), and K is the keyword of the sequence to be arranged. The key to efficient bucket sorting lies in the mapping function, which must be: if the keyword K1 < K2, then f (K1) < = f (K2). That is, the maximum data in B (I) is smaller than the minimum data in B (I + 1). Obviously, the determination of the mapping function is closely related to the characteristics of the data itself. Let's take the following example:

If the column to be sorted k = {49, 38, 35, 97, 76, 73, 27, 49}. These data are all between 1 and 100. Therefore, we customize 10 buckets and determine the mapping function f (k) = K / 10. Then the first keyword 49 will be located in the fourth bucket (49 / 10 = 4). Heap all keywords into the bucket in turn, and quickly sort them in each non empty bucket, as shown in the following figure:

For the above figure, as long as the data in each B [i] is output sequentially, the ordered sequence can be obtained.

The core code of the algorithm is as follows:

Cost analysis of bucket sorting

Bucket sorting uses the mapping relationship of functions to reduce almost all the comparison work. In fact, the calculation of F (k) value of bucket sorting is equivalent to the division in fast sorting, which has divided a large amount of data into basically ordered data blocks (buckets). Then, only a small amount of data in the bucket needs to be compared and sorted.

The time complexity of bucket sorting for n keywords is divided into two parts:

(1) The bucket mapping function of each keyword is calculated circularly, and the time complexity is O (n). (2) The advanced comparison sorting algorithm is used to sort all the data in each bucket, and its time complexity is Σ o (Ni * logni). Where Ni is the data amount of the ith bucket.

Obviously, part (2) is the determinant of bucket sorting performance. Minimizing the amount of data in the bucket is the only way to improve efficiency (because the best average time complexity based on comparison sorting can only reach o (n * logn)). Therefore, we need to try our best to achieve the following two points:

(1) The mapping function f (k) can evenly distribute n data into m buckets, so that each bucket has [n / M] data. (2) Try to increase the number of barrels. In the extreme case, each bucket can only get one data, which completely avoids the "comparison" sorting operation of the data in the bucket. Of course, it is not easy to do this. When the amount of data is huge, f (k) function will make the number of bucket sets huge and waste space seriously. This is a trade-off between time cost and space cost.

For N data to be queued, m buckets, the average bucket sorting time complexity of [n / M] data per bucket is:

O(N)+O(M*(N/M)*log(N/M))=O(N+N*(logN-logM))=O(N+N*logN-N*logM)

When n = m, that is, there is only one data per barrel in the limit case. The best efficiency of bucket sorting can reach o (n).

Summary: the average time complexity of bucket sorting is linear o (n + C), where C = n * (logn logm). If relative to the same N, the greater the number of barrels m, the higher the efficiency, and the best time complexity reaches o (n). Of course, the spatial complexity of bucket sorting is O (n + m). If the input data is very large and the number of buckets is very large, the spatial cost is undoubtedly expensive. In addition, bucket sorting is stable.

That is, the following three points:

1. Bucket sorting is stable 2 Bucket sorting is the fastest sort in common sorting, which is faster than fast sorting... In most cases, 3 Bucket sorting is very fast, but it also consumes a lot of space. It is basically the most space-consuming sorting algorithm

Supplement: among the search algorithms, the best time complexity of the search algorithm based on comparison is also o (logn). Such as half search, balanced binary tree, red black tree, etc. However, hash table has o (c) linear search efficiency (the search efficiency reaches o (1) without conflict). So: does the idea of hash table and bucket sorting have the same work?

In fact, bucket sorting has special requirements for data conditions. If the array is large, it is obviously impossible to allocate hundreds of millions of buckets. Therefore, bucket sorting has its limitations, which is suitable for the case where the combination of element value sets is small.

For more information about Java algorithms, readers who are interested can see the topics on this site: Java data structure and algorithm tutorial, summary of Java DOM node operation skills, summary of java file and directory operation skills, and summary of Java cache operation skills

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>