Introduction to cardinality sorting and its implementation in Java language
basic thought
Radix sort is developed on the basis of bucket sort. Both sorts are advanced implementations of allocation sort. The basic idea of distributive sort: the sorting process does not need to compare keywords, but realizes sorting through the processes of "allocation" and "collection". Their time complexity can reach linear order: O (n).
Cardinality sorting is a stable sorting algorithm, but it has some limitations:
1. Keywords can be decomposed. 2. The number of key words in the record is less. If it is dense, it is better. 3. If it is a number, it is better to be unsigned, otherwise it will increase the corresponding mapping complexity. You can sort the positive and negative separately first.
Let's take a look at the bucket sort.
Bucket sorting is also called binport. Its basic idea is to set several buckets, scan the records to be sorted R [0], R [1],..., R [n-1], load all the records with keywords in a certain range into the kth bucket (allocation), and then connect the head and tail of each non empty bucket according to the sequence number (Collection).
For example, to sort 52 shuffled playing cards according to points a < 2 <... < J < Q < K, 13 "buckets" need to be set. When sorting, each card is put into the corresponding bucket according to points, and then these buckets are connected from head to tail to get a pair of cards arranged in the order of increasing points.
In bucket sorting, the number of buckets depends on the value range of keywords. Therefore, bucket sorting requires that the type of keyword is limited, otherwise infinite buckets may be required.
Generally, it is unpredictable how many records with the same keyword are stored in each bucket, so the bucket type should be designed as a linked list.
In order to ensure the stability of sorting, the connection in the process of packing and collection must be carried out according to the principle of first in first out.
For bucket sorting, the time of allocation process is O (n); The time of the collection process is O (m) (the linked list is used to store the input records to be sorted) or O (M + n). Therefore, the bucket sorting time is O (M + n). If the order of magnitude of the number of buckets m is O (n), the bucket sorting time is linear, i.e. o (n).
The time complexity of most of the above-mentioned sorting algorithms is O (N2), and the time complexity of some sorting algorithms is O (nlogn). Bucket sorting can achieve o (n) time complexity. But the disadvantages of bucket sorting are: first, the space complexity is relatively high and the additional overhead is large. Sorting has the space overhead of two arrays, one is to store the array to be sorted, and the other is the so-called bucket. For example, if the value to be sorted is from 0 to M-1, m buckets are required, and this bucket array needs at least m spaces. Secondly, the elements to be sorted must be within a certain range, and so on.
Cardinality sorting is an improvement on bucket sorting, which makes bucket sorting suitable for a larger set of element values, rather than improving performance.
We still use the example of playing cards to illustrate. A card consists of two keywords: design and color (peach < heart < plum < Square) + face value (a < 2 < 3 < 4 <... < K). If the size of a card is first determined by the suit, and the cards of the same suit are determined by numbers. We have two algorithms to solve this problem.
That is, if the designs and colors of the two cards are different, regardless of the face value, the card with low design and color is smaller than that with high design and color. Only in the case of the same design and color, the size relationship is determined by the face value. This is multi key sorting.
In order to get the sorting results, we discuss two sorting methods.
Method 1: the designs and colors were sorted and divided into four groups, namely plum blossom group, square group, red heart group and black heart group. Then sort each group by face value. Finally, connect the four groups.
Method 2: first give 13 number groups (No. 2, No. 3,..., No. a) according to 13 face values, put the cards into the corresponding number group according to the face values, and divide them into 13 piles. Then four number groups (plum blossom, square, red heart and black heart) are given according to the color. Take out the cards in group 2 and put them into the corresponding color group respectively, and then take out the cards in group 3 and put them into the corresponding color group,... In this way, the four color groups are orderly according to the face value, and then connect the four color groups in turn.
Multi key sorting is carried out successively from the most primary key to the most secondary key or from the most secondary key to the most primary key. There are two methods:
The most significant digital first method (MSD method for short):
1) Firstly, the sequence is sorted and grouped according to K1, and the sequence is divided into several subsequences. In the records of the same group of sequences, the key K1 is equal.
2) Then each group is sorted into subgroups according to K2, and then the following key codes continue to be sorted and grouped until the subgroups are sorted according to the most minor key code KD.
3) Then connect the groups to get an ordered sequence. The first method introduced in the sorting of playing cards by color and face value is MSD method.
Least significant digital first (LSD) method for short:
1) Start sorting from KD, then sort KD-1 and repeat it in turn until it is sorted by K1 and divided into the smallest subsequence.
2) Finally, connect the sub sequences to get an ordered sequence. The second method introduced in the sorting of playing cards by color and face value is LSD method.
A single keyword of numeric or character type can be regarded as a multi keyword composed of multiple digits or characters, At this time, it can be taken This process is called cardinality sorting, in which the number of possible values of each number or character is called cardinality. For example, the cardinality of the decor of playing cards is 4 and the cardinality of the face value is 13. When sorting playing cards, you can sort them by decor or face value first. When sorting by decor, you can divide them into 4 in the order of red, black, square and flower Stack (distribute), then stack them together (collect) in this order, then divide them into 13 stacks (distribute) in the order of face value, and then stack them together (collect) in this order. In this way, the playing cards can be arranged in order by secondary distribution and collection.
In the process of "allocation collection", it is necessary to ensure the stability of sorting.
The idea of cardinality sorting is to allocate each group of keywords in the data to be sorted in order. For example, the following waiting list:
135、242、192、93、345、11、24、19
We divide the bits, tens and hundreds of each value into three keywords, for example:
135 - > K1 (bits) = 5, K2 (TENS) = 3, K3 (hundreds) = 1.
Then, starting from the lowest bit (starting from the second keyword), allocate buckets for K1 keywords of all data (because each number is 0-9, so the bucket size is 10), and then output the data in the bucket in turn to obtain the following sequence.
(11) , (242, 192), (93), (24), (135, 345), (19) (sort from the lowest keyword, ignore the hundreds and tens, and allocate according to the numbers in the individual bits)
Then, the bucket allocation for K2 is carried out for the above sequence, and the output sequence is:
(11, 19), (24), (135), (242, 345), (192, 93) (sort the second keyword by referring to the last keyword, ignore the hundreds and individual bits, and allocate according to the number on the TENS)
Finally, for bucket allocation of K3, the output sequence is:
(011, 019, 024, 093), (135, 192), (242), (345) (refer to the second keyword to sort the highest keyword, ignore ten bits and one bit, and allocate according to the number above the hundred bit)
Sort complete.
Java implementation
The printing results are as follows:
algorithm analysis
At first glance, the execution efficiency of cardinality sorting seems unbelievable. All we have to do is copy the original data items from the array to the linked list, and then copy them back. If there are 10 data items, there are 20 copies, and the process is repeated for each bit. If you sort 5-digit numbers, you need 20 * 5 = 100 copies. If there are 100 data items, there will be 200 * 5 = 1000 copies. The number of copies is directly proportional to the number of data items, i.e. o (n). This is the most efficient sorting algorithm we have seen.
Unfortunately, the more data items, the longer keywords are required. If the data items are increased by 10 times, the keywords must be increased by one bit (one more round of sorting). The number of copies and the number of data items are directly proportional to the keyword length, and the keyword length can be considered as the logarithm of N. therefore, in most cases, the execution efficiency of cardinal sorting regresses to o (n * logn), Similar to quick sort.
summary
The above is all about the introduction of cardinality sorting and the implementation of Java language. I hope it will be helpful to you. Interested friends can continue to refer to other related topics on this site. If there are deficiencies, please leave a message to point out.