Random enumeration of hash tables in Ocaml

Sorry, it's a long question I decided to explain the background of the problem first, because my problem may have other solutions If you are in a hurry, please read the following questions

(edit – at the same time, I added some attempts to solve the problem. The fourth has my final conclusion, you can jump directly to it.)

context

I have a hash table with about 20K pairs (key (I), value (I)) I want to generate a random list like this

[(key(213),value(213));(key(127),value(127));(key(89),value(89));...]

The limitation is that once I select key (213) as the first element of the list, not all keys can follow it (I have some other functions to 'decide' which can decide whether a key can be the next key. List or not) So, I want to select a random next element and check if it's appropriate - in the above example, I selected the key (127) If this element is rejected by my 'decide' function, I want to choose another one at random But I don't want to choose the same rejected one, because I know it will be rejected again and it will not only be inefficient. If only a few keys can become the next one, I will also take a risk. It takes a long time until I find the right key. Note that it can be repeated, for example

[(key(213),value(213));(key(213),value(213));(key(78),value(78));...]

This is possible as long as the 'decision' function accepts the key (213) as the next one in the list So what I need is a way to randomly enumerate (key, value) pairs in the hash table Whenever I have to select a key, I create an enumeration. I consume it by using the 'decision' function to check each new element (so there will be no repetition). When I find one, I add it to the list and continue to increment the list The problem is that I don't want the hash table enumeration to be the same every time I hope it's random (this is related to the structure of my search space in a specific problem, which is irrelevant here.)

I can certainly do this by generating random integers and using only lists - that's what I'm doing right now However, since this is a problem I often encounter, I wonder if there are some random enumeration tools for hash tables somewhere

problem

Does the hash table have some random enumeration functions? I know the function bathashtbl Enum (battery Library), but I think it will always give me the same hash table and the same enumeration (is this correct?) In addition, there seems to be nothing of any kind in the bat hashtbl module I'll be interested in something like that

random_enum: ('a,'b) t -> int -> ('a * 'b) Enum.t

When a hash table and some integers are provided as seeds of the random generator, it will give different random enumerations of the hash table Any ideas?

Thanks for your help!

Best, surikator

First attempt

I came up with this after Niki suggested in the comments and looked at more details through the battery store

let rand_enum ht n =
BatRandom.init n;
let hte = BatHashtbl.enum ht
in let s = BatRandom.shuffle hte (* This returns*)
in Array.to_list s

What type

val rand_enum : ('a,'b) BatHashtbl.t -> int -> ('a*'b) list

It uses Fisher Yates algorithm for shuffling running in O (n) It returns a list instead of an enumeration, which is annoying because it means that even if I use Rand_ Enum is satisfied with the third element of the list, and the function will still calculate the random enumeration of the entire 20K element Hash table

Best, surikator

Second attempt

I define the module rndhashtblenum as

(* Random Hashtable Enumeration Module *)
type ('a,'b) t = {
   ht:('a,'b) BatHashtbl.t;
   mutable ls:('a*'b) list;
   f: (('a,'b) BatHashtbl.t -> ('a*'b) list)}

let shuffle ht =
  let hte = BatHashtbl.enum ht
  in let s = BatRandom.shuffle hte
  in Array.to_list s

let create ht n = (BatRandom.init n; {ht=ht;ls=shuffle ht;f=shuffle})

let rec next re =
match re.ls with
    | [] -> re.ls<-(re.f re.ht);next re
    | h::t -> re.ls<-t; h

It has a new type T for randomly enumerating hash tables. This type stores the hash table we want to enumerate, the list we will enumerate, and the function to calculate a new enumeration list (from the hash table) once we run out of lists Once the list runs out, when we request a new random element of the hash table, type T is automatically placed in the new random list created from the hash table

Therefore, using the above module, if we want to randomly enumerate hash tables, we just need to:

let re = RndHashtblEnum.create ht 1236

Create a random enumeration of the hash table HT using random seed 1236 (in this code, I assume that the hash table is defined earlier) and then we can write

let (k,v) = RndHashtblEnum.next re

Get the next (k, V) pair from the random enumeration

One question we might ask is whether this is actually fair randomness, because the next time I need random enumeration, I use the rest of the list to randomly enumerate the hash table Well, that's not the case If my hash table has 1000 elements and I am satisfied with the results after extracting 5 random numbers, I know that these 5 elements will not be extracted in the next 995 (second group extraction) So this is not a fair randomness The situation is even worse It is likely that in the next 1000 extracts (995 in this list and 5 in the next enumeration list), some elements will not be overwritten On average, the algorithm is fair, but it is always unfair

Best, surikator

Third attempt

Hello, we meet again,

Including Niki's use of batarray With the suggestion of enum and the fundamental change of random part of the algorithm, I propose a new and improved version of rndhashtblenum module The recommendations are:

(* Improved Random Hashtable Enumeration Module *)
type ('a,'b) t = {ht:('a,'b) BatHashtbl.t; mutable enum:('a*'b) BatEnum.t; enum0: ('a*'b) BatEnum.t}

let shuffle ht =
let hte = BatHashtbl.enum ht
in let s = BatRandom.shuffle hte
in BatArray.enum s

let create ht n =
let e = shuffle ht
in (BatRandom.init n; {ht=ht;enum=BatEnum.clone e;enum0=e})

let rec next re =
match BatEnum.get re.enum with
    | None -> re.enum<-re.enum0; next re
    | Some e -> e

This new module eliminates the (stupid) cost of passing arrays to lists and uses only the Fisher Yates algorithm at the beginning - so in the long run, we can consider that the contribution of Fisher Yates bits is O (1)

The new version is now fair in terms of randomness It's not easy to see. It took me a while to realize it Suppose the hash table has 1000 entries In the new version, we always use the same enum (enum0 – fixed when creating random enums using the "create" function) This means that when we try to find the next element in the final list, because some keys in the hash table must meet the "decision" function (otherwise we will not be able to continue the algorithm, we will only stop), it will do so somewhere between articles 0 and 999 Suppose it's on entry 300 Now, since we have selected this key, in order to determine the next key in the final list, our enumeration will continue to use the remaining 700 elements, and then continue to use the next 300 in the same copy List Therefore, 700 and 300 generate exactly 1000 in the hash table This means that we will always consider each entry in the hash table only once Another thing is that whenever we try to find a key to enter the list, we can find the label on entry 300, entry 734 or other content, because the decision function actually depends on which previous keys are selected This is on the final list Therefore, every time we restart looking for the elements of the final list in the hash table, we start with the random elements of the hash table

Sorry, if this is not very clear It's hard to explain. =)

Thanks for all the comments

Best, surikator

Fourth and final attempt – this is my solution

Hey, again,

Sharing gasche's concerns about variable fields and enumerations and all the strange side effects that might come from there, I decided to forget to use the off the shelf solution of the available hash table library and write my things using ordinary lists I also brought laziness to avoid generating random lists, only some of which will be used (so there are some useful lazy things to use as you suggest, Niki)

I created this type

type 'a node_t =
   | ENil
   | ECons of 'a * 'a list * 'a t
and 'a t = ('a node_t) Lazy.t

For lazy random enumeration of lists Each enumeration is empty (enil) or not (econs), in which case it has three parts: (1) the element of the current focus, (2) the remaining available elements of the enumeration, and (3) another enumeration to continue this enumeration

You can then use the create function to obtain a random enumeration of the list

let rec create ls =
lazy(   match ls with
    | [] -> ENil
    | h::t -> let n = Random.int (List.length ls)
              in let newx,rest=remove ls n
          in ECons(newx,rest,create t))

The auxiliary deletion function has been defined to extract the nth element of the list and return a pair (x, LS), where x is the extracted element and LS is the new list without extracted element For the sake of completeness, I also added the code to delete the function here

let rec remove ls n =
let rec remove_ ls acc k n =
    match ls with
        | []        -> raise (Failure "remove")
        | h::t  -> if k=n
            then    h,List.rev_append acc t
            else remove_ t (h::acc) (k+1) n
in remove_ ls [] 0 n

We can now define a very simple function to generate the next state of the random enumeration and get the actual element in each state of the enumeration Those are

exception End_of_enum
let next e =
match Lazy.force e with
    | ENil -> raise End_of_enum
    | ECons(x,ls,t) -> t
let rec get e =
match Lazy.force e with
    | ENil -> raise End_of_enum
    | ECons(x,t) -> x

OK, so far, I've just made a random list If we want to enumerate hash tables, we can use

let rand_enum ht =
let ls = Hashtbl.fold (fun k v acc -> (k,v) :: acc) ht []
in create ls

To obtain the random enumeration of pairs in the hash table, we can use next and obtain (key, value) pairs Collapse, but it's just a method to get the hash table of all (key, value) pairs in the list (thanks to Pascal's answer in this question)

This ends the whole hash table enumeration For completeness, I have also added solutions to the overall problem I am trying to solve, which are explained in the "context" above If you remember, the problem is to (1) randomly generate a list of (key, value) pairs from the hash table, and (2) determine whether (key, value) can be attached to the decision function specific pair list of some hash tables Since the whole generation process may never terminate, in order to ensure termination, I think there is a third parameter that makes sense. This function tells us whether the process should be stopped (we should ensure that it returns true at some time). The whole process terminates)

The function generate might look like this

let generate ht d stop =
let rec gen1 d fst e =
    if d (List.rev fst) (get e)
                then (get e)::fst
                else gen1 d fst (next e)
in let rec generate_ ht d stop acc =
            let e = rand_enum ht
            in  if stop acc
                        then acc
                        else    try generate_ ht d stop (gen1 d acc e)
                          with End_of_enum -> generate_ ht d stop (List.tl acc)
in generate_ ht d stop []

Thank you very much for all the useful comments It's really helpful

Good luck, surikator

Solution

I have two suggestions The first is to change your Rand_ Enum function, so it returns an enum t:

let rand_enum ht n =
BatRandom.init n;
let hte = BatHashtbl.enum ht
in Array.enum (BatRandom.shuffle hte)

It's not very different (it still calculates all 20K random enumerations) but it's closer to what you originally wanted

Alternatively, you can always get the source code of hashtbl and use Rand_ The enum function recompiles it However, this may not be so different, because hashtbl is implemented as an array. If you want to avoid repeated errors, you may eventually use shuffle

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>