Re: Constructing a random permutation on the fly
- From: David Eather <eather@xxxxxxxxxx>
- Date: Mon, 15 Jun 2009 17:15:08 +1000
David Eather wrote:
Tom Anderson wrote:Hello!
This question is not really about cryptography, but it has a cryptographyish flavour, so i thought i'd ask it here. If there's a better place to ask it, please let me know.
I have a list of N items, where N is about 100 000, and a few thousand or more users. I would like to present each user with all the items in a random order, where the order is different for each user. I only need to go through each user's randomised list of items one by one, rather than having random access into the list, but the process of going through them could take days, weeks, perhaps even months.
I could put all the items in an array, shuffle them, and go through them from first to last, storing the shuffled array for as long as it's needed. Or i could put the numbers 0 ... N-1 in an array, shuffle and store that, and go through that, using the numbers and indices into a master list of items. However, i would like to avoid having to store great big lists of numbers if at all possible.
So, what can i do?
<snip> </snip>
To create a random permutation of size N.
1. Find the number of minimum bits (n-bits) required for N
n-bits = int( log(N) / log(2) +1)
for N = 100000, n-bits = 17 bits
2. Find the required length of the user key
The MINIMUM length of user key (or seed) is
seed_length = int( log(N**2) / log(2) +1)
for N = 100000, seed_length = 34 bits
The length of key is to make sure that everyone has a unique seed. Note that this is a minimum length and it means that the chance of a matching key pair amongst all users is less than 50%.
I suggest that a more conservative seed length is at least
seed_length = int( log(N**3) / log(2) +1)
for N = 100000 this suggests seed_length = 50 bits
Use any convenient key length that is longer than minimum requirement (eg. in the N=100000 example 40-bits would OK. But 64-bits would be a good and more conservative choice.
3. Create a quality random number generator whose output is in the range 0 to N-1 (pseudo code follows)
Get_Random:
increment iteration_counter
random_candidate = Hash( key || iteration_counter)
random_candidate = Random_Candidate AND (2**n-bits) -1
if random_candidate >= N-1 then Get_Random
Else return random_candidate
The variable iteration_counter is a stateful variable with local scope. It ensure the hash function behaves like a random number generator. Otherwise, if the hash function hit a fixed point it would then output a never ending stream of identical numbers, which is not a behavior expected of a random number generator.
Also, if the hash function gave two identical outputs for two different keys, then from that matching point onwards, both keys would produce a identical output. Using the iteration counter stops the above from happening. An alternative is to use a bigger (and stronger) hash function so these probabilities become so remote as to be negligible, but the iteration counter is quicker and makes the probabilities of this undesirable behavior zero.
The hash function is just a standard hash function that provides the properties you require, such as size and/or cryptographic security. The internal state of the hash function should be as large or larger than the minimum seed_length.
Some examples of useful hash functions would be CRC-32, CRC-64, MD-4, MD-5, SHA-1 or SHA-2. In the N=100000 example you could not use CRC-32 without possible problems (32-bits is too small an internal state when for N = 100000, seed_length = 34 bits). My preferences would be to use MD-4 for any non-cryptographic purpose (it is faster than CRC-32 but with a much larger state of 128-bits) and to use SHA-2 using 256-bits, 384-bits or 512-bits as required for any generator that requires cryptographic strength.
The || symbol simply means concatenation. The input to the hash function is 0xKEY || 0xIteration_Counter. If required certain types of off-line cryptographic attacks can be made more difficult by also concatenating a “salt” at this point.
The AND function is a bit-wise AND which clears all unnecessary MSB's in random_candidate variable. This limits the range of possible outputs for random-candidate so that candidates will pass with better than a 50% probability in the long run (76.3% in the 100000 sample case). This method does not introduce any bias and is required for speed.
Note that the output sequence is exactly the same for each key.
4. To create the required permutation from the key.
dim permutation_array(N-1)
for i = 0 to n-1 step 1
permutation_array(i) = i.
for i = 0 to n-1 step 1
j = Get_Random(key)
swap permutation_array(i), permutation_array(j)
The first “for i” loop sets permutation_array into a known default condition. In this case it is 0,1,2,3 … N-1. If you didn't do this you could not directly create the required permutation from just the key.
Completing the second “for i” loop produces the shuffled permutation. Pointer i sequentially selects each element in the permutation_array and the swap function exchanges that element with the element selected by the pseudo random pointer j.
5. Done.
If I have not expressed something clearly or I have made an error then please let me know.
.
- Follow-Ups:
- Re: Constructing a random permutation on the fly
- From: Maaartin
- Re: Constructing a random permutation on the fly
- From: David Eather
- Re: Constructing a random permutation on the fly
- References:
- Constructing a random permutation on the fly
- From: Tom Anderson
- Re: Constructing a random permutation on the fly
- From: David Eather
- Constructing a random permutation on the fly
- Prev by Date: shifts for 64-bit lfsr?
- Next by Date: Re: Job Finished - Adacrypt
- Previous by thread: Re: Constructing a random permutation on the fly
- Next by thread: Re: Constructing a random permutation on the fly
- Index(es):
Relevant Pages
|
Loading