RandomBitStream

© 2006,2007,2008 John Abbott
GNU Free Documentation License, Version 1.2



index page

User documentation for RandomBitSteam

The class RandomBitSteam is supposed to model a binary random variable (with 50% probability of being "true" and 50% of being "false", and with independent samples). See RandomLongStream for a way to generate random numbers.

There are three ways of creating a new RandomBitSteam object:

    RandomBitStream RBS1;      // default ctor, seeded with 1
    RandomBitStream RBS2(n);   // seed generator with abs(n)
    RandomBitStream RBS3(0);   // seed generator from current time

Note that seeding from current time will produce different results each time the program is run; this is likely desirable unless you're trying to debug a randomized algorithm. Conversely, if you create more than one RandomBitStream object with the same seed, they will each produce exactly the same sequence of bits.

Once you have created a RandomBitStream you may perform the following operations on it:

    *RBS             // get the current value of RBS (as a boolean).
    ++RBS            // advance to next value of RBS.
    RBS++            // advance to next value of RBS **INEFFICIENTLY**.
    sample(RBS)      // advance RBS and then return new value; same as ``*++RBS``
    prob(P, RBS)     // use bits from RBS to produce a boolean with prob P of being
                     // true; on average uses less than 2 samples from RBS per call.
    out << RBS       // print some information about RBS.
    RBS.myIndex()    // number of times RBS has been advanced,
                     // same as the number of random bits generated.

Note that a RandomBitStream supports input iterator syntax.

Note prob should work correctly even with probabilities close to 0 (e.g. < 1/10^10).

You may assign or create copies of RandomBitStream objects; the copies acquire the complete state of the original, so will go on to produce exactly the same sequence of bits as the original will produce.

Maintainer documentation for RandomBitSteam

The idea is very simple: use the pseudo-random number generator of GMP to generate a random integer, and then give out the bits of this integer one at at time; when the last bit has been given out, get a new random integer from the GMP generator. The random integer is kept in the data member myBuffer, and myBitIndex is used to read the bits one at a time.

Among the data members, there are two mild surprises: mySeed and myCounter. I put these in to help any poor blighter who has to debug a randomized algorithm, and who may want to "fast forward" the RandomBitSteam to the right place.

The data member myState holds all the state information used by the GMP generator. Its presence makes the ctors, dtor and assignment messier than they would have been otherwise.

The advancing and reading member functions (i.e. operator++ and operator*) are inline for efficiency, as is the sample function. The condition for refilling myBuffer is when the index goes beyond the number of bits held in myBuffer.

myFillBuffer also sets the data member myBitIndex to zero; it seemed most sensible to do this here.

The function prob is nifty; if you think about it for a moment, it is obviously right (and economical on random bits). It would be niftier still if the probability were specified as an unsigned long -- on a 64-bit machine this ought to be sufficient for almost all purposes. If the requested probability is of the form N/2^k for some odd integer N, then the average number of bits consumed by prob is 2-2^(1-k), which always lies between 1 and 2. If the requested probability is 0 or 1, no bits are consumed.

Bugs, Shortcomings and other ideas

Should the syntax for seeding from current time be more visibly different from the other ctor calls?

Is run-time speed so important here? How many algorithms really consume millions of random bits? Surely the computations on the random bits/values should exceed the cost of generating them, shouldn't it?

Is it really worth having inline functions here? operator++() is long enough that it is hard to justify keeping it inline.

Should sample advance before or after getting the value?

Is the information printed by myOutputSelf adequate? Time will tell.