Random In a Stream

Exercise

Recall that a stream is, informally, a possibly infinite sequence of elements which can be read from one at a time, and can’t be rewound (without using extra memory). That is, it’s an iterator that only has a next method.

Given a stream of length $n$ (where $n$ is not known), uniformly choose a random element from that stream using $O(1)$ space and $O(n)$ time.

Solution

The following solution, in Common Lisp, uses a list as the representation for our stream. However, any data type could be used as long as we can tell if we are at the end of a stream (i.e., null) and how to get the next element of the stream (i.e., pop).

(defun random! (seq)
  (labels ((rec (rand-elt count)
             (if (null seq)
                 rand-elt
                 (let ((next (pop seq)))
                   (rec (if (zerop (random count))
                            next
                            rand-elt)
                        (1+ count))))))
    (rec (pop seq) 2)))

The general strategy is this: as we pull items out of the stream, increment a counter (thereby “labeling” the item), and select that as our “new” item with a probability inversely proportional to the counter.

More precisely: Suppose we are at the $k$th item, then with a probability of $1/k$, choose that item as our new random item $r$. When we have exhausted the stream or we have iterated through $n$ elements, our random item will be $r$.

We can prove that the $k$th item in the stream of length $n$ has a probability of $P(k,n)=1/n$. We do so inductively.

Base Case. For a stream of length $n=1$, it is clear that we have a probability $P(1,1)=1$.

General Case. Suppose we have read $n-1$ elements, and suppose $r$ is a uniformly random element from the items enumerated by $1$ and $n-1$. The probability that $r$ will be set to the $n$th element is $1/n$, by definition. The chance that $r$ will not be updated is $(n-1)/n$. In this case, the probability that any particular (suppose the $k$th) previous stream element will remain in $r$ is \begin{equation*}
P(k,n-1)\cdot(1-P(n,n)) = \frac{1}{n-1}\cdot\frac{n-1}{n} = \frac{1}{n}.
\end{equation*}
Therefore, the probability of $r$ being updated is the same as the probability any previous element has a chance of being $r$.

By induction, for a stream of length $n$, $P(k,n) = 1/n$.