The Internet Archive discovers and captures web pages through many different web crawls.
At any given time several distinct crawls are running, some for months, and some every day or longer.
View the web archive through the Wayback Machine.
Content crawled via the Wayback Machine Live Proxy mostly by the Save Page Now feature on web.archive.org.
Liveweb proxy is a component of Internet Archive’s wayback machine project. The liveweb proxy captures the content of a web page in real time, archives it into a ARC or WARC file and returns the ARC/WARC record back to the wayback machine to process. The recorded ARC/WARC file becomes part of the wayback machine in due course of time.
For a given probability space, measurement of rarer events will yield more information content than more common values. Thus, self-information is antitonic in probability for events under observation.
Intuitively, more information is gained from observing an unexpected event—it is "surprising".
For example, if there is a one-in-a-million chance of Alice winning the lottery, her friend Bob will gain significantly more information from learning that she won than that she lost on a given day. (See also: Lottery mathematics.)
The corresponding property for likelihoods is that the log-likelihood of independent events is the sum of the log-likelihoods of each event. Interpreting log-likelihood as "support" or negative surprisal (the degree to which an event supports a given model: a model is supported by an event to the extent that the event is unsurprising, given the model), this states that independent events add support: the information that the two events together provide for statistical inference is the sum of their independent information.
This measure has also been called surprisal, as it represents the "surprise" of seeing the outcome (a highly improbable outcome is very surprising). This term (as a log-probability measure) was coined by Myron Tribus in his 1961 book Thermostatics and Thermodynamics.
When the event is a random realization (of a variable) the self-information of the variable is defined as the expected value of the self-information of the realization.
To verify this, the 6 outcomes correspond to the event and a total probability of 1/6. These are the only events that are faithfully preserved with identity of which dice rolled which outcome because the outcomes are the same. Without knowledge to distinguish the dice rolling the other numbers, the other combinations correspond to one die rolling one number and the other die rolling a different number, each having probability 1/18. Indeed, , as required.
Unsurprisingly, the information content of learning that both dice were rolled as the same particular number is more than the information content of learning that one dice was one number and the other was a different number. Take for examples the events and for . For example, and .
The information contents are
Let be the event that both dice rolled the same value and be the event that the dice differed. Then and . The information contents of the events are
By definition, information is transferred from an originating entity possessing the information to a receiving entity only when the receiver had not known the information a priori. If the receiving entity had previously known the content of a message with certainty before receiving the message, the amount of information of the message received is zero.
When the content of a message is known a priori with certainty, with probability of 1, there is no actual information conveyed in the message. Only when the advance knowledge of the content of the message by the receiver is less than 100% certain does the message actually convey information.
Accordingly, the amount of self-information contained in a message conveying content informing an occurrence of event, , depends only on the probability of that event.
for some function to be determined below. If , then . If , then .
Further, by definition, the measure of self-information is nonnegative and additive. If a message informing of event is the intersection of two independent events and , then the information of event occurring is that of the compound message of both independent events and occurring. The quantity of information of compound message would be expected to equal the sum of the amounts of information of the individual component messages and respectively:
Because of the independence of events and , the probability of event is
However, applying function results in
The class of function having the property such that
is the logarithm function of any base. The only operational difference between logarithms of different bases is that of different scaling constants.
Since the probabilities of events are always between 0 and 1 and the information associated with these events must be nonnegative, that requires that .
Taking into account these properties, the self-information associated with outcome with probability is defined as:
The smaller the probability of event , the larger the quantity of self-information associated with the message that the event indeed occurred. If the above logarithm is base 2, the unit of is bits. This is the most common practice. When using the natural logarithm of base , the unit will be the nat. For the base 10 logarithm, the unit of information is the hartley.
As a quick illustration, the information content associated with an outcome of 4 heads (or any specific outcome) in 4 consecutive tosses of a coin would be 4 bits (probability 1/16), and the information content associated with getting a result other than the one specified would be ~0.09 bits (probability 15/16). See below for detailed examples.
^R. B. Bernstein and R. D. Levine (1972) "Entropy and Chemical Change. I. Characterization of Product (and Reactant) Energy Distributions in Reactive Molecular Collisions: Information and Entropy Deficiency", The Journal of Chemical Physics57, 434-449 link.
^Myron Tribus (1961) Thermodynamics and Thermostatics:An Introduction to Energy, Information and States of Matter, with Engineering Applications (D. Van Nostrand, 24 West 40 Street, New York 18, New York, U.S.A) Tribus, Myron (1961), pp. 64-66 borrow.
^Thomas M. Cover, Joy A. Thomas; Elements of Information Theory; p. 20; 1991.