Search View Information Theory

To find a specific word, name, or topic in this article, select the option in your Web browser for finding within the page. In Internet Explorer, this option is under the Edit menu.

The search seeks the exact word or phrase that you type, so if you don’t find your choice, try searching for a key word in your topic or recheck the spelling of a word or name.

Information Theory
I. Introduction

Information Theory, application of mathematical principles to the problems of transmitting and storing information. Information theory stems originally from the work of American mathematician and electrical engineer Claude E. Shannon and, in particular, his classic paper “A Mathematical Theory of Communication,” published in 1948 in the Bell System Technical Journal. Shannon rewrote the paper and, with Warren Weaver, published it in book form as The Mathematical Theory of Communication in 1949. Shannon was interested in the problems of communicating information accurately from one place to another. He decided to approach communication from a mathematical point of view.

Information theory focuses on the problems inherent in sending and receiving messages and information. The theory is based on the idea that communication involves uncertain processes, both in the selection of the message to be transmitted and in the transmission of the message itself. Information theory provides a way to measure this uncertainty precisely.

In information theory, the actual meaning of the message is unimportant. Instead, the important qualities of communication are the amount of information that the message contains, the accuracy of the transmission, and the quality of the reception. All of these values are represented mathematically, so different messages and different communication systems can be compared, studied, and improved.

Information theory measures the amount of information in a message by using units called bits, short for binary digits, which use only the numbers 0 and 1 (see Number Systems). Information theory is useful because it provides a way to find the minimum number of bits required to communicate a given message. Information theory can also determine the maximum rate, in bits per second, at which a given communication channel can transmit reliable information.

Information theory is primarily a theoretical study. However, it has had a profound impact on the design of practical data communication and storage systems, such as telephones and computers. The theory can be applied to both the transmission and the storage of messages, because storage is nothing more than transmission in time. For example, both making a telephone call to a friend in another city and tape recording a message for a friend to play later in the day involve the same issues of sending and receiving messages. In information theory, no fundamental distinction is made between these two types of problems.

II. Parts of a Communication System

Any time a message is sent from a sender to a receiver, the different parts of the communication system can be represented by the accompanying schematic figure titled 'Elements of a Communication System,' adapted from Shannon’s work on information theory. The model he devised to represent a communication system always consists of five major parts: the information source, the transmitter, the channel, the receiver, and the destination.

The information source produces (or selects) the message or the sequence of messages to be transmitted to the destination. For example, the information source could be a distant spacecraft and the message could be an image of a planet, or the information source could be a rock-and-roll band and the message could be a new song.

The transmitter converts the message into a signal suitable for transmission over the channel. For example, the transmitter could be the spacecraft telecommunication system that converts a photograph of Jupiter into a television signal. Another example would be the recording studio’s audio equipment, which converts the rock-and-roll song into a sequence of tiny pits on the mirrorlike surface of a compact disc (CD).

The channel is the medium that is used to transmit the signal. The channel is often noisy, in the sense that when the signal arrives at the receiver, it may contain noise or static, or it may be slightly garbled. For example, the channel could be the millions of kilometers of empty space between Jupiter and Earth, with noise arising because the received signal is so weak. Or it could be the surface of a CD, with noise occurring because of fingerprints, dust, or scratches on the surface.

The receiver is a device that reconstructs (either exactly or approximately) the message from the received signal. It could be a large dish antenna or the electronics in a CD player.

The destination is the person (or thing) for which the message is intended. For example, the destination could be a teenager interested in planetary science or an astronomer interested in rock and roll.

Information theory is the mathematical study of these five components, individually and in combination. Existing communication systems can be studied this way, and new systems can be designed based on the knowledge gained. For example, information theory can provide a way to measure the amount of information produced by a source or to measure the ability of a noisy channel to transmit information reliably. In addition, the theory provides the theoretical basis for data compression, which is a way to squeeze more information into a message by eliminating redundancy, or parts of the message that do not contain any important information. Information theory also offers guidelines for the engineering design of transmitters and receivers.

III. Measuring Information: The Bit

In any communication system the message produced by the source is one of several possible messages. The receiver will know what these possibilities are but will not know which one has been selected. Shannon observed that any such message can be represented by a sequence of fundamental units called bits, consisting of either 0s or 1s. The number of bits required for a message depends on the number of possible messages: the more possible messages (and hence the more initial uncertainty at the receiver), the more bits required.

As a simple example, suppose a coin is flipped and the outcome (heads or tails) is to be communicated to a person in the next room. The outcome of the flip of a coin can be represented using one bit of information: 0 for heads and 1 for tails. Similarly, the outcome of a football game might also be represented with one bit: 0 if the home team loses and 1 if the home team wins. These examples emphasize one of the limitations of information theory—it cannot measure (and does not attempt to measure) the meaning or the importance of a message. It requires the same amount of information to distinguish heads from tails as it does to distinguish a win from a loss: one bit.

For an example with more than two outcomes, more bits are required. Suppose a playing card is chosen at random from a 52-card deck, and the suit chosen (hearts, spades, clubs, or diamonds) is to be communicated. Communicating the chosen suit (one of four possible messages) requires two bits of information, using the following simple scheme:

IV. Information Content and Entropy

A fundamental problem in information theory is to find the minimum average number of bits needed to represent a particular message selected from a set of possible messages. Shannon solved this problem by using the notion of entropy. The word entropy is borrowed from physics, in which entropy is a measure of the disorder of a group of particles. In information theory disorder implies uncertainty and, therefore, information content, so in information theory, entropy describes the amount of information in a given message. Entropy also describes the average information content of all the potential messages of a source. This value is useful when, as is often the case, some messages from a source are more likely to be transmitted than are others.

A. Information Content of a Message

If there are a number of possible messages, then each one can be expected to occur a certain fraction of the time. This fraction is called the probability of the message. For example, if an ordinary coin is tossed, the two outcomes, heads or tails, will each occur with probability ½. Similarly, if a card is selected at random from a 52-card deck, each of the four suits will occur with probability ¼.

Shannon determined that the information content of a message is inversely related to its probability of occurrence. The more unlikely a message is, the more information it contains. Based on this concept, the exact information value of a message can be determined mathematically. If the probability of a given outcome is denoted by p, then Shannon defined the information content (I) of that message, measured in bits, to be equal to the base 2 logarithm (log2) of the reciprocal of p. (The log2 of a given number is the exponent that must be given to the number 2 in order to obtain the given number. Log 2 of 8 = 3, for example, because 23 = 8.) The relation between information content and probability can be represented by the equation

I = log21/p

The graph below illustrates a plot of this function. The horizontal axis shows the probability p; p is a number between 0 (an impossible event) and 1 (a certain event). The vertical axis shows the information content.

As can be seen in the graph, the less likely or probable an event, the more information it conveys. Here is a short table giving several values of p and the corresponding information content I:

The more possibilities a source contains, the more information it contains and, therefore, the more bits needed to represent the information content. For example, if a card is picked at random from a 52-card deck and the exact value of the card is to be communicated, then each possibility has probability 1/52. According to the formula I = log21/p, each possible outcome has an information content equal to log252, which equals 5.70044.

B. Entropy of a Source

The entropy of a source is the average information content from all the possible messages. In the example of drawing a card from a 52-card deck, each card has the same chance of being drawn. Since each outcome is equally likely, the entropy of this source is also 5.70044 bits. However, the information content from most sources is more likely to vary from message to message. Information content can vary because some messages may have a better chance of being sent than others have. Messages that are unlikely convey the most information, while messages that are highly probable convey less information.

The following example demonstrates the entropy of a source when the outcomes are not equally likely. Consider a deck of 8 cards, consisting of 4 spades, 2 hearts, 1 diamond, and 1 club. The probability of each suit being drawn at random is no longer equal. The probability and the corresponding information content of drawing a given card are listed in the following table:

In this case the entropy, or average information content, of the system is exactly:

(1/2 × 1) + (1/4 × 2) + (1/8 × 3) + (1/8 × 3) = 1.75 bits.

Even though there are four possible messages, some are more likely than others, so it actually takes less than two bits to specify the information content of this system. To see how to represent this particular source using an average of 1.75 bits, consider the following binary code:

Here the most likely outcome (spades) is represented by a short code and the least likely outcomes are represented by longer codes. Mathematically, the average is exactly 1.75 bits, the source entropy. Information theory states that no further improvement is possible.

V. Channel Capacity and Noise

Information theory is often used to design communication systems. An important property of a transmission system is its channel capacity. Sometimes known as the Shannon limit, channel capacity is a measure of the ultimate speed or rate at which a channel can transmit information reliably. The capacity of a particular system can be approached but never exceeded.

Channel capacity is measured in bits per second for a transmission channel and in bits per centimeter (or per square centimeter) for storage channels. Information theory says that if the transmitter is properly designed, information can be transmitted perfectly reliably at any speed up to the channel’s capacity. However, if capacity is exceeded, then inevitably the message will be received at the destination with errors.

One factor that decreases capacity is noise, or unwanted information. Most communication channels introduce some amount of noise, such as radio static, into the message. Noise interferes with messages and often causes errors to occur during transmission. For example, a noisy channel that is used to transmit zeros and ones might occasionally change a zero into a one or a one into a zero. In such a case, the message 10001 might be received as 11001. Information theory provides a way to measure the severity of the noise by measuring the capacity of a channel.

The graph below shows the fraction of received bits that are erroneous as a function of the transmission rate, measured in multiples of capacity. For example, if the capacity is 1,000 bits per second, a rate of 0.5 equals a transmission speed of 500 bits per second. The surprising thing about the graph below is that the fraction of bits that are erroneous need not increase as the transmission speed increases until capacity is reached. However, once capacity is exceeded, errors are inevitable. For example, at a speed of 99 percent of capacity, no errors have to occur, but at twice capacity at least 11 percent of the bits will be erroneous, no matter how well the system is designed.

However, even when transmission rates are below capacity, noise usually introduces some errors. To combat the effects of noise, engineers strive to build systems that will transmit messages close to capacity over a noisy channel without introducing errors. To do so, they use error-correcting coding. If the information source is a random sequence of bits, a device in the transmitter called the encoder uses these data bits to carefully calculate a number of check bits. The idea is that when the data bits and the check bits are combined, a strong pattern appears. The data bits and the check bits are then transmitted together over the noisy channel. The combination of data bits and check bits is usually called a code word.

Noise in the channel may corrupt both the data bits and the check bits. However, Shannon showed that if the check bits are computed in just the right way, the pattern in the code word will be so strong that the pattern will almost always be recognizable despite the channel noise. A pattern-recognizing device, which is called a decoder, forms part of the receiver. Its job is to reconstruct the original data, using the noisy data and the noisy check bits as clues.

Shannon did not show how the encoder and decoder should be designed; he only showed that the use of check digits is possible. Later generations of mathematicians and engineers went on to design practical error-correcting systems so that it is now possible to communicate reliably at speeds near capacity.

VI. Data Compression

Information theory provides a method for determining exactly how many bits are required to specify a given message to a given precision. This method is called the theory of data compression or, more technically, rate distortion theory. Often the message selected at the source need not (or cannot) be transmitted perfectly to the destination. For example, in a telephone conversation, extremely high sound quality is not necessary. It is usually sufficient that the two parties recognize and understand each other. Similarly, if the message is a scientific measurement—for example, a measurement of the number p = 3.14159265…—it is not possible to transmit all of the digits in a finite amount of time. However, a useful approximation of p can be transmitted with a relatively small number of bits.

The general idea of data compression theory is illustrated in the graph below. The horizontal axis measures the distortion, or imprecision, that is tolerable in a given message. The vertical axis gives the minimum possible number of bits required to specify the message with this distortion. The graph shows that as the acceptable distortion becomes smaller and smaller, the required number of bits becomes larger and larger. Conversely, as the allowed distortion becomes larger, the required number of bits decreases. Ultimately, the number of required bits becomes zero. The number becomes equal to zero when the allowed distortion can be achieved by merely guessing at the message.

Consider the following simple example. If the message is the outcome of the toss of an ordinary coin and it is acceptable to be wrong 50 percent of the time, then no bits are required. In this case, one can simply guess heads and be sure of being right 50 percent of the time, on average.

The relationship shown by the rate-distortion curve represents the absolute minimum number of bits, on the average, required to represent a message with a given distortion. The exact shape of the curve depends on the details of the particular situation. Shannon’s Fundamental Theorem of Data Compression states that, in principle, it is possible to compress a message to a given level, but no more. Exactly how this compression should be done he did not say, but later researchers developed practical techniques for data compression based on Shannon’s theorems. These techniques use mathematical algorithms, or repeated computations, to compress data. Many of these methods come very close to achieving the ultimate limits set forth by information theory.

VII. Information Theory Applications

Information theory began as a theoretical science. Nevertheless, its insights led to a revolution in the design of digital transmission and storage systems. Three major areas that have directly benefited from information theory are transmission systems, storage systems, and the Internet.

The first transmission systems to be influenced by information theory were spacecraft communication systems. Since the late 1960s, long-range space probes, such as Pioneer, Voyager, Galileo, and Cassini, have used digital communication systems enhanced by advances made through research in information theory. In the early days of space exploration, transmission speeds for signals from distant probes were measured in tens of bits per second. Thanks in large part to information theory, these rates have increased to hundreds of thousands, and in some cases millions, of bits per second. Computer modems have also benefited from information theory. In the 1980s modems often operated at speeds no faster than 300 bits per second. By the late 1990s, modems routinely operated at speeds up to 56,000 bits per second.

Computer storage systems are also designed using the guidelines provided by information theory. The random-access memory, or RAM, in modern computers would be impossible without error-control coding designed by information theorists. High-capacity hard disks and CD ROMs are similarly protected. Many of today’s consumer electronic devices would also be impossible without information theory. Recording engineers have used concepts such as channel capacity and entropy to guide the design of compact disc, DAT (digital audio tape), and DVD (digital video disc) players and recorders.

The Internet and the World Wide Web are computer networks that store and transmit large amounts of data. Sending and receiving large amounts of information accurately over these networks require large amounts of channel capacity. Information theory, especially its data compression algorithms, has played a large part in making the Internet practical. For example, sending and receiving still or moving color images require large amounts of memory and would ordinarily overwhelm the capacity of the Internet. With data compression algorithms, large images can be reduced to an efficient and manageable size, making rapid exchanges of information possible.