Variable Length Codes

INTRODUCTION

The codes we have looked at so far have all used a fixed length, and they are called block codes from the fact that the messages are of fixed block lengths in the stream of symbols being sent. The Morse code mentioned is an example of a variable-length codes. We now examine this class of codes in more detail. The advantage of a code is more efficient in the sense that to represent the same information we can use fewer digits on the average. To accomplish this we need to know something about the statistics of the message being sent. If every symbol is as likely as every other one, Than the block codes are about as efficient as any code can be. But if some symbols are more probable than others, than we can take advantage of this feature to make the most frequent symbols correspond to the shorter encodings and the rare symbol correspond to the longer encodings. This is exactly what the Morse code does. The letter E in the English language occurs most frequently and corresponds to the encoded symbol "dot".

However, variable-length codes bring with them a fundamental problem, at the receiving end, how to you recognize each symbol of the code? In, for example, a binary system how to you recognize the end of one code word and the beginning of the next? If the probabilities of the frequencies of occurrence of the individual symbols are sufficiently more efficient than block encoding.

In this chapter we will take advantage only of the frequencies of occurrence of the individual symbols being sent, and neglect any larger structure the message may have. An example of a slightly larger scale structure is the fact that in English the letter Q is usually followed by the letter U. We will take advantage of such correlation in the message being sent.

UNIQUE DECODING

We need to make clear when we are talking about the symbols to be sent and when we are talking about the symbols used by the signaling system. We will refer to the source symbols when we refer to the symbols ( such as the letters of the English alphabet ) that are to be sent, and to the code's alphabet when we refer to the symbols used in sending ( such as 0 and 1 in the binary system ). In general, we will assume that the source alphabet being sent has q symbols, s1, s2,…….,sq, and that the code's alphabet has r symbols ( r for the radix of the system).

The first property that we need is unique decodability - the received message must have a single, unique possible interpretation. Consider a code in which the source alphabet S has four symbols, and they are to be encoded in binary as follows:

s1 = 0

s2 = 01

s3 = 11

s4 = 00

The particular received message 0011 could be one of these two:

0011= s4, s3 or s1, s1, s3

thus the code is not uniquely decodable. While this property of unique decodability is not always absolutely essential, it is usually highly desirable.

In order to get our thoughts clear we make a formal definition:

Definition. The nth extension of a code is simple all possible concatenations of n symbols of the original source code.

This is also called the nth direct product of the code. The definition is necessary because the message we send look to the receiver as concatenations of the encoded symbols of the source alphabet, and the receiver needs to decides which sequence of source symbols was sent. For unique decodability, no two encoded concatenations can be the same, even for different extensions. Clearly, only if every sequence is unique can we have a uniquely decodable signaling system. This is a necessary and sufficient condition, but it is hardly usable in this form.

INSTANTANEOUS CODES

Consider the following code:

s1 = 0

s2 = 10

s3 = 110

s4 = 111

Now consider how the receiver would decode message sent in this code. It would set up what is equivalent to a finite automation, or if you prefer, a decision tree. Starting in the initial state the first binary digit received will cause a branch, either to a terminal state s1 if the digit is 0, or else to a second decision point if it is a 1. For the next binary digit this second branch would go to the terminal state s2 if a 0 received, and to a third decision point if it is a 1. The third would go to the terminal state s3 if the third digital is a 0,and to the terminal symbol s4 if it is a 1. Each terminal state would, of course, emit its symbol and then return control to the initial state. Note that each bit of the received stream is examined only once, and that the terminal states of this tree are four source symbols s1, s2, s3, and s4.

In this example the decoding is instantaneous since when a complete symbol is received, the receiver immediately knows this and does not have to look further deciding what message symbol it has received. No encoded symbol this code is a prefix of any other symbol. This particular type of code is what is called a comma code, since the binary digit 0 indicates the end of a symbol, plus the fact that in this case no symbol is longer than three digits.

The following code is uniquely decodable but is not instantaneous because you do not know when one symbol is over without looking farther:

s1 = 0

s2 = 01

s3 = 011

s4 = 111

We can see the trouble - some word codes are prefixes of other words; that is, they are the same as the beginning part of some other symbol. Now consider the string

01111……1111

s4 s4

It can only be decoded by first going to the end and then identifying runs of three 1's as each begin an s4 until the first symbol is reached. Only then can it be identified as s1, s2, or s3. Thus the receiver cannot decide whether it has the word that corresponds to prefix, or if it must wait until more is sent to complete the word. The simplest way to decode message in this particular code is always start at the back end of the received message. This puts a severe burden on the storage and also causes a time delay.

It is clearly both necessary and sufficient that an instantaneous code have no code si which is a prefix of another code word sj. If we had to deal with a noninstantaneous code, than the decision tree would have a structure which instead of returning to the start word, when it realized that it had received a complete word some time ago, emit the proper code word and then go to an appropriate place in the decision tree, not necessarily the start. As in the example above, the tree could require potentially infinite storage.

CONSTRUCTION OF INSTANTANEOUS CODES

It is clear that of all uniquely decodable codes, the instantaneous codes are preferable to ones that are not, and since it will turn out that they cost us nothing extra, it is worth concentrating in them. Let us, therefore, explore the problem of constructing them.

Given that we are to construct a code with five symbols s1, in the source code S, and that the code alphabet is the binary one, we can assign

s1 = 0

s2 = 10

s3 = 110

s4 = 1110

s5 = 1111

to get an instantaneous code.

In this construction the use of the 0 for first symbol reduced the number of possibilities available later. Instead of this, let us use two digits for the first two symbols, s1 =00 and s2 = 01. We can now use s3 = 10. With two symbols yet to go, we cannot use s4 = 11; instead, we must try s4 = 110, which leaves us with s3 = 111. We get the code

s1 = 00

s2 = 01

s3 = 10

s4 = 110

s5 = 111

this code is clearly instantaneous, since no symbol is a prefix of any other symbol, and the decoding tree is easily constructed.

Which of these two codes is the better ? This depends on the frequency of occurrence of the symbols s1. By "better" we mean, of course, that the sent message on the average will be shorter more efficient. We will investigate this more closely. Evidently "efficient" must depend on the probability of the various symbol being used in the message being sent.

SHORTENED BLOCK CODES

Returning for the moment to the earlier block codes, if we had exactly 2m code words in a binary system, then we could use m digits to represent each symbol. But suppose that we do not have an exact power of the radix. To see what can happen, consider the case of five symbols. Of the eight binary symbols

000

001

010

011

100

101

110

111

We can drop any three. But if we drop 001,011, and 101, we can shorten three branches of the decoding tree and still have instantaneous decodability. We will have

s1 = 00

s2 = 01

s3 = 10

s4 = 110

s5 = 111

Instead of this choice we can drop 001,010, 011, and shorten only one branch of tree to the code

s1 = 0

s2 = 100

s3 = 101

s4 = 110

s5 = 111

in both cases there now are no unused terminals and therefore K=1. We will call these shortened block codes; they are essentially block codes with small modifications.

HUFFMAN CODES

Now for the first time we will make use of the probabilities of the various symbol being sent. As in the Morse code, we want the most frequent symbols to have the shorter encodings. If the probability of the it symbol is pi, and it's length is li, then average of the code is

With no loss in generality the pi may be taken in decreasing order. If the length li are not in the opposite order, that is, we do not have both

And

Then the code is not efficient in the sense that we could have a shorter average length by reassigning the code representations of the symbol s1, s2, s3,………,sq. To prove this assertion, suppose that m < n we have conditions ( for some m and n)

In computing the average length originally we have, among others, the two terms

By interchanging the encoded symbols for sm and sn, we get the corresponding terms

Subtracting the old from the new we have the change due to the reassignment

From the assumptions above this is a negative number; we will decrease the average code length if we interchange the encoded symbols for sm and sn. We therefore assume that the two running inequalities both hold.

We begin our examination of Huffman coding with encoding into the binary alphabet. We will look at the bass r alphabet. We will use "source symbol" for the input si and "code alphabet" for the alphabet we are encoding into.

The first thing to prove is that the two least-frequent source symbols of an efficient code have the same encoded lengths. For an instantaneous code we know that no encoded symbol is the prefix of another symbol, thus the encoded bits of the (q-1)st code are not the same as the corresponding encoded bits of the qth code symbol. Therefore, any bits the qth encoded symbol beyond those of the (q-1)st can be dropped and no confusion will result in the decoding tree. Thus two of the longest encoded symbols have the same length and from the decoding tree they differ in the last digit.

The proof of the encoding properties, as well as the method of encoding, is a reduction at each stage to a shorter code. We simply combine the two least probable symbols of the source alphabet into a single symbol, whose probability is equal to the sum of the two corresponding probabilities. Thus we have to encode a source alphabet of one less symbol. Repeating this step by step we get down to the problem of encoding just two symbols of a source alphabet, which is easy, merely use 0 and 1. Now in going backward one of these two symbols is to be split into two symbols, and this can be done by appending a second digit 0 for one of them and 1 for the other. In the next stage of going back, one of these three symbols is to be split into two symbols in the same way. And so it goes. For one special case given below, the general case should be obvious from this.

How do we know that this process generates an efficient code? Suppose that there were a shorter code with code length L' with

L' < L

Let us compare the two decoding trees. In an efficient binary code, all the terminals are occupied and there are no "dead branched". (A dead branch would enable us to shorten the code by deleting the corresponding binary digit in all the terminals that pass through this useless decision point.)

If there are only two symbol of maximum length in a tree, then they must have their last decision node in common, and they must be the two least probable symbols. Before we reduce a tree, the two symbols contribute

If there are more than two symbols of the maximum length, we can use the following proposition: symbols having the same length may e interchange without changing the code length. Using this, we can bring the two least probable symbols so they their final decision node. Thus after the reduction, we have shortened the code by the amount