Table Of ContentINTERNATIONAL CENTRE FOR MECHANICAL SCIENCES
C 0 URS ES AN D L E C T li RES - No. 136
IMRE CSISZAR
HUNGARJAN ACADEMY OF SCIENCES, BUDAPEST
INFORMATION TRANSMISSION WITH
SIMBOLS OF DIFFERENT COST
COURSE HELD AT THE DEPARTMENT
OF AUTOMATION AND INFORMATION
]UNE 197.2
SPRINGER-VERLAG WIEN GMBH 1972
Tbis wodt il suqect to copyright
AU rights are reserved,
whether the whole or part of the material il concemed
specificaUy those of translation, reprinting, re-use of illustrations,
broadcasting, reproduction by photocopying machine
or similar means, and storage in data banks.
© 1972 Springer-Verlag Wien
Originally published by Springer-Verlag Wien-New York 1972
ISBN 978-3-211-81136-8 ISBN 978-3-7091-2866-4 (eBook)
DOI 10.1007/978-3-7091-2866-4
PREFACE
These notes represent the materiat of
the author's teatures at the CISM's Summer aourses in
Udine, 1972.
The author is indebted to Prof. L. So
brero, Searetary General of CISM, for having invited
him to give these Zeatures and atso to Prof. G. Longo
whose enthusiastia work in organizing this information
theory aourse was a main faator of its suaaess.
Udine, July 1972
A basic problern of information theory is the re
liable transmission of messages with possibly small cost. The
cost is most often measured by the number of symbols needed for
the transmission (the length of the encoded message). It may hap
pen, however, that the different symbols are of different cost
( e.g. different duration, as the Morse symbols dot, dash and space
in telegraphy) in which case the problern of minimizing the aver
age cost is different of minimizing the average number of symbols
in the encoded message. This series of talks is devoted to prob
lems of that kind. The results will very clearly demonstrate the
fundamental role of Shannon1s entropy as the measure of the a
mount of information. Moreover, our investigations will lead to
significant consequences even when specialized to the simplest
case of equal symbol costs, admittedly most often met in modern
digital communication; this means that the theory to be develop
ped may be of interest also for those who are not concerned with
actually different symbol costs.
Chapter 1.
THE SIMPLEST CODING PROBLEM
Let X be a (finite or countably infinite) set,
•
let Y be a finite set consisting of d elements, let Y denote
the set of all finite sequences (strings) of letters of the al-
phabet Y ( elements of the set Y ) • Let g = X-Y* b e an encoding of
elements of X by (variable-length) code-words, having the prefix
property. Suppose that probabilities p~) are associated with
=1);
the elements of X f p (:x:) ~0, ~ p(:c) then the average code
\ :X:E')(.
word length is
(1.1) L = :.:~)( p (:x:) II g(:r:)ll
!Iu II u
where denotes the length of the string e: y*. It is known
that
(1.2) H =- E p(~) tog~p(:c)
~€)(
and a prefix code exists with
(1.3) L < H + 1
fog d
2
0)
If the symbols ~4iY are of (possibly) different costs %(y) (~(1J)>
then (1.1) is to be replaced by the average code word cost
Information transmission ••• 7
n
L = E p(x)~~(:x:)\ where ~(u) =.E ~('J~) if u =y . ... 'Jne:Y~
:X:E X ~ ') ~ .. i
( 1.4)
We sketch a proof of (1.2) which aasily genera
lizes to yield a similar lower bound for (1.4).
For each ue:Y•, let p(u) be the probability of hav
ing a code word beginning with the string u :
p(u) = E p(:x:), ( 1. 5)
x=u< g(:x:)
p( u)
and let ~I denote the ( condi tional) probabili ty that the next
symbol will be ~ :
= p(u y) (1.6)
p(yju)
p(u) .
The average amount of infonnation given by the
symbol following u is
I(u) =- E p(y ju) eog2 P(yju). (1. 7)
-y~x
Intuition suggests and an easy calculation cheks
that
H =- E p(x) fog p(:x:)- E p(u) I(u), (1.8)
2
a:€X u
8 Chapter 1. The simplest coding problern
where the last summation is for those U€Y•which areproper pre
fixes of al least one codeword g(~).
Since the average amount of information conveyed
by one of d alternatives is upper bounded by eo92 d, from (1.8)
we obtain
( 1.9) H ~ Eog d 1: p(u);
2
u
but ~ p(u) = ~ P(~)llg(:.:)ll, thus (1.9) is equivalent to (1.2).
u :uX'
To obtain a similar result for (1.4), the average
amount of infonnation should be upper bounded in terms of aver-
age cost.
Lemma 1. For an arbitrary probability assignement
p(~), we have
where ~ois the (unique) positive number satisfying
~ - %(~) =
(1.11) ._ 1l1o 1.
~d
Proof. From the convexi.ty of the function f(t) =
t eo9t t easily follows that for arbitrary positive numbers a and
b summing up to a and b , respectiv ely, the inequali ty
(1.12)
Infonnation transmission •••
9
holdi. We shall refer to (1.12) as the basic ineguality. (To
prove (1.12), applytheconvexityinequalitj" f(t)!! f{t0)+f~to)~-t0)
a· a
with f(t) = t eogzt to t = b~' to = b ' multiply both sides by b~ and
swn up for ~ ) •
-x.(lj)
Applying the basic inequality to p(y) and UYo
in the role of a~ and b~ , respectiv ely, we obtain
%(~
E P(~) eog2p(y) W'o ~ 0' (1.13)
lj6'{
which is equivalent to (1.10).
Remark. The upper bound (1.10) is accurate in the
sense that the equality obtains for a particular probability
assignement (namely, for p ( ~) = 'UT 0 - !l'.(y)) •
•
Theorem 1. For any prefix code g =X-Y, the aver-
age code word cost given by (1.4) is lower bounded by
L ~ H
c
Proof. Applying lemma 1. to the conditional pro
babilities P(~lu) , see (1.6), from (1.8) we obtain
(1.15)
=
But the double sum on the right of (1.15) equals L ~~p(~)x~(~~
thus (1.15) is equivalent to (1.14).
•
Theorem 2. There exists a prefix code g =X-Y such
10 Chapter 1. The simplest coding problern
that the average code word cost satisfies
(1.16) H
L < C + ~rnu
Proof. A code with average cost satisfying {1.16)
can be constructed by a simple modification of the Shannon-Fano
method. Let the probabilities P(:t) be arranged into a non-in
creasing sequence: P.. ~p2~ ••• ; let us divide the interval (0,1)
into consecutive intervals of length p~, and the left endpoints
of these subintervals should represent the corresponding :t e: X.
Divide the interval f\10 ,1) into subinterval of length 'U1- 0 "(Y > ,
cf. (1.11), then divide each such subinterval containing at least
two points representing different elements x~X into subintervals
-%(y)
of length proportional to ~o , etc. The code word of each
~E X is detennined by the sequence of subintervals containing the
point representing x ; the length of the latter is clearly
-%('jt} - ~(yt) - ~(yn) - %(9(~))
W'o t11o ••• Wo = to'o where g(:x:) = \ji ... yh • The
length of the previous subinterval was greater than p~ = p(x)- else
it would ~ve contained ·no point repr.esenting a different :.;
-c~c9<"J:»-~c~n>)
- which means 'lO' o ~ p~ = p (-r.) , and ewm. more
-(~(g<x~- :Z.~t~a~t
Wo ~ p~ (x). Takinglogaritbnrs, multiplyingby p(:x:)and
summing up for all.&EXwe obtain -(L- !tirnax) eog1 'W'o ~ ~ p~)fog,p(:x:)=-H,
'XE X
proving (1.16).
Theorem 3. For block - to variable length encodings
E
of a discrete memoryless source of entropy rate H =- p(x) eog2 p(?:.),
~E)(