Kolmogorov complexity

(1)

HAL Id: hal-00347376

https://hal.archives-ouvertes.fr/hal-00347376

Submitted on 15 Dec 2008

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Bruno Durand, Alexandre Zvonkine

To cite this version:

Bruno Durand, Alexandre Zvonkine. Kolmogorov complexity. É. Charpentier, A. Lesne, N. Nikolski.

Kolmogorov’s Heritage in Mathematics, Springer-Verlag, pp.281-299, 2007. �hal-00347376�

(2)

Kolmogorov Complexity

By Bruno Durand and Alexander Zvonkin

The term omplexity has dierent meanings in dierent ontexts. Computa-

tional omplexity measures how muh time or spae is needed to perform some

omputational task. On the other hand, the omplexity of desription (alled

also Kolmogorov omplexity) is the minimal number of information bits needed

todene (desribe)a given objet. It may well happen that a short desription

requiresa lotof time and spaeto followit and atuallyonstrut the desribed

objet. However, whenspeakingaboutKolmogorovomplexity,weusuallyignore

this problemand ountonly the desription bits.

Asitwas ommontohim, Kolmogorov published,in1965, ashort note[10℄ that

started a new line of researh. Aside from the formal denition of omplexity,

he has alsosuggested touse this notion inthe foundations of probabilitytheory.

Hisidea wasquite simple:

Anobjet israndom if ithas maximalpossible omplexity.

The denition of omplexity uses the notion of an algorithm; this unexpeted

marriage of two a priori distant domainsin our ase, probability theory and

theory of algorithmsisalsoa typialtrait of Kolmogorov'swork.

1.1 Algorithms

Thenotionofanalgorithminquitereent. In 1912(whenneitheromputersnor

programminglanguages existed)Émile Borel(see[19℄) usedthe phraseaformal

and preise automati rule desribing an objet whih we would now all an

(3)

algorithm.

(1)

However, a mathematialtheory of algorithmswasdeveloped only

in the 1930ies (by Turing, Gödel, Post, Churh, Kleene and others). The key

observation was the existene of a universal algorithm (see below); it allows to

prove easily that some problems (e.g., the so-alled halting problem that asks

whether a given algorithmterminates ona given input) are undeidable(annot

be solved by algorithms). Note that to prove the non-existene of an algorithm

that solves aertainproblemwe needa mathematiallypreisedenition ofthis

notion. Whenappeared,thisnotionbeameasubjetofthetheoryofalgorithms,

alsoalled theory of reursivefuntions ortheory of omputability.

The remainingpart of this setion disusses some aspets of the notion of algo-

rithm; the readernot interested inthese detailsmay skipitand proeed diretly

to Setion1.2.

It is ratherdiult togive a mathematialdenition thataptures the intuitive

idea of an algorithmin itsfull generality; instead, we may dene aspei lass

of algorithmsand laim that this lass is representative, i.e., that any algorithm

isequivalenttoaertainalgorithminthislass. (By theway,one oftheselasses

was suggested by Kolmogorov.)

1.1.1 Models of omputation

A modelof omputationformallydesribessome speilass ofalgorithms(the

lass of objets used as input/output data, how they are proessed, et.) Some

omputational models resemble programming languages while others look more

as ahardware desription. In any ase, we assumethat omputationalresoures

areunlimited(andforgetthatinrealprogramminglanguagesintegersareusually

bounded, proessor arhiteture has axed word length, et.).

(The study of resoures (time and spae) needed to solve a given problem is a

dierenteld alledomputational omplexity. Letusnotethatanimportantno-

tioninthiseld,NP-ompleteness,wasintroduedatthebeginningofthe1970ies

independentlyby threeresearhers, one of whom,LeonidLevin,is Kolmogorov's

student. Therstpubliationsby Levinwere aboutKolmogorovomplexity[21℄.

His short biography and a brief story how Kolmogorov inuened him may be

found inthe book[17℄.)

1

Thehistoryofthetermalgorithm isinterestinginitself. Thiswordisaderivativeofthe

nameofamedievalPersiansavantAl-Khw arizm(787 .850) whowastheauthorofabook

through whih the Europeans learned the positional number system and the rules of arith-

metioperations(addition,multipliation,et.). ThenameofAl-Khw arizm(whihmeansde

Khorezm, atown in Uzbekistan today alled Khiva) wastransliterated in Latin asAlgorith-

mus. The termalgorithms meantatthebeginningtherulesof fourarithmeti operations.

Thenbyextensionithasgotthemeaningofanysystematimethodofomputation. Leibnitz

alledalgorithms thesetofrulesofomputingdierentialsandintegrals. Itisonlygradually

that the word aquired its modern meaning; onehundred years agothis proess wasnot yet

(4)

Whihomputationalmodelisthe best one? This depends onour purposes. If

wewanttowriterealprograms,itisnaturaltousearealomputerandanappro-

priateprogramming language. On the other hand,if wewant to prove theorems

it would be more onvenient to work with an abstrat model of omputation;

a very simple model, with a small number of primitives, would then be better.

However, there is no anonial model adapted for proofs sine dierent models

are more suitablefor dierentresults.

ThemostpopularmodelisTuringmahine. Itisrathereasytoprovethe univer-

salityofthismodel;however, wehavetodealwithmanydetailsonerningtapes,

symbols, representation of the transition table, et. There are many versions of

Turing mahines; the most ommonone was, by the way, presented by Post and

not by Turing.

ReursivefuntionsàlaChurh giveamoremathematialandattrativemodel

thoughtheproofsofertainbasitheoremsbeomesomewhatdisouragingifnot

frightening.

Markov algorithmsare similarto rewriting systems for strings with termination

onditions; this is a model diult to manipulate (but well suited for the proof

of the undeidability of word problems).

The RAM (randomaess mahines) modelresembles von Neumann-style om-

puters...

Teahingthe algorithmstheory,one may hoose adierentapproah andnot x

anyspeimodelbutrelydiretlyontheintuitionofalgorithms. Moreformally,

itmeansthatwe have toaeptsomepropertiesof algorithmsused intheproofs

as axioms. Then we do not need to go into umbersome details of a spei

omputationalmodel; the prie is,however, that the listof axioms is open (e.g.,

if during the proof we need to establish the omputability of some funtion, we

just desribe informallyits omputation and then add a new axiom saying that

this funtion isomputable).

1.1.2 All models of omputation are equivalent

Why do we believe that this or that omputationalmodel orretly reets the

intuitive notion of an algorithm? This statement is usually alled the Churh

thesis (foragiven omputationmodel): itlaimsthat any omputablefuntion

(omputed by an algorithm in the informal sense) is omputable in this model.

This assertion is not a mathematial one; it is a belief onerning the notion of

intuitiveomputability. Ontheotherhand,weanprovethattheseassertionsfor

dierent omputation models are equivalent, sine it turns out that the lass of

omputablefuntionsisthe samefordierentexisting models(Turing mahines,

reursive funtions,et.).

Thenamegiventothethesis isratherinappropriate. Churhlaimedthatallin-

tuitivelyomputabletotal funtionsare omputableinhismodel. Alongontro-

(5)

equivalene theorem for two dierent models (reursive funtions à la Churh

and Turing mahines) was established by Turing in his seminal artile, and the

thesis in its most general form was formulated by Post. Therefore, a more ap-

propriate name would beChurhTuringPost thesis.

Allthis wasdone inthe 1930ies,sowhyKolmogorovmightwanttosuggestadif-

ferent omputationmodelin the1950ies? Hismotivationould bereonstruted

as follows. Though all omputationmodels mentioned above are equivalent, the

translationbetweenthem sometimesreplaes onestep inone modelby alongse-

quene of steps in another one. Forexample, anaddition may bean elementary

operation in some programming language while its implementation by Turing

mahine requires many steps.

Kolmogorov wanted to nd a model whose steps are elementary in the sense

thattheydonotallownaturaldeompositionintoasequene ofsimplersteps. On

the otherhand, he triedtond amost general(andnatural)modelamongthese

models. Thismeansthat elementarystepsofany othermodel(if they areindeed

elementary aording to our intuition) shouldnot require further deomposition

when translated into Kolmogorov's model.

1.1.3 KolmogorovUspensky mahines

The model suggested byKolmogorovwas lateralled KolmogorovUspenskyma-

hines. These mahines are not relatedto Kolmogorovomplexity, but they are

related toKolmogorov himself; hene we say a oupleof words about them.

Theonguration(stateoftheomputation)ofaKolmogorovUspenskymahine

is a graph; some node of this graph is delared to be ative. The program for

the mahineisa listofrules thatsay howthis ativepart shouldbetransformed

and when the proessing halts. So the omputation step is indeed loal; it

deals with a nite size neighborhoodof the ative node. On the other hand, the

topologialstruture ofthe omputationan beome ratherompliated. This

maybeonsideredasadisadvantageofthemodelsineitallowssomeationsthat

are hard toperform in a physial spae. (Forexample, a KolmogorovUspensky

mahineanreatealabeledtreethat providesanunreasonablyfastaesstoan

exponential amount of information.) So one may want to restrit somehow the

lass of allowed graphs [19, 8, 1℄. Later a version of this model was onsidered

by Shönhage (who used direted graphs with unlimited in-degrees). It seems

pertinent to mention here the development of the GASM (Gurevih Abstrat

State Mahines) whih were inspired by KolmogorovUspensky mahines but

have other goals and do not play a spei role in the lassial omputability

theory. The rst omplete desription of KolmogorovUspensky mahines may

(6)

1.1.4 Universality

Now weare austomedto the idea that the same proessor an be used toper-

form dierent tasks if provided with a suitable program. However, this idea of

universal omputation was a nontrivial and very important step in the devel-

opmentof the rst real omputers.

The same idea an be formally expressed as follows: there exists a universal

omputablefuntionU oftwoargumentspandx. Theuniversalitymeansthatwe

anobtain any omputablefuntionof xby xinganappropriate rstargument

p(a program for this funtion).

Why doesauniversal funtionexist? Imagineaninterpreter of anarbitrarypro-

gramminglanguagethatonsidersitsrst argumentpasaprogramandexeutes

this programusing x asits input.

1.1.5 Non-omputable funtions

The existene of a universal omputable funtion immediately brings us to a

paradox. Consider the funtion F(p) = U(p;p)+1. This (unary) funtion is

omputablesineU is. Itshouldthen haveaprogramassoiatedtoit(sineU is

universal); letus denote this program by q. What happens if we apply program

q toitself? Bydenition of U this givesU(q;q). On the other hand,sine q isa

programfor F, the same result must be equalto F(q) =U(q;q)+1. So we get

U(q;q)=F(q)=U(q;q)+1,and this seemsimpossible.

Theonly way toexplain this paradoxis toreall that ertainomputationsmay

neverterminate, so a programmay ompute a non-totalfuntion. And the on-

tradition disappears if U(q;q)is not dened.

A similar argument shows that the halting problem is undeidable: there is no

algorithmthat gets aprogrampand inputxand tellswhether U(p;x)is dened

(=whether the program pterminates on inputx).

1.1.6 Bak to algorithms

Returning topratie, let usnote that the notion of a omputablefuntion ap-

tures only one aspet of algorithmi pratie. For example, the behavior of a

real-time algorithm (suh as an operating system) is a more ompliated thing

than amere funtion. The hoie of a orretmathematialmodel for this lass

of algorithms(very importantfor pratie) is a wellstudied but not fully solved

problemof theoretial omputer siene.

1.2 Desriptions and sizes

Any informationmay be enoded as a bit string (a nite sequene of bits). For

(7)

Binary strings arealso alledwords in the alphabet B =f0;1g,and the set of all

binarystringsisdenotedasB

. WeidentifyB

withthesetZ +

n f0g=f1;2;3;:::g

usingthe lexiographiorder. (Theempty wordisassoiatedwith1,then07!2,

1 7! 3, 00 7! 4, 01 7! 5, et.: a string u is assoiated with a natural number

that has binary representation 1u. For example,the word 00 orresponds to the

number100

2

,i.e., 4.)

The length juj of a binary word u, i.e., the number of letters init, isthen equal

totheintegralpartbloguofthe binarylogarithmofthenumberassoiatedwith

u. (Note that juj stands for the length of the word u and not for the absolute

value of the orrespondinginteger.)

Denition 1.2.1. Let f : B

! B

be a omputable funtion. We dene the

omplexity of x2B

with respet to f as

K

f (x)=

minjtj suh that f(t)=x;

1 if suh t does not exist.

In other terms, we all desriptions of x (with respet to f) all strings t suh

that f(t)=x; thenthe omplexityK

f

(x)is dened asthe lengthof the shortest

desription.

The main problem with this denition is that the omplexity depends on the

hoie off. Itisunavoidable,but the theoremstated below(due toKolmogorov

but already present, in an informal way, in the paper of Solomono [18℄) ex-

plains in whih way this dependene an be limited. This theorem was later

independently proved by Chaitin but does not appear in his rst papers on the

subjet [2, 3℄the priority laims have provoked a long and futile ontroversy

explained in[13℄.

Theorem 1.2.1 (Existene of an optimal funtion). There exists a om-

putable funtion f

0

(alled optimal funtion) suh that for any other omputable

funtion f there exists a onstant C suh that

8x K

f0

(x)K

f

(x)+C: (1.2.1)

(Note that the onstantC may depend on f but not on x.)

Proof. Let t be a shortest desription of x with respet to f, i.e., f(t) = x.

Then f

0

uses as a desription of x the pair (p;t) where p is a program that

omputes thefuntion f. Inthis pair phas jpjbitsand t has jtjbits, sothe total

numberof bits is jpj+jtj, i.e., jpj+K

f

(x). So we letC =jpj.

Remark 1.2.1. This argument needs some renement. We annot use the pair

(p;t) diretly; we need to enode it by a single string. Not any enoding will

work. Anappropriateenodingmay enode pinaveryineientwaythisonly

inreases the onstantC. On the other hand, itis essential tobeable to enode

t without any loss of spae sine anenoding of t whih demands, say, jtjbits

with >1 leads tothe omplexity K (x)+C instead of K (x)+C.

(8)

Corollary 1.2.1. If f

1

and f

2

are two optimal funtions then there exists a

onstantC suh that

8x jK

f1

(x) K

f2

(x)jC: (1.2.2)

Proeeding from this orollary, we hoose some optimal funtion f

0

and x it.

The subsript f

0 in K

f

0

is then suppressed. However, after doing this we still

have in mind that in fat the Kolmogorov omplexity is dened only up to a

bounded additive term.

Denition 1.2.2. The Kolmogorov omplexity K(x) is the omplexity K

f

0 (x)

with respet to some optimal funtion f

0

. The omplexity K(x) is dened up to

a bounded additive term.

Proposition 1.2.1.

K(x)jxj+C; or, equivalently, K(x)logx+C: (1.2.3)

Proof. It sues toletf(x)=x in(1.2.1),i.e., touse xitself asadesription

of x.

Proposition1.2.2(Distributionofomplexities).Considerallbinarystrings

of length n. The fration of strings x of length n suh that K(x) < n k does

not exeed 2 k

.

Proof. Thenumberofstrings oflengthn is2 n

whilethe numberof(potential)

desriptionsof length less than n k is

1+2+:::+2 n k 1

<2 n k

:

There exist strings of length n whose omplexity is at least n (they are often

alled inompressible strings). Indeed, there are 2 n

strings of length n and at

most 1+2+:::+2 n 1

=2 n

1potential desriptionsof length less than n.

One may ask for an example of an inompressible string. However, it is not

possible to nd an inompressible string of length n eetively (having n as

input). Indeed, if it were possible, a string generated by this algorithm would

have omplexity logn+ sine we need to speify n (about logn bits) and the

algorithm itself (onstant number of bits), and logn + is less than n for all

suiently large n.

Inompressiblestringsareausefultoolintheoretialomputersiene(automata

theory, formallanguages, et.).

Today everybody uses software for data ompression and deompression; this

(9)

However, the Kolmogorov omplexity theory may still provide useful hints: for

example, if a software advertisement laims that a latest version of the super-

ompressor ompresses every leby a ertainfator, you better avoidthis prod-

ut.

Finally,toprepare forthe next setion(on Gödel's inompleteness theorem), we

present a variation on a well known theme of busy beavers. Initially the busy

beaver numbers were dened as follows. Consider Turing mahines that have at

most n statesandwhose tapealphabetonsistsoftwosymbols(say, blank and

stroke). We start suh a mahine on the blank tape. Some mahines do not

terminate atall. Forthe mahines thatterminate weount thenumberof steps;

let T(n) be the maximal number of steps among the terminating mahines with

at most n states.

Evidently, T(n) is an inreasing funtion of n sine we onsider all mahines

that have at most n states. It grows very fast; in fat, it grows faster that any

omputable funtion (does not have a omputable upper bound). Indeed, if a

omputableupperboundf(n)exists,itmaybeusedtosolvethehaltingproblem,

sineweknowthatifamahinewithn statesdoesnotterminateafterf(n)steps,

itwillneverterminate. Sonoomputablefuntion, even a fastgrowing one, like

n!

(n! levels), is anupper bound forT(n).

Buthereweonsideradierent(butrelated)fast-growingfuntion. Letusdene

Æ(n) as the biggest integer that has omplexity less than n. It exists sine the

numberofdesriptionsofsizelessthannisnite. Bydenitionwehaven K(x)

for any x > Æ(n), e.g., for x = Æ(n)+1. If the funtion Æ were omputable we

would have K(Æ(n) +1) logn + C sine n might serve as a desription of

Æ(n)+1. The ontradition is evident. Hene, Æ isnot omputable. In a similar

way we an prove that Æ grows faster than any omputablefuntion. (It sues

to replae Æ(n) in the preeding inequalities by any omputable upper bound

for Æ.)

1.3 Gödel's theorem

1.3.1 It is proved that one annot prove everything

The funtion K(x) is not omputable. How an we use it? For example, to

prove theorems. Maybe the most remarkable example is the proof of Gödel's

inompleteness theorem. Roughly speaking, this theorem laims that not allthe

truthsareprovable. Mathematishas itsintrinsilimits: thereexistpropositions

that are true but impossibleto prove.

We propose to you a more onrete form of a proposition that is true but

unprovable; itwas suggested by Gregory Chaitin[4℄.