Récemment recherché

Aucun résultat trouvé

Étiquettes

Aucun résultat trouvé

Document

Aucun résultat trouvé

Accueil Écoles Thèmes

Connexion

Algorithmic information theory: a gentle introduction

Partager "Algorithmic information theory: a gentle introduction"

N/A

N/A

Protected

Année scolaire: 2022

Info

Protected

Academic year: 2022

Partager "Algorithmic information theory: a gentle introduction"

Copied!

221

0

0

221

0

0

Chargement.... (Voir le texte intégral maintenant)

Télécharger maintenant ( 221 Page )

Texte intégral

(1)

Algorithmic information theory:

a gentle introduction

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen

LIRMM CNRS & University of Montpellier

February 2016, IHP

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(2)

Standard example

Four-letter alphabet {a,b,c,d}. Two bits per letter. Different frequencies: 1/8,1/8,1/4,1/2.

Better encoding: 000,001,01,1

In general: log(1/p_i) bits for a letter with frequency p_i, averageH =P

pilog(1/pi).

“Statistical regularities can be used for compression”

Other types of regularities: block frequencies 50% compession if aa,bb,cc,dd only

Non-statistical regularities: binary expansion ofπ is highly compressible.

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(3)

Standard example

Four-letter alphabet {a,b,c,d}. Two bits per letter.

Different frequencies: 1/8,1/8,1/4,1/2. Better encoding: 000,001,01,1

In general: log(1/p_i) bits for a letter with frequency p_i, averageH =P

pilog(1/pi).

“Statistical regularities can be used for compression”

Other types of regularities: block frequencies 50% compession if aa,bb,cc,dd only

Non-statistical regularities: binary expansion ofπ is highly compressible.

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(4)

Standard example

Four-letter alphabet {a,b,c,d}. Two bits per letter.

Different frequencies: 1/8,1/8,1/4,1/2.

Better encoding: 000,001,01,1

In general: log(1/p_i) bits for a letter with frequency p_i, averageH =P

pilog(1/pi).

“Statistical regularities can be used for compression”

Other types of regularities: block frequencies 50% compession if aa,bb,cc,dd only

Non-statistical regularities: binary expansion ofπ is highly compressible.

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(5)

Standard example

Four-letter alphabet {a,b,c,d}. Two bits per letter.

Different frequencies: 1/8,1/8,1/4,1/2.

Better encoding: 000,001,01,1

In general: log(1/p_i) bits for a letter with frequency p_i, averageH =P

pilog(1/pi).

“Statistical regularities can be used for compression”

Other types of regularities: block frequencies 50% compession if aa,bb,cc,dd only

Non-statistical regularities: binary expansion ofπ is highly compressible.

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(6)

Standard example

Four-letter alphabet {a,b,c,d}. Two bits per letter.

Different frequencies: 1/8,1/8,1/4,1/2.

Better encoding: 000,001,01,1

In general: log(1/p_i) bits for a letter with frequency p_i, averageH =P

pilog(1/pi).

“Statistical regularities can be used for compression”

Other types of regularities: block frequencies 50% compession if aa,bb,cc,dd only

Non-statistical regularities: binary expansion ofπ is highly compressible.

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(7)

Standard example

Four-letter alphabet {a,b,c,d}. Two bits per letter.

Different frequencies: 1/8,1/8,1/4,1/2.

Better encoding: 000,001,01,1

In general: log(1/p_i) bits for a letter with frequency p_i, averageH =P

pilog(1/pi).

“Statistical regularities can be used for compression”

Other types of regularities: block frequencies 50% compession if aa,bb,cc,dd only

Non-statistical regularities: binary expansion ofπ is highly compressible.

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(8)

Standard example

Four-letter alphabet {a,b,c,d}. Two bits per letter.

Different frequencies: 1/8,1/8,1/4,1/2.

Better encoding: 000,001,01,1

In general: log(1/p_i) bits for a letter with frequency p_i, averageH =P

pilog(1/pi).

“Statistical regularities can be used for compression”

Other types of regularities: block frequencies 50% compession if aa,bb,cc,dd only

Non-statistical regularities: binary expansion ofπ is highly compressible.

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(9)

Standard example

Four-letter alphabet {a,b,c,d}. Two bits per letter.

Different frequencies: 1/8,1/8,1/4,1/2.

Better encoding: 000,001,01,1

In general: log(1/p_i) bits for a letter with frequency p_i, averageH =P

pilog(1/pi).

“Statistical regularities can be used for compression”

Other types of regularities: block frequencies 50% compession if aa,bb,cc,dd only

Non-statistical regularities: binary expansion ofπ is highly compressible.

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(10)

Algorithmic complexity

Goal: to define the amount of information in an individual object (genome, picture,. . . )

Idea: “amount of information = number of bits needed to define (describe, specify,. . . ) a given object”

“Define” is vague:

THE MINIMAL POSITIVE INTEGER THAT CANNOT BE DEFINED BY LESS THAN THOUSAND ENGLISH WORDS

More precise version: algorithmic complexity of x is the minimal length of a program that produces x.

“compressed size”

but we do not care about compression, only decompression matters

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(11)

Algorithmic complexity

Goal: to define the amount of information in an individual object (genome, picture,. . . )

Idea: “amount of information = number of bits needed to define (describe, specify,. . . ) a given object”

“Define” is vague:

THE MINIMAL POSITIVE INTEGER THAT CANNOT BE DEFINED BY LESS THAN THOUSAND ENGLISH WORDS

More precise version: algorithmic complexity of x is the minimal length of a program that produces x.

“compressed size”

but we do not care about compression, only decompression matters

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(12)

Algorithmic complexity

Goal: to define the amount of information in an individual object (genome, picture,. . . )

Idea: “amount of information = number of bits needed to define (describe, specify,. . . ) a given object”

“Define” is vague:

THE MINIMAL POSITIVE INTEGER THAT CANNOT BE DEFINED BY LESS THAN THOUSAND ENGLISH WORDS

More precise version: algorithmic complexity of x is the minimal length of a program that produces x.

“compressed size”

but we do not care about compression, only decompression matters

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(13)

Algorithmic complexity

Goal: to define the amount of information in an individual object (genome, picture,. . . )

Idea: “amount of information = number of bits needed to define (describe, specify,. . . ) a given object”

“Define” is vague:

THE MINIMAL POSITIVE INTEGER THAT CANNOT BE DEFINED BY LESS THAN THOUSAND ENGLISH WORDS

More precise version: algorithmic complexity of x is the minimal length of a program that produces x.

“compressed size”

but we do not care about compression, only decompression matters

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(14)

Algorithmic complexity

Goal: to define the amount of information in an individual object (genome, picture,. . . )

Idea: “amount of information = number of bits needed to define (describe, specify,. . . ) a given object”

“Define” is vague:

THE MINIMAL POSITIVE INTEGER THAT CANNOT BE DEFINED BY LESS THAN THOUSAND ENGLISH WORDS

More precise version: algorithmic complexity of x is the minimal length of a program that produces x.

“compressed size”

but we do not care about compression, only decompression matters

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(15)

Algorithmic complexity

Goal: to define the amount of information in an individual object (genome, picture,. . . )

Idea: “amount of information = number of bits needed to define (describe, specify,. . . ) a given object”

“Define” is vague:

THE MINIMAL POSITIVE INTEGER THAT CANNOT BE DEFINED BY LESS THAN THOUSAND ENGLISH WORDS

More precise version: algorithmic complexity of x is the minimal length of a program that produces x.

“compressed size”

but we do not care about compression, only decompression matters

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(16)

Algorithmic complexity

Goal: to define the amount of information in an individual object (genome, picture,. . . )

Idea: “amount of information = number of bits needed to define (describe, specify,. . . ) a given object”

“Define” is vague:

THE MINIMAL POSITIVE INTEGER THAT CANNOT BE DEFINED BY LESS THAN THOUSAND ENGLISH WORDS

More precise version: algorithmic complexity of x is the minimal length of a program that produces x.

“compressed size”

but we do not care about compression, only decompression matters

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(17)

Kolmogorov complexity and decompressors

decompressor = any partial computable functionV from {0,1}^∗ to{0,1}^∗ (we define complexity of strings) decompressor = programming language (without input): if V(x) =z we say that “x is a program for z”

(“description” of z, “compressed version” of z, etc.)

Given decompressor V, we define thecomplexityof a string z w.r.t. this decompressor

C_V(z) =min{|x|:V(x) =z} min∅= +∞

Can one achieve something by this trivial definition?!

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(18)

Kolmogorov complexity and decompressors

decompressor = any partial computable functionV from {0,1}^∗ to{0,1}^∗ (we define complexity of strings)

decompressor = programming language (without input): if V(x) =z we say that “x is a program for z”

(“description” of z, “compressed version” of z, etc.)

Given decompressor V, we define thecomplexityof a string z w.r.t. this decompressor

C_V(z) =min{|x|:V(x) =z} min∅= +∞

Can one achieve something by this trivial definition?!

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(19)

Kolmogorov complexity and decompressors

decompressor = any partial computable functionV from {0,1}^∗ to{0,1}^∗ (we define complexity of strings) decompressor = programming language (without input):

if V(x) =z we say that “x is a program for z”

(“description” of z, “compressed version” of z, etc.)

Given decompressor V, we define thecomplexityof a string z w.r.t. this decompressor

C_V(z) =min{|x|:V(x) =z} min∅= +∞

Can one achieve something by this trivial definition?!

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(20)

Kolmogorov complexity and decompressors

decompressor = any partial computable functionV from {0,1}^∗ to{0,1}^∗ (we define complexity of strings) decompressor = programming language (without input):

if V(x) =z we say that “x is a program for z”

(“description” of z, “compressed version” of z, etc.)

Given decompressor V, we define thecomplexityof a string z w.r.t. this decompressor

C_V(z) =min{|x|:V(x) =z} min∅= +∞

Can one achieve something by this trivial definition?!

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(21)

Kolmogorov complexity and decompressors

decompressor = any partial computable functionV from {0,1}^∗ to{0,1}^∗ (we define complexity of strings) decompressor = programming language (without input):

if V(x) =z we say that “x is a program for z”

(“description” of z, “compressed version” of z, etc.)

Given decompressor V, we define thecomplexityof a string z w.r.t. this decompressor

C_V(z) =min{|x|:V(x) =z} min∅= +∞

Can one achieve something by this trivial definition?!

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(22)

Kolmogorov complexity and decompressors

decompressor = any partial computable functionV from {0,1}^∗ to{0,1}^∗ (we define complexity of strings) decompressor = programming language (without input):

if V(x) =z we say that “x is a program for z”

(“description” of z, “compressed version” of z, etc.)

Given decompressor V, we define thecomplexityof a string z w.r.t. this decompressor

C_V(z) =min{|x|:V(x) =z}

min∅= +∞

Can one achieve something by this trivial definition?!

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(23)

Kolmogorov complexity and decompressors

decompressor = any partial computable functionV from {0,1}^∗ to{0,1}^∗ (we define complexity of strings) decompressor = programming language (without input):

if V(x) =z we say that “x is a program for z”

(“description” of z, “compressed version” of z, etc.)

Given decompressor V, we define thecomplexityof a string z w.r.t. this decompressor

C_V(z) =min{|x|:V(x) =z}

min∅= +∞

Can one achieve something by this trivial definition?!

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(24)

Kolmogorov complexity and decompressors

decompressor = any partial computable functionV from {0,1}^∗ to{0,1}^∗ (we define complexity of strings) decompressor = programming language (without input):

if V(x) =z we say that “x is a program for z”

(“description” of z, “compressed version” of z, etc.)

Given decompressor V, we define thecomplexityof a string z w.r.t. this decompressor

C_V(z) =min{|x|:V(x) =z}

min∅= +∞

Can one achieve something by this trivial definition?!

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(25)

Function C

_V

Without computability: let V be an arbitrary partial function from strings to strings

which functionsC_V do we obtain?

#{z:CV(z) =k} ≤2^k

necessary and sufficient condition

use k-bit strings as “descriptions” (“compressed versions”) of stringsz with C(z) =k.

for every z one can trivially findV that makesC_V(z) = 0 (map empty string Λ toz)

So what? Even if we restrict V to computable partial functions, can we get anything non-trivial?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(26)

Function C

_V

Without computability: let V be an arbitrary partial function from strings to strings

which functionsC_V do we obtain?

#{z:CV(z) =k} ≤2^k

necessary and sufficient condition

use k-bit strings as “descriptions” (“compressed versions”) of stringsz with C(z) =k.

for every z one can trivially findV that makesC_V(z) = 0 (map empty string Λ toz)

So what? Even if we restrict V to computable partial functions, can we get anything non-trivial?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(27)

Function C

_V

Without computability: let V be an arbitrary partial function from strings to strings

which functionsC_V do we obtain?

#{z:CV(z) =k} ≤2^k

necessary and sufficient condition

use k-bit strings as “descriptions” (“compressed versions”) of stringsz with C(z) =k.

for every z one can trivially findV that makesC_V(z) = 0 (map empty string Λ toz)

So what? Even if we restrict V to computable partial functions, can we get anything non-trivial?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(28)

Function C

_V

Without computability: let V be an arbitrary partial function from strings to strings

which functionsC_V do we obtain?

#{z:CV(z) =k} ≤2^k

necessary and sufficient condition

use k-bit strings as “descriptions” (“compressed versions”) of stringsz with C(z) =k.

for every z one can trivially findV that makesC_V(z) = 0 (map empty string Λ toz)

So what? Even if we restrict V to computable partial functions, can we get anything non-trivial?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(29)

Function C

_V

Without computability: let V be an arbitrary partial function from strings to strings

which functionsC_V do we obtain?

#{z:CV(z) =k} ≤2^k

necessary and sufficient condition

use k-bit strings as “descriptions” (“compressed versions”) of stringsz with C(z) =k.

for every z one can trivially findV that makesC_V(z) = 0 (map empty string Λ toz)

So what? Even if we restrict V to computable partial functions, can we get anything non-trivial?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(30)

Function C

_V

Without computability: let V be an arbitrary partial function from strings to strings

which functionsC_V do we obtain?

#{z:CV(z) =k} ≤2^k

necessary and sufficient condition

use k-bit strings as “descriptions” (“compressed versions”) of stringsz with C(z) =k.

for every z one can trivially findV that makesC_V(z) = 0 (map empty string Λ toz)

So what? Even if we restrict V to computable partial functions, can we get anything non-trivial?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(31)

Function C

_V

Without computability: let V be an arbitrary partial function from strings to strings

which functionsC_V do we obtain?

#{z:CV(z) =k} ≤2^k

necessary and sufficient condition

use k-bit strings as “descriptions” (“compressed versions”) of stringsz with C(z) =k.

for every z one can trivially findV that makesC_V(z) = 0 (map empty string Λ toz)

So what? Even if we restrict V to computable partial functions, can we get anything non-trivial?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(32)

Function C

_V

Without computability: let V be an arbitrary partial function from strings to strings

which functionsC_V do we obtain?

#{z:CV(z) =k} ≤2^k

necessary and sufficient condition

use k-bit strings as “descriptions” (“compressed versions”) of stringsz with C(z) =k.

for every z one can trivially findV that makesC_V(z) = 0 (map empty string Λ toz)

So what? Even if we restrict V to computable partial functions, can we get anything non-trivial?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(33)

An easy exercise

For every two decompressors V0 andV1 there exist some V such that

CV(z)≤min(CV0(z),CV1(z)) +O(1) for allz

“V is (almost) as good as each of V₀,V₁” V(0x) =V₀(x) andV(1x) =V₁(x)

first we specify which decompressor to use, and then the short program for this decompressor

preserves computability

“practical application”: zipped file starts with a header that specifies compression method (2^k methods fork-bit header)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(34)

An easy exercise

For every two decompressors V0 and V1 there exist some V such that

CV(z)≤min(CV0(z),CV1(z)) +O(1) for allz

“V is (almost) as good as each of V₀,V₁” V(0x) =V₀(x) andV(1x) =V₁(x)

first we specify which decompressor to use, and then the short program for this decompressor

preserves computability

“practical application”: zipped file starts with a header that specifies compression method (2^k methods fork-bit header)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(35)

An easy exercise

For every two decompressors V0 and V1 there exist some V such that

CV(z)≤min(CV0(z),CV1(z)) +O(1) for allz

“V is (almost) as good as each of V₀,V₁”

V(0x) =V₀(x) andV(1x) =V₁(x)

first we specify which decompressor to use, and then the short program for this decompressor

preserves computability

“practical application”: zipped file starts with a header that specifies compression method (2^k methods fork-bit header)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(36)

An easy exercise

For every two decompressors V0 and V1 there exist some V such that

CV(z)≤min(CV0(z),CV1(z)) +O(1) for allz

“V is (almost) as good as each of V₀,V₁” V(0x) =V₀(x) andV(1x) =V₁(x)

first we specify which decompressor to use, and then the short program for this decompressor

preserves computability

“practical application”: zipped file starts with a header that specifies compression method (2^k methods fork-bit header)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(37)

An easy exercise

For every two decompressors V0 and V1 there exist some V such that

CV(z)≤min(CV0(z),CV1(z)) +O(1) for allz

“V is (almost) as good as each of V₀,V₁” V(0x) =V₀(x) andV(1x) =V₁(x)

first we specify which decompressor to use, and then the short program for this decompressor

preserves computability

“practical application”: zipped file starts with a header that specifies compression method (2^k methods fork-bit header)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(38)

An easy exercise

For every two decompressors V0 and V1 there exist some V such that

CV(z)≤min(CV0(z),CV1(z)) +O(1) for allz

“V is (almost) as good as each of V₀,V₁” V(0x) =V₀(x) andV(1x) =V₁(x)

first we specify which decompressor to use, and then the short program for this decompressor

preserves computability

“practical application”: zipped file starts with a header that specifies compression method (2^k methods fork-bit header)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(39)

An easy exercise

For every two decompressors V0 and V1 there exist some V such that

CV(z)≤min(CV0(z),CV1(z)) +O(1) for allz

“V is (almost) as good as each of V₀,V₁” V(0x) =V₀(x) andV(1x) =V₁(x)

first we specify which decompressor to use, and then the short program for this decompressor

preserves computability

“practical application”: zipped file starts with a header that specifies compression method (2^k methods fork-bit header)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(40)

Optimal decompressors

For every sequence V₀,V₁, . . . of decompressors there exist someV such that C_V(z)≤C_V_i(z) + 2 logi+c for somec and for alli and z

V is almost as good as every Vi (and the price to pay is moderate, only O(logi))

proof: prepend Vi-programs by a self-delimited description of i (say,i in binary with all bits doubled, terminated by 01) the computable enumeration of all computableVi gives

“Kolmogorov–Solomonoff theorem”: there exists an optimal computable decompressor that is almost as good as any other computable one.

CU for such an optimalU is called “algorithmic complexity” (or Kolmogorov complexity) and denoted by C

“application”: self-extracting archives

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(41)

Optimal decompressors

For every sequence V₀,V₁, . . . of decompressors there exist someV such that C_V(z)≤C_V_i(z) + 2 logi+c for somec and for alli and z

V is almost as good as every Vi (and the price to pay is moderate, only O(logi))

proof: prepend Vi-programs by a self-delimited description of i (say,i in binary with all bits doubled, terminated by 01) the computable enumeration of all computableVi gives

“Kolmogorov–Solomonoff theorem”: there exists an optimal computable decompressor that is almost as good as any other computable one.

CU for such an optimalU is called “algorithmic complexity” (or Kolmogorov complexity) and denoted by C

“application”: self-extracting archives

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(42)

Optimal decompressors

For every sequence V₀,V₁, . . . of decompressors there exist someV such that C_V(z)≤C_V_i(z) + 2 logi+c for somec and for alli and z

V is almost as good as every Vi (and the price to pay is moderate, only O(logi))

proof: prepend Vi-programs by a self-delimited description of i (say,i in binary with all bits doubled, terminated by 01) the computable enumeration of all computableVi gives

“Kolmogorov–Solomonoff theorem”: there exists an optimal computable decompressor that is almost as good as any other computable one.

CU for such an optimalU is called “algorithmic complexity” (or Kolmogorov complexity) and denoted by C

“application”: self-extracting archives

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(43)

Optimal decompressors

For every sequence V₀,V₁, . . . of decompressors there exist someV such that C_V(z)≤C_V_i(z) + 2 logi+c for somec and for alli and z

V is almost as good as every Vi (and the price to pay is moderate, only O(logi))

proof: prepend Vi-programs by a self-delimited description of i (say,i in binary with all bits doubled, terminated by 01)

the computable enumeration of all computableVi gives

“Kolmogorov–Solomonoff theorem”: there exists an optimal computable decompressor that is almost as good as any other computable one.

CU for such an optimalU is called “algorithmic complexity” (or Kolmogorov complexity) and denoted by C

“application”: self-extracting archives

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(44)

Optimal decompressors

For every sequence V₀,V₁, . . . of decompressors there exist someV such that C_V(z)≤C_V_i(z) + 2 logi+c for somec and for alli and z

V is almost as good as every Vi (and the price to pay is moderate, only O(logi))

proof: prepend Vi-programs by a self-delimited description of i (say,i in binary with all bits doubled, terminated by 01) the computable enumeration of all computableV_i gives

“Kolmogorov–Solomonoff theorem”: there exists an optimal computable decompressor that is almost as good as any other computable one.

CU for such an optimalU is called “algorithmic complexity” (or Kolmogorov complexity) and denoted by C

“application”: self-extracting archives

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(45)

Optimal decompressors

For every sequence V₀,V₁, . . . of decompressors there exist someV such that C_V(z)≤C_V_i(z) + 2 logi+c for somec and for alli and z

V is almost as good as every Vi (and the price to pay is moderate, only O(logi))

proof: prepend Vi-programs by a self-delimited description of i (say,i in binary with all bits doubled, terminated by 01) the computable enumeration of all computableV_i gives

“Kolmogorov–Solomonoff theorem”: there exists an optimal computable decompressor that is almost as good as any other computable one.

CU for such an optimalU is called “algorithmic complexity” (or Kolmogorov complexity) and denoted by C

“application”: self-extracting archives

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(46)

Optimal decompressors

For every sequence V₀,V₁, . . . of decompressors there exist someV such that C_V(z)≤C_V_i(z) + 2 logi+c for somec and for alli and z

V is almost as good as every Vi (and the price to pay is moderate, only O(logi))

proof: prepend Vi-programs by a self-delimited description of i (say,i in binary with all bits doubled, terminated by 01) the computable enumeration of all computableV_i gives

“Kolmogorov–Solomonoff theorem”: there exists an optimal computable decompressor that is almost as good as any other computable one.

CU for such an optimalU is called “algorithmic complexity”

(or Kolmogorov complexity) and denoted by C

“application”: self-extracting archives

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(47)

Optimal decompressors

For every sequence V₀,V₁, . . . of decompressors there exist someV such that C_V(z)≤C_V_i(z) + 2 logi+c for somec and for alli and z

V is almost as good as every Vi (and the price to pay is moderate, only O(logi))

proof: prepend Vi-programs by a self-delimited description of i (say,i in binary with all bits doubled, terminated by 01) the computable enumeration of all computableV_i gives

“Kolmogorov–Solomonoff theorem”: there exists an optimal computable decompressor that is almost as good as any other computable one.

CU for such an optimalU is called “algorithmic complexity”

(or Kolmogorov complexity) and denoted by C

“application”: self-extracting archives

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(48)

Some natural properties of C

C(x)≤ |x|+O(1).

Indeed, compare the optimal decompressor U withx 7→x

“algorithmic transformations do not create new information”: if Ais some computable function, thenC(A(x))≤C(x) +c_A for somecA and all x (herecA depends on Abut not on x) there is less than 2ⁿ objects of complexity less thann. Indeed, the number of descriptions shorter thann does not exceed 1 + 2 + 4 +. . .+ 2ⁿ⁻¹ <2ⁿ

Some objects are highly compressible, e.g., C(0ⁿ)≤logn+c Indeed, consider the algorithmic transformation bin(n)7→0ⁿ but most are not: the fraction of strings x of length n such that C(x)<n−c is less than 2^−c

Law of nature: tossing8000coins, you get a sequence of1000 bytes that has zip-compressed length at least 900. Does it follow from the known laws of physics (and how if it does)?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(49)

Some natural properties of C

C(x)≤ |x|+O(1).

Indeed, compare the optimal decompressor U withx 7→x

“algorithmic transformations do not create new information”: if Ais some computable function, thenC(A(x))≤C(x) +c_A for somecA and all x (herecA depends on Abut not on x) there is less than 2ⁿ objects of complexity less thann. Indeed, the number of descriptions shorter thann does not exceed 1 + 2 + 4 +. . .+ 2ⁿ⁻¹ <2ⁿ

Some objects are highly compressible, e.g., C(0ⁿ)≤logn+c Indeed, consider the algorithmic transformation bin(n)7→0ⁿ but most are not: the fraction of strings x of length n such that C(x)<n−c is less than 2^−c

Law of nature: tossing8000coins, you get a sequence of1000 bytes that has zip-compressed length at least 900. Does it follow from the known laws of physics (and how if it does)?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(50)

Some natural properties of C

C(x)≤ |x|+O(1).

Indeed, compare the optimal decompressorU withx 7→x

“algorithmic transformations do not create new information”: if Ais some computable function, thenC(A(x))≤C(x) +c_A for somecA and all x (herecA depends on Abut not on x) there is less than 2ⁿ objects of complexity less thann. Indeed, the number of descriptions shorter thann does not exceed 1 + 2 + 4 +. . .+ 2ⁿ⁻¹ <2ⁿ

Some objects are highly compressible, e.g., C(0ⁿ)≤logn+c Indeed, consider the algorithmic transformation bin(n)7→0ⁿ but most are not: the fraction of strings x of length n such that C(x)<n−c is less than 2^−c

Law of nature: tossing8000coins, you get a sequence of1000 bytes that has zip-compressed length at least 900. Does it follow from the known laws of physics (and how if it does)?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(51)

Some natural properties of C

C(x)≤ |x|+O(1).

Indeed, compare the optimal decompressorU withx 7→x

“algorithmic transformations do not create new information”:

if Ais some computable function, thenC(A(x))≤C(x) +c_A for somecA and all x (herecA depends on Abut not on x)

there is less than 2ⁿ objects of complexity less thann. Indeed, the number of descriptions shorter thann does not exceed 1 + 2 + 4 +. . .+ 2ⁿ⁻¹ <2ⁿ

Some objects are highly compressible, e.g., C(0ⁿ)≤logn+c Indeed, consider the algorithmic transformation bin(n)7→0ⁿ but most are not: the fraction of strings x of length n such that C(x)<n−c is less than 2^−c

Law of nature: tossing8000coins, you get a sequence of1000 bytes that has zip-compressed length at least 900. Does it follow from the known laws of physics (and how if it does)?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(52)

Some natural properties of C

C(x)≤ |x|+O(1).

Indeed, compare the optimal decompressorU withx 7→x

“algorithmic transformations do not create new information”:

if Ais some computable function, thenC(A(x))≤C(x) +c_A for somecA and all x (herecA depends on Abut not on x) there is less than 2ⁿ objects of complexity less thann.

Indeed, the number of descriptions shorter thann does not exceed 1 + 2 + 4 +. . .+ 2ⁿ⁻¹ <2ⁿ

Some objects are highly compressible, e.g., C(0ⁿ)≤logn+c Indeed, consider the algorithmic transformation bin(n)7→0ⁿ but most are not: the fraction of strings x of length n such that C(x)<n−c is less than 2^−c

Law of nature: tossing8000coins, you get a sequence of1000 bytes that has zip-compressed length at least 900. Does it follow from the known laws of physics (and how if it does)?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(53)

Some natural properties of C

C(x)≤ |x|+O(1).

Indeed, compare the optimal decompressorU withx 7→x

“algorithmic transformations do not create new information”:

if Ais some computable function, thenC(A(x))≤C(x) +c_A for somecA and all x (herecA depends on Abut not on x) there is less than 2ⁿ objects of complexity less thann.

Indeed, the number of descriptions shorter thann does not exceed 1 + 2 + 4 +. . .+ 2ⁿ⁻¹ <2ⁿ

Some objects are highly compressible, e.g., C(0ⁿ)≤logn+c Indeed, consider the algorithmic transformation bin(n)7→0ⁿ but most are not: the fraction of strings x of length n such that C(x)<n−c is less than 2^−c

Law of nature: tossing8000coins, you get a sequence of1000 bytes that has zip-compressed length at least 900. Does it follow from the known laws of physics (and how if it does)?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(54)

Some natural properties of C

C(x)≤ |x|+O(1).

Indeed, compare the optimal decompressorU withx 7→x

“algorithmic transformations do not create new information”:

if Ais some computable function, thenC(A(x))≤C(x) +c_A for somecA and all x (herecA depends on Abut not on x) there is less than 2ⁿ objects of complexity less thann.

Indeed, the number of descriptions shorter thann does not exceed 1 + 2 + 4 +. . .+ 2ⁿ⁻¹ <2ⁿ

Some objects are highly compressible, e.g., C(0ⁿ)≤logn+c

Indeed, consider the algorithmic transformation bin(n)7→0ⁿ but most are not: the fraction of strings x of length n such that C(x)<n−c is less than 2^−c

Law of nature: tossing8000coins, you get a sequence of1000 bytes that has zip-compressed length at least 900. Does it follow from the known laws of physics (and how if it does)?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(55)

Some natural properties of C

C(x)≤ |x|+O(1).

Indeed, compare the optimal decompressorU withx 7→x

“algorithmic transformations do not create new information”:

if Ais some computable function, thenC(A(x))≤C(x) +c_A for somecA and all x (herecA depends on Abut not on x) there is less than 2ⁿ objects of complexity less thann.

Indeed, the number of descriptions shorter thann does not exceed 1 + 2 + 4 +. . .+ 2ⁿ⁻¹ <2ⁿ

Some objects are highly compressible, e.g., C(0ⁿ)≤logn+c Indeed, consider the algorithmic transformation bin(n)7→0ⁿ

but most are not: the fraction of strings x of length n such that C(x)<n−c is less than 2^−c

Law of nature: tossing8000coins, you get a sequence of1000 bytes that has zip-compressed length at least 900. Does it follow from the known laws of physics (and how if it does)?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(56)

Some natural properties of C

C(x)≤ |x|+O(1).

Indeed, compare the optimal decompressorU withx 7→x

“algorithmic transformations do not create new information”:

if Ais some computable function, thenC(A(x))≤C(x) +c_A for somecA and all x (herecA depends on Abut not on x) there is less than 2ⁿ objects of complexity less thann.

Indeed, the number of descriptions shorter thann does not exceed 1 + 2 + 4 +. . .+ 2ⁿ⁻¹ <2ⁿ

Some objects are highly compressible, e.g., C(0ⁿ)≤logn+c Indeed, consider the algorithmic transformation bin(n)7→0ⁿ but most are not: the fraction of strings x of length n such that C(x)<n−c is less than 2^−c

Law of nature: tossing8000coins, you get a sequence of1000 bytes that has zip-compressed length at least 900. Does it follow from the known laws of physics (and how if it does)?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(57)

Some natural properties of C

C(x)≤ |x|+O(1).

Indeed, compare the optimal decompressorU withx 7→x

“algorithmic transformations do not create new information”:

if Ais some computable function, thenC(A(x))≤C(x) +c_A for somecA and all x (herecA depends on Abut not on x) there is less than 2ⁿ objects of complexity less thann.

Indeed, the number of descriptions shorter thann does not exceed 1 + 2 + 4 +. . .+ 2ⁿ⁻¹ <2ⁿ

Some objects are highly compressible, e.g., C(0ⁿ)≤logn+c Indeed, consider the algorithmic transformation bin(n)7→0ⁿ but most are not: the fraction of strings x of length n such that C(x)<n−c is less than 2^−c

Law of nature: tossing8000coins, you get a sequence of1000 bytes that has zip-compressed length at least 900. Does it follow from the known laws of physics (and how if it does)?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(58)

Bad news

We defined the complexity function C, but in fact it is defined only up to O(1)-change

. . . unless you declare your favourite programming language to be “the right one”

So the questions “is C(0¹⁰⁰⁰)<15”? or “what is bigger: C(010) orC(101)” do not make sense

Theorem: functionC(·) is not computable (and even does not have a computable lower bound)

proof: if it were, the string x_n, “the first string that has complexity at least n”, has complexity at leastn and at most O(logn) at the same time (since it is obtained algorithmically from n)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(59)

Bad news

We defined the complexity function C, but in fact it is defined only up to O(1)-change

. . . unless you declare your favourite programming language to be “the right one”

So the questions “is C(0¹⁰⁰⁰)<15”? or “what is bigger: C(010) orC(101)” do not make sense

Theorem: functionC(·) is not computable (and even does not have a computable lower bound)

proof: if it were, the string x_n, “the first string that has complexity at least n”, has complexity at leastn and at most O(logn) at the same time (since it is obtained algorithmically from n)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(60)

Bad news

We defined the complexity function C, but in fact it is defined only up to O(1)-change

. . . unless you declare your favourite programming language to be “the right one”

So the questions “is C(0¹⁰⁰⁰)<15”? or “what is bigger: C(010) orC(101)” do not make sense

Theorem: functionC(·) is not computable (and even does not have a computable lower bound)

proof: if it were, the string x_n, “the first string that has complexity at least n”, has complexity at leastn and at most O(logn) at the same time (since it is obtained algorithmically from n)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(61)

Bad news

We defined the complexity function C, but in fact it is defined only up to O(1)-change

. . . unless you declare your favourite programming language to be “the right one”

So the questions “is C(0¹⁰⁰⁰)<15”? or “what is bigger:

C(010) orC(101)” do not make sense

Theorem: functionC(·) is not computable (and even does not have a computable lower bound)

proof: if it were, the string x_n, “the first string that has complexity at least n”, has complexity at leastn and at most O(logn) at the same time (since it is obtained algorithmically from n)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(62)

Bad news

We defined the complexity function C, but in fact it is defined only up to O(1)-change

. . . unless you declare your favourite programming language to be “the right one”

So the questions “is C(0¹⁰⁰⁰)<15”? or “what is bigger:

C(010) orC(101)” do not make sense

Theorem: functionC(·) is not computable (and even does not have a computable lower bound)

proof: if it were, the string x_n, “the first string that has complexity at least n”, has complexity at leastn and at most O(logn) at the same time (since it is obtained algorithmically from n)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(63)

Bad news

We defined the complexity function C, but in fact it is defined only up to O(1)-change

. . . unless you declare your favourite programming language to be “the right one”

So the questions “is C(0¹⁰⁰⁰)<15”? or “what is bigger:

C(010) orC(101)” do not make sense

Theorem: functionC(·) is not computable (and even does not have a computable lower bound)

proof: if it were, the string x_n, “the first string that has complexity at least n”, has complexity at leastn and at most O(logn) at the same time (since it is obtained algorithmically fromn)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(64)

Digression: G¨ odel and Chaitin

G¨odel: not all true statements are provable

Chaitin: not all true statements of the form “C(x)>m” where x is a specific string, and mis a specific number, are provable

moreover, they are provable only form not exceeding some constant

Why? If not, consider the functionm7→y_m= (the first discovered string with complexity provably exceedingm) the complexity of y_m is at least m(assuming only true statements are provable)

the complexity of y_m is at most logm+O(1) since it is obtained fromm by an algorithmic transformation

second order digression: axiomatic power of statements of this form

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(65)

Digression: G¨ odel and Chaitin

G¨odel: not all true statements are provable

Chaitin: not all true statements of the form “C(x)>m” where x is a specific string, and mis a specific number, are provable

moreover, they are provable only form not exceeding some constant

Why? If not, consider the functionm7→y_m= (the first discovered string with complexity provably exceedingm) the complexity of y_m is at least m(assuming only true statements are provable)

the complexity of y_m is at most logm+O(1) since it is obtained fromm by an algorithmic transformation

second order digression: axiomatic power of statements of this form

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(66)

Digression: G¨ odel and Chaitin

G¨odel: not all true statements are provable

Chaitin: not all true statements of the form “C(x)>m”

wherex is a specific string, and mis a specific number, are provable

moreover, they are provable only form not exceeding some constant

Why? If not, consider the functionm7→y_m= (the first discovered string with complexity provably exceedingm) the complexity of y_m is at least m(assuming only true statements are provable)

the complexity of y_m is at most logm+O(1) since it is obtained fromm by an algorithmic transformation

second order digression: axiomatic power of statements of this form

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(67)

Digression: G¨ odel and Chaitin

G¨odel: not all true statements are provable

Chaitin: not all true statements of the form “C(x)>m”

wherex is a specific string, and mis a specific number, are provable

moreover, they are provable only form not exceeding some constant

Why? If not, consider the functionm7→y_m= (the first discovered string with complexity provably exceedingm) the complexity of y_m is at least m(assuming only true statements are provable)

the complexity of y_m is at most logm+O(1) since it is obtained fromm by an algorithmic transformation

second order digression: axiomatic power of statements of this form

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(68)

Digression: G¨ odel and Chaitin

G¨odel: not all true statements are provable

Chaitin: not all true statements of the form “C(x)>m”

wherex is a specific string, and mis a specific number, are provable

moreover, they are provable only form not exceeding some constant

Why? If not, consider the functionm7→y_m= (the first discovered string with complexity provably exceedingm)

the complexity of y_m is at least m(assuming only true statements are provable)

the complexity of y_m is at most logm+O(1) since it is obtained fromm by an algorithmic transformation

second order digression: axiomatic power of statements of this form

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(69)

Digression: G¨ odel and Chaitin

G¨odel: not all true statements are provable

Chaitin: not all true statements of the form “C(x)>m”

wherex is a specific string, and mis a specific number, are provable

moreover, they are provable only form not exceeding some constant

Why? If not, consider the functionm7→y_m= (the first discovered string with complexity provably exceedingm) the complexity of y_m is at least m(assuming only true statements are provable)

the complexity of y_m is at most logm+O(1) since it is obtained fromm by an algorithmic transformation

second order digression: axiomatic power of statements of this form

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(70)

Digression: G¨ odel and Chaitin

G¨odel: not all true statements are provable

Chaitin: not all true statements of the form “C(x)>m”

wherex is a specific string, and mis a specific number, are provable

moreover, they are provable only form not exceeding some constant

Why? If not, consider the functionm7→y_m= (the first discovered string with complexity provably exceedingm) the complexity of y_m is at least m(assuming only true statements are provable)

the complexity of y_m is at most logm+O(1) since it is obtained fromm by an algorithmic transformation

second order digression: axiomatic power of statements of this form

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(71)

Digression: G¨ odel and Chaitin

G¨odel: not all true statements are provable

Chaitin: not all true statements of the form “C(x)>m”

wherex is a specific string, and mis a specific number, are provable

moreover, they are provable only form not exceeding some constant

Why? If not, consider the functionm7→y_m= (the first discovered string with complexity provably exceedingm) the complexity of y_m is at least m(assuming only true statements are provable)

the complexity of y_m is at most logm+O(1) since it is obtained fromm by an algorithmic transformation

second order digression: axiomatic power of statements of this form

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(72)

Good news

Still the asymptotic considerations have sense

e.g., one can define “effective Hausdorff dimension” of an individual infinite bit sequence x₀x₁. . .as

lim infC(x0. . .xn−1)/n

(Hausdorff dimension for a singleton?!)

theorem: if x=x₀x₁. . .is obtained by independent trials of Bernoulli distribution (p,1−p), then with probability 1 the effective Hausdorff dimension of x is H(p).

finite version: ifp is a frequence of 1s in an-bit string x, then C(x)≤nH(p) +O(logn)

Even for genome (or a long novel) the notion of complexity has sense: different “natural” programming languages give complexities that are 10²–10⁵ apart (the length of a compiler)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(73)

Good news

Still the asymptotic considerations have sense

e.g., one can define “effective Hausdorff dimension” of an individual infinite bit sequence x₀x₁. . .as

lim infC(x0. . .xn−1)/n

(Hausdorff dimension for a singleton?!)

theorem: if x=x₀x₁. . .is obtained by independent trials of Bernoulli distribution (p,1−p), then with probability 1 the effective Hausdorff dimension of x is H(p).

finite version: ifp is a frequence of 1s in an-bit string x, then C(x)≤nH(p) +O(logn)

Even for genome (or a long novel) the notion of complexity has sense: different “natural” programming languages give complexities that are 10²–10⁵ apart (the length of a compiler)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(74)

Good news

Still the asymptotic considerations have sense

e.g., one can define “effective Hausdorff dimension” of an individual infinite bit sequence x₀x₁. . .as

lim infC(x0. . .xn−1)/n

(Hausdorff dimension for a singleton?!)

theorem: if x=x₀x₁. . .is obtained by independent trials of Bernoulli distribution (p,1−p), then with probability 1 the effective Hausdorff dimension of x is H(p).

finite version: ifp is a frequence of 1s in an-bit string x, then C(x)≤nH(p) +O(logn)

Even for genome (or a long novel) the notion of complexity has sense: different “natural” programming languages give complexities that are 10²–10⁵ apart (the length of a compiler)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(75)

Good news

Still the asymptotic considerations have sense

e.g., one can define “effective Hausdorff dimension” of an individual infinite bit sequence x₀x₁. . .as

lim infC(x0. . .xn−1)/n

(Hausdorff dimension for a singleton?!)

theorem: if x=x₀x₁. . .is obtained by independent trials of Bernoulli distribution (p,1−p), then with probability 1 the effective Hausdorff dimension of x is H(p).

finite version: ifp is a frequence of 1s in an-bit string x, then C(x)≤nH(p) +O(logn)

Even for genome (or a long novel) the notion of complexity has sense: different “natural” programming languages give complexities that are 10²–10⁵ apart (the length of a compiler)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(76)

Good news

Still the asymptotic considerations have sense

e.g., one can define “effective Hausdorff dimension” of an individual infinite bit sequence x₀x₁. . .as

lim infC(x0. . .xn−1)/n

(Hausdorff dimension for a singleton?!)

theorem: if x=x₀x₁. . .is obtained by independent trials of Bernoulli distribution (p,1−p), then with probability 1 the effective Hausdorff dimension of x is H(p).

finite version: ifp is a frequence of 1s in an-bit string x, then C(x)≤nH(p) +O(logn)

Even for genome (or a long novel) the notion of complexity has sense: different “natural” programming languages give complexities that are 10²–10⁵ apart (the length of a compiler)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(77)

Good news

Still the asymptotic considerations have sense

e.g., one can define “effective Hausdorff dimension” of an individual infinite bit sequence x₀x₁. . .as

lim infC(x0. . .xn−1)/n

(Hausdorff dimension for a singleton?!)

theorem: if x=x₀x₁. . .is obtained by independent trials of Bernoulli distribution (p,1−p), then with probability 1 the effective Hausdorff dimension of x is H(p).

finite version: ifp is a frequence of 1s in an-bit string x, then C(x)≤nH(p) +O(logn)

Even for genome (or a long novel) the notion of complexity has sense: different “natural” programming languages give complexities that are 10²–10⁵ apart (the length of a compiler)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(78)

Good news

Still the asymptotic considerations have sense

e.g., one can define “effective Hausdorff dimension” of an individual infinite bit sequence x₀x₁. . .as

lim infC(x0. . .xn−1)/n

(Hausdorff dimension for a singleton?!)

theorem: if x=x₀x₁. . .is obtained by independent trials of Bernoulli distribution (p,1−p), then with probability 1 the effective Hausdorff dimension of x is H(p).

finite version: ifp is a frequence of 1s in an-bit string x, then C(x)≤nH(p) +O(logn)

Even for genome (or a long novel) the notion of complexity has sense: different “natural” programming languages give complexities that are 10²–10⁵ apart (the length of a compiler)

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

(79)

Parallels with classical information theory

H(X,Y)≤H(X) +H(Y)

computation: convexity of logarithms

In AIT: if there is a short program computing x, and another short program computing y, they could be combined into a program that computes a pair (x,y) (some encoding of it) complexity of a pair = complexity of its encoding (change of the encoding is a computable transformation, so only

O(1)-change in complexity)

C(x,y)≤C(x) +C(y) +O(log(C(x) +C(y)) logarithmic overhead needed to separate the programs why so different arguments for parallel statements?

alexander.shen@lirmm.fr, www.lirmm.fr/~ashen Algorithmic information theory: a gentle introduction

Références

Télécharger maintenant ( PDF - 221 Page - 707.01 KB )

Documents relatifs

Different versions of the nerve theorem and rainbow simplices

The base case is given by Theorem 3 which is the special case when k = −1 (only unions are considered), proved in Section 2. A crucial ingredient in the induction is a lemma that

Detecting Different Versions of Ontologies in Large Ontology Repositories.

The rules presented here reflect the main characteristics of the classes of URIs discussed in Section 2, but can be easily extended for additional patterns (many of them being

Les versions des historiens

1° S’il fait bien sa révolution en 1798, c’est pour exiger et obtenir l’égalité entre les sujets du Bas et les seigneurs du Haut ; mais tan- dis que le Bas-Valais

Algorithmic Information Theory

Keywords Kolmogorov complexity, algorithmic information theory, Shannon information theory, mutual information, data compression, Kolmogorov structure function, Minimum

A Generic Algorithmic Framework to Solve Special Versions of the Set Partitioning Problem

Section 5 made the specialization steps explicit: (1) formalization of the admissible parts and the resulting admissible partitions, (2) analysis of the generic algorithm exe-

On What Condition can Divorce be Considered as a Risk?

Another example illustrates the complexity of the variables in question: the fact that young people having experienced their parents' process of divorce and lived with their

AVEC DES CAROTTESdeux versions

Remplacez la choupette de chantilly par un fromage blanc frais fouetté avec ail en purée et persillade au goût déposé en quenelle sur la coupelle de potage bouillant De même

Some classical function theory theorems and their modern versions

u/h has a fine limit, but not necessarily a non-tangential limit or even a normal one, almost everywhere on the boundary, for the boundary measure determining the harmonic component

Téléchargez tous les documents en téléchargeant vos documents d'étude.

Votre document sera enrichi, partagé sur 123dok FR pour vous aider à étudier.

Documents relatifs

Assessment of performance of smallholder dairy farms in Kenya: An econometric approach

Assessment of performance of smallholder dairy farms in Kenya: An econometric approach

9

0

0

Examen gynécologique chez les femmes de plus de 65 ans en médecine générale pour le dépistage des cancers gynécologiques, freins et limites‎ : étude qualitative auprès de 20 praticiens

Examen gynécologique chez les femmes de plus de 65 ans en médecine générale pour le dépistage des cancers gynécologiques, freins et limites‎ : étude qualitative auprès de 20 praticiens

141

0

0

The Infinite Versions of LogSpace!=P Are Consistent with the Axioms of Set Theory.

The Infinite Versions of LogSpace!=P Are Consistent with the Axioms of Set Theory.

12

0

0

Comparing different versions of tiling cohomology

Comparing different versions of tiling cohomology

26

0

0

Learning with stochastic inputs and adversarial outputs

Learning with stochastic inputs and adversarial outputs

44

0

0

Discrete versions of the transport equation and the Shepp-Olkin conjecture

Discrete versions of the transport equation and the Shepp-Olkin conjecture

29

0

0

View of TIPS for the Treatment of Portal Hypertension-Related Complications in 2016

View of TIPS for the Treatment of Portal Hypertension-Related Complications in 2016

11

0

0

Laboratory gouge friction: Seismic-like slip weakening and secondary rate- and state-effects

Laboratory gouge friction: Seismic-like slip weakening and secondary rate- and state-effects

5

0

0