Statistical mechanics and visual signal processing

(1)

HAL Id: jpa-00247030

https://hal.archives-ouvertes.fr/jpa-00247030

Submitted on 1 Jan 1994

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Statistical mechanics and visual signal processing

Marc Potters, William Bialek

To cite this version:

Marc Potters, William Bialek. Statistical mechanics and visual signal processing. Journal de Physique I, EDP Sciences, 1994, 4 (11), pp.1755-1775. �10.1051/jp1:1994219�. �jpa-00247030�

(2)

Classification Physics Abstracts

05.20 87.10 87.32

Statistical mechanics and visual signal processing

Marc Potters(~,~>*) and William Bialek(~)

(~) ^NEC ^Research Institutej 4 Independence Wayj Princetonj New Jersey 08540, U-S-A- (~) Department of Physicsj Princeton Universityj Princetonj New Jersey 08544j U-S-A-

(Received 27 Jauuary 1994j accepted in final form 28 July 1994)

Résumé. Nous montrons comment employer le langage de la théorie statistique des champs

pour poser et résoudre des problèmes où l'on doit estimer _une caractéristique de l'environnement à l'aide de données provenant ^d'un ^ensemble ^de détecteurs. Dans _ce formalisme, l'estimateur

optimal _peut être écrit

comme la valeur moyenne d'un opérateurj l'ensemble des données d'entrée

agissant _comme _un champ externe. Les problèmes à faible rapport signal-bruit sont résolus _par la théorie des perturbations. La méthode du col est employée _pour _ceux à haut rapport Signal-bruit.

Ces idées sont illustrées _en détails _sur _un modèle d'estimation visuelle du mouvement basé _sur

un problème résolu _par la mouche. L'estimateur optimal _a _une structure très richej s'adaptant à divers paramètres de l'environnement tels la variance du _contraste et le temps de corrélation des

variations de contraste. Cette structure est _en accord qualitatif

avec les résultats d'expériences

sur des _neurones du système visuel de la mouche. De plus, les propriétés d'adaptation de

l'estimateur optimal peuvent aider à résoudre certains désaccords _au sujet de l'interprétation de

ces résultats. Nous proposons finalement quelques tests directs de _ces propriétés d'adaptation.

Abstract. We show how to _use trie Ianguage of statistical field theory to address and solve

problems _m which _one must estimate _some aspect ^of ^trie environment from trie data _in _an _array

of _sensors. In trie field theory formulation trie optimal estimator _can be written _as _an expectation

value _in _an ensemble where trie input data act _as externat field, Problems at Iow signal-to-noise

ratio _can be solved in perturbation theoryj while high signal-to-noise ratios _are treated with _a saddle-point approximation. These ideas _are illustrated in detail by _an example ^of ^visual motion

estimation which is chosen to model _a problem solved by trie fIy's bram. Trie optimal estimator bas _a ricin structure, adapting to _various parameters of trie environment such _as trie mean-square

contrast and trie correlation time of contrast fluctuations. This structure _is _m qualitative accord with existing measurements _on motion sensitive _neurons

m trie fly's brain, and trie adaptive properties of trie optimal estimator _may help resolve conficts _among dilferent interpretations of these data. Finally we propose some crucial direct tests of trie adaptive behavior.

(* ^Preseut ^address: Dipartimento di Fisica, Università di Roma I, Piazzale Aldo Moro 2, 00185 Roma, Italy. potters©ramai.inln.it.

(3)

1. Introduction.

Imagine walking along a busy city street. We _are almost _unaware of the mynad tasks which _ouf brains _are performing: _we _use _a combination of visual and vestibular signais to keep ourselves

upright, _sensors ⁱⁿ ouf feet and Ieg muscles help adjust _our stride to the terrain, _we Iisten for

cars and people ^behind _usj vision helps _us _recognize the café _we _are approaching ^and ^identifies

our friend who wiII meet us there, and aII these _senses combine to provide _us with _a trajectory

which reaches _our goal ^and avoids obstacles. Ail of these tasks involve signal processing, and

we have the qualitative impression that _we (and other animais) _are quite good at solving these

problems. We show that statistical mechanics provides _a natural language for formulating

and solving signal _processing problems, and that the statistical mechanics models provide _a predictive theory of how real brains solve the corresponding problems.

Î.I PHYSICAL LIMITS AND BIOLOGICAL SIGNAL PROCESSING. Signal _prOCeSS1ng iS in _essence

a physics problem. As _an example, in vision the _precise formulation of _any task _must begin

with the fact that images are blurred by diffraction and corrupted by photon shot noise. These irreducible limitations in the quality of the input signal limit the reliability with which the brain

(or _any device) can estimate what is really going _on in the visual environment-we _can _never be truly certain of what _we _are looking at, There _are physical limitations from the hardware _as

well-cells of _a certain size _are bound to generate electrical noise, signais _are passed from _one point to another along cables of limited information capacity, These physical limitations to the reliability of computation are actually relevant to the operation of real brains. Indeed

there _are _many _cases where the performance of the _nervous system is essentially equal to the Iimit that _one calculates from first principles

[1~ 2]; examples _range from photon counting in toads and humans [3] ^to visual motion estimation _in flies

[4~ 5] ^to ^acoustic coding in frogs and crickets _[6] and echo delay estimation in bats [7]. These observations strongly suggest ^that

a theory ^of optimal signal processing will help _us understand the computational strategies ^of brains.

1.2 CHOOSING A PROBLEM. Of the many examples of signal processing in the _nervous

system, our discussion is motivated most directly by trie problem of motion estimation in the

fly's visual system. ^In many insects trie visual system produces movement signais which _are used to control the flight muscles and thereby stabilize flight. In trie _case offlies, visually gmded flight behavior has been studied both by measuring flight paths during natural behaviors _[8]

and by examining the torques produced by flies hangmg from _a torsion balance [9, 10]. Input

to the motion computation _comes from _a single class of photoreceptor cells which _are arrayed

in the regular Iattice of the compound _eye, and the signal and _noise properties ^of these cells _are

extremely weII characterized. The output ^of ^the movement computation _can be monitored in

a handful of cells [11], ^and destruction of these few _neurons produces clear deficits in the fly's optomotor ^behavior [12]. ^CeIIS ^of essentially identical structure occur in each fly, and these cells have _responses to visual stimuli which _are quantitatively reproducible from individual to

individual. Thus the cells _can be named and numbered based _on either structure or functionj under favorable conditions _one _can record from the ceII HI, which codes wide field~ horizontal

movements of trie visual field~ for periods _{of up} to rive days _[13]. The accessibility of quantitative measurements at each of several Iayers _in trie _nervous system photoreceptors, second order

neurons~ motion-sensitive cells, flight behavior makes the fly a nearly ideal testing ground

for theories of neural signal processing.

(4)

1.3 RELATION _TO PREVIOUS WORK. Signal processing is _an _enormous field with _a diverse set of applications _[14]. At the _core of this field _is the description of random functions, whether they be functions of time (e.g., sounds) _or _space (e.g., images). Nonetheless, the bulk of the

Iiterature _on signal processing does not _seem to make _use of the functional integral methods which _seem _so natural from the perspective of statistical mechanics. In 1988, however~ Zweig and Lackner [15] studied _a model signal processing problem, reconstructing trie configuration of

a randomly moving plate using _a set of sparse and noisy observations of displacements at par- ticuIar points along trie plate. Choosing trie random motions from trie Boltzmann distribution, they emphasized the connection to statistical mechanics.

Traditional approaches to the problem of "optimal estimation" begin by defining _a class of estimation strategies and then _use variational methods _to search this class for the best estimator. This approach _goes back to Wiener and Kolmogorov [2( _28], who found the optimal Iinear filters for recovering and extrapolating signais in noisy backgrounds.The Wiener-Kolmogorov

results essentially close _a broad class of questions related to optimal estimation by Iinear filters, but Ieave completely _open problems where the optimal estimator may be _a non-Iinear function of the input ^data. ^We ^shall see that the statistical mechanics approach gives _us tools for going beyond Iinear (or Iowest order non-Iinear) estimators. Rather than searching through _a Iimited class of estimators for _a Iocal optimum~ the mapping to _a statistical mechanics problem allows

us to construct trie globally optimal estimator.

In the _case of vision~ it is widely recognized that unconstrained variational approaches to estimation result in III-posed problems _[16]. The "regularization" of these III-posed problems

can be given a probabilistic interpretation, in which the optimal estimator reflects _a compromise between fitting the available data and taking account of _a priori knowledge about the expected

distribution of incoming signalsj this is the _same structure described by Zweig and Lackner in their model problem. Diiferent regularization terms thus represent ^diiferent hypotheses about the statistical structure of the visual environment~ but there has been very Iittle experimental

work to charactenze these statistics il?]. Following Geman and Geman [18]~ there has been considerable focus _on "Markov random field" models for image statistics, which _are equivalent

to Boltzmann distributions with local Hamiltonians, but there has been relatively Iittle work exploiting trie fuII statistical mechanics structure of _even these simple models. In particular, if _one takes the statistical models seriously _as models of the problems that _are solved by real brains, there is the clear prediction that trie signal processing strategies of the brain must

adapt to the statistical structure of the environment.

The connection between statistical mechanics and signal processing has been used in previous work _to study vision at very Iow photon flux, where the Iow signal-to-noise ^ratio (SNR) Ieads to _a universal "pre-processing" stage in any estimation task _{[19~ 21~].} The _structure of this pre-processing step ^is in good agreement with the _responses of cells _in the first stage of retinal

signal processing, and _as for _as _we know this _is the first example of _a successful parameter-free prediction of neural responses. Most sensory systems ^encode incoming signais _in _sequences of

discrete pulses~ termed spikes _or action potentials~ and the problem of decoding these puise

sequences can again be given _a statistical mechanics formulation [21]j ^the resulting predictions

concerning the algorithms for optimal decoding have been confirmed in expenments on a wide

variety of organisms [22]. Finally, the Iow SNR Iimit of motion estimation _in fly vision has also been studied [2, 23~ 24] ^and ^this analysis _was crucial in establishing that the fly does make optimal estimates, but _we shall _see that by restricting attention to Iow SNR _one _misses

a large amount of structure which is Iikely relevant under natural conditions. Attempts to go

beyond the Iow SNR perturbation theory have been confined _to model problems _[25]. For _a

more detailed description of the present work~ the reader is referred _to [26].

(5)

2. Simple examples of signal processing.

A typical signal processing problem consists of extracting information from the output ^of a

device corrupted by noise. The signal might be encoded in the data in _a complex _way _or might

even be present only statistically (as in the case of prediction of time serres). The task is

then~ given the statistical properties of the quantities of interest~ to find the optimal estimate of the signal using the data. To do _so, _one must decide _on _a definition of the quality of _an

estimate. There _are two obvions choices: maximum Iikelihood and Ieast mean-square error.

The estimator which provides Ieast mean-square error is the conditional _mean~ that is the average value of the signal given the data.

There _are other cost functions whose minimizations Iead to diiferent estimators, but these _two have _a natural connection to statistical mechanics and field theory Ieast mean-square error

estimators _are just expectation values~ while maximum-Iikehhood estimators correspond to classical _or mean-field approximations. Notice that since the Ieast mean-square error estimator is _an expectation value, _we _can find "optimal" estimators without doing _any further variational calculations. Before _we proceed with _our main problem~ Iet _us explain _our approach _on _a few simple models where everything _can be computed exactly.

2.1 ONE _VARIABLE ESTIMATION. Consider the following problem: estimate _a signal _con-

sisting of _a single number _s using the output _y of _a detector contaminated by noise, _y ₌ _s + i~.

Since the _noise _is random and the true signal is unknown to the observer, this problem is _a

probabilistic _one. TO find the best estimate of the signal~ we need to construct the probability of _s given the output _y. ^We use Bayes' theorem to write down the conditional probability of

s given y:

~~~~~~

~Î~Î~~~'

~~~

P[y[s] describes the detection process: how the output ^is ^related to the input signal and the

probabilistic nature of the _noise. Pis] reflects the _a priori knowledge of the signal, and Pipi _is independent of _s and thus _serves _as _a normalization.

Suppose first that both signal and noise _are chosen from independent Gaussian distributions with variance S and N respectively. We find

/~ ^(S ⁺ ^N)(s ^j~y ^)~

~~~~~~

2~rNS ~~~

2NS ~~~

The conditional probability distribution is thus also Gaussian with _mean /§ _and

variance

). _In _this _example, _the _maximum _and _the

mean of the conditional probability distribution

are equal and _are given by

~

~~ S ₊ N) ^~' ~~~

The best estimate of the input is thus _a Iinear function of the detector output.

To illustrate the emergence of non-hnear estimators, consider _a small modification of _our

toy-model. Suppose _now that the signal is positive and taken from _an exponential distribution

with mean 1la and that the _noise is Gaussian and additive but has _a variance proportional to

the signal (1.e. N

= fis).The analog of equation (2) is _now

~~~~~~ Z~ ~~~ °~

~2flÎ~~

' ~~~

(6)

where Z is _an s-independent normalization factor. The maximum hkelihood estimator, which maximizes equation _(4)~ _is given by

~~~ ~~~

~~I~Î~ÎÎ~~ ^~' ~~~

To compute ^the ^Ieast mean-square estimator, we need to average s with respect to P[s[y].

After _some algebra ^and _a trip to _our favorite integral table _we find

~LMS à

~ ÎYÎ (~)

~ l + 2afl /Î@'

In this example~ the two definitions of optimal estimation Iead to diiferent resultsj both _are non-Iinear functions of the output variable _y. This of _course _comes from the non-Gaussian nature of the signal and the dependence of the _noise _upon the signal. Amusingly, _in this _case the optimal estimator is not _even monotonic in the detector output _y.

2.2 CAUSAL _LINEAR _FILTERING. Let _us _now consider the estimation of continuons signais by doing the time dependent version of _our Gaussian toy-model. Suppose _now that s(t) and

i~(t) are functions of time chosen from Gaussian distributions with _power spectra S(w) and

N(w). We find the conditional probability distribution

i i jm ^du ^jy(wj ^s(wjj2 ¹ jm ^du js(wjj21~

~i8iYi = > ^e~P -à

~~

ù Njwj à

~~

ù s(wj ^' ⁷⁾

with Z, a normalization constant. To find (s(w))y we compute ^the ~'equation of motion" and find in the time domain:

OJ OJ ~~ ~(~j

s~(t)

= /_~ dTg(T)y(t T) where g(T) ₌ /_~ _2~r^-e~~"~ _S(w) (8)

+ N(w)

We _see that the problem of estimating continuons functions of time is just the problem of estimating the independent Fourier componentsj for each component we have _an equation

identical to equation (3) for the estimation of _a single variable. It should be clear that this

simplicity is tied to the assumption ^of ^Gaussian distributions for both the signal and the noise.

The filter g(T) is acausal, but any real device which takes y(t) _as _an _mput and returns an estimate of the signal s(t) must be causal. The optimal causal estimator would require knowledge of the output y(t) only _up to the present _or~ _more generally, _up to _a small delay

time _To in the future. The delay time _To _can be negative, corresponding to a predictor instead of _an estimator. This problem _was solved by Wiener [27] and Kolmogorov _[28] ^who ^assumed

hnear filtering and found that the optimal causal filter only required knowledge of the signal

and noise power spectra. We wiII _assume that the signal and noise _are chosen from Gaussian distributions and find that the optimal estimator _is _a hnear filter.

The task is to estimate s(-To) given the output _up to time t

= l~. We write the estimator

as the conditional _average of the random variable s(-To) _upon observing y(t _< _l~). This is

eqmvalent to averaging first with respect to the conditional probability of _s given aII of _y and then averaging over y+(t) _using the conditional probability P[y+(t)[y~(t)] where

Î~ÎÎÎÎÎ

"

~~~ Î~ÎÎÎÎÎ

"

Î(t). ^~~~

(7)

We need to write down the probability distribution of y+(t) given y~(t),

~ PIY(t)i

~lY _~~~lY~~~ _" (io)

Piy-(t)i'

The key trick [27] ^is ^to ^write

~~~~~~~ N(w) S(wl' ^~~~~

where fil(w) is chosen such that it has _no potes in the _Upper half plane. The probability

distribution in the time domain is then

P[y(t)]

=

j

exp 1-j/~ _dT /~ ~°y(w)fil(w)e~~"~ j

(12)

~" ~" ~

The function z(T) defined by z(T) =

/~ )Y(~)if(~)e~~"~i ⁽¹³⁾

is causal (1.e. ^it only depends _on the values of y(t) for _{t <} T).

~~~ '~~~ _~~~ lÎ ^/Î _~~ /Î _ÎÎ~

~~~~~~~~ ^~~~

~

j~ _dT /~ ) _{(v~ (~)}

+ v+ (Ld))ill~)e~~"~ ~ll14)

PiY~i

= /idY+iPiY~i^Y+i ^(lS)

To do the integral _over y+(t) _we change variables to z+(T) defined _as z(T) for _T > l~. We notice that this _is _a hnear transformation~ and _since ~(T) is causal the transformation matrix

is Iower triangular. Therefore the Jacobian of this transformation is independent of y~ (t). The Gaussian integral _over z(T) _gives _an overall factor that _can be absorbed in the normalization of _our probability distribution. P[y~] _is then given by the first _term _on the h-s- of equation (14) which _can be substituted in equation (ll~) to give

Plv+lt)lv~lt)1= j

exP 1-jj~ dT /~ ) _(v~l~)

+ v+l~)) ill~)e~~"~ j

l16)

As mentioned earlier, _we first average s using P[s[y] and then average the result with respect to P[y+[y~] The result of the first averaging îs ^the âcausal êstimator (8) evaluated at t

= -To,

8nc(-To)

=

/~ )e~"~°Y(Ld)S(~)if(Ld)4i(~). ⁽¹⁷⁾

Since suc(-To) _is linear in pli)

j

averaging with respect ^to (16) amounts to replacing pli) by its average value which solves the equation of motion:

/~° _(y~(~)

+ y+(~)) iÎ(~)e~~"~

= 0 for _T > 0. (18)

1 ^~r