Conclusions - Learning to compress and search visual data in large-scale systems

multiple layers, the data to be encoded, effectively follows the i.i.d. Gaussian noise structure.

2.5 Conclusions

In this chapter, we reviewed some fundamental concepts from signal and image modeling and provided several important instances of the existing solutions from the literature.

We framed all attempts under the Bayesian framework that tries to systematically merge two sources of knowledge: the data and the prior.

While the data can be incorporated to the Bayesian objective in various ways, we saw that most variations that differentiate between algorithms come from the way the prior knowledge is incorporated into the formulation and its consequences on how the solution is actually achieved through mathematical optimization.

We argued that the choice of the prior directly influences the quality of the solution w.r.t.

the training samples available. In particular, we noticed that methods from the literature can roughly be divided into two broad categories: the basic and the composite models, as we termed them. While the basic models are more intuitive to understand and analyze, faster to train and perhaps require less number of samples, the composite models, on the other hand, can benefit from the availability of larger amounts of training data and can provide with better solutions under this regime.

Basic models are more common in signal processing communities and can roughly be divided into two families of synthesis and analysis priors. While each of them has their own particularities, excellent theoretical treatments already exist for both. On the other hand, the composite models have developed rather within the machine learning and deep learning communities. This has provided us with excellent practical know-how leading to an advanced technology capable of achieving auspicious results.

We observed, however, that there does exist a noticeable gap between the two. In particular, we do not seem to be able to develop composite models by building on top of the basic ones in a systematic way. In fact, the performance of basic models seems to have somehow saturated, while our understanding of composite models seems to be only practical.

This thesis adopts its strategy for signal modeling as trying to develop composite models by repeatedly invoking basic ones. We realize this idea using the additive residual structure which is rooted in the concept of successive refinement of information. We pursue this idea in the next part of the thesis.

Part II

Algorithms

Chapter 3

Single-layer architectures

In chapter 2, we saw a general literature overview of signal modeling and how for a broad variety of tasks, similar ideas for signal decomposition and prior modeling can be framed under the Bayesian paradigm. We further sketched a general picture of the strategies of this thesis in using priors and how they relate to the considered literature.

In this chapter, we first conceptualize a general framework in section 3.1 to encompass most of the objectives and ideas followed in this thesis. Later in the third part of the thesis, the different flavors of these ideas show up when addressing several applications. Inspired by signal processing literature, we next pursue the solutions to these general problems by making them more concrete within two general strategies, the synthesis and the analysis prior models.

Our synthesis model treatment leads us to the Variance-Regularized K-means (VR-Kmeans) algorithm in section 3.2 and our analysis formulation leads to the Sparse Ternary Codes (STC) of section 3.3. We start the development of these algorithms by assuming an underlying probability distribution for the data. We then lift these assumptions and gradually shift to more data-dependent solutions.

While the algorithms developed in this chapter follow a structure that we refer to as a

“single-layer architecture”, we will see their limitations and make them more intricate and powerful in chapter 4, where we discuss “multi-layer architectures”.

3.1 General objective: encoder-decoder pair

For a lot of purposes relevant to this thesis, it can be very useful to encapsulate different objectives under the “encoder-decoder” split. This is defined as follows:

Consider an encoderQ[·] :ℜⁿ → X^m that assigns a code x =Q[f] to a vector f ∈ ℜⁿ. The idea is to limit the entropy of representation from ℜⁿ to a lower-entropic space X^m, which is not necessarily a Hilbert space, perhaps for some coding or mapping efficiency.

Furthermore, for some applications, we might be interested in efficiently storing and indexing this representation in memory. Therefore, we may also choose a discretized alphabet for x.

For this basic setup, a general optimization objective would be to minimize some cost function c(·) : ℜⁿ → ℜ, that measures the deviation of the data w.r.t. some (perturbed) observation q as:

minimize

Q[·] c(f,q) s.t. Ω{x},

ΩQ[·] ,

(3.1)

where Ω{x}and ΩQ[·] represent a set of constraints on the code and the encoder respectively.

Given the code x, for a certain set of tasks like compression, we are interested in recon-structing the originalf, either exactly or approximatively. Therefore, we define accordingly a decoderQ⁻¹[·] :X^m→ ℜⁿ that reconstructsf by decoding x, denoted as ˆf =Q⁻¹[x].

We may then focus on the quality of reconstruction within a trade-off with a set of constraints ΩQ[·],Q⁻¹[·] on both the encoder and decoder. This idea can be formalized as:

minimize

Q[·],Q⁻¹[·] dE(f,ˆf) s.t. Ω{x},

ΩQ[·],Q⁻¹[·] ,

(3.2)

where dE is the Euclidean distortion measure between two vectorsf and ˆf, and is defined as:

dE(f,ˆf)≜ 1

n||f−ˆf||²₂, (3.3)

and whose expected value is a fundamental property of an encoding and is referred to as the distortion, which is defined as in Eq. 3.4a, if the distribution is known; or as in Eq. 3.4b, if training samples are available instead:

D=E[dE(F,Fˆ)]. (3.4a)

Dˆ= 1 N

i=1

dE(f_i,ˆf_i). (3.4b)

R= 1

nE[# bits used for encoding ]. (3.4c) Depending on the code constraints, i.e., Ω{x}, the codes need different number of bits to represent them. In other words, Ω{x}specifies the rate of encoding, another fundamental property of an encoding scheme which is defined as in Eq. 3.4c.

While it is desired to reduce both the rate of encoding, i.e., to have more compact codes, and the distortion of reconstruction, i.e., more fidelity to the data, for any source of information and under any encoding scheme, these are in fact conflicting requirements.

3.2 The Variance-Regularized K-means problem 41

Dans le document Learning to compress and search visual data in large-scale systems (Page 52-58)