Highlights - Our main contributions - Learning to compress and search visual data in large-scal

1.4 Our main contributions

1.4.2 Highlights

Here we highlight some of the contributions of this thesis in the order of their appearance:

1. Introducing the rate-allocation regularization into the formulation of K-means, solving the optimization problem, analyzing its solution and showing its efficiency in avoiding over-fitting in high-dimensional scenarios.

2. Introducing the framework of Sparse Ternary Codes (STC) as a universal encoding-decoding mechanism for lossy compression of correlated vectors, solving for optimal parameters and characterizing its rate-distortion performance.

3. Developing the introduced synthesis-based and analysis-based prior models under the successive refinement principle as the RRQ and the ML-STC frameworks, and expanding their operational rate-distortion regime to arbitrary values.

4. Proposing a range of effective choices to target different trade-offs of sample complexity and model capacity for both families of methods. In particular, the RRQ fills the spectrum of possibilities by changing the regularization from infinity to zero, while the ML-STC becomes more data-dependent rather than prior-based, by shirting to ML-STC-Procrustean, STNets and STNets-Procrustean frameworks.

5. Introducing a novel neural network architecture, which is developed from basic compo-nents and is capable of significantly reducing the training time and sample-complexity by benefiting from pre-training the basic components, as well as fine-tuning using back-propagation.

6. Proposing a systematic way to be able to back-propagate in the presence of the non-differentiable quantizer function used in the network, i.e., the ternarizing operator, as the result of studying the information concentration properties of this function.

7. Defining the notion of coding gain for similarity search using information-theoretic measures and as a systematic way to measure the efficiency of achieving the triple trade-off between memory, complexity, and performance in the problem of similarity search, as well as showing the superiority of the STC as a viable alternative to binary hashing.

8. Proposing a decoding mechanism for STC that significantly reduces the computational complexity w.r.t. the exhaustive search, as well as its extension to multiple layers.

9. As compared with the two families of existing solutions in the literature of similarity search, proposing a middle-ground solution between these two families that benefits from efficient search in the space of codes, while at the same time refines the results with accurate estimates of distances, thanks to its excellent rate-distortion performance.

10. Performing learning-based image compression on the two scenarios of high-resolution natural images and domain-specific facial images, and showing promising results for compression and advocating the idea of learned compression, as an alternative to data-agnostic solutions.

11. Injecting the effective prior of learned compression on the problem of denoising of domain-specific data and showing superior performance w.r.t. the state-of-the-art under very noisy conditions.

12. Investigating compressibility as a prior to solve inverse problems and proposing an effective iterative algorithm to achieve it. The algorithm decouples the optimization of compression network from the optimization to solve the inverse problem and is hence very flexible and can be used in many practical scenarios while being able to invoke any compression paradigm as its underlying engine.

Chapter 2

Image models: literature overview

To achieve different objectives and to target different applications, a multitude of processing tasks should be performed on signals. These tasks, within their application context, try to make sense out of signals in one way or another. Focusing on images in particular, famous examples of these tasks are “image restoration”, “image compression”, “compressive sensing”,

“image recognition” and “(content-based) image retrieval”.

Image restoration involves cases, where a physical phenomenon has degraded the quality of the given image, e.g. as in “image denoising”, where noise has contaminated the image,

“image de-blurring”, where the image at hand is blurred, “image inpainting”, where parts of the image has been lost or degraded, “image super-resolution”, where the resolution of the given image is lower than desired. In all these tasks, the objective is to undo the degradation process, perhaps approximatively. Therefore, these tasks are also related to as

“inverse-problems”.

Image compression involves finding a more compact representation for an image than what the direct representation of its pixel numerical values would take from memory. Finding compressive representations is an important focus of this thesis for which we develop different solutions in the chapters 3 and 4. Focusing on images in particular, we provide image compression solutions later in chapter 6.

Compressive sensing tries to reduce the number of times image sensors are being used to reproduce an image with a certain quality. This is important, e.g., for applications like medical tomography, where the acquisition process is very slow, expensive and exposes the patient to radiations. So given the under-sampled observations, recovering the original image is the primary objective of this application.

Image recognition involves assigning a semantic label to an image. The procedure is based on a training phase, where different images with assigned labels are present to the algorithm, and then a test phase, where the labels of some other similar images are to be predicted based on the seen examples from the training phase.

In content-based image retrieval, usually without the availability of categorical labels or keywords, an image is presented to the retrieval system and queries its similar images from a (usually large) collection of images within the system’s database. Chapter 5 is dedicated to the similarity searching within these databases by reviewing fundamental concepts and providing our contributions.

These tasks seem to be very diverse, take different forms, and are studied even in different communities. Obviously, not all of them fit within the scope of this thesis. However, it is important to point out that they all use somehow similar principles for their solutions. So the understanding of these common principles might turn out to be mutually beneficial for these applications. Next in section 2.1, we use the Bayesian framework to somehow conceptualize and unify such efforts. This is a useful start to understand how these problems are posed.

In sections 2.2 and 2.3, we then provide a generalist literature overview of signal modeling efforts within the signal processing and machine learning communities. Finally, section 2.4 positions the ideas used in this thesis with respect to the literature.

2.1 Bayesian framework

Almost all attempts in signal modeling can somehow be interpreted under the Bayesian framework, either explicitly, or through some of its variations like the empirical Bayes, where signal priors are learned from the data.

The general Bayesian principle involves incorporating and merging two components: First, the evidence or observations, i.e., the given data; and second the prior beliefs, i.e., the signal models or signal decompositions.

Suppose we are given the observationq that we want to infer an underlying phenomenon f from it. Within a probabilistic setup involving randomness, this task can be posed as finding the posterior probability distribution p(f|q). While this might be impractical to calculate directly, the Bayes rule provides us with an alternative:

p(f|q) = p(f)p(q|f) p(q) ,

where p(f) is a (subjective) prior belief about that underlying phenomenon and is injected to our observations along with the likelihoodp(q|f), which is usually much easier to handle than the direct p(f|q). The estimation off can then be formulated as:

f = argmax

p(f|q) = argmax

p(f)p(q|f) p(q) .

We next try to elaborate on these two components of the Bayesian framework through a very generalist and non-exhaustive narrative of signal modeling literature.

2.1 Bayesian framework 21

Dans le document Learning to compress and search visual data in large-scale systems (Page 34-38)