• Aucun résultat trouvé

Nonlinear transform learning: model, applications and algorithms

N/A
N/A
Protected

Academic year: 2022

Partager "Nonlinear transform learning: model, applications and algorithms"

Copied!
217
0
0

Texte intégral

(1)

Thesis

Reference

Nonlinear transform learning: model, applications and algorithms

KOSTADINOV, Dimche

Abstract

Les principes de la modélisation de non-linéarités sont essentiels pour maints problèmes de la vie réelle. Leur traitement joue un rôle central et influence non seulement la qualité de la solution, mais aussi la complexité computationnelle et les gains dans les compromis possiblement impliqués, qui sont tous hautement demandés dans une variété d'applications, comme la prise du contenu des empreintes digitales active, la reconstitution des images, l'apprentissage supervisé et non-supervisé des représentations discriminatives pour des tâches de reconnaissance d'image et les méthodes de regroupement. Dans la thèse présente un modèle de transformation non-linéaire généralisé novateur est proposé et étudié. Notre intérêt principal et élément de base est la transformation non linéaire exprimée par une double opération qui consiste en une modélisation linéaire suivi d'une non-linéarité par éléments. Pour ce faire, selon l'application considérée, des interprétations probabilistes sont développées et des généralisations et des cas particuliers sont proposées et considérées.

Une [...]

KOSTADINOV, Dimche. Nonlinear transform learning: model, applications and algorithms. Thèse de doctorat : Univ. Genève, 2018, no. Sc. 5335

URN : urn:nbn:ch:unige-1185338

DOI : 10.13097/archive-ouverte/unige:118533

Available at:

http://archive-ouverte.unige.ch/unige:118533

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

UNIVERSITÉ DE GENÈVE FACULTÉ DES SCIENCES

Département d’Informatique Professeur S. Voloshynovskiy

Nonlinear Transform Learning:

Model, Applications and Algorithms

THÈSE

présentée à la Faculté des sciences de l’Université de Genève pour obtenir le grade de Docteur ès sciences, mention informatique

par

Dimche Kostadinov

de

Strumica (Macedonia)

Thèse no 5335

GENÈVE

Repro-Mail - Université de Genève

2018

(3)
(4)

NONLINEAR TRANSFORM LEARNING:

MODEL, APPLICATIONS AND ALGORITHMS

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF UNIVERSITY OF GENEVA

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Dimche Kostadinov May 2019

(5)

c Copyright by Dimche Kostadinov 2019 All Rights Reserved

ii

(6)

In memory of my father To all that I care about with love and eternal appreciation

(7)
(8)

Acknowledgements

I would like to thank my supervisor Prof. Sviatoslav Voloshynovskiy for providing me op- portunity to work on this PhD Thesis, all his encouragement, considerations and involvement at all times, the discussions, the insights and all the rest of his support. I would like to thank my jury members Prof. Karen Egiazarian, Prof. Teddy Furon, Prof. Sylvain Sardy and Prof.

Stéphane Marchand-Maillet for their careful reading, valuable suggestions and comments.

I would like to thank Taras Holotyak for his time spend in the discussions and reading of my draft concepts as well as providing valuable comments. I would like to thank Sohrab Ferdowsi for taking the time and participating in discussions regarding my presentations, elaborations and outlines of many of my ideas on the board in our office. I’m thankful to Maurits Diephuis for been involved in providing comments on the English writing over many papers. In addition, I would like to thank Behrooz Razeghi for his interests and enthusiasm in some of the concepts over several works and getting involved in providing comments towards certain details and clarifications. I want to mention all of the rest of my colleagues from our Stohastic Information Processing Group, who have directly or indirectly provided a support in terms of discussions, comments and suggestions related to the work in this Thesis.

I would like to thank our head of the Computer Vision and Multimedia Laboratory Prof.

Thierry Pun for outlying his supportive attitude and enabling a great working environment, as well as Prof. Stéphane Marchand-Maillet and Prof. Alexandros Kalousis for providing me with insights and suggestions related to some machine learning aspects.

I would like to thank Fokko Beekhof and Farzad Farhadzadeh for their initial help during my relocation to Geneva. I would like to thank Boris Petrov Lambrev for providing me his help with the translation of the abstract. I would like to thank also the rest of the colleges of the Computer Vision and Multimedia Laboratory with whom I have spend a wonderful time, made professional as well as personal bonds. I’m thankful to many of the colleges with whom I had many coffee breaks with interesting and re-energizing conversations. I would like to thank Edgar Francisco Roman, Sohrab Ferdowsi, Ke Sun, Majid Yazdani and Michal Muszynski for the delightful accompany and the great hang outs.

Finally, I would like to thank my mother, my two sisters and their families for their never-ending support and I would like to thank to my wife.

(9)
(10)

Abstract

Modeling of nonlinearities is essential for many real-world problems, where its treatment plays a central role and impacts not only the quality of the solution but also the computational complexity. Its high prevalence impacts on a variety of applications, including active content fingerprinting, image restoration, supervised and unsupervised discriminative representation learning for image recognition tasks and clustering.

In this thesis, we introduce and study a novel generalized nonlinear transform model. In particular, our main focus and core element is on the nonlinear transform that is expressible by a two-step operation consisting of linear mapping, which is followed by element-wise nonlinearity. To that end, depending on the considered application, we unfold probabilistic interpretations, propose generalizations, extensions and take into account special cases.

An approximation to the empirical likelihood of our nonlinear transform model provides a learning objective, where we not only identify and analyze the corresponding trade-offs, but we give information-theoretic as well as empirical risk connections considering the addressed objectives in the respective problem formulations. We introduce a generalization that extends an integrated maximum marginal principle over the approximation to the empirical likelihood, which allows us to address the optimal parameter estimation. In this scope, depending on the modeled assumptions w.r.t. an application objective, the implementation of the maximum marginal principle enables us to efficiently estimate the model parameters where we propose an approximate and exact closed form solutions as well as present iterative algorithms with convergence guarantees.

Numerical experiments empirically validate the nonlinear transform model, the learning principle, and the algorithms for active content fingerprinting, image denoising, estimation of robust and discriminative nonlinear transform representation for image recognition tasks and our clustering method that is preformed in the nonlinear transform domain. At the moment of thesis preparation our numerical results demonstrate advantages in comparison to the state-of-the-art methods of the corresponding category, regarding the learning time, the run time and the quality of the solution.

(11)
(12)

Résumé

Les principes de la modélisation de non-linéarités sont essentiels pour maints problèmes de la vie réelle. Leur traitement joue un rôle central et influence non seulement la qualité de la solution, mais aussi la complexité computationnelle et les gains dans les compromis possiblement impliqués, qui sont tous hautement demandés dans une variété d’applications, comme la prise du contenu des empreintes digitales active, la reconstitution des images, l’apprentissage supervisé et non-supervisé des représentations discriminatives pour des tâches de reconnaissance d’image et les méthodes de regroupement.

Dans la thèse présente un modèle de transformation non-linéaire généralisé novateur est proposé et étudié. Notre intérêt principal et élément de base est la transformation non linéaire exprimée par une double opération qui consiste en une modélisation linéaire suivi d’une non-linéarité par éléments. Pour ce faire, selon l’application considérée, des interprétations probabilistes sont développées et des généralisations et des cas particuliers sont proposées et considérées.

Une approximation à la probabilité empirique de la transformation non-linéaire assure l’objectif d’apprentissage où non seulement les compromis correspondants sont identifiés et analysés, mais les connexions à risque d’un point de vue informative-théorique, ainsi qu’empirique sont proposé en considérant les objectifs adressés dans les formulations respec- tives du problème. L’introduction d’une généralisation qui étend un principe maximal intégré marginal sur l’approximation de la probabilité empirique permet d’adresser l’estimation optimale du paramètre. Dans cet esprit, selon les hypothèses modelées par rapport à un objectif d’application la réalisation du principe marginal maximal, permet d’estimer de manière efficace les paramètres du modèle où des solutions analytiques approximatives et exactes sont proposées, ainsi que des algorithmes itératifs avec des garanties convergentes.

Des expériences numériques confirment la validité de notre modèle NT, le principe d’apprentissage, les algorithmes pour la prise du contenu des empreintes digitales active, l’enlèvement du bruit des images, l’estimation d’une représentation de transformation non- linéaire robuste et discriminative pour des tâches de reconnaissance d’image et la méthode de regroupement exécuté dans le domaine de transformation non-linéaire. Lors de la préparation

(13)

x

de la thèse nos résultats numériques montrent des avantages, comparés aux méthodes de pointe correspondants, concernant le temps d’apprentissage, la durée de fonctionnement et la qualité de la solution.

(14)

Table of contents

List of figures xv

List of tables xix

1 Introduction 1

1.1 Scope of the Thesis . . . 3

1.2 Thesis Outline . . . 4

1.3 Main Contributions . . . 7

2 Modeling and Estimation of Nonlinear Transform 9 2.1 Sparse Synthesis Model vs Nonlinear Transform Model . . . 9

2.1.1 Sparse Synthesis Model . . . 9

2.1.2 Nonlinear Transform Model . . . 11

2.1.3 Application Perspectives of the NT Model . . . 15

3 Estimation and Learning of NT Based Modulation for ACFP 19 3.1 Passive Content Fingerprinting . . . 19

3.2 Active Content Fingerprint . . . 20

3.3 ACFP with Predefined Linear Feature Map . . . 21

3.3.1 Contributions . . . 22

3.3.2 Reduction to Constrained Projection and Closed Form Solution . . 22

3.3.3 Giving Up Distortion or Exact Feature Descriptor Properties . . . . 24

3.4 Joint ACFP and Linear Feature Map Learning . . . 26

3.4.1 Contributions . . . 26

3.4.2 Problem Formulation . . . 26

3.4.3 Learning Algorithm (AFIL) . . . 27

3.4.4 Local Convergence Analysis . . . 30

3.5 ACFP Using Latent Data Representation, Extractor and Reconstructor . . . 30

3.5.1 Contributions . . . 31

(15)

xii Table of contents

3.5.2 ACFP-LR vs ACFP . . . 32

3.5.3 PCFP, ACFP and ACFP-LR: Feature Generation . . . 32

3.5.4 ACFP-LR: Reconstruction from Latent Variables . . . 33

3.5.5 Problem Formulation . . . 33

3.5.6 Reduced Problem Under Linear Feature Extraction and Reconstruction 34 3.6 Computer Simulations . . . 36

3.6.1 Numerical Experiments Setup . . . 36

3.6.2 Measures . . . 37

3.6.3 ACFP . . . 37

3.6.4 AFIL . . . 40

3.6.5 ACFP-LR . . . 40

3.7 Summary . . . 43

4 Learning NT for Image Denoising 47 4.1 Contributions . . . 47

4.2 Related Work . . . 48

4.2.1 Sparse Models for Image Denoising . . . 49

4.3 Learning an Overcomplete and Sparsifying Transform with Exact and Ap- proximate Closed Form Solutions . . . 50

4.3.1 Problem Formulation . . . 51

4.3.2 Two Step Iteratively Alternating Algorithm . . . 52

4.3.3 Local Convergence . . . 56

4.3.4 Image Denoising WithεCAT . . . 56

4.4 Numerical Evaluation . . . 58

4.5 Summary . . . 61

5 Learning Robust and Discriminative NT Representation for Image Recognition 63 5.1 Decision Aggregation Over Local NT Representations for Robust Face Recognition . . . 63

5.1.1 Contributions . . . 65

5.1.2 Related Work . . . 66

5.1.3 Decisions Aggregation Over Local NT Representation . . . 68

5.1.4 Asymptotic Analysis of Computational Cost and Memory Usage . . 73

5.1.5 Numerical Evaluation . . . 75

5.2 Supervised and Unsupervised Learning of Sparse and Discriminative NT for Image Recognition . . . 81

5.2.1 Contributions . . . 82

(16)

Table of contents xiii

5.2.2 Related Work . . . 83

5.2.3 Nonlinear Transform Model . . . 84

5.2.4 Discriminative Prior . . . 86

5.2.5 Learning Nonlinear Transform with Priors . . . 88

5.2.6 A Solution by Iterative Alternating Algorithm . . . 92

5.2.7 Evaluation of Algorithm Properties, Discrimination Quality and Recognition Accuracy . . . 95

5.3 Unsupervised Learning of Discrimination Specific, Self-Collaborative and NT Model for Image Recognition . . . 106

5.3.1 Motivations, NT Model and Learning Strategy Outline . . . 106

5.3.2 Contributions . . . 107

5.3.3 Related Work . . . 108

5.3.4 Target Specific Self-Collaboration Model . . . 109

5.3.5 Joint Learning of NTs with Discrimination Specific Self-Collaboration . . . 114

5.3.6 The Learning Algorithm . . . 115

5.3.7 Evaluation of the Proposed Approach . . . 117

5.4 Summary . . . 122

6 Clustering with NT and Discriminative Min-Max Assignment 123 6.1 Approach Outline and Contributions . . . 124

6.1.1 Joint Modeling of NTs with Priors . . . 124

6.1.2 Simultaneous Cluster and NT Representation Assignment . . . 125

6.1.3 Contributions . . . 125

6.2 Related Work . . . 126

6.3 Joint Modeling of Nonlinear Transforms With Priors . . . 127

6.3.1 Nonlinear Transforms Modeling . . . 128

6.3.2 Priors Modeling . . . 130

6.4 Problem Formulation and Learning Algorithm . . . 132

6.4.1 Problem Formulation . . . 132

6.4.2 The Learning Algorithm . . . 132

6.5 Evaluation of the Proposed Approach . . . 136

6.5.1 Data Sets, Algorithm Setup and Performance Measures . . . 137

6.5.2 Numerical Experiments . . . 137

6.6 Summary . . . 139

(17)

xiv Table of contents

7 Conclusions 141

7.1 NT Model and IMM Principle . . . 141

7.2 NT for ACFP . . . 141

7.3 NT for Image Denoising . . . 142

7.4 NT for Image Recognition . . . 143

7.5 Clustering with NT . . . 144

Appendix A 147 A.1 Proof of Theorem 1 . . . 147

A.2 Proof of Theorem A.1 . . . 148

Appendix B 151 B.1 Proof for the Global Optimal Solution . . . 151

B.2 Proof for the Approximateε-Close Closed Form Solution . . . 153

Appendix C 157 C.1 Proof of Theorem 3 . . . 157

C.2 Proof of Theorem 4 . . . 158

C.3 Proposition of Likelihood w.r.t Similarity and Dissimilarity . . . 162

C.4 Proof that Constrained Likelihood Estimation is MAP . . . 162

C.5 Proof Regarding the Discrimination Density Interpretation . . . 163

C.6 Proof of the Implicit Form in Supervised Case and the Closed Form Solution 164 C.7 Proof of the Implicit Form in Unsupervised Case and the Closed Form Solution167 Appendix D 171 D.1 Proof for the Closed Form Solution w.r.t. NT Representation . . . 171

D.2 Proof for the Closed Form Solution w.r.t. Similarity Related Parameter . . . 173

D.3 Proof for the Solution w.r.t. Dissimilarity Related Parameter . . . 174

References 177

(18)

List of figures

1.1 The problems addressed in this thesis are based on the unified concept of nonlinear transform modeling and learning. . . 3 3.1 Local ACFP framework. . . 22 3.2 Local ACFP framework with approximation to the linear map. . . 24 3.3 A general scheme for joint ACFP modulation and linear feature map learning. 26 3.4 A general scheme for ACFP-LR using a latent representation, extractor and

reconstructor functions. . . 31 4.1 The evolution of a) the transform error∥AX−Y∥2F, b) the−Tr{AXYT}and

its lower bound approximation−Tr{AG}c) the conditioning number and d) the expected mutual coherenceµ(A)while learning the transform matrix Aon overlapping 8×8 noisy image blocks (equivalentlyN=64) from the Cameraman image, whereMwas set to 80 and the sparsity level was set to 36. 58 4.2 The evolution of the normalized transform error AXY

2 F

L , whereLis the total number of samplesxl∈ {1, ...,L}under a) sparsity levelss∈ {4,10,16,22,28, 24,40,46,52,58,64,70} and b) amounts of data expressed in percentage from the total amount of data while learning the transform matrixAon over- lapping 8×8 noisy image blocks (equivalentlyN=64) from the Cameraman image, whereM was set to 80 and the sparsity level was set to 36. . . 59 5.1 The illustration of the clustering over local blocks. . . 68 5.2 An illustration of the code sj,k construction for subject k at image patch

location j. . . 72 5.3 An illustration of the recognition based on aggregation over local bag-of-

word decisions, that use the local NT representations. . . 74 5.4 Recognition results under basic fusion with 1

2 constrained projection, soft thresholding and hard thresholding and weighted fusion with 1

2 constrained projection, soft thresholding and hard thresholding. . . 77

(19)

xvi List of figures 5.5 Recognition results under varying number of training samples and varying

number of codebook codes. . . 78 5.6 Comparative recognition results using Extended Yale B and AR. . . 78 5.7 Comparative recognition results using PUT and FARET data sets. . . 79 5.8 Comparative recognition results under random corruption and continuous

occlusion. . . 79 5.9 An illustration of the idea about our NT transform, where we used different

colors to denote the spaces of the data samples from different classes in the original and transform domain. The goal of our NT is to achieve discrim- ination by taking into account a minimum information loss on the linear map and discrimination prior with a discrimination measure defined on the support intersection for the NT representations. . . 85 5.10 The evolution of the approximationsC1=RP

1(X)andC2=DP

1(X), their ra- tioC1/C2and the discrimination power log(C1/C2) =It during the learning of the nonlinear transform with transform dimensionM=9100. . . 97 5.11 The conditioning numberκn(A) =Cn(A) and the expected mutual coher-

ence µ(A)for the learned linear map A in the NT at different transform dimensionalityM∈Q. . . 97 5.12 The approximationC2=RP

1(X)and the discrimination powerIton a subset of the transform data using learned NT at different transform dimensionality M∈Q. . . 99 5.13 The recognition results and the discrimination power on the Extended Yale B

and MNIST databases, respectively, using a NT with different dimensionality Mand linear SVM classifier on top of the transform representation. . . 101 5.14 The expected lossE[∥zMc,k22] =E[Axc,kyc,k

2 2

M ]and the discrimination power on the Extended Yale B and MNIST databases, respectively. The trans- form representationYis obtained by using a nonlinear transformTP with different dimensionality. . . 101 5.15 The expected mutual coherenceµ(A)and the conditioning numberκn(A) =

λmax

λmin for the learned transform matrixAat dimensionalityM∈QU. . . 102 5.16 The evolution of the discrimination powerIt for 100 algorithm iterations

on a subset of the transform data using UNT at transform dimensionM=5884.102 5.17 An illustration of the idea about our NT transform with self-collaboration

relations that takes discrimination specific objective into account. . . 110 6.1 An illustration of the cluster assignment based on a similarity measured(., .)

betweenxiand the clustersdj,j={1, ..,8},i.e., ˆj=7=arg minjm(xi,dj). 127

(20)

List of figures xvii 6.2 An illustration of the proposed simultaneous cluster and NT representation

assignment. qi=Axiis the linear transform representation,y|{c1,c2}is NT representation, {τττc1,νννc2} are element-wise nonlinearity parameters with discrimination role. There are in total of 4 NT representations, determined by all pairs {c1,c2} ∈ {1,2} × {1,2}. Simultaneously, the data point xi is assigned to cluster indexc=2(c1−1) +c2= (2−1)2+2=4 and the NT representation is estimated asyi=y|{2,2}based on the discriminating min-max similarity/dissimilarity score. . . 128 6.3 The evolution of a) the objective related to the problem of simultaneous

cluster and NT representation assignment, b) the expected NT error and c) the expected discrimination min-max functional score per iteration for the proposed algorithm on the ORL [132], COIL [105], E-YALE-B [47] and AR [96] database. . . 136

(21)
(22)

List of tables

2.1 The nonlinear transform model and the applications considered in this Thesis. 17 3.1 The peusing PCFP under varying AWGN noise, varying JPEG compression

levels and projective transformation with QF level of 5. . . 38 3.2 DWR and peusing varying ACFP modulation under under varying AWGN

noise, varying JPEG compression levels and affine transformation with QF level of 5. . . 39 3.3 DW Rand peunder PCFP using varying AWGN noise, JPEQ quality factor

and affine transformation with QF=5 for the feature mapsF,F,Fr andFI . 40 3.4 DW Rand peusing varying ACFP modulation under varying AWGN noise,

JPEQ quality factor and Projective transformation with QF=5 for the feature mapsFandFI . . . 41 3.5 DW Rand peusing varying ACFP modulation under varying AWGN noise,

JPEQ quality factor and affine transformation with QF=5 for the feature mapsF,FandFr . . . 42 3.6 DW Rand peusing varying ACFP modulation under varying AWGN noise,

JPEQ quality factor and affine transformation with QF=5. . . 43 3.7 DW Rand pe of under different Additive White Gaussian Noise (AWGN) level. 44 3.8 DW Rand peunder different JPEQ Quality Factor (QF). . . 44 3.9 DW Rand peunder JPEQ quality factor QF=5 and affine transformation. . . 45 3.10 DW R and pe for ACFP-LR under affine transform and extremely low QF

and high AWGN levels. . . 45 4.1 Denosing performance in PSNR, whereσ is the noise standard deviation. . 60 4.2 The execution time in minutes and the percentage of the used image data. . 60 4.3 The PSNR for theεCAT algorithm learned on percentage of the available

noisy image data with noise levelσ=10. . . 61 5.1 Computational complexity in bigO(.)notation . . . 75

(23)

xx List of tables 5.2 Memory usage . . . 75 5.3 a) The conditioning numberκn(A)and the expected mutual coherence µ(A)

for the learned linear map Ain the NT, and the execution time te[min] in minutes of the proposed algorithm for 28 iterations using NT with dimen- sionalityM=19000 . b) The discrimination power in the original domain IO, after a transform with random linear map IRT, after using learned sparsifying transformIST and after using learned NTINT. . . 98 5.4 a) The conditioning numberκn(A)and the expected mutual coherence µ(A)

for the learned linear map Ain the NT, and the execution time te[min] in minutes of the proposed algorithm for 28 iterations using NT with dimen- sionalityM=19000 . b) The discrimination power in the original domain IO, after a transform with random linear map IRT, after using learned sparsifying transformIST and after using learned NTINT. . . 99 5.5 a) The discrimination powerI for the methodsDLSI[118],FDDL [162],

COPAR [148], LRSDL [149] and the proposed NT, and the recognition results using a nonlinear transform with different dimensionality M and linear SVM classifier on top of the transform representation for the Extended Yale B and MNIST database. . . 100 5.6 The recognition results and the learning time in hours on all database, respec-

tively. We show the k-NN accuracy on the original data (OD) representation and the k-NN accuracy on the NT representation where the UNT is learned in the unsupervised case and has dimensionM=5884. . . 103 5.7 a) The discrimination power for the sparse representations of the methods

DLSI [118], FDDL [162], COPAR [148] and LRSDL [149] and the pro- posed methodU NT, b), c) The recognition results on the Extended Yale B and MNIST for the methods DLSI [118], FDDL [162],COPAR [148]

andLRSDL[149] compared to the kNN results on the UNT representations learned in the unsupervised setup. . . 103 5.8 Recognition accuracy comparison between state-of-the-art methods and 1)

K Nearest Neighbor kNN search and 2) linear SVM [61] (l-svm) that use the Sparsifying Nonlinear Transform (sNT) representations from our model on extracted HOG [32] image features. We use our algorithm to learn the model on the HOG features. Then we get the sNT representations with dimensionality 7300 for the respective training and test sets. The training sNT representations are used to estimate the SVM parameters and the recognition is performed using the learned SVM on the test sNT representations. . . . 104

(24)

List of tables xxi 5.9 The cumulative expected mutual coherence L1lµ(Al)and the cumulative

conditioning number 1Llκn(Al)for the linear mapsAl,l∈ {1, ...,6}with dimensions 6570×N, whereNis the dimensionality of the input data . . . 117 5.10 The learning time in hours on the databases AR, YALE B, COIL20 and

NORB using our model with dimensionLM=6570, number of self-collaboration componentsL=9, and dimension per self-collaboration componentM=730 . . . 118 5.11 The discrimination power in the original domain, after random transform,

after learned sparsifying transform and after learned self-collaborating target specific nonlinear transform with dimensionM=6570. . . 119 5.12 The recognition results on the databases AR, YALE B, COIL20 and NORB,

using k-NN on the raw image data (raw) and the sparse representations from our model (p) with dimensionM=6570. . . 119 5.13 The discrimination power and the recognition results on the Extended Yale

B and MNIST databases for the methodsDLSI[118],FDDL[162],COPAR [148], LRSDL [149], the proposed model on raw image data p and the proposed model on extracted HOG [32] image features HOG-p. . . 120 5.14 Recognition accuracy comparison between sota and 1) k Nearest Neighbor

(k-nn) search and 2) linear SVM [61] (l-svm) that use the Sparsifying NT (sNT) representations from our model on extracted HOG [32] image features.

We use our algorithm to learn the model on the HOG features. Then we get the sNT representations with dimensionality 9800 for the respective training and test sets. Considering the obtained result for database SVHN, we note that the unlabeled training data from the respective database was not used during the learning of the corresponding model. . . 121 6.1 The computational efficiency per iterationt[sec]for the proposed algorithm,

the conditioning numberκn(A)and the expected mutual coherenceµ for the liner mapA. . . 137 6.2 The clustering performance over the databases COIL, ORL, E-YALE-B and

AR evaluated using the Cluster Accuracy (CA) and the Normalized Mutual Information (NMI) metrics. . . 138 6.3 A comparative results between state-of-the-art [89], [168], [164], [54] and

[73], and the proposed method(∗). . . 138 6.4 The k-NN accuracy results using assigned NT representations and original

data (OD) representation. . . 138

(25)
(26)

Chapter 1 Introduction

"A perspective from a transform trough projection, one may see it as a formal notion, others, as the essence of life..."

D. Kostadinov Nowadays, in many areas, including signal processing, machine learning, artificial intelligence, computer vision, etc., due to the inevitable imperfections in the data acquisition process, a commonly encountered issue is the presence of data uncertainty in a form of noise or data variability. To that end, essential are convenient data representations, mappings and transforms that allow to qualitatively and efficiently process, analyze, modulate, recognize, classify and cluster the data.

In general, we can distinguish two main types of mappings and transforms. The first type characterizes transforms that when applied to the data introduce small changes with respect to a defined constraints. This approach covers a range of applications in the case when content modulation is appropriate, prior to the content distribution/reproduction such as content authentication, identification and recognition. The second type describes transforms that do not introduce changes to the original data, but when applied to the data result in transform representations that satisfy task specific objectives. Commonly, the latter type of transforms is widely used in applications like image denoising, recognition, classification and clustering.

Nonetheless, in both cases, one aims to efficiently express the original data representation with an other convenient data representation, which can result from:

- A carefully and appropriately chosen analytic and predefined transform or - Data adaptive learned transform.

(27)

2 Introduction The advantage of the latter compared to the former is an ability to adapt to the given data, since for the use of the former a more strict statistical data properties have to be known in advance, which might be restrictive for the practical usability.

Nonetheless, in order to accordingly model and allow efficient estimation of a task- relevant, useful and information preserving transform representation that satisfy certain properties, usually, a prior knowledge in the form of more "loose" assumption has to be taken into account. One of the fundamental concepts that was widely exploited in the past decade, addressing data adaptiveprocessing and data analysis, is thesparse data representation.

That is, given a data samplex∈ℜN and a set of vectorsD= [d1, ...,dM]∈ℜN×M (formally known as a frame1), a sparse representationy∈ℜM forxoverDis one that uses a sparse (small) number of vectorsdi∈ℜN fromDto representx.

Although sparsity is crucial for the modeling and solving of many inverse problems that are encountered across different signal processing, machine learning and artificial intelligence tasks, the sparsity assumption alone is not enough to encompass the full extend of requirements in applications like active content fingerprinting (ACFP) and recognition.

On the other hand, even if we relay only on the mostly used synthesis [2] sparse model, a disadvantage might be the computational complexity, since the synthesis sparse model can have high computational complexity when the input data dimension or the sparse representation dimension is high.

To address the aforementioned challenges, extensions and alternative models have to be taken into account, and additional priors and assumptions on the representation properties have to be considered, modeled and explored in order to fulfill task specific demands like:

- Low complexity estimate - Optimal trade-offs - Robustness

- Low estimate variation w.r.t. a task specific objective - Discrimination.

All which are very important for the active content fingerprinting (ACFP), image denoising, estimation of discriminative representations in image recognition/classification tasks and the clustering methods. In this thesis, to address the above open issues:

- We introduce a novel generalized nonlinear transform (NT) model

- We demonstrate the usefulness of the NT model across several applications.

1A set ofMorthonormal vectors with vector dimensionalityNequal toMis said to form a basis set for that vector space. A frame of an inner product space is a generalization of a basis of a vector space to sets that may be linearly dependent.

(28)

1.1 Scope of the Thesis 3

Fig. 1.1 The problems addressed in this thesis are based on the unified concept of nonlinear transform modeling and learning.

Our model allows not only to address different task specific constraints on the transform rep- resentations, but also offers a probabilistic interpretation and provides information-theoretic connections, as well as enables the considered approximation of the model log likelihood to be related to the empirical risk in the corresponding learning objective.

Our parametric model will varay depending on the different assumptions and prior constraints, which are application driven. At the basic level, a common component is the low complexity nonlinear transform, which is expressible by a linear mapping that is followed by an element-wise nonlinearity. Regarding the estimation of the NT representation, the key difference of our NT model compared to the commonly used synthesis model with constraints is that we do not explicitly address the reconstruction of the data by a sparse linear combination. Rather, we address a constrained projection problem and estimate the NT representation as its solution. Our approach has number of advantages that will be presented and explained in this Thesis.

1.1 Scope of the Thesis

In the scope of this Thesis, using special cases and extensions of our nonlinear transform model, we address:

- The active content fingerprinting (ACFP) problem

- The image denoising, as one particular representative of the restoration problems

(29)

4 Introduction - The estimation of sparse and discriminative NT representations useful for image

recognition/classification tasks

- Nonlinear transform domain clustering.

The addressed problems in this Thesis are summarized in Figure 1.1.

In the ACFP, we use a model that represents a special case of our generalized nonlinear transform model. Since the focus is on a special type of ACFP modulation, the NT represen- tation appears in the linear system, which has to be solved in order to estimate the optimal distortion component that has to be added to the original data.

In image denoising, the sparsifying transform model is also another special case of our generalized nonlinear transform model. To study the optimal solution for the the sparsifying transform model with a non-structured overcomplete transform matrix, we focus on a problem formulation that addresses a trade-off between (a) the alignment of the gradients in the approximative objective and the original objective and (b) lower bound tightness to the original objective. The usage of the aforementioned trade-off offers an acceleration in the local convergence of the solution next to leading to a satisfactory solution under a small amount of training data.

Sparsity alone does not guarantee that the resulting representation will be discriminative.

Up to the best of our knowledge, we provide the first work that extends the sparsifying model for learning sparse and discriminative representations while offering a high degree of freedom in modeling2and imposing constraints other then the sparsity constraint on the representation.

Considering an estimate with low variability w.r.t. a discrimination specific objective that is found trough the use of self-collaboration, we extend our generalized nonlinear transform model and explore a discrimination centered, collaboration structured and sparse modeling.

In the final part of this Thesis, we jointly model and learn multiple NT transforms with explicit consideration of discrimination specific parameters. We propose a novel clustering principle, where we focus on measures that reflect a notion of joint similarity and dissimilarity score between a data point and a set of data points. Finally, we develop a concept that allows unsupervised discrimination and clustering not in the original data domain, but instead in a nonlinear transform domain, where a nonlinear transform model is used.

1.2 Thesis Outline

In Chapter 2, we present the commonly used synthesis model, give its probabilistic interpreta- tion and provide the related inverse problem for the estimation of the synthesis representation.

2Many nonlinearities, i.e., ReLu,p-norms, elastic net-like,1

2-norm ratio, binary encoding, ternary encoding, etc., can be modeled as a generalized nonlinear transform representation.

(30)

1.2 Thesis Outline 5 Afterwards, we introduce our generalized nonlinear transform model, which is a unified base for all our NT modeling across the considered applications in this thesis. In addition, we introduce the related direct problem,i.e., the constrained projection problem, which has a central role in estimation of the NT representation.

In Chapter 3, we introduce and describe the active content fingerprinting concept and explain the differences compared to passive content fingerprinting (PCFP). Then, by taking an approximation of the negative logarithm of a special case of our NT model, we introduce the generalized problem formulation as a form of min-max problem. Under a linear modulation and predefined linear feature map, we show a reduction to a constrained projection problem and provide the optimal solution. We also address an approximation of the predefined linear feature map in order to find appropriate trade-offs between the modulation distortion and feature robustness. Afterwards, we address a problem formulation, where we jointly learn the linear map and estimate the modulation distortion in order to attain a low modulation distortion and high feature robustness. In our numerical evaluation, our efficient solution demonstrates significant improvements compared to PCFP. Finally, we extend the basic concept of ACFP by focusing on a redundant content representation and include extractor and reconstructor functions, which we name as ACFP-LR. In the latter, we present the problem formulation and show a reduction to a constrained projection problem, which has an efficient solution. Our numerical evaluation shows that ACFP-LR has superior performance compared to the rest of the analyzed schemes.

In Chapter 4, we address the learning problem for the data adaptive transform that provides sparse representation in a space with dimensions larger than or equal to the dimensions of the original space. We show that the sparsifying transform model represents one reduced form of the generalized NT model. We present an iterative, alternating algorithm that has two steps: (i) transform update and (ii) sparse coding. In the transform update step, we focus on a novel problem formulation based on a lower bound of the objective function that addresses a trade-off between (a) how much the gradients are aligned of the approximative objective and the original objective and (b) how close is the lower bound to the original objective. This allows us not only to propose an approximate closed form solution, but also gives a possibility to find an update that can lead to an accelerated local convergence and enables us to estimate an update that achieves satisfactory solution under a small amount of data. Since in the transform update, the approximate closed form solution preserves the gradient and in the sparse coding step, we use the exact closed form solution and we show that the resulting algorithm is convergent. On the practical side, we evaluate our algorithm in an image denoising application. We demonstrate promising performance together with

(31)

6 Introduction advantages in training data requirements, accelerated local convergence and computational complexity.

Chapter 5 consists of three major sections. In the first section of Chapter 5, we consider the face recognition problem from both machine learning and information coding perspectives, adopting an alternative way of visual information encoding and decoding thought estimation of a robust NT representation. Our model for recognition is based on multilevel vector quantization (MVQ) that is conceptually equivalent to a bag-of-word method and bears similarity to a convolutional neural network CNN. We introduce an alternative aggregation method over local bag-of-words decisions from locally estimated robust NT representations w.r.t. a learned centroids over local blocks. Moreover, we relate the local NT representation with a corresponding likelihood vector. We present a generalization of a sparse likelihood approximation, give connections to Maximum a Posterior (MAP) estimate, as well as showing connections to common techniques such as hard and soft thresholding. We evaluate our approach by extensive numerical simulation on face image databases, where we show improvements and competitive result w.r.t. the state-of-the art methods, while having low computational complexity.

In the second section of Chapter 5, we describe and explain our NT model, where we take into consideration the modeling and learning of a nonlinear transform that is parameterized by a linear map and generalized element-wise nonlinearity. In our modeling of the NT, we introduce the minimum information loss and discriminative priors for the respective linear map and sparse representations. During training, we estimate the model parameters by minimizing an approximation for the negative logarithm of the model. We propose an efficient iterative algorithm with convergence guarantee that alternates between two steps, which have approximate and exact closed form solutions. Given a test data sample, we estimate a sparse representation using the learned model parameters, which represents a solution to a low complexity constrained projection problem. The efficiency of the proposed approach, together with the potential usefulness of the NT representations is validated by numerical experiments in a supervised and unsupervised image recognition setup. The evaluation demonstrates advantages in comparison to the state-of-the-art methods of the same category, regarding learning time, the discriminative quality and the recognition accuracy.

In the third section of Chapter 5, we present an extension of our base NT model to another NT model for learning collaboration structured, discriminative and sparse representations.

The idea is to model a collaboration corrective functionality between multiple nonlinear transforms in order to reduce the uncertainty in the estimate. The focus is on joint estimation of a data-adaptive NTs that take into account a collaboration component w.r.t. a discrimination target. The joint model includes the minimum information loss, collaboration corrective

(32)

1.3 Main Contributions 7 and discriminative priors. The model parameters are learned by minimizing the negative logarithm of the learning model, where we propose an efficient solution by an iterative, coordinate descend algorithm. Numerical experiments validate the potential of the proposed learning principle. The preliminary results show advantages in comparison to the state of-the-art methods.

In Chapter 6, we present a novel clustering concept based on (i) jointly learned NTs with minimum information loss and discriminative priors and (ii) min-max assignment over NT representations. In the common clustering algorithms a data point in the original data space is assigned to clusters based on the similarity correspondence. In contrast, we propose a simultaneous cluster and NT representation assignment principle based on evaluating a min-max score that approximates discriminative log likelihood in the transform domain.

Numerical experiments on image clustering task validate the potential of the proposed approach. The evaluation shows advantages in comparison to the state-of-the-art clustering methods regarding the learning time and the used clustering performance measures.

In Chapter 7, the conclusions summarize this Thesis.

1.3 Main Contributions

The main contributions of this thesis are summarized as follows:

- We introduce a generalized nonlinear transform model that contrary to the synthesis model, which is based on data reconstruction, relies on a constrained data projection - We generalize the integrated maximum marginal (IMM) principle by taking into

consideration the negative logarithm of the learning model for estimation of the NT model parameters, which enables efficient solutions for a number of applications - We propose several novel active content fingerprinting (ACFP) schemes under linear

modulation and linear feature maps, where we show optimal closed form solutions and efficient algorithms with convergence guarantees by utilizing a special case of our NT model

- We study the sparsifying transform model with an overcomplete transform matrix for an image denoising application. The considered model represents a reduction and a special case of our NT model. We propose not only an alternating algorithm with an approximate and exact closed form solution and convergence guarantees, but also introduce a novel problem formulation that addresses a trade-off between accelerated local convergence and a satisfactory solution under small amount of data

(33)

8 Introduction - We propose novel strategies for learning discriminative and robust NT representations that are useful for image recognition tasks in supervised and unsupervised setups.

In addition, we consider task-centric self-collaboration. The NT model parameters estimation is based on our generalized IMM principle, which allow as efficient solution with convergence guarantee to be implemented by iterative alternating algorithm - We present a novel clustering concept based on (i) jointly learned NTs with mini-

mum information loss and discriminative priors and (ii) min-max assignment over NT representations, where we introduce the simultaneous cluster and NT represen- tation assignment principle, which is based on evaluating a score that approximates discriminative log likelihood in the transform domain.

(34)

Chapter 2

Modeling and Estimation of Nonlinear Transform

In this chapter, we outline the well known synthesis model and its corresponding inverse problem. Then, we introduce our nonlinear transform model and its corresponding direct problem. Along this way, we also highlight the differences in the modeling approaches and the corresponding problems.

2.1 Sparse Synthesis Model vs Nonlinear Transform Model

2.1.1 Sparse Synthesis Model

As the name suggests, in many areas, the main idea behind this model, is to synthesize a data vector from a set of defined vectors that represent some dictionary.

Deterministic FormulationIn the most general case, according to the synthesis model, a data samplexi∈ℜN of dimensionalityNis approximated by a linear combinationyi∈ℜM (referred to as a sparse data representation) of a few words (frame vectors)∥yi0<<M, from a dictionary (frame1)D∈ℜN×M, as:

xi=Dyi+vi, (2.1)

wherevi∈ℜN denotes the approximation error, which is usually assumed to be Gaussian.

1A matrixDN×M is said to be overcomplete ifM>N. Equivalently, if the numberM of columns dmNinDis bigger than the dimensionalityNofdm,i.e.,M>N, one might also say that the set of vectors {d1,d2, ...,dM}is linearly dependent and that this set forms a frame.

(35)

10 Modeling and Estimation of Nonlinear Transform Probabilistic FormulationIn a probabilistic sense, we consider thatxi,yiandDare random vectors and random matrix, respectively. A conditional probability distribution ofxigiven the dictionaryDcan be expressed as:

p(xi|D) = Z

yiM

p(xi,yi|D)dyi= Z

yiM

p(xi|yi,D)p(yi|D)dyi, (2.2) where p(xi|yi,D)models the relation (2.1),i.e.:

p(xi|yi,D)∝exp

− 1

β0∥xi−Dyi22

, (2.3)

where β0 is a scaling parameter. In the prior term p(yi|D), it is usually assumed that yi is independent to D, i.e., p(yi) = p(yi|D). Moreover, assuming that the entries in the representationyiare i.i.d. and follow Laplace distribution, then we have that:

p(yi)∝exp

−1 β1∥yi1

, (2.4)

where∥.∥1denotes theℓ1-norm andβ1is a scaling parameter.

Learning The Model Parameters Given CK data samples X= [x1, ...,xCK], we model conditional probability p(X|D)that under the independence assumption between the data samplesxidecomposes as:

p(X|D) =

CK

i=1

p(xi|D). (2.5)

Moreover, instead of just working with p(X|D), we can use the Bayes’ rule and consider an approximative posterior:

p(D|X)∝p(X|D)p(D), (2.6)

were we disregard the prior p(X), while the priorp(D)on the dictionaryDis defined as:

p(D)∝exp(−ΩS(D)), (2.7)

whereΩS(.)is the prior measure that defines the properties of the dictionaryD.

Under the above considerations, the Maximum a Posterior (MAP) estimations ofDand Y= [y1, ...,yCK]can be expressed as:

{Y,ˆ Dˆ}=arg max

Y,D p(D|X)≃arg max

Y,D p(X|D)p(D), (2.8)

(36)

2.1 Sparse Synthesis Model vs Nonlinear Transform Model 11 or equivalently taking the negative logarithm of p(X|D)p(D), the problem reduces to:

{Y,ˆ Dˆ} ≃arg min

Y,D[−logp(X|D)−logp(D)] = arg min

Y,D

"

CK i=1

logp(xi|D)−logp(D)

#

=

arg min

Y,D

"

CK i=1

log Z

yiM

p(xi|yi,D)p(yi)dyi−logp(D)

# .

(2.9)

The estimation of ˆYand ˆDis still difficult to compute due to the integration overyi. If we replace p(yi,xi|D)with its extreme value, then we end up with the following problem:

{Y,ˆ Dˆ}=arg min

D CK

i=1

minyi

1

2∥xi−Dyi221∥yi1

+Ω(D), (2.10) where we assumed that{β10, 1

β1}={121}.

−Sparse Representation EstimationAssuming that the dictionaryDis given, then (2.10) per individual sparse representationyi, reduces to:

yi=arg min

yi

1

2∥xi−Dyi221∥yi1, (2.11) which represents aninverse problemw.r.t. yithat also is known as a constrained regression problem.

2.1.2 Nonlinear Transform Model

In this thesis, we focus on a model that describes a generalized nonlinear transform represen- tationyifor the data samplexi.

Deterministic FormulationWe express our nonlinear transform model as:

Axi=yi+zi, (2.12)

whereA∈ℜM×N is the linear map of the nonlinear transform,yiis the nonlinear transform representation andzi∈ℜM is the nonlinear transform error vector. In contrast to the synthesis model, in the nonlinear model, one assumes that the nonlinear transform representationyi results from applying a generalized element-wise nonlinearity toAxithat is parameterized byθθθ,i.e.,

yi= fθθθ(Axi), (2.13)

(37)

12 Modeling and Estimation of Nonlinear Transform whereθθθ are parameters, which allows us not only to consider a notion for sparsity, but also to take into account robustness or discrimination.

−ExamplesOne simple example of such a transform is a sparsifying transform model [119], where the parameterθθθ =λ1∈ℜM,λ ∈ℜwith ahard thresholdingfunction fhthat acts as a nonlinear transform,i.e.:

fh(yi(m)) =



yi(m), if|yi(m)|>λ, 0, otherwise,

(2.14)

other example is asoft thresholdingfunction,i.e.:

fs(yi(m)) =









yi(m)−λ, ifyi(m)>λ, yi(m) +λ, if−yi(m)>λ,

0, otherwise,

(2.15)

which can be compactly expressed as:

fs(yi) =sign(Ayi)⊙max(|Ayi| −λ1,0), (2.16) wheresignis a sign function and⊙is the Hadamard product. The third example is the ternary encoding:

ft(yi) =sign(max(|Ayi| −λ1,0)). (2.17) Another also interesting example is the ReLu activation function, that is commonly used in the deep neural networks,i.e.:

fReLu(yi) =max(Ayi,0). (2.18) Probabilistic FormulationTo introduce a probabilistic interpretation of a nonlinear trans- form model, we will consider the marginal probability distribution, which we express as:

p(yi|xi,A) = Z

θ θθ

p(yi,θθθ|xi,A)dθθθ, (2.19) Furthermore, we can use the chain rule, which leads to:

p(xi,yi,θθθ|A) =p(yi,θθθ|xi,A)p(xi|A) =p(xi|θθθ,yi,A)p(yi,θθθ|A), (2.20) and assume thatp(xi|A) =p(xi).

(38)

2.1 Sparse Synthesis Model vs Nonlinear Transform Model 13 We are interested in modeling p(θθθ,yi|xi,A), where under the Bayes’ rule, we focus on the proportional form that we express as:

p(yi,θθθ|xi,A)∝p(xi|θθθ,yi,A)p(yi,θθθ|A). (2.21) In the simplest case, p(xi|θθθ,yi,A)models the residual vectorzi=Axi−yias:

p(xi|θθθ,yi,A)∝exp

− 1

β0∥Axi−yi22

, (2.22)

whereβ0is a scaling parameter. We note that any additional knowledge about the residual vectorzican be used and added in the model p(xi|θθθ,yi,A).

In order to simplify the consideration, we neglect the dependence onAby assuming that:

p(yi,θθθ|A) =p(yi,θθθ)∝exp

− 1 β1

m(θθθ,yi)

, (2.23)

wherem(.):ℜM×ℜM →ℜis a measure and β1 is a scaling parameter. The motivation behind the use of such a parametric prior p(yi,θθθ) on yi is to accommodate a class of assumptions related to sparsity, robustness and/or discrimination.

Learning The Model Parameters GivenCK data samples X= [x1, ...,xCK], under our consideration, we consider the following learning model:

p(Y,A|X) =p(Y|A,X)p(A|X) =

CK

i=1

Z

θ θ θ

p(θθθ,yi|xi,A)dθθθp(A|xi)

CK

i=1

Z

θ θ θ

p(xi|θθθ,yi,A)p(yi,θθθ)dθθθp(A|xi).

(2.24)

where we use a simplification for the prior on the linear mapA,i.e., p(A|xi) =p(A)and we define it as:

p(A)∝exp(−Ω(A)), (2.25)

withΩ(.)denoting the prior measure which defines the properties of the rows ofA.

Minimizing the exact negative logarithm of our learning model (2.24) overY,θθθ andA is difficult since we have to integrate in order to compute the marginal and the partitioning function of the prior p(y,θθθ). Instead of minimizing the exact negative logarithm of the marginalRθθθ p(xi|θθθ,yi,A)p(yi,θθθ)dθθθ, we consider minimizing the negative logarithm of its

(39)

14 Modeling and Estimation of Nonlinear Transform maximum point-wise estimate, i.e.,

Z

θ θ θest

p(xi|θθθest,yi,A)p(yi,θθθest)dθθθest≤Dp(xi|θθθ,yi,A)p(yi,θθθ), (2.26) where we assume that θθθ are the parameters for which p(xi|θθθest,yi,A)p(yi,θθθest) has the maximum value andDis a constant. Furthermore, we use the proportional relation (2.21) and by disregarding the partitioning function related to the priorp(yi,θθθ), we end up with the following problem formulation:

{Y,ˆ θθθˆ,Aˆ}=arg min

Y,θθθ,A





CK

i=1





logp(xi|θθθ,yi,A)

z }| { 1

2∥Axi−yi22+

logp(yiθθ)

z }| { λ1m(θθθ,yi)



+

logp(A)

z }| { Ω(A)



, (2.27)

where{2, 1

λ1}={β01}.

We note that in general, depending of the used measures that describe p(xi|θθθ,yi,A) and p(θθθ,yi), even the exact minimization w.r.t. the the point-wise estimate might still be difficult to compute. Since in order p(xi|θθθ,yi,A)and p(θθθ,yi) to be a properly factored probabilities, p(xi|θθθ,yi,A)and p(θθθ,yi)have to contain partitioning functions, which can be exactly evaluated only by integrating over the involved parametersxi,yi,θθθ andA.

Alternativly, the maximization of our lernining model∏CKi=1Rθθθ p(xi|θθθ,yi,A)p(yi,θθθ)p(A|xi) over any of the variablesyi,θθθ and Acan be seen as an approximative form of integrated marginal maximization (IMM) [116] of∏CKi=1Rθθθp(xi|θθθ,yi,A)p(yi,θθθ)p(A|xi) over the re- spectiveyi,θθθ orA, which can be summarized by the following steps:

- Approximative maximization of p(xi|yi,θθθ,A)with prior p(θθθ,yi)overyi, - Approximative maximization of p(xi|yi,θθθ,A)with prior p(θθθ,yi)overθθθ and - Approximative maximization of∏CKi (xi|yi,θθθ,A)with prior p(A)overA.

−NT Representation EstimationAssuming that the linear mapAand the parameterθθθ is given, then the exact estimation ofyi w.r.t. our model is equivalent to computing the minimum of the negative logarithm overp(xi|θθθ,yi,A)p(θθθ,yi),i.e.:

ˆ

yi=arg min

yi

[−logp(xi|yi,θθθ,A)−logp(θθθ,yi)], (2.28) where again we point out that it might be difficult to compute depending of the chosen measures that describe p(xi|yi,θθθ,A)and p(θθθ,yi), that involve a possible integration in the corresponding partitioning functions for p(xi|yi,θθθ,A)and p(θθθ,yi).

Références

Documents relatifs

After establishing the existence and uniqueness of a solution for PDE problem, we tested the model numerically by both an explicit and a semi-implicit schemes using an additive

In both TL-NMF and ILRTA, the parameters of the model (e.g. the orthogonal transform and the NMF factors for TL- NMF) are estimated through the minimization of a non- convex

As supervised learning algorithms build classifiers based on labeled training examples, several of these approaches aim to reduce the amount of time and effort needed to obtain

If we place the bistable units of each layer relatively to the number of connections they receive from the oscillations generator, then we can see time as it passes (fig.. In

An image of Chelsea decomposed and reconstructed using the Roadmaker’s Pavage via a graph bank that has been upsampled, filtered and interpolated after the pulses were

Neutron diffraction strain mapping was carried out and revealed that in the case of the AZ91D alloy, tensile strain of *0.05% was associated with initiation of material’s plastic

We apply the Penrose transform, which is a basic tool of relativistic physics, to the study of sharp estimates for linear and nonlinear wave equations.. On the other hand, we

Il détermine, de sa position, la direction de trois repères appa- raissant sur la carte et il reporte ces observa- tions sur la carte pour obtenir trois segments de droites dont