Compressive sensing of auto-regressive sources

6.3 Compression of facial images using RRQ

7.1.3 Compressive sensing of auto-regressive sources

We now consider the problem of noisy compressive sensing as another instance of inverse problems that can be addressed with the idea of “compressibility as a prior”. Let us now take the case of auto-regressive sources and in particular AR(1) sources.

So suppose we have trained a network on such data. As an example, let us pick our ML-STC of section 4.3.1 and train it on AR(1) source, as we did in Fig. 4.5.

Now that this network is trained, our objective is to recover the under-sampled and noisy measurements of the instances from the test set, i.e., to recover f from the unders-sampled and noisy q= Tf+p, where T∈ ℜ^l×n is a fat matrix (i.e., l < n) with Gaussian random values andp is white Gaussian noise with variance σ_P².

(a) Noisy

PSNR = 3.10 dB (b) Denoised (BM3D)

PSNR = 20.25 dB (c) Denoised (proposed) PSNR = 21.35 dB

Fig. 7.3 Image denoising in highly noisy regimes (σ²_P = 0.7). (a) Noisy image, (b) Denoised with BM3D, (c) Denoised with Eq. 7.2 when initialized with BM3D.

We apply the iterative algorithm of Eq. 7.2 to recover the data. This algorithm is initialized with f⁽⁰⁾ = (T^TT)⁻¹T^Tq, i.e., the Moore-Penrose pseudo-inverse of T to back-project q tof. Note that this is the optimal reconstruction without regularization, i.e., for µ= 0.

Fig. 7.4 shows a sample off from the test set, as well as its recovery using pseudo-inversion and our Eq. 7.2. It is clear that our regularization produces AR(1) solutions while the solution returned by pseudo-inversion does not have such structures.

0 100 200 300 400 500

−2

−1 0 1

Original sample Recovered (pseudo-inverse)

Recovered (proposed)

Fig. 7.4 Noisy compressive sensing of AR(1) data withn= 512 with the iterative algorithm of Eq. 7.2. The compressability prior is imposed using the ML-STC and withµ= 100. The observation is sampled withl= 256 and contaminated with noise of variance σ²_P = 1.

Fig. 7.5 shows the evolution of MSE during iterations for 3 different sub-samplings in l= 32,128,256, and with noise varianceσ_P² = 1, while the data dimension wasn= 512, and the correlation factor for the source was ρ= 0.99.

7.2 Conclusions 143

0 5 10 15 20 25 30 35 40

0 0.2 0.4 0.6 0.8 1

iteration

MSE

l= 32 l= 128 l= 256

Fig. 7.5 Noisy compressive sensing of AR(1) data with the iterative algorithm of Eq. 7.2.

The compressability prior is imposed using the ML-STC and withµ= 100.

This simple idea seems to be very efficient, and the reconstruction error is decreasing for all values of l. We do not provide any convergence guarantee, but we notice that when initialized other than with the pseudo-inverse, e.g., withf⁽⁰⁾ = T^Tq, the convergence takes several more iterations, but it always converges. This, however, depends on µ, and the step-size τ and the operating rate of the compressor.

7.2 Conclusions

This chapter presented some of our ongoing works, as well as some promising directions for future research based on learned compression.

Among various possibilities, we focused on one instance application that learning to compress can be useful beyond the task of compression itself. In particular, we investigated compressibility as a prior to solve inverse problems and imposed it as a regularization to the data-fidelity term. This was achieved using a black-box interpretation of the compression-decompression network, which led to a simple gradient descent on the objective.

We saw two variants of this problem. First, we considered image denoising, where the JPEG2000 codec was used to apply the compressibility prior. We saw that this can be useful to denoise natural images in highly noisy regimes. In particular, we improved on BM3D to more than 1 decibel of PSNR in these regimes.

The second variant of inverse problems that we investigated was the noisy compressive sensing. We experimented with auto-regressive sources, where we trained an ML-STC on a set of clean AR(1) examples. This was used in our iterative algorithm to recover under-sampled and noisy measurements of such data. We saw that it is possible to significantly improve upon pseudo-inversion using the proposed algorithm.

These algorithms can be improved in many ways. For image denoising, instead of JPEG2000, emphasis should be put on learned compression schemes, where training is done on clean images. This can be extended to other inverse tasks, e.g., image super-resolution, image inpainting or deblurring.

Our results on compressive sensing were promising but preliminary. We did not provide any recovery or convergence guarantees. We leave this, and a lot of other interesting possibilities for future investigation.

Concluding remarks

This thesis studies several aspects of data and signal processing, addresses some issues and proposes several solutions for them. The focus was on vectorial data and in particular images and image descriptors and the considerations and issues when dealing with them in large-scale scenarios. The similarity search and compression aspects were highlighted, where efficient and useful representations for data are to be learned from its different examples.

Our central consideration when modeling the data and learning the representations was compactness. This led us to carefully study and optimize the rate-distortion behavior of the proposed data models. We then argued that this optimization is useful beyond compression itself and can help a couple of other tasks in signal and image processing.

In the first part of the thesis, we started with reviewing a diverse range of methods and ideas from the literature of signal processing and machine learning. We interpreted all attempts under the general framework of Bayesian modeling and categorized them under two rough divisions of basic and composite models. We discussed their properties and benefits and their behavior under different sample regimes. This helped us in setting up our main strategy in design and development of the models as to start with basic ones and under simple assumption-based setups, where data-distribution is assumed and then gradually lift the assumptions and rely more on the data samples, as the models evolve from basic to composite ones.

This was pursued in the second part of the thesis, where we developed the algorithmic infrastructure based on the two families of basic prior models, i.e., synthesis and analysis models. The development of synthesis models first led us to the VR-Kmeans algorithm, which regularizes the standard K-means by imposing a rate-allocation criterion in learning the codebooks. We showed that this is very helpful in avoiding over-fitting, particularly in high-dimensional settings.

The criterion of having discrete representations led us to the development of the analysis model as the Sparse Ternary Codes (STC) framework, which is inspired by ℓ₀ regularization of the data in the projected domain. The information-theoretic properties of this encoding were studied, and several possibilities for reconstruction of these codes were investigated.

Studying the rate-distortion properties of these two frameworks led us to conclude that the basic analysis or synthesis modeling is not enough. In particular, we noticed that

we cannot operate at high rates and hence high-fidelity representations while having fast encoding-decoding procedures based on these two basic models. We then concluded that composite solutions based on these basic models should be preferred.

This transition from basic to composite modeling was based on the framework of successive refinement from information theory. In particular, we chose the additive residual based encoding to enhance our basic models. The VR-Kmeans was developed to the RRQ framework which maintains a very good rate-distortion performance at arbitrary rates and for any number of layers, without getting over-fitted.

Based on the two practical decoding schemes proposed for reconstruction of STC, its multi-layer extension was the ML-STC and the ML-STC-Procrustean. The former being more assumption-based, the latter model relies more on data samples and learns multiple layers of optimal transformations from the data using a Procrustean approach.

These composite models, however, are trained layer-by-layer and without assuming the anti-causal errors between layers. For the analysis-based models, i.e., the ML-STC and the ML-STC-Procrustean, another prominent evolution is still possible. These models can be considered as neural network structures and can hence benefit from the back-propagation technique to jointly optimize all layers. This was hindered by a technial issue regarding the non-differentiability of discretizing ternary operator. Thanks to the favorable information preservation properties of ternary encoding, however,the issue was solved using a smooth surrogate without causing approximation errors.

This made the evolution of our models complete, which we termed as STNets, a neural structure with discrete representations that is pre-trained layer-by-layer and using our sample-efficient and theoretically-sound recipes and is then fine-tuned using the back-propagation technique that benefits from the practical insights and infrastructures available in deep-learning communities. We then have a range of possibilities for training to choose from, w.r.t.

sample-size and computational budget available.

The third part of the thesis considers the applications of these algorithms in three directions. First, the problem of fast similarity search was addressed where we categorized the efforts in the literature under two families of binary hashing that benefits from fast search in the space of codes but degrades search quality and the VQ-based methods that have high search performance but do not benefit from fast search within the codes. We proposed that a third approach is possible using our ML-STC model that performs a double-stage procedure to search. The first stage is performing a very efficient search in the space of ternary codes, which we showed that possess superior coding gain w.r.t. the binary codes. While this stage prunes the majority of database candidates, the second stage refines the list of candidates by reconstructing the candidate codes and performing a floating-point search on them. This is also very efficient thanks to the rate-distortion behavior that we have optimized. We performed million-scale experiments on public databases showing superior performance.

147 The second application we addressed was the learned compression of images. We first argued that it can be beneficial under many scenarios to give up on the traditional image compression codecs and develop data-adaptive solutions instead. Two cases for learned image compression were considered: We first experimented with high-resolution and color natural images and showed that even when skipping several procedures in the compression pipeline, we can achieve results comparable to the JPEG2000 codec when training is done on 200 such images using the ML-STC-Procrustean. We then argued the necessity of avoiding block-wise division for images by measuring the loss in coding efficiency due to breaking the global spatial structure of the images into patches.

We then experimented with domain-specific images where a lot of redundancy is shared between the images of interest. In particular, we showed that we can surpass JPEG2000 in compression of facial images using a simple RRQ and without performing entropy coding. As a further step beyond compression, we tested whether this trained model can be beneficial for other tasks and in particular image denoising. Interestingly enough, we showed that by merely compressing and decompressing noisy facial images, we can outperform the BM3D in denoising of very highly noisy images, since the network has seen samples of clean images and can inject priors about face structure during reconstruction.

This led us to investigate more in this direction. So the third application of the thesis was to consider the usefulness of compression beyond itself. We presented several of our ongoing research and showed that it is possible to regularize inverse problems with compressibility as a prior, i.e., to encourage solutions that are better compressible under the (trained) model.

This was achieved using an iterative algorithm that considers the compression network as a black-box that can be evaluated multiple times during iterations. In particular, we showed that highly noisy images denoised by BM3D can be improved up to more than one decibel in PSNR using our proposed algorithm when the compression engine is the JPEG2000. We then targetted the noisy compressive sampling of auto-regressive sources and showed that we can significantly improve upon pseudo-inversion.

Future works

The thesis leaves a lot of directions for further research. Instead of the ML-STC that we used for simplicity, the STNets framework can further enhance the rate-distortion behavior by benefitting from more training examples. Moreover, instead of using image descriptors, the STNets framework can very well be used to train useful features directly from raw pixels.

The image compression pipeline proposed can be improved in many ways. We did not implement several important steps like entropy coding. As was concluded, patch-based encoding of images should be replaced with pyramidal decompositions. This way, the loss of global content of images, as well as the problem with variable size images will be resolved. Our proposed whitening procedure can also be improved in many ways. For example, the global

2D-DCT can be followed by local processing in order to make the independence assumption of sub-bands more realistic. Finally, the training can benefit from larger samples.

Our presented results for solving inverse problems with compressibility prior were pre-liminary and ongoing. Many aspects should be considered, e.g., how much we can gain with learned compression instead of JPEG2000 for image denoising, convergence guarantees, momentum-based solvers instead of simple gradient descent, recovery guarantees for com-pressive sensing, and also studying inverse problems other than denoising and comcom-pressive sensing.

Beyond these aspects, we can imagine other directions that can benefit from the frameworks developed in this thesis. An important step is to model the sequential codes from different layers of the ML-STC using sequence modelers like the Recurrent Neural Networks (RNNs).

This can bring a lot of possibilities, e.g, tasks like generative modeling or image classification can be addressed by adding a parallel RNN to the STNets and forming a composite cost function to be optimized jointly.

References

[1] Claude E Shannon. Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Conv. Rec, 4(142-163):1, 1959.

[2] T. Cover and J. Thomas. Elements of Information Theory 2nd Edition. Wiley-Interscience, 2 edition, 7 2006.

[3] Y. Chen and Y. Chi. Harnessing structures in big data via guaranteed low-rank matrix estimation: Recent theory and fast algorithms via convex and nonconvex optimization.

IEEE Signal Processing Magazine, 35(4):14–31, July 2018.

[4] Peyman Milanfar. Rendition: Reclaiming what a black box takes away. SIAM Journal on Imaging Sciences (under review), 2018.

[5] David L Donoho and Michael Elad. Maximal sparsity representation via ?1 minimization.

Proceedings of the National Academy of Sciences, 100(5):2197–2202, 2003.

[6] J. A. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10):2231–2242, Oct 2004.

[7] Thomas Blumensath and Mike E. Davies. Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, 27(3):265 – 274, 2009.

[8] A. Chambolle, R. A. De Vore, Nam-Yong Lee, and B. J. Lucier. Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Transactions on Image Processing, 7(3):319–335, March 1998.

[9] Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. Opti-mization with sparsity-inducing penalties. Found. Trends Mach. Learn., 4(1), January 2012.

[10] Mohammad Ali Maleki. Approximate message passing algorithms for compressed sensing. Stanford University, 2010.

[11] Michael Elad. Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer Publishing Company, Incorporated, 1st edition, 2010.

[12] Emmanuel J Candes and David L Donoho. Curvelets: A surprisingly effective nonadap-tive representation for objects with edges. Technical report, Stanford Univ Ca Dept of Statistics, 2000.

[13] M. N. Do and M. Vetterli. The contourlet transform: an efficient directional multiresolu-tion image representamultiresolu-tion. IEEE Transactions on Image Processing, 14(12):2091–2106, Dec 2005.

[14] M. Aharon, M. Elad, and A. Bruckstein. rmk-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11):4311–4322, Nov 2006.

[15] Leonid I. Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1):259 – 268, 1992.

[16] M. Yaghoobi, S. Nam, R. Gribonval, and M. E. Davies. Analysis operator learning for overcomplete cosparse representations. In2011 19th European Signal Processing Conference, pages 1470–1474, Aug 2011.

[17] S. Ravishankar and Y. Bresler. Learning sparsifying transforms for image processing.

In2012 19th IEEE International Conference on Image Processing, pages 681–684, Sept 2012.

[18] S. Ravishankar and Y. Bresler. Learning sparsifying transforms. IEEE Transactions on Signal Processing, 61(5):1072–1086, March 2013.

[19] S. Ravishankar and Y. Bresler. Closed-form solutions within sparsifying transform learning. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 5378–5382, May 2013.

[20] S. Ravishankar and Y. Bresler. Sparsifying transform learning with efficient opti-mal updates and convergence guarantees. IEEE Transactions on Signal Processing, 63(9):2389–2404, May 2015.

[21] S. Ravishankar and Y. Bresler. Learning overcomplete sparsifying transforms for signal processing. In2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3088–3092, May 2013.

[22] Michael Elad, Peyman Milanfar, and Ron Rubinstein. Analysis versus synthesis in signal priors. Inverse Problems, 23(3):947, 2007.

[23] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, June 2016.

[24] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.

http://www.deeplearningbook.org.

[25] Xiao-Jiao Mao, Chunhua Shen, and Yu-Bin Yang. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Proc.

Advances in Neural Inf. Process. Syst., 2016.

[26] Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan Xu. Memnet: A persistent memory network for image restoration. InProceedings of International Conference on Computer Vision, 2017.

[27] K. Zeng, J. Yu, R. Wang, C. Li, and D. Tao. Coupled deep autoencoder for single image super-resolution. IEEE Transactions on Cybernetics, 47(1):27–37, Jan 2017.

[28] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In 2011 International Conference on Computer Vision, pages 479–486, Nov 2011.

References 151 [29] J. Xu, L. Zhang, W. Zuo, D. Zhang, and X. Feng. Patch group based nonlocal self-similarity prior learning for image denoising. In2015 IEEE International Conference on Computer Vision (ICCV), pages 244–252, Dec 2015.

[30] Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. In Pro-ceedings of the 27th International Conference on International Conference on Machine Learning, pages 399–406. Omnipress, 2010.

[31] Bo Xin, Yizhou Wang, Wen Gao, David Wipf, and Baoyuan Wang. Maximal sparsity with deep networks? In Advances in Neural Information Processing Systems, pages 4340–4348, 2016.

[32] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Deep image prior. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2018.

[33] Stéphane Mallat and Irene Waldspurger. Deep learning by scattering. arXiv preprint arXiv:1306.5532, 2013.

[34] Vardan Papyan, Yaniv Romano, and Michael Elad. Convolutional neural networks analyzed via convolutional sparse coding. arXiv preprint arXiv:1607.08194, 2016.

[35] Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. InProceedings of the fourteenth international conference on artificial intelligence and statistics, pages 215–223, 2011.

[36] Adam Coates and Andrew Y Ng. Learning feature representations with k-means. In Neural networks: Tricks of the trade, pages 561–580. Springer, 2012.

[37] W. Johnson. and J. Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26(189-206):1, 1984.

[38] P. Li, T. Hastie J, and Kenneth W Church. Very sparse random projections. InProc. of 12th ACM SIGKDD int. conf. on Knowledge discovery and data mining, pages 287–296.

ACM, 2006.

[39] Michael Grant and Stephen Boyd. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, March 2014.

[40] Michael Grant and Stephen Boyd. Graph implementations for nonsmooth convex programs. In V. Blondel, S. Boyd, and H. Kimura, editors, Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pages 95–110.

Springer-Verlag Limited, 2008. http://stanford.edu/~boyd/graph_dcp.html.

Dans le document Learning to compress and search visual data in large-scale systems (Page 158-175)