Multi-Layer Sparse Ternary Codes - Learning to compress and search visual data in large-scale s

4.2.3 Rate-distortion behavior

We now experiment with the RRQ to see how it can target arbitrarily high rates. Fig. 4.3 compares the distortion-rate curves of the RRQ with RQ, on the same data as is in the experiments of Fig. 3.2.

Fig. 4.3 Distortion-rate curves for the RRQ (based on VR-Kmeans) and the RQ (based on K-means) for AR(1) Gaussian sources with ρ= 0,0.5,0.99, corresponding to (a)i.i.d., (b) mildly correlated and (c) highly correlated sources. Results are averaged over 5 independent experiments.

Clearly, the RQ based on the standard K-means is quickly over-fitting.⁴ Its regularized version, however, reduces the train-test gap, as prescribed byµ.

We also experiment with PQ and OPQ codes. It is interesting to see that as the data becomes more correlated, they suffer more from the independence assumption they impose on the sub-codebooks. On the other hand, as the rate increases, i.e., as the number of partitions becomes larger, the disparity between them also increases for PQ and OPQ. This, however, is not an issue with the RRQ, as we see a noticable performance improvement w.r.t. PQ.

Later in chapter 6, we will successfully use the RRQ for image compression and apply it on whitened images, as will be clarified in section 6.3.

4.3 Multi-Layer Sparse Ternary Codes

In the previous section, we mentioned the limitations of the single-layer structures in terms of their operational rate. Inspired by the successive refinement framework in information theory and the RQ from signal processing, we then extended them to multiple layers. In particular,

4This is a demonstration of what we quoted from [61] in section 4.2.1.3, regarding the limitation of RQ to usually 2 layers.

f^[0] ^Q^[^·^]^[1]

x^[1]

Q⁻¹[·]^[1] ˆf^[0] −⁺

layer 1

f^[1] ^Q^[^·^]^[2]

x^[2]

Q⁻¹[·]^[2] ˆf^[1] −⁺

layer 2

f^[2] ^{· · ·} f^[L⁻^1] ^Q^[^·^]^[L]

x^[L]

Q⁻¹[·]^[L] ˆf^[L−1]

layerL

Fig. 4.4 ML-STC architecture.

we proposed the RRQ algorithm as the straightforward, residual-based development of the VR-Kmeans and solved the rate limitation issues, targeting arbitrarily high rates with reasonable distortion.

While the K-means, the VR-Kmeans and hence the RQ and the RRQ are synthesis structures, on the other hand, it is of no surprise that similar concepts are valid for analysis models as well. In fact, we have already seen in section 3.3.2.2 that for the analysis-based STC framework, as the rate increases, its allocation to different dimensions deviates from optimality, an inevitable phenomenon due to design constraints which leads to poor performance for high rates as seen in Fig. 3.7.

How do we circumvent the issue with the analysis model and in particular the STC? It is very natural to ask wheather similar solutions to synthesis models exist also for the STC.

In fact, while the RQ solution of section 4.2 has a synthesis model shape, the theoretical framework of additive successive refinement does not have a constraint on the encoding-decoding construction. Therefore, we utilize this straightforward idea next in section 4.3.1 and validate it in practice in the experiments of section 4.3.3.

4.3.1 ML-STC algorithm

We are not aware of any existing solution within the analysis-shaped encoding schemes that takes advantage of successive refinement ideas. However, we now use this idea to develop the Multi-Layer STC (ML-STC) from the single-layer structure of STC developed in section 3.3.2.2.

This idea, identical to the residual encoding of the previous section, is simply put in the recursion rule of Eq. 4.7, and is illustrated in Fig. 4.4:

x^[l]=Qf^[l−1]=ϕ^[l]_λ

X(A^[l]f^[l−1])⊙β^[l], ˆf^[l−1]=Q⁻¹x^[l]= A^[l]^Tx^[l],

f^[l]=f^[l−1]−ˆf^[l−1],

(4.7)

where the superscripts depict the index of the layer l= 1,· · · , L. The input to the algorithm at layerlisf^[l−1], which is the residual of the approximation from layerl−1 and is initialized asf^[0]=f. The rest of the procedure is the same as the single-layer case.

4.3 Multi-Layer Sparse Ternary Codes 81 We saw earlier in section 3.3.2.4 that, in order not to deviate from optimality, the STC should be operated only at lower rates. Therefore, we choose the ternarization threshold λin all the ϕ^[l]_λ

X(·)’s to be very high, such that they produce very sparse x^[l]’s.

Algorithm 7 summarizes the ML-STC.

Algorithm 7 ML-STC

Input: Training set F (whitened), # of layers L and threshold λ(assuming equal for all layers)

Output: Projections A^[l]’s, weighting-vectorsβ^[l]’s and codes X^[l]’s, forl= 1,· · · , L

1: F^[0]←F

2: ˆF←0 ▷all-zero matrix of (n×N)

3: forl= 1,· · ·, L do

4: Estimate the covariance matrix C^[l−1]_F from input samples F^[l−1]

5: UΣU^T ←EIG(C^[l−1]_F ) ▷ Eigen-value decomposition

6: A^[l]←U^T

Although the main building-blocks of training of this algorithm are the eigenvalue decomposition (using both eigenvectors and eigenvalues) of the covariance matrices, this algorithm does not rely solely on the second-order statistical properties of the data. This is because the PCA is applied successively and on the residuals of encoding, rather than once and on the input data. In fact, as the layers increase, the residuals, and hence the input to the next layers, i.e., F^[l]’s become closer and closer to the white Gaussian noise.

We evaluate the performance of this algorithm in section 4.3.3 and section 4.4, and will utilize it in practice, later in chapter 5.

4.3.2 ML-STC-Procrustean algorithm

We saw earlier in section 3.3.2.3 that by abandoning some of the assumptions of the linear decoding of section 3.3.2.2 and relying more on the training data instead of the priors, we can formulate the learning procedure as in Eq. 3.37 and solve it using an alternating

minimization approach and by using the orthogonal Procrustes framework. This led us then to the Algorithm 4, which we dubbed the Procrustean-STC.

Within the multi-layer case, exactly similar to the ML-STC framework, we extend the Procrustean-STC algorithm by training over the residuals of the previous encoding. We term this new algorithm as ML-STC-Procrustean and summarize it as in Algorithm 8.

Algorithm 8 ML-STC-Procrustean

Input: Training set F (whitened), # of layers L and threshold λ(assuming equal for all layers)

Output: Projections A^[l]’s, weighting-vectors β^[l]’s and codes X^[l]’s, forl= 1,· · ·, L

1: F^[0] ←F

2: ˆF←0 ▷ all-zero matrix of (n×N)

3: forl= 1,· · ·, L do

4: A^[l],β^[l],X^[l]←Procrustean-STC F^[l−1], λ ▷From Algorithm 4.

5: ˆF^[l−1] ←A^[l]^TX^[l]

6: ˆF← ˆF + ˆF^[l−1] ▷ Current approximation

7: F^[l]←F−ˆF ▷Residual (input to the next stage’s quantizer)

8: end for

We will evaluate this algorithm in section 4.4.

4.3.3 Rate-distortion behavior

Let us see how the basic ML-STC of Algorithm 7 performs at higher rates. We take the same experimental setup as in Fig. 3.7, where our performance was limited to very low rates.

This is illustrated in Fig. 4.5, where we see that the ML-STC covers the entire rate-regime and provides very reasonable distortion.

Dans le document Learning to compress and search visual data in large-scale systems (Page 96-100)