Approximate Least Squares Based Strategies

Sparse Matrix Computational Techniques in Concept Decomposition Matrix

10.3 Approximate Least Squares Based Strategies

Note that all we want is to find a sparse matrix M such that the functional f(M) =A−C M²F is minimized, or more precisely, approximately minimized.

A similar problem has been studied recently in the preconditioning community to compute a sparse approximate inverse matrixM for a nonsingular coefficient ma-trixAto solve a sparse linear systemAx=b. The sparse matrixM is computed by minimizing

f(M) =min

M∈GI−AM²F, (10.3)

subject to certain constraints on the sparsity pattern ofM. Here the sparsity pattern ofMis restricted to a (usually unknowna priori) subsetG.

So, taking the sparse approximate inverse computation in mind, for a term-document matrixA, we can minimize the functional

f(M) =min

M∈GA−CM²F (10.4)

with a constraint such thatMis sparse. The most important part of this minimization procedure is to determine the sparsity pattern constraint setG, that gives the sparsity pattern ofM.

10.3.1 Computational Procedure

For a moment, we suppose that a sparsity pattern setGforMis given somehow, the minimization problem (10.3) is decoupled intonindependent subproblems as

A−CM²F=

∑

n j=1

(A−CM)e_j²2=

∑

n j=1

a_j−Cm_j²2, (10.5) where a_j andm_j are the jth column of the matricesAandM, respectively. (ej is the jth unit vector.) It follows that the minimization problem (10.5) is equivalent to minimizing the individual functions

Cm_j−a_j2, j=1,2, . . . ,n, (10.6) with certain restrictions placed on the sparsity pattern ofm_j. In other words, each column ofM can be computed independently. This certainly opens the possibility for parallel implementation. Since we assume the sparsity pattern ofm_j(andM) is given, i.e., a few, sayn₂, entries ofm_jat certain locations are allowed to be nonzero, the rest of the entries ofm_jare forced to be zero. Denote then₂nonzero entries of m_j by ¯m_j and then₂columns ofCcorresponding to ¯m_j byC_j. SinceCis sparse, the submatrixC_jhas many rows that are identically zero. After removing the zero rows ofC_j, we have a reduced matrix ¯C_jwithn₁rows. The individual minimization

problem (10.6) is reduced to a much smaller least squares problem of ordern₁×n₂ C¯_jm¯_j−a¯_j2, j=1,2, . . . ,n, (10.7) in which ¯a_jconsists of the entries ofa_jcorresponding to the remaining columns of C_j. We note that the matrix ¯C_j is now a very small rectangular matrix. It has full rank if the matrixCdoes.

There are a variety of methods available to solve the small least squares problem (10.7). Assume that ¯C_jhas full rank. Since ¯C_jis small, the easiest way is probably to perform a QR factorization on ¯C_jas

C¯_j=Q_j orthogo-nal matrix, such thatQ⁻¹_j =Q^T_j. The least squares problem (10.7) is solved by first computing ¯c_j=Q^T_ja¯_j and then obtaining the solution as ¯m_j=R⁻_j¹c¯_j(1 :n₂).In this way, ¯m_jcan be computed for each j=1,2, . . . ,n, independently. This yields an approximate decomposition matrixM, which minimizesCM−AFfor the given sparsity pattern.

The remaining problem for constructing a sparse approximate decomposition matrix M is choosing or deciding a good sparsity pattern forM. Here we intro-duce static sparsity pattern (SSP) and dynamic sparsity pattern (DSP) approaches.

They are based on the similar strategies proposed in the preconditioning commu-nity [5, 9–11]. The difference between the static and dynamic strategies lies in that the static sparsity patterns are decided before the matrix construction phase (a pri-ori) and unchanged during the computation while dynamic sparsity patterns are adjusted adaptively in the approximate decomposition matrix construction phase.

10.3.2 Static Sparsity Pattern (SSP)

In preconditioning field, there are some heuristic strategies developed for choosing suitable sparsity patterns forM. A particularly useful and effective strategy is to use the sparsity pattern of the coefficient matrixCorC^T. Chow [4] offers the strategy of using sparsity patterns ofCas the sparsity pattern forM. The difficulty for choos-ing a static sparsity pattern in information retrieval lies in the fact that there is no known study that has been done to find a suitable sparsity pattern, to the best of our knowledge. This work ventures into a non-traditional application of computational numerical linear algebra with approximate decomposition matrix computation in information retrieval.

Once a good sparsity pattern is chosen or found, the static sparsity pattern algo-rithms are relatively easier than the dynamic sparsity pattern (to be discussed later) to implement [4, 12, 13].

The knowledge from the preconditioning field can be exploited for choosing a suitable sparsity pattern for our application. Note that the concept matrixCdescribes

the relationship between the term vectors and the concept vectors. If a term is re-lated to a concept vector, this relationship may be maintained in the approximate decomposition matrix in some sense. However, this line of reasoning is much more difficult than that in the preconditioning field. This is because of the fact that the di-mensions of the matrixCand those ofM do not match. The dimensions ofC are m×kand those ofMarek×n. To make such an approach practically useful, several auxiliary strategies based on the sparsity pattern ofCand entry values of bothCand Aare proposed.

• Our first strategy is based on the numerical computation, vector-vector product.

That isc_im_j=a_{i j}, wherec_iis theith row ofC,m_jis thejth column ofM, anda_{i j} is the entry at theith row, jth column ofA. The sparsity pattern ofm_jis given in this way: Ifa_{i j}is the largest entry in the jth column ofA, the sparsity pattern of the jth column ofM,m_j, is the same as that of theith row ofC, i.e.c_i. Here we use small matrices to illustrate our ideas. Suppose we have three matrices:C_4×3, M_3×5, andA_4×5. The pattern ofCM=Ais depicted by the Eq. (10.9).

Here, “x” denotes nonzero entry, “-” denotes undefined pattern. We determine the sparsity pattern ofMcolumn by column. First, find the largest entry in each column ofA, suppose they area₃₁,a₁₂,a₄₃,a₁₄, anda₂₅in Eq. (10.9). Then the sparsity pattern ofm₁, the first column ofM, is the same as that of the third row ofC,c₃and the sparsity pattern ofm₂is the same as that ofc₁,m₃andm₄have the same sparsity pattern ofc₄andc₁respectively. Finally we have the sparsity pattern ofMlike this:

⎛

Since there may be more than one largest entries in each column of term-document matrixA, the following rules may be applied to choosing the largest term.

– Start the above procedure from the column ofAthat has the smallest number of nonzero entries.

– Do not use the same row’s sparsity pattern ofCif possible.

Based on this strategy, the sparsity ratio ofMis almost the same as that ofC.

• In order to improve the accuracy and robustness, the first strategy can be applied again to those second largest entries in each column ofA. For example the second largest entry in the second column isa₃₂. Comparing the pattern of the third row ofC, (x x0), with that of the second column ofM, (x0 x), we simply fill in more nonzero entries in the second column ofMbased on the nonzero positions in the third row ofC. Now the sparsity pattern of the second column ofM is

(x x x). This strategy can be repeated couple of times as needed. The matrixM may be more dense. However it might be more accurate and robust. We can also control the number of fill-ins like that in preconditioning techniques.

10.3.3 Dynamic Sparsity Pattern (DSP)

The dynamic sparsity pattern strategies first compute an approximate decomposi-tion matrix by solving the least squares problem (10.4) with respect to an initial sparsity pattern guess. Then this sparsity pattern is updated according to some rules and is used as the new sparsity pattern guess for solving (10.4). The approximate decomposition matrix computation may be repeated several times until some stop-ping criteria are satisfied. Different update rules lead to different dynamic sparsity pattern strategies. One useful rule, suggested by Grote and Huckle [5] in comput-ing sparse approximate inverse preconditioners, adds the candidate indices into the sparsity pattern of the current approximationSthat can most effectively reduce the residualg=C(·,S)m¯j−aj.

The candidate indices are chosen from the set β⁼{j∈/S|C(α^,^j)=0},

whereα⁼{i|g(i)=0}. This is a one-dimensional minimization problem minuj g+u_j(Cej−a_j)2, j∈β^,

whiche_jis the jth unit vector. Denotel_j=Ce_j−a_j, the above minimization prob-lem has the solution

u_j=−g^Tl_j l_j²₂.

For each j, we compute the two-norm of the new residual as ρj=g²−(g^Tl_j)²

l_j²₂ .

Then we can choose the most profitable indices jwhich lead to the smallest new residual normρj. The procedure to augment the sparsity structureSis as follows.

Dynamic Sparsity Pattern Construction Algorithm Given the maximum number of update steps ns>0,

stopping toleranceε, integersµ>0, and an initial diagonal sparsity pattern S;

Loop:

computeρjfor all indices j∈β; Compute the meanλof{ρj};

At mostµindices withρj<λwill be added into S;

Untilr2<εor exceed the steps ns.

Barnard et al. released an MPI implementation, SPAI 3.0, [14] for computing the sparse approximate inverse preconditioner of a nonsingular matrix. We modified the code for our computation of the sparse approximate decomposition matrix.

Dans le document Advances in Computational Algorithms and Data Analysis (Page 146-150)