• Aucun résultat trouvé

Efficient computation of the spatially-variant model

Oliver Whyte, Josef Sivic, Andrew Zisserman and Jean Ponce

4.5 Efficient computation of the spatially-variant model

4.4.2 Updating the latent image

The latent image update (the third step) in the algorithm of Cho & Lee (2009) is performed in a similar way to many non-blind deblurring algorithms (Xu & Jia 2010, Krishnan & Fergus 2009) by solving a linear least-squares problem of the form

minf a∗fg22+αdxf2

2+αdyf2

2, (4.19)

wheredxanddyrepresent horizontal and vertical derivative filters, andαis a regularisa-tion weight. Note that in Cho & Lee’s (2009) formularegularisa-tion, the data term also takes into account the derivatives offandgas well as the intensities (i.e.a(d(q)f)d(q)g2

2

is also penalised, for various derivative filtersd(q)).

To apply our blur model, we simply replace the convolution in the data reconstruction term and update the latent image as

min

f Afg22+αdxf2

2+αdyf2

2, (4.20)

where A is given by Eq. (4.14). We minimise Eq. (4.20) using conjugate-gradient descent (Shewchuk 1994).

Note that at this point, we are not able to take full advantage of the speed optimisa-tions proposed byCho & Lee(2009), due to their use of Fourier transforms to compute convolutions and to perform direct minimisation ofEq. (4.19)in the frequency domain.

However, in the following section we will describe an efficient approximation of the spatially-variant blur model which enables this.

4.5 Efficient computation of the spatially-variant model

Due to the additional computational expense incurred by using a spatially-variant blur model instead of a spatially-invariant one, both blind and non-blind deblurring under this model can be very time consuming. As seen in the previous section, MAP-type deblurring algorithms typically involve solving linear least-squares problems, mini-misingAfg22 with respect tof, and Bwg22 with respect to tow. As defined inEq. (4.14)and(4.15), the matricesAandBinvolve sums over a set of homographies {T(k)}, where the sizekof the set can be large, of the order of hundreds or thousands.

In iterative minimisation algorithms for solving such least-squares problems, we must repeatedly compute arbitrary matrix-vector multiplications involvingAandB. For images with millions of pixels these matrices are generally too large to fit into memory, and the matrix-vector products must be computed on the fly, by explicitly warping the image for each homographyT(k). This is by far the biggest bottleneck in the use of our model, and means that both blind PSF estimation and non-blind deblurring are signifi-cantly slower than for spatially-invariant blur.

To reduce the running time of the whole deblurring process, in this section we propose an efficient approximation to the blur model from Eq. (4.12), based on the locally-uniform ‘efficient filter flow’ proposed byHirsch, Sra, Schölkopf & Harmeling(2010).

We begin by describing the approximation in Section 4.5.1, before demonstrating in

Section 4.5.2how it can be used to compute the matrix-vector products necessary for the kernel update step (Eq. (4.18)) very quickly. InSection 4.5.3we describe how this approach allows us to update the sharp image (Eq. (4.20)) directly in the frequency domain (instead of using iterative methods). The approximation allows blind deblurring to be performed an order of magnitude faster than when using the exact forward model.

Note that concurrently with our work,Hirsch, Schuler, Harmeling & Schölkopf(2011) proposed a similar model for efficiently computing spatially-variant camera-shake blur.

4.5.1 A locally-uniform approximation for camera shake

Hirschet al.(2010) observe that in some cases of spatially-variant image blur, the blur may vary slowly and smoothly across the image. In these cases, it is reasonable to approximate spatially-variant blur as locally-uniform. Following this observation, they propose a model for spatially-variant blur, whereby the sharp imagefis covered with a coarse grid ofPoverlapping patches, each of which is modelled as having a spatially-invariant blur. The overlapping patches ensure the blur varies smoothly, while allowing the forward model to be computed using P small convolutions.Hirschet al. (2010) assign each patchra spatially-invariant blur filtera(r), and their model is given by:

g=

where C(r) is a matrix that crops the rth patch of the imagef (thus C(r) reinserts it via zero-padding). The vector mis the Bartlett–Hann window, and· ◦ ·represents the Hadamard (element-wise) product. Note that this model can be computed very efficiently by computing the discrete Fourier transforms of each patch and filter using the Barlett–Hann window fast Fourier transform (FFT), multiplying them element-wise in the frequency domain, and then taking the inverse discrete Fourier transform of the result. Under varying assumptions, different authors have also proposed locally-uniform models of spatially-variant blur, which take similar forms to Eq. (4.21) (Nagy & O’Leary 1998, Vio, Nagy, Tenorio & Wamsteker 2005,Tai, Du, Brown &

Lin 2010).

In their original work,Hirschet al.(2010) parameterise the blur using a separate filter a(r)for each patchr. LikewiseHarmelinget al.(2010), who apply this model to single-image camera shake removal, also estimate a separate filter per patch using the MAP algorithm of Cho & Lee(2009). One weakness of this approach is that in textureless regions, the algorithm ofCho & Leemay fail, and so heuristics are needed to encourage similarity between neighbouring filters, and to detect and replace failed local kernel estimates.

Given the forward blur model for camera shake inEq. (4.12), which is parameterised by a single set of weightsw, we can in fact write eacha(r)in terms ofw. For each patch r, we choosea(r)to be the point spread function for the central pixelir, which is given by theirth row ofA. SinceAis linear inw, we can construct a matrixJ(r) such that a(r)=C(r)J(r)w. The elements of eachJ(r)are simply a rearrangement of the elements of the matricesT(k);Jjk(r) =Ti(k)rj .

4.5 Efficient computation of the spatially-variant model 89

(a) Approximately 3×4 patches (b) Approximately 6×8 patches

(c) Approximately 12×16 patches (d) Exact model

Figure 4.5 Approximating spatially-variant blur by combining uniformly-blurred, overlapping patches. Using the model described inSection 4.5.1, we can efficiently compute approximations to the spatially-variant blur model inEq. (4.12). With a small number of patches (a), the PSF at each pixel is visibly the sum of different blurs from overlapping patches. As more patches are used (b–c), the approximation becomes increasingly close to the exact model (d) – at 12×16 patches it is almost indistinguishable.

Having written each filtera(r)in terms ofw, we can then substitute this intoEq. (4.21) to obtain the following approximation of the forward model fromEq. (4.12):

g=Af=Bw P r= 1

C(r) C(r)J(r)w

mC(r)f

. (4.22)

This equation allows the forward model to be computed quickly using only a handful of small convolutions, which can be performed efficiently in the frequency domain.

Figure 4.5shows how the quality of the locally uniform approximation varies with the number of patches being used, compared to the exact model. In all our experiments, we use a grid of 6×8 patches.

4.5.2 Updating the blur kernel

In iterative algorithms for estimating the blur kernelwfromEq. (4.18), we typically need to compute the gradient of Bwg22 with respect tow, which involves com-putingByfor arbitraryyandBB. In addition to being able to compute the forward

model quickly,Eq. (4.22)provides us with a fast approximate way of computing these products using one convolution per patch:

By where · ⊗ · represents 2D correlation, and the function XCorrMatrix constructs the full cross-correlation matrix from a cross-correlationvector between two signals by replicating and shifting elements. A cross-correlation matrix M for a pair of signals uandvstores their inner-product at all possible translations of both signals. Assuming appropriate boundary conditions, each row ofMis simply a shifted version of the cross-correlationuv.

4.5.3 Updating the latent image fast, non-iterative non-blind deconvolution

The locally-uniform approximation allows the forward model and its derivatives to be computed much faster using the FFT, such that we can quickly compute the Ayand AAproducts needed to perform gradient-based optimisation of the sharp image in Eq. (4.20). However, for spatially-invariant blur, the fastest way of updating the sharp image (inEq. (4.19)) is not using iterative methods such as conjugate-gradient descent, but rather using non-iterative frequency-domain deconvolution, which we outline below.

In this section we extend this method to handle spatially-variant blur, via the locally-uniform approximation.

When blur is spatially-invariant, using Parseval’s theorem, Eq. (4.19) can be transformed intoNindependent 1D quadratic minimisations in the frequency domain, allowing the solution to be obtained directly by pixel-wise division in the frequency domain (Gamelin 2001): whereF(·)takes the 2D discrete Fourier transform (computed using the Barlett–Hann window fast Fourier transform), andF1(·)the inverse Fourier transform.

However, for spatially-variant blur, the locally-uniform approximation does not immediately permit this. Even though each patch has a spatially-invariant blur, the fact that the patches overlap means that the reconstruction of any blurred pixel will involve several patches. This can be seen by inserting the locally-uniform model into the non-blind deblurring problem in Eq. (4.20), and checking that the sum over patches lies inside the data-reconstruction term: Thus, this equation cannot be minimised independently for each patch.

4.5 Efficient computation of the spatially-variant model 91

In order to be able to estimate each patch independently, one simple solution is to use Jensen’s inequality to obtain an upper bound onEq. (4.26), taking the sum over patches outside the data-reconstruction term, and expanding the regularisation terms to also penalise individual patches. Having done this, we minimise the upper bound instead of the original cost. We define the set of blurred patches{g(r)}and sharp patches{f(r)}

for each patch. This can be done using the direct frequency-domain method of Eq. (4.25). Next, we estimate the full deblurred imageˆfthat best matches the deblurred patches{ˆf(r)}by minimisingP

r= 1mC(r)f− ˆf(r)2

2. This problem can be solved independently for each pixel, yielding the following solution:

ˆf=

rC(r)(m◦ ˆf(r))

rC(r)(mm) . (4.28)

InFigure 4.6we compare this method of non-blind deblurring to other possibilities for solvingEq. (4.20)with spatially-variant camera shake blur. The fast independent method produces results which are visually very similar to those obtained using the exact model, in a significantly shorter amount of time. Using this direct deconvolution method to update the latent image viaEq. (4.20)during blind deblurring provides an additional speed improvement, compared to the use of conjugate-gradient descent with the approximate forward model.

As a demonstration of the speed-up achievable with this approximation, the examples inFigures 4.4,4.6and4.7all took more than three hours for blind deblurring using the exact model (implemented in Matlab and C), compared to under six minutes using the approximation presented in this section (implemented in Matlab). The results shown in this chapter were all produced using the approximation. The reduction in computa-tional complexity can be quantified by comparing the exact model inEq. (4.13)with the approximate model in Eq. (4.22). Evaluating Eq. (4.13)requires O

NK

operations, where N is the number of pixels in the image and K is the number of non-zeros in w, whereasEq. (4.22) requiresO

NlogN

operations (due to the use of the FFT to compute the convolution). For a camera shake blur, O

K

O

N

, and thus the approximation provides a significant reduction in computational complexity. For a complete discussion of the computational complexity, as well as a full derivation of the equations presented here, please refer toWhyte(2012).

Application to other non-blind deblurring algorithms

The fast, non-iterative method for non-blind deblurring presented in this section can be used as a building block in more sophisticated non-blind deblurring algorithms.

A number of recent algorithms, which place sparse-gradient priors on the deblurred image (Krishnan & Fergus 2009,Wang, Yang, Yin & Zhang 2008,Shanet al. 2008), or which use more robust data-reconstruction terms (Yan, Zhang & Yin 2009,Xu & Jia

(a) Blurred image and estimated blur kernel

(b) Conjugate gradient descent (20 iterations), exact model, 442 s

(c) Conjugate gradient descent (20 iterations), approximate model, 137 s

(d) Direct independent deconvolution of patches, 4 s

(a) (b) (c) (d)

Figure 4.6 Least-squares non-blind deblurring using the exact and approximate forward models.

Given a blurred image of size 1024×768 and blur kernel (a), this figure shows the results and computation times for least-squares deconvolution with2gradient regularisation, using (b) conjugate-gradient descent (CG) with the exact forward model, (c) CG with the approximate forward model, and (d) direct deconvolution using the approach described inSection 4.5.3.

The results are visually similar using all three methods. Using CG with the approximate forward model is much faster than with the exact model, however the direct approach takes only a fraction of the time of either of these.

2010), involve solving sub-problems of the form of Eq. (4.19), which has a quadratic data-reconstruction term and quadratic regularisation. As such, the non-iterative method presented in this section can be used to extend these more sophisticated algorithms to handle spatially-variant blur, without a substantial increase in processing time. The

4.5 Efficient computation of the spatially-variant model 93

(a) Blurred image (b) Some local PSFs for (d), magnified

(c) MAP-2, uniform (d) MAP-1, non-uniform

Figure 4.7 Blind deblurring of real camera shake. The result of blind deblurring on a real camera shake image (a), captured with a shutter speed of 1 s, using the MAP approach ofCho & Lee with both the uniform and non-uniform blur models. Also shown in (b) are some of the local PSFs generated from the blur kernel in (d) at various points in the image. In the blurred image, most of the text on the book cover is too blurred to read. Deblurring the image with the uniform blur model (c) allows some of the text on the cover of the book to be read, however, after deblurring with our non-uniform model (d), all but the smallest text becomes legible.

The estimated kernels for the two models are shown inset in the deblurred results. The blur kernel in (d) covers±0.4inθXandθY, and±0.9inθZ. Reproduced fromWhyteet al.(2012) with permission, © Springer 2012.

results shown inSection 4.6use the algorithm ofKrishnan & Fergus(2009), modified in this way, to perform the final non-blind deblurring.