• Aucun résultat trouvé

Extensions to low-light imaging

6 Multi-sensor fusion for motion deblurring

6.5 Extensions to low-light imaging

Our hybrid-speed sensor assumes sufficient light. However, under low light, the high-speed cameras can incur severe noise due to insufficient exposure and produce incorrect blur kernel estimations. In this section, we briefly discuss our recent work on extending

6.5 Extensions to low-light imaging 133

(c) (b)

Camera Array

(a)

Grasshopper (monochrome)

Flea2 (monochrome)

Flea2 (color)

Figure 6.9 Our heterogeneous sensor for low-light imaging. (a) Our heterogeneous sensor consists of four cameras; (b) and (c) show the mounting of the sensor for indoor and outdoor applications. [Adopted fromLiet al.(2013).]

the hybrid-speed sensor to low light imaging. We modify the sensor by combining a single high-speed monochrome (HS-M) camera, a pair of high resolution monochrome (HR-M) cameras, and a single high resolution RGB color (HR-C) camera. The HR-M cameras use a wide aperture and a relatively slow shutter for acquiring images with a low level of noise whereas the HS-M camera uses fast shutters for capturing motion blur-free images on fast moving objects. The HR-C camera is used to capture color information about the scene and it uses slow shutters to reduce the color noise. We call this sensor the heterogenous sensor.

6.5.1 Sensor construction

Figure 6.9shows the construction of the heterogenous sensor: it consists of one Point Grey Grasshopper high-speed monochrome (HS-M) camera (top-left), one Flea2 high resolution color (HR-C) camera (top-right), and two Point Grey Flea2 high resolu-tion monochrome (HR-M) cameras (bottom). All the cameras are equipped with the same Rainbow 16mm C-mount F1.4 lens. We mount the four cameras on a T-slotted aluminum grid, which is then mounted on two conventional tripods for indoor appli-cations. To deal with the long working range of outdoor applications, we also build a giant “tripod” from a six foot step ladder to hold the camera array grid, as shown in Figure 6.9(c). Since the observable regions of the cameras overlap substantially, we use Zhang’s algorithm (Zhang 2000) directly for camera calibration.

We build a single workstation with an Intel SC5400BASE-NA chassis to control and synchronize these cameras, and stream the image data to the storage device. All the cameras are connected to this workstation via PCI-E FireWire adaptors. The use of a FireWire bus allows us to synchronize the cameras using Point Grey software. In our system, the HS-M camera captures images at 640×480×8 bit at 120 fps, the two HR-M

Low Res Mono

Figure 6.10 Simultaneous deblurring and denoising using our heterogenous sensor. [Adopted fromLiet al.(2013).]

cameras work at 1024×768×8 bit at 30 fps, and the Flea2 color camera captures color images of size 1024×768×24 bit at 7.5 fps. To stream these∼100 MB/s image data, we connect the workstation to an external SATA disk array as the data storage device.

It is capable of writing data at 500 MB/s when configured to RAID 0, as shown in Figure 6.9(b).

To improve the SNR in low-light imaging, we choose to use two HR-M cameras since they gather significantly more light than the HR-C for the same sensor. This is because monochrome sensors do not use the “Bayer array” for color filtering. Bayer arrays can commonly filter out approximately two-thirds of the incoming light at every pixel. Also, monochrome sensors are responsive to a much wider range of spectrums and hence have higher light efficiency. In addition, we use wide apertures on the monochrome cameras.

However, due to large apertures the resulting images have a shallow DoF, i.e. they exhibit strong blurs for out-of-focus regions. Our solution is to use two HR-M cameras focusing at different parts of the scene to extend the DoF.

6.5.2 Processing pipeline

In our heterogenous sensor, every color frame regionIhrci captured by the HR-C camera maps to four pairs of synchronized high resolution grayscale framesIhrmp j(p= 4i,. . ., 4i+

3;j= 0, 1) by the two HR-Ms, and to 16 synchronized low resolution grayscale images Ihsmq (q = 16i,. . ., 16i+ 15) by the HS-M.Figure 6.10shows our system pipeline for fusing the imagery data. Specifically, we use the HR-M images as the spatial prior to denoise HS-M and HR-C images. We then estimate motion flow using denoised HS-M sequences to deblur the HR-C images.

The most challenging task in the pipeline is to denoise the HS-M camera using the HR-M camera pair. Recall that the HR-M and HS-M sensors use different exposure settings, therefore we first conduct feature-preserving tone-mapping (Mertens, Kautz

& Van Reeth 2007) on the low dynamic range HS-M image to match the intensity

6.5 Extensions to low-light imaging 135

level of the HR-M images. The tone-mapped HS-M image has enhanced contrast but still exhibits strong noise. Hence we use the HR-M images to denoise the HS-M ones.

However, the three sensors are separated by relatively large baselines and their images exhibit strong parallax. We therefore conduct a patch-based denoising scheme: for each noisy patch in the HS-M image, we first locate its corresponding patches in the HR-M images via multi-view block matching, and then impose them as spatial priors in total variation (TV) based denoising (Liet al. 2013).

The analysis of motion flows in the HS-M sequence also enables the separation of static versus moving regions in the HR-C image. For the static region, we only need conduct denoising and we apply the same TV-based denoising scheme by using patches from the HR-M cameras. To deblur the moving regions, we adopt a similar approach to Section 6.3to estimate the PSF in the HR-C image. The main difference is that the HR-C image is corrupted by both blur and noise under low light. As a result, direct deconvolu-tion can lead to large errors. Our approach is to apply the simultaneous deblur/denoise scheme (Wang, Yang, Yin & Zhang 2008,Krishnan & Fergus 2009,Liet al. 2013).

6.5.3 Preliminary results

We have conducted preliminary experiments on our heterogeneous sensor.

Synthetic scenes

To compare our hybrid denoising algorithm with BM3D quantitatively, we generate a pseudo multi-camera array and render the scene using AutodeskR 3ds MaxR. Specifi-cally, the HR-M cameras are rendered at a resolution of 1024×768 with aperturef/8, the HS-M camera at 640×480 with aperturef/22, and the HR-C at 1024×768 also with aperturef/22. All cameras have a focal length of 135 mm. We construct a scene with a moving paper cup with constant velocity 66.18 cm/s parallel to the image plane of the sensor. We focus one HR-M camera on the front of the table and the second one on the background orchid. To simulate motion blurs, we render 16 frames from the HR-C camera and average them as the motion-blurred HR-C image. To simulate noisy HS-M and HR-C images, we add zero-mean Gaussian noise of variance 0.09 and 0.03, respectively.

Figure 6.11shows our denoised HS-M images. We compare our approach with BM3D (Dabov, Foi, Katkovnik & Egiazarian 2007) using the peak signal-to-noise ratio (PSNR) in decibels (dB). Our algorithm not only outperforms BM3D by 0.24 dB but also preserves fine details better, such as the texture of the table, the text on the paper cup, and the contour of the background orchid. InFigure 6.12, we show our deblur-ring/denoising result of the synthetic HR-C image. We first estimate the motion infor-mation of the paper cup from the denoised HS-M image sequence and then compute the PSF to conduct non-blind deconvolution of the HR-C image. Finally, we apply our hybrid denoising technique. The deblurring/denoising process is applied on individual color channels and the results are then combined. Our approach is able to partially recover the text on the moving cup, even though it is corrupted by noise. It also increases the PSNR from 17.69 dB to 22.95 dB on the HR-C image.

Figure 6.11 Path-based denoising on a sample image in a synthetic scene. From left to right: the synthesized low-light image captured by the HS-M camera, the BM3D denoising result, and our result. [Adopted fromLiet al.(2013).]

Figure 6.12 Image deblurring/denoising on a sample HR-C image. From left to right: the motion-blurred HR-C image, our deblurring/denoising result, and closeup views of the moving object. [Adopted fromLiet al.(2013).]

Real scenes

Figure 6.13shows an indoor scene of a toy train moving in front of a cluttered back-ground. We focus one HR-M camera on the toy train and chocolate box at the front, and the second on the book in the background. Defocus variations due to large apertures can be clearly observed in (b) and (c). The use of large apertures, however, signifi-cantly reduces the noise. In fact, the HR-M images exhibit very low noise and hence are suitable for denoising the HS-M and HR-C images using our patch-based denoising.

Specifically, we first boost the contrast of the HS-M image (a) via feature-preserving tone mapping and the result is shown in (d). However the closeup views show that the contrast-enhanced result is still severely degraded. By applying our denoising algo-rithm, we can significantly reduce the noise while preserving local details such as the text on the background book and the contour of the foreground train. In contrast, the results using BM3D appear overly smooth and exhibit artifacts near edge boundaries.

For example, the closeup view (e) shows that the BM3D result exhibits a special diag-onal scanline noise pattern (possibly caused by the rolling shutter).

6.6 Discussion and summary 137

(a) (b) (c)

(d) (e) (f)

Figure 6.13 Patch-based denoising on a sample image of an indoor scene. (a) Shows the low-light image captured by the HS-M camera; (b) and (c) show the large aperture stereo image pair from HR-M cameras; (d) shows the contrast enhanced (a); (e) and (f) show the denoising results of (d) by BM3D and our method. [Adopted fromLiet al.(2013).]

InFigure 6.14, we show the deblurred result on the HR-C image. We first use the denoised HS-M sequence to estimate the PSF of the motion-blurred region in HR-C and then simultaneously denoise/deblur the region. In this example, we only apply the multi-view block matching algorithm on the green channel. This is because a typical Bayer array contains twice as many green as red or blue sensors and the green channel is less noisy. Our approach is able to reconstruct the contours of the moving train, partially recover the text on it, and significantly reduce the noise.

6.6 Discussion and summary

We have presented two multi-sensor fusion techniques for reducing motion blurs under sufficient and low-light conditions. Our solution leverages the latest advances on high-speed, high resolution and multi-spectral sensors by integrating multiple types of sensors into a unified imaging system. The multi-sensor fusion solution eliminates the need for co-axial setups or the requirement of shift-invariant kernels in hybrid-imaging systems.

It actively exploits the parallax across the sensors for simultaneous depth estimation and image deblurring/denoising. A downside of our solution, however, is that the system tends to be bulky and less portable since three or more sensors are generally needed.

Figure 6.14 Deblurring/denoising result on a sample HR-C frame. Left: the HR-C frame (contrast-enhanced) corresponding to the HS-M/HR-M sequences inFigure 6.13; right: the deblurred/denoised result. [Adopted fromLiet al.(2013).]

Our hybrid-speed sensor aims to handle fast motions under sufficient lighting condi-tions robustly. The sensor consists of a pair of high-speed color (HS-C) cameras and a single high resolution color (HR-C) camera. The HS-C cameras are able to capture fast motion with little motion blur. They also form a stereo pair and can estimate the low resolution depth map. We estimate the motion flows in the HS-C cameras and then warp them using the depth map onto the HR-C camera to get the PSFs for motion deblurring.

The HR-C image, once deblurred, is then used to super-resolve the depth map.

We have also demonstrated preliminary work on extending our hybrid-speed sensor to low-light imaging. We configured a heterogenous sensor by combining a high-speed monochrome (HS-M) camera, two high resolution (HR-M) cameras, and a single high resolution RGB color (HR-C) camera. The HR-M cameras use large apertures to gather more light. The HS-M camera captures fast motions without motion blurs but produces noisy images. The HR-C camera provides color information of the scene using a slow shutter but incurs strong motion blurs. We strategically fuse the heterogenous imagery data from the sensors to conduct simultaneous denoising and deblurring. We are currently conducting more thorough experiments on both synthetic and real scenes.

Our multi-sensor fusion solutions can potentially be applied to a range of imaging tasks beyond deblurring. For example, if we ignore motion blurs, we can simplify the hybrid-speed sensor to a hybrid-resolution stereo camera by coupling one high reso-lution color camera with one low resoreso-lution monochrome camera. Our recent work (Yu, Thorpe, Yu, Grauer-Gray, Li & Yu 2011) has shown that such sensors can be used to recover high quality depth maps at interactive speeds and in addition synthesize dynamic refocusing effects. Our hybrid-speed sensor can also be fine-tuned to acquire

References 139

high resolution, high-speed 3D videos. In particular, we can adjust the frame rate and the resolution of the sensors to enable more accurate PSF estimation and reliable deblur-ring. More sophisticated super-resolution techniques such as dictionary learning based methods can be used to generate high resolution depth maps for creating 3D video content.

Acknowledgements

Part of this chapter is based on the work that appeared inLiet al.(2008) and we grate-fully acknowledge IEEE for their permission to reproduce large portions here.

References

Baker, S. & Matthews, I. (2004). Lucas–Kanade 20 years on: A unifying framework. Interna-tional Journal of Computer Vision,56(3), 221–55.

Ben-Ezra, M. & Nayar, S. (June 2004). Motion-based motion deblurring.IEEE Transactions on Pattern Analysis and Machine Intelligence,26(6), 689–98.

Ben-Ezra, M. & Nayar, S. K. (2003). Motion deblurring using hybrid imaging. InProceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 657–64.

Bergen, J. R., Anandan, P., Hanna, K. J. & Hingorani, R. (1992). Hierarchical model-based motion estimation. InProceedings of the Second European Conference on Computer Vision, pp. 237–52.

Boykov, Y. & Funka-Lea, G. (2006). Graph cuts and efficient N-D image segmentation. Interna-tional Journal of Computer Vision,70(2), 109–31.

Dabov, K., Foi, A., Katkovnik, V. & Egiazarian, K. (2007). Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8), 2080–95.

Eisemann, E. & Durand, F. (2004). Photography enhancement via intrinsic relighting. ACM Transactions on Graphics,23(3), 673–8.

Fergus, R., Singh, B., Hertzmann, A., Roweis, S. T. & Freeman, W. T. (2006). Removing camera shake from a single photograph.ACM Transactions on Graphics,25(3), 787–94.

Kolmogorov, V. & Zabih, R. (2002). Multi-camera scene reconstruction via graph cuts. In Proceedings of the 7th European Conference on Computer Vision, Part III, pp. 82–96.

Kopf, J., Cohen, M. F., Lischinski, D. & Uyttendaele, M. (2007). Joint bilateral upsampling.ACM Transactions on Graphics,26(3), 96:1–10.

Krishnan, D. & Fergus, R. (2009). Fast image deconvolution using hyper-Laplacian priors. In Neural Information Processing Systems Conference,22, 1–9.

Li, F., Ji, Y. & Yu, J. (2013).A Hybrid Camera Array for Low Light Imaging. University of Delaware Technical Report UD-CIS-2013-01.

Li, F., Yu, J. & Chai, J. (2008). A hybrid camera for motion deblurring and depth map super-resolution. InComputer Vision and Pattern Recognition, pp. 1–8.

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints.IEEE Interna-tional Conference on Computer Vision,60(2), 91–110.

Mertens, T., Kautz, J. & Van Reeth, F. (2007). Exposure fusion. InIEEE 15th Pacific Conference on Computer Graphics and Applications, pp. 382–90.

Petschnigg, G., Szeliski, R., Agrawala, M., Cohen, M., Hoppe, H. & Toyama, K. (2004). Digital photography with flash and no-flash image pairs. ACM Transactions on Graphics, 23(3), 664–72.

Tai, Y.-W., Du, H., Brown, M. S. & Lin, S. (2010). Correction of spatially varying image and video motion blur using a hybrid camera.IEEE Transactions on Pattern Analysis and Machine Intelligence,32(6), 1012–28.

Tomasi, C. & Manduchi, R. (1998). Bilateral filtering for gray and color images.Proceedings of the 6th International Conference on Computer Vision, pp. 839–46.

Wang, Y., Yang, J., Yin, W. & Zhang, Y. (2008). A new alternating minimization algorithm for total variation image reconstruction.SIAM Journal on Imaging Sciences,1(3), 248–72.

Yang, Q., Yang, R., Davis, J. & Nister, D. (June 2007). Spatial-depth super resolution for range images.Computer Vision and Pattern Recognition, pp. 1–8.

Yitzhaky, Y., Mor, I., Lantzman, A. & Kopeika, N. S. (1998). Direct method for restoration of motion-blurred images.Journal of the Optical Society of America A: Optics, Image Science &

Vision,15(6), 1512–19.

Yu, Z., Thorpe, C., Yu, X., Grauer-Gray, S., Li, F. & Yu, J. (2011). Dynamic depth of field on live video streams: a stereo solution.Computer Graphics International, pp. 1–9.

Zhang, Z. (2000). A flexible new technique for camera calibration.IEEE Transactions on Pattern Analysis and Machine Intelligence,22(11), 1330–4.

7 Motion deblurring using fluttered