Implementation of an IMU Aided Image Stacking Algorithm in a Digital Camera for Unmanned Aerial Vehicles

(1)

HAL Id: hal-01559030

https://hal.archives-ouvertes.fr/hal-01559030

Submitted on 17 Jul 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Algorithm in a Digital Camera for Unmanned Aerial Vehicles

Ahmad Audi, Marc Pierrot-Deseilligny, Christophe Meynard, Christian Thom

To cite this version:

Ahmad Audi, Marc Pierrot-Deseilligny, Christophe Meynard, Christian Thom. Implementation of an

IMU Aided Image Stacking Algorithm in a Digital Camera for Unmanned Aerial Vehicles. Sensors,

MDPI, 2017, 17 (7), pp.1646. �hal-01559030�

(2)

Implementation of an IMU Aided Image Stacking Algorithm in a Digital Camera for Unmanned Aerial Vehicles

Ahmad Audi^1,*, Marc Pierrot-Deseilligny^2,, Christophe Meynard^1,and Christian Thom^1,

1 Université Paris-Est, IGN, LaSTIG, LOEMI, 73 Avenue de Paris, 94160 Saint-Mandé, France;

ahmad.audi@ign.fr (A.A.); christophe.meynard@ign.com (C.M.); christian.thom@ign.com (CH.T.)

2 Université Paris-Est, IGN, LaSTIG, LOEMI, ENSG, 6-8 Avenue Blaise Pascal, 77420 Champs-sur-Marne, France; Marc.Pierrot-Deseilligny@ensg.eu (M.P-D.)

* Correspondence: ahmad.audi@ign.fr Academic Editor: name

Version July 7, 2017 submitted to Entropy

Abstract: Images acquired with a long exposure time using a camera embedded on UAVs

1

(Unmanned Aerial Vehicles) exhibit motion blur due totheerratic movements of the UAV. The

2

aim of the present work is to be able to acquire several images with ashort exposure timeand use

3

an image processing algorithm to produce astackedimage with an equivalentlong exposure time.

4

Our method is based on features image registration technique. The algorithm is implemented

5

on the light-weight IGN (Institut national de l’information géographique) camera, which

6

have an IMU (Inertial Measurement Unit) sensor and a SoC (System On Chip)/FPGA

7

(Field-Programmable Gate Array). To obtain the correct parameters forthe resampling of the

8

images, the proposed method accurately estimates the geometrical transformation between the

9

first and the N^thimages. Feature pointsare detected in the first imageusing theFAST(Features

10

From Accelerated Segment Test) detector,thenhomologous points on other images are obtained

11

by template matching using an initial positionbenefiting greatly from the presence oftheIMU

12

sensor. The SoC/FPGA in the camera is used to speed up some parts of the algorithm in order

13

to achieve real-time performance as our ultimate objective is to exclusively write the resulting

14

image to save bandwidth on the storage device. The paper includes a detailed description of the

15

implemented algorithm, resource usage summary, resultingprocessingtime, resulting imagesand

16

block diagrams of the described architecture. The resulting stacked image obtainedforreal surveys

17

does not seem visually impaired. An interesting by-product of this algorithm isthe 3D rotation

18

estimated by a photogrammetric method between poses, which can be used to recalibrate in

19

real-time the gyrometers of IMU. Timing results demonstrate that the image resampling part of

20

this algorithm is the most demandingprocessing taskand shouldalso beaccelerated in the FPGA

21

in future work.

22

Keywords: UAVs, Stacking, Inertial Measurement Unit, Image processing, Photogrammetry,

23

Real-time, Co-design.

24

1. Introduction

25

Over the last decade, Unmanned Aerial Vehicles (UAVs) have been widely used for civil

26

applications [1],especially for environmental surveys[2], [3]. Their growing popularity has led

27

the LOEMI team of IGN/LaSTIG to design and produce an ultra-light smart digital aerial camera,

28

more suitablefor photogrammetric applications than consumer cameras.

29

This camerawasoriginally used to exploit photogrammetric and metrologic surveys using UAVs

30

Submitted toEntropy, pages 1 – 22 www.mdpi.com/journal/entropy

(3)

whilesome research works are alreadyunderway,such as [4] and [5]. The next targeted applications

31

involve shots under cloudy conditions,night-time surveys and narrow spectral bandwidth imagery,

32

that usually implyalong exposure time. Images acquired with a long exposure timeusinga camera

33

embedded on UAVsmayexhibit motion blur due totheerratic movements of the UAV. This paper

34

presents an image stacking algorithm that can provide an accurate blur-free composite image of

35

photogrammetric quality with an equivalent long exposure time. An initial presentation of this

36

work has been done in [6].

37

In this context, the main part of our algorithm is based on featuresimage registration [7] [8] [9].

38

This technique consists of feature detection, feature matching, transformation estimation, image

39

transformation and resampling. The ultimate objective of our work is to achieve real-time processing

40

of the algorithm, leading toonly the resulting image needing to be saved, and therefore ensuring

41

a better overall frame rate. The inertial measurement unit mounted inside the camera cangive the

42

relative orientations of the camera during shootings quantifying an initial amount of motion between

43

poses. This consequently considerably speeds up the expensive computing of matching feature

44

points,one of the most time consuming tasks, making our stacking algorithm more appropriate for

45

real-time performance. Images are acquired withahigh frame rate, which impliesminordeformation

46

between images.

47

Two approaches are introduced in this paper. For the firstapproach, the UAV isassumed to be

48

quasi-stationary during exposures,only camera attitude variationsare considered,andonly image

49

deformation due to camera rotationis corrected. One interesting by-product of this method is a

50

better estimation of variationin the orientation of the camera between shots,which can be used

51

to recalibrate the IMU gyrometers. Other sensors, such as lidar sensors on the UAV can then

52

benefit from the resulting better-quality of the IMU data. The second approach uses the 2D planar

53

homography transformation supposing that not only the 3D rotation is taken into account oncamera

54

movementbut also the translationwhenthe scene is nearly planar.

55

Our algorithm works completely automatically, whether offline or online. Some tests have been

56

completedon several types of UAVs,including the Copter 1B UAV equipped with the IGN camera

57

to assesstheaccuracy of the method and some of the results are presented.

58

2. Related Works

59

This workrelatesto two fields: image stacking and visual-inertial fusion techniques.

60

The stacking technique creates a single image by combining many images that may be superimposed.

61

One stacking method is HDR(High Dynamic Range)imaging, whichcombines multiple images with

62

different exposure levels in order toobtaina single image with a large dynamic range [10], [11], [12].

63

These worksassumethattheimages are superimposable; differing from our methodon this basis.

64

Many visual-inertial fusion techniques have been proposed intheliterature. Weherebypresent some

65

of them. Authors of [13] suggests usingthe inertial measurement sensors to measure camera’s

66

acceleration and angular velocity during exposure, which can then be used toestimate the blur

67

function and then deconvolve the blurred image to achieve sharp results. Their methodonlyworks

68

forvery small movements and many cited limitations, such as sensor accuracy and noise, constrain

69

their usage. In the same context, the IMU is used in [14] during aerial shooting by a camera

70

embedded on an aircraft to eliminate blur and provide a deblurred image. The authors’ objective

71

is close to ours,but their methodonly coversvery small movements.Their results show that images

72

with morethan7 pixelsinmotion blurrecordminor improvements. Other works relate to the same

73

objective using deblurring algorithms without IMU data: [15], [16] and [17].

74

Some studies in literature expand on the combination of IMU and computer vision for tracking

75

purposes. For example, a vision-assisted algorithm is combined to improve positioning and to

76

track the performance of mobile devices for SLAM (Simultaneous Localization And Mapping)

77

applications[18].Parts of this approach seem similar to ours.ItusesIMU gyrometers measurements

78

to estimate rotation angles to improve the accuracy of trajectory estimation, but they also use

79

(4)

reference nodes as control points identified in the vision processingphaseto correct the propagation

80

of IMU errors. Therefore, thismethod is not optimal if no reference nodes are availablealong

81

the trajectory. Authors of[19], [20] and [21] use a combination of inertial sensors anda camera for

82

motion tracking in augmented reality applications. Mostfusion sensor works use inertial sensors to

83

aid computer vision methods,making the system more accurate and more robust.

84

Someparts of our work havealso been explored in alternative domains. For example,a real-time

85

video stabilization algorithm for unmanned aerial vehicles is proposed in[22], butthismethod

86

excludes any aided-sensors such asIMU to improve the accuracy of motion estimation or reducethe

87

tracking time for featuresin consecutive frames.

88

3. Materials and Methods

89

3.1. Hardware

90

The system presented in Figure 1 consists of an UAV equipped with an IGN lightweight

91

photogrammetric camera [23]. The camera used to implement our algorithm is shown in Figure

92

2. This camera was developed at the French National Mapping Agency (IGN) by the LOEMI team.

Figure 1.IGN camera with the Copter 1B UAV developed by Wikhydro

6 axis IMU

CMV20000-CMOS Sensor (20 MP) "SoC- FPGA Zynq 7030" chip

Figure 2.The CMOS Sensor with the inertial sensors and the embedded intelligence

93

It is based on a global shutter CMOS 20MP sensor (CMOSIS CMV20000)from CMOSISand a Zynq

94

7030 SoC/FPGAfrom Xilinx,allowing some image processing algorithms to be implemented in

95

hardware, making them real-time compliant. It has both embedded intelligence, in the form of two

96

ARM cortex A9 CPUs running a Linux OS andapplication softwares, and a programmable logic

97

subsystem originallyused to acquire image data from the sensor. A standard 35 mm optical lens

98

(5)

is usedfor this purpose. The camera prototype we used was fitted with a Bayer sensor and has

99

a resolution of 20 MP (5120×3840). Our work aims to be used for a hyperspectral imaging system

100

consisting ofseveral monochrome IGN cameras, thus we use the 5 MP (2560×1920) grayscale images

101

obtainedfrom the original pixel-stream by averaging each green pair of the Bayer pattern.

102

Recently, LOEMI team engineers integrated aGNSS (GLobal Navigation Satellite System)receiver,

103

whichcan be used in future works to take into accountof UAV translationduring shots.

IMU

CMOS Sensor

z x

y

Figure 3.The IMU and the CMOS sensor inside IGN camera

104

The ICM-20608-G IMU sensor from InvenSense used in the camera is a low-cost MEMS

105

(Micro-electro-mechanical-system)device. It measures angular speed around the three axes with

106

low latency and at a high frequency. The main disadvantage of MEMS technology is the reduced

107

performance in terms of accuracy and stability: the slow evolving bias is the main type of error that

108

affects measurements,as itaccumulates over time due to the integration of angular velocity.We will

109

show that it can be estimated by sensor fusion and then eliminated from the subsystem.Laboratory

110

thermal calibrationis used to eliminate the temperature effects caused by thedissipation of heat

111

inside the camera (75^◦ C). The IMU sensor istightly fixed inside the camerato ensure that both

112

sensors experience the same 3D rotationas shown in Figure3. We use the IMU subsystem to obtain

113

estimation of absolute 3D rotation of each pose (Rⁿ_imu) acquired by the CMOS sensor.

114

3.2. Calibration of the Camera

115

In photogrammetric processes, cameras should be calibrated to take accurate 3D geometric

116

measurements from their images. This calibrationinvolvesestimating their internal orientation. It

117

is obtainedusing camera self-calibration bundle adjustment methodviaMicMac, free open-source

118

photogrammetric software developedbyIGN [24]. The method used in MicMac is based on SIFT

119

points, matching relationship in stereo images. It is applied toeach pair of multiple view images

120

takenusingan un-calibrated camera. We use the radial standard distortion model [25] in our work.

121

(6)

3.3. Images and Data Processing

122

This algorithm is implemented and executed in the camera and is intended tobenefit fromthe

123

FPGA acceleration.It was initiallydeveloped onadesktop PC.

124

3.3.1. Algorithm

125

The architecture of the algorithm is illustrated in Figure4.

Figure 4.Architecture of the algorithm.

(7)

• Camera attitudeis computed using measurements from the IMU subsystem.

126

• Acquisition of a burst of 10 images by the camera,saved inRAM at themaximumframe rate of

127

the CMOS (30 images/s);the time difference between successive images is 33 ms.

128

• Detection of a reasonable number of feature points in the first image. This part is implemented

129

in hardware.

130

• For the next sequential images:

131

1. In this image,thepredicted positions of feature points are computed using IMU orientation

132

to speed up the next step.

133

2. Template matching is used to determine the accurate homologous position of these

134

feature points.

135

3. (a) 3D rotation between the two poses is estimated usingtheleast squares method with

136

outliers elimination. This informationmay possibly beused as feedback in the IMU

137

subsystem to improve its overall quality by the real-time determination of sensor

138

drifts.

139

(b) 2D homography transformation between the two images is estimated usingtheleast

140

squares method withoutliers elimination.

141

4. The image is resampled in the geometry of the first image.

142

5. Stackit.

143

• Savethe stacked image.

144

3.3.2. Features Detection

145

Detection of features is a fundamental task in computer vision applications, such as motion

146

detection, object recognition and tracking, 3D reconstruction, visual odometry etc. A feature can be

147

defined as a pattern or distinct structure with special properties found in an image. Most of the time,

148

these features are used in further processing steps. In our work, thefeatures detection algorithm was

149

mainly selected on the basis ofits ability to find a reasonable number of good features in a single

150

image (leading by correlation to unambiguous matches in other images) and by the possibilityfor

151

easy implementationin hardware. After analyzingthemain existing feature detectors [26] [27] [28],

152

we selected the FAST detector algorithm developed by [29].

153

This detector is many times faster in software than other existing detectors [30], proving that it

154

is assumed to be conceptually simple, leading to the fact that it can be easily implemented in

155

hardware. In our application, we will notconsiderthe stability of the detector because feature points

156

willonly be detectedin the first image and then searchedforby template matching in otherimages.

157

Anotheradvantage is that FAST requiresfarless resources than other detectors andin parallel, offers

158

sufficient performance for our application.

159

On the otherhand,theFAST detector is dependent as many other detectors on a threshold,which

160

must be set according toimage noise level, especially as FAST has been shown to be sensitive to

161

high-level noise [26] [27]. However, in our camera, the noise level is reduced to be constant in RMS

162

with respect to gray levels byusing the non-linearLUT(Look Up Tables),whichtranslate the 12 bits

163

of pixel data into 8 bits. We can then use a threshold value independent of image brightness level and

164

content.

165

3.3.3. Features Selection Approach

166

Authors of[31] shows that FAST tends to detect a high number of feature points in high-textured

167

zones but not in low-textured zones. This means that the feature points maybe concentrated only

168

inhigh-textured zones orina specific zone oftheimage. This non-uniform distribution of points

169

makes the estimation of transformation (rotation or homography) between images less accurate [32].

170

To limit the high number of feature points in high-textured zones andensureif possible that even

171

low-textured structure zones mayobtainfeature points when using a lowthreshold value, we use a

172

(8)

grid-adapted feature selection approach that can maintain a compromise between achievingamore

173

uniform distribution of feature points over the image and implementingthis approach in hardware

174

in a simple way.

175

This approachinvolvespartitioning the image into blocks andonly retaining the first feature found

176

by the detector in each block. It is easy to implement in hardware and can improve the uniformity

177

of the distribution of the features over the imageas shown in Figure5: we can see that no feature

178

points are detected in very-low textured zones, but a grid like pattern is observed in high-textured

179

zones where a high number of potential feature points are shown.

Figure 5. Uniform distribution of features over the image. The threshold intensity value of the detector is set to7. This image is acquired with a low exposure time but brightened artificially to clearly show the feature points. The size of the grid-adapted features is set to100×100blocks.

180

3.3.4. The FAST Algorithm

181

Features from FAST detector is based on a7×7pixels neighborhoodcentered on the candidate

182

pixelp.The size (7×7) of the window is chosen accordingly with the original implementation of

183

FAST detector. The original FAST-Nalgorithm compares the intensity of a feature candidatepwith

184

each pixel on a 16-pixelsBresenham circle surroundingp. The candidate pixel is detected as a feature

185

point ifNcontiguous pixels on the Bresenham circle with the radiusrare all brighter or all darker

186

thanpby a thresholdt. Originally,Nwas chosen to be 12as shown in Figure6.

187

Hardware Implementation

188

Processing high resolution images is considerably expensive in terms of computing. The

189

SoC/FPGA integrated into the IGN lightweight photogrammetric camera can provide a suitable

190

(9)

p

1 2 3

4 5 6 7 8 9 10 11 12 13 14

15 16

Figure 6.FAST-12 image feature detector illustration.

solution to speed up this processing.Thanks tothis SoC that combines dual ARM cores and a FPGA,

191

hardware/software can be co-designed allowing some processing-intensive tasksto be executed

192

using the programmable logicarray, and leaving the CPUs free to be used for other tasks. Several

193

hardware implementations have been proposedfor FAST. For example,authors of[33] not only test

194

if candidate pixels are considered as feature points, but also compute a corner score for everyfeature

195

detected foruse in another module called NMS (Non-Maximal Suppression) to reduce the occurrence

196

of adjacent features. Here, we present a simpler hardware implementation of FASTaimingto detect

197

good feature points for next processing stages with the addition ofthe selection of grid-adapted

198

features.

199

Hardware Architecture

200

The design has the following characteristics:

201

• Image dimensions and threshold configured by software.

202

• Input pixels format: 8-bit greyscale value.

203

• Input rates are constant at 1 pixel/clock. For now, the image is read from the memory for

204

debugging reasons. Wecouldimprove the design to process the stream of pixels directly from

205

the sensor toachievea real time performance.

206

• Theoutput (x,y) coordinates of the feature points (2×16 bits) are written to memory.

207

The design is divided into two modules: the FAST detector and the selector. The FAST detector

208

module computes a validity flag for each pixel under investigation. The selector module picksthe

209

first valid feature detected by the FAST detector module in each block of the grid-adapted image.

210

The “FAST detector” module is divided into two modules : The “(7×7) window” and the “corner

211

test”.

212

We need simultaneous access to the 16 pixels located on the Bresenham circle and the candidate

213

centered pixel, requiring a 7×7 window containing the 17 pixels under investigation. For this

214

purpose, we used 6 arrays as FIFO buffer whose depth is equal to the width of the image.

215

Furthermore, weused (7×7) 8-bit registers to preserve the intensity values ofthewindow pixels

216

to beprocessedbythe “corner test” module.

217

The “corner test” moduleonly computesthe 12 contiguous pixelswithan intensity value greater than

218

(10)

the candidate pixel intensity value plus the threshold, excluding those with anintensity valueless

219

than the candidate pixel intensity value minus the threshold. This is sufficient toobtaina reasonable

220

number of good feature points. In the first clock cycle, the intensity value of each pixel located on

221

the Bresenham circle is compared with theintensity valueof the centered candidate pixel plusthe

222

threshold. The output of these 16 comparisonsformsa 16-bit wide logic sequence of‘0’and‘1’: ‘1’

223

if the intensity value ofthe pixel in questionis greater than the constant, and‘0’in all other cases.

224

In the next clock cycle, the 16 bit sequence issimultaneouslycompared withthe16 rotations of 12

225

contiguous‘1’and 4 contiguous‘0’. The output (validation signal) ofthe“corner test” module is‘1’

226

if this sequence matchesoneof the 16 combinations,and‘0’in all other cases. Therefore,the output

227

coordinatesare delayed by two-clocks.

228

The “selector” module uses aflagregister to deactivatethe other valid features in each block. At the

229

beginning of eachgrid row, the flag register is initialized to‘0’. This modulealso controlsthe end

230

of image by testing row and column input data. At end of image (last block), it sets column and row

231

output data to‘0’and the feature valid output to‘1’, so we can detect theend of the features list.

232

The Global Design

233

Figure 7.The global design of the grid-based fast detector implementation.

(11)

The global design presented in Figure 7 was implemented and tested on the camera

234

Xilinx/Zynq. Our solution uses the standard DDR RAM memory for storing both theimage and

235

coordinates of feature points. The AXI bus was used to connect the processor system and memory:

236

theAXI busprovidesdirect access to the DDR memory. The AXIStransmitsstream data between

237

modules in master/slave modes. As a first step, to verify the design of this algorithm, the image in

238

the RAM is read by the first DMA engine and transmitted to the “grid-based FAST detector” module.

239

The resultingfeature points coordinatesare written backinthe memory by the second DMA engine.

240

Timing results of the global design executionis 0.024 secondsas shown in Table2, it is the time

241

takentotransfer(2560×1920)pixels from the RAM. In the final design, the processwill be applied

242

directlytothe stream of pixels from the sensor.

243

ResourcesUsageforthe FAST Detector Design

244

(Table. 1) illustrates the number of resources used intheFPGA for the implemented design.

245

Compared with other proposed hardware implementations, our design uses far fewer FPGA

246

resources. As an example, the architecture presented in [33] uses about 10 times more look-up tables

247

because they not only test if the candidate pixel is a feature point,but also calculate a cornerquality

248

score to use it in another module in order to select the feature point with the maximum scorefrom

249

other adjacent feature points.

250

Table 1.Resources usage of the FAST detector design in Xilinx/Zynq-7030 FPGA.

Resource used Available Utilization (%)

Slice LUTs 232 78600 0,30

Slice of 4 registers 472 157200 0,30

Memory (blocks of 36 KB) 6 265 2,26

Clocking 1 32 3,13

3.4. Estimation of Predicted Positions of Homologous Points in Images

251

Using the internal orientation parameters produced by the optical calibration process and the 3D

252

rotations between poses from IMU, we can compute the predicted positions of homologous points of

253

detected featuresin other images as illustrated in Figure8.

Figure 8. The variousstepsrequiredto obtain theapproximateinitial homologous point in another image.Rⁿ_imuis the 3D rotation obtained by IMU Measurements.

254

Let us consider thatP_i¹(u,v)is a 2D point in the plane of the first image. First, the geometric distortion

255

is removed, thenP_i¹(u,v)is transformed to a 3D vector−→

V_i¹(x,y,z)in the camera coordinates system,

256

the origin of this system is at the optical center of the camera. After applying the inverse of the

257

3D rotation (Rⁿ_imu) to−→

V_i¹(x,y,z), we obtain a new vector−→

V_iⁿ(x⁰,y⁰,z⁰)that will be projected into the

258

plane of theN^thimage toobtainthe corresponding 2D pointP_iⁿ(u⁰,v⁰)after applying the geometric

259

distortion. Theaccuracyof the estimated 3D camera rotation (Rⁿ_imu) depends on the accuracy of IMU.

260

In fact, the main limitation oflow-cost MEMS IMUsis the drift due toaccumulated uncorrected bias.

261

Subsequently, this rotation (Rⁿ_imu)onlyprovides an approximate positionforhomologous points in

262

(12)

other imagesas shown in Figure9. We consider this predicted position as the center of a search area

Figure 9.Positions of the features predicted by the IMU for all images.

263

in which the correct position should be found by template matchingas illustrated in Figure10. This

264

stepbringsthe algorithm closer to real-time performance.

Figure 10.Rotation between three images.

265

3.5. Template Matching with ZNCC

266

Once thecentersof search areas in other imageshave beenestimated, the candidate homologous

267

points can be found precisely by template matchingas shown in Figure11and Figure12. Due to

268

(13)

the fact that the deformations between images are small, the correlation scores are high (close to

269

the maximum correlation value(=1), so we perform here a simple filtering to keep only the pairs

270

with a correlation score higher than 0.85 in order to eliminate possible outliers. WechoseZNCC

271

(Zero Mean Normalized Cross-Correlation)[34] because it is a simple and reliable method which

272

can be usedto measure the similarity between image patches, and is not influenced by variation

273

in brightness and contrast[35]. Whereas ZNCCdoes not perform wellwith significant rotation

274

and change of scalebetween two images,but one of ourassumptions forthis work is that rotation

275

between images is small.

276

Thefeature point window and 7×7 FAST detector windowshould have the same size so as to obtain

277

an accurate measurement of the degree of similarity between image patches through the correlation

278

process. The size of search areas depends on the accuracy/bias ofIMU gyrometers,whichmeans

279

that the size of search areasshould be chosen to be large enough (11×11 pixels) between the first

280

and second images to accommodate prediction errors. This sizecanthen bereduced for all other

281

images after eliminating bias. This correlation could be implemented in hardware toenablereal-time

282

processing[36].

Figure 11. Feature points detectedusingFAST algorithm in the first image, the threshold choosen here is 7. The number ofgrid blocksisdefined as100×100;image resolution: 2560×1920 pixels.

283

3.5.1. The Sub-Pixel Correlation

284

One of drawbacks of ZNCC is that its resolution is limited to one pixel, i.e. the peak

285

location coordinateshave integer values. Matching precision can,however,be improveddirectly

286

by interpolating the cross-correlation surface to a higher spatial resolution in order to locate the

287

correlation peak with a sub-pixel accuracy [37]. The sub-pixel location can be approximated by

288

(14)

Figure 12.Search areas obtained by theIMU gyrometersbetween the first image and the 10^thimage.

The green points represent the homologous points obtained by template matching. The size of each search area is 11×11 pixels;image resolution: 2560×1920 pixels.

fitting a two-dimensional model through the correlation coefficients around the peak location. Several

289

models are proposed in the research community such as parabola fitting and Gaussian fitting [38] [39]

290

[40]. In our work, we choose the simple parabolic curve fitting model that computes the sub-pixel

291

peak location separately for the two orthogonal directions where the curve reaches its maximum.

292

Results presented in Figure 15 demonstrate that with this simple model, the sub-pixel approach

293

improve the precision and accuracy of ZNCC-based image matching eliminating an average of 0.15

294

residuals pixels²with respect to the precision of pixel-level matching, so it is notthennecessary to

295

use another complex fitting model. The rounding of uniformly distributed random variable caused

296

errorinits estimation of variance of 1/12pixels². Since we have 2 independent variables x and y,

297

overall error willsee its variance increasedby 1/6 pixels², that is 0.17. We see that our method

298

corrects nearly all this noise.

299

3.6. Estimation of Transformation between Images

300

Now that we have good homologous pairs, the next step is to establish the best estimation of

301

a geometrical relation that maps pixel coordinates from one image to another. Two geometrical

302

transformations can be established here, 3D rotation between poses and 2D homography between

303

images. Which of these two transformations is selected, depends on the real movement of the

304

camera.

305

3D rotationis used when the UAV is quasi-stationary and only attitude variations are considered,and

306

is beneficial for improvingthe robustness of the system because itonlyhas threedegreesof freedom.

307

(15)

This approach is not optimal ifthe real movement of the camera is not a pure rotation. Moreover, in

308

this case,it is not possible to recalibrate theIMU gyrometersdirectly from this transformation.

309

2D homographyis used when thecamera movement consists of both rotation and translation. This

310

transformation has eight degrees of freedom, whichmeansthatit can partially absorb the translation

311

movement of the camera. On the other hand,this approachis more sensitive to errors anddoes not

312

work perfectly when the scene is not planar.

313

3.6.1. Estimation of 3D Rotation CameraUsing theLeast SquaresMethod

314

Once given a set of 3D vectors−→

V_i¹(x,y,z)and corresponding set of 3D vectors−→

V_iⁿ(x⁰,y⁰,z⁰), the

315

3D rotationRⁿ_imgis estimatedusing theleast squaresmethod,which minimizes the errors between

316

estimated 3D vectors−→ˆ

V_iⁿand renormalized 3D vectors−−−→

V_i,corrⁿ resulting from the correlation. The IMU

317

can be used to obtainan initial solutionand then only one iteration is needed to be close enough

318

to the true solution, several more iterations are then used toautomatically eliminateoutliers due to

319

false correlations.

320

The relative 3D rotation betweenRⁿ_imgandRⁿ_imuis used to estimate, in real-time, most ofthe drift of

321

the IMU biasand then improve the accuracy of the estimation of the consecutive 3D rotationRⁿ⁺¹_img.

322

3.6.2. Estimation of 2D HomographyUsing theLeast SquaresMethod

323

With the2D planar homography model,geometrical transformation can be calculated easily

324

using the least squares method, withthe 2D points obtained bytheFAST detector in the first image

325

and their homologouspoints found by template matching intheN^th image. Outliers with errors

326

greater than 3 pixels areautomatically eliminatedin order toobtaina robust estimation of the best

327

fitting homography.

328

3.7. Resampling Images and Generating the Stacked Image

329

We used bilinear interpolation to find the resampled value of each pixelfromthe first image

330

intheN^th images. The stacked intensity valuesarethen saved with a coefficient that canbe used

331

to encode the values of the stacked image on 8 bits instead of 12 bits and optionally increase the

332

brightness of the image. This stage is the most computation intensive task in the algorithmas shown

333

inTable. 2. However, the process of mapping each pixelfromthe first imageto the other images

334

can belargelyaccelerated by using a bilinear approximation. This approach consistsofdividing the

335

first image into M×N blocks and computingfor each block the bilinear function that match the

336

real transformation on the corners of the block, and then using the nearest neighbourresampling

337

method to resample each pixel in other images. Thus we significantly reduce the number of

338

arithmetic operations and memory access use. This method is very effective in software, and we

339

currently aimto implement it in hardware because itusesless resources (especially DSP resources)

340

andrequiresless memory access.

341

4. Results

342

We performeda fewexperiments to quantify the accuracy and performance of our algorithm.

343

We used the hardware described in Chapter 3. Several datasets were acquired by the IGN

344

photogrammetric camera, some of which are shown here. Each dataset consistsof a sequence of

345

ten images acquired at the maximum frame rate of the CMOS (30 images/s) by transmitting a

346

wireless signal to trigger image acquisition when the UAV is hovering. The value of exposure time

347

of acquisition is set to 1msto be closer to hyperspectral images with a low dynamic range.

348 349

(16)

4.1. Timing Results (in seconds)

350

All tests were run on two platforms: Firstly, a HPZ210 Workstation with an Intel Xeon 3.30 GHz

351

processor and 8.0 GB of RAM. Secondly, the photogrammetric IGN camera with 666 MHz dual-core

352

ARM Cortex-A9 processors.

Table 2.(1): initial geometrical transformation with bilinear interpolation, (2): accelerated geometrical transformation using the bilinear function method with bilinear interpolation, (3): accelerated geometrical transformation using the bilinear function method with nearest-neighbor resampling.

Feature detection (402 points)

Matching (402 points)

&

Estimation of 3D rotation

Resampling

Rotation Homography

(1) (2) (3) (1)

PC (Soft) 0.21 0.08 4.72 0.94 0.66 4.00

PC (Soft/2 Threads) 2.46 0.42 0.38 2.06

Camera (Soft) 1.64 0.83 36.94 7.24 5.28 25.17

Camera (Hard) 0.02 Camera

(Soft/2 Threads) 18.48 3.81 2.65 12.57

353

4.2. Accuracy of IMU

354

Figure13 shows the 3D rotations respectively experienced by the IMU and estimated by the

355

photogrammetric method. We choose here a dataset in which the bias of IMU drifts is significant.

-0.006 -0.004 -0.002 0 0.002 0.004 0.006 0.008

0 1 2 3 4 5 6 7 8

Components of quaternion

N^th rotation

qx imu qx estim qy imu qy estim qz imu qz estim diff qx diff qy diff qz

Figure 13.The two 3D rotations of the camera between poses.

356

(17)

4.3. Accuracy ofTransformations Estimation

357

4.3.1. Case of PureRotation Camera Movement

358

Figure14, Figure15and Figure16illustrate a case in which the UAV has experienced a pure

359

rotation movement. In order to explain the augmentation of RMS errors over images, we separate

360

the calculation of errors between grass regions and road regions.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0 2 4 6 8 10

RMS errors (pixels)

between the first image and all N^th images RMS errors with subpixel correl RMS errors without subpixel correl

(a)RMSerrors for 3D Rotation.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0 2 4 6 8 10

RMS erros (pixels)

between the first image and all N^th images RMS errors with subpixel correl RMS errors without subpixel correl

(b)RMSerrors for 2D Homography.

Figure 14.Residuals of the estimation of the 3D rotation and 2D homography.

0 0.05 0.1 0.15 0.2

0 2 4 6 8 10

RMS2(ZNCC) - RMS2(ZNCC_subpixel) (pixels2)

between the first image and all N^th images

(a)3D rotation.

0 0.05 0.1 0.15 0.2

0 2 4 6 8 10

RMS2(ZNCC) - RMS2(ZNCC_subpixel) (pixels2)

between the first image and all N^th images

(b)2D homography

Figure 15.RMS²difference between ZNCC without sub-pixel and ZNCC with sub-pixelaccuracy.

361

4.3.2. Case of Non-pureRotation Camera Movement

362

Figure17and Figure18illustrate a case in which the UAV has suffered a Z-translation movement

363

caused by a free-fall.

364

4.4.Quality of Resulting Image

365

Figure19represents the final stacked image resulting from our algorithm. The red part shows

366

the rotation experienced by the camera between the first and last image. It is possible here to use a

367

brightness factor to illuminate the image.

368

(18)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0 2 4 6 8 10

between the first image and all N^th images RMS errors in grass regions

RMS errors in road region

Figure 16. Residuals of the 3D rotation estimation between the first image and all other images in grass and road areas.

(a)The case of 3D rotation. (b)The case of 2D homography

Figure 17. Residus obtainedusingthe transformations estimation between the first image and the 10^thimage in case of non-pure rotation mouvement of the camera. Red arrows represent the outliers.

5. Discussions

369

5.1. Timing Results

370

Table2shows that the resampling part of the algorithm is the most consumming task for the

371

two types of transformations. Benefiting from the presence of dual-core processors in the camera,

372

we use the OpenMP library to accelerate the image resampling part, the number of threads was

373

set to 2, the processing time of this part was reduced by half. The acceleration factor when using

374

the nearest-neighbor resampling method is 7. This part of the process should be accelerated even

375

further with the help of the programmable logic (FPGA) to achieve a real time performance.

376

(19)

0 0.2 0.4 0.6 0.8 1 1.2 1.4

0 2 4 6 8 10

from the first image to the 10^th image RMS errors of rotation with subpixel correl RMS errors of homography with subpixel correl

Figure 18.Residuals of the estimation of 3D rotation and 2D homography between the first image and all other images in case of UAV translation.

Figure 19.The stacked image. The red part is the non-common part between the first image and the 10th image.

5.2. Accuracy of IMU

377

Our results show that the errors of components increase slowly and linearly over time,

378

implying that the bias is constant over the sequence duration. This bias can be then estimated

379

(20)

(a) (b)

Figure 20. Example illustrating two subsections of two images. (a) is obtained by the average of the ten images, (b) is the stacked image. The angular motion between images is 1.2 degree corresponding to 59 pixels.

by computing the 3D rotation linking the attitude estimated by the photogrammetric method and

380

the one obtained by the IMU. This rotation can then be used to obtain a better initial rotation for

381

the following images, leading to the reduction of the size of the research areas and then of the

382

correlation time, or to correct the biases values in the IMU subsystem; that could be beneficial to

383

other sensors (i.e. LIDAR) on the UAV.

384

5.3. Accuracy of Estimation of Transformations

385

5.3.1. Case of Pure Rotation Movement of the Camera

386

For the two transformations, the RMS errors are less than 0.5 pixel for all the images that are

387

processed. Our results presented in Figure14show also a clear improvement of accuracy when we

388

use the sub-pixel correlation for both transformations. Figure15demonstrates that the difference

389

in variance between sub-pixel correlation and non sub-pixel correlation almost equals to(2/12)

390

corresponding to the variance in uniform distribution; which means that the sub-pixel correlation

391

works as expected. Additionally, the estimation of the homography has less residuals than the

392

rotation, consistent with the fact that former has more degrees of freedom.

393

In both cases, the RMS values of errors increase linearly over images. Figure16shows that it is

394

due to the movement of the grass next to the road: the residuals in road region are constant and

395

significantly lower. It indicates also that the noise of our process is only of the order of 0.2 pixels

396

RMS.

397

5.3.2. Case of Non-pure Rotation Movement of the Camera

398

In this case, the estimation of the rotation with our photogrammetric method doesn’t work

399

optimally because the optical center is moving, thus the residual errors are bigger than expected.

400

They converge to the center of the image as shown in Figure17a because the UAV is experiencing

401

a free-fall. These residuals which are not compensated by the rotation can be absorbed by the 2D

402

homography which include a scale factor: Figure18demonstrates that the residual errors of the

403

estimated homography are constant over images and much smaller than those of the 3D rotation.

404

But, as expected, they are more important in zones where the surface get away from the average

405

plan in scene such as the rooftop shown in Figure17b. It means that for these parts of image, the

406

stacked image will be slightly blurred.

407

(21)

Thanks to the RMS error, we have a control quality factor that helps our processing to determine

408

which type of transformation should be used to obtain the best results.

409

5.4. Quality of Resulting Image

410

Figure19demonstrates visually that no motion blur mars the stacked image. Figure20shows

411

that our method can be used to eliminate the motion blur caused by erratic movement of the

412

camera. Images can be improved to a point where the blur is almost undetectable even if the

413

movement of the camera is complex (Figure9), where the trajectories of predicted features are not

414

the same over the entire image. Our method can be applied for images that exceed an angular

415

motion blur of much more than several pixels unlike de-convolution methods [14] that have strict

416

limits on the amount of angular motion.

417

6. Conclusion and Future Works

418

An image processing stacking algorithm has been presented, which consists of several steps.

419

First, feature points are detected in the first image and matched using ZNCC correlation with the

420

help of IMU measurements, then, 3D rotation (and/or 2D homography) between sequential images

421

isestimatedusinga photogrammetric method,which leadsto a better accuracy. This geometrical

422

transformation (3D rotation or 2D homography) isthenused to resample images, and 3D rotation

423

in particularcan be used to recalibrate theIMU gyrometersby estimating their bias. This algorithm

424

benefits from the presence of a FPGA to speed-up some tasks, like detecting feature points. The

425

proposed method was tested with the IGN camera embedded on an UAV, andthe experimental

426

results show the efficiency and accuracy of this method when the UAV is quasi-stationary. The IMU

427

data processing part is already real-time and can be used for other applications.

428

In the future, we will implement a real-time version by acceleration the image resampling part in

429

the FPGA, since thisis the most demanding part ofprocessing. We will refine all parameters to

430

obtain good result in less time. We will also integrate the translation of the UAV in our model using

431

GNSS/IMU data, which maylater be used to correct translation deformation. Other sensors like

432

lidar or other synchronized cameras can benefit from the resulting better quality of the IMU data. In

433

fact, this methodology is intended to be used on an airborne prototype imaging system with different

434

narrow spectral channels composed of 10 synchronized IGN cameras, which will make it possibleto

435

obtainimages with an equivalent long exposure time.

436

Bibliography

437

1. Boucher, P. Domesticating the Drone: The Demilitarisation of Unmanned Aircraft for Civil Markets.

438

Science and Engineering Ethics2015,21, 1393–1412.

439

2. Colomina, I.; Molina, P. Unmanned aerial systems for photogrammetry and remote sensing: A review.

440

ISPRS Journal of Photogrammetry and Remote Sensing2014,92, 79–97.

441

3. Ham, Y.; Han, K.K.; Lin, J.J.; Golparvar-Fard, M. Visual monitoring of civil infrastructure systems

442

via camera-equipped Unmanned Aerial Vehicles (UAVs): a review of related works. Visualization in

443

Engineering2016,4, 1.

444

4. Daakir, M.; Pierrot-Deseilligny, M.; Bosser, P.; Pichard, F.; Thom, C.; Rabot, Y. STUDY OF LEVER-ARM

445

EFFECT USING EMBEDDED PHOTOGRAMMETRY AND ON-BOARD GPS RECEIVER ON UAV FOR

446

METROLOGICAL MAPPING PURPOSE AND PROPOSAL OF A FREE GROUND MEASUREMENTS

447

CALIBRATION PROCEDURE. ISPRS - International Archives of the Photogrammetry, Remote Sensing and

448

Spatial Information Sciences2016,XL-3/W4, 65–70.

449

5. Tournadre, V.; Pierrot-Deseilligny, M.; Faure, P.H. UAV LINEAR PHOTOGRAMMETRY. ISPRS

450

- International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2015,

451

XL-3/W3, 327–333.

452

(22)

6. Audi, A.; Pierrot Deseilligny, M.; Meynard, C.; Thom, C. Implementation of real-time functionalities in

453

the hardware of an intelligent ultra-light camera specialized for aerial photography. GeoUAV - ISPRS

454

Geospatial Week2015.

455

7. Le Moigne, J.; Netanyahu, N.S.; Eastman, R.D., Eds. Image registration for remote sensing; Cambridge

456

University Press: Cambridge, New York, 2010.

457

8. Lakshmi, K.D.; Vaithiyanathan, V. Study of Feature based Image Registration Algorithms for Navigation

458

of Unmanned Aerial Vehicles.Indian Journal of Science and Technology2015,8.

459

9. Patel, P.; Shah, V. Image Registration Techniques: A Comprehensive Survey. International Journal of

460

Innovative Research and Development || ISSN 2278 – 02112014,0.

461

10. Mann, S.; Picard, R.W. On Being ’undigital’ With Digital Cameras: Extending Dynamic Range By

462

Combining Differently Exposed Pictures. Society for Imaging Science and Technology Proceedings1995, pp.

463

442–448.

464

11. Debevec, P.E.; Malik, J. Recovering high dynamic range radiance maps from photographs. Society for

465

Imaging Science and Technology Proceedings1997, pp. 369–378.

466

12. Robertson, M.A.; Borman, S.; Stevenson, R.L. Dynamic Range Improvement Through Multiple Exposures.

467

Society for Imaging Science and Technology Proceedings1999, pp. 159–163.

468

13. Joshi, N.; Kang, S.B.; Zitnick, C.L.; Szeliski, R. Image Deblurring Using Inertial Measurement Sensors.

469

ACM Trans. Graph.2010,29, 30:1–30:9.

470

14. Shah, C.A.; Schickler, W. AUTOMATED BLUR DETECTION AND REMOVAL IN AIRBORNE IMAGING

471

SYSTEMS USING IMU DATA. ISPRS - International Archives of the Photogrammetry, Remote Sensing and

472

Spatial Information Sciences2012,XXXIX-B1, 321–323.

473

15. Sieberth, T.; Wackrow, R.; Chandler, J.H. AUTOMATIC ISOLATION OF BLURRED IMAGES FROM UAV

474

IMAGE SEQUENCES. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial

475

Information Sciences2013,XL-1/W2, 361–366.

476

16. Sieberth, T.; Wackrow, R.; Chandler, J.H. UAV IMAGE BLUR – ITS INFLUENCE AND WAYS TO

477

CORRECT IT. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information

478

Sciences2015,XL-1/W4, 33–39.

479

17. Klein, G.; Drummond, T. Tightly Integrated Sensor Fusion for Robust Visual Tracking. In Proc. British

480

Machine Vision Conference (BMVC’02, 2002, pp. 787–796.

481

18. Tsai, F.; Chang, H.; Su, A.Y.S. Combining MEMS-based IMU data and vision-based trajectory estimation.

482

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences2014,

483

XL-4, 267–271.

484

19. Hol, J.D.; Schön, T.B.; Gustafsson, F.; Slycke, P.J. Sensor Fusion for Augmented Reality."9th International

485

Conference on Information Fusion, Florence, 20062007.

486

20. You, S.; Neumann, U. Fusion of vision and gyro tracking for robust augmented reality registration. Virtual

487

Reality, 2001. Proceedings. IEEE, 2001, pp. 71–78.

488

21. Caarls, J.; Jonker, P.; Persa, S., Sensor Fusion for Augmented Reality. InAmbient Intelligence: First European

489

Symposium, EUSAI 2003, Veldhoven,The Netherlands,November 3-4, 2003. Proceedings; Aarts, E.; Collier, R.W.;

490

van Loenen, E.; de Ruyter, B., Eds.; Springer Berlin Heidelberg: Berlin, Heidelberg, 2003; pp. 160–176.

491

22. Wang, Y.; Hou, Z.; Leman, K.; Chang, R. Real-Time Video Stabilization for Unmanned Aerial Vehicles.

492

MVA2011 IAPR Conference on Machine Vision Applications, June 13-15, 2011, Nara, JAPAN2011.

493

23. Martin, O.; Meynard, C.; Pierrot Deseilligny, M.; Souchon, J.P.; Thom, C. Réalisation d’une caméra

494

photogrammétrique ultralégère et de haute résolution. Colloque Drones et moyens légers aéroportés

495

d’observation, Montpellier, France2014.

496

24. Pierrot Deseilligny, M.; Clery, I. APERO, AN OPEN SOURCE BUNDLE ADJUSMENT SOFTWARE FOR

497

AUTOMATIC CALIBRATION AND ORIENTATION OF SET OF IMAGES.ISPRS - International Archives

498

of the Photogrammetry, Remote Sensing and Spatial Information Sciences2011,XXXVIII-5/W16, 269–276.

499

25. Richard Hartley, S.B.K. Parameter-free Radial Distortion Correction with Centre of Distortion Estimation.

500

Technical report, 2005.

501

26. Pena, M.A Comparative Study of Three Image Matching Algorithms: Sift, Surf, and Fast; BiblioBazaar, 2012.

502

27. Le, X.; Gonzalez, R. Pattern-based corner detection algorithm. Image and Signal Processing and Analysis,

503

2009. ISPA 2009. Proceedings of 6th International Symposium on, 2009, pp. 238–243.

504