HAL Id: hal-01559030
https://hal.archives-ouvertes.fr/hal-01559030
Submitted on 17 Jul 2017
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Algorithm in a Digital Camera for Unmanned Aerial Vehicles
Ahmad Audi, Marc Pierrot-Deseilligny, Christophe Meynard, Christian Thom
To cite this version:
Ahmad Audi, Marc Pierrot-Deseilligny, Christophe Meynard, Christian Thom. Implementation of an
IMU Aided Image Stacking Algorithm in a Digital Camera for Unmanned Aerial Vehicles. Sensors,
MDPI, 2017, 17 (7), pp.1646. �hal-01559030�
Implementation of an IMU Aided Image Stacking Algorithm in a Digital Camera for Unmanned Aerial Vehicles
Ahmad Audi1,*, Marc Pierrot-Deseilligny2,, Christophe Meynard1,and Christian Thom1,
1 Université Paris-Est, IGN, LaSTIG, LOEMI, 73 Avenue de Paris, 94160 Saint-Mandé, France;
ahmad.audi@ign.fr (A.A.); christophe.meynard@ign.com (C.M.); christian.thom@ign.com (CH.T.)
2 Université Paris-Est, IGN, LaSTIG, LOEMI, ENSG, 6-8 Avenue Blaise Pascal, 77420 Champs-sur-Marne, France; Marc.Pierrot-Deseilligny@ensg.eu (M.P-D.)
* Correspondence: ahmad.audi@ign.fr Academic Editor: name
Version July 7, 2017 submitted to Entropy
Abstract: Images acquired with a long exposure time using a camera embedded on UAVs
1
(Unmanned Aerial Vehicles) exhibit motion blur due totheerratic movements of the UAV. The
2
aim of the present work is to be able to acquire several images with ashort exposure timeand use
3
an image processing algorithm to produce astackedimage with an equivalentlong exposure time.
4
Our method is based on features image registration technique. The algorithm is implemented
5
on the light-weight IGN (Institut national de l’information géographique) camera, which
6
have an IMU (Inertial Measurement Unit) sensor and a SoC (System On Chip)/FPGA
7
(Field-Programmable Gate Array). To obtain the correct parameters forthe resampling of the
8
images, the proposed method accurately estimates the geometrical transformation between the
9
first and the Nthimages. Feature pointsare detected in the first imageusing theFAST(Features
10
From Accelerated Segment Test) detector,thenhomologous points on other images are obtained
11
by template matching using an initial positionbenefiting greatly from the presence oftheIMU
12
sensor. The SoC/FPGA in the camera is used to speed up some parts of the algorithm in order
13
to achieve real-time performance as our ultimate objective is to exclusively write the resulting
14
image to save bandwidth on the storage device. The paper includes a detailed description of the
15
implemented algorithm, resource usage summary, resultingprocessingtime, resulting imagesand
16
block diagrams of the described architecture. The resulting stacked image obtainedforreal surveys
17
does not seem visually impaired. An interesting by-product of this algorithm isthe 3D rotation
18
estimated by a photogrammetric method between poses, which can be used to recalibrate in
19
real-time the gyrometers of IMU. Timing results demonstrate that the image resampling part of
20
this algorithm is the most demandingprocessing taskand shouldalso beaccelerated in the FPGA
21
in future work.
22
Keywords: UAVs, Stacking, Inertial Measurement Unit, Image processing, Photogrammetry,
23
Real-time, Co-design.
24
1. Introduction
25
Over the last decade, Unmanned Aerial Vehicles (UAVs) have been widely used for civil
26
applications [1],especially for environmental surveys[2], [3]. Their growing popularity has led
27
the LOEMI team of IGN/LaSTIG to design and produce an ultra-light smart digital aerial camera,
28
more suitablefor photogrammetric applications than consumer cameras.
29
This camerawasoriginally used to exploit photogrammetric and metrologic surveys using UAVs
30
Submitted toEntropy, pages 1 – 22 www.mdpi.com/journal/entropy
whilesome research works are alreadyunderway,such as [4] and [5]. The next targeted applications
31
involve shots under cloudy conditions,night-time surveys and narrow spectral bandwidth imagery,
32
that usually implyalong exposure time. Images acquired with a long exposure timeusinga camera
33
embedded on UAVsmayexhibit motion blur due totheerratic movements of the UAV. This paper
34
presents an image stacking algorithm that can provide an accurate blur-free composite image of
35
photogrammetric quality with an equivalent long exposure time. An initial presentation of this
36
work has been done in [6].
37
In this context, the main part of our algorithm is based on featuresimage registration [7] [8] [9].
38
This technique consists of feature detection, feature matching, transformation estimation, image
39
transformation and resampling. The ultimate objective of our work is to achieve real-time processing
40
of the algorithm, leading toonly the resulting image needing to be saved, and therefore ensuring
41
a better overall frame rate. The inertial measurement unit mounted inside the camera cangive the
42
relative orientations of the camera during shootings quantifying an initial amount of motion between
43
poses. This consequently considerably speeds up the expensive computing of matching feature
44
points,one of the most time consuming tasks, making our stacking algorithm more appropriate for
45
real-time performance. Images are acquired withahigh frame rate, which impliesminordeformation
46
between images.
47
Two approaches are introduced in this paper. For the firstapproach, the UAV isassumed to be
48
quasi-stationary during exposures,only camera attitude variationsare considered,andonly image
49
deformation due to camera rotationis corrected. One interesting by-product of this method is a
50
better estimation of variationin the orientation of the camera between shots,which can be used
51
to recalibrate the IMU gyrometers. Other sensors, such as lidar sensors on the UAV can then
52
benefit from the resulting better-quality of the IMU data. The second approach uses the 2D planar
53
homography transformation supposing that not only the 3D rotation is taken into account oncamera
54
movementbut also the translationwhenthe scene is nearly planar.
55
Our algorithm works completely automatically, whether offline or online. Some tests have been
56
completedon several types of UAVs,including the Copter 1B UAV equipped with the IGN camera
57
to assesstheaccuracy of the method and some of the results are presented.
58
2. Related Works
59
This workrelatesto two fields: image stacking and visual-inertial fusion techniques.
60
The stacking technique creates a single image by combining many images that may be superimposed.
61
One stacking method is HDR(High Dynamic Range)imaging, whichcombines multiple images with
62
different exposure levels in order toobtaina single image with a large dynamic range [10], [11], [12].
63
These worksassumethattheimages are superimposable; differing from our methodon this basis.
64
Many visual-inertial fusion techniques have been proposed intheliterature. Weherebypresent some
65
of them. Authors of [13] suggests usingthe inertial measurement sensors to measure camera’s
66
acceleration and angular velocity during exposure, which can then be used toestimate the blur
67
function and then deconvolve the blurred image to achieve sharp results. Their methodonlyworks
68
forvery small movements and many cited limitations, such as sensor accuracy and noise, constrain
69
their usage. In the same context, the IMU is used in [14] during aerial shooting by a camera
70
embedded on an aircraft to eliminate blur and provide a deblurred image. The authors’ objective
71
is close to ours,but their methodonly coversvery small movements.Their results show that images
72
with morethan7 pixelsinmotion blurrecordminor improvements. Other works relate to the same
73
objective using deblurring algorithms without IMU data: [15], [16] and [17].
74
Some studies in literature expand on the combination of IMU and computer vision for tracking
75
purposes. For example, a vision-assisted algorithm is combined to improve positioning and to
76
track the performance of mobile devices for SLAM (Simultaneous Localization And Mapping)
77
applications[18].Parts of this approach seem similar to ours.ItusesIMU gyrometers measurements
78
to estimate rotation angles to improve the accuracy of trajectory estimation, but they also use
79
reference nodes as control points identified in the vision processingphaseto correct the propagation
80
of IMU errors. Therefore, thismethod is not optimal if no reference nodes are availablealong
81
the trajectory. Authors of[19], [20] and [21] use a combination of inertial sensors anda camera for
82
motion tracking in augmented reality applications. Mostfusion sensor works use inertial sensors to
83
aid computer vision methods,making the system more accurate and more robust.
84
Someparts of our work havealso been explored in alternative domains. For example,a real-time
85
video stabilization algorithm for unmanned aerial vehicles is proposed in[22], butthismethod
86
excludes any aided-sensors such asIMU to improve the accuracy of motion estimation or reducethe
87
tracking time for featuresin consecutive frames.
88
3. Materials and Methods
89
3.1. Hardware
90
The system presented in Figure 1 consists of an UAV equipped with an IGN lightweight
91
photogrammetric camera [23]. The camera used to implement our algorithm is shown in Figure
92
2. This camera was developed at the French National Mapping Agency (IGN) by the LOEMI team.
Figure 1.IGN camera with the Copter 1B UAV developed by Wikhydro
6 axis IMU
CMV20000-CMOS Sensor (20 MP) "SoC- FPGA Zynq 7030" chip
Figure 2.The CMOS Sensor with the inertial sensors and the embedded intelligence
93
It is based on a global shutter CMOS 20MP sensor (CMOSIS CMV20000)from CMOSISand a Zynq
94
7030 SoC/FPGAfrom Xilinx,allowing some image processing algorithms to be implemented in
95
hardware, making them real-time compliant. It has both embedded intelligence, in the form of two
96
ARM cortex A9 CPUs running a Linux OS andapplication softwares, and a programmable logic
97
subsystem originallyused to acquire image data from the sensor. A standard 35 mm optical lens
98
is usedfor this purpose. The camera prototype we used was fitted with a Bayer sensor and has
99
a resolution of 20 MP (5120×3840). Our work aims to be used for a hyperspectral imaging system
100
consisting ofseveral monochrome IGN cameras, thus we use the 5 MP (2560×1920) grayscale images
101
obtainedfrom the original pixel-stream by averaging each green pair of the Bayer pattern.
102
Recently, LOEMI team engineers integrated aGNSS (GLobal Navigation Satellite System)receiver,
103
whichcan be used in future works to take into accountof UAV translationduring shots.
IMU
CMOS Sensor
z x
y
Figure 3.The IMU and the CMOS sensor inside IGN camera
104
The ICM-20608-G IMU sensor from InvenSense used in the camera is a low-cost MEMS
105
(Micro-electro-mechanical-system)device. It measures angular speed around the three axes with
106
low latency and at a high frequency. The main disadvantage of MEMS technology is the reduced
107
performance in terms of accuracy and stability: the slow evolving bias is the main type of error that
108
affects measurements,as itaccumulates over time due to the integration of angular velocity.We will
109
show that it can be estimated by sensor fusion and then eliminated from the subsystem.Laboratory
110
thermal calibrationis used to eliminate the temperature effects caused by thedissipation of heat
111
inside the camera (75◦ C). The IMU sensor istightly fixed inside the camerato ensure that both
112
sensors experience the same 3D rotationas shown in Figure3. We use the IMU subsystem to obtain
113
estimation of absolute 3D rotation of each pose (Rnimu) acquired by the CMOS sensor.
114
3.2. Calibration of the Camera
115
In photogrammetric processes, cameras should be calibrated to take accurate 3D geometric
116
measurements from their images. This calibrationinvolvesestimating their internal orientation. It
117
is obtainedusing camera self-calibration bundle adjustment methodviaMicMac, free open-source
118
photogrammetric software developedbyIGN [24]. The method used in MicMac is based on SIFT
119
points, matching relationship in stereo images. It is applied toeach pair of multiple view images
120
takenusingan un-calibrated camera. We use the radial standard distortion model [25] in our work.
121
3.3. Images and Data Processing
122
This algorithm is implemented and executed in the camera and is intended tobenefit fromthe
123
FPGA acceleration.It was initiallydeveloped onadesktop PC.
124
3.3.1. Algorithm
125
The architecture of the algorithm is illustrated in Figure4.
Figure 4.Architecture of the algorithm.
• Camera attitudeis computed using measurements from the IMU subsystem.
126
• Acquisition of a burst of 10 images by the camera,saved inRAM at themaximumframe rate of
127
the CMOS (30 images/s);the time difference between successive images is 33 ms.
128
• Detection of a reasonable number of feature points in the first image. This part is implemented
129
in hardware.
130
• For the next sequential images:
131
1. In this image,thepredicted positions of feature points are computed using IMU orientation
132
to speed up the next step.
133
2. Template matching is used to determine the accurate homologous position of these
134
feature points.
135
3. (a) 3D rotation between the two poses is estimated usingtheleast squares method with
136
outliers elimination. This informationmay possibly beused as feedback in the IMU
137
subsystem to improve its overall quality by the real-time determination of sensor
138
drifts.
139
(b) 2D homography transformation between the two images is estimated usingtheleast
140
squares method withoutliers elimination.
141
4. The image is resampled in the geometry of the first image.
142
5. Stackit.
143
• Savethe stacked image.
144
3.3.2. Features Detection
145
Detection of features is a fundamental task in computer vision applications, such as motion
146
detection, object recognition and tracking, 3D reconstruction, visual odometry etc. A feature can be
147
defined as a pattern or distinct structure with special properties found in an image. Most of the time,
148
these features are used in further processing steps. In our work, thefeatures detection algorithm was
149
mainly selected on the basis ofits ability to find a reasonable number of good features in a single
150
image (leading by correlation to unambiguous matches in other images) and by the possibilityfor
151
easy implementationin hardware. After analyzingthemain existing feature detectors [26] [27] [28],
152
we selected the FAST detector algorithm developed by [29].
153
This detector is many times faster in software than other existing detectors [30], proving that it
154
is assumed to be conceptually simple, leading to the fact that it can be easily implemented in
155
hardware. In our application, we will notconsiderthe stability of the detector because feature points
156
willonly be detectedin the first image and then searchedforby template matching in otherimages.
157
Anotheradvantage is that FAST requiresfarless resources than other detectors andin parallel, offers
158
sufficient performance for our application.
159
On the otherhand,theFAST detector is dependent as many other detectors on a threshold,which
160
must be set according toimage noise level, especially as FAST has been shown to be sensitive to
161
high-level noise [26] [27]. However, in our camera, the noise level is reduced to be constant in RMS
162
with respect to gray levels byusing the non-linearLUT(Look Up Tables),whichtranslate the 12 bits
163
of pixel data into 8 bits. We can then use a threshold value independent of image brightness level and
164
content.
165
3.3.3. Features Selection Approach
166
Authors of[31] shows that FAST tends to detect a high number of feature points in high-textured
167
zones but not in low-textured zones. This means that the feature points maybe concentrated only
168
inhigh-textured zones orina specific zone oftheimage. This non-uniform distribution of points
169
makes the estimation of transformation (rotation or homography) between images less accurate [32].
170
To limit the high number of feature points in high-textured zones andensureif possible that even
171
low-textured structure zones mayobtainfeature points when using a lowthreshold value, we use a
172
grid-adapted feature selection approach that can maintain a compromise between achievingamore
173
uniform distribution of feature points over the image and implementingthis approach in hardware
174
in a simple way.
175
This approachinvolvespartitioning the image into blocks andonly retaining the first feature found
176
by the detector in each block. It is easy to implement in hardware and can improve the uniformity
177
of the distribution of the features over the imageas shown in Figure5: we can see that no feature
178
points are detected in very-low textured zones, but a grid like pattern is observed in high-textured
179
zones where a high number of potential feature points are shown.
Figure 5. Uniform distribution of features over the image. The threshold intensity value of the detector is set to7. This image is acquired with a low exposure time but brightened artificially to clearly show the feature points. The size of the grid-adapted features is set to100×100blocks.
180
3.3.4. The FAST Algorithm
181
Features from FAST detector is based on a7×7pixels neighborhoodcentered on the candidate
182
pixelp.The size (7×7) of the window is chosen accordingly with the original implementation of
183
FAST detector. The original FAST-Nalgorithm compares the intensity of a feature candidatepwith
184
each pixel on a 16-pixelsBresenham circle surroundingp. The candidate pixel is detected as a feature
185
point ifNcontiguous pixels on the Bresenham circle with the radiusrare all brighter or all darker
186
thanpby a thresholdt. Originally,Nwas chosen to be 12as shown in Figure6.
187
Hardware Implementation
188
Processing high resolution images is considerably expensive in terms of computing. The
189
SoC/FPGA integrated into the IGN lightweight photogrammetric camera can provide a suitable
190
p
1 2 3
4 5 6 7 8 9 10 11 12 13 14
15 16
Figure 6.FAST-12 image feature detector illustration.
solution to speed up this processing.Thanks tothis SoC that combines dual ARM cores and a FPGA,
191
hardware/software can be co-designed allowing some processing-intensive tasksto be executed
192
using the programmable logicarray, and leaving the CPUs free to be used for other tasks. Several
193
hardware implementations have been proposedfor FAST. For example,authors of[33] not only test
194
if candidate pixels are considered as feature points, but also compute a corner score for everyfeature
195
detected foruse in another module called NMS (Non-Maximal Suppression) to reduce the occurrence
196
of adjacent features. Here, we present a simpler hardware implementation of FASTaimingto detect
197
good feature points for next processing stages with the addition ofthe selection of grid-adapted
198
features.
199
Hardware Architecture
200
The design has the following characteristics:
201
• Image dimensions and threshold configured by software.
202
• Input pixels format: 8-bit greyscale value.
203
• Input rates are constant at 1 pixel/clock. For now, the image is read from the memory for
204
debugging reasons. Wecouldimprove the design to process the stream of pixels directly from
205
the sensor toachievea real time performance.
206
• Theoutput (x,y) coordinates of the feature points (2×16 bits) are written to memory.
207
The design is divided into two modules: the FAST detector and the selector. The FAST detector
208
module computes a validity flag for each pixel under investigation. The selector module picksthe
209
first valid feature detected by the FAST detector module in each block of the grid-adapted image.
210
The “FAST detector” module is divided into two modules : The “(7×7) window” and the “corner
211
test”.
212
We need simultaneous access to the 16 pixels located on the Bresenham circle and the candidate
213
centered pixel, requiring a 7×7 window containing the 17 pixels under investigation. For this
214
purpose, we used 6 arrays as FIFO buffer whose depth is equal to the width of the image.
215
Furthermore, weused (7×7) 8-bit registers to preserve the intensity values ofthewindow pixels
216
to beprocessedbythe “corner test” module.
217
The “corner test” moduleonly computesthe 12 contiguous pixelswithan intensity value greater than
218
the candidate pixel intensity value plus the threshold, excluding those with anintensity valueless
219
than the candidate pixel intensity value minus the threshold. This is sufficient toobtaina reasonable
220
number of good feature points. In the first clock cycle, the intensity value of each pixel located on
221
the Bresenham circle is compared with theintensity valueof the centered candidate pixel plusthe
222
threshold. The output of these 16 comparisonsformsa 16-bit wide logic sequence of‘0’and‘1’: ‘1’
223
if the intensity value ofthe pixel in questionis greater than the constant, and‘0’in all other cases.
224
In the next clock cycle, the 16 bit sequence issimultaneouslycompared withthe16 rotations of 12
225
contiguous‘1’and 4 contiguous‘0’. The output (validation signal) ofthe“corner test” module is‘1’
226
if this sequence matchesoneof the 16 combinations,and‘0’in all other cases. Therefore,the output
227
coordinatesare delayed by two-clocks.
228
The “selector” module uses aflagregister to deactivatethe other valid features in each block. At the
229
beginning of eachgrid row, the flag register is initialized to‘0’. This modulealso controlsthe end
230
of image by testing row and column input data. At end of image (last block), it sets column and row
231
output data to‘0’and the feature valid output to‘1’, so we can detect theend of the features list.
232
The Global Design
233
Figure 7.The global design of the grid-based fast detector implementation.
The global design presented in Figure 7 was implemented and tested on the camera
234
Xilinx/Zynq. Our solution uses the standard DDR RAM memory for storing both theimage and
235
coordinates of feature points. The AXI bus was used to connect the processor system and memory:
236
theAXI busprovidesdirect access to the DDR memory. The AXIStransmitsstream data between
237
modules in master/slave modes. As a first step, to verify the design of this algorithm, the image in
238
the RAM is read by the first DMA engine and transmitted to the “grid-based FAST detector” module.
239
The resultingfeature points coordinatesare written backinthe memory by the second DMA engine.
240
Timing results of the global design executionis 0.024 secondsas shown in Table2, it is the time
241
takentotransfer(2560×1920)pixels from the RAM. In the final design, the processwill be applied
242
directlytothe stream of pixels from the sensor.
243
ResourcesUsageforthe FAST Detector Design
244
(Table. 1) illustrates the number of resources used intheFPGA for the implemented design.
245
Compared with other proposed hardware implementations, our design uses far fewer FPGA
246
resources. As an example, the architecture presented in [33] uses about 10 times more look-up tables
247
because they not only test if the candidate pixel is a feature point,but also calculate a cornerquality
248
score to use it in another module in order to select the feature point with the maximum scorefrom
249
other adjacent feature points.
250
Table 1.Resources usage of the FAST detector design in Xilinx/Zynq-7030 FPGA.
Resource used Available Utilization (%)
Slice LUTs 232 78600 0,30
Slice of 4 registers 472 157200 0,30
Memory (blocks of 36 KB) 6 265 2,26
Clocking 1 32 3,13
3.4. Estimation of Predicted Positions of Homologous Points in Images
251
Using the internal orientation parameters produced by the optical calibration process and the 3D
252
rotations between poses from IMU, we can compute the predicted positions of homologous points of
253
detected featuresin other images as illustrated in Figure8.
Figure 8. The variousstepsrequiredto obtain theapproximateinitial homologous point in another image.Rnimuis the 3D rotation obtained by IMU Measurements.
254
Let us consider thatPi1(u,v)is a 2D point in the plane of the first image. First, the geometric distortion
255
is removed, thenPi1(u,v)is transformed to a 3D vector−→
Vi1(x,y,z)in the camera coordinates system,
256
the origin of this system is at the optical center of the camera. After applying the inverse of the
257
3D rotation (Rnimu) to−→
Vi1(x,y,z), we obtain a new vector−→
Vin(x0,y0,z0)that will be projected into the
258
plane of theNthimage toobtainthe corresponding 2D pointPin(u0,v0)after applying the geometric
259
distortion. Theaccuracyof the estimated 3D camera rotation (Rnimu) depends on the accuracy of IMU.
260
In fact, the main limitation oflow-cost MEMS IMUsis the drift due toaccumulated uncorrected bias.
261
Subsequently, this rotation (Rnimu)onlyprovides an approximate positionforhomologous points in
262
other imagesas shown in Figure9. We consider this predicted position as the center of a search area
Figure 9.Positions of the features predicted by the IMU for all images.
263
in which the correct position should be found by template matchingas illustrated in Figure10. This
264
stepbringsthe algorithm closer to real-time performance.
Figure 10.Rotation between three images.
265
3.5. Template Matching with ZNCC
266
Once thecentersof search areas in other imageshave beenestimated, the candidate homologous
267
points can be found precisely by template matchingas shown in Figure11and Figure12. Due to
268
the fact that the deformations between images are small, the correlation scores are high (close to
269
the maximum correlation value(=1), so we perform here a simple filtering to keep only the pairs
270
with a correlation score higher than 0.85 in order to eliminate possible outliers. WechoseZNCC
271
(Zero Mean Normalized Cross-Correlation)[34] because it is a simple and reliable method which
272
can be usedto measure the similarity between image patches, and is not influenced by variation
273
in brightness and contrast[35]. Whereas ZNCCdoes not perform wellwith significant rotation
274
and change of scalebetween two images,but one of ourassumptions forthis work is that rotation
275
between images is small.
276
Thefeature point window and 7×7 FAST detector windowshould have the same size so as to obtain
277
an accurate measurement of the degree of similarity between image patches through the correlation
278
process. The size of search areas depends on the accuracy/bias ofIMU gyrometers,whichmeans
279
that the size of search areasshould be chosen to be large enough (11×11 pixels) between the first
280
and second images to accommodate prediction errors. This sizecanthen bereduced for all other
281
images after eliminating bias. This correlation could be implemented in hardware toenablereal-time
282
processing[36].
Figure 11. Feature points detectedusingFAST algorithm in the first image, the threshold choosen here is 7. The number ofgrid blocksisdefined as100×100;image resolution: 2560×1920 pixels.
283
3.5.1. The Sub-Pixel Correlation
284
One of drawbacks of ZNCC is that its resolution is limited to one pixel, i.e. the peak
285
location coordinateshave integer values. Matching precision can,however,be improveddirectly
286
by interpolating the cross-correlation surface to a higher spatial resolution in order to locate the
287
correlation peak with a sub-pixel accuracy [37]. The sub-pixel location can be approximated by
288
Figure 12.Search areas obtained by theIMU gyrometersbetween the first image and the 10thimage.
The green points represent the homologous points obtained by template matching. The size of each search area is 11×11 pixels;image resolution: 2560×1920 pixels.
fitting a two-dimensional model through the correlation coefficients around the peak location. Several
289
models are proposed in the research community such as parabola fitting and Gaussian fitting [38] [39]
290
[40]. In our work, we choose the simple parabolic curve fitting model that computes the sub-pixel
291
peak location separately for the two orthogonal directions where the curve reaches its maximum.
292
Results presented in Figure 15 demonstrate that with this simple model, the sub-pixel approach
293
improve the precision and accuracy of ZNCC-based image matching eliminating an average of 0.15
294
residuals pixels2with respect to the precision of pixel-level matching, so it is notthennecessary to
295
use another complex fitting model. The rounding of uniformly distributed random variable caused
296
errorinits estimation of variance of 1/12pixels2. Since we have 2 independent variables x and y,
297
overall error willsee its variance increasedby 1/6 pixels2, that is 0.17. We see that our method
298
corrects nearly all this noise.
299
3.6. Estimation of Transformation between Images
300
Now that we have good homologous pairs, the next step is to establish the best estimation of
301
a geometrical relation that maps pixel coordinates from one image to another. Two geometrical
302
transformations can be established here, 3D rotation between poses and 2D homography between
303
images. Which of these two transformations is selected, depends on the real movement of the
304
camera.
305
3D rotationis used when the UAV is quasi-stationary and only attitude variations are considered,and
306
is beneficial for improvingthe robustness of the system because itonlyhas threedegreesof freedom.
307
This approach is not optimal ifthe real movement of the camera is not a pure rotation. Moreover, in
308
this case,it is not possible to recalibrate theIMU gyrometersdirectly from this transformation.
309
2D homographyis used when thecamera movement consists of both rotation and translation. This
310
transformation has eight degrees of freedom, whichmeansthatit can partially absorb the translation
311
movement of the camera. On the other hand,this approachis more sensitive to errors anddoes not
312
work perfectly when the scene is not planar.
313
3.6.1. Estimation of 3D Rotation CameraUsing theLeast SquaresMethod
314
Once given a set of 3D vectors−→
Vi1(x,y,z)and corresponding set of 3D vectors−→
Vin(x0,y0,z0), the
315
3D rotationRnimgis estimatedusing theleast squaresmethod,which minimizes the errors between
316
estimated 3D vectors−→ˆ
Vinand renormalized 3D vectors−−−→
Vi,corrn resulting from the correlation. The IMU
317
can be used to obtainan initial solutionand then only one iteration is needed to be close enough
318
to the true solution, several more iterations are then used toautomatically eliminateoutliers due to
319
false correlations.
320
The relative 3D rotation betweenRnimgandRnimuis used to estimate, in real-time, most ofthe drift of
321
the IMU biasand then improve the accuracy of the estimation of the consecutive 3D rotationRn+1img.
322
3.6.2. Estimation of 2D HomographyUsing theLeast SquaresMethod
323
With the2D planar homography model,geometrical transformation can be calculated easily
324
using the least squares method, withthe 2D points obtained bytheFAST detector in the first image
325
and their homologouspoints found by template matching intheNth image. Outliers with errors
326
greater than 3 pixels areautomatically eliminatedin order toobtaina robust estimation of the best
327
fitting homography.
328
3.7. Resampling Images and Generating the Stacked Image
329
We used bilinear interpolation to find the resampled value of each pixelfromthe first image
330
intheNth images. The stacked intensity valuesarethen saved with a coefficient that canbe used
331
to encode the values of the stacked image on 8 bits instead of 12 bits and optionally increase the
332
brightness of the image. This stage is the most computation intensive task in the algorithmas shown
333
inTable. 2. However, the process of mapping each pixelfromthe first imageto the other images
334
can belargelyaccelerated by using a bilinear approximation. This approach consistsofdividing the
335
first image into M×N blocks and computingfor each block the bilinear function that match the
336
real transformation on the corners of the block, and then using the nearest neighbourresampling
337
method to resample each pixel in other images. Thus we significantly reduce the number of
338
arithmetic operations and memory access use. This method is very effective in software, and we
339
currently aimto implement it in hardware because itusesless resources (especially DSP resources)
340
andrequiresless memory access.
341
4. Results
342
We performeda fewexperiments to quantify the accuracy and performance of our algorithm.
343
We used the hardware described in Chapter 3. Several datasets were acquired by the IGN
344
photogrammetric camera, some of which are shown here. Each dataset consistsof a sequence of
345
ten images acquired at the maximum frame rate of the CMOS (30 images/s) by transmitting a
346
wireless signal to trigger image acquisition when the UAV is hovering. The value of exposure time
347
of acquisition is set to 1msto be closer to hyperspectral images with a low dynamic range.
348 349
4.1. Timing Results (in seconds)
350
All tests were run on two platforms: Firstly, a HPZ210 Workstation with an Intel Xeon 3.30 GHz
351
processor and 8.0 GB of RAM. Secondly, the photogrammetric IGN camera with 666 MHz dual-core
352
ARM Cortex-A9 processors.
Table 2.(1): initial geometrical transformation with bilinear interpolation, (2): accelerated geometrical transformation using the bilinear function method with bilinear interpolation, (3): accelerated geometrical transformation using the bilinear function method with nearest-neighbor resampling.
Feature detection (402 points)
Matching (402 points)
&
Estimation of 3D rotation
Resampling
Rotation Homography
(1) (2) (3) (1)
PC (Soft) 0.21 0.08 4.72 0.94 0.66 4.00
PC (Soft/2 Threads) 2.46 0.42 0.38 2.06
Camera (Soft) 1.64 0.83 36.94 7.24 5.28 25.17
Camera (Hard) 0.02 Camera
(Soft/2 Threads) 18.48 3.81 2.65 12.57
353
4.2. Accuracy of IMU
354
Figure13 shows the 3D rotations respectively experienced by the IMU and estimated by the
355
photogrammetric method. We choose here a dataset in which the bias of IMU drifts is significant.
-0.006 -0.004 -0.002 0 0.002 0.004 0.006 0.008
0 1 2 3 4 5 6 7 8
Components of quaternion
Nth rotation
qx imu qx estim qy imu qy estim qz imu qz estim diff qx diff qy diff qz
Figure 13.The two 3D rotations of the camera between poses.
356
4.3. Accuracy ofTransformations Estimation
357
4.3.1. Case of PureRotation Camera Movement
358
Figure14, Figure15and Figure16illustrate a case in which the UAV has experienced a pure
359
rotation movement. In order to explain the augmentation of RMS errors over images, we separate
360
the calculation of errors between grass regions and road regions.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0 2 4 6 8 10
RMS errors (pixels)
between the first image and all Nth images RMS errors with subpixel correl RMS errors without subpixel correl
(a)RMSerrors for 3D Rotation.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0 2 4 6 8 10
RMS erros (pixels)
between the first image and all Nth images RMS errors with subpixel correl RMS errors without subpixel correl
(b)RMSerrors for 2D Homography.
Figure 14.Residuals of the estimation of the 3D rotation and 2D homography.
0 0.05 0.1 0.15 0.2
0 2 4 6 8 10
RMS2(ZNCC) - RMS2(ZNCC_subpixel) (pixels2)
between the first image and all Nth images
(a)3D rotation.
0 0.05 0.1 0.15 0.2
0 2 4 6 8 10
RMS2(ZNCC) - RMS2(ZNCC_subpixel) (pixels2)
between the first image and all Nth images
(b)2D homography
Figure 15.RMS2difference between ZNCC without sub-pixel and ZNCC with sub-pixelaccuracy.
361
4.3.2. Case of Non-pureRotation Camera Movement
362
Figure17and Figure18illustrate a case in which the UAV has suffered a Z-translation movement
363
caused by a free-fall.
364
4.4.Quality of Resulting Image
365
Figure19represents the final stacked image resulting from our algorithm. The red part shows
366
the rotation experienced by the camera between the first and last image. It is possible here to use a
367
brightness factor to illuminate the image.
368
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0 2 4 6 8 10
RMS errors (pixels)
between the first image and all Nth images RMS errors in grass regions
RMS errors in road region
Figure 16. Residuals of the 3D rotation estimation between the first image and all other images in grass and road areas.
(a)The case of 3D rotation. (b)The case of 2D homography
Figure 17. Residus obtainedusingthe transformations estimation between the first image and the 10thimage in case of non-pure rotation mouvement of the camera. Red arrows represent the outliers.
5. Discussions
369
5.1. Timing Results
370
Table2shows that the resampling part of the algorithm is the most consumming task for the
371
two types of transformations. Benefiting from the presence of dual-core processors in the camera,
372
we use the OpenMP library to accelerate the image resampling part, the number of threads was
373
set to 2, the processing time of this part was reduced by half. The acceleration factor when using
374
the nearest-neighbor resampling method is 7. This part of the process should be accelerated even
375
further with the help of the programmable logic (FPGA) to achieve a real time performance.
376
0 0.2 0.4 0.6 0.8 1 1.2 1.4
0 2 4 6 8 10
RMS errors (pixels)
from the first image to the 10th image RMS errors of rotation with subpixel correl RMS errors of homography with subpixel correl
Figure 18.Residuals of the estimation of 3D rotation and 2D homography between the first image and all other images in case of UAV translation.
Figure 19.The stacked image. The red part is the non-common part between the first image and the 10th image.
5.2. Accuracy of IMU
377
Our results show that the errors of components increase slowly and linearly over time,
378
implying that the bias is constant over the sequence duration. This bias can be then estimated
379
(a) (b)
Figure 20. Example illustrating two subsections of two images. (a) is obtained by the average of the ten images, (b) is the stacked image. The angular motion between images is 1.2 degree corresponding to 59 pixels.
by computing the 3D rotation linking the attitude estimated by the photogrammetric method and
380
the one obtained by the IMU. This rotation can then be used to obtain a better initial rotation for
381
the following images, leading to the reduction of the size of the research areas and then of the
382
correlation time, or to correct the biases values in the IMU subsystem; that could be beneficial to
383
other sensors (i.e. LIDAR) on the UAV.
384
5.3. Accuracy of Estimation of Transformations
385
5.3.1. Case of Pure Rotation Movement of the Camera
386
For the two transformations, the RMS errors are less than 0.5 pixel for all the images that are
387
processed. Our results presented in Figure14show also a clear improvement of accuracy when we
388
use the sub-pixel correlation for both transformations. Figure15demonstrates that the difference
389
in variance between sub-pixel correlation and non sub-pixel correlation almost equals to(2/12)
390
corresponding to the variance in uniform distribution; which means that the sub-pixel correlation
391
works as expected. Additionally, the estimation of the homography has less residuals than the
392
rotation, consistent with the fact that former has more degrees of freedom.
393
In both cases, the RMS values of errors increase linearly over images. Figure16shows that it is
394
due to the movement of the grass next to the road: the residuals in road region are constant and
395
significantly lower. It indicates also that the noise of our process is only of the order of 0.2 pixels
396
RMS.
397
5.3.2. Case of Non-pure Rotation Movement of the Camera
398
In this case, the estimation of the rotation with our photogrammetric method doesn’t work
399
optimally because the optical center is moving, thus the residual errors are bigger than expected.
400
They converge to the center of the image as shown in Figure17a because the UAV is experiencing
401
a free-fall. These residuals which are not compensated by the rotation can be absorbed by the 2D
402
homography which include a scale factor: Figure18demonstrates that the residual errors of the
403
estimated homography are constant over images and much smaller than those of the 3D rotation.
404
But, as expected, they are more important in zones where the surface get away from the average
405
plan in scene such as the rooftop shown in Figure17b. It means that for these parts of image, the
406
stacked image will be slightly blurred.
407
Thanks to the RMS error, we have a control quality factor that helps our processing to determine
408
which type of transformation should be used to obtain the best results.
409
5.4. Quality of Resulting Image
410
Figure19demonstrates visually that no motion blur mars the stacked image. Figure20shows
411
that our method can be used to eliminate the motion blur caused by erratic movement of the
412
camera. Images can be improved to a point where the blur is almost undetectable even if the
413
movement of the camera is complex (Figure9), where the trajectories of predicted features are not
414
the same over the entire image. Our method can be applied for images that exceed an angular
415
motion blur of much more than several pixels unlike de-convolution methods [14] that have strict
416
limits on the amount of angular motion.
417
6. Conclusion and Future Works
418
An image processing stacking algorithm has been presented, which consists of several steps.
419
First, feature points are detected in the first image and matched using ZNCC correlation with the
420
help of IMU measurements, then, 3D rotation (and/or 2D homography) between sequential images
421
isestimatedusinga photogrammetric method,which leadsto a better accuracy. This geometrical
422
transformation (3D rotation or 2D homography) isthenused to resample images, and 3D rotation
423
in particularcan be used to recalibrate theIMU gyrometersby estimating their bias. This algorithm
424
benefits from the presence of a FPGA to speed-up some tasks, like detecting feature points. The
425
proposed method was tested with the IGN camera embedded on an UAV, andthe experimental
426
results show the efficiency and accuracy of this method when the UAV is quasi-stationary. The IMU
427
data processing part is already real-time and can be used for other applications.
428
In the future, we will implement a real-time version by acceleration the image resampling part in
429
the FPGA, since thisis the most demanding part ofprocessing. We will refine all parameters to
430
obtain good result in less time. We will also integrate the translation of the UAV in our model using
431
GNSS/IMU data, which maylater be used to correct translation deformation. Other sensors like
432
lidar or other synchronized cameras can benefit from the resulting better quality of the IMU data. In
433
fact, this methodology is intended to be used on an airborne prototype imaging system with different
434
narrow spectral channels composed of 10 synchronized IGN cameras, which will make it possibleto
435
obtainimages with an equivalent long exposure time.
436
Bibliography
437
1. Boucher, P. Domesticating the Drone: The Demilitarisation of Unmanned Aircraft for Civil Markets.
438
Science and Engineering Ethics2015,21, 1393–1412.
439
2. Colomina, I.; Molina, P. Unmanned aerial systems for photogrammetry and remote sensing: A review.
440
ISPRS Journal of Photogrammetry and Remote Sensing2014,92, 79–97.
441
3. Ham, Y.; Han, K.K.; Lin, J.J.; Golparvar-Fard, M. Visual monitoring of civil infrastructure systems
442
via camera-equipped Unmanned Aerial Vehicles (UAVs): a review of related works. Visualization in
443
Engineering2016,4, 1.
444
4. Daakir, M.; Pierrot-Deseilligny, M.; Bosser, P.; Pichard, F.; Thom, C.; Rabot, Y. STUDY OF LEVER-ARM
445
EFFECT USING EMBEDDED PHOTOGRAMMETRY AND ON-BOARD GPS RECEIVER ON UAV FOR
446
METROLOGICAL MAPPING PURPOSE AND PROPOSAL OF A FREE GROUND MEASUREMENTS
447
CALIBRATION PROCEDURE. ISPRS - International Archives of the Photogrammetry, Remote Sensing and
448
Spatial Information Sciences2016,XL-3/W4, 65–70.
449
5. Tournadre, V.; Pierrot-Deseilligny, M.; Faure, P.H. UAV LINEAR PHOTOGRAMMETRY. ISPRS
450
- International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2015,
451
XL-3/W3, 327–333.
452
6. Audi, A.; Pierrot Deseilligny, M.; Meynard, C.; Thom, C. Implementation of real-time functionalities in
453
the hardware of an intelligent ultra-light camera specialized for aerial photography. GeoUAV - ISPRS
454
Geospatial Week2015.
455
7. Le Moigne, J.; Netanyahu, N.S.; Eastman, R.D., Eds. Image registration for remote sensing; Cambridge
456
University Press: Cambridge, New York, 2010.
457
8. Lakshmi, K.D.; Vaithiyanathan, V. Study of Feature based Image Registration Algorithms for Navigation
458
of Unmanned Aerial Vehicles.Indian Journal of Science and Technology2015,8.
459
9. Patel, P.; Shah, V. Image Registration Techniques: A Comprehensive Survey. International Journal of
460
Innovative Research and Development || ISSN 2278 – 02112014,0.
461
10. Mann, S.; Picard, R.W. On Being ’undigital’ With Digital Cameras: Extending Dynamic Range By
462
Combining Differently Exposed Pictures. Society for Imaging Science and Technology Proceedings1995, pp.
463
442–448.
464
11. Debevec, P.E.; Malik, J. Recovering high dynamic range radiance maps from photographs. Society for
465
Imaging Science and Technology Proceedings1997, pp. 369–378.
466
12. Robertson, M.A.; Borman, S.; Stevenson, R.L. Dynamic Range Improvement Through Multiple Exposures.
467
Society for Imaging Science and Technology Proceedings1999, pp. 159–163.
468
13. Joshi, N.; Kang, S.B.; Zitnick, C.L.; Szeliski, R. Image Deblurring Using Inertial Measurement Sensors.
469
ACM Trans. Graph.2010,29, 30:1–30:9.
470
14. Shah, C.A.; Schickler, W. AUTOMATED BLUR DETECTION AND REMOVAL IN AIRBORNE IMAGING
471
SYSTEMS USING IMU DATA. ISPRS - International Archives of the Photogrammetry, Remote Sensing and
472
Spatial Information Sciences2012,XXXIX-B1, 321–323.
473
15. Sieberth, T.; Wackrow, R.; Chandler, J.H. AUTOMATIC ISOLATION OF BLURRED IMAGES FROM UAV
474
IMAGE SEQUENCES. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial
475
Information Sciences2013,XL-1/W2, 361–366.
476
16. Sieberth, T.; Wackrow, R.; Chandler, J.H. UAV IMAGE BLUR – ITS INFLUENCE AND WAYS TO
477
CORRECT IT. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information
478
Sciences2015,XL-1/W4, 33–39.
479
17. Klein, G.; Drummond, T. Tightly Integrated Sensor Fusion for Robust Visual Tracking. In Proc. British
480
Machine Vision Conference (BMVC’02, 2002, pp. 787–796.
481
18. Tsai, F.; Chang, H.; Su, A.Y.S. Combining MEMS-based IMU data and vision-based trajectory estimation.
482
ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences2014,
483
XL-4, 267–271.
484
19. Hol, J.D.; Schön, T.B.; Gustafsson, F.; Slycke, P.J. Sensor Fusion for Augmented Reality."9th International
485
Conference on Information Fusion, Florence, 20062007.
486
20. You, S.; Neumann, U. Fusion of vision and gyro tracking for robust augmented reality registration. Virtual
487
Reality, 2001. Proceedings. IEEE, 2001, pp. 71–78.
488
21. Caarls, J.; Jonker, P.; Persa, S., Sensor Fusion for Augmented Reality. InAmbient Intelligence: First European
489
Symposium, EUSAI 2003, Veldhoven,The Netherlands,November 3-4, 2003. Proceedings; Aarts, E.; Collier, R.W.;
490
van Loenen, E.; de Ruyter, B., Eds.; Springer Berlin Heidelberg: Berlin, Heidelberg, 2003; pp. 160–176.
491
22. Wang, Y.; Hou, Z.; Leman, K.; Chang, R. Real-Time Video Stabilization for Unmanned Aerial Vehicles.
492
MVA2011 IAPR Conference on Machine Vision Applications, June 13-15, 2011, Nara, JAPAN2011.
493
23. Martin, O.; Meynard, C.; Pierrot Deseilligny, M.; Souchon, J.P.; Thom, C. Réalisation d’une caméra
494
photogrammétrique ultralégère et de haute résolution. Colloque Drones et moyens légers aéroportés
495
d’observation, Montpellier, France2014.
496
24. Pierrot Deseilligny, M.; Clery, I. APERO, AN OPEN SOURCE BUNDLE ADJUSMENT SOFTWARE FOR
497
AUTOMATIC CALIBRATION AND ORIENTATION OF SET OF IMAGES.ISPRS - International Archives
498
of the Photogrammetry, Remote Sensing and Spatial Information Sciences2011,XXXVIII-5/W16, 269–276.
499
25. Richard Hartley, S.B.K. Parameter-free Radial Distortion Correction with Centre of Distortion Estimation.
500
Technical report, 2005.
501
26. Pena, M.A Comparative Study of Three Image Matching Algorithms: Sift, Surf, and Fast; BiblioBazaar, 2012.
502
27. Le, X.; Gonzalez, R. Pattern-based corner detection algorithm. Image and Signal Processing and Analysis,
503
2009. ISPA 2009. Proceedings of 6th International Symposium on, 2009, pp. 238–243.
504