Perceptual hashing-based movement compensation applied to in vivo two-photon microscopy

(1)

Perceptual hashing-based movement compensation

applied to in vivo two-photon microscopy

Mémoire

Grégory Sadetsky

Maîtrise en Biophotonique

Maître ès sciences (M.Sc.)

(2)

(3)

Résumé

Le mouvement animal, présent lors d’expériences in vivo effectuées à l’aide de microscopie à effet deux photons, nuit à l’observation de phénomènes biologiques et à l’analyse subséquente des flux vidéos acquis. Ceci s’explique entre autres par le fait que, dû au sectionnement optique, tout déplacement dans l’axe z (perpendiculaire au plan d’imagerie) modifie drastiquement l’image et ne permet qu’une observation instable de l’échantillon examiné.

En appliquant une fonction de hachage aux images acquises, nous produisons des vecteurs décrivant les qualités perceptuelles de ces images ; ces vecteurs peuvent alors servir à comparer les images une à une, en temps réel. Ces comparaisons nous permettent de réunir les images en groupes correspondant à des plans z distincts.

Ainsi, du processus de hachage, de comparaison et de groupage d’images résulte une méthode logicielle de compensation de mouvement en temps réel qui peut être utilisée dans le cadre d’expériences biologiques en laboratoire.

(4)

(5)

Abstract

Animal movement during in vivo two-photon microscopy experiments hinders efforts at ob-serving biological phenomena and the subsequent analysis of the acquired video streams. One of the reasons for this is that, due to optical sectioning, any displacement in the z-axis (perpen-dicular to the plane of imaging) dramatically changes the collected image and thus provides the experimenter with an unstable view of the imaged sample.

By applying a hashing function on the acquired video frames, we produce vectors embodying the images’ perceptual qualities; these vectors can then be used to compare the frames one to another, in real-time. These comparisons allow us to group similar images in clusters corresponding to distinct z-planes.

In effect, the process of perceptually hashing, comparing and grouping video frames provides us with software-based, real-time movement compensation which can be used in a biological laboratory setting.

(6)

(7)

List of Tables

2.1 Confusion table. The threshold was set to achieve about 95% specificity in both

cases. . . 22 2.2 Speed of the operations required for real-time image comparison using

(10)

(11)

List of Figures

1.1 Steps of radial hash calculation . . . 8 1.2 Comparison of images before and after the blur function have been applied.

High-frequency features have been removed. The right size shows a zoom on a

feature of the image. This zoom was created using the nearest neighbor method. 9 1.3 The Radon Transform. For each angle Θ of a given set, the sum of the pixels

traversed by a line will be captured at the sensor. . . 10 1.4 Schematic overview of the 180 parallel lines being projected through the

orig-inal image in 90 steps. The top portion of each diagram corresponds to the image, while the bottom portion corresponds to the resulting, temporary image formed from the radial projection lines. The blue and red projection lines pass-ing through the image correspond to the blue and red lines in the temporary projection image. Both blue and red projection lines are calculated as part of

the same looping iteration to minimize computation time. . . 11 1.5 Radial variance hashing of 3 images. From top to bottom for each image: the

source image, the temporary image formed from the radial projection lines, the variance vector and the corresponding direct cosine transform. Notice that sample images i and ii have more in common visually, a fact that is reflected

in the equivalent projection line transform, variance and DCT results. . . 13 2.1 Comparative robustness of the pHash algorithm and the pHash algorithm to

three common image deformation types. . . 18 2.2 Robustness to z-axis movement. Colors represent different ROIs. The mean

value of the normalized cross-correlation fit and the radial variance hashing similarity score was calculated for a thousand pairs of images separated by a

known z-axis δz. . . 19 2.3 Distribution of cross-correlation fit between pairs of images. The top panel

(actual positives) count the number of pairs of images taken from the same ROI at the same z-axis position whereas the bottom panel counts pairs of images taken from different ROIs altogether (actual negatives). If the cross correlation fit between a pair of images was a good measure of their similarity, we would expect the distribution of the actual positives to be skewed to the right and the

(12)

2.4 Distribution of the delta pHash between pairs of images. The top panel (actual positives) counts the number of pairs of images taken from the same ROI at the same z-axis position whereas the bottom panel counts pairs of images taken from different ROIs altogether (actual negatives). We can observe that delta pHash between a pair of images is a good measure of their similarity since the distribution of actual positives is skewed to the right and the distribution of

actual negatives is skewed to left. . . 21 2.5 Comparison of the normalized cross-correlation and the radial variance hashing

algorithms to compare pairs of images. In the case of the NCC, images are compared directly in 13.21 ms. In the case of the pHash algorithm, once a hash has been calculated for an image, further comparisons will take less than

a hundredth of a ms. . . 23 3.1 Shot-noise reduction and sharpness of images when time-sequential or

similarity-sequential images are compensated on the x/y axes and averaged. Panel A shows the single reference image; Panel B uses 10 time-sequential images and panel C uses 10 similarity-sequential images with a radial variance hashing sim-ilarity score threshold of 0.708 when compared to the reference image. A zoom on part of a structure is shown on the right side of each panel. Scale bars are

25 µm and 4 µm on the left and right sides, respectively. . . 29 3.2 Demonstration of the possible misalignment when images from different z-stacks

are aligned using the normalized cross-correlation algorithm. The reference image is shown on the bottom row while the 9 previous images (numbered from -9 to -1) are shown on the top rows. Images with a radial variance hashing similarity score when compared to the reference image below 0.708 are framed in red. These images are considered as belonging to a different z-stack and

would not have been aligned to the reference image. Scale bar is 25 µm. . . 30

(13)

L’imaginaire est ce qui tend à devenir réel

(14)

(15)

Foreword

The author wishes to express his deepest gratitude to his supervisor Professor Daniel Côté for the opportunity to work on this mémoire, his unwavering dedication, helpful advice and ongoing support.

The author wishes to express sincere thanks to his co-supervisor Dr. Mario Méthot for his intellectual and moral support.

The author also wishes to thank his committee members, Professor Yves De Koninck and Professor Simon Rainville, as well as his supervisor and co-supervisor, for generously agreeing to serve as examiners and for providing constructive and useful comments during his Master’s seminar.

In addition, a very special thanks to Émilie Chamma for her invaluable help in performing in vivo experiments.

The author would like to extend his appreciation to all of his research group colleagues for their encouragement, enthusiasm and wise counsel.

A word of heartfelt thanks to Ann-Julie Rhéaume for her unfailing friendship and guidance. Last, but not least, the author wishes to thank his family for their loving encouragement and support.

(16)

(17)

Chapter 1 Introduction

The advent of optical microscopy has revolutionized biology. With a simple optical con-struction, Robert Hooke in 1665 made possible the visualization of cellular and sub-cellular details to the human eye. In fact, since cellular structures are on the order of microns, op-tical microscopy is the only modality that can offer the required high spatial resolution. In fact, by making use of modalities such as fluorescence, optical microscopy can offer the very high specificity needed to pursue complex cellular imaging studies with contrast based on genetically-encoded proteins, fluorescent antibodies, or organic dyes that target membranes or other organelles.

Yet, widefield optical microscopy, as originally presented by Hooke, can suffer from a large amount of background light since there is no rejection of the out of focus light. It becomes highly problematic for thick tissue since the tissue from above and below the object plane is backscattered into the objective and appears as a blur. To remove this background, one needs to slice the tissue in very fine slices approximately 10 microns thick, which is possible but time consuming and incompatible with live specimens. For imaging cell cultures, this is not a problem since there is little to no scattering above and below the object plane.

To alleviate this problem, two “optical sectioning strategies” have emerged. The first and most common strategy is confocal microscopy, where instead of flooding the sample with light and imaging with a camera, a laser is focused and scanned onto the sample while the focal spot is imaged into a pinhole that stands right in front of a detector (typically a photomultiplier tube). The pinhole will remove the light that does not come from the focal spot. This results in an optically-sectioned image that is reconstructed by raster scan. The second and more recent strategy introduced by Denk et al. [1990] is to use ultrashort pulses to capitalize on the nonlinear interactions between the light and the sample at high irradiance. Since the irradiance (number of photons per seconds per area) increases as the beam approaches the focal spot and then decreases again, a maximum irradiance at the focal spot will maximize nonlinear interactions and the excitation of fluorescence through a nonlinear process such as two-photon

(18)

excited fluorescence. The result is that excitation of fluorescence is only performed at the focal spot. With a large-area detector, close to the objective, one can collect the light knowing it originates from the focal spot and reconstruct the image from the raster scan. Two-photon microscopy was an improvement upon the previous technology since it increased the emitted light at the focal point compared to the rest of the sample and removed the constraint of having to place a pinhole before the light detector to exclude the contribution of out of focus planes. The combination of this technology with fast imaging rate scanning and fluorescent molecular probes made it possible to study living structures, especially live animals.

However, microscopy in live animals brings its own set of challenges, the most important being the movement of the animal. Indeed, since the thickness of the optical section is on the order of a few microns, any displacement will have a significant impact on the plane being imaged. Live imaging studies of the spinal cord have been particularly difficult due to the combination of heart beat and breathing of the animal. Surgical strategies have been devised to mechanically stabilize the animal during imaging (cranial windows with a stereotaxic holder, for instance) and other active devices have also been invented. Although these techniques can be very powerful, they are also time-consuming to master by the experimenters and lack wide applicability.

Below we discuss in more details the various strategies to compensate the animal’s movement, including the one retained for this work, a numerical analysis that rapidly classifies images into sets of similar or dissimilar images.

1.1 Movement compensation

A M × N pixel image can be thought of representing a vector with M × N components. Completely comparing two images will require at least M × N single operations. In the case of 500 x 500 pixel images, this will represent 250,000 comparison operations. This can be prohibitive if the comparison operation requires a non-negligible time to compute. On the other hand, if one reduces an M × N pixel image to a Q-sized vector, the computation time may be reduced by as much as the reduction factor M ×N/Q. This reduction however, cannot be performed indiscriminately of the nature of the image: movement and shot noise (salt-and-pepper appearance) are inherent properties of live animal microscopy and therefore, any image reduction strategy must not strongly depend on these two aspects. Hence, we would like to extract image features rather than work with the images themselves. Before discussing the data reduction strategy retained in this project, a section will describe animal movement as well as shot-noise as experienced in live animal microcopy. Then image reduction strategies will be discussed in general terms.

(19)

1.2 Movement and its effect on microscopy

The use of optically sectioned imaging techniques (confocal microscopy, multiphoton mi-croscopy) provide exquisitely well defined images of a single plane but are obviously sensitive to the exact position of that plane. Typical z-sectioning in a microscope ranges from 1-5 µm which means any movement on that order will affect the plane of imaging and sample a dif-ferent plane that the one desired. In live animal microscopy, the breathing of the animal and also beating of the heart always leads to small-scale movement. This movement can be mini-mized a priori in the laboratory with strategies such as better surgical and mounting protocols, real-time movement compensation, or a posteriori with software operations on images. 1.2.1 In-laboratory strategies

Movement compensation in the laboratory allows the experimenter to get the most out of the limited imaging time with the anesthetized animal (usually less than four hours). By adjusting the imaging immediately upon movement, the number of useful images acquired per session is maximized for a single plane. This gives more time to look for regions of interests (ROI), validating data, and attempting new experiments.

Spinal stabilization devices are available as well as deep anesthesia inducing drugs [Davalos et al., 2008]. These technologies can minimize respiratory-induced movement, but they may also interfere with the biological processes under study.

A successful movement compensation approach based on the real-time monitoring of the tissue surface and immediate adjustment of the imaged sample’s position allowed the imaging of a living rat’s spinal cord at millisecond time scale. The device developed by Laffray et al. [2011] maintains a constant distance between the objective and the tissue by illuminating the sample with an independent off-axis light beam. The lateral displacement of its reflexion is an indicator of the Z sample position which is used to adjust a nanopositioner onto which the objective is mounted, effectively closing the feedback loop.

However, these strategies may not be compatible with all microscopy setups, require the acquisition and use of hardware, and limit the videos to the recorded plane. In fact, movement can be beneficial in the sense that it provides many different planes to be analyzed. If one can select and categorize images, it may be possible to extract more information from a moving sample by a posteriori analysis.

1.2.2 A posteriori strategies

Being able to compensate the animal’s movement after the experiment, as this movement is represented in the video stream, can also be very useful. It allows the analysis of video streams which have been previously acquired and the creation of video sequences in which the studied

(20)

biological processes can be seen more clearly and without interruption.

It is possible to use software methods to compensate the animal’s movement in real-time or in post processing. Furthermore, a software solution requires no hardware acquisition nor adjustment and can gather and discern images originating from different z-levels.

Movement on the x and y axes can be compensated by realigning every entering image with the previous one. This makes lateral movement of the target appear seamless to the experimenter. In 2008, our lab developed a system allowing image acquisition at the video-rate of 30 frames per second (fps) that includes software allowing for real-time movement correction on the x-y plane [Veilleux et al., 2008]. The real-time correction is performed by finding the maximum of the normalized cross-correlation (NCC) coefficient between the current and the previous image and displacing the new image in the direction of the resulting vector T.

This technique gave satisfactory results when movement was limited to the x/y axes. However, movement on the z-axis (perpendicular to the plane of imaging) proved more challenging. The reason for this is that while displacement on the x-y plane will generate images with large regions of similarity, movement on the z-axis will change the image dramatically if its amplitude is greater than the optical depth.

We thus posit that similar images acquired using two-photon microscopy are likely to come from the same optical slice whereas dissimilar images are likely to come from different slices. An algorithm capable of comparing the current image to a great number of previously acquired images would allow the real-time exclusion of images falling outside a z-stack range chosen by the experimenter and greatly improve movement compensation.

1.3 Shot noise in two-photon microscopy

Shot noise originates from the statistical nature of photon emission and detection. The number of photons emited per unit of time randomly fluctuates over time in such a way that the root-mean-square deviation in the number of photons emitted (N) is the square root of the average number. As a result, the signal-to-noise ratio (SNR) is given by:

SNR = _√N

N =

√ N

In two-photon laser scanning fluorescence microscopy, the very faint photon signal often makes it necessary to increase the signal to noise ratio (SNR). Indeed, the excited volume emits very few photons. Combining a number of images would have the same effect on SRN as increasing the total number of photons emitted (N) i.e., it would increase SNR proportionally to the square root of N.

(21)

On a still image, the noise will be distributed randomly while the signal will remain in place, so averaging around ten subsequent images will increase this ratio three-fold. However, on a moving target, the signal will move and the averaged image will be blurred. Grouping similar images can help circumvent this problem by averaging aligned images from the same optical slice.

1.4 Algorithms to calculate the similarity of images

The similarity or likeness of two images is often expressed as a distance, which can be calculated in many ways.

To be suitable for the purposes of movement compensation and increasing the signal-to-noise ratio, the distance between two images must discriminate dissimilar images while being robust to many types of transformations: rotation, x-y translation, noise, etc. For an algorithm to be considered robust, a small change in an image (a translation, for instance) should have a small effect on the resulting distance (similarity) vector.

Furthermore, if movement compensation is going to be done in real-time while acquiring a 30 fps video stream, it must be possible to calculate dozens of distances for each image coming in, which means that the calculation must not take more than a few milliseconds. The speed of comparison is of the utmost importance.

1.4.1 Intensity cross-correlation

To establish the alignment between two images, one can be shifted relative to the other while trying to maximize the product of the two images’ pixel values.

Given two images I0(x, y)and I1(x, y), the cross correlation is calculated as follows:

I0(x, y) ? I1(x, y) = W −1 X i=0 H−1 X j=0 I0(i, j)I1(x + i, y + j) (1.1)

where W and H are the width and height of the images.

A further optimization lies in the fact that this operation can be done in frequency space – this corresponds to the Correlation Theorem.

Multiplying the complex conjugated Fourier transform of I0, symbolized by F {I0}, by the

Fourier transform of I1, F {I1}, gives the Fourier transform of the cross-correlation:

(22)

The Fourier transform of the two images, the calculation of the complex vector conjugate of one of the images, and the inverse Fourier transform of the multiplication result are done in our in-house microscopy software iPhoton using hardware-accelerated functions available through the Apple Accelerate framework.

The location of the maximum cross-correlation score yields a displacement vector between the two images that are being compared. By applying the inverse of this vector to the second image by shifting its pixels, we register it to the first image.

An intrinsic problem of intensity cross-correlation for deriving the similarity between two images is that it is sensible to differences in pixel intensity. Therefore, other algorithms may be better suited for our purpose of getting a similarity score between two images.

1.4.2 Phase cross-correlation

While intensity cross-correlation operates in the image domain, phase cross-correlation finds the transformation parameters for registration of the images while working in the Fourier domain. The phase cross correlation between two images I0 and I1 is given by

PCorr(I0, I1) = F−1 F {I0}∗ |F {I₀}| F {I1} |F {I₁}| . (1.3)

This method has the advantage of not being sensitive to pixel intensity, but has reduced overall performance in terms of its comparative abilities.

1.4.3 Normalized cross-correlation

Normalized cross-correlation is considered to be the gold standard for image comparison. It is used in the video acquisition software of the system of Veilleux et al. [2008] for the purpose of real-time correction of x- and y-axis displacement.

NCC(u) = P i [I0(xi)][I1(ri+ u)] pP i[I0(xi)]2pPi[I1(ri+ u)]2 (1.4) where u = (u, v) is the displacement between the two images [Szeliski, 2011].

The idea behind this formula is to maximize the product (or cross-correlation) between the two aligned images.

This method is very efficient for the correction of movement on the X/Y axis. However, relative to other existing algorithms, it is time-consuming and not robust to other types of image deformations encountered in two-photon microscopy, namely movement on the z-axis, rotation and noise.

(23)

All of the above methods for comparing two images perform the comparison before reducing the amount of data. Consequently, is it computationally intensive and not suitable for real-time microscopy applications.

1.4.4 Perceptual hashing

Perceptual hashing functions are designed to generate different hash values (n-dimensional vectors) for different image inputs in such a way that the distance between two vectors will be proportional to the perceptual distance between the two corresponding images. Perceptually similar image will generate close hashes while dissimilar images will generate hashes remote from each other.

Perceptual hash functions are widely used to authenticate and classify multimedia files but to our knowledge, their potential in the field of fluorescent microscopy has not yet been assessed. One of the advantages of these functions is that once a hash value has been calculated for an image, this unique value can be compared to that of other images very efficiently. The comparison of two hashes relies on a cross-correlation, but since the hash value of an image is about 5 orders of magnitude smaller than the original image, the computing time required to compare two hashes is significantly smaller than the time required to compute a NCC comparison of two images.

Among the many perceptual hashing algorithms available, we chose to use the radial variance based hash [De Roover et al., 2005, Standaert et al., 2005] because of its good performances in benchmark tests measuring speed and robustness against the types of deformations [Zauner, 2010] we are likely to encounter in the context of in vivo two-photon microscopy, namely translation, rotation and noise.

The perceptual hashing of an image can only be done for one image channel at a time. In our case, the biologically relevant information comes from the two-photon fluorescence channel. The reflectance confocal channel is not taken into account.

We will attempt to demonstrate that perceptual hashing yields accurate similarity scores while being fast enough for real-time microscopy applications such as movement compensation through image classification.

The radial variance hash algorithm

Given two images to compare (Image1 and Image2), the radial variance hash algorithm is done in three steps:

1. Calculate radial hash of Image1 2. Calculate radial hash of Image2

(24)

0 50 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 Blur θ1 23.2 54.5 ... ... θ2 θ3 θ1 θ2 θn DCT 30 40 50 0 100 200 300 400 500 600 700 800

}

θ n s_tr ip 40 co eff. 320-bit vector Va ria nc_e Va ria nc_e Va ria nc_e

Figure 1.1: Steps of radial hash calculation

The calculation of the radial hash of one image is done in three main steps, each of which is described in more detail below and as illustrated in Figure 1.1:

1. Blur the image

2. Construct a radial variance vector from the resulting image 8

(25)

Original

Blurred

mercredi 7 août 2013

Figure 1.2: Comparison of images before and after the blur function have been applied. High-frequency features have been removed. The right size shows a zoom on a feature of the image. This zoom was created using the nearest neighbor method.

3. Apply the Discrete Cosine Transform to the radial variance vector

Blur. Applying a blur filter will pass through the lower frequencies while attenuating higher frequencies. This keeps the important, low-frequency features of the image (which we are interested in matching) while getting rid of high-frequency noise (whose random source will vary from image to image).

We apply a Gaussian blur with a standard deviation of 1.0. A Gaussian blur corresponds to a convolution of a 2-D Gaussian distribution with the image’s pixel values (see Figure 1.2). Radial Variance Vector. The radial variance vector can be obtained by applying a modified version of the Radon transform to an image. In its original form, the Radon transform will compute the sum of pixels over of a set of straight parallel line projections passing through a sample for a number of angles (see Figure 1.3 for an overview and Figure 1.4 for a schematic overview of the parallel projection lines passing through the image). In the medical field, the main application of the Radon transform is to reconstruct images (using the inverse Radon

(26)

y

x

So

urce

Se

nso

r

ϕ

Figure 1.3: The Radon Transform. For each angle Θ of a given set, the sum of the pixels traversed by a line will be captured at the sensor.

transform) formed as part of tomographic imaging processes, such as computerised tomography or magnetic resonance imaging [Deans, 2007].

In Standaert et al. [2005], the authors proposed a modification of the Radon transform called the Radial Soft Hash Algorithm (RADISH). This algorithm captures the variance of the pixels, instead of their sum, along each projection line in the resulting vector. This way, luminance dis-continuities are better captured along the projection lines than the original Radon transform. In an image, these discontinuities correspond to edges that are orthogonal to the projection direction. Hence, the variance is expected to capture relevant information about the visual content of an image.

The RADISH feature vector R[φ] is defined as follows: 10

(27)

Step 0

Step 30

Step 60

Step 90

Figure 1.4: Schematic overview of the 180 parallel lines being projected through the original image in 90 steps. The top portion of each diagram corresponds to the image, while the bottom portion corresponds to the resulting, temporary image formed from the radial projection lines. The blue and red projection lines passing through the image correspond to the blue and red lines in the temporary projection image. Both blue and red projection lines are calculated as part of the same looping iteration to minimize computation time.

R[φ] = P (x,y)∈Γ(φ)I2(x,y) #Γ(φ) − P (x,y)∈Γ(φ)I(x,y) #Γ(φ) 2

where Γ(φ) denotes the set of pixels (x, y) on the projection line corresponding to a given angle φ, I(x, y) is the luminance value of pixel (x, y) and #Γ(φ) is the number of pixels in Γ(φ).

Discrete Cosine Transform. As the last step of an image’s hash calculation, a Discrete Cosine Transform (DCT), a special case of a discrete Fourier transform, is applied to the resulting radial variance vector to form a compact and robust image feature vector. The 40 lowest frequency coefficients are extracted from the DCT’s result to form the radial variance vector. This approach was proposed by De Roover et al. [2005] who demonstrated that the addition of a DCT improves the qualities of a calculated perceptual hash for an image.

Distance calculation

Finally, the distance between two radial variance hash vectors is defined by the peak of cross-correlation between them. It is defined for hashes p and q, consisting each of N (40) coefficients,

(28)

as: d(p, q) =max       N P j=0 (pj − ¯p)(q(N +j−i)modN − ¯q) s N P j=0 (pj− ¯p)2 N P j=0 (q(N +j−i)modN − ¯q)2       ∀i ∈ {0, . . . , N − 1} (1.5)

The entire radial variance hashing process is summarized in Figure 1.5.

Overall, the characteristics of the pHash (perceptual hashing) algorithm makes it promising for z-axis movement compensation as well as noise reduction in the context of in vivo two photon microscopy. We will explore these topics in the next two chapters of this document.

1.5 Hypothesis and objectives

The hypotheses and objectives of this project were the following:

Hypothesis 1: The radial variance hashing algorithm will allow us to group images based on their z-axis position with more accuracy, robustness and speed than the normalized cross-correlation algorithm.

Objectives:

– Compare the robustness to translation, rotation and noise of the radial variance hash and normalized cross-correlation algorithms.

– Compare the performance of the hashing and NCC algorithms in grouping images from the same z-axis.

– Compare the speed of the hashing and NCC algorithms at comparing two images. Hypothesis 2: Using the radial variance hashing algorithm in real-time in vivo microscopy will reduce shot noise and allow the concurrent visualization of multiple stabilized z-planes.

Objectives:

– Reduce shot noise by averaging ten images from the same optical slice.

– Compare the sharpness of images obtained by averaging time-consecutive images and images coming from the same optical slice.

– Integrate an implementation of the radial variance hash algorithm into iPhoton. – Implement an algorithm to separate a video stream into a number of stabilized optical

slices in real-time. 12

(29)

R a d ia l p ro je ct io n l in e s V a ri a n ce DCT

i

−4.0 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0 0.5 1.0 1.5 2.0 2.5 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 5 10 15 20 25 30 35 40

ii

So u rce −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 5 10 15 20 25 30 35 40

iii

−3.5 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0 0.5 1.0 1.5 2.0 2.5 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 5 10 15 20 25 30 35 40

Figure 1.5: Radial variance hashing of 3 images. From top to bottom for each image: the source image, the temporary image formed from the radial projection lines, the variance vector and the corresponding direct cosine transform. Notice that sample images i and ii have more in common visually, a fact that is reflected in the equivalent projection line transform, variance and DCT results.

(30)

1.6 Deliverable

At the end of this project, we expect to have a tool that will :

1. determine whether or not an image is similar to a reference image for averaging, 2. group images in real-time for averaging,

3. allow the concurrent display of multiple video streams corresponding to different optical slices.

(31)

Chapter 2 Assessment of the pHash algorithm in

two-photon microscopy

2.1 Preamble

Many different laboratory situations will benefit from the tool developed for this mémoire. As an example, we will demonstrate its applicability and characterize its limits with a blood vessel imaging modality. This model is used to study the breakdown of the blood brain barrier in a live animal before they show clinical symptoms.

2.2 Image acquisition

The multimodal imaging system allows for simultaneous acquisition and separation of different signals based on their wavelengths. The acquired image’s resolution was 748 pixels wide by 500 pixels high. We used a 30 fps multimodality imaging system that is designed specifically for live animal imaging cell tracking which has been described in detail in Veilleux et al. [2008]:

Briefly, the heart of the system is the fast laser scanning mechanism and its timing circuitry. The approach for the scanning platform is based on the original design of Rajadhyaksha and Webb. It consists in a galvanometer mounted mirror and a spinning polygonal mirror that produce a unidirectional raster scan pattern. The horizontal (or fast) axis is scanned by the spinning polygonal mirror (DT-36-290-025, Lincoln Laser) at a speed of 480 revolutions per second yielding a line scanning rate of approximately 18 kHz. The vertical (or slow) axis scan is performed by the galvanometer-based optical scanner (6240H, Cambridge Technology) coupled to a 15 mm clear aperture silver coated mirror. To obtain the raster scan, the galvanometer mounted mirror is synchronously scanned vertically at 30 Hz.

Light excitation is provided by a femtosecond Ti-Sapphire laser (Maitai HP, Spectra-Physics, Irvine, CA), operating at 80 MHz with a pulse duration of about 140 fs. The excitation beam

(32)

is focused onto the sample via a 40× 0.8 NA Olympus objective.

The emitted fluorescence is first separated from excitation with a primary dichroic filter (FF735-Di01, Semrock). Then signals arising from the Texas Red is filtered with the emis-sion filter (FF02-632/22-25, Semrock) and finally focussed on a photomultiplier tube (R3896, Hamamatsu).

The reflectance signal, emitted as part of the confocal backscattering process, is acquired through an avalanche photodiode (APD).

The analog signals from the PMT and APD detectors and the synchronization signals, VSYNC and HSYNC, are sent to a video acquisition board (VGA2PCIe, Epiphan). A typical pixel clock, internally generated by the frame grabber board, of 10 MHz is used. This corresponds to 100 ns of pixel dwell time.

2.2.1 Reference

In order to quantify typical movement in all three dimensions due to animal breathing and heartbeat, two series of experiments were performed with the same animals, alive and after sacrifice. After live acquisition, the data set provided a reference to quantify precisely this movement and the accuracy of the algorithm.

Experiments took place in the course of 2 days, March 20th and 21st 2013, at the Centre de recherche de l’Institut universitaire en santé mentale de Québec.

Two mice were prepared each day, in accordance to the Animal Care Committee rules (CPAUL). All mice were anesthesized using isofluorane. 30 minutes after isofluorane was inhaled, Texas Red dextran was injected intravenously. One of the four mice died shortly after the Texas Red was injected; no images were acquired for this animal. The mice were mounted on a micromanipulator platform (MPC-385, Sutter) whose resolution (the minimal microstep size is 62.5 nanometers) allowed us to keep track of the z-position of the platform.

For each biological replicate (mouse), 10 to 30 regions of interest (ROIs) were chosen to cover a range of differing biological image types. The ROIs were chosen to encompass one or a few branchings of blood vessels in the spinal cord.

For each ROI, video acquisition and recording to disk at 30 frames per second were done for at least 20 seconds, acquiring at least 600 frames in the process. Each frame was composed of two channels: a structural channel based upon the reflectance signal (assigned to the blue channel) and the fluorescent dye signal (assigned to the green channel). The duration of the acquisition allowed us to see multiple instances of the mice breathing, at a typical 1 Hz breathing frequency.

After the acquisition of video streams was done, the mice were killed by an overdose of ketamine 16

(33)

xylazine. Once the mice were confirmed to be dead, we found the ROIs which were recorded while the mouse was alive, and then proceeded to acquire video streams of the same, still ROIs for more than 20 z-level steps, each step separated from the next by a distance of 2 µm. We made sure to acquire still ROI video streams above the structure (in terms of the z-position), throughout the structure, and below the structure of interest.

2.3 Robustness to XYZ translation, rotation and noise

To confirm that the radial variance hashing algorithm (the "pHash algorithm") can operate within our experimental setting, we checked its robustness against 3 types of image distortions common in in vivo two-photon microscopy, namely translation and rotation in the image’s plane and noise. These distortions were made against 20 images sourced from randomly chosen ROIs from different experiments.

The distortions were applied to the images using a custom-made Python script (using the Python Imaging Library (PIL) package) and compared to their original counterparts using the pHash algorithm as well as normalized cross-correlation (NCC).

2.3.1 Translation in XY parallel to image plane

To generate translated images, we determined the comparison image size by substracting the translation distance (the same distance was used in the x and y axes) from the total image size. Thus, for a source 748x500 image, considering an example translation of 100 pixels, we created two 648x400 images: one whose corners were (0,0) and (648,400) in the original image, and another cropped image bound to the (100,100) and (748,500) corners. These two images were then compared using pHash and NCC for a range of translation values (0 to 180 pixels).

2.3.2 Translation in Z perpendicular to image plane

To assess robustness to movement on the z-axis of the pHash and the NCC algorithms, we compared pairs of images taken from the same ROI at approximately the same time with a z-axis position shift (δz) ranging from 0 to 20 microns in steps of 2 microns.

2.3.3 Rotation on the image’s plane

To generate rotated images, we used the Python PIL package’s “rotate” function upon the original images. This function was called with a number of angle values; it mapped the rotated pixels using a “nearest neighbour” strategy.

(34)

Si mi la ri ty sco re 0 0.5 1.0 Translation distance (px) 0 20 40 60 80_{100 120 140 160 180} Rotation angle 0° 0.9° 1.8° 2.7° 3.6° 4.5° 5.4° 6.3° 7.2° 8.1° Noise level 1 7 13 19 25 31 37 43 49 55 pHash Cross correlation

Figure 2.1: Comparative robustness of the pHash algorithm and the pHash algorithm to three common image deformation types.

2.3.4 Noise

To generate noisy images, we generated random images of the same size as the original images. Pixel values for each image were generated using the Python numpy package’s numpy.random.normalfunction, for which the scale value was varied. The noisy and original images were summed together pixel-by-pixel, and the resulting distorted image was compared to the original one using both comparison algorithms.

2.3.5 Results

Figure 2.1 shows the comparative robustness of the normalized cross-correlation and of the pHash algorithm to the three image deformations described above. We see that the NCC performs better than pHash in the case of translation on the plane of the image. As for rotation and noise, pHash shows itself to be a better suited comparison algorithm as it is more robust to these two types of distortion.

Regarding Z-translation, as can be seen on figure 2.2, the cross-correlation fit cannot discrim-inate between images on the same z-stack and images on different z-stacks. Indeed, scores lay approximately between 0.1 and 0.2 regardless of the distance between two images (see Figure 2.3 for a distribution of the scores).

By contrast, the distance between the pHash values of two images (also known as the delta pHash), is informative on whether two images are close together on the z-axis. This feature of the pHash score will allow us to define a score above which images are less than 2 microns apart (see section 2.4.1). As images get further apart on the z-axis, the delta pHash between 18

(35)

d e lt a p h a sh 0 0.5 1.0 cro ss co rre la tio n fit 0 0.1 0.2 delta z (microns) 0 1 2 3 4 5 6 7 8 9 10 Scores vs delta z

Figure 2.2: Robustness to z-axis movement. Colors represent different ROIs. The mean value of the normalized cross-correlation fit and the radial variance hashing similarity score was calculated for a thousand pairs of images separated by a known z-axis δz.

them seems to decrease more rapidly for some ROIs than others, following the regularity of the actual structure being observed.

2.4 Use of pHash distance for grouping

Z-stack grouping allows software-based compensation of movement by selectively showing images from a stream similar to a representative image taken from a z-stack.

To assess the performance of the pHash algorithm in this task, we needed spinal cord images with known relative z-coordinates which could not be obtained from a living animal. To circumvent this problem, we acquired images from 10 to 30 regions of interest (ROI) at least 600 time points for each mouse during the time it was alive and repeated the image acquisition process, for the same ROIs, once it was dead.

(36)

Act u a l n e g a tive s 0 10 20 30 40 Act u a l p o si ti ve s 0 10 20 30 40

Cross correlation score

0 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32

Figure 2.3: Distribution of cross-correlation fit between pairs of images. The top panel (actual positives) count the number of pairs of images taken from the same ROI at the same z-axis position whereas the bottom panel counts pairs of images taken from different ROIs altogether (actual negatives). If the cross correlation fit between a pair of images was a good measure of their similarity, we would expect the distribution of the actual positives to be skewed to the right and the distribution of actual negatives to be skewed to left.

the gold standard for determining image similarity, on four main features: sensitivity, speci-ficity, robustness and speed.

2.4.1 Sensitivity and specificity

In the context of grouping microscopy images by their similarity, a sensitive method will have a high proportion of images from the same optical slice grouped as such (high true positive rate) while a specific method will rarely group images from different optical slices together (low false positive rate).

In order to keep a balance between these two important features of our grouping algorithms, we compared the normalized cross-correlation and pHash scores of two sets of 1500 pairs of images representing actual positives and actual negatives. Actual positives are images taken from the same ROI at the same z-axis position at a slightly different time. Actual negatives are images taken from different ROIs. Figures 2.3 and 2.4 show the distribution of the similarity scores of both algorithms on actual positives and actual negatives.

From this data, a threshold for calling two images similar was set for both algorithms (0.144 20

(37)

Act u a l n e g a tive s 0 20 40 60 80 Act u a l p o si tive s 0 100 200 300 400 500 pHash Score 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Figure 2.4: Distribution of the delta pHash between pairs of images. The top panel (actual positives) counts the number of pairs of images taken from the same ROI at the same z-axis position whereas the bottom panel counts pairs of images taken from different ROIs altogether (actual negatives). We can observe that delta pHash between a pair of images is a good measure of their similarity since the distribution of actual positives is skewed to the right and the distribution of actual negatives is skewed to left.

for the normalized cross-correlation and 0.708 for the pHash algorithm) such that there would be 5% of the type I error (false positives). The corresponding type II error (false negatives) was 72.07% for cross correlation and 4.93% for the pHash algorithm (Table 2.1).

This demonstrates that the pHash algorithms is much better at discriminating dissimilar images than the normalized cross-correlation.

Furthermore, as seen on figure 2.2, the pHash score stays above the 0.708 threshold for at least a 2 micron z-distance in all ROIs. This means that the 0.708 threshold could be used to filter images belonging to a 2 micron thick z-stack.

Considering this, in our experimental context, the pHash algorithm can associate images based on their z-axis position within a distance of 2 microns. By contrast, the z-axis threshold distance allowing the cross correlation algorithm to group two images is below 2 microns.

(38)

Cross correlation (threshold = 0.144) pHash (threshold = 0.708) Predicted positive Predicted negative Predicted positive Predicted negative Actual positive 30.0 ± 3.6% 70.0 ± 2.3% 95.2 ± 1.8% 4.8 ± 1.9%

Actual negative 3.73 ± 2.0% 96.3 ± 1.0% 8.3 ± 3.2% 91.6 ± 2.0% Table 2.1: Confusion table. The threshold was set to achieve about 95% specificity in both cases.

2.5 Computing time

When the radial variance hashing algorithm is used for movement compensation in real-time, each newly acquired image is compared to a large number of other images in the stream (see section 3.4). This process takes place in two steps. The first step is to calculate the radial variance hash of the entering image. The hashing function will only be called once per entering image and the hash values will be stored in memory. In the second step, the similarity score between the hash value of the entering image and the hash value of a large number of other images will be computed. Since the distance function will happen many times per entering image, it must be very fast to be amenable to real-time processing.

By contrast, it is impossible to store a value that will speed up the comparison between two images when the normalized cross-correlation is used. The entire algorithm has to be run for each pair of images (see figure 2.5).

In the context of two-photon microscopy at 30 fps, the image processing steps done for every newly acquired image must run in less than 33 ms. In the following section, we will compare the speed of the different parts of the algorithms we discussed concerning movement compensation on the z-axis.

2.5.1 Normalized cross-correlation

The normalized cross-correlation between two images can be calculated most efficiently by taking advantage of the correlation theorem which states that multiplying the Fourier trans-form of one function by the complex conjugate of the Fourier transtrans-form of the other gives the Fourier transform of their correlation [Lyon, 2010]. To implement it, one needs to take both signals into the frequency domain, form the complex conjugate of one of the signals, multiply, then take the inverse Fourier transform.

iPhoton implements this normalized cross-correlation using Apple’s optimized Accelerate frame-work. This framework makes use of the (hardware) graphic processing unit (GPU), which is able to do this computation in less time than a software-based, central processing unit (CPU)-implementation. The graphical card used for our tests was an NVIDIA GeForce GT 330M. The algorithm was run on 300 images taken from different video streams acquired during our 22

(39)

Original images

Comparison of full size

images by cross

correlation: 13.21 ms

Comparison by cross correlation

similarity score

Original image

hash

Transformation into a hash

value: 11.148 ms

Image I

Image 2

Comparison using perceptual hashing

Image I

hash

Image 2

similarity score

Comparison of two

hashes: 0.0093 ms

dimanche 18 mai 2014

Figure 2.5: Comparison of the normalized cross-correlation and the radial variance hashing algorithms to compare pairs of images. In the case of the NCC, images are compared directly in 13.21 ms. In the case of the pHash algorithm, once a hash has been calculated for an image, further comparisons will take less than a hundredth of a ms.

(40)

Operation Processing Time (ms) Comparison of two images using normalized cross-correlation 13.21

Calculation of the radial variance hash of one image 11.148 Comparison of two radial variance hashes 0.0093

Table 2.2: Speed of the operations required for real-time image comparison using normalized cross-correlation and pHash.

experiment in order to get a representative mean. Each image consisted of a unique 8 bit channel of 500 × 748 pixels acquired via a VGA2PCIe (Epiphan) card.

2.5.2 pHash

The running time of the two steps of the pHash algorithm (the radial variance hashing function and the calculation of the similarity score between two hash values) was measured separately using Zauner [2010]’s implementation of the algorithm on a 2.66 Ghz Intel Core i7 processor. In both cases, the algorithm was run on 200 images in order to get a representative mean. It should be noted that no hardware accelerator was used in our implementation of pHash. Such hardware as well as the use of specialized functions included in standard image processing libraries (ex.: OpenCV or Apple’s Accelerate framework) could have improved the speed of our implementation. Furthermore, pHash Pro, a commercial product built on top of the open source implementation we used in this work, claims to achieve speeds up to 50 times faster. As the speed of the radial variance hashing algorithm was not a limiting factor for our demon-strations, we chose not to use the Pro version of the software. However, in another experiment were many other unrelated calculations would have to be done for each entering image, the speed of the hashing function could become limiting and justify the use of the aforementioned specialized functions, hardware accelerators or pHash Pro.

2.5.3 Results

The computing time of the NCC as well as the computing time of both parts of the pHash algorithm are shown in table 2.2. It appears that the radial variance hashing algorithm runs much faster than the normalized cross-correlation, especially when many comparisons have to be made.

For example, suppose an experimenter wants to compare the entering image of a video stream to the previously acquired images of the stream and has a maximum of 33 ms to do so. If they used normalized cross-correlation, it would only be possible to compare the entering image to 2 images (for a total computation time of 26.42 ms). By contrast, if pHash was used instead, a hash value would be calculated for each acquired image and stored in memory. The size of these hashes is minimal: 40 bytes in the case of the radial variance hashing algorithm we have 24

(41)

been studying. This operation takes 11.148 ms and happens only once per image. After that initial computation, the hashes can be directly compared, which is a lot more efficient than comparing images. Indeed, one comparison takes less than one hundredth of a ms. During the 18.852 ms left for computation, it would theoretically be possible to make a total of 20,165 comparisons with previously acquired images (see schema on figure 2.5).

The speed with which comparisons can be made using this method allows the grouping of similar images in real-time. Depending on the needs of the experimenter, they can choose to view one or many distinct optical slices during and after the experiment (see section 3.4).

2.6 Conclusion

In summary, the feature hashing has been shown to be more robust to rotation and noise and highly usable in terms of speed for discriminating different biological images in real-time with high specificity and sensitivity.

One weakness of the pHash algorithm is its relative sensitivity to translation. Indeed, cross-correlation remains the most robust algorithm to this kind of distortion. This could affect the performance of pHash in the task of grouping images from the same z-stack if the images also have an important translation between them. It would be tempting to try and lessen the problem by realigning each new entering image using cross-correlation before calculating pHash scores. In our experience, a drawback would be that when images are too dissimilar, cross-correlation sometimes has an unpredictable behavior and does wrong alignments, which would in turn decrease the performance of pHash.

It the following chapter, we will show an implementation of pHash used in two-photon mi-croscopy with an example from blood-brain barrier studies in the live spinal cord.

(42)

(43)

Chapter 3 Implementation of the radial variance

hashing algorithm for real-time

two-photon microscopy

3.1 Shot noise reduction

As we discussed in section 1.3, averaging a few similar images reduces shot-noise and thus increases the signal-to-noise ratio (SNR).

In a live video stream, instead of showing the entering image, it is possible to show the average of the entering image combined with a few previously recorded images.

In the experimental setup we used for acquiring images (see section 2.2), the averaging of 10 images gave satisfying results. As the SNR follows the root of the number of images, averaging 10 images give approximately a three-fold improvement of this ratio.

The simplest way to choose which images to average is to take the last sequential 10 images from the stream. This works well when the target is stabilized or if its movement is physically compensated. However, when the target is moving on the z-axis, the perceptual differences between the averaged images will blur (lower the spatial resolution of) the resulting image. By contrast, selecting images perceptually similar to the entering image (perceptually similar, thus most likely to originate from the same optical slice) for the averaging function will also increase the SNR while avoiding the loss of spatial resolution.

Figure 3.1, is based on a live video stream acquired using the same experimental method as described in section 2.2. Panel A shows an example of a single entering image (the reference image) along with a zoom on a structural element. Although a structure is clearly recognizable, there is a lot of shot noise in and around the structure and its edges are not sharp. The image on

(44)

panel B was formed by combining the reference image with the 9 previous images in the stream, regardless of their similarity to the reference image. The 10 images were aligned on the x/y axes using the normalized cross-correlation algorithm (see section 2.5.1) and then averaged. On this image, the signal-to-noise ratio is increased in comparison to the reference image. However, since there isn’t any compensation for movement on the z-axis, the structure’s edges are not as sharp as they could be. On panel C, instead of selecting the 10 previous images of the stream, we selected 10 previous images by filtering out those whose similarity score compared to the reference image was lower than the previously-mentioned 0.708 threshold. By doing this, only images very similar to the reference image (most likely because they were from same z-stack (see section 2.4)) were gathered, aligned on the x/y axes and averaged. This filtering has the effect of reducing the shot noise without blurring the structures’ edges. 3.1.1 Computing time

The computing time required to calculate the average of 10 images consisting of a unique 8 bit channel of 500 * 748 pixels was measured using functions provided by Apple’s Accelerate framework. The vImageConvert_Planar8toPlanarF function was used to convert the 8 bit channel image into a 32 bit floating point one while the vDSP_vadd function was used to sum the images. The graphical card used by these hardware-accelerated functions was a NVIDIA GeForce GT 330M. The process was repeated 10 times with an average computing time of 3.64 ms.

3.2 Improvement of the normalized cross-correlation

registration by pHash

When the normalized cross-correlation algorithm is used to register images from a video stream without filtering those belonging to different z-stacks first, the images may be misaligned. If the pixel position and intensity are used as a proxy for the measurement of some biological process of interest (blood-brain barrier permeability, for example), this misalignment can lower the validity of the experimental results.

To illustrate this problem, we used a series of 10 consecutive images during which there was movement on the z axis caused by the mouse’s breathing. As would be the case if we were to naively average an entering image with the 9 previous images to increase the signal-to-noise ratio (see section 3.1), we used the normalized cross-correlation algorithm to register each image to the last one (the entering image and reference image, see figure 3.2).

Since most of these images are not on the same z-stack, it would be impossible to align these images on the x/y plane correctly. Forcing the normalized cross-correlation algorithm to do so leads to aberrations where the image are shifted in the x/y plane when they shouldn’t be. The most obvious cases from our example are images -5 and -7 of figure 3.2.

(45)

Figure 3.1: Shot-noise reduction and sharpness of images when time-sequential or similarity-sequential images are compensated on the x/y axes and averaged. Panel A shows the single ref-erence image; Panel B uses 10 time-sequential images and panel C uses 10 similarity-sequential images with a radial variance hashing similarity score threshold of 0.708 when compared to the reference image. A zoom on part of a structure is shown on the right side of each panel. Scale bars are 25 µm and 4 µm on the left and right sides, respectively.

(46)

Figure 3.2: Demonstration of the possible misalignment when images from different z-stacks are aligned using the normalized cross-correlation algorithm. The reference image is shown on the bottom row while the 9 previous images (numbered from -9 to -1) are shown on the top rows. Images with a radial variance hashing similarity score when compared to the reference image below 0.708 are framed in red. These images are considered as belonging to a different z-stack and would not have been aligned to the reference image. Scale bar is 25 µm.

(47)

If the hashing algorithm had been used to filter out images not on the same z-stack as the ref-erence image, all the images framed in red would have been excluded. This includes misaligned images (especially images -5 and -7) as well as images where different parts of the structure could be observed more accurately if they were grouped together with similar images instead of the reference image (see the algorithm described in section 3.4.1).

3.3 Integration of the pHash algorithm into iPhoton

An implementation of the radial variance based hash function [Zauner, 2010] was integrated into iPhoton, the microscopy software used in our lab.

iPhoton [Veilleux et al., 2008] is a Mac OS X microscopy software written in-house by our laboratory’s principal investigator Daniel Côté which makes the best of available software and hardware accelerated image processing libraries in order to acquire video streams from a variety of video acquisition boards, process these streams in real-time and simultaneously write them to disk. iPhoton’s flexible architecture allows the implementation and use of additional real-time image processing modules, which are named FilterUnits. FilterUnits enabled by the user are called upon the acquisition of every video frame to process the image; these units can either produce a new video stream from the resulting processed images, or can generate other types of data such as streams of floating point numbers. In order to maintain the real-time capabilities of the software, FilterUnits must process each image in less time than a frame’s lifetime, which at 30 fps (the typical acquisition rate) corresponds to 33 milliseconds. In order to "play nice" with other units which require computational time as well, FilterUnits must try to limit their processing time as much as possible.

The use of hardware-accelerated libraries (such as Apple’s Accelerate Framework), which many existing FilterUnits take advantage of, reduces the computation time and allows the simulta-neous use of multiple units.

An implementation of the radial variance hashing algorithm was integrated into iPhoton as a FilterUnit. When enabled, this FilterUnit creates new video streams for images which are deemed similar, given a user-configurable similarity threshold. Each video stream can be viewed in real-time in a separate user interface window, allowing the experimenter to view separate groups of similar images, these groups corresponding to images from different z-stack planes.

3.4 Image grouping in real-time using the pHash algorithm

The ultimate goal of this project was to allow experimenters to get a clearer and more stable view of their specimen than what is possible with our current acquisition system. By integrat-ing similarity-based groupintegrat-ing based on radial variance hashintegrat-ing similarity scores into iPhoton,

(48)

we were able to add the benefits associated with z-axis stacking on top of all the features already included in the software, such as filters and x/y stabilization.

We used the algorithm described in section 3.4.1 to provide the experimenter with a video stream split into groups representing different z-stacks, each of which has a reduced signal-to-noise ratio and is stabilized in the z-axis when compared to the original video stream. Basically, every entering image is either displayed as part of an existing video stream formed from images similar to each other, or it is displayed alone as part of a new group.

3.4.1 Algorithm

The pseudo-code of the algorithm used to separate z-stacks in real time within iPhoton is detailed in Algorithm 1.

Create a first group with the first image of the video stream; foreach Image do

bestGroupScore= 0; bestGroup= None;

Calculate and store the pHash value of Image; foreach Group do

currentScore= pHash score between the current image and first image of Group; if currentScore ≥ bestGroupScore then

bestGroupScore= currentScore; bestGroup= Group;

end end

// At this point, we know to which group the image is most similar (bestGroup) and a score was assigned to this group (bestGroupScore);

if bestGroupScore ≥ 0.708 then

Align Image with the latest image of bestGroup using cross-correlation; Assign Image to bestGroup;

AverageImage with the last 9 images of bestGroup Display Image in bestGroup’s video stream;

else

if number of groups < MAX-NUMBER-OF-GROUPS then Create a new group with Image;

Display Image in a new video stream; else

Reject the image.; end

end end

Algorithm 1:Real-time image sorting within iPhoton

(49)

3.4.2 Results

Usually, after the experimenter has selected an ROI, the whole depth of the region will be covered in the first seconds of the video stream, a period sufficient to encompass at least one heart beat and one full breath of the living animal. During this period, a few similar-image groups will be created and will remain stable as long as the experimenter remains on the same region of interest.

Every new image being acquired will be displayed in one of several distinct grouped video streams. Since new images added to existing grouped streams within 95% confidence will have a similarity score above 0.708 when compared to the first image of the group to which this image is being assigned by the grouping algorithm, these groups will appear stable and will most likely represent < 2 micron thick slices (see section 2.4.1). As a result, the z-axis movement of the living animal will be transparent to the experimenter.

The algorithm described in section 3.4.1 is suitable for real-time display. At 30 fps, there are about 33 ms available for computation between two images. The important steps taking place for every entering image are the following, in decreasing order of computing time:

– Alignment of the new image with the last image of the group by means of normalized cross-correlation (13.2 ms)

– Calculation of the radial variance hashing value of the entering image (11.1 ms) – Averaging of the 10 last images of a group for display (3.6 ms)

An entering image also needs to be compared with the first image of each already existing group. At 0.01 ms per comparison, it would be possible to make about a hundred such com-parisons in a ms during the remaining time per frame. Since groups represent video streams designed to be seen simultaneously by an experimenter, having more than 10 groups is not practical. To make sure the stream would not be interrupted by too large a number of com-parisons, a constant representing the maximum number of groups allowed, MAX-NUMBER-OF-GROUPS, was included in the algorithm and set to 10. It didn’t happen that the number of groups would have been greater than 10 in the single-ROI video streams acquired during our experiments.

By restricting the number of groups to 10, we also restrict the computing time to make the comparisons to 0.1 ms.

This means that there are about 5.2 ms left for other processes implemented in iPhoton such as displaying the image, writing on disk and other image processing operations.

An example live video stream split into 8 groups can be seen at the following URL: http://cafeine.crulrg.ulaval.ca/grouping.mov

(50)

while the 8 other divisions show the 8 groups formed in real-time by the algorithm described in sections 3.4.1.

(51)

Conclusion

Biological imaging suffers from animal movement which leads to images that appear to shift from side to side or change depth and prevent their averaging and analysis. Many different strategies have been developed over the years, but many if not most suffer from poor robustness or long computation times making them unusable in real-time in vivo applications.

We therefore selected and implemented a perceptual hash algorithm appropriate for biological images. We have shown that the algorithm is sensitive enough to differentiate images that appear different to the user while being robust to small translations (on the x, y and z axes), rotations and the presence of shot noise. The radial variance hashing algorithm compares favorably to normalized cross-correlation in terms of robustness to rotation and noise and performs better at grouping images based on the z-axis coordinate. Moreover, once the hash of an image has been calculated, the comparison between two images is many times faster than what can be achieved with an algorithm based on cross-correlation.

An implementation of the radial variance algorithm was integrated into iPhoton, which al-lowed us to demonstrate its applicability to in vivo microscopy. The speed with which the comparsions, and thus the grouping, can take place allows the algorithm to be run in real time on the acquired video stream. Grouping microscopy images by optical slice allows the averaging of the most similar images, effectively reducing shot noise. These properties make it possible to concurently display sharp video streams of different optical slices in real time. Now that we have demonstrated the ability of the pHash algorithm to group images belonging to the same z-stack, a natural next step woud be to process a video stream in real-time to model the biological structures under observation in three dimensions.

(52)

(53)

Bibliography

D. Davalos, J. K. Lee, W. B. Smith, B. Brinkman, M. H. Ellisman, B. Zheng, and K. Akas-soglou. Stable in vivo imaging of densely populated glia, axons and blood vessels in the mouse spinal cord using two-photon microscopy. Journal of Neuroscience Methods, 169(1): 1–7, Mar. 2008. ISSN 01650270. doi: 10.1016/j.jneumeth.2007.11.011.

C. De Roover, C. De Vleeschouwer, F. Lefebvre, and B. Macq. Robust image hashing based on radial variance of pixels. In IEEE International Conference on Image Processing, 2005. ICIP 2005, volume 3, pages III–77–80, 2005. doi: 10.1109/ICIP.2005.1530332.

S. R. Deans. The Radon transform and some of its applications. DoverPublications. com, 2007.

W. Denk, J. H. Strickler, and W. W. Webb. Two-photon laser scanning fluorescence microscopy. Science, 248(4951):73–76, Apr. 1990. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.2321027. PMID: 2321027.

S. Laffray, S. Pagès, H. Dufour, P. De Koninck, Y. De Koninck, and D. Côté. Adaptive movement compensation for in vivo imaging of fast cellular dynamics within a moving tissue. PLoS ONE, 6(5):e19928, May 2011. doi: 10.1371/journal.pone.0019928.

D. Lyon. The discrete fourier transform, part 6: Cross-correlation. JOURNAL OF OBJECT TECHNOLOGY, 9(2), 2010.

F. X. Standaert, F. Lefebvre, G. Rouvroy, B. Macq, J. J. Quisquater, and J. D. Legat. Practical evaluation of a radial soft hash algorithm. In International Conference on Information Technology: Coding and Computing, 2005. ITCC 2005, volume 2, pages 89–94 Vol. 2, 2005. doi: 10.1109/ITCC.2005.229.

R. Szeliski. Computer vision algorithms and applications. Springer, London; New York, 2011. ISBN 9781848829343 1848829345 9781848829350 1848829353.

I. Veilleux, J. A. Spencer, D. P. Biss, D. Côté, and C. P. Lin. In vivo cell tracking with video rate multimodality laser scanning microscopy. IEEE Journal of selected topics in quantum

(54)

C. Zauner. Implementation and benchmarking of perceptual image hash functions. Master’s thesis, Upper Austria University of Applied Sciences, Hagenberg Campus, 43, 2010.

Perceptual hashing-based movement compensation applied to in vivo two-photon microscopy