List of Figures - Automatic non linear metric learning : Application to gesture recognition

1 Architecture du SNN originel. Deux réseaux partageant un même en-semble de poids W prennent deux échantillons X et Y en entrée, dans le but de calculer une erreur relative à une fonction de similarité définie sur l’espace de sortie (ici, la similarité cosinus), au moyen des vecteurs de sortie respectifs O_X et O_Y. . . xi 2 Systèmes de ressorts. A gauche, deux échantillons similaires s’attirent,

tandis que deux échantillons dissimilaires se repoussent à droite, avec une distance maximum représentée en pointillés à partir de laquelle les points sont suffisamment éloignés pour ne plus se repousser. Extrait de [HCL06]. . . xii 3 Représentation 3D de l’effet d’une étape de mise à jour sur la norme

des sorties pour une paire (O_X₁, O_X₂) à l’instant t. Les échantillons sont représentés avec la même norme dans un souci de simplification. Le cône gris correspond à la surface équipotentielle pour la fonction . . . . xv 1.1 User talking and pointing to items on the screen with the

Put-That-There interface. Extracted from [Bol80]. . . 2 1.2 Relation between Ambiant Intelligence (AmI) and other areas in

Com-puting Science. Extracted from [AM07]. . . 4 1.3 Five key properties of metric learning algorithms. Extracted from [BHS13].

. . . 7

2.1 Representation of the three axis of MEMS sensors, relative to the Smart-phone device. . . 12 2.2 Schematics for a one-dimensional MEMS accelerometer. Extracted from

[ins]. . . 12

2.3 Schematics for a one-axis vibrating structure MEMS gyrometer. Ex-tracted from [GK03]. . . 13 2.4 Resonance modes of the tuning fork gyrometer. Extracted from [ZSZA08]. 14 2.5 Single driving structure of 3-axis digital gyroscopes. Extracted from

[STM]. . . 14

2.6 Representation of the visual hull, result of 3D space carving. Extracted from [GSFP06]. . . 16 2.7 Photograph of a Kinect device. . . 17 2.8 Depth image, corresponding inferred most likely body part labels and

xxviii LIST OF FIGURES 2.9 Example of a recursive construction of a primitive model. Gesture

prim-itives, formed by two end-points (EP) which are local extrema for the acceleration signal, are recursively decomposed in inter-primitive points (IP), located at equal distance from their parents). Extracted from

[CCB+06]. . . 24

2.10 Example of a left-to-right HMM, with two forward connections. . . 26 2.11 Extracted from [RK04]. Example of DTW computation between two

similar, out-of-phase signals Q and C, represented in A. B corresponds to the warping matrix, with the minimum-weight warping path between the two signals. C is the representation of the resulting alignment. . . . 30 2.12 Decomposition of the successive computations operated by an artificial

neuron.. . . 38 2.13 Representation of a MLP with one hidden layer, and the forward

con-nections between neurons. . . 38 2.14 Representation of the architecture and activation maps of a

Convolu-tional Neural Network. Extracted from [DBLG14]. Convolutional lay-ers c_i are followed by sub-sampling layers s_i, which allows a progressive dimensionality reduction and feature computation for the last output layer operating the classification. . . 42 2.15 Extracted from [Sch]. Representation of the LSTM cell. The CEC is

marked in red, while the output, input and forget gates are in green. The peephole connection corresponds to the connection between the CEC, whose outputs are delayed by one time step, and the gates. The blue nodes show the multiplications operated by the gates. . . 44 3.1 Architecture of the original Siamese Neural Network. Two identical

neural networks (NN) with shared weights W take simultaneously two input samples X and Y to compute the error relative to a cosine-based objective function, thanks to the respective outputs O_X and O_Y. . . 48 3.2 Spring system between similar examples (solid circles) and dissimilar

examples (hollow circles). (a) Attract-only springs (b) Loss function and gradient associated with similar pairs. (c) Repulse-only springs with margin m. (d) Loss function and gradient associated with dissimilar pairs. Extracted from [HCL06]. . . 52 3.3 Representation of the effect of an update step on the norm of the

pro-jections for a pair of outputs (O^t₁, O^t₂) at the epoch t in 3D. The centre of the sphere corresponds to the origin of the output space. The grey cone represents the equipotential surface for the function cos_O₁. For simplification, O^t₁ and O^t₂ were given the same norm. . . 58 3.4 Area of a 2D polytope implied by a vertex of two vectors a and b . . . 59 4.1 Snapshots from all the actions available in the Berkeley MHAD are

dis-played together with the corresponding point clouds obtained from the Kinect depth data. Actions (from left to right): jumping, jumping jacks,

bending, punching, waving two hands, waving one hand, clapping,

throw-ing, sit down/stand up, sit down, stand up. Extracted from [OCK+13].. 66

4.2 Evolution of the mean output norm for cosine-based SNNs. . . 70 4.3 Evolution of the mean output norm for scalar product-based SNNs. . . 71 4.4 Evolution of the mean output norm for Polar Sine Metric-based SNNs. 71 4.5 Evolution of the Classification score for the cosine-based SNN. . . 73

4.6 Evolution of the Classification score for the scalar product-based SNN. 74 4.7 Evolution of the Classification score for the Polar Sine Metric-based SNN. 74 4.8 Representations of the gestures in the Orange Dataset. . . 77 4.9 SNN-scal, SNN-psine, DTW and MLP comparison on R1. . . . 85 4.10 SNN-scal, SNN-psine, DTW and MLP comparison on R2. . . . 86 4.11 Close-up of Fig.4.10 around the 41.7% landmark. . . 87 4.12 SNN-scal, SNN-psine, DTW and MLP rejection details on R2 . . . 88 B.1 3D plot of the training outputs distribution after training the Siamese

network on 3 flick classes. Three different representations of the output space are available, with different rotations, and projections on the XY horizontal plane, and XZ and Y Z vertical planes. . . . 104 B.2 3D plot of the test outputs distribution after training the Siamese

net-work on 3 flick classes, with outputs from one unknown class in black. Three different representations of the output space are available, with different rotations, and projections on the XY horizontal plane, and XZ and Y Z vertical planes. . . . 105 C.1 Raw accelerometer signals for 10 recordings of the "Flick East" gesture

for one user. . . 107 C.2 Accelerometer signals for 10 recordings of the "Flick East" gesture for

Dans le document Automatic non linear metric learning : Application to gesture recognition (Page 32-36)