Visual Tracking of Deforming Objects Using Physics-based Models

(1)

HAL Id: hal-03179253

https://hal.inria.fr/hal-03179253

Submitted on 24 Mar 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Visual Tracking of Deforming Objects Using Physics-based Models

Agniva Sengupta, Alexandre Krupa, Eric Marchand

To cite this version:

Agniva Sengupta, Alexandre Krupa, Eric Marchand. Visual Tracking of Deforming Objects Using

Physics-based Models. ICRA 2021 - IEEE International Conference on Robotics and Automation,

May 2021, Xi’an, China. pp.14178-14184. �hal-03179253�

(2)

Visual Tracking of Deforming Objects Using Physics-based Models

Agniva Sengupta

¹

, Alexandre Krupa

¹

, Eric Marchand

¹

Abstract— In this paper, we propose a framework for tracking the deformation of soft objects using a RGB-D camera by utilizing the physically-based model of the considered object. A coarse, 3D template of the object being tracked is the only prior information required by the proposed method. The proposed approach does not rely on the accurate knowledge of the material properties of the object being tracked. In this paper, we integrate computer vision based tracking methodology with physical model based deformation representation without requiring expensive numerical optimization for minimizing non- linear error terms. The proposed approach enables deformation tracking by joint minimization of a geometric error and a direct photometric intensity error while utilizing co-rotational Finite Element Method (FEM) as the underlying deformation model.

The proposed method has been validated both on synthetic data (with groundtruth) and real data.

I. INTRODUCTION

Physically based models have been used to analyze deformations of objects of complex shapes since the inception of structural mechanics. Some of the recent computer-vision based approaches have tried to leverage the advantages of the physics based models for deformation tracking. Surprisingly, these physical-model based approaches often fall short of other competing deformation models in terms of accuracy [1]. Many deformation tracking methodologies used by the computer vision and robotics community utilizes geometric regularizers or assumption of isometry or conformality as a deformation model. However, structural mechanics offer many highly evolved methodologies for modelling deformation in complex shaped objects. In this paper, we propose a method of non-rigid object tracking using a RGB-D camera that combines the minimization of a geometric error that penalizes the model to pointcloud distance and a photometric error that penalizes inter-frame photometric intensity difference. A coarse, 3D template of the object being tracked is the only prior input necessary for the proposed approach.

The entire observable surface of the deforming object is tracked in 3D using the data from the RGB-D camera.

The deformation model used for representing the underlying physics is based on FEM. Thanks to the physics based model, it is possible to track large deformations that do not preserve the volume of the object. The proposed method is robust to occlusion and the tracking accuracy is not dependent on apriori knowledge of physical properties of the deforming object. However, if the physical parameters are known apriori, we demonstrate that the approach can also be utilized for estimating the external contact forces applied

1Univ Rennes, Inria, IRISA, CNRS, Rennes, France

e-mail:{agniva.sengupta, alexandre.krupa, eric.marchand}@irisa.fr.

on the surface of the deforming object. The following are the main contributions of this paper:

• We propose a new methodology for deformation tracking using joint optimization of a geometric and photometric error defined on the surface model of the deforming object. A non-linear least squares optimization based method with an analytic expression of the Jacobian relating the variation of geometric and visual error to displacement of the vertices of the mechanical mesh is proposed to track the deformation of objects

• The method proposed here has also been utilized to track external forces applied on the surface of deforming objects. Such applications are an important advantage of using physically-based models for deformation tracking.

II. BACKGROUND

A method to track deformable objects by applying virtual forces on the simulation of a physically based model was proposed in [2]. The physics of the deforming object was mod- elled as a collection of linked rigid bodies or particles and the tracking was formulated as an Expectation-Maximization (EM) problem. The qualitative results in [2] are impressive, but there are a few sequences where the tracking suffers noticeable drift in heavily occluded regions. It is not trivial to modify the approach of [2] to utilize a Newtonian optimization scheme for minimizing the visual error. [3] improved the approach of [2] by considering co-rotational FEM. But, in [3], no attempt was made to tackle the deformation tracking problem using conventional optimization since the gradient estimation of the high dimensional optimization step was considered to be intractable. In this context, it must be noted that physical model free, RGB-D based deformation tracking approaches such as [4] depend heavily on the assumption that the deformation is isometric. Consequently, these approaches are not preferable when there is a significant change in the volume of the deforming object. In [5], the authors utilized co-rotational FEM to track deformation by purely geometric matching of pointcloud with the 3D object model, producing highly accurate tracking results at frame-rate. However, the approach of [5] was also sensitive to occlusion, large inter- frame motion and demonstrated a tendency to drift along the tangent of large planar surfaces, which can be intuitively explained as an inherent drawback of tracking solely using point-to-point or point-to-plane correspondences. In [6], the deformation tracking using physically based model has been formulated as a conventional optimization problem, but only using depth information. Moreover, the optimization in [6]

requires expensive simulation of object deformation using FEM for computing the gradient of the visual error, which

(3)

is unsuitable for deformation tracking at frame-rate. [7] pro- poses a force tracking methodology based on the approach of [6] which is slow and does not utilize the photometric information for tracking.

III. METHODOLOGY

The proposed method tracks deformable objects using a RGB-D camera with the help of a coarse 3D template of the deforming object. The proposed approach can be summarized by the following iterative steps:

• Rigidly track the deforming object w.r.t the camera

• Analyze the residual error after rigid tracking to deter- mine the region which is likely to have deformed

• Minimize a combination of geometric and photometric error terms to track the deformation using a set of selected vertices of the model in those deformed region We now describe the entire approach in details.

A. Notation

Throughout the rest of the paper, the notationQis used to denote the vertices of the volumetric mechanical tetrahedral mesh K, while P^S = (P^S_X,P^S_Y,P^S_Z) denotes the vertices of the surface model S. The mapping S 7→ K is done via barycentric mapping. Following standard convention, the object to camera homogeneous transformation matrix is given as^CTO=

_C

RO CtO

0 1

and this is utilized to align K andS to the actual object, where ^CR_O and^Ct_O are the rotation matrix and translation vector respectively. Between two consecutive data frames, the pose of the object in current frame(p)with respect to pose in the previous frame(p−1) isq= (^pt_p−1, θu), whereθanduare the angle and axis of rotation^pR_p−1. The time derivative ofqis given as^pv_p−1= δq, wherev∈se(3)is the velocity screw. The color image for the p-th data frame is given by Ip. The perspective projection functionΠ(·)is used to project the 3D pointPto its corresponding 2D coordinates(u, v)in the image plane, such that



 u v 1



 =Π P

1

=K_pΠ_p^cT_o P

1

, where K_p is the camera calibration matrix in the standard notation and Πp =

I_3×3 0_3×1

. The function I(u, v) gives the grayscale intensity at that pixel position and (∇I^u_p,∇I^v_p) gives the corresponding image gradient. (f_x, f_y) are focal lengths in pixel units.

B. Deformation Model

We employ FEM [8] to model the underlying physics of the non-rigid object. Let the four vertices of a particular face of the tetrahedral mechanical mesh be denoted by the vertices Q1, Q2, Q3 and Q4 and the centroid be a point QC. To obtain the displacement of vertices as a result of application of external force on the object, a second order differential equation of the following nature needs to be solved:

MQ¨ +DQ˙ +KQ=F_ext+F_e (1) whereMandDrepresent the mass and damping matrices of the model respectively, andK is the global stiffness matrix.

Fextdenotes the external forces acting on the vertices andFe

are the internal elastic force vector acting on the vertices of the tetrahedral elements. A linear solver based on conjugate gradient descent [9] is used to solve Eqn. (1). Co-rotational FEM [10] is a modification of linear FEM that allows robust handling of strong deformations in the mesh. If the model undergoes a large deformation, let the centroid of the tetrahedral element at this deformed configuration be denoted by a point Q^R_C. An arbitrary vertex Qx of the mechanical mesh in the undeformed frame gets deformed to Q in the new frame (variables with the˜· notation are expressed in the corotated frameQ^R_C). With the co-rotational formulation [11], it can be shown thatd˜=R^> U−Q_C+Q_x

−Q_x given that R is the rotation matrix, U is the displacement of Qx, expressed in the object frame ofQC, and d˜ is the displacement of Qx in the corotated frame Q^R_C. Applying similar deformation to the vertices of the tetrahedron, the internal, elastic forces can be represented by:

F_e=R_eK_eU˜^R (2) Here, U˜^R = ( ˜Q1,Q˜2,Q˜3,Q˜4), Ke represents the stiffness matrix andReis the 12×12block diagonal matrix of four3×3rotation matrices Rstacked diagonally.

C. Visual Tracking

We track the deforming object by minimizing two error terms defined on the surface of S. The minimization is done using a set ofcontrol handles C on K, such that the entire deformation becomes optimizable by regulating the displacement ofC.C are those vertices of K that shall be controlled to minimize the error terms defined on S. Any displacement ofCgets propagated to all vertices ofKusing the deformation model. We are interested in minimizing a geometric error function E_depth and a photometric error functionE_photo. The combined error term can be represented as:

ES =

Edepth Ephoto

>

(3) E_S is minimized using a set of displacement vector on C. For the sake of simplicity, we describe the proposed mechanism of minimizing E_S with the help of a single control handle, i.e., the u-th control handle of C, denoted by uC. However, to optimize in a non-linear least squares fashion, we need to obtain the gradient of the stacked cost function. Numeric estimation can be computationally expensive as determining the node displacements typically involves multiple iteration of conjugate gradient descent for solving an equation of the nature of Eqn. (1). One of the key contribution of this paper is the proposed mechanism for minimizingESwith non-linear least squares optimization along with a strategy for analytically estimating the Jacobian.

Let us analyze the case of an arbitrary 3D point P of the pointcloud, which lies on or near the triangle

P^S_i P^S_i+1 P^S_i+2

of S. Let us assume that a small displacement ∆uC =

∆uC_X ∆uC_Y ∆uC_Z

produces the displacement∆x,∆yand∆zon the j-th surface plane

(4)

of S (due to the underlying deformation of the mechanical mesh) comprising of three vertices P^S_i,P^S_i+1 andP^S_i+2 respectively. This vector of vertex positions is given by:

ϑj=

(P^S_i)^> (P^S_i+1)^> (P^S_i+2)^>

(4) Subsequently, the gradient of the error E_S(P) w.r.t u_C is given by the Jacobian:

J_S(P) =∂E_S(P)

∂u_C = ∂E_S(P)

∂ϑ_j

1×9

∂ϑj

∂u_C

9×3

(5)

The term _∂u^∂ϑ^j

C is somewhat similar to the classical strain- displacement matrix, as expressed conventionally in FEM literature [12]. The details of how ^∂E_∂ϑ^S^(P)

j can be derived in our case is described in Sec. III-E.

Computing ^∂ϑ_∂u^j

C can be done by estimating the deformation of the mechanical modelK, for every newcontrol handle explored by our proposed approach, only once. For the j-th control handleCj, this is done by successively displacing the vertex in K corresponding toCj by a small distance along the positive direction of X, Y and Z axis. This estimation step produce three deformed meshes per control handle, which can be directly utilized for determining _∂u^∂ϑ^j

C using forward finite differences. Assuming thatuCpoints to thec-th vertex of K, the values of ^∂P_∂u^Sⁱ

C can be mapped to a matrixΓ_O by a mapping functionC such thatΓ_O=C ^∂P_∂_u_C^Sⁱ

where:

∂ϑj

∂uC

=

∂P^S_i

∂u_C,^∂P

S i+1

∂u_C ,^∂P

S i+2

∂u_C

(6) Here,C represents the mapping ^∂P_∂u^Sⁱ

C 7→ΓO. Assuming that M denotes the number of vertices in K and S, C can be expressed as:

ΓO=C∂P^S_i

∂uC

=

M

X

i=0 M

X

c=0

∂P^S_i

∂uC

⊗J^i,c_M×M (7) where⊗gives the Kronecker product andJ^i,c_M×Mdenotes the conventional single-valued matrix of dimension[M×M]for thec-th control handle, such that:

J^i,c_M×M

x,y =

(1 ifx =iandy =c,∀x,y∈M

0 else (8)

and the x and y in the subscript of (·)x,y denotes the row and column index of an element of the matrix. Based on the matrix indices, Γ_O can be represented as:

ΓO

aM+i,bM+c= ∂P^S_i

∂uC

a,b| ∀i, c∈M∧a, b∈ {0,1,2}

(9) We term the matrix ΓO as influence matrix (since it denotes the influenceof the displacement of a vertex on its neighbors) and it can be conveniently computed offline once per every object model and stored. During tracking, ΓO is loaded from memory when required.

For estimating Eqn. (5), we derive a new matrixR^Γ such that:

R^Γ

3M×3M=^CRO⊗IM×M (10)

where IM×M is an identity matrix of dimension [M×M].

R^Γ allows us to rotate the influence matrix to the current camera reference frame. This transformed matrixΓCis given by Γ_C = R^ΓΓ_O. We then derive a new matrix Γ by thresholdingΓ_C such that its value at an arbitrary index m, n associated with thei-th vertexP^S_i is given by:

Γm,n=

(1 if ε>^3α||∆u₂ ^C^||

0 else (11)

given thatε =P2 a=0

P2 b=0

ΓC

m+a,n+b

²

where∆uC

is the calibrating deformation (implying that ||∆uC|| is a constant for a particularΓO).αis a tunable parameter which regulates the area for which the value ofΓ will be 1.

This thresholding of Eqn. (11) is done because we in- tend to utilize the point-to-plane distance as the error term E_depth. It is known classically that point-to-plane distance minimization using planar (or nearly planar) surface model can result in the model ‘sliding’ over the 3D data [13]. The calibrating deformation∆uCproduces larger deformation in vertices closer to the vertex indexc, i.e.,ΓChighly prioritizes vertices close toc. This, in turn, converts the Jacobian in Eqn.

(5) to effectively consider only a small patch of surface near the vicinity of the c-th vertex, while neglecting the small magnitude of deformation observed in the vertices farther away from the c-th vertex. To overcome this problem, we propose to binarize ΓC to Γ for obtaining the gradient of E_depth. However, the Jacobian for E_photo can be obtained directly fromΓ_C without modification.

D. Determining the Control Handles

It is possible to estimate suitable positions for thecontrol handles by analyzing the cost function. Given the two objective functionsE_depthandE_photo, we define two vectors Z^depth and Z^photo, such that their values at the index corresponding to the 2D image pointΠ(P_l)is given by:

Z^depth_Π(P

l)= (Wdepth)l,lEdepth(Pl) Z^photo_Π(P

l)= (Wphoto)l,lEphoto(Pl) (12) for the l-th point Pl of the pointcloud. Z^depth and Z^photo forms a vectorized image from the pixel intensities corresponding to the error values of Edepth and Ephoto respectively.Wdepth andWphoto are the diagonal weighting matrix derived fromEdepth andEphoto using Tukey based m-estimator [14]. The combined error matrix Z_Π on the image plane is obtained by:

Z_Π =D

vec⁻¹_H×W

Z^depthE D

vec⁻¹_H×W

Z^photoE (13) where denotes the Hadamard product and h·i gives the normalized matrix. vec⁻¹_H×W(·) denotes the vector to matrix map R^HW 7→ R^H×W, where H and W are the height and width image respectively. A median blur with 3×3 sized kernel is applied on ZΠ followed by a linear thresholding and the output of this operation is clustered into multiple clusters. The centroid of each cluster is associated with the nearest projection of the visible vertices of K on the image

(5)

plane. These associated vertex indices ofKare identified as thecontrol handles for that particular frame.

E. Non-rigid Error Minimization

Assuming that an estimate of ^pT_p−1 has already been obtained (using the method described in Sec. III-F), a combination of depth based geometric error and direct photometric error is minimized to track the deforming object. We now define the error term that needs to be minimized for a single pointPof the pointcloud at the p-th data frame. The point- to-plane distance based geometric error is given by:

E_depth(P^p) =n_j·P^p−d_j (14) assuming that the j-th surface plane of S (where n_j is the normal and d_j is distance to origin) corresponds with the pointPon the image plane. We propose a photometric error term defined on the (p-1)-th frame by:

E_photo(P^p−1) =I_p(P^p−1_e )−I_p−1(P^p−1) (15) The updated point positionP^p−1_e is determined by a barycentric map P^p−1_e =

(P^S_i)⁰ (P^S_i+1)⁰ (P^S_i+2)⁰

B such that P^p−1 corresponded with the triangle P^S_i P^S_i+1 P^S_i+2 and (P^S_i)⁰ gives the updated vertex position of P^S_i when subject to the update ϑ, given that B is a column vector denoting the barycentric coordinates ofP^p−1w.r.tP^S_i,P^S_i+1 andP^S_i+2.

Eqn. 3 can be slightly modified to:

ES =

Edepth µEphoto

>

(16) as the combined cost function, whereµ=β^kE_kE^depth^k

photok is used for bringing the geometric and photometric error terms to the same scale and 0.9 ≤ β ≤ 1.2 is a tunable parameter for weighting the relative influence of the photometric cost function.

The combined cost function that we seek to minimize is Eqn. (16) w.r.t uC, the displacement of the control handle.

The Jacobian relating the change ofuC to the change ofES

is obtained by utilizing Eqn. (10) and Eqn. (11) in Eqn. (5), such that:

J= ∂E_S

∂uC

=h_∂E

depth

∂ξ Γ µ^∂E_∂ξ^photoΓC

i>

(17) where:

∂Edepth

∂ξ =







−n^>_j −n^>_l A

P^S_i+2−P^S_i+1

×)

−n^>_l (A

P^S_i −P^S_i+2

×

−n^>_l A

P^S_i+1−P^S_i

×







>

(18)

given that:

A= 1

knjk(I_3×3−njn^>_j) (19) provided(·)_× gives the skew-symmetric matrix and:

nj= (P^S_i+2−P^S_i)×(P^S_i+1−P^S_i) (20)

andnl=P^p_l −P^S_i. On the other hand:

∂Ephoto

∂ξ = ∇I^up

∇I^vp

>





 b1 0

0 b1

b2 0 0 b2

b3 0 0 b3







>



 Gⁱ_PS

Gⁱ⁺¹_PS

Gⁱ⁺²_PS





(21) where:

G^q_PS =





f_x

(P^S_q)_Z 0 −fx (P^S_q)X

(P^S_q)²_Z

0 _(P^fS^y

q)Z −f_y^(P_(P^q_S⁾^Y

q)²_Z



∀q∈i,(i+1),(i+2) (22) andB= b₁, b₂, b₃

are the barycentric coordinates.

The optimization method of our choice is Levenberg- Marquadt (LM) like [15], and the update is given by:

δu_C=−(H+λI_N×N)⁻¹J^>W^>WE_S (23) where H =J^>W^>WJ is the approximation of the Hes- sian andW = diag W_depth,W_photo^µ

whereW_depth and W^µ_photo is obtained using the Tukey based m-estimator on E_depth andµE_photo respectively. λis a scaling factor and N is the size of the error vector E_S. The new position u_C+δu_C of C_j is to be applied on the mechanical model such that it deforms to minimize the errors from (14) and (15). Apart fromαin Eqn. (11),βrelated to Eqn. (16) andλ (and excluding the clustering mechanism of Sec. III-D), there are no parameters involved in the non-rigid visual tracking approach presented here.

F. Initialization

The rigid pose of the object in the camera’s reference frame is updated at the beginning of each frame using an approach similar to [16], by jointly minimizing a geometric error and a sparse feature based error, given by:

e^D(^pvp−1) =

pRp−1P^p−1+^ptp−1

·nj

−dj (24) e^K(^pv_p−1) =

Π(P^p−1)_x−x^∗ Π(P^p−1)y−y^∗

(25) where P^p−1 represents an arbitrary 3D point from the last data frame, which has been matched to the j-th plane of the object’s 3D model, denoted by the normal vector nj = (n^X_j, n^Y_j, n^Z_j) and distance to origin dj.

Π(P^p−1)_x,Π(P^p−1)_y

are the projection of the same arbitrary point in the (p-1)-th image while (x^∗, y^∗)are the same image points matched in the p-th image using Harris corner features. ^pv_p−1 ∈ se(3) gives the velocity twist between the(p−1)-th frame and p-th frame.

G. Force Tracking

Next, the method of Sec. III-C - III-E is used to estimate the deforming forces acting on an object. Two additional information are required for force tracking: the material properties of the object being tracked, i.e., Young’s modulus, Poisson’s ratio, Rayleigh stiffness etc. and the approximate point of contacts on the object. The force tracking experiments have been performed by tracking the tool applying

(6)

You Young 55 50 500

Fig. 1: Variance ofHwithYandσfor the two simulated objects, cuboidandicosphere(color coded with the value ofH)

the force using fiducial markers. Once a point PP have been identified as the point of contact, its nearest mechanical mesh vertexQPis obtained using a nearest neighbor search.

Thereafter, the force applied on PP is given as

P

i

FQ_i

, for all indices isuch that Qi is a neighbor ofQP, wherein the force vectors FQ_i is derived using (2).

IV. RESULTS

The results can be roughly divided into three categories:

• For the simulated objects with groundtruth, we focus on comparing the fundamental concept of the non-rigid tracking approach proposed here with comparable state- of-the-art methods. We also use challenging simulated sequences to quantitatively validate the robustness of our approach;

• For deformation tracking on real data, we emphasize the capability of our algorithm to accurately track large volumetric deformations in soft objects;

• We provide experimental validation of the force tracking approach proposed in Sec. III-G

A. Simulated Results: Validation with Groundtruth

We base our simulated results on two objects¹, a cuboid and an icosphere, as shown in Fig. 3 and Fig. 2. The simulated data is generated using the Blender software [17] and the object deformations are generated manually using harmonic coordinates with B-spline basis [18]. Some quantitative results on synthetic sequences are expressed as a percentage of the largest diagonal of the bounding box of the objects (2.884m for the cuboid and 3.408m for the icosphere), and hence the values are unitless.

First, we demonstrate that the difference in accuracy between maintaining the influence matrix Γ constant (C.

Γ) and re-computing and updating Γ (U. Γ) at each frame numerically is not significant. To establish this proposition, we run tests on the synthetic data with and without holding Γ constant for both standard, linear FEM (SF) and co- rotational FEM (CRF). The results, summarized in Table I, are expressed in terms of the Hausdorff distance (H) from groundtruth (GT). It can be clearly seen that the variation in accuracy of tracking is between 0 to 0.59 % for both the sequences tested here. This comes in exchange for a large improvement in runtime (upto > 53% improvement in per frame time requirement, see Sec. IV-D for time requirements of the implementation) of the entire algorithm,

1All data available at: github.com/icra2021/VisualDeformationTracking

Cube Icosphere SF ConstantΓ 5.76 % 4.83%

UpdatedΓ 5.75 % 5.42 % CRF ConstantΓ 2.14% 5.25 % UpdatedΓ 2.14% 5.34 %

[19] 2.93 % 7.93 %

[6] 3.28 % 6.67 %

TABLE I: Tracking accuracy for two synthetic sequences in terms of Hausdorff distance between tracking output and GT

since multiple FEM simulations per iteration of LM is highly expensive. We also use these sequences for comparing the proposed approach to [19] and [6]. The corotational FEM based version of the proposed approach outperforms [19] and [6] in both the sequences. Fig. 2 shows the visual comparison of all the approaches tested on the synthetic sequences.

Using a specificcontrol handleand maintaining all other parameters constant, the variation of the tracking accuracy (in terms of H) of the method proposed in this paper w.r.t the Young’s modulus (Y) and Poisson’s ratio (σ) is shown in Fig. 1. It is clear that the tracking accuracy do not vary significantly with change in Young’s modulus. This observation is in accordance with [6]. However, the tracking accuracy varies significantly with change in Poisson’s ratio. This is because materials with Poisson’s ratio in the range of 0.35−0.45 are highly ductile in nature and hence significantly harder to track with.

Fig. 3 shows the output of the proposed approach on two challenging deformation sequences oncuboidandicosphere.

The two objects were subjected to large volumetric deformation and were occluded using large floating objects in the synthetic scene. The mean value of H is 0.0696m for the cuboidand 0.0836mfor theicosphere, while the value ofH for every frame of the sequence if shown in Fig. 4.

SF CRF

Input [19] [6] U.Γ C.Γ U.Γ C.Γ

Fig. 2: Comparison between GT and tracking output for the two synthetic sequencescubeandicosphere. The red edges and vertices shows the object model from the GT, the black ones are from the tracking output.

B. Experiments on Real Data

The real data has been captured using an Intel RealSense D435 camera. Three objects were used, a block ofsponge, a rugbyball (cut in half) and a softdice. These objects were strongly deformed from the top. The output of tracking has been demonstrated visually² in Fig. 5 while the evolution of point-to-plane distance across the two sequences have been logged in Fig. 6. Thesponge andball demonstrates highly accurate tracking with a mean geometric error (from Eqn. 14)

2For a detailed video of the results, please visit: youtu.be/ScJnz j4-cs

(7)

Fig. 3: Thecuboidandicosphereare subjected to large deformation ((a) & (b)) with large, floating objects flying across the scene (as occlusion). The tracking output is shown in (c) and (e) whileEdepth

is shown on the surface ofS in (d) and (f)

of −0.81 mm and −0.18 mm respectively, with a standard deviation of0.47mm and0.62mm. This accuracy is highly robust to occlusion and demonstrates the suitability of the proposed approach to robotic manipulation of soft objects.

Thediceis a slightly smaller deformation, but shows a high mean accuracy of∼0.61mm.

C. Force Tracking

To validate the force tracking methodology, we use a 6- DOF anthropomorphic robot arm (Viper 850 from ADEPT) fitted with a ATI Gamma IP65 force/torque sensor and a 3D- printed stylus as an end-effector distal tool. The robot is used only to utilize its force sensor to obtain a GT of the force.

We use thespongeand theballto validate the force tracking method. The Young’s modulus of the sponge and the ball is determined by repeated indentation tests and is found to be 460 kPa and 160 kPa respectively. Next, these objects are subjected to a strong deformation using the robot’s 3D printed end-effector, and the point of contact on the object is tracked using pre-trained markers (fitted to the probe) from the image data. The results of the force tracking approach, as summarized in Fig. 7, shows an accuracy of ∼97% for thespongeand of∼90%for theballwhen the deformation is optimally observable.

D. Implementation

The approach proposed in this paper has been imple- mented in C++ on an Intel Xeon CPU working at 3.70 GHz.

Utilizing only a single core of the computer (without using GPU), the un-optimised code was able to achieve a runtime of 350 ms - 550 ms per frame while tracking deformations using the proposed approach, showing that it can be possible to achieve real-time performance at frame rate.

No. of data frame

Fig. 4: The value ofH (in m) across all the frames of the two sequences shown in Fig. 3

Fig. 5: The color and depth image input for theball,spongeanddice sequences are shown in (a), (b), (f), (g), (k) & (l). The tracked model are shown in (c), (h) & (m). (d), (i) & (n) shows the object model overlaid on the image, while (e), (j) & (o) shows the geometric error for the visible surfaces, derived using [20]

Error (m)

Iterations

Sponge

Error (m)

Toy

Error (m)

Dice Ball

Error (m)

Fig. 6: Value of the weighted geometric error, i.e.,kWdepthEdepthk across the iterations of optimization for the sequences in Fig. 5

Fig. 7: Setup and results for force tracking. The red plot is the force measurement from the robot’s force sensor and the blue plot is the estimated force reported by the proposed approach

V. CONCLUSION

The paper presented here describes a new method for com- bining geometric and photometric visual error minimization with FEM to create an accurate and fast deformation tracking method. The algorithm has been tested on synthetic and real data and has been shown to outperform state-of-the-art methods in tracking accuracy for generic deforming objects.

The algorithm proposed here can be extended to other, more complex physical models without loss of generalization. The method proposed here has been demonstrated as a reliable visual force tracking system, when the material properties of the object being tracked is available.

ACKNOWLEDGEMENT

We gratefully acknowledge the support from The Research Council of Norway through participation in GentleMAN (299757) project.

(8)

REFERENCES

[1] Sebastian Hoppe Nesgaard Jensen, Alessio Del Bue, Mads Emil Brix Doest, and Henrik Aanæs. A benchmark and evaluation of non-rigid structure from motion. arXiv preprint arXiv:1801.08388, 2018.

[2] John Schulman, Alex Lee, Jonathan Ho, and Pieter Abbeel. Tracking deformable objects with point clouds. In 2013 IEEE International Conference on Robotics and Automation, pages 1130–1137, 2013.

[3] Bin Wang, Longhua Wu, KangKang Yin, Uri M Ascher, Libin Liu, and Hui Huang. Deformation capture and modeling of soft objects.

ACM Trans. Graph., 34(4):94–1, 2015.

[4] Richard A Newcombe, Dieter Fox, and Steven M Seitz. Dynamicfu- sion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 343–352, 2015.

[5] Antoine Petit, Vincenzo Lippiello, and Bruno Siciliano. Real-time tracking of 3d elastic objects with an rgb-d sensor. In2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3914–3921, 2015.

[6] Agniva Sengupta, Alexandre Krupa, and Eric Marchand. Tracking of non-rigid objects using rgb-d camera. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, pages 3310–3317, 2019.

[7] Agniva Sengupta, Romain Lagneau, Alexandre Krupa, Eric Marchand, and Maud Marchal. Simultaneous tracking and elasticity parameter estimation of deformable objects. InIEEE Int. Conf. on Robotics and Automation, pages 10038–10044, 2020.

[8] A. Nealen, M. Muller, R. Keiser, E. Boxerman, and M. Carlson.

Physically based deformable models in computer graphics.Computer Graphics Forum, Vol 25:809–836, 2006.

[9] F. Faure, C. Duriez, H. Delingette, J. Allard, B. Gilles, S. Marchesseau, H. Talbot, H. Courtecuisse, G. Bousquet, I. Peterlik, et al. Sofa: A multi-model framework for interactive physical simulation. In Soft tissue biomechanical modeling for computer assisted surgery, pages 283–321. 2012.

[10] M. Muller and M. Gross. Interactive virtual materials. InProceedings of Graphics interface, pages 239–246. Canadian Human-Computer Communications Society, 2004.

[11] Carlos A Felippa. A systematic approach to the element-independent corotational dynamics of finite elements. Technical report, Technical Report CU-CAS-00-03, Center for Aerospace Structures, 2000.

[12] Michael Friswell and John E Mottershead. Finite element model updating in structural dynamics, volume 38. Springer Science &

Business Media, 2013.

[13] Yang Chen and G´erard G Medioni. Object modeling by registration of multiple range images. Image Vision Comput., 10(3):145–155, 1992.

[14] DA Freedman and Persi Diaconis. On inconsistent m-estimators.The Annals of Statistics, pages 454–461, 1982.

[15] Jorge J Mor´e. The levenberg-marquardt algorithm: implementation and theory. InNumerical analysis, pages 105–116. Springer, 1978.

[16] Souriya Trinh, Fabien Spindler, Eric Marchand, and Franc¸ois Chaumette. A modular framework for model-based visual tracking using edge, texture and depth features. In2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 89–96.

IEEE, 2018.

[17] Blender Online Community.Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amster- dam, 2018.

[18] Pushkar Joshi, Mark Meyer, Tony DeRose, Brian Green, and Tom Sanocki. Harmonic coordinates for character articulation. ACM Transactions on Graphics (TOG), 26(3):71–es, 2007.

[19] Antoine Petit, Vincenzo Lippiello, Giuseppe Andrea Fontanelli, and Bruno Siciliano. Tracking elastic deformable objects with an rgb-d sensor for a pizza chef robot. Robotics and Autonomous Systems, 88:187–201, 2017.

[20] Cloudcompare (version 2.11.1) [gpl software]. Retrieved fromhttp:

//www.cloudcompare.org/, 2020.