1 INTRODUCTION
Visual **tracking** of targeted objects has received significant attention in the Vision community. In the last decade a number of robust **tracking** strate- gies that can track targets in complex scenes have been proposed. One such successful paradigm is the particle filter (PF) [(Isard and Blake, 1998), (Doucet et al., 2000)]. The most important property of a PF is its ability to handle complex, **multi**-modal (non- Gaussian) posterior distributions. Such distributions are approximated by a cloud of particles. Essentially, the number of particles required to adequately ap- proximate the distribution grows exponentially with the dimensionality of the state space. PFs are compu- tationally expensive as the number of particles needs to be large for good performances in terms of robust- ness and precision. Moreover, the observation models are often built on complex appearance models, and as a result the trackers have difficulties to operate in real time especially when the number of targets increases. Currently, to the best of our knowledge, the literature is lacking works that showcase graphical processing unit (GPU) to improve performance of **multi** **object** **tracking** (MOT) algorithm. Hence, this paper aims to fill this void by presenting performance comparison and assessment of MOT algorithm running on GPU. The **tracking** is achieved by a decentralized particle filter with rich target appearance model. We demon- strate that the precision of GPU implementation is

En savoir plus
Recently, efficient particle implementations of PHD fil- tering have been proposed [18], [10], [12]. These approaches include the estimation of the number of observed tracks. To this end, particle clustering is necessary to identify tracks. However, such a procedure is non-trivial in urban scenarios where objects move in close proximity to each other. In our variant of the particle-based PHD-filter, we avoid the prob- lems of clustering and cardinality estimation by initializing tracks with a fixed number of particles constantly attached to them. The method used by us is not claiming to perform superior **multi**-**object** **tracking**, however, it does facilitate the integration of contextual information, as particles are easy to affect by context information.

En savoir plus
{thi-lan-anh.nguyen|furqan.khan|farhood.negin|francois.bremond}@inria.fr
Abstract
Appearance based **multi**-**object** **tracking** (MOT) is a challenging task, specially in complex scenes where objects have similar appearance or are occluded by background or other objects. Such factors motivate researchers to pro- pose effective trackers which should satisfy real-time pro- cessing and **object** trajectory recovery criteria. In order to handle both mentioned requirements, we propose a robust online **multi**-**object** **tracking** method that extends the fea- tures and methods proposed for re-identiﬁcation to MOT. The proposed tracker combines a local and a global tracker in a comprehensive two-step framework. In the local track- ing step, we use the frame-to-frame association to gener- ate online **object** trajectories. Each **object** trajectory is called tracklet and is represented by a set of **multi**-modal feature distributions modeled by GMMs. In the global **tracking** step, occlusions and mis-detections are recovered by tracklet bipartite association method based on learn- ing Mahalanobis metric between GMM components using KISSME metric learning algorithm. Experiments on two public datasets show that our tracker performs well when compared to state-of-the-art **tracking** algorithms.

En savoir plus
these variants, we use for each variant and ∆t value the hyper-optimization procedure discussed previously to find the best set of parameters.
MOTA values and IDS are indicated in Fig. 4. First of all, they show that the proposed LINF1 variant outperforms the other variants both in terms of MOTA and IDS. L1 variant performs poorly in our **multi**-frame data association context, especially concerning IDS. When using these representations, each detection is represented by only a few similar detections. It leads to promote short tracks of highly similar detections rather than long tracks through the whole sliding window. The two other appearance models, App N N and App M EAN , yield more

En savoir plus
This work presents cross-classification clustering (henceforth 3C), a technique that extends single object classification approaches, simultaneously and efficiently classi[r]

Abstract—Following the **tracking**-by-detection paradigm, mul- tiple **object** **tracking** deals with challenging scenarios, occlusions or even missing detections; the priority is often given to quality measures instead of speed, and a good trade-off between the two is hard to achieve. Based on recent work, we propose a fast, light- weight tracker able to predict targets position and reidentify them at once, when it is usually done with two sequential steps. To do so, we combine a bounding box regressor with a target-oriented appearance learner in a newly designed and unified architecture. This way, our tracker can infer the targets’ image pose but also provide us with a confidence level about target identity.

En savoir plus
Very popular method for local data association is the bipartite matching. The exact solution can be found via Hungarian algorithm [11, 2]. These methods are compu- tationally inexpensive, but can deal only with short term occlusion. An example of global method is extension of the bipartite matching into network flow [16, 3]. Given the objects detections at each frame, the direct acyclic graph is formed and the solution is found through minimum-cost flow algorithm. The algorithms reduce trajectory fragments and improves trajectory consistency but lack of robustness to identity switches of close or intersecting trajectories. To overcome the ID switches, the paper in [15] proposes global data association using a model which is close to the real world **tracking** scenario. It incorporates both motion and ap- pearance features into generalized minimum clique graph. They form a k-partite graph, where all the pairwise relation- ships between detections in the video is considered. The track of a person forms a clique and MOT is formulated as constraint maximum weight clique problem. In order to globally optimize the tracks, entire sequence must be pro- vided beforehand. The weights to balance motion and ap- pearance feature are set manually. The algorithm overcome the identity switch for intersecting trajectories however if the appearances of pedestrians walking in same direction are similar, the ID-switch remains.

En savoir plus
We can see that our system works better when we combine the output of the LTSM with the output of the detector. This also shows that the system works better than without it (only matching). 7.7 **Multi**-**object** **tracking**: matching
In each step, the detections have to be matched with the current track because it is possible to have multiple instances of a **object** in a frame. This can be done in a lot of different way. To avoid spending too much time on this problem, we decided to take a common approach: applying the hungerian algorithm [1] on the two sets of boxes with the distance:

En savoir plus
5.3. Evaluation metrics
In order to evaluate the performance of our **multi**- **object** **tracking** method, we adopt the widely used CLEAR MOT metrics [1]: including multiple **object** **tracking** accu- racy (MOTA↑) and the multiple **object** **tracking** precision (MOTP↑) which punish on false positives (FP↓), false neg- atives (FN↓) and identity switching (IDSw↓). The **tracking**- time metrics are also computed: the number of trajectories in ground-truth (GT), the ratio of mostly tracked trajecto- ries (MT↑), a ground truth trajectory that is covered by a **tracking** hypothesis for at least 80% is regarded as mostly tracked), the ratio of mostly lost (ML↓), a ground truth tra- jectory that is covered by a **tracking** hypothesis for at most 20% is regarded as mostly lost) and the number of times a trajectory is fragmented (FM↓). Where ↑ indicates that higher scores lead to better results, and ↓ shows that lower scores correspond to better results.

En savoir plus
Quantifying the uncertainty in **object** estimates maintained by a catalog provides an important advantage to various operations, some inherent to the **tracking** activity itself, such as the prediction of an orbital trajectory, and some peripheral, such as the assessment of the probability of collision between two objects. For this reason, casting the SSA estimation problem in a Bayesian paradigm has become a topic of growing interest but remains relatively unexplored so far. Besides some nontraditional approaches in the context of **tracking** [1,2], it has been addressed chiefly through either track-based [3,4] or set-based approaches [5–11]. Popular track-based solutions include the multiple hypothesis **tracking** (MHT) and joint probabilistic density association (JPDA) filters and follow an intuitive construction in which sequences of observations that may represent the data originating from a single specific **object** are maintained as tracks. They do not maintain a probabilistic description of the dynamical evolution of the population of objects and rely on heuristics and expert knowledge in order to determine, for example, at which point a stream of observations is assumed to be sufficient evidence for the creation of new track, or at which point a track is considered lost. Set-based solutions [12,13], on the other hand, approach the **multi**-**object** estimation problem in a holistic manner and incorporate all the sources of uncertainty (size of the population of objects, individual states, observation process, etc.) in a unified probabilistic framework. The construction of specific **multi**-**object** **tracking** algorithms follows a principled derivation exploiting the finite set statistics (FISST) framework [13] and relies on specific modeling choices for the sources of uncertainty rather than heuristics. Popular set-based solutions such as the probability hypothesis density [14] or cardinalized probability hypothesis density [15] filters

En savoir plus
In this paper, we discuss the **multi**-**object** negotiation model, where subsets of a set of objects are under negotiation, and show that this is a special case of the **multi**-issue negotiation model. Under the **multi**-**object** model, we demonstrate a technique for learning the opponent’s preferences over subsets during a negotiation. One setting where such a negotiation might take place is in the realm of privacy. A website might request several items of personal information from a user in order to complete a transaction, and negotiation can take place to determine which subset of those items is suitable to the website and the user. Here, a partial order over the opponent’s preferences is known. In particular, the receiver of the items (assuming that the items are desirable) will necessarily prefer offer a over offer b if a is a superset of b. The reverse is true for the giver. However, if neither is a subset of the other, it is not immediately clear which is preferred. To fill in these missing preferences, we can observe or predict that users typically behave in one of several ways. Our method uses a Bayesian classification technique that decides in which of these predefined classes a new opponent’s total ordering is likely to reside. This decision is based on the opponent’s offers made thus far in the negotiation. The ultimate goal is to learn the total order over the opponent’s preferences as well as possible so that an effective negotiation strategy can be devised. The paper is organized as follows. In section 2 we formalize the framework in which we consider our negotiations, and define the **multi**-**object** negotiation model. A protocol from the literature that can be used for such a negotiation model is then discussed. Section 3 presents our classification scheme for opponents’ preferences, and we show how the technique can be extended for use in the more general model of **multi**-issue negotiation. Section 4 then sheds some light on the effectiveness of our technique by describing ex- perimental results, and finally sections 5 and 6 offer conclu- sions and discuss plans for future work.

En savoir plus
Table 2. **Tracking** results for the TUD-Stadtmitte sequence. The best values are printed in red .
Fig. 4. **Tracking** of the motion of 3 persons in the subway video over time. The **tracking** is correct even with low person resolutions and low person contrast (in second image).
Subway video The video of the third test belongs to an European project (anonymity). The test sequence contains 36006 frames and lasts 2 hours. Figure 4 illustrates the correct **tracking** of three persons with low resolutions over time. In the second image, the contrast with respect to the surrounding background of the two persons with tracker ID 4 and 5 (corresponding to cyan and pink trajectories) are very low. They occlude each other at several frames but they are still correctly tracked (see the right image).

En savoir plus
Isard adapted the particle filter for **object** **tracking** in his condensation algorithm [2]. It belongs to the family of the Bayesian filters, and is based on the estimation of the posterior distribution of the **object** state using a set of weighted particles. Each particle corresponds to an hypothetical state of the **object** (typically describing its position and velocity), while the weight represents the importance of the particle. The particle filter is composed of three steps: (1) Observation, to measure how each particle fits the model, (2) Resample, to keep the best particles, and (3) Propagation, to diffuse the set of particles according to a dynamical model. In **tracking** problems, the observation step is usually based on **object** visual representations. Different representations have been used to model the target **object**: Isard [2] models the borders of the **object** using parametric B-spline curves, Nummiaro [3] uses colour histograms. P´erez [4] splits the **object** in different parts, each one modelled by a colour histogram. Maggio [5] uses smoothed histograms of gradient orientation for rotation robustness purposes, combined with colour histograms to form a particle filter using two features. Brasnett [6] also uses

En savoir plus
leader-follower approach. In addition, to obtain reactive and distributed control architecture, only robot’s local frame were used. Moreover, an on-line **object** detection, performed by the **multi**-robot system is presented. The objective is to localize the position of the leader in the formation. This leader detection is obtained while using a heuristic method that exploits the range data from the group of cooperative robots to obtain the parameters of an ellipse [25]. This ellipse surround completely the leader to follow. Collision with objects was not addressed in this work, it will be subject of further works.

En savoir plus
I. M OTIVATIONS
Detecting, localizing or following targets is at the core of numerous robotic applications, in both adversarial and cooper- ative contexts. Much work has been devoted in various research communities to such problems, which are often referred to as “pursuit-evasion” problems. This very evocative term actually encompasses a variety of scenarios that pertain either to mono- or **multi**-robot contexts, considering either a single or multiple targets, and whose objective is either to detect, to capture or to track them. On the other hand, other similar problems are named differently and make use of specific vocabulary, e.g., surveillance, search or **tracking**. This is partly explained by the different application contexts considered (industrial, civilian or military), and by the fact that different communities tackle them with different standpoints (e.g., sensor data processing, symbolic or geometric task planning, task allocation, game theory, etc.)

En savoir plus
The rigid **tracking** method used here is based on a monocular vision system. Local **tracking** is performed via a 1D oriented gradient search to the normal of the contours at a specified sampling distance. This 1D search provides real-time performance. Local **tracking** provides a redun- dant group of distance to contour based features which are used together in order to calculate the global pose of the ob- ject. The use of redundant measures allows the elimination of noise and leads to high estimation precision. These lo- cal measures form an objective function which is minimized via a non-linear minimization procedure using virtual visual servoing(VVS) [3].

En savoir plus
VIII. C ONCLUSION
In this paper, we studied a visual **tracking** problem in which a team of robots equipped with cameras are charged with **tracking** the locations of targets moving on the ground. We discussed the sources of uncertainty that affect the quality of estimating the locations of ground targets using overhead images. We showed the infeasibility of **tracking** all targets while maintaining the optimal quality of **tracking**, or any factor of the optimal quality, at all times. We then formulated the target **tracking** problem where the goal is to assign trajectories for each robot in order to maximize the quality of **tracking**. When we are given a set of candidate robot trajectories, we showed how the problem can be posed as a combinatorial optimization problem. A simple and easy-to- implement greedy algorithm applied to this problem yields a 1/2 approximation. Finally, we presented results from sim- ulations and preliminary experiments validating the sensing

En savoir plus
A direct use of this approach in a moving camera context is not that easy. Especially because we have no direct access to a moving **object** segmentation. Moreover, the use of a person detector cannot detect the larger region formed by the merging of two correlated pedestrians. However, the use of stereo-cameras can give information on the presence of obstacles. Thus an extension of their method could be built using a sliding window approach for initial detection and stereo obstacle segmentation when multiple targets merge together. Another efficient approach for pedestrian **tracking** has been proposed recently by Ess et al. [11]. In their case, the detection was done by a simple HOG detector. They also used an appearance model to discriminate each target from each other, thus avoiding unlikely asso- ciation. They used a very simple appearance model, they used a color histogram computed inside an ellipse fitted to the bounding box.

En savoir plus
In [3] and [4] an omnidirectional image was first un- wrapped to a panoramic image using a cylindrical projection, where the optical flow is calculated. In the former a synthetic optical flow is generated by estimating the position of the focii of expansion and contraction and the calculated flow is compared to the generated one, while the latter estimates an affine transform on square subsegments to warp the image and perform the differentiation. In [5] the omnidirectional image is segmented in a set of perspective images and detection is done in the vein of [1], while the **tracking** is based on the particle filter. To perform the following a control law based on a minimization of an ad hoc following error is calculated. Compared to cylindrical projection geometry, working on the unit sphere offers the advantage of mitigating wrapping issues at the edges of the projection caused by cutting and unwrapping the cylinder.

En savoir plus
how this access benefits you. Your story matters.
Citation Suleiman, Amr, Zhengdong, Zhang, and Vivienne Sze. "A 58.6mW
Real-Time Programmable **Object** Detector with **Multi**-Scale **Multi**- **Object** Support Using Deformable Parts Model on 1920x1080 Video at 30fps." In 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), Digest of Technical Papers, held 15-17 June 2016, Honolulu, HI. IEEE.