Context has also been considered as an important cue for action recog- nition [ 9 , 28 , 29 ]. Authors in [ 9 ] propose to learn neighborhood shapes of the space-time features which are discriminative for a given action category, and recursively map the descriptors of the variable-sized neighborhoods into higher-level vocabularies resulting into a hierarchy of space-time conﬁgura- tions. More recently, Wang et al. [ 28 ] introduce a contextual model in order to capture contextual interactions between interest points. Multiple chan- nels of contextual features for each interest point are computed in multi-scale contextual domains with diﬀerent shapes, where an individual context is rep- resented by the posterior density of this particular feature class at this pixel location. Then multiple kernel learning is used to select the best combina- tion of channels in a multi-channel SVM classiﬁcation. In [ 29 ], objectsand human body parts are considered as mutual contextand their interactions are modeled using random ﬁelds. Authors in that work, cast the learning task as a structure learning problem, by which the structural connectivity between objects, overall human poses, and diﬀerent body parts are estimated.
Current group activity recognition approaches , , , , ,  typically tackle this problem by de- composing it into two parts: feature learningand relational reasoning. The first part focuses on learning person-specific visual features important for understanding individual actions. In the second part, pairwise relations are modeled to infer the group activities. Despite recent advances, these approaches still confuse between visually similar group activities as they rely only on person-level appearance features in the feature learning part and ignore the contextual information present in videos. Consider the example shown in Fig. 1. It is challenging to differentiate walking activity in the first case from crossing activity in the second case using appearance features alone as in both the cases people are moving from one point to another. However, if we have additional cues that identify that a group of people is moving on a sidewalk in Fig. 1a vs. road in Fig. 1b, the model can learn to distinguish these group activities. We term these additional cues as contextual information and propose to integrate them with appearance features for group activity understanding.
Moreover, to the best of our knowledge, no studies focus on the fusion of gait and thermal facial features together to detect human emotion in social robotics context. A very common scenario is when the human walks towards the social robot and stops in front of it to interact with it. In our work, we developed a multimodal emotion recognition method by using thermal facial images during human-robot interaction and gait data during walking towards the robot to recognize four emotional states (i.e., neutral, happy, angry, and sad). The offline emotion recognition is widely developed by researchers. However, the online testing of emotion recognitionmodel is challenging in real-time HRI context. In this paper, we developed a new method based on Random Forest (RF) modeland confusion matrices of two individual Random Forest models by using the data from the face thermal images and gait. In addition, the IRL method is used with the verbal human feedback in the learning loops in order to improve the performance of real-time emotion recognition. The experiment results shows the effectiveness of IRL in multimodal emotion recognition.
This application intended to teach the recognitionandlearning of colors (only red, yellow, blue and green) to children (aged from 2 to 5 according to the level of difficulty). The scenario is based on the French teaching syllabus for nursery schools. We asked a teacher to imagine one or more scenarios using an interactive tabletop and a set of objects without giving any limits or constraints. The teacher proposed a simple application in which the children have to move a set of objects which have “lost their color” into the suitably colored frame (i.e. a “black and white” bee should be placed inside a yellow frame) .
Context Models and Out-of-contextObjects
Myung Jin Choi, Antonio Torralba, Alan S. Willsky
The context of an image encapsulates rich information about how natural scenes andobjects are related to each other. Such contextual information has the potential to enable a coherent understanding of natural scenes and images. However, context models have been evaluated mostly based on the improvement of object recognition performance even though it is only one of many ways to exploit contextual information. In this paper, we present a new scene understanding problem for evaluating and applying context models. We are interested in finding scenes andobjects that are “out-of-context”. Detecting “out-of-context” objectsand scenes is challenging because context violations can be detected only if the relationships between objects are carefully and precisely modeled. To address this problem, we evaluate different sources of context information, and present a graphical model that combines these sources. We show that physical support relationships between objects can provide useful contextual information for both object recognitionand out-of-context detection.
To explore the dynamics of a wide variety of systems behavior based on collected event streams, there exist many advanced continuous time modeling formalisms: for instance, continuous time Bayesian networks, Markov jump pro- cesses , Poisson networks and graphical event models (GEMs) . In this work we are interested in Recursive Timescale Graphical Models (or RTGEMs)  a sub-family of GEMs, that present advantages compared to the other formalisms. Appropriate learningand verification techniques should be adapted for the type of formalism that we wish to use. Standard model checking, for example, is used as an verification method . It has been applied to many formalisms, but to the best of our knowledge, never adapted to RTGEMs. Another valid solution for verification are approximation methods, such as Statistical Model Checking (SMC) , which is an efficient technique based on simulations and statistical results. SMC has been successfully applied to probabilistic graphical models such as dynamic Bayesian networks (DBNs) in . In the same way, SMC could be easily adapted to RTGEMs.
3.3. Data post-processing
Due to random displacements between the SEM ground truth and the optical image, the instances (attributes and labels) dataset had to be cleaned. To perform this, a first description about the dataset is necessary. The dataset has a total of 786 655 instances. Among all these instances, there are 287 classes. The majority of those cannot be used due to the small number of occurrences. Furthermore, the imbalance of instances among categories affects the training phase of machine learning algorithms and their performances during the classification test ( Yen and Lee , 2006 ). For example, we have a to- tal of 2 instances for the ‘‘Actinolite, Plagioclase’’ class and 16 566 occurrences for the ‘‘Plagioclase, None’’ category. In addition, the classification algorithm used to process SEM data yields a category named ‘‘Unknown’’, in which particle with an ambiguous composition was not allocated with a mineral name. The classification fails when the chemical composition of a mineral exceed the specified tolerance in distance in the Euclidian hyperspace due to impurities, mixed signal or spectral deconvolution issues. Consequently, all instances labeled as ‘‘Unknown’’ were excluded from learning to avoid contaminating the other classes. Thus, we decided to exclude all instances with the word ‘‘Unknown’’ in their label because the sand grains normally belong to a known mineral and to avoid contaminating the other classes. Also, particle identified as ‘‘Quartz’’ are overwhelmingly dominant (47 570 instances), but plagued with various color issues. Quartz is typically colorless and transparent. However, it may be stained by iron oxide coating, tinted by internal structural damages, or be loaded with submi- croscopic inclusions that alter its apparent color. Being transparent and bi-refracting, light traversing the grains tends to disperse as in a prism into ‘‘rainbows’’. Furthermore, due to transparency, quartz particle may reflect the color light form neighboring grains. Consequently, instances labeled as ‘‘Quartz’’ were eliminated from the dataset. Finally, to prove the computer vision and machine learning concept, classes that are not pure were excluded. For example, instances labeled as ‘‘Plagioclase, None’’ were considered as pure and were preserved, while instances labeled as ‘‘Plagioclase, Magnetite’’ were not considered as pure and disregarded. Once post-processed, 546 444 instances were retained, labeled into 9 classes. Among these instances, the ‘‘Background’’ class account for 468 431 instances.
The contextual features are evaluated at the same rate as the vision-based gesture recognizer.
We designed our experiments to demonstrate how contextual features can improve visual feedback recognition under the same two axes described in Chapter 2: embod- iment and conversional capabilities. Our datasets include interactions with a robot, an avatar and a non-embodied interface. We also experimented with diﬀerent types of visual feedback: head nods, head shakes and eye gaze aversion gestures. Finally, we tested our context-based recognition framework (described in Section 4.3) with two classiﬁcation algorithms (SVM and FHCRF) to show the generality of our approach. The experiments were performed on three diﬀerent datasets: MelHead and Eye. For the MelHead dataset, the goal is to recognize head nods and head shakes from human participants when interacting with a robot. For the WidgetsHead dataset, the goal is to recognize head nods from human participants when interacting with gestures-based widgets. For the AvatarEye dataset, the goal is to recognize eye gestures (gaze aversion) from human participants when interacting with a virtual agent. For the MelHead and WidgetsHead datasets, multi-class SVMs were used to train and test the contextual predictor and multi-modal integrator while FHCRF models were used for the AvatarEye dataset. Our goal for using both SVM and FHCRF is to show how our context-based framework generalizes to diﬀerent learning algorithms.
A Spectral Database for the Recognition of Urban Objects in Kaunas City: Performance and Morphometric Issues
Sébastien Gadal 12 , Gintautas Mozgeris 3 , Donatas Jonikavicius 3 , Jurate Kamicaityte 4 , Walid Ouerghemmi 1
• The diversity of urban materials of Kaunas city, contemporary and historical urban structures, urban planning
To obtain better performance, several studies use ensemble methods such as  that combines the predictions of three models resulting from Support Vector Regression, Ensemble Tree and Artificial Neural Network (ANN) algorithms for predicting the consumption of air conditioning in residential buildings. To ensure that their data meet the input requirements of the models, they perform a linear interpolation for missing and incorrect data. Then, to select the right feature set, they use a statistical measure. In , the authors propose an approach for detecting the current transportation mode of a user from his/her smartphone sensors data. They propose to divide the collected data into consecutive non-overlapping time sequences and to extract four features for each sequence and each sensor. Then, they combine multiple learners to improve their performance. In , the authors present an ensemble method that combines the predictions of three models resulting from DT, MLP, and Logistic Regression (LR) for human activity recognition. To determine the class of a new activity, they consider the predictions (i.e. classes) of the three models, and choose the class with the highest number of votes. Their results show that ensemble learning can achieve significant improvements for activity recognition when compared to what each learning algorithm can achieve individually. The same problem is also investigated in ; in this case, however, the authors combine the results of other classifiers such as MLP, SVM, and LogitBoost. In addition, they use a clustering method to select 18 relevant features from 24 features and they obtain a good accuracy of 91.15%. In , the authors introduce a multi-class classification approach based on ultra-wide band sensor measurements and RF to detect when old people fall down. The pre-processing phase includes filtering, feature extraction, stream windowing, change detection and buffering. The classifier obtains the lowest error rate by setting the number of trees at 200.
Keywords: Handwriting Recognition, Historical Document, Transfer Learning, Deep Neural Network, Unlabeled Data Abstract: In this work, we investigate handwriting recognition on new historical handwritten documents using transfer
learning. Establishing a manual ground-truth of a new collection of handwritten documents is time consuming but needed to train and to test recognition systems. We want to implement a recognition system without performing this annotation step. Our research deals with transfer learning from heterogeneous datasets with a ground-truth and sharing common properties with a new dataset that has no ground-truth. The main difficulties of transfer learning lie in changes in the writing style, the vocabulary, and the named entities over centuries and datasets. In our experiment, we show how a CNN-BLSTM-CTC neural network behaves, for the task of transcribing handwritten titles of plays of the Italian Comedy, when trained on combinations of various datasets such as RIMES, Georges Washington, and Los Esposalles. We show that the choice of the training datasets and the merging methods are determinant to the results of the transfer learning task.
In addition to a strong technological knowledge intensity, other attributes of the organizational context (lower part of figure 1) must be present. In order for organizational learning to occur, individuals within an organization must be given the opportunity to make the required changes to correct errors once they have been detected (what Argyris and Schon (1978) identify as double-loop learning). This necessitates an organizational culture which favors participation and openness, what Kanter (1983: 396) labels "organic" in opposition to "mechanistic" culture. Managers in such organizations favor participatory decision-making through formal and informal meetings and the active diffusion of information (Birley and Westhead, 1990). Workers' commitment to learning is encouraged and is reflected in various human resource practices such as performance appraisal (Hornsby and Kuratko, 1990) or through the existence of training practices (Snell and Dean, 1992). By developing an organizational climate conducive to change and creativity and committing to organizational learning, organizations promote employee motivation and skills, without which learning cannot occur.
An effective way to take into account semantic information is to re-evaluate (rescoring) the best hypotheses of the ASR (N-best). The recognition system provides us for each word of the hypoth- esis sentence an acoustic score p_acc (w) and a linguistic score p_ml (w). The best sentence is the one that maximizes the probability of the word sequence:
agent system that learns from demonstrations to control robotic devices and ESCHER, a multi-agent system for multi-criteria optimization.
Service robotic deals with the design of robotic devices whose objectives are to provide adequate services to their users. User needs are multiple, dynamic and sometimes contradictory. Providing a natural way to automatically adapt the behaviour of robotic devices to user needs is a challenging task. The complex- ity comes with the lack of way to evaluate user satisfaction without evaluating a particular objective. A good way to handle this challenge is to use Learning from Demonstrations, a paradigm to dynamically learn new behaviours from demonstrations performed by a human tutor. With this approach, each action performed by a user on a device is seen as a feedback. Through the natural pro- cess of demonstration, the user not only shows that the current device behaviour is not satisfying him, but also provides the adequate action to perform. Adaptive Learner by EXperiments (ALEX)  is a multi-agent system designed to face this challenge.
This paper addresses the problem of recognizing multi- ple rigid objects that are common to two images. We pro- pose a generic algorithm that allows to simultaneously de- cide if one or several objects are common to the two images and to estimate the corresponding geometric transforma- tions. The considered transformations include similarities, homographies and epipolar geometry. We first propose a generalization of an a contrario formulation of the RANSAC algorithm proposed in [ 6 ]. We then introduce an algorithm for the detection of multiple transformations between im- ages and show its efficiency on various experiments. 1. Introduction
In this section we explore symmetry as a second edge attribute used together with v(a, b), as described below.
5.2 Symmetry attributes
As mentioned in the introduction, when no strict or exact symmetry is verified, then it is meaningful to consider symmetry as a matter of degree, expressed by a symmetry measure. In our case, the regions are crisp sets but, as we have to deal with approximate symmetries, it is still of interest to use a symmetry measure instead of a boolean value. All the results obtained in the previous sections are valid here, considering crisp sets as a particular case of fuzzy sets. Symmetry measures can be used to define a vertex attribute or an edge at- tribute. The first case applies if some objects of the scene are known to be approximately symmetrical. Then it is possible to define a symmetry attribute as the orientation of the symmetry plane of the region and compare these ori- entations in the modeland the image to be recognized. Another option for such a scene is to compare the degree of symmetry of regions.
Bei and Luan opened the way towards a "green" heterolo- gous expression system when they found that tobacco mesophyll protoplasts are devoid of K + inward currents
and demonstrated that the KAT1 Shaker channel could be heterologoulsy expressed and subsequently characterised therein . The stable transformation protocol by Agro- bacterium infiltration of leaf disks and subsequent regen- eration of a plant , however, is time consuming. Another drawback of this method is that ubiquitous expression of some transgenes may, in some instances, prevent the regeneration of a transformed plant that is required to obtain mesophyll protoplasts, or induce inop- portune transcriptome modifications. On the other hand, the transient expression of protein-GFP fusions in tobacco cells has been used since about 8 years to study the target- ing of proteins [19,20] suggesting possible use of these cells for electrophysiological characterisation of electro- genic transport systems . Based on this, we developed a new procedure relying on transient transformation of tobacco mesophyll protoplasts. Vectors (available upon request) were engineered that allow selection of the trans- formed protoplasts (GFP reporter), expression of GFP- tagged or untagged proteins (for subcellular localisationand electrophysiological analyses, respectively), and co- expression of two different proteins in order to investigate their functional interactions. A PEG-mediated transforma- tion protocol was adapted and the potential usefulness of the method was assessed by functional expression of the AKT1 channel, a result that had not been obtained in
V. C ONCLUSION
Thanks to a thorough risk analysis, this paper identifies top four realistic and cost efficient security requirements for securing games based on active NFC objects against cheaters: secure communications between the object and the server as well as between two objects, sign data stored on objectsand perform regular mandatory online checks of the objects. The objective of these requirements is to help game developers to protect their games from players boosting the characteristics of their objectsand playing with counterfeited objects.