• Aucun résultat trouvé

7 Essential Principles to Make Multimodal Sentiment Analysis Work in the Wild

N/A
N/A
Protected

Academic year: 2022

Partager "7 Essential Principles to Make Multimodal Sentiment Analysis Work in the Wild"

Copied!
1
0
0

Texte intégral

(1)

7 Essential Principles to Make Multimodal Sentiment Analysis Work in the Wild

Bj¨orn W. Schuller

1,2

1

Department of Computing, Imperial College London, United Kingdom

2

Chair of Complex & Intelligent Systems, University of Passau, Germany bjoern.schuller@imperial.ac.uk

Abstract

Sentiment Analysis is increasingly carried out on multimodal data such as videos taken in every- day environments. This requires robust process- ing across languages and cultures when aiming for mining of opinions from the ‘many’. Here, seven key principles are laid out to ensure a high perfor- mance of an according automatic approach.

1 Introduction

Sentiment Analysis (SA) recently found its way beyond pure text analysis [Cambriaet al., 2015], as sentiment is increas- ingly expressed also via video ‘micro blogs’, short clips, or other forms. Such multimodal data is usually recorded ‘in the wild’ thus challenging today’s automatic analysers. For ex- ample, one’s video-posted opinion on a movie may contain scenes of this movie, requiring subject tracking, and music in the background may need to be overcome for speech recogni- tion and voice analysis. Here, I provide ‘essential principles’

to make a multimodal SA work despite such challenges.

2 The Principles

Seven selected recommendations to make a multimodal SA system ‘ready for the wild’ are given with a short statement : Make it Multimodal – But Truly.Multimodal SA is of- ten carried out in a late fusion manner, as e. g., (spoken) lan- guage, acoustics, and video analysis operate on different time levels and monomodal analysers prevail. However, recent ad- vances in synergistic fusion allow for further exploitation of heterogeneous information streams such as analysis of cross- modal behaviour-synchrony to reveal, e. g., regulation.

Make it Robust. Robustness is a obviously key handling real-world data. Effective denoising and dereverberation can these days be reached by data-driven approaches such as (hi- erarchical) deep learning. Beyond, recognition of occlusions, background noises, and alike should be used in the fusion to dynamically adjust weights given to the modalities

Train it on Big Data.A major bottleneck for SA beyond textual analysis is the lack of suited ‘big’ (ideally multimodal) training data. While data is usually ‘out there’ (such as videos on the net), it is the labels that lack. Recent cooperative learning approaches such as by dynamic active learning and

semi-supervised learning combined with (gamified) crowd- sourcing can help to efficiently build a large training corpus.

Smart pre-selection of suited material from large resources can further improve efficiency.

Make it Adapt. It has repeatedly been shown that in- domain training improves SA [Cambriaet al., 2015]. Recent transfer learning provides a range of solutions to adapt to a new domain even if only little and/or unlabelled data should exist from it. Subject adaptation is another key aspect.

Make it Context-aware.Temporal context modelling can be learnt, e. g., by LSTM recurrent networks. Additional ex- ploitation of knowledge sources can help resolve ambiguities.

Also, automatic recognition of further traits and states of the subject expressing sentiment such as age, gender, ethnicity, personality, or emotion, and health state can add important information regarding the sentiment expressed.

Make it Multilingual.It seems obvious that multilingual- ism is an issue for text-based SA. However, it is as well for acoustic analysis, and can in principle even influence video- based SA due to, e. g., varying lip-movements. In fact, lan- guages are often even mixed in real-world statements.

Make it Multicultural. Cross-cultural SA has been re- searched comparably little, albeit it is clearly of crucial im- portance. Just as for multilingualism, a key requirement will be sufficient learning data. Then, models can be switched, and transferred across languages and cultures.

3 Conclusion

Seven major requirements were highlighted on the way to truly robust multimodal sentiment analysis in adverse condi- tions in today’s ‘big data’ era. Beyond these, a number of fur- ther issues need to be addressed to best exploit automated sen- timent analysis such as provision of meaningful confidence measures and optimal exploitation of ‘internal’ confidences.

Acknowledgments

The author was supported by the EU’s H2020 Programme via the IAs # 644632 (MixedEmotions) and # 645094 (SEWA).

References

[Cambriaet al., 2015] E. Cambria, B. Schuller, and Y. Xia.

New Avenues in Opinion Mining and Sentiment Analysis.

InProc. IJCAI 2015, Buenos Aires, 2015.

Proceedings of the 4th Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2016), IJCAI 2016, page 1, New York City, USA, July 10, 2016.

1

Références

Documents relatifs

Return prediction performance is inconclusive as some asset returns are better predicted by the market data only model and some are better predicted by the

Since machine learning techniques are known to outperform lexicon- based methods [12], this research focused on the use of machine learning methods for sentiment classification in

Next, we used the convolutional N-gram BiL- STM word embedding model by (Nguyen and Le Nguyen, 2018), which has state-of-the-art per- formance on SenTube, to label the data on the

The Sentiment Analysis at SEPLN (TASS)-2019 3 was a classification task where it was required to classify a Spanish tweet on basis of its sentiment,into various classes like,

The peculiarities of the classical models of "information gatekeepers" are analyzed and the approach to clustering of messages in social networks is proposed

Bold black values are the best F-scores overall; blue bold values are the best F-scores obtained by using a single combination method in the word-based word embeddings scenario

For the evaluation of already implemented classifiers, we used the publicly available dataset ”EmoDB”, aka Berlin database of emotional speech. It consists of 535 emotional

Beginning of the song 55.8 73.5 88.3 Ending of the song 64.7 61.7 82.35 From Table 2 it is observed that the whole song gives less performance in detecting the sentiment of a