Hate speech and offensive language detection using transfer learning approaches

(1)

HAL Id: tel-03276023

https://tel.archives-ouvertes.fr/tel-03276023

Submitted on 1 Jul 2021

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Hate speech and offensive language detection using

transfer learning approaches

Marzieh Mozafari

To cite this version:

(2)

626

NNT

:

2021IPP

AS007

Hate Speech and Offensive Language

Detection using Transfer Learning

Approaches

Th èse de doctorat de l’Institut Polytechnique de Paris pr épar ée à T él écom SudParis

´

Ecole doctorale n◦_{626 Ecole doctorale de l’Institut Polytechnique de Paris (ED IP}

Paris)

Sp ´ecialit ´e de doctorat : Informatique

Th èse pr ésent ée et soutenue à Évry, le 28/05/2021, par

M

ARZIEH

M

OZAFARI

Composition du Jury :

Daqing Zhang

Directeur d’ études, IMT, T él écom SudParis - France Pr ésident Gabriella Pasi

Professeure, University of Milano-Bicocca - Italy Rapporteuse Ioan Marius Bilasco

Maˆıtre de Conf ´erences, Lille 1 University - France Rapporteur Elena Cabrio

Assistant Professor, Universit ´e C ˆote d’Azur - France Examinateur Christophe Cerisara

Charg ´e de recherche, CNRS - France Examinateur Daqing Zhang

Directeur d’ études, IMT, T él écom SudParis - France Examinateur Noel Crespi

Professeur, IMT, T él écom SudParis - France Directeur de th èse Reza Farahbakhsh

(3)

(4)

(5)

(6)

(7)

(8)

Dedication

❚♦

t✇♦ ♦❢ ♠② ❜❡st ❢r✐❡♥❞s ❉r✳ ❆❧✐ ❏❛❧✐❧✈❛♥❞ ❛♥❞ ❉r✳ ❆r❞❛✈❛♥ ❆❢s❤❛r ✇❤♦ ♣❛ss❡❞ ❛✇❛② ❞✉r✐♥❣ t❤❡✐r P❤❉ ❛♥❞ ❝♦✉❧❞ ♥♦t ❞❡❢❡♥❞ t❤❡✐r t❤❡s✐s✱ ✉♥❢♦rt✉♥❛t❡❧②✳✳✳

(9)

(10)

✏◆♦ ♦♥❡ ✐s ❜♦r♥ ❤❛t✐♥❣ ❛♥♦t❤❡r ♣❡rs♦♥ ❜❡❝❛✉s❡ ♦❢ t❤❡ ❝♦❧♦r ♦❢ ❤✐s s❦✐♥✱ ♦r ❤✐s ❜❛❝❦❣r♦✉♥❞✱ ♦r ❤✐s r❡❧✐❣✐♦♥✳ P❡♦♣❧❡ ♠✉st ❧❡❛r♥ t♦ ❤❛t❡✱ ❛♥❞ ✐❢ t❤❡② ❝❛♥ ❧❡❛r♥ t♦ ❤❛t❡✱ t❤❡② ❝❛♥ ❜❡ t❛✉❣❤t

t♦ ❧♦✈❡✱ ❢♦r ❧♦✈❡ ❝♦♠❡s ♠♦r❡ ♥❛t✉r❛❧❧② t♦ t❤❡ ❤✉♠❛♥ ❤❡❛rt t❤❛♥ ✐ts ♦♣♣♦s✐t❡✳✑

(11)

(12)

(13)

(14)

(15)

(16)

(17)

(18)

(19)

(20)

(21)

✶✽ ❚❆❇▲❊ ❖❋ ❈❖◆❚❊◆❚❙

❆ ❆♣♣❡♥❞✐① ✶✻✾

(22)

(23)

(24)

(25)

(26)

(27)

(28)

(29)

(30)

Chapter

2

(31)

(32)

(33)

(34)

(35)

(36)

(37)

(38)

(39)

(40)

(41)

(42)

(43)

(44)

(45)

(46)

Chapter

3

Social Media Content Analysis

(47)

(48)

(49)

(50)

(51)

(52)

(53)

(54)

❈❍❆P❚❊❘ ✸✳ ❙❖❈■❆▲ ▼❊❉■❆ ❈❖◆❚❊◆❚ ❆◆❆▲❨❙■❙ ✺✶

Dkl(P, M ) =

X

P (i) logP (i)

(55)

(56)

(57)

(58)

(59)

(60)

(61)

(62)

(63)

(64)

(65)

(66)

(67)

(68)

(69)

(70)

Chapter

4

(71)

(72)

(73)

(74)

(75)

(76)

(77)

(78)

(79)

(80)

(81)

(82)

(83)

(84)

(85)

(86)

(87)

(88)

(89)

(90)

(91)

(92)

(93)

(94)

(95)

(96)

(97)

(98)

(99)

(100)

(101)

(102)

(103)

(104)

Chapter

5

(105)

(106)

(107)

(108)

(109)

(110)

(111)

(112)

(113)

(114)

(115)

(116)

(117)

(118)

(119)

(120)

(121)

(122)

(123)

(124)

(125)

(126)

(127)

(128)

(129)

(130)

(131)

(132)

(133)

(134)

(135)

(136)

(137)

(138)

(139)

(140)

❈❍❆P❚❊❘ ✺✳ ▼❯▲❚■▲■◆●❯❆▲ ❍❆❚❊ ❙P❊❊❈❍ ❉❊❚❊❈❚■❖◆ ✶✸✼ st❛❣❡✮ ❛♥❞ ❧❡❛r♥✐♥❣ r❛t❡ ♦❢ ♦✉t❡r ❧♦♦♣ β ❛r❡ s❡t ✐♥✐t✐❛❧❧② t♦ ✸❡✲✺ ❛♥❞ ✻❡✲✺✱ r❡s♣❡❝t✐✈❡❧②✳ ❲❡ ✉s❡ ❆❞❛♠ ♦♣t✐♠✐③❡r t♦ ✉♣❞❛t❡ t❤❡ ♣❛r❛♠❡t❡rs✳ ❚❤❡ ♥✉♠❜❡r ♦❢ ✉♣❞❛t❡ st❡♣s ✐♥ t❤❡ ✐♥♥❡r✲ ❧♦♦♣ ✐s s❡t t♦ ✶✵✳ ❉✉r✐♥❣ t❤❡ ✜rst ✸✵ ❡♣♦❝❤s✱ ✇❡ ❝❛❧❝✉❧❛t❡ t❤❡ ✜rst✲♦r❞❡r ❞❡r✐✈❛t✐✈❡s ❛♥❞ ✐♥ t❤❡ r❡st ♦❢ tr❛✐♥✐♥❣ ♣r♦❝❡ss ✇❡ ❝❛❧❝✉❧❛t❡ t❤❡ s❡❝♦♥❞✲♦r❞❡r ❞❡r✐✈❛t✐✈❡s ✐♥ ▼❆▼▲✳ ❲❡ ♣❡r❢♦r♠ ❡✈❛❧✉❛t✐♦♥ ♦♥ t❤❡ s❛♠♣❧❡s ✐♥ Lval s❡t ✇✐t❤ ✺ ❞✐✛❡r❡♥t s❡❡❞s ❛❢t❡r ❡❛❝❤ ❡♣♦❝❤✱ ❛♥❞ t♦ ❛✈♦✐❞ ♦✈❡r✜tt✐♥❣✱ ✇❡ ❛♣♣❧② ❡❛r❧② st♦♣♣✐♥❣ ✇❤❡♥ t❤❡ ✈❛❧✐❞❛t✐♦♥ ❛❝❝✉r❛❝② ❢❛✐❧❡❞ t♦ ❞❡❝r❡❛s❡ ❢♦r ✺ ❡♣♦❝❤s✳ ■♥ t❤❡ ❢❡✇✲s❤♦t s❡tt✐♥❣✱ ✇❡ ❝❤♦s❡ k ∈ {4, 8, 16} t♦ ❡✈❛❧✉❛t❡ ❤♦✇ ♠♦❞❡❧s ❣❡♥❡r❛❧✐③❡ t♦ ♥❡✇ t❛r❣❡t ❧❛♥❣✉❛❣❡ ✇✐t❤ ❛ ❧✐♠✐t❡❞ ❧❛❜❡❧❡❞ ❞❛t❛ k ♣❡r ❝❧❛ss✳ ❋♦r t❤❡ ❳▲▼✲❘ ❜❛s❡❧✐♥❡✱ t❤❡ ♠❛①✐♠✉♠ s❡q✉❡♥❝❡ ❧❡♥❣t❤ ♦❢ t❤❡ ✐♥♣✉t s❡♥t❡♥❝❡s ✐s s❡t t♦ ✷✺✻ ❛♥❞ ✐♥ ❝❛s❡ t❤❡ ✐♥♣✉t ❧❡♥❣t❤ ✐s s❤♦rt❡r ♦r ❧♦♥❣❡r✱ ✐t ✇✐❧❧ ❜❡ ♣❛❞❞❡❞ ✇✐t❤ ③❡r♦ ✈❛❧✉❡s ♦r tr✉♥❝❛t❡❞ t♦ t❤❡ ♠❛①✐♠✉♠ ❧❡♥❣t❤✱ r❡s♣❡❝t✐✈❡❧②✳ ❚❤❡ ♠♦❞❡❧ ✐s ✜♥❡✲t✉♥❡❞ ✇✐t❤ ❛ ❜❛t❝❤ s✐③❡ ♦❢ ✶✻ ❢♦r ✺ ❡♣♦❝❤s✳ ❆♥ ❆❞❛♠ ♦♣t✐♠✐③❡r ✇✐t❤ ❛ ❧❡❛r♥✐♥❣ r❛t❡ ♦❢ ✷❡✲✺ ✐s ✉s❡❞ t♦ ♠✐♥✐♠✐③❡ t❤❡ ❈r♦ss✲❊♥tr♦♣② ❧♦ss ❢✉♥❝t✐♦♥✳ ❋♦r ♥♦♥✲❡♣✐s♦❞✐❝ ❜❛s❡❧✐♥❡✱ t❤❡ ♠♦❞❡❧ ✐s tr❛✐♥❡❞ ❢♦r ✺ ❡♣♦❝❤s ♦♥ Ltrain ❛♥❞ ✐s ❡✈❛❧✉❛t❡❞ ❛❢t❡r ❡❛❝❤ ❡♣♦❝❤ ♦♥ Lval s❡t✳ ■♠♣❧❡♠❡♥t❛t✐♦♥ ❆s t❤❡ ✐♠♣❧❡♠❡♥t❛t✐♦♥ ❛♥❞ ❡①❡❝✉t✐♦♥ ❡♥✈✐r♦♥♠❡♥t✱ ✇❡ ✉s❡ ▲❛❜✲■❆✶✶ ♣❧❛t❢♦r♠ ♣r♦✈✐❞❡❞ ❜② ❚❤❡ ❋r❡♥❝❤ ◆❛t✐♦♥❛❧ ❈❡♥tr❡ ❢♦r ❙❝✐❡♥t✐✜❝ ❘❡s❡❛r❝❤✶✷ _{✭❈◆❘❙✮ ✇✐t❤ ❛} ◆❱■❉■❆ ❚❡s❧❛ ❱✶✵✵ ●P❯ ✇✐t❤ ✸✷ ●✐❇ ♦❢ ❘❆▼ ✭◆❱▲✐♥❦✮✳ ✺✳✸✳✺✳✸ ❘❡s✉❧ts ❛♥❞ ❞✐s❝✉ss✐♦♥s ■♥ t❤✐s s❡❝t✐♦♥✱ ✇❡ ❡✈❛❧✉❛t❡ t❤❡ tr❛✐♥✐♥❣ ♠♦❞❡❧s ♦♥ ❤❛t❡ s♣❡❡❝❤ ❛♥❞ ♦✛❡♥s✐✈❡ ❧❛♥❣✉❛❣❡ ❞❡t❡❝t✐♦♥ t❛s❦s ✇✐t❤ ❞✐✛❡r❡♥t ❧❛♥❣✉❛❣❡s✳ ❍❛t❡ s♣❡❡❝❤ ❞❡t❡❝t✐♦♥ ■♥ t❤✐s t❛s❦✱ ✇❡ ❝♦♠❜✐♥❡ ❛❧❧ ❞❛t❛s❡ts ✐♥ ❡❛❝❤ ❧❛♥❣✉❛❣❡ ❛s r❡♣♦rt❡❞ ✐♥ ❚❛❜❧❡ ✺✳✼✳ ❉✉❡ t♦ t❤❡ ❧❛❝❦ ♦❢ ❛ ❤❡❧❞✲♦✉t ❜❡♥❝❤♠❛r❦ t❡st s❡t ❢♦r ❡❛❝❤ ❞❛t❛s❡t✱ ❛❢✲ t❡r ❝♦♠❜✐♥✐♥❣ ❛❧❧ ❞❛t❛s❡ts ✐♥ ❡❛❝❤ ❧❛♥❣✉❛❣❡✱ ✇❡ s❡❧❡❝t ✷✵✪ ♦❢ s❛♠♣❧❡s ✐♥ ❡❛❝❤ ❧❛♥❣✉❛❣❡ ❛s t❡st s❡t ❜② ♣❡r❢♦r♠✐♥❣ ❛ str❛t✐✜❡❞ s❛♠♣❧✐♥❣✳ ❚♦ ❤❛✈❡ ❛ ✈❡r✐t② ♦❢ t❛s❦s ❞✉r✐♥❣ t❤❡ ♠❡t❛✲tr❛✐♥✐♥❣ st❡♣✱ ✇❡ ❧❡✈❡r❛❣❡ ❞✐✛❡r❡♥t ❧❛♥❣✉❛❣❡s ✇✐t❤ ❞✐✛❡r❡♥t ❤❛t❡❢✉❧ ❝♦♥t❡♥t ✇❤❡r❡ ❛❧❧ ❧❛♥❣✉❛❣❡s ❡①❝❡♣t t✇♦ ❛r❡ s❡❧❡❝t❡❞ ❛s tr❛✐♥✐♥❣ s❡t✳ ❋♦r ❡①❛♠♣❧❡✱ t♦ ❡✈❛❧✉❛t❡ ♠❡t❛✲ ❧❡❛r♥✐♥❣ ♠♦❞❡❧s ♦♥ ❆r❛❜✐❝ ❛s ❛ t❛r❣❡t ❧❛♥❣✉❛❣❡ ✇✐t❤ k ❧❛❜❡❧❡❞ s❛♠♣❧❡s ♣❡r ❝❧❛ss✱ ✇❡ ❝♦♥✲ s✐❞❡r ♦♥❡ ❧❛♥❣✉❛❣❡ ❢♦r ✈❛❧✐❞❛t✐♦♥ ❛♥❞ t❤❡ r❡st ♦❢ ❧❛♥❣✉❛❣❡s ❢♦r tr❛✐♥✐♥❣✱ ✇❤❡r❡ Ltrain =

{English, F rench, German, Indonesian, Spanish, P ortuguese}✱ Lval= {Italian}✱ ❛♥❞ Ltest=

(141)

(142)

(143)

(144)

(145)

(146)

(147)

(148)

Chapter

6

Conclusion and Future Work

❈♦♥t❡♥ts

✻✳✶ ❈♦♥❝❧✉s✐♦♥ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✶✹✻ ✻✳✶✳✶ ❙✉♠♠❛r② ❛♥❞ ■♥s✐❣❤ts ♦❢ ❈♦♥tr✐❜✉t✐♦♥s ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✶✹✻ ✻✳✷ ❋✉t✉r❡ ❲♦r❦ ❛♥❞ ❈❤❛❧❧❡♥❣❡s ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✶✹✾

(149)

(150)

(151)

(152)

(153)

(154)

(155)

(156)

(157)

(158)

(159)

(160)

(161)

(162)

(163)

(164)

(165)

(166)

(167)

(168)

(169)

(170)

(171)

(172)

(173)

(174)

(175)

(176)

(177)

(178)

Titre : D ´etection du Discours de Haine et du Langage Offensant utilisant des Approches de Transfer Learning Mots cl ´es : Discours de haine, Langage offensant, Apprentissage par transfert, BERT, XLM-RoBERTa,

l’apprentissage en profondeur, Classification interlinguistique des textes, Few-shot learning, Meta learning, R ´eseaux sociaux, Twitter

R ´esum ´e : Une des promesses des plateformes de

r éseaux sociaux (comme Twitter et Facebook) est de fournir un endroit s ûr pour que les utilisateurs puissent partager leurs opinions et des informations. Cependant, l’augmentation des comportements abu-sifs, comme le harc èlement en ligne ou la pr ésence de discours de haine, est bien r éelle. Dans cette th èse, nous nous concentrons sur le discours de haine, l’un des ph énom ènes les plus pr éoccupants concernant les r éseaux sociaux.

Compte tenu de sa forte progression et de ses graves effets n égatifs, les institutions, les plateformes de r éseaux sociaux et les chercheurs ont tent é de r éagir le plus rapidement possible. Les progr ès r écents des algorithmes de traitement automatique du langage naturel (NLP) et d’apprentissage automatique (ML) peuvent être adapt és pour d évelopper des m éthodes automatiques de d étection des discours de haine dans ce domaine.

Le but de cette th èse est d’ étudier le probl ème du discours de haine et de la d étection des propos in-jurieux dans les r éseaux sociaux. Nous proposons diff érentes approches dans lesquelles nous adaptons des mod èles avanc és d’apprentissage par transfert (TL) et des techniques de NLP pour d étecter auto-matiquement les discours de haine et les contenus injurieux, de mani ère monolingue et multilingue. La premi ère contribution concerne uniquement la langue anglaise. Tout d’abord, nous analysons le contenu textuel g én ér é par les utilisateurs sur Fa-cebook en introduisant un nouveau cadre capable de cat égoriser le contenu en termes de similarit é bas ée sur diff érentes caract éristiques, à savoir les ca-ract éristiques lexicales, topiques et s émantiques. En outre, en utilisant l’API Perspective de Google, nous mesurons et analysons la toxicit é du contenu. En-suite, nous proposons une approche TL pour l’identi-fication des discours de haine en utilisant une combi-naison du mod èle non supervis é pr é-entraˆın é BERT (Bidirectional Encoder Representations from Trans-formers) et de nouvelles strat égies supervis ées de r églage fin. Enfin, nous étudions l’effet du biais

in-volontaire dans notre mod èle pr é-entraˆın é BERT et proposons un nouveau m écanisme de g én éralisation dans les donn ées d’entraˆınement en repond érant les échantillons puis en changeant les strat égies de r églage fin en termes de fonction de perte pour att énuer le biais racial propag é par le mod èle. Pour évaluer les mod èles propos és, nous utilisons trois da-tasets publics provenant de Twitter.

(179)

Title : Hate Speech and Offensive Language Detection using Transfer Learning Approaches

Keywords : Hate Speech, Offensive Language, Transfer Learning, BERT, XLM-RoBERTa, Deep Learnig,

Cross Lingual Text Classification, Few-shot Learning, Meta Learning, Social Media, Twitter

Abstract : The great promise of social media

plat-forms (e.g., Twitter and Facebook) is to provide a safe place for users to communicate their opinions and share information. However, concerns are gro-wing that they enable abusive behaviors, e.g., threa-tening or harassing other users, cyberbullying, hate speech, racial and sexual discrimination, as well. In this thesis, we focus on hate speech as one of the most concerning phenomena in online social media. Given the high progression of online hate speech and its severe negative effects, institutions, social media platforms, and researchers have been trying to react as quickly as possible. The recent advancements in Natural Language Processing (NLP) and Machine Learning (ML) algorithms can be adapted to develop automatic methods for hate speech detection in this area.

The aim of this thesis is to investigate the problem of hate speech and offensive language detection in so-cial media, where we define hate speech as any com-munication criticizing a person or a group based on some characteristics, e.g., gender, sexual orientation, nationality, religion, race. We propose different ap-proaches in which we adapt advanced Transfer Lear-ning (TL) models and NLP techniques to detect hate speech and offensive content automatically, in a mo-nolingual and multilingual fashion.

In the first contribution, we only focus on English language. Firstly, we analyze user-generated textual content in Facebook to gain a brief insight into the type of content by introducing a new framework being able to categorize contents in terms of topical simila-rity based on different features, namely lexical, topi-cal, and semantical features. Furthermore, using the Perspective API from Google, we measure and ana-lyze the toxicity of the content. Secondly, we propose a TL approach for identification of hate speech by em-ploying a combination of the unsupervised pre-trained

model BERT (Bidirectional Encoder Representations from Transformers) and new supervised fine-tuning strategies. Finally, we investigate the effect of unin-tended bias in our pre-trained BERT-based model and propose a new generalization mechanism in training data by reweighting samples and then changing the fine-tuning strategies in terms of the loss function to mitigate the racial bias propagated through the mo-del. To evaluate the proposed models, we use three publicly available datasets from Twitter.

In the second contribution, we consider a multilingual setting where we focus on low-resource languages in which there is no or few labeled data available. First, we present the first corpus of Persian offensive lan-guage consisting of 6 000 microblogs from Twitter to deal with offensive language detection in Persian as a low-resource language in this domain. After annota-ting the corpus, we perform extensive experiments to investigate the performance of transformer-based nolingual and multilingual pre-trained language mo-dels (e.g., ParsBERT, mBERT, XLM-RoBERTa) in the downstream task. Furthermore, we propose an en-semble model to boost the performance of our model. Then, we expand our study into a cross-lingual few-shot learning problem and we adapt a meta learning-based approach to study the problem of few-shot hate speech and offensive language detection in low-resource languages that will allow hateful or offensive content to be predicted by only observing a few labe-led data items in a specific target language. To eva-luate the proposed model, we use diverse collections of different publicly available corpora, comprising 15 datasets across 8 languages for hate speech and 6 datasets across 6 languages for offensive language. To the best of the author’s knowledge, there has been an insignificant number of attempts to use meta lear-ning approaches on hate speech detection tasks.

Institut Polytechnique de Paris