• Aucun résultat trouvé

Can we predict self-reported customer satisfaction from interactions ?

N/A
N/A
Protected

Academic year: 2021

Partager "Can we predict self-reported customer satisfaction from interactions ?"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02134252

https://hal.archives-ouvertes.fr/hal-02134252

Submitted on 20 May 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Can we predict self-reported customer satisfaction from interactions ?

Jeremy Auguste, Delphine Charlet, Geraldine Damnati, Frédéric Béchet, Benoit Favre

To cite this version:

Jeremy Auguste, Delphine Charlet, Geraldine Damnati, Frédéric Béchet, Benoit Favre. Can we predict self-reported customer satisfaction from interactions ?. International Conference on Acoustics, Speech and Signal Processing, May 2019, Brighton, United Kingdom. �hal-02134252�

(2)

Can we predict self-reported customer satisfaction from

interactions ?

Jeremy Auguste

1

, Delphine Charlet

2

, G´

eraldine Damnati

2

, Fr´

ed´

eric B´

echet

1

, Benoit Favre

1

(1) {firstname.lastname}@univ-amu.fr

(2) {firstname.lastname}@orange.com

Summary

• Objective: Automatically evaluate the customer satisfaction from conversation logs.

• Data: Contact center chat conversations and the customers’ satis-faction surveys.

• Method: Comparison of different classification schemes: 3-labels, 2 × 2-labels or 2-labels multitask classification. Definition of the Serious Error Rate metric to focus on problematic confusions.

• Results: Considering the classification of extreme opinions as two distinct tasks greatly improves the results on the neutral class.

Task and Motivations

Task: Evaluate the customer satisfaction from the logs of a human-human conversation.

Possible evaluations: Direct supervision using surveys filled by the cus-tomers themselves and indirect supervision by experts.

Problem: Customer surveys are not mandatory and experts can’t eval-uate every conversation.

Question: Can we retrieve directly from conversation logs such subjec-tive opinions as the Net Promoter Score ?

Orange conversation corpus

Chat data description:

• Technical and commercial assistance;

• 79,000 conversations with completed surveys; • 140,000 unique tokens;

• Word Error Rate of 4.3% overall (10.1% for the Customers, 1.6% for the Agents)

Customer surveys

How likely would you be to recommend us

to your family and friends ?

Detractor Passive Promoter

0 10

Following Customer Relationship Management conventions, apprecia-tions are grouped into 3 categories: detractor, passive and promoter.

Classification schemes

3-label output

Model

0.6 0.3 0.1 Detractor Answer:

detractor passive promoter

2 × 2-label output 0.6 0.4 0.7 0.3 Detractor Model Promoter Model Passive Answer: no yes no yes

2-label multitask output

Single Model

0.2 0.8 0.7 0.3 Detractor Answer: no yes no yes Shared layers

Only used with neural networks classifiers

Goal: Avoid confusions between detractors and promoters.

Classifiers

Different classification methods that consider dialogues differently: • Support-Vector Machine (SVM);

• Convolutional Neural Network (CNN);

• Long-Short Term Memory network with attention (RNN).

Evaluation metrics

Use of 3 different metrics:

• Accuracy: #correct predictions#samples ;

• F1-score: F1(l) = 2×Precision(l)×Recall(l)Precision(l)+Recall(l) ;

• Serious Error Rate: Percentage of confusion between the

De-tractor and the Promoter classes.

Results

Model Accuracy Serious Error Rate

3-labels classification scheme

majority class 42.7 30.9

SVM 56.9 14.7

CNN 57.5 15.5

RNN 57.5 15.8

2-labels+reject classification scheme

SVM 2x2 labels 52.7 6.2 CNN 2x2 labels 55.2 7.7 CNN 2 labels multitask 55.0 7.6 RNN 2x2 labels 53.5 6.5 RNN 2 labels multitask 53.5 6.5 0 10 20 30 40 50 60 70 80 SVM CNN RNN SVM 2x2 CNN 2x2 RNN 2x2 CNN multitask RNN multitask F1-score Detractor Passive Promoter

3-labels 2x2-labels 2-labels multitask

2-labels schemes greatly improve the prediction of the passive class and greatly reduce confusions between extreme classes.

Contrastive experiment

Reducing the lexicon size to evaluate domain robustness:

42 44 46 48 50 52 54 56 all 308 173 128 108 92 42 SVM CNN RNN Macr o F1 Lexicon size

Lexicon reduced by selecting words occuring at least 10K to 100K times.

Acknowledgements

Research supported by grants ANR-15-CE23-0003 (DATCHA), ANR-16-CONV-0002 (ILCB) and ANR-11-IDEX-0001-02 (A*MIDEX).

Références

Documents relatifs

[r]

[r]

[r]

- Cela te permet alors de compléter le carré à gauche au milieu, puisqu’il ne manque plus

- Commence par chercher dans les lignes, les colonnes et les petits carrés où il ne manque que peu de chiffres. - Puis cherche les chiffres qui reviennent

et à Brienz. Certainement, sans qu'elles fussent jolies, il faisait beau les voir ramer d'un bras vi- goureux, en chantant les rondes alpestres. li tues

7KHDLPRIWKLVZRUNZDVWRGHVLJQDORZSRZHUGHFLPDWLRQ ILOWHUIRUDPXOWLVWDQGDUGUHFHLYHU7KHGHVLJQZHSURSRVHGLQ 6HFWLRQV ,9 DQG 9 FRPELQHV WKH FORFNJDWLQJ WHFKQLTXH

D'ouest en est, la numérotation commence à la lettre A pour la Bretagne (Easting compris entre 0 et 100 km) et se termine à N pour la Corse (Easting compris entre 1100 et 1200