• Aucun résultat trouvé

An Automatic Procedure for Generating Datasets for Conversational Recommender Systems

N/A
N/A
Protected

Academic year: 2022

Partager "An Automatic Procedure for Generating Datasets for Conversational Recommender Systems"

Copied!
2
0
0

Texte intégral

(1)

An Automatic Procedure for Generating Datasets for Conversational Recommender

Systems

Alessandro Suglia1, Claudio Greco1, Pierpaolo Basile1, Giovanni Semeraro1, and Annalina Caputo2

1 Department of Computer Science, University of Bari Aldo Moro

2 ADAPT Centre, Trinity College Dublin, Dublin, Ireland alessandro.suglia@gmail.com, claudiogaetanogreco@gmail.com, {firstname.lastname}@uniba.it,annalina.caputo@adaptcentre.ie

Abstract. Conversational Recommender Systems assist online users in their information-seeking and decision making tasks by supporting an interactive process with the aim of finding the most appealing items ac- cording to the user preferences. Unfortunately, collecting dialogues data to train these systems can be labour-intensive, especially for data-hungry Deep Learning models. Therefore, we propose an automatic procedure able to generate plausible dialogues from recommender systems datasets.

People have information needs of varying complexity, which can be solved by an intelligent agent able to answer questions formulated in a proper way, eventu- ally considering user context and preferences.Conversational Recommender Sys- tems (CRS)assist online users in their information-seeking and decision making tasks by supporting an interactive process [1] with the aim of finding the most appealing items according to the user preferences.

Unfortunately, collecting dialogues data required for the training phase of these systems can be really labour-intensive, especially for the latest data-hungry Deep Learning models. For this reason, synthetic dialogue datasets can be ex- tremely useful in order to bootstrap effective dialogue systems able to support a goal-oriented conversation with the user. Therefore, we propose an automatic procedure able to generate plausible dialogues directly from well-known recom- mender systems datasets exploiting data coming from the Linked Open Data Cloud and contextual information related to the user.

Given a useruand his/her set ofbinarypreferences, we trained a decision tree from the user u preferences expressed towards items represented using Linked Open Databinary features extracted from theWikidata3knowledge base. In par- ticular, each pair predicate-object is represented as a binary feature which is 1 if The dialogue datasets generated fromMovieLens 1M andMovieTweetings datasets can be found at:http://github.com/swapUniba/ConvRecSysDataset.

The source code of the automatic procedure for generating conversational recom- mender systems datasets will be released when the paper will be accepted.

3 http://www.wikidata.org

(2)

the knowledge base contains the triple (item, predicate, object), 0 otherwise. The considered predicates arewdt:P57(director),wdt:P161 (cast member),wdt:P136 (genre)4. The dialogue generation procedure is an iterative algorithm which is executed until all user preferences have been used. At each step of the dialog generation procedure, a top-n list of items composed by positive and negative items is generated by randomly choosing from positive and negative preferences of the given user u. Then, paths from the root of the decision tree to thecon- sistently classified examples are exploited to generate a sequence of questions, randomly chosen according to a binomial distribution over the item features, to elicit user preferences. Depending from the percentage of positive items in the top-n, a “refine” step is triggered which extends the dialog with additional questions that lead to a list of suggestions which contains only positive items.

Table 1 shows a conversation generated by applying the designed procedure to the well-known MovieLens 1M recommender systems dataset. In the first part of the conversation, utterances with the aim of introducing the user are generated by exploiting the contextual information included in the dataset.

Agent Utterance user Hey

bot What is your name?

user I am Sofia bot How old are you?

user I am 25

bot What is your occupation?

user sales/marketing

bot Which are your favourite movies?

user I love Speed

bot I suggest you Erin Brockovich Witness Pocahontas Four Weddings and a Funeral Anna and the King. Do you like them?

user I hate them

bot What are your favourite directors?

user I like John Waters

bot What are your favourite actors?

user I like Josef Sommer Wade Williams Marg Helgenberger Jeroen Krabb bot What genres do you like?

user My favourite genres are teen film romance film biographical film

bot I suggest you Erin Brockovich Witness Ever After Simply Irresistible Hairspray. Do you like them?

user I like them

bot I am glad that you like them

Table 1.Conversation generated from the MovieLens 1M dataset. To help reading, Wikidata ids have been replaced with the corrisponding entities.

In this work we have proposed an automatic procedure able to generate syn- thetic dialogue datasets starting from well-known datasets in the recommender system field. The presented procedure is completely generic and can be applied on any dataset containing binary user preferences and whose items have a cor- responding identifier in the Linked Open Data Cloud.

References

1. Mahmood, T., Ricci, F.: Improving recommender systems with adaptive conversa- tional strategies. In: Proceedings of the 20th ACM conference on Hypertext and hypermedia. pp. 73–82. ACM (2009)

4 The prefix wdt stands for http://www.wikidata.org/prop/direct/

Références

Documents relatifs

In Section 4, we de- scribe our approach, named Videam platform (VIsual DrivEn dAta Miner), and we explain how the visualization can drive the different steps of the data mining

Building Data Genome is another recent research project [11], [4], that deals with producing datasets from public and non-residential buildings that be can utilized by the

The outer wave crest continued to wrap around the head of the jetty and resulted in wave attack on both sides of the Jetty A roundhead and trunk (see Figure 7b). The Jetty A rock

In the hourly wind power density time series of the two grid points, the hours when there is wind state (wind power density greater than 200 W/m 2 ) at one of the two points but

For determining reservoir geological units by using this method, a number of synthetic tests were performed: firstly, we have created a multiple units block-model

Implementation and performance: Clust&See provides the ability to apply three different, recently devel- oped graph clustering algorithms to networks and to visualize: (i)

In the “Cart’Eaux” project, a multidisciplinary team, including an industrial partner, develops a global methodology using Machine Learning and Data Mining approaches applied to

Il filera dans un tube sous vide à la vitesse de 1 200 km/h.. Il transportera 28 passagers par navette toutes les