• Aucun résultat trouvé

Estimation of Character Diagram from Open Movie Database Using Markov Logic Network

N/A
N/A
Protected

Academic year: 2022

Partager "Estimation of Character Diagram from Open Movie Database Using Markov Logic Network"

Copied!
4
0
0

Texte intégral

(1)

Estimation of Character Diagram from Open Movie Database using Markov Logic Network

Yuta Ohwatari, Takahiro Kawamura, Yuichi Sei, Yasuyuki Tahara, and Akihiko Ohsuga

Graduate School of Information Systems, University of Electro-Communications, Tokyo, Japan

{y-ohwatari@ohsuga.is.uec.ac.jp,kawamura@ohsuga.is.uec.ac.jp, sei@is.uec.ac.jp,tahara@is.uec.ac.jp,ohsuga@uec.ac.jp}

Abstract. In this paper, we propose the estimation method of interper- sonal relationships of characters from movie script databases on the Web using Markov Logic Network. By using Markov Logic Network, we can infer while allowing the violation of rules. In experiments, we confirmed that our proposed method can estimate favors between the characters in a movie with a precision of 69.8%.

Keywords: Markov Logic Network, Semantic Analysis, Open Movie Database

1 Introduction

Every year, a large number of movies have been released. If a user want to quickly know about a movie, he/she will see the summary of the movie. Therefore, It is effective summarization of a movie is required in order for better understanding of the movie.

An overview of our proposed method is illustrated in Figure 1. Our method is separated into the estimation of interpersonal relationships and the generation of character diagrams.

First, we prepared script data for learningand inferringby extracting who speak what to whom from a movie database. Then, we estimate the sentiment

!"#"$%&'&'(!

)*+$+)#"$,!&+($+%!

-.,, /"0(0,123-45!

67$,8"+$'&'(!

($79'!,67$%98+!

8"+$'&'(!

&'6"$$"!,

&'#"$:"$;7'+8,$"8+<7';*&:!

:$":$7)";;&'(!

67$,&'6"$$&'(!

&'6"$$&'(!

2+$=7>,?7(&),@"#A7$=!

($79'!,67$%98+!

B8)*"%C!

D;<%+<7',B("'#,76,E*+$+)#"$,-&+($+%! EBFGB1@,H,I";J,;&$K,

LM1NOP@,H,/Q0P5,G"88,

#*"%,A",A&;*,#7, 47+$!,+#,7')"0, EBFGB1@,H,I";J,;&$0!

?&=";/LM1OP@JEBFGB1@5, R0RRSRTUS,

?&=";/LM1OP@JLM1OP@5, R0RVRRTU,

?&=";/LM1OP@J@MGD5, R0RRTRTUW!

Fig. 1.Method workflow

(2)

polarity for lines in the script and the favorable impression between a speaker and a listener in a movie using Markov Logic Network. Finally, we generate the character diagram of a movie from the estimated interpersonal relationships.

A first-order knowledge base can be seen as a set of hard constraints on the set of possible worlds. However, the solution in the real world is often on the set of impossible worlds. In contrast, Markov Logic Network solves this problem by associating weight that reflects how strong a constraint is with each formulas.

Also, it is laborious to construct Markov Networks. Markov Logic Network can be viewed as a template for constructing Markov Networks. Markov Logic Network (MLN) is a probabilistic extension of a finite first-order logic[4], which makes up the disadvantages of Markov Networks and a first-order logic.

Note that we used the learning and inference algorithms provided in the open-source Alchemy1 as an implementation of the MLN in this paper.

2 Related Work

Tanaka et al[7] presents interpersonal relationships extracted from sentence struc- tures as a summary of a story. We considered that it is effective to present the re- lationships of characters as a summary. On the other hand, analysis of e-mails[1]

and estimation from co-occurrence of the name[2] are studies of estimating the relationships of persons in thereal world. However, it has not been studied about the estimation of interpersonal relationships offictional characters.

There are many studies using MLN, for example, entity resolution[5], in- formation extraction[3]. These studies focus on global constraints, and built a model by using MLN. We also targets a text and extracts infomation on global constraints.

3 Defined Rules

We defined rules to estimate interpersonal relationships for MLN. These rules determine the sentiment polarity for lines in the script using sentiment polarity for the word and favor between characters using the sentiment polarity for lines.

To use sentiment polarity of words, we incorporated as the Semantic Orientations of Words Dictionary that is built by Takamura et al[6]. This assigns a real value in the range from -1 to +1 to where the words assigned with values close to -1 are supposed to be negative, and the words assigned with values close to +1 are supposed to be positive. Vocabulary was extracted from WordNet2.

In this paper, we limited to two-valued attribute of positive(+1) and negative(1). A observed predicate is a predicate with all arguments given by inferring and training. A hidden predicate is a predicate with an argument not given by inferring but given by training. Observed predicates and hidden predi- cates in this paper are shown in Table 1.

1 http://alchemy.cs.washington.edu/

2 http://wordnet.princeton.edu

(3)

Table 1.Observed and hidden predicates

Predicate Description

Observed predicates

Line(text, speaker, listener)speakerspeaktexttolistener

Word(text, position, word) wordintextand the position isposition Wpol(word, pol) The sentiment polarity ofword ispol Hidden

predicates

Lpol(text, pol) The sentiment polarity oftextispol Likes(person, person) Favor

We describe some of the logical rules for each script line below. t and l is variable. A constant is enclosed in double quotes. Underscore means an arbitrary value. If (+) gets attached to the front of the variable, it is replaced by all the constants that is deployed from the actual data (grounding).

W ord(t, l,+w)∧W pol(+w,+p)⇒Lpol(t,+p) Line(t,+sp,+li)∧Lpol(t,”P”)⇒Likes(+sp,+li) Line(t,+sp,+li)∧Lpol(t,”N”)⇒ ¬Likes(+sp,+li)

...

4 Experiment on Relation Extraction

Datasets

In the experiment, we used movie script data from IMSDb: The Internet Movie Script Database3 on the Web. The title of movies used in the experiment are Back to the Future (1985), Good Will Hunting (1997), Harry Potter And The Sorcerer’s Stone (2001), The Lord of the Rings The Fellowship of the Ring (2001), and Star Wars Episode I The Phantom Menace (1999). The average number of lines and characters are 704.6 and 42.6, respectively.

Setting

We used a movie for testing, and the remaining 4 movies as training data.

We treated as true above the mean value of the probability, because estimation results are expressed in a probability. Note, we ask the person for a description ofLikespredicates in the training data that is familiar with the movies and has seen actually.

Result

The experimental results are shown in Table 2. The training time was about 19 hours in total, and the inferring time was about 3 hours in total. As a result, recall is lower than precision. In addition, Figure 2 shows an example of the generated character diagram from the estimated interpersonal relationships. In this figure, a node represents a person, an edge represents a relationship. The

3 http://www.imsdb.com/

(4)

information with edge shows the estimated probability of the predicateLikes() and the mean of the probability (like or not like). A dashed edge means false estimation. This figure generally represents the interpersonal relationships of Star Wars Episode I The Phantom Menace (1999).

Table 2.Estimated relationships and the number of grounded rules Precision Recall F-measure ground clauses

mean 69.752 45.332 53.392 46,203

!"#$%"&'$()#*$+,!

-#./0&!/%.#*#! 12-3!

4#,56&7#8*!

9:99;9<=>?"@5&*%$+A!

9:9BC9<;<?*%$+A!

9:9B<9<;C?*%$+A&

9:99D9<=2?"@5&*%$+A&

9:9B>9<;;?*%$+A!

9:9B=9<;B?*%$+A!

Fig. 2.Part of character diagram generated from the estimated relationship

5 Conculusion

In this paper, using MLN on the movie script database, we estimated the senti- ment polarity of script lines and the interpersonal relationships of the characters in a movie. In the experiments, we confirmed that our proposed method esti- mated favors between the characters in a movie with a precision of 69.8%. In the future, we will improve the model to achieve the higher accuracy.

Acknowledgements

This work was supported by JSPS KAKENHI Grant Numbers 24300005, 26330081, 26870201.

References

1. Adamic, L.A., Adar, E.: Friends and neighbors on the web. Social networks 25(3), 211–230 (2003)

2. Matsuo, Y., Tomobe, H., Hashida, K., Nakajima, H., Ishizuka, M.: Social network extraction from the web information. Transactions of the Japanese Society for Ar- tificial Intelligence 20(1), 46–56 (2005)

3. Poon, H., Domingos, P.: Joint inference in information extraction. In: AAAI. vol. 7, pp. 913–918 (2007)

4. Richardson, M., Domingos, P.: Markov logic networks. Machine learning 62(1-2), 107–136 (2006)

5. Singla, P., Domingos, P.: Entity resolution with markov logic. In: Data Mining, 2006. ICDM’06. Sixth International Conference on. pp. 572–582. IEEE (2006) 6. Takamura, H., Inui, T., Okumura, M.: Extracting semantic orientations of words

using spin model. In: Proc. the 43rd Annual Meeting on Association for Computa- tional Linguistics. pp. 133–140. Association for Computational Linguistics (2005) 7. Tanaka, S., Okabe, M., Onai, R.: Interactive narrative summarization. Workshop

on Interactive Systems and Software pp. 06–01–06–6 (2011)

Références

Documents relatifs

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Unit´e de recherche INRIA Rennes, Irisa, Campus universitaire de Beaulieu, 35042 RENNES Cedex Unit´e de recherche INRIA Rhˆone-Alpes, 655, avenue de l’Europe, 38330 MONTBONNOT ST

Probabilities of errors for the single decoder: (solid) Probability of false negative, (dashed) Probability of false positive, [Down] Average number of caught colluders for the

One of the approaches that are largely used in natural language processing (NLP) to represent meanings is based on semantic frames. Seman- tic frames are composed of slots,

Goto, “A vocabulary-free infinity-gram model for nonparametric bayesian chord progression analysis,” in Proceedings of the International Symposium on Music Information

Based on this result, models based on deterministic graph- ings – or equivalently on partial measured dynamical systems – can be defined based on realisability techniques using

Mourchid, Movie script similarity using multilayer network portrait divergence, in International Conference on Complex Networks and Their Applications (Springer, Cham, 2020),

The focus of this paper is on the context-aware decision process which uses a dedicated Markov Logic Network approach to benefit from the formal logical representation of