Example 2: Multiple Labels—Film Genres - Natural Language Annotation for Machine Learning

A more complicated example of classification is identifying film genres. This type of classification is an excellent example of a task where the annotation guidelines can completely change the outcome of the annotation, and trying to apply the guidelines may lead to changes in the spec.

Remember that within the MATTER cycle there is also the MAMA (Model-Annotate-Model-Annotate) cycle. Chances are good (especial

ly if you are starting your annotation project from scratch) that you will need to revise your model and guidelines at least a few times before you will be able to annotate your entire corpus. But don’t be discouraged;

the more you refine your annotation, the easier it will be to define fea

tures for your ML algorithms later on.

If you recall from “Film Genre Classification” (page 70), when discussing the spec for a film genre classification task we used IMDb’s list of film genres, which included the following 26 genres:

Action Adventure Animation Biography Comedy

Crime Documentary Drama Family Fantasy

Film-Noir Game-Show History Horror Music

Musical Mystery News Reality-TV Romance

Sci-Fi Sport Talk-Show Thriller War

Western

This certainly seems like a reasonable list, and it comes from one of the most (if not the most) popular movie reference websites on the Internet, so at the very least it’s a good starting point for your spec. So let’s look at the list of questions to be answered for the guidelines.

What is the goal of the project?

To label film summaries with genre notations.

What is each tag called and how is it used?

We have 26 tags that can be applied to each summary as needed.

What parts of the text do you want annotated, and what should be left alone?

Each label will apply to the entire document.

How should the annotation be created?

Annotation software is probably the best way to apply multiple labels to a document.

Well, that was easy! Except…the answer to the second question, particularly the “how is it used” part, is quite underspecified. When labeling movie reviews as positive or negative it’s probably enough to say (as a starting point, at least) that the label will be based on tone, and neutral reviews will be labeled as “negative.” However, genre labels are not all mutually exclusive, so annotators are going to need clearer guidelines for how and when to apply each one. One basic question that needs to be answered is: “Is there a maximum number of labels that can be applied to a document?” The answer to this question alone can completely differentiate one annotation task from another, even if each one is using the same spec; guidelines that specify a maximum of, say, two labels per document are likely going to return a vastly different corpus than guidelines that have no such limit. For the imaginary task we are describing here, our guidelines will not specify a limit to the number of tags.

However, while knowing how many labels can be applied partly answers the question of “how” the tags are used, it doesn’t address all of the aspects of “how” the tags are used.

Another aspect that has to be considered is when each tag will be used. In the positive/

negative review task, each document was assigned a single label, and if a document wasn’t positive, it was negative: a dichotomy that’s fairly straightforward. Since there’s no limit to the number of genre tags (or even if there were a limit), annotators will need some clarification about when to apply which tags.

At first, the question of when to use each label seems straightforward. But consider the first two tags in the previous list: action and adventure. What, your annotators will want to know, is the difference? A quick Google search shows that this is hardly the first time this question has been asked, and the general consensus appears to be that action movies tend to be more violent, while adventure movies generally require that the protagonist

be going on some sort of journey to a place or situation he has not dealt with previously.

However, the same Google search also reveals that other genre lists (such as the one Netflix uses) simply have one genre label called “Action-Adventure,” presumably due to a high level of overlap between the two labels.

Let’s assume that you’re satisfied with the aforementioned distinction between action and adventure, and you put them in your guidelines as definitions for your annotators to refer back to. So now they’ll (hopefully) label Die Hard as an action movie and Around the World in 80 Days as an adventure. But wait a minute, one of the first summaries for Die Hard on IMDb.com starts with “New York City Detective John McClane has just arrived in Los Angeles to spend Christmas with his wife. Unfortunately….” So is there a journey involved? The character does go to a different location, but a cop who is used to dealing with criminals going to a different city to deal with criminals doesn’t really meet the “new situation” clause of the adventure definition given earlier, so we can probably safely say that Die Hard doesn’t qualify as an adventure movie.

OK, that’s a good start. But what about Eat, Pray, Love? The main character clearly goes on a journey to new places and situations, but we suspect that most people wouldn’t consider it to be an adventure movie. So, maybe adventure movies also have to have some element of danger? Better amend the definition in your guidelines. Or maybe at this point you feel like trying to differentiate between the two is a bit tedious and/or pointless, and you’d rather amend your spec to have a single action-adventure category.

Believe it or not, we didn’t include the preceding few paragraphs simply because we enjoy nitpicking about movie genres. Rather, we included them because this discussion illustrates the kinds of questions you will need to answer for your annotators when you give them the guidelines for their task. The simplest approach to the task is to just give your annotators a pile of texts and tell them to put on whatever labels seem right to them, but don’t forget that an important part of an annotation task is reproducibility. If you simply tell annotators to label what they want, it’s unlikely that a different set of annotators will be able to give you the same (or even similar) results at a later date.

If you aren’t entirely sure what definition to give to each label, now would be a really good time to take another piece of advice that we’ve repeated a number of times (and will continue to repeat): do some research! One excellent book that we found on the subject is Barry Keith Grant’s Film Genre: From Iconography to Ideology (Wallflower Press, 2007). While not all of the theory in the book can necessarily be applied to an annotation task, looking at the different genres in terms of themes rather than simply looking at surface details can help clarify what makes a movie fit into a genre. For

example, Western movies often are, in part, about exploring new frontiers and the pio

neer spirit, a definition that might be more effective and relevant than one that specifies a Western has horses, people wearing cowboy hats, and at least one person who says

“pardner” very slowly.

A closer look at the film genres also reveals that not all of the genres in the list are describing the same aspects of the film. While labels such as “Action,” “Adventure,”

“Crime,” and “Romance” tell the reader something about the events that will take place in the film, the labels “Historical,” “Sci-Fi,” and “Fantasy” refer to the setting, and “Ani

mation,” “Talk-Show,” and “Reality-TV” all describe the production circumstances.

Therefore, an altogether different approach to this task would be to break up these genres into categories (production, setting, etc.) and ask annotators to assign at least one label from each category. Assuming that the categories and labels are sufficiently well defined (which is not necessarily an easy task), the specific requirement of “at least X number of labels” may greatly improve the IAA and reproducibility of your task. If you were to take this approach, you might create a DTD (Document Type Definition) that looks something like this:

<!ELEMENT setting ( EMPTY ) >

<!ATTLIST setting description ( historical | sci-fi | fantasy ) >

<!ELEMENT production ( EMPTY ) >

<!ELEMENT content ( EMPTY ) >

Of course, this reorganized DTD is only a start: if you want to mandate that every movie be assigned a setting, then you’ll need at least one that can describe a movie set in the present with no particular augmentations to reality. But this is another way to frame the genre task that might prove more useful, depending on the goal you’ve set.

Overall, it’s important to realize that if you have a task with many different labels that you want to use, it’s vital that you create clear definitions for each label and to provide examples of when you want your annotators to use each of them (and equally as im

portant, when you don’t want your annotators to use them). While this might seem more important when you are creating your own labels rather than relying on existing terms, you also want to make sure that your annotators’ judgments aren’t clouded by their own preconceptions about what a term means.

Another potential cause of confusion for annotators, aside from their knowledge about a term in the spec, is their knowledge of the material being annotated. If you are asking an annotator to create labels that describe a movie based on a written summary, but the annotator has seen the movie and feels that the summary is inaccurate, you will need to address whether he can use his own world knowledge to augment the information in the document he is labeling. Though for reprodu

cibility and ML purposes, we strongly recommend against using outside knowledge or intuition as a source for annotation data.

Dans le document Natural Language Annotation for Machine Learning (Page 131-135)