Example 4: Link Tags—Semantic Roles - Natural Language Annotation for Machine Learning

While the addition of extent tags to an annotation task requires clear guidelines on where those tags should start and end, link tags bring two new questions to the guidelines that need to be answered:

• What are the links connecting?

• When should a link be created?

These questions may seem quite straightforward, but recall our example of a temporal link annotation from the discussion of informativity and correctness in “Refining Your Goal: Informativity Versus Correctness” (page 35). Admittedly, that was a somewhat extreme example of how an annotation task can get out of hand, but it does illustrate the importance of having clear guidelines about when links are needed and when they

should be created. Fortunately, most annotation tasks have much clearer boundaries when it comes to linking extents than our sample temporal annotation did. For example, the semantic role task for films that we discussed in “Semantic Roles” (page 72) doesn’t have as much potential for getting completely blown out of proportion, although it too has potential for confusion. Remember that the task specified that semantic relation

ships between actors, characters, writers, directors, and movies would be annotated with the roles acts_in, acts_as, directs, writes, and character_in. So, for example, in the sentence “James Cameron directed Avatar” we would have a link between “James Cameron” (who would be tagged as a director) and “Avatar” (which would be tagged as a film_title), and the link would have the semantic role directs.

But even this straightforward example has a few places where the task could become more complicated. Let’s look at the sample review we saw in the preceiding chapter (here the actors, writers, and directors are in bold, film titles are in italics, and characters are in constant width:

In Love, Actually, writer/director Richard Curtis weaves a convoluted tale about charac

ters and their relationships. Of particular note is Liam Neeson (Schindler’s List, Star Wars) as Daniel, a man struggling to deal with the death of his wife and the relationship with his young stepson, Sam (Thomas Sangster). Emma Thompson (Sense and Sensibil

ity, Henry V) shines as a middle-aged housewife whose marriage with her hus band (played by Alan Rickman) is under siege by a beautiful secretary. While this movie does have its purely comedic moments (primarily presented by Bill Nighy as out-of-date rock star Billy Mack), this movie avoids the more in-your-face comedy that Curtis has presented before as a writer for Blackadder and Mr. Bean, presenting instead a remarkable, gently humorous insight into what love, actually, is.

While most of the semantic role annotations here are quite straightforward, there are a few pieces that might trip up conscientious annotators. For example, when creating writes links for Black Adder and Mr. Bean, should those film titles be linked to the Curtis that appears in the same sentence, or should they be linked back to the Richard Curtis in the first sentence, because that’s his full name? Similarly, should every act_in and character_in relationship for the movie being reviewed be linked back to the mention of the title in the first sentence, or should they be linked to (currently unan

notated) phrases such as “this movie”? If Love, Actually were mentioned more than once in the review, should the annotators link actors and characters to the closest mention of the title, only to the first one, or to all of them?

We aren’t going to provide you with the answers to these questions, because there is no One True Answer to them. How you approach the annotation will depend on your goal and model, and simple trial and error with your guidelines and annotators will help determine what the most reasonable and useful answer to these questions is for your task. However, don’t forget to check out guidelines for similar tasks for suggestions on what has worked for other people!

Annotators

A key component of any annotation project is, of course, the people who perform the annotation task. Clearly, this means that some thought needs to be put into who you find to create your annotations. To that end, we suggest being able to answer the fol

lowing questions:

• What language or languages do your annotators need to know to perform your annotation task?

• Does your annotation task require any specialized knowledge to understand or perform?

• What are the practical considerations that need to be taken into account (money, time, size of the dataset, etc.)?

Let’s go through these one at a time.

What language or languages do your annotators need to know to perform your annota

tion task? And furthermore, how well do they need to know them?

Chances are, the answer to this question is pretty obvious, but it’s still worth spec

ifying. If your task requires close reading of a text (e.g., anaphoric relationships, word sense disambiguation, or semantic roles), you may want to limit your anno

tators to native speakers of the language that you are annotating. For some anno

tations, you may be able to use nonnative speakers, however, and for some tasks they might even be preferred (e.g., if the purpose of the task is to learn about the second-language learner’s perceptions of his new language). Regardless of what you decide, be sure to make any language preferences clear in any job postings or de

scriptions.

Does your annotation task require any specialized knowledge to understand or perform?

Aside from the language(s) the texts are in, is there any other outside knowledge that your annotators need to have to perform well on this task? If your task is one of POS tagging, finding annotators who are familiar with those concepts (perhaps people who have taken a syntax course or two) will probably lessen the time needed to train your annotators and increase IAA.

There are other factors that can affect what your annotators need to know to per

form well at your annotation task, such as the actual source material. Biomedical and clinical annotations are areas that more and more Natural Language Processing (NLP) researchers are looking into, but it’s much easier for an annotator to identify and label gene expressions in scientific papers if she is already familiar with the concepts and vocabulary. Clinical documents such as hospital notes and discharge summaries can be even trickier, because chances are, you will need someone trained as an RN (if not an MD) to interpret any medical information you might be inter

ested in due to how dense and jargon-filled the text is.

If you do decide that you will be selecting annotators with certain skills or knowledge, be sure to keep track of that information and make it available to other people who use your corpus and guide

lines. An annotation task’s reproducibility is increased when all the variables are accounted for, just like any other experiment!

What are the practical considerations that need to be taken into account?

One thing you need to consider when planning your annotation project and where to find annotators is that annotation takes time. Obviously, tasks that have a high density of tags, such as POS tagging, are time-consuming simply because there is a one-to-one ratio of tags to words. But more than that, most annotation tasks can only be done for a few hours at a time by most people. Annotation requires a lot of concentration and attention to detail, and if you expect your annotators to do it from 9:00 to 5:00 for days in a row, you will likely get very inconsistent annotations.

Annotation will speed up as your workers get used to the task, but make sure you allow enough time in your schedule for your annotators to do good work.

If you are expanding on an annotation task/guideline that already exists, it’s worth the time to train your annotators on data from the previous dataset. That way, you have a solid way to evaluate wheth

er your annotators understand the given task, and you can make necessary adjustments to the guidelines without compromising your own dataset.

In theory, if you were on a tight schedule, you could simply hire and train more annotators to all work at the same time. However, as we will discuss further in

“Evaluating the Annotations” (page 126), you need to make sure each file gets anno

tated at least twice (so that you can calculate IAA scores), and these things are generally easier to manage when you aren’t overwhelmed with annotators.

Also, even if your annotation guidelines have been repeatedly modified and per

fected, the longer that an annotator has to adjust to a task, the better he will be at it, and the more time you allocate to getting the annotation done, the better your annotators will be able to acclimate to the task and thereby generate more accurate annotations.

Dans le document Natural Language Annotation for Machine Learning (Page 136-140)