Gap Threading - Extending the Syntactic Coverage

4.2 Extending the Syntactic Coverage

4.2.7 Gap Threading

The technique used for passing gap information among the nonterminals in grammar rules outlined in the previous section has two problems:

1. Several versions of each rule, differing only in which constituent(s) the gap in-formation is passed to, may be needed. For instance, a rule for building dative verb phrases

vp --> datv, np, pp.

would need two versions vp(GapInfo) -->

datv, np(GapInfo), pp(nogap).

vp(GapInfo) -->

datv, np(nogap), pp(GapInfo).

so as to allow a gap to occur in either the NP or PP, as in the sentences What did Alfred give to Bertrand?

Who did Alfred give a book to?

2. Because of the multiple versions of rules, sentences with no gaps will receive multiple parses. For instance, the sentence

Alfred gave a book to Bertrand.

would receive one parse using the first dative VP rule (withGapInfobound to nogap) and another with the second dative VP rule.

An alternative method for passing gap information, sometimes referred to as gap threading, has been used extensively in the logic programming literature. It is based on data structures called difference lists.

Difference Lists

The encoding of sequences of terms as lists using the.operator and[ ]is so natural that it seems unlikely that alternatives would be useful. However, in certain cases, sequences may be better encoded with a data structure known as a difference list. A difference list is constructed from a pair of list structures one of which is a suffix of the other. Every list is a suffix of itself. Also, if the list is of the form[Head|Tail]then every suffix ofTailis a suffix of the whole list. Thus the relation between lists and their suffixes is the reflexive transitive closure of the relation between list and their tails.

We will use the binary infix operator “-” to construct a difference list from the two component lists. A difference listList-Suffixencodes the sequence of elements in Listup to but not including those inSuffix. Thus the elements inList-Suffixare the list difference of the elements inListand the elements inSuffix. For instance, the sequence of elementsh1,2,3imight be encoded as the list[1,2,3]or as any of the difference lists[1,2,3,4]-[4],[1,2,3]-[ ],[1,2,3|X]-X.

We will be especially concerned with the most general difference-list encoding of a sequence, that is, the encoding in which the suffix is a variable. The final exam-ple of a difference-list encoding of the sequenceh1,2,3iis of this form. Any other difference-list encoding of the sequence is an instance of[1,2,3|X]-X. Henceforth, the term “difference list” will mean a most general difference list. We will also use the terms front and back for the two components of a difference list. Note that the empty difference list isX-X.

The difference-list encoding of sequences has one key advantage over the standard list encoding. Concatenation of difference lists is far simpler, requiring only a single unit clause.

Program 4.6

conc_dl(Front-Back1, Back1-Back2, Front-Back2).

The predicateconc_dlperforms concatenation of difference lists by simply unifying the back of the first list with the front of the second. This engenders the following behavior:

4.2. Extending the Syntactic Coverage

This digital edition of Prolog and Natural-Language Analysis is distributed at no charge for noncommercial use by Microtome Publishing.

107

?- conc_dl([1,2,3|X]-X, [4,5|Y]-Y, Result).

Result = [1,2,3,4,5|Y]-Y yes

Actually, we have seen difference lists before. The use of pairs of string positions encoded as lists to encode the list between the positions is an instance of a differ-ence list encoding. We can see this more clearly by taking the encoding of grammar rules using explicit concatenation, as briefly mentioned in Chapter 1, and substituting difference-list concatenation. Using explicit concatenation, the rule

S →NP VP would be axiomatized (as in Chapter 1) as

(∀u,v,w)NP(u)∧VP(v)∧conc(u,v,w)⇒S(w) or in Prolog,

s(W) :- np(U), vp(V), conc(U, V, W).

Substituting difference-list concatenation, we have s(W) :- np(U), vp(V), conc_dl(U, V, W).

and partially executing this clause with respect to theconc_dlpredicate in order to remove the final literal, we get the following clause (with variable names chosen for obvious reasons):

s(P0-P) :- np(P0-P1), vp(P1-P).

Thus, we have been using a difference list encoding for sequences of words implicitly throughout our discussion of DCGs.

Difference Lists for Filler-Gap Processing

We now turn to the use of difference lists in filler-gap processing. First, think of the gap information associated with each node as providing the list of gaps covered by the node whose corresponding fillers are not covered by it. Alternatively, this cn be viewed as the list of gaps whose filler-gap dependency passes through the given node.

We will call this list the filler list of the node. For the most part the filler list of each constituent is the concatenation of the filler lists of its subconstituents. For instance, for the dative VP rule, we have

vp(FL) --> datv, np(FL1), pp(FL2), {conc_dl(FL1, FL2, FL)}.

We include only those constituents which might potentially include a gap in the con-catenation. Again, we remove the explicit concatenations by partial execution yielding

vp(F0-F) --> datv, np(F0-F1), pp(F1-F).

Similarly, other rules will display this same pattern.

s(F0-F) --> np(F0-F1), vp(F1-F).

vp(F0-F) --> tv, np(F0-F).

vp(F0-F0) --> iv.

...

We turn now to constituents in which a new filler or a new gap is introduced. For instance, the complement relative clause rule requires that a gap be contained in the S which is a sibling of the filler. It therefore states that the filler list of the S contains a single NP.

optrel(F-F) --> relpron, s([gap(np)|F]-F).

The rule introducing NP gaps includes a single NP filler marker on the S, thereby declaring that the S covers a single NP gap.

np([gap(np)|F]-F) --> [].

Island constraints can be added to a grammar using this encoding of filler-gap dependencies in two ways. First, we can leave out filler information for certain con-stituents, as we did for verbs and relative pronouns. More generally, however, we can mandate that a constituent not contain any gaps bound outside the constituent by making its filler list the empty list (i.e.,F-F). For instance, the sentence formation rule above can be modified to make the subject of the sentence an island merely by unifying the two parts of its filler list.

s(F0-F) --> np(F0-F0), vp(F0-F).

The gap-threading technique for encoding filler-gap dependencies solves many of the problems of the more redundant gap-passing method described earlier. Each un-threaded rule generates only one rule with appropriate gap information. The filler-list information is added in a quite regular pattern. Fillers, islands, and gaps are all given a simple treatment. All of these properties make the gap-threading technique conducive to automatic interpretation as we will do in Section 6.3.3.

However, several problems with the gap-threading technique are known, most showing up only in rather esoteric constructions such as parasitic gap constructions, multiple gaps, crossing filler-gap dependencies, and so forth. Many of these problems can be handled by using more complex combinations of filler lists rather than simple concatenation. For instance, crossing dependencies can be handled by shuffling the filler lists of subconstituents to yield the filler list of the full constituent. Of course, this defeats the simple elegant pattern of variable sharing that difference-list concate-nation engenders.

Dans le document Prolog and Natural-Language Analysis D (Page 115-118)