• Aucun résultat trouvé

Graph-based Dependency Parsing (Chu-Liu-Edmonds algorithm) Sam Thomson (with thanks to Swabha Swayamdipta)

N/A
N/A
Protected

Academic year: 2022

Partager "Graph-based Dependency Parsing (Chu-Liu-Edmonds algorithm) Sam Thomson (with thanks to Swabha Swayamdipta)"

Copied!
54
0
0

Texte intégral

(1)

Graph-based Dependency Parsing

(Chu-Liu-Edmonds algorithm)

Sam Thomson (with thanks to Swabha Swayamdipta)

University of Washington, CSE 490u

February 22, 2017

(2)

Outline

I Dependency trees

I Three main approaches to parsing

I Chu-Liu-Edmonds algorithm

I Arc scoring / Learning

(3)

Dependency Parsing - Output

(4)

Dependency Parsing

TurboParser output from

http://demo.ark.cs.cmu.edu/parse?sentence=I%20ate%20the%20fish%20with%20a%20fork.

(5)

Dependency Parsing - Output Structure

A parse is anarborescence (akadirected rooted tree):

I Directed [Labeled] Graph

I Acyclic

I Single Root

I Connected and Spanning: ∃ directed path from root to every other word

(6)

Projective / Non-projective

I Some parses are projective: edges don’t cross

I Most English sentences are projective, but non-projectivity is common in other languages (e.g. Czech, Hindi)

Non-projective sentence in English:

and Czech:

Examples fromNon-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05

(7)

Dependency Parsing - Approaches

(8)

Dependency Parsing Approaches

I Chart (Eisner, CKY)

I O(n3)

I Onlyproduces projective parses

I Shift-reduce

I O(n) (fast!), but inexact

I “Pseudo-projective” trick can capture some non-projectivity

I Graph-based (MST)

I O(n2) for arc-factored

I Can produce projectiveandnon-projective parses

(9)

Dependency Parsing Approaches

I Chart (Eisner, CKY)

I O(n3)

I Onlyproduces projective parses

I Shift-reduce

I O(n) (fast!), but inexact

I “Pseudo-projective” trick can capture some non-projectivity

I Graph-based (MST)

I O(n2) for arc-factored

I Can produce projectiveandnon-projective parses

(10)

Dependency Parsing Approaches

I Chart (Eisner, CKY)

I O(n3)

I Onlyproduces projective parses

I Shift-reduce

I O(n) (fast!), but inexact

I “Pseudo-projective” trick can capture some non-projectivity

I Graph-based (MST)

I O(n2) for arc-factored

I Can produce projectiveandnon-projective parses

(11)

Graph-based Dependency Parsing

(12)

Arc-Factored Model

Every possible labeled directed edgee between every pair of nodes gets a score,score(e).

G =hV,Ei=

(O(n2) edges)

Example fromNon-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05

(13)

Arc-Factored Model

Every possible labeled directed edgee between every pair of nodes gets a score,score(e).

G =hV,Ei=

(O(n2) edges)

Example fromNon-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05

(14)

Arc-Factored Model

Best parse is:

A = arg max

A⊆G s.t.Aan arborescence

X

e∈A

score(e)

etc. . .

The Chu-Liu-Edmonds algorithm finds this argmax.

Example fromNon-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05

(15)

Arc-Factored Model

Best parse is:

A = arg max

A⊆G s.t.Aan arborescence

X

e∈A

score(e)

etc. . .

The Chu-Liu-Edmonds algorithm finds this argmax.

Example fromNon-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05

(16)

Arc-Factored Model

Best parse is:

A = arg max

A⊆G s.t.Aan arborescence

X

e∈A

score(e)

etc. . .

The Chu-Liu-Edmonds algorithm finds this argmax.

Example fromNon-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05

(17)

Chu-Liu-Edmonds

Chu and Liu ’65, On the Shortest Arborescence of a Directed Graph, Science Sinica

Edmonds ’67, Optimum Branchings, JRNBS

(18)

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge

In fact, every connected component that doesn’t containROOT needs exactly 1 incoming edge

I Greedily pick an incoming edge for each node.

I If this forms an arborescence, great!

I Otherwise, it will contain a cycleC.

I Arborescences can’t have cycles, so we can’t keep every edge in C. One edge inC must get kicked out.

I C also needs an incoming edge.

I Choosing an incoming edge forC determines which edge to kick out

(19)

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge

In fact, every connected component that doesn’t containROOT needs exactly 1 incoming edge

I Greedily pick an incoming edge for each node.

I If this forms an arborescence, great!

I Otherwise, it will contain a cycleC.

I Arborescences can’t have cycles, so we can’t keep every edge in C. One edge inC must get kicked out.

I C also needs an incoming edge.

I Choosing an incoming edge forC determines which edge to kick out

(20)

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge

In fact, every connected component that doesn’t containROOT needs exactly 1 incoming edge

I Greedily pick an incoming edge for each node.

I If this forms an arborescence, great!

I Otherwise, it will contain a cycleC.

I Arborescences can’t have cycles, so we can’t keep every edge in C. One edge inC must get kicked out.

I C also needs an incoming edge.

I Choosing an incoming edge forC determines which edge to kick out

(21)

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge

In fact, every connected component that doesn’t containROOT needs exactly 1 incoming edge

I Greedily pick an incoming edge for each node.

I If this forms an arborescence, great!

I Otherwise, it will contain a cycleC.

I Arborescences can’t have cycles, so we can’t keep every edge in C. One edge inC must get kicked out.

I C also needs an incoming edge.

I Choosing an incoming edge forC determines which edge to kick out

(22)

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge

In fact, every connected component that doesn’t containROOT needs exactly 1 incoming edge

I Greedily pick an incoming edge for each node.

I If this forms an arborescence, great!

I Otherwise, it will contain a cycleC.

I Arborescences can’t have cycles, so we can’t keep every edge in C. One edge inC must get kicked out.

I C also needs an incoming edge.

I Choosing an incoming edge forC determines which edge to kick out

(23)

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge

In fact, every connected component that doesn’t containROOT needs exactly 1 incoming edge

I Greedily pick an incoming edge for each node.

I If this forms an arborescence, great!

I Otherwise, it will contain a cycleC.

I Arborescences can’t have cycles, so we can’t keep every edge in C. One edge in C must get kicked out.

I C also needs an incoming edge.

I Choosing an incoming edge forC determines which edge to kick out

(24)

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge

In fact, every connected component that doesn’t containROOT needs exactly 1 incoming edge

I Greedily pick an incoming edge for each node.

I If this forms an arborescence, great!

I Otherwise, it will contain a cycleC.

I Arborescences can’t have cycles, so we can’t keep every edge in C. One edge in C must get kicked out.

I C also needs an incoming edge.

I Choosing an incoming edge forC determines which edge to kick out

(25)

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge

In fact, every connected component that doesn’t containROOT needs exactly 1 incoming edge

I Greedily pick an incoming edge for each node.

I If this forms an arborescence, great!

I Otherwise, it will contain a cycleC.

I Arborescences can’t have cycles, so we can’t keep every edge in C. One edge in C must get kicked out.

I C also needs an incoming edge.

I Choosing an incoming edge forC determines which edge to kick out

(26)

Chu-Liu-Edmonds - Recursive (Inefficient) Definition

defmaxArborescence(V , E ,ROOT):

””” returns best arborescence as a map from each node to its parent ”””

forvinV\ROOT:

bestInEdge[v]arg maxe∈inEdges[v]e.score ifbestInEdgecontains a cycle C:

# build a new graph whereCis contracted into a single node vCnewNode()

V0V∪ {vC} \C

E0← {adjust(e)foreE\C}

AmaxArborescence(V0, E0,ROOT)

return{e.originalforeA} ∪C\ {A[vC].kicksOut}

# each node got a parent without creating any cycles returnbestInEdge

defadjust(e):

e0copy(e) e0.originale ife.destC:

e0.destvC

e0.kicksOutbestInEdge[e.dest]

e0.scoree.scoree0.kicksOut.score elife.srcC:

e0.srcvC returne0

(27)

Chu-Liu-Edmonds

Consists of two stages:

I Contracting (everythingbeforethe recursive call)

I Expanding (everythingafterthe recursive call)

(28)

Chu-Liu-Edmonds - Preprocessing

I Remove every edge incoming toROOT

I This ensures thatROOTis in fact the root of any solution

I For every ordered pair of nodes,vi,vj, remove all but the highest-scoring edge from vi tovj

(29)

Chu-Liu-Edmonds - Contracting Stage

I For each non-ROOT nodev, set bestInEdge[v] to be its highest scoring incoming edge.

I If a cycleC is formed:

I contractthe nodes inC into a new nodevC

I edges outgoing from any node inC now get sourcevC I edges incoming to any node inC now get destinationvC I For each nodeu inC, and for each edgee incoming toufrom

outside ofC:

I sete.kicksOuttobestInEdge[u], and

I sete.scoreto bee.scoree.kicksOut.score.

I Repeat until every non-ROOT node has an incoming edge and no cycles are formed

(30)

An Example - Contracting Stage

V1

ROOT

V3 V2

a: 5 b: 1 c: 1

f : 5 d: 11

h: 9 e: 4

i: 8 g: 10

bestInEdge V1

V2 V3

kicksOut a

b c d e f g h i

(31)

An Example - Contracting Stage

V1

ROOT

V3 V2

a: 5 b: 1 c: 1

f : 5 d: 11

h: 9 e: 4

i: 8 g: 10

bestInEdge

V1 g

V2 V3

kicksOut a

b c d e f g h i

(32)

An Example - Contracting Stage

V1

ROOT

V3 V2

a: 5 b: 1 c: 1

f : 5 d: 11

h: 9 e: 4

i: 8 g: 10

bestInEdge

V1 g

V2 d

V3

kicksOut a

b c d e f g h i

(33)

An Example - Contracting Stage

V1

ROOT

V3 V2

a: 510 b: 111 c: 1

f : 5 d: 11

h: 910 e: 4

i: 811 g: 10

V4

bestInEdge

V1 g

V2 d

V3

kicksOut

a g

b d

c d e f g

h g

i d

(34)

An Example - Contracting Stage

V4

ROOT

V3

b:−10 c: 1

f: 5 a:−5

h:−1 e: 4 i:−3

bestInEdge

V1 g

V2 d

V3 V4

kicksOut

a g

b d

c d e f g

h g

i d

(35)

An Example - Contracting Stage

V4

ROOT

V3

b:−10 c: 1

f: 5 a:−5

h:−1 e: 4 i:−3

bestInEdge

V1 g

V2 d

V3 f

V4

kicksOut

a g

b d

c d e f g

h g

i d

(36)

An Example - Contracting Stage

V4

ROOT

V3

b:−10 c: 1

f: 5 a:−5

h:−1 e: 4 i:−3

bestInEdge

V1 g

V2 d

V3 f

V4 h

kicksOut

a g

b d

c d e f g

h g

i d

(37)

An Example - Contracting Stage

V4

ROOT

V3

b:−10− −1 c: 15

f: 5 a:−5− −1

h:−1 e: 4 i:−3

V5

bestInEdge

V1 g

V2 d

V3 f

V4 h

V5

kicksOut

a g, h

b d, h

c f

d e f g

h g

i d

(38)

An Example - Contracting Stage

V5 ROOT

b:−9 a:−4 c:−4

bestInEdge

V1 g

V2 d

V3 f

V4 h

V5

kicksOut

a g, h

b d, h

c f

d

e f

f g

h g

i d

(39)

An Example - Contracting Stage

V5 ROOT

b:−9 a:−4 c:−4

bestInEdge

V1 g

V2 d

V3 f

V4 h

V5 a

kicksOut

a g, h

b d, h

c f

d

e f

f g

h g

i d

(40)

Chu-Liu-Edmonds - Expanding Stage

After the contracting stage, every contracted node will have exactly onebestInEdge. This edge will kick out one edge inside the contracted node, breaking the cycle.

I Go through each bestInEdgee in the reverseorder that we added them

I lock down e, andremove every edge inkicksOut(e) from bestInEdge.

(41)

An Example - Expanding Stage

V5 ROOT

b:−9 a:−4 c:−4

bestInEdge

V1 g

V2 d

V3 f

V4 h

V5 a

kicksOut

a g, h

b d, h

c f

d

e f

f g

h g

i d

(42)

An Example - Expanding Stage

V5 ROOT

b:−9 a:−4 c:−4

bestInEdge

V1 a

g

V2 d

V3 f

V4 a

h

V5 a

kicksOut

a g, h

b d, h

c f

d

e f

f g

h g

i d

(43)

An Example - Expanding Stage

V4

ROOT

V3

b:−10 c: 1

f: 5 a:−5

h:−1 e: 4 i:−3

bestInEdge

V1 a

g

V2 d

V3 f

V4 a

h

V5 a

kicksOut

a g, h

b d, h

c f

d

e f

f g

h g

i d

(44)

An Example - Expanding Stage

V4

ROOT

V3

b:−10 c: 1

f: 5 a:−5

h:−1 e: 4 i:−3

bestInEdge

V1 a

g

V2 d

V3 f

V4 a

h

V5 a

kicksOut

a g, h

b d, h

c f

d

e f

f g

h g

i d

(45)

An Example - Expanding Stage

V1

ROOT

V3 V2

a: 5 b: 1 c: 1

f : 5 d: 11

h: 9 e: 4

i: 8 g: 10

bestInEdge

V1 a

g

V2 d

V3 f

V4 a

h

V5 a

kicksOut

a g, h

b d, h

c f

d

e f

f g

h g

i d

(46)

An Example - Expanding Stage

V1

ROOT

V3 V2

a: 5 b: 1 c: 1

f : 5 d: 11

h: 9 e: 4

i: 8 g: 10

bestInEdge

V1 a

g

V2 d

V3 f

V4 a

h

V5 a

kicksOut

a g, h

b d, h

c f

d

e f

f g

h g

i d

(47)

Chu-Liu-Edmonds - Notes

I This is a greedy algorithm with a clever form of delayed back-tracking to recover from inconsistent decisions (cycles).

I CLE is exact: it always recovers the optimal arborescence.

(48)

Chu-Liu-Edmonds - Notes

I Efficient implementation:

Tarjan ’77, Finding Optimum Branchings, Networks

Not recursive. Uses aunion-find(a.k.a. disjoint-set) data structure to keep track of collapsed nodes.

I Even more efficient:

Gabow et al. ’86, Efficient Algorithms for Finding Minimum Spanning Trees in Undirected and Directed Graphs, Combinatorica

Uses a Fibonacci heapto keep incoming edges sorted. Finds cycles by following bestInEdgeinstead of randomly visiting nodes.

Describes how to constrain ROOT to have only one outgoing edge

(49)

Chu-Liu-Edmonds - Notes

I Efficient (wrong)implementation:

Tarjan ’77, Finding Optimum Branchings*, Networks

*corrected in Camerini et al. ’79, A note on finding optimum branchings, Networks

Not recursive. Uses aunion-find(a.k.a. disjoint-set) data structure to keep track of collapsed nodes.

I Even more efficient:

Gabow et al. ’86, Efficient Algorithms for Finding Minimum Spanning Trees in Undirected and Directed Graphs, Combinatorica

Uses a Fibonacci heapto keep incoming edges sorted. Finds cycles by following bestInEdgeinstead of randomly visiting nodes.

Describes how to constrain ROOT to have only one outgoing edge

(50)

Chu-Liu-Edmonds - Notes

I Efficient (wrong)implementation:

Tarjan ’77, Finding Optimum Branchings*, Networks

*corrected in Camerini et al. ’79, A note on finding optimum branchings, Networks

Not recursive. Uses aunion-find(a.k.a. disjoint-set) data structure to keep track of collapsed nodes.

I Even more efficient:

Gabow et al. ’86, Efficient Algorithms for Finding Minimum Spanning Trees in Undirected and Directed Graphs, Combinatorica

Uses a Fibonacci heapto keep incoming edges sorted.

Finds cycles by following bestInEdgeinstead of randomly visiting nodes.

Describes how to constrain ROOT to have only one outgoing edge

(51)

Arc Scoring / Learning

(52)

Arc Scoring

Features

can look at source (head), destination (child), and arc label.

For example:

I number of words between head and child,

I sequence of POS tags between head and child,

I is head to the left or right of child?

I vector state of a recurrent neural net at head and child,

I vector embedding of label,

I etc.

(53)

Learning

Recall that when we have a parameterized model, and we have a decoder that can make predictions given that model. . .

we can use structured perceptron, or structured hinge loss:

Lθ(xi,yi) = max

y∈Y{scoreθ(y) +cost(y,yi)} −scoreθ(yi)

(54)

Learning

Recall that when we have a parameterized model, and we have a decoder that can make predictions given that model. . .

we can use structured perceptron, or structured hinge loss:

Lθ(xi,yi) = max

y∈Y{scoreθ(y) +cost(y,yi)} −scoreθ(yi)

Références

Documents relatifs

• Find a memory mapping to provide conflict free access to all the memory banks at any time instance [11], [15], [16]. In the first category of decoder

In order to enforce R 1 over R 0 and thus require The lamp bulb is ok for the light to be on, we need the ability to make some subsumed information (namely, R 1 ) prevail over (or

Further an edge links two nodes of the line graph if and only if the corresponding edges of the original graph share a node.. Although line graph is the term most commonly used, it

We devise a constant-factor approximation algorithm for the maximization version of the edge-disjoint paths problem if the supply graph together with the demand edges form a

The main goal of this study is to holistically analyse the Edge computing architecture, a non-trivial extension of cloud computing [Sark16], which serves as a platform that

Patient preference for subcutaneous trastuzumab via handheld syringe versus intravenous infusion in HER2-positive early breast cancer: cohort 2 of the PrefHer study; abstract

By using the following theorem, we can reduce the disjoint paths problem to an equivalent smaller problem if the input graph has a large clique minor.. An outline of the proof

Fifteen years after the EU member states confirmed an EU accession perspective for all Western Balkan countries at the June 2003 Thessaloniki summit, the European Commission