Thesis Outline - Algorithms and frameworks for Tree-based Machine Translation and tree structur

The thesis is divided in two main parts: in the first part we describe the work done on improving the efficiency of the HMT decoder focussing on the Cube Pruning algorithm, while in the second part we present new training and decoding frameworks for Tree-based MT.

Chapter 2 contains the line of work relative to Cube Pruning. We start by introducing the topic in sections 2.2 and 2.4. We give a general formalization of the Cube Pruning problem in monotonic search space in sections 2.4.1, 2.4.2 and 2.4.3. In sections 2.4.4 and 2.4.5, we show how CP relates to well known problems such as Parsing and K-way Merge. In section 2.4.6, we describe a technique allowing to generalize a given CP algorithm for a search space with any number of dimensions. CP pruning is often applied to approximately-monotonic search spaces, thus we discuss how the algorithm is applied in these contexts as a heuristic solution in section 2.4.7. In section 2.5, we describe in detail the standard CP algorithm using a formalism consistent with the following presentation. In section 2.6, we present the set of algorithms that improve over standard Cube Pruning in terms of runtime but keeping the overall asymptotic complexity of O(n log n). Furthermore, in section 2.7, we present a linear time algorithm that solves exactly a relaxation of the original CP problem and that can be applied as heuristic solution. In section 2.8, we apply the proposed CP algorithms to the task of MT, and compare them to the standard Cube Pruning algorithm.

In Chapter 3 we present a novel Structured Prediction approach to HMT based on deterministic decoding and discriminative training. In section 3.2 and 3.3, we introduce the core concepts and formalisms on which we base the design of our models, and we discuss the related works that have inspired or introduced the novel ideas that we are introducing in Tree-based Machine Translation. In section 3.4 we introduce a novel deterministic undirectional approach for HMT. Then, we introduce two alternative frameworks to train the parameters of the scoring function. In section 3.5, we describe the first proposed training framework, that focusses optimizing each individual action selection given the local context. While in section 3.6, we describe the second proposed training framework, that focusses on learning a decoding policy that optimizes a global loss function. We discuss and compare the two proposed frameworks in section 3.7. In section 3.8 we show how the features designed for standard bottom-up HMT can be adapted to the undirected approach, and we introduce a new feature from the new class of undirected features. And finally in section 3.9, we test the proposed models and compare them with standard HMT.

In Chapter 4, we give a literature review of the publications that have been most influential for the lines of work described in this dissertation. Chapter 5 contains conclusions and future work and future research plans.

Cube Pruning

In this chapter we present a series of alternative algorithms to solve the Cube Pruning (CP) problem. The proposed algorithms leverage an aggressive prun-ing and little memory usage to achieve lower complexity and faster execution time. Some of the proposed algorithms have a lower asymptotic complexity than the standard CP algorithm. Any of the proposed algorithms can substitute the standard Cube Pruning algorithm in any of its applications, offering a different balance between speed and accuracy. From now on, we refer to the original CP al-gorithm as “Standard Cube Pruning” (SCP) to distinguish it from the alal-gorithms we propose.

Any problem whose solution involves the application of the SCP algorithm can take advantage of the improvements described in this chapter. Either in monotonic search spaces or in search spaces with constant slope dimensions or in cases where this conditions hold approximately such as MT.

2.1 Chapter Outline

In Section 2.2 we review the history of the CP problem. From its origin to the recent adaptations to various tasks.

In Section 2.4, we give a generalized formalization of the Cube Pruning Prob-lem for the Monotonic Search Space case, and then we extend it to the Approx-imately Monotonic Search Space case. Furthermore we describe how the CP

problem relates to other known problems and applications such as parsing and K-way merge.

In Section 2.5, we review Standard Cube Pruning solutions based on the Lazy Algorithm, and show that it runs with a complexity of O(klog(k)), where k is the beam size.

In Section 2.6, we present a set of algorithms that improves over SCP. These al-gorithms keepO(klog(k)) overall complexity. But they optimize different aspects of the selection and maintenance of candidate elements, and allow to inspect a smaller portion of the search space. We refer to these algorithms as Faster Cube Pruning (FCP) algorithms. We prove that FCP algorithms retrieve the same k-best as SCP in the monotonic search space case.

In Section 2.7, we present a linear time CP approximation algorithm (LCP), that hasO(k) complexity. LCP returns the samek-bestas SCP in the cases where all dimensions but one have constant slope, otherwise it returns an approximation.

In general, the use of LCP instead of SCP leads to an asymptotic reduction of the complexity at a cost of bounded loss in accuracy.

Finally, in Section 2.8, we test and compare empirically all the presented algo-rithms on an application having approximately monotonic search space: a stan-dard Machine Translation task with Language Model (LM) features. We measure which accuracy/speed balance can be achieved with the different algorithms in different settings, show evidences of the LCP asymptotic speed advantage, and discuss in detail the empirical results.

Dans le document Algorithms and frameworks for Tree-based Machine Translation and tree structures prediction (Page 30-34)