Transducers - Phonemes and Graphemes - Phonetic Distance Measures for the Induction of a Transl

3.3 Phonemes and Graphemes

4.1.2 Transducers

A transducer is a formal representation of a specific algorithm used to transform a given input string into an output string according to some constraints. It consists of a set of states and a set of transitions between states. The states (vertices) and tran-sitions (edges) form a directed graph. If the set of states is finite, the transducer is called finite-state transducer. Each transition defines one symbol to be consumed on the input string and one symbol to be written on the output string. The set of symbols contains the special symbol², the empty symbol. Used on the input side, it does not consume a letter on the input string, allowing the input string to be shorter than the output string. Used on the output side, it does not produce a symbol on the output string, allowing the input string to be longer than the output string. Depending on the particular structure of the graph, some input strings cannot be transduced at all, and some input strings yield several possible output strings. A transducer with the latter property is non-deterministic (see Figure 4.3 for an example). We will rely massively on such transducers in our application. The input strings are Swiss German dialect words. For each of them, we will generate several hundred output strings, which are potential Standard German words.

4 These values can be considered as the negative logarithms of lexical probabilities (1 if the word is

0 a:b 1

a:- b:a 2

b:c

Figure 4.3: Graphic representation of a non-deterministic transducer. Ini-tial states are represented with a thick border, final states with a double bor-der. The shown transducer transduces the sequenceab to the sequences ba,bc,a,c.

Formally, a finite-state transducer is a tuple

T = 〈S,Σ,Γ,s₀,F,δ〉

whereSis the set of states andΣandΓrepresent the input and output sets of symbols respectively. All three sets are finite. s₀is the initial state (thus,s₀∈S) andF is the set of final states (thus,F ⊆S).δis the transition function defined as follows:

δ:S×(Σ∪{²})×(Γ∪{²})×S→{0, 1}

For each tuple consisting of a start state, an input symbol, an output symbol, and an end state, this function defines if such a transition exists (1) or not (0). This function can be viewed as the characteristic function of the transition relation∆:

∆⊆S×(Σ∪{²})×(Γ∪{²})×S

ApathinT is a sequence of transitions〈d1, . . . ,dn〉such thatQ_n

i=1di >0 and for each i(1≤i <n), the start state ofd_i₊₁is identical to the end state ofd_i. Theinput labelof a path is the concatenation of the input symbols of the transitions defined by the path.

Theoutput labelof a path is the concatenation of the output symbols of the transitions defined by the path.T transduces a stringx∈Σ^∗into a stringy∈Γ^∗(notation:x[T]y) if there exists a path from the initial states₀to a final states_f ∈F whose input label isx and whose output label isy.

In our experiments, the input and the output alphabets will be identical. As the writing conventions for dialects are inspired by Standard German orthography, there are no differences in the character sets. However, if we were to use dialect data in phonetic transcription, two different alphabets would need to be considered.

A transducer as defined above only gives binary answers: an input string is or is not

contained in the lexicon, 0 if it is not).

transduced to an output string. If one input string yields several output strings, they are all equally likely. However, we may want to make more subtle distinctions. Weighted transducers associate numerical values to all transitions. By aggregating these transi-tion weights, each input-output string pair will be assigned a weight. These weights allow for ranking the output candidates of a fixed input string. There are essentially two interpretations for the transition weights. In the first case, the weights represent costs. Small weights represent low costs and therefore very likely transductions. Typi-cally, the cost of a path is thesumof its transition costs. In the second case, the weights representprobabilities. Small weights represent small probabilities and therefore very unlikely transductions. The probability of a path is theproductof its transition proba-bilities. We will use both types of weighted transducers, depending on the model.

In the formal description, weighted transducers differ from binary transducers only with respect to the transition functionδ, whereRrepresents the most general inter-pretation:

δ:S×(Σ∪{²})×(Γ∪{²})×S→R

For the stochastic interpretation, the weights are real numbers situated in [0, 1], the sum of all weights of transitions leaving one state being equal to 1. For the “cost” inter-pretation, the weights may be real numbers or integers.

Some of our models use weighted memoryless transducers, which are a simplified variant of the weighted transducer defined above (for an example, see Figure 4.4). They do not allow transitions to be dependent of the previous transitions. In other words, they are not sensible to the context of the preceding symbols – they are memoryless.

The advantage of this restriction is that memoryless transducers need only one state.

This unique state is initial and final; all transitions leave it and return to it. As it has no proper signification any more, it can be omitted in the formal description. A weighted memoryless transducer is a tuple

T = 〈Σ,Γ,δ〉

whereΣandΓrepresent the input and output alphabets respectively.δis the weighted transition relation defined as follows:

δ: (Σ∪{²})×(Γ∪{²})→R

The maximal number of transitions in a transducer depends on the size of the

alpha-0/0 a:a/0b:b/0a:b/1b:a/1a:-/1b:-/1-:a/1-:b/1

Figure 4.4: Example of a memoryless transducer implementing the Leven-shtein distance algorithm on the alphabet {ab}∗.²is represented by−.

bets and on the number of states. In the case of memoryless transducers, the number of states is fixed at the minimal value. This restricts the maximal number of transitions and yields transducers that are sufficiently simple to be trained with learning algo-rithms.

Transducers can be used in recognition or generation mode. In recognition mode, they take as input a pair of strings, and return the weight associated to the corresponding transition path. In the case of non-weighted transducers, they returnTrueif and only if there exists a corresponding transition path. In generation mode, the transducer only takes one string as input, and it returns all possible output strings defined by the transducer, and their corresponding weights.

Dans le document Phonetic Distance Measures for the Induction of a Translation Lexicon for Dialects - A Study on Bernese Swiss German and Standard German (Page 37-40)