Breadth-First Top-Down Parsing - General Directional Top-Down Parsing

General Directional Top-Down Parsing

6.3 Breadth-First Top-Down Parsing

The breadth-ﬁrst solution to the top-down parsing problem is to maintain a list of all possible predictions. Each of these predictions is then processed as described in Section 6.2 above, that is, if there is a non-terminal on top, the prediction stack is replaced by several new prediction stacks, as many as there are choices for this non-terminal. In each of these new prediction stacks, the top non-terminal is replaced by the corresponding choice. This prediction step is repeated for all prediction stacks it applies to (including the new ones), until all prediction stacks have a terminal on top.

For each of the prediction stacks we match the terminal in front with the current input symbol, and strike out all prediction stacks that do not match. If there are no prediction stacks left, the sentence does not belong to the language. So instead of one prediction (stack, analysis stack) pair, our automaton now maintains a list of prediction (stack, analysis stack) pairs, one for each possible choice, as depicted in Figure 6.7.

matched input rest of input analysis1 prediction1 analysis2 prediction2

··· ···

Fig. 6.7. An instantaneous description of our extended automaton

The method is suitable for on-line parsing, because it processes the input from left to right. Any parsing method that processes its input from left to right and results in a leftmost derivation is called anLLparsing method. The ﬁrstLstands for Left to right, and the secondLfor Leftmost derivation.

Now we almost know how to write a parser along these lines, but there is one detail that we have not properly dealt with yet: termination. Does the input sentence belong to the language deﬁned by the grammar when, ultimately, we have an empty prediction stack? Only when the input is exhausted! To avoid this extra check, and to

avoid problems about what to do when we arrive at the end of sentence but have not ﬁnished parsing yet, we introduce a special so-calledend marker#. This end marker is appended both to the end of the sentence and to the end of the prediction, so when both copies match we know that the prediction has been matched by the input and the parsing has succeeded.

Fig. 6.8. The breadth-ﬁrst parsing of the sentenceaabc

6.3.1 An Example

Figure 6.8 presents a complete breadth-ﬁrst parsing of the sentenceaabc. At ﬁrst there is only one prediction stack: it contains the start symbol and the end marker;

no symbols have been accepted yet (framea). The step leading to (b) is a predict step; there are two possible right-hand sides, so we obtain two prediction stacks.

The difference of the prediction stacks is also reﬂected in the analysis stacks, where the different sufﬁxes ofSrepresent the different right-hand sides predicted. Another predict step with multiple right-hand sides leads to (c). Now all prediction stacks have a terminal on top; all happen to match, resulting in (d). Next, we again have some predictions with a non-terminal in front, so another predict step leads us to (e).

The next step is a match step, and fortunately, some matches fail; these are dropped as they can never lead to a successful parse. From (f) to (g) is again a predict step.

Another match in which again some matches fail leads us to (h). A further prediction results in (i) and then a match brings us ﬁnally to (j), leading to a successful parse with the end markers matching.

The analysis is S₂A₂aA₁aB₁bc#

For now, we do not need the terminals in the analysis; discarding them gives S₂A₂A₁B₁

This means that we get a leftmost derivation by ﬁrst applying ruleS₂, then ruleA₂, etc., all the time replacing the leftmost non-terminal. Check:

S ---> AB ---> aAB ---> aaB ---> aabc

The breadth-ﬁrst method described here was ﬁrst presented by Greibach [7].

However, in that presentation, grammars are ﬁrst transformed into Greibach Normal Form, and the steps taken are like the ones our initial pushdown automaton makes.

The predict and match steps are combined.

6.3.2 A Counterexample: Left Recursion

The method discussed above clearly works for this grammar, and the question arises whether it works for all context-free grammars. One would think it does, because all possibilities are systematically tried, for all non-terminals, in any occurring predic-tion. Unfortunately, this reasoning has a serious ﬂaw, which is demonstrated by the following example: let us see if the sentenceabbelongs to the language deﬁned by the simple grammar

S ---> Sb | a Our automaton starts off in the following state:

ab#

As we have a non-terminal at the beginning of the prediction, we use a predict step, resulting in:

ab#

S₁ Sb#

S₂ a#

As one prediction again starts with a non-terminal, we predict again:

ab#

S₁S₁ Sbb#

S₁S₂ ab#

S₂ a#

By now, it is clear what is happening: we seem to have ended up in an infinite process, leading us nowhere. The reason for this is that we keep trying theS--->Sb rule without ever coming to a state where a match can be attempted. This problem can occur whenever there is a non-terminal that derives an infinite sequence of sentential forms, all starting with a non-terminal, so no matches can take place. As all these sentential forms in this infinite sequence start with a non-terminal, and the number of non-terminals is finite, there is at least one non-terminalAoccurring more than once at the start of those sentential forms. So we have:A → ··· → Aα. A non-terminal that derives a sentential form starting with itself is calledleft-recursive.

Left recursion comes in several kinds: we speak ofimmediate left recursionwhen there is a grammar ruleA→Aα, like in the ruleS--->Sb; we speak ofindirect left recursionwhen the recursion goes through other rules, for exampleA→Bα,B→Aβ.

Both these forms of left recursion can be concealed byε-producing non-terminals;

this causeshidden left recursionandhidden indirect left recursion, respectively. For example in the grammar

S ---> ABc B ---> Cd B ---> ABf C ---> Se A ---> ε

the non-terminalsS, B, andCare all left-recursive. Grammars with left-recursive non-terminals are called left-recursive as well.

If a grammar has noε-rules and no loops, we could still use our parsing scheme if we use one extra step: if a prediction stack has more symbols than the unmatched part of the input sentence, it can never derive the sentence (every non-terminal derives at least one symbol), so it can be dropped. However, this little trick has one big disadvantage: it requires us to know the length of the input sentence in advance, so the method no longer is suitable for on-line parsing. Fortunately, left recursion can be eliminated: given a left-recursive grammar, we can transform it into a grammar without left-recursive non-terminals that deﬁnes the same language. As left recursion poses a major problem for any top-down parsing method, we will now discuss this grammar transformation.

Dans le document Monographs in Computer Science (Page 193-197)