Non-recursive implementation of predictive parsing

Next: LL(1) Grammars Up: Parsing Previous: Predictive parsing

Non-recursive implementation of predictive parsing

THE IDEA. Predictive parsing can be performed using a pushdown stack, avoiding recursive calls.

Initially the stack holds just the start symbol of the grammar.
At each step a symbol X is popped from the stack:
- if X is a terminal symbol then it is matched with lookahead and lookahead is advanced,
- if X is a nonterminal, then using lookahead and a parsing table (implementing the FIRST sets) a production is chosen and its right hand side is pushed onto the stack.
This process goes on until the stack and the input string become empty. It is useful to have an end_of_stack and an end_of_input symbols. We denote them both by $.

Figure 8 shows the structure of non-recursive predictive parsers. For each nonterminal A and each token a the entry M[A, a] of the parsing table contains either an A-production generating sentences starting with a or an error-entry.

**Figure 8:** The structure of non-recursive predictive parsers.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.5]{nonRecursivePredictiveParsing.eps} \end{figure}$

Example 17 Consider the grammar G given by:

S	$\longmapsto$	aAa \| BAa \| $\varepsilon$
A	$\longmapsto$	cA \| bA \| $\varepsilon$
B	$\longmapsto$	b

The parsing table is:

	a	b	c	$
S	S $\longmapsto$ aAa	S $\longmapsto$ BAa		S $\longmapsto$ $\varepsilon$
A	A $\longmapsto$ $\varepsilon$	A $\longmapsto$ bA	A $\longmapsto$ cA
B		B $\longmapsto$ b

where the empty slots correspond to error-entries. Consider the parsing of the word w = bcba.

Stack	Remaining input	action
$S	bcba$	choose S $\longmapsto$ BAa
$aAB	bcba$	choose B $\longmapsto$ b
$aAb	bcba$	match b
$aA	cba$	choose A $\longmapsto$ cA
$aAc	cba$	match c
$aA	ba$	choose A $\longmapsto$ bA
$aAb	ba$	match b
$aA	a$	choose A $\longmapsto$ $\varepsilon$
$a	a$	match a

THE ALGORITHM.

Algorithm 5

$\fbox{ \begin{minipage}{12 cm} \begin{description} \item[{\bf Initial configurat... ...sto Y_1 Y_2 \cdots Y_k$; \\ \> {\bf until} $X = \$$\end{tabbing}\end{minipage}}$

COMPUTING THE FIRST SETS. Recall that for any string $\alpha$ of symbols the set FIRST( $\alpha$ ) satisfies the following conditions for every terminal a and every string of symbols $\beta$

FIRST( $\alpha$ ) $\subseteq$ V_T $\cup$ { $\varepsilon$ }
$\alpha$ $\;\stackrel{{\ast}}{{\Longrightarrow}}\;$ a $\beta$ iff a $\in$ FIRST( $\alpha$ )
$\alpha$ $\;\stackrel{{\ast}}{{\Longrightarrow}}\;$ $\varepsilon$ iff $\varepsilon$ $\in$ FIRST( $\alpha$ ).

For a symbol X $\in$ V_T $\cup$ V_N the set FIRST(X) can be computed as follows

Algorithm 6

$\fbox{ \begin{minipage}{12 cm} \begin{description} \item[{\bf Input:}] $X \in V_... ...) := {\sc FIRST}($X$) ${\cup} \ \{ {\varepsilon} \}$\end{tabbing}\end{minipage}}$

Comments about the computation of FIRST(X) with Algorithm 6.

The case where X is a terminal symbol is trivial.
Assume from now on that X is a nonterminal. The case where X $\longmapsto$ $\varepsilon$ follows immediately from the specifications of FIRST(X).
Assume that there is a production of the form X $\longmapsto$ X₁X₂^...X_k where X₁, X₂,...X_k are grammar symbols. If $\varepsilon$ $\not\in$ FIRST(X₁) then the first letter of a word generated from X₁X₂^...X_k is the first letter of a word generated from X₁ and thus FIRST(X) = FIRST(X₁). If $\varepsilon$ $\in$ FIRST(X₁) but $\varepsilon$ $\not\in$ FIRST(X₂) then the first letter of a word generated from X₁X₂^...X_k is either the first letter of a word generated from X₁ or the first letter of a word generated from X₂. Etc ... This explains the nested for loop.
The last if statement tells that if $\varepsilon$ belongs to all FIRST(X_i) then $\varepsilon$ must be in FIRST(X). This statement is necessary since the nested for loop cannot add $\varepsilon$ to FIRST(X) even if $\varepsilon$ belongs to all FIRST(X_i).

Algorithm 7

$\fbox{ \begin{minipage}{10 cm} \begin{description} \item[{\bf Input:}] $X = X_1 ... ...) := {\sc FIRST}($X$) ${\cup} \ \{ {\varepsilon} \}$\end{tabbing}\end{minipage}}$

The principle of Algorithm 7 is similar to that of Algorithm 6.

If $\varepsilon$ $\not\in$ FIRST(X₁) then the first letter of a word generated from X₁X₂^...X_k must be the first letter of a word generated from X₁.
If $\varepsilon$ $\in$ FIRST(X₁) but $\varepsilon$ $\not\in$ FIRST(X₂) then the first letter of a word generated from X₁X₂^...X_k is either the first letter of a word generated from X₁ or the first letter of a word generated from X₂.
Etc ...
$\varepsilon$ belongs to FIRST( X₁X₂^...X_k) iff it belongs to each FIRST(X_i).

THE FOLLOW SETS. In order to give an algorithm for building the parsing table we define for every A $\in$ V_N

FOLLOW(A) = $\displaystyle \left\{\vphantom{ a \in V_T \, {\cup} \, \{ \$ \} \ \mid \ S\$ \s... ...N)^{\ast} \, {\cup} \, \{ {{\$} \, \varepsilon} \} \end{array} \right. }\right.$ a $\displaystyle \in$ V_T $\displaystyle \cup$ {$} | S$ $\displaystyle \;\stackrel{{\ast}}{{\Longrightarrow}}\;$ $\displaystyle \alpha$ Aar | $\displaystyle \left\{\vphantom{ \begin{array}{l} {\alpha} \in (V_T \, {\cup} \,... ...p} \, V_N)^{\ast} \, {\cup} \, \{ {{\$} \, \varepsilon} \} \end{array} }\right.$ $\displaystyle \begin{array}{l} {\alpha} \in (V_T \, {\cup} \, V_N)^{\ast} \\ r... ...T \, {\cup} \, V_N)^{\ast} \, {\cup} \, \{ {{\$} \, \varepsilon} \} \end{array}$ $\displaystyle \left.\vphantom{ a \in V_T \, {\cup} \, \{ \$ \} \ \mid \ S\$ \st... ...)^{\ast} \, {\cup} \, \{ {{\$} \, \varepsilon} \} \end{array} \right. }\right\}$

(42)

In other words FOLLOW(A) is the set of the terminals that can appear immediately to the right of the nonterminal A in some sentential form. Moreover $ belongs to FOLLOW(A) if A is the rightmost symbol in some sentential form.

COMPUTING THE FOLLOW SETS.

Algorithm 8

$\fbox{ \begin{minipage}{12 cm} \begin{description} \item[{\bf Input:}] $G = (V_T... ...{\bf then} \\ \> \> \> \> \> $done$\ := {\bf true} \end{tabbing}\end{minipage}}$

The FOLLOW sets of all nonterminal symbols are computed together by the following process:

Initially all these sets are empty, except FOLLOW(S).
There exist three rules that can increase FOLLOW(B) for a given nonterminal B. These rules are:
1. if (A $\longmapsto$ $\alpha$ B $\beta$ ) then FOLLOW(B) := FIRST( $\beta$ ) $\setminus$ { $\varepsilon$ } $\cup$ FOLLOW(B)
2. if (A $\longmapsto$ $\alpha$ B) then FOLLOW(B) := FOLLOW(B) $\cup$ FOLLOW(A)
3. if (A $\longmapsto$ $\alpha$ B $\beta$ ) and $\beta$ $\;\stackrel{{\ast}}{{\Longrightarrow}}\;$ $\varepsilon$ then FOLLOW(B) := FOLLOW(B) $\cup$ FOLLOW(A)
Let us call a pass the fact of trying to apply each rule to each nonterminal.
Then the algorithm scheme can be stated as follows: repeat perform a pass until this pass does not change any of the FOLLOW sets.
To implement this process in Algorithm 8 we use two boolean auxiliary variables:
- done which becomes true when a pass could not increase any of the FOLLOW sets.
- building which becomes true during a pass if at least one FOLLOW set could be increased.
In the pseudo-code of Algorithm 8 the value | FOLLOW(B) | denotes the number of elements of the set FOLLOW(B).
One could wonder whether Algorithm 8 could run forever. But it is easy to see that the process has to stop. Indeed, each FOLLOW set contains at most t + 1 symbols where t is the number of terminals. So the number of passes is at most n(t + 1) where n is the number of nonterminals (since each successful pass increases at least by one the number of elements of at least one FOLLOW set).

Example 18 Consider the following grammar (with terminals a, b, c and nonterminals S, A)

S	$\longmapsto$	aAa \| bAa \| $\varepsilon$
A	$\longmapsto$	cA \| bA \| $\varepsilon$

The FIRST sets of the right side of the productions are given by

FIRST(aAa)	=	{a}
FIRST(bAa)	=	{b}
FIRST(cA)	=	{c}
FIRST(bA)	=	{b}
FIRST( $\varepsilon$ )	=	{ $\varepsilon$ }

The FOLLOW sets of the left side of the productions are given by

FOLLOW(S)	=	{$}
FOLLOW(A)	=	{a}

COMPUTING THE PARSING TABLE.

Algorithm 9

$\fbox{ \begin{minipage}{10 cm} \begin{description} \item[{\bf Input:}] $G = (V_T... ...} \\ \> \> \> \> \> \> \> $M[A,a]$\ := {\em error} \end{tabbing}\end{minipage}}$

Algorithm 9 consists of three main steps.

The first for loop which initializes each entry of the parsing table M to the empty set.
The second for loop which fills the parsing table M by using the following rules.
1. If A $\longmapsto$ $\alpha$ with $\alpha$ $\neq$ $\varepsilon$ is a production and if a is a terminal such that a $\in$ FIRST( $\alpha$ ) then the production A $\longmapsto$ $\alpha$ is added to M[A, a].
2. If $\varepsilon$ $\in$ FIRST( $\alpha$ ) (which means $\alpha$ $\;\stackrel{{\ast}}{{\Longrightarrow}}\;$ $\varepsilon$ ) then the production A $\longmapsto$ $\alpha$ is added to M[A, b] for every b $\in$ FOLLOW(A). In particular if $\varepsilon$ $\in$ FIRST( $\alpha$ ) and $ $\in$ FOLLOW(A) then the production A $\longmapsto$ $\alpha$ is added to M[A,$].
The third for loop which sets to error every empty entry of the parsing table M.

Example 19 Consider the following grammar (with terminals + ,*,(,), $\bf id$ and nonterminals T, E, E', F)

E	$\longmapsto$	TE'
E'	$\longmapsto$	+ TE'* \| $\varepsilon$*
T	$\longmapsto$	FT'
T'	$\longmapsto$	*FT' \| $\varepsilon$
F	$\longmapsto$	(E) \| $\bf id$

The FIRST sets of the right side (and some of their prefixes) of the productions are given by

FIRST(F)	=	{(, $\bf id$ }
FIRST(T)	=	{(, $\bf id$ }
FIRST(T')	=	{, $\varepsilon$ }*
FIRST(E)	=	{(, $\bf id$ }
FIRST(E')	=	{ + , $\varepsilon$ }
FIRST(TE')	=	{(, $\bf id$ }
FIRST(FT')	=	{(, $\bf id$ }
FIRST((E))	=	{(}
FIRST(+ TE')	=	{ + }
FIRST(FT')*	=	{}*

The FOLLOW sets of the left side of the productions are given by

FOLLOW(E)	=	{),$}
FOLLOW(E')	=	{),$}
FOLLOW(T)	=	{ + ,),$}
FOLLOW(T')	=	{ + ,),$}
FOLLOW(F)	=	{ + ,,),$}*

	$\bf id$	+	*	(	)	$
E	E $\longmapsto$ TE'			E $\longmapsto$ TE'
E'		E' $\longmapsto$ + TE'			E' $\longmapsto$ $\varepsilon$	E' $\longmapsto$ $\varepsilon$
T	T $\longmapsto$ FT'			T $\longmapsto$ FT'
T'		T' $\longmapsto$ $\varepsilon$	T' $\longmapsto$ *FT'		T' $\longmapsto$ $\varepsilon$	T' $\longmapsto$ $\varepsilon$
F	F $\longmapsto$ $\bf id$			F $\longmapsto$ (E)

Example 20 Consider the following grammar (with terminals a, b, e, i, t and nonterminals S, S', E)

S	$\longmapsto$	i E t S S' \| a
S'	$\longmapsto$	e S \| $\varepsilon$
E	$\longmapsto$	b

The FIRST sets of the right side of the productions are given by

FIRST( i E t S S')	=	{i}
FIRST(a)	=	{a}
FIRST(e S)	=	{e}
FIRST( $\varepsilon$ )	=	{ $\varepsilon$ }
FIRST(b)	=	{b}

Le's sketch the computation of the FOLLOW sets. Recall that the rules are:

if	A $\displaystyle \longmapsto$ $\displaystyle \alpha$ B $\displaystyle \beta$	then	$\displaystyle \mbox{{\sc FOLLOW}($B$)}$ : = $\displaystyle \mbox{{\sc FIRST}(${\beta}$)}$ $\displaystyle \setminus$ { $\displaystyle \varepsilon$ } $\displaystyle \cup$ $\displaystyle \mbox{{\sc FOLLOW}($B$)}$
if	A $\displaystyle \longmapsto$ $\displaystyle \alpha$ B	then	$\displaystyle \mbox{{\sc FOLLOW}($B$)}$ : = $\displaystyle \mbox{{\sc FOLLOW}($B$)}$ $\displaystyle \cup$ $\displaystyle \mbox{{\sc FOLLOW}($A$)}$
if	$\displaystyle \left\{\vphantom{ \begin{array}{l} A \longmapsto {\alpha} B {\beta} \\ {\beta} \stackrel{\ast}{\Longrightarrow} {\varepsilon} \end{array} }\right.$ $\displaystyle \begin{array}{l} A \longmapsto {\alpha} B {\beta} \\ {\beta} \stackrel{\ast}{\Longrightarrow} {\varepsilon} \end{array}$	then	$\displaystyle \mbox{{\sc FOLLOW}($B$)}$ : = $\displaystyle \mbox{{\sc FOLLOW}($B$)}$ $\displaystyle \cup$ $\displaystyle \mbox{{\sc FOLLOW}($A$)}$

(43)

	Initialization	S $\longmapsto$ iEtSS'	S $\longmapsto$ iEtSS'
FOLLOW(S)	{$}	{e,$}	{e,$}
FOLLOW(S')	$\emptyset$	$\emptyset$	{e,$}
FOLLOW(E)	$\emptyset$	{t}	{t}

This leads to the following parsing table.

	a	b	e	i	$	t
S	S $\longmapsto$ a			S $\longmapsto$ iEtSS'
S'			S' $\longmapsto$ $\varepsilon$		S' $\longmapsto$ $\varepsilon$
			S' $\longmapsto$ eS
E		E $\longmapsto$ b

The entry M[S', e] contains both

S' $\longmapsto$ eS (since e $\in$ FIRST(e S))
S' $\longmapsto$ $\varepsilon$ (since $\varepsilon$ $\in$ FIRST( $\varepsilon$ ) and since e $\in$ FOLLOW(S'))

We know that this grammar is ambiguous and this ambiguity is shown by this choice of productions when an e (else) is seen.

ERROR RECOVERY IN PREDICTIVE PARSING. A predictive parser attempts to match the nonterminals and the terminals in the stack with the remaining input. Therefore two types of conflicts can occur.

T-conflict.: A terminal appearing on top of the stack does not match the following input token.
N-conflict.: For a nonterminal B on top of the stack and the lookahead token b the entry M[B, b] of the parsing table is empty.

Panic-mode recovery is based on the idea of skipping symbols on the input string until a token in a selected set of synchronizing tokens appears. These synchronizing sets should be chosen such that the parser recovers quickly from errors that are likely to occur in practice. Here are some strategies for the above conflict cases.

T-conflict.

Skip (= ignore and advance) the token in the input string. Hence the synchronizing set conists here of all other tokens.

N-conflict.

Possible solutions.

Skip (= ignore and advance) the token b in the input string.
If M[B, b] is a blank entry labeled synch then skip (= pop and ignore) the nonterminal B. (Strictly speaking and according to the above definition, this is not a panic-mode recovery, but quite close in the spirit.)
Skip tokens from the input string until an element of FIRST(B) is reached, then continue parsing normally. So the synchronizing set here is FIRST(B).
Skip tokens from the input string until an element of FOLLOW(B) is reached, then skip B and continue parsing normally. (Again this is a variation of panic-mode recovery.)
Delimiters such as ; in C can be added to the two previous synchronizing sets.

Next: LL(1) Grammars Up: Parsing Previous: Predictive parsing

Marc Moreno Maza
2004-12-02