Bottom-up parsing

Example 21 Consider the grammar (with terminals a, b, c, d, e and nonterminals S, A, B)

S	$\longmapsto$	aABe
A	$\longmapsto$	Abc \| b
B	$\longmapsto$	d

Consider the input word w = abbcde. We reduce w to S by repeating the following algorithm.

Choose a string such that
- $\beta$ is the leftmost factor of w such that
- there exists a production A $\longmapsto$ $\beta$ .
Replace $\beta$ by A in w.

Warning! This algorithm is a first approach. (True algorithms for botton-up parsing do not follow exactly this scheme.) Well, our first-approach algorithms leads to the following sequence of transformations in the first column of the table below. The second column gives the production that could be used in order to rewrite the current word backwards. The last column gives the chosen production according to the above algorithm.

Current string	Qualifying productions	Chosen production
abbcde	A $\longmapsto$ b
	B $\longmapsto$ d	A $\longmapsto$ b
aAbcde	A $\longmapsto$ Abc
	A $\longmapsto$ b
	B $\longmapsto$ d	A $\longmapsto$ Abc
aAde	B $\longmapsto$ d	B $\longmapsto$ d
aABe	S $\longmapsto$ aABe	S $\longmapsto$ aABe
S

Observe that we have

S $\displaystyle \;\stackrel{{{\ rm }}}{{\Longrightarrow}}\;$ aABe $\displaystyle \;\stackrel{{{\ rm }}}{{\Longrightarrow}}\;$ aAde $\displaystyle \;\stackrel{{{\ rm }}}{{\Longrightarrow}}\;$ aAbcde $\displaystyle \;\stackrel{{{\ rm }}}{{\Longrightarrow}}\;$ abbcde

(44)

where $\alpha$ $\;\stackrel{{{\ rm }}}{{\Longrightarrow}}\;$ $\beta$ means that $\beta$ derives from $\alpha$ by a rightmost derivation.

BOTTOM-UP PARSING constructs a parse tree for an input string beginning at the leaves and working up towards the root.

HANDLE. Let G = (V_T, V_N, S, P) be a context-free grammar. Let n be a positive integer, let w, $\beta$ be two strings of symbols and let A be a nonterminal such that A $\longmapsto$ $\beta$ is a production of G. We say that (A $\longmapsto$ $\beta$ , n) is a handle for w if there exists a string of symbols $\gamma$ such that

More informally, a handle of w is a substring $\beta$ that matches a right side of a production A $\longmapsto$ $\beta$ such that the reduction of the handle corresponds to a step of the reverse of a rightmost derivation.

HANDLE PRUNING is the general approach used in shift-and-reduce parsing. The implementation of handle pruning involves the following data-structures:

Example 25 Consider the grammar G

E	$\longmapsto$	E + E
E	$\longmapsto$	EE*
E	$\longmapsto$	(E)
E	$\longmapsto$	$\bf id$

A rigthmost derivation of the sentence w = $\bf id$ + $\bf id$ * $\bf id$ is

S	$\displaystyle \;\stackrel{{{\ rm }}}{{\Longrightarrow}}\;$	E + E
	$\displaystyle \;\stackrel{{{\ rm }}}{{\Longrightarrow}}\;$	E + E*E
	$\displaystyle \;\stackrel{{{\ rm }}}{{\Longrightarrow}}\;$	E + E* $\displaystyle \bf id$
	$\displaystyle \;\stackrel{{{\ rm }}}{{\Longrightarrow}}\;$	E + $\displaystyle \bf id$ * $\displaystyle \bf id$
	$\displaystyle \;\stackrel{{{\ rm }}}{{\Longrightarrow}}\;$	$\displaystyle \bf id$ + $\displaystyle \bf id$ * $\displaystyle \bf id$

(46)

Let us sketch the actions a shift-and-reduce parser might take in parsing w to obtain the above rigthmost derivation. (Another shift-and-reduce parser might obtain another rigthmost derivation of w w.r.t. G.)

Stack	Input	Action to take
$	$\bf id$ + $\bf id$ $\bf id$ $*	shift
$ $\bf id$	+ $\bf id$ $\bf id$ $*	reduce with E $\longrightarrow$ $\bf id$
$E	+ $\bf id$ $\bf id$ $*	shift
$E +	$\bf id$ $\bf id$ $*	shift
$E + $\bf id$	* $\bf id$ $	reduce with E $\longrightarrow$ $\bf id$
$E + E	* $\bf id$ $	shift
$E + E*	$\bf id$ $	shift
$E + E $\bf id$*	$	reduce with E $\longrightarrow$ $\bf id$
$E + EE*	$	reduce with E $\longrightarrow$ EE*
$E + E	$	reduce with E $\longrightarrow$ E + E
$E	$	accept

Of course the decisions (especially for the 4th shift) made in the above table are not explained at this point.

Proof. This fact becomes clear when considering the possible forms of two successive steps in any rightmost derivation. These two steps can be of the form (type 1)

S $\displaystyle \;\stackrel{{{\ rm } \ {\ast}}}{{\Longrightarrow}}\;$ $\displaystyle \alpha$ A z $\displaystyle \;\stackrel{{{\ rm } \ {\ast}}}{{\Longrightarrow}}\;$ $\displaystyle \alpha$ $\displaystyle \beta$ B y z $\displaystyle \;\stackrel{{{\ rm } \ {\ast}}}{{\Longrightarrow}}\;$ $\displaystyle \alpha$ $\displaystyle \beta$ $\displaystyle \gamma$ y z

(47)

or the form (type 2)

S $\displaystyle \;\stackrel{{{\ rm } \ {\ast}}}{{\Longrightarrow}}\;$ $\displaystyle \alpha$ B x A z $\displaystyle \;\stackrel{{{\ rm } \ {\ast}}}{{\Longrightarrow}}\;$ $\displaystyle \alpha$ B x y z $\displaystyle \;\stackrel{{{\ rm } \ {\ast}}}{{\Longrightarrow}}\;$ $\displaystyle \alpha$ $\displaystyle \gamma$ x y z

(48)

In type 1, first the rightmost nonterminal A is replaced by the right side of the production A $\longrightarrow$ $\beta$ B y and then the rightmost nonterminal B is replaced by the right side of the production B $\longrightarrow$ $\gamma$ .
In type 2, the rightmost nonterminal A is replaced first by use of the rule A $\longrightarrow$ y (where y is a string of one or more terminals) and then B is replaced by using B $\longrightarrow$ $\gamma$ .

We proceed by induction. Let us assume that the handle B $\longrightarrow$ $\gamma$ appears on top of the stack. Then let us show that the handle A $\longrightarrow$ $\beta$ B y in type 1 (and the handle A $\longrightarrow$ y in type 2) appears on top also. Let us consider type 1 in reverse, where a shift-and-reduce parser has just reached the following configuration.

Stack	Input
$\alpha$ $\beta$ $\gamma$	y z

The parser now reduces the handle.

Stack	Input
$\alpha$ $\beta$ B	y z

Since B is the rightmost nonterminal, the right end of the handle of w = $\alpha$ $\beta$ Byz cannot be to the left of B. Therefore the parser can shift y.

Stack	Input
$\alpha$ $\beta$ B y	z

For case 2 in reverse, assume the following configuration has been reached

Stack	Input
$\alpha$ $\gamma$	x y z

Reduction leads to

Stack	Input
$\alpha$ B	x y z

Again B is the rightmost nonterminal so it needs to be involved.

$\qedsymbol$

S	$\longmapsto$	AC \| BC
A	$\longmapsto$	a
B	$\longmapsto$	a
C	$\longmapsto$	c