Next: Translation Schemes Up: Compiler Theory: Syntax-Directed Translation Previous: Syntax-Directed Definitions

Syntax Trees for Expressions

In this section, we consider the case where

the input language consists of arithmetic expressions (in a general sense that we will make precise below) like arithmetic expressions in the usual sense, or regular expressions, or S-expressions, etc.
the output intermediate representation for an input sentence is a tree reflecting the syntax structure of the sentence.

In this case,

we will forbid inherited attributes,
we will simplify the constructions of the parse tree and the annotated tree,
and we will avoid the construction of the dependency graph.

ARITHMETIC EXPRESSIONS. A language L over an alphabet $\Sigma$ is a language of arithmetic expressions (or expressions for short) if there exists

a partition $\Sigma$ = $\Sigma_{{{op}}}^{{}}$ $\cup$ $\Sigma_{{{val}}}^{{}}$ $\cup$ { $\bf ($ , $\bf )$ },
a grammar G generating L such that each production has one of the forms
- A $\longmapsto$ a for a $\in$ $\Sigma_{{{val}}}^{{}}$
- A $\longmapsto$ (B)
- A $\longmapsto$ $\alpha_{{1}}^{{}}$ $\alpha_{{2}}^{{}}$ ,... $\alpha_{{n}}^{{}}$ where n $\geq$ 2 and where the $\alpha_{{i}}^{{}}$ 's are grammar symbols and such that at most one of them belongs to $\Sigma_{{{op}}}^{{}}$ , the other symbols being nonterminals.

The elements of $\Sigma_{{{op}}}^{{}}$ are called operators and those of $\Sigma_{{{val}}}^{{}}$ are called values. The nonterminals on the right side of a production where an operator appears are called operands of the production.

The language L is said

binary infix if every production has the form A $\longmapsto$ a for a $\in$ $\Sigma_{{{val}}}^{{}}$ , A $\longmapsto$ (B) or A $\longmapsto$ BfC where f $\in$ $\Sigma_{{{op}}}^{{}}$ .
binary postfix if every production has the form A $\longmapsto$ a for a $\in$ $\Sigma_{{{val}}}^{{}}$ , A $\longmapsto$ (B) or A $\longmapsto$ BCf where f $\in$ $\Sigma_{{{op}}}^{{}}$ .
binary prefix if every production has the form A $\longmapsto$ a for a $\in$ $\Sigma_{{{val}}}^{{}}$ , A $\longmapsto$ (B) or A $\longmapsto$ fBC where f $\in$ $\Sigma_{{{op}}}^{{}}$ .

KEY IDEAS.

The first idea is to condense the double construction parse tree + annotated tree in a single construction. For arithmetic expressions this is quite easy as we shall see
The second idea is to avoid the duplications of nodes that stand for two (syntactically) equal subexpressions. For (arithmetic) expressions this is again quite easy by using dags.

CONSTRUCTING SYNTAX TREES FOR EXPRESSIONS. Each node in a syntax tree for an (arithmetic) expression is a record with several fields.

In the node for an operator, one field identifies the operator and the remaining fields contain pointers to the nodes of the operands. The operator is often called the label of the node.
In the leaf for an identifier, one field (the label) indicates that this is an identifier and another field is a pointer to the symbol-table entry for this identifier.
In the leaf for a value, one field (the label) indicates that this is a value (may be this field gives the type of the value) and another field is the value (or a pointer to this value).

We use here the following functions to create the nodes of syntax trees for expressions with binary operators. Each function returns a pointer to a newly created node.

make_node: (op, left, right) creates an operator node with label op and two fields containing pointers to left and right.
make_leaf: ( $\bf id$ , entry) creates an identifier leaf with label $\bf id$ and a field containing entry a pointer to the symbol-table entry for the identifier.
make_leaf: ( $\bf num$ , val) creates a number leaf with label $\bf num$ and a field containing val.

Example 7 The following syntax-directed definition allows the construction of a syntax tree for a binary infix arithmetic expression. The synthesized attributes E.ptr, T.ptr and F.ptr keep track of the pointers returned by the function calls.

Production	Semantic Rule
E $\longmapsto$ E₁ + T	E.ptr := make_node( $\bf +$ , E₁.ptr, T.ptr)
E $\longmapsto$ E₁ - T	E.ptr := make_node( $\bf -$ , E₁.ptr, T.ptr)
E $\longmapsto$ T	E.ptr := T.ptr
T $\longmapsto$ T₁F*	T.ptr := make_node( $\bf $ , T₁.ptr, F.ptr)*
T $\longmapsto$ F	T.ptr := F.ptr
F $\longmapsto$ (E)	F.ptr := E.ptr
F $\longmapsto$ $\bf id$	F.ptr := make_leaf( $\bf id$ , $\bf id$ .entry)
F $\longmapsto$ $\bf num$	F.ptr := make_leaf( $\bf num$ , $\bf num$ .val)

USING DAGS FOR SYNTAX TREES OF EXPRESSIONS.

A dag for an expression identifies the common subexpressions in the expression.
Like a syntax tree, a dag has a node for every subexpression of the expression.
The difference is that a node in a dag representing a common subexpression has more than one parent.
Thus, before constructing a node with label op and fields with pointers to left and right, the operation make_node(op, left, right) checks if such a node has already been constructed.

**Figure 10:** A dag for the expression a + a*(b - c) + (b - c)*d.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.4]{dagForArithmeticSubexpression.eps} \end{figure}$

IMPLEMENTING DAGS FOR SYNTAX TREES OF EXPRESSIONS.

To construct dags for expressions in an efficient manner one could use an array of records.
To avoid (explicit) pointers we can refer to a node by its index (or position) in this array of records.
Each record implements a node (or leaf). For an operator node this record will contain
- the label of the operator,
- the indices of the children.
This kind of representations allows search for existing subexpressions in linear time.
This can be improved by using hash-tables.
- A hash-table will be in our case an array of buckets where a bucket is a linked list of nodes (each node being represented by a record as in the previous data structure).
- We assume that we are given a hash-function h which converts a label into a non-negative machine integer.
- To perform make_node(op, left, right) we first compute h(op) which returns the bucket number in which to search the node (and may be store the node if not yet present).
- In practice we expect each bucket to contain a small number of nodes.
- If $\alpha$ is the expected maximum length of a bucket, then search/insertion of nodes can be done in $\cal {O}$ ( $\alpha$ ) rather than in $\cal {O}$ (n) where n is the number of nodes currently stored.

Next: Translation Schemes Up: Compiler Theory: Syntax-Directed Translation Previous: Syntax-Directed Definitions

Marc Moreno Maza
2004-12-02