Next: Abstract Stack Machines
Up: Compiler Theory: Intermediate Code Generation
Previous: Compiler Theory: Intermediate Code Generation
AN INTERMEDIATE REPRESENTATION (IR) is a language for an abstract machine
(or a language that can be easily evaluated by an abstract machine)
- It should not include too much machine specific detail.
- It provides a separation between front and back ends which helps compiler portability.
Code generation and assignment of temporaries to registers are clearly separated
from semantic analysis.
- It allows optimization independent of the target machine.
Intermediate representations are by design more abstract and uniform,
so optimization routines are simpler.
Properties of good intermediate representations.
- Convenient to produce in the semantic analysis phase.
- Convenient to translate into code for the desired target architecture.
- Each construct must have a clear and simple meaning such that
optimizing transformations that rewrite the IR can be easily specified and implemented.
TREE-BASED INTERMEDIATE REPRESENTATIONS.
The most obvious way to present the information gained from lexical and syntax analysis is a syntax tree.
In C, the representation can be handled using a struct for each node:
struct node {
int nodetype;
struct node *field1;
struct node *field2;
...................
}
- The nodetype field contains a small integer to identify the
nonterminal corresponding to the node.
- The other fields point to subtrees.
- Unused fields may have null pointers.
Here's a C routine for allocating storage for the nodes.
struct node *mknode (int type,
struct node *f1,
struct node *f2,
struct node *f3)
{
struct node *t = (struct node *) malloc(sizeof(struct node));
t -> nodetype = type;
t -> field1 = f1;
t -> field2 = f2;
t -> field3 = f3;
return t;
}
The routine mknode can be used to build the syntax tree using
a syntax-directed definition where each non-terminal E has a
synthesized attribute E.ptr that points to the root of its syntax tree.
In practice we do not hold individual characters (letters or characters)
at leaves of the tree:
- i.e. we do not build subtrees for characters of a variable name.
- The lexer groups these together into strings or integers.
- thus we can use string or integer fields in the struct.
We can extend our definition of the struct node from earlier to include name and value fields
struct node {
int nodetype;
struct node *field1;
struct node *field2;
struct node *field3;
char *name;
int value;
}
Example 1
A syntax-directed definition to produce syntax trees for assignment statements
could be
Production |
Semantic Rule |
S : = E |
S.ptr := make_node('
', make_leaf(,
.entry), E.ptr) |
E E1 + E2 |
E.ptr := make_node('
, E1.ptr, E2.ptr) |
E E1*E2 |
E.ptr := make_node('
, E1.ptr, E2.ptr) |
E - E1 |
E.ptr := make_node('
', E1.ptr) |
E (E1) |
E.ptr := E1.ptr |
E |
E.ptr := make_leaf(,
.entry) |
POSTFIX NOTATION is a linearized representation of a syntax tree.
- It is a list of the nodes of the tree in which every node appears
immediately after its children.
- The edges of a syntax tree do not appear explicitely in postfix
notation.
- However they can be recovered from
- the order in which the nodes appear and
- the number of operands that the operator at a node expects.
- The recovery of edges is similar to the evaluation, using a stack,
of an expression in postfix notation.
Next: Abstract Stack Machines
Up: Compiler Theory: Intermediate Code Generation
Previous: Compiler Theory: Intermediate Code Generation
Marc Moreno Maza
2004-12-02