CS 4447 Compiler Theory
Assignment 2

Department of Computer Science
University of Western Ontario
December 3, 2010

This assignment is due Friday, December 3. It has a total value of 250 points, so it is possible to get upto 300/250 on it.

Please see the course outline for information on the late assignment penalty and the rules of ethical conduct.

Completed assignments should be packages as a .tar file, placed in a web-accessible location, and the URL E-mailed to pbourdyk@csd.uwo.ca and watt@csd.uwo.ca. You will receive a confirmation when your assignment has been uploaded.


The directory http://www.csd.uwo.ca/~watt/home/courses/2010-11/cs4447a/parts/classes has been updated to contain the following files that you may use and adapt for this assignment:

Top Level
Front End
C Code representation and tools
C type representation and tools
Utilities and tools

In some cases these classes are quite complete, and in others there are parts you still have to complete.

Question 1: The Symbol Table (60 points)

Write a visitor class that traverses a CCode object and determines where each identifier is bound. As you enter a scope (e.g. a file, a function body, a compound statement, a struct delcaration) you should create a table to keep track of the identifiers bound there. These tables should be kept as annotations on the relevant nodes of the CCode tree.

You may decide to keep separate tables for labels, struct declarations, variables etc, or you may decide to keep them all in one table with different kinds of entries. Remember, though, that it is legal in C to have a struct and an id and a label all with the same spelling. Each table should point to the relevant table in the enclosing scope.

As your visitor traverses the CCode tree, it should add entries to the relevant table when it sees id declarations, and look up entries in the table when it sees id uses. At the end of the traversal, each instance of every identifier should be annotated by a table entry that can be updated with information about that identifier. The declaration and each use of the same bound name should refer to the same table entry object. This fact will be used in questions 2 and 3.

Your program should report undeclared and multiply declared variables.

Question 2: Message Handling (40 points)

Create a MessageHandler class that can deal with different kinds of messages. It should handle at least three kinds, warnings, recoverable errors and fatal errors.

Modify your scanner and parser so that each generated CCode node has its source line number (if it hasn't got one already). You don't need to keep track of the input file or of the column number, although in a real compiler you would.

Modify your answer to question 1 so that if an identifier is undeclared or multiply declared, then you add a message to the message handler rather then aborting.

Add a reporting step at the end of your main program that sorts the messages in the MessageHandler by their line number and then prints them.

Question 3: Type Inference (70 points)

Write a second visitor class to traverse CCode trees. It should assume that the tree is properly annotated with symbol table nodes on the scope constructs and table entries on the identifiers. This visitor should add a CType annotation to each node that is an instance of CCodeExpr (or any of its subclasses). At the end of the traversal, each node in every expression (but not each node in statements or declarations) should have a type annotation. You should report type-errors in the input program.

Question 4: Intermediate Code (80 points)

Write a program (in the style of your choice) that converts a type-inferred CCode tree to LLVM code. Additional support material will be available on request from pbourdyk@csd.uwo.ca.

What to hand in

For each of questions 1-4 you should hand in a directory with full sources, documentation, test inputs and test outputs.

Question 5: Dataflow and Register Allocation (50 points)

Questions 5.1-5.4 are exercises from the text "Compilers: Principles, Techniques, & Tools" Second Edition, by Aho, Lam, Sethi and Ullman (Addison Wesley 2007).

Question 5.1: Flow Graphs (12 points)

Exercise 8.4.1 on page 531.

Question 5.2: Loops (12 points)

Exercises 9.6.1 and 9.6.2 on pages 669--670.

Question 5.3: Dataflow Analysis (12 points)

Exercises 9.2.1 and 9.2.2 on page 615.

Question 5.4: Register Allocation (12 points)

Exercise 8.8.1 on page 557.