The Compiler Course

Frequently Asked Questions

Joseph Bergin - - Pace University, New York


This list of questions and answers will change frequently. (It is not currently up to date.)

When I unzip the course files I don't get the correct directory structure. What do I do?

If you use a command line unzipper, use the -d option. Otherwise make sure your preferences are set to preserve directories. Also, make sure that the zip program you use preserves long file names. Some older programs truncate to dos names. Java requires the long names.

How should I set up my working directory? (Obsolete - See Eclipse instructions, elsewhere)

For the small project you want a directory that contains the java file and the SAM2 and MACC2 executables. This can be anywhere on your disk. It should NOT be within the java distribution directory or the KAWA directory, however. You also want the test files in this directory. After you build your compiler you will have a subdirectory named micro as well.

For the large project you need a directory that contains all of the java files for that project as well as the SAM2 and MACC2 executables. You also need the newcoco.jar file and the batch file that executes it (newcoco.bat). You need your grammar file GCL.atg as well as any tests you want to run at the current time. After you build your compiler you will have a subdirectory named gcl as well. Finally, you will need the Parser.frame and Scanner.frame that were distributed in the gcl directory of the download. NOTE that these are NOT the same as the files with the same name in the COCOSources directory. Leave those where they are.

I have tried compiling MicroGCL in KAWA. But then I try to run I get "exception in main".

You probably don't have the options set correctly in the Customize/Project Root Options menu set correctly. The first option (-d) needs to have a checkmark in the box and a period in the text field. Look again at the Kawa setup instructions.

You also may want/need to set the command line arguments (the input and output file names in microgcl, for example) in the Project/Interpreter Options menu. Don't forget to also check the associated check box there.

Suppose I don't want to use KAWA. Can I just use the command line compiler that comes with JDK?

Yes. Here is how for microgcl.

To compile microgcl without kawa (bare machine) you want to use the command

javac -d .

The -d (period) option is necessary. The compiler will create a new directory named micro and put a few .class files into it. I don't remember how many, but one for each class in the java file. Note the capitalization. Java tools are case sensitive, just like the language.

To run the resulting executable you use the command (from the same working directory you compiled from)

java micro.MicroGCLCompiler microtes.x codefile

Note the capitalization, which is essential. This will run the compiler using input file microtes.x and producing an output named codefile. You always want the output to be named codefile, since the input to SAM2 is required to have this name--it is hardcoded into the assembler. codefile is just a text file and you can read it with any text editor.

The compile command for the big project is quite a bit more complex since there are several files. The run command is similar, however. You can use a batch command to run these, of course.

NOTE. The codefile (the output of the compiler) is just a text file, though it has no extension. You may read it in any text editor.

The compile command for the big project is much more complicated. It requires naming all the source files. I will provide a batch file to ease this.

How do I run SAM2?

Sam2 is an assembler. Once you have the codefile from your compiler you can run sam with just


sam2 will produce a new file (that you can't read) called obj. Of course you need a command line window to execute this command.

Note that you want both the SAM2 and MACC2 executables in your working directory, since this is where the files they need will be.

If your compiler produces incorrect assembly code, sam2 will give you errors. Otherwise it just runs silently, but produces the obj file. Note that the assembly code is case sensitive. Opcodes are all upper case and they must be as in the sam2 documentation.

If your version of Sam produces a textual listing then it is just that: a listing. The numbers at the left of each line are the machine addresses at which each instruction can be found in the final machine code program (obj). If it produces this listing it does so on standard output, so it can be redirected to a file. It still produces the obj file in any case.

NOTE. Run sam and macc from a command window.

NOTE. The obj file is not human readable. It is NOT a text file.

If you get (DOS/Windows) error 02 when running sam2 you either don't have codefile named properly or it is in the wrong directory.

If you are running under UNIX then codefile needs to be CODEFILE (all caps) since unix requires case sensitive file names. Also, in UNIX, the obj file will be capitalized (OBJ). Sam2 produces it like this and macc2 expects it also.

There is a documentation file for sam2 in your distribution kit and you should become familiar with it. It describes the target language of the compilers in both projects. Your compiler outpus will need to conform to the requirements of this document.

How do I run MACC2?

Macc2 is a machine simulator. Think of it as your "hardware." Once you have an obj file from SAM2 you can run macc2 with just


or with

macc2 ‹inputdata

or with

macc2 ‹inputdata ›outputdata 

where inputdata and output data are file names of your choice. The inputdata would contain the data for any reads the original algorithm does and outputdata will contain the results of any write statements. If you use the first form the simulator (macc2) will wait for you to type inputs without prompting in any way. See pages 6 and 7 of the notes. The latter form will save your outputs for inclusion in your reports.

If you get (DOS/Windows) error 02 when runnning macc2 you either don't have obj named properly or it is in the wrong directory or possibly missing altogether.

There is a documentation file for macc2, but it mostly describes the machine itself and the internals of the machine language, so isn't very useful.

What is COCO and how do I use it?

Coco is short for Compiler-Compiler. It is used only in the second (large) project. It translates a grammer (gcl.atg) into both a parser and a scanner. You have been provided with a description of coco in both MS/Word and html formats. To run coco requires that you have two files in the working directory Scanner.frame and Parser.frame. Do not modify these files in any way and do not edit the output of coco ( and in any way. and need to be part of your project, of course, and after running coco you may need to "build-all" files, rather than just "build-dirty" files.

The easiest way to run coco is to integrate it into Eclipse. Instructions for this are provided in the course materials.

To run coco you can use a batch file that is provided named newcoco.bat. This batch file has one optional argument (-pt) and if provided the parser will be modified so that it prints out parse trees as it compiles a test program. The batch file is hard coded to build the grammar gcl.atg. All you need to write (command line) is



newcoco -pt

NOTE. The distribution contains source code for COCO, but you don't need it. It is there for your information only. We will not need to rebuild COCO during this course or visit its source code.

There is a documentation file for COCO in your distribution kit.

Where can I learn more about Design Patterns?

There are now many books. My home page has a number of resources. One of the best places to start is with James (Cope) Coplien's Software Patterns. Another is Patterns and Software: Essential Concepts and Terminology by Brad Appleton


Where can I learn more about compilers?

My links page has some resources.


Where can I learn more about Java?

The book Java Tutorial by Campione and Walrath (Addison Wesley) is a good source. There is a shorter overview in the Java in a Nutshell book by Flannagan (O'Reilly). My home page also has a lot of resources.

Sun has an online Java tutorial at


Where can I learn more about context free grammars and BNF?

Here are three links (somewhat abstract presentations, however)

You can also consult the book: Ravi Sethi, Programming Languages: Concepts and Constructs, 2ed, Addison-Wesley, 1996

Also look on the course wiki.


I see that on page 13 of the notes, on line 22 (LD R0, #10). What is the significance of the # in this line of code?

The # before a number indicates an "immediate mode" operand (immed).
LD R0, #10
says to put a 10 into register 0.

LD R0, 10
says to put the contents of the memory cell with address 10 into register 0
These are quite different. Without the sharp you have direct memory mode (dmem)

LD R0 $A$
is also dmem mode since $A$ is just a name indicating an address.
There are 6 other memory modes also. We will discuss them later or you can read about them in the sam doc.
These memory address modes are fairly typical of assemblers. You probably used them in your assembly language course.

By the way, the R0 itself is direct register mode (dreg). The first operand in a two operand instruction is always dreg mode, but you can also use dreg for the second operand.

This and more is all explained in the SAM2 documentation in the docs directory in the distributed software.

When I run my compiler I get an exception, perhaps Null Pointer Exception. What does this mean and how do I deal with it?

Java reference variables are like pointers. All Java objects are created dynamically with the new operator. If you have a reference variable and don't initialize it correctly you will get a Null Pointer Exception.

All exceptions in Java produce volumes of information, including a trace of all outstanding method invocations. Here is an example. The earliest call is at the bottom (it is a stack, actually) and the current method at the top. The numbers are line numbers.

at micro.Codegen.extractExpr(
at micro.Codegen.generate_2(
at micro.Codegen.loadreg(
at micro.Semantic.genInfix(
at micro.Parser.expression(, Compiled Code)
at micro.Parser.statement(
at micro.Parser.statementList(, Compiled Code)
at micro.Parser.program(
at micro.Parser.parse(
at micro.Second.main(
Exception in thread "main" Process Exit...

Therefore, look on line 835 of your program (in file ) in method micro.Codegen.extractExpr. Somewhere on that line is a reference variable that was not properly initialized, giving the Null Pointer Exception. Other exceptions may not involve reference variables, but the line numbers are accurate. Look there for the error. You can trace back to see how the program got to the final function by going down the stack. extractExpr was called from generate_2 at line 861, which was called from loadreg line 808, etc.

How do I build my new Parser and Scanner once I edit my GCL.atg file? (Large project only)

There is a batch file named newcoco.bat in the distribution that will do this. It should be put in your working directory along with newcoco.jar. You can try to double click this to execute it. If the resulting window does not immediately disappear then this will work for you. If it does, then you should instead open a command window, navigate to your working directory and type newcoco. If your build is successful you will see something like.

C:\courses\gcl 2001>java -classpath .;newcoco.jar Coco.Comp GCL.ATG
   Coco/R V1.0
   defPart deletable
   parser + scanner generated
   0 error(s) detected

The important line here is the one that tells you that the parser and scanner were generated. Otherwise you have errors in your grammar that must be corrected. When successful, your parser and scanner should have new dates as well.

Note that for some unknown reason JDK 1.2.2 does not correctly run with this file. You therefore need a different version of Java. JDK1.5 or 1.6 is probably best.

Last Updated: December 28, 2008