Wednesday, June 5, 2013

generating C99 lexer & parser using antlr v4

This article describes how to generate a parser that compatible with native C99 standard by using antlr v4.


I. Why antlr v4

antlr v4 implemented a new LL* algorithm for parser which provides more feature, i.e. auto generating parse tree. Besides, it provides a set of test kits which could used to make sure lexer and parser works safe and sound.

So we are doing this using antlr v4 instead of antlr v3: http://www.antlr.org/download.html

II. Combined rules

antlr's methodology depends on two parts of rules: lexer and parser.
lexer takes a raw input stream (the source code) and outputs a token stream.
parser takes token stream and outputs a parse tree as the final result (so far).
we combined these sets of rules into one single grammer file with extension .g4.


III. Grammer file

i figured out that antlr v3 has officially provided a set of grammer rules including C, Java and CSharp. but unfortunately these rules won't work in antlr v4 because several options have been obsoleted and several rules are not compatible with the new algorithm implemented in antlr v4. 

In purpose of get it working, i made several modifications to the rules and get this: https://www.dropbox.com/s/zxl39ft2xyh1q5o/C.g4


IV. Generating java classes

use the following command line to generate java classes based on the grammer rule:
java org.antlr.v4.Tool C.g4

after this, if succeed, at least these two files will be generated: CLexer.java and CParser.java.



V. Compile and test it

use the following command line:
javac.exe C*.java
java org.antlr.v4.runtime.misc.TestRig C translation_unit -gui -trace -diagnostics < main.c 1> test.log 2>&1

if things don't go wrong, you will see a tree graph after a short period, which means it works so far.


VI. Integrating to your own java project

first make sure your project have antlr-4.0-complete.jar in class path, then you can add these classes auto generated into your project.

now you can create your own code to sort things out:

ANTLRInputStream input = new ANTLRInputStream(inputStream);
CLexer lexer = new CLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);

CParser parser = new CParser(tokens);
parser.setBuildParseTree(true);
ParserRuleContext tree = parser.translation_unit();

we have the parse tree now

you can use org.antlr.v4.runtime.tree.gui.TreeViewer to display the tree on your JFrame.

--- EOF ---

No comments:

Post a Comment