Creating External DSLs using ANTLR and Java
In my previous post quite sometime back I had written about Internal DSLs using Java. In the book Domain Specific Languages by Martin Fowler, he discusses about another type of DSL called external DSLs in which the DSL is written in another language which is then parsed by the host language to populate the semantic model.
In the previous example I was discussing about creating a DSL for defining a graph. The advantage of using an external dsl is that any change in the graph data would not require recompilation of the program instead the program can just load the external dsl, create a parse tree and then populate the semantic model. The semantic model will remain the same and the advantage of using the semantic model is that one can make modification to the DSL without making much changes to the semantic model. In the example between Internal DSLs and external DSLs I have not modified the semantic model. To create an external DSL I am making use of ANTLR.
What is ANTLR?
The definition as given on the official site is:
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
The notable features of ANTLR from the above definition are:
- parser generator for structured text or binary files
- can build and walk parse trees
Semantic Model
In this example I will exploit the above features of ANTLR to parse a DSL and then walk through the parse tree to populate the Semantic model. To recap, the semantic model consists of Graph
, Edge
and Vertex
classes which represent a Graph and an Edge and a Vertex of the Graph respectively. The below code shows the class definitions:
001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 108 109 110 | public class Graph { private List<Edge> edges; private Set<Vertex> vertices; public Graph() { edges = new ArrayList<>(); vertices = new TreeSet<>(); } public void addEdge(Edge edge){ getEdges().add(edge); getVertices().add(edge.getFromVertex()); getVertices().add(edge.getToVertex()); } public void addVertice(Vertex v){ getVertices().add(v); } public List<Edge> getEdges() { return edges; } public Set<Vertex> getVertices() { return vertices; } public static void printGraph(Graph g){ System.out.println( "Vertices..." ); for (Vertex v : g.getVertices()) { System.out.print(v.getLabel() + " " ); } System.out.println( "" ); System.out.println( "Edges..." ); for (Edge e : g.getEdges()) { System.out.println(e); } } } public class Edge { private Vertex fromVertex; private Vertex toVertex; private Double weight; public Edge() { } public Edge(Vertex fromVertex, Vertex toVertex, Double weight) { this .fromVertex = fromVertex; this .toVertex = toVertex; this .weight = weight; } @Override public String toString() { return fromVertex.getLabel() + " to " + toVertex.getLabel() + " with weight " + getWeight(); } public Vertex getFromVertex() { return fromVertex; } public void setFromVertex(Vertex fromVertex) { this .fromVertex = fromVertex; } public Vertex getToVertex() { return toVertex; } public void setToVertex(Vertex toVertex) { this .toVertex = toVertex; } public Double getWeight() { return weight; } public void setWeight(Double weight) { this .weight = weight; } } public class Vertex implements Comparable<Vertex> { private String label; public Vertex(String label) { this .label = label.toUpperCase(); } @Override public int compareTo(Vertex o) { return ( this .getLabel().compareTo(o.getLabel())); } public String getLabel() { return label; } public void setLabel(String label) { this .label = label; } } |
Creating the DSL
Lets come up with the structure of the language before going into creating grammar rules. The structure which I am planning to come up is something like:
1 2 3 4 5 | Graph { A -> B (10) B -> C (20) D -> E (30) } |
Each line in the Graph block represents an edge and the vertices involved in the edge and the value in the braces represent the weight of the edge. One limitation which I am enforcing is that the Graph cannot have dangling vertices i.e vertices which are not part of any edge. This limitation can be removed by slightly changing the grammar, but I would leave that as an exercise for the readers of this post.
The first task in creating the DSL is to define the grammar rules. These are the rules which your lexer and parser will use to convert the DSL into a Abstract Syntax tree/parse tree.
ANTLR then makes use of this grammar to generate the Parser, Lexer and a Listener which are nothing but java classes extending/implementing some classes from the ANTLR library. The creators of the DSL must make use of these java classes to load the external DSL, parse it and then using the listener populate the semantic model as and when the parser encounters certain nodes (think of this as a variant of SAX parser for XML)
Now that we know in very brief what ANTLR can do and the steps in using ANTLR, we would have to setup ANTLR i.e download ANTLR API jar and setup up some scripts for generating the parser and lexer and then trying out the language via the command line tool. For that please visit this official tutorial from ANTLR which shows how to setup ANTLR and a simple Hello World example.
Grammar for the DSL
Now that you have ANTLR setup let me dive into the grammar for my DSL:
1 2 3 4 5 6 7 | grammar Graph; graph: 'Graph {' edge+ '}' ; vertex: ID; edge: vertex '->' vertex '(' NUM ')' ; ID: [a-zA-Z]+; NUM: [0-9]+; WS: [ \t\r\n]+ -> skip; |
Lets go through these rules:
1 | graph: 'Graph {' edge+ '}' ; |
The above grammar rule which is the start rule says that the language should start with ‘Graph {‘ and end with ‘}’ and has to contain at lease one edge or more than one edge.
1 2 3 4 | vertex: ID; edge: vertex '->' vertex '(' NUM ')' ; ID: [a-zA-Z]+; NUM: [0-9]+; |
The above four rules say that a vertex should have atleast one character or more than one character. And an edge is defined as collection of two vertices separated by a ‘->’ and with the some digits in the ‘()’.
I have named the grammar language as “Graph” and hence once we use ANTLR to generate the java classes i.e parser and lexer we will end up seeing the following classes: GraphParser, GraphLexer, GraphListener and GraphBaseListener. The first two classes deal with the generation of parse tree and the last two classes deal with the parse tree walk through. GraphListener is an interface which contains all the methods for dealing with the parse tree i.e dealing with events such as entering a rule, exiting a rule, visiting a terminal node and in addition to these it contains methods for dealing with events related to entering the graph rule, entering the edge rule and entering the vertex rule. We will be making use of these methods to intercept the data present in the dsl and then populate the semantic model.
Populating the semantic model
I have created a file graph.gr in the resource package which contains the DSL for populating the graph. As the files in the resource package are available to the ClassLoader at runtime, we can use the ClassLoader to read the DSL script and then pass it on to the Lexer and parser classes. The DSL script used is:
1 2 3 4 5 6 7 | Graph { A -> B (10) B -> C (20) D -> E (30) A -> E (12) B -> D (8) } |
And the code which loads the DSL and populates the semantic model:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | //Please resolve the imports for the classes used. public class GraphDslAntlrSample { public static void main(String[] args) throws IOException { //Reading the DSL script InputStream is = ClassLoader.getSystemResourceAsStream( "resources/graph.gr" ); //Loading the DSL script into the ANTLR stream. CharStream cs = new ANTLRInputStream(is); //Passing the input to the lexer to create tokens GraphLexer lexer = new GraphLexer(cs); CommonTokenStream tokens = new CommonTokenStream(lexer); //Passing the tokens to the parser to create the parse trea. GraphParser parser = new GraphParser(tokens); //Semantic model to be populated Graph g = new Graph(); //Adding the listener to facilitate walking through parse tree. parser.addParseListener( new MyGraphBaseListener(g)); //invoking the parser. parser.graph(); Graph.printGraph(g); } } /** * Listener used for walking through the parse tree. */ class MyGraphBaseListener extends GraphBaseListener { Graph g; public MyGraphBaseListener(Graph g) { this .g = g; } @Override public void exitEdge(GraphParser.EdgeContext ctx) { //Once the edge rule is exited the data required for the edge i.e //vertices and the weight would be available in the EdgeContext //and the same can be used to populate the semantic model Vertex fromVertex = new Vertex(ctx.vertex( 0 ).ID().getText()); Vertex toVertex = new Vertex(ctx.vertex( 1 ).ID().getText()); double weight = Double.parseDouble(ctx.NUM().getText()); Edge e = new Edge(fromVertex, toVertex, weight); g.addEdge(e); } } |
And the output when the above would be executed would be:
1 2 3 4 5 6 7 8 | Vertices... A B C D E Edges... A to B with weight 10.0 B to C with weight 20.0 D to E with weight 30.0 A to E with weight 12.0 B to D with weight 8.0 |
To summarize, this post creates a external DSL for populating the data for graphs by making use of ANTLR. I will enhance this simple DSL and expose it as an utility which can be used by programmers working on graphs.
The post is very heavy on concepts and code, feel free to drop in any queries you have so that I can try to address them for benefit of others as well.