Creating External DSLs using ANTLR and Java
In my previous post quite sometime back I had written about Internal DSLs using Java. In the book Domain Specific Languages by Martin Fowler, he discusses about another type of DSL called external DSLs in which the DSL is written in another language which is then parsed by the host language to populate the semantic model.
In the previous example I was discussing about creating a DSL for defining a graph. The advantage of using an external dsl is that any change in the graph data would not require recompilation of the program instead the program can just load the external dsl, create a parse tree and then populate the semantic model. The semantic model will remain the same and the advantage of using the semantic model is that one can make modification to the DSL without making much changes to the semantic model. In the example between Internal DSLs and external DSLs I have not modified the semantic model. To create an external DSL I am making use of ANTLR.
What is ANTLR?
The definition as given on the official site is:
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
The notable features of ANTLR from the above definition are:
- parser generator for structured text or binary files
- can build and walk parse trees
Semantic Model
In this example I will exploit the above features of ANTLR to parse a DSL and then walk through the parse tree to populate the Semantic model. To recap, the semantic model consists of Graph
, Edge
and Vertex
classes which represent a Graph and an Edge and a Vertex of the Graph respectively. The below code shows the class definitions:
public class Graph { private List<Edge> edges; private Set<Vertex> vertices; public Graph() { edges = new ArrayList<>(); vertices = new TreeSet<>(); } public void addEdge(Edge edge){ getEdges().add(edge); getVertices().add(edge.getFromVertex()); getVertices().add(edge.getToVertex()); } public void addVertice(Vertex v){ getVertices().add(v); } public List<Edge> getEdges() { return edges; } public Set<Vertex> getVertices() { return vertices; } public static void printGraph(Graph g){ System.out.println("Vertices..."); for (Vertex v : g.getVertices()) { System.out.print(v.getLabel() + " "); } System.out.println(""); System.out.println("Edges..."); for (Edge e : g.getEdges()) { System.out.println(e); } } } public class Edge { private Vertex fromVertex; private Vertex toVertex; private Double weight; public Edge() { } public Edge(Vertex fromVertex, Vertex toVertex, Double weight) { this.fromVertex = fromVertex; this.toVertex = toVertex; this.weight = weight; } @Override public String toString() { return fromVertex.getLabel() + " to " + toVertex.getLabel() + " with weight " + getWeight(); } public Vertex getFromVertex() { return fromVertex; } public void setFromVertex(Vertex fromVertex) { this.fromVertex = fromVertex; } public Vertex getToVertex() { return toVertex; } public void setToVertex(Vertex toVertex) { this.toVertex = toVertex; } public Double getWeight() { return weight; } public void setWeight(Double weight) { this.weight = weight; } } public class Vertex implements Comparable<Vertex> { private String label; public Vertex(String label) { this.label = label.toUpperCase(); } @Override public int compareTo(Vertex o) { return (this.getLabel().compareTo(o.getLabel())); } public String getLabel() { return label; } public void setLabel(String label) { this.label = label; } }
Creating the DSL
Lets come up with the structure of the language before going into creating grammar rules. The structure which I am planning to come up is something like:
Graph { A -> B (10) B -> C (20) D -> E (30) }
Each line in the Graph block represents an edge and the vertices involved in the edge and the value in the braces represent the weight of the edge. One limitation which I am enforcing is that the Graph cannot have dangling vertices i.e vertices which are not part of any edge. This limitation can be removed by slightly changing the grammar, but I would leave that as an exercise for the readers of this post.
The first task in creating the DSL is to define the grammar rules. These are the rules which your lexer and parser will use to convert the DSL into a Abstract Syntax tree/parse tree.
ANTLR then makes use of this grammar to generate the Parser, Lexer and a Listener which are nothing but java classes extending/implementing some classes from the ANTLR library. The creators of the DSL must make use of these java classes to load the external DSL, parse it and then using the listener populate the semantic model as and when the parser encounters certain nodes (think of this as a variant of SAX parser for XML)
Now that we know in very brief what ANTLR can do and the steps in using ANTLR, we would have to setup ANTLR i.e download ANTLR API jar and setup up some scripts for generating the parser and lexer and then trying out the language via the command line tool. For that please visit this official tutorial from ANTLR which shows how to setup ANTLR and a simple Hello World example.
Grammar for the DSL
Now that you have ANTLR setup let me dive into the grammar for my DSL:
grammar Graph; graph: 'Graph {' edge+ '}'; vertex: ID; edge: vertex '->' vertex '(' NUM ')' ; ID: [a-zA-Z]+; NUM: [0-9]+; WS: [ \t\r\n]+ -> skip;
Lets go through these rules:
graph: 'Graph {' edge+ '}';
The above grammar rule which is the start rule says that the language should start with ‘Graph {‘ and end with ‘}’ and has to contain at lease one edge or more than one edge.
vertex: ID; edge: vertex '->' vertex '(' NUM ')' ; ID: [a-zA-Z]+; NUM: [0-9]+;
The above four rules say that a vertex should have atleast one character or more than one character. And an edge is defined as collection of two vertices separated by a ‘->’ and with the some digits in the ‘()’.
I have named the grammar language as “Graph” and hence once we use ANTLR to generate the java classes i.e parser and lexer we will end up seeing the following classes: GraphParser, GraphLexer, GraphListener and GraphBaseListener. The first two classes deal with the generation of parse tree and the last two classes deal with the parse tree walk through. GraphListener is an interface which contains all the methods for dealing with the parse tree i.e dealing with events such as entering a rule, exiting a rule, visiting a terminal node and in addition to these it contains methods for dealing with events related to entering the graph rule, entering the edge rule and entering the vertex rule. We will be making use of these methods to intercept the data present in the dsl and then populate the semantic model.
Populating the semantic model
I have created a file graph.gr in the resource package which contains the DSL for populating the graph. As the files in the resource package are available to the ClassLoader at runtime, we can use the ClassLoader to read the DSL script and then pass it on to the Lexer and parser classes. The DSL script used is:
Graph { A -> B (10) B -> C (20) D -> E (30) A -> E (12) B -> D (8) }
And the code which loads the DSL and populates the semantic model:
//Please resolve the imports for the classes used. public class GraphDslAntlrSample { public static void main(String[] args) throws IOException { //Reading the DSL script InputStream is = ClassLoader.getSystemResourceAsStream("resources/graph.gr"); //Loading the DSL script into the ANTLR stream. CharStream cs = new ANTLRInputStream(is); //Passing the input to the lexer to create tokens GraphLexer lexer = new GraphLexer(cs); CommonTokenStream tokens = new CommonTokenStream(lexer); //Passing the tokens to the parser to create the parse trea. GraphParser parser = new GraphParser(tokens); //Semantic model to be populated Graph g = new Graph(); //Adding the listener to facilitate walking through parse tree. parser.addParseListener(new MyGraphBaseListener(g)); //invoking the parser. parser.graph(); Graph.printGraph(g); } } /** * Listener used for walking through the parse tree. */ class MyGraphBaseListener extends GraphBaseListener { Graph g; public MyGraphBaseListener(Graph g) { this.g = g; } @Override public void exitEdge(GraphParser.EdgeContext ctx) { //Once the edge rule is exited the data required for the edge i.e //vertices and the weight would be available in the EdgeContext //and the same can be used to populate the semantic model Vertex fromVertex = new Vertex(ctx.vertex(0).ID().getText()); Vertex toVertex = new Vertex(ctx.vertex(1).ID().getText()); double weight = Double.parseDouble(ctx.NUM().getText()); Edge e = new Edge(fromVertex, toVertex, weight); g.addEdge(e); } }
And the output when the above would be executed would be:
Vertices... A B C D E Edges... A to B with weight 10.0 B to C with weight 20.0 D to E with weight 30.0 A to E with weight 12.0 B to D with weight 8.0
To summarize, this post creates a external DSL for populating the data for graphs by making use of ANTLR. I will enhance this simple DSL and expose it as an utility which can be used by programmers working on graphs.
The post is very heavy on concepts and code, feel free to drop in any queries you have so that I can try to address them for benefit of others as well.