DSL based approach to input Graph data in graph theory based java programs
Most of us have coded some programs which deal with graph theory algorithms like finding the shortest path between two vertices, finding the minimum spanning tree for a given graph and so on. In each of these algorithms the programatic way to represent the graph is to used either adjacency matrix or adjacency list. Both of them are not very intuitive way of defining the graph input. The adjacency matrix for example can lead to errors if the entries are not made in the right columns and rows. Moreover at run time you are not very sure which row/column represents which edge and things get even more complicated when it comes to input for a graph with large number of vertices.
During my engineering studies I had implemented quite a few graph algorithms in Java and in all of them I had nested for-loops for taking the adjacency matrix input. Recently when I was reading Martin Fowlers DSL book, I hit upon the idea of create a DSL for providing graph input i.e a DSL which would allow the user to specify the vertices, edges and their weights. I picked the graph algorithms which I had implemented and just stripped of the adjacency matrix input and instead made use of the DSL which I created. The algorithm worked like a charm.
In this post I will show the valid syntax for the DSL by taking different graph inputs and showing the DSL for them. Then I will show you the library which I created which consists of the Semantic Model of the Graph, the parser and lexer of the DSL and a simple builder API which populates the semantic model from the DSL script. The parser and lexer were generated by using ANTLR and hence this library requires ANTLR Jar to be available in the classpath. And finally I will show how this DSL can be used to find the minimum spanning tree using Kruskals Algorithm.
DSL Syntax and some examples
The DSL for the below graph (g1):
Graph { A1 -> B2 (12.3) B2 -> C3(0.98) C3->D4 (2) D4 ->E5 (12.45) }
Note that there are different spaces between the elements in the DSL above. This is just to show the different ways in which the DSL can be written.
The DSL for the below graph(g2) would be:
Graph{ A1 -> B2 (12.3) B2 -> C3 (0.98) C3 -> D4 (2) E5 }
Note that there is no space between ‘Graph’ and ‘{‘. This is just to show different ways it can be written.
The DSL for the below graph (g3) would be:
Graph { A -> B (12.3) B -> C (0.98) C -> D (2) D -> E (12.45) }
Now to show some of the invalid DSL scripts:
Graph { 1A -> B (12.3) B -> C (0.98) }
The above is invalid because the vertex name begins with a digit.
Graph { }
The above is invalid because the Graph expects atleast one vertex to be defined. But it can have zero or more edges.
Library for DSL based graph input
I have made use of ANTLR which does all the task of creating lexer and parser for the grammar which I defined for my DSL. This way I need not worry about creating parser or worry about creating tokens from the DSL input script.
The parser and lexer classes along with the semantic model classes are package into a jar and this jar along with the ANTLR jar have to be included to make use of writing a DSL for graph input.
The structure of the DSL jar can be seen in the screenshot below:
The classes in the graph package correspond to the semantic model i.e these are generic classes and can be used irrespecitve of whether someone is using a DSL or not. The classes in graph.dsl correspond to the ANTLR generated java classes for lexer and parser.
The grammar which is used by ANTLR for lexical analysis and parsing is:
grammar Graph; graph: GRAPH_START (edge|vertex)+ GRAPH_END; edge: (vertex) TO (vertex) weight; vertex: ID; weight: '(' NUM ')'; GRAPH_START : 'Graph'([ ])*'{'; GRAPH_END : '}'; WEIGHT_START: '('; WEIGHT_END: ')'; TO: '->'; ID: ^[a-zA-Z][a-zA-Z0-9]*; NUM: [0-9]*[.]?[0-9]+; WS: [ \t\r\n]+ -> skip;
The above grammar has scope for improvement but as my first attempt I have tried to keep it to this level.
- Download the DSL jar from here(GraphDSL.jar).
- Download the ANTLR jar from here(antlr-4.1-complete.jar).
Note: This DSL is developed using ANTLR Version 4.
A recommended book for external DSLs using ANTLR is Language Implementation Patterns: Create Your Own DomainSpecific and General Programming Languages
Kruskals Algorithm to find minimum spanning tree
The graph which is used to test this algorithm implementation is:
and the DSL for the same is:
Graph { A -> B (7) B -> C (8) A -> D (5) B -> D (9) D -> E (15) D -> F (6) E -> F (8) E -> C (5) B -> E (7) E -> G (9) F -> G (11) }
Lets look at the implementation:
package kruskalsalgo; import java.util.ArrayList; import java.util.Collections; import java.util.Scanner; import graph.Edge; import graph.Graph; import graph.Vertex; import graph.GraphBuilder; import java.io.IOException; import java.util.Comparator; public class KruskalsAlgorithm { public static void main(String[] args) throws IOException { //Load the graph data from the DSL Graph g = new GraphBuilder().buildGraph("graph.gr"); ArrayList<Set> forest = new ArrayList<Set>(); ArrayList<Edge> finalEdgeSet = new ArrayList<Edge>(); //Creating disjoint set of vertices which represents the initial forest for (Vertex v : g.getVertices()) { Set newSet = new Set(); newSet.getVertexList().add(v); forest.add(newSet); //Creating Disjoint Sets } //sort the edges in the graph based on their weight Collections.sort(g.getEdges(), new Comparator<Edge>(){ public int compare(Edge o1, Edge o2) { return o1.getWeight().compareTo(o2.getWeight()); } }); for (Edge edge : g.getEdges()) { //Find in which set the vertices in the edges belong int rep1 = Set.findRep(edge.getFromVertex(), forest); int rep2 = Set.findRep(edge.getToVertex(), forest); //If in different sets then merge them into one set and pick the edge. if (rep1 != rep2) { finalEdgeSet.add(edge); Set.Union(rep1, rep2, forest); } } System.out.println("The Minimum Spanning tree is"); for (Edge edge : finalEdgeSet) { System.out.println("Vertex: " + edge.getFromVertex().getLabel() + " to Vertex: " + edge.getToVertex().getLabel()); } System.out.println(""); }//End of Main } class Set { private ArrayList<Vertex> vertexList; private int representative; static int count; public Set() { vertexList = new ArrayList<Vertex>(); this.representative = ++(Set.count); } //Find the set identifier in which the given vertex belongs to. public static int findRep(Vertex vertex, ArrayList<Set> forest) { int rep = 0; for (Set set : forest) { for (Vertex v : set.getVertexList()) { if (v.getLabel().equals(vertex.getLabel())) { return set.getRepresentative(); } } } return rep; } //Find the set given the step identifier. public static Set findSet(int rep, ArrayList<Set> forest) { Set resultSet = null; for (Set set : forest) { if (set.getRepresentative() == rep) { return set; } } return resultSet; } //Merge the set into another and remove it from the main set. public static void Union(int rep1, int rep2, ArrayList<Set> forest) { Set set1 = Set.findSet(rep1, forest); Set set2 = Set.findSet(rep2, forest); for (Vertex v : set2.getVertexList()) { set1.getVertexList().add(v); } forest.remove(set2); } public ArrayList<Vertex> getVertexList() { return vertexList; } public int getRepresentative() { return representative; } }
The above code loads the graph data from the dsl- graph.gr. The DSL scripts have to be placed in the resources package so that the DSL library can locate it.
The output for the above code:
The Minimum Spanning tree is Vertex: A to Vertex: D Vertex: E to Vertex: C Vertex: D to Vertex: F Vertex: A to Vertex: B Vertex: B to Vertex: E Vertex: E to Vertex: G
and showing the same diagramatically
Some suggestions: wouldn’t it be better to choose something else than Set as class name? Also, VertexList should be a List, not an ArrayList.