Neo4j: Finding all shortest paths
One of the Cypher language features we show in Neo4j training courses is the shortest path function which allows you to find the shortest path in terms of number of relationships between two nodes.
Using the movie graph, which you can import via the ‘:play movies’ command in the browser, we’ll first create a ‘KNOWS’ relationship between any people that have appeared in the same movie:
MATCH (p1:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(p2:Person) MERGE (p1)-[:KNOWS]-(p2)
Now that we’ve got that relationship we can easily find the shortest path between two people, say Tom Cruise and Tom Hanks:
MATCH (p1:Person {name: "Tom Hanks"}), (p2:Person {name: "Tom Cruise"}), path = shortestpath((p1)-[:KNOWS*]-(p2)) RETURN path
That works pretty well but what if we want to find the longest shortest path between any two people in the graph? We can calculate it like this:
MATCH (p1:Person), (p2:Person), path = shortestpath((p1)-[:KNOWS*]-(p2)) RETURN path ORDER BY LENGTH(path) DESC LIMIT 1
So that’s 6 hops which is actually the Bacon number – I expect we’d probably see a smaller maximum value if we imported all the movies.
And to round off the post what if we want to find the longest shortest path between the 10 people who acted in the most movies? We might start out with the following query which seems like it should do the job:
MATCH (p1:Person)-[:ACTED_IN]->() WITH p1, COUNT(*) AS appearances ORDER BY appearances DESC LIMIT 10 WITH p1 AS p1, p1 AS p2 MATCH path = shortestpath((p1)-[:KNOWS*]-(p2)) RETURN path ORDER BY LENGTH(path) DESC LIMIT 1
Unfortunately if we run that query we get no rows returned because ‘p1′ and ‘p2′ always refer to the same node. Instead we can calculate the shortest path between our hardest working people by creating a cross product using COLLECT and UNWIND:
MATCH (p1:Person)-[:ACTED_IN]->() WITH p1, COUNT(*) AS appearances ORDER BY appearances DESC LIMIT 10 WITH COLLECT(p1) AS ps UNWIND ps AS p1 UNWIND ps AS p2 MATCH path = shortestpath((p1)-[:KNOWS*]-(p2)) RETURN path ORDER BY LENGTH(path) DESC LIMIT 1
That’s all for now!
Reference: | Neo4j: Finding all shortest paths from our JCG partner Mark Needham at the Mark Needham Blog blog. |