Neo4j: Graphing the ‘My name is…I work’ Twitter meme
Over the last few days I’ve been watching the chain of ‘My name is…’ tweets kicked off by DHH with interest. As I understand it, the idea is to show that coding interview riddles/hard tasks on a whiteboard are ridiculous.
Hello, my name is David. I would fail to write bubble sort on a whiteboard. I look code up on the internet all the time. I don't do riddles.
— DHH (@dhh) February 21, 2017
Other people quoted that tweet and added their own piece and yesterday Eduardo Hernacki suggested that traversing this chain of tweets seemed tailor made for Neo4j.
https://twitter.com/eduardohki/status/836402386440761347
Michael was quickly on the scene and created a Cypher query which calls the Twitter API and creates a Neo4j graph from the resulting JSON response. The only tricky bit is creating a ‘bearer token’ but Jason Kotchoff has a helpful gist showing how to generate one from your Twitter consumer key and consumer secret.
Now that we’re got our bearer token let’s create a parameter to store it. Type the following in the Neo4j browser:
:param bearer: '<your-bearer-token-goes-here>'
Now we’re ready to query the Twitter API. We’ll start with the search API and find all tweets which contain the text ‘”my name” “I work”‘. That will return a JSON response containing lots of tweets. We’ll then create a node for each tweet it returns, a node for the user who posted the tweet, a node for the tweet it quotes, and relationships to glue them all together.
We’re going to use the apoc.load.jsonParams procedure from the APOC library to help us import the data. If you want to follow along you can use a Neo4j sandbox instance which comes with APOC installed. For your local Neo4j installation, grab the APOC jar and put it into your plugins folder before restarting Neo4j.
This is the query in full:
WITH 'https://api.twitter.com/1.1/search/tweets.json?count=100&result_type=recent⟨=en&q=' as url, {bearer} as bearer CALL apoc.load.jsonParams(url + "%22my%20name%22%20is%22%20%22I%20work%22",{Authorization:"Bearer "+bearer},null) yield value UNWIND value.statuses as status WITH status, status.user as u, status.entities as e WHERE status.quoted_status_id is not null // create a node for the original tweet MERGE (t:Tweet {id:status.id}) ON CREATE SET t.text=status.text,t.created_at=status.created_at,t.retweet_count=status.retweet_count, t.favorite_count=status.favorite_count // create a node for the author + a POSTED relationship from the author to the tweet MERGE (p:User {name:u.screen_name}) MERGE (p)-[:POSTED]->(t) // create a MENTIONED relationship from the tweet to any users mentioned in the tweet FOREACH (m IN e.user_mentions | MERGE (mu:User {name:m.screen_name}) MERGE (t)-[:MENTIONED]->(mu)) // create a node for the quoted tweet and create a QUOTED relationship from the original tweet to the quoted one MERGE (q:Tweet {id:status.quoted_status_id}) MERGE (t)–[:QUOTED]->(q) // repeat the above steps for the quoted tweet WITH t as t0, status.quoted_status as status WHERE status is not null WITH t0, status, status.user as u, status.entities as e MERGE (t:Tweet {id:status.id}) ON CREATE SET t.text=status.text,t.created_at=status.created_at,t.retweet_count=status.retweet_count, t.favorite_count=status.favorite_count MERGE (t0)-[:QUOTED]->(t) MERGE (p:User {name:u.screen_name}) MERGE (p)-[:POSTED]->(t) FOREACH (m IN e.user_mentions | MERGE (mu:User {name:m.screen_name}) MERGE (t)-[:MENTIONED]->(mu)) MERGE (q:Tweet {id:status.quoted_status_id}) MERGE (t)–[:QUOTED]->(q);
The resulting graph looks like this:
MATCH p=()-[r:QUOTED]->() RETURN p LIMIT 25
A more interesting query would be to find the path from DHH to Eduardo which we can find with the following query:
match path = (dhh:Tweet {id: 834146806594433025})<-[:QUOTED*]-(eduardo:Tweet{id: 836400531983724545}) UNWIND NODES(path) AS tweet MATCH (tweet)<-[:POSTED]->(user) RETURN tweet, user
This query:
- starts from DHH’s tweet
- traverses all QUOTED relationships until it finds Eduardo’s tweet
- collects all those tweets and then finds the author
- returns the tweet and the author
And this is the output:
I ran a couple of other queries against the Twitter API to hydrate some nodes that we hadn’t set all the properties on – you can see all the queries on this gist.
For the next couple of days I also have a sandbox running https://10-0-1-157-32898.neo4jsandbox.com/browser/. You can login using the credentials readonly/twitter.
If you have any questions/suggestions let me know in the comments, @markhneedham on twitter, or email the Neo4j DevRel team – devrel@neo4j.com.
Reference: | Neo4j: Graphing the ‘My name is…I work’ Twitter meme from our JCG partner Mark Needham at the Mark Needham Blog blog. |