Getting Started With Neo4j
Pseudocode to create our ‘Toy’ Network
Five Nodes
N1 = Tom
N2 = Harry
N3 = Julian
N4 = Michele
N5 = Josephine
Five Edges
e1 = Harry ‘is known by’ Tom
e2 = Julian ‘is co-worker of’ Harry
e3 = Michele ‘is wife of’ Harry
e4 = Josephine ‘is wife of’ Tom
e5 = Josephine ‘is friend of’ Michele
A simple text description of a graph
N1 - e1 -> N2
N2 - e2 -> N3
2 - e3 -> N4
N1 - e4 -> N5
N4 - e5 -> N5
A more technical text description of a graph
N1:ToyNode - e1 -> N2:ToyNode
N2 - e2 -> N3:ToyNode
N2 - e3 -> N4:ToyNode
N1 - e4 -> N5:ToyNode
N4 - e5 -> N5
Even more technical pseudo-code
N1:ToyNode - ToyRelation -> N2:ToyNode
N2 - ToyRelation -> N3:ToyNode
N2 - ToyRelation -> N4:ToyNode
N1 - ToyRelation -> N5:ToyNode
N4 - ToyRelation -> N5
Pseudo-code approximating CYPHER code
N1:ToyNode {name: 'Tom'} - ToyRelation {relationship: 'knows'} -> N2:ToyNode {name: 'Harry'}
N2 - ToyRelation {relationship: 'co-worker'} -> N3:ToyNode {name: 'Julian', job: 'plumber'} N2 - ToyRelation {relationship: 'wife'}-> N4:ToyNode {name: 'Michele', job: 'accountant'}
N1 - ToyRelation {relationship: 'wife'} -> N5:ToyNode {name: 'Josephine', job: 'manager'}
N4 - ToyRelation {relationship: 'friend'} -> N5
The actual CYPHER code to create our ‘Toy’ network
create (N1:ToyNode {name: 'Tom'}) - [:ToyRelation {relationship: 'knows'}] -> (N2:ToyNode {name: 'Harry'}),
(N2) - [:ToyRelation {relationship: 'co-worker'}] -> (N3:ToyNode {name: 'Julian', job: 'plumber'}),
(N2) - [:ToyRelation {relationship: 'wife'}] -> (N4:ToyNode {name: 'Michele', job: 'accountant'}),
(N1) - [:ToyRelation {relationship: 'wife'}] -> (N5:ToyNode {name: 'Josephine', job: 'manager'}),
(N4) - [:ToyRelation {relationship: 'friend'}] -> (N5)
;
View the resulting graph
match (n:ToyNode)-[r]-(m) return n, r, m
Delete all nodes and edges
match (n)-[r]-() delete n, r
Delete all nodes which have no edges
match (n) delete n
Delete only ToyNode nodes which have no edges
match (n:ToyNode) delete n
Delete all edges
match (n)-[r]-() delete r
Delete only ToyRelation edges
match (n)-[r:ToyRelation]-() delete r
Selecting an existing single ToyNode node
match (n:ToyNode {name:'Julian'}) return n
Adding a Node Correctly
match (n:ToyNode {name:'Julian'})
merge (n)-[:ToyRelation {relationship: 'fiancee'}]->(m:ToyNode {name:'Joyce', job:'store clerk'})
Adding a Node Incorrectly
create (n:ToyNode {name:'Julian'})-[:ToyRelation {relationship: 'fiancee'}]->(m:ToyNode {name:'Joyce', job:'store clerk'})
Correct your mistake by deleting the bad nodes and edge
match (n:ToyNode {name:'Joyce'})-[r]-(m) delete n, r, m
Modify a Node’s Information
match (n:ToyNode) where n.name = 'Harry' set n.job = 'drummer'
match (n:ToyNode) where n.name = 'Harry' set n.job = n.job + ['lead guitarist']
One way to "clean the slate" in Neo4j before importing (run both lines):
match (a)-[r]->() delete a,r
match (a) delete a
Script to Import Data Set: test.csv (simple road network)
For Windows use something like the following
[NOTE: replace any spaces in your path with %20, "percent twenty" ]
LOAD CSV WITH HEADERS FROM "file:///C:/coursera/data/test.csv" AS line
MERGE (n:MyNode {Name:line.Source})
MERGE (m:MyNode {Name:line.Target})
MERGE (n) -[:TO {dist:line.distance}]-> (m)
For mac OSX use something like the following
[NOTE: replace any spaces in your path with %20, "percent twenty" ]
LOAD CSV WITH HEADERS FROM "file:///coursera/data/test.csv" AS line
MERGE (n:MyNode {Name:line.Source})
MERGE (m:MyNode {Name:line.Target})
MERGE (n) -[:TO {dist:line.distance}]-> (m)
Script to import global terrorist data
LOAD CSV WITH HEADERS FROM "file:///Users/jsale/sdsc/coursera/data/terrorist_data_subset.csv" AS row
MERGE (c:Country {Name:row.Country})
MERGE (a:Actor {Name: row.ActorName, Aliases: row.Aliases, Type: row.ActorType})
MERGE (o:Organization {Name: row.AffiliationTo})
MERGE (a)-[:AFFILIATED_TO {Start: row.AffiliationStartDate, End: row.AffiliationEndDate}]->(o)
MERGE(c)<-[:IS_FROM]-(a);
Basic Graph Operations with CYPHER
Counting the number of nodes
match (n:MyNode)
return count(n)
Counting the number of edges
match (n:MyNode)-[r]->()
return count(r)
Finding leaf nodes:
match (n:MyNode)-[r:TO]->(m)
where not ((m)-->())
return m
Finding root nodes:
match (m)-[r:TO]->(n:MyNode)
where not (()-->(m))
return m
Finding triangles:
match (a)-[:TO]->(b)-[:TO]->(c)-[:TO]->(a)
return distinct a, b, c
Finding 2nd neighbors of D:
match (a)-[:TO*..2]-(b)
where a.Name='D'
return distinct a, b
Finding the types of a node:
match (n)
where n.Name = 'Afghanistan'
return labels(n)
Finding the label of an edge:
match (n {Name: 'Afghanistan'})<-[r]-()
return distinct type(r)
Finding all properties of a node:
match (n:Actor)
return * limit 20
Finding loops:
match (n)-[r]->(n)
return n, r limit 10
Finding multigraphs:
match (n)-[r1]->(m), (n)-[r2]-(m)
where r1 <> r2
return n, r1, r2, m limit 10
Finding the induced subgraph given a set of nodes:
match (n)-[r:TO]-(m)
where n.Name in ['A', 'B', 'C', 'D', 'E'] and m.Name in ['A', 'B', 'C', 'D', 'E']
return n, r, m
Path Analytics with CYPHER
Viewing the graph
match (n:MyNode)-[r]->(m)
return n, r, m
Finding paths between specific nodes*:
match p=(a)-[:TO*]-(c)
where a.Name='H' and c.Name='P'
return p limit 1
*Your results might not be the same as the video hands-on demo. If not, try the following query and it should return the shortest path between nodes H and P:
match p=(a)-[:TO*]-(c) where a.Name='H' and c.Name='P' return p order by length(p) asc limit 1
Finding the length between specific nodes:
match p=(a)-[:TO*]-(c)
where a.Name='H' and c.Name='P'
return length(p) limit 1
Finding a shortest path between specific nodes:
match p=shortestPath((a)-[:TO*]-(c))
where a.Name='A' and c.Name='P'
return p, length(p) limit 1
All Shortest Paths:
MATCH p = allShortestPaths((source)-[r:TO*]-(destination))
WHERE source.Name='A' AND destination.Name = 'P'
RETURN EXTRACT(n IN NODES(p)| n.Name) AS Paths
All Shortest Paths with Path Conditions:
MATCH p = allShortestPaths((source)-[r:TO*]->(destination))
WHERE source.Name='A' AND destination.Name = 'P' AND LENGTH(NODES(p)) > 5
RETURN EXTRACT(n IN NODES(p)| n.Name) AS Paths,length(p)
Diameter of the graph:
match (n:MyNode), (m:MyNode)
where n <> m
with n, m
match p=shortestPath((n)-[*]->(m))
return n.Name, m.Name, length(p)
order by length(p) desc limit 1
Extracting and computing with node and properties:
match p=(a)-[:TO*]-(c)
where a.Name='H' and c.Name='P'
return extract(n in nodes(p)|n.Name) as Nodes, length(p) as pathLength,
reduce(s=0, e in relationships(p)| s + toInt(e.dist)) as pathDist limit 1
Dijkstra's algorithm for a specific target node:
MATCH (from: MyNode {Name:'A'}), (to: MyNode {Name:'P'}),
path = shortestPath((from)-[:TO*]->(to))
WITH REDUCE(dist = 0, rel in rels(path) | dist + toInt(rel.dist)) AS distance, path
RETURN path, distance
Dijkstra's algorithm SSSP:
MATCH (from: MyNode {Name:'A'}), (to: MyNode),
path = shortestPath((from)-[:TO*]->(to))
WITH REDUCE(dist = 0, rel in rels(path) | dist + toInt(rel.dist)) AS distance, path, from, to
RETURN from, to, path, distance order by distance desc
Graph not containing a selected node:
match (n)-[r:TO]->(m)
where n.Name <> 'D' and m.Name <> 'D'
return n, r, m
Shortest path over a Graph not containing a selected node:
match p=shortestPath((a {Name: 'A'})-[:TO*]-(b {Name: 'P'}))
where not('D' in (extract(n in nodes(p)|n.Name)))
return p, length(p)
Graph not containing the immediate neighborhood of a specified node:
match (d {Name:'D'})-[:TO]-(b)
with collect(distinct b.Name) as neighbors
match (n)-[r:TO]->(m)
where
not (n.Name in (neighbors+'D'))
and
not (m.Name in (neighbors+'D'))
return n, r, m
;
match (d {Name:'D'})-[:TO]-(b)-[:TO]->(leaf)
where not((leaf)-->())
return (leaf)
;
match (d {Name:'D'})-[:TO]-(b)<-[:TO]-(root)
where not((root)<--())
return (root)
Graph not containing a selected neighborhood:
match (a {Name: 'F'})-[:TO*..2]-(b)
with collect(distinct b.Name) as MyList
match (n)-[r:TO]->(m)
where not(n.Name in MyList) and not (m.Name in MyList)
return distinct n, r, m
Connectivity Analytics with CYPHER
Viewing the graph
match (n:MyNode)-[r]->(m)
return n, r, m
Find the outdegree of all nodes
match (n:MyNode)-[r]->()
return n.Name as Node, count(r) as Outdegree
order by Outdegree
union
match (a:MyNode)-[r]->(leaf)
where not((leaf)-->())
return leaf.Name as Node, 0 as Outdegree
Find the indegree of all nodes
match (n:MyNode)<-[r]-()
return n.Name as Node, count(r) as Indegree
order by Indegree
union
match (a:MyNode)<-[r]-(root)
where not((root)<--())
return root.Name as Node, 0 as Indegree
Find the degree of all nodes
match (n:MyNode)-[r]-()
return n.Name, count(distinct r) as degree
order by degree
Find degree histogram of the graph
match (n:MyNode)-[r]-()
with n as nodes, count(distinct r) as degree
return degree, count(nodes) order by degree asc
Save the degree of the node as a new node property
match (n:MyNode)-[r]-()
with n, count(distinct r) as degree
set n.deg = degree
return n.Name, n.deg
Construct the Adjacency Matrix of the graph
match (n:MyNode), (m:MyNode)
return n.Name, m.Name,
case
when (n)-->(m) then 1
else 0
end as value
Construct the Normalized Laplacian Matrix of the graph
match (n:MyNode), (m:MyNode)
return n.Name, m.Name,
case
when n.Name = m.Name then 1
when (n)-->(m) then -1/(sqrt(toInt(n.deg))*sqrt(toInt(m.deg)))
else 0
end as value