Querying Knowledge Graphs
A working Cypher tutorial — MATCH patterns with label/property filters and WHERE, one-hop and multi-hop relationship traversal (co-actors), modifying the graph with CREATE/MERGE/DELETE (and why MERGE is the idempotent default), connecting from Python via the LangChain Neo4j Graph class, and the parse → plan → execute → project execution model.
On this page
Connecting from Python
The notebook talks to Neo4j through LangChain’s
Neo4j Graph class, which wraps the connection
and exposes a query() method that sends Cypher and returns Python dicts. It
needs three credentials, loaded from the environment — never hard-coded: [V] Verified
from langchain_community.graphs import Neo4jGraph
kg = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD)
result = kg.query("MATCH (n) RETURN count(n)") # [{'count(n)': 171}]
How does kg.query() connect a Python app to Neo4j, and what three parameters does the connection need? KGR-2.6
The LangChain Neo4jGraph class wraps a Neo4j driver: you construct it with a URL, a username, and a password (loaded from environment variables, not hard-coded), and it handles connection pooling and auth. Its query() method sends a Cypher string to the database and returns the results as a list of Python dictionaries — the bridge between application code and the graph.
The movie dataset
The training graph models Person and Movie nodes (171 nodes, 38 movies) joined by relationship types that make Chapter 1’s “identity through relationships” concrete — a person is an actor or a director purely by their relationships, not by a label:
| Relationship | Direction | Meaning |
| --- | --- | --- |
| ACTED_IN | Person → Movie | acted in |
| DIRECTED | Person → Movie | directed |
| WROTE | Person → Movie | wrote |
| REVIEWED | Person → Movie | reviewed |
| FOLLOWS | Person → Person | a reviewer follows another |
Person carries name, born; Movie carries title, tagline, released.
Reading the graph
A Cypher MATCH specifies a pattern; the engine binds variables to every matching element. Filter by label inside the parentheses, match a property with braces, or add conditions with WHERE:
MATCH (n) RETURN count(n) AS totalNodes -- all nodes (171)
MATCH (m:Movie) RETURN count(m) AS movies -- filter by label (38)
MATCH (tom:Person {name: "Tom Hanks"}) RETURN tom -- property match
MATCH (m:Movie) WHERE m.released > 1990 AND m.released < 2000 RETURN m.title
Write a Cypher query that returns the titles of all movies released after 2000, and say which clause does the filtering. KGR-2.1
MATCH (m:Movie) WHERE m.released > 2000 RETURN m.title. The MATCH clause binds every Movie node; the WHERE clause filters them to those with released > 2000; RETURN m.title projects just the title. (Equivalently, a range like this needs WHERE — inline {...} property matching only does equality.)
Traversing relationships
Extend the pattern with relationship syntax to follow connections. The arrow gives
direction — ACTED_IN goes Person → Movie:
// All movies Tom Hanks acted in
MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(movie:Movie)
RETURN tom.name, movie.title
Multi-hop traversal is where graphs
earn their keep: chain relationships to reach indirect connections. To find Tom
Hanks’s co-actors, go forward to a shared movie, then backward (the reverse
arrow <-) from the co-actor:
// Co-actors: forward to the movie, backward from the co-actor
MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActor:Person)
RETURN coActor.name, m.title
The shared node is the graph's JOIN
KGR-2.2A two-hop pattern through a shared node — (a)-[:R]->(b)<-[:R]-(c) — is the
canonical way to discover indirect connections; the middle node b is the graph
equivalent of a SQL JOIN, but written as a visual pattern instead of nested JOINs.
When it breaks: traversals explode combinatorially on high-degree nodes — a
movie with 50 actors yields on the order of co-actor pairs, and three hops can return
millions of rows. Always bound results with LIMIT or a label/property filter. [V] Verified
Modifying the graph
MATCH + RETURN reads; CREATE,
MERGE, and DELETE write.
Reads dominate in production, but graph-construction pipelines (Modules 4–6) live
on writes.
CREATE (andreas:Person {name: "Andreas"}) RETURN andreas -- unconditional insert
MATCH (a:Person {name: "Andreas"}), (e:Person {name: "Emil Eifrem"})
MERGE (a)-[r:KNOWS]->(e) RETURN r -- idempotent: only if absent
MATCH (p:Person {name: "Emil Eifrem"})-[r:ACTED_IN]->(:Movie)
DELETE r -- remove a relationship
MERGE is the idempotent default
KGR-2.3MERGE combines existence-check + create in one atomic step, so re-processing the
same data yields the same graph — the pipeline is idempotent, which is
exactly what you need when data arrives from unreliable sources (API retries,
replays, duplicate messages). Pair it with a uniqueness constraint and
ON CREATE SET / ON MATCH SET to set initial vs updated properties. When it
breaks: if you MERGE a pattern that includes a volatile property (a
changing timestamp), MERGE treats each value as new and creates a fresh element
every run — so MERGE on the stable identifier, then ON MATCH SET the volatile
fields. [V] Verified
Compare CREATE and MERGE for a bulk-loading pipeline that may retry batches. Which is safe, and why? KGR-2.3
CREATE inserts unconditionally, so a retried batch duplicates nodes/relationships (or hits a constraint error). MERGE checks for the element first and creates only if absent, so re-processing the same data leaves the graph unchanged — it’s idempotent, the property a retry-prone pipeline needs. Use MERGE on a stable identifier and ON MATCH SET for volatile fields; reserve CREATE for guaranteed-new inserts where you want to skip the existence check.
How a query executes
Understanding the pipeline helps you write fast queries:
Run EXPLAIN before a query to see the plan without executing; PROFILE to run
it and see actual row counts per step.
Trace what the engine does for MATCH (p:Person)-[:ACTED_IN]->(m:Movie) RETURN p.name, m.title LIMIT 5. KGR-2.7
Parse the string into a pattern (a Person linked by ACTED_IN to a Movie). Plan: use the Person/Movie labels (and any index) to choose where to start and how to expand. Execute: find Person nodes, traverse outgoing ACTED_IN relationships to Movie nodes, binding p and m — stopping early thanks to LIMIT 5. Project: return only p.name and m.title for those 5 matches. Pattern spec → match → projection.
Cypher pattern cheat-sheet
Nine patterns cover most knowledge-graph query needs and recur through every later module:
| Need | Cypher |
| --- | --- |
| Count all nodes | MATCH (n) RETURN count(n) |
| Filter by label | MATCH (m:Movie) RETURN m |
| Property match | MATCH (p:Person {name: "Tom"}) RETURN p |
| Conditional | MATCH (m:Movie) WHERE m.released > 2000 RETURN m |
| One hop | MATCH (p)-[:ACTED_IN]->(m) RETURN p, m |
| Two hops | MATCH (a)-[:R]->(b)<-[:R]-(c) RETURN a, c |
| Create node | CREATE (n:Label {k: "v"}) RETURN n |
| Merge relationship | MATCH (a),(b) MERGE (a)-[:R]->(b) |
| Delete relationship | MATCH ()-[r:R]->() DELETE r |
Why should Cypher variable names be meaningful, given the engine runs the query the same regardless? KGR-2.5
Because the names are for humans, not the engine: (tom:Person)-[:ACTED_IN]->(m:Movie) reads as its own documentation, while (n1)-[:ACTED_IN]->(n2) forces the reader to reconstruct what each variable means. In multi-hop queries the difference is the gap between a pattern you can scan and one you have to decode — meaningful names are a maintainability investment with zero runtime cost.
Summary
Cypher mirrors the graph: () nodes, [] relationships, arrows for direction.
MATCH binds patterns, labels and WHERE filter, dot notation projects.
Multi-hop traversals through a shared node are the graph’s JOIN, natural where SQL
needs nested self-JOINs. MERGE (not CREATE) is the idempotent default for
writes; DELETE/DETACH DELETE respect referential integrity. These fundamentals
power every later module — where you build, query, and extend a knowledge graph
from SEC filings.
- Chapter 3 — preparing text for RAG: vector indexes and embeddings on the graph.
- Chapters 4–6 — constructing and expanding the SEC knowledge graph.
- Chapter 7 — an LLM writes this Cypher for you.