Knowledge Graph Fundamentals
The graph data model — nodes, relationships, properties, labels — and how it maps to graph theory; why graph traversal complements vector similarity search for RAG; "identity through relationships"; knowledge graphs vs tabular storage; and a first taste of Cypher text notation.
On this page
Why knowledge graphs matter for RAG
In a basic RAG system documents become chunks, chunks become vectors, and retrieval is nearest-neighbour by cosine similarity. That finds semantically similar text well — but it cannot discover structural connections between entities. Store those chunks in a knowledge graph and a new option opens: retrieve one chunk, then traverse relationships to reach connected chunks and entities that similarity search would never surface. [V] Verified
Name two retrieval strategies a knowledge graph enables beyond vector similarity search, and the kind of question each answers. KGR-1.2
Graph traversal (follow typed relationships from a starting node to connected entities — answers “what is connected to this?”) and multi-hop expansion (chain several relationships to reach indirectly-related context — answers “what is connected to the things connected to this?”). Vector search only answers “what text is similar?”, so the graph adds connection-based retrieval that embedding distance can’t.
The graph data model
- Nodes are data records representing entities. In Cypher text notation they’re written in parentheses:
(Person). Each node carries one or more labels (e.g.Person,Company,Movie) that group it with similar entities, plus zero or more properties. - Relationships connect two nodes and are themselves rich records: a start and end node, a type (
ACTED_IN,KNOWS,OWNS_STOCK_IN), a direction, and optional properties. Written in square brackets with arrows.
// Person named Andreas KNOWS Person named Andrew, since 2024
(Person {name: "Andreas"})-[:KNOWS {since: 2024}]->(Person {name: "Andrew"})
// Patterns chain to express richer scenarios:
(Person {name: "Andreas"})-[:TEACHES]->(Course {title: "KG for RAG"})
Identity through relationships
KGR-1.3A node’s role comes from its relationships, not from extra labels. A Person
is an actor because it has an ACTED_IN relationship, a director because it has
DIRECTED, a writer because it has WROTE — no separate Actor/Director/Writer
labels needed. Direction and type are the semantics. When it breaks: when a
role needs role-specific properties (an actor’s per-film salary vs a director’s
budget), relationship-only identity pushes those onto the relationship, which can
get unwieldy at scale. [V] Verified
Explain 'identity through relationships'. Why does a Person node not need separate Actor/Director labels? KGR-1.3
A node’s role is encoded by the typed relationships it participates in, not by a label or type field. A Person with an ACTED_IN relationship is an actor; the same node with a DIRECTED relationship is a director — so adding Actor/Director labels would be redundant with the relationships and risk a “label explosion.” Direction and relationship type carry the semantic meaning.
Graph-theory equivalences
The vocabulary maps directly onto classic graph theory — interviewers use both, so know each: [V] Verified
| Graph theory | KG term | Why the KG term is preferred | | --- | --- | --- | | Vertex | Node | More intuitive for data modelling | | Edge | Relationship | Conveys richness (type, direction, properties) | | Graph | Knowledge graph | Emphasises semantic meaning and structure | | Adjacency | Traversal | Describes following relationships through data |
Knowledge graphs vs tabular storage
The core trade-off is relationship complexity, not data size. A relational database represents relationships with foreign keys and reconstructs them with JOINs; a graph makes relationships first-class, so multi-hop questions become traversals instead of nested JOINs.
| Dimension | Relational DB | Knowledge graph | | --- | --- | --- | | Schema | Fixed (DDL required) | Flexible (schema-optional) | | Relationships | Foreign keys + JOINs | First-class records | | Multi-hop queries | Expensive (nested JOINs) | Natural (traversal) | | Vector search | Bolt-on (e.g. pgvector) | Native (vector indexes) | | Query language | SQL | Cypher | | Best for | Transactional, tabular | Connected, semantic data |
The cost intuition: a -hop traversal touches roughly nodes where is the average fan-out per node — but it only walks the local neighbourhood, whereas the SQL equivalent needs self-JOINs over the whole table.
On which single axis does the knowledge-graph-vs-relational decision most turn, and which way does each side point? KGR-1.4
Relationship complexity. If the data is highly interconnected with many-to-many or multi-hop relationships (and you query across them), the graph wins — traversal stays cheap where SQL JOINs explode. If relationships are simple foreign keys and the workload is tabular or pure similarity, a relational database (plus a vector index) is simpler and sufficient. Schema flexibility and native vector+graph queries are secondary tilts toward the graph.
A first taste of Cypher
Cypher is Neo4j’s declarative, pattern-matching query
language: you draw the pattern you want and it returns every matching subgraph.
The notation mirrors the whiteboard — () for nodes, [] for relationships, ->
for direction. [V] Verified
// Find all people who acted in a movie
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN p.name, m.title
Read it as: find a Person connected to a Movie by an ACTED_IN relationship,
and return their names. The pattern syntax is the graph structure, which makes
Cypher queries self-documenting (Module 2 is a full Cypher tutorial; Module 7 has
an LLM generate Cypher automatically).
In Cypher text notation, what encloses a node, what encloses a relationship, and how is direction shown? Write the pattern for a Person who DIRECTED a Movie. KGR-1.6
Nodes are in parentheses (), relationships in square brackets [:TYPE], and direction with an arrow -> (or <-). Pattern: (p:Person)-[:DIRECTED]->(m:Movie). Labels follow a colon inside the node; properties go in {} on either a node or a relationship.
Where knowledge graphs are used
Give two real-world uses of knowledge graphs outside RAG, and what the graph encodes in each. KGR-1.5
Two of: web-search knowledge cards (entities like people/companies linked by typed relationships, aggregated into a summary panel); e-commerce recommendations (purchase/view relationships traversed for “also bought” suggestions); financial analysis (companies, investors, and filings as connected entities). In each, the graph encodes entities and the typed relationships between them, enabling look-ups and traversals that tabular storage handles awkwardly.
The course roadmap
The guide builds one capability per module onto the same graph (the course’s seven Modules map one-to-one to this guide’s Chapters — the terms are used interchangeably):
| Module | Topic | What gets added |
| --- | --- | --- |
| 1 | Fundamentals | Vocabulary: nodes, relationships, properties, labels |
| 2 | Querying | Cypher: MATCH, WHERE, CREATE, MERGE, DELETE |
| 3 | Preparing for RAG | Vector indexes, embeddings, similarity search |
| 4 | Graph construction | SEC 10-K chunks as nodes; vector search on the graph |
| 5 | Adding relationships | NEXT, PART_OF, SECTION; chunk-window retrieval |
| 6 | Expanding the graph | Form 13 data; Company/Manager nodes; full-text index |
| 7 | Chatting with the KG | LLM-generated Cypher; few-shot; combined retrieval |
Summary
A knowledge graph stores data as nodes and relationships, both carrying properties; labels group nodes, and relationship type + direction encode semantic meaning. That structure enables traversal-based retrieval that complements vector similarity — surfacing connections embeddings can’t. Cypher expresses queries as the very patterns you’d draw on a whiteboard. Reach for a graph when your questions are about connections across entities; for simple tabular look-ups, relational storage is simpler.
- Chapter 2 — querying & Cypher:
MATCH/MERGE/WHEREagainst a real dataset. - Chapters 3–6 — building a knowledge graph from SEC filings (vectors, then relationships, then expansion).
- Chapter 7 — graph-RAG chat: an LLM writes the Cypher for you.