Part 1 Chapter 1 Last verified 2026-06-19

Knowledge Graph Fundamentals

The graph data model — nodes, relationships, properties, labels — and how it maps to graph theory; why graph traversal complements vector similarity search for RAG; "identity through relationships"; knowledge graphs vs tabular storage; and a first taste of Cypher text notation.

On this page
  1. Why knowledge graphs matter for RAG
  2. The graph data model
  3. Graph-theory equivalences
  4. Knowledge graphs vs tabular storage
  5. A first taste of Cypher
  6. Where knowledge graphs are used
  7. The course roadmap
  8. Summary

Why knowledge graphs matter for RAG

In a basic RAG system documents become chunks, chunks become vectors, and retrieval is nearest-neighbour by cosine similarity. That finds semantically similar text well — but it cannot discover structural connections between entities. Store those chunks in a knowledge graph and a new option opens: retrieve one chunk, then traverse relationships to reach connected chunks and entities that similarity search would never surface. [V] Verified

Name two retrieval strategies a knowledge graph enables beyond vector similarity search, and the kind of question each answers. KGR-1.2

Graph traversal (follow typed relationships from a starting node to connected entities — answers “what is connected to this?”) and multi-hop expansion (chain several relationships to reach indirectly-related context — answers “what is connected to the things connected to this?”). Vector search only answers “what text is similar?”, so the graph adds connection-based retrieval that embedding distance can’t.

The graph data model

  • Nodes are data records representing entities. In Cypher text notation they’re written in parentheses: (Person). Each node carries one or more labels (e.g. Person, Company, Movie) that group it with similar entities, plus zero or more properties.
  • Relationships connect two nodes and are themselves rich records: a start and end node, a type (ACTED_IN, KNOWS, OWNS_STOCK_IN), a direction, and optional properties. Written in square brackets with arrows.
// Person named Andreas KNOWS Person named Andrew, since 2024
(Person {name: "Andreas"})-[:KNOWS {since: 2024}]->(Person {name: "Andrew"})

// Patterns chain to express richer scenarios:
(Person {name: "Andreas"})-[:TEACHES]->(Course {title: "KG for RAG"})
Key concept

Identity through relationships

KGR-1.3

A node’s role comes from its relationships, not from extra labels. A Person is an actor because it has an ACTED_IN relationship, a director because it has DIRECTED, a writer because it has WROTE — no separate Actor/Director/Writer labels needed. Direction and type are the semantics. When it breaks: when a role needs role-specific properties (an actor’s per-film salary vs a director’s budget), relationship-only identity pushes those onto the relationship, which can get unwieldy at scale. [V] Verified

Explain 'identity through relationships'. Why does a Person node not need separate Actor/Director labels? KGR-1.3

A node’s role is encoded by the typed relationships it participates in, not by a label or type field. A Person with an ACTED_IN relationship is an actor; the same node with a DIRECTED relationship is a director — so adding Actor/Director labels would be redundant with the relationships and risk a “label explosion.” Direction and relationship type carry the semantic meaning.

Graph-theory equivalences

The vocabulary maps directly onto classic graph theory — interviewers use both, so know each: [V] Verified

| Graph theory | KG term | Why the KG term is preferred | | --- | --- | --- | | Vertex | Node | More intuitive for data modelling | | Edge | Relationship | Conveys richness (type, direction, properties) | | Graph | Knowledge graph | Emphasises semantic meaning and structure | | Adjacency | Traversal | Describes following relationships through data |

Knowledge graphs vs tabular storage

The core trade-off is relationship complexity, not data size. A relational database represents relationships with foreign keys and reconstructs them with JOINs; a graph makes relationships first-class, so multi-hop questions become traversals instead of nested JOINs.

| Dimension | Relational DB | Knowledge graph | | --- | --- | --- | | Schema | Fixed (DDL required) | Flexible (schema-optional) | | Relationships | Foreign keys + JOINs | First-class records | | Multi-hop queries | Expensive (nested JOINs) | Natural (traversal) | | Vector search | Bolt-on (e.g. pgvector) | Native (vector indexes) | | Query language | SQL | Cypher | | Best for | Transactional, tabular | Connected, semantic data |

The cost intuition: a dd-hop traversal touches roughly O(kd)O(k^d) nodes where kk is the average fan-out per node — but it only walks the local neighbourhood, whereas the SQL equivalent needs dd self-JOINs over the whole table.

On which single axis does the knowledge-graph-vs-relational decision most turn, and which way does each side point? KGR-1.4

Relationship complexity. If the data is highly interconnected with many-to-many or multi-hop relationships (and you query across them), the graph wins — traversal stays cheap where SQL JOINs explode. If relationships are simple foreign keys and the workload is tabular or pure similarity, a relational database (plus a vector index) is simpler and sufficient. Schema flexibility and native vector+graph queries are secondary tilts toward the graph.

A first taste of Cypher

Cypher is Neo4j’s declarative, pattern-matching query language: you draw the pattern you want and it returns every matching subgraph. The notation mirrors the whiteboard — () for nodes, [] for relationships, -> for direction. [V] Verified

// Find all people who acted in a movie
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN p.name, m.title

Read it as: find a Person connected to a Movie by an ACTED_IN relationship, and return their names. The pattern syntax is the graph structure, which makes Cypher queries self-documenting (Module 2 is a full Cypher tutorial; Module 7 has an LLM generate Cypher automatically).

In Cypher text notation, what encloses a node, what encloses a relationship, and how is direction shown? Write the pattern for a Person who DIRECTED a Movie. KGR-1.6

Nodes are in parentheses (), relationships in square brackets [:TYPE], and direction with an arrow -> (or <-). Pattern: (p:Person)-[:DIRECTED]->(m:Movie). Labels follow a colon inside the node; properties go in {} on either a node or a relationship.

Where knowledge graphs are used

Give two real-world uses of knowledge graphs outside RAG, and what the graph encodes in each. KGR-1.5

Two of: web-search knowledge cards (entities like people/companies linked by typed relationships, aggregated into a summary panel); e-commerce recommendations (purchase/view relationships traversed for “also bought” suggestions); financial analysis (companies, investors, and filings as connected entities). In each, the graph encodes entities and the typed relationships between them, enabling look-ups and traversals that tabular storage handles awkwardly.

The course roadmap

The guide builds one capability per module onto the same graph (the course’s seven Modules map one-to-one to this guide’s Chapters — the terms are used interchangeably):

| Module | Topic | What gets added | | --- | --- | --- | | 1 | Fundamentals | Vocabulary: nodes, relationships, properties, labels | | 2 | Querying | Cypher: MATCH, WHERE, CREATE, MERGE, DELETE | | 3 | Preparing for RAG | Vector indexes, embeddings, similarity search | | 4 | Graph construction | SEC 10-K chunks as nodes; vector search on the graph | | 5 | Adding relationships | NEXT, PART_OF, SECTION; chunk-window retrieval | | 6 | Expanding the graph | Form 13 data; Company/Manager nodes; full-text index | | 7 | Chatting with the KG | LLM-generated Cypher; few-shot; combined retrieval |

Summary

A knowledge graph stores data as nodes and relationships, both carrying properties; labels group nodes, and relationship type + direction encode semantic meaning. That structure enables traversal-based retrieval that complements vector similarity — surfacing connections embeddings can’t. Cypher expresses queries as the very patterns you’d draw on a whiteboard. Reach for a graph when your questions are about connections across entities; for simple tabular look-ups, relational storage is simpler.

  • Chapter 2querying & Cypher: MATCH/MERGE/WHERE against a real dataset.
  • Chapters 3–6 — building a knowledge graph from SEC filings (vectors, then relationships, then expansion).
  • Chapter 7graph-RAG chat: an LLM writes the Cypher for you.