Part 1 Chapter 7 Last verified 2026-06-19

Chatting with the Knowledge Graph

The capstone: the Extract-Enhance-Expand pattern for growing graphs, Address nodes with geospatial points and point.distance() search, teaching an LLM to generate Cypher with few-shot prompts, the GraphCypherQAChain for end-to-end natural-language querying, schema-hallucination guards, and the four retrieval paradigms (vector, full-text, traversal, geospatial) the finished graph unites.

On this page
  1. The pattern behind the whole course: Extract-Enhance-Expand
  2. A fourth paradigm: geospatial search
  3. Teaching an LLM to write Cypher
  4. Four retrieval paradigms, one engine
  5. Summary — and the course in one arc

The pattern behind the whole course: Extract-Enhance-Expand

Key concept

Extract-Enhance-Expand

KGR-7.1

Every module repeated one three-phase move:

  1. Extract — pull source data into nodes (text → Chunks; CSV → Company / Manager / Address nodes).
  2. Enhance — add indexes or computed properties (vector embeddings, full-text indexes, geospatial points).
  3. Expand — connect the new nodes to the existing graph (NEXT, PART_OF, FILED, OWNS_STOCK_IN, LOCATED_AT).

It’s a repeatable framework for growing any graph incrementally. When it breaks: Expand assumes the new data links to existing nodes by a shared identifier — when identifiers are absent or ambiguous (name-only matching), Expand needs an entity-resolution step the pattern doesn’t itself provide. [V] Verified

| Module | Extract | Enhance | Expand | | --- | --- | --- | --- | | M4 Documents | Chunk nodes | vector embeddings | — | | M5 Structure | Form nodes | section index | NEXT, PART_OF, SECTION | | M6 Investments | Company, Manager | full-text index | FILED, OWNS_STOCK_IN | | M7 Geography | Address nodes | geospatial points | LOCATED_AT |

Name the three phases of Extract-Enhance-Expand and give one example of each from the course. KGR-7.1

Extract — create nodes from source data (e.g. split filing text into Chunk nodes, or build Manager nodes from the Form 13 CSV). Enhance — add indexes or computed properties (vector embeddings on chunks, a full-text index on manager names, geospatial points on addresses). Expand — connect the new nodes into the existing graph with relationships (NEXT/PART_OF for chunks, FILED/OWNS_STOCK_IN for the investment data, LOCATED_AT for addresses).

The graph is pre-expanded with Address nodes (via LOCATED_AT), each holding a geospatial point — latitude/longitude. That unlocks distance queries with point.distance():

MATCH (sc:Address {city: "Santa Clara"})
MATCH (com:Company)-[:LOCATED_AT]->(comAddr:Address)
WHERE point.distance(sc.location, comAddr.location) < 10000
RETURN com.companyName
Write a query to find companies within 20 km of San Jose, and say what unit point.distance returns. KGR-7.2

MATCH (sj:Address {city: "San Jose"}) MATCH (com:Company)-[:LOCATED_AT]->(a:Address) WHERE point.distance(sj.location, a.location) < 20000 RETURN com.companyName. point.distance() returns metres, so 20 km is < 20000 (not 20). A point index on the location property keeps the range query fast.

Teaching an LLM to write Cypher

The capstone capability: instead of hand-writing Cypher, give the LLM the schema plus a few examples and let it translate plain English into queries — a few-shot prompt turns the model into a “Cypher compiler”.

Task: Generate Cypher to query a graph database.
- Use ONLY the provided relationship types and properties.
- Do not hallucinate relationship types or properties.
Schema: {schema}

# What investment firms are in San Francisco?
MATCH (mgr:Manager)-[:LOCATED_AT]->(a:Address)
WHERE a.city = "San Francisco" RETURN mgr.managerName

Question: {question}

That prompt drives a GraphCypherQAChain, which runs the full loop end to end:

Key concept

Progressive few-shot: diversity beats quantity

KGR-7.5

Each progressively added example unlocks a new query shape — city filter, then point.distance geospatial, then full-text + SECTION document navigation — and the LLM generalises each to unseen questions. Two or three diverse examples (filter, aggregate, traverse) cover far more than ten similar ones; past that, returns diminish fast. Choose for pattern coverage, not count. [V] Verified

Why does adding one 'What does company X do?' example unlock document navigation for many other companies? KGR-7.5

The example demonstrates a query shape — full-text find the company, then traverse FILED → SECTION → Chunk to its business section — not a single hard-coded answer. The LLM generalises the pattern, so it can apply the same full-text-then-traverse structure to any company name. That’s progressive few-shot: each diverse example teaches one new capability the model reuses.

What is schema hallucination in LLM-generated Cypher, and how do you prevent it? KGR-7.6

Schema hallucination is when the LLM generates Cypher referencing relationship types or properties that don’t exist in the graph — plausible-looking but invalid. Prevent it by injecting the real schema into the prompt, instructing the model to use only those types and “do not hallucinate”, supplying few-shot examples that use the correct relationships, and validating the generated query before running it in production.

Four retrieval paradigms, one engine

Key concept

The finished graph unites four paradigms

KGR-7.7

No single retrieval method answers every question — the graph’s power is holding all four in one query engine:

  1. Vector — semantic similarity for conceptual questions (“about cloud storage”).
  2. Full-text — keyword/string lookup for entities (“find Royal Bank of Canada”).
  3. Graph traversal — relationship following for structure (“who invests in NetApp?”).
  4. Geospatial — distance for location (“companies within 10 km of San Jose”).

(These are retrieval methods; Module 6’s “three search paradigms” counted index types — vector, full-text, property — with exact-lookup folding in here alongside the new traversal and geospatial methods.) Complex questions compose them — “investors within 50 miles of NetApp” needs full-text + geospatial + traversal at once. [V] Verified

| Paradigm | Method | Example question | | --- | --- | --- | | Vector | db.index.vector.queryNodes | “Tell me about cloud storage” | | Full-text | db.index.fulltext.queryNodes | “Find Palo Alto Networks” | | Traversal | MATCH path patterns | “Who invests in NetApp?” | | Geospatial | point.distance() | “Companies within 10 km of San Jose” |

Name the four retrieval paradigms and give one question type suited to each. KGR-7.7

Vector — conceptual/semantic questions (“what is NetApp’s business?”). Full-text — entity lookup by name (“find Royal Bank of Canada”). Graph traversal — relationship/structural questions (“who are NetApp’s investors?”). Geospatial — location/distance questions (“companies within 10 km of San Jose”). Most real questions combine two or more.

Summary — and the course in one arc

This module added geospatial search (point.distance(), in metres, over Address nodes) and LLM-generated Cypher: a few-shot prompt plus the GraphCypherQAChain turn a plain-English question into a generated query, an execution, and a natural-language answer — guarded against schema hallucination by injecting the real schema. The finished graph unites four retrieval paradigms — vector, full-text, traversal, geospatial — a toolkit far richer than similarity search alone, all grown by the same Extract-Enhance-Expand move.

Across seven chapters the guide built one system: graph fundamentals (M1) → Cypher (M2) → vector search in the graph (M3) → constructing a graph from documents (M4) → relationships and chunk windows (M5) → expanding with a second dataset (M6) → conversational, multi-paradigm querying (M7). The throughline: retrieval by connection, not just similarity.