A B2B SaaS team asked us to look at their support assistant after a sales engineer tried to use it to answer a customer call. The question was: "which of our enterprise customers complained about the new pricing tier in Q4 last year, and what is their current MRR?"
The assistant returned a fluent paragraph about pricing complaints in general, with one specific customer name lifted from a chunk where they appeared in passing. The MRR was wrong. The complaint was actually from Q3, not Q4. The list of "enterprise customers" was missing two of the largest accounts.
The team's first instinct was to throw a bigger model at it. The actual problem was structural: the question required reasoning over relationships between entities that a chunked vector store had already destroyed at index time.
What chunking does to entity relationships
When a CRM corpus or a support ticket history goes through a standard RAG pipeline, the chunker treats the text as a flat sequence. A chunk that mentions "Acme Corp called about pricing in October" is stored alongside the text. A separate chunk that mentions "Acme Corp's MRR is $48,000" is stored separately, in a different document. A third chunk says "Q4 starts in October."
A vector retrieval over the question "enterprise customers complaining about pricing in Q4" might return the first chunk. It will not return the second chunk because it does not mention pricing. It will not return the third chunk because it does not mention Acme. The LLM gets one of the three pieces it needs and confidently makes up the other two.
This is not a model problem. A larger LLM hallucinates more confidently with the same incomplete context. The fix has to happen earlier in the pipeline.
Five signs the corpus needs a graph layer
Sapota's audit checklist for "is this a Graph RAG problem" is five questions. If three or more come back yes, we recommend adding a graph.
1. Do the questions involve more than one named entity?
"What products has customer X bought" is one entity. "What products has customer X bought that customers similar to X also returned" is three entities and two relationships. The second question is a graph query, not a search query, regardless of how it is phrased in natural language.
2. Do the questions filter on relationships, not just attributes?
"Engineers with more than five years of experience" is an attribute filter. "Engineers who reported to managers who left in the last year" is a relationship filter. Vector search has no concept of relationships. Graph databases are built around them.
3. Does the right answer depend on aggregating across many entities?
"Total revenue from accounts in the EMEA region" requires walking across all account entities, filtering by region, and summing. This is a SQL query disguised as natural language and should not be in a vector database at all. A graph database (or just a SQL agent) handles it cleanly.
4. Do the source documents contain dense entity references?
CRM notes, meeting transcripts, customer interviews, sales call summaries: these documents are mostly references to people, accounts, products, and dates. Their meaning lives in the relationships between these entities, not in the prose. Chunking them preserves the prose and destroys the relationships.
5. Do the users ask follow-up questions that build on the previous answer?
"Show me the top 10 customers by revenue. Now show me which of those have open support tickets. Now show me the average resolution time for those tickets." Each follow-up adds a hop to the implied query. By the third follow-up, a vector retrieval is essentially random.
What we actually build
The default pattern Sapota ships for Graph RAG is hybrid: a vector store for unstructured content (long-form notes, articles, documentation) plus a graph database for entities and relationships extracted from that content.
The pipeline at index time:
- Run an entity-and-relationship extraction pass over the corpus using an LLM (Claude Sonnet or GPT-4o, gpt-4o-mini for cost-sensitive cases). Output is a list of (entity, relationship, entity) triples plus the source chunk for each.
- Insert the triples into Neo4j (or LightRAG if the team wants the simpler embedded option). Each triple carries a reference to the source chunk in the vector store.
- Embed the original chunks into the vector store as usual.
At query time:
- A planning LLM classifies the query as graph-shaped, vector-shaped, or hybrid.
- Graph-shaped queries are translated to Cypher (or the equivalent for LightRAG) and executed against the graph. The result is a set of entities + the source chunks they came from.
- Vector-shaped queries hit the vector store as usual.
- Hybrid queries do both and merge: graph result narrows the candidate entities, vector retrieval pulls the supporting prose from the chunks linked to those entities.
The synthesis LLM gets both the structured graph result and the unstructured prose context, which is enough to answer multi-hop questions correctly.
Why most teams skip Graph RAG
Three reasons, in our experience.
The setup looks scary. Neo4j has a learning curve. Cypher is a new query language. The entity extraction step requires LLM calls during indexing, which is not free. Compared to "spin up a Pinecone index and embed everything," the operational complexity is real.
The product team does not realize the queries are graph-shaped. Until a sales engineer or analyst tries to use the system for the questions they actually have, the team thinks the chatbot is for FAQ lookups. By the time the multi-hop questions surface, the architecture is already in production and the migration is painful.
Most tutorials and managed RAG services do not cover it. Vector RAG has the brand recognition. Graph RAG is the unsexy older sibling that solves the harder problem. It is the same dynamic as relational databases vs document stores in 2010, and it tends to resolve the same way: the boring older option turns out to be necessary for serious use cases.
When vector-only is genuinely enough
We do not always recommend adding a graph layer. The cases where vector-only RAG is the right call:
- The corpus is genuinely document-shaped (long-form prose, blog posts, technical articles) without dense entity references.
- The questions are mostly definitional or how-to, not multi-hop or comparative.
- The team does not have the engineering capacity to maintain a graph plus a vector store, and the simpler architecture's failure modes are acceptable.
- The product is early enough that the question distribution has not stabilized and over-engineering is the bigger risk.
A note on LightRAG and GraphRAG-Microsoft
There are two managed-style options that reduce the operational lift: LightRAG (open-source from HKU) and Microsoft's GraphRAG. Both wrap the entity extraction, graph construction, and graph-aware retrieval into higher-level APIs.
LightRAG is the lighter option and the one we use for projects that want graph capabilities without committing to a Neo4j cluster. GraphRAG-Microsoft is more opinionated and pulls in more of the Azure ecosystem. For teams already on Azure, it is a reasonable choice. For everyone else, LightRAG plus Neo4j when scale demands it is the path of least resistance.
If your AI cannot answer multi-hop questions
If your team has shipped a vector RAG system and the users keep asking questions that require connecting entities across documents, that is the pattern we resolve. Sapota runs a one-week graph readiness assessment that takes the production query log, classifies which queries are graph-shaped, and ships the entity extraction pipeline plus the hybrid retrieval as a working integration.
Reach out via the AI engineering page with three example questions that are failing in production. The diagnosis usually surfaces within the first call.