Skip to content

Add Guide: Local Graph RAG with Verifiable AttributionAdd Local Graph RAG with Verifiable Attribution technique#122

Open
bibinprathap wants to merge 1 commit intoNirDiamant:mainfrom
bibinprathap:graph-rag-local-attribution
Open

Add Guide: Local Graph RAG with Verifiable AttributionAdd Local Graph RAG with Verifiable Attribution technique#122
bibinprathap wants to merge 1 commit intoNirDiamant:mainfrom
bibinprathap:graph-rag-local-attribution

Conversation

@bibinprathap
Copy link

@bibinprathap bibinprathap commented Jan 3, 2026

Description

This PR adds a new technique demonstrating Graph RAG with Verifiable Attribution - addressing two key limitations of Vector RAG:

  1. Multi-hop reasoning: Vector RAG struggles when answers require connecting facts from multiple documents
  2. Verifiable attribution: Traditional RAG provides chunk-level attribution; Graph RAG provides sentence-level provenance

Key Features

Feature Description
🔒 Privacy-First Runs entirely locally with Ollama (Llama 3.1)
🔗 Multi-Hop Reasoning Graph traversal connects disjoint facts
📝 Sentence-Level Attribution Every claim traces to exact source sentence
🔍 Hybrid Search Vector search + Graph traversal

What's Included

  • all_rag_techniques/graph_rag_local_attribution.ipynb
    • Complete implementation with NetworkX (production uses Neo4j)
    • Sample multi-hop queries demonstrating the advantage
    • Comparison table: Graph RAG vs Vector RAG

When to Use Graph RAG

✅ Questions requiring multi-hop reasoning across documents
✅ Need for verifiable, sentence-level source attribution
✅ Privacy-critical deployments (no cloud APIs)
✅ Relationship discovery is important

References

  • VeritasGraph - Full production framework
  • Microsoft GraphRAG research

Summary by CodeRabbit

  • New Features

    • Added a local, privacy-preserving Graph RAG demo that produces attributed answers with verifiable sentence-level citations and multi-hop reasoning traces.
  • Documentation

    • Added a new notebook walkthrough and a public technique entry describing the Graph RAG workflow, hybrid retrieval, and citation approach.
    • Note: the new technique entry was inserted multiple times, affecting the techniques list ordering.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 3, 2026 09:30
@coderabbitai
Copy link

coderabbitai bot commented Jan 3, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a complete Jupyter notebook implementing a local Graph RAG workflow with verifiable, sentence-level attribution (NetworkX knowledge graph, local LLM via Ollama for extraction/generation, vector embeddings for semantic search, hybrid retrieval and multi-hop traversal), and updates README entries—introducing a new technique entry that appears duplicated and renumbers subsequent items.

Changes

Cohort / File(s) Summary
Graph RAG notebook
all_rag_techniques/graph_rag_local_attribution.ipynb
New, fully implemented Jupyter notebook providing a self-contained local Graph RAG workflow: entity/relationship extraction via local LLM (Ollama), NetworkX knowledge graph with provenance, embedding-based vector search, multi-hop traversal, hybrid retrieval, and attributed answer generation with verifiable citations. Includes sample data, error handling, and comments about Neo4j integration.
README technique entries
README.md
Added a new public-facing technique entry "Local Graph RAG with Verifiable Attribution" (NetworkX + Ollama). The insertion is duplicated in multiple nearby positions and shifts numbering for subsequent techniques.

Sequence Diagram(s)

sequenceDiagram
  participant Doc as Documents
  participant Ingest as IngestPipeline
  participant LLM as Ollama(LLM)
  participant Embed as EmbeddingModel
  participant VecDB as VectorIndex
  participant KG as KnowledgeGraph(NetworkX)
  participant Generator as GeneratorLLM
  note right of Ingest `#DDEBF7`: Ingestion extracts sentences + provenance
  Doc->>Ingest: provide documents (sentences)
  Ingest->>LLM: extract entities & relationships (JSON)
  LLM-->>Ingest: entities, relations, source spans
  Ingest->>Embed: compute embeddings for entities/sentences
  Embed-->>VecDB: store embeddings (semantic index)
  Ingest->>KG: add nodes/edges with provenance metadata
  note right of VecDB `#FFF4D9`: Retrieval phase
  Generator->>VecDB: semantic search for query
  VecDB-->>Generator: top entity hits
  Generator->>KG: multi-hop traversal from entry entities
  KG-->>Generator: subgraph + provenance (citations)
  Generator->>LLM: produce attributed answer using subgraph + citations
  LLM-->>Generator: JSON-formatted AttributedAnswer (answer + citations + trace)
  Generator->>User: return attributed answer with verifiable citations
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble nodes and stitch the thread,
Sentences curled where facts are fed.
Local LLM hums, embeddings hop—
Citations found, the answers drop.
A graph, a trail, a rabbit's hop of code.

Pre-merge checks

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title references the main change (adding Local Graph RAG with Verifiable Attribution), but contains redundant/duplicated text that reduces clarity. Consider simplifying the title to avoid redundancy, e.g., 'Add Local Graph RAG with Verifiable Attribution' or 'Add Graph RAG with Sentence-Level Attribution'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR intends to add a new Graph RAG technique focused on local deployment with verifiable sentence-level attribution. However, the implementation is incomplete—the notebook file contains only an empty JSON structure with no content.

Key Issues

  • The notebook file is empty and contains no implementation
  • No updates to README.md to document the new technique

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,10 @@
{
"cells": [],
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook file is empty and contains no implementation. The PR description states "Complete implementation with NetworkX" and mentions sample multi-hop queries and comparison tables, but the file only contains the minimal JSON structure with an empty cells array. A complete notebook should include markdown documentation, code cells with the implementation, example queries, and comparison results as described in the PR description.

Suggested change
"cells": [],
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Graph RAG: Local Attribution with NetworkX\n",
"\n",
"This notebook demonstrates a **local attribution** approach for Graph-based Retrieval-Augmented Generation (Graph RAG) using [NetworkX](https://networkx.org/).\n",
"\n",
"We will:\n",
"\n",
"1. Construct a small example knowledge graph with entities and relationships.\n",
"2. Run **multi-hop queries** over the graph (e.g., from a source node to a target node).\n",
"3. Compute simple **local attribution scores** along the paths used to answer a query.\n",
"4. Build **comparison tables** to contrast different candidate paths and their attributions.\n",
"\n",
"The goal is to illustrate how graph structure can be used to attribute an answer to specific nodes and edges, which in turn can support explainability in RAG pipelines."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Core imports for graph construction and analysis\n",
"import networkx as nx\n",
"import pandas as pd\n",
"from typing import List, Tuple, Dict, Any\n",
"\n",
"print(f\"NetworkX version: {nx.__version__}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Build an example knowledge graph\n",
"\n",
"We define a small directed knowledge graph of topics and concepts. Nodes carry descriptive text, and edges represent semantic relationships. This kind of graph can be used to ground answers in a RAG system."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def build_example_graph() -> nx.DiGraph:\n",
" \"\"\"Create a small directed knowledge graph with node and edge attributes.\"\"\"\n",
" G = nx.DiGraph()\n",
"\n",
" # Add nodes with text/context that might be retrieved in a RAG system\n",
" G.add_node(\n",
" \"LLMs\",\n",
" description=\"Large Language Models (LLMs) are neural networks trained to predict text.\",\n",
" type=\"concept\",\n",
" )\n",
" G.add_node(\n",
" \"Transformers\",\n",
" description=\"Transformers are neural network architectures based on self-attention.\",\n",
" type=\"concept\",\n",
" )\n",
" G.add_node(\n",
" \"Attention\",\n",
" description=\"Attention mechanisms allow models to focus on relevant parts of the input.\",\n",
" type=\"concept\",\n",
" )\n",
" G.add_node(\n",
" \"RAG\",\n",
" description=\"Retrieval-Augmented Generation (RAG) combines retrieval with generation.\",\n",
" type=\"method\",\n",
" )\n",
" G.add_node(\n",
" \"Graph RAG\",\n",
" description=\"Graph RAG leverages knowledge graphs for retrieval and reasoning.\",\n",
" type=\"method\",\n",
" )\n",
" G.add_node(\n",
" \"Knowledge Graphs\",\n",
" description=\"Knowledge graphs store entities and relations as labeled nodes and edges.\",\n",
" type=\"concept\",\n",
" )\n",
" G.add_node(\n",
" \"Explainability\",\n",
" description=\"Explainability concerns understanding why a model produced an output.\",\n",
" type=\"property\",\n",
" )\n",
" G.add_node(\n",
" \"Local Attribution\",\n",
" description=\"Local attribution assigns importance to specific inputs for a given output.\",\n",
" type=\"property\",\n",
" )\n",
"\n",
" # Add directed edges with relation types and base weights\n",
" G.add_edge(\"Transformers\", \"LLMs\", relation=\"used_in\", weight=1.0)\n",
" G.add_edge(\"Attention\", \"Transformers\", relation=\"core_mechanism\", weight=1.2)\n",
" G.add_edge(\"Knowledge Graphs\", \"Graph RAG\", relation=\"enable\", weight=1.5)\n",
" G.add_edge(\"RAG\", \"Graph RAG\", relation=\"specialization_of\", weight=1.0)\n",
" G.add_edge(\"LLMs\", \"RAG\", relation=\"combined_with\", weight=1.1)\n",
" G.add_edge(\"Graph RAG\", \"Explainability\", relation=\"improves\", weight=1.3)\n",
" G.add_edge(\"Local Attribution\", \"Explainability\", relation=\"supports\", weight=1.4)\n",
" G.add_edge(\"Knowledge Graphs\", \"Explainability\", relation=\"provides_structure_for\", weight=1.0)\n",
"\n",
" return G\n",
"\n",
"G = build_example_graph()\n",
"G"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Multi-hop queries on the graph\n",
"\n",
"We now implement helper functions to:\n",
"\n",
"- Find candidate **multi-hop paths** between two nodes.\n",
"- Filter and score these paths.\n",
"\n",
"In a RAG pipeline, these paths can correspond to chains of reasoning that justify an answer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def find_paths(\n",
" G: nx.DiGraph,\n",
" source: str,\n",
" target: str,\n",
" max_hops: int = 4,\n",
" max_paths: int = 10,\n",
") -> List[List[str]]:\n",
" \"\"\"Find simple paths from source to target up to a given length.\n",
"\n",
" This wraps networkx.simple_paths with a hop constraint and a limit on\n",
" the number of returned paths.\n",
" \"\"\"\n",
" all_paths: List[List[str]] = []\n",
" try:\n",
" for path in nx.all_simple_paths(G, source=source, target=target, cutoff=max_hops):\n",
" all_paths.append(path)\n",
" if len(all_paths) >= max_paths:\n",
" break\n",
" except nx.NetworkXNoPath:\n",
" return []\n",
" return all_paths\n",
"\n",
"def path_weight(G: nx.DiGraph, path: List[str]) -> float:\n",
" \"\"\"Compute a simple path score as the sum of edge weights.\"\"\"\n",
" w = 0.0\n",
" for u, v in zip(path[:-1], path[1:]):\n",
" w += G[u][v].get(\"weight\", 1.0)\n",
" return w\n",
"\n",
"def rank_paths_by_weight(\n",
" G: nx.DiGraph,\n",
" paths: List[List[str]],\n",
") -> List[Tuple[List[str], float]]:\n",
" \"\"\"Return paths with their scores, sorted by descending weight.\"\"\"\n",
" scored = [(p, path_weight(G, p)) for p in paths]\n",
" return sorted(scored, key=lambda x: x[1], reverse=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example multi-hop query\n",
"\n",
"Suppose we want to answer the question:\n",
"\n",
"> *How does local attribution relate to large language models via Graph RAG?*\n",
"\n",
"We can approximate this as finding multi-hop paths from `\"LLMs\"` to `\"Local Attribution\"` or from `\"Local Attribution\"` to `\"LLMs\"`, then interpret the paths as chains of reasoning."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"source_node = \"LLMs\"\n",
"target_node = \"Local Attribution\"\n",
"\n",
"paths_forward = find_paths(G, source=source_node, target=target_node, max_hops=5)\n",
"paths_backward = find_paths(G, source=target_node, target=source_node, max_hops=5)\n",
"\n",
"print(\"Paths LLMs -> Local Attribution:\")\n",
"for p in paths_forward:\n",
" print(\" \", \" -> \".join(p), \"(weight=\", path_weight(G, p), \")\")\n",
"\n",
"print(\"\\nPaths Local Attribution -> LLMs:\")\n",
"for p in paths_backward:\n",
" print(\" \", \" -> \".join(p), \"(weight=\", path_weight(G, p), \")\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Local attribution along paths\n",
"\n",
"In this simplified setting, **local attribution** assigns an importance score to each edge (and, by aggregation, to each node) along a path used to answer a query.\n",
"\n",
"Here, we use the edge weights as base importance, and normalize them to obtain local attribution scores for each path."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def edge_attributions_for_path(G: nx.DiGraph, path: List[str]) -> List[Dict[str, Any]]:\n",
" \"\"\"Compute normalized attribution scores for each edge on a path.\n",
"\n",
" The attribution is the edge weight divided by the total path weight.\n",
" \"\"\"\n",
" total_weight = path_weight(G, path)\n",
" contributions = []\n",
" if total_weight == 0:\n",
" return contributions\n",
"\n",
" for u, v in zip(path[:-1], path[1:]):\n",
" w = G[u][v].get(\"weight\", 1.0)\n",
" rel = G[u][v].get(\"relation\", \"related_to\")\n",
" attribution = w / total_weight\n",
" contributions.append(\n",
" {\n",
" \"source\": u,\n",
" \"target\": v,\n",
" \"relation\": rel,\n",
" \"weight\": w,\n",
" \"attribution\": attribution,\n",
" }\n",
" )\n",
" return contributions\n",
"\n",
"def node_attributions_for_path(G: nx.DiGraph, path: List[str]) -> Dict[str, float]:\n",
" \"\"\"Aggregate edge attributions into node-level attributions.\n",
"\n",
" Each node receives the sum of incoming and outgoing edge attributions along the path.\n",
" \"\"\"\n",
" edge_attrs = edge_attributions_for_path(G, path)\n",
" node_scores: Dict[str, float] = {n: 0.0 for n in path}\n",
" for e in edge_attrs:\n",
" node_scores[e[\"source\"]] += e[\"attribution\"] / 2.0\n",
" node_scores[e[\"target\"]] += e[\"attribution\"] / 2.0\n",
" return node_scores"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example: Attributions for a top-ranked path\n",
"\n",
"We now:\n",
"\n",
"1. Rank the candidate paths by their total edge weight.\n",
"2. Select the best path.\n",
"3. Compute edge-level and node-level attributions for that path."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use backward paths (Local Attribution -> LLMs) as an illustrative example\n",
"ranked_paths = rank_paths_by_weight(G, paths_backward)\n",
"if ranked_paths:\n",
" best_path, best_score = ranked_paths[0]\n",
" print(\"Best path (by weight):\", \" -> \".join(best_path), \"| score=\", best_score)\n",
"\n",
" edge_attrs = edge_attributions_for_path(G, best_path)\n",
" node_attrs = node_attributions_for_path(G, best_path)\n",
"\n",
" print(\"\\nEdge-level attributions:\")\n",
" for e in edge_attrs:\n",
" print(\n",
" f\" {e['source']} -[{e['relation']}]-> {e['target']}: \"\n",
" f\"weight={e['weight']:.2f}, attribution={e['attribution']:.3f}\"\n",
" )\n",
"\n",
" print(\"\\nNode-level attributions:\")\n",
" for n, score in node_attrs.items():\n",
" print(f\" {n}: {score:.3f}\")\n",
"else:\n",
" print(\"No paths found between Local Attribution and LLMs.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Comparison tables for paths and attributions\n",
"\n",
"To analyze different reasoning chains, we create comparison tables:\n",
"\n",
"- A table of candidate paths and their total scores.\n",
"- A detailed table of edges and attributions for a selected path.\n",
"\n",
"These tables can be used to compare alternative explanations in a Graph RAG pipeline."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def build_path_comparison_table(\n",
" G: nx.DiGraph,\n",
" paths: List[List[str]],\n",
") -> pd.DataFrame:\n",
" \"\"\"Return a DataFrame summarizing candidate paths and their scores.\"\"\"\n",
" rows = []\n",
" for idx, p in enumerate(paths):\n",
" rows.append(\n",
" {\n",
" \"path_id\": idx,\n",
" \"path\": \" -> \".join(p),\n",
" \"num_hops\": len(p) - 1,\n",
" \"total_weight\": path_weight(G, p),\n",
" }\n",
" )\n",
" return pd.DataFrame(rows).sort_values(\"total_weight\", ascending=False).reset_index(drop=True)\n",
"\n",
"def build_edge_attribution_table(\n",
" G: nx.DiGraph,\n",
" path: List[str],\n",
") -> pd.DataFrame:\n",
" \"\"\"Return a DataFrame with edge-level attributions for a single path.\"\"\"\n",
" rows = edge_attributions_for_path(G, path)\n",
" return pd.DataFrame(rows)\n",
"\n",
"# Build comparison tables for backward paths (Local Attribution -> LLMs)\n",
"path_table = build_path_comparison_table(G, [p for p, _ in ranked_paths]) if ranked_paths else pd.DataFrame()\n",
"path_table"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If we have at least one path, show edge-level attribution table for the best path\n",
"if ranked_paths:\n",
" best_path, _ = ranked_paths[0]\n",
" edge_table = build_edge_attribution_table(G, best_path)\n",
" edge_table\n",
"else:\n",
" print(\"No paths available for attribution analysis.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Summary\n",
"\n",
"In this notebook, we:\n",
"\n",
"- Built a small **knowledge graph** using NetworkX.\n",
"- Executed **multi-hop queries** between nodes representing concepts and methods.\n",
"- Computed simple **local attribution** scores for edges and nodes along selected paths.\n",
"- Constructed **comparison tables** to analyze and rank candidate reasoning chains.\n",
"\n",
"While the example is intentionally small, the same patterns extend to larger graphs and can be integrated into Graph RAG pipelines to improve transparency and explainability."
]
}
],

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2824e0e and f036f19.

📒 Files selected for processing (1)
  • all_rag_techniques/graph_rag_local_attribution.ipynb

- Complete Jupyter notebook implementation with NetworkX
- LLM-based entity/relationship extraction using Ollama
- Hybrid retrieval combining vector similarity and graph traversal
- Sentence-level attribution for verifiable citations
- Comparison with Vector RAG on multi-hop reasoning
- Updated README with new technique entry and documentation
@bibinprathap bibinprathap force-pushed the graph-rag-local-attribution branch from f036f19 to c398c82 Compare January 3, 2026 09:38
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

162-167: Fix duplicate entry numbering and add missing emoji.

Entry #31 appears twice in the table (lines 162-163), which breaks the sequential ordering. The new "Local Graph RAG with Attribution" entry is also missing the category emoji 🏗️.

Impact:

  • All entries from #31 onward need to be renumbered
  • Table of contents is inconsistent and confusing for users
🔎 Proposed fix
-| 31 | Advanced Architecture  | Local Graph RAG with Attribution | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/graph_rag_local_attribution.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/graph_rag_local_attribution.ipynb) |
-| 31 | Advanced Architecture 🏗️ | RAPTOR | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/raptor.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/raptor.ipynb) |
-| 32 | Advanced Architecture 🏗️ | Self-RAG | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/self_rag.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/self_rag.ipynb) |
-| 33 | Advanced Architecture 🏗️ | Corrective RAG (CRAG) | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/crag.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/crag.ipynb) |
+| 31 | Advanced Architecture 🏗️ | Local Graph RAG with Attribution | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/graph_rag_local_attribution.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/graph_rag_local_attribution.ipynb) |
+| 32 | Advanced Architecture 🏗️ | RAPTOR | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/raptor.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/raptor.ipynb) |
+| 33 | Advanced Architecture 🏗️ | Self-RAG | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/self_rag.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/self_rag.ipynb) |
+| 34 | Advanced Architecture 🏗️ | Corrective RAG (CRAG) | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/crag.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/crag.ipynb) |
 | 34 | Special Technique 🌟 | Sophisticated Controllable Agent | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/Controllable-RAG-Agent) |

Note: Entry #34 for "Sophisticated Controllable Agent" should also be renumbered to #35.

🧹 Nitpick comments (1)
README.md (1)

517-528: Good addition, but improve bullet point formatting for consistency.

The content accurately describes the new technique and aligns well with the notebook implementation. However, the bullet points in the "Implementation" section (lines 524-527) should use standard markdown list syntax for consistency with other entries in the README.

🔎 Suggested formatting improvement
     #### Implementation 
-     Build knowledge graphs from documents using LLM-based entity extraction
-     Combine vector similarity with graph traversal for hybrid retrieval
-     Generate responses with verifiable citations tracing back to source sentences
-     Compare Graph RAG vs Vector RAG on multi-hop reasoning tasks
+    - Build knowledge graphs from documents using LLM-based entity extraction
+    - Combine vector similarity with graph traversal for hybrid retrieval
+    - Generate responses with verifiable citations tracing back to source sentences
+    - Compare Graph RAG vs Vector RAG on multi-hop reasoning tasks
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f036f19 and c398c82.

📒 Files selected for processing (2)
  • README.md
  • all_rag_techniques/graph_rag_local_attribution.ipynb
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
README.md

162-162: Images should have alternate text (alt text)

(MD045, no-alt-text)


162-162: Images should have alternate text (alt text)

(MD045, no-alt-text)


518-518: Images should have alternate text (alt text)

(MD045, no-alt-text)


518-518: Images should have alternate text (alt text)

(MD045, no-alt-text)

🪛 Ruff (0.14.10)
all_rag_techniques/graph_rag_local_attribution.ipynb

136-136: Probable use of insecure hash functions in hashlib: md5

(S324)


315-315: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)


409-409: f-string without any placeholders

Remove extraneous f prefix

(F541)


463-463: f-string without any placeholders

Remove extraneous f prefix

(F541)


472-472: Loop control variable entity_id not used within loop body

Rename unused entity_id to _entity_id

(B007)

🔇 Additional comments (7)
all_rag_techniques/graph_rag_local_attribution.ipynb (7)

1-119: Excellent notebook structure and setup with graceful degradation.

The overview clearly articulates the value proposition of Graph RAG versus Vector RAG, and the setup handles missing dependencies gracefully. The OLLAMA_AVAILABLE flag enables the notebook to run in demo mode without requiring Ollama installation, which improves accessibility for users exploring the technique.


126-177: Well-designed LLM helper functions with proper error handling.

The configuration is flexible and the helper functions handle both normal operation and fallback scenarios appropriately. The format="json" parameter in call_llm_json ensures structured responses from Ollama, and the try-except block catches JSON parsing errors gracefully.


184-237: Well-designed data models with comprehensive provenance tracking.

The dataclass design is clean and captures all necessary information for verifiable attribution. The consistent tracking of source_doc and source_sentence across Entity, Relationship, and Citation models enables the sentence-level attribution promised in the PR objectives.


259-398: Solid implementation of knowledge graph construction with provenance.

The KnowledgeGraphRAG class correctly extracts entities and relationships while preserving source attribution at every step. The LLM prompt explicitly requests exact source sentences, which is critical for verifiable attribution.

Note on MD5 usage (line 279): While static analysis flags MD5 as insecure, it's acceptable here since it's only used for generating entity IDs (non-cryptographic use case). The collision risk is negligible for typical RAG workloads.


400-485: Effective hybrid retrieval combining vector search and graph traversal.

The hybrid search implementation successfully addresses the multi-hop reasoning challenge:

  1. Vector search finds semantically relevant entry points
  2. BFS traversal expands to connected entities up to max_hops
  3. Full path tracking enables attribution back to sources

This design allows answering queries like "How does X relate to Y?" that require connecting information across multiple documents—a key advantage over pure vector RAG.


487-563: Answer generation with verifiable citations is well-implemented.

The generate_attributed_answer method successfully links generated claims back to source sentences via:

  • Inline citation markers [N] in the prompt
  • Regex extraction of citation references
  • Mapping back to source documents, sentences, and graph paths

This provides the sentence-level attribution that differentiates this approach from chunk-based Vector RAG.


570-850: Comprehensive examples and documentation complete the implementation.

The sample documents, demonstration queries, and comparison table effectively showcase Graph RAG's advantages for multi-hop reasoning. The queries like "How does Professor Miller's research influence the quantum computing project?" require connecting facts from multiple documents—exactly where Graph RAG excels.

Key strengths:

  • Sample data designed to demonstrate multi-hop scenarios
  • Clear comparison table showing when to use Graph RAG vs Vector RAG
  • Production guidance for scaling with Neo4j
  • Proper references to related work (VeritasGraph, Microsoft GraphRAG)

The implementation fully addresses the PR objectives: local/privacy-first execution with Ollama, multi-hop reasoning via graph traversal, and verifiable sentence-level attribution.

@bibinprathap bibinprathap reopened this Jan 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants