Add Guide: Local Graph RAG with Verifiable AttributionAdd Local Graph RAG with Verifiable Attribution technique#122
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds a complete Jupyter notebook implementing a local Graph RAG workflow with verifiable, sentence-level attribution (NetworkX knowledge graph, local LLM via Ollama for extraction/generation, vector embeddings for semantic search, hybrid retrieval and multi-hop traversal), and updates README entries—introducing a new technique entry that appears duplicated and renumbers subsequent items. Changes
Sequence Diagram(s)sequenceDiagram
participant Doc as Documents
participant Ingest as IngestPipeline
participant LLM as Ollama(LLM)
participant Embed as EmbeddingModel
participant VecDB as VectorIndex
participant KG as KnowledgeGraph(NetworkX)
participant Generator as GeneratorLLM
note right of Ingest `#DDEBF7`: Ingestion extracts sentences + provenance
Doc->>Ingest: provide documents (sentences)
Ingest->>LLM: extract entities & relationships (JSON)
LLM-->>Ingest: entities, relations, source spans
Ingest->>Embed: compute embeddings for entities/sentences
Embed-->>VecDB: store embeddings (semantic index)
Ingest->>KG: add nodes/edges with provenance metadata
note right of VecDB `#FFF4D9`: Retrieval phase
Generator->>VecDB: semantic search for query
VecDB-->>Generator: top entity hits
Generator->>KG: multi-hop traversal from entry entities
KG-->>Generator: subgraph + provenance (citations)
Generator->>LLM: produce attributed answer using subgraph + citations
LLM-->>Generator: JSON-formatted AttributedAnswer (answer + citations + trace)
Generator->>User: return attributed answer with verifiable citations
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR intends to add a new Graph RAG technique focused on local deployment with verifiable sentence-level attribution. However, the implementation is incomplete—the notebook file contains only an empty JSON structure with no content.
Key Issues
- The notebook file is empty and contains no implementation
- No updates to README.md to document the new technique
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,10 @@ | |||
| { | |||
| "cells": [], | |||
There was a problem hiding this comment.
The notebook file is empty and contains no implementation. The PR description states "Complete implementation with NetworkX" and mentions sample multi-hop queries and comparison tables, but the file only contains the minimal JSON structure with an empty cells array. A complete notebook should include markdown documentation, code cells with the implementation, example queries, and comparison results as described in the PR description.
| "cells": [], | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Graph RAG: Local Attribution with NetworkX\n", | |
| "\n", | |
| "This notebook demonstrates a **local attribution** approach for Graph-based Retrieval-Augmented Generation (Graph RAG) using [NetworkX](https://networkx.org/).\n", | |
| "\n", | |
| "We will:\n", | |
| "\n", | |
| "1. Construct a small example knowledge graph with entities and relationships.\n", | |
| "2. Run **multi-hop queries** over the graph (e.g., from a source node to a target node).\n", | |
| "3. Compute simple **local attribution scores** along the paths used to answer a query.\n", | |
| "4. Build **comparison tables** to contrast different candidate paths and their attributions.\n", | |
| "\n", | |
| "The goal is to illustrate how graph structure can be used to attribute an answer to specific nodes and edges, which in turn can support explainability in RAG pipelines." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Core imports for graph construction and analysis\n", | |
| "import networkx as nx\n", | |
| "import pandas as pd\n", | |
| "from typing import List, Tuple, Dict, Any\n", | |
| "\n", | |
| "print(f\"NetworkX version: {nx.__version__}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 1. Build an example knowledge graph\n", | |
| "\n", | |
| "We define a small directed knowledge graph of topics and concepts. Nodes carry descriptive text, and edges represent semantic relationships. This kind of graph can be used to ground answers in a RAG system." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def build_example_graph() -> nx.DiGraph:\n", | |
| " \"\"\"Create a small directed knowledge graph with node and edge attributes.\"\"\"\n", | |
| " G = nx.DiGraph()\n", | |
| "\n", | |
| " # Add nodes with text/context that might be retrieved in a RAG system\n", | |
| " G.add_node(\n", | |
| " \"LLMs\",\n", | |
| " description=\"Large Language Models (LLMs) are neural networks trained to predict text.\",\n", | |
| " type=\"concept\",\n", | |
| " )\n", | |
| " G.add_node(\n", | |
| " \"Transformers\",\n", | |
| " description=\"Transformers are neural network architectures based on self-attention.\",\n", | |
| " type=\"concept\",\n", | |
| " )\n", | |
| " G.add_node(\n", | |
| " \"Attention\",\n", | |
| " description=\"Attention mechanisms allow models to focus on relevant parts of the input.\",\n", | |
| " type=\"concept\",\n", | |
| " )\n", | |
| " G.add_node(\n", | |
| " \"RAG\",\n", | |
| " description=\"Retrieval-Augmented Generation (RAG) combines retrieval with generation.\",\n", | |
| " type=\"method\",\n", | |
| " )\n", | |
| " G.add_node(\n", | |
| " \"Graph RAG\",\n", | |
| " description=\"Graph RAG leverages knowledge graphs for retrieval and reasoning.\",\n", | |
| " type=\"method\",\n", | |
| " )\n", | |
| " G.add_node(\n", | |
| " \"Knowledge Graphs\",\n", | |
| " description=\"Knowledge graphs store entities and relations as labeled nodes and edges.\",\n", | |
| " type=\"concept\",\n", | |
| " )\n", | |
| " G.add_node(\n", | |
| " \"Explainability\",\n", | |
| " description=\"Explainability concerns understanding why a model produced an output.\",\n", | |
| " type=\"property\",\n", | |
| " )\n", | |
| " G.add_node(\n", | |
| " \"Local Attribution\",\n", | |
| " description=\"Local attribution assigns importance to specific inputs for a given output.\",\n", | |
| " type=\"property\",\n", | |
| " )\n", | |
| "\n", | |
| " # Add directed edges with relation types and base weights\n", | |
| " G.add_edge(\"Transformers\", \"LLMs\", relation=\"used_in\", weight=1.0)\n", | |
| " G.add_edge(\"Attention\", \"Transformers\", relation=\"core_mechanism\", weight=1.2)\n", | |
| " G.add_edge(\"Knowledge Graphs\", \"Graph RAG\", relation=\"enable\", weight=1.5)\n", | |
| " G.add_edge(\"RAG\", \"Graph RAG\", relation=\"specialization_of\", weight=1.0)\n", | |
| " G.add_edge(\"LLMs\", \"RAG\", relation=\"combined_with\", weight=1.1)\n", | |
| " G.add_edge(\"Graph RAG\", \"Explainability\", relation=\"improves\", weight=1.3)\n", | |
| " G.add_edge(\"Local Attribution\", \"Explainability\", relation=\"supports\", weight=1.4)\n", | |
| " G.add_edge(\"Knowledge Graphs\", \"Explainability\", relation=\"provides_structure_for\", weight=1.0)\n", | |
| "\n", | |
| " return G\n", | |
| "\n", | |
| "G = build_example_graph()\n", | |
| "G" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 2. Multi-hop queries on the graph\n", | |
| "\n", | |
| "We now implement helper functions to:\n", | |
| "\n", | |
| "- Find candidate **multi-hop paths** between two nodes.\n", | |
| "- Filter and score these paths.\n", | |
| "\n", | |
| "In a RAG pipeline, these paths can correspond to chains of reasoning that justify an answer." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def find_paths(\n", | |
| " G: nx.DiGraph,\n", | |
| " source: str,\n", | |
| " target: str,\n", | |
| " max_hops: int = 4,\n", | |
| " max_paths: int = 10,\n", | |
| ") -> List[List[str]]:\n", | |
| " \"\"\"Find simple paths from source to target up to a given length.\n", | |
| "\n", | |
| " This wraps networkx.simple_paths with a hop constraint and a limit on\n", | |
| " the number of returned paths.\n", | |
| " \"\"\"\n", | |
| " all_paths: List[List[str]] = []\n", | |
| " try:\n", | |
| " for path in nx.all_simple_paths(G, source=source, target=target, cutoff=max_hops):\n", | |
| " all_paths.append(path)\n", | |
| " if len(all_paths) >= max_paths:\n", | |
| " break\n", | |
| " except nx.NetworkXNoPath:\n", | |
| " return []\n", | |
| " return all_paths\n", | |
| "\n", | |
| "def path_weight(G: nx.DiGraph, path: List[str]) -> float:\n", | |
| " \"\"\"Compute a simple path score as the sum of edge weights.\"\"\"\n", | |
| " w = 0.0\n", | |
| " for u, v in zip(path[:-1], path[1:]):\n", | |
| " w += G[u][v].get(\"weight\", 1.0)\n", | |
| " return w\n", | |
| "\n", | |
| "def rank_paths_by_weight(\n", | |
| " G: nx.DiGraph,\n", | |
| " paths: List[List[str]],\n", | |
| ") -> List[Tuple[List[str], float]]:\n", | |
| " \"\"\"Return paths with their scores, sorted by descending weight.\"\"\"\n", | |
| " scored = [(p, path_weight(G, p)) for p in paths]\n", | |
| " return sorted(scored, key=lambda x: x[1], reverse=True)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "### Example multi-hop query\n", | |
| "\n", | |
| "Suppose we want to answer the question:\n", | |
| "\n", | |
| "> *How does local attribution relate to large language models via Graph RAG?*\n", | |
| "\n", | |
| "We can approximate this as finding multi-hop paths from `\"LLMs\"` to `\"Local Attribution\"` or from `\"Local Attribution\"` to `\"LLMs\"`, then interpret the paths as chains of reasoning." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "source_node = \"LLMs\"\n", | |
| "target_node = \"Local Attribution\"\n", | |
| "\n", | |
| "paths_forward = find_paths(G, source=source_node, target=target_node, max_hops=5)\n", | |
| "paths_backward = find_paths(G, source=target_node, target=source_node, max_hops=5)\n", | |
| "\n", | |
| "print(\"Paths LLMs -> Local Attribution:\")\n", | |
| "for p in paths_forward:\n", | |
| " print(\" \", \" -> \".join(p), \"(weight=\", path_weight(G, p), \")\")\n", | |
| "\n", | |
| "print(\"\\nPaths Local Attribution -> LLMs:\")\n", | |
| "for p in paths_backward:\n", | |
| " print(\" \", \" -> \".join(p), \"(weight=\", path_weight(G, p), \")\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 3. Local attribution along paths\n", | |
| "\n", | |
| "In this simplified setting, **local attribution** assigns an importance score to each edge (and, by aggregation, to each node) along a path used to answer a query.\n", | |
| "\n", | |
| "Here, we use the edge weights as base importance, and normalize them to obtain local attribution scores for each path." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def edge_attributions_for_path(G: nx.DiGraph, path: List[str]) -> List[Dict[str, Any]]:\n", | |
| " \"\"\"Compute normalized attribution scores for each edge on a path.\n", | |
| "\n", | |
| " The attribution is the edge weight divided by the total path weight.\n", | |
| " \"\"\"\n", | |
| " total_weight = path_weight(G, path)\n", | |
| " contributions = []\n", | |
| " if total_weight == 0:\n", | |
| " return contributions\n", | |
| "\n", | |
| " for u, v in zip(path[:-1], path[1:]):\n", | |
| " w = G[u][v].get(\"weight\", 1.0)\n", | |
| " rel = G[u][v].get(\"relation\", \"related_to\")\n", | |
| " attribution = w / total_weight\n", | |
| " contributions.append(\n", | |
| " {\n", | |
| " \"source\": u,\n", | |
| " \"target\": v,\n", | |
| " \"relation\": rel,\n", | |
| " \"weight\": w,\n", | |
| " \"attribution\": attribution,\n", | |
| " }\n", | |
| " )\n", | |
| " return contributions\n", | |
| "\n", | |
| "def node_attributions_for_path(G: nx.DiGraph, path: List[str]) -> Dict[str, float]:\n", | |
| " \"\"\"Aggregate edge attributions into node-level attributions.\n", | |
| "\n", | |
| " Each node receives the sum of incoming and outgoing edge attributions along the path.\n", | |
| " \"\"\"\n", | |
| " edge_attrs = edge_attributions_for_path(G, path)\n", | |
| " node_scores: Dict[str, float] = {n: 0.0 for n in path}\n", | |
| " for e in edge_attrs:\n", | |
| " node_scores[e[\"source\"]] += e[\"attribution\"] / 2.0\n", | |
| " node_scores[e[\"target\"]] += e[\"attribution\"] / 2.0\n", | |
| " return node_scores" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "### Example: Attributions for a top-ranked path\n", | |
| "\n", | |
| "We now:\n", | |
| "\n", | |
| "1. Rank the candidate paths by their total edge weight.\n", | |
| "2. Select the best path.\n", | |
| "3. Compute edge-level and node-level attributions for that path." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Use backward paths (Local Attribution -> LLMs) as an illustrative example\n", | |
| "ranked_paths = rank_paths_by_weight(G, paths_backward)\n", | |
| "if ranked_paths:\n", | |
| " best_path, best_score = ranked_paths[0]\n", | |
| " print(\"Best path (by weight):\", \" -> \".join(best_path), \"| score=\", best_score)\n", | |
| "\n", | |
| " edge_attrs = edge_attributions_for_path(G, best_path)\n", | |
| " node_attrs = node_attributions_for_path(G, best_path)\n", | |
| "\n", | |
| " print(\"\\nEdge-level attributions:\")\n", | |
| " for e in edge_attrs:\n", | |
| " print(\n", | |
| " f\" {e['source']} -[{e['relation']}]-> {e['target']}: \"\n", | |
| " f\"weight={e['weight']:.2f}, attribution={e['attribution']:.3f}\"\n", | |
| " )\n", | |
| "\n", | |
| " print(\"\\nNode-level attributions:\")\n", | |
| " for n, score in node_attrs.items():\n", | |
| " print(f\" {n}: {score:.3f}\")\n", | |
| "else:\n", | |
| " print(\"No paths found between Local Attribution and LLMs.\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 4. Comparison tables for paths and attributions\n", | |
| "\n", | |
| "To analyze different reasoning chains, we create comparison tables:\n", | |
| "\n", | |
| "- A table of candidate paths and their total scores.\n", | |
| "- A detailed table of edges and attributions for a selected path.\n", | |
| "\n", | |
| "These tables can be used to compare alternative explanations in a Graph RAG pipeline." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def build_path_comparison_table(\n", | |
| " G: nx.DiGraph,\n", | |
| " paths: List[List[str]],\n", | |
| ") -> pd.DataFrame:\n", | |
| " \"\"\"Return a DataFrame summarizing candidate paths and their scores.\"\"\"\n", | |
| " rows = []\n", | |
| " for idx, p in enumerate(paths):\n", | |
| " rows.append(\n", | |
| " {\n", | |
| " \"path_id\": idx,\n", | |
| " \"path\": \" -> \".join(p),\n", | |
| " \"num_hops\": len(p) - 1,\n", | |
| " \"total_weight\": path_weight(G, p),\n", | |
| " }\n", | |
| " )\n", | |
| " return pd.DataFrame(rows).sort_values(\"total_weight\", ascending=False).reset_index(drop=True)\n", | |
| "\n", | |
| "def build_edge_attribution_table(\n", | |
| " G: nx.DiGraph,\n", | |
| " path: List[str],\n", | |
| ") -> pd.DataFrame:\n", | |
| " \"\"\"Return a DataFrame with edge-level attributions for a single path.\"\"\"\n", | |
| " rows = edge_attributions_for_path(G, path)\n", | |
| " return pd.DataFrame(rows)\n", | |
| "\n", | |
| "# Build comparison tables for backward paths (Local Attribution -> LLMs)\n", | |
| "path_table = build_path_comparison_table(G, [p for p, _ in ranked_paths]) if ranked_paths else pd.DataFrame()\n", | |
| "path_table" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# If we have at least one path, show edge-level attribution table for the best path\n", | |
| "if ranked_paths:\n", | |
| " best_path, _ = ranked_paths[0]\n", | |
| " edge_table = build_edge_attribution_table(G, best_path)\n", | |
| " edge_table\n", | |
| "else:\n", | |
| " print(\"No paths available for attribution analysis.\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 5. Summary\n", | |
| "\n", | |
| "In this notebook, we:\n", | |
| "\n", | |
| "- Built a small **knowledge graph** using NetworkX.\n", | |
| "- Executed **multi-hop queries** between nodes representing concepts and methods.\n", | |
| "- Computed simple **local attribution** scores for edges and nodes along selected paths.\n", | |
| "- Constructed **comparison tables** to analyze and rank candidate reasoning chains.\n", | |
| "\n", | |
| "While the example is intentionally small, the same patterns extend to larger graphs and can be integrated into Graph RAG pipelines to improve transparency and explainability." | |
| ] | |
| } | |
| ], |
- Complete Jupyter notebook implementation with NetworkX - LLM-based entity/relationship extraction using Ollama - Hybrid retrieval combining vector similarity and graph traversal - Sentence-level attribution for verifiable citations - Comparison with Vector RAG on multi-hop reasoning - Updated README with new technique entry and documentation
f036f19 to
c398c82
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
162-167: Fix duplicate entry numbering and add missing emoji.Entry #31 appears twice in the table (lines 162-163), which breaks the sequential ordering. The new "Local Graph RAG with Attribution" entry is also missing the category emoji 🏗️.
Impact:
- All entries from #31 onward need to be renumbered
- Table of contents is inconsistent and confusing for users
🔎 Proposed fix
-| 31 | Advanced Architecture | Local Graph RAG with Attribution | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/graph_rag_local_attribution.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/graph_rag_local_attribution.ipynb) | -| 31 | Advanced Architecture 🏗️ | RAPTOR | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/raptor.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/raptor.ipynb) | -| 32 | Advanced Architecture 🏗️ | Self-RAG | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/self_rag.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/self_rag.ipynb) | -| 33 | Advanced Architecture 🏗️ | Corrective RAG (CRAG) | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/crag.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/crag.ipynb) | +| 31 | Advanced Architecture 🏗️ | Local Graph RAG with Attribution | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/graph_rag_local_attribution.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/graph_rag_local_attribution.ipynb) | +| 32 | Advanced Architecture 🏗️ | RAPTOR | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/raptor.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/raptor.ipynb) | +| 33 | Advanced Architecture 🏗️ | Self-RAG | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/self_rag.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/self_rag.ipynb) | +| 34 | Advanced Architecture 🏗️ | Corrective RAG (CRAG) | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/crag.ipynb) [<img src="https://colab.research.google.com/assets/colab-badge.svg" height="20">](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/crag.ipynb) | | 34 | Special Technique 🌟 | Sophisticated Controllable Agent | [<img src="https://img.shields.io/badge/GitHub-View-blue" height="20">](https://github.com/NirDiamant/Controllable-RAG-Agent) |Note: Entry #34 for "Sophisticated Controllable Agent" should also be renumbered to #35.
🧹 Nitpick comments (1)
README.md (1)
517-528: Good addition, but improve bullet point formatting for consistency.The content accurately describes the new technique and aligns well with the notebook implementation. However, the bullet points in the "Implementation" section (lines 524-527) should use standard markdown list syntax for consistency with other entries in the README.
🔎 Suggested formatting improvement
#### Implementation - Build knowledge graphs from documents using LLM-based entity extraction - Combine vector similarity with graph traversal for hybrid retrieval - Generate responses with verifiable citations tracing back to source sentences - Compare Graph RAG vs Vector RAG on multi-hop reasoning tasks + - Build knowledge graphs from documents using LLM-based entity extraction + - Combine vector similarity with graph traversal for hybrid retrieval + - Generate responses with verifiable citations tracing back to source sentences + - Compare Graph RAG vs Vector RAG on multi-hop reasoning tasks
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.mdall_rag_techniques/graph_rag_local_attribution.ipynb
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
README.md
162-162: Images should have alternate text (alt text)
(MD045, no-alt-text)
162-162: Images should have alternate text (alt text)
(MD045, no-alt-text)
518-518: Images should have alternate text (alt text)
(MD045, no-alt-text)
518-518: Images should have alternate text (alt text)
(MD045, no-alt-text)
🪛 Ruff (0.14.10)
all_rag_techniques/graph_rag_local_attribution.ipynb
136-136: Probable use of insecure hash functions in hashlib: md5
(S324)
315-315: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
409-409: f-string without any placeholders
Remove extraneous f prefix
(F541)
463-463: f-string without any placeholders
Remove extraneous f prefix
(F541)
472-472: Loop control variable entity_id not used within loop body
Rename unused entity_id to _entity_id
(B007)
🔇 Additional comments (7)
all_rag_techniques/graph_rag_local_attribution.ipynb (7)
1-119: Excellent notebook structure and setup with graceful degradation.The overview clearly articulates the value proposition of Graph RAG versus Vector RAG, and the setup handles missing dependencies gracefully. The OLLAMA_AVAILABLE flag enables the notebook to run in demo mode without requiring Ollama installation, which improves accessibility for users exploring the technique.
126-177: Well-designed LLM helper functions with proper error handling.The configuration is flexible and the helper functions handle both normal operation and fallback scenarios appropriately. The
format="json"parameter incall_llm_jsonensures structured responses from Ollama, and the try-except block catches JSON parsing errors gracefully.
184-237: Well-designed data models with comprehensive provenance tracking.The dataclass design is clean and captures all necessary information for verifiable attribution. The consistent tracking of
source_docandsource_sentenceacross Entity, Relationship, and Citation models enables the sentence-level attribution promised in the PR objectives.
259-398: Solid implementation of knowledge graph construction with provenance.The KnowledgeGraphRAG class correctly extracts entities and relationships while preserving source attribution at every step. The LLM prompt explicitly requests exact source sentences, which is critical for verifiable attribution.
Note on MD5 usage (line 279): While static analysis flags MD5 as insecure, it's acceptable here since it's only used for generating entity IDs (non-cryptographic use case). The collision risk is negligible for typical RAG workloads.
400-485: Effective hybrid retrieval combining vector search and graph traversal.The hybrid search implementation successfully addresses the multi-hop reasoning challenge:
- Vector search finds semantically relevant entry points
- BFS traversal expands to connected entities up to
max_hops- Full path tracking enables attribution back to sources
This design allows answering queries like "How does X relate to Y?" that require connecting information across multiple documents—a key advantage over pure vector RAG.
487-563: Answer generation with verifiable citations is well-implemented.The
generate_attributed_answermethod successfully links generated claims back to source sentences via:
- Inline citation markers
[N]in the prompt- Regex extraction of citation references
- Mapping back to source documents, sentences, and graph paths
This provides the sentence-level attribution that differentiates this approach from chunk-based Vector RAG.
570-850: Comprehensive examples and documentation complete the implementation.The sample documents, demonstration queries, and comparison table effectively showcase Graph RAG's advantages for multi-hop reasoning. The queries like "How does Professor Miller's research influence the quantum computing project?" require connecting facts from multiple documents—exactly where Graph RAG excels.
Key strengths:
- Sample data designed to demonstrate multi-hop scenarios
- Clear comparison table showing when to use Graph RAG vs Vector RAG
- Production guidance for scaling with Neo4j
- Proper references to related work (VeritasGraph, Microsoft GraphRAG)
The implementation fully addresses the PR objectives: local/privacy-first execution with Ollama, multi-hop reasoning via graph traversal, and verifiable sentence-level attribution.
Description
This PR adds a new technique demonstrating Graph RAG with Verifiable Attribution - addressing two key limitations of Vector RAG:
Key Features
What's Included
all_rag_techniques/graph_rag_local_attribution.ipynbWhen to Use Graph RAG
✅ Questions requiring multi-hop reasoning across documents
✅ Need for verifiable, sentence-level source attribution
✅ Privacy-critical deployments (no cloud APIs)
✅ Relationship discovery is important
References
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.