GraphMemory
GraphRAG database - hybrid graph / vector db
Install / Use
/learn @bradAGI/GraphMemoryREADME
GraphMemory - GraphRAG Database
An embedded graph database for RAG and knowledge graph applications, powered by DuckDB. Vector similarity search, full-text search, hybrid search, merge/upsert, graph traversal, and a full GraphRAG retrieval pipeline — all in a single Python package.
Features
- Vector Search — HNSW-indexed nearest neighbors (L2, cosine, inner product)
- Full-Text Search — BM25-scored search across node properties
- Hybrid Search — Combined vector + text with configurable weights
- GraphRAG — Retrieval pipeline: hybrid search → graph expansion → context assembly → LLM Q&A
- Merge / Upsert — Deduplicate nodes by property keys and edges by (source, target, relation)
- Query Builder — Fluent, parameterized API with multi-hop traversal
- DSPy Extraction — Entity/relationship extraction from text via DSPy (optional)
- Graph Algorithms — PageRank, centrality, components via NetworkX (optional)
- Import / Export — JSON, CSV, GraphML
- Visualizer — Interactive D3.js force-directed graph in the browser
- Thread-Safe — Connection pooling, transactions, automatic retry with exponential backoff
Installation
pip install graphmemory
# Optional
pip install graphmemory[extraction] # DSPy extraction
pip install graphmemory[algorithms] # NetworkX algorithms
Quick Start
from graphmemory import GraphMemory, Node, Edge
graph = GraphMemory(database="graph.db", vector_length=3, distance_metric="cosine")
# Insert nodes
alice = Node(type="Person", properties={"name": "Alice", "role": "engineer"}, vector=[0.1, 0.8, 0.3])
bob = Node(type="Person", properties={"name": "Bob", "role": "manager"}, vector=[0.2, 0.7, 0.4])
graph.insert_node(alice)
graph.insert_node(bob)
# Insert edge
graph.insert_edge(Edge(source_id=alice.id, target_id=bob.id, relation="reports_to", weight=1.0))
# Vector search
nearest = graph.nearest_nodes(vector=[0.1, 0.8, 0.3], limit=5)
# Full-text search
results = graph.search_nodes("engineer", limit=10)
# Hybrid search
results = graph.hybrid_search("engineer", query_vector=[0.1, 0.8, 0.3], text_weight=0.5, vector_weight=0.5)
# Context manager
with GraphMemory(database="graph.db", vector_length=3) as graph:
graph.insert_node(alice)
Usage
Query Builder
# Filter by type and properties
results = graph.query().match(type="Person").where(role="engineer").execute()
# Multi-hop traversal
results = graph.query().traverse(source_id=alice.id, depth=2).execute()
# Paginate and order
results = graph.query().match(type="Person").order_by("name").limit(10).offset(0).execute()
# Query edges
edges = graph.query().match(type="Person").edges().execute()
Merge / Upsert
Insert-or-update nodes matched by property keys. Edges deduplicate on (source_id, target_id, relation).
from graphmemory import MergeStrategy
# Insert if no match, update if "name" matches an existing Person node
result = graph.merge_node(alice, match_keys=["name"])
print(result.created) # True = inserted, False = updated
# Bulk merge with strategy
results = graph.bulk_merge_nodes(nodes, match_keys=["name"], strategy=MergeStrategy.UPDATE)
# Edge merge
result = graph.merge_edge(edge)
results = graph.bulk_merge_edges(edges)
| Strategy | Behavior |
|----------|----------|
| UPDATE | Shallow merge — existing keys preserved, incoming keys added/overwritten (default) |
| REPLACE | Incoming properties fully replace existing |
| KEEP | Existing properties unchanged; only new nodes inserted |
GraphRAG Retrieval
Full pipeline: hybrid search → multi-hop graph expansion → token-aware context assembly → LLM generation.
# Retrieve context
result = graph.retrieve(query="Who leads ML?", query_vector=embedding, max_hops=2, max_tokens=4000)
print(result.context_text) # Prompt-ready string
print(result.token_estimate) # Token count estimate
# End-to-end Q&A
answer = graph.ask(query="Who leads ML?", query_vector=embedding, llm_callable=my_llm)
print(answer["answer"])
DSPy Extraction
Requires pip install graphmemory[extraction]. Uses DSPy typed predictors to extract entities and relationships from text.
from graphmemory.extraction import extract_and_store, extract_and_merge
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
text = """George Washington was the first President. Thomas Jefferson
served as Secretary of State under Washington."""
# Extract and insert (may create duplicates on repeated calls)
nodes, edges = extract_and_store(graph, text)
# Extract and merge (deduplicates against existing graph)
node_results, edge_results = extract_and_merge(graph, text, match_keys=["name"])
| Function | Description |
|----------|-------------|
| extract_nodes(text) | Extract entity nodes from text |
| extract_edges(text, nodes) | Extract relationships between known nodes |
| extract(text) | Extract both nodes and edges |
| extract_and_store(graph, text) | Extract and insert into graph |
| extract_and_merge(graph, text, match_keys) | Extract and merge (deduplicated) |
Graph Algorithms
Requires pip install graphmemory[algorithms]. Powered by NetworkX.
from graphmemory.algorithms import pagerank, betweenness_centrality, connected_components, to_networkx
scores = pagerank(graph)
centrality = betweenness_centrality(graph)
components = connected_components(graph)
G = to_networkx(graph) # Export to NetworkX DiGraph
| Function | Description |
|----------|-------------|
| pagerank(graph, alpha=0.85) | PageRank scores for all nodes |
| betweenness_centrality(graph) | Betweenness centrality scores |
| degree_distribution(graph) | In/out/total degree per node |
| connected_components(graph) | Weakly connected components (largest first) |
| to_networkx(graph) | Export to networkx.DiGraph |
Import / Export
# Export
data = graph.export_graph(format="json") # also: "csv", "graphml", "json_string"
# Import
graph.import_graph(data, format="json")
Visualizer
Interactive D3.js force-directed graph visualization — opens in your browser with zero dependencies.
# Open in browser
graph.visualize()
# Save to file
graph.visualize(output="my_graph.html", open_browser=False)
Features: drag nodes, zoom/pan, hover to highlight connections, click for detail panel, search bar, filter by node type.
Data Models
| Model | Fields |
|-------|--------|
| Node | id: UUID, type: str, properties: dict, vector: list[float] |
| Edge | id: UUID, source_id: UUID, target_id: UUID, relation: str, weight: float |
| NearestNode | node: Node, distance: float |
| SearchResult | node: Node, score: float |
| TraversalResult | node: Node, depth: int, path: list[UUID] |
| MergeResult | node: Node, created: bool |
| EdgeMergeResult | edge: Edge, created: bool |
| RetrievalResult | query: str, contexts: list, context_text: str, token_estimate: int |
All IDs are auto-generated UUIDs. All models are Pydantic BaseModel instances.
API Reference
Connection
| Method | Description |
|--------|-------------|
| GraphMemory(database=None, vector_length=3, distance_metric='l2', hnsw_ef_construction=128, hnsw_ef_search=64, hnsw_m=16, auto_index=True) | Initialize. None = in-memory. HNSW index auto-created. |
| close() | Close connection (thread-safe, idempotent). |
| transaction() | Context manager for atomic operations. |
Nodes
| Method | Description |
|--------|-------------|
| insert_node(node) -> UUID | Insert a node. |
| bulk_insert_nodes(nodes) -> list[Node] | Bulk insert. |
| merge_node(node, match_keys, strategy=UPDATE) -> MergeResult | Insert or update by property match. |
| bulk_merge_nodes(nodes, match_keys, ...) -> list[MergeResult] | Bulk merge. |
| get_node(node_id) -> Node | Get by ID. |
| update_node(node_id, **kwargs) -> bool | Update fields. |
| delete_node(node_id) | Delete node and its edges. |
| bulk_delete_nodes(node_ids) | Bulk delete. |
| nodes_by_attribute(attr, value) -> list[Node] | Query by property. |
Edges
| Method | Description |
|--------|-------------|
| insert_edge(edge) | Insert an edge. |
| bulk_insert_edges(edges) | Bulk insert. |
| merge_edge(edge) -> EdgeMergeResult | Insert or update by (source, target, relation). |
| bulk_merge_edges(edges) -> list[EdgeMergeResult] | Bulk merge. |
| get_edge(edge_id) -> Edge | Get by ID. |
| update_edge(edge_id, **kwargs) -> bool | Update fields. |
| delete_edge(source_id, target_id) | Delete by endpoints. |
| bulk_delete_edges(edge_ids) | Bulk delete. |
Search
| Method | Description |
|--------|-------------|
| nearest_nodes(vector, limit) -> list[NearestNode] | Vector similarity search. |
| search_nodes(query_text, limit=10) -> list[SearchResult] | Full-text BM25 search. |
| hybrid_search(query_text, query_vector, ...) -> list[SearchResult] | Combined text + vector search. |
| create_index(ef_construction=None, ef_search=None, m=None) | Create/recreate HNSW index with tunable params. Auto-called on init. |
| compact_index() | Compact HNSW index to reclaim space after deletions. |
Retrieval
| Method | Description |
|--------|-------------|
| retrieve(query, query_vector, ...) -> RetrievalResult | Full GraphRAG retrieval pipeline. |
| ask(query, query_vector, llm_callable, ...) -> dict | Retrieval + LLM generation. |
Traversal
| Method | Description |
|--------|-------------|
| connected_nodes(node_id) -> list[Node] | All nodes connected to a node. |
| query() -> QueryBuilder | Flue
