AuraSDK
Cognitive memory for AI agents. Pure Rust, <1ms recall, 2.7MB, zero cloud. Patent Pending.
Install / Use
/learn @teolex2020/AuraSDKREADME
Your AI model is smart. But it forgets everything after every conversation.
AuraSDK is a local cognitive runtime that runs alongside any frozen model. It gives agents durable memory, explainability, governed correction, bounded recall reranking, and bounded self-adaptation through experience — all locally, without fine-tuning or cloud training.
pip install aura-memory
from aura import Aura, Level
brain = Aura("./agent_memory")
brain.enable_full_cognitive_stack() # activate all four bounded reranking overlays
# store what happens
brain.store("User always deploys to staging first", level=Level.Domain, tags=["workflow"])
brain.store("Staging deploy prevented 3 production incidents", level=Level.Domain, tags=["workflow"])
# recall — local retrieval with optional bounded cognitive reranking
context = brain.recall("deployment decision") # <1ms, no API call
# inspect advisory hints produced from stored evidence
hints = brain.get_surfaced_policy_hints()
# → [{"action": "Prefer", "domain": "workflow", "description": "deploy to staging first"}]
No API keys. No embeddings required. No cloud. The model stays the same — the cognitive layer becomes more structured, more inspectable, and more useful over time.
⭐ If AuraSDK is useful to you, a GitHub star helps us get funding to continue development from Kyiv.
Why Aura?
| | Aura | Mem0 | Zep | Cognee | Letta/MemGPT | |---|---|---|---|---|---| | Architecture | 5-layer cognitive engine | Vector + LLM | Vector + LLM | Graph + LLM | LLM orchestration | | Derived cognitive layers without LLM | Yes — Belief→Concept→Causal→Policy | No | No | No | No | | Advisory policy hints from experience | Yes — bounded and non-executing | No | No | No | No | | Learns from agent's own responses | Yes — bounded, auditable, no fine-tuning | No | No | No | No | | Salience weighting | Yes — what matters persists longer | No | No | No | No | | Contradiction governance | Yes — explicit, operator-visible | No | No | No | No | | LLM required | No | Yes | Yes | Yes | Yes | | Recall latency | <1ms | ~200ms+ | ~200ms | LLM-bound | LLM-bound | | Works offline | Fully | Partial | No | No | With local LLM | | Cost per operation | $0 | API billing | Credit-based | LLM + DB cost | LLM cost | | Binary size | ~3 MB | ~50 MB+ | Cloud service | Heavy (Neo4j+) | Python pkg | | Memory decay & promotion | Built-in | Via LLM | Via LLM | No | Via LLM | | Trust & provenance | Built-in | No | No | No | No | | Encryption at rest | ChaCha20 + Argon2 | No | No | No | No | | Language | Rust | Python | Proprietary | Python | Python |
The Core Idea: Cheap Model + Aura > Expensive Model Alone
Fine-tuning costs thousands of dollars and weeks of work. RAG requires embeddings and a vector database. Context windows are expensive per token.
Aura gives you a third path: a local cognitive runtime that accumulates structured experience between conversations — free, local, sub-millisecond.
Week 1: GPT-4o-mini + Aura Week 1: GPT-4 alone
→ average answers → average answers
Week 4: GPT-4o-mini + Aura Week 4: GPT-4 alone
→ recalls your workflow → still forgets everything
→ surfaces patterns you repeat → same cost per token
→ exposes explainability + correction → no improvement
→ boundedly adapts from experience → no durable learning
→ $0 compute cost → still billing per call
The model stays the same. The cognitive layer gets stronger. That's Aura.
Performance
Benchmarked on 1,000 records (Windows 10 / Ryzen 7):
| Operation | Latency | vs Mem0 | |-----------|---------|---------| | Store | 0.09 ms | ~same | | Recall (structured) | 0.74 ms | ~270× faster | | Recall (cached) | 0.48 µs | ~400,000× faster | | Maintenance cycle | 1.1 ms | No equivalent |
Mem0 recall requires an embedding API call (~200ms+) + vector search. Aura recall is pure local computation.
What Ships Today
Aura's full cognitive recall pipeline is active and bounded:
Record → Belief (±5%) → Concept (±4%) → Causal (±3%) → Policy (±2%)
Enable everything in one call:
brain.enable_full_cognitive_stack() # activates all four bounded reranking phases
brain.disable_full_cognitive_stack() # back to raw RRF baseline
Or configure individual phases:
brain.set_belief_rerank_mode("limited") # belief-aware ranking
brain.set_concept_surface_mode("limited") # concept annotations + bounded concept reranking
brain.set_causal_rerank_mode("limited") # causal chain boost
brain.set_policy_rerank_mode("limited") # policy hint shaping
Higher layers also expose advisory surfaced output:
get_surfaced_concepts()— stable concept abstractions over repeated beliefsget_surfaced_causal_patterns()— learned cause→effect patternsget_surfaced_policy_hints()— advisory recommendations (Prefer / Avoid / Warn)- no automatic behavior influence — all output is advisory and read-only
Aura also ships operator-facing and plasticity-facing surfaces:
- explainability:
explain_recall()explain_record()provenance_chain()explainability_bundle()
- governed correction:
- targeted retract/deprecate APIs
- persistent correction log
- correction review queue
- suggested corrections without auto-apply
- bounded autonomous plasticity:
capture_experience()ingest_experience_batch()- maintenance-phase integration
- anti-hallucination guards
- plasticity risk scoring
- purge / freeze controls
- bounded v6 cognitive guidance:
- salience:
mark_record_salience()get_high_salience_records()get_salience_summary()
- reflection:
get_reflection_summaries()get_latest_reflection_digest()get_reflection_digest()
- contradiction and instability:
get_belief_instability_summary()get_contradiction_clusters()get_contradiction_review_queue()
- honest explainability support:
- unresolved-evidence markers in recall explanations
- bounded answer-support phrasing for agent / UI layers
- salience:
How Memory Works
Aura organizes memories into 4 levels across 2 tiers. Important memories persist, trivial ones decay naturally:
CORE TIER (slow decay — weeks to months)
Identity [0.99] Who the user is. Preferences. Personality.
Domain [0.95] Learned facts. Domain knowledge.
COGNITIVE TIER (fast decay — hours to days)
Decisions [0.90] Choices made. Action items.
Working [0.80] Current tasks. Recent context.
SEMANTIC TYPES (modulate decay & promotion)
fact Default knowledge record.
decision More persistent than a standard fact. Promotes earlier.
preference Long-lived user or agent preference.
contradiction Preserved longer for conflict analysis.
trend Time-sensitive pattern tracked over repeated activation.
serendipity Cross-domain discovery record.
One call runs the lifecycle — decay, promotion, consolidation, and archival:
report = brain.run_maintenance() # background memory maintenance
Key Features
Core Cognitive Runtime
- Fast Local Recall - Multi-signal ranking with optional embedding support
- Two-Tier Memory — Cognitive (ephemeral) + Core (permanent) with decay, promotion, and archival
- Semantic Memory Types — 6 roles (
fact,decision,trend,preference,contradiction,serendipity) that influence memory behavior and insighting - Phase-Based Insights — Detects conflicts, trends, preference patterns, and cross-domain links
- Background Maintenance — Continuous memory hygiene: decay, reflect, insights, consolidation, archival
- Namespace Isolation —
namespace="sandbox"keeps test data invisible to production recall - Pluggable Embeddings - Optional embedding support: bring your own embedding function
Trust & Safety
- **
