Semantica
Semantica 🧠 — A framework for building semantic layers, context graphs, and decision intelligence systems with explainability and provenance.
Install / Use
/learn @Hawksight-AI/SemanticaREADME
🧠 Semantica
A Framework for Building Context Graphs and Decision Intelligence Layers for AI
⭐ Give us a Star • 🍴 Fork us • 💬 Join our Discord • 🐦 Follow on X
</div>Transform Chaos into Intelligence. Build AI systems with context graphs, decision tracking, and advanced knowledge engineering that are explainable, traceable, and trustworthy — not black boxes.
The Problem
AI agents today are capable but not trustworthy:
- No memory structure — agents store embeddings, not meaning. Retrieval is fuzzy; there's no way to ask why something was recalled.
- No decision trail — agents make decisions continuously but record nothing. When something goes wrong, there's no history to debug or audit.
- No provenance — outputs cannot be traced back to source facts. In regulated industries, this is a compliance blocker.
- No reasoning transparency — black-box answers with no explanation of how a conclusion was reached.
- No conflict detection — contradictory facts silently coexist in vector stores, producing unpredictable answers.
These aren't edge cases. They are the reason AI cannot be deployed in healthcare, finance, legal, and government without custom guardrails built from scratch.
The Solution
Semantica is the context and intelligence layer you add to your AI stack:
- Context Graphs — structured graph of entities, relationships, and decisions your agent builds as it works. Queryable, traceable, persistent.
- Decision Intelligence — every decision is a first-class object: recorded, linked causally, searchable by precedent, and analyzable for downstream impact.
- Provenance — every fact links to its source. W3C PROV-O compliant. Full lineage from ingestion to inference.
- Reasoning engines — forward chaining, Rete networks, deductive, abductive, and SPARQL reasoning. Explainable inference paths, not black-box answers.
- Deduplication & QA — conflict detection, entity resolution, and validation built into the pipeline.
Works alongside LangChain, LlamaIndex, AutoGen, CrewAI, and any LLM provider — Semantica is not a replacement, it's the accountability layer on top.
⚡ Quick Installation
pip install semantica
What's New in v0.3.0
First stable release —
Production/Stableon PyPI. Ships across three stages: 0.3.0-alpha, 0.3.0-beta, and 0.3.0 stable.
| Area | Highlights |
|------|-----------|
| Context Graphs | Temporal validity windows (valid_from/valid_until), weighted BFS (min_weight), cross-graph navigation (link_graph, navigate_to, resolve_links) with full save/load persistence |
| Decision Intelligence | Complete lifecycle: record_decision → trace_decision_chain → analyze_decision_impact → find_similar_decisions; hybrid precedent search; PolicyEngine with versioned rules |
| KG Algorithms | PageRank, betweenness, community detection (Louvain), Node2Vec embeddings, link prediction, path finding — all returning structured dicts |
| Semantic Extraction | LLM relation extraction fixed (no silent drops); _match_pattern rewritten; duplicate relation bug removed; "llm_typed" metadata corrected |
| Deduplication v2 | blocking_v2/hybrid_v2 candidate generation (63.6% faster); two-stage prefilter (18–25% faster); semantic dedup v2 (6.98x faster) |
| Delta Processing | SPARQL-based incremental diff; delta_mode pipelines; snapshot versioning with prune_versions() |
| Export | RDF format aliases ("ttl", "json-ld", etc.); ArangoDB AQL export; Apache Parquet export (Spark/BigQuery/Databricks ready) |
| Pipeline | FailureHandler with LINEAR/EXPONENTIAL/FIXED backoff; PipelineValidator returning ValidationResult; retry loop fixed |
| Graph Backends | Apache AGE (SQL injection fixed), AWS Neptune, FalkorDB, PgVector (HNSW/IVFFlat indexing) |
| Tests | 886+ passing, 0 failures — 335 context, ~430 KG, 70 semantic extraction, 85 real-world E2E |
See RELEASE_NOTES.md for the full per-contributor breakdown and CHANGELOG for the complete diff.
Unreleased / Coming Next
| Area | Highlights |
|------|-----------|
| SHACL Constraints | OntologyEngine.to_shacl() auto-derives SHACL shapes from any OWL ontology; validate_graph() returns structured SHACLValidationReport with plain-English violation explanations; three quality tiers ("basic", "standard", "strict"); three output formats (Turtle, JSON-LD, N-Triples); 3-level inheritance propagation |
Features
Context & Decision Intelligence
- Context Graphs — structured graph of entities, relationships, and decisions; queryable, causal, persistent
- Decision tracking — record, link, and analyze every agent decision with
add_decision(),record_decision() - Causal chains — link decisions with
add_causal_relationship(), trace lineage withtrace_decision_chain() - Precedent search — hybrid similarity search over past decisions with
find_similar_decisions() - Influence analysis —
analyze_decision_impact(),analyze_decision_influence()— understand downstream effects - Policy engine — enforce business rules with
check_decision_rules(); automated compliance validation - Agent memory —
AgentMemorywith short/long-term storage, conversation history, and statistics - Cross-system context capture —
capture_cross_system_inputs()for multi-agent pipelines
Knowledge Graphs
- Knowledge graph construction — entities, relationships, properties, typed edges
- Graph algorithms — PageRank, betweenness centrality, clustering coefficient, community detection
- Node embeddings — Node2Vec embeddings via
NodeEmbedder - Similarity — cosine similarity via
SimilarityCalculator - Link prediction — score potential new edges via
LinkPredictor - Temporal graphs — time-aware nodes and edges
- Incremental / delta processing — update graphs without full recompute
Semantic Extraction
- Entity extraction — named entity recognition, normalization, classification
- Relation extraction — triplet generation from raw text using LLMs or rule-based methods
- LLM-typed extraction — extraction with typed relation metadata
- Deduplication v1 — Jaro-Winkler similarity, basic blocking
- Deduplication v2 —
blocking_v2,hybrid_v2,semantic_v2strategies withmax_candidates_per_entity - Triplet deduplication —
dedup_triplets()for removing duplicate (subject, predicate, object) triples
Reasoning Engines
- Forward chaining —
Reasonerwith IF/THEN string rules and dict facts - Rete network —
ReteEnginefor high-throughput production rule matching - Deductive reasoning —
DeductiveReasonerfor classical inference - Abductive reasoning —
AbductiveReasonerfor hypothesis generation from observations - SPARQL reasoning —
SPARQLReasonerfor query-based inference over RDF graphs
Provenance & Auditability
- Entity provenance —
ProvenanceTracker.track_entity(id, source_url, metadata) - Algorithm provenance —
AlgorithmTrackerWithProvenancetracks computation lineage - Graph builder provenance —
GraphBuilderWithProvenancerecords entity source lineage from URLs - W3C PROV-O compliant — lineage tracking across all modules
- Change management — version control with checksums, audit trails, compliance support
Vector Store
- Backends — FAISS, Pinecone, Weaviate, Qdrant, Milvus, PgVector, in-memory
- Semantic search — top-k retrieval by embedding similarity
- Hybrid search — vector + keyword with configurable weights
- Filtered search — metadata-based filtering on any field
- Custom similarity weights — tune retrieval per use case
🌐 Graph Database Support
- AWS Neptune — Amazon Neptune graph database with IAM authentication
- Apache AGE — PostgreSQL graph extension with openCypher via SQL
- FalkorDB — native support;
DecisionQueryandCausalChainAnalyzerwork directly with FalkorDB row/header shapes
Data Ingestion
- File formats — PDF, DOCX, HTML, JSON, CSV, Excel, PPTX, archives
- Web crawl —
WebIngestorwith configurable depth - Databases —
DBIngestorwith SQL query support - Snowflake —
SnowflakeIngestorwith table/query ingestion, pagination, and key-pair/OAuth auth - Docling — advanced document parsing with table and layout extraction (PDF, DOCX, PPTX, XLSX)
- Media — image OCR, audio/video metadata extraction
Export Formats
- RDF — Turtle (
.ttl), JSON-LD, N-Triples (.nt), XML viaRDFExporter - Parquet —
ParquetExporterfor entities, relationships, and full KG export - *ArangoDB AQL
