🧠 Semantica

A Framework for Building Context Graphs and Decision Intelligence Layers for AI

⭐ Give us a Star • 🍴 Fork us • 💬 Join our Discord • 🐦 Follow on X

Transform Chaos into Intelligence. Build AI systems with context graphs, decision tracking, and advanced knowledge engineering that are explainable, traceable, and trustworthy — not black boxes.

</div>

The Problem

AI agents today are capable but not trustworthy:

No memory structure — agents store embeddings, not meaning. Retrieval is fuzzy; there's no way to ask why something was recalled.
No decision trail — agents make decisions continuously but record nothing. When something goes wrong, there's no history to debug or audit.
No provenance — outputs cannot be traced back to source facts. In regulated industries, this is a compliance blocker.
No reasoning transparency — black-box answers with no explanation of how a conclusion was reached.
No conflict detection — contradictory facts silently coexist in vector stores, producing unpredictable answers.

These aren't edge cases. They are the reason AI cannot be deployed in healthcare, finance, legal, and government without custom guardrails built from scratch.

The Solution

Semantica is the context and intelligence layer you add to your AI stack:

Context Graphs — structured graph of entities, relationships, and decisions your agent builds as it works. Queryable, traceable, persistent.
Decision Intelligence — every decision is a first-class object: recorded, linked causally, searchable by precedent, and analyzable for downstream impact.
Provenance — every fact links to its source. W3C PROV-O compliant. Full lineage from ingestion to inference.
Reasoning engines — forward chaining, Rete networks, deductive, abductive, and SPARQL reasoning. Explainable inference paths, not black-box answers.
Deduplication & QA — conflict detection, entity resolution, and validation built into the pipeline.

Works alongside LangChain, LlamaIndex, AutoGen, CrewAI, and any LLM provider — Semantica is not a replacement, it's the accountability layer on top.

⚡ Quick Installation

pip install semantica

What's New in v0.3.0

First stable release — Production/Stable on PyPI. Ships across three stages: 0.3.0-alpha, 0.3.0-beta, and 0.3.0 stable.

| Area | Highlights | |------|-----------| | Context Graphs | Temporal validity windows (valid_from/valid_until), weighted BFS (min_weight), cross-graph navigation (link_graph, navigate_to, resolve_links) with full save/load persistence | | Decision Intelligence | Complete lifecycle: record_decision → trace_decision_chain → analyze_decision_impact → find_similar_decisions; hybrid precedent search; PolicyEngine with versioned rules | | KG Algorithms | PageRank, betweenness, community detection (Louvain), Node2Vec embeddings, link prediction, path finding — all returning structured dicts | | Semantic Extraction | LLM relation extraction fixed (no silent drops); _match_pattern rewritten; duplicate relation bug removed; "llm_typed" metadata corrected | | Deduplication v2 | blocking_v2/hybrid_v2 candidate generation (63.6% faster); two-stage prefilter (18–25% faster); semantic dedup v2 (6.98x faster) | | Delta Processing | SPARQL-based incremental diff; delta_mode pipelines; snapshot versioning with prune_versions() | | Export | RDF format aliases ("ttl", "json-ld", etc.); ArangoDB AQL export; Apache Parquet export (Spark/BigQuery/Databricks ready) | | Pipeline | FailureHandler with LINEAR/EXPONENTIAL/FIXED backoff; PipelineValidator returning ValidationResult; retry loop fixed | | Graph Backends | Apache AGE (SQL injection fixed), AWS Neptune, FalkorDB, PgVector (HNSW/IVFFlat indexing) | | Tests | 886+ passing, 0 failures — 335 context, ~430 KG, 70 semantic extraction, 85 real-world E2E |

See RELEASE_NOTES.md for the full per-contributor breakdown and CHANGELOG for the complete diff.

Unreleased / Coming Next

| Area | Highlights | |------|-----------| | SHACL Constraints | OntologyEngine.to_shacl() auto-derives SHACL shapes from any OWL ontology; validate_graph() returns structured SHACLValidationReport with plain-English violation explanations; three quality tiers ("basic", "standard", "strict"); three output formats (Turtle, JSON-LD, N-Triples); 3-level inheritance propagation |

Features

Context & Decision Intelligence

Context Graphs — structured graph of entities, relationships, and decisions; queryable, causal, persistent
Decision tracking — record, link, and analyze every agent decision with add_decision(), record_decision()
Causal chains — link decisions with add_causal_relationship(), trace lineage with trace_decision_chain()
Precedent search — hybrid similarity search over past decisions with find_similar_decisions()
Influence analysis — analyze_decision_impact(), analyze_decision_influence() — understand downstream effects
Policy engine — enforce business rules with check_decision_rules(); automated compliance validation
Agent memory — AgentMemory with short/long-term storage, conversation history, and statistics
Cross-system context capture — capture_cross_system_inputs() for multi-agent pipelines

Knowledge Graphs

Knowledge graph construction — entities, relationships, properties, typed edges
Graph algorithms — PageRank, betweenness centrality, clustering coefficient, community detection
Node embeddings — Node2Vec embeddings via NodeEmbedder
Similarity — cosine similarity via SimilarityCalculator
Link prediction — score potential new edges via LinkPredictor
Temporal graphs — time-aware nodes and edges
Incremental / delta processing — update graphs without full recompute

Semantic Extraction

Entity extraction — named entity recognition, normalization, classification
Relation extraction — triplet generation from raw text using LLMs or rule-based methods
LLM-typed extraction — extraction with typed relation metadata
Deduplication v1 — Jaro-Winkler similarity, basic blocking
Deduplication v2 — blocking_v2, hybrid_v2, semantic_v2 strategies with max_candidates_per_entity
Triplet deduplication — dedup_triplets() for removing duplicate (subject, predicate, object) triples

Reasoning Engines

Forward chaining — Reasoner with IF/THEN string rules and dict facts
Rete network — ReteEngine for high-throughput production rule matching
Deductive reasoning — DeductiveReasoner for classical inference
Abductive reasoning — AbductiveReasoner for hypothesis generation from observations
SPARQL reasoning — SPARQLReasoner for query-based inference over RDF graphs

Provenance & Auditability

Entity provenance — ProvenanceTracker.track_entity(id, source_url, metadata)
Algorithm provenance — AlgorithmTrackerWithProvenance tracks computation lineage
Graph builder provenance — GraphBuilderWithProvenance records entity source lineage from URLs
W3C PROV-O compliant — lineage tracking across all modules
Change management — version control with checksums, audit trails, compliance support

Vector Store

Backends — FAISS, Pinecone, Weaviate, Qdrant, Milvus, PgVector, in-memory
Semantic search — top-k retrieval by embedding similarity
Hybrid search — vector + keyword with configurable weights
Filtered search — metadata-based filtering on any field
Custom similarity weights — tune retrieval per use case

🌐 Graph Database Support

AWS Neptune — Amazon Neptune graph database with IAM authentication
Apache AGE — PostgreSQL graph extension with openCypher via SQL
FalkorDB — native support; DecisionQuery and CausalChainAnalyzer work directly with FalkorDB row/header shapes

Data Ingestion

File formats — PDF, DOCX, HTML, JSON, CSV, Excel, PPTX, archives
Web crawl — WebIngestor with configurable depth
Databases — DBIngestor with SQL query support
Snowflake — SnowflakeIngestor with table/query ingestion, pagination, and key-pair/OAuth auth
Docling — advanced document parsing with table and layout extraction (PDF, DOCX, PPTX, XLSX)
Media — image OCR, audio/video metadata extraction

Export Formats

RDF — Turtle (.ttl), JSON-LD, N-Triples (.nt), XML via RDFExporter
Parquet — ParquetExporter for entities, relationships, and full KG export
*ArangoDB AQL

Semantica

Install / Use

README