Cosdata
Cosdata: A cutting-edge AI data platform for next-gen search pipelines. Features semantic search, hybrid capabilities, real-time scalability, and ML integration. Designed for immutability and version control to enhance AI projects.
Install / Use
/learn @cosdata/CosdataREADME
📦 Table of Contents
- Overview
- Why Cosdata?
- Benchmarks
- Features
- Getting Started
- Client SDKs
- Documentation
- Contributing
- Contacts & Community
- Show Your Support
🚀 Overview
Cosdata is a next-generation retrieval infrastructure engineered for AI-native applications that demand relevance beyond simple vector similarity.
The Challenge
Traditional vector databases optimize for cosine similarity rather than what users actually find useful. Decades of search evolution prove that effective retrieval requires sophisticated ranking systems that understand context, incorporate multiple signals, and optimize for user satisfaction—not just mathematical proximity.
Our Solution
Built with immutability and version control at its core, Cosdata delivers a relevance-first architecture combining:
- Multi-Modal Retrieval: Seamlessly integrate BM25 full-text search, HNSW dense vectors, SPLADE learned sparse embeddings, and metadata-rich sparse vectors in a unified platform
- Context-Aware Capabilities: Leverage geofencing, hierarchical document organization, and explainable ranking that understands user intent and real-world complexity
- Enterprise-Grade Architecture: Benefit from colocated storage, streaming ingestion, transactional versioning, and comprehensive security features
Proven Impact
Organizations using Cosdata achieve 60-120% reduction in compute requirements while improving retrieval quality by 20-50% (NDCG@10). Our unified architecture eliminates external document stores and complex multi-database queries, reducing infrastructure costs and latency.
<br>💡 Why Cosdata?
The Cosine Similarity Problem
Most vector databases treat retrieval as a pure similarity problem—if two embeddings are mathematically close in vector space, they must be relevant to each other. This assumption is fundamentally flawed.
High cosine similarity ≠ High relevance to users.
Cosine similarity measures the angle between embedding vectors—a mathematical distance determined by how a model was trained. But this metric has no inherent connection to what users actually find useful or relevant. Two documents can be mathematically similar while being practically useless for a user's information need, or vice versa.
The Relevance-First Approach
True relevance requires understanding context, not just proximity.
Decades of search engine evolution—from Google's PageRank to modern recommendation systems—prove that effective retrieval demands:
- Multiple signals: Lexical matching, semantic understanding, metadata, recency, authority, and user context
- Ground truth from users: Real relevance comes from actual user behavior and expert judgments, not embedding distances
- Explainable ranking: Systems must show why results matter, not just that they're "similar"
- Business logic integration: Geographic constraints, temporal filters, hierarchical relationships, and domain-specific rules
How Cosdata Delivers Relevance
Cosdata is built from the ground up to optimize for user satisfaction, not mathematical convenience:
-
Hybrid Multi-Modal Search: Combines BM25 lexical matching, dense vectors (HNSW), SPLADE learned sparse embeddings, and metadata-rich representations—letting each signal contribute what it does best
-
Context-Aware Ranking: Native support for geofencing, hierarchical document structures, temporal filtering, and custom business logic without requiring everything to be embedded
-
Explainable Results: Every result comes with transparent scoring showing semantic similarity contributions, metadata matches, geographic relevance, and hierarchical context
-
Proven Quality Metrics: We measure success using NDCG (Normalized Discounted Cumulative Gain) and recall against human-judged relevance datasets like BEIR—not just precision against our own similarity rankings
Real-World Impact
Organizations using Cosdata see:
- 20-50% improvement in retrieval quality (NDCG@10) compared to pure vector similarity approaches
- 60-120% reduction in compute requirements through efficient multi-modal indexing
- Sub-100ms response times while maintaining relevance quality
- Simplified architecture with colocated storage eliminating external document stores
Bottom line: Cosdata treats retrieval as a relevance problem, not a storage problem. We've learned from decades of search evolution to build infrastructure that understands what users actually need.
<br>📊 Benchmarks
Cosdata delivers exceptional performance across all retrieval modalities. Our benchmarks use industry-standard datasets and compare against leading solutions to demonstrate real-world performance gains.
🔍 Full-Text Search (BM25)
Our custom BM25 implementation outperforms Elasticsearch with dramatically higher throughput and lower latency while maintaining comparable ranking quality.
Performance Highlights
- Up to 151× higher QPS than Elasticsearch (SciFact dataset)
- Average 44× QPS improvement across multiple IR benchmark datasets
- Up to 12× faster indexing on large-scale datasets
- Lower latency at both p50 and p95 percentiles across all tested datasets
Detailed Comparison: Cosdata vs. Elasticsearch
| Dataset | Corpus Size | System | Indexing (sec) | QPS | NDCG@10 | p50 (ms) | p95 (ms) | |---------|-------------|--------|----------------|-----|---------|----------|----------| | arguana | 8.7K | Cosdata | 0.1 | 2,167 | 0.40 | 9 | 15 | | | | Elasticsearch | 1.4 | 263 | 0.48 | 44 | 74 | | climate-fever | 5.4M | Cosdata | 40.6 | 135 | 0.13 | 106 | 379 | | | | Elasticsearch | 522.8 | 84 | 0.14 | 162 | 263 | | fever | 5.4M | Cosdata | 40.3 | 314 | 0.47 | 52 | 157 | | | | Elasticsearch | 525.7 | 154 | 0.52 | 80 | 138 | | fiqa | 57K | Cosdata | 0.5 | 4,942 | 0.25 | 7 | 12 | | | | Elasticsearch | 6.7 | 251 | 0.25 | 39 | 60 | | msmarco | 8.8M | Cosdata | 57.7 | 315 | 0.23 | 46 | 162 | | | | Elasticsearch | 714.7 | 166 | 0.23 | 73 | 129 | | nq | 2.6M | Cosdata | 19.3 | 483 | 0.29 | 30 | 81 | | | | Elasticsearch | 243.2 | 197 | 0.29 | 59 | 100 | | quora | 522K | Cosdata | 2.7 | 1,425 | 0.81 | 11 | 36 | | | | Elasticsearch | 30.2 | 323 | 0.81 | 39 | 55 | | scidocs | 25K | Cosdata | 0.3 | 13,338 | 0.16 | 7 | 12 | | | | Elasticsearch | 3.6 | 319 | 0.15 | 33 | 48 | | scifact | 5.2K | Cosdata | 0.1 | 40,909 | 0.69 | 7 | 13 | | | | Elasticsearch | 1.0 | 271 | 0.68 | 34 | 51 | | trec-covid | 171K | Cosdata | 1.7 | 2,219 | 0.61 | 10 | 18 | | | | Elasticsearch | 22.1 | 110 | 0.62 | 57 | 88 | | webis-touche2020 | 382K | Cosdata | 5.5 | 2,789 | 0.34 | 10 | 18 | | | | Elasticsearch | 63.1 | 108 | 0.34 | 62 | 99 |
Key Takeaway: Cosdata maintains comparable or better ranking quality (NDCG@10) while delivering dramatically higher throughput and lower latency.
🎯 Dense Vector Search (HNSW)
Our HNSW implementation achieves industry-leading performance on large-scale vector datasets with high-dimensional embeddings.
Performance Highlights
- 1,758 QPS on 1 million records (1536 dimensions)
- ~42% faster than Qdrant
- ~54% faster than Weaviate
- ~146% faster than E
Related Skills
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
ui-ux-pro-max-skill
57.9kAn AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
ui-ux-pro-max-skill
57.9kAn AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
onlook
25.0kThe Cursor for Designers • An Open-Source AI-First Design tool • Visually build, style, and edit your React App with AI
