SkillAgentSearch skills...

Bhumi

โšก Bhumi โ€“ The fastest AI inference client for Python, built with Rust for unmatched speed, efficiency, and scalability ๐Ÿš€

Install / Use

/learn @justrach/Bhumi
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img src="/assets/bhumi_logo.png" alt="Bhumi Logo" width="1600"/> </p>

Ask DeepWiki

PyPI - Version

๐ŸŒ BHUMI v0.4.82 - The Fastest AI Inference Client โšก

Introduction

Bhumi is the fastest AI inference client, built with Rust for Python. It is designed to maximize performance, efficiency, and scalability, making it the best choice for LLM API interactions.

Why Bhumi?

  • ๐Ÿš€ Fastest AI inference client โ€“ Outperforms alternatives with 2-3x higher throughput
  • โšก Built with Rust for Python โ€“ Achieves high efficiency with low overhead
  • ๐ŸŒ Supports 9+ AI providers โ€“ OpenAI, Anthropic, Google Gemini, Groq, Cerebras, SambaNova, Mistral, Cohere, and more
  • ๐Ÿ‘๏ธ Vision capabilities โ€“ Image analysis across 5 providers (OpenAI, Anthropic, Gemini, Mistral, Cerebras)
  • ๐Ÿ”„ Streaming and async capabilities โ€“ Real-time responses with Rust-powered concurrency
  • ๐Ÿ” Automatic connection pooling and retries โ€“ Ensures reliability and efficiency
  • ๐Ÿ’ก Minimal memory footprint โ€“ Uses up to 60% less memory than other clients
  • ๐Ÿ— Production-ready โ€“ Optimized for high-throughput applications with OpenAI Responses API support

Bhumi (เคญเฅ‚เคฎเคฟ) is Sanskrit for Earth, symbolizing stability, grounding, and speedโ€”just like our inference engine, which ensures rapid and stable performance. ๐Ÿš€

๐Ÿ†• What's New in v0.4.82

โœจ Major New Features

  • ๐Ÿ”ท Cohere Provider Support: Added Cohere AI with OpenAI-compatible /v1/chat/completions endpoint
  • ๐Ÿ“ก Free-Threaded Python 3.13+ Support: True parallel execution without GIL for maximum performance
  • ๐Ÿ—‘๏ธ Removed orjson Dependency: Simplified dependencies using stdlib JSON for better compatibility
  • โฌ†๏ธ PyO3 0.26 Upgrade: Updated to latest PyO3 with modern Bound API and better performance
  • ๐Ÿ”ง Tokio 1.47: Latest async runtime for improved concurrency

๐Ÿ›  Technical Improvements

  • Enhanced OCR Integration: client.ocr() and client.upload_file() methods
  • Unified API: Single method handles both file upload and OCR processing
  • Better Error Handling: Improved timeout and validation for OCR operations
  • Production Ready: Optimized for high-volume document processing workflows

๐Ÿ“Š OCR Capabilities

  • Document Types: PDF, JPEG, PNG, and more formats
  • Text Extraction: High-accuracy OCR with layout preservation
  • Structured Data: Extract tables, forms, and key-value pairs
  • Bounding Boxes: Precise text positioning and element detection
  • Multi-format Output: Markdown text + structured JSON data

๐Ÿ†• What's New in v0.4.8

โœจ Major New Features

  • ๐ŸŒ 8+ AI Providers: Added Mistral AI support with vision capabilities (Pixtral models)
  • ๐Ÿ‘๏ธ Vision Support: Image analysis across 5 providers (OpenAI, Anthropic, Gemini, Mistral, Cerebras)
  • ๐Ÿ“ก OpenAI Responses API: Intelligent routing for new API patterns with better performance
  • ๐Ÿ”ง Satya v0.3.7: Upgraded with nested model support and enhanced validation
  • ๐Ÿš€ Production Ready: Improved wheel building, Docker compatibility, and CI/CD

๐Ÿ›  Technical Improvements

  • Cross-platform Wheels: Enhanced building for Linux, macOS (Intel + Apple Silicon), Windows
  • OpenSSL Integration: Proper SSL library linking for all platforms
  • Workflow Optimization: Disabled integration tests for faster releases
  • Bug Fixes: Resolved MAP-Elites buffer issues and Satya validation problems
  • Performance Optimizations: Improved MAP-Elites archive loading with orjson + Satya validation
  • Production Ready: Enhanced error handling and timeout protection

๐Ÿ“Š Provider Support Matrix

| Provider | Chat | Streaming | Tools | Vision | Structured | |----------|------|-----------|-------|---------|------------| | OpenAI | โœ… | โœ… | โœ… | โœ… | โœ… | | Anthropic | โœ… | โœ… | โœ… | โœ… | โš ๏ธ | | Gemini | โœ… | โœ… | โœ… | โœ… | โš ๏ธ | | Groq | โœ… | โœ… | โœ… | โŒ | โš ๏ธ | | Cerebras | โœ… | โœ… | โœ…* | โœ… | โš ๏ธ | | SambaNova | โœ… | โœ… | โœ… | โŒ | โš ๏ธ | | OpenRouter | โœ… | โœ… | โœ… | โŒ | โš ๏ธ | | Cohere | โœ… | โœ… | โœ… | โŒ | โš ๏ธ |

*Cerebras tools require specific models

Installation

No Rust compiler required! ๐ŸŽŠ Pre-compiled wheels are available for all major platforms:

pip install bhumi

Supported Platforms:

  • ๐Ÿง Linux (x86_64)
  • ๐ŸŽ macOS (Intel & Apple Silicon)
  • ๐ŸชŸ Windows (x86_64)
  • ๐Ÿ Python 3.8, 3.9, 3.10, 3.11, 3.12

Latest v0.4.8 release includes improved wheel building and cross-platform compatibility!

Quick Start

OpenAI Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("OPENAI_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="openai/gpt-4o",
        debug=True
    )
    
    client = BaseLLMClient(config)
    
    response = await client.completion([
        {"role": "user", "content": "Tell me a joke"}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

โšก Performance Optimizations

Bhumi includes cutting-edge performance optimizations that make it 2-3x faster than alternatives:

๐Ÿง  MAP-Elites Buffer Strategy (v0.4.8 Enhanced)

  • Ultra-fast archive loading with Satya v0.3.7 validation + stdlib JSON parsing (2-3x faster than standard JSON)
  • Trained buffer configurations optimized through evolutionary algorithms
  • Automatic buffer adjustment based on response patterns and historical data
  • Type-safe validation with comprehensive error checking
  • Secure loading without unsafe eval() operations
  • Nested model support for complex data structures

๐Ÿ“Š Performance Status Check

Check if you have optimal performance with the built-in diagnostics:

from bhumi.utils import print_performance_status

# Check optimization status
print_performance_status()
# ๐Ÿš€ Bhumi Performance Status
# โœ… Optimized MAP-Elites archive loaded  
# โšก Optimization Details:
#    โ€ข Entries: 15,644 total, 15,644 optimized
#    โ€ข Coverage: 100.0% of search space
#    โ€ข Loading: Satya validation + stdlib JSON parsing (2-3x faster)

๐Ÿ† Archive Distribution (v0.4.8 Enhanced)

When you install Bhumi, you automatically get:

  • Pre-trained MAP-Elites archive for optimal buffer sizing
  • Fast stdlib JSON parsing (2-3x faster than standard json)
  • Satya v0.3.7-powered type validation for bulletproof data loading
  • Performance metrics and diagnostics
  • Nested model support for complex configurations

Gemini Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("GEMINI_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="gemini/gemini-2.0-flash",
        debug=True
    )
    
    client = BaseLLMClient(config)
    
    response = await client.completion([
        {"role": "user", "content": "Tell me a joke"}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

Cerebras Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("CEREBRAS_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="cerebras/llama3.1-8b",  # gateway-style model parsing is supported
        debug=True,
    )

    client = BaseLLMClient(config)

    response = await client.completion([
        {"role": "user", "content": "Summarize the benefits of Bhumi in one sentence."}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

Mistral AI Example (with Vision)

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("MISTRAL_API_KEY")

async def main():
    # Text-only model
    config = LLMConfig(
        api_key=api_key,
        model="mistral/mistral-small-latest",
        debug=True
    )
    
    client = BaseLLMClient(config)
    response = await client.completion([
        {"role": "user", "content": "Bonjour! Parlez-moi de Paris."}  # French language support
    ])
    print(f"Mistral Response: {response['text']}")

    # Vision model for image analysis
    vision_config = LLMConfig(
        api_key=api_key,
        model="mistral/pixtral-12b-2409"  # Pixtral vision model
    )
    
    vision_client = BaseLLMClient(vision_config)
    response = await vision_client.completion([
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="}}
            ]
        }
    ])
    print(f"Vision Analysis: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

Provider API: Multi-Provider Model Format

Bhumi unifies providers using a simple provider/model format in LLMConfig.model. Base URLs are auto-set for known providers; you can override with base_url.

  • Supported providers: openai, anthropic, gemini, groq, sambanova, openrouter, cerebras, mistral, cohere
  • Foundation providers use provider/model. Gateways like Groq/OpenRouter/SambaNova may use nested paths after the provider (e.g., openrouter/meta-llama/llama-3.1-8b-instruct).
from bhumi.base_client import BaseLLMClient, LLMConfig

# OpenAI
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4o"))

# Anthropic
client = BaseLLMClient(LLMConfig(api_key=os.getenv("ANTHROPIC_API_KEY"), model="anthropic/claude-3-5-sonnet-latest"))

# Gemini (OpenAI-compatible endpoint)
client = BaseL
View on GitHub
GitHub Stars67
CategoryDevelopment
Updated2d ago
Forks6

Languages

Python

Security Score

100/100

Audited on Mar 29, 2026

No findings