Pixeltable
Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads.
Install / Use
/learn @pixeltable/PixeltableREADME
The only open source Python library providing declarative data infrastructure for building multimodal AI applications, enabling incremental storage, transformation, indexing, retrieval, and orchestration of data.
Quick Start | Documentation | API Reference | Starter Kit
Installation
pip install pixeltable
Demo
https://github.com/user-attachments/assets/b50fd6df-5169-4881-9dbe-1b6e5d06cede
Quick Start
With Pixeltable, you define your entire data processing and AI workflow declaratively using computed columns on tables. Focus on your application logic, not the data plumbing.
# Installation
pip install -qU torch transformers openai pixeltable
# Basic setup
import pixeltable as pxt
# Table with multimodal column types (Image, Video, Audio, Document)
t = pxt.create_table('images', {'input_image': pxt.Image})
# Computed columns: define transformation logic once, runs on all data
from pixeltable.functions import huggingface
# Object detection with automatic model management
t.add_computed_column(
detections=huggingface.detr_for_object_detection(
t.input_image,
model_id='facebook/detr-resnet-50'
)
)
# Extract specific fields from detection results
t.add_computed_column(detections_text=t.detections.label_text)
# OpenAI Vision API integration with built-in rate limiting and async management
from pixeltable.functions import openai
t.add_computed_column(
vision=openai.vision(
prompt="Describe what's in this image.",
image=t.input_image,
model='gpt-4o-mini'
)
)
# Insert data directly from an external URL
# Automatically triggers computation of all computed columns
t.insert(input_image='https://raw.github.com/pixeltable/pixeltable/release/docs/resources/images/000000000025.jpg')
# Query - All data, metadata, and computed results are persistently stored
# Structured and unstructured data are returned side-by-side
results = t.select(
t.input_image,
t.detections_text,
t.vision
).collect()
What Pixeltable Does
When you run the code above, Pixeltable handles storage, orchestration, indexing, versioning, and inference — automatically. Here's the lifecycle:
| You Write | Pixeltable Does |
|-----------|-----------------|
| pxt.Image, pxt.Video, pxt.Document columns | Stores media, handles formats, caches from URLs |
| add_computed_column(fn(...)) | Runs incrementally, caches results, retries failures |
| add_embedding_index(column) | Manages vector storage, keeps index in sync |
| @pxt.udf / @pxt.query | Creates reusable functions with dependency tracking |
| table.insert(...) | Triggers all dependent computations automatically |
| t.sample(5).select(t.text, summary=udf(t.text)) | Experiment on a sample — nothing stored, calls parallelized and cached |
| table.select(...).collect() | Returns structured + unstructured data together |
| (nothing — it's automatic) | Versions all data and schema changes for time-travel |
That single workflow replaces most of the typical AI stack:
| Instead of … | Pixeltable gives you … |
|---|---|
| PostgreSQL / MySQL | pxt.create_table() — schema is Python, versioned automatically |
| Pinecone / Weaviate / Qdrant | add_embedding_index() — one line, stays in sync |
| S3 / boto3 / blob storage | pxt.Image / Video / Audio / Document types with caching; destination='s3://…' for cloud routing |
| Airflow / Prefect / Celery | Computed columns trigger on insert — no orchestrator needed |
| LangChain / LlamaIndex (RAG) | @pxt.query + .similarity() + computed column chaining |
| pandas / polars (multimodal) | .sample(), ephemeral UDFs, then add_computed_column() to commit — prototype to production |
| DVC / MLflow / W&B | Built-in history(), revert(), time travel (table:N), snapshots |
| Custom retry / rate-limit / caching | Built into every AI integration; results cached, only new rows recomputed |
| Custom ETL / glue code | Declarative schema — Pixeltable handles execution, caching, incremental updates |
On top of these, Pixeltable ships with built-in functions for media processing (FFmpeg, Pillow, spaCy), embeddings (sentence-transformers, CLIP), and 30+ AI providers (OpenAI, Anthropic, Gemini, Ollama, and more). For anything domain-specific, wrap your own logic with @pxt.udf. You still write the application layer (FastAPI, React, Docker).
Deployment options: Pixeltable can serve as your full backend (managing media locally or syncing with S3/GCS/Azure, plus built-in vector search and orchestration) or as an orchestration layer alongside your existing infrastructure.
Where Did My Data Go?
Pixeltable workloads generate various outputs, including both structured outputs (such as bounding boxes for detected objects) and unstructured outputs (such as generated images or video). By default, everything resides in your Pixeltable user directory at ~/.pixeltable. Structured data is stored in a Postgres instance in ~/.pixeltable. Generated media (images, video, audio, documents) are stored outside the Postgres database, in separate flat files in ~/.pixeltable/media. Those media files are referenced by URL in the database, and Pixeltable provides the "glue" for a unified table interface over both structured and unstructured data.
In general, the user is not expected to interact directly with the data in ~/.pixeltable; the data store is fully managed by Pixeltable and is intended to be accessed through the Pixeltable Python SDK.
See Working with External Files for details on loading data from URLs, S3, and local paths.
Key Principles
<details> <summary><b>Store:</b> Unified Multimodal Interface</summary> <br>pxt.Image, pxt.Video, pxt.Audio, pxt.Document, pxt.Json – manage diverse data consistently.
t = pxt.create_table(
'media',
{
'img': pxt.Image,
'video': pxt.Video,
'audio': pxt.Audio,
'document': pxt.Document,
'metadata': pxt.Json
}
)
</details>
<details>
<summary><b>Orchestrate:</b> Declarative Computed Columns</summary>
<br>
Define processing steps once; they run automatically on new/updated data. Supports API calls (OpenAI, Anthropic, Gemini), local inference (Hugging Face, YOLOX, Whisper), vision models, and any Python logic.
# LLM API call
t.add_computed_column(
summary=openai.chat_completions(
messages=[{"role": "user", "content": t.text}], model='gpt-4o-mini'
)
)
# Local model inference
t.add_computed_column(
classification=huggingface.vit_for_image_classification(t.image)
)
# Vision analysis
t.add_computed_column(
description=openai.vision(prompt="Describe this image", image=t.image)
)
→ Computed Columns · AI Integrations · Sample App: Prompt Studio
</details> <details> <summary><b>Iterate:</b> Explode & Process Media</summary> <br>Create views with iterators to explode one row into many (video→frames, doc→chunks, audio→segments).
from pixeltable.functions.video import frame_iterator
from pixeltable.functions.document import document_splitter
# Document chunking with overlap & metadata
chunks = pxt.create_
Related Skills
tmux
335.2kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
notion
335.2kNotion API for creating and managing pages, databases, and blocks.
things-mac
335.2kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
blogwatcher
335.2kMonitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.