Deeplake
Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.
Install / Use
/learn @activeloopai/DeeplakeREADME
<img src="https://static.scarf.sh/a.png?x-pxid=bc3c57b0-9a65-49fe-b8ea-f711c4d35b82" /><p align="center"> <img src="https://i.postimg.cc/rsjcWc3S/deeplake-logo.png" width="400"/>
</h1> </br> <h1 align="center">Deep Lake: Database for AI</h1> <p align="center"> <a href="https://pypi.org/project/deeplake/"><img src="https://badge.fury.io/py/deeplake.svg" alt="PyPI version" height="18"></a> <a href="https://pepy.tech/project/deeplake"><img src="https://static.pepy.tech/badge/deeplake" alt="PyPI version" height="18"></a> <h3 align="center"> <a href="https://docs.deeplake.ai/?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme"><b>Docs</b></a> • <a href="https://docs.deeplake.ai/latest/getting-started/quickstart/?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme"><b>Get Started</b></a> • <a href="https://docs.deeplake.ai/latest/api/dataset/?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme"><b>API Reference</b></a> • <a href="http://learn.activeloop.ai"><b>LangChain & VectorDBs Course</b></a> • <a href="https://www.activeloop.ai/resources/?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme"><b>Blog</b></a> • <a href="https://www.deeplake.ai/?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme"><b>Whitepaper</b></a> • <a href="http://slack.activeloop.ai"><b>Slack</b></a> • <a href="https://twitter.com/intent/tweet?url=https%3A%2F%2Factiveloop.ai%2F&via=activeloopai&text=Deep%20Lake%20is%20the%20Database%20for%20all%20AI%20data.%20Check%20it%20out%21&hashtags=DeepLake%2Cactiveloop%2Copensource"><b>Twitter</b></a> </h3>What is Deep Lake?
Deep Lake is a Database for AI powered by a storage format optimized for deep-learning applications. Deep Lake can be used for:
- Storing and searching data plus vectors while building LLM applications
- Managing datasets while training deep learning models
Deep Lake simplifies the deployment of enterprise-grade LLM-based products by offering storage for all data types (embeddings, audio, text, videos, images, dicom, pdfs, annotations, and more), querying and vector search, data streaming while training models at scale, data versioning and lineage, and integrations with popular tools such as LangChain, LlamaIndex, Weights & Biases, and many more. Deep Lake works with data of any size, it is serverless, and it enables you to store all of your data in your own cloud and in one place. Deep Lake is used by Intel, Bayer Radiology, Matterport, ZERO Systems, Red Cross, Yale, & Oxford.
Deep Lake includes the following features:
<details> <summary><b>Multi-Cloud Support (S3, GCP, Azure)</b></summary> Use one API to upload, download, and stream datasets to/from S3, Azure, GCP, Activeloop cloud, local storage, or in-memory storage. Compatible with any S3-compatible storage such as MinIO. </details> <details> <summary><b>Native Compression with Lazy NumPy-like Indexing</b></summary> Store images, audio, and videos in their native compression. Slice, index, iterate, and interact with your data like a collection of NumPy arrays in your system's memory. Deep Lake lazily loads data only when needed, e.g., when training a model or running queries. </details> <details> <summary><b>Dataloaders for Popular Deep Learning Frameworks</b></summary> Deep Lake comes with built-in dataloaders for Pytorch and TensorFlow. Train your model with a few lines of code - we even take care of dataset shuffling. :) </details> <details> <summary><b>Integrations with Powerful Tools</b></summary> Deep Lake has integrations with <a href="https://github.com/hwchase17/langchain">Langchain</a> and <a href="https://github.com/jerryjliu/llama_index">LLamaIndex</a> as a vector store for LLM apps, <a href="https://wandb.ai/">Weights & Biases</a> for data lineage during model training, <a href="https://github.com/open-mmlab/mmdetection">MMDetection</a> for training object detection models, and <a href="https://github.com/open-mmlab/mmsegmentation">MMSegmentation</a> for training semantic segmentation models. </details> <details> <summary><b>100+ most-popular image, video, and audio datasets available in seconds</b></summary> Deep Lake community has uploaded <a href="https://app.activeloop.ai/datasets/activeloop?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme">100+ image, video and audio datasets</a> like <a href="https://app.activeloop.ai/activeloop/mnist-train?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme">MNIST</a>, <a href="https://app.activeloop.ai/activeloop/coco-train?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme">COCO</a>, <a href="https://app.activeloop.ai/activeloop/imagenet-train?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme">ImageNet</a>, <a href="https://app.activeloop.ai/activeloop/cifar100-test?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme">CIFAR</a>, <a href="https://app.activeloop.ai/activeloop/gtzan-genre?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme">GTZAN</a> and others. </details> </details> <details> <summary><b>Instant Visualization Support in the <a href="https://app.activeloop.ai/?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme">Deep Lake App</a></b></summary> Deep Lake datasets are instantly visualized with bounding boxes, masks, annotations, etc. in <a href="https://app.activeloop.ai/?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readme">Deep Lake Visualizer</a> (see below). </details>🚀 How to install Deep Lake
Deep Lake can be installed using pip:
pip install deeplake
To access all of Deep Lake's features, please register in the Deep Lake App.
🧠 Deep Lake Code Examples by Application
Vector Store Applications
Using Deep Lake as a Vector Store for building LLM applications:
- Vector Store Quickstart
- Vector Store Tutorials
- LangChain Integration
- LlamaIndex Integration
- Image Similarity Search with Deep Lake
Deep Learning Applications
Using Deep Lake for managing data while training Deep Learning models:
- Deep Learning Quickstart
- Tutorials for Training Models
⚙️ Integrations
Deep Lake offers integrations with other tools in order to streamline your deep learning workflows. Current integrations include:
- LLM Apps
- Use Deep Lake as a vector store for LLM apps. Our integration combines the Langchain VectorStores API with Deep Lake datasets as the underlying data storage. The integration is a serverless vector store that can be deployed locally or in a cloud of your choice.
📚 Documentation
Getting started guides, examples, tutorials, API reference, and other useful information can be found on our documentation page.
🎓 For Students and Educators
Deep Lake users can access and visualize a variety of popular datasets through a free integration with Deep Lake's App. Universities can get up to 1TB of data storage and 100,000 monthly queries on the Tensor Database for free per month. Chat in on our website: to claim the access!
👩💻 Comparisons to Familiar Tools
<details> <summary><b>Deep Lake vs Chroma </b></summary>Both Deep Lake & ChromaDB enable users to store and search vectors (embeddings) and offer integrations with LangChain and LlamaIndex. However, they are architecturally very different. ChromaDB is a Vector Database that can be deployed locally or on a server using Docker and will offer a hosted solution shortly. Deep Lake is a serverless Vector Store deployed on the user’s own cloud, locally, or in-memory. All computations run client-side, which enables users to support lightweight production apps in seconds. Unlike ChromaDB, Deep Lake’s data format can store raw data such as images, videos, and text, in addition to embeddings. ChromaDB is limited to light metadata on top of the embeddings and has no visualization. Deep Lake datasets can be visualized and version controlled. Deep Lake also has a performant dataloader for fine-tuning your Large Language Models.
</details> <details> <summary><b>Deep Lake vs Pinecone</b></summary>Both Deep Lake and Pinecone enable users to store and search vectors (embeddings) and offer integrations with LangChain and LlamaIndex. However, they are architecturally very different. Pinecone is a fully-managed Vector Database that is optimized for highly demanding applications requiring a search for billions of vectors. Deep Lake is serverless. All computations run client-side, which enables users to get started in seconds. Unlike Pinecone, Deep Lake’s data format can store raw data such as images, videos, and
Related Skills
apple-reminders
336.2kManage Apple Reminders via remindctl CLI (list, add, edit, complete, delete). Supports lists, date filters, and JSON/plain output.
bluebubbles
336.2kUse when you need to send or manage iMessages via BlueBubbles (recommended iMessage integration). Calls go through the generic message tool with channel="bluebubbles".
canvas
336.2kCanvas Skill Display HTML content on connected OpenClaw nodes (Mac app, iOS, Android). Overview The canvas tool lets you present web content on any connected node's canvas view. Great for: -
gh-issues
336.2kFetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]

