Datahub
The Metadata Platform for your Data and AI Stack
Install / Use
/learn @datahub-project/DatahubREADME
The #1 Open Source AI Data Catalog
Enterprise-grade metadata platform enabling discovery, governance, and observability across your entire data ecosystem
<p align="center"> <a href="https://github.com/datahub-project/datahub/actions/workflows/build-and-test.yml"> <img src="https://github.com/datahub-project/datahub/actions/workflows/build-and-test.yml/badge.svg" alt="Build Status" /> </a> <a href="https://pypi.org/project/acryl-datahub/"> <img src="https://img.shields.io/pypi/v/acryl-datahub.svg" alt="PyPI Version" /> </a> <a href="https://pypi.org/project/acryl-datahub/"> <img src="https://img.shields.io/pypi/dm/acryl-datahub.svg" alt="PyPI Downloads" /> </a> <a href="https://hub.docker.com/r/linkedin/datahub-gms"> <img src="https://img.shields.io/docker/pulls/linkedin/datahub-gms.svg" alt="Docker Pulls" /> </a> <br /> <a href="https://datahub.com/slack?utm_source=github&utm_medium=readme&utm_campaign=github_readme"> <img src="https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social" alt="Join Slack" /> </a> <a href="https://www.youtube.com/channel/UC3qFQC5IiwR5fvWEqi_tJ5w"> <img src="https://img.shields.io/youtube/channel/subscribers/UC3qFQC5IiwR5fvWEqi_tJ5w?style=social&logo=youtube&label=Subscribe" alt="YouTube Subscribers" /> </a> <a href="https://datahub.com/blog/"> <img src="https://img.shields.io/badge/blog-read-red.svg?style=social&logo=medium" alt="DataHub Blog" /> </a> <a href="https://github.com/datahub-project/datahub/graphs/contributors"> <img src="https://img.shields.io/github/contributors/datahub-project/datahub.svg" alt="Contributors" /> </a> <a href="https://github.com/datahub-project/datahub/stargazers"> <img src="https://img.shields.io/github/stars/datahub-project/datahub.svg?style=social&label=Star" alt="GitHub Stars" /> </a> <a href="https://github.com/datahub-project/datahub/blob/master/LICENSE"> <img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License" /> </a> </p> <p align="center"> <a href="https://datahub.com/free-trial/"><b>Free Cloud Trial</b></a> • <a href="https://docs.datahub.com/docs/quickstart"><b>Quick Start</b></a> • <a href="https://demo.datahub.com"><b>Live Demo</b></a> • <a href="https://docs.datahub.com"><b>Documentation</b></a> • <a href="https://datahub.com/slack"><b>Slack Community</b></a> • <a href="https://www.youtube.com/@datahubproject"><b>YouTube</b></a> </p> <p align="center"> <i>Built with ❤️ by <a href="https://datahub.com">DataHub</a> and <a href="https://engineering.linkedin.com">LinkedIn</a></i> </p><p align="center"> <a href="https://demo.datahub.com"> <img width="90%" src="https://raw.githubusercontent.com/datahub-project/static-assets/refs/heads/main/imgs/demos/datahub-tour.gif" alt="DataHub Product Tour" /> </a> </p> <p align="center"> <i>Search, discover, and understand your data with DataHub's unified metadata platform</i> </p>
🤖 NEW: Connect AI Agents to DataHub via Model Context Protocol (MCP)
<p align="center"> <a href="https://youtu.be/aVWJsw7RJ8c?t=568"> <img width="600" src="https://raw.githubusercontent.com/datahub-project/static-assets/refs/heads/main/imgs/demos/mcp-demo.gif" alt="DataHub MCP Demo - Query metadata with AI agents" /> </a> <br/> <i>▶️ Click to watch full demo on YouTube</i> </p>Connect your AI coding assistants (Cursor, Claude Desktop, Cline) directly to DataHub. Query metadata with natural language: "What datasets contain PII?" or "Show me lineage for this table"
Quick setup:
npx -y @acryldata/mcp-server-datahub init
What is DataHub?
🔍 Finding the right DataHub? This is the open-source metadata platform at datahub.com (GitHub: datahub-project/datahub). It was previously hosted at
datahubproject.io, which now redirects to datahub.com. This project is not related to datahub.io, which is a separate public dataset hosting service. See the FAQ below.
DataHub is the #1 open-source AI data catalog that enables discovery, governance, and observability across your entire data ecosystem. Originally built at LinkedIn, DataHub now powers data discovery at thousands of organizations worldwide, managing millions of data assets.
The Challenge: Modern data stacks are fragmented across dozens of tools—warehouses, lakes, BI platforms, ML systems, AI agents, orchestration engines. Finding the right data, understanding its lineage, and ensuring governance is like searching through a maze blindfolded.
The DataHub Solution: DataHub acts as the central nervous system for your data stack—connecting all your tools through real-time streaming or batch ingestion to create a unified metadata graph. Unlike static catalogs, DataHub keeps your metadata fresh and actionable—powering both human teams and AI agents.

Why DataHub?
- 🚀 Battle-Tested at Scale: Born at LinkedIn to handle hyperscale data, now proven at thousands of organizations worldwide managing millions of data assets
- ⚡ Real-Time Streaming: Metadata updates in seconds, not hours or days
- 🤖 AI-Ready: Native support for AI agents via MCP, LLM integrations, and context management
- 🔌 Pioneering Ingestion Architecture: Flexible push/pull framework (widely adopted by other catalogs) with 80+ production-grade connectors extracting deep metadata—column lineage, usage stats, profiling, and quality metrics
- 👨💻 Developer-First: Rich APIs (GraphQL, OpenAPI), Python + Java SDKs, CLI tools
- 🏢 Enterprise Ready: Battle-tested security, authentication, authorization, and audit trails
- 🌍 Open Source: Apache 2.0 licensed, vendor-neutral, community-driven
🧠 The Context Foundation
Essential for modern data teams and reliable AI agents:
- Context Management Is the Missing Piece in the Agentic AI Puzzle - Why context management is essential for deploying reliable AI agents at scale
- Data Lineage: What It Is and Why It Matters - Understanding the map of how data flows through your organization
- What is Metadata Management? - A comprehensive guide for enterprise data leaders
📑 Table of Contents
- FAQ
- See DataHub in Action
- Quick Start
- Installation Options
- Architecture
- Use Cases & Examples
- Trusted By
- Ecosystem
- Community
- Contributing
- Resources
- License
❓ Frequently Asked Questions
<details> <summary><b>Is this the same project as datahub.io?</b></summary>No. datahub.io is a completely separate project — a public dataset hosting service with no affiliation to this project. DataHub (this project) is an open-source metadata platform for data discovery, governance, and observability, hosted at datahub.com and developed at github.com/datahub-project/datahub.
</details> <details> <summary><b>What happened to datahubproject.io?</b></summary>DataHub was previously hosted at datahubproject.io. That domain now redirects to datahub.com. All documentation has moved to docs.datahub.com. If you find references to datahubproject.io in blog posts or tutorials, they refer to this same project — just under its former domain.
Yes. DataHub was originally built at LinkedIn to manage metadata at scale across their data ecosystem. LinkedIn open-sourced DataHub in 2020. It has since grown into an independent community project under the datahub-project GitHub organization, now hosted at datahub.com.
</details> <details> <summary><b>How do I install the DataHub metadata platform?</b></summary>pip install acryl-datahub
datahub docker quickstart
See the Quick Start section below for full instructions. The PyPI package is acryl-datahub.
