SkillAgentSearch skills...

Metadata

Knowledge sharing - Metadata, metadata-lake

Install / Use

/learn @data-engineering-helpers/Metadata
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Metadata, metadatalake, Modern Metadata Stack (MMS)

Table of Content (ToC)

Created by gh-md-toc

Overview

This project intends to collect, analyze and synthetize referential material about metadata, in order to facilitate the implementing of metadatalakes. That is, this project is a first contribution to a Modern Metadatalake Stack (MMS), much like the initiatives around the rise of the Modern Data Stack (MDS).

Even though the members of the GitHub organization may be employed by some companies, they speak on their personal behalf and do not represent these companies.

Other repositories of Data Engineering helpers

References

Articles

Metadata is king

  • Date: June 2025
  • Author: Dipankar Mazumdar (Dipankar Mazumdar on LinkedIn)
  • Link to the LinkedIn post: https://www.linkedin.com/posts/dipankar-mazumdar_lakehouse-dataengineering-softwareengineering-activity-7336406995798740992-eji9/

From Data Catalog to Data Marketplace

  • Title: From Data Catalog 📚 to Data Marketplace 🛒
  • Author: Jochen Christ (Jochen Christ on LinkedIn)
  • Date: Jan. 2025
  • Link to the LinkedIn post: https://www.linkedin.com/posts/jochenchrist_datamarketplace-datamarketplace-dataproducts-activity-7281953125140246528-BExu/
  • Link to the Data Mesh Manager blog post: https://datamesh-manager.com/learn/data-catalog-vs-data-marketplace

The Art of Discoverability

  • Title: The Art of Discoverability and Reverse Engineering User Happiness
  • Authors: Animesh Kumar and Travis Thompson
  • Date: Dec. 2024
  • Link to the article: https://moderndata101.substack.com/p/the-art-of-discoverability-and-reverse

Google paper - Big Metadata: When Metadata is Big Data

Introduction

In the past 10 years, as the modern data stack has matured and become mainstream, we’ve taken great leaps forward in data infrastructure. However, the modern data stack still has one key missing component: context. That’s where metadata comes in. In this increasingly diverse data world, metadata holds the key to the elusive promised land — a single source of truth. There will always be countless tools and tech in a team’s data infrastructure. By effectively collecting metadata, a team can finally unify context about all their tools, processes, and data.

But what actually is metadata, you ask? Simply put, metadata is “data about data”.

Today, metadata is everywhere. Every component of the modern data stack and every user interaction on it generates metadata. Apart from traditional forms like technical metadata (e.g. schemas) and business metadata (e.g. taxonomy, glossary), our data systems now create entirely new forms of metadata.

Cloud compute ecosystems and orchestration engines generate logs every second, called performance metadata. Users who interact with data assets and one another generate social metadata. Logs from BI tools, notebooks, and other applications, as well as from communication tools like Slack, generate usage metadata. Orchestration engines and raw code (e.g. SQL) used to create data assets generate provenance metadata.

Metadata lake

Frameworks

Hudi metadata table

  • Homepage: https://hudi.apache.org/docs/metadata/
  • Hudi GitHub repository: https://github.com/apache/hudi
  • Hudi tracks metadata about a table to remove bottlenecks in achieving great read/write performance, specifically on cloud storage.
    • Avoid list operations to obtain set of files in a table
    • Expose columns statistics for better query planning and faster queries

DataHub

Acryl data

  • Moto: Bring clarity to your data
  • Home page: https://www.acryldata.io/
  • Open source: no
  • Overview: Acryl Cloud is a comprehensive metadata platform that joins a best-in-class catalog with data observability. Built by the team behind DataHub (see above).

Metaphor

  • Moto: "Data Mastery for the Whole Company" "A modern data catalog powered by social data intelligence and AI - from the creators of DataHub"
  • Home page: https://metaphor.io/
  • Open source: no
  • Articles on the principles:
    • The Grand Rewrite of DataHub, by Mars Lan et al, Sep. 2023 - https://metaphor.io/blog/the-grand-
View on GitHub
GitHub Stars8
CategoryDevelopment
Updated1mo ago
Forks1

Security Score

85/100

Audited on Feb 10, 2026

No findings