SkillAgentSearch skills...

Fastdup

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.

Install / Use

/learn @visual-layer/Fastdup

README

<!-- PROJECT LOGO --> <br /> <div align="left"> <a href="https://www.visual-layer.com" target="_blank" rel="noopener noreferrer" name="top"> <picture> <source media="(prefers-color-scheme: dark)" srcset="./gallery/logo_dark_mode.png" width=600> <source media="(prefers-color-scheme: light)" srcset="./gallery/Logo-fastdup-by-VL.png" width=600> <img alt="fastdup logo." src="./gallery/Logo-fastdup-by-VL.png"> </picture> </a> <br> <br> </div> <!-- <h3 align="left">Manage, Clean & Curate Visual Data - Fast and at Scale.</h3> -->

PyPi PyPi PyPi Contributors License OS

<!-- MARKDOWN LINKS & IMAGES --> <!-- https://www.markdownguide.org/basic-syntax/#reference-style-links --> <p align="left"> A powerful open-source tool for analyzing image and video datasets founded by the authors of <a href="https://github.com/apache/tvm">XGBoost</a>, <a href="https://github.com/apache/tvm">Apache TVM</a> & <a href="https://github.com/apple/turicreate">Turi Create</a> - <a href="https://www.linkedin.com/in/dr-danny-bickson-835b32">Danny Bickson</a>, <a href="https://www.linkedin.com/in/carlos-guestrin-5352a869">Carlos Guestrin</a> and <a href="https://www.linkedin.com/in/amiralush">Amir Alush</a>.</p> <hr> <a href="https://visual-layer.readme.io/" target="_blank" rel="noopener noreferrer">Documentation</a> · <a href="#features--advantages" target="_blank" rel="noopener noreferrer">Features</a> · <a href="https://github.com/visual-layer/fastdup/issues/new/choose" target="_blank" rel="noopener noreferrer">Report Bug</a> · <a href="https://medium.com/visual-layer" target="_blank" rel="noopener noreferrer">Blog</a> · <a href="#getting-started" target="_blank" rel="noopener noreferrer">Quickstart</a> · <a href="#visual-layer-cloud" target="_blank" rel="noopener noreferrer">Visual Layer Cloud</a> <hr> </p> <!-- <br /> <br /> <a href="https://discord.gg/tkYHJCA7mb" target="_blank" rel="noopener noreferrer"> <img src="https://img.shields.io/badge/DISCORD%20COMMUNITY-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Logo"> </a> <a href="https://visual-layer.readme.io/discuss" target="_blank" rel="noopener noreferrer"> <img src="https://img.shields.io/badge/DISCUSSION%20FORUM-slateblue?style=for-the-badge&logo=discourse&logoWidth=20" alt="Logo"> </a> <a href="https://www.linkedin.com/company/visual-layer/" target="_blank" rel="noopener noreferrer"> <img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="Logo"> </a> <a href="https://twitter.com/visual_layer" target="_blank" rel="noopener noreferrer"> <img src="https://img.shields.io/badge/X%20(TWITTER)-000000?style=for-the-badge&logo=x&logoColor=white" alt="Logo"> </a> <a href="https://www.youtube.com/@visual-layer" target="_blank" rel="noopener noreferrer"> <img src="https://img.shields.io/badge/-YouTube-black.svg?style=for-the-badge&logo=youtube&colorB=red" alt="Logo"> </a> <br /> <br /> -->

Getting Started

pip install fastdup from PyPI:

pip install fastdup

More installation options are available here.

Initialize and run fastdup:

import fastdup

fd = fastdup.create(input_dir="IMAGE_FOLDER/")
fd.run()

Explore the results in a interactive web UI:

fd.explore()   

run

Alternatively, visualize the result in a static gallery:

fd.vis.duplicates_gallery()    # gallery of duplicates
fd.vis.outliers_gallery()      # gallery of outliers
fd.vis.component_gallery()     # gallery of connected components
fd.vis.stats_gallery()         # gallery of image statistics (e.g. blur, brightness, etc.)
fd.vis.similarity_gallery()    # gallery of similar images

Check this quickstart tutorial for more info

https://github.com/user-attachments/assets/738a329d-8063-4515-a961-f2527934a0ca

Features & Advantages

fastdup handles labeled/unlabeled datasets in image or video format, providing a range of features:

<div align="center" style="display:flex;flex-direction:column;"> <a href="https://www.visual-layer.com" target="_blank" rel="noopener noreferrer"> <img src="./gallery/fastdup_features_new.png" alt="fastdup" width="1000"> </a> </div>

What sets fastdup apart from other similar tools:

  • Quality: High-quality analysis to identify duplicates/near-duplicates, outliers, mislabels, broken images, and low-quality images.
  • Scale: Highly scalable, capable of processing 400M images on a single CPU machine. Scales up to billions of images.
  • Speed: Optimized C++ engine enables high performance even on low-resource CPU machines.
  • Privacy: Runs locally or on your cloud infrastructure. Your data stays where it is.
  • Ease of use: Works on labeled or unlabeled datasets in image or video format with support for major operating systems like MacOS, Linux and Windows.

Learn from Examples

Learn the basics of fastdup through interactive examples. View the notebooks on GitHub or nbviewer. Even better, run them on Google Colab or Kaggle, for free.

<table> <tr> <td rowspan="4" width="160"> <a href="https://visual-layer.readme.io/docs/quickstart"> <img src="./gallery/cat_dog_thumbnail.jpg" width="200"> </a> </td> <td rowspan="4"> <b>⚡ Quickstart:</b> Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here! <br> <br> <b>📌 Dataset:</b> <a href="https://www.robots.ox.ac.uk/~vgg/data/pets/">Oxford-IIIT Pet</a>. </td> <td align="center" width="80"> <a href="https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quickstart.ipynb"> <img src="./gallery/nbviewer_logo.png" height="30"> </a> </td> </tr> <tr> <td align="center"> <a href="https://github.com/visual-layer/fastdup/blob/main/examples/quickstart.ipynb"> <img src="./gallery/github_logo.png" height="25"> </a> </td> </tr> <tr> <td align="center"> <a href="https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/quickstart.ipynb"> <img src="./gallery/colab_logo.png" height="20"> </a> </td> </tr> <tr> <td align="center"> <a href="https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/quickstart.ipynb"> <img src="./gallery/kaggle_logo.png" height="25"> </a> </td> </tr> <!-- ------------------------------------------------------------------- --> <tr> <td rowspan="4" width="160"> <a href="https://visual-layer.readme.io/docs/finding-removing-duplicates"> <img src="gallery/duplicates_horses_thumbnail.jpg" width="200"> </a> </td> <td rowspan="4"> <b>🧹 Finding and Removing Duplicates:</b> Learn how to how to analyze an image dataset for duplicates and near-duplicates. <br> <br> <b>📌 Dataset:</b> <a href="https://www.robots.ox.ac.uk/~vgg/data/pets/">Oxford-IIIT Pet</a>. </td> <td align="center" width="80"> <a href="https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/finding-removing-duplicates.ipynb"> <img src="./gallery/nbviewer_logo.png" height="30"> </a> </td> </tr> <tr> <td align="center"> <a href="https://github.com/visual-layer/fastdup/blob/main/examples/finding-removing-duplicates.ipynb"> <img src="./gallery/github_logo.png" height="25"> </a> </td> </tr> <tr> <td align="center"> <a href="https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/finding-removing-duplicates.ipynb"> <img src="./gallery/colab_logo.png" height="20"> </a> </td> </tr> <tr> <td align="center"> <a href="https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/finding-removing-duplicates.ipynb"> <img src="./gallery/kaggle_logo.png" height="25"> </a> </td> </tr> <!-- ------------------------------------------------------------------- --> <tr> <td rowspan="4" width="160"> <a href="https://visual-layer.readme.io/docs/finding-removing-mislabels"> <img src="./gallery/food_thumbn

Related Skills

View on GitHub
GitHub Stars1.8k
CategoryDesign
Updated1d ago
Forks88

Languages

Python

Security Score

85/100

Audited on Mar 25, 2026

No findings