GaMAC
No description available
Install / Use
/learn @ITMO-CODE-AI/GaMACREADME
<p align="center"><h1 align="center">GaMAC <img src="docs/gamac_itmo.jpg" width="44px" height="44px"></h1></p>
<p align="center">
<a href="https://itmo.ru/"><img src="https://raw.githubusercontent.com/aimclub/open-source-ops/43bb283758b43d75ec1df0a6bb4ae3eb20066323/badges/ITMO_badge.svg"></a>
<img src="https://img.shields.io/github/license/CTLab-ITMO/CoolPrompt?style=BadgeStyleOptions.DEFAULT&logo=opensourceinitiative&logoColor=white&color=blue" alt="license">
</p>
<p align="center">
</p>
<br>
GaMAC
<overview> GaMAC is a Python module for automated machine learning on clustering tasks with a GPU acceleraion. The project was started in 2024 by ITMO AI Laboratory of Information Technologies and Programming Faculty, and since then we are currently working on this project. </overview>Sponsored by Foundation for Promotion of Innovation.
Contents
Project catalog
├── data # External datasets
├── docs # Project documentation
├── gamac # Project module
| ├── algorithms # Implementations of clustering algorithms
| ├── bin # Models files
| ├── data # Data processing module
| ├── estimation # Estimation of clustering results module
| ├── meta # Meta-classifier module
| | ├── accessors # Markup data
| | ├── impl # Meta-classifier implementation
| | └── storage # Meta-classifier storage
| ├── pipeline # Algorithm search module
| ├── tests # Tests
| | ├── data # Data for tests
| | └── unit # Unit-tests
| └── autoclustering.py # Autoclustering main interface script
├── notebooks # Project notebooks
| ├── examples # Examples of running GaMAC
| | ├── basic_example.ipynb # Basic examples of GaMAC
| | └── example_on_realdata.ipynb # Examples of production cases
| └── experiments # Development experiments
| | ├── cvi_accuracy.ipynb # Evaluation of meta-classifier
| | ├── embedder_testing.ipynb # Evaluation of different image-text encoders
| | ├── experiment_on_optimizers.ipynb # Evaluation of different optimizers
| └── devops # Devops experiments
├── flake8
├── .gitignore
├── LICENSE
├── pyproject.toml
├── README_RU.md # Russian README
├── README.md # English README
└── requirements.txt
Minimal requirements
- Ubuntu 22.04 / WSL
- 4 CPU cores, 16 GB RAM;
- GPU: NVIDIA, CUDA 12.8 support, GPU memory size: 10 Gb
- Python>=3.12
Python dependencies
List of dependencies can be found in requirements.txt.
Installation and dependencies setup
Pre-setting for work with pyrfr
sudo apt install swig libboost-all-dev python3.12-dev
Install GaMAC
With pip
pip install -U --extra-index-url https://test.pypi.org/simple/ Gamac --extra-index-url https://download.pytorch.org/whl/cu128
With git
git clone https://github.com/ITMO-CODE-AI/GaMAC.git
cd GaMAC
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128
Quick Start
- Run Gamac
Check examples in following notebook
1.1. Autoclustering with table, text and image data
from torchvision.datasets import CIFAR100
from gamac.autoclustering import Gamac
# Import data
cifar100 = CIFAR100('../data/cifar', download=True, train=False)
cifar_txt = [f'a photo of {cifar100.classes[img[1]]}' for img in cifar100]
cifar_img = [img[0] for img in cifar100]
cifar_table = pd.DataFrame(cifar100.targets)
result = Gamac().run(table=cifar_table, text=cifar_txt, image=cifar_img)
print(f'result.model: {result.model}')
print(f'clusters: {result.model.labels_}')
1.2. Autoclustering with only table data
import pandas as pd
from sklearn.datasets import load_digits
from gamac.autoclustering import Gamac
# Import data
data = load_digits(as_frame=True)
table = data['data']
result = Gamac().run(table=table, text=None, image=None)
print(f'result.model: {result.model}')
print(f'clusters: {result.model.labels_}')
1.3. Autoclustering with only text and image data
from torchvision.datasets import CIFAR100
from gamac.autoclustering import Gamac
# Import data
cifar100 = CIFAR100('../data/cifar', download=True, train=False)
cifar_txt = [f'a photo of {cifar100.classes[img[1]]}' for img in cifar100]
cifar_img = [img[0] for img in cifar100]
result = Gamac().run(table=None, text=cifar_txt, image=cifar_img)
print(f'result.model: {result.model}')
print(f'clusters: {result.model.labels_}')
Practical applications
- Computer Vision and Image Analysis
- Image analysis based on specific parameters (colors, brightness, contrast, etc.).
- Automatic product categorization by visual features (e-commerce).
- Natural Language Processing (NLP)
- Pattern detection in social media (sentiment analysis, thematic trends).
- Bioinformatics and Medical Diagnostics
- Identification of different cell and tissue types (e.g., in histology).
- Genomic data analysis for mutation pattern detection.
- Finance and Fintech
- Bank customer segmentation to identify groups of related borrowers.
- Anomaly detection in transactions (fraud, money laundering).
- Recommender Systems
- Content clustering (movies, music, products) to improve recommendations.
- Marketing and Behavioral Analytics
- Audience segmentation for targeted advertising.
- Geospatial Analysis
- Clustering points of interest (POI) for urban planning and logistics.
License
This project is protected under the Apache 2.0 License. For more details, refer to the LICENSE file.
