SkillAgentSearch skills...

Upgini

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs

Install / Use

/learn @upgini/Upgini

README

<!-- <h2 align="center"> <a href="https://upgini.com/">Upgini</a> : low-code feature search and enrichment library for machine learning </h2> --> <!-- <h2 align="center"> <a href="https://upgini.com/">Upgini</a> : Free automated data enrichment library for machine learning: </br>only the accuracy improving features in 2 minutes </h2> --> <!-- <h2 align="center"> <a href="https://upgini.com/">Upgini</a> • Free production-ready automated data enrichment library for machine learning</h2>--> <h2 align="center"> <a href="https://upgini.com/">Upgini • Intelligent data search & enrichment for Machine Learning and AI</a></h2> <p align="center"> <b>Easily find and add relevant features to your ML & AI pipeline from</br> hundreds of public, community, and premium external data sources, </br>including open & commercial LLMs</b> </p> <p align="center"> <br /> <a href="https://colab.research.google.com/github/upgini/upgini/blob/main/notebooks/Upgini_Features_search%26generation.ipynb"><strong>Quick Start in Colab »</strong></a> | <!--<a href="https://upgini.com/">Upgini.com</a> |--> <a href="https://profile.upgini.com">Register / Sign In</a> | <!-- <a href="https://gitter.im/upgini/community?utm_source=share-link&utm_medium=link&utm_campaign=share-link">Gitter Community</a> | --> <a href="https://4mlg.short.gy/join-upgini-community">Slack Community</a> | <a href="https://forms.gle/pH99gb5hPxBEfNdR7"><strong>Propose a new data source</strong></a> </p> <p align=center> <a href="/LICENSE"><img alt="BSD-3 license" src="https://img.shields.io/badge/license-BSD--3%20Clause-green"></a> <a href="https://pypi.org/project/upgini/"><img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/upgini"></a> <a href="https://pypi.org/project/upgini/"><img alt="PyPI" src="https://img.shields.io/pypi/v/upgini?label=Release"></a> <a href="https://pepy.tech/project/upgini"><img alt="Downloads" src="https://static.pepy.tech/badge/upgini"></a> <a href="https://4mlg.short.gy/join-upgini-community"><img alt="Upgini slack community" src="https://img.shields.io/badge/slack-@upgini-orange.svg?logo=slack"></a> </p> <!-- [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?logo=python&logoColor=white)](https://github.com/psf/black) [![Gitter Сommunity](https://img.shields.io/badge/gitter-@upgini-teal.svg?logo=gitter)](https://gitter.im/upgini/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) -->

❔ Overview

Upgini is an intelligent data search engine with a Python library that helps you find and add relevant features to your ML pipeline from hundreds of public, community, and premium external data sources. Under the hood, Upgini automatically optimizes all connected data sources by generating an optimal set of ML features using large language models (LLMs), GNNs (graph neural networks), and recurrent neural networks (RNNs).

Motivation: for most supervised ML models external data & features boost accuracy significantly better than any hyperparameters tuning. But lack of automated and time-efficient enrichment tools for external data blocks massive adoption of external features in ML pipelines. We want to radically simplify feature search and enrichment to make external data a standard approach. Like hyperparameter tuning in machine learning today.

Mission: Democratize access to data sources for data science community.

🚀 Awesome features

⭐️ Automatically find only relevant features that improve your model’s accuracy. Not just correlated with the target variable, which in 9 out of 10 cases yields zero accuracy improvement
⭐️ Automated feature generation from the sources: feature generation with LLM‑based data augmentation, RNNs, and GraphNNs; ensembling across multiple data sources
⭐️ Automatic search key augmentation from all connected sources. If you do not have all search keys in your search request, such as postal/ZIP code, Upgini will try to add those keys based on the provided set of search keys. This will broaden the search across all available data sources
⭐️ Calculate accuracy metrics and uplift after enriching an existing ML model with external features
⭐️ Check the stability of accuracy gain from external data on out-of-time intervals and verification datasets. Mitigate the risks of unstable external data dependencies in the ML pipeline
⭐️ Easy to use - a single request to enrich the training dataset with all of the keys at once:

<table> <tr> <td> date / datetime </td> <td> phone number </td> </tr> <tr> <td> postal / ZIP code </td> <td> hashed email / HEM </td> </tr> <tr> <td> country </td> <td> IP-address </td> </tr> </table>

⭐️ Scikit-learn-compatible interface for quick data integration with existing ML pipelines
⭐️ Support for most common supervised ML tasks on tabular data:

<table> <tr> <td><a href="https://en.wikipedia.org/wiki/Binary_classification">☑️ binary classification</a></td> <td><a href="https://en.wikipedia.org/wiki/Multiclass_classification">☑️ multiclass classification</a></td> </tr> <tr> <td><a href="https://en.wikipedia.org/wiki/Regression_analysis">☑️ regression</a></td> <td><a href="https://en.wikipedia.org/wiki/Time_series#Prediction_and_forecasting">☑️ time-series prediction</a></td> </tr> </table>

⭐️ Simple Drag & Drop Search UI:
<a href="https://upgini.com/upgini-widget"> <img width="710" alt="Drag & Drop Search UI" src="https://github.com/upgini/upgini/assets/95645411/36b6460c-51f3-400e-9f04-445b938bf45e"> </a>

🌎 Connected data sources and coverage

  • Public data: public sector, academic institutions, other sources through open data portals. Curated and updated by the Upgini team
  • Community‑shared data: royalty- or license-free datasets or features from the data science community (our users). This includes both public and scraped data
  • Premium data providers: commercial data sources verified by the Upgini team in real-world use cases

👉 Details on datasets and features

📊 Total: 239 countries and up to 41 years of history

|Data sources|Countries|History (years)|# sources for ensembling|Update frequency|Search keys|API Key required |--|--|--|--|--|--|--| |Historical weather & Climate normals | 68 |22|-|Monthly|date, country, postal/ZIP code|No |Location/Places/POI/Area/Proximity information from OpenStreetMap | 221 |2|-|Monthly|date, country, postal/ZIP code|No |International holidays & events, Workweek calendar| 232 |22|-|Monthly|date, country|No |Consumer Confidence index| 44 |22|-|Monthly|date, country|No |World economic indicators|191 |41|-|Monthly|date, country|No |Markets data|-|17|-|Monthly|date, datetime|No |World mobile & fixed-broadband network coverage and performance |167|-|3|Monthly|country, postal/ZIP code|No |World demographic data |90|-|2|Annual|country, postal/ZIP code|No |World house prices |44|-|3|Annual|country, postal/ZIP code|No |Public social media profile data |104|-|-|Monthly|date, email/HEM, phone |Yes |Car ownership data and Parking statistics|3|-|-|Annual|country, postal/ZIP code, email/HEM, phone|Yes |Geolocation profile for phone & IPv4 & email|239|-|6|Monthly|date, email/HEM, phone, IPv4|Yes |🔜 Email/WWW domain profile|-|-|-|-

Know other useful data sources for machine learning? Give us a hint and we'll add it for free.

💼 Tutorials

Search of relevant external features & Automated feature generation for Salary prediction task (use as a template)

  • The goal is to predict salary for a data science job posting based on information about the employer and job description.
  • Following this guide, you'll learn how to search and auto‑generate new relevant features with the Upgini library
  • The evaluation metric is Mean Absolute Error (MAE).

Run Feature search & generation notebook inside your browser:

Open example in Google Colab  

<!-- [![Open in Binder](https://img.shields.io/badge/run_example_in-mybinder-red.svg?style=for-the-badge&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFkAAABZCAMAAABi1XidAAAB8lBMVEX///9XmsrmZYH1olJXmsr1olJXmsrmZYH1olJXmsr1olJXmsrmZYH1olL1olJXmsr1olJXmsrmZYH1olL1olJXmsrmZYH1olJXmsr1olL1olJXmsrmZYH1olL1olJXmsrmZYH1olL1olL0nFf1olJXmsrmZYH1olJXmsq8dZb1olJXmsrmZYH1olJXmspXmspXmsr1olL1olJXmsrmZYH1olJXmsr1olL1olJXmsrmZYH1olL1olLeaIVXmsrmZYH1olL1olL1olJXmsrmZYH1olLna31Xmsr1olJXmsr1olJXmsrmZYH1olLqoVr1olJXmsr1olJXmsrmZYH1olL1olKkfaPobXvviGabgadXmsqThKuofKHmZ4Dobnr1olJXmsr1olJXmspXmsr1olJXmsrfZ4TuhWn1olL1olJXmsqBi7X1olJXmspZmslbmMhbmsdemsVfl8ZgmsNim8Jpk8F0m7R4m7F5nLB6jbh7jbiDirOEibOGnKaMhq+PnaCVg6qWg6qegKaff6WhnpKofKGtnomxeZy3noG6dZi+n3vCcpPDcpPGn3bLb4/Mb47UbIrVa4rYoGjdaIbeaIXhoWHmZYHobXvpcHjqdHXreHLroVrsfG/uhGnuh2bwj2Hxk17yl1vzmljzm1j0nlX1olL3AJXWAAAAbXRSTlMAEBAQHx8gICAuLjAwMDw9PUBAQEpQUFBXV1hgYGBkcHBwcXl8gICAgoiIkJCQlJicnJ2goKCmqK+wsLC4usDAwMjP0NDQ1NbW3Nzg4ODi5+3v8PDw8/T09PX29vb39/f5+fr7+/z8/Pz9/v7+zczCxgAABC5JREFUeAHN1ul3k0UUBvCb1CTVpmpaitAGSLSpSuKCLWpbTKNJFGlcSMAFF63iUmRccNG6gLbuxkXU66JAUef/9LSpmXnyLr3T5AO/rzl5zj137p136BISy44fKJXuGN/d19PUfYeO67Znqtf2KH33Id1psXoFdW30sPZ1sMvs2D060AHqws4FHeJojLZqnw53cmfvg+XR8mC0OEjuxrXEkX5ydeVJLVIlV0e10PXk5k7dYeHu7Cj1j+49uKg7uLU61tGLw1lq27ugQYlclHC4bgv7VQ+TAyj5Zc/UjsPvs1sd5cWryWObtvWT2
View on GitHub
GitHub Stars349
CategoryData
Updated9d ago
Forks27

Languages

Python

Security Score

100/100

Audited on Mar 28, 2026

No findings