SkillAgentSearch skills...

VerticaPy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

Install / Use

/learn @vertica/VerticaPy
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img src='https://raw.githubusercontent.com/vertica/VerticaPy/master/assets/img/logo.png' width="180px"> </p>

:star: 2023-12-01: VerticaPy secures 200 stars.

:loudspeaker: 2020-06-27: Vertica-ML-Python has been renamed to VerticaPy.

:warning: The following README is for VerticaPy 1.1.x and onwards, and so some of the elements may not be present in the previous versions.

:scroll: Some basic syntax can be found in the cheat sheet.

📰 Check out the latest newsletter here.

VerticaPy

PyPI version Conda Version License Python Version codecov Code style: black linting: pylint

<p align="center"> <img src='https://raw.githubusercontent.com/vertica/VerticaPy/master/assets/img/benefits.png' width="92%"> </p>

VerticaPy is a Python library with scikit-like functionality used to conduct data science projects on data stored in Vertica, taking advantage of Vertica’s speed and built-in analytics and machine learning features. VerticaPy offers robust support for the entire data science life cycle, uses a 'pipeline' mechanism to sequentialize data transformation operations, and offers beautiful graphical options. <br><br>

Table of Contents

<br>

Introduction

Vertica was the first real analytic columnar database and is still the fastest in the market. However, SQL alone isn't flexible enough to meet the needs of data scientists. <br><br> Python has quickly become the most popular tool in this domain, owing much of its flexibility to its high-level of abstraction and impressively large and ever-growing set of libraries. Its accessibility has led to the development of popular and perfomant APIs, like pandas and scikit-learn, and a dedicated community of data scientists. Unfortunately, Python only works in-memory as a single-node process. This problem has led to the rise of distributed programming languages, but they too, are limited as in-memory processes and, as such, will never be able to process all of your data in this era, and moving data for processing is prohobitively expensive. On top of all of this, data scientists must also find convenient ways to deploy their data and models. The whole process is time consuming. <br><br> VerticaPy aims to solve all of these problems. The idea is simple: instead of moving data around for processing, VerticaPy brings the logic to the data. <br><br> 3+ years in the making, we're proud to bring you VerticaPy. <br><br> Main Advantages:

<ul> <li> Easy Data Exploration.</li> <li> Fast Data Preparation.</li> <li> In-Database Machine Learning.</li> <li> Easy Model Evaluation.</li> <li> Easy Model Deployment.</li> <li> Flexibility of using either Python or SQL.</li> </ul> <p align="center"> <img src='https://raw.githubusercontent.com/vertica/VerticaPy/master/assets/img/architecture.png' width="92%"> </p>

:arrow_up: Back to TOC <br>

Installation

To install <b>VerticaPy</b> with pip:

# Latest release version
root@ubuntu:~$ pip3 install verticapy[all]

# Latest commit on master branch
root@ubuntu:~$ pip3 install git+https://github.com/vertica/verticapy.git@master

To install <b>VerticaPy</b> from source, run the following command from the root directory:

root@ubuntu:~$ python3 setup.py install

A detailed installation guide is available at: <br>

https://www.vertica.com/python/documentation/installation.html

:arrow_up: Back to TOC <br>

Connecting to the Database

VerticaPy is compatible with several clients. For details, see the <a href='https://www.vertica.com/python/documentation/connection.html'>connection page</a>.<br>

:arrow_up: Back to TOC <br>

Documentation

The easiest and most accurate way to find documentation for a particular function is to use the help function:

import verticapy as vp

help(vp.vDataFrame)

Official documentation is available at: <br>

https://www.vertica.com/python/documentation/

To generate documentation, please look at: <br> https://github.com/mail4umar/VerticaPy/blob/master/docs/Documentation%20Generation.md

:arrow_up: Back to TOC <br>

Use-cases

Examples and case-studies: <br>

https://www.vertica.com/python/examples/

:arrow_up: Back to TOC <br>

Highlighted Features

Themes

VerticaPy, offers users the flexibility to customize their coding experience with two visually appealing themes: Dark and Light.

Dark mode, ideal for night-time coding sessions, features a sleek and stylish dark color scheme, providing a comfortable and eye-friendly environment.

<p align="center"> <img src="https://github.com/vertica/VerticaPy/assets/46414488/8ee0b717-a994-4535-826a-7ca4db3772b5" width="70%"> </p>

On the other hand, Light mode serves as the default theme, offering a clean and bright interface for users who prefer a traditional coding ambiance.

<p align="center"> <img src="https://github.com/vertica/VerticaPy/assets/46414488/24757bfd-4d0f-4e92-9aca-45476d704a33" width="70%"> </p>

Theme can be easily switched by:

import verticapy as vp

vp.set_option("theme", "dark") # can be switched 'light'.

VerticaPy's theme-switching option ensures that users can tailor their experience to their preferences, making data exploration and analysis a more personalized and enjoyable journey.

:arrow_up: Back to TOC <br>

SQL Magic

You can use VerticaPy to execute SQL queries directly from a Jupyter notebook. For details, see <a href='https://www.vertica.com/python/documentation/1.1.x/html/api/verticapy.jupyter.extensions.sql_magic.sql_magic.html#verticapy.jupyter.extensions.sql_magic.sql_magic'>SQL Magic</a>:

Example

Load the SQL extension.

%load_ext verticapy.sql

Execute your SQL queries.

%%sql
SELECT version();

# Output
# Vertica Analytic Database v24.4-0

:arrow_up: Back to TOC <br>

SQL Plots

You can create interactive, professional plots directly from SQL.

To create plots, simply provide the type of plot along with the SQL command.

Example

%load_ext verticapy.jupyter.extensions.chart_magic
%chart -k pie -c "SELECT pclass, AVG(age) AS av_avg FROM titanic GROUP BY 1;"
<p align="center"> <img src="https://github.com/vertica/VerticaPy/assets/46414488/7616ca04-87d4-4fd7-8cb9-015f48fe3c19" width="50%"> </p>

:arrow_up: Back to TOC <br>

Multiple Database Connection using DBLINK

In a single platform, multiple databases (e.g. PostgreSQL, Vertica, MySQL, In-memory) can be accessed using SQL and python.

Example

%%sql
/* Fetch TAIL_NUMBER and CITY after Joining the flight_vertica table with airports table in MySQL database. */
SELECT flight_vertica.TAIL_NUMBER, airports.CITY AS Departing_City
FROM flight_vertica
INNER JOIN &&& airports &&&
ON flight_vertica.ORIGIN_AIRPORT = airports.IATA_CODE;

In the example above, the 'flight_vertica' table is stored in Vertica, whereas the 'airports' table is stored in MySQL. We can associate special symbols "&&&" to the different databases to fetch the data. The best part is that all the aggregation is pushed to the databases (i.e. it is not done in memory)!

For more details on how to setup DBLINK, please visit the github repo. To learn about using DBLINK in VerticaPy, check out the documentation page.

:arrow_up: Back to TOC <br>

Python and SQL Combo

VerticaPy has a unique place in the market because it allows users to use Python and SQL in the same environment.

Example

import verticapy as vp

selected_titanic = vp.vDataFrame(
    "(SELECT pclass, embarked, AVG(survived) FROM public.titanic GROUP BY 1, 2) x"
)
selected_titanic.groupby(columns=["pclass"], expr=["AVG(AVG)"])

:arrow_up: Back to TOC <br>

Charts

VerticaPy comes integrated with three popular plotting libraries: matplotlib, highcharts, and plotly.

A gallery of VerticaPy-generated charts is available at:<br>

https://www.vertica.com/python/documentation/chart.html

<p align="center"> <img src="https://github.com/vertica/VerticaPy/assets/46414488/ac62df51-5f26-4b67-839b-fbd962fbaaea" width="70%"> </p>

:arrow_up: Back to TOC <br>

Complete Machine Learning

Related Skills

View on GitHub
GitHub Stars225
CategoryData
Updated20h ago
Forks48

Languages

Python

Security Score

100/100

Audited on Mar 31, 2026

No findings