Rumble

Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more

Generate Convert Improve

Install / Use

/learn @RumbleDB/Rumble

About this skill

Quality Score

0/100

README

RumbleDB

With RumbleDB, you can query with ease a lot of different nested, heterogeneous data formats like JSON, CSV, Parquet, Avro, LibSVM, text, etc.

RumbleDB exposes a query language rather than a DataFrame API, for more flexibility, more productivity but also because a lot of data simply will not fit in DataFrames.

You can query it in place from any local file systems or data lakes (Azure blob storage, Amazon S3, HDFS, etc).

You can prepare, clean up, validate your data and put it right into your machine learning pipelines with RumbleDB ML.

Getting started: you will find a Jupyter notebook that introduces the JSONiq language on top of RumbleDB here. You can also run it locally if you prefer.

The documentation also contains an introduction specific to RumbleDB and how you can read input datasets, but we have not converted it to Jupyter notebooks yet (this will follow).

The documentation of the latest official release is available here.

Contributors (Ghislain Fourny's students at ETH): Stefan Irimescu, Renato Marroquin, Rodrigo Bruno, Falko Noé, Ioana Stefan, Andrea Rinaldi, Stevan Mihajlovic, Mario Arduini, Can Berker Çıkış, Elwin Stephan, David Dao, Zirun Wang, Ingo Müller, Dan-Ovidiu Graur, Thomas Zhou, Olivier Goerens, Alexandru Meterez, Pierre Motard, Remo Röthlisberger, Dominik Bruggisser, David Loughlin, David Buzatu, Marco Schöb, Maciej Byczko, Abishek Ramdas, Matteo Agnoletto, Dwij Dixit, Omar Hammoud, Henrik Pätzold.