Congruity

The goal of this library is to provide a compatibility layer that makes it easier to adopt Spark Connect. The library is designed to be simply imported in your application and will then monkey-patch the existing API to provide the legacy functionality.

Generate Convert Improve

Install / Use

/learn @databricks/Congruity

About this skill

Quality Score

0/100

README

congruity

In many ways, the migration from using classic Spark applications using the full power and flexibility to be using only the Spark Connect compatible DataFrame API can be challenging.

Non-Goals

This library is not intended to be a long-term solution. The goal is to provide a compatibility layer that becomes obsolete over time. In addition, we do not aim to provide compatibility for all methods and features but only a select subset. Lastly, we do not aim to achieve the same performance as using some of the native RDD APIs.

Usage

Spark JVM & Spark Connect compatibility library.

pip install spark-congruity

import congruity.patch

Example

Here is code that works on Spark JVM:

from pyspark.sql import SparkSession

spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
data = [("Java", "20000"), ("Python", "100000"), ("Scala", "3000")]
spark.sparkContext.parallelize(data).toDF()

This code doesn't work with Spark Connect. The congruity library rearranges the code under the hood, so the old syntax works on Spark Connect clusters as well:

import congruity.patch  # noqa: F401
from pyspark.sql import SparkSession

spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
data = [("Java", "20000"), ("Python", "100000"), ("Scala", "3000")]
spark.sparkContext.parallelize(data).toDF()

Contributing

We very much welcome contributions to this project. The easiest way to start is to pick any of the below RDD or SparkContext methods and implement the compatibility layer. Once you have done that open a pull request and we will review it.

What's supported?

RDD

| RDD | API | Comment | |-----------------------------------|--------------------|-------------------------------------------------------------------| | aggregate | :white_check_mark: | | | aggregateByKey | :x: | | | barrier | :x: | | | cache | :x: | | | cartesian | :x: | | | checkpoint | :x: | | | cleanShuffleDependencies | :x: | | | coalesce | :x: | | | cogroup | :x: | | | collect | :white_check_mark: | | | collectAsMap | :white_check_mark: | | | collectWithJobGroup | :x: | | | combineByKey | :x: | | | count | :white_check_mark: | | | countApprox | :x: | | | countByKey | :x: | | | countByValue | :x: | | | distinct | :x: | | | filter | :white_check_mark: | | | first | :white_check_mark: | | | flatMap | :x: | | | fold | :white_check_mark: | First version | | foreach | :x: | | | foreachPartition | :x: | | | fullOuterJoin | :x: | | | getCheckpointFile | :x: | | | getNumPartitions | :x: | | | getResourceProfile | :x: | | | getStorageLevel | :x: | | | glom | :white_check_mark: | | | groupBy | :white_check_mark: | | | groupByKey | :white_check_mark: | | | groupWith | :x: | | | histogram | :white_check_mark: | | | id | :x: | | | intersection | :x: | | | isCheckpointed | :x: | | | isEmpty | :white_check_mark: | | | isLocallyCheckpointed | :x: | | | join | :x: | | | keyBy | :white_check_mark: | | | keys | :white_check_mark: | | | leftOuterJoin | :x: | | | localCheckpoint | :x: | | | lookup | :x: | | | map | :white_check_mark: | | | mapPartitions | :white_check_mark: | First version, based on mapInArrow. | | mapPartitionsWithIndex | :x: | | | mapPartitionsWithSplit | :x: | | | mapValues | :white_check_mark: | | | max | :white_check_mark: | | | mean | :white_check_mark: | | | meanApprox | :x: | | | min | :white_check_mark: | | | name | :x: | | | partitionBy | :x: | | | persist | :x: | | | pipe | :x: | | | rand

Related Skills

clearshot

Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.

openpencil

2.0k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

HappyColorBlend

HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to

Flyaro-waffle-app

Waffle Delight - Full Stack MERN Application Rules & Documentation Project Overview A comprehensive waffle delivery application built with MERN stack featuring premium UI/UX, admin management, a

databricks

View profile

View on GitHub

GitHub Stars18

CategoryDesign

Updated12mo ago

Forks1

databricks/congruity

Languages

Python

Security Score

82/100

Audited on Apr 4, 2025

No findings