Dfply
dplyr-style piping operations for pandas dataframes
Install / Use
/learn @kieferk/DfplyREADME
dfply
Version: 0.3.2
Note: Version 0.3.0 is the first big update in awhile, and changes a lot of the "base" code. The
pandas-plypackage is no longer being imported. I have coded my own version of the "symbolic" objects that I was borrowing frompandas-ply. Also, I am no longer supporting Python 2, sorry!
In v0.3
groupbyhas been renamed togroup_byto mirror thedplyrfunction. If this breaks your legacy code, one possible fix is to havefrom dfply.group import group_by as groupbyin your package imports.
The dfply package makes it possible to do R's dplyr-style data manipulation with pipes
in python on pandas DataFrames.
This is an alternative to pandas-ply
and dplython, which both engineer dplyr
syntax and functionality in python. There are probably more packages that attempt
to enable dplyr-style dataframe manipulation in python, but those are the two I
am aware of.
dfply uses a decorator-based architecture for the piping functionality and
to "categorize" the types of data manipulation functions. The goal of this
architecture is to make dfply concise and easily extensible, simply by chaining
together different decorators that each have a distinct effect on the wrapped
function. There is a more in-depth overview of the decorators and how dfply can be
customized below.
dfply is intended to mimic the functionality of dplyr. The syntax
is the same for the most part, but will vary in some cases as Python is a
considerably different programming language than R.
A good amount of the core functionality of dplyr is complete, and the remainder is
actively being added in. Going forward I hope functionality that is not
directly part of dplyr to be incorporated into dfply as well. This is not
intended to be an absolute mimic of dplyr, but instead a port of the ease,
convenience and readability the dplyr package provides for data manipulation
tasks.
Expect frequent updates to the package version as features are added and any bugs are fixed.
<!-- START doctoc generated TOC please keep comment here to allow auto update --> <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->- Overview of functions
- Embedded column functions
- Extending
dfplywith custom functions - Advanced: understanding base
dfplydecorators - Contributing
Overview of functions
The >> and >>= pipe operators
dfply works directly on pandas DataFrames, chaining operations on the data with
the >> operator, or alternatively starting with >>= for inplace operations.
from dfply import *
diamonds >> head(3)
carat cut color clarity depth table price x y z
0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
You can chain piped operations, and of course assign the output to a new DataFrame.
lowprice = diamonds >> head(10) >> tail(3)
lowprice
carat cut color clarity depth table price x y z
7 0.26 Very Good H SI1 61.9 55.0 337 4.07 4.11 2.53
8 0.22 Fair E VS2 65.1 61.0 337 3.87 3.78 2.49
9 0.23 Very Good H VS1 59.4 61.0 338 4.00 4.05 2.39
Inplace operations are done with the first pipe as >>= and subsequent pipes
as >>.
diamonds >>= head(10) >> tail(3)
diamonds
carat cut color clarity depth table price x y z
7 0.26 Very Good H SI1 61.9 55.0 337 4.07 4.11 2.53
8 0.22 Fair E VS2 65.1 61.0 337 3.87 3.78 2.49
9 0.23 Very Good H VS1 59.4 61.0 338 4.00 4.05 2.39
When using the inplace pipe, the DataFrame is not required on the left hand
side of the >>= pipe and the DataFrame variable is overwritten with the
output of the operations.
The X DataFrame symbol
The DataFrame as it is passed through the piping operations is represented
by the symbol X. It records the actions you want to take (represented by
the Intention class), but does not evaluate them until the appropriate time.
Operations on the DataFrame are deferred. Selecting
two of the columns, for example, can be done using the symbolic X DataFrame
during the piping operations.
diamonds >> select(X.carat, X.cut) >> head(3)
carat cut
0 0.23 Ideal
1 0.21 Premium
2 0.23 Good
Selecting and dropping
select() and drop() functions
There are two functions for selection, inverse of each other: select and
drop. The select and drop functions accept string labels, integer positions,
and/or symbolically represented column names (X.column). They also accept symbolic "selection
filter" functions, which will be covered shortly.
The example below selects "cut", "price", "x", and "y" from the diamonds dataset.
diamonds >> select(1, X.price, ['x', 'y']) >> head(2)
cut price x y
0 Ideal 326 3.95 3.98
1 Premium 326 3.89 3.84
If you were instead to use drop, you would get back all columns besides those specified.
diamonds >> drop(1, X.price, ['x', 'y']) >> head(2)
carat color clarity depth table z
0 0.23 E SI2 61.5 55.0 2.43
1 0.21 E SI1 59.8 61.0 2.31
Selection using the inversion ~ operator on symbolic columns
One particularly nice thing about dplyr's selection functions is that you can
drop columns inside of a select statement by putting a subtraction sign in front,
like so: ... %>% select(-col). The same can be done in dfply, but instead of
the subtraction operator you use the tilde ~.
For example, let's say I wanted to select any column except carat, color, and
clarity in my dataframe. One way to do this is to specify those for removal using
the ~ operator like so:
diamonds >> select(~X.carat, ~X.color, ~X.clarity) >> head(2)
cut depth table price x y z
0 Ideal 61.5 55.0 326 3.95 3.98 2.43
1 Premium 59.8 61.0 326 3.89 3.84 2.31
Note that if you are going to use the inversion operator, you must place it
prior to the symbolic X (or a symbolic such as a selection filter function, covered
next). For example, using the inversion operator on a list of columns will
result in an error:
diamonds >> select(~[X.carat, X.color, X.clarity]) >> head(2)
TypeError: bad operand type
Related Skills
node-connect
337.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.1kCommit, push, and open a PR
