SkillAgentSearch skills...

Numpysane

more-reasonable core functionality for numpy

Install / Use

/learn @dkogan/Numpysane
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

  • TALK I just gave a talk about this at [[https://www.socallinuxexpo.org/scale/18x][SCaLE 18x]]. Here are the [[https://www.youtube.com/watch?v=YOOapXNtUWw][video of the talk]] and the [[https://github.com/dkogan/talk-numpysane-gnuplotlib/raw/master/numpysane-gnuplotlib.pdf]["slides"]].

  • NAME numpysane: more-reasonable core functionality for numpy

  • SYNOPSIS #+BEGIN_EXAMPLE

import numpy as np import numpysane as nps

a = np.arange(6).reshape(2,3) b = a + 100 row = a[0,:] + 1000

a array([[0, 1, 2], [3, 4, 5]])

b array([[100, 101, 102], [103, 104, 105]])

row array([1000, 1001, 1002])

nps.glue(a,b, axis=-1) array([[ 0, 1, 2, 100, 101, 102], [ 3, 4, 5, 103, 104, 105]])

nps.glue(a,b,row, axis=-2) array([[ 0, 1, 2], [ 3, 4, 5], [ 100, 101, 102], [ 103, 104, 105], [1000, 1001, 1002]])

nps.cat(a,b) array([[[ 0, 1, 2], [ 3, 4, 5]],

   [[100, 101, 102],
    [103, 104, 105]]])

@nps.broadcast_define( (('n',), ('n',)) ) ... def inner_product(a, b): ... return a.dot(b)

inner_product(a,b) array([ 305, 1250]) #+END_EXAMPLE

  • DESCRIPTION

Numpy is a very widely used toolkit for numerical computation in Python. Despite its popularity, some of its core functionality is mysterious and/or incomplete. The numpysane library seeks to fill those gaps by providing its own replacement routines. Many of the replacement functions are direct translations from PDL (http://pdl.perl.org), a numerical computation library for perl. The functions provided by this module fall into three broad categories:

  • Broadcasting support
  • Nicer array manipulation
  • Basic linear algebra

** Broadcasting Numpy has a limited support for broadcasting (http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html), a generic way to vectorize functions. A broadcasting-aware function knows the dimensionality of its inputs, and any extra dimensions in the input are automatically used for vectorization.

*** Broadcasting rules A basic example is an inner product: a function that takes in two identically-sized 1-dimensional arrays (input prototype (('n',), ('n',)) ) and returns a scalar (output prototype () ). If one calls a broadcasting-aware inner product with two arrays of shape (2,3,4) as input, it would compute 6 inner products of length-4 each, and report the output in an array of shape (2,3).

In short:

  • The most significant dimension in a numpy array is the LAST one, so the prototype of an input argument must exactly match a given input's trailing shape. So a prototype shape of (a,b,c) accepts an argument shape of (......, a,b,c), with as many or as few leading dimensions as desired.
  • The extra leading dimensions must be compatible across all the inputs. This means that each leading dimension must either
    • equal 1
    • be missing (thus assumed to equal 1)
    • equal to some positive integer >1, consistent across all arguments
  • The output is collected into an array that's sized as a superset of the above-prototype shape of each argument

More involved example: A function with input prototype ( (3,), ('n',3), ('n',), ('m',) ) given inputs of shape

#+BEGIN_EXAMPLE (1,5, 3) (2,1, 8,3) ( 8) ( 5, 9) #+END_EXAMPLE

will return an output array of shape (2,5, ...), where ... is the shape of each output slice. Note again that the prototype dictates the TRAILING shape of the inputs.

*** What about the stock broadcasting support?

The numpy documentation dedicates a whole page explaining the broadcasting rules, but only a small number of numpy functions provide any broadcasting support. It's fairly inconsistent, and most functions have no broadcasting support and no mention of it in the documentation. And as a result, this is not a prominent part of the numpy ecosystem and there's little user awareness that it exists.

*** What this module provides This module contains functionality to make any arbitrary function broadcastable, in either C or Python. In both cases, the input and output prototypes are declared, and these are used for shape-checking and vectorization each time the function is called.

The functions can have either

  • A single output, returned as a numpy array. The output specification in the prototype is a single shape tuple
  • Multiple outputs, returned as a tuple of numpy arrays. The output specification in the prototype is a tuple of shape tuples

*** Broadcasting in python This is invoked as a decorator, applied to any function. An example:

#+BEGIN_EXAMPLE

import numpysane as nps

@nps.broadcast_define( (('n',), ('n',)) ) ... def inner_product(a, b): ... return a.dot(b) #+END_EXAMPLE

Here we have a simple inner product function to compute ONE inner product. The 'broadcast_define' decorator adds broadcasting-awareness: 'inner_product()' expects two 1D vectors of length 'n' each (same 'n' for the two inputs), vectorizing extra dimensions, as needed. The inputs are shape-checked, and incompatible dimensions will trigger an exception. Example:

#+BEGIN_EXAMPLE

import numpy as np

a = np.arange(6).reshape(2,3) b = a + 100

a array([[0, 1, 2], [3, 4, 5]])

b array([[100, 101, 102], [103, 104, 105]])

inner_product(a,b) array([ 305, 1250]) #+END_EXAMPLE

Another related function in this module broadcast_generate(). It's similar to broadcast_define(), but instead of adding broadcasting-awareness to an existing function, it returns a generator that produces tuples from a set of arguments according to a given prototype. Similarly, broadcast_extra_dims() is available to report the outer shape of a potential broadcasting operation.

Stock numpy has some rudimentary support for all this with its vectorize() function, but it assumes only scalar inputs and outputs, which severely limits its usefulness. See the docstrings for 'broadcast_define' and 'broadcast_generate' in the INTERFACE section below for usage details.

*** Broadcasting in C The python broadcasting is useful, but it is a python loop, so the loop itself is computationally expensive if we have many iterations. If the function being wrapped is available in C, we can apply broadcasting awareness in C, which makes a much faster loop.

The "numpysane_pywrap" module generates code to wrap arbitrary C code in a broadcasting-aware wrapper callable from python. This is an analogue of PDL::PP (http://pdl.perl.org/PDLdocs/PP.html). This generated code is compiled and linked into a python extension module, as usual. This functionality documented separately: https://github.com/dkogan/numpysane/blob/master/README-pywrap.org

After I wrote this, I realized there is some support for this in stock numpy:

https://docs.scipy.org/doc/numpy-1.13.0/reference/c-api.ufunc.html

Note: I have not tried using these APIs.

** Nicer array manipulation Numpy functions that move dimensions around and concatenate matrices are unintuitive. For instance, a simple concatenation of a row-vector or a column-vector to a matrix requires arcane knowledge to accomplish reliably. This module provides new functions that can be used for these basic operations. These new functions do have well-defined and sensible behavior, and they largely come from the interfaces in PDL (http://pdl.perl.org). These all respect the core rules of numpy broadcasting:

  • LEADING length-1 dimensions don't affect the meaning of an array, so the routines handle missing or extra length-1 dimensions at the front

  • The inner-most dimensions of an array are the TRAILING ones, so whenever an axis specification is used, it is strongly recommended (sometimes required) to count the axes from the back by passing in axis<0

A high level description of the functionality is given here, and each function is described in detail in the INTERFACE section below. In the following examples, I use a function "arr" that returns a numpy array with given dimensions:

#+BEGIN_EXAMPLE

def arr(shape): ... product = reduce( lambda x,y: xy, shape) ... return numpy.arange(product).reshape(*shape)

arr(1,2,3) array([[[0, 1, 2], [3, 4, 5]]])

arr(1,2,3).shape (1, 2, 3) #+END_EXAMPLE

*** Concatenation This module provides two functions to do this

**** glue Concatenates some number of arrays along a given axis ('axis' must be given in a kwarg). Implicit length-1 dimensions are added at the start as needed. Dimensions other than the glueing axis must match exactly. Basic usage:

#+BEGIN_EXAMPLE

row_vector = arr( 3,) col_vector = arr(5,1,) matrix = arr(5,3,)

numpysane.glue(matrix, row_vector, axis = -2).shape (6,3)

numpysane.glue(matrix, col_vector, axis = -1).shape (5,4) #+END_EXAMPLE

**** cat Concatenate some number of arrays along a new leading axis. Implicit length-1 dimensions are added, and the logical shapes of the inputs must match. This function is a logical inverse of numpy array iteration: iteration splits an array over its leading dimension, while cat joins a number of arrays via a new leading dimension. Basic usage:

#+BEGIN_EXAMPLE

numpysane.cat(arr(5,), arr(5,)).shape (2,5)

numpysane.cat(arr(5,), arr(1,1,5,)).shape (2,1,1,5) #+END_EXAMPLE

*** Manipulation of dimensions Several functions are available, all being fairly direct ports of their PDL (http://pdl.perl.org) equivalents **** clump Reshapes the array by grouping together 'n' dimensions, where 'n' is given in a kwarg. If 'n' > 0, then n leading dimensions are clumped; if 'n' < 0, then -n trailing dimensions are clumped. Basic usage:

#+BEGIN_EXAMPLE

numpysane.clump( arr(2,3,4), n = -2).shape (2, 12)

numpysane.clump( arr(2,3,4), n = 2).shape (6, 4) #+END_EXAMPLE

**** atleast_dims Adds length-1 dimensions at the front of an array so that all the given dimensions are in-bounds. Any axis<0 may expand the shape. Adding new leading dimensions (axis>=0) is never useful, since numpy broadcast

Related Skills

View on GitHub
GitHub Stars47
CategoryDevelopment
Updated10mo ago
Forks2

Languages

Python

Security Score

72/100

Audited on Jun 8, 2025

No findings