Bitarray
efficient arrays of booleans for Python
Install / Use
/learn @ilanschnell/BitarrayREADME
bitarray: efficient arrays of booleans
This library provides an object type which efficiently represents an array of booleans. Bitarrays are sequence types and behave very much like usual lists. Eight bits are represented by one byte in a contiguous block of memory. The user can select between two representations: little-endian and big-endian. All functionality is implemented in C. Methods for accessing the machine representation are provided, including the ability to import and export buffers. This allows creating bitarrays that are mapped to other objects, including memory-mapped files.
Key features
-
The bit-endianness can be specified for each bitarray object, see below.
-
Sequence methods: slicing (including slice assignment and deletion), operations
+,*,+=,*=, theinoperator,len() -
Bitwise operations:
~,&,|,^,<<,>>(as well as their in-place versions&=,|=,^=,<<=,>>=). -
Fast methods for encoding and decoding variable bit length prefix codes.
-
Bitarray objects support the buffer protocol (both importing and exporting buffers).
-
Packing and unpacking to other binary data formats, e.g.
numpy.ndarray. -
Pickling and unpickling of bitarray objects.
-
Immutable
frozenbitarrayobjects which are hashable -
Sequential search
-
Type hinting
-
Extensive test suite with about 600 unittests
-
Utility module
bitarray.util:- conversion to and from hexadecimal strings
- generating random bitarrays
- pretty printing
- conversion to and from integers
- creating Huffman codes
- compression of sparse bitarrays
- (de-) serialization
- various count functions
- other helpful functions
Installation
Python wheels are are available on PyPI for all major platforms and Python versions. Which means you can simply:
.. code-block:: shell-session
$ pip install bitarray
Once you have installed the package, you may want to test it:
.. code-block:: shell-session
$ python -c 'import bitarray; bitarray.test()'
bitarray is installed in: /Users/ilan/bitarray/bitarray
bitarray version: 3.8.0
sys.version: 3.13.5 (main, Jun 16 2025) [Clang 18.1.8]
sys.prefix: /Users/ilan/miniforge
pointer size: 64 bit
sizeof(size_t): 8
sizeof(bitarrayobject): 80
HAVE_BUILTIN_BSWAP64: 1
default bit-endianness: big
machine byte-order: little
Py_GIL_DISABLED: 0
Py_DEBUG: 0
DEBUG: 0
.........................................................................
.........................................................................
................................................................
----------------------------------------------------------------------
Ran 595 tests in 0.165s
OK
The test() function is part of the API. It will return
a unittest.runner.TextTestResult object, such that one can verify that
all tests ran successfully by:
.. code-block:: python
import bitarray
assert bitarray.test().wasSuccessful()
Usage
As mentioned above, bitarray objects behave very much like lists, so there is not too much to learn. The biggest difference from list objects (except that bitarray are obviously homogeneous) is the ability to access the machine representation of the object. When doing so, the bit-endianness is of importance; this issue is explained in detail in the section below. Here, we demonstrate the basic usage of bitarray objects:
.. code-block:: python
>>> from bitarray import bitarray
>>> a = bitarray() # create empty bitarray
>>> a.append(1)
>>> a.extend([1, 0])
>>> a
bitarray('110')
>>> x = bitarray(2 ** 20) # bitarray of length 1048576 (initialized to 0)
>>> len(x)
1048576
>>> bitarray('1001 011') # initialize from string (whitespace is ignored)
bitarray('1001011')
>>> lst = [1, 0, False, True, True]
>>> a = bitarray(lst) # initialize from iterable
>>> a
bitarray('10011')
>>> a[2] # indexing a single item will always return an integer
0
>>> a[2:4] # whereas indexing a slice will always return a bitarray
bitarray('01')
>>> a[2:3] # even when the slice length is just one
bitarray('0')
>>> a.count(1)
3
>>> a.remove(0) # removes first occurrence of 0
>>> a
bitarray('1011')
Like lists, bitarray objects support slice assignment and deletion:
.. code-block:: python
>>> a = bitarray(50)
>>> a.setall(0) # set all elements in a to 0
>>> a[11:37:3] = 9 * bitarray('1')
>>> a
bitarray('00000000000100100100100100100100100100000000000000')
>>> del a[12::3]
>>> a
bitarray('0000000000010101010101010101000000000')
>>> a[-6:] = bitarray('10011')
>>> a
bitarray('000000000001010101010101010100010011')
>>> a += bitarray('000111')
>>> a[9:]
bitarray('001010101010101010100010011000111')
In addition, slices can be assigned to booleans, which is easier (and faster) than assigning to a bitarray in which all values are the same:
.. code-block:: python
>>> a = 20 * bitarray('0')
>>> a[1:15:3] = True
>>> a
bitarray('01001001001001000000')
This is easier and faster than:
.. code-block:: python
>>> a = 20 * bitarray('0')
>>> a[1:15:3] = 5 * bitarray('1')
>>> a
bitarray('01001001001001000000')
Note that in the latter we have to create a temporary bitarray whose length must be known or calculated. Another example of assigning slices to Booleans, is setting ranges:
.. code-block:: python
>>> a = bitarray(30)
>>> a[:] = 0 # set all elements to 0 - equivalent to a.setall(0)
>>> a[10:25] = 1 # set elements in range(10, 25) to 1
>>> a
bitarray('000000000011111111111111100000')
As of bitarray version 2.8, indices may also be lists of arbitrary
indices (like in NumPy), or bitarrays that are treated as masks,
see Bitarray indexing <https://github.com/ilanschnell/bitarray/blob/master/doc/indexing.rst>__.
Bitwise operators
Bitarray objects support the bitwise operators ~, &, |, ^,
<<, >> (as well as their in-place versions &=, |=, ^=,
<<=, >>=). The behavior is very much what one would expect:
.. code-block:: python
>>> a = bitarray('101110001')
>>> ~a # invert
bitarray('010001110')
>>> b = bitarray('111001011')
>>> a ^ b # bitwise XOR
bitarray('010111010')
>>> a &= b # inplace AND
>>> a
bitarray('101000001')
>>> a <<= 2 # in-place left-shift by 2
>>> a
bitarray('100000100')
>>> b >> 1 # return b right-shifted by 1
bitarray('011100101')
The C language does not specify the behavior of negative shifts and of left shifts larger or equal than the width of the promoted left operand. The exact behavior is compiler/machine specific. This Python bitarray library specifies the behavior as follows:
- the length of the bitarray is never changed by any shift operation
- blanks are filled by 0
- negative shifts raise
ValueError - shifts larger or equal to the length of the bitarray result in bitarrays with all values 0
It is worth noting that (regardless of bit-endianness) the bitarray left
shift (<<) always shifts towards lower indices, and the right
shift (>>) always shifts towards higher indices.
Bit-endianness
For many purposes the bit-endianness is not of any relevance to the end user
and can be regarded as an implementation detail of bitarray objects.
However, there are use cases when the bit-endianness becomes important.
These use cases involve explicitly reading and writing the bitarray buffer
using .tobytes(), .frombytes(), .tofile() or .fromfile(),
importing and exporting buffers. Also, a number of utility functions
in bitarray.util will return different results depending on
bit-endianness, such as ba2hex() or ba2int.
To better understand this topic, please read bit-endianness <https://github.com/ilanschnell/bitarray/blob/master/doc/endianness.rst>__.
Buffer protocol
Bitarray objects support the buffer protocol. They can both export their
own buffer, as well as import another object's buffer. To learn more about
this topic, please read buffer protocol <https://github.com/ilanschnell/bitarray/blob/master/doc/buffer.rst>. There is also an example that shows how
to memory-map a file to a bitarray: mmapped-file.py <https://github.com/ilanschnell/bitarray/blob/master/examples/mmapped-file.py>
Variable bit length prefix codes
The .encode() method takes a dictionary mapping symbols to bitarrays
and an iterable, and extends the bitarray object with the encoded symbols
found while iterating. For example:
.. code-block:: python
>>> d = {'H':bitarray('111'), 'e':bitarray('0'),
... 'l':bitarray('110'), 'o':bitarray('10')}
...
>>> a = bitarray()
>>> a.encode(d, 'Hello')
>>> a
bitarray('111011011010')
Note that the string 'Hello' is an iterable, but the symbols are not
limited to characters, in fact any immutable Python object can be a symbol.
Taking the same dictionary, we can apply the .decode() method which will
return an iterable of the symbols:
.. code-block:: python
>>> list(a.decode(d))
['H', 'e', 'l', 'l', 'o']
>>> ''.join(a.decode(d))
'Hello'
Symbols are not limited to being characters.
The above dictionary d can be efficiently constructed using the function
bitarray.util.huffman_code(). I also wrote Huffman coding in Python using bitarray <http://ilan.schnell-web.net/prog/huffman/>__ for more
background information.
When the codes are large, and you have many decode calls, most time will be spent creating the (same) internal decode tree objects. In this case, it will b
