Atheris: A Coverage-Guided, Native Python Fuzzer

Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native extensions written for CPython. Atheris is based off of libFuzzer. When fuzzing native code, Atheris can be used in combination with Address Sanitizer or Undefined Behavior Sanitizer to catch extra bugs.

Installation Instructions

Atheris supports Linux (32- and 64-bit) and Mac OS X, Python versions 3.11-3.13. Versions 3.10 and below are not supported in the current source version, but remain accessible in PyPI.

You can install prebuilt versions of Atheris with pip:

pip3 install atheris

These wheels come with a built-in libFuzzer, which is fine for fuzzing Python code. If you plan to fuzz native extensions, you may need to build from source to ensure the libFuzzer version in Atheris matches your Clang version.

Building from Source

Atheris relies on libFuzzer, which is distributed with Clang. If you have a sufficiently new version of clang on your path, installation from source is as simple as:

# Build latest release from source
pip3 install --no-binary atheris atheris
# Build development code from source
git clone https://github.com/google/atheris.git
cd atheris
pip3 install .

If you don't have clang installed or it's too old, you'll need to download and build the latest version of LLVM. Follow the instructions in Installing Against New LLVM below.

Mac

Apple Clang doesn't come with libFuzzer, so you'll need to install a new version of LLVM from head. Follow the instructions in Installing Against New LLVM below.

Installing Against New LLVM

# Building LLVM
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir build
cd build
cmake -DLLVM_ENABLE_PROJECTS='clang;compiler-rt' -G "Unix Makefiles" ../llvm
make -j 10  # This step is very slow

# Installing Atheris
CLANG_BIN="$(pwd)/bin/clang" pip3 install <whatever>

Using Atheris

Example

#!/usr/bin/python3

import atheris

with atheris.instrument_imports():
  import some_library
  import sys

def TestOneInput(data):
  some_library.parse(data)

atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

When fuzzing Python, Atheris will report a failure if the Python code under test throws an uncaught exception.

Python coverage

Atheris collects Python coverage information by instrumenting bytecode. There are 3 options for adding this instrumentation to the bytecode:

You can instrument the libraries you import:
```
with atheris.instrument_imports():
  import foo
  from bar import baz
```
This will cause instrumentation to be added to foo and bar, as well as any libraries they import.

Or, you can instrument individual functions:

@atheris.instrument_func
def my_function(foo, bar):
  print("instrumented")

Or finally, you can instrument everything:
```
atheris.instrument_all()
```
Put this right before atheris.Setup(). This will find every Python function currently loaded in the interpreter, and instrument it. This might take a while.

Atheris can additionally instrument regular expression checks, e.g. re.search. To enable this feature, you will need to add: atheris.enabled_hooks.add("RegEx") To your script before your code calls re.compile. Internally this will import the re module and instrument the necessary functions. This is currently an experimental feature.

Similarly, Atheris can instrument str methods; currently only str.startswith and str.endswith are supported. To enable this feature, add atheris.enabled_hooks.add("str"). This is currently an experimental feature.

Why am I getting "No interesting inputs were found"?

You might see this error:

ERROR: no interesting inputs were found. Is the code instrumented for coverage? Exiting.

You'll get this error if the first 2 calls to TestOneInput didn't produce any coverage events. Even if you have instrumented some Python code, this can happen if the instrumentation isn't reached in those first 2 calls. (For example, because you have a nontrivial TestOneInput). You can resolve this by adding an atheris.instrument_func decorator to TestOneInput, using atheris.instrument_all(), or moving your TestOneInput function into an instrumented module.

Visualizing Python code coverage

Examining which lines are executed is helpful for understanding the effectiveness of your fuzzer. Atheris is compatible with coverage.py: you can run your fuzzer using the coverage.py module as you would for any other Python program. Here's an example:

python3 -m coverage run your_fuzzer.py -atheris_runs=10000  # Times to run
python3 -m coverage html
(cd htmlcov && python3 -m http.server 8000)

Coverage reports are only generated when your fuzzer exits gracefully. This happens if:

you specify -atheris_runs=<number>, and that many runs have elapsed.
your fuzzer exits by Python exception.
your fuzzer exits by sys.exit().

No coverage report will be generated if your fuzzer exits due to a crash in native code, or due to libFuzzer's -runs flag (use -atheris_runs). If your fuzzer exits via other methods, such as SIGINT (Ctrl+C), Atheris will attempt to generate a report but may be unable to (depending on your code). For consistent reports, we recommend always using -atheris_runs=<number>.

If you'd like to examine coverage when running with your corpus, you can do that with the following command:

python3 -m coverage run your_fuzzer.py corpus_dir/* -atheris_runs=$(( 1 + $(ls corpus_dir | wc -l) ))

This will cause Atheris to run on each file in <corpus-dir>, then exit. Note: atheris use empty data set as the first input even if there is no empty file in <corpus_dir>. Importantly, if you leave off the -atheris_runs=$(ls corpus_dir | wc -l), no coverage report will be generated.

Using coverage.py will significantly slow down your fuzzer, so only use it for visualizing coverage; don't use it all the time.

Fuzzing Native Extensions

In order for fuzzing native extensions to be effective, your native extensions must be instrumented. See Native Extension Fuzzing for instructions.

Structure-aware Fuzzing

Atheris is based on a coverage-guided mutation-based fuzzer (LibFuzzer). This has the advantage of not requiring any grammar definition for generating inputs, making its setup easier. The disadvantage is that it will be harder for the fuzzer to generate inputs for code that parses complex data types. Often the inputs will be rejected early, resulting in low coverage.

Atheris supports custom mutators (as offered by LibFuzzer) to produce grammar-aware inputs.

Example (Atheris-equivalent of the example in the LibFuzzer docs):

@atheris.instrument_func
def TestOneInput(data):
  try:
    decompressed = zlib.decompress(data)
  except zlib.error:
    return

  if len(decompressed) < 2:
    return

  try:
    if decompressed.decode() == 'FU':
      raise RuntimeError('Boom')
  except UnicodeDecodeError:
    pass

To reach the RuntimeError crash, the fuzzer needs to be able to produce inputs that are valid compressed data and satisfy the checks after decompression. It is very unlikely that Atheris will be able to produce such inputs: mutations on the input data will most probably result in invalid data that will fail at decompression-time.

To overcome this issue, you can define a custom mutator function (equivalent to LLVMFuzzerCustomMutator). This example produces valid compressed data. To enable Atheris to make use of it, pass the custom mutator function to the invocation of atheris.Setup.

def CustomMutator(data, max_size, seed):
  try:
    decompressed = zlib.decompress(data)
  except zlib.error:
    decompressed = b'Hi'
  else:
    decompressed = atheris.Mutate(decompressed, len(decompressed))
  return zlib.compress(decompressed)

atheris.Setup(sys.argv, TestOneInput, custom_mutator=CustomMutator)
atheris.Fuzz()

As seen in the example, the custom mutator may request Atheris to mutate data using atheris.Mutate() (this is equivalent to LLVMFuzzerMutate).

You can experiment with custom_mutator_example.py and see that without the mutator Atheris would not be able to find the crash, while with the mutator this is achieved in a matter of seconds.

$ python3 example_fuzzers/custom_mutator_example.py --no_mutator
[...]
#2      INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 37Mb
#524288 pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 262144 rss: 37Mb
#1048576        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 349525 rss: 37Mb
#2097152        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 299593 rss: 37Mb
#4194304        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 279620 rss: 37Mb
[...]

$ python3 example_fuzzers/custom_mutator_example.py
[...]
INFO: found LLVMFuzzerCustomMutator (0x7f9c989fb0d0). Disabling -len_control by default.
[...]
#2      INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 37Mb
#3      NEW    cov: 4 ft: 4 corp: 2/11b lim: 4096 exec/s: 0 rss: 37Mb L: 10/10 MS: 1 Custom-
#12     NEW    cov: 5 ft: 5 corp: 3/21b lim: 4096 exec/s: 0 rss: 37Mb L: 10/10 MS: 7 Custom-CrossOver-Custom-CrossOver-Custom-ChangeBit-Custom-
 === Uncaught Python exception: ===
RuntimeError: Boom
Traceback (most recent call last):
  File "example_fuzzers/custom_mutator_example.py", line 62, in TestOneInput
    raise RuntimeError('Boom')
[...]

Custom crossover functions (equivalent to LLVMFuzzerCustomCrossOver) are also s

Atheris

Install / Use

README