Nutter

Testing framework for Databricks notebooks

Generate Convert Improve

Install / Use

/learn @microsoft/Nutter

About this skill

Quality Score

0/100

README

Nutter

Overview
Nutter Runner
Nutter CLI
Nutter CLI Syntax and Flags
- Run Command
- List Command
Integrating Nutter with Azure DevOps
Contributing
- Contribution Tips
- Contribution Guidelines

Overview

The Nutter framework makes it easy to test Databricks notebooks. The framework enables a simple inner dev loop and easily integrates with Azure DevOps Build/Release pipelines, among others. When data or ML engineers want to test a notebook, they simply create a test notebook called test_<notebook_under_test>.

Nutter has 2 main components:

Nutter Runner - this is the server-side component that is installed as a library on the Databricks cluster
Nutter CLI - this is the client CLI that can be installed both on a developers laptop and on a build agent

The tests can be run from within that notebook or executed from the Nutter CLI, useful for integrating into Build/Release pipelines.

Nutter Runner

Cluster Installation

The Nutter Runner can be installed as a cluster library, via PyPI.

For more information about installing libraries on a cluster, review Install a library on a cluster.

Nutter Fixture

The Nutter Runner is simply a base Python class, NutterFixture, that test fixtures implement. The runner runtime is a module you can use once you install Nutter on the Databricks cluster. The NutterFixture base class can then be imported in a test notebook and implemented by a test fixture:

from runtime.nutterfixture import NutterFixture, tag
class MyTestFixture(NutterFixture):
   …

To run the tests:

result = MyTestFixture().execute_tests()

To view the results from within the test notebook:

print(result.to_string())

To return the test results to the Nutter CLI:

result.exit(dbutils)

Note: The call to result.exit, behind the scenes calls dbutils.notebook.exit, passing the serialized TestResults back to the CLI. At the current time, print statements do not work when dbutils.notebook.exit is called in a notebook, even if they are written prior to the call. For this reason, it is required to temporarily comment out result.exit(dbutils) when running the tests locally.

The following defines a single test fixture named 'MyTestFixture' that has 1 TestCase named 'test_name':

from runtime.nutterfixture import NutterFixture, tag
class MyTestFixture(NutterFixture):
   def run_test_name(self):
      dbutils.notebook.run('notebook_under_test', 600, args)

   def assertion_test_name(self):
      some_tbl = sqlContext.sql('SELECT COUNT(*) AS total FROM sometable')
      first_row = some_tbl.first()
      assert (first_row[0] == 1)

result = MyTestFixture().execute_tests()
print(result.to_string())
# Comment out the next line (result.exit(dbutils)) to see the test result report from within the notebook
result.exit(dbutils)

To execute the test from within the test notebook, simply run the cell containing the above code. At the current time, in order to see the below test result, you will have to comment out the call to result.exit(dbutils). That call is required to send the results, if the test is run from the CLI, so do not forget to uncomment after locally testing.

Notebook: (local) - Lifecycle State: N/A, Result: N/A
============================================================
PASSING TESTS
------------------------------------------------------------
test_name (19.43149897100011 seconds)


============================================================

Test Cases

A test fixture can contain 1 or more test cases. Test cases are discovered when execute_tests() is called on the test fixture. Every test case is comprised of 1 required and 3 optional methods and are discovered by the following convention: prefix_testname, where valid prefixes are: before_, run_, assertion_, and after_. A test fixture that has run_fred and assertion_fred methods has 1 test case called 'fred'. The following are details about test case methods:

before_(testname) - (optional) - if provided, is run prior to the 'run_' method. This method can be used to setup any test pre-conditions
run_(testname) - (optional) - if provider, is run after 'before_' if before was provided, otherwise run first. This method is typically used to run the notebook under test
assertion_(testname) (required) - run after 'run_', if run was provided. This method typically contains the test assertions

Note: You can assert test scenarios using the standard assert statement or the assertion capabilities from a package of your choice.

after_(testname) (optional) - if provided, run after 'assertion_'. This method typically is used to clean up any test data used by the test

A test fixture can have multiple test cases. The following example shows a fixture called MultiTestFixture with 2 test cases: 'test_case_1' and 'test_case_2' (assertion code omitted for brevity):

from runtime.nutterfixture import NutterFixture, tag
class MultiTestFixture(NutterFixture):
   def run_test_case_1(self):
      dbutils.notebook.run('notebook_under_test', 600, args)

   def assertion_test_case_1(self):
     …

   def run_test_case_2(self):
      dbutils.notebook.run('notebook_under_test', 600, args)

   def assertion_test_case_2(self):
     …

result = MultiTestFixture().execute_tests()
print(result.to_string())
#result.exit(dbutils)

before_all and after_all

Test Fixtures also can have a before_all() method which is run prior to all tests and an after_all() which is run after all tests.

from runtime.nutterfixture import NutterFixture, tag
class MultiTestFixture(NutterFixture):
   def before_all(self):
      …

   def run_test_case_1(self):
      dbutils.notebook.run('notebook_under_test', 600, args)

   def assertion_test_case_1(self):
     …

   def after_all(self):
      …

Multiple test assertions pattern with before_all

It is possible to support multiple assertions for a test by implementing a before_all method, no run methods and multiple assertion methods. In this pattern, the before_all method runs the notebook under test. There are no run methods. The assertion methods simply assert against what was done in before_all.

from runtime.nutterfixture import NutterFixture, tag
class MultiTestFixture(NutterFixture):
   def before_all(self):
     dbutils.notebook.run('notebook_under_test', 600, args) 
      …

   def assertion_test_case_1(self):
      …

   def assertion_test_case_2(self):
     …

   def after_all(self):
      …

Guaranteed test order

After test cases are loaded, Nutter uses a sorted dictionary to order them by name. Therefore test cases will be executed in alphabetical order.

Sharing state between test cases

It is possible to share state across test cases via instance variables. Generally, these should be set in the constructor. Please see below:

class TestFixture(NutterFixture):
  def __init__(self):
    self.file = '/data/myfile'
    NutterFixture.__init__(self)

Running test fixtures in parallel

Version 0.1.35 includes a parallel runner class NutterFixtureParallelRunner that facilitates the execution of test fixtures concurrently. This approach could significantly increase the performance of your testing pipeline.

The following code executes two fixtures, CustomerTestFixture and CountryTestFixture in parallel.

from runtime.runner import NutterFixtureParallelRunner
from runtime.nutterfixture import NutterFixture, tag
class CustomerTestFixture(NutterFixture):
   def run_customer_data_is_inserted(self):
      dbutils.notebook.run('../data/customer_data_import', 600)

   def assertion_customer_data_is_inserted(self):
      some_tbl = sqlContext.sql('SELECT COUNT(*) AS total FROM customers')
      first_row = some_tbl.first()
      assert (first_row[0] == 1)

class CountryTestFixture(NutterFixture):
   def run_country_data_is_inserted(self):
      dbutils.notebook.run('../data/country_data_import', 600)

   def assertion_country_data_is_inserted(self):
      some_tbl = sqlContext.sql('SELECT COUNT(*) AS total FROM countries')
      first_row = some_tbl.first()
      assert (first_row[0] == 1)

parallel_runner = NutterFixtureParallelRunner(num_of_workers=2)
parallel_runner.add_test_fixture(CustomerTestFixture())
parallel_runner.add_test_fixture(CountryTestFixture())

result = parallel_runner.execute()
print(result.to_string())
# Comment out the next line (result.exit(dbutils)) to see the test result report from within the notebook
# result.exit(dbutils)

The parallel runner combines the test results of both fixtures in a single result.

Notebook: N/A - L

Related Skills

node-connect

340.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

340.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

84.2k

Commit, push, and open a PR