shablona

Shablona is a template project for small scientific python projects. The recommendations we make here follow the standards and conventions of much of the scientific Python eco-system. Following these standards and recommendations will make it easier for others to use your code, and can make it easier for you to port your code into other projects and collaborate with other users of this eco-system.

To use it as a template for your own project, click the green "use this template" button at the top of the front page of this repo.

First, let me explain all the different moving parts that make up a small scientific python project, and all the elements which allow us to effectively share it with others, test it, document it, and track its evolution.

Organization of the project

The project has the following structure:

shablona/
  |- README.md
  |- shablona/
     |- __init__.py
     |- shablona.py
     |- due.py
     |- data/
        |- ...
     |- tests/
        |- ...
  |- doc/
     |- Makefile
     |- conf.py
     |- sphinxext/
        |- ...
     |- _static/
        |- ...
  |- setup.py
  |- .travis.yml
  |- .mailmap
  |- appveyor.yml
  |- LICENSE
  |- Makefile
  |- ipynb/
     |- ...

In the following sections we will examine these elements one by one. First, let's consider the core of the project. This is the code inside of shablona/shablona.py. The code provided in this file is intentionally rather simple. It implements some simple curve-fitting to data from a psychophysical experiment. It's not too important to know what it does, but if you are really interested, you can read all about it here.

Module code

We place the module code in a file called shablona.py in directory called shablona. This structure is a bit confusing at first, but it is a simple way to create a structure where when we type import shablona as sb in an interactive Python session, the classes and functions defined inside of the shablona.py file are available in the sb namespace. For this to work, we need to also create a file in __init__.py which contains code that imports everything in that file into the namespace of the project:

from .shablona import *

In the module code, we follow the convention that all functions are either imported from other places, or are defined in lines that precede the lines that use that function. This helps readability of the code, because you know that if you see some name, the definition of that name will appear earlier in the file, either as a function/variable definition, or as an import from some other module or package.

In the case of the shablona module, the main classes defined at the bottom of the file make use of some of the functions defined in preceding lines.

Remember that code will be probably be read more times than it will be written. Make it easy to read (for others, but also for yourself when you come back to it), by following a consistent formatting style. We strongly recommend following the PEP8 code formatting standard, and we enforce this by running a code-linter called flake8, which automatically checks the code and reports any violations of the PEP8 standard (and checks for other general code hygiene issues), see below.

Project Data

In this case, the project data is rather small, and recorded in csv files. Thus, it can be stored alongside the module code. Even if the data that you are analyzing is too large, and cannot be effectively tracked with github, you might still want to store some data for testing purposes.

Either way, you can create a shablona/data folder in which you can organize the data. As you can see in the test scripts, and in the analysis scripts, this provides a standard file-system location for the data at:

import os.path as op
import shablona as sb
data_path = op.join(sb.__path__[0], 'data')

Testing

Most scientists who write software constantly test their code. That is, if you are a scientist writing software, I am sure that you have tried to see how well your code works by running every new function you write, examining the inputs and the outputs of the function, to see if the code runs properly (without error), and to see whether the results make sense.

Automated code testing takes this informal practice, makes it formal, and automates it, so that you can make sure that your code does what it is supposed to do, even as you go about making changes around it.

Most scientists writing code are not really in a position to write a complete specification of their software, because when they start writing their code they don't quite know what they will discover in their data, and these chance discoveries might affect how the software evolves. Nor do most scientists have the inclination to write complete specs - scientific code often needs to be good enough to cover our use-case, and not any possible use-case. Testing the code serves as a way to provide a reader of the code with very rough specification, in the sense that it at least specifies certain input/output relationships that will certainly hold in your code.

We recommend using the 'pytest' library for testing. The py.test application traverses the directory tree in which it is issued, looking for files with the names that match the pattern test_*.py (typically, something like our shablona/tests/test_shablona.py). Within each of these files, it looks for functions with names that match the pattern test_*. Typically each function in the module would have a corresponding test (e.g. test_transform_data). This is sometimes called 'unit testing', because it independently tests each atomic unit in the software. Other tests might run a more elaborate sequence of functions ('end-to-end testing' if you run through the entire analysis), and check that particular values in the code evaluate to the same values over time. This is sometimes called 'regression testing'. We have one such test in shablona/tests/test_shablona.py called test_params_regression. Regressions in the code are often canaries in the coal mine, telling you that you need to examine changes in your software dependencies, the platform on which you are running your software, etc.

Test functions should contain assertion statements that check certain relations in the code. Most typically, they will test for equality between an explicit calculation of some kind and a return of some function. For example, in the test_cumgauss function, we test that our implmentation of the cumulative Gaussian function evaluates at the mean minus 1 standard deviation to approximately (1-0.68)/2, which is the theoretical value this calculation should have. We recommend using functions from the numpy.testing module (which we import as npt) to assert certain relations on arrays and floating point numbers. This is because npt contains functions that are specialized for handling numpy arrays, and they allow to specify the tolerance of the comparison through the decimal key-word argument.

To run the tests on the command line, change your present working directory to the top-level directory of the repository (e.g. /Users/arokem/code/shablona), and type:

py.test shablona

This will exercise all of the tests in your code directory. If a test fails, you will see a message such as:

shablona/tests/test_shablona.py .F...

=================================== FAILURES ===================================
________________________________ test_cum_gauss ________________________________

  def test_cum_gauss():
      sigma = 1
      mu = 0
      x = np.linspace(-1, 1, 12)
      y = sb.cumgauss(x, mu, sigma)
      # A basic test that the input and output have the same shape:
      npt.assert_equal(y.shape, x.shape)
      # The function evaluated over items symmetrical about mu should be
      # symmetrical relative to 0 and 1:
      npt.assert_equal(y[0], 1 - y[-1])
      # Approximately 68% of the Gaussian distribution is in mu +/- sigma, so
      # the value of the cumulative Gaussian at mu - sigma should be
      # approximately equal to (1 - 0.68/2). Note the low precision!
>       npt.assert_almost_equal(y[0], (1 - 0.68) / 2, decimal=3)
E       AssertionError:
E       Arrays are not almost equal to 3 decimals
E        ACTUAL: 0.15865525393145707
E        DESIRED: 0.15999999999999998

shablona/tests/test_shablona.py:49: AssertionError
====================== 1 failed, 4 passed in 0.82 seconds ======================

This indicates to you that a test has failed. In this case, the calculation is accurate up to 2 decimal places, but not beyond, so the decimal key-word argument needs to be adjusted (or the calculation needs to be made more accurate).

As your code grows and becomes more complicated, you might develop new features that interact with your old features in all kinds of unexpected and surprising ways. As you develop new features of your code, keep running the tests, to make sure that you haven't broken the old features. Keep writing new tests for your new code, and recording these tests in your testing scripts. That way, you can be confident that even as the software grows, it still keeps doing correctly at least the few things that are codified in the tests.

We have also provided a Makefile that allows you to run the tests with more verbose and informative output from the top-level directory, by issuing the followi

Shablona

Install / Use

README

shablona

Organization of the project

Module code

Project Data

Testing