SkillAgentSearch skills...

CleanX

Python library for exploring, cleaning, normalizing, and augmenting large datasets of radiological data.

Install / Use

/learn @drcandacemakedamoore/CleanX
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img style="width: 30%; height: 30%" src="cleanx-logo.svg"> </p>

CleanX

Zenodo DOI License GPL-3 Anaconda-Server Badge JOSS Publication Anaconda-Server Badge PYPI Version Anaconda-Server Badge Sanity Sanity Documentation GitHub issues GitHub Discussions

CleanX is an open-source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images. The images can be extracted from DICOM files or used directly. The primary authors are Candace Makeda H. Moore, Oleg Sivokon, and Andrew Murphy.

Documentation

Online documentation is at https://drcandacemakedamoore.github.io/cleanX/. You can also build up-to-date documentation, which will be generated in ./build/sphinx/html directory, by command as follows:

python setup.py apidoc
python setup.py build_sphinx

Special additional documentation for medical professionals with limited programming ability is available here on the project wiki. To get a high level overview of some of the functionality of the program you can look at the Jupyter notebooks inside workflow_demo folder.

Requirements

  • Python 3.7, 3.8, 3.9. Python 3.10 has not been tested yet.
  • Ability to create virtual environments (recommended, not absolutely necessary)
  • tesserocr, matplotlib, pandas, and opencv
  • Optional recommendation of SimpleITK or pydicom for DICOM/dcm to JPG conversion
  • Anaconda is now supported, but not technically necessary

Supported Platforms

CleanX is a pure Python package, but it has many dependencies on native libraries. We try to test it on as many platforms as we can to see if dependencies can be installed there. Below is the list of platforms that will potentially work. Please note that where python.org Python or Anaconda Python stated as supported, it means that versions 3.7, 3.8 and 3.9 (but not 3.10) are supported.

AMD64 (x86)

| | Linux | Win | OSX | |:---------------------------:|:---------:|:---------:|:---------:| | p | Supported | Unknown | Unknown | | a | Supported | Supported | Supported |

ARM64

Unsupported at the moment on both Linux and OSX, but it's likely that support will be added in the future.

32-bit Intell and ARM

We don't know if either one of these is supported. There's a good chance that 32-bit Intell will work. There's a good chance that ARM won't. It's unlikely that the support for ARM will be added in the future.

Installation

  • setting up a virtual environment is desirable, but not absolutely necessary

  • activate the environment

Anaconda Installation

  • use command for conda as below
conda install -c doctormakeda -c conda-forge cleanx

You need to specify both channels because there are some cleanX dependencies that exist in both Anaconda main channel and in conda-forge

pip installation

  • use pip as below
pip install cleanX

You can install some optional dependencies.

To have CLI functionality:

pip install cleanX[cli]

To have PyDicom installed and used to process DICOM files:

pip install cleanX[pydicom]

Similarly, if you want SimpleITK used to process DICOM files:

pip install cleanX[simpleitk]

The tesserocr package deserves a special mention. It is not possible to install tesseract library from PyPI server. The tesserocr is simply a binding to the library. You will need to install the library yourself. For example, on Debian flavor Linux, this might work:

sudo apt-get install libleptonica-dev \
    tesseract-ocr-all \
    libtesseract-dev

We've heard that

brew install tesseract

works on Mac.

Getting Started

We will imagine a very simple scenario, where we need to automate normalization of the images we have. We stored the images in directory /images/to/clean/ and they all have a jpg extension. We want the cleaned images to be saved in the cleaned directory.

Normalization here means ensuring that the lowest pixel value (the darkest part of the image) is as dark as possible and that the lightest part of the image is as light as possible.

Docker

Docker images are available from Dockerhub. You should be able to run them using:

docker run --rm -v "$(pwd)":/cleanx drcandacemakedamoore/cleanx --help

The /cleanx directory in the image is intentionaly left to be used as a mount point. The image, by default, runs as root, but doesn't require root privileged. In the future, it's possible that the image will come with a non-root user and will default to running as a non-root user.

Additionally, there is a Docker image with several examples in a form of Jupyter notebooks. To run this image:

docker run --rm -ti -p 8888:8888 \
    drcandacemakedamoore/cleanx-jupyter-examples

There is usually a more up-to-date image available built from develop branch (use at your own risk:)

docker run --rm -ti -p 8888:8888 \
    drcandacemakedamoore/cleanx-jupyter-examples:develop

This will generate output similar to:

[I 12:59:52.383 NotebookApp] Writing notebook server cookie secret  \
to /home/jupyter/.local/share/jupyter/runtime/notebook_cookie_secret
[I 12:59:52.704 NotebookApp] Serving notebooks from local directory:\
/home/jupyter
[I 12:59:52.704 NotebookApp] Jupyter Notebook 6.4.11 is running at:
[I 12:59:52.705 NotebookApp] http://localhost:8888/?token=...
[I 12:59:52.705 NotebookApp]  or http://127.0.0.1:8888/?token=...
[I 12:59:52.705 NotebookApp] Use Control-C to stop this server and \
shut down all kernels (twice to skip confirmation).
[W 12:59:52.709 NotebookApp] No web browser found: could not locate\
runnable browser.
[C 12:59:52.709 NotebookApp] 

    To access the notebook, open this file in a browser:
        file:///.../nbserver-1-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=...
     or http://127.0.0.1:8888/?token=...

Copy the text that starts with http://127.0.0.1:8888 (including the token) and paste it into your browser's address bar. The demos should be fully operational (you may interact with them, re-evaluate them, change available parameters etc.)

CLI Example

The problem above doesn't require writing any new Python code. We can accomplish our task by calling the cleanX command like this:

mkdir cleaned

python -m cleanX images run-pipeline \
    -s Acquire \
    -s Normalize \
    -s "Save(target='cleaned')" \
    -j \
    -r "/images/to/clean/*.jpg"

Let's look at the command's options and arguments:

  • python -m cleanX is the Python's command-line option for loading the cleanX package. All command-line arguments that follow this part are interpreted by cleanX.
  • images sub-command is used for processing of images.
  • run-pipeline sub-command is used to start a Pipeline to process the images.
  • -s (repeatable) option specifies Pipeline Step. Steps map to their class names as found in the cleanX.image_work.steps module. If the __init__ function of a step doesn't take any arguments, only the class name is necessary. If, however, it takes arguments, they must be given using Python's literals, using Python's named arguments syntax.
  • -j option instructs to create journaling pipeline. Journaling pipelines can be restarted from the point where they failed, or had been interrupted.
  • -r allows to specify source for the pipeline. While, normally, we will want to start with Acquire step, if the pipeline was interrupted, we need to tell it where to look for the initial sources.

Once the command finishes, we should see the cleaned directory filled with images with the same names they had in the source directory.

Let's consider another simple task: batch-extraction of images from DICOM files:


mkdir extracted

python -m cleanX dicom extract \

Related Skills

View on GitHub
GitHub Stars25
CategoryDevelopment
Updated3mo ago
Forks9

Languages

Jupyter Notebook

Security Score

87/100

Audited on Dec 10, 2025

No findings