CleanX
Python library for exploring, cleaning, normalizing, and augmenting large datasets of radiological data.
Install / Use
/learn @drcandacemakedamoore/CleanXREADME
CleanX
CleanX is an open-source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images. The images can be extracted from DICOM files or used directly. The primary authors are Candace Makeda H. Moore, Oleg Sivokon, and Andrew Murphy.
Documentation
Online documentation is at https://drcandacemakedamoore.github.io/cleanX/. You can also build up-to-date documentation, which will be generated in ./build/sphinx/html directory, by command as follows:
python setup.py apidoc
python setup.py build_sphinx
Special additional documentation for medical professionals with limited programming ability is available here on the project wiki. To get a high level overview of some of the functionality of the program you can look at the Jupyter notebooks inside workflow_demo folder.
Requirements
- Python 3.7, 3.8, 3.9. Python 3.10 has not been tested yet.
- Ability to create virtual environments (recommended, not absolutely necessary)
tesserocr,matplotlib,pandas, andopencv- Optional recommendation of
SimpleITKorpydicomfor DICOM/dcm to JPG conversion - Anaconda is now supported, but not technically necessary
Supported Platforms
CleanX is a pure Python package, but it has many dependencies on native libraries. We try to test it on as many platforms as we can to see if dependencies can be installed there. Below is the list of platforms that will potentially work. Please note that where python.org Python or Anaconda Python stated as supported, it means that versions 3.7, 3.8 and 3.9 (but not 3.10) are supported.
AMD64 (x86)
| | Linux | Win | OSX |
|:---------------------------:|:---------:|:---------:|:---------:|
|
| Supported | Unknown | Unknown |
|
| Supported | Supported | Supported |
ARM64
Unsupported at the moment on both Linux and OSX, but it's likely that support will be added in the future.
32-bit Intell and ARM
We don't know if either one of these is supported. There's a good chance that 32-bit Intell will work. There's a good chance that ARM won't. It's unlikely that the support for ARM will be added in the future.
Installation
-
setting up a virtual environment is desirable, but not absolutely necessary
-
activate the environment
Anaconda Installation
- use command for conda as below
conda install -c doctormakeda -c conda-forge cleanx
You need to specify both channels because there are some cleanX dependencies that exist in both Anaconda main channel and in conda-forge
pip installation
- use pip as below
pip install cleanX
You can install some optional dependencies.
To have CLI functionality:
pip install cleanX[cli]
To have PyDicom installed and used to process DICOM files:
pip install cleanX[pydicom]
Similarly, if you want SimpleITK used to process DICOM files:
pip install cleanX[simpleitk]
The tesserocr package deserves a special mention. It is not
possible to install tesseract library from PyPI server. The
tesserocr is simply a binding to the library. You will need to
install the library yourself. For example, on Debian flavor Linux,
this might work:
sudo apt-get install libleptonica-dev \
tesseract-ocr-all \
libtesseract-dev
We've heard that
brew install tesseract
works on Mac.
Getting Started
We will imagine a very simple scenario, where we need to automate
normalization of the images we have. We stored the images in
directory /images/to/clean/ and they all have a jpg extension. We
want the cleaned images to be saved in the cleaned directory.
Normalization here means ensuring that the lowest pixel value (the darkest part of the image) is as dark as possible and that the lightest part of the image is as light as possible.
Docker
Docker images are available from Dockerhub. You should be able to run them using:
docker run --rm -v "$(pwd)":/cleanx drcandacemakedamoore/cleanx --help
The /cleanx directory in the image is intentionaly left to be used
as a mount point. The image, by default, runs as root, but doesn't
require root privileged. In the future, it's possible that the image
will come with a non-root user and will default to running as a
non-root user.
Additionally, there is a Docker image with several examples in a form of Jupyter notebooks. To run this image:
docker run --rm -ti -p 8888:8888 \
drcandacemakedamoore/cleanx-jupyter-examples
There is usually a more up-to-date image available built from
develop branch (use at your own risk:)
docker run --rm -ti -p 8888:8888 \
drcandacemakedamoore/cleanx-jupyter-examples:develop
This will generate output similar to:
[I 12:59:52.383 NotebookApp] Writing notebook server cookie secret \
to /home/jupyter/.local/share/jupyter/runtime/notebook_cookie_secret
[I 12:59:52.704 NotebookApp] Serving notebooks from local directory:\
/home/jupyter
[I 12:59:52.704 NotebookApp] Jupyter Notebook 6.4.11 is running at:
[I 12:59:52.705 NotebookApp] http://localhost:8888/?token=...
[I 12:59:52.705 NotebookApp] or http://127.0.0.1:8888/?token=...
[I 12:59:52.705 NotebookApp] Use Control-C to stop this server and \
shut down all kernels (twice to skip confirmation).
[W 12:59:52.709 NotebookApp] No web browser found: could not locate\
runnable browser.
[C 12:59:52.709 NotebookApp]
To access the notebook, open this file in a browser:
file:///.../nbserver-1-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=...
or http://127.0.0.1:8888/?token=...
Copy the text that starts with http://127.0.0.1:8888 (including the
token) and paste it into your browser's address bar. The demos should
be fully operational (you may interact with them, re-evaluate them,
change available parameters etc.)
CLI Example
The problem above doesn't require writing any new Python code. We can
accomplish our task by calling the cleanX command like this:
mkdir cleaned
python -m cleanX images run-pipeline \
-s Acquire \
-s Normalize \
-s "Save(target='cleaned')" \
-j \
-r "/images/to/clean/*.jpg"
Let's look at the command's options and arguments:
python -m cleanXis the Python's command-line option for loading thecleanXpackage. All command-line arguments that follow this part are interpreted bycleanX.imagessub-command is used for processing of images.run-pipelinesub-command is used to start aPipelineto process the images.-s(repeatable) option specifiesPipelineStep. Steps map to their class names as found in thecleanX.image_work.stepsmodule. If the__init__function of a step doesn't take any arguments, only the class name is necessary. If, however, it takes arguments, they must be given using Python's literals, using Python's named arguments syntax.-joption instructs to create journaling pipeline. Journaling pipelines can be restarted from the point where they failed, or had been interrupted.-rallows to specify source for the pipeline. While, normally, we will want to start withAcquirestep, if the pipeline was interrupted, we need to tell it where to look for the initial sources.
Once the command finishes, we should see the cleaned directory filled
with images with the same names they had in the source directory.
Let's consider another simple task: batch-extraction of images from DICOM files:
mkdir extracted
python -m cleanX dicom extract \
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
