PDCP

PDCP is a tool used to collect and process patients radiotherapy imaging data from the Orthanc research PACs via its REST API.

Generate Convert Improve

Install / Use

/learn @AustralianCancerDataNetwork/PDCP

About this skill

Quality Score

0/100

README

Welcome to PDCP's documentation!

Overview (version 0.0.1)

Patient Data Collection and Processing (PDCP) is a tool used to collect and process patients radiotherapy imaging data from the Orthanc research PACs via its REST API.

Orthanc provides a simple and standalone DICOM server (PACS). It supports the DICOM and DICOMweb protocols. It was used in the AusCAT network to save patients anonymized radiotherapy data exported from the Hospital PACS.

PDCP is a python module that consists of multiple classes used to prepare patients records. Object oriented programming and inheritance were followed to make the code reusable and to extend certain functions. The patientImaging class is the main module and it contains a set of functions used to collect, validate, and track the data preparation process.

Other classes that inherit from patientImaging were created to handle different cancer sites and projects:

patientImagingCRDP: a class that inherits the patientImaging class, used to collect and process cancer patients data from the AUScat data centers, where there is a need to link 4 patient modalities. (C: stands for CT, R: stands for RTSTRUCTS, D: stands for RTDOSE, P: stands for the RTPLAN)
patientImagingCRD: a class that inherits the patientImaging class, used to collect and process cancer patients radiotherapy data where the RTPLAN cannot be obtained
patientImagingCR: a class that inherits the patientImaging class, used to collect and process a patients data where there is a need to link the CT with the RTSTRUCT only (segmentation task).

Other classes/functions were implemented to facilitate the data preparation task:

ReadPatientImagingData: used to load the patient's processed data into a python dictionary {'CT':,'masks':,etc}.
ROIP : a class used to generate the central slices patient's organs at risk (OARs) and target volumes (TV)
DVH : a class used to generate the dosimetry features for the patients.

Several python packages were utilized to collect the data and to process the images:

requests: a python built-in module used to retrieve required files from an orthanc server (targets the orthanc server rest API)
pyorthanc: a python library that wraps the Orthanc REST API and facilitates the manipulation of data with several utilities. It was used in this script to identify patient related files. It was initially used to retrieve data through some fuctions (get_instance_file(), get_instance_simplified_tags()), however it was noticed that it was slow compared to python built-in modules.
concurrent.futures : a python built-in module used to apply threading and multiprocessing.
pydicom: a python library used to handle dicom imaging tasks. i.e the orthanc server returns the files as a bytes array. pydicom functions were used to convert such arrays to pydicom FileDatasets before saving into the patient's directory.
simpleITK: is a simplified, open-source interface to the Insight Segmentation and Registration Toolkit. It was used to handle resampling tasks for the dose grid.
numpy: used to save radiotherapy data into 3d arrays.
pandas: a python library used with tabular data, dataframes were used to save the images summaries in this script
json: used to save patient notes while collecting and processing the records
hdf5stogare: used to save and load matlab files that contains the patient's related data (not used).

What can PDCP do?

PDCP facilitates the data collection and preparation for using imaging radiotherapy data. PDCP can:

Query, retrieve, and validate patient imaging summaries from an Orthanc PACS.
Analyse associations in patient studies (linking required modalities)
Retrieve patient imaging data into a local directory.
Prepare the records for use in various research questions (dosimetry analyses, contouring, image standardization)
Track the patients data collection process and idenfity reasons behind excluding patients data.

Data Collection Example (Public Data)

https://zenodo.org/record/5847536

Documentation

Linked Software Paper:

Haidar, A.; Aly, F.; Holloway, L. PDCP: A Set of Tools for Extracting, Transforming, and Loading Radiotherapy Data from the Orthanc Research PACS. Software 2022, 1, 215-222. https://doi.org/10.3390/software1020009

Code Documentation:

https://australiancancerdatanetwork.github.io/PDCP/html/index.html

Next Steps

Patients data are currently retrieved via the http protocol. We aim to utilize the WebDav, which is an extended protocol that allows remote web content authoring operations, in future versions to be able to use dicom tags and indexed images from Orthanc directly. Within WebDav, data can be viewed by patients, studies, or uids, which will help in managing DICOM data at the centres.
We aim to utilize the Orthanc plugins such as serve-folders to be able to track patient notes from webpages.
At this stage, patients with multiple studies will require manual review for selecting the right study. We aim to automate this process.

Selecting Modalities in a Patient Study

The current linkage is conducted without retrieving patients data from the Orthanc server. Before retrieving data, the patients will be verified. Further details about verification can be found in documentation associated with the repo.

Selecting a Patient Study

A study is selected if it contains the required modalities added by the user.
A study will be discarded if the selected CT series contains a large number of instances
A study will be discarded if the study has the required RTSTRUCT, with the RTSTRUCT not containing any of the required keywords (i.e. study with RTSTRUCT with contour names PATIENT, ISO) will not be used.
A study will be discarded if it contains multiple CTs with multiple associations, will be discarded and will require review.
A study will be discarded if it contains a keyword that should not be found in its study name (e.g. a breast cohort is being collected while the study name shows 'head and neck').

Example Data Collection

The example below shows the steps followed to prepare a cohort of 10 patients.

This example has been conducted with patientImagingCRDP for a breast cohort, where a linkage between CT--> RTSTRUCT --> RTPLAN --> RTDOSE is targeted.

Step 1: Setup Configuration File

For each data collection and preparation task, a configuration file is required. The first step is to create the configuration in a json file:

Here is a brief overview of the required keys:

required_modalities_for_patient: a list of the required modalities ['CT','RTSTRUCT','RTPLAN','RTDOSE']
study_desc_should_not_contain: a list of keywords the patient study should not contain.
study_desc_may_contain: a list of keywords a patient study might contain
possibilities: a list of keywords, where one of them at least should be found in the RTSTRUCT contours
imagesummaries: a directory to host the patient instances dicom summaries tags found in the targeted Orthanc server.
patientnotesdir: a directory to host the patient notes
datadictionary: a directory to host the patients records.
quarantine: Patients might have a rescan through the course of treatment. Other patients might have multiple courses of treatment. Quarantine is used to host patients with multiple studies.
link_to_ids: a csv file that contains the patient ids and its corresponding indexed orthanc ids.
ipport: ip and port of the Orthanc server.
username: username of an account in the Orthanc server
password: password of an account in the Orthanc server. Leave empty if not needed.
CONNECTIONS: the number of connections to be used when targeting the server.
TIMEOUT: total number of seconds to wait when sending the request to the server.

Step 2: Prepare directories

Create a directory to host the patients data and notes. This directory should include:

ids : a directory used to save a .csv file that will contain the patient ids and its corresponding Orthanc ids (links_to_ids in the config file).
imagesummaries: a directory used to save csv files that contain summaries of patient's imaging data in the targeted orthanc server
patientnotes: a directory that will be used to save json objects that contain each patient data collection and preparation notes
data: will be used to save patient data. For each patient, a new directory will be created.
quarantine: will be used to save patient quarantine data. Each study will be treated as the only study associated with the patient. Hence, once a study is selected it will be manually moved into the data directory.

Step 3: Retrieve Orthanc Ids

The third step is to retrieve patients assoicated ids from the Orthanc server. The Orthanc server will create a specific id for each patient. These values are currently saved in an SQLite database (by default) with options to utilize other servers as plugins (Postgress).

After running this script, each row in the generated .csv file will represent the patient identifier and its corresponding Orthanc identifier, which is used by the Orthanc server to index patient's studies.

The script is currentl

Related Skills

feishu-drive

352.5k

things-mac

352.5k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

352.5k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

codebase-memory-mcp

1.3k

High-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 66 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.

AustralianCancerDataNetwork

View profile

View on GitHub

GitHub Stars4

CategoryData

Updated2y ago

Forks2

AustralianCancerDataNetwork/PDCP

Languages

Python

Security Score

60/100

Audited on Mar 9, 2024

No findings