Pixels
Facilitates simple large scale processing of HLS Medical images, documents, zip files. OHIF Viewer, 2 segmentation models and interactive learning.
Install / Use
/learn @databricks-industry-solutions/PixelsREADME
pixels Solution Accelerator
✅ Ingest and index DICOM image metadata (.dcm and from zip archives) </br> ✅ Analyze DICOM image metadata with SQL and Machine Learning. </br> ✅ View, segment, label DICOM Images with OHIF viewer integrated into Lakehouse Apps and Databricks security model. </br> ✅ One button push to launch model training from OHIF viewer. </br> ✅ NVIDIA's MONAI Integration, AI to automatically segment medical images and train custom models. </br> ✅ Leverage Databricks' Model Serving with serverless GPU enabled clusters for real-time segmentation.
Secure Lakehouse integrated DICOM Viewer powered by OHIF
<img src="https://github.com/databricks-industry-solutions/pixels/blob/main/images/LHA_AUTOSEG.gif?raw=true" alt="MONAI_AUTOSEG"/></br>
Run SQL queries over DICOM metadata
![]()
Build Dashboards over DICOM metadata
add any features extracted too!
![]()
DICOM data ingestion is easy
# import Pixels Catalog (indexer) and DICOM transformers & utilities
from dbx.pixels import Catalog # 01
from dbx.pixels.dicom import * # 02
# catalog all your files
catalog = Catalog(spark) # 03
catalog_df = catalog.catalog(<path>) # 04
# extract the DICOM metadata
meta_df = DicomMetaExtractor(catalog).transform(catalog_df) # 05
# save your work for SQL access
catalog.save(meta_df) # 06
You'll find this example in 01-dcm-demo which does:
Architecture
The image depicts the Pixels Reference Solution Architecture, which outlines a data processing and analytics framework designed for healthcare or imaging applications. Here's a breakdown of its components:
Key Functional Areas
-
AI/BI Analytics: Supports cohort building and natural language-based analysis.
-
Lakehouse Apps: Includes an OHIF Viewer for labeling and customer-specific applications.
-
Deep Learning: Facilitates active learning and customer model training.
-
Realtime Inferencing: Implements MONAI (Medical Open Network for AI) for segmentation integration with the OHIF viewer. Customer provided proprietary models can be easily plugged in.
Data Flow: Batch, Incremental, Streaming Lakeflow
The architecture processes data in stages:
-
Acquire: from data in ADLS, S3, GCS cloud storage as governed by Unity Catalog (UC) Volumes. Based on customer demand, due to the composible nature of the solution accelerator, sources VNA, PACS, CIFS, AWS HealthImaging can be added as needed.
-
Ingest: Ultimately all the DICOM files are ingested. Ingesting and producing Nifti file formats are currently on the roadmap.
-
Extract & Index: Unzips files, storing the extracted DICOM files into a UC volume. All of the DICOM metadata tags are extracted and stored in Databricks Data Intelligence Platform tables.
-
Protect – Metadata: Applies PHI (Protected Health Information) redaction via format preserving encryption to all necessary tags.
-
Protect – Image: Ensures PHI redaction for pixel-level data. This is under active integration based on work Databricks has done in previous solution accelerators.
-
Inferencing: Utilizes industry-standard models pre-trained MONAI open source models sponsored by NVIDIA. Similarly, customers can fine tune the MONAI models or bring their own segmentation or featurization models.
Supporting Layers
-
Governance Layer: Unity Catalog provides data access controls, automatic capture of data lineage (including models)
-
Customer’s Cloud Storage: Stores object indexes, folders, and ML models in open formats in customer's account.
-
Open Access: Provides APIs, SQL connections, Spark integration, and credential vending via Delta Sharing.
This architecture is designed to handle healthcare imaging data securely while enabling advanced analytics and AI-driven insights.
DICOMweb Apps Reference
For the Databricks Apps architecture and operations guide (viewer app, gateway app, QIDO/WADO/STOW implementation, caching, metrics, and config reference), see:
The notebook-driven OHIF/MONAI sections in this README remain valid for interactive
workspace workflows. For production DICOMweb deployments with the split
dicom_web + dicom_web_gateway Databricks Apps architecture, use
README_DICOMWEB.md as the source of truth.
Getting started
-
To run this accelerator, clone this repo into a Databricks workspace.
-
Attach a notebook to Serverless Compute or a cluster (>=DBR 14.3 LTS)
-
Run
config/setup.pyfrom the notebook. This will install the pixels package onto your workspace -
If you need additional libraries to decode or encode DICOM pixel data, use the pydicom guidance to pick the right optional codec package(s): pydicom pixel data decompression guide.
Run pipeline as a job
- Attach the
RUNMEnotebook to Serverless Compute or a cluster (>=DBR 14.3 LTS). 2. Execute the notebook via Run-All. You can configure the notebook tasks run by the job injob_jsonA multi-step-job describing the accelerator pipeline will be created, and the link will be provided. The cost associated with running the accelerator is the user's responsibility.
Incremental processing
Pixels allows you to ingest DICOM files in a streaming fashion using autoloader capability.
To enable incremental processing you need to set streaming and streamCheckpointBasePath as follows:
catalog_df = catalog.catalog(path, streaming=True, streamCheckpointBasePath=<checkpointPath>)
Optional: managed file events with Auto Loader
For higher scalability, you can enable managed file events for discovery instead of directory listing.
catalog_df = catalog.catalog(
path,
streaming=True,
streamCheckpointBasePath=<checkpointPath>,
useManagedFileEvents=True,
includeExistingFiles=True,
allowOverwrites=False,
maxFileAge="90 days"
)
Best practices:
- Use Unity Catalog Volumes or external locations governed by Unity Catalog.
- Ensure the stream runs at least once every 7 days to keep file events warm.
- Keep
allowOverwrites=Falseunless upstream systems can overwrite files. - Use
maxFileAgeto bound discovery windows for large/high-churn landing zones. - Reuse a stable checkpoint path across runs to avoid reprocessing.
Built-in unzip
Automatically extracts zip files in the defined volume path.
If extractZip is not enabled then zip files will be ignored.
To enable unzip capability you need to set extractZip. The parameter extractZipBasePath is optional and the default path will be volume + /unzipped/
catalog_df = catalog.catalog(path, extractZip=True, extractZipBasePath=<unzipPath>)
Metadata Anonymization
Pixels provides a feature to anonymize DICOM metadata to ensure patient privacy and compliance with regulations. This feature can be enabled during the cataloging process. An example can be explored in the 03-Metadata-DeIdentification notebook.
To enable metadata anonymization, you can use the following extractor:
metadata_df = DicomMetaAnonymizerExtractor(
catalog,
anonym_mode="METADATA",
fp_key=<fp_key>, #ONLY HEX STRING ALLOWED - 128, 192 or 256 bits
fp_tweak=<fp_tweak>, #ONLY HEX STRING ALLOWED - 64 bits
anonymization_base_path=<anonym_path>
).transform(catalog_df)
fp_key is the format preserving encryption key used to ensure that the anonymization process is consistent across different runs. This key is used to generate pseudonyms for sensitive data fields, ensuring that the same input value always maps to the same pseudonym. This is useful for maintaining the ability to link records across datasets without revealing the original sensitive information.
fp_tweak is an optional parameter that can be used to add an additional layer of randomness to the pseudonymization process. This can be useful for further enhancing privacy.
By setting the anonym_mode parameter to "METADATA", the DICOM metadata will be anonymized during the ingestion process. This ensures that sensitive patient information is not stored in the catalog.
The default configuration will save the anonymized DICOM files under anonymization_base_path property's path.
Remove UN Tags
DICOM files can contain elements with Value Representation UN (Unknown), which are tags that could not be resolved to a specific VR during parsing. These tags often carry unstructured or proprietary data that can bloat the extracted metadata, cause serialization issues, or introduce noise in downstream analytics.
Pixels provides a built-in option to strip all UN VR elements from the dataset before metadata extraction. The removal is recursive, so UN elements nest
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
API
A learning and reflection platform designed to cultivate clarity, resilience, and antifragile thinking in an uncertain world.
openclaw-plugin-loom
Loom Learning Graph Skill This skill guides agents on how to use the Loom plugin to build and expand a learning graph over time. Purpose - Help users navigate learning paths (e.g., Nix, German)
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
