DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

Generate Convert Improve

Install / Use

/learn @NVIDIA/DALI

About this skill

Quality Score

0/100

README

NVIDIA DALI

.. overview-begin-marker-do-not-remove

The NVIDIA Data Loading Library (DALI) is a GPU-accelerated library for data loading and pre-processing to accelerate deep learning applications. It provides a collection of highly optimized building blocks for loading and processing image, video and audio data. It can be used as a portable drop-in replacement for built in data loaders and data iterators in popular deep learning frameworks.

Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding, cropping, resizing, and many other augmentations. These data processing pipelines, which are currently executed on the CPU, have become a bottleneck, limiting the performance and scalability of training and inference.

DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline. Features such as prefetching, parallel execution, and batch processing are handled transparently for the user.

In addition, the deep learning frameworks have multiple data pre-processing implementations, resulting in challenges such as portability of training and inference workflows, and code maintainability. Data processing pipelines implemented using DALI are portable because they can easily be retargeted to TensorFlow, PyTorch, and PaddlePaddle.

.. image:: /dali.png :width: 800 :align: center :alt: DALI Diagram

DALI in action:

.. container:: dali-tabs

Pipeline mode:

.. code-block:: python

  from nvidia.dali.pipeline import pipeline_def
  import nvidia.dali.types as types
  import nvidia.dali.fn as fn
  from nvidia.dali.plugin.pytorch import DALIGenericIterator
  import os

  # To run with different data, see documentation of nvidia.dali.fn.readers.file
  # points to https://github.com/NVIDIA/DALI_extra
  data_root_dir = os.environ['DALI_EXTRA_PATH']
  images_dir = os.path.join(data_root_dir, 'db', 'single', 'jpeg')


  def loss_func(pred, y):
      pass


  def model(x):
      pass


  def backward(loss, model):
      pass


  @pipeline_def(num_threads=4, device_id=0)
  def get_dali_pipeline():
      images, labels = fn.readers.file(
          file_root=images_dir, random_shuffle=True, name="Reader")
      # decode data on the GPU
      images = fn.decoders.image_random_crop(
          images, device="mixed", output_type=types.RGB)
      # the rest of processing happens on the GPU as well
      images = fn.resize(images, resize_x=256, resize_y=256)
      images = fn.crop_mirror_normalize(
          images,
          crop_h=224,
          crop_w=224,
          mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
          std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
          mirror=fn.random.coin_flip())
      return images, labels


  train_data = DALIGenericIterator(
      [get_dali_pipeline(batch_size=16)],
      ['data', 'label'],
      reader_name='Reader'
  )


  for i, data in enumerate(train_data):
      x, y = data[0]['data'], data[0]['label']
      pred = model(x)
      loss = loss_func(pred, y)
      backward(loss, model)

Dynamic mode:

.. code-block:: python

  import os
  import nvidia.dali.types as types
  import nvidia.dali.experimental.dynamic as ndd
  import torch

  # To run with different data, see documentation of ndd.readers.File
  # points to https://github.com/NVIDIA/DALI_extra
  data_root_dir = os.environ['DALI_EXTRA_PATH']
  images_dir = os.path.join(data_root_dir, 'db', 'single', 'jpeg')


  def loss_func(pred, y):
      pass


  def model(x):
      pass


  def backward(loss, model):
      pass


  reader = ndd.readers.File(file_root=images_dir, random_shuffle=True)

  for images, labels in reader.next_epoch(batch_size=16):
      images = ndd.decoders.image_random_crop(images, device="gpu", output_type=types.RGB)
      # the rest of processing happens on the GPU as well
      images = ndd.resize(images, resize_x=256, resize_y=256)
      images = ndd.crop_mirror_normalize(
          images,
          crop_h=224,
          crop_w=224,
          mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
          std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
          mirror=ndd.random.coin_flip(),
      )

      x = torch.as_tensor(images)
      y = torch.as_tensor(labels.gpu())

      pred = model(x)
      loss = loss_func(pred, y)
      backward(loss, model)

Highlights

Easy-to-use functional style Python API.
Multiple data formats support - LMDB, RecordIO, TFRecord, COCO, JPEG, JPEG 2000, WAV, FLAC, OGG, H.264, VP9 and HEVC.
Portable across popular deep learning frameworks: TensorFlow, PyTorch, PaddlePaddle, JAX.
Supports CPU and GPU execution.
Scalable across multiple GPUs.
Flexible graphs let developers create custom pipelines.
Extensible for user-specific needs with custom operators.
Accelerates image classification (ResNet-50), object detection (SSD) workloads as well as ASR models (Jasper, RNN-T).
Allows direct data path between storage and GPU memory with GPUDirect Storage <https://developer.nvidia.com/gpudirect-storage>__.
Easy integration with NVIDIA Triton Inference Server <https://developer.nvidia.com/nvidia-triton-inference-server>__ with DALI TRITON Backend <https://github.com/triton-inference-server/dali_backend>__.
Open source.

.. overview-end-marker-do-not-remove

DALI success stories:

During Kaggle computer vision competitions <https://www.kaggle.com/code/theoviel/rsna-breast-baseline-faster-inference-with-dali>: "DALI is one of the best things I have learned in this competition" <https://www.kaggle.com/competitions/rsna-breast-cancer-detection/discussion/391059>
Lightning Pose - state of the art pose estimation research model <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168383/>__
To improve the resource utilization in Advanced Computing Infrastructure <https://arcwiki.rs.gsu.edu/en/dali/using_nvidia_dali_loader>__
MLPerf - the industry standard for benchmarking compute and deep learning hardware and software <https://developer.nvidia.com/blog/mlperf-hpc-v1-0-deep-dive-into-optimizations-leading-to-record-setting-nvidia-performance/>__
"we optimized major models inside eBay with the DALI framework" <https://www.nvidia.com/en-us/on-demand/session/gtc24-s62578/>__

DALI Roadmap

The following issue represents <https://github.com/NVIDIA/DALI/issues/5320>__ a high-level overview of our 2024 plan. You should be aware that this roadmap may change at any time and the order of its items does not reflect any type of priority.

We strongly encourage you to comment on our roadmap and provide us feedback on the mentioned GitHub issue.

Installing DALI

To install the latest DALI release for the latest CUDA version (12.x)::

pip install nvidia-dali-cuda120
# or
pip install --extra-index-url https://pypi.nvidia.com  --upgrade nvidia-dali-cuda120

DALI requires NVIDIA driver <https://www.nvidia.com/drivers>__ supporting the appropriate CUDA version. In case of DALI based on CUDA 12, it requires CUDA Toolkit <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>__ to be installed.

DALI comes preinstalled in the TensorFlow <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow>, PyTorch <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch>, and PaddlePaddle <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/paddlepaddle>__ containers on NVIDIA GPU Cloud <https://ngc.nvidia.com>__.

For other installation paths (TensorFlow plugin, older CUDA version, nightly and weekly builds, etc), and specific requirements please refer to the Installation Guide <https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html>__.

To build DALI from source, please refer to the Compilation Guide <https://docs.nvidia.com/deeplearning/dali/user-guide/docs/compilation.html>__.

Examples and Tutorials

An introduction to DALI can be found in the Getting Started <https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/getting_started.html>__ page.

More advanced examples can be found in the Examples and Tutorials <https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/index.html>__ page.

For an interactive version (Jupyter notebook) of the examples, go to the docs/examples <https://github.com/NVIDIA/DALI/blob/main/docs/examples>__ directory.

Note: Select the Latest Release Documentation <https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html>__ or the Nightly Release Documentation <https://docs.nvidia.com/deeplearning/dali/main-user-guide/docs/index.html>__, which stays in sync with the main branch, depending on your version.

Additional Resources

GPU Technology Conference 2024; Optimizing Inference Model Serving for Highest Performance at eBay; Yiheng Wang: event <https://www.nvidia.com/en-us/on-demand/session/gtc24-s62578/>__
GPU Technology Conference 2023; Developer Breakout: Accelerating Enterprise Workflows With Triton Server and DALI; Brandon Tuttle: event <https://www.nvidia.com/en-us/on-demand/session/gtcspring23-se52140/>__.
GPU Technology Conference 2023; GPU-Accelerating End-to-End Geospatial Workflows; Kevin Green: event <https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s51796/>__.
GPU Technology Conference 2022; *Effective NVIDIA DALI: Accelerating Real-life Deep-learning Applications

Related Skills

claude-opus-4-5-migration

82.7k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

model-usage

335.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

mcp-for-beginners

15.6k

This open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.

TrendRadar

49.8k

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载，你的 AI 舆情监控助手与热点筛选工具！聚合多平台热点 + RSS 订阅，支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机，也支持接入 MCP 架构，赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ，数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。

NVIDIA

View profile

View on GitHub

GitHub Stars5.7k

CategoryEducation

Updated15h ago

Forks660

NVIDIA/DALI

Languages

C++

Security Score

100/100

Audited on Mar 25, 2026

No findings