SkillAgentSearch skills...

S3dlio

Part of the sai3 project that delivers multi-protocol storage access for AI/ML workflows, supporting Pytorch, Tensorflow and Jax. This project provides a CLI, along with Rust and Python libraries for AI/ML storage workflows. Supporting S3, File, Azure Blob and GCS using the latest Rust SDKs.

Install / Use

/learn @russfellows/S3dlio
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

s3dlio - Universal Storage I/O Library

Build Status Rust Tests Version PyPI License Rust Python

High-performance, multi-protocol storage library for AI/ML workloads with universal copy operations across S3, Azure, GCS, local file systems, and DirectIO.

v0.9.86 — Redirect-following connector (S3DLIO_FOLLOW_REDIRECTS=1) for tacit NVIDIA AIStore support via S3; scheme-downgrade (HTTPS→HTTP) prevention active; 21 new redirect unit tests. Note: direct AIStore end-to-end testing has not been performed; cert-pinning security is pending (see docs/security/HTTPS_Redirect_Security_Issues.md).

📦 Installation

Quick Install (Python)

# If using uv package manager + uv virtual environment:
uv pip install s3dlio

# If using pip without uv:
pip install s3dlio

Python Backend Profiles (PyPI vs Full Build)

  • If using uv package manager + uv virtual environment: uv pip install s3dlio.
  • If using standard pip without uv: pip install s3dlio.
  • The default published wheel is now S3-focused (Azure Blob and GCS are excluded).
  • If you want full backends (S3 + Azure Blob + GCS), build from source with:
# uv workflow:
uv pip install s3dlio --no-binary s3dlio --config-settings "cargo-extra-args=--features extension-module,full-backends"

# pip-only workflow:
pip install s3dlio --no-binary s3dlio --config-settings "cargo-extra-args=--features extension-module,full-backends"

You can still add a separate package name (for example s3dlio-full) later if you want a dedicated prebuilt full wheel distribution.

Maintainer note: for PyPI uploads, publish the default (./build_pyo3.sh) wheel unless intentionally releasing a separate distribution. full-backends is currently source-build only via the command above.

Building from Source (Rust)

System Dependencies

s3dlio requires some system libraries to build. Only OpenSSL and pkg-config are required by default. HDF5 and hwloc are optional and improve functionality but are not needed for the core library:

Ubuntu/Debian:

# Quick install - run our helper script
./scripts/install-system-deps.sh

# Or manually (required only):
sudo apt-get install -y build-essential pkg-config libssl-dev

# Optional - for NUMA topology support (--features numa):
sudo apt-get install -y libhwloc-dev

# Optional - for HDF5 data format support (--features hdf5):
sudo apt-get install -y libhdf5-dev

# All optional libraries at once:
sudo apt-get install -y libhdf5-dev libhwloc-dev cmake

RHEL/CentOS/Fedora/Rocky/AlmaLinux:

# Quick install
./scripts/install-system-deps.sh

# Or manually (required only):
sudo dnf install -y gcc gcc-c++ make pkg-config openssl-devel

# Optional - for NUMA topology support:
sudo dnf install -y hwloc-devel

# Optional - for HDF5 data format support:
sudo dnf install -y hdf5-devel

# All optional libraries at once:
sudo dnf install -y hdf5-devel hwloc-devel cmake

macOS:

# Quick install
./scripts/install-system-deps.sh

# Or manually (required only):
brew install pkg-config openssl@3

# Optional - for NUMA/HDF5 support:
brew install hdf5 hwloc cmake

# Set environment variables (add to ~/.zshrc or ~/.bash_profile):
export PKG_CONFIG_PATH="$(brew --prefix openssl@3)/lib/pkgconfig:$PKG_CONFIG_PATH"
export OPENSSL_DIR="$(brew --prefix openssl@3)"

Arch Linux:

# Quick install
./scripts/install-system-deps.sh

# Or manually (required only):
sudo pacman -S base-devel pkg-config openssl

# Optional - for NUMA/HDF5 support:
sudo pacman -S hdf5 hwloc cmake

WSL (Windows Subsystem for Linux) / Minimal Environments:

If you are building on WSL or any environment where libhdf5 or libhwloc may not be available, s3dlio builds without them by default. No extra libraries are required:

# Just the basics - works on WSL, Docker, CI, and minimal installs:
sudo apt-get install -y build-essential pkg-config libssl-dev
cargo build --release
# install Python package (no system HDF5/hwloc needed):
# uv workflow:
uv pip install s3dlio
# pip-only workflow:
pip install s3dlio

Install Rust (if not already installed)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

Build s3dlio

# Clone the repository
git clone https://github.com/russfellows/s3dlio.git
cd s3dlio

# Build with default features (no HDF5 or NUMA required)
cargo build --release

# Build s3-cli with all cloud backends enabled (AWS + Azure + GCS)
cargo build --release --bin s3-cli --features full-backends

# Build s3-cli with GCS enabled only (plus default backends)
cargo build --release --bin s3-cli --features backend-gcs

# Build with NUMA topology support (requires libhwloc-dev)
cargo build --release --features numa

# Build with HDF5 data format support (requires libhdf5-dev)
cargo build --release --features hdf5

# Build with all optional features
cargo build --release --features numa,hdf5

# Run tests
cargo test

# Build Python bindings (optional)
./build_pyo3.sh

# Build Python bindings with full backends (S3 + Azure + GCS)
./build_pyo3.sh full

# Named profile form is also supported:
./build_pyo3.sh --profile full
./build_pyo3.sh --profile default

# Show profile/help usage
./build_pyo3.sh --help

Build Profile Quick Reference

Rust backend feature profiles:

  • Default build (cargo build --release): S3-focused default backend set.
  • GCS-enabled build (--features backend-gcs): enables GCS in addition to default set.
  • Full cloud build (--features full-backends): enables AWS + Azure + GCS.

Python wheel build profiles via build_pyo3.sh:

  • default or slim: AWS + file/direct; excludes Azure and GCS.
  • full: AWS + Azure + GCS + file/direct.
  • Positional and named forms are equivalent:
    • ./build_pyo3.sh full
    • ./build_pyo3.sh -p full
    • ./build_pyo3.sh --profile full

Optional extra Rust features for wheel builds can still be passed with EXTRA_FEATURES. Example: EXTRA_FEATURES="numa,hdf5" ./build_pyo3.sh full.

Note: NUMA support (--features numa) improves multi-socket performance but requires the hwloc2 C library. HDF5 support (--features hdf5) enables HDF5 data format generation but requires libhdf5. Both are optional and s3dlio is fully functional without them.

Platform support: s3dlio builds natively on Linux (x86_64, aarch64), macOS (x86_64 and Apple Silicon arm64), and WSL. Making numa and hdf5 optional was the key change for broad platform support — all remaining dependencies are pure Rust or use platform-independent system libraries (OpenSSL). To cross-compile Python wheels for Linux ARM64 from an x86_64 host, see build_pyo3.sh for instructions using the --zig linker. For macOS universal2 (fat binary covering both architectures), see the commented section in build_pyo3.sh.

✨ Key Features

  • High Performance: High-throughput multi GB/s reads and writes on platforms with sufficient network and storage capabilities
  • Zero-Copy Architecture: bytes::Bytes throughout for minimal memory overhead
  • Multi-Protocol: S3, Azure Blob, GCS, file://, direct:// (O_DIRECT)
  • Python & Rust: Native Rust library with zero-copy Python bindings (PyO3), bytearray support for efficient memory management
  • Multi-Endpoint Load Balancing: RoundRobin/LeastConnections across storage endpoints
  • AI/ML Ready: PyTorch DataLoader integration, TFRecord/NPZ format support
  • High-Speed Data Generation: 50+ GB/s test data with configurable compression/dedup

🌟 Latest Release

v0.9.86 (March 2026) - Redirect-following connector for tacit NVIDIA AIStore S3 support (S3DLIO_FOLLOW_REDIRECTS=1); HTTPS→HTTP scheme-downgrade protection active; 21 new redirect unit tests. Note: tested against the AIStore protocol spec but not against a live AIStore cluster. Certificate pinning is pending (see security doc).

Recent highlights:

  • v0.9.86 - Redirect follower for NVIDIA AIStore (S3 path); HTTPS→HTTP downgrade prevention; 21 new redirect tests; redirect security analysis documented
  • v0.9.84 - HEAD elimination (ObjectSizeCache); OnceLock env-var caching; lock-free range assembly; AWS_CA_BUNDLE_PATHAWS_CA_BUNDLE; structured tracing
  • v0.9.80 - Python list hang fix (IMDSv2 legacy call removed); tracing deadlock fix (tokio::spawn → inline stream); async S3 delete/bucket helpers; deprecated Python APIs cleaned up

📖 Complete Changelog - Full version history, migration guides, API details


📚 Version History

For detailed release notes and migration guides, see the Complete Changelog.


Storage Backend Support

Universal Backend Architecture

s3dlio provides unified storage operations across all backends with consistent URI patterns:

  • 🗄️ Amazon S3: s3://bucket/prefix/ - High-performance S3 operations (5+ GB/s reads, 2.5+ GB/s writes)
  • ☁️ Azure Blob Storage: az://container/prefix/ - Complete Azure integration with RangeEngine (30-50% faster for large blobs)
  • 🌐 Google Cloud Storage: gs://bucket/prefix/ or gcs://bucket/prefix/ - Production ready with RangeEngine and full ObjectStore integration
  • 📁 Local File System: `file:///path/to/directo

Related Skills

View on GitHub
GitHub Stars8
CategoryCustomer
Updated13d ago
Forks0

Languages

Rust

Security Score

90/100

Audited on Mar 23, 2026

No findings