Hdf5
Modern Pure Go implementation of the HDF5 file format
Install / Use
/learn @scigolib/Hdf5README
HDF5 Go Library
Pure Go implementation of the HDF5 file format - No CGo required
A modern, pure Go library for reading and writing HDF5 files without CGo dependencies. HDF5 2.0.0 compatible, production-ready read/write support.
✨ Features
- ✅ Pure Go - No CGo, no C dependencies, cross-platform
- ✅ Modern Design - Built with Go 1.25+ best practices
- ✅ HDF5 2.0.0 Compatibility - Read/Write: v0, v2, v3 superblocks | Format Spec v4.0 with checksum validation
- ✅ Full Dataset Reading - Compact, contiguous, chunked layouts with GZIP
- ✅ Rich Datatypes - Integers, floats, strings (fixed/variable), compounds
- ✅ Memory Efficient - Buffer pooling and smart memory management
- ✅ Production Ready - Read support feature-complete
- ✍️ Comprehensive Write Support - Datasets, groups, attributes + Smart Rebalancing!
🚀 Quick Start
Installation
go get github.com/scigolib/hdf5
Basic Usage
package main
import (
"fmt"
"log"
"github.com/scigolib/hdf5"
)
func main() {
// Open HDF5 file
file, err := hdf5.Open("data.h5")
if err != nil {
log.Fatal(err)
}
defer file.Close()
// Walk through file structure
file.Walk(func(path string, obj hdf5.Object) {
switch v := obj.(type) {
case *hdf5.Group:
fmt.Printf("📁 %s (%d children)\n", path, len(v.Children()))
case *hdf5.Dataset:
fmt.Printf("📊 %s\n", path)
}
})
}
Output:
📁 / (2 children)
📊 /temperature
📁 /experiments/ (3 children)
📚 Documentation
Getting Started
- Installation Guide - Install and verify the library
- Quick Start Guide - Get started in 5 minutes
- Reading Data - Comprehensive guide to reading datasets and attributes
Reference
- Datatypes Guide - HDF5 to Go type mapping
- Troubleshooting - Common issues and solutions
- FAQ - Frequently asked questions
- API Reference - GoDoc documentation
Advanced
- Architecture Overview - How it works internally
- Performance Tuning - B-tree rebalancing strategies for optimal performance
- Rebalancing API - Complete API reference for rebalancing options
- Examples - Working code examples (7 examples with detailed documentation)
⚡ Performance Tuning
When deleting many attributes, B-trees can become sparse (wasted disk space, slower searches). This library offers 4 rebalancing strategies:
1. Default (No Rebalancing)
Fast deletions, but B-tree may become sparse
// No options = no rebalancing (like HDF5 C library)
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate)
Use for: Append-only workloads, small files (<100MB)
2. Lazy Rebalancing (10-100x faster than immediate)
Batch processing: rebalances when threshold reached
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
hdf5.WithLazyRebalancing(
hdf5.LazyThreshold(0.05), // Trigger at 5% underflow
hdf5.LazyMaxDelay(5*time.Minute), // Force rebalance after 5 min
),
)
Use for: Batch deletion workloads, medium/large files (100-500MB)
Performance: ~2% overhead, occasional 100-500ms pauses
3. Incremental Rebalancing (ZERO pause)
Background processing: rebalances in background goroutine
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
hdf5.WithLazyRebalancing(), // Prerequisite!
hdf5.WithIncrementalRebalancing(
hdf5.IncrementalBudget(100*time.Millisecond),
hdf5.IncrementalInterval(5*time.Second),
),
)
defer fw.Close() // Stops background goroutine
Use for: Large files (>500MB), continuous operations, TB-scale data
Performance: ~4% overhead, zero user-visible pause
4. Smart Rebalancing (Auto-Pilot)
Auto-tuning: library detects workload and selects optimal mode
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
hdf5.WithSmartRebalancing(
hdf5.SmartAutoDetect(true),
hdf5.SmartAutoSwitch(true),
),
)
Use for: Unknown workloads, mixed operations, research environments
Performance: ~6% overhead, adapts automatically
Performance Comparison
| Mode | Deletion Speed | Pause Time | Use Case | |------|----------------|------------|----------| | Default | 100% (baseline) | None | Append-only, small files | | Lazy | 95% (10-100x faster than immediate!) | 100-500ms batches | Batch deletions | | Incremental | 92% | None (background) | Large files, continuous ops | | Smart | 88% | Varies | Unknown workloads |
Learn more:
- Performance Tuning Guide: Comprehensive guide with benchmarks, recommendations, troubleshooting
- Rebalancing API Reference: Complete API documentation
- Examples: 4 working examples demonstrating each mode
🎯 Current Status
HDF5 2.0.0 Ready with 88%+ library coverage! 🎉
✅ Fully Implemented
-
File Structure:
- Superblock parsing (v0, v2, v3) with checksum validation (CRC32)
- Object headers v1 (legacy HDF5 < 1.8) with continuations
- Object headers v2 (modern HDF5 >= 1.8) with continuations
- Groups (traditional symbol tables + modern object headers)
- B-trees (leaf + non-leaf nodes for large files)
- Local heaps (string storage)
- Global Heap (variable-length data)
- Fractal heap (direct blocks for dense attributes) ✨ NEW
-
Dataset Reading:
- Compact layout (data in object header)
- Contiguous layout (sequential storage)
- Chunked layout with B-tree indexing
- GZIP/Deflate compression
- LZF compression (h5py/PyTables compatible) ✨ NEW
- Filter pipeline for compressed data
-
Datatypes (Read + Write):
- Basic types: int8-64, uint8-64, float32/64
- AI/ML types: FP8 (E4M3, E5M2), bfloat16 - IEEE 754 compliant ✨ NEW
- Strings: Fixed-length (null/space/null-padded), variable-length (via Global Heap)
- Advanced types: Arrays, Enums, References (object/region), Opaque
- Compound types: Struct-like with nested members
-
Attributes:
- Compact attributes (in object header) ✨ NEW
- Dense attributes (fractal heap foundation) ✨ NEW
- Attribute reading for groups and datasets ✨ NEW
- Full attribute API (Group.Attributes(), Dataset.Attributes()) ✨ NEW
-
Navigation: Full file tree traversal via Walk()
-
Code Quality:
- Test coverage: 88%+ library packages (target: >70%) ✅
- Lint issues: 0 (34+ linters) ✅
- TODO items: 0 (all resolved) ✅
- Official HDF5 test suite: 433 files, 100% pass rate ✅
-
Security ✨ NEW:
- 4 CVEs fixed (CVE-2025-7067, CVE-2025-6269, CVE-2025-2926, CVE-2025-44905) ✅
- Overflow protection throughout (SafeMultiply, buffer validation) ✅
- Security limits: 1GB chunks, 64MB attributes, 16MB strings ✅
- 39 security test cases, all passing ✅
✍️ Write Support - Feature Complete!
Production-ready write support with all features! ✅
Dataset Operations:
- ✅ Create datasets (all layouts: contiguous, chunked, compact)
- ✅ Write data (all datatypes including compound)
- ✅ Dataset resizing with unlimited dimensions
- ✅ Variable-length datatypes: strings, ragged arrays
- ✅ Compression (GZIP, Shuffle, Fletcher32)
- ✅ Array and enum datatypes
- ✅ References and opaque types
- ✅ Attribute writing (dense & compact storage)
- ✅ Attribute modification/deletion
Links:
- ✅ Hard links (full support)
- ✅ Soft links (symbolic references - full support)
- ✅ External links (cross-file references - full support)
Read Enhancements:
- ✅ Hyperslab selection (data slicing) - 10-250x faster!
- ✅ Efficient partial dataset reading
- ✅ Stride and block support
- ✅ Chunk-aware reading (reads ONLY needed chunks)
- ✅ ChunkIterator API - Memory-efficient iteration over large datasets
Validation:
- ✅ Official HDF5 Test Suite: 100% pass rate (378/378 files)
- ✅ Production quality confirmed
Future Enhancements:
- ✅ LZF filter (read + write, Pure Go) ✨ NEW
- ✅ BZIP2 filter (read only, stdlib)
- ⚠️ SZIP filter (stub - requires libaec)
- ⚠️ Thread-safety with mutexes + SWMR mode
- ⚠️ Parallel I/O
Related Skills
xurl
350.8kA CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.
feishu-drive
350.8k|
things-mac
350.8kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
350.8kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
