Putils

putils is a utility library for debugging and profiling distributed AI model training workloads, with specialized support for NPU (Neural Processing Unit) and CUDA environments.

Generate Convert Improve

Install / Use

/learn @Kirrito-k423/Putils

About this skill

Quality Score

0/100

README

putils

English | 中文

English

Overview

putils is a utility library for debugging and profiling distributed AI model training workloads, with specialized support for NPU (Neural Processing Unit) and CUDA environments. The library provides non-invasive instrumentation tools that enable developers to inspect tensor states, monitor memory consumption, profile execution performance, and debug multi-process distributed training workflows without modifying core training code.

Key Features

Tensor state inspection and tracking (shape, hash, front elements, etc.)
Memory consumption monitoring and reporting
Execution performance profiling
Distributed training debugging support
Multiple integrated debugging tools (debug, cache, timer, etc.)
NPU and CUDA environment support
Low-intrusive design, minimal changes to training code

Tech Stack

Language: Python 100%
Supported Environments: NPU, CUDA
Related Tools: VizTracer (tracing), Backward Hook (gradient monitoring)

Installation

# Basic installation (core utilities only)
pip install putils

# Install with stack-sniffer support (requires py-spy)
pip install putils[stack-sniffer]

# Development installation
pip install -e .
pip install -e .[stack-sniffer]  # with stack-sniffer support
pip install -r requirements-dev.txt  # development dependencies

Project Structure

putils/
├── tools/
│   ├── python_stack_sniffer.py   # Main tool: Stack tracing & visualization
│   └── py_stack_snap.py          # Stack snapshot tool
├── debug.py                      # Debug utilities
├── memory.py                     # Memory monitoring
├── timer.py                      # Timing utilities
├── cache.py                      # Caching utilities
├── device.py                     # Device management
├── profiling.py                  # Profiling utilities
├── perf.py                       # Performance tools
├── pprint.py                     # Pretty printing
├── accuracy.py                   # Accuracy metrics
├── burn.py                       # Burn-in testing
├── dataset/                      # Dataset utilities
├── tests/                        # Unit tests
└── README.md

中文

简介

putils 是一个专为分布式AI模型训练工作负载设计的调试和性能分析工具库，特别支持NPU（神经处理单元）和CUDA环境。该库提供非侵入式的仪器工具，使开发者能够在不修改核心训练代码的情况下，检查张量状态、监控内存消耗、分析执行性能，以及调试多进程分布式训练工作流。

主要功能

张量状态检查与追踪（支持shape、hash、前10个元素等信息）
内存消耗监控和报告
执行性能分析和分析（perf profiling）
分布式训练调试支持
多种调试工具集成（debug、cache、timer等）
支持NPU和CUDA环境
低侵入式设计，最轻量化修改训练代码

技术栈

语言：Python 100%
支持环境：NPU、CUDA
相关工具：VizTracer（用于追踪）、Backward Hook（用于梯度监控）

安装

# 基础安装（仅核心工具）
pip install putils

# 安装 stack-sniffer 支持（需要 py-spy）
pip install putils[stack-sniffer]

# 开发模式安装
pip install -e .
pip install -e .[stack-sniffer]  # 包含 stack-sniffer 支持
pip install -r requirements-dev.txt  # 开发依赖

项目结构

putils/
├── tools/
│   ├── python_stack_sniffer.py   # 核心工具：栈追踪与可视化
│   └── py_stack_snap.py          # 栈快照工具
├── debug.py                      # 调试工具
├── memory.py                     # 内存监控
├── timer.py                      # 计时工具
├── cache.py                      # 缓存工具
├── device.py                     # 设备管理
├── profiling.py                 # 性能分析
├── perf.py                       # 性能工具
├── pprint.py                     # 格式化打印
├── accuracy.py                   # 精度指标
├── burn.py                       # 烧机测试
├── dataset/                      # 数据集工具
├── tests/                        # 单元测试
└── README.md

python_stack_sniffer.py 详解

English

python_stack_sniffer.py is a powerful stack tracing tool that periodically captures stack traces from Python processes and converts them to Chrome Tracing JSON format for visualization.

Core Features

Stack Trace Capture: Uses py-spy to capture stack traces from running Python processes
Chrome Tracing Format: Converts stack traces to Chrome Tracing JSON format for visualization in chrome://tracing
Automatic PID Discovery: Automatically discovers Python processes via npu-smi info (NPU environment)
NPU Monitoring: Records NPU AICore and HBM usage rates during tracing
CPU/Memory Monitoring: Records system CPU memory usage
Multi-thread Support: Can capture all threads or MainThread only
Auto-save: Supports periodic snapshots to prevent data loss during long-running tasks

Usage Examples

# Auto-discover PIDs from npu-smi (recommended for NPU environments)
python python_stack_sniffer.py -i 60 -o stack_trace.json --autosave-interval 60 --npu-usage --cpu-mem-usage --all-thread

# Manual PID list
python python_stack_sniffer.py -p 44002,44003 -i 0.2 -d 10 -o trace.json

# With NPU monitoring
python python_stack_sniffer.py -p 12345 -i 0.2 -d 10 -o trace.json --npu-usage

# Auto-save every 10 seconds
python python_stack_sniffer.py -p 1667631 -i 0.1 -o stack_trace.json --autosave-interval 10

Key Parameters

| Parameter | Description | |-----------|-------------| | -p/--pid | Process ID list (comma-separated or repeatable) | | -i/--interval | Sampling interval in seconds (default: 0.1) | | -o/--output | Output JSON file path | | -d/--duration | Duration to run in seconds | | --npu-usage | Enable NPU usage monitoring (AICore/HBM) | | --cpu-mem-usage | Enable CPU memory monitoring | | --all-threads | Capture all threads (default: MainThread only) | | --autosave-interval | Auto-save interval in seconds | | --autosave-snapshot-interval | Snapshot interval with time tag (default: 7200s) |

Output

The tool generates a Chrome Tracing compatible JSON file that can be opened in:

Chrome browser: chrome://tracing
VS Code with Chrome Trace Viewer extension
Any Chrome Tracing compatible viewer

python_stack_sniffer.py 详解

python_stack_sniffer.py 是一款强大的栈追踪工具，可以定期捕获Python进程的堆栈跟踪并将其转换为Chrome Tracing JSON格式进行可视化分析。

核心功能

栈跟踪捕获: 使用 py-spy 从正在运行的Python进程中捕获堆栈跟踪
Chrome追踪格式: 将堆栈跟踪转换为Chrome追踪JSON格式，可在 chrome://tracing 中可视化
自动PID发现: 通过 npu-smi info 自动发现Python进程（NPU环境）
NPU监控: 追踪期间记录NPU AICore和HBM使用率
CPU/内存监控: 记录系统CPU内存使用情况
多线程支持: 可捕获所有线程或仅主线程
自动保存: 支持定期快照，防止长时运行任务数据丢失

异步使用

datetime=$(date +%Y%m%d_%H%M%S)

pkill -f "python_stack_sniffer.py"
python /mnt/huawei/tanshaojie/putils/tools/python_stack_sniffer.py -i 2 -o stack_trace_${datetime}.json --autosave-interval 10 --npu-usage --cpu-mem-usage --all-thread --debug-pid-discovery 1>&2 >> tmp.log &

使用示例

# 从npu-smi自动发现PID（推荐NPU环境使用）
python python_stack_sniffer.py -i 60 -o stack_trace.json --autosave-interval 60 --npu-usage --cpu-mem-usage --all-thread

# 手动指定PID列表
python python_stack_sniffer.py -p 44002,44003 -i 0.2 -d 10 -o trace.json

# 开启NPU监控
python python_stack_sniffer.py -p 12345 -i 0.2 -d 10 -o trace.json --npu-usage

# 每10秒自动保存
python python_stack_sniffer.py -p 1667631 -i 0.1 -o stack_trace.json --autosave-interval 10

主要参数

| 参数 | 说明 | |------|------| | -p/--pid | 进程ID列表（逗号分隔或可重复） | | -i/--interval | 采样间隔（秒，默认0.1） | | -o/--output | 输出JSON文件路径 | | -d/--duration | 运行持续时间（秒） | | --npu-usage | 启用NPU使用率监控（AICore/HBM） | | --cpu-mem-usage | 启用CPU内存监控 | | --all-threads | 捕获所有线程（默认仅MainThread） | | --autosave-interval | 自动保存间隔（秒） | | --autosave-snapshot-interval | 带时间戳的快照间隔（默认7200秒） |

输出说明

工具生成兼容Chrome Tracing的JSON文件，可在以下工具中打开：

Chrome浏览器: chrome://tracing
VS Code配合Chrome Trace Viewer扩展
任何兼容Chrome Tracing的查看器

技术细节

使用 py-spy dump 命令获取栈跟踪，支持 --native 选项捕获原生栈
自动解析py-spy输出，提取线程信息、函数名、文件名、行号
转换为Chrome Tracing事件格式（B=begin, E=end, M=metadata, C=counter）
支持动态PID发现和移除，自动管理open stacks避免数据错误
记录各阶段耗时统计，便于性能分析和问题诊断

compare_perf 文档入口

compare_perf 的快速接入、CLI 工作流、mapping/cache、产物解读和故障排查已迁移到子目录文档:

putils/compare_perf/README.md

Related Skills

node-connect

354.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

112.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

354.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

354.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。