GRALP
Generalized-depth Ray-Attention Local Planner
Install / Use
/learn @hnsqdtt/GRALPREADME
GRALP (Generalized-depth Ray-Attention Local Planner)
GRALP trains a lightweight PPO local planner on a fully randomized, map-free GPU environment. Observations are vectorized generalized ray/depth (distance-to-obstacle) samples with kinematic history; actions are continuous planar velocity commands. The codebase also ships a one-command exporter that packages the trained policy into a standalone inference API.
GRALP(Generalized-depth Ray-Attention Local Planner)在 完全随机化、无地图 的 GPU 环境中训练轻量级的 PPO 局部规划器。观测由向量化的广义光线/深度(离障距离)采样和运动学历史组成,动作是连续的平面速度指令。仓库还提供“一键导出”脚本,可将训练好的策略打包为独立的推理 API。

Quickstart
-
Install dependencies
pip install -r requirements.txt # numpy + onnx + onnxruntime (CPU) # then install torch matching your device (needed for training + setup_api export), e.g.: pip install torch --index-url https://download.pytorch.org/whl/cpu # or pip install torch --index-url https://download.pytorch.org/whl/cu121 # for GPU ONNX export/inference, swap to: # pip install onnxruntime-gpu onnxtools/setup_api.pyandppo_apiinference rely ononnx+onnxruntime(oronnxruntime-gpu).- Optional: install
matplotlib+scipywhen runningtools/analyze_blank_ratio.py.
-
Configure
config/env_config.jsonlimits: linear/yaw caps (vx_max,omega_max).sim:dt,safe_distance, task redraw knobs (task_point_max_dist_m,task_point_success_radius_m,task_point_random_interval_max).obs: view geometry (patch_meters,ray_max_gapfor derived ray count) plus empty-ratio sampling controls (blank_ratio_base,blank_ratio_randmax,blank_ratio_std_ratio) and optional narrow-passage Gaussian distances (narrow_passage_gaussian,narrow_passage_std_ratio).reward: collision/progress/time/jerk weights andorientation_verifygate.
config/train_config.jsondevice:cuda:0(default) orcpu;env_configfile name to load.sampling:batch_env(e.g., 2048 GPU envs),rollout_len,reset_each_rollout.ppo: discount (gamma),gae_lambda, clipping, optimizer lrs (lr,value_lr),epochs,minibatch_size,entropy_coef,value_coef,max_grad_norm, AMP toggles (amp,amp_bf16),collision_done,log_std_min/max.model: attention shape (num_queries,num_heads).run:total_env_steps,ckpt_dir,log_interval.
-
Train
python -m rl_ppo.ppo_train --train_config config/train_config.jsonOn startup you can create a new checkpoint (
y) or resume from the latest checkpoint underrun.ckpt_dir(n).
快速开始
-
安装依赖
pip install -r requirements.txt # 包含 numpy + onnx + onnxruntime(CPU) # 再安装与你设备匹配的 torch(训练和 setup_api 导出需要),例如: pip install torch --index-url https://download.pytorch.org/whl/cpu # 或 pip install torch --index-url https://download.pytorch.org/whl/cu121 # 如需 GPU ONNX 导出/推理,改为: # pip install onnxruntime-gpu onnxtools/setup_api.py和ppo_api推理依赖onnx+onnxruntime(或onnxruntime-gpu,用于 CUDA/TRT)。- 选装:运行
tools/analyze_blank_ratio.py时可安装matplotlib、scipy。
-
配置文件
config/env_config.jsonlimits:线速度/角速度上限(vx_max,omega_max)。sim:dt、安全距离与任务点重采样(safe_distance,task_point_max_dist_m,task_point_success_radius_m,task_point_random_interval_max)。obs:视野参数(patch_meters,ray_max_gap用于推导射线数量),空/障比例采样控制(blank_ratio_base,blank_ratio_randmax,blank_ratio_std_ratio),以及可选的狭窄通道高斯距离采样(narrow_passage_gaussian,narrow_passage_std_ratio)。reward:碰撞/进度/时间/jerk 权重和orientation_verify开关。
config/train_config.jsondevice:默认cuda:0(可设为cpu);env_config指定加载的环境配置文件。sampling:batch_env(如 2048 个 GPU 环境)、rollout_len、reset_each_rollout。ppo:折扣系数gamma、gae_lambda、裁剪范围、优化器学习率(lr,value_lr)、epochs、minibatch_size、entropy_coef、value_coef、max_grad_norm、AMP 开关(amp,amp_bf16)、collision_done、log_std_min/max。model:注意力形状(num_queries,num_heads)。run:total_env_steps,ckpt_dir,log_interval。
-
开始训练
python -m rl_ppo.ppo_train --train_config config/train_config.json启动时可选择创建新检查点(输入
y),或从run.ckpt_dir下的最新检查点恢复(输入n)。
Standalone Inference Export
python tools/setup_api.py
- Requires
torch,onnx, andonnxruntime(oronnxruntime-gpu).pip install -r requirements.txtplus the right torch wheel covers export and inference. - Rebuilds
ppo_api/: copiestools/api_example, syncs limits/dt/FOV/attention fields fromconfig/env_config.jsonandconfig/train_config.json, picks the newest.ptunderrun.ckpt_dir(defaults toruns/), exports ONNX with the derived ray count, then cleans any new.onnxfiles left in the checkpoint folder. - Use via
from ppo_api.inference import PPOInference; setexecution_providerinppo_api/config.jsontocpu/cuda/tensorrt(defaults to CPU). The generatedppo_api/README.mddocuments the validated inputs and IO layout.
独立推理导出
python tools/setup_api.py
- 需要安装
torch、onnx和onnxruntime(或onnxruntime-gpu);pip install -r requirements.txt加合适的 torch wheel 即可覆盖导出与推理。 - 重建
ppo_api/:复制tools/api_example模板;从config/env_config.json与config/train_config.json同步 limits/dt/FOV/attention 等关键参数;在run.ckpt_dir(默认runs/)下选择最新的.pt权重;按推导的射线数量导出 ONNX,并清理检查点目录中新生成的.onnx。 - 使用时
from ppo_api.inference import PPOInference;可在ppo_api/config.json或初始化时指定execution_provider=cpu/cuda/tensorrt(默认 CPU),输入/输出格式详见生成的ppo_api/README.md。
Repository Layout
env/sim_gpu_env.py: Batched randomized ray environment (SimRandomGPUBatchEnv) with per-step FOV resampling and task-point rewards.ray.py: Ray count utilities used whenn_rays==0.utils.py: Logging helpers and JSON config loader.
rl_ppo/ppo_train.py: PPO training entrypoint.ppo_models.py: Shared-encoder Gaussian policy and value head with tanh-squashed actions.encoder.py: RayEncoder backbone (ray convolutions + multi-query, multi-head attention) that outputs a 256-d latent.ppo_buffer.py: GAE-Lambda rollout buffer.ppo_utils.py: Discounted return helpers, checkpoint utilities, AMP guards, and reproducibility tools.
config/env_config.json: Environment, observation, and reward settings.train_config.json: PPO hyperparameters and run configuration.
tools/setup_api.py: One-command exporter that builds a self-containedppo_api/with the newest checkpoint and synced configs.api_example/: Template for the exported inference package.
runs/: Default checkpoint/output directory (created at runtime).
目录结构
env/sim_gpu_env.py:批量随机化的光线环境(SimRandomGPUBatchEnv),支持每步视场重新采样和任务点奖励。ray.py:当n_rays==0时使用的光线数量工具函数。utils.py:日志工具和 JSON 配置加载器。
rl_ppo/ppo_train.py:PPO 训练入口。ppo_models.py:共享编码器的高斯策略与价值头,动作使用tanh压缩。encoder.py:RayEncoder 主干(光线卷积 + 多查询多头注意力),输出 256 维潜在向量。ppo_buffer.py:GAE-Lambda 轨迹缓冲。ppo_utils.py:折扣回报工具、检查点管理、AMP 保护和可复现性辅助。
config/env_config.json:环境、观测与奖励配置。train_config.json:PPO 超参数与运行配置。
tools/setup_api.py:一键导出脚本,使用最新检查点与配置生成独立的ppo_api/。api_example/:导出推理包的模板。
runs/:默认的检查点与输出目录(运行时生成)。
GPU Randomized Environment

- Per-step FOV resampling: Each GPU sub-environment redraws per-ray distances every step using an empty/obstacle mask derived from
blank_ratio_baseplus Gaussian jitter (blank_ratio_randmax,blank_ratio_std_ratio). Empty rays are filled with the full view radius while obstacle rays sample distances. - Gaussian narrow passages (optional): When
narrow_passage_gaussianis true, obstacle distances follow a half-Gaussian with std =patch_meters * narrow_passage_std_ratio, producing clustered close obstacles; otherwise distances are uniform within the view radius. - Task points without global maps: Task points are sampled within
task_point_max_dist_mand clipped to LOS using the sampled rays; redraw cadence is controlled bytask_point_random_interval_max.
GPU 随机环境
- 每步视场重采样:每个 GPU 子环境每步重新生成射线距离,先用基准空白率
blank_ratio_base加高斯抖动(blank_ratio_randmax,blank_ratio_std_ratio)得到空/障掩码,空白射线填充视野半径,障碍射线再采样距离。 - 可选高斯狭窄通道:当
narrow_passage_gaussian为真时,障碍距离服从半高斯分布(标准差为patch_meters * narrow_passage_std_ratio),使障碍更集中;否则在视野半径内均匀采样。 - 无全局地图的任务点:任务点在
task_point_max_dist_m内随机生成,并按当前射线的 LOS 裁剪,可通过task_point_random_interval_max控制重绘频率。
Observation & Action
- Observation layout (dimension
R + 7):[rays_norm(R), sin_ref, cos_ref, prev_vx/lim, prev_omega/lim, Δvx/(2·lim), Δomega/(2·omega_max), dist_to_task/patch_meters]. - Actions are
(vx, vy, omega), clipped bylimitseach step. When only two columns are provided,vyis zeroed inside the environment.
观测与动作
- 观测向量维度为
R + 7:[rays_norm(R), sin_ref, cos_ref, prev_vx/lim, prev_omega/lim, Δvx/(2·lim), Δomega/(2·omega_max), dist_to_task/patch_meters]。 - 动作为
(vx, vy, omega),每步按limits裁剪;若只提供两列,环境会将vy置零。
GRALP Network (policy/value)

- RayEncoder backbone (
rl_ppo/encoder.py)- Ray branch: 1D depthwise-separable convolutions with GELU + squeeze-excite blocks to embed per-ray distances.
- Attention fusion: Multi-query, multi-head attention over the ray features; pose/history MLP provides query bias; outputs
[B, num_queries, d_model]plus global averages. - Fusion head: Concatenates attended rays, global averages, and mean queries → two-layer MLP → 256-d latent.
- Policy head (
rl_ppo/ppo_models.py)- Linear map from 256-d latent to mean action, global learnable
log_stdclamped to[log_std_min, log_std_max].
- Linear map from 256-d latent to mean action, global learnable
Related Skills
node-connect
349.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.7kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
