Results for "deepspeed-ulysses"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

3 skills found

feifeibear / Long Context Attention

654

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

universal

attention-is-all-you-needdeepspeed-ulyssesllm-inference+3

Updated 1h ago

InternLM / InternEvo

419

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

universal

910bdeepspeed-ulyssesflash-attention+15

Updated 13d ago

Eugene29 / Megatron DeepSpeed ViT

Fork of Megatron-DeepSpeed with VIT bug fixes and model parallelisms (TP, TP-SP, Ulysses, etc) enabled for VIT. Pipeline Parallelism is not yet enabled.

universal

Updated 5mo ago