115 skills found · Page 1 of 4
openai / Sparse AttentionExamples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
svg-project / Sparse VideoGen[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
Haiyang-W / DSVT[CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"
cschenxiang / DRSformerLearning A Sparse Transformer Network for Effective Image Deraining (CVPR 2023)
thu-ml / SLASLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
microsoft / Swin3DA shift-window based transformer for 3D sparse tasks
VITA-Group / SLaK[ICLR 2023] "More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity"; [ICML 2023] "Are Large Kernels Better Teachers than Transformers for ConvNets?"
lucidrains / Sinkhorn TransformerSinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention
DerrickXuNu / CoBEVT[CoRL2022] CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers
microsoft / SwinBERTResearch code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
NimbleEdge / Sparse TransformersSparse Inferencing for transformer based LLMs
facebookresearch / Mixture Of TransformersMixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.
ThomasVonWu / SparseEnd2EndEnd2EndPerception deployment solution based on vision sparse transformer paradigm is open sourced.
joshyZhou / ASTAdapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration
JIA-Lab-research / SparseTransformerA fast and memory-efficient libarary for sparse transformer with varying token numbers (e.g., 3D point cloud).
Ephemeral182 / UDR S2Former Deraining[ICCV'23] Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks
hihihihiwsf / ASTAdversarial Sparse Transformer for Time Series Forecasting
kyegomez / SwitchTransformersImplementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
sharc-lab / Edge MoEEdge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts
kyegomez / SparseAttentionPytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"