SparK
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
Install / Use
/learn @keyu-tian/SparKREADME
SparK: the first successful BERT/MAE-style pretraining on any convolutional networks

This is the official implementation of ICLR paper Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling, which can pretrain any CNN (e.g., ResNet) in a BERT-style self-supervised manner. We've tried our best to make the codebase clean, short, easy to read, state-of-the-art, and only rely on minimal dependencies.
<!-- <p align="center"> --> <!-- <img src="https://user-images.githubusercontent.com/39692511/211496814-e6cb9243-833c-43d2-a859-d35afa96ed22.png" width=86% class="center"> --> <!-- </p> -->https://user-images.githubusercontent.com/39692511/226858919-dd4ccf7e-a5ba-4a33-ab21-4785b8a7833c.mp4
<br> <div align="center"> </div> <!-- <div align="center"> --> <!-- [[`pdf`](https://arxiv.org/pdf/2301.03580.pdf)] --> <!-- [[`bibtex`](https://github.com/keyu-tian/SparK#citation)] --> <!-- </div> -->🔥 News
- A brief introduction (in English) is available on our ICLR poster page! [
📹Recorded Video, Poster, and Slides]. - On May. 11th another livestream on OpenMMLab & ReadPaper (bilibili)! [
📹Recorded Video] - On Apr. 27th (UTC+8 8pm) another livestream would be held at OpenMMLab (bilibili)!
- On Mar. 22nd (UTC+8 8pm) another livestream would be held at 极市平台 (bilibili)! [
📹Recorded Video] - The share on TechBeat (将门创投) is scheduled on Mar. 16th (UTC+8 8pm) too! [
📹Recorded Video] - We are honored to be invited by Synced ("机器之心机动组 视频号" on WeChat) to give a talk about SparK on Feb. 27th (UTC+0 11am, UTC+8 7pm), welcome! [
📹Recorded Video] - This work got accepted to ICLR 2023 as a Spotlight (notable-top-25%).
- Other articles: [
Synced] [DeepAI] [TheGradient] [Bytedance] [CVers[QbitAI(量子位)] [BAAI(智源)] [机器之心机动组] [极市平台] [ReadPaper笔记]
🕹️ Colab Visualization Demo
Check pretrain/viz_reconstruction.ipynb for visualizing the reconstruction of SparK pretrained models, like:
<p align="center"> <img src="https://user-images.githubusercontent.com/39692511/226376648-3f28a1a6-275d-4f88-8f3e-cd1219882488.png" width=50% <p>We also provide pretrain/viz_spconv.ipynb that shows the "mask pattern vanishing" issue of dense conv layers.
What's new here?
🔥 Pretrained CNN beats pretrained Swin-Transformer:
<p align="center"> <img src="https://user-images.githubusercontent.com/39692511/226844278-1dc1e13c-1f07-4b8f-9843-8c47fca47253.jpg" width=66%> <p>🔥 After SparK pretraining, smaller models can beat un-pretrained larger models:
<p align="center"> <img src="https://user-images.githubusercontent.com/39692511/226861835-77e43c07-0a00-4020-9395-03e81bfe6959.jpg" width=72%> <p>🔥 All models can benefit, showing a scaling behavior:
<p align="center"> <img src="https://user-images.githubusercontent.com/39692511/211705760-de15f4a1-0508-4690-981e-5640f4516d2a.png" width=65%> <p>🔥 Generative self-supervised pretraining surpasses contrastive learning:
<p align="center"> <img src="https://user-images.githubusercontent.com/39692511/211497479-0563e891-f2ad-4cf1-b682-a21c2be1442d.png" width=65%> <p>See our paper for more analysis, discussions, and evaluations.
Todo list
<details> <summary>catalog</summary>- [x] Pretraining code
- [x] Pretraining toturial for customized CNN model (Tutorial for pretraining your own CNN model)
- [x] Pretraining toturial for customized dataset (Tutorial for pretraining your own dataset)
- [x] Pretraining Colab visualization playground (reconstruction, sparse conv)
- [x] Finetuning code
- [ ] Weights & visualization playground in
huggingface - [ ] Weights in
timm
Pretrained weights (self-supervised; w/o decoder; can be directly finetuned)
Note: for network definitions, we directly use timm.models.ResNet and official ConvNeXt.
reso.: the image resolution; acc@1: ImageNet-1K finetuned acc (top-1)
| arch. | reso. | acc@1 | #params | flops | weights (self-supervised, without SparK's decoder) | |:--------------:|:-----:|:-----:|:-------:|:------:|:---------------------------------------------------------------------------------------------------------------------------------------| | ResNet50 | 224 | 80.6 | 26M | 4.1G | resnet50_1kpretrained_timm_style.pth | | ResNet101 | 224 | 82.2 | 45M | 7.9G | resnet101_1kpretrained_timm_style.pth | | ResNet152 | 224 | 82.7 | 60M | 11.6G | resnet152_1kpretrained_timm_style.pth | | ResNet200 | 224 | 83.1 | 65M | 15.1G | resnet200_1kpretrained_timm_style.pth | | ConvNeXt-S | 224 | 84.1 | 50M | 8.7G | convnextS_1kpretrained_official_style.pth | | ConvNeXt-B | 224 | 84.8 | 89M | 15.4G | convnextB_1kpretrained_official_style.pth | | ConvNeXt-L | 224 | 85.4 | 198M | 34.4G | convnextL_1kpretrained_official_style.pth | | ConvNeXt-L | 384 | 86.0 | 198M | 101.0G | convnextL_384_1kpretrained_official_style.pth |
<details> <summary> <b> Pretrained weights (with SparK's UNet-style decoder; can be used to reconsRelated Skills
diffs
336.9kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
1.8kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
ui-ux-pro-max-skill
51.3kAn AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
