SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

Generate Convert Improve

Install / Use

/learn @keyu-tian/SparK

About this skill

Quality Score

0/100

README

SparK: the first successful BERT/MAE-style pretraining on any convolutional networks

This is the official implementation of ICLR paper Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling, which can pretrain any CNN (e.g., ResNet) in a BERT-style self-supervised manner. We've tried our best to make the codebase clean, short, easy to read, state-of-the-art, and only rely on minimal dependencies.

https://user-images.githubusercontent.com/39692511/226858919-dd4ccf7e-a5ba-4a33-ab21-4785b8a7833c.mp4

</div>

🔥 News

A brief introduction (in English) is available on our ICLR poster page! [📹Recorded Video, Poster, and Slides].
On May. 11th another livestream on OpenMMLab & ReadPaper (bilibili)! [📹Recorded Video]
On Apr. 27th (UTC+8 8pm) another livestream would be held at OpenMMLab (bilibili)!
On Mar. 22nd (UTC+8 8pm) another livestream would be held at 极市平台 (bilibili)! [📹Recorded Video]
The share on TechBeat (将门创投) is scheduled on Mar. 16th (UTC+8 8pm) too! [📹Recorded Video]
We are honored to be invited by Synced ("机器之心机动组视频号" on WeChat) to give a talk about SparK on Feb. 27th (UTC+0 11am, UTC+8 7pm), welcome! [📹Recorded Video]
This work got accepted to ICLR 2023 as a Spotlight (notable-top-25%).
Other articles: [Synced] [DeepAI] [TheGradient] [Bytedance] [CVers [QbitAI(量子位)] [BAAI(智源)] [机器之心机动组] [极市平台] [ReadPaper笔记]

🕹️ Colab Visualization Demo

Check pretrain/viz_reconstruction.ipynb for visualizing the reconstruction of SparK pretrained models, like:

We also provide pretrain/viz_spconv.ipynb that shows the "mask pattern vanishing" issue of dense conv layers.

What's new here?

🔥 Pretrained CNN beats pretrained Swin-Transformer:

🔥 After SparK pretraining, smaller models can beat un-pretrained larger models:

🔥 All models can benefit, showing a scaling behavior:

🔥 Generative self-supervised pretraining surpasses contrastive learning:

See our paper for more analysis, discussions, and evaluations.

Todo list

<details> <summary>catalog</summary>

[x] Pretraining code
[x] Pretraining toturial for customized CNN model (Tutorial for pretraining your own CNN model)
[x] Pretraining toturial for customized dataset (Tutorial for pretraining your own dataset)
[x] Pretraining Colab visualization playground (reconstruction, sparse conv)
[x] Finetuning code
[ ] Weights & visualization playground in huggingface
[ ] Weights in timm

</details>

Pretrained weights (self-supervised; w/o decoder; can be directly finetuned)

Note: for network definitions, we directly use timm.models.ResNet and official ConvNeXt.

reso.: the image resolution; acc@1: ImageNet-1K finetuned acc (top-1)

| arch. | reso. | acc@1 | #params | flops | weights (self-supervised, without SparK's decoder) | |:--------------:|:-----:|:-----:|:-------:|:------:|:---------------------------------------------------------------------------------------------------------------------------------------| | ResNet50 | 224 | 80.6 | 26M | 4.1G | resnet50_1kpretrained_timm_style.pth | | ResNet101 | 224 | 82.2 | 45M | 7.9G | resnet101_1kpretrained_timm_style.pth | | ResNet152 | 224 | 82.7 | 60M | 11.6G | resnet152_1kpretrained_timm_style.pth | | ResNet200 | 224 | 83.1 | 65M | 15.1G | resnet200_1kpretrained_timm_style.pth | | ConvNeXt-S | 224 | 84.1 | 50M | 8.7G | convnextS_1kpretrained_official_style.pth | | ConvNeXt-B | 224 | 84.8 | 89M | 15.4G | convnextB_1kpretrained_official_style.pth | | ConvNeXt-L | 224 | 85.4 | 198M | 34.4G | convnextL_1kpretrained_official_style.pth | | ConvNeXt-L | 384 | 86.0 | 198M | 101.0G | convnextL_384_1kpretrained_official_style.pth |

<details> <summary> <b> Pretrained weights (with SparK's UNet-style decoder; can be used to recons

Related Skills

diffs

336.9k

Use the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.

clearshot

Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.

openpencil

1.8k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

ui-ux-pro-max-skill

51.3k

An AI SKILL that provide design intelligence for building professional UI/UX multiple platforms