Frido

Research code for paper "Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis"

Generate Convert Improve

Install / Use

/learn @davidhalladay/Frido

About this skill

Quality Score

0/100

README

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

This is the official repository of Frido. We now support training and testing for text-to-image, layout-to-image, scene-graph-to-image, and label-to-image on COCO/VG/OpenImage. Please stay tune there!

Frido demo

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis <br/>Wan-Cyuan Fan, Yen-Chun Chen, DongDong Chen, Yu Cheng, Lu Yuan, Yu-Chiang Frank Wang<br/>

☀️Important updates

[Nov 7, 2024] Microsoft’s blob storage no longer allow anonymous downloads per the latest company wise security policy. The pretrained-weights may not be available now. We are fixing this issue.
[Nov 12, 2024] All checkpoints are available for download in Google Drive.

☀️News

We provide a web version of demo here to help researchers to better understand our work. This web demo contains multiple animations to explain th diffusion and denoising processes of Frido and more qualitative experimental results. Hope it's useful!

🐧TODO

Frido codebase

[x] Training code
[x] Training scrpits
[x] Inference code
[x] Inference scripts
[x] Inference model weights setup
[X] Evaluation code and scripts
[X] Auto setup datasets
[X] Auto download model weights
[X] PLMS sampling tools
[X] Web demo and framework animation

Machine environment

Ubuntu version: 18.04.5 LTS
CUDA version: 11.6
Testing GPU: Nvidia Tesla V100

Requirements

A conda environment named frido can be created and activated with:

conda env create -f environment.yaml
conda activate frido

Datasets setup

We provide two approaches to set up the datasets:

🎶 Auto-download

To automatically download datasets and save it into the default path (../), please use following script:

bash tools/datasets/download_coco.sh
bash tools/datasets/download_vg.sh
bash tools/datasets/download_openimage.sh

🎶 Manual setup

COCO 2014 split (T2I)

We use COCO 2014 splits for text-to-image task, which can be downloaded from official COCO website.

Please create a folder name 2014 and collect the downloaded data and annotations as follows.

<details><summary>COCO 2014 file structure</summary>

>2014
├── annotations
│   └── captions_val2014.json
│   └── ...
└── val2014
   └── COCO_val2014_000000000073.jpg
   └── ...

</details>

COCO-stuff 2017

Standard split (Layout2I & Label2I)

We follow TwFA and LAMA to perform layout-to-image experiment on COCO-stuff 2017, which can be downloaded from official COCO website.
Please create a folder name 2017 and collect the downloaded data and annotations as follows.
<details><summary>COCO-stuff 2017 split file structure</summary>
```
>2017
├── annotations
│   └── captions_val2017.json
│   └── ...
└── val2017
   └── 000000000872.jpg
   └── ... 
```
</details>

Segmentation challenge split (Layout2I & SG2I)

We follow LDM and HCSS to perform layout-to-image experiment on COCO-stuff segmentation challenge split, which can be downloaded from official COCO website.
Please make sure the deprecated-challenge2017 folder is downloaded and saved in annotations dir.

Please create a folder name 2017 and collect the downloaded data and annotations as follows.

<details><summary>COCO 2017 Segmentation challenge split file structure</summary>

>2017
├── annotations
│   └── deprecated-challenge2017
│        └── train-ids.txt
│        └── val-ids.txt
│   └── captions_val2017.json
│   └── ...
└── val2017
   └── 000000000872.jpg
   └── ...

</details>

Visual Genome (Layout2I & SG2I)

We follow TwFA and LAMA to perform layout-to-image experiments on Visual Genome.
Also, we follow Sg2Im and CanonicalSg2Im to conduct scene-graph-to-image experiments on Visual Genome.
Firstly, please use the download scripts in Sg2Im to download and pre-process the Visual Genome dataset.
Secondly, Please use the script TODO.py to generate coco-style vg.json for both two tasks, as shown below:

python3 TODO.py [VG_DIR_PATH]

Please create a folder name vg and collect the downloaded data and annotations as follows.

<details><summary>Visual Genome file structure</summary>

>vg
├── VG_100K
│   └── captions_val2017.json
│   └── ...
└── objects.json
└── train_coco_style.json
└── train.json
└── ...

</details>

OpenImage (Layout2I)

We follow LDM and HCSS to perform layout-to-image experiment on OpenImage, which can be downloaded from official OpenImage website.

Please create a folder name openimage and collect the downloaded data and annotations as follows.

<details><summary>OpenImage file structure</summary>

>openimage
├── train
│   └── data
│   │    └── *.jpg
│   └── labels
│   │    └── masks
│   │    └── detections.csv
│   └── metadata
│   │    └── classes.csv
│   │    └── image_id.csv
│   │    └── ...
├── validation
│   └── data
│   └── labels
│   └── metadata
└── info.json

</details>

File structure for dataset and code

Please make sure that the file structure is the same as the following. Or, you might modify the config file to match the corresponding paths.

<details><summary>File structure</summary>

>datasets
├── coco
│   └── 2014
│        └── annotations
│        └── val2014
│        └── ...
│   └── 2017
│        └── annotations
│        └── val2017
│        └── ...
├── vg
├── openimage
>Frido
└── configs
│   └── frido
│   └── ... 
└── exp
│   └── t2i
│        └── frido_f16f8_coco
│             └── checkpoints
│                  └── model.ckpt
│   └── layout2i
│   └── ...
└── frido
└── scripts
└── tools
└── ...

</details>

Download pre-trained models

Microsoft’s blob storage no longer allow anonymous downloads per the latest company wise security policy. The pretrained-weights may not be available by the following script. Please kindly download the checkpoints from Google Drive.

The following table describs tasks and models that are currently available. To auto-download (using azcopy) all model checkpoints of Frido, please use following command:

bash tools/download.sh

You may also download them manually from the download links shown below.

| Task | Dataset | FID | Link (TODO) | Comments | ---------------------- | -------------------------- | ------ | --- |------------- | Text-to-image | COCO 2014 | 11.24 | Google drive | | Text-to-image (mini) | COCO 2014 | 64.85 | Google drive |1000 images of mini-val; FID was calculated against corresponding GT images. | Text-to-image | COCO 2014 | 10.74 | Google drive | CLIP encoder from stable diffusion (not CLIP re-ranking) | Scene-graph-to-image | COCO-stuff 2017 | 46.11 | Google drive |Data preprocessing same as sg2im. | Scene-graph-to-image | Visual Genome | 31.61 | Google drive |Data preprocessing same as sg2im. | Label-to-image | COCO-stuff | 27.65 | Google drive | 2-30 instances | Label-to-image | COCO-stuff | 47.39 | Google drive | 3-8 instances | Layout-to-image | COCO (finetuned from OpenImage) | 37.14 | Google drive |FID calculated on 2,048 val images. | Layout-to-image (mini) | COCO (finetuned from OpenImage) | 121.23 | Google drive |320 images of mini-val; FID was calculated against corresponding GT images. | Layout-to-image | OpenImage | 29.04 | [Google drive](https://github.com/davidhalladay/Frido/blob/main/to

Related Skills

qqbot-channel

347.9k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.2k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

347.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

davidhalladay

View profile

View on GitHub

GitHub Stars114

CategoryContent

Updated12d ago

Forks11

davidhalladay/Frido

Languages

Jupyter Notebook

Security Score

100/100

Audited on Mar 23, 2026

No findings