SkillAgentSearch skills...

DiLightNet

Official Code Release for [SIGGRAPH 2024] DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

Install / Use

/learn @iamNCJ/DiLightNet

README

<p align="center"> <h1 align="center">DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation</h1> <p align="center"> <a href="https://www.chong-zeng.com/"><strong>Chong Zeng</strong></a> · <a href="https://yuedong.shading.me/"><strong>Yue Dong</strong></a> · <a href="https://www.cs.wm.edu/~ppeers/"><strong>Pieter Peers</strong></a> · <a href="https://github.com/DQSSSSS"><strong>Youkang Kong</strong></a> · <a href="https://svbrdf.github.io/"><strong>Hongzhi Wu</strong></a> · <a href="https://scholar.google.com/citations?user=P91a-UQAAAAJ&hl=en"><strong>Xin Tong</strong></a> </p> <h2 align="center">SIGGRAPH 2024 Conference Proceedings</h2> <div align="center"> <img src="examples/teaser.png"> </div> <p align="center"> <br> <a href="https://dilightnet.github.io/"><strong>Project Page</strong></a> | <a href="https://arxiv.org/abs/2402.11929"><strong>arXiv</strong></a> | <a href="https://huggingface.co/dilightnet/DiLightNet"><strong>Model</strong></a> | <a href="https://huggingface.co/spaces/dilightnet/DiLightNet"><strong>Demo</strong></a> </p> </p>

DiLightNet is a novel method for exerting fine-grained lighting control during text-driven diffusion-based image generation. It involves a three stage method for controlling the lighting during image generation: provisional image generation, foreground synthesis and background inpainting. In this repo, we open-source the ControlNet model used in the second stage of DiLightNet, which is a neural network that takes in a provisional image, a mask and radiance hints as input, and geenrates a foreground image under the target lighting. For the provisional image generation stage and background inpainting stage, you can use any off-the-shelf models (e.g. Stable Diffusion, Depth ControlNet, ...) or services (e.g. DALL·E 3, MidJourney, ...).

Table of Content

<!-- Created by https://github.com/ekalinin/github-markdown-toc -->

Environment Setup

We use blender python binding bpy for radiance hint rendering. bpy requires a minimal version of python 3.10, and for the bpy version (3.6, LTS) we used, the only supported python version is 3.10. Thus, we recommend using conda to create a new environment with python 3.10 as well as CUDA and PyTorch dependencies.

conda create --name dilightnet python=3.10 pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda==12.4 mkl==2023.1.0 -c pytorch -c nvidia
conda activate dilightnet
git clone https://github.com/iamNCJ/DiLightNet
cd DiLightNet
pip install -r requirements.txt

Usage

Load the NeuralTextureControlNet Module & Model Weights

from diffusers.utils import get_class_from_dynamic_module
NeuralTextureControlNetModel = get_class_from_dynamic_module(
    "dilightnet/model_helpers",
    "neuraltexture_controlnet.py",
    "NeuralTextureControlNetModel"
)
neuraltexture_controlnet = NeuralTextureControlNetModel.from_pretrained("DiLightNet/DiLightNet")

Inference with StableDiffusionControlNetPipeline Pipeline

The base model of DiLightNet is stabilityai/stable-diffusion-2-1, you can easily set up an inference pipeline with our DiLightNet controlnet model.

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1", controlnet=neuraltexture_controlnet,
)
cond_image = torch.randn((1, 16, 512, 512))
image = pipe("some text prompt", image=cond_image).images[0]

Please check the simple example for using with real condition images.

Input Format

The input tensor to the controlnet model should be a torch.Tensor of shape (BS, 16, H, W), range (0, 1), where H and W are the height and width of the image, respectively. The 16 channels are in the order of:

  • Provisional Image: torch.Tensor of shape (1, 3, H, W), range (0, 1)
  • Mask: torch.Tensor of shape (1, 1, H, W), range (0, 1)
  • Radiance Hints: torch.Tensor of shape (1, 12, H, W), in the order of diffuse, specular (r=0.05, r=0.13, r=0.34), range (0, 1)

Input/Output Example

Inputs

| Provisional Image | Mask | Radiance Hints - Diffuse | | --- | --- | --- | | Provisional Image | Mask | Diffuse | | Radiance Hints - Specular (r=0.05) | Radiance Hints - Specular (r=0.13) | Radiance Hints - Specular (r=0.34) | | Specular 0.05 | Specular 0.13 | Specular 0.34 |

Outputs

| Output Image | | --- | | Output |

Integration Examples

Note: We have switched to DUSt3R for mono metric depth estimation and camera intrinsics estimation. The results in the paper are produced with ZoeDepth and a fixed camera intrinsics (fov=55.0). Hence the released version should be better but can produce results slightly different from the paper.

CLI: video generation with continuous lighting changing

Example

python3 infer_img.py --prov_img examples/provisional_img/futuristic_soldier.png --prompt "futuristic soldier with advanced armor weaponry and helmet" --env_map examples/env_map/grace.exr --out_vid ./output/soldier_grace.mp4

Please check the test script for more examples.

Available arguments

--prov_img str         Path to the provisional image (default: None)
--prompt str          Prompt for the generated images (default: )
--num_imgs_per_prompt int
                      Number of images to generate per prompt (default: 4)
--out_vid [str]       Path to the output video, defaults to the input image path (default: None)
--seed int            Seed for the generation (default: 3407)
--steps int           Number of steps for the diffusion process (default: 20)
--cfg float           CFG for the diffusion process (default: 3.0)
--fov [float]         Field of view for the mesh reconstruction, none for auto estimation from the image (default: None)
--mask_path [str]     Path to the mask for the image (default: None)
--use_sam bool, --nouse_sam bool
                      Use SAM for background removal (default: True)
--mask_threshold float
                      Mask threshold for foreground object extraction (default: 25.0)
--pl_pos_r float      Rotation radius of the point light (default: 5.0)
--pl_pos_h float      Height of the point light (default: 3.0)
--power float         Power of the point light (default: 1200.0)
--inpaint bool, --noinpaint bool
                      Inpaint the background of generated point light images (default: False)
--env_map [str]       Environment map for the rendering, defaults to None (white point light) (default: None)
--frames int          Number of frames for lighting controlled video (default: 120)
--use_gpu_for_rendering bool, --nouse_gpu_for_rendering bool
                      Use GPU for radiance hints rendering (default: True)
--cache_radiance_hints bool, --nocache_radiance_hints bool
                      Cache the radiance hints for the video (default: True)
--radiance_hints_path [str]
                      pre-rendered radiance hint path (default: None)

Generation Tips

  1. Foreground mask: By default we use U2Net to generate an inital mask and use SAM to further refine it. But if it doesn't work well on your image, you can provide a mask image with --mask_path. The mask image can be RGBA or grayscale, which we directly use the last channel as the mask.
  2. Background inpainting: For environment map lightings, the script will automatically inpaint the background with the background color of the environment map. For point light lightings, you can use --inpaint to inpaint the background of the generated images using the stable diffusion inpainting model, but we suggest a manual intervention to get better inpainting results.
  3. Randomness: Due to the ambiguity in the provisional image (e.g. shape, original lighting, fine-grained material properties, etc.), generated results can and should have diversity. Hence, you can try different seeds and prompts to get the wanted results, just as any diffusion model. The script will generate num_imgs_per_prompt (default=4) images for each prompt.

Depth

Related Skills

docs-writer

99.0k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

334.5k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

pr

for a github pr, please respond in the following format - ## What type of PR is this? - [ ] 🍕 Feature - [ ] 🐛 Bug Fix - [ ] 📝 Documentation - [ ] 🧑‍💻 Code Refactor - [ ] 🔧 Other ## Description <!-- What changed and why? Optional: include screenshots or other supporting artifacts. --> ## Related Issues <!-- Link issues like: Fixes #123 --> ## Updated requirements or dependencies? - [ ] Requirements or dependencies added/updated/removed - [ ] No requirements changed ## Testing - [ ] Tests added/updated - [ ] No tests needed **How to test or why no tests:** <!-- Describe test steps or explain why tests aren't needed --> ## Checklist - [ ] Self-reviewed the code - [ ] Tests pass locally - [ ] No console errors/warnings ## [optional] What gif best describes this PR?

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

View on GitHub
GitHub Stars207
CategoryContent
Updated8d ago
Forks13

Languages

Python

Security Score

100/100

Audited on Mar 17, 2026

No findings