Total: 1
We propose a diffusion-based inverse rendering framework that decomposes a single RGB image into geometry, material, and lighting. Inverse rendering is inherently ill-posed, making it difficult to predict a single accurate solution. To address this challenge, recent generative model-based methods aim to present a range of possible solutions. However, the objectives of finding a single accurate solution and generating diverse possible solutions can often be conflicting. In this paper, we propose a channel-wise noise scheduling approach that allows a single diffusion model architecture to achieve two conflicting objectives. The resulting two diffusion models, trained with different channel-wise noise schedules, can predict a single highly accurate solution and present multiple possible solutions. Additionally, we introduce a technique to handle high-dimensional per-pixel environment maps in diffusion models. The experimental results on both synthetic and real-world datasets demonstrate the superiority of our two models and their complementary nature, highlighting the importance of considering both accuracy and diversity in inverse rendering.