Total: 1
The rise of AIGC has revolutionized multimedia processing, including audio applications. Room Impulse Response (RIR), which models sound propagation in acoustic environments, plays a critical role in various downstream tasks such as speech synthesis. Existing RIR generation methods, whether based on ray tracing or neural representations, fail to fully exploit the temporal dynamics inherent in RIR. In this work, we propose a novel method for temporal modeling of RIR through autoregressive learning. Our approach captures the sequential evolution of sound propagation by introducing a multi-scale generation mechanism that adaptively scales across varying temporal resolutions. Extensive evaluations demonstrate that our approach achieves respective T60 error rates of 4.1% and 5.3% on two real-world datasets, outperforming existing RIR generation methods. We believe our work opens up new directions for future research.