A Hypernetwork-Based Approach to KAN Representation of Audio Signals

#1 A Hypernetwork-Based Approach to KAN Representation of Audio Signals [PDF] [Copy] [Kimi] [REL]

Authors: Patryk Marszałek, Maciej Rut, Piotr Kawa, Piotr Syga

Implicit neural representations (INR) have gained prominence for efficiently encoding multimedia data, yet their applications in audio signals remain limited. This study introduces the Kolmogorov-Arnold Network (KAN), a novel architecture using learnable activation functions, as an effective INR model for audio representation. KAN demonstrates superior perceptual performance over previous INRs, achieving the lowest Log-SpectralDistance of 1.29 and the highest Perceptual Evaluation of Speech Quality of 3.57 for 1.5 s audio. To extend KAN's utility, we propose FewSound, a hypernetwork-based architecture that enhances INR parameter updates. FewSound outperforms the state-of-the-art HyperSound, with a 33.3% improvement in MSE and 60.87% in SI-SNR. These results show KAN as a robust and adaptable audio representation with the potential for scalability and integration into various hypernetwork frameworks. The source code can be accessed at https://github.com/gmum/fewsound.git.

Subjects: Sound , Computer Vision and Pattern Recognition , Audio and Speech Processing

Publish: 2025-03-04 13:08:45 UTC

2503.02585

#1 A Hypernetwork-Based Approach to KAN Representation of Audio Signals [PDF] [Copy] [Kimi] [REL]