Total: 1
This study explores the potential of Rhythm Formant Analysis (RFA) to capture long-term temporal modulations in dementia speech. Specifically, we introduce RFA-derived rhythm spectrograms as novel features for dementia classification and regression tasks. We propose two methodologies: (1) handcrafted features derived from rhythm spectrograms, and (2) a data-driven fusion approach, integrating proposed RFA-derived rhythm spectrograms with vision transformer (ViT) for acoustic representations along with BERT-based linguistic embeddings. We compare these with existing features. Notably, our handcrafted features outperform eGeMAPs with a relative improvement of 14.2% in classification accuracy and comparable performance in the regression task. The fusion approach also shows improvement, with RFA spectrograms surpassing Mel spectrograms in classification by around a relative improvement of 13.1% and a comparable regression score with the baselines. All codes are available in GitHub repo.