Total: 1
Previous works on speech-based depression detection typically use datasets collected in similar environments for both training and testing the models. However, in practice, the training and testing distributions often differ. Distributional shifts in speech can result from various factors, such as differences in recording environments (e.g., background noise) and demographic attributes (e.g., gender, age). These shifts can significantly degrade the performance of depression detection models. In this paper, we analyze the application of test-time training (TTT) to improve the robustness of depression detection models against such shifts. Our results demonstrate that TTT can substantially enhance model performance under various distributional shifts, including those caused by (a) background noise, (b) gender bias, and (c) differences in data collection and curation procedures, where training and testing samples originate from different datasets.