2024-10-29 | | Total: 13
The analysis of neural power spectra plays a crucial role in understanding brain function and dysfunction. While recent efforts have led to the development of methods for decomposing spectral data, challenges remain in performing statistical analysis and group-level comparisons. Here, we introduce Bayesian Spectral Decomposition (BSD), a Bayesian framework for analysing neural spectral power. BSD allows for the specification, inversion, comparison, and analysis of parametric models of neural spectra, addressing limitations of existing methods. We first establish the face validity of BSD on simulated data and show how it outperforms an established method (\fooof{}) for peak detection on artificial spectral data. We then demonstrate the efficacy of BSD on a group-level study of EEG spectra in 204 healthy subjects from the LEMON dataset. Our results not only highlight the effectiveness of BSD in model selection and parameter estimation, but also illustrate how BSD enables straightforward group-level regression of the effect of continuous covariates such as age. By using Bayesian inference techniques, BSD provides a robust framework for studying neural spectral data and their relationship to brain function and dysfunction.
Accurate and robust recording and decoding from the central nervous system (CNS) is essential for advances in human-machine interfacing. However, technologies used to directly measure CNS activity are limited by their resolution, sensitivity to interferences, and invasiveness. Advances in muscle recordings and deep learning allow us to decode the spiking activity of spinal motor neurons (MNs) in real time and with high accuracy. MNs represent the motor output layer of the CNS, receiving and sampling signals originating in different regions in the nervous system, and generating the neural commands that control muscles. The input signals to MNs can be estimated from the MN outputs. Here we argue that peripheral neural interfaces using muscle sensors represent a promising, non-invasive approach to estimate some neural activity from the CNS that reaches the MNs but does not directly modulate force production. We also discuss the evidence supporting this concept, and the necessary advances to consolidate and test MN-based CNS interfaces in controlled and real-world settings.
Mouse and human brains have different functions that depend on their neuronal networks. In this study, we analyzed nanometer-scale three-dimensional structures of brain tissues of the mouse medial prefrontal cortex and compared them with structures of the human anterior cingulate cortex. The obtained results indicated that mouse neuronal somata are smaller and neurites are thinner than those of human neurons. These structural features allow mouse neurons to be integrated in the limited space of the brain, though thin neurites should suppress distal connections according to cable theory. We implemented this mouse-mimetic constraint in convolutional layers of a generative adversarial network (GAN) and a denoising diffusion implicit model (DDIM), which were then subjected to image generation tasks using photo datasets of cat faces, cheese, human faces, and birds. The mouse-mimetic GAN outperformed a standard GAN in the image generation task using the cat faces and cheese photo datasets, but underperformed for human faces and birds. The mouse-mimetic DDIM gave similar results, suggesting that the nature of the datasets affected the results. Analyses of the four datasets indicated differences in their image entropy, which should influence the number of parameters required for image generation. The preferences of the mouse-mimetic AIs coincided with the impressions commonly associated with mice. The relationship between the neuronal network and brain function should be investigated by implementing other biological findings in artificial neural networks.
We propose a differential geometric model of hypercolumns in the primary visual cortex V1 that combines features of the symplectic model of the primary visual cortex by A. Sarti, G. Citti and J. Petitot and of the spherical model of hypercolumns by P. Bressloff and J. Cowan. The model is based on classical results in Conformal Geometry.
Neural encoding of artificial neural networks (ANNs) links their computational representations to brain responses, offering insights into how the brain processes information. Current studies mostly use linear encoding models for clarity, even though brain responses are often nonlinear. This has sparked interest in developing nonlinear encoding models that are still interpretable. To address this problem, we propose LinBridge, a learnable and flexible framework based on Jacobian analysis for interpreting nonlinear encoding models. LinBridge posits that the nonlinear mapping between ANN representations and neural responses can be factorized into a linear inherent component that approximates the complex nonlinear relationship, and a mapping bias that captures sample-selective nonlinearity. The Jacobian matrix, which reflects output change rates relative to input, enables the analysis of sample-selective mapping in nonlinear models. LinBridge employs a self-supervised learning strategy to extract both the linear inherent component and nonlinear mapping biases from the Jacobian matrices of the test set, allowing it to adapt effectively to various nonlinear encoding models. We validate the LinBridge framework in the scenario of neural visual encoding, using computational visual representations from CLIP-ViT to predict brain activity recorded via functional magnetic resonance imaging (fMRI). Our experimental results demonstrate that: 1) the linear inherent component extracted by LinBridge accurately reflects the complex mappings of nonlinear neural encoding models; 2) the sample-selective mapping bias elucidates the variability of nonlinearity across different levels of the visual processing hierarchy. This study presents a novel tool for interpreting nonlinear neural encoding models and offers fresh evidence about hierarchical nonlinearity distribution in the visual cortex.
To better understand existing LLMs, we may examine the human mental (cognitive/psychological) architecture, and its components and structures. Based on psychological, philosophical, and cognitive science literatures, it is argued that, within the human mental architecture, existing LLMs correspond well with implicit mental processes (intuition, instinct, and so on). However, beyond such implicit processes, explicit processes (with better symbolic capabilities) are also present within the human mental architecture, judging from psychological, philosophical, and cognitive science literatures. Various theoretical and empirical issues and questions in this regard are explored. Furthermore, it is argued that existing dual-process computational cognitive architectures (models of the human cognitive/psychological architecture) provide usable frameworks for fundamentally enhancing LLMs by introducing dual processes (both implicit and explicit) and, in the meantime, can also be enhanced by LLMs. The results are synergistic combinations (in several different senses simultaneously).
In daily interactions, emotions are frequently conveyed and triggered through verbal exchanges. Sometimes, we must modulate our emotional reactions to align with societal norms. Among the emotional words, taboo words represent a specific category that has been poorly studied. One intriguing question is whether these word categories can be predicted from EEG responses with the use of machine learning methods. To address this question, Support Vector Machine (SVM) was applied to decode the word categories from Event Related Potential (ERP) in 40 native Italian speakers. 240 neutral, negative and taboo words were used to this aim. Results indicate that the SVM classifier successfully distinguished between the three-word categories, with significant differences in neural activity ascribed to the late positive potential mainly detected in the central-parietal-occipital and anterior right scalp areas in the time windows of 450-649 ms and 650-850 ms. These findings were in line with the established distribution pattern of the late positive potential. Intriguingly, the study also revealed that word categories were still detectable in the regulate condition. This study extends previous results on the domain of the cortical responses of taboo words, and how machine learning methods can be used to predict word categories from EEG responses.
Sparse autoencoders have recently produced dictionaries of high-dimensional vectors corresponding to the universe of concepts represented by large language models. We find that this concept universe has interesting structure at three levels: 1) The "atomic" small-scale structure contains "crystals" whose faces are parallelograms or trapezoids, generalizing well-known examples such as (man-woman-king-queen). We find that the quality of such parallelograms and associated function vectors improves greatly when projecting out global distractor directions such as word length, which is efficiently done with linear discriminant analysis. 2) The "brain" intermediate-scale structure has significant spatial modularity; for example, math and code features form a "lobe" akin to functional lobes seen in neural fMRI images. We quantify the spatial locality of these lobes with multiple metrics and find that clusters of co-occurring features, at coarse enough scale, also cluster together spatially far more than one would expect if feature geometry were random. 3) The "galaxy" scale large-scale structure of the feature point cloud is not isotropic, but instead has a power law of eigenvalues with steepest slope in middle layers. We also quantify how the clustering entropy depends on the layer.
In (Linton, 2024) I present a new illusion (the 'Linton Stereo Illusion') that challenges our understanding of stereo vision. A vision scientist has shared their own analysis of the 'Linton Stereo Illusion' (titled: 'There is no challenge to our understanding of stereo vision: Response to Linton and Kriegeskorte (ECVP 2024 and ArXiv:2408.00770)') claiming that the 'Linton Stereo Illusion' is fully explained by Johnston (1991). I regard Johnston (1991) as one of the most important stereo vision papers in our young (< 200-year-old) field, and so this challenge requires a response. In this paper I explain why Johnston (1991) cannot explain the 'Linton Stereo Illusion'. Indeed, Johnston (1991) makes predictions that are the exact opposite of those observed in the 'Linton Stereo Illusion'. I also highlight a key concern with Johnston (1991)'s account that has so far been overlooked. Johnston (1991)'s account predicts that vergence eye movements will cause massive stereo distortions, leading to a world of unstable stereo perception. But this simply does not reflect our visual experience.
Machine learning techniques have enabled researchers to leverage neuroimaging data to decode speech from brain activity, with some amazing recent successes achieved by applications built using invasive devices. However, research requiring surgical implants has a number of practical limitations. Non-invasive neuroimaging techniques provide an alternative but come with their own set of challenges, the limited scale of individual studies being among them. Without the ability to pool the recordings from different non-invasive studies, data on the order of magnitude needed to leverage deep learning techniques to their full potential remains out of reach. In this work, we focus on non-invasive data collected using magnetoencephalography (MEG). We leverage two different, leading speech decoding models to investigate how an adversarial domain adaptation framework augments their ability to generalize across datasets. We successfully improve the performance of both models when training across multiple datasets. To the best of our knowledge, this study is the first ever application of feature-level, deep learning based harmonization for MEG neuroimaging data. Our analysis additionally offers further evidence of the impact of demographic features on neuroimaging data, demonstrating that participant age strongly affects how machine learning models solve speech decoding tasks using MEG data. Lastly, in the course of this study we produce a new open-source implementation of one of these models to the benefit of the broader scientific community.
The optimal training of a vision transformer for brain encoding depends on three factors: model size, data size, and computational resources. This study investigates these three pillars, focusing on the effects of data scaling, model scaling, and high-performance computing on brain encoding results. Using VideoGPT to extract efficient spatiotemporal features from videos and training a Ridge model to predict brain activity based on these features, we conducted benchmark experiments with varying data sizes (10k, 100k, 1M, 6M) and different model configurations of GPT-2, including hidden layer dimensions, number of layers, and number of attention heads. We also evaluated the effects of training models with 32-bit vs 16-bit floating point representations. Our results demonstrate that increasing the hidden layer dimensions significantly improves brain encoding performance, as evidenced by higher Pearson correlation coefficients across all subjects. In contrast, the number of attention heads does not have a significant effect on the encoding results. Additionally, increasing the number of layers shows some improvement in brain encoding correlations, but the trend is not as consistent as that observed with hidden layer dimensions. The data scaling results show that larger training datasets lead to improved brain encoding performance, with the highest Pearson correlation coefficients observed for the largest dataset size (6M). These findings highlight that the effects of data scaling are more significant compared to model scaling in enhancing brain encoding performance. Furthermore, we explored the impact of floating-point precision by comparing 32-bit and 16-bit representations. Training with 16-bit precision yielded the same brain encoding accuracy as 32-bit, while reducing training time by 1.17 times, demonstrating its efficiency for high-performance computing tasks.
Identifying auditory attention by comparing auditory stimuli and corresponding brain responses, is known as auditory attention decoding (AAD). The majority of AAD algorithms utilize the so-called envelope entrainment mechanism, whereby auditory attention is identified by how the envelope of the auditory stream drives variation in the electroencephalography (EEG) signal. However, neural processing can also be decoded based on endogenous cognitive responses, in this case, neural responses evoked by attention to specific words in a speech stream. This approach is largely unexplored in the field of AAD but leads to a single-word auditory attention decoding problem in which an epoch of an EEG signal timed to a specific word is labeled as attended or unattended. This paper presents a deep learning approach, based on EEGNet, to address this challenge. We conducted a subject-independent evaluation on an event-based AAD dataset with three different paradigms: word category oddball, word category with competing speakers, and competing speech streams with targets. The results demonstrate that the adapted model is capable of exploiting cognitive-related spatiotemporal EEG features and achieving at least 58% accuracy on the most realistic competing paradigm for the unseen subjects. To our knowledge, this is the first study dealing with this problem.
Introduction Schizophrenia is a severe mental disorder, and early diagnosis is key to improving outcomes. Its complexity makes predicting onset and progression challenging. EEG has emerged as a valuable tool for studying schizophrenia, with machine learning increasingly applied for diagnosis. This paper assesses the accuracy of ML models for predicting schizophrenia and examines the impact of stress during EEG recording on model performance. We integrate acute stress prediction into the analysis, showing that overlapping conditions like stress during recording can negatively affect model accuracy. Methods Four XGBoost models were built: one for stress prediction, two to classify schizophrenia (at rest and task), and a model to predict schizophrenia for both conditions. XAI techniques were applied to analyze results. Experiments tested the generalization of schizophrenia models using their datasets' healthy controls and independent health-screened controls. The stress model identified high-stress subjects, who were excluded from further analysis. A novel method was used to adjust EEG frequency band power to remove stress artifacts, improving predictive model performance. Results Our results show that acute stress vary across EEG sessions, affecting model performance and accuracy. Generalization improved once these varying stress levels were considered and compensated for during model training. Our findings highlight the importance of thorough health screening and management of the patient's condition during the process. Stress induced during or by the EEG recording can adversely affect model generalization. This may require further preprocessing of data by treating stress as an additional physiological artifact. Our proposed approach to compensate for stress artifacts in EEG data used for training models showed a significant improvement in predictive performance.