2025-05-21 | | Total: 9
We introduce Perceptual-Initialization (PI), a paradigm shift in visual representation learning that incorporates human perceptual structure during the initialization phase rather than as a downstream fine-tuning step. By integrating human-derived triplet embeddings from the NIGHTS dataset to initialize a CLIP vision encoder, followed by self-supervised learning on YFCC15M, our approach demonstrates significant zero-shot performance improvements, without any task-specific fine-tuning, across 29 zero shot classification and 2 retrieval benchmarks. On ImageNet-1K, zero-shot gains emerge after approximately 15 epochs of pretraining. Benefits are observed across datasets of various scales, with improvements manifesting at different stages of the pretraining process depending on dataset characteristics. Our approach consistently enhances zero-shot top-1 accuracy, top-5 accuracy, and retrieval recall (e.g., R@1, R@5) across these diverse evaluation tasks, without requiring any adaptation to target domains. These findings challenge the conventional wisdom of using human-perceptual data primarily for fine-tuning and demonstrate that embedding human perceptual structure during early representation learning yields more capable and vision-language aligned systems that generalize immediately to unseen tasks. Our work shows that "beginning with you", starting with human perception, provides a stronger foundation for general-purpose vision-language intelligence.
Recurrent neural networks (RNNs) trained on neuroscience-inspired tasks offer powerful models of brain computation. However, typical training paradigms rely on open-loop, supervised settings, whereas real-world learning unfolds in closed-loop environments. Here, we develop a mathematical theory describing the learning dynamics of linear RNNs trained in closed-loop contexts. We first demonstrate that two otherwise identical RNNs, trained in either closed- or open-loop modes, follow markedly different learning trajectories. To probe this divergence, we analytically characterize the closed-loop case, revealing distinct stages aligned with the evolution of the training loss. Specifically, we show that the learning dynamics of closed-loop RNNs, in contrast to open-loop ones, are governed by an interplay between two competing objectives: short-term policy improvement and long-term stability of the agent-environment interaction. Finally, we apply our framework to a realistic motor control task, highlighting its broader applicability. Taken together, our results underscore the importance of modeling closed-loop dynamics in a biologically plausible setting.
Large language models (LLMs) can sometimes report the strategies they actually use to solve tasks, but they can also fail to do so. This suggests some degree of metacognition -- the capacity to monitor one's own cognitive processes for subsequent reporting and self-control. Metacognitive abilities enhance AI capabilities but raise safety concerns, as models might obscure their internal processes to evade neural-activation-based oversight mechanisms designed to detect harmful behaviors. Given society's increased reliance on these models, it is critical that we understand the limits of their metacognitive abilities, particularly their ability to monitor their internal activations. To address this, we introduce a neuroscience-inspired neurofeedback paradigm designed to quantify the ability of LLMs to explicitly report and control their activation patterns. By presenting models with sentence-label pairs where labels correspond to sentence-elicited internal activations along specific directions in the neural representation space, we demonstrate that LLMs can learn to report and control these activations. The performance varies with several factors: the number of example pairs provided, the semantic interpretability of the target neural direction, and the variance explained by that direction. These results reveal a "metacognitive space" with dimensionality much lower than the model's neural space, suggesting LLMs can monitor only a subset of their neural mechanisms. Our findings provide empirical evidence quantifying metacognitive capabilities in LLMs, with significant implications for AI safety.
Biological brains learn continually from a stream of unlabeled data, while integrating specialized information from sparsely labeled examples without compromising their ability to generalize. Meanwhile, machine learning methods are susceptible to catastrophic forgetting in this natural learning setting, as supervised specialist fine-tuning degrades performance on the original task. We introduce task-modulated contrastive learning (TMCL), which takes inspiration from the biophysical machinery in the neocortex, using predictive coding principles to integrate top-down information continually and without supervision. We follow the idea that these principles build a view-invariant representation space, and that this can be implemented using a contrastive loss. Then, whenever labeled samples of a new class occur, new affine modulations are learned that improve separation of the new class from all others, without affecting feedforward weights. By co-opting the view-invariance learning mechanism, we then train feedforward weights to match the unmodulated representation of a data sample to its modulated counterparts. This introduces modulation invariance into the representation space, and, by also using past modulations, stabilizes it. Our experiments show improvements in both class-incremental and transfer learning over state-of-the-art unsupervised approaches, as well as over comparable supervised approaches, using as few as 1% of available labels. Taken together, our work suggests that top-down modulations play a crucial role in balancing stability and plasticity.
Quantifying similarity between population spike patterns is essential for understanding how neural dynamics encode information. Traditional approaches, which combine kernel smoothing, PCA, and CCA, have limitations: smoothing kernel bandwidths are often empirically chosen, CCA maximizes alignment between patterns without considering the variance explained within patterns, and baseline correlations from stochastic spiking are rarely corrected. We introduce ReBaCCA-ss (Relevance-Balanced Continuum Correlation Analysis with smoothing and surrogating), a novel framework that addresses these challenges through three innovations: (1) balancing alignment and variance explanation via continuum canonical correlation; (2) correcting for noise using surrogate spike trains; and (3) selecting the optimal kernel bandwidth by maximizing the difference between true and surrogate correlations. ReBaCCA-ss is validated on both simulated data and hippocampal recordings from rats performing a Delayed Nonmatch-to-Sample task. It reliably identifies spatio-temporal similarities between spike patterns. Combined with Multidimensional Scaling, ReBaCCA-ss reveals structured neural representations across trials, events, sessions, and animals, offering a powerful tool for neural population analysis.
The present paper aims to develop a mathematical model concerning the visual perception of spatial information. It is a challenging problem in theoretical neuroscience to investigate how the spatial information of the objects in the physical space is encoded and decoded in the neural processes in the brain. In the past, researchers conjectured the existence of an abstract visual space where spatial information processing takes place. Based on several experimental data it was conjectured that the said psychological manifold is non-Euclidean. However, the consideration of the neural origin of the non-Euclidean character of the visual space was not explicit in the models. In the present paper, we showed that the neural mechanism and specifically the Fisher information contained in the neural population code plays the role of energy-momentum tensor to create the space-dependent metric tensor resulting in a curved space described by a curvature tensor. The theoretical prediction of information geometry regarding the emergence of curved manifolds in the presence of the Fisher information is verified in the present work in the domain of neural processing of spatial information at mid-level vision. Several well-known phenomena of visual optics are analyzed using the notion of non-Euclidean visual space, the geodesics of the space, and the Fisher-Rao metric as the suitable psychometric distance.
The perceptron has served as a prototypical neuronal learning machine in the physics community interested in neural networks and artificial intelligence, which included Gérard Toulouse as one of its prominent figures. It has also been used as a model of Purkinje cells of the cerebellum, a brain structure involved in motor learning, in the early influential theories of David Marr and James Albus. We review these theories, more recent developments in the field, and highlight questions of current interest.
Conventionally it is assumed that the nerve impulse is an electrical process based upon the observation that electrical stimuli produce an action potential as defined by Hodgkin Huxley (1952) (HH). Consequently, investigations into the computation of nerve impulses have almost universally been directed to electrically observed phenomenon. However, models of computation are fundamentally flawed and assume that an undiscovered timing system exists within the nervous system. In our view it is synchronisation of the action potential pulse (APPulse) that effects computation. The APPulse, a soliton pulse, is a novel purveyor of computation and is a quantum mechanical pulse: i.e. It is a non-Turing synchronised computational event. Furthermore, the APPulse computational interactions change frequencies measured in microseconds, rather than milliseconds, producing effective efficient computation. However, the HH action potential is a necessary component for entropy equilibrium, providing energy to open ion channels, but it is too slow to be functionally computational in a neural network. Here, we demonstrate that only quantum non-electrical soliton pulses converging to points of computation are the main computational structure with synaptic transmission occurring at slower millisecond speeds. Thus, the APPulse accompanying the action potential is the purveyor of computation; a novel computational mechanism, that is incompatible with Turing timed computation and artificial intelligence (AI).
A common view of sensory processing is as probabilistic inference of latent causes from receptor activations. Standard approaches often assume these causes are a priori independent, yet real-world generative factors are typically correlated. Representing such structured priors in neural systems poses architectural challenges, particularly when direct interactions between units representing latent causes are biologically implausible or computationally expensive. Inspired by the architecture of the olfactory bulb, we propose a novel circuit motif that enables inference with correlated priors without requiring direct interactions among latent cause units. The key insight lies in using sister cells: neurons receiving shared receptor input but connected differently to local interneurons. The required interactions among latent units are implemented indirectly through their connections to the sister cells, such that correlated connectivity implies anti-correlation in the prior and vice versa. We use geometric arguments to construct connectivity that implements a given prior and to bound the number of causes for which such priors can be constructed. Using simulations, we demonstrate the efficacy of such priors for inference in noisy environments and compare the inference dynamics to those experimentally observed. Finally, we show how, under certain assumptions on latent representations, the prior used can be inferred from sister cell activations. While biologically grounded in the olfactory system, our mechanism generalises to other natural and artificial sensory systems and may inform the design of architectures for efficient inference under correlated latent structure.