The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

#1 The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks [PDF¹⁸] [Copy] [Kimi²⁴] [REL]

Authors: Ziqian Zhong, Ziming Liu, Max Tegmark, Jacob Andreas

Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms? Several recent studies, on tasks ranging from group operations to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex: small changes to model hyperparameters and initializations can induce discovery of qualitatively different algorithms from a fixed training set, and even learning of multiple different solutions in parallel. In modular addition, we specifically show that models learn a known *Clock* algorithm, a previously undescribed, less intuitive, but comprehensible procedure we term the *Pizza* algorithm, and a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for mechanistically characterizing the behavior of neural networks across the algorithmic phase space.

Subject: NeurIPS.2023 - Oral

S5wmbQc1We@OpenReview

#1 The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks [PDF18] [Copy] [Kimi24] [REL]

#1 The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks [PDF¹⁸] [Copy] [Kimi²⁴] [REL]