Scaled Signed Averaging Improves In-Context and Early Learning Benchmark Performance in Small Transformers

2508.14685

Total: 1

#1 Scaled Signed Averaging Improves In-Context and Early Learning Benchmark Performance in Small Transformers [PDF⁴] [Copy] [Kimi⁴] [REL]

Authors: Omar Naim, Swarnadeep Bhar, Jérôme Bolte, Nicholas Asher

While Large Language models' abilities for in-context learning (ICL) have drawn much attention, we examine some of its limitations on semantic tasks involving quantifiers like "all" and "some", as well as on tasks with linear functions. We identify Softmax, the scoring function in attention mechanism, as a contributing factor to these limitations. We propose scaled signed averaging (SSA), a novel alternative to Softmax to mitigate these problems. We show that SSA significantly improves performance on our ICL tasks. In addition, SSA outperforms transformer models with Softmax on several early learning NLP benchmarks and linguistic probing tasks on zero and few-shot settings.

Subject: Computation and Language

Publish: 2025-08-20 13:01:34 UTC

2508.14685

#1 Scaled Signed Averaging Improves In-Context and Early Learning Benchmark Performance in Small Transformers [PDF4] [Copy] [Kimi4] [REL]

#1 Scaled Signed Averaging Improves In-Context and Early Learning Benchmark Performance in Small Transformers [PDF⁴] [Copy] [Kimi⁴] [REL]