Total: 1
While Large Language models' abilities for in-context learning (ICL) have drawn much attention, we examine some of its limitations on semantic tasks involving quantifiers like "all" and "some", as well as on tasks with linear functions. We identify Softmax, the scoring function in attention mechanism, as a contributing factor to these limitations. We propose scaled signed averaging (SSA), a novel alternative to Softmax to mitigate these problems. We show that SSA significantly improves performance on our ICL tasks. In addition, SSA outperforms transformer models with Softmax on several early learning NLP benchmarks and linguistic probing tasks on zero and few-shot settings.