E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

#1 E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning [PDF²³] [Copy] [Kimi⁴²] [REL]

Authors: Zihan Liao, Jun Wang, Hang Yu, Lingxiao Wei, Jianguo Li, Jun Wang, Wei Zhang

Processing long contexts is increasingly important for Large Language Models (LLMs) in tasks like multi-turn dialogues, code generation, and document summarization. This paper addresses the challenges of achieving high long-context performance, low computational complexity, and compatibility with pretrained models -- collectively termed the ``impossible triangle''. We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. E2LLM divides long contexts into chunks, compresses each into soft prompts using a pretrained text encoder, and aligns these representations with a decoder-only LLM via an adapter. To enhance the LLM's reasoning with these soft prompts, we employ two training objectives: encoder output reconstruction and long-context instruction fine-tuning. Extensive experiments reveal that E2LLM not only outperforms 8 state-of-the-art (SOTA) methods in effectiveness and efficiency for document summarization and question answering, but also achieves the best performance on LongBench v2 among models of comparable size.

Subject: Computation and Language

Publish: 2024-09-10 17:44:35 UTC

2409.06679

#1 E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning [PDF23] [Copy] [Kimi42] [REL]

#1 E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning [PDF²³] [Copy] [Kimi⁴²] [REL]