Less Mature is More Adaptable for Sentence-level Language Modeling

#1 Less Mature is More Adaptable for Sentence-level Language Modeling [PDF²] [Copy] [Kimi³] [REL]

Authors: Abhilasha Sancheti, David Dale, Artyom Kozhevnikov, Maha Elbayad

This work investigates sentence-level models (i.e., models that operate at the sentence-level) to study how sentence representations from various encoders influence downstream task performance, and which syntactic, semantic, and discourse-level properties are essential for strong performance. Our experiments encompass encoders with diverse training regimes and pretraining domains, as well as various pooling strategies applied to multi-sentence input tasks (including sentence ordering, sentiment classification, and natural language inference) requiring coarse-to-fine-grained reasoning. We find that ”less mature” representations (e.g., mean-pooled representations from BERT’s first or last layer, or representations from encoders with limited fine-tuning) exhibit greater generalizability and adaptability to downstream tasks compared to representations from extensively fine-tuned models (e.g., SBERT or SimCSE). These findings are consistent across different pretraining seed initializations for BERT. Our probing analysis reveals that syntactic and discourse-level properties are stronger indicators of downstream performance than MTEB scores or decodability. Furthermore, the data and time efficiency of sentence-level models, often outperforming token-level models, underscores their potential for future research.

Subject: ACL.2025 - Long Papers

2025.acl-long.573@ACL

#1 Less Mature is More Adaptable for Sentence-level Language Modeling [PDF2] [Copy] [Kimi3] [REL]

#1 Less Mature is More Adaptable for Sentence-level Language Modeling [PDF²] [Copy] [Kimi³] [REL]