Let's CONFER: A Dataset for Evaluating Natural Language Inference Models on CONditional InFERence and Presupposition

#1 Let's CONFER: A Dataset for Evaluating Natural Language Inference Models on CONditional InFERence and Presupposition [PDF] [Copy] [Kimi] [REL]

Authors: Tara Azin, Daniel Dumitrescu, Diana Inkpen, Raj Singh

Natural Language Inference (NLI) is the task of determining whether a sentence pair represents entailment, contradiction, or a neutral relationship. While NLI models perform well on many inference tasks, their ability to handle fine-grained pragmatic inferences, particularly presupposition in conditionals, remains underexplored. In this study, we introduce CONFER, a novel dataset designed to evaluate how NLI models process inference in conditional sentences. We assess the performance of four NLI models, including two pre-trained models, to examine their generalization to conditional reasoning. Additionally, we evaluate Large Language Models (LLMs), including GPT-4o, LLaMA, Gemma, and DeepSeek-R1, in zero-shot and few-shot prompting settings to analyze their ability to infer presuppositions with and without prior context. Our findings indicate that NLI models struggle with presuppositional reasoning in conditionals, and fine-tuning on existing NLI datasets does not necessarily improve their performance.

Subject: Computation and Language

Publish: 2025-06-06 14:42:20 UTC

2506.06133

#1 Let's CONFER: A Dataset for Evaluating Natural Language Inference Models on CONditional InFERence and Presupposition [PDF] [Copy] [Kimi] [REL]