Total: 1
Achieving appropriate human reliance on Artificial Intelligence (AI) systems remains a central challenge in Human-Computer Interaction. Confidence scores—indicators of an AI system’s certainty in its recommendations—have been proposed as a means to help users calibrate their trust and reliance on AI Decision Support Systems (DSS). However, limited research has explored how well-calibrated versus miscalibrated confidence scores affect human decision-making. We report a study examining the effects of confidence calibration on user reliance, decision accuracy, and perceived utility of an AI DSS. In a within-subjects experiment involving 184 participants solving logic puzzles, we found that well-calibrated confidence scores significantly improved decision accuracy (+20%, 95% CI: [0.18, 0.23]), whereas miscalibrated scores yielded minimal accuracy gains (+2%, 95% CI: [-0.00, 0.04]) and increased vulnerability to automation bias and conservatism bias. Participants were more likely to accept AI recommendations when high confidence was expressed, even when those recommendations were incorrect, resulting in errors. Conversely, miscalibrated and low-confidence recommendations increased conservatism bias, leading users to reject even accurate AI suggestions. Perceived utility of the AI system was higher when confidence levels were high (p < 0.001) and when confidence was well-calibrated (p = 0.002). These findings underscore the importance of designing AI systems with properly calibrated confidence cues to improve human-AI collaboration and mitigate reliance-related biases.