Improving Occupational ISCO Classification of Multilingual Swiss Job Postings with LLM-Refined Training Data

#1 Improving Occupational ISCO Classification of Multilingual Swiss Job Postings with LLM-Refined Training Data [PDF] [Copy] [Kimi] [REL]

Authors: Ann-Sophie Gnehm, Simon Clematide

Classifying occupations in multilingual job postings is challenging due to noisy labels, language variation, and domain-specific terminology. We present a method that refines silver-standard ISCO labels by consolidating them with predictions from pre-fine-tuned models, using large language model (LLM) evaluations to resolve discrepancies. The refined labels are used in Multiple Negatives Ranking (MNR) training for SentenceBERT-based classification. This approach substantially improves performance, raising Top-1 accuracy on silver data from 37.2% to 58.3% and reaching up to 80% precision on held-out data—an over 30-point gain validated by both GPT and human raters. The model benefits from cross-lingual transfer, with particularly strong gains in French and Italian. These results demonstrate hat LLM-guided label refinement can substantially improve multilingual occupation classification in fine-grained taxonomies such as CH-ISCO with 670 classes.

Subject: ACL.2025 - Findings

2025.findings-acl.1124@ACL

#1 Improving Occupational ISCO Classification of Multilingual Swiss Job Postings with LLM-Refined Training Data [PDF] [Copy] [Kimi] [REL]