WbfbT2BH6F@OpenReview

Total: 1

#1 TabNAT: A Continuous-Discrete Joint Generative Framework for Tabular Data [PDF1] [Copy] [Kimi1] [REL]

Authors: Hengrui Zhang, Liancheng Fang, Qitian Wu, Philip Yu

While autoregressive models dominate natural language generation, their application to tabular data remains limited due to two challenges: 1) tabular data contains heterogeneous types, whereas autoregressive next-token (distribution) prediction is designed for discrete data, and 2) tabular data is column permutation-invariant, requiring flexible generation orders. Traditional autoregressive models, with their fixed generation order, struggle with tasks like missing data imputation, where the target and conditioning columns vary. To address these issues, we propose Diffusion-nested Non-autoregressive Transformer (TabNAT), a hybrid model combining diffusion processes and masked generative modeling. For continuous columns, TabNAT uses a diffusion model to parameterize their conditional distributions, while for discrete columns, it employs next-token prediction with KL divergence minimization. A masked Transformer with bi-directional attention enables order-agnostic generation, allowing it to learn the distribution of target columns conditioned on arbitrary observed columns. Extensive experiments on ten datasets with diverse properties demonstrate TabNAT's superiority in both unconditional tabular data generation and conditional missing data imputation tasks.

Subject: ICML.2025 - Poster