Processing math: 100%

2506.09437

Total: 1

#1 Sufficient digits and density estimation: A Bayesian nonparametric approach using generalized finite Pólya trees [PDF] [Copy] [Kimi] [REL]

Authors: Mario Beraha, Jesper Møller

This paper proposes a novel approach for statistical modelling of a continuous random variable X on [0,1), based on its digit representation X=.X1X2. In general, X can be coupled with a random variable N so that if a prior of N is imposed, (X1,,XN) becomes a sufficient statistics and .XN+1XN+2 is uniformly distributed. In line with this fact, and focusing on binary digits for simplicity, we propose a family of generalized finite Pólya trees that induces a random density for a sample, which becomes a flexible tool for density estimation. Here, the digit system may be random and learned from the data. We provide a detailed Bayesian analysis, including closed form expression for the posterior distribution which sidesteps the need of MCMC methods for posterior inference. We analyse the frequentist properties as the sample size increases, and provide sufficient conditions for consistency of the posterior distributions of the random density and N. We consider an extension to data spanning multiple orders of magnitude, and propose a prior distribution that encodes the so-called extended Newcomb-Benford law. Such a model shows promising results for density estimation of human-activity data. Our methodology is illustrated on several synthetic and real datasets.

Subjects: Methodology , Probability , Statistics Theory

Publish: 2025-06-11 06:38:34 UTC