Total: 1
The oxidation state (OS) is an essential chemical concept that embodies chemical intuition but cannot be computed with well-defined physical laws. We establish a data-driven paradigm, with its implementation as Tsinghua Oxidation States in Solids (TOSS), to explicitly compute the OSs in crystal structures as the emergent properties from large-sized datasets based on Bayesian maximum a posteriori probability (MAP). TOSS employs two looping structures over the large-sized dataset of crystal structures to obtain an emergent library of distance distributions as the foundation for chemically intuitive understanding and then determine the OSs by minimizing a loss function for each structure based on MAP and distance distributions in the whole dataset. The application of TOSS to a dataset of $\mathrm{>}$1,000,000 crystal structures delivers a superior success rate, and using the resulting OSs as the dataset, we further train a data-driven alternative to TOSS based on graph convolutional networks. We expect TOSS and the ML-model-based alternative to find a wide spectrum of applications, and this work also demonstrates an encouraging example for the data-driven paradigms to explicitly compute the chemical intuition for tackling complex problems in chemistry.