Total: 1
Granting maximum information extraction in the analysis of noisy data is non-trivial. We introduce a general, data-driven approach that employs Shannon entropy as a transferable metric to quantify the maximum information extractable from noisy data via their clustering into statistically-relevant micro-domains. We demonstrate the method's efficiency by analyzing, as a representative example, time-series data extracted from molecular dynamics simulations of water and ice coexisting at the solid/liquid transition temperature. The method allows quantifying the information contained in the data distributions (time-independent component) and the additional information gain attainable by analyzing data as time-series (i.e., accounting for the information contained in data time-correlations). The approach is also highly effective for high-dimensional datasets, providing clear demonstrations of how considering components/data that may be little informative but noisy may be not only useless but even detrimental to maximum information extraction. This provides a general and robust parameter-free approach and quantitative metrics for data-analysis, and for the study of any type of system from its data.