Total: 1
The analysis of flow around buildings has gained significant research interest across various domains, including pedestrian safety, pollutant dispersion, natural ventilation, and building energy efficiency. While these domains frequently include high-resolution computational fluid dynamics (CFD) data, predicting urban flow fields with machine learning (ML) models has emerged as a promising approach to overcome the prohibitive costs of CFD simulations. However, the availability of open-source datasets for training such ML models remains scarce. In particular, publicly available two-dimensional datasets of urban flow fields are nearly non-existent, despite their potential value for early development and debugging stages of data-driven models, before scaling to computationally expensive three-dimensional datasets. To bridge this gap, this study presents a comprehensive dataset consisting of 3,000 two-dimensional urban flow simulations conducted using a lattice-Boltzmann method across three distinct Reynolds numbers. The dataset contains the time-averaged velocity fields. A key feature of this dataset is its high geometric diversity: each layout incorporates between three and six buildings with randomized sizes, positions, and rotation angles ranging from 0° to 90°. This extensive variability enables the dataset to capture several critical flow characteristics, including wake formation, flow acceleration, shielding effects, and recirculation zones, across a wide range of orchestrated urban canopies. The large sample size and consistent simulation setup make the dataset particularly suitable for developing and benchmarking ML architectures. In addition, the dataset can support transfer-learning strategies in which models trained on large two-dimensional datasets are adapted to smaller and more computationally expensive three-dimensional datasets.