lou-resin@osdi22@USENIX

Total: 1

#1 RESIN: A Holistic Service for Dealing with Memory Leaks in Production Cloud Infrastructure [PDF] [Copy] [Kimi] [REL]

Authors: Chang Lou ; Cong Chen ; Peng Huang ; Yingnong Dang ; Si Qin ; Xinsheng Yang ; Xukun Li ; Qingwei Lin ; Murali Chintalapati

Memory leak is a notorious issue. Despite the extensive efforts, addressing memory leaks in large production cloud systems remains challenging. Existing solutions incur high overhead and/or suffer from high inaccuracies. This paper presents RESIN, a solution designed to holistically address memory leaks in production cloud infrastructure. RESIN takes a divide-and-conquer approach to tackle the challenges. It performs a low-overhead detection first with a robust bucketization-based pivot scheme to identify suspicious leaking entities. It then takes live heap snapshots at appropriate time points in carefully sampled leak entities. RESIN analyzes the collected snapshots for leak diagnosis. Finally, RESIN automatically mitigates detected leaks. RESIN has been running in production in Microsoft Azure for 3 years. It reports on average 24 leak tickets each month with high accuracy and low overhead, and provides effective diagnosis reports. Its results translate into a 41× reduction of VM reboots caused by low memory.