35102@AAAI

Total: 1

#1 Breaking the Resource Monopoly from Industries: Sustainable and Reliable LLM Serving by Recycling Outdated and Resource-Constrained GPUs [PDF1] [Copy] [Kimi] [REL]

Author: Tianlong Chen

In recent years, Large Language Model (LLM) agents, exemplified by models like ChatGPT, and PaLM, have showcased remarkable prowess in various tasks, owing to their vast number of parameters and emergent in-context learning capabilities. People expect the wide usage of LLM serving at edge hardware, personal devices, and organization/enterprise IT infrastructures to revolutionize global access to information, communication, automation, and creativity. However, due to the extreme large-scale LLM parameters (LLaMA 3.1 contains 405 billion of 2 or 4 bytes floating point numbers), the LLM serving is facing significant sustainability pressure due to its requirements on the latest high-embodied carbon hardware (e.g., GPUs, HBMs, memory, storage, and network hardware) and the high operational carbon emissions, leading to a significant and alarming increase in carbon emissions and a high barrier to their widespread deployments and practical applications in various scenarios. Companies, organizations, and institutes usually have the complete general-purpose IT infrastructure, which consists of a large amount of computing, memory, storage, and network hardware. Although these general-purpose IT infrastructures are far more than enough for existing application executions, deploying and executing the LLM for a broad spectrum of serving platforms can be challenging and difficult due to resource limitations. Purchasing the latest hardware including GPUs (e.g., Nvidia H100 or H200) will lead to considerable issues including 1) serious embodied carbon emissions during the new hardware production, 2) no explicitly lower operational carbon emissions with essential modeling and optimizations, 3) high economic and financial pressures, and 4) potentially tremendous existing hardware resource wasting. Therefore, it is a trend and becomes a must to explore how to use the existing hardware, especially outdated hardware, to collectively improve both environmental sustainability, efficiency, and reliability for LLM serving. A few pioneering examples include Microsoft’s Project Natick, Google’s TPU Pod Optimization, Alibaba’s Cloud Server Repurposing, and Facebook’s Network Hardware Reuse. In this talk, I will traverse my series of contributions with promising new directions, particularly emphasizing modularized LLM architecture (Part 1), in-storage sustainable computing (Part 2), and reliable serving against software and hardware attacks (Part 3).

Subject: AAAI.2025 - New Faculty Highlights