Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models

35101@AAAI

Total: 1

#1 Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models [PDF²] [Copy] [Kimi] [REL]

With the astonishing ability of different pretrained foundation models (e.g., large language models (LLMs), vision-language models, diffusion models), today’s AI research and development tendency has been revolutionized. In this talk, I will answer two questions: Q1: How can we efficiently train or fine-tune foundation models? Q2: How can we build strong open-world multimodal understanding and generation models with these pretrained foundation models?

Subject: AAAI.2025 - New Faculty Highlights

35101@AAAI

#1 Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models [PDF2] [Copy] [Kimi] [REL]

#1 Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models [PDF²] [Copy] [Kimi] [REL]