Fast and Efficient AI

Speaker
Yang You, Ph.D., Presidential Young Professor National University Of Singapore
Abstract:
Designing AI systems that are both fast and efficient has become a cornerstone for scaling modern machine learning to real-world impact. In this talk, I will present a unified view of my research contributions that push the limits of training and deploying large-scale AI models efficiently, without compromising performance.
I will first introduce LARS and LAMB, two widely adopted optimizers that enable stable and scalable training of extremely large neural networks by addressing challenges in large-batch optimization. Notably, according to NVIDIA's public benchmarks, LAMB achieves up to 17× speedup over
state-of-the-art baselines for large-batch BERT pre-training, making large-scale training practically feasible.
I will then discuss my work on Sequence Parallelism and 2D Tensor Parallelism, which break through memory and communication bottlenecks by decomposing training workloads in novel ways-making it feasible to train trillion-parameter models on commodity hardware. Notably, Sequence Parallelism has been adopted by leading organizations such as NVIDIA and Meta in their large-scale training frameworks, powering models like Megatron-LM and OPT.
Next, I will share our recent advances in Real-Time Video Generation with Pyramid Attention Broadcast (PAB), which demonstrates how efficient architectural design principles can enable high-quality generative models to operate in real time, opening new directions for interactive AI applications. Finally, I will highlight Colossal-AI, an open-source system that brings these algorithmic and system-level innovations together into a unified toolkit. With over 41,000 GitHub stars, Colossal-AI has become one of the world's most popular open-source projects for large-scale AI system design, widely adopted by academia and industry alike.
By connecting innovations across optimizers, parallelism strategies, system design, and generative AI, I hope to illustrate a coherent research vision for building the next generation of fast, efficient, and accessible AI systems.
Categories
Engineering, Lecture/Talk, Medicine, Research, Technology, Webcast