AI Seminar
Generative Models of Video
This event is free and open to the publicAdd to Google Calendar

Abstract:
Generative models of text, audio, image and video have made immense strides in the past few years in terms of quality of generation that is often indistinguishable from real data. In addition, these models have become widely accessible and increasingly useful for real-world applications.
In this talk, I will focus on generative models of images and video. I will begin with a brief historical overview of the key developments that led to today’s state-of-the-art models. Then, I will provide an intuitive explanation of diffusion models, exploring the underlying principles that contribute to their impressive performance. Building on this foundation, I will describe the various approaches that have culminated in the current generation of video diffusion models. Finally, I will highlight examples of incredible works from creative professionals, discuss current limitations of video generation, and outline potential directions for future research.
Bio: I am a Senior Research Scientist at Google DeepMind where I work on generative models of video. My primary goal is to develop models that learn to understand and generate the intricate dynamics of the real world from massive amounts of video data. These world models should be capable of interpreting the dynamics within a given video, forecast future events based on past observations, and to recombine learned concepts into novel, simulated versions of reality. I received my PhD from the Computer Science & Engineering Department at the University of Michigan, Ann Arbor under the supervision of Professor Honglak Lee. During my PhD, I mainly focused on building models for future frame prediction using self-supervised and supervised approaches. I also contributed to building world models successfully applied in model-based reinforcement learning.