Anhong Guo receives Google funding for live visual description platform for blind people
Anhong Guo, assistant professor of computer science and engineering at the University of Michigan, has received a $75,000 grant from the Google Academic Research Awards program for his project titled “WorldScribe: Context-Aware Live Visual Descriptions for Blind People.” Building on his extensive engagement with the disability community, Guo aims to revolutionize how users who are blind or have low vision perceive and interact with their environments.
The Google Academic Research Awards program supports pioneering research in computer science and related fields. Each year, Google funds a select number of proposals that demonstrate strong potential for technological impact and relevance to contemporary challenges in the field.
The WorldScribe project, led by Guo and CSE PhD student Ruei-Che Chang, tackles the complex challenge of providing rich, real-time visual descriptions to blind and low-vision users. By leveraging the latest advancements in vision-language models (VLMs) and large language models (LLMs), WorldScribe can generate detailed and contextually aware descriptions that are tailored to the user’s current environment and activities. The system aims to provide users with enhanced autonomy and independence as they navigate everyday scenarios, from busy streets to quiet indoor spaces.
The proposed research is founded on three core activities: developing a live agent architecture that can handle multimodal input for generating descriptions; adapting these descriptions to varying user intents in real-time; and embedding short-term and long-term memory in the system to provide contextual and personalized experiences. This comprehensive approach positions WorldScribe to effectively address the diverse needs of its users.
The benefits of WorldScribe are manifold. By providing instant and detailed descriptions of their surroundings, individuals who are blind or have low vision can gain a much fuller understanding of their environment. This, in turn, supports their safety, enhances their ability to socialize, and fosters greater independence in their daily lives.
“Providing rich and detailed descriptions in real time is a grand challenge in the accessibility space,” said Guo. “We saw an opportunity to leverage increasingly capable AI models to create automated and adaptive live descriptions.”
This work builds on the team’s previous successful projects, including VizLens and Lookout, which have already made notable advancements in assistive technology for people who are blind or have low vision. Guo and Chang’s recent paper at UIST 2024 introducing WorldScribe won Best Paper Award, a testament to its success. The potential of WorldScribe to transform lives and interactions with the world aligns with Google’s mission to make information universally accessible and useful.
“By leveraging multimodal context and real-time interactions, our goal is to make the real world more accessible and navigable for blind and low-vision individuals,” added Guo.