AI Seminar

Benchmarking LLMs’ Judgments with No Gold Standard

Grant SchoenebeckAssociate Professor, School of InformationUniversity of Michigan
SHARE:

Location: BBB 3725
Zoom: https://umich.zoom.us/j/97434198716
Meeting ID: 974 3419 8716
Passcode: aiseminar

Benchmarking LLMs’ Judgments with No Gold Standard

Abstract:

With the advent of Large Language Models, a key question is how to evaluate the text they produce. This talk will introduce the GEM (Generative Estimator for Mutual Information), an evaluation metric for assessing language generation by Large Language Models (LLMs), particularly in generating informative judgments, without the need for a gold standard reference. GEM broadens the scenarios where we can benchmark LLM generation performance-from traditional ones, like machine translation and summarization, where gold standard references are readily available, to subjective tasks without clear gold standards, such as academic peer review. GEM uses a generative model to estimate mutual information between candidate and reference responses, without requiring the reference to be a gold standard. This work builds upon previous work about mechanisms for information elicitation. Although NLG Evaluation may not seem related to mechanism design, this talk will make this connection clear.

Bio:

Grant Schoenebeck is an associate professor at the University of Michigan in the School of Information. His work has recently focused on develop and analyze systems for eliciting and aggregating information from of diverse group of agents with varying information, interests, and abilities by combining ideas from theoretical computer science, machine learning, and economics (e.g game theory, mechanism design, and information design). More generally his recent work has been about incentives and (machine) learning in a variety of contexts. His research is supported by the NSF including an NSF CAREER award. Before coming to the University of Michigan in 2012, he was a Postdoctoral Research Fellow at Princeton. Grant received his PhD at UC Berkeley, studied theology at Oxford University, and received his BA in mathematics and computer science from Harvard.

Organizer

AI Lab

Student Host

Martin Ziqiao MaAI Lab Seminar Tsar

Faculty Host

Wei HuAssistant Professor, Computer Science and EngineeringUniversity of Michigan