MIDWEST SPEECH AND LANGUAGE DAYS

April 15-16, 2024

University of Michigan, Ann Arbor
The Michigan League (map)
Ann Arbor, MI, 48105

Midwest Speech and Language Days (MSLD) is a 2-day meeting that continues and expands upon the tradition of Illinois Speech Day and the Midwest Computational Linguistics Colloquium. After a pandemic-induced hiatus, MSLD returns in 2024 at the University of Michigan. This non-archival meeting invites presenters and attendees come from Midwest universities and research institutions to discuss recent and in-progress work. The goal is to increase awareness of speech and language research going on in the region and to foster collaboration among sites.

Visiting Ann Arbor
We welcome all to the University of Michigan! The organizing committee and students have put together a list of things to see/do and places to eat around Ann Arbor.

Keynote Speakers:
Emma Strubell (Carnegie Mellon University)
Betsy Sneller (Michigan State University)
Eric Fosler-Lussier (The Ohio State University)
Hao Peng (University of Illinois Urbana-Champaign)

Organizers:
Steve Abney, University of Michigan
Dallas Card, University of Michigan
Joyce Chai, University of Michigan
David Jurgens, University of Michigan
Rada Mihalcea, University of Michigan
Emily Mower Provost, University of Michigan
VG Vinod Vydiswaran, University of Michigan
Lu Wang, University of Michigan
Justine Zhang, University of Michigan

Presentation Modalities:
Talks: Talk slots are 10 minutes and include 15 minutes for the talk and 5 for questions/switching presenters.
Posters: Posters can be at most 48″x36″ (width x height) in landscape orientation.

Schedule of Events

MONDAY, April 15

08:00-09:00 Breakfast/Registration at The Michigan League

09:00-09:15 Welcome

09:15-10:00 Keynote 1: Emma Strubell (CMU): “LLMs: Everything’s Different and Nothing Has Changed”

10:00-10:30 Coffee Break 1

10:30-11:30 Talk Session 1
10:30-10:45 Bohan Zhang, Yixin Wang, Paramveer Dhillon (Univ. Michigan) Causal Inference for Human-Language Model Collaboration
10:45-11:00 Christian Clark, William Schuler (The Ohio State) Categorical Grammar Induction from Raw Data
11:00-11:15 Achyutarama R Ganti, Steven R. Wilson, Wing-Yue Geoffrey Louie (Oakland University) Cross-Domain Classification of Educational Talk Turns
11:15-11:30 Joshua Gryzen, Yuliya Lierler (Univ. of Nebraska, Omaha) Evaluating Open-Source Large Language Models on bAbI-Tasks

11:30-12:30 Poster Session 1

12:30-14:00 Lunch

14:00-15:15 Talk Session 2
14:00-14:15 Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, Philip Torr, Bernard Ghanem, Guohao Li (IIT) Can Large Language Model Agents Simulate Human Trust Behaviors?
14:15-14:30 Chenghao Yang, Allyson Ettinger (Univ. of Chicago) Can You Follow Me? Testing Situational Understanding in ChatGPT
14:30-14:45 So Yeon Min, Xavier Puig, Devendra Singh Chaplot, Tsung-Yen Yang, Akshara Rai, Priyam Parashar, Ruslan Salakhutdinov, Yonatan Bisk, Roozbeh Mottaghi (CMU) Situated Instruction Following
14:45-15:00 Fan Huang, Haewoon Kwak, Kunwoo Park, Jisun An (Univ. of Indiana) ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?
15:00-15:15 Tunazzina Islam (Purdue) Uncovering Latent Arguments in Social Media Messaging by Employing a LLMs-in-the-Loop Strategy

15:15-16:15 Coffee Break 2 + Poster Session 2

16:15-17:00 Keynote 2: Betsy Sneller (MSU): “Real talk: Naturalistic speech in the MI Diaries corpus”

TUESDAY, April 16

08:00-09:00 Breakfast/Registration at the Michigan League

09:00-09:45 Keynote 3: Eric Fossler-Lussier (OSU): “Borderlands: When Speech Meets Text”

09:45-10:45 Talk Session 3
09:45-10:00 Ruiyi Wang, Haofei Yu, Wenxin Sharon Zhang, Zhengyang Qi, Maarten Sap, Graham Neubig, Yonatan Bisk, Hao Zhu (CMU) SOTOPIA-π: Interactive Learning of Socially Intelligent Language Agents
10:00-10:15 Andy Yang (Notre Dame) Masked Hard-Attention Transformers and Boolean RASP Recognize Exactly the Star-Free Languages
10:15-10:30 Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su (The Ohio State) GPT-4V(ision) is a Generalist Web Agent, if Grounded
10:30-10:45 Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu (TTIC) Few-Shot Spoken Language Understanding Via Joint Speech-Text Models

10:45-12:15 Coffee Break 3 and Poster Session 3

12:15-13:30 Lunch

01:30-02:15 Talk Session 4
13:30-13:45 Meera Desai, Irene Pasquetto, Abigail Z Jacobs, Dallas Card (Univ. Michigan) An Archival Perspective on Pretraining Data
13:45-14:00 Clara Na, Ian Magnusson, Ananya Harsh Jha, Tom Sherborne, Emma Strubell, Jesse Dodge, Pradeep Dasigi (CMU) Approximating training data ablations for language models
14:00-14:15 Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang (Univ. of Minnesota) How Far Can We Extract Diverse Perspectives from Large Language Models?

02:15-03:00 Keynote 4: Hao Peng (UIUC): “Pushing the Boundaries of Length Generalization and Reasoning Capabilities of Open LLMs”

03:00-03:15 Closing

Poster Sessions

Posters are assigned numbers with specific places in the room so please see use the following guide. Most posters are shown for presentation in two sessions.

Poster Session 1

Muhammad S. Abdo, Damir Cavar The Hosiers Ellipsis Corpus: Building a Corpus of Ellipsis for Arabic Natural Language Processing

Yingshan Chang, Yasi Zhang, Jacob Zhiyuan Fang, Ying Nian Wu, Yonatan Bisk, Feng Gao Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

Chenghao Yang, Tuhin Chakrabarty, Karli R Hochstatter, Melissa N Slavin, Nabila El-Bassel, Smaranda Muresan Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts

Katsumi Ibaraki, Winston Wu, Lu Wang, Rada Mihalcea Analyzing Occupational Distribution Representation in Japanese Language Models

Jood Otey, Laura Biester, Steven R. Wilson Multilingual Error Analysis For Offensive Language

Santiago Castro, Amir Ziai, Avneesh Saluja, Zhuoning Yuan, Rada Mihalcea CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models

Yijun Pan, Sushrita Rakshit, Daniel Tian, Hua Shen, Kenan Alkiek, David Jurgens Interpreting Spatial Reasoning Capabilities in Language Models

Kevin Christian Wibisono, Yixin Wang Causal Inference with Text Data via Maximizing Contrasts

Stephen Bothwell, Justin DeBenedetto, Theresa Crnkovich, Hildegund Muller, David Chiang Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines

Jiaxin Pei, Aparna Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Apostolos Dedeloudis, Jackson Sargent, David Jurgens POTATO: The Portable Text Annotation Tool

Aylin Ece Gunal, Djallel Bouneffouf Are LLMs Rational Agents? Iterated Prisoner’s Dilemma to Understand LLM Strategy

Damir Cavar, Ludovic Mompelat, Muhammad S. Abdo The Hoosier Ellipsis Corpus (HELC): Documenting Linguistic Dark Matter

Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul Speech Recognition for Analysis of Police Radio Communication

Anna Wegmann, Tijs van den Broek, Dong Nguyen What’s Mine becomes Yours: Defining, Annotating and Detecting Context-Dependent Paraphrases in News Interview Dialogs

Aparna Ananthasubramaniam, Daniel Romero, David Jurgens Using Text Classifiers to Study how Macroeconomic Context Moderates Socioeconomic Determinants of Suicide

Mingyue Huo Aligning Speech and Hum Pairs Based on Dynamic Time Warping

Mingqian Zheng, Jiaxin Pei, Lajanugen Logeswaran, Moontae Lee, David Jurgens Is ”A Helpful Assistant“ the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts

Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, Rada Mihalcea Tables as Images? Exploring the Strengths and Limitations of LLMs on Multimodal Representations of Tabular Data

Jason Yan, Tong Lin, Yanna Krupnikov, Kerri Milita, Sabina J Tomkins I’ve Seen That Before! Towards Understanding Hard News Exposure from Soft News Outlets

Alyssa Allen SQL explainability via LLM generated comments

Giorgio Piatti, Zhijing Jin, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, Rada Mihalcea Governance of the Commons Benchmark for LLM Agents

Tong Lin, Jason Yan, Sabina J Tomkins Tab2Text: Transforming tabular data to text with LLMs

Grace LeFevre, Liam Frölund, Lori Beaman, Rob Voigt PLM-Augmented Rule-Based Classifiers: A Lightweight Method for Improving the Generalizability of Expert Knowledge in Novel Information Extraction Tasks

Jorge Fandinno, Yuliya Lierler tExplain: Information Extraction with Explanations

Ziru Chen, Michael White, Ray Mooney, Ali Payani, Yu Su, Huan Sun When is Tree Search Useful for LLM Planning? It Depends on the Discriminator

Canyu Chen, Kai Shu Can LLM-Generated Misinformation Be Detected?

Nam Ho Koh, Santiago Castro, Rada Mihalcea Text over Context: Navigating the Mirage of Model Gaslighting in Visual Language Frameworks

Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu Few-Shot Spoken Language Understanding Via Joint Speech-Text Models

Andy Yang Masked Hard-Attention Transformers and Boolean RASP Recognize Exactly the Star-Free Languages

Ruiyi Wang, Haofei Yu, Wenxin Sharon Zhang, Zhengyang Qi, Maarten Sap, Graham Neubig, Yonatan Bisk, Hao Zhu SOTOPIA-π: Interactive Learning of Socially Intelligent Language Agents

Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su GPT-4V(ision) is a Generalist Web Agent, if Grounded

Yanhong Li, Chenghao Yang, Allyson Ettinger When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models

Poster Session 2

Aarohi Srivastava, David Chiang BERTwich: Extending BERT’s Capabilities to Model Dialectal and Noisy Text

Yingshan Chang, Yasi Zhang, Jacob Zhiyuan Fang, Ying Nian Wu, Yonatan Bisk, Feng Gao Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

Trisha Thomas, Antje Stoehr, Ying Xu The role of bilingual proficiency in ASR performance of children’s speech

Katsumi Ibaraki, Winston Wu, Lu Wang, Rada Mihalcea Analyzing Occupational Distribution Representation in Japanese Language Models

Billy Dickson, Sahaj Singh Maini, Zoran Tiganj Combining LLMs and cognitive models of memory

Santiago Castro, Amir Ziai, Avneesh Saluja, Zhuoning Yuan, Rada Mihalcea CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models

Yanhong Li, Jiawei Zhou Enhancing Language Modeling with Adaptive Chunk Distilled Generation

Kevin Christian Wibisono, Yixin Wang Causal Inference with Text Data via Maximizing Contrasts

Angana Borah, Aparna Garimella, Rada Mihalcea Towards Region-aware Bias Evaluation Metrics

Jiaxin Pei, Aparna Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Apostolos Dedeloudis, Jackson Sargent, David Jurgens POTATO: The Portable Text Annotation Tool

Jiazhao Li, Yijin Yang, Zhuofeng Wu, V.G.Vinod Vydiswaran, Chaowei Xiao ChatGPT as an Attack Tool: Stealthy Backdoor Attack via Blackbox Generative Model Trigger

Damir Cavar, Ludovic Mompelat, Muhammad S. Abdo The Hoosier Ellipsis Corpus (HELC): Documenting Linguistic Dark Matter

Anna Wegmann, Tijs van den Broek, Dong Nguyen What’s Mine becomes Yours: Defining, Annotating and Detecting Context-Dependent Paraphrases in News Interview Dialogs

Michelle YoungJin Kim, Junghwan Kim ABLE: Agency-BeLiefs Embedding to Address Stereotypical Bias Through Awareness Instead of Obliviousness

Mingyue Huo Aligning Speech and Hum Pairs Based on Dynamic Time Warping

Kenneth Sible, David Chiang Improving Rare Word Translation with Dictionaries and Attention Masking

Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, Rada Mihalcea Tables as Images? Exploring the Strengths and Limitations of LLMs on Multimodal Representations of Tabular Data

Tianliang Xu, Haofei Xu, Chang Ge, Justine Zhang, Sabina J Tomkins A large-scale dataset to examine political discourse in local governance

Alyssa Allen SQL explainability via LLM generated comments

Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang Universal Automatic Phonetic Transcription into the International Phonetic Alphabet

Tong Lin, Jason Yan, Sabina J Tomkins Tab2Text: Transforming tabular data to text with LLMs

Oana Ignat, Longju Bai, Joan Nwatu, Rada Mihalcea Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

Jorge Fandinno, Yuliya Lierler tExplain: Information Extraction with Explanations

Seyed Ali Alavi Bajestan, Donald S Williamson Contrastive learning approach for source identification in auditory attention detection

Canyu Chen, Kai Shu Can LLM-Generated Misinformation Be Detected?

Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang How Far Can We Extract Diverse Perspectives from Large Language Models?

Achyutarama R Ganti, Steven R. Wilson, Wing-Yue Geoffrey Louie Cross-Domain Classification of Educational Talk Turns

Christian Clark, William Schuler Categorial Grammar Induction from Raw Data

Clara Na, Ian Magnusson, Ananya Harsh Jha, Tom Sherborne, Emma Strubell, Jesse Dodge, Pradeep Dasigi Approximating training data ablations for language models

Bohan Zhang, Yixin Wang, Paramveer Dhillon Causal Inference for Human-Language Model Collaboration

Joshua Gryzen, Yuliya Lierler Evaluating Open-Source Large Language Models on bAbI-Tasks

Chenghao Yang, Allyson Ettinger Can You Follow Me? Testing Situational Understanding in ChatGPT

Meera Desai, Irene Pasquetto, Abigail Z Jacobs, Dallas Card. An Archival Perspective on Pretraining Data

Poster Session 3

Muhammad S. Abdo, Damir Cavar The Hosiers Ellipsis Corpus: Building a Corpus of Ellipsis for Arabic Natural Language Processing

Aarohi Srivastava, David Chiang BERTwich: Extending BERT’s Capabilities to Model Dialectal and Noisy Text

Chenghao Yang, Tuhin Chakrabarty, Karli R Hochstatter, Melissa N Slavin, Nabila El-Bassel, Smaranda Muresan Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts

Trisha Thomas, Antje Stoehr, Ying Xu The role of bilingual proficiency in ASR performance of children’s speech

Jood Otey, Laura Biester, Steven R. Wilson Multilingual Error Analysis For Offensive Language

Billy Dickson, Sahaj Singh Maini, Zoran Tiganj Combining LLMs and cognitive models of memory

Yijun Pan, Sushrita Rakshit, Daniel Tian, Hua Shen, Kenan Alkiek, David Jurgens Interpreting Spatial Reasoning Capabilities in Language Models

Yanhong Li, Jiawei Zhou Enhancing Language Modeling with Adaptive Chunk Distilled Generation

Stephen Bothwell, Justin DeBenedetto, Theresa Crnkovich, Hildegund Muller, David Chiang Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines

Angana Borah, Aparna Garimella, Rada Mihalcea Towards Region-aware Bias Evaluation Metrics

Aylin Ece Gunal, Djallel Bouneffouf Are LLMs Rational Agents? Iterated Prisoner’s Dilemma to Understand LLM Strategy

Jiazhao Li, Yijin Yang, Zhuofeng Wu, V.G.Vinod Vydiswaran, Chaowei Xiao ChatGPT as an Attack Tool: Stealthy Backdoor Attack via Blackbox Generative Model Trigger

Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul Speech Recognition for Analysis of Police Radio Communication

Yanhong Li, Chenghao Yang, Allyson Ettinger When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models

Aparna Ananthasubramaniam, Daniel Romero, David Jurgens Using Text Classifiers to Study how Macroeconomic Context Moderates Socioeconomic Determinants of Suicide

Michelle YoungJin Kim, Junghwan Kim ABLE: Agency-BeLiefs Embedding to Address Stereotypical Bias Through Awareness Instead of Obliviousness

Mingqian Zheng, Jiaxin Pei, Lajanugen Logeswaran, Moontae Lee, David Jurgens Is ”A Helpful Assistant“ the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts

Kenneth Sible, David Chiang Improving Rare Word Translation with Dictionaries and Attention Masking

Jason Yan, Tong Lin, Yanna Krupnikov, Kerri Milita, Sabina J Tomkins I’ve Seen That Before! Towards Understanding Hard News Exposure from Soft News Outlets

Tianliang Xu, Haofei Xu, Chang Ge, Justine Zhang, Sabina J Tomkins A large-scale dataset to examine political discourse in local governance

Giorgio Piatti, Zhijing Jin, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, Rada Mihalcea Governance of the Commons Benchmark for LLM Agents

Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang Universal Automatic Phonetic Transcription into the International Phonetic Alphabet

Grace LeFevre, Liam Frölund, Lori Beaman, Rob Voigt PLM-Augmented Rule-Based Classifiers: A Lightweight Method for Improving the Generalizability of Expert Knowledge in Novel Information Extraction Tasks

Oana Ignat, Longju Bai, Joan Nwatu, Rada Mihalcea Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

Ziru Chen, Michael White, Ray Mooney, Ali Payani, Yu Su, Huan Sun When is Tree Search Useful for LLM Planning? It Depends on the Discriminator

Seyed Ali Alavi Bajestan, Donald S Williamson Contrastive learning approach for source identification in auditory attention detection

Nam Ho Koh, Santiago Castro, Rada Mihalcea Text over Context: Navigating the Mirage of Model Gaslighting in Visual Language Frameworks

Fan Huang, Haewoon Kwak, Kunwoo Park, Jisun An ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?

Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, Philip Torr, Bernard Ghanem, Guohao Li Can Large Language Model Agents Simulate Human Trust Behaviors?

Tunazzina Islam Uncovering Latent Arguments in Social Media Messaging by Employing a LLMs-in-the-Loop Strategy

So Yeon Min, Xavier Puig, Devendra Singh Chaplot, Tsung-Yen Yang, Akshara Rai, Priyam Parashar, Ruslan Salakhutdinov, Yonatan Bisk, Roozbeh Mottaghi Situated Instruction Following

Benjamin Roger Litterer, David Jurgens, Dallas Card When it Rains, it Pours: Modeling Media Storms and the News Ecosystem

Registration Details

Register to attend here. As with past MSLD events, registration is free, though space may be limited. We especially encourage junior scholars to submit their work. Please register by Monday April 1.

Call for Abstracts:

MSLD invites abstracts of published work and work in progress. Abstracts are non-archival and no proceedings will be published. Submissions should not include the authors. Abstracts may be submitted as text and optionally include a PDF of formatted text, tables, and figures. Due to the combination of multiple research communities, there is no specific format or template for PDF submissions.

Abstracts may be submitted through OpenReview here. The abstract deadline is Sunday March 24 Midnight (AOE) to be considered for a talk. Late abstracts may be considered for a poster. Submission decisions will be announced the week of March 25th.

Travel and Accommodations:

We have secured three hotel room blocks at different locations in the city. Please see this map of the venue, hotel room blocks, additional lodging options, local restaurants and things to do. We anticipate offering some limited travel assistance for presenters and attendees who would otherwise be unable to attend. Please fill out the relevant parts of the registration form to indicate you might need such assistance.

Inn at the Michigan League ($180/night + tax)
Guests can book online at this link (use code SPEECH2024) :
Room block ends: March 14

Bell Tower Hotel ($179/night + tax)
Guests can book online at this link
Room block ends: March 14

Microtel Inn & Suites by Wyndham Ann Arbor
1 Queen Bed, Non-Smoking (NQ1) – $109/night + Tax
2 Queen Beds, Non-Smoking (NQQ1) – $119/night + Tax
1 Queen Bed, Studio Suite, Non-Smoking (SNQ1) – $119/night + Tax
Guests will need to call 877-361-2512 for individual reservations and reference itinerary# 5136B420390585
Room block ends: March 14
The Microtel hotel is serviced by AATA routes 65 and 23 that provide service to the U-M campus.
Please see the campus travel pages for details on how to get to Ann Arbor and around the University of Michigan campus.

If you want to arrange their own lodging near the Briarwood mall, the buses to campus (via TheRide) include the 6, 24 and 62 routes, although rideshares are available.

If flying to MSLD, a cheaper option for traveling from/to the airport is the Michigan Flyer bus ($15 one way; $25 round trip from DTW to Ann Arbor).

Past Events:
MSLD 2019 at the TTI-Chicago
MSLD 2018 at the University of Notre Dame
MSLD & MCLC 2017 at TTI-Chicago
MSLD & MCLC 2016 at Indiana University
MSLD 2015 at TTI-Chicago
MSLD 2014 at UIUC
MSLD 2013 at TTI-Chicago
Illinois Speech Day 2012 at TTI-Chicago
Illinois Speech Day 2011 at TTI-Chicago
Illinois Speech Day 2010 at TTI-Chicago
Illinois Speech Day 2009 at TTI-Chicago
MCLC 2009 at Indiana University
MCLC 2008 at Michigan State University
MCLC 2006 at UIUC
MCLC 2005 at The Ohio State University
MCLC 2004 at Indiana University