Midwest Speech and Language Days (MSLD) is a 2-day meeting that continues and expands upon the tradition of Illinois Speech Day and the Midwest Computational Linguistics Colloquium. After a pandemic-induced hiatus, MSLD returns in 2024 at the University of Michigan. This non-archival meeting invites presenters and attendees come from Midwest universities and research institutions to discuss recent and in-progress work. The goal is to increase awareness of speech and language research going on in the region and to foster collaboration among sites.
Visiting Ann Arbor We welcome all to the University of Michigan! The organizing committee and students have put together a list of things to see/do and places to eat around Ann Arbor.
Keynote Speakers: Emma Strubell (Carnegie Mellon University) Betsy Sneller (Michigan State University) Eric Fosler-Lussier (The Ohio State University) Hao Peng (University of Illinois Urbana-Champaign)
Organizers: Steve Abney, University of Michigan Dallas Card, University of Michigan Joyce Chai, University of Michigan David Jurgens, University of Michigan Rada Mihalcea, University of Michigan Emily Mower Provost, University of Michigan VG Vinod Vydiswaran, University of Michigan Lu Wang, University of Michigan Justine Zhang, University of Michigan
Presentation Modalities: Talks: Talk slots are 10 minutes and include 15 minutes for the talk and 5 for questions/switching presenters. Posters: Posters can be at most 48″x36″ (width x height) in landscape orientation.
Schedule of Events
MONDAY, April 1508:00-09:00 Breakfast/Registration at The Michigan League
09:00-09:15 Welcome
09:15-10:00 Keynote 1: Emma Strubell (CMU): “LLMs: Everything’s Different and Nothing Has Changed”
10:00-10:30 Coffee Break 1
10:30-11:30 Talk Session 1 10:30-10:45 Bohan Zhang, Yixin Wang, Paramveer Dhillon (Univ. Michigan) Causal Inference for Human-Language Model Collaboration 10:45-11:00 Christian Clark, William Schuler (The Ohio State) Categorical Grammar Induction from Raw Data 11:00-11:15 Achyutarama R Ganti, Steven R. Wilson, Wing-Yue Geoffrey Louie (Oakland University) Cross-Domain Classification of Educational Talk Turns 11:15-11:30 Joshua Gryzen, Yuliya Lierler (Univ. of Nebraska, Omaha) Evaluating Open-Source Large Language Models on bAbI-Tasks 11:30-12:30 Poster Session 112:30-14:00 Lunch
14:00-15:15 Talk Session 2 14:00-14:15 Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, Philip Torr, Bernard Ghanem, Guohao Li (IIT) Can Large Language Model Agents Simulate Human Trust Behaviors? 14:15-14:30 Chenghao Yang, Allyson Ettinger (Univ. of Chicago) Can You Follow Me? Testing Situational Understanding in ChatGPT 14:30-14:45 So Yeon Min, Xavier Puig, Devendra Singh Chaplot, Tsung-Yen Yang, Akshara Rai, Priyam Parashar, Ruslan Salakhutdinov, Yonatan Bisk, Roozbeh Mottaghi (CMU) Situated Instruction Following 14:45-15:00 Fan Huang, Haewoon Kwak, Kunwoo Park, Jisun An (Univ. of Indiana) ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales? 15:00-15:15 Tunazzina Islam (Purdue) Uncovering Latent Arguments in Social Media Messaging by Employing a LLMs-in-the-Loop Strategy
15:15-16:15 Coffee Break 2 + Poster Session 2
16:15-17:00 Keynote 2: Betsy Sneller (MSU): “Real talk: Naturalistic speech in the MI Diaries corpus”
TUESDAY, April 1608:00-09:00 Breakfast/Registration at the Michigan League
09:00-09:45 Keynote 3: Eric Fossler-Lussier (OSU): “Borderlands: When Speech Meets Text”
09:45-10:45 Talk Session 3 09:45-10:00 Ruiyi Wang, Haofei Yu, Wenxin Sharon Zhang, Zhengyang Qi, Maarten Sap, Graham Neubig, Yonatan Bisk, Hao Zhu (CMU) SOTOPIA-π: Interactive Learning of Socially Intelligent Language Agents 10:00-10:15 Andy Yang (Notre Dame) Masked Hard-Attention Transformers and Boolean RASP Recognize Exactly the Star-Free Languages 10:15-10:30 Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su (The Ohio State) GPT-4V(ision) is a Generalist Web Agent, if Grounded 10:30-10:45 Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu (TTIC) Few-Shot Spoken Language Understanding Via Joint Speech-Text Models
10:45-12:15 Coffee Break 3 and Poster Session 3
12:15-13:30 Lunch
01:30-02:15 Talk Session 4 13:30-13:45 Meera Desai, Irene Pasquetto, Abigail Z Jacobs, Dallas Card (Univ. Michigan) An Archival Perspective on Pretraining Data 13:45-14:00 Clara Na, Ian Magnusson, Ananya Harsh Jha, Tom Sherborne, Emma Strubell, Jesse Dodge, Pradeep Dasigi (CMU) Approximating training data ablations for language models 14:00-14:15 Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang (Univ. of Minnesota) How Far Can We Extract Diverse Perspectives from Large Language Models?
02:15-03:00 Keynote 4: Hao Peng (UIUC): “Pushing the Boundaries of Length Generalization and Reasoning Capabilities of Open LLMs”
03:00-03:15 Closing
Poster Sessions Posters are assigned numbers with specific places in the room so please see use the following guide. Most posters are shown for presentation in two sessions. Poster Session 1
- Muhammad S. Abdo, Damir Cavar The Hosiers Ellipsis Corpus: Building a Corpus of Ellipsis for Arabic Natural Language Processing
- Yingshan Chang, Yasi Zhang, Jacob Zhiyuan Fang, Ying Nian Wu, Yonatan Bisk, Feng Gao Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
- Chenghao Yang, Tuhin Chakrabarty, Karli R Hochstatter, Melissa N Slavin, Nabila El-Bassel, Smaranda Muresan Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts
- Katsumi Ibaraki, Winston Wu, Lu Wang, Rada Mihalcea Analyzing Occupational Distribution Representation in Japanese Language Models
- Jood Otey, Laura Biester, Steven R. Wilson Multilingual Error Analysis For Offensive Language
- Santiago Castro, Amir Ziai, Avneesh Saluja, Zhuoning Yuan, Rada Mihalcea CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models
- Yijun Pan, Sushrita Rakshit, Daniel Tian, Hua Shen, Kenan Alkiek, David Jurgens Interpreting Spatial Reasoning Capabilities in Language Models
- Kevin Christian Wibisono, Yixin Wang Causal Inference with Text Data via Maximizing Contrasts
- Stephen Bothwell, Justin DeBenedetto, Theresa Crnkovich, Hildegund Muller, David Chiang Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines
- Jiaxin Pei, Aparna Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Apostolos Dedeloudis, Jackson Sargent, David Jurgens POTATO: The Portable Text Annotation Tool
- Aylin Ece Gunal, Djallel Bouneffouf Are LLMs Rational Agents? Iterated Prisoner’s Dilemma to Understand LLM Strategy
- Damir Cavar, Ludovic Mompelat, Muhammad S. Abdo The Hoosier Ellipsis Corpus (HELC): Documenting Linguistic Dark Matter
- Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul Speech Recognition for Analysis of Police Radio Communication
- Anna Wegmann, Tijs van den Broek, Dong Nguyen What’s Mine becomes Yours: Defining, Annotating and Detecting Context-Dependent Paraphrases in News Interview Dialogs
- Aparna Ananthasubramaniam, Daniel Romero, David Jurgens Using Text Classifiers to Study how Macroeconomic Context Moderates Socioeconomic Determinants of Suicide
- Mingyue Huo Aligning Speech and Hum Pairs Based on Dynamic Time Warping
- Mingqian Zheng, Jiaxin Pei, Lajanugen Logeswaran, Moontae Lee, David Jurgens Is ”A Helpful Assistant“ the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts
- Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, Rada Mihalcea Tables as Images? Exploring the Strengths and Limitations of LLMs on Multimodal Representations of Tabular Data
- Jason Yan, Tong Lin, Yanna Krupnikov, Kerri Milita, Sabina J Tomkins I’ve Seen That Before! Towards Understanding Hard News Exposure from Soft News Outlets
- Alyssa Allen SQL explainability via LLM generated comments
- Giorgio Piatti, Zhijing Jin, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, Rada Mihalcea Governance of the Commons Benchmark for LLM Agents
- Tong Lin, Jason Yan, Sabina J Tomkins Tab2Text: Transforming tabular data to text with LLMs
- Grace LeFevre, Liam Frölund, Lori Beaman, Rob Voigt PLM-Augmented Rule-Based Classifiers: A Lightweight Method for Improving the Generalizability of Expert Knowledge in Novel Information Extraction Tasks
- Jorge Fandinno, Yuliya Lierler tExplain: Information Extraction with Explanations
- Ziru Chen, Michael White, Ray Mooney, Ali Payani, Yu Su, Huan Sun When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
- Canyu Chen, Kai Shu Can LLM-Generated Misinformation Be Detected?
- Nam Ho Koh, Santiago Castro, Rada Mihalcea Text over Context: Navigating the Mirage of Model Gaslighting in Visual Language Frameworks
- Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu Few-Shot Spoken Language Understanding Via Joint Speech-Text Models
- Andy Yang Masked Hard-Attention Transformers and Boolean RASP Recognize Exactly the Star-Free Languages
- Ruiyi Wang, Haofei Yu, Wenxin Sharon Zhang, Zhengyang Qi, Maarten Sap, Graham Neubig, Yonatan Bisk, Hao Zhu SOTOPIA-π: Interactive Learning of Socially Intelligent Language Agents
- Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su GPT-4V(ision) is a Generalist Web Agent, if Grounded
- Yanhong Li, Chenghao Yang, Allyson Ettinger When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
Poster Session 2
- Aarohi Srivastava, David Chiang BERTwich: Extending BERT’s Capabilities to Model Dialectal and Noisy Text
- Yingshan Chang, Yasi Zhang, Jacob Zhiyuan Fang, Ying Nian Wu, Yonatan Bisk, Feng Gao Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
- Trisha Thomas, Antje Stoehr, Ying Xu The role of bilingual proficiency in ASR performance of children’s speech
- Katsumi Ibaraki, Winston Wu, Lu Wang, Rada Mihalcea Analyzing Occupational Distribution Representation in Japanese Language Models
- Billy Dickson, Sahaj Singh Maini, Zoran Tiganj Combining LLMs and cognitive models of memory
- Santiago Castro, Amir Ziai, Avneesh Saluja, Zhuoning Yuan, Rada Mihalcea CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models
- Yanhong Li, Jiawei Zhou Enhancing Language Modeling with Adaptive Chunk Distilled Generation
- Kevin Christian Wibisono, Yixin Wang Causal Inference with Text Data via Maximizing Contrasts
- Angana Borah, Aparna Garimella, Rada Mihalcea Towards Region-aware Bias Evaluation Metrics
- Jiaxin Pei, Aparna Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Apostolos Dedeloudis, Jackson Sargent, David Jurgens POTATO: The Portable Text Annotation Tool
- Jiazhao Li, Yijin Yang, Zhuofeng Wu, V.G.Vinod Vydiswaran, Chaowei Xiao ChatGPT as an Attack Tool: Stealthy Backdoor Attack via Blackbox Generative Model Trigger
- Damir Cavar, Ludovic Mompelat, Muhammad S. Abdo The Hoosier Ellipsis Corpus (HELC): Documenting Linguistic Dark Matter
- Anna Wegmann, Tijs van den Broek, Dong Nguyen What’s Mine becomes Yours: Defining, Annotating and Detecting Context-Dependent Paraphrases in News Interview Dialogs
- Michelle YoungJin Kim, Junghwan Kim ABLE: Agency-BeLiefs Embedding to Address Stereotypical Bias Through Awareness Instead of Obliviousness
- Mingyue Huo Aligning Speech and Hum Pairs Based on Dynamic Time Warping
- Kenneth Sible, David Chiang Improving Rare Word Translation with Dictionaries and Attention Masking
- Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, Rada Mihalcea Tables as Images? Exploring the Strengths and Limitations of LLMs on Multimodal Representations of Tabular Data
- Tianliang Xu, Haofei Xu, Chang Ge, Justine Zhang, Sabina J Tomkins A large-scale dataset to examine political discourse in local governance
- Alyssa Allen SQL explainability via LLM generated comments
- Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang Universal Automatic Phonetic Transcription into the International Phonetic Alphabet
- Tong Lin, Jason Yan, Sabina J Tomkins Tab2Text: Transforming tabular data to text with LLMs
- Oana Ignat, Longju Bai, Joan Nwatu, Rada Mihalcea Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
- Jorge Fandinno, Yuliya Lierler tExplain: Information Extraction with Explanations
- Seyed Ali Alavi Bajestan, Donald S Williamson Contrastive learning approach for source identification in auditory attention detection
- Canyu Chen, Kai Shu Can LLM-Generated Misinformation Be Detected?
- Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang How Far Can We Extract Diverse Perspectives from Large Language Models?
- Achyutarama R Ganti, Steven R. Wilson, Wing-Yue Geoffrey Louie Cross-Domain Classification of Educational Talk Turns
- Christian Clark, William Schuler Categorial Grammar Induction from Raw Data
- Clara Na, Ian Magnusson, Ananya Harsh Jha, Tom Sherborne, Emma Strubell, Jesse Dodge, Pradeep Dasigi Approximating training data ablations for language models
- Bohan Zhang, Yixin Wang, Paramveer Dhillon Causal Inference for Human-Language Model Collaboration
- Joshua Gryzen, Yuliya Lierler Evaluating Open-Source Large Language Models on bAbI-Tasks
- Chenghao Yang, Allyson Ettinger Can You Follow Me? Testing Situational Understanding in ChatGPT
- Meera Desai, Irene Pasquetto, Abigail Z Jacobs, Dallas Card. An Archival Perspective on Pretraining Data
Poster Session 3
- Muhammad S. Abdo, Damir Cavar The Hosiers Ellipsis Corpus: Building a Corpus of Ellipsis for Arabic Natural Language Processing
- Aarohi Srivastava, David Chiang BERTwich: Extending BERT’s Capabilities to Model Dialectal and Noisy Text
- Chenghao Yang, Tuhin Chakrabarty, Karli R Hochstatter, Melissa N Slavin, Nabila El-Bassel, Smaranda Muresan Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts
- Trisha Thomas, Antje Stoehr, Ying Xu The role of bilingual proficiency in ASR performance of children’s speech
- Jood Otey, Laura Biester, Steven R. Wilson Multilingual Error Analysis For Offensive Language
- Billy Dickson, Sahaj Singh Maini, Zoran Tiganj Combining LLMs and cognitive models of memory
- Yijun Pan, Sushrita Rakshit, Daniel Tian, Hua Shen, Kenan Alkiek, David Jurgens Interpreting Spatial Reasoning Capabilities in Language Models
- Yanhong Li, Jiawei Zhou Enhancing Language Modeling with Adaptive Chunk Distilled Generation
- Stephen Bothwell, Justin DeBenedetto, Theresa Crnkovich, Hildegund Muller, David Chiang Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines
- Angana Borah, Aparna Garimella, Rada Mihalcea Towards Region-aware Bias Evaluation Metrics
- Aylin Ece Gunal, Djallel Bouneffouf Are LLMs Rational Agents? Iterated Prisoner’s Dilemma to Understand LLM Strategy
- Jiazhao Li, Yijin Yang, Zhuofeng Wu, V.G.Vinod Vydiswaran, Chaowei Xiao ChatGPT as an Attack Tool: Stealthy Backdoor Attack via Blackbox Generative Model Trigger
- Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul Speech Recognition for Analysis of Police Radio Communication
- Yanhong Li, Chenghao Yang, Allyson Ettinger When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
- Aparna Ananthasubramaniam, Daniel Romero, David Jurgens Using Text Classifiers to Study how Macroeconomic Context Moderates Socioeconomic Determinants of Suicide
- Michelle YoungJin Kim, Junghwan Kim ABLE: Agency-BeLiefs Embedding to Address Stereotypical Bias Through Awareness Instead of Obliviousness
- Mingqian Zheng, Jiaxin Pei, Lajanugen Logeswaran, Moontae Lee, David Jurgens Is ”A Helpful Assistant“ the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts
- Kenneth Sible, David Chiang Improving Rare Word Translation with Dictionaries and Attention Masking
- Jason Yan, Tong Lin, Yanna Krupnikov, Kerri Milita, Sabina J Tomkins I’ve Seen That Before! Towards Understanding Hard News Exposure from Soft News Outlets
- Tianliang Xu, Haofei Xu, Chang Ge, Justine Zhang, Sabina J Tomkins A large-scale dataset to examine political discourse in local governance
- Giorgio Piatti, Zhijing Jin, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, Rada Mihalcea Governance of the Commons Benchmark for LLM Agents
- Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang Universal Automatic Phonetic Transcription into the International Phonetic Alphabet
- Grace LeFevre, Liam Frölund, Lori Beaman, Rob Voigt PLM-Augmented Rule-Based Classifiers: A Lightweight Method for Improving the Generalizability of Expert Knowledge in Novel Information Extraction Tasks
- Oana Ignat, Longju Bai, Joan Nwatu, Rada Mihalcea Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
- Ziru Chen, Michael White, Ray Mooney, Ali Payani, Yu Su, Huan Sun When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
- Seyed Ali Alavi Bajestan, Donald S Williamson Contrastive learning approach for source identification in auditory attention detection
- Nam Ho Koh, Santiago Castro, Rada Mihalcea Text over Context: Navigating the Mirage of Model Gaslighting in Visual Language Frameworks
- Fan Huang, Haewoon Kwak, Kunwoo Park, Jisun An ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?
- Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, Philip Torr, Bernard Ghanem, Guohao Li Can Large Language Model Agents Simulate Human Trust Behaviors?
- Tunazzina Islam Uncovering Latent Arguments in Social Media Messaging by Employing a LLMs-in-the-Loop Strategy
- So Yeon Min, Xavier Puig, Devendra Singh Chaplot, Tsung-Yen Yang, Akshara Rai, Priyam Parashar, Ruslan Salakhutdinov, Yonatan Bisk, Roozbeh Mottaghi Situated Instruction Following
- Benjamin Roger Litterer, David Jurgens, Dallas Card When it Rains, it Pours: Modeling Media Storms and the News Ecosystem
Registration Details
Register to attend here. As with past MSLD events, registration is free, though space may be limited. We especially encourage junior scholars to submit their work. Please register by Monday April 1.
Call for Abstracts:
MSLD invites abstracts of published work and work in progress. Abstracts are non-archival and no proceedings will be published. Submissions should not include the authors. Abstracts may be submitted as text and optionally include a PDF of formatted text, tables, and figures. Due to the combination of multiple research communities, there is no specific format or template for PDF submissions.
Abstracts may be submitted through OpenReview here. The abstract deadline is Sunday March 24 Midnight (AOE) to be considered for a talk. Late abstracts may be considered for a poster. Submission decisions will be announced the week of March 25th.
Travel and Accommodations:
We have secured three hotel room blocks at different locations in the city. Please see this map of the venue, hotel room blocks, additional lodging options, local restaurants and things to do. We anticipate offering some limited travel assistance for presenters and attendees who would otherwise be unable to attend. Please fill out the relevant parts of the registration form to indicate you might need such assistance.
Inn at the Michigan League ($180/night + tax) Guests can book online at this link (use code SPEECH2024) : Room block ends: March 14
Bell Tower Hotel ($179/night + tax) Guests can book online at this link Room block ends: March 14
Microtel Inn & Suites by Wyndham Ann Arbor 1 Queen Bed, Non-Smoking (NQ1) – $109/night + Tax 2 Queen Beds, Non-Smoking (NQQ1) – $119/night + Tax 1 Queen Bed, Studio Suite, Non-Smoking (SNQ1) – $119/night + Tax Guests will need to call 877-361-2512 for individual reservations and reference itinerary# 5136B420390585 Room block ends: March 14 The Microtel hotel is serviced by AATA routes 65 and 23 that provide service to the U-M campus. Please see the campus travel pages for details on how to get to Ann Arbor and around the University of Michigan campus.
If you want to arrange their own lodging near the Briarwood mall, the buses to campus (via TheRide) include the 6, 24 and 62 routes, although rideshares are available.
If flying to MSLD, a cheaper option for traveling from/to the airport is the Michigan Flyer bus ($15 one way; $25 round trip from DTW to Ann Arbor).
Past Events: MSLD 2019 at the TTI-Chicago MSLD 2018 at the University of Notre Dame MSLD & MCLC 2017 at TTI-Chicago MSLD & MCLC 2016 at Indiana University MSLD 2015 at TTI-Chicago MSLD 2014 at UIUC MSLD 2013 at TTI-Chicago Illinois Speech Day 2012 at TTI-Chicago Illinois Speech Day 2011 at TTI-Chicago Illinois Speech Day 2010 at TTI-Chicago Illinois Speech Day 2009 at TTI-Chicago MCLC 2009 at Indiana University MCLC 2008 at Michigan State University MCLC 2006 at UIUC MCLC 2005 at The Ohio State University MCLC 2004 at Indiana University
|