Explore Computer Science Research (ExploreCSR) Program
My Experience with the ExploreCSR Program
by Grace O’Brien
Hi! My name is Grace O’Brien and I’m a rising senior at the University of Michigan. I am majoring in Pure Math and Spanish and minoring in Computer Science. This past semester I participated in Explore Computer Science Research (ExploreCSR), a program sponsored by Google in collaboration with Girls Encoded. ExploreCSR is a program designed to introduce undergraduate students from underrepresented groups to research in computer science and help build their confidence.
As the Fall 2020 semester began, I planned out my schedule and realized I would have time for another activity or project. As a math major, I have had experience with pure and applied math research, but I still wasn’t sure what type of career I wanted to pursue. I also greatly enjoyed my computer science and Spanish courses, so I started investigating how I could combine these various interests. After a bit of searching, I discovered the field of Computational Linguistics and Dr. Rada Mihalcea’s Language and Information Technologies (LIT) lab. I emailed Dr. Mihalcea to learn more about her research, and she suggested I apply to ExploreCSR.
ExploreCSR immediately interested me because navigating the world of computer science research has been very intimidating to me and ExploreCSR seemed like a great way to dip my toes in and gain some experience. I was excited to meet other women interested in computer science and connect with mentors in the field.
Once I was accepted to ExploreCSR, I was paired with my mentor Allie Lahnala, a PhD student in the LIT lab. Allie and I got along great right off the bat. We have very similar academic interests and she was even in the same student organization (STEM Society) as me when she was an undergraduate. She was extremely supportive and welcomed me to the program with lots of ideas for topics we could research together. After a few weeks of literature review and acquainting myself with the current research, Allie and I decided to work on a project related to Computer-Assisted Language Learning, specifically for Second Language Acquisition of Spanish. This project has been very interesting to me because it falls right at the intersection of my interests; I get to use skills from all my areas of study. We found a Subreddit called r/WriteStreakES where posters learning Spanish write short responses to daily prompts and native Spanish speakers reply with corrections and suggestions to help improve their writing skills. For example, here is a post by a non-native speaker and its corresponding correction:
Using the data from this source, we are working on building a model to predict if a sentence was written by a native or non-native speaker, and, if non-native, if the sentence has an error in it. I started by reading through many of these posts to get an understanding of the data. I found that the most common type of error is mistaken adjective-noun agreement. Some native speakers give corrections like the above post, copying the text and making the necessary changes. However, others prefer to type a few bullet points with comments or explanations of how to use certain words. We decided that posts where the original text and the comment are well-aligned would be more helpful for us as we can directly compare correct and incorrect sentences. In order to filter the posts to fit this criterion, we use Jaccard Similarity and Levenshtein Distance. Jaccard Similarity measures what proportion of the words are the same between two strings of text. For example, the Jaccard Similarity between the following two strings is 0.5 as half of the total words are shared between the two sentences.
String 1: We ate dinner at the new restaurant.
String 2: We ate at a restaurant.
Number of words in common: 4
Number of words total: 8
Jaccard Similarity: 0.5
Alternatively, Levenshtein Distance measures the number of edits (in the form of insertions, deletions or swaps) needed to convert one string of text into another. For example, by changing B to K, R to N, and inserting S, we can transform “Bitter” to “Kittens” in three steps.
String 1: Bitter
String 2: Kittens
Levenshtein Distance: 3
These techniques allow us to pair a sentence written by a non-native speaker with the corrected sentence written by a native speaker, if one exists. Then, we train a binary classifier to identify which sentences were written by which type of speaker. Currently, our classifier has only about 56% accuracy so we are working to improve it by considering more linguistic features.
One of my favorite parts of ExploreCSR was the chance to connect with a mentor. I appreciated meeting Allie and learning more about what life as a graduate student looks like. Hearing firsthand what she is working on and the path she took to get here makes the prospect of attending graduate school seem much less intimidating. Unfortunately due to the virtual semester, I didn’t have many opportunities to meet other undergraduate researchers in the program. However, I really enjoyed listening to everyone’s final presentations. It was fascinating to see how many different types of projects students came up with. Among the 30 students, no two projects were the same which I think speaks to the breadth of topics within computer science. I particularly enjoyed the project investigating the implicit bias that is encoded into much of AI.
Although my career will likely not be closely related to my ExploreCSR project, participating in this program has definitely helped me develop research skills and build confidence as a woman in STEM. In the future, I plan to pursue a PhD in pure mathematics with the goal of working as a professor and researcher. I hope one day to be able to help organize a program like ExploreCSR to mentor the next generation of researchers so other women can benefit from the same opportunities I have had. Overall, participating in ExploreCSR was a wonderful experience for me and I’d recommend it to anyone interested in learning more about the wide variety of research areas in computer science.
About the ExploreCSR Program
by Allie Lahnala
When I began my PhD in CSE in Fall 2018, my advisor Professor Rada Mihalcea told me about a new program she was developing and asked if I wanted to be involved. That year, Google Research began offering exploreCSR (explore computer science research) awards at the start of the academic year to fund research initiatives that encourage historically marginalized students to pursue graduate studies and careers in computing research. According to the Computing Research Association’s 2020 Taulbee Survey, of the CS doctoral degree recipients in 2019-2020, 19.9% were female, and “the combined percentage of CS doctoral graduates who were American Indian or Alaska Native, Black or African American, Native Hawaiian/Pacific Islander, Hispanic, or Multiracial Non-Hispanic was 3.8 percent.” Prof. Mihalcea had just received an exploreCSR award with the idea to offer paid research opportunities in which students receive personal mentoring on a computing research project from experienced researchers (professors, research scientists, post-doctoral researchers, and senior doctoral students). The organizers within CSE would hold additional research skills workshops and socials for students to connect with each other (and eat tasty snacks). The program would provide an avenue for developing research skills and learning about careers in CS research through experience.
I thought about how such an opportunity might have impacted me and all the things I wish I had known before the start of my PhD. The idea of doing research had always appealed to me, but as an undergraduate struggling to keep up with my computer science courses and the financial costs of university studies, I had the idea that graduate school and computing research were beyond my reach. I had not known an undergraduate doing CS research who might have been able to tell me otherwise, and I was not even aware that tuition is funded and a stipend is provided for PhD students, and in many cases for Master’s students doing research as well. So when Prof. Mihalcea asked if I would be interested in helping organize the program, I was immediately invested, and have been involved each school year since 2018.
Any undergraduate is welcome to apply, no matter if they are a senior with several upper-level CS courses under their belt or a freshman who is still deciding their major and how computer science will fit in. Mainly, we look for candidates who show a budding curiosity or developed interests in CS research and are motivated by the issues of representation in the field. The selected exploreCSR participants always have a wide variety of interests, ranging from core computing aspects to the arts, language, social sciences, and medicine. The mentors conduct computing research across disciplines at the university, from the EECS department, School of Information, and even the School of Music. With this variety of expertise, we try to pair each student to a mentor based on the student’s unique interests, in hopes that the students will research a topic that is most exciting for them.
At the end of the year, we hold a celebration where the students present what they have learned about computing research and are invited to share their work. Across nearly thirty students, there are nearly thirty new research challenges that most in the audience learn about for the first time. That is, each student not only embarks on their own exploration of computing research, but they also demonstrate to each other that there are a plethora of scientific challenges that require more minds like their own.