On the left, is a list of AI-related projects and research I’ve conducted. On the right, the timeline contextualizes each project globally and personally, or I provide some insight into what I was doing during the gaps of my work. I provide abstracts and links to related news articles, but if you would like to read the full paper or have further questions please contact me.
This page roughly outlines the evolution of my relationship with AI. I first hard coded a neural network from scratch my junior year of high school and taught myself NLP for my high school senior research project. Ever since, I continued to foster cross-disciplinary collaboration, especially with humanities, by leading workshops and conducting research that leverages AI methodologies to investigate questions relating to social politics, gender, Latin literature, and cultural linguistics. This research cultivated an understanding for the limitations of AI when implemented in social settings. After working at Microsoft and participating in FATE group meetings, I began shifting my focus to building ethical AIs in clinical and health-related settings. Here, I apply my knowledge to create more equitable healthcare for under-resourced communities.
Using Machine Learning to Improve Community Health Worker Operations in Nigeria: A Collaboration with M’Care
MIT’s 6.7930 Machine Learning for Healthcare Graduate Course
Team: Darcy Kim, Beth Whittier, Joyce Luo, Angela Lin
May 2023
“In a remarkable collaboration between M’Care a 2021 Health Security & Pandemics Solver team, and students from MIT, the landscape of child health care in Nigeria could undergo a transformative change, wherein the power of data is harnessed to improve child health outcomes in economically disadvantaged communities.” (See article linked on right for more!)
Abstract:
M’Care provides equitable access to healthcare services such as early detection, diagnosis,
and treatment in Nigeria’s economically disadvantaged communities. They utilize live AI decision support to help community health extension workers provide personalized health- care in rural communities.
In Nigeria, chronic malnutrition leading to impaired growth affects 32% of children, but only 1 out of 5 children affected are provided treatment. Treatments and interventions for malnutrition under age 5 include disease prevention, specially-formulated nutrient-dense therapeutic foods, and supplemental vitamins and micro-nutrient powders. Organizing the administration of interventions will help decrease early childhood malnutrition and the consequential health complications. M’Care is addressing this issue by running studies involving healthcare workers ad-ministering doses of micro-nutrient powder, vitamin A, and zinc to children under five. M’Care would like to leverage the data collected from their current operations to maximize the number of children they can successfully deliver vitamin and nutrient interventions to. Our team was tasked with predicting whether beneficiaries (child-mother pairs) would “complete” their prescribed intervention series, as well as predict the aggregate volume of contacts over time. We used various binary classification methods to predict whether ben- eficiaries would complete their next scheduled contact. This allows M’Care to target their reminders to beneficiaries that are at risk of not receiving their next scheduled contact. We also used time series models to predict the aggregate volume of contacts, which helps M’Care to plan ahead the number of resources (vitamins/nutrients, health workers) to have available at any given time. Our contributions are the following:
• Analyze data from M’Care’s current operations to understand current health worker
workload and intervention completion rates
• Predict whether each beneficiary will complete their next recommended contact/dose
of an intervention
• Predict aggregate volume of expected contacts over time
• Discuss implications and make recommendations to M’Care and their supporters, including the Ministry of Public Health and World Bank
How Hearts Beat: Classifying Failing Hearts into HFrEF vs HFpEF using Image Embeddings from Chest X-Rays
MIT’s 6.8300 Advances in Computer Vision’s Course
Independent Project
May 2023
For this independent project, I successfully build a deep learning model that can make the diagnosis and management of heart failure more equitable using computer visions techniques on real medical data (EHR) and Chest-X Rays collected from Beth Israel Hospital in Boston, MA. I further contextualize and analyze my model in the socio-political issues inherent in the US medical system. I hope to continue building sensitive AI’s that provide equitable care and reparations for systematic injustices in the medical system.
Abstract:
Heart failure is a major public health challenge with growing costs. Ejection fraction (EF) is a key metric for the diagnosis and management of heart failure. However, estimating EF using echocardiography is expensive and requires extensive expertise.
An existing deep learning model is able to identify EF from chest x-rays, which are quick, inexpensive, and require less expertise. However, this model requires enormous computational resource to train and has inequitable performance across race. This paper introduces a model that distinguishes EF in chest-x rays using image embeddings from the MIMIC dataset. Compared to the existing model trained for the same task on raw images, the proposed model performs similarly while significantly reducing the computational resources needed for training and updating. Upon further model analysis based on race, gender, and insurance, the proposed embedding model has a bias performance correlated with the representation in the dataset. Finally, further discussion on model bias in clinical settings puts the model into context of the American healthcare system.
Building AI’s to Predict Energy Landscape of Proteins
Rocklin Lab @ Northwestern School of Medicine
Contributing to Rocklin Lab’s HDX Project
Summer 2023
As apart of the Rosetta Commons REU program, I contributed research to the Rocklin Lab @ Northwestern School of Medicine (linked right). I creatively adapted Natural Language Processing, Graph Neural Networks, general AI methodology, and various data analysis tools to predict energy landscapes and conformations of proteins. Proteins sample different conformational states that are populated according to their relative energy. Changes in the energy landscape can alter the population of intermediate states and result in diseases, and the work I did has implications in drug design and the treatment of conformational diseases.
The poster below outlines my initial approach.

Microsoft Internship & FATE Group
Software Engineer and PM Intern @ Redmond, WA
FATE Group
Summer 2022
While interning at Microsoft, I successfully developed a micro-feedback tool for Edge’s new tab page in a team with three other interns using Fluent UI and CSS and TypeScript. We designed the component to be modular, dynamic, and accessible working in an AGILE environment and collaborating with designers, the general Edge team, and accessibility experts.
In addition, I also participated in their Fairness, Accountability, Transparency, and Ethics (FATE) in AI team meetings. They develop collaborative techniques to facilitate ethical implementation of AI. Their team studies how societal issues like systemic bias and discriminatory behaviors are manifest in complex real-world socio-technical systems, and develop both quantitative and qualitative techniques to help mitigate such issues and guide the design and application of socio-technical systems.
As impressive as their ideas are, when I talked to the actual people implementing language models onto their products, these ideas were largely ignored (in my personal observation and limited experience). Despite devoting research to mitigating harms, Microsoft continues to produce technology that amplify systemic violence using AI. The research is estranged from their product is estranged from the people who use it. Nonetheless, change takes time and I greatly admire the work the FATE Group does. Their ideas continue to inspire me and I center the practice of their revolutionary tactics for creating ethical AI in every project I pursue since.
Using Word Vectors (NLP) to trace semantic shifts surrounding the language of LGBTQ communities in 20th-21st century US
Wellesley College’s LING 246 Corpus Linguistics Course
Independent Project
May 2022
Word embeddings are a ML method that represents each English word by a vector. The geometry between these vectors captures semantic relations between the corresponding words. The paper “Word embeddings quantify 100 years of gender and ethnic stereotypes”: “Demonstrate that word embeddings can be used as a powerful tool to quantify historical trends and social change. As specific applications, we develop metrics based on word embeddings to characterize how gender stereotypes and attitudes toward ethnic minorities in the United States evolved during the 20th and 21st centuries starting from 1910. Our framework opens up a fruitful intersection between machine learning and quantitative social science” (Garg et. al. 2018)
For my project, I expand their methodology and techniques from various disciplines including political science, corpus linguistics, and history to explore how the language surrounding LGBTQ communities evolves through 20th and 21st century United States.
Abstract:
This paper uses a combination of natural language processing, a subfield of artificial intelligence/computer science, and traditional corpus linguistic techniques to explore how the language surrounding LGBTQ evolves through 20th and 21st century United States. I also provide historical and legal context to investigate if significant shifts in language surrounding LGBTQ people align with relevant historical events. The main analysis involves tracing the semantic shift of the word “gay” through the Corpus of Historical American English overtime. Using the methods k-nearest neighbors, collocation, stereotype quantification from word embeddings, and concordance, I found that the language surround the word “gay” has significant shifts in the years 1960 and 1990. These two years correspond with significant events in the gay rights movement and AIDS epidemic. Discussion is supplemented with analysis of other words relating to the LGBTQ community (“homosexual,” “lesbian,” and “queer”) using similar methods.
How Nasty was Nero Really?
A Computational Investigation of Original Latin Text Using ML Classifiers
Wellesley College’s CS 232 Artificial Intelligence Course
Independent Project
May 2022
Nero is regarded as the most infamous Roman emperor. Historical accounts cement his notoriety for slaughtering his loved ones, burning the city of Rome, and extreme tyranny. However, Rachel Mead reports in her New Yorker article “How Nasty Was Nero Really?” that a 2021 show at the British Museum offered a less sensationalist account of Nero’s reign since modern scholars recognize that many accounts of Nero’s atrocities closely resemble literary accounts of mythical events. In other words, the evils often associated with his reign might be grossly exaggerated for the sake of story and propaganda.
Abstract: I want to investigate the notion that the primary Latin sources on Nero more closely resemble myth than history from a computational standpoint. To do so, I created a classification neural-network model that determines if Latin text is historical or mythological using the Latin BERT language model. I then feed in primary Latin text sources on Nero, so the model can determine if the literature is more mythological or historical.
My results support the notion that some primary Latin text sources on Nero reflect mythological text more so than historical text. In other words, these computational conclusions reflect the conclusions drawn by traditional classical scholarship in this area of research.
EASEL Lab
Wellesley College Research Lab
Research Assistant
August 2021 – May 2022
I joined the EASEL Lab my sophomore year at Wellesley, where I interrogated this bias in AI language models under the mentorship of Prof. Carolyn Anderson and in collaboration with other students. We explored how representations of women change over time using Wikipedia as the dataset and word vectors as a quantifying metric. I used state of the art NLP techniques such as BERT and word2vec, big data analysis, and social science methodologies.
ML Workshop for Humanities Students
Google Developer Student Club | Wellesley College Chapter
December 2020
As a member of my school’s “Google Developer Student Club,” I led a workshop for humanities majors on AI and how they could use it in their research. I learned from my high school senior research project the importance of cross disciplinary collaboration in AI research. By sharing my knowledge of this technology, I openned the door for collaboration with other students with different skill sets and research interests. Preparing this workshop also taught me to teach highly technical methodologies to people outside my discipline and make this knowledge more accessible to people outside my field of study.
Using NLP and word vectors to trace how the portrayal of women in American novels change in relation to politics and cultural
Thomas Jefferson High School for Science and Technology
Senior Research Project
Fall 2019-Spring 2020
My senior year of high school, I used AI to trace how the portrayal of women in American novels changed over time. I adored English class, where I found windows and understandings into cultures across time and space by entering into the world of a novel. I thought AI could allow me to practice this analysis over thousands of books instead of working through one at a time. My project did yield interesting results where women’s portrayal evolved alongside major political shifts such as women’s suffrage and the landmark SCOTUS cases overturning gender discrimination in the 1970s. However, the single narrative produced by the AI did not encompass the multiplicities of experiences various women across America had. I had to refer to specific examples in specific texts that weren’t represented by the AI’s narrative. Black, queer, Asian, trans women’s stories were being erased in this generalized analysis. When a model takes big data as input and outputs a single answer, it forsakes the multitudes of experiences that are underrepresented in the data.
Hello (Artificial) World! My First AI Algorithms
Thomas Jefferson High School for Science and Technology
Independent Projects
2018-2019
During my junior year of high school, I built a neural network from scratch. I didn’t use any packages and hardcoded the tedious matrix calculations for feed forward and back propagation functions in python. Doing this gave me a deep technical understanding of the math and statistics behind AI algorithms. I found shock and joy each time I watched my code learn to solve increasingly complicated problems without my explicit instruction. I trained AIs to discriminate points that are in or out of high dimensional organic regions, predict if a passenger would survive the titanic, and play a competitive game of Othello/Reversi.