Research - disciplined hedonism

On the left, is a list of AI-related projects and research I’ve conducted. On the right, the timeline contextualizes each project globally and personally, or I provide some insight into what I was doing during the gaps of my work. I provide abstracts and links to related news articles, but if you would like to read the full paper or have further questions please contact me.

This page roughly outlines the evolution of my relationship with AI. I first hard coded a neural network from scratch my junior year of high school and taught myself NLP for my high school senior research project. Ever since, I continued to foster cross-disciplinary collaboration, especially with humanities, by leading workshops and conducting research that leverages AI methodologies to investigate questions relating to social politics, gender, Latin literature, and cultural linguistics. This research cultivated an understanding for the limitations of AI when implemented in social settings. After working at Microsoft and participating in FATE group meetings, I began shifting my focus to building ethical AIs in clinical and health-related settings. Here, I apply my knowledge to create more equitable healthcare for under-resourced communities.

Using Machine Learning to Improve Community Health Worker Operations in Nigeria: A Collaboration with M’Care

MIT’s 6.7930 Machine Learning for Healthcare Graduate Course

Team: Darcy Kim, Beth Whittier, Joyce Luo, Angela Lin

May 2023

“In a remarkable collaboration between M’Care a 2021 Health Security & Pandemics Solver team, and students from MIT, the landscape of child health care in Nigeria could undergo a transformative change, wherein the power of data is harnessed to improve child health outcomes in economically disadvantaged communities.” (See article linked on right for more!)

Abstract:

M’Care provides equitable access to healthcare services such as early detection, diagnosis,
and treatment in Nigeria’s economically disadvantaged communities. They utilize live AI decision support to help community health extension workers provide personalized health- care in rural communities.

In Nigeria, chronic malnutrition leading to impaired growth affects 32% of children, but only 1 out of 5 children affected are provided treatment. Treatments and interventions for malnutrition under age 5 include disease prevention, specially-formulated nutrient-dense therapeutic foods, and supplemental vitamins and micro-nutrient powders. Organizing the administration of interventions will help decrease early childhood malnutrition and the consequential health complications. M’Care is addressing this issue by running studies involving healthcare workers ad-ministering doses of micro-nutrient powder, vitamin A, and zinc to children under five. M’Care would like to leverage the data collected from their current operations to maximize the number of children they can successfully deliver vitamin and nutrient interventions to. Our team was tasked with predicting whether beneficiaries (child-mother pairs) would “complete” their prescribed intervention series, as well as predict the aggregate volume of contacts over time. We used various binary classification methods to predict whether ben- eficiaries would complete their next scheduled contact. This allows M’Care to target their reminders to beneficiaries that are at risk of not receiving their next scheduled contact. We also used time series models to predict the aggregate volume of contacts, which helps M’Care to plan ahead the number of resources (vitamins/nutrients, health workers) to have available at any given time. Our contributions are the following:

• Analyze data from M’Care’s current operations to understand current health worker

workload and intervention completion rates

• Predict whether each beneficiary will complete their next recommended contact/dose

of an intervention

• Predict aggregate volume of expected contacts over time

• Discuss implications and make recommendations to M’Care and their supporters, including the Ministry of Public Health and World Bank

May 2023

Revolutionizing Child Health in Nigeria: M’Care and MIT Join Forces

This was a rare opportunity to closely collaborate with people who both collect the data and live the story behind it. We built AI models grounded in the embodied knowledge of their communities, instead of trying to enforce the patterns of a generalized dataset onto their specific needs. Our work was presented to World Bank representatives and the Nigerian Ministry of Public Health, and it influenced decisions about the nation’s malnutrition plan.

How Hearts Beat: Classifying Failing Hearts into HFrEF vs HFpEF using Image Embeddings from Chest X-Rays

MIT’s 6.8300 Advances in Computer Vision’s Course

Independent Project

May 2023

For this independent project, I successfully build a deep learning model that can make the diagnosis and management of heart failure more equitable using computer visions techniques on real medical data (EHR) and Chest-X Rays collected from Beth Israel Hospital in Boston, MA. I further contextualize and analyze my model in the socio-political issues inherent in the US medical system. I hope to continue building sensitive AI’s that provide equitable care and reparations for systematic injustices in the medical system.

Abstract:

Heart failure is a major public health challenge with growing costs. Ejection fraction (EF) is a key metric for the diagnosis and management of heart failure. However, estimating EF using echocardiography is expensive and requires extensive expertise.

An existing deep learning model is able to identify EF from chest x-rays, which are quick, inexpensive, and require less expertise. However, this model requires enormous computational resource to train and has inequitable performance across race. This paper introduces a model that distinguishes EF in chest-x rays using image embeddings from the MIMIC dataset. Compared to the existing model trained for the same task on raw images, the proposed model performs similarly while significantly reducing the computational resources needed for training and updating. Upon further model analysis based on race, gender, and insurance, the proposed embedding model has a bias performance correlated with the representation in the dataset. Finally, further discussion on model bias in clinical settings puts the model into context of the American healthcare system.

May 2023

Building AI for clinical settings using sensitive medical data

There has never been a successful implementation of AI in a clinical setting. Medical technology in the US, especially those reliant on big data and AI, often perpetuate systematic racism. For example, pulse oximeters (a ubiquitous medical tool) only work well for light skinned peoples (see article above). Medical data is collected in the context of structural injustices which determines healthcare accessibility and quality based on a person’s race, class, and gender. A racist system will only produce racist data. Thus, we must provide reparations in the clinical AI models we build using this inherently bias data.

Building AI’s to Predict Energy Landscape of Proteins

Rocklin Lab @ Northwestern School of Medicine

Contributing to Rocklin Lab’s HDX Project

Summer 2023

As apart of the Rosetta Commons REU program, I contributed research to the Rocklin Lab @ Northwestern School of Medicine (linked right). I creatively adapted Natural Language Processing, Graph Neural Networks, general AI methodology, and various data analysis tools to predict energy landscapes and conformations of proteins. Proteins sample different conformational states that are populated according to their relative energy. Changes in the energy landscape can alter the population of intermediate states and result in diseases, and the work I did has implications in drug design and the treatment of conformational diseases.
The poster below outlines my initial approach.

Summer 2023

AI solved one of biology’s greatest challenges:

(Thumbnail: Proteinase Inhibitor CI-2 from barley seeds.)

Predicting how proteins fold from a chain of amino acids into 3D shapes that carry out life’s tasks. Ever since, researchers have explored AI as a valuable tool for understanding protein’s seemingly chaotic nature.

Although I am not a trained biologist, this problem fascinated me because of its inherently 3D physical representations and the nuanced, complex relationships that can chaotically change a protein. Through collaboration, I was able to adapt NLP and AI skills to a draw conclusions about this real biological problem.

Microsoft Internship & FATE Group

Software Engineer and PM Intern @ Redmond, WA

FATE Group

Summer 2022

While interning at Microsoft, I successfully developed a micro-feedback tool for Edge’s new tab page in a team with three other interns using Fluent UI and CSS and TypeScript. We designed the component to be modular, dynamic, and accessible working in an AGILE environment and collaborating with designers, the general Edge team, and accessibility experts.

In addition, I also participated in their Fairness, Accountability, Transparency, and Ethics (FATE) in AI team meetings. They develop collaborative techniques to facilitate ethical implementation of AI. Their team studies how societal issues like systemic bias and discriminatory behaviors are manifest in complex real-world socio-technical systems, and develop both quantitative and qualitative techniques to help mitigate such issues and guide the design and application of socio-technical systems.

As impressive as their ideas are, when I talked to the actual people implementing language models onto their products, these ideas were largely ignored (in my personal observation and limited experience). Despite devoting research to mitigating harms, Microsoft continues to produce technology that amplify systemic violence using AI. The research is estranged from their product is estranged from the people who use it. Nonetheless, change takes time and I greatly admire the work the FATE Group does. Their ideas continue to inspire me and I center the practice of their revolutionary tactics for creating ethical AI in every project I pursue since.

Summer 2022

Microsoft’s FATE: Fairness, Accountability, Transparency, and Ethics in AI

(Thumbnail: darcy kim)

This summer I turned 20 was the first drop of sun on skin that only knew darkness. And by that, I mean: Hope began to solidify in the corners of my iris. In the context of AI, I began to build community amongst people who see the violence imbued in AI and are already doing the work to change it. It’s when I decided AI as my vocation before work or hobby, at least for the time being. 生き甲斐(ikigai) translates as reason for being. And I feel it manifest in this shared struggle to liberate AI from its colonial roots. Donna Haraway details a prescient vision for this in her revolutionary (white) feminist essay A Cyborg Manifesto (1985).

Using Word Vectors (NLP) to trace semantic shifts surrounding the language of LGBTQ communities in 20th-21st century US

Wellesley College’s LING 246 Corpus Linguistics Course

Independent Project

May 2022

Word embeddings are a ML method that represents each English word by a vector. The geometry between these vectors captures semantic relations between the corresponding words. The paper “Word embeddings quantify 100 years of gender and ethnic stereotypes”: “Demonstrate that word embeddings can be used as a powerful tool to quantify historical trends and social change. As specific applications, we develop metrics based on word embeddings to characterize how gender stereotypes and attitudes toward ethnic minorities in the United States evolved during the 20th and 21st centuries starting from 1910. Our framework opens up a fruitful intersection between machine learning and quantitative social science” (Garg et. al. 2018)

For my project, I expand their methodology and techniques from various disciplines including political science, corpus linguistics, and history to explore how the language surrounding LGBTQ communities evolves through 20th and 21st century United States.

Abstract:

This paper uses a combination of natural language processing, a subfield of artificial intelligence/computer science, and traditional corpus linguistic techniques to explore how the language surrounding LGBTQ evolves through 20th and 21st century United States. I also provide historical and legal context to investigate if significant shifts in language surrounding LGBTQ people align with relevant historical events. The main analysis involves tracing the semantic shift of the word “gay” through the Corpus of Historical American English overtime. Using the methods k-nearest neighbors, collocation, stereotype quantification from word embeddings, and concordance, I found that the language surround the word “gay” has significant shifts in the years 1960 and 1990. These two years correspond with significant events in the gay rights movement and AIDS epidemic. Discussion is supplemented with analysis of other words relating to the LGBTQ community (“homosexual,” “lesbian,” and “queer”) using similar methods.

May 2022

“Word embeddings quantify 100 years of gender and ethnic stereotypes” (Garg et. al. 2018)

When interrogating the harms of AI, cross-disciplinary collaboration is often cited as a way to mitigate these harms. In this paper, I centered the goals of linguistics and social science while using ML as a tool to further knowledge in these disciplines.

Grounding AI in the living truths elucidated through humanities and social science studies not only provides greater tools to mitigate its harms, but also stretches the possibilities for what it can do. It reveals the pitfalls of AI while providing techniques to supplement its weaknesses and complement its strengths. For example, word embeddings captured overall trends and extracted key time periods of intense change. However, close reading and political contextualization allowed for the nuanced conclusions about the entwined cause-effect nature of politics and culture.

How Nasty was Nero Really?

A Computational Investigation of Original Latin Text Using ML Classifiers

Wellesley College’s CS 232 Artificial Intelligence Course

Independent Project

May 2022

Nero is regarded as the most infamous Roman emperor. Historical accounts cement his notoriety for slaughtering his loved ones, burning the city of Rome, and extreme tyranny. However, Rachel Mead reports in her New Yorker article “How Nasty Was Nero Really?” that a 2021 show at the British Museum offered a less sensationalist account of Nero’s reign since modern scholars recognize that many accounts of Nero’s atrocities closely resemble literary accounts of mythical events. In other words, the evils often associated with his reign might be grossly exaggerated for the sake of story and propaganda.

Abstract: I want to investigate the notion that the primary Latin sources on Nero more closely resemble myth than history from a computational standpoint. To do so, I created a classification neural-network model that determines if Latin text is historical or mythological using the Latin BERT language model. I then feed in primary Latin text sources on Nero, so the model can determine if the literature is more mythological or historical.

My results support the notion that some primary Latin text sources on Nero reflect mythological text more so than historical text. In other words, these computational conclusions reflect the conclusions drawn by traditional classical scholarship in this area of research.

May 2022

How Nasty Was Nero Really?

I studied Latin for 6 years, and I loved discovering sounds and emotions from a foreign ancient culture that still resonate with my modern human experience.

Since I first studied CS, it has never been a means to an end, but rather a tool to understand beyond itself. Similarly, this project allowed me to use AI as a tool to understand Latin literature on a different scale than close reading/translating.

EASEL Lab

Wellesley College Research Lab

Research Assistant

August 2021 – May 2022

I joined the EASEL Lab my sophomore year at Wellesley, where I interrogated this bias in AI language models under the mentorship of Prof. Carolyn Anderson and in collaboration with other students. We explored how representations of women change over time using Wikipedia as the dataset and word vectors as a quantifying metric. I used state of the art NLP techniques such as BERT and word2vec, big data analysis, and social science methodologies.

Summer 2021

Math Teacher for Advanced Middle Schoolers

During Summer 2021, I worked as an Algebra 1 and Geometry instructor for gifted middle school students, I taught 4 hybrid classes of 4-24 students; Collaborated with other teachers, admin, parents; and Created then taught interactive lesson plans from scratch that engaged students.

My favorite lessons involved playing DND with my students where they would have to use geometric concepts and proofs to fight monsters and navigate a fantastical world loosely based on the school and teachers.

ML Workshop for Humanities Students

Google Developer Student Club | Wellesley College Chapter

December 2020

As a member of my school’s “Google Developer Student Club,” I led a workshop for humanities majors on AI and how they could use it in their research. I learned from my high school senior research project the importance of cross disciplinary collaboration in AI research. By sharing my knowledge of this technology, I openned the door for collaboration with other students with different skill sets and research interests. Preparing this workshop also taught me to teach highly technical methodologies to people outside my discipline and make this knowledge more accessible to people outside my field of study.

Spring 2021

Coding4Youth Instructor

During Spring 2021, I lived in Maine with 3 friends I met in the first semester of college where we bunkered through Covid together. During this time, I taught and mentored seven elementary and middle school students in fundamental CS skills, which includes a scratch-like language, Web Design in HTML5/CSS, and game development in Roblox.

Using NLP and word vectors to trace how the portrayal of women in American novels change in relation to politics and cultural

Thomas Jefferson High School for Science and Technology

Senior Research Project

Fall 2019-Spring 2020

My senior year of high school, I used AI to trace how the portrayal of women in American novels changed over time. I adored English class, where I found windows and understandings into cultures across time and space by entering into the world of a novel. I thought AI could allow me to practice this analysis over thousands of books instead of working through one at a time. My project did yield interesting results where women’s portrayal evolved alongside major political shifts such as women’s suffrage and the landmark SCOTUS cases overturning gender discrimination in the 1970s. However, the single narrative produced by the AI did not encompass the multiplicities of experiences various women across America had. I had to refer to specific examples in specific texts that weren’t represented by the AI’s narrative. Black, queer, Asian, trans women’s stories were being erased in this generalized analysis. When a model takes big data as input and outputs a single answer, it forsakes the multitudes of experiences that are underrepresented in the data.

Fall 2019-
Spring 2020

Teaching myself NLP for the first time…

I went to a science and technology public high school where every student conducts an independent research project during their senior year. For my research project, I taught myself NLP and linguistics from free online lectures and textbooks. I’m happy I got to work through the technical details of these fields on my own, before eventually learning it again in college, since I developed my own intuition for these fields which gave me more creative freedom in my future application of these ideas.

Hello (Artificial) World! My First AI Algorithms

Thomas Jefferson High School for Science and Technology

Independent Projects

2018-2019

During my junior year of high school, I built a neural network from scratch. I didn’t use any packages and hardcoded the tedious matrix calculations for feed forward and back propagation functions in python. Doing this gave me a deep technical understanding of the math and statistics behind AI algorithms. I found shock and joy each time I watched my code learn to solve increasingly complicated problems without my explicit instruction. I trained AIs to discriminate points that are in or out of high dimensional organic regions, predict if a passenger would survive the titanic, and play a competitive game of Othello/Reversi.

2016-2020

TJHSST

My high school was founded during the Cold War with funds from the government and defense contracting companies. Reagan spoke at the school’s opening. It is a public high school that students need to apply and test into, and I’m so grateful for this rare opportunity of such a high quality free education. Still, I was always aware of the school’s history rooted in war and the scientific developments ushered for the sake of war. I think my introduction to AI in this high school colored my understanding of its possibilities to perpetuate violence and war efforts since I first started exploring it. Of course, my understanding of AI has expanded beyond this realm, but I never forgot this dangerous application of it.