Projects
A selection of projects completed for undergraduate research, internships, and coursework.
This section of my website is currently under construction! I have listed brief summaries of projects I've completed, but am in the process of updating the corresponding detail pages for each.
Note: I changed my first name late 2022 so some of my previous work is credited under a different name.
Work Projects
Center for Human-Compatible AI: A STRONGReject for Empty Jailbreaks [ICLR 2024 Workshop on Reliable and Responsible Foundation Models; NeurIPS submission in progress]
I helped create a new benchmark for jailbreak attacks against large language models, consisting of 346 harmful prompts aimed to evaluate whether jailbreak attacks enable malicious actors to use LLMs to cause harm and a novel autograding method which achieves SOTA performance on matching fine-grained human judgments of jailbreak quality.
I implemented intelligence evaluations using the Massive Multitask Language Understanding benchmark to show that many jailbreaks degrade model intelligence, decreasing their usefulness for malicious actors.
I finetuned language models to create a new autograder which is easy and cost-effective to run on a laptop CPU, with comparable performance to autograders implemented using the GPT-4 API.
Skills: large language models (LLMs), jailbreaking, benchmark creation, AI safety, AI evaluation, Python, linux, LLM finetuning, machine learning, data analysis, paper writing
Center for Human-Compatible AI: Partially Observable Ethically Compliant Autonomous Systems [ICRA 2024]
I extended a framework for ethically compliant autonomous systems to work with environments with partial observability (described using POMDPs instead of MDPs).
Skills: ethical AI, MDPs, POMDPs, traditional planning, linear programming, CPLEX, Python, Docker, linux servers, research, paper writing
MIT Interactive Robotics Group: Multiagent RL Communication in Continuous Embedding Space [Under construction]
I implemented code features for a project studying the effects of allowing agents in multiagent RL settings to communicate in a continuous embedding space analagous to word embeddings used in language models.
Skills: reinforcement learning, training ML models, PyTorch, Python, data visualization, information theory, research
MIT Experimental Phonetics Lab: Learning Phonotactic Alternations [Under construction]
I wrote software for linguistics experiments on human participants to test their ability to learn certain sound patterns in an artificial language.
Skills: linguistics, human participants research, PsychoPy, PsychoJS, Pavlovia, Python, JavaScript, data processing, research
AbbVie: Dual Learning for Chinese-English Machine Translation of Medical Texts [Under construction]
I investigated the effects of using a dual learning technique to finetune machine translation models on low-resource topics such as medical texts.
Skills: natural language processing, training ML models, ML algorithms, machine translation, PyTorch, Python, linux servers
Course Projects
MIT 9.60 Machine-Motivated Human Vision: Identifying Road Signs in Non-Optimal Conditions [Has project report]
I compared how machine vision systems and humans differ in their ability to identify road signs in non-optimal conditions, including occlusion by natural obstacles and steep angles of approach.
Skills: machine vision, training ML models, human participants research, Python, jsPsych, JavaScript, research
MIT 6.8610 Quantitative Methods for Natural Language Processing: Quantifying Multi-Class Bias in Word Embeddings [Has project report]
I devised a novel metric for bias in word embeddings for types of bias which are not binary, including racial bias.
Skills: natural language processing, machine learning, word embeddings, Python, research
MIT 9.66 Computational Cognitive Science: Noisy-channel Bayesian Inference in Mandarin Listening Tasks [Has project report]
It is known that after hearing a sentence with an implausible literal meaning, an English speaker may conclude that they misheard and the speaker actually said a different, more plausible, sentence, performing Bayesian inference. I investigated whether Mandarin speakers do the same thing.
Skills: rational analysis, Bayesian analysis, human participants research, Python, data visualization, research
MIT 9.58 Projects in the Science of Intelligence: Effects of Audio Network Architecture on Network Metamers [Has project report]
I tested whether an alternate model architecture for speech recognition systems allows them to produce internal representations that are more recognizable to humans.
Skills: speech recognition, training ML models, TensorFlow, human participants research, research
Other Projects
References to AI Safety in International Policy Documents (External Link)
As part of a mentorship program in AI Governance, I compiled a listing of references to AI Safety in documents from Chinese government agencies and large companies.
Skills: AI Governance, policy research, Chinese, writing, communication