Skip to content

Projects

A selection of projects completed for undergraduate research, internships, and coursework.

This section of my website is currently under construction! I have listed brief summaries of projects I've completed, but am in the process of updating the corresponding detail pages for each.

Note: I changed my first name late 2022 so some of my previous work is credited under a different name.

Work Projects

Center for Human-Compatible AI: A STRONGReject for Empty Jailbreaks [ICLR 2024 Workshop on Reliable and Responsible Foundation Models; NeurIPS submission in progress]

I helped create a new benchmark for jailbreak attacks against large language models, consisting of 346 harmful prompts aimed to evaluate whether jailbreak attacks enable malicious actors to use LLMs to cause harm and a novel autograding method which achieves SOTA performance on matching fine-grained human judgments of jailbreak quality.

I implemented intelligence evaluations using the Massive Multitask Language Understanding benchmark to show that many jailbreaks degrade model intelligence, decreasing their usefulness for malicious actors.

I finetuned language models to create a new autograder which is easy and cost-effective to run on a laptop CPU, with comparable performance to autograders implemented using the GPT-4 API.

Skills: large language models (LLMs), jailbreaking, benchmark creation, AI safety, AI evaluation, Python, linux, LLM finetuning, machine learning, data analysis, paper writing

Center for Human-Compatible AI: Partially Observable Ethically Compliant Autonomous Systems [ICRA 2024]

I extended a framework for ethically compliant autonomous systems to work with environments with partial observability (described using POMDPs instead of MDPs).

Skills: ethical AI, MDPs, POMDPs, traditional planning, linear programming, CPLEX, Python, Docker, linux servers, research, paper writing

MIT Interactive Robotics Group: Multiagent RL Communication in Continuous Embedding Space [Under construction]

I implemented code features for a project studying the effects of allowing agents in multiagent RL settings to communicate in a continuous embedding space analagous to word embeddings used in language models.

Skills: reinforcement learning, training ML models, PyTorch, Python, data visualization, information theory, research

MIT Experimental Phonetics Lab: Learning Phonotactic Alternations [Under construction]

I wrote software for linguistics experiments on human participants to test their ability to learn certain sound patterns in an artificial language.

Skills: linguistics, human participants research, PsychoPy, PsychoJS, Pavlovia, Python, JavaScript, data processing, research

AbbVie: Dual Learning for Chinese-English Machine Translation of Medical Texts [Under construction]

I investigated the effects of using a dual learning technique to finetune machine translation models on low-resource topics such as medical texts.

Skills: natural language processing, training ML models, ML algorithms, machine translation, PyTorch, Python, linux servers

Course Projects

MIT 9.60 Machine-Motivated Human Vision: Identifying Road Signs in Non-Optimal Conditions [Has project report]

I compared how machine vision systems and humans differ in their ability to identify road signs in non-optimal conditions, including occlusion by natural obstacles and steep angles of approach.

Skills: machine vision, training ML models, human participants research, Python, jsPsych, JavaScript, research

MIT 6.8610 Quantitative Methods for Natural Language Processing: Quantifying Multi-Class Bias in Word Embeddings [Has project report]

I devised a novel metric for bias in word embeddings for types of bias which are not binary, including racial bias.

Skills: natural language processing, machine learning, word embeddings, Python, research

MIT 9.66 Computational Cognitive Science: Noisy-channel Bayesian Inference in Mandarin Listening Tasks [Has project report]

It is known that after hearing a sentence with an implausible literal meaning, an English speaker may conclude that they misheard and the speaker actually said a different, more plausible, sentence, performing Bayesian inference. I investigated whether Mandarin speakers do the same thing.

Skills: rational analysis, Bayesian analysis, human participants research, Python, data visualization, research

MIT 9.58 Projects in the Science of Intelligence: Effects of Audio Network Architecture on Network Metamers [Has project report]

I tested whether an alternate model architecture for speech recognition systems allows them to produce internal representations that are more recognizable to humans.

Skills: speech recognition, training ML models, TensorFlow, human participants research, research

Other Projects

References to AI Safety in International Policy Documents (External Link)

As part of a mentorship program in AI Governance, I compiled a listing of references to AI Safety in documents from Chinese government agencies and large companies.

Skills: AI Governance, policy research, Chinese, writing, communication