Backup Transformer Heads are Robust to Ablation Distribution
Some quick and dirty mechanistic interpretability research into backup name mover heads for indirect object identification (IOI) in GPT-2 small, which won 2nd place in Apart Research's second Interpretability Hackathon.
Responsible for: Deciding on the research question, engineering and running the mean ablation experiments, evaluating and interpreting results.
MLAB Transformers From Scratch
A documented and unit-tested repo to help you learn how to build transformer neural network models from scratch.
Responsible for: Taking the transformer days from the original MLAB repo, creating a clean starter file with class/method stubs and clear docstrings, adding unit tests, and implementing a clean solution file.
A slightly new technique building off prior work for unsupervised learning of human-interpretable concept-based explanations of language models operating on the task of sentiment analysis. Compared to black-box baseline models, performance is comparable, but the coherency of discovered concepts is sometimes mixed.
Responsible for: Almost all of the code, most of the paper.
Levelling Up in AIS RE
A level-based guide for independently up-skilling in AI Safety Research Engineering that aims to give concrete objectives, goals, and resources to help anyone go from zero to hero.
Responsible for: Everything, though it draws upon knowledge from others listed in the Sources section.
Minitorch Self-Study Guide
While implementing Minitorch, a Python implementation of the core functionality of the popular PyTorch machine learning library, and getting a better understanding of how autodifferentiation, tensors, and other PyTorchic things work, I created this study guide to help others learn as well.
Responsible for: All of the study guide, none of Minitorch.
A plugin to facilitate the use deep reinforcement learning in Unreal Engine by exposing UE4 as an OpenAI Gym environment and a suite of Deep RL samples project to test the plugin.
Responsible for: Cleaning up and improving the plugin for release, creating all the sample projects while working for Epic Games.