Artificial Intelligence

By Gabriel Mukobi

Machine learning/deep learning/reinforcement learning/other AI projects with a focus on AI safety, including both technical research projects and some community resources for helping others learn.

Backup Transformer Heads are Robust to Ablation Distribution

Some quick and dirty mechanistic interpretability research into backup name mover heads for indirect object identification (IOI) in GPT-2 small, which won 2nd place in Apart Research's second Interpretability Hackathon.

Responsible for: Deciding on the research question, engineering and running the mean ablation experiments, evaluating and interpreting results.

MLAB Transformers From Scratch

A documented and unit-tested repo to help you learn how to build transformer neural network models from scratch.

Responsible for: Taking the transformer days from the original MLAB repo, creating a clean starter file with class/method stubs and clear docstrings, adding unit tests, and implementing a clean solution file.

Concept-Based Explanations

A slightly new technique building off prior work for unsupervised learning of human-interpretable concept-based explanations of language models operating on the task of sentiment analysis. Compared to black-box baseline models, performance is comparable, but the coherency of discovered concepts is sometimes mixed.

Responsible for: Almost all of the code, most of the paper.

Levelling Up in AIS RE

A level-based guide for independently up-skilling in AI Safety Research Engineering that aims to give concrete objectives, goals, and resources to help anyone go from zero to hero.

Responsible for: Everything, though it draws upon knowledge from others listed in the Sources section.

Minitorch Self-Study Guide

While implementing Minitorch, a Python implementation of the core functionality of the popular PyTorch machine learning library, and getting a better understanding of how autodifferentiation, tensors, and other PyTorchic things work, I created this study guide to help others learn as well.

Responsible for: All of the study guide, none of Minitorch.


A plugin to facilitate the use deep reinforcement learning in Unreal Engine by exposing UE4 as an OpenAI Gym environment and a suite of Deep RL samples project to test the plugin.

Responsible for: Cleaning up and improving the plugin for release, creating all the sample projects while working for Epic Games.

CNNs for CGI Detection

A binary CNN classifier to distinguish between real photographic images and photorealistic computer-generated images that achieves 96% test accuracy on a custom dataset.

Responsible for: Approximately half of the work.