Hannah Erlebach
DPhil in Machine Learning @ FLAIR, University of Oxford
Oxford · UK
About
I’m a first year DPhil student in machine learning at the Univeristy of Oxford, supervised by Jakob Foerster. Previously, I completed an MSc in Machine Learning at University College London, and a BA in Mathematics at the University of Cambridge. I am funded by the Cooperative AI PhD fellowship.
I'm broadly interested in exploring questions about minds: what they are, where they come from, and how they could be. I also care deeply about AI alignment and what it means for diverse minds to co-exist harmoniously.
You can contact me at hannah [dot] erlebach [at] gmail [dot] com.
Research
I'm currently thinking about goals: what they are and how they arise from a goal-less universe. Ultimately, I think that reward specification is an inescapable problem, and I'm curious about whether and how it's possible to induce interesting behaviour outside of the reward-maximisation paradigm.
My previous research has focused on cooperation in language models and multi-agent reinforcement learning settings.
- DUA: Discovering Universal Attacks Using Foundation Models. Master's thesis for UCL MSc in Machine Learning, 2025.
- Guiding Evolution of Artificial Life Using Vision-Language Models. Nikhil Baid, Hannah Erlebach, Paul Hellegouarch and Frederico Wieser. Published in Artificial Life Conference 2025. [arXiv]
- Mitigating Goal Misgeneralisation via Minimax Regret. Karim Abdel Sadek, Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christian Schroeder de Witt, David Krueger and Michael Dennis. Published in Reinforcement Learning Conference 2025.
- RACCOON: Regret-based Adaptive Curricula for Cooperation. Hannah Erlebach and Jonathan Cook. Published in CoCoMARL workshop at Reinforcement Learning Conference 2024.
- Welfare Diplomacy: Benchmarking Language Model Cooperation. Gabriel Mukobi, Hannah Erlebach, Niklas Lauffer, Lewis Hammond, Alan Chan and Jesse Clifton. Published in SoLaR workshop at NeurIPS 2023. [arXiv]