About
Maths undergrad working on AI safety.
I spend most of my time outside coursework on AI safety. Currently running the Nottingham AI Safety Initiative as a Kairos Pathfinder Fellow. Other things I’ve worked on along the way: a campus technical fellowship, a UK AI governance podcast, and some interpretability research.

The work
Where I’m trying to be useful
Two things at the moment.
The technical side: mechanistic interpretability. Most recently I’ve been working on how refusal directions degrade once published jailbreak frameworks bypass them, using PCA, SVD, and steering vectors. Sparse autoencoders, RLHF, and Constitutional AI on the alignment-training side.
The organisational side: running the Nottingham AI Safety Initiative as a Kairos Pathfinder Fellow. Designed the curriculum (built on BlueDot Impact’s Technical AI Safety course), brought together the cohort, got the university on board. Before that I co-founded AI Policy Pulse, a UK AI governance podcast.
To follow what we’re doing at NAISI, find us on Instagram and elsewhere via linktr.ee/nottsaisi.
Methods
- Mech interp
- PCA
- SVD
- Steering vectors
- Sparse autoencoders
- RLHF
- Constitutional AI
- Jailbreak analysis
- Grokking
Stack
- Python
- TypeScript
- React
- Firestore
Experience
Recent and notable
- 2025 →
Pathfinder Fellow
Kairos Pathfinder Fellowship
Co-founded the Nottingham AI Safety Initiative. Designed the curriculum (built on BlueDot Impact's Technical AI Safety course), brought together the cohort, got the university on board. I run a 2-hour weekly technical fellowship for around ten students and a lecturer, covering interpretability, evaluations, and alignment training.
- Late 2025
Course participant
BlueDot Impact · Technical AI Safety
30-hour technical curriculum on RLHF, Constitutional AI, data filtration, dangerous-capability evaluations, alignment faking, and mechanistic interpretability via sparse autoencoders. I now use it as the backbone of the Kairos fellowship I run at Nottingham. Submitted a research proposal extending Nanda et al.'s grokking analysis to Fourier-based algorithmic representations in transformers learning modular arithmetic.
- Spring 2025
Technical AI Safety Researcher
Impact Research Groups
Five-person team investigating how refusal directions in open-source LLMs degrade when published jailbreak frameworks bypass them. I owned the long-prompt regime and its signal-extraction problems. Applied PCA, SVD, and direction extraction / steering. Placed third in the cohort.
- 2024–25
Podcast Director (co-founder)
AI Policy Pulse
Co-founded a UK-focused AI governance podcast for getting Gen Z engaged with catastrophic AI risks. Booked guests via the London Initiative for Safe AI and worked with academics on technically grounded episodes. Handled editing, publication, and marketing myself.
- Summer 2024
Course participant
BlueDot Impact · AI Alignment
Co-authored a project on the effect of online AI-generated misinformation on LLM predictions. The course shaped the editorial direction at AI Policy Pulse afterwards.
The credentialed long form lives on the CV (PDF).
Beyond the work
Other things
- Global Challenges Project: a programme on biosecurity and AI safety; the iteration I attended ran in Oxford. Met a lot of the people I now know in the UK AI safety community there.
- Senior Non-Commissioned Officer, Army Cadets, planning and leading training for junior cadets.
- Volunteered at a local Nottingham charity, preparing and serving meals to students.
- Self-taught my way through Karpathy’s nanoGPT and nanoLLM series; regular practice on Codewars and LeetCode.
Get in touch
Find me here
Easiest way to reach me is to book a slot or email zach@zlevin.uk. If you’re a student thinking about AI safety, an organiser running an AI safety group at another university, or a researcher working on adjacent problems, I’d especially like to hear from you.
Looking for summer 2026 AI safety research positions and related programmes. Get in touch if you have something or know who I should be talking to.