Curriculum Vitae

Adrià Garriga-Alonso

↓ Download PDF

Experience

2026–present
Technical Co-founder
Dokimasia · Remote
Building tools for value-aligned computing. Shielding against unwanted and false information.
2023–2025
Research Scientist
FAR AI · Berkeley, CA
Led interpretability research (ACDC, ~400 citations). Managed team of 3, collaborated with 11. Built GPU infrastructure (8-80 GPUs, >400B parameters), reduced costs ~50%.
2022–2023
Member of Technical Staff
Redwood Research · Berkeley, CA
Correctness testing for optimizing compilers. Built fuzzing/property-based testing suite. Mentored 8 interns across 4 projects.
2021
Summer Research Fellow
Center on Long-Term Risk
Open-source game theory: agents that can read each other's source code.
2019
Research Intern
Microsoft Research Cambridge
Optimal choice and learning from partial observations in inverse RL. Supervisor: Dr. Sebastian Tschiatschek.

Education

2017–2021
PhD Machine Learning
University of Cambridge
Thesis: "Priors in finite and infinite Bayesian convolutional neural networks." Supervisor: Prof. Carl E. Rasmussen. First to show infinite CNNs converge to Gaussian processes.
2016–2017
MSc Computer Science
University of Oxford · Distinction
Thesis: "Probability density imputation of missing data with GMMs." Supervisor: Prof. Mihaela van der Schaar.
2012–2016
BSc Computer Science
Pompeu Fabra University · 1st in class (9.02/10)
Thesis: "Solving Montezuma's Revenge with planning and RL." la Caixa Fellowship (6.6% acceptance). María de Maeztu Award for best CS thesis in Spain.

Selected Publications

Towards Automated Circuit Discovery for Mechanistic Interpretability
A. Conmy, A. Mavor-Parker, A. Lynch, S. Heimersheim, A. Garriga-Alonso
NeurIPS 2023 Spotlight · ~400 citations
Deep Convolutional Networks as Shallow Gaussian Processes
A. Garriga-Alonso, L. Aitchison, C.E. Rasmussen
ICLR 2019 · ~330 citations
Causal Scrubbing: A Method for Rigorously Testing Interpretability Hypotheses
L. Chan, A. Garriga-Alonso, N. Goldowsky-Dill, R. Greenblatt, et al.
Alignment Forum 2022 · ~90 citations
Open Problems in Mechanistic Interpretability
L. Sharkey, B. Chughtai, [...], A. Garriga-Alonso, et al.
2025 · ~100 citations

Mentorship & Service

2024–present
MATS Program Mentor
Advised 10 scholars on mechanistic interpretability and RL. 3 NeurIPS papers, 5 workshop papers. Mentees now at Anthropic, METR, Mistral.
2019–present
Reviewer
NeurIPS (2019 top 5%, 2020, 2025), ICLR (2020, 2021, 2026), ICML (2020, 2021, 2023, 2025), JMLR, various workshops.
2019
Workshop Co-organizer
ICLR 2019 workshop: "Safe Machine Learning: Specification, Robustness and Assurance."

Awards

2017
Malmö Collaborative AI Challenge
1st & 3rd place (different categories). $20,000 Azure credits.
2016
la Caixa Foundation Fellowship
Full tuition and stipend for Oxford MSc. 6.6% acceptance rate.
2016
María de Maeztu Award
Best Computer Science Bachelor's thesis in Spain (reproducibility in software).

Skills

Languages: Python, PyTorch, JAX, C++, Rust
Areas: Mechanistic interpretability, Bayesian ML, Gaussian processes, RL, GPU infrastructure
Human Languages: Catalan (native), Spanish (native), English (fluent)