CV — Adrià Garriga-Alonso

Experience

2026–present

Technical Co-founder

Dokimasia · Remote

Building tools for value-aligned computing. Shielding against unwanted and false information.

2023–2025

Research Scientist

FAR AI · Berkeley, CA

Led interpretability research (ACDC, ~400 citations). Managed team of 3, collaborated with 11. Built GPU infrastructure (8-80 GPUs, >400B parameters), reduced costs ~50%.

2022–2023

Member of Technical Staff

Redwood Research · Berkeley, CA

Correctness testing for optimizing compilers. Built fuzzing/property-based testing suite. Mentored 8 interns across 4 projects.

2021

Summer Research Fellow

Center on Long-Term Risk

Open-source game theory: agents that can read each other's source code.

2019

Research Intern

Microsoft Research Cambridge

Optimal choice and learning from partial observations in inverse RL. Supervisor: Dr. Sebastian Tschiatschek.

Education

2017–2021

PhD Machine Learning

University of Cambridge

Thesis: "Priors in finite and infinite Bayesian convolutional neural networks." Supervisor: Prof. Carl E. Rasmussen. First to show infinite CNNs converge to Gaussian processes.

2016–2017

MSc Computer Science

University of Oxford · Distinction

Thesis: "Probability density imputation of missing data with GMMs." Supervisor: Prof. Mihaela van der Schaar.

2012–2016

BSc Computer Science

Pompeu Fabra University · 1st in class (9.02/10)

Thesis: "Solving Montezuma's Revenge with planning and RL." la Caixa Fellowship (6.6% acceptance). María de Maeztu Award for best CS thesis in Spain.

Selected Publications

Towards Automated Circuit Discovery for Mechanistic Interpretability

A. Conmy, A. Mavor-Parker, A. Lynch, S. Heimersheim, A. Garriga-Alonso

NeurIPS 2023 Spotlight · ~400 citations

Deep Convolutional Networks as Shallow Gaussian Processes

A. Garriga-Alonso, L. Aitchison, C.E. Rasmussen

ICLR 2019 · ~330 citations

Causal Scrubbing: A Method for Rigorously Testing Interpretability Hypotheses

L. Chan, A. Garriga-Alonso, N. Goldowsky-Dill, R. Greenblatt, et al.

Alignment Forum 2022 · ~90 citations

Open Problems in Mechanistic Interpretability

L. Sharkey, B. Chughtai, [...], A. Garriga-Alonso, et al.

2025 · ~100 citations

Mentorship & Service

2024–present

MATS Program Mentor

Advised 10 scholars on mechanistic interpretability and RL. 3 NeurIPS papers, 5 workshop papers. Mentees now at Anthropic, METR, Mistral.

2019–present

Reviewer

NeurIPS (2019 top 5%, 2020, 2025), ICLR (2020, 2021, 2026), ICML (2020, 2021, 2023, 2025), JMLR, various workshops.

2019

Workshop Co-organizer

ICLR 2019 workshop: "Safe Machine Learning: Specification, Robustness and Assurance."

Awards

2017

Malmö Collaborative AI Challenge

1st & 3rd place (different categories). $20,000 Azure credits.

2016

la Caixa Foundation Fellowship

Full tuition and stipend for Oxford MSc. 6.6% acceptance rate.

2016

María de Maeztu Award

Best Computer Science Bachelor's thesis in Spain (reproducibility in software).

Skills

Languages: Python, PyTorch, JAX, C++, Rust
Areas: Mechanistic interpretability, Bayesian ML, Gaussian processes, RL, GPU infrastructure
Human Languages: Catalan (native), Spanish (native), English (fluent)