Research
Papers
Still: Amortized KV Cache Compaction in a Single Forward Pass
arXiv preprint, 2026
From superposition to sparse codes: interpretable representations in neural networks
arXiv preprint, 2025
Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts
NeurIPS 2024, Foundation Models for Science (Oral)
Sparse autoencoders for dense text embeddings reveal hierarchical feature sub-structure
NeurIPS 2024, Scientific Methods for Understanding Deep Learning
Steering semantic search with interpretable features from sparse autoencoders
NeurIPS 2024, Foundation Model Interventions
Disentangling Dense Embeddings with Sparse Autoencoders
arXiv preprint
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
arXiv preprint
AstroLLaMA: Towards specialised foundation models in astronomy
IJCNLP-AACL 2023
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets
Research Notes of the American Astronomical Society
Measuring Sharpness in Grokking
ICLR 2024, Bridging the Gap Between Practice and Theory Workshop
Baseten research
Post-training frontier legal agents with Baseten Research
April 2026, with Mudith Jayasekara, Matthew Blau, Aaron Ellis-Bloor, Niko Grupen, and Gabe Pereyra
Towards infinite context windows: neural KV cache compaction
March 2026, with Alex Sandomirsky and Harry Partridge
March 2026, with Max Kirkby
Repeated KV cache for long-running agents
February 2026
February 2026, with Max Kirkby and Mudith Jayasekara
If we can't design neat latent structures, then maybe we can Bitter Lesson it through self-study
January 2026
BYO SWE-grep: automatically train blazing fast search sub-agents on your knowledge base
October 2025, with Jonathon Liu
Lumina: building self-improving evaluation through customer-in-the-loop refinement
October 2025, with Harry Partridge, Max Kirkby, Jonathon Liu, Paras Stefanopoulos, and Mudith Jayasekara
Upweight the strategy, not the tokens: faster training with explicit reasoning through RGT
October 2025, with Harry Partridge and Mudith Jayasekara
Attention-based attribution: what your model is actually looking at
October 2025, with Jonathon Liu, Kimbrian Canavan, Max Kirkby, and Mudith Jayasekara
Training loss predicts evaluation performance, even for non-verifiable tasks
October 2025, with Max Kirkby
Robust, sample-efficient SFT with prompt mutations
October 2025, with Harry Partridge
Iterative SFT: dense reward learning
October 2025, with Jonathon Liu, Harry Partridge, Max Kirkby, and Mudith Jayasekara
Write small, learn forever: rank-1 LoRA for continual learning
October 2025, with Max Kirkby, Harry Partridge, and Jonathon Liu
September 2025, with Max Kirkby
Do transformers notice their own mistakes? Finding a linear hallucination detector inside LLMs
February 2025, with Mudith Jayasekara, Max Kirkby, Sviatoslav Chalnev, and Rune Chi Zhao
Resurrecting the salmon: seeing clearer inside LLMs with domain-specific SAEs
January 2025, with Mudith Jayasekara and Max Kirkby
Why mechanistic interpretability needs a paradigm inversion
January 2025, with Mudith Jayasekara and Max Kirkby