Posts
One-dimensional vs multi-dimensional features in interpretability
How to stop conflating with 768 dimensions
February 01, 2025
LLMs are really good at k-order thinking (where k is even)
You still need to tell a language model you want to cure cancer before it can help you cure cancer.
January 15, 2025
Information bounds in quantum gravity
How information theory links quantum mechanics and general relativity
January 13, 2025
Can quantised autoencoders find and interpret circuits in language models?
Using VQ-VAEs and categorical decision trees to do automatic circuit identification in LLMs.
March 01, 2024
Learning compressed representations and GPT-5 speculation
Why language models probably get too much from the abstraction we give them for free
March 01, 2024