Writing
Essays, notes, and rougher thinking.
The less formal side of the archive: model notes, build logs, and ideas that are still becoming sharper.
8 June 2026
GPT-2 --> Llama 3, Part 2: Optimizing for generation
A walkthrough of why generation changes transformer attention, how KV cache cuts repeated work, and why Llama-style grouped-query attention shrinks inference memory.
8 May 2026
GPT-2 --> Llama 3, One Improvement At A Time
A thinking-out-loud walkthrough of moving a GPT-2-style block toward Llama 3 with RoPE, RMSNorm, and SwiGLU.