Skip to content
Writing

Essays, notes, and rougher thinking.

The less formal side of the archive: model notes, build logs, and ideas that are still becoming sharper.

8 June 2026

GPT-2 --> Llama 3, Part 2: Optimizing for generation

A walkthrough of why generation changes transformer attention, how KV cache cuts repeated work, and why Llama-style grouped-query attention shrinks inference memory.

8 May 2026

GPT-2 --> Llama 3, One Improvement At A Time

A thinking-out-loud walkthrough of moving a GPT-2-style block toward Llama 3 with RoPE, RMSNorm, and SwiGLU.