Hello


This site is where I’ll keep notes, projects, and writing on infrastructure for large-scale ML and LLM systems.

Topics I’m planning to write about:

  • Inference serving — scheduling, KV cache, batching, tail latency.
  • Distributed training — communication, parallelism strategies, failure modes.
  • GPU systems — kernels, memory hierarchies, profiling.
  • The weird debugging stories that don’t fit anywhere else.

First real post coming soon.