May 17, 2026

Hello

This site is where I’ll keep notes, projects, and writing on infrastructure for large-scale ML and LLM systems.

Topics I’m planning to write about:

Inference serving — scheduling, KV cache, batching, tail latency.
Distributed training — communication, parallelism strategies, failure modes.
GPU systems — kernels, memory hierarchies, profiling.
The weird debugging stories that don’t fit anywhere else.

First real post coming soon.