Hello
This site is where I’ll keep notes, projects, and writing on infrastructure for large-scale ML and LLM systems.
Topics I’m planning to write about:
- Inference serving — scheduling, KV cache, batching, tail latency.
- Distributed training — communication, parallelism strategies, failure modes.
- GPU systems — kernels, memory hierarchies, profiling.
- The weird debugging stories that don’t fit anywhere else.
First real post coming soon.