Mengjie Zhao

Infra engineer at LinkedIn working on latency-sensitive realtime ML inference for ads scoring and ranking at internet scale. Currently going deep on LLM inference serving — paged attention, continuous batching, disaggregated serving.

Notes, projects, and writing on infra for ML systems at scale.

CV · Writing · GitHub · LinkedIn · zmj0129@gmail.com

Selected projects

Project A (placeholder) — One sentence on the problem, the approach, and the result with a number. e.g. "Cut P99 inference latency 38% on a sharded LLM fleet by ..."
Project B (placeholder) — Short blurb on the next flagship project. Numbers > adjectives.
Project C (placeholder) — Open-source repo: what it is, why it exists, what to look at first.

More projects →

Recent writing

May 17, 2026 Hello

All writing →