Mengjie Zhao
Infra engineer at LinkedIn working on latency-sensitive realtime ML inference for ads scoring and ranking at internet scale. Currently going deep on LLM inference serving — paged attention, continuous batching, disaggregated serving.
Notes, projects, and writing on infra for ML systems at scale.
CV · Writing · GitHub · LinkedIn · zmj0129@gmail.com
Selected projects
- Project A (placeholder) — One sentence on the problem, the approach, and the result with a number. e.g. "Cut P99 inference latency 38% on a sharded LLM fleet by ..."
- Project B (placeholder) — Short blurb on the next flagship project. Numbers > adjectives.
- Project C (placeholder) — Open-source repo: what it is, why it exists, what to look at first.