Mengjie Zhao

Senior software engineer at LinkedIn, working across large-scale ads and ML infrastructure spanning retrieval, ranking, model serving, and the high-throughput, low-latency production systems underneath them. I take work end-to-end: system design, implementation, rollout, and the long tail of operating it reliably at ads scale.

Most of my recent work is close to production inference — serving paths, request routing, online experimentation, embedding pipelines, and rollout safety under tight latency budgets. I shipped differential serving, which A/B tests a new PyTorch GPU inference stack against the production CPU champion at request granularity, so an entirely new serving path can roll out without risking the one that's live. Earlier I built the stateful ads cache, using hybrid pull + push ingestion to hold snapshot-level consistency, which became the data foundation for nearline embedding inference and downstream realtime serving.

I tend to be the early adopter when an emerging ML workflow needs to become reliable infrastructure: ramping up in a new domain, defining system boundaries, and turning an ambiguous problem into a scalable service. These days that pull is toward LLM inference serving at scale, where I'm increasingly active in the open-source community and contributing to vLLM.

Recent writing

All writing →