Mengjie Zhao

Infra engineer at LinkedIn working on latency-sensitive realtime ML inference for ads scoring and ranking at internet scale. Currently going deep on LLM inference serving — paged attention, continuous batching, disaggregated serving.

Notes, projects, and writing on infra for ML systems at scale.

CV · Writing · GitHub · LinkedIn · zmj0129@gmail.com

Selected projects

More projects →

Recent writing

All writing →