Job Responsibilities Optimize the inference deployment of large models on computing clusters, focusing on multi-node, multi-GPU parallel inference, task scheduling, KV cache management, and other techniques to enhance inference performan......
Job Location: Hong Kong, Hong Kong