Job Description:
About the Team The Seed Infrastructures team oversees the distributed training, reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models. As a project intern, you will have the opportunity to engage in impactful short-term projects that provide you with a glimpse of professional real-world experience. You will gain practical skills through on-the-job learning in a fast-paced work environment and develop a deeper understanding of your career interests. Applications will be reviewed on a rolling basis - we encourage you to apply early. Responsibilities - Contribute to AI compiler optimizations for training and inference workloads - Develop and extend MLIR-based compiler passes for graph lowering, optimization, and code generation - Optimize model execution on GPU and NPU accelerators, focusing on performance, memory efficiency, and scalability - Support model deployment pipelines, including compilation, packaging, and runtime integration - Assist with distributed training and inference acceleration, such as parallel execution, communication optimization, and runtime scheduling - Benchmark, profile, and analyze performance of large-scale models across different hardware backends - Collaborate with researchers and engineers to translate model and system requirements into compiler and runtime improvements
Candidate Requirements:
Minimum Qualifications - Currently pursuing a Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related technical fields - Experience using or developing open source frameworks for LLM inference such as vLLM or SGLang. Proficient in at least one deep learning framework (e.g., PyTorch, Megatron, DeepSpeed, JAX), with experience in model inference workflows - Understanding of modern computing systems, including hardware, storage, and networking, and how they impact ML workloads - Familiarity with compilers or model optimization pipelines (e.g., PyTorch Dynamo), or related model execution workflows - Able to commit to working for 12 weeks in 2026 Preferred Qualifications - Experience with distributed or large-scale ML systems, including training or inference pipelines and related optimizations (e.g., FSDP, DeepSpeed, Megatron, GSPMD) - Experience with GPU/TPU/NPU programming and performance optimization, or high-performance computing and communication (e.g., CUDA, Triton, NCCL, RDMA) - Understanding of AI compiler and model optimization stacks (e.g., torch.fx, PyTorch Dynamo, XLA, MLIR)
| Source: | Company website |
| Posted on: | 11 Apr 2026 (verified 15 Apr 2026) |
| Type of offer: | Internship |
| Industry: | Internet / New Media |
| Languages: | English |