Publica unas prácticas
es
Detalles de la Oferta
Empleo > Prácticas > Informática/Tecnología > EE.UU. > San Jose > Detalles de la Oferta 

Software Engineer Project Intern (Model Infrastructure) - 2026 Start (BS/MS)

TikTok
Estados Unidos  San Jose, Estados Unidos
Prácticas, Informática/Tecnología, Inglés
11
Visitas
0
Candidatos
Regístrate

Descripción del puesto:

About the Team The TikTok Model Infrastructure team is the core engine powering the world's most engaged "For You" feed. We focus on the engineering efficiency and architectural evolution of recommendation models at an unprecedented scale. As we lead the industry's shift toward LLM2Rec and Large Recommendation Models (LRM), our mission is to build ultra-high-performance infrastructure that bridges the gap between massive data scale and extreme algorithmic complexity. We tackle the industry's most demanding "frontier" challenges: managing Petabyte-scale distributed embedding states, optimizing thousand-node GPU clusters, and perfecting real-time Sparse/Dense streaming. Our work ensures that models with hundreds of billions of dense parameters-on par with the world's largest LLMs-can operate with millisecond-level latency. We are seeking Software Engineering Interns to join the Model Infra team to redefine the performance boundaries of recommendation systems. In this role, you will focus on the efficiency of the entire model lifecycle. You will work on the convergence of generative AI and recommendation architecture, optimizing everything from the raw throughput of multi-billion parameter dense blocks to the efficient retrieval of sparse features across massive distributed memory fabrics. As a project intern, you will have the opportunity to engage in impactful short-term projects that provide you with a glimpse of professional real-world experience. You will gain practical skills through on-the-job learning in a fast-paced work environment and develop a deeper understanding of your career interests. Applications will be reviewed on a rolling basis - we encourage you to apply early. Responsibilities - Engineering Efficiency at Scale: Drive the optimization of training and inference pipelines to maximize hardware utilization (MFU/HFU) for models featuring hundreds of billions of dense parameters. - LLM2Rec Infrastructure: Architect specialized systems to support the integration of LLMs into the recommendation stack, focusing on memory-efficient attention mechanisms and advanced KV cache management for long-sequence user modeling. - Massive Sparse & Dense Streaming: Build and optimize high-concurrency engines for Petabyte-scale streaming training, handling continuous parameter updates and high-frequency data ingestion without compromising stability. - Hardware-Aware Co-Design: Work closely with researchers to design next-generation recommendation architectures optimized for modern GPU/NPU interconnects, ensuring high-bandwidth utilization across the cluster. - Distributed State Management: Innovate on how we store and synchronize massive model states across heterogeneous memory hierarchies (HBM, DDR, and NVMe)

Requerimientos del candidato/a:

Minimum Qualification(s) - Currently pursuing an Undergraduate/Master in Software Development, Computer Science, Computer Engineering, or a related technical discipline. - Strong programming skills in C++ and Python. - Solid understanding of Computer Architecture and the GPU software stack (CUDA, Triton, or NCCL). - Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) and a desire to "look under the hood" of model execution runtimes. - A strong interest in solving system-level bottlenecks in large-scale distributed environments. Preferred Qualification(s) - Experience with Transformer-based architectures, 3D parallelism (TP/PP/DP). - Deep understanding of the torch.compile stack, including TorchDynamo (graph acquisition) and TorchInductor (lowering). - Hands-on experience writing high-performance kernels or optimizing collective communication (e.g., customizing NCCL/UCX). - Familiarity with RDMA networking, high-performance storage, or specialized Parameter Server architectures. - Success in programming competitions (ACM-ICPC) or contributions to prominent open-source AI infrastructure or high-performance computing projects

Origen: Web de la compañía
Publicado: 04 Abr 2026  (comprobado el 09 Abr 2026)
Tipo de oferta: Prácticas
Sector: Internet / Nuevos Medios
Duración: 6 meses
Idiomas: Inglés
Regístrate
151.846 empleos y prácticas
en 155 países
Regístrate