Research Engineer Intern (Seed - Infra - Training) - 2026 Start (BS/MS)

TikTok

Seattle, Stati Uniti

Stage, Ingegneria, Inglese

223 Visite			0 Candidati

Registrarsi

Descrizione del lavoro:

Team Intro: Within the Seed-Infra-Training team, this sub team is responsible for ByteDance's large model training platform. We internally support ByteDance's basic large model training and generative AI business, supporting pre-training and post-training of language models, multi-modal understanding, video generation, etc. We have built a multi-tenant and multi-cloud heterogeneous GPU computing platform for customers, providing a series of stable, efficient, observable and diagnosable framework system platform components to help and support the expansion of large model training to Wanka and beyond

Requisiti del candidato:

Qualifications: - Currently in BS/MS program in distributed, parallel computing principles and know the recent advances in computing, storage, networking, and hardware technologies. - Familiarity with orchestration frameworks such as Kubernetes, Kubeflow, or Volcano - Proficient in at least one deep learning framework (e.g., PyTorch, Megatron, DeepSpeed, vLLM) - Experience with at least one major machine learning framework Preferred Qualifications: - Knowledge of fault tolerance and system reliability - Experience with large-scale training and LLM systems - Background in AIOps and resource scheduling - Papers selected by top conferences such as OSDI/SOSP/NSDI/ATC/Eurosys/SysML

Visualizza tutto