| 47 Visite |
0 Candidati |
Descrizione del lavoro:
Team Intro: Within the Seed-Infra-Training team, this sub team is responsible for ByteDance's large model training platform. We internally support ByteDance's basic large model training and generative AI business, supporting pre-training and post-training of language models, multi-modal understanding, video generation, etc. We have built a multi-tenant and multi-cloud heterogeneous GPU computing platform for customers, providing a series of stable, efficient, observable and diagnosable framework system platform components to help and support the expansion of large model training to Wanka and beyond
Requisiti del candidato:
Qualifications: - Currently in BS/MS program in distributed, parallel computing principles and know the recent advances in computing, storage, networking, and hardware technologies. - Familiarity with orchestration frameworks such as Kubernetes, Kubeflow, or Volcano - Proficient in at least one deep learning framework (e.g., PyTorch, Megatron, DeepSpeed, vLLM) - Experience with at least one major machine learning framework Preferred Qualifications: - Knowledge of fault tolerance and system reliability - Experience with large-scale training and LLM systems - Background in AIOps and resource scheduling - Papers selected by top conferences such as OSDI/SOSP/NSDI/ATC/Eurosys/SysML
| Provenienza: | Web dell'azienda |
| Pubblicato il: | 27 Nov 2025 (verificato il 14 Dic 2025) |
| Tipo di impiego: | Stage |
| Settore: | Internet / New Media |
| Lingue: | Inglese |
Aziende |
Offerte |
Paesi |