| 47 Visites |
0 Candidats |
Description du poste:
Team Intro: Within the Seed-Infra-Training team, this sub team is responsible for ByteDance's large model training platform. We internally support ByteDance's basic large model training and generative AI business, supporting pre-training and post-training of language models, multi-modal understanding, video generation, etc. We have built a multi-tenant and multi-cloud heterogeneous GPU computing platform for customers, providing a series of stable, efficient, observable and diagnosable framework system platform components to help and support the expansion of large model training to Wanka and beyond
Profil requis du candidat:
Qualifications: - Currently in BS/MS program in distributed, parallel computing principles and know the recent advances in computing, storage, networking, and hardware technologies. - Familiarity with orchestration frameworks such as Kubernetes, Kubeflow, or Volcano - Proficient in at least one deep learning framework (e.g., PyTorch, Megatron, DeepSpeed, vLLM) - Experience with at least one major machine learning framework Preferred Qualifications: - Knowledge of fault tolerance and system reliability - Experience with large-scale training and LLM systems - Background in AIOps and resource scheduling - Papers selected by top conferences such as OSDI/SOSP/NSDI/ATC/Eurosys/SysML
| Origine: | Site web de l'entreprise |
| Publié: | 27 Nov 2025 (vérifié le 14 Dec 2025) |
| Type de poste: | Stage |
| Secteur: | Internet / Nouveaux Médias |
| Langues: | Anglais |
Entreprises |
Offres |
Pays |