| 48 Besuche |
0 Bewerbungen |
Beschreibung:
Team Intro: Within the Seed-Infra-Training team, this sub team is responsible for ByteDance's large model training platform. We internally support ByteDance's basic large model training and generative AI business, supporting pre-training and post-training of language models, multi-modal understanding, video generation, etc. We have built a multi-tenant and multi-cloud heterogeneous GPU computing platform for customers, providing a series of stable, efficient, observable and diagnosable framework system platform components to help and support the expansion of large model training to Wanka and beyond
Ihr Profil:
Qualifications: - Currently in BS/MS program in distributed, parallel computing principles and know the recent advances in computing, storage, networking, and hardware technologies. - Familiarity with orchestration frameworks such as Kubernetes, Kubeflow, or Volcano - Proficient in at least one deep learning framework (e.g., PyTorch, Megatron, DeepSpeed, vLLM) - Experience with at least one major machine learning framework Preferred Qualifications: - Knowledge of fault tolerance and system reliability - Experience with large-scale training and LLM systems - Background in AIOps and resource scheduling - Papers selected by top conferences such as OSDI/SOSP/NSDI/ATC/Eurosys/SysML
| Quelle: | Website des Unternehmens |
| Datum: | 27 Nov 2025 (geprüft am 15 Dez 2025) |
| Stellenangebote: | Praktikum |
| Bereich: | Internet / New Media |
| Sprachkenntnisse: | Englisch |