Publica unas prácticas
es
Detalles de la Oferta
Empleo > Prácticas > Ciencia/Investigación > EE.UU. > San Jose > Detalles de la Oferta 

Student Researcher [Seed Vision - Multimodal Interaction & World Model Pretraining] - 2026 Start (PhD)

TikTok
Estados Unidos  San Jose, Estados Unidos
Prácticas, Ciencia/Investigación, Inglés
155
Visitas
0
Candidatos
Regístrate

Descripción del puesto:

About the team The Seed Multimodal Interaction and World Model team is dedicated to developing models that boast human-level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products. Employ pre-training and simulation technologies to model various environments of the virtual and physical world, providing foundational capabilities for multimodal interactive exploration. We are looking for talented individuals to join us for an internship in 2026. PhD Internships at ByteDance aim to provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. PhD internships at ByteDance provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. Our dynamic internship experience blends hands-on learning, enriching community-building and development events, and collaboration with industry experts. Applications will be reviewed on a rolling basis - we encourage you to apply early. Please state your availability clearly in your resume (Start date, End date). Responsibilities: - Contribute to research and engineering to advance world models and multimodal understanding, enhancing models' reasoning and generation capabilities. - Design and prototype novel architectures that balance modeling performance, generalization, and efficiency. - Help establish scaling laws and conduct systematic ablations to derive transferrable insights across model families and tasks

Requerimientos del candidato/a:

Minimum Qualifications: - Currently pursuing a PhD in Computer Vision, Machine Learning, or a related technical field. - Familiarity with multimodal modeling, world models, or foundation model pretraining. - Strong coding skills and hands-on experience with PyTorch or JAX. - Experience with large-scale distributed training frameworks and GPU/TPU compute stacks. - Demonstrated research ability, with publications in top-tier conferences such as CVPR, ICCV, ECCV, NeurIPS, ICML, or ICLR. Preferred Qualifications: - Experience working with transformer-based architectures, including dense and Mixture-of-Experts (MoE) models. - Understanding of scaling behavior in foundation models and how to analyze them. - Familiarity with data preparation pipelines for large-scale multimodal pretraining

Origen: Web de la compañía
Publicado: 09 Dic 2025  (comprobado el 05 Ene 2026)
Tipo de oferta: Prácticas
Sector: Internet / Nuevos Medios
Idiomas: Inglés
Regístrate
118.102 empleos y prácticas
en 153 países
Regístrate
Empresas
Ofertas
Países