Student Researcher [Seed Vision - Multimodal Interaction & World Model Pretraining] - 2026 Start (PhD)

TikTok

San Jose, Estados Unidos

Prácticas, Ciencia/Investigación, Inglés

155 Visitas			0 Candidatos

Regístrate

Descripción del puesto:

About the team The Seed Multimodal Interaction and World Model team is dedicated to developing models that boast human-level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products. Employ pre-training and simulation technologies to model various environments of the virtual and physical world, providing foundational capabilities for multimodal interactive exploration. We are looking for talented individuals to join us for an internship in 2026. PhD Internships at ByteDance aim to provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. PhD internships at ByteDance provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. Our dynamic internship experience blends hands-on learning, enriching community-building and development events, and collaboration with industry experts. Applications will be reviewed on a rolling basis - we encourage you to apply early. Please state your availability clearly in your resume (Start date, End date). Responsibilities: - Contribute to research and engineering to advance world models and multimodal understanding, enhancing models' reasoning and generation capabilities. - Design and prototype novel architectures that balance modeling performance, generalization, and efficiency. - Help establish scaling laws and conduct systematic ablations to derive transferrable insights across model families and tasks

Requerimientos del candidato/a:

Minimum Qualifications: - Currently pursuing a PhD in Computer Vision, Machine Learning, or a related technical field. - Familiarity with multimodal modeling, world models, or foundation model pretraining. - Strong coding skills and hands-on experience with PyTorch or JAX. - Experience with large-scale distributed training frameworks and GPU/TPU compute stacks. - Demonstrated research ability, with publications in top-tier conferences such as CVPR, ICCV, ECCV, NeurIPS, ICML, or ICLR. Preferred Qualifications: - Experience working with transformer-based architectures, including dense and Mixture-of-Experts (MoE) models. - Understanding of scaling behavior in foundation models and how to analyze them. - Familiarity with data preparation pipelines for large-scale multimodal pretraining

Ver todo el texto