Publica unas prácticas
es
Detalles de la Oferta
Empleo > Prácticas > Ciencia/Investigación > EE.UU. > San Jose > Detalles de la Oferta 

Student Researcher [Seed LLM Post Training - Reward Modeling] - 2026 Start (PhD)

TikTok
Estados Unidos  San Jose, Estados Unidos
Prácticas, Ciencia/Investigación, Inglés
151
Visitas
0
Candidatos
Regístrate

Descripción del puesto:

About the team The Seed LLM Post Training team is responsible for researching cutting-edge posttrain technologies and providing core posttrain capabilities for unified multimodal large models. The team's goal is to research and explore next-generation advanced technologies such as SFT, RM, RL, and self-learning during the posttrain phase, while significantly optimizing and improving key areas including reasoning, coding, agent, and omni model. PhD internships at ByteDance provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. Our dynamic internship experience blends hands-on learning, enriching community-building and development events, and collaboration with industry experts. Applications will be reviewed on a rolling basis - we encourage you to apply early. Please state your availability clearly in your resume (Start date, End date). Responsibilities - Design and train reward models that reflect nuanced human preferences in LLM outputs. - Develop and evaluate components of a Reward Model System that integrates model predictions, verifier feedback, tool usage, and agent signals to produce reliable, generalizable reward estimates. - Develop reward models to enhance controllability and instruction-following performance, especially in scenarios involving complex, multi-part user requests. - Contribute to data selection and synthesis pipelines that improve post-training data quality, leveraging reward signals to expand the model's capabilities. - Research scalable methods for learning from pairwise comparisons, rankings, or human demonstrations across diverse tasks

Requerimientos del candidato/a:

Minimum Qualifications: - Currently pursuing a PhD in Computer Science, Machine Learning, or a related technical field. - First-author publications in top-tier venues (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP). - Research experience in reward modeling, human preference learning, or LLM post-training. - Proficient in Python and deep learning frameworks such as PyTorch or JAX. - Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment Preferred Qualifications: - Experience with RLHF, DPO, rejection sampling, or ranking-based supervision methods. - Familiarity with model-based reward composition, verifier integration, or synthetic data pipelines. - Understanding of how reward models interact with large-scale RL and agent systems

Origen: Web de la compañía
Publicado: 04 Nov 2025  (comprobado el 15 Dic 2025)
Tipo de oferta: Prácticas
Sector: Internet / Nuevos Medios
Idiomas: Inglés
Regístrate
121.936 empleos y prácticas
en 157 países
Regístrate
Empresas
Ofertas
Países