Student Researcher [Seed LLM Post Training - Reward Modeling] - 2026 Start (PhD)

TikTok

San Jose, Stati Uniti

Stage, Scienza/Ricerca, Inglese

148 Visite			0 Candidati

Registrarsi

Descrizione del lavoro:

About the team The Seed LLM Post Training team is responsible for researching cutting-edge posttrain technologies and providing core posttrain capabilities for unified multimodal large models. The team's goal is to research and explore next-generation advanced technologies such as SFT, RM, RL, and self-learning during the posttrain phase, while significantly optimizing and improving key areas including reasoning, coding, agent, and omni model. PhD internships at ByteDance provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. Our dynamic internship experience blends hands-on learning, enriching community-building and development events, and collaboration with industry experts. Applications will be reviewed on a rolling basis - we encourage you to apply early. Please state your availability clearly in your resume (Start date, End date). Responsibilities - Design and train reward models that reflect nuanced human preferences in LLM outputs. - Develop and evaluate components of a Reward Model System that integrates model predictions, verifier feedback, tool usage, and agent signals to produce reliable, generalizable reward estimates. - Develop reward models to enhance controllability and instruction-following performance, especially in scenarios involving complex, multi-part user requests. - Contribute to data selection and synthesis pipelines that improve post-training data quality, leveraging reward signals to expand the model's capabilities. - Research scalable methods for learning from pairwise comparisons, rankings, or human demonstrations across diverse tasks

Requisiti del candidato:

Minimum Qualifications: - Currently pursuing a PhD in Computer Science, Machine Learning, or a related technical field. - First-author publications in top-tier venues (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP). - Research experience in reward modeling, human preference learning, or LLM post-training. - Proficient in Python and deep learning frameworks such as PyTorch or JAX. - Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment Preferred Qualifications: - Experience with RLHF, DPO, rejection sampling, or ranking-based supervision methods. - Familiarity with model-based reward composition, verifier integration, or synthetic data pipelines. - Understanding of how reward models interact with large-scale RL and agent systems

Visualizza tutto