Job Description:
Imagine what you could do here. At Apple, great new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish! Are you passionate about music, movies, and the world of Artificial Intelligence and Machine Learning? So are we! Join our Human-Centered AI team for Apple Media Services. In this role, you'll represent the user perspective on new features, review and analyze data, and evaluate AI models powering everything from search and recommendations to other innovative features. Collaborate with Data Scientists, Researchers, and Engineers to drive improvements across our platforms.
We are looking for a Machine Learning Engineer focused on Evaluation & Insights for the Human-Centered AI team. In this role, you will bridge the gap between human perception and algorithmic performance, helping evaluate and optimize Foundation Models and generative AI systems. You will architect robust evaluation frameworks, design scalable MLOps pipelines for model assessment, and translate qualitative failure modes into programmatic guardrails and training signals (e.g., SFT, RLHF/DPO). This role blends deep ML engineering expertise with strong analytical judgment to assess, interpret, and improve the behavior of advanced AI models. You will work cross-functionally with Software Engineering, Product, Research and Responsible AI teams at Apple to ensure that our AI experiences are reliable, safe, and aligned with human expectations.
Bachelor's or Master's degree in Computer Science, Machine Learning, Artificial Intelligence, Cognitive Science, or a related technical field, with relevant industry experience in ML Engineering or Applied Research. Advanced proficiency in Python and modern deep learning ecosystems (PyTorch, JAX, Hugging Face). Strong ability to interpret unstructured model outputs (text, transcripts, embedding spaces) and synthesize qualitative findings into actionable engineering guidance and training objectives. Hands-on experience developing, fine-tuning, or evaluating LLMs, multimodal models, and NLP systems. Deep familiarity with AI quality metrics, hallucination detection techniques (e.g., SelfCheckGPT), model alignment (RLHF/DPO), and LLM-as-a-judge frameworks (e.g., G-Eval, DeepEval).
Knowledge of human factors, HCI, or cognitive science methodologies as applied to AI system design. Proven experience building scalable ML inference pipelines, model-evaluation workflows, and structured rating frameworks for large-scale AI systems. Experience building internal tools or automated pipelines for ML workflows using tools like MLflow, Weights & Biases, or similar platforms. Strong familiarity with advanced prompt engineering, RAG architectures (vector databases, semantic search), and Fine-Tuning
| Source: | Company website |
| Posted on: | 12 Jun 2026 |
| Type of offer: | Graduate job |
| Industry: | Consumer Electronics |
| Languages: | English |