Description du poste:
Job Description
Get to Know the Team
You will be embedded in a team at the frontier of AI evaluation. This is a technically novel and high-impact problem, as Grab's AI agents grow in complexity, becoming multi-modal, multi-agent, and multi-turn.
This is a paid internship role. You will report to our Engineering Manager II and based onsite at our office in Petaling Jaya, Selangor.
Get to Know the Role
As an AI Engineer Intern on the Agents Platform team, you will work directly with the Senior AI Engineer to contribute to EvalsHub - Grab's platform for evaluating AI agents and LLM services. This is a hands-on engineering role where you will build features, improve eval infrastructure, and work with real agent traces from teams across Grab.
The Critical Tasks You Will Perform
* Improve features on EvalsHub - the platform used by AI teams across Grab to evaluate their agents and LLM services
* Design and implement evaluation pipelines, including support for multi-turn, multi-agent, and coding agent use cases
* Build and curate golden datasets and configure LLM-as-a-judge evaluators to identify and measure agent failure modes
* Analyze eval results to surface applicable insights for agent builders across Grab
* Collaborate with hero teams (chatbot builders, coding agent teams) to understand their evaluation needs and onboard them to the platform
* Contribute to research on evaluation strategies - e.g., multi-turn evals, model comparison workflows, trace rendering
* Document findings, methodologies, and best practices for the broader community
Profil requis du candidat:
Qualifications
What Essential Skills You Will Need
The must-haves
* Can start from May 2026 onwards with a minimum duration of 3 months
* Pursuing a degree in Computer Science, Software Engineering, AI/ML, or a related field
* Good programming skills in Python, Go; comfortable reading and writing production-quality code
* Familiarity with LLMs and AI agent concepts (prompting, tool use, agentic loops)
* Curiosity and rigour - you care about measuring things correctly, not just quickly
* Self independent and communicate well on findings
The nice-to-haves
* Prior exposure to evaluation frameworks, benchmarking, or ML experimentation pipelines
* Experience with LLM tracing or observability tools (e.g., LangSmith, OpenTelemetry)
* Familiarity with coding agent frameworks (Claude Code, Codex, or similar)
* Experience with Temporal workflows or distributed task orchestration
* Contributions to open-source AI/ML projects
| Origine: | Site web de l'entreprise |
| Publié: | 21 Mai 2026 (vérifié le 22 Mai 2026) |
| Type de poste: | Stage |
| Secteur: | TIC / Informatique |
| Langues: | Anglais |
Entreprises |
Offres |
Pays |