Angebot veröffentlichen
de
Angebot aufzeigen
Arbeit > Praktika > IT/Technologie > China > Beijing > Angebot aufzeigen 

Deep Learning Performance Architect Intern - 2025

Nvidia
China  Beijing, China
Praktikum, IT/Technologie, Englisch
32
Besuche
0
Bewerbungen
Anmelden

Beschreibung:

We are looking for a first-class Deep Learning Performance architect to join in us to shape the performance analysis infrastructures for GPUs. We build cutting-edge analysis tools and visualization frameworks that empower engineers to optimize GPU performance for Deep Learning and HPC workloads-spanning pre-silicon architectural exploration to post-silicon validation and optimization. Your work will directly shape the tools that define how NVIDIA GPUs are analyzed, tuned, and scaled for next-gen AI systems, and impact the next-gen GPUs architectures.

What you'll be doing:
* Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle
* Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities.
* AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure.
* Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to co-design performance-centric solutions.
* End-to-End Optimization: Create benchmarks to validate performance improvements across AI/HPC workloads and present actionable insights.

What we need to see:
* BS/MS+ in relevant discipline (CS, EE, Math)
* Proficiency in C/C++ (performance-critical coding) and Python (automation/scripting, and AI/ML frameworks)
* Strong grasp of computer architecture (pipelines, memory hierarchies) and Operating System fundamentals
* Understand machine learning and data analysis basics, LLM techniques such as prompt engineering, fine-tuning, vector databases
* Experience with performance modeling, architecture simulation, profiling, and analysis.
* Self-starter who thrives in dynamic environments and manages competing priorities effectively.

Ways to stand out from the crowd:
* Experience with developing HW performance debugging and analysis tools
* Familiar with System Software Stack(like CUDA Driver), CUDA kernel optimization and understand GPU architecture
* Familiarity with GPU performance profiling tools like Nsight System, Nsight Compute
* Practical experience or projects demonstrating LLM-based code generation, automated data analysis, or workflow assistants. Prior experience with agentic LLM frameworks like Langchain and LLamaIndex.
* Full-Stack Versatility: Skills in JavaScript, SQL, or UI/UX design for tool interfaces

Quelle: Website des Unternehmens
Datum: 06 Jun 2025  (geprüft am 15 Jul 2025)
Stellenangebote: Praktikum
Bereich: Unterhaltungselektronik
Sprachkenntnisse: Englisch
Anmelden
110.110 Jobs und Praktika
in 163 Länder
Registrieren