Generative AI Research Intern - Image-to-Video Diffusion

Apple

Beijing, Chine

Stage, Science/Recherche, Anglais

60 Visites			0 Candidats

Enregistrez-vous

Description du poste:

We are seeking a talented Generative AI Research Intern to work on image-to-video (I2V) diffusion models. This role focuses on building next-generation generative systems that transform still images into high-quality, temporally consistent videos. You will work closely with researchers and engineers to explore cutting-edge diffusion and transformer-based architectures for video generation, motion modeling, and temporal representation learning.
You will: - Lead experimental studies on image-to-video diffusion techniques and temporal generative modeling. - Design, implement, and evaluate model variants for motion modeling and temporal consistency. - Build experimental pipelines and run ablation studies and benchmarking experiments. - Analyze results and summarize findings in internal research reports or presentations. - Study and reproduce recent work in video diffusion and I2V generation. - Participate in research discussions and proposal of new ideas.
Currently pursuing a Master's or Ph.D. degree in Computer Science, Electrical Engineering, Applied Mathematics, or a related field. Strong background in computer vision and deep learning, with research or project experience in diffusion models, flow-matching, or transformer-based generative models applied to video generation or spatiotemporal tasks. Experience or solid understanding of image-to-video models, video generation, temporal modeling, or video representation. Proficiency in Python and PyTorch, with the ability to implement research code independently. Familiarity with model training workflows, ablation studies, and experimental result analysis. Familiarity with generative video evaluation metrics such as LPIPS, FVD, and perceptual assessment methods. Strong motivation for independent research and the ability to drive projects with minimal supervision. Ability to read, reproduce, and critically analyze recent research papers.
Understanding of video fundamentals such as motion estimation, optical flow, temporal consistency, or alignment techniques. Prior experience with large-scale training, multimodal models, or video datasets. Publications in top-tier conferences or journals (e.g., CVPR, ICCV, ECCV, NeurIPS, ICLR, etc.). Strong curiosity and motivation for independent research in generative video modeling

Voir texte complet