| 11 Visite |
0 Candidati |
Descrizione del lavoro:
Publication date : Apr 08, 2026, 12:15AM
Large Language Models (LLMs) are impacting our daily lives since their breakthrough in 2022 [Ouyang et al. 2022]. Although recent efforts integrated reasoning steps in the training process of these models, they still struggle to reason and plan, being error prone and incontrollable. These limitations cannot be solved with advancing prompting (e.g. chain-of-thoughts (COT) or agentic AI (Re-Act) [Yao et al. 2022]). Furthermore, LLMs frequently neglect grounding: the process by which human speakers agree in their mutual understanding. Typically, LLMs are self-confident and rarely repair or clarify an uncertainty. When it comes to Small Language Models or Language Models for short (LM), these limitations are stronger since they do not scale, nor generalized due to their limited number of parameters.
Reinforcement Learning from Human Feedback (RLHF) has shown to be remarkable efficient improving LLMs capabilities reducing significantly hallucinations and unsafe content [Ouyang, Wu, et al. 2022 and Havrilla et al. 2024]. Deepseek has shown that the combination of deterministic and probabilistic rewards improves reasoning without degrading the fluency. However, RLHF is not efficient when it comes to small LMs. Therefor recently mulit-agent reinforcement learning (MRL) has gain interest as a way to overcome the inherent limitations of LM [Junyou Li, et al. 2024 and Lowe et al. 2017]. We propose in this thesis to study multi-agent reinforcement learning by decomposing complex conversation tasks into three sub-tasks: grounding, reasoning and planning, in this study we will focus on small language models. Objectives: · To study MRL to adjust the weights of specialized LM agents that can work collaboratively in a mutli-agent setting, going beyond classic solutions such as prompting (simple or advanced), RAG or finetuning. · To apply MRL on the resolution of complex tasks such as public benchmarks and notably Orange's use-cases (e.g. the resolution of technical problems of the network or our products). We will address the following challenges to reach these objectives. · We will study how MRL can help LMs to acquire complex capabilities. · We need to find the optimal decomposition of a complex task into sub-tasks. · Defining an adequate reward function is also a challenge. We need to evaluate whether the distinct rewards in the MRL setting and proposed the most optimal ones. MRL allows multiple specialized agents to work together. By cooperating, they overcome their individual limitations and solve complex tasks.
References:
Yao et al. 2022. React: Synergizing reasoning and acting in language models.
Ouyang et al. 2022. Training language models to follow instructions with human feedback.
Li, et al. 2024. More agents is all you need. TMLR.
Lowe et al. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. NeurIPS.
Havrilla et al. 2024. Teaching large languagemodels to reason with reinforcement learning. ICML.
Skills (Technical and scientific) and soft skills
* You have experience in the fields of Artificial Intelligence, Machine Learning and particularly in deep learning.
* You have a strong background in mathematics (numerical optimization, statistics, probability, etc.).
* You are proficient in software development
* You are proficient in read, written and spoken English
* You are curious, attracted by new technologies, and ready to keep up with their evolutions - You enjoy working in a team, within multidisciplinary projects, and contributing to a common goal, while being autonomous in your activities
* You have good analytical and synthesis skills
* Proficiency in one of the following deep learning tools: Torch, pyTorch, TensorFlow, MXNet is desired
* You like to communicate the results of your work through written reports and oral presentations preferable in English
Required training (master's degree, engineering degree, PhD, scientific and technical field, etc.)
* Engineering degree and/or Research Master's degree, with knowledge in machine learning and in at least one of the fields listed above.
Desired experience (internships, etc.),
A first experience in the implementation of deep learning algorithms (as part of an internship for example) would be desired.
You will join a team specialized in dialogue, you will work with researchers, data scientists, architects, developers, PhD students and interns.
L'ambition de la Division Innovation est de porter plus loin l'innovation d'Orange et de renforcer son leadership technologique, en mobilisant nos capacités de recherche pour nourrir une innovation responsable au service de l'humain, éclairer les choix stratégiques du Groupe à long terme et influencer l'écosystème digital mondial.
Nous formons les expertes et les experts des technologies d'aujourd'hui et de demain, et veillons à une amélioration continue de la performance de nos services et de notre efficacité. La division Innovation rassemble, dans le monde, 6000 salariés dédiés à la recherche et l'innovation dont 740 chercheurs. Porteurs d'une vision globale avec une grande diversité de profils (chercheurs, ingénieurs, designers, développeurs, data scientists, sociologues, graphistes, marketeurs, experts en cybersécurité…), les femmes et les hommes de Innovation sont à l'écoute et au service des pays, des régions et des business units pour faire d'Orange un opérateur multiservices de confiance.
Au sein de Innovation, vous serez intégré(e) dans la direction Data & AI. Cette direction a pour principale mission de faire d'Orange une entreprise « data driven qui définit les standards du Groupe en matière de data et d'intelligence artificielle, et qui facilite le développement des cas d'usage, des produits et services de données. Cette direction est appelée à accompagner l'ensemble du groupe Orange.
At Orange, only your skills matter.
Regardless of your age, gender, background, origin, religion, sexual orientation, disability, neurodiversity, or appearance, we actively encourage diversity within our teams, as it is a collective strength and a driver of innovation.
Orange is a disability-inclusive employer: please feel free to let us know about any specific needs you may have
| Provenienza: | Web dell'azienda |
| Pubblicato il: | 11 Apr 2026 (verificato il 15 Apr 2026) |
| Tipo di impiego: | Graduate Programme |
| Settore: | Telecomunicazioni |
| Durata di lavoro: | 36 mesi |
| Lingue: | Inglese |