Maxime Wabartha

Profile picture 

PhD student,
Reasoning and Learning Lab, Mila,
School of Computer Science, McGill University
E-mail: maxime.wabartha@mail.mcgill.ca
Github: maxwab

About me

I am a machine learning researcher specialized in reinforcement learning, with nearly two years of industry experience doing research within the FAIR team at Meta and at RBC Borealis. During my PhD, I focused on improving the transparency of reinforcement learning agents and supervised learning neural networks at several scales. I designed a more interpretable architecture for robotics agents to better decompose their behavior, leading to a simulated robot solving mazes with a handful number of linear sub-policies. I provided supervised learning models with a more discriminative OOD detection mechanism, which lets them recognize more accurately situations they are not trained to handle. Finally, I studied the properties of embeddings that are useful in efficiently evaluating the value of recommender system policies from logged data.

With a strong background in exploratory research, I am comfortable working in emerging fields where the right questions are yet to be defined. Drawing on my industry experience and engineering training, I use large-scale computational resources to quickly validate ideas.

Research

Currently, my research interests include

  • Reinforcement learning

  • Transparency and interpretability

  • Representation learning

Recent Publications

* denotes an equal contribution.

  • Wabartha, M., Wilson, K., Evans, D., Sharifi-Noghabi, H. & Sylvain, T. (2025). “Investigating Action Embeddings for More Efficient Off-Policy Evaluation”. RecSys workshop on Causality, Counterfactuals & Sequential Decision-Making (CONSEQUENCES ’25).

  • Wabartha, M. & Pineau, J. (2025). “Object-Centric Concept Representation and Use in RL Agents. Under revision.”

  • Danesh, M., Wabartha, M., Pineau, J. & Lin, H. C. (2025). “Mitigating Distribution Shifts: Uncertainty-aware Offline-to-online Reinforcement Learning”. Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM ’25).

  • Wabartha, M. & Pineau, J. (2024). “Piecewise Linear Parametrization of Policies for Interpretable Deep Reinforcement Learning”. International Conference on Learning Representations (ICLR ’24).

Full list of publications.
A brief cv (last updated: 2026/02/04).

This website was created using Jemdoc.