Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language

Robots can adapt to user preferences by learning reward functions from demonstrations, but with limited data, reward models often overfit to spurious correlations and fail to generalize. This happens because demonstrations show robots how to do a task but not what matters for that task, causing the model to focus on irrelevant state details. Natural language can more directly specify what the robot should focus on, and, in principle, disambiguate between many reward functions consistent with the demonstrations. However, existing language-conditioned reward learning methods typically treat instructions as simple conditioning signals, without fully exploiting their potential to resolve ambiguity. Moreover, real instructions are often ambiguous themselves, so naive conditioning is unreliable. Our key insight is that these two input types carry complementary information: demonstrations show how to act, while language specifies what is important. We propose Masked Inverse Reinforcement Learning (Masked IRL), a framework that uses large language models (LLMs) to combine the strengths of both input types. Masked IRL infers state-relevance masks from language instructions and enforces invariance to irrelevant state components. When instructions are ambiguous, it uses LLM reasoning to clarify them in the context of the demonstrations. In simulation and on a real robot, Masked IRL outperforms prior language-conditioned IRL methods by up to 15% while using up to 4.7 times less data, demonstrating improved sample-efficiency, generalization, and robustness to ambiguous language.

Masked IRL is a language-conditioned reward learning framework that reasons jointly over language and demonstrations. Given an ambiguous instruction and a user demonstration, an LLM first disambiguates the language in the context of a reference (shortest-path) trajectory. The clarified instruction is then passed to a second LLM that predicts which state dimensions are relevant for that preference, producing a binary state relevance mask.

Instead of explicitly zeroing out masked dimensions, Masked IRL applies an implicit masking loss: it perturbs irrelevant state dimensions with random noise and penalizes changes in the predicted reward. This drives the reward model to become invariant to irrelevant features while remaining sensitive to the parts of the state that language indicates matter for the task. The overall objective combines a standard Maximum Entropy IRL loss with this masking loss.

At test time, the reward model takes a new instruction and state as input and implicitly infers which state components are important through its language-conditioned architecture, enabling trajectory optimization for novel language-specified preferences.

BibTeX

@article{hwang2025maskedirl,
  title   = {Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language},
  author  = {Hwang, Minyoung and Forsey-Smerek, Alexandra and Dennler, Nathaniel and Bobu, Andreea},
  journal = {arXiv preprint},
  year    = {2025},
}

Masked IRL: LLM-Guided Reward Disambiguation
from Demonstrations and Language

How can robots learn reward functions that capture true human preferences when demonstrations and instructions are ambiguous?

Abstract

Method: Masked Inverse Reinforcement Learning

Experiments

RQ1: Efficiency of the Masking Loss (Simulation)

Example Simulated Trajectories per Method

Example Visualization of Learned Rewards

RQ2: Robustness to Ambiguous Language

RQ3: Real-World Evaluation on a Franka Arm

Example Real-Robot Executions per Method

BibTeX