Masked IRL is a language-conditioned reward learning framework that reasons jointly over language and demonstrations. Given an ambiguous instruction and a user demonstration, an LLM first disambiguates the language in the context of a reference (shortest-path) trajectory. The clarified instruction is then passed to a second LLM that predicts which state dimensions are relevant for that preference, producing a binary state relevance mask.
Instead of explicitly zeroing out masked dimensions, Masked IRL applies an implicit masking loss: it perturbs irrelevant state dimensions with random noise and penalizes changes in the predicted reward. This drives the reward model to become invariant to irrelevant features while remaining sensitive to the parts of the state that language indicates matter for the task. The overall objective combines a standard Maximum Entropy IRL loss with this masking loss.
At test time, the reward model takes a new instruction and state as input and implicitly infers which state components are important through its language-conditioned architecture, enabling trajectory optimization for novel language-specified preferences.