Workshop: Offline Reinforcement Learning

Sim-to-Real Interactive Recommendation via Off-Dynamics Reinforcement Learning

Junda Wu · Zhihui Xie · Tong Yu · Qizhi Li · Shuai Li


Interactive recommender systems (IRS) have received growing attention due to its awareness of long-term engagement and dynamic preference. Although the long-term planning perspective of reinforcement learning (RL) naturally fits the IRS setup, RL methods require a large amount of online user interaction, which is restricted due to economic considerations. To train agents with limited interaction data, previous works often count on building simulators to mimic user behaviors in real systems. This poses potential challenges to the success of sim-to-real transfer. In practice, such transfer easily fails as user dynamics is highly unpredictable and sensitive to the type of recommendation task. To address the above issue, we propose a novel method, S2R-Rec, to bridge the sim-to-real gap via off-dynamics RL. Generally, we expect the policy learned by only interacting with the simulator can perform well in the real environment. To achieve this, we conduct dynamics adaptation to calibrate the difference of state transition using reward correction. Furthermore, we align representation discrepancy of items by representation adaptation. Instead of separating the above into two stages, we propose to jointly adapt the dynamics and representations, leading to a unified learning objective. Experiments on real-world datasets validate the superiority of our approach, which achieves about 33.18% improvements compared to the baselines.

Chat is not available.