大规模推荐系统的贝叶斯非平稳线性匪徒

论文标题

大规模推荐系统的贝叶斯非平稳线性匪徒

Bayesian Non-stationary Linear Bandits for Large-Scale Recommender Systems

论文作者

Ghoorchian, Saeed, Kortukov, Evgenii, Maghsudi, Setareh

论文摘要

利用上下文信息可以潜在地提高推荐系统的性能。在大数据时代，此类信息通常具有多个维度。因此，开发决策算法以实时应对这种高维环境至关重要。当决策者有多种项目需要推荐时，这特别具有挑战性。此外，由于缺乏对环境中的分配变化的鲁棒性，项目的“受欢迎程度或用户”偏好的变化可能会阻碍已部署的推荐系统的性能。在本文中，我们基于线性上下文多臂强盗框架来解决此问题。我们为线性强盗问题制定了决策政策，其中具有高维功能向量，大型武器和非平稳奖励生成过程。我们的汤普森采样策略使用随机投影降低了特征向量的维度，并将指数增加权重以随着时间的推移减少过去观察的影响。我们提出的推荐系统采用此政策来在线学习用户的项目偏好，同时最大程度地减少运行时。我们证明了一种遗憾的界限，它是缩小尺寸而不是原始维度的一个因素。为了通过数值评估我们提出的建议系统，我们将其应用于三个现实世界数据集。理论和数值结果证明了我们提出的算法在与最先进的情况下进行计算复杂性和遗憾性能之间的权衡。

Taking advantage of contextual information can potentially boost the performance of recommender systems. In the era of big data, such side information often has several dimensions. Thus, developing decision-making algorithms to cope with such a high-dimensional context in real time is essential. That is specifically challenging when the decision-maker has a variety of items to recommend. In addition, changes in items' popularity or users' preferences can hinder the performance of the deployed recommender system due to a lack of robustness to distribution shifts in the environment. In this paper, we build upon the linear contextual multi-armed bandit framework to address this problem. We develop a decision-making policy for a linear bandit problem with high-dimensional feature vectors, a large set of arms, and non-stationary reward-generating processes. Our Thompson sampling-based policy reduces the dimension of feature vectors using random projection and uses exponentially increasing weights to decrease the influence of past observations with time. Our proposed recommender system employs this policy to learn the users' item preferences online while minimizing runtime. We prove a regret bound that scales as a factor of the reduced dimension instead of the original one. To evaluate our proposed recommender system numerically, we apply it to three real-world datasets. The theoretical and numerical results demonstrate the effectiveness of our proposed algorithm in making a trade-off between computational complexity and regret performance compared to the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题