RecSys2020推荐系统论文Recommending the Video to Watch Next: An Offline and Online Evaluation at YOUTV.de

Recommending the Video to Watch Next: An Offline and Online Evaluation at YOUTV.de

推荐接下来要看的视频:在YOUTV.de的在线和离线评估

作者:
Panagiotis Symeonidis, Free University of Bozen-Bolzano, Italy, panagiotis.Symeonidis@unibz.it
Andrea Janes, Free University of Bozen-Bolzano, Italy, andrea.janes@unibz.it
Dmitry Chaltsev, Free University of Bozen-Bolzano, Italy, dmitry.chaltsev@gmail.com
Philip Giuliani, Keep in Mind, Italy, philip.giuliani@keepinmind.info
Daniel Morandini, Keep in Mind, Italy, daniel.morandini@keepinmind.info
Andreas Unterhuber, Keep in Mind, Italy, andreas.unterhuber@keepinmind.info
Ludovik Coba, Free University of Bozen-Bolzano, Italy, ludovik.Coba@unibz.it
Markus Zanker, Free University of Bozen-Bolzano, Italy, markus.Zanker@unibz.it

DOI与发表期刊
DOI: https://doi.org/10.1145/3383313.3412257
RecSys '20: Fourteenth ACM Conference on Recommender Systems, Virtual Event, Brazil, September 2020

分类索引
• Information systems → Collaborative filtering;
• Information systems → Recommender systems;

关键词
session-based recommendation, offline and online evaluation

摘要 Abstract

The task “recommend a video to watch next?” has been in the focus of recommender systems’ research for a long time. However, adequately exploiting the clues hidden in the sequences of actions of user sessions in order to reveal users’ short-term intentions moved only recently into the focus of research. Based on a real-world application scenario, in this paper, we propose a Markov Chain-based transition probability matrix to efficiently reveal the short-term preferences of individuals. We experimentally evaluated our proposed method by comparing it against state-of-the-art algorithms in an offline as well as a live evaluation setting. In both cases our method not only demonstrated its superiority over its competitors, but exposed a clearly stronger engagement of users on the platform. In the online setting, our method improved the click-through rate by up to 93.61%. This paper therefore contributes real-world evidence for improving the recommendation effectiveness, by considering sequence-awareness, since capturing the short-term preferences of users is crucial in the light of items with a short life span such as tv programs (news, tv shows, etc.).

任务“推荐接下来要看的视频”一直以来都是推荐系统研究的重点。充分利用隐藏在用户会话操作序列中的线索以揭示用户的短期意图只是最近才成为研究重点。

本文基于一个实际的应用场景,提出了一种基于马尔科夫链的转移概率矩阵来有效揭示用户个人的短期偏好。我们通过在离线和在线评估环境中将我们提出的方法与最好的算法进行比较,实验性地评估了我们的方法。

在这两种情况下,我们的方法不仅证明了其优于竞争对手的优势,而且使得用户在平台上的参与度明显提高。在在线环境下,我们的方法将点击率提高了93.61%。

因此,本文通过考虑序列感知为提高推荐效果提供了现实证据,因为对于生命周期较短的项目,如电视节目(新闻、电视剧等),捕获用户的短期偏好至关重要。

1. 引言 Introduction

YOUTV.de operates as an online video recording service that allows users to record high quality German television programs of their choice and view them whenever and wherever they want. Users of YOUTV.de can watch broadcasts directly after recording them online without any time delay (via streaming) or on the go by using an iOS or Android device. Personalized recommendations are therefore becoming a core functionality in order to enhance customer engagement and retention for such a service provider [11]. YOUTV.de provides the functionality of recommending which videos to watch next based on the sequence of tv programs that the users have interacted with (see at the bottom of Figure 1). Consumer studies suggest that typical visitors of an Internet TV website lose their interest after clicking on 10 to 20 titles (perhaps reviewing 3 in more detail) [6]. That is, users either find something interesting within a short time-span or the likelihood that they abandon the service increases significantly.

YOUTV.de是一个在线视频录制服务平台,可让用户录制自己选择的高质量德国电视节目,并随时随地观看。YOUTV.de的用户在在线录制广播后可以直接观看广播,没有任何时间延迟(通过流媒体),也可以使用iOS或Android设备边走边看。

因此,对于 [ 11 ] [11] [11]这样的服务提供商来说,个性化推荐正成为增强客户参与度和保留率的核心功能。YOUTV.de提供了根据与用户相交互的电视节目序列推荐接下来要观看哪些视频的功能(请参见图1底部)。

消费者研究表明,互联网电视网站的典型访问者在点击10到20个标题之后就会失去观看的兴趣(也许会更详细地回顾其中3个) [ 6 ] [6] [6]。也就是说,用户要么在短时间内找到感兴趣的东西,要么他们放弃服务的可能性大大增加。
在这里插入图片描述

图1. YOUTV的下一个视频推荐。这些推荐显示在底部的模块“Auch das konnte dir gefallen”(德语),翻译成“您可能也喜欢”。

Collaborative Filtering (CF) helps users in exploring videos that other users with similar behavior and taste have already identified in the past. That is, given a target user and her positively rated tv programs, a CF algorithm will identify their neighbourhood of similar users, i.e. user-based CF (UBCF) [27]. Another CF algorithmic variation is the item-based CF (IBCF) [28], where given a target user and her positively rated tv programs, the algorithm relies on the items’ similarities for the formation of a neighborhood of nearest items. Recently, session-based CF (SBCF) has been proposed by [12] also known as session-knn, where similar sessions to the current session of a user are identified. Please note that a session is a short-term interaction of a user with the system.

协同过滤算法(CF)可以帮助用户浏览那些有着相似行为和品味的用户在过去已经识别出来的视频。也就是说,给定目标用户及其好评过的电视节目,一个CF算法可以识别其他邻近的类似用户,即基于用户的CF(UBCF) [ 27 ] [27] [27]

CF的另一种算法变化是基于项目的CF(IBCF) [ 28 ] [28] [28],给定目标用户及其积极评价过的电视节目,该算法依赖观看项的相似性来形成最接近观看项的推荐项邻域。

最近, [ 12 ] [12] [12]也提出了基于会话的CF(SBCF),也称为Session-knn,该算法可以识别与用户当前会话相似的会话。请注意,会话是指用户与系统的短期交互。

However, all the aforementioned methods are considered as non-sequential, since they learn a user’s preference on each individual item, and then they rank these items based on their score to provide recommendations. This functionality makes these methods good at capturing the general tastes of users by using the whole historical information (i.e. by aggregating a user’s complete log history). In contrast, sequence-aware recommender systems capture the transition relationship between two or more adjacent items in user log sequences (i.e. user sessions) so as to learn the sequential dynamics among items [24]. For example, many sophisticated sequence-based approaches were proposed that implement some form of sequence modelling based on Markov Chain Models (MCM) [1,21,23], which can capture the item transition probabilities and the very last user intentions inside a user session.

但是,前面提到的所有方法都被认为是非序列化的,因为它们仅学习了用户对每个单独观看项的偏好,然后根据这些观看项的得分对这些观看项进行排序,从而提供推荐。这个功能使得那些方法能够很好地利用整个历史信息(即通过汇总用户完整的日志记录)来捕获用户的总体喜好。

而具有序列感知的推荐系统则通过捕捉用户日志序列(即用户会话)中两个或多个相邻观看项之间的转移关系,来了解观看项之间的序列动态 [ 24 ] [24] [24]。例如,许多复杂的基于序列的方法被提出,它们实现了基于马尔可夫链模型(MCM) [ 1 , 21 , 23 ] [1,21,23] [1,21,23]实现了某种形式的序列化建模,这些模型可以捕捉观看项之间的转移概率和用户会话中最新的用户意图。

In this paper, we use MCM to provide session-based video recommendations, by trying to capture the short-term preferences of users. To reveal the very last user’s intention, we analyse the item interactions inside her latest sessions (i.e., MCM-based items’ similarity) by building a Transition Probability Matrix (TPM). Moreover, we track the evolution of these preferences by using a sliding time window to assign smaller importance to videos that are outdated or old. Thus, our model is continuously updated with the latest user clicks, which makes it sensitive to adapt to the changes of the user preferences. We have run offline and online experiments with our proposed method and other baselines. In both settings, our method performs best. In particular, for online setting, our method which profiles users in real time improves the click-through rate by up to to 93.61%.

在本文中,我们试图通过捕获用户的短期偏好来使用MCM提供基于会话的视频推荐。为了揭示最后一位用户的意图,我们通过建立一个转移概率矩阵(TPM)来分析她在最近一次会话中与观看项进行的交互(即基于MCM的项相似性)。此外,我们通过滑动时间窗口对过于久远或陈旧的视频赋予较小的权重,从而跟踪这些偏好的演变。

因此,我们的模型会随着最新的用户点击而不断更新,这使得它能够敏感地适应用户偏好的变化。我们已经用我们提出的方法和其他基线方法进行了离线和在线实验。在这两种实验环境中,我们的方法的性能最好。特别是在在线环境中,我们的方法实时分析用户,将点击率提高多达93.61%。

The rest of the paper is organized as follows. Section 2 summarizes the related work. Section 3 provides the problem formulation, whereas Section 4 presents a motivation toy example. Section 5 presents our proposed methodology. Experimental results are given in Section 6, whereas we discuss some methodological challenges in Section 7. Finally, Section 8 concludes the paper.

本文的其余部分安排如下。第2节总结了相关工作。第3节给出了问题的描述,而第4节给出了一个启发性的小例子。第5节介绍了我们提出的方法。实验结果在第6节中给出,而我们在第7节中讨论一些方法上的挑战。最后,第8节对全文进行总结。

2. 相关工作 Related Work

There are several works [1, 3, 4, 7, 21, 23, 34] in recommender systems that use sequential modelling based on MCMs. Zimdars et al. [34] were the first to suggest the sequentially of the recommendation process. Esiyok et al. [3] studied the users’ behaviour in the context of news categories by building a MCM based on the Plista data set and by describing patterns in the evolution of news categories while users browse news articles online. Moreover, approaches for recommender systems that use MDPs are published by Moling et al. [22] and Shani et al. [29]. A hybrid model that combines MCM with MF is proposed by Rendle et al. [26] and it is denoted as Factorized Personalized Markov Chains (FPMC). FPMC is used for the next-basket recommendation problem. The task at hand is to predict user’s next basket content, given history of past shopping baskets. This approach combines MCM with MF using a three-dimensional tensor (user, current item, next item). Each entry in the tensor corresponds to an observed transition between two items performed by a specific user. The proposed method then uses pairwise factorisation to predict the unobserved entries in the sparse tensor [20]. However, this approach is computationally intensive, and does not scale well in real world on-line situations.

使用基于MCM进行序列化建模的推荐系统已经有过很多研究[1,3,4,7,21,23,34]。

Zimdars等人 [ 34 ] [34] [34]首先提出了推荐过程的序列化建议。

Esiyok等 [ 3 ] [3] [3]通过基于Plista数据集来构建MCM,并描述了用户在线浏览新闻文章时,新闻类别演变的模式,研究了用户在新闻类别情景下的行为。

Moling et al. [ 22 ] [22] [22]和Shani et al. [ 29 ] [29] [29]公布了使用MDP的推荐系统的方法。

Rendle等 [ 26 ] [26] [26]提出了将MCM与MF相结合的混合模型,称为分解个性化马尔科夫链(FPMC)。FPMC用于下一个购物车推荐问题。给定过去购物车的历史记录,当前的任务是预测用户的下一个购物车内容。该方法利用三维张量(用户、当前项、下一项)将MCM与MF相结合。张量中的每一项都对应于一个特定用户所观察到的其他两项之间的转换。然后,该方法使用两两分解来预测稀疏张量中未观察到的项 [ 20 ] [20] [20]。然而,这种方法计算量大,在现实世界的在线情况下无法很好地扩展。

Session-based recommendation usually refers to the case, in which we only have anonymous sessions and we are not able to build a user’s profile. Recently, session-based recommendations have been modelled with Recurrent Neural Networks (RNNs). Hidasi et al. [9] presented a recommender system based on Gated Recurrent Unit (GRU), which learns when and how much to update the hidden state of the GRU model. However, a more recent study [12] have shown that a simple k-nearest neighbor (session-kNN) scheme adapted for session-based recommendations often outperforms the GRU4Rec model. Nevertheless, several new adjustments were proposed during last years that improve the performances of the initial RNN model [8, 10, 25, 30, 31]. Neural Attentive Recommendation Machine (NARM) [16] proposed an attentive mechanism to capture in a session both the user’s (i) sequential behavior and (ii) main purpose. Short-Term Attention/Memory Priority (STAMP) [19] model proposed another attention mechanism, which can effectively capture long-term and short-term interests of a user in a session. Yuan et al. [33] proposed a convolutional generative model, which combines masked filters with 1D dilated convolutions to model the long-range item dependencies among user sessions.

基于会话的推荐通常指的是只有匿名会话,导致无法建立用户个人资料的情况。最近,基于会话的推荐已使用递归神经网络(RNN)进行建模。

Hidasi等 [ 9 ] [9] [9]提出了一种基于门控递归单元(GRU)的推荐系统,该系统学习何时更新GRU模型的隐藏状态以及更新多少。

然而,最近的一项研究 [ 12 ] [12] [12]表明,一种简单的k-最近邻(Session-kNN)方案适用于基于会话的推荐,其性能往往优于GRU4Rec模型。

尽管如此,最近几年提出了一些新的调整措施,以改善初始RNN模型的性能 [ 8 , 10 , 25 , 30 , 31 ] [8,10,25,30,31] [8,10,25,30,31]

神经注意推荐机(NARM) [ 16 ] [16] [16]提出了一种注意机制,可以在会话中捕获用户的(i)连续行为和(ii)主要目的。短期注意/记忆优先级(Short-Term Attention/Memory Priority, STAMP)模型提出了另一种注意机制 [ 19 ] [19] [19],可以有效捕获用户在一个会话中的长期和短期兴趣。

Yuan等人 [ 33 ] [33] [33]提出了一种卷积生成模型,该模型将掩码过滤器与一维扩张卷积相结合,来建模用户会话之间的长期项依赖关系。

Another research direction is the combination of collaborative filtering with content-based filtering for providing item recommendations. Related work [17] has shown that a way to increase accuracy is to consider the context of the user (i.e., time, location, mood, etc). For example, Das et al. [2] generated recommendations based on collaborative filtering that takes under consideration the co-visitation count of articles, which is the number of times a news story was co-visited with another news stories in the user’s click-history. Later, Liu et al [18] combined a content-based method with the collaborative filtering method previously developed for Google News [2] to generate personalised news recommendations. The hybrid method develops a Bayesian framework for predicting users’ current news interests based on profiles learned from: (i) the target user’s activity and (ii) the news trends demonstrated in the activity of all users based on the categories that the news articles belong to. We will use in our experiments this method as a comparison partner since also videos have content that describes them and could be combined with the collaborative data.

另一个研究方向是将协同过滤与基于内容的过滤相结合,提供推荐项。相关工作 [ 17 ] [17] [17]表明,提高推荐准确性的一种方法是考虑用户的情景(例如时间、地点、心情等)。

例如,Das等人 [ 2 ] [2] [2]基于协同过滤生成推荐,考虑到了文章的共同访问次数,即用户点击历史中一个新闻故事与另一个新闻故事被共同访问的次数。

后来,Liu等 [ 18 ] [18] [18]将基于内容的方法与之前为Google News [ 2 ] [2] [2]开发的协同过滤方法相结合,生成个性化的新闻推荐。混合方法开发了一个贝叶斯框架来预测用户当前的新闻兴趣,该框架基于以下内容:(i)目标用户的活动;(ii)基于新闻文章所属类别在所有用户的活动中展示的新闻趋势。我们将在实验中使用这种方法作为比较算法,因为视频也有描述它们的内容,并可以与协作数据相结合。

3. 问题描述 Problem Formulation

We are interested in building a recommender system that suggests videos to interested internet television viewers (see the ”you may also like” section on Figure 1). The internet tv provider may update a small personalised top-N list of video recommendations (which may be shown inside a widget), every time a video is selected by the viewer, because the provider wants to engage the user more time in the website for reasons of advertisement and for fulfilling his viewing desires. The system monitors how visitors react upon the received recommendations to drive better suggestions and try to predict their next click/item inside a session.

我们感兴趣的是建立一个推荐系统,向感兴趣的网络电视观众推荐视频(请参见图1中的“您可能也喜欢”部分)。每当观看者选择视频时,网络电视提供商可能会更新一小段个性化的视频推荐列表(可能显示在屏幕小部件内)的前N个推荐项,因为提供商希望用户在网站上停留更长的时间,以便让用户观看更多广告。该系统监控访问者对收到的建议的反应,以提出更好的建议,并试图预测他们在本次会话中的下一个点击项/观看项。

Let U U U denote the increasing set of users that visit the online web site, and I I I represent the increasing set of incoming tv programs/items. We keep track of users’ actions over items on the website. In particular, whenever a user views one or more videos in a short period of time, we store these interactions in the database as a user’s session. These interactions with items have a sequence. That is, we know for every item that belongs to a session, if it is selected first, second or last and how long the user interacted with it. For instance, session S 1 ( u s e r = u 1 , T i m e S t a r t e d = t 1 ∣ i 1 , 20 s e c , i 2 , 145 s e c ) S_1(user = u_1, TimeStarted = t_1|{i_1, 20sec}, {i_2, 145sec}) S1(user=u1,TimeStarted=t1i1,20sec,i2,145sec) indicates that within session S 1 S_1 S1 that started at timepoint t 1 t1 t1 from user u 1 u_1 u1, item i 1 i_1 i1 was selected first, it was read for 20 seconds, i 2 i_2 i2 was selected second and it got user’s attention for 145 seconds. Table 1 summarises some basic symbols and notations that will be used later.

U U U表示不断增加的访问在线网站的用户集合, I I I表示不断增加的电视节目/观看项集合。

我们跟踪用户在网站上的行为。特别是,只要用户在短时间内观看一个或多个视频,我们就会将这些交互作为用户的会话数据存储在数据库中。用户与这些观看项的交互是序列化的。也就是说,我们知道属于该会话的每个观看项,是第一个、第二个还是最后一个被观看的,以及用户与它交互的时间。

例如,会话 S 1 ( u s e r = u 1 , T i m e S t a r t e d = t 1 ∣ i 1 , 20 s e c , i 2 , 145 s e c ) S_1(user = u_1, TimeStarted = t_1|{i_1, 20sec}, {i_2, 145sec}) S1(user=u1,TimeStarted=t1i1,20sec,i2,145sec)表示在从用户 u 1 u_1 u1在时间点 t 1 t_1 t1开启了会话 S 1 S_1 S1,首先选择了项 i 1 i_1 i1,读取时间为20秒,其次选择了项 i 2 i_2 i2,它吸引了145秒的用户注意。表1总结了一些下文会用到的基本符号和表示。

表1. 符号及其描述

符号(Symbol)描述(Description)
U U U用户集, U = U 1 , U 2 , . . . , U n U = {U_1, U_2, ...,U_n} U=U1,U2,...,Un
I I I项集, I = i 1 , i 2 , . . . , i m I = {i_1, i_2, ...,i_m} I=i1,i2,...,im
S S S会话集, S = S 1 , S 2 , . . . , S N S = {S_1, S_2, ...,S_N} S=S1,S2,...,SN
S u S_u Su用户 u u u所属的会话集
S i S_i Si i i i所属的会话集
S t p S^{t_p} Stp, S t y S^{t_y} Sty带有时间窗口的会话集
w w w滑动时间窗口的大小

4. 启发性示例 Motivating Example

Our session-based recommender consists of two modules. The first one is the user profile updater module, which reads instances from the stream of sessions combining them with earlier recorded information. In particular, our user profile updater assigns validity intervals to elements of the sessions stream S S S. Then, a sliding time window of size w w w states that the processing at a point in time t t t should respect all events not older than t − w t−w tw. Therefore, the profile updater sets a (half open) validity interval ( t − w , t ] (t−w, t] (tw,t] to an event that has been arisen at time t t t. Then, the second module is the recommender that runs on top of the profile updater to deliver the top- N N N recommended items to each user.

我们所提出的基于会话的推荐器包括两个模块。第一个是用户信息文件更新模块,它从会话流中读取实例,并将实例与先前记录的信息相结合。

特别是,我们的用户信息文件更新模块为会话流 S S S中的元素分配有效期间隔。然后,大小为 w w w的滑动时间窗口限制在时间点 t t t的处理只需考虑不早于 t − w t−w tw的所有事件。因此,用户信息文件更新模块设置了(半开放)有效期间隔 ( t − w , t ] (t−w, t] (tw,t]

然后,第二个模块是运行在用户信息文件更新模块之上的推荐模块,它向每个用户推荐排名最高的前 N N N个推荐项。

To better explain our approach, we will use the following graphical representation for our running example, which is shown in Figure 2. We have 3 users and want to predict the video that user 2 will click next in his unfinished session (i.e. session S 7 S_7 S7).

为了更好地解释我们的方法,我们将使用下面的图形作为示例,如图2所示。我们有3个用户,并且希望预测用户2将在其未完成的会话(即会话S7)中点击的下一个视频。
在这里插入图片描述

图2. 我们的简易示例(用户-项目-会话)的可视化表示

For computing similarities between target user 2 with the two other users, please note that sessions S 1 S_1 S1 and S 2 S_2 S2 are ignored, due to being outside of the valid time window interval set as ( t − w , t ] (t−w, t] (tw,t]. This sliding window captures the notion of recency of tv programs. That is, a video may have a life span, which obsoletes fast (e.g. the news broadcast of the previous day is already old). Thus, we should try not to recommend videos to users that are not recent. Please note that still entertainment content, such as movies, tv series, and tv shows could be interesting for more extended time periods and have longer life span. When two or more items are selected within one session, these items can be considered to be more similar compared to items that were selected in different sessions from the same user. For instance, by taking actions of user 1 (i.e., U 1 U_1 U1) into account we can infer that item i 4 i_4 i4 is more similar to item i 6 i_6 i6 and i 7 i_7 i7 than item i 9 i_9 i9, since they were selected inside the same session S 4 S_4 S4 together with item i 4 i_4 i4.

为了计算目标用户2与其他两个用户之间的相似性,请注意,会话 S 1 S_1 S1 S 2 S_2 S2被忽略了,因为它们在有效时间窗口 ( t − w , t ] (t−w, t] (tw,t]之外。

这个滑动窗口捕捉了电视节目近期性的概念。也就是说,视频可能有一个生命周期,很快就会过时(例如,前一天播放的新闻已经很旧了)。因此,我们应该尽量不要向用户推荐不是近期的视频。

请注意,娱乐内容如电影、电视剧和电视节目保持新鲜的时间更长,即寿命更长。

当在一个会话中选择了两个或更多的观看项时,可以认为这些观看项与从同一用户在不同会话中选择的观看项更相似。例如,考虑到用户1(即 U 1 U_1 U1)的动作,我们可以推断观看项 i 4 i_4 i4比观看项 i 9 i_9 i9更类似于观看项 i 6 i_6 i6 i 7 i_7 i7,因为它们和观看项 i 4 i_4 i4是在同一个会话 S 4 S_4 S4中被选中的。

Our running example is depicted in Figure 2, session S 7 S_7 S7 is still open and it is running for user U 2 U_2 U2. Thus, items i 2 i_2 i2 and i 3 i_3 i3 of session S 7 S_7 S7 can be matched in order to make item recommendations to user U 2 U_2 U2. As shown, user U 3 U_3 U3 has also selected the same items (inside the valid window time interval) with U 2 U_2 U2. He has also selected i 8 i_8 i8, which could be a nice recommendation for U 2 U_2 U2. Please note that also U 1 U_1 U1 has selected exactly the same items with those of session S 7 S_7 S7 of U 2 U_2 U2. However, we cannot use session S 1 S_1 S1 for training recommendations since it is not inside the specified time window.

我们的运行示例如图2所示,会话 S 7 S_7 S7仍处于打开状态,并且正在为用户 U 2 U_2 U2运行。因此,可以匹配会话 S 7 S_7 S7中的观看项 i 2 i_2 i2 i 3 i_3 i3,从而对用户 U 2 U_2 U2提出推荐项建议。

如图所示,用户 U 3 U_3 U3也与 U 2 U_2 U2选择了相同的观看项(在有效的窗口时间间隔内), U 3 U_3 U3还选择了 i 8 i_8 i8,这可能是 U 2 U_2 U2的一个不错的推荐项。

请注意, U 1 U_1 U1也选择了与 U 2 U_2 U2的会话 S 7 S_7 S7完全相同的观看项。但是,我们不能使用会话 S 1 S_1 S1进行推荐训练,因为它不在指定的时间窗口内。

In summary, items that are selected within the same session (intra-session Markov Chain-based item similarity) are considered to be more similar than those items which are selected by the same user within different sessions (inter-session item similarity). Intra-session Markov Chain-based item similarity thus reveals the short-term preference of the user and his intentions inside a session independently of other sessions.

总之,在同一个会话中选择的观看项(基于会话内马尔科夫链的项相似度)被认为比相同用户在不同会话中选择的观看项(会话间项相似度)更相似。基于会话内马尔可夫链的项相似性揭示了某会话内用户的短期偏好及其意图,而与其他会话无关。

5. 方法 Methodology

In this Section, we will introduce our Markov Chain-based algorithm in more detail. In particular, for each user session, we update an item transition probability matrix based on the subsequences between items.

在这一节中,我们将更详细地介绍我们的基于马尔可夫链的算法。特别地,对于每个用户会话,我们根据观看项之间的子序列更新项转移概率矩阵。

5.1. 马尔科夫链 Markov Chain

A Markov chain is a stochastic process of possible events that satisfies the Markov property, where the probability of each event depends only on the present state and not on the previous states. A variation of MCM, denoted as Markov Chain Model of Order m, states that the future state depends on the past m states. Hidden Markov Model (HMM) is also a MCM with hidden states. Moreover, Markov Decision Processes (MDPs) extend MCMs, where at each timepoint t t t, when the process is in state x t x_t xt, the decision maker may choose any action a ∈ A x t a\in A_{x_t} aAxt. MDP reacts at the next time step by randomly moving into a new state x t + 1 x_{t+1} xt+1, and giving the decision maker a corresponding reward R ( x t ,   a ,   x t + 1 ) R(x_t, a, x_t+1) R(xt,a,xt+1).

马尔科夫链是满足马尔科夫性质的可能事件的随机过程,其中每个事件的概率只取决于当前状态,而不依赖于之前的状态。

MCM的一个变体,表示为m阶的马尔可夫链模型,该模型的未来状态依赖于过去的m个状态。隐马尔可夫模型(HMM)也是一种具有隐状态的MCM。

此外,马尔可夫决策过程(MDP)扩展了MCM,在每个时间点 t t t,当过程处于 x t x_t xt状态时,决策者可以在 A x t A_{x_t} Axt中选择任何动作 a ∈ A x t a\in A_{x_t} aAxt。MDP在下一步做出反应,并随机转移到新状态 x t + 1 x_{t+1} xt+1,同时给决策者一个相应的奖励 R ( x t , a , x t + 1 ) R(x_t, a, x_{t+1}) R(xt,a,xt+1)

One of the ways that a Markov Chain Model { X } \{X\} {X} can be represented is by using a transition matrix P i , j P_{i,j} Pi,j, where each row contains the probability of transition between states.
P i , j = P ( X t + 1 = x j ∣ X t = x i ) (1) \begin{aligned} P_{i,j}=\mathbb{P}(X_{t+1}=x_j|X_t=x_i) & \qquad\qquad\text{(1)} \end{aligned} Pi,j=P(Xt+1=xjXt=xi)(1)

马尔可夫链模型 { X } \{X\} {X}表示的方法之一是使用转移矩阵 P i , j P_{i,j} Pi,j,其中每一行包含状态之间的转移概率。 P i , j = P ( X t + 1 = x j ∣ X t = x i ) (1) \begin{aligned} P_{i,j}=\mathbb{P}(X_{t+1}=x_j|X_t=x_i) & \qquad\qquad\text{(1)} \end{aligned} Pi,j=P(Xt+1=xjXt=xi)(1)

Each row of the matrix is a probability vector, and the sum of its components is equal to 1. For instance, a transition matrix that represents a model with three possible states:
P = [ 0.7 0.05 0.25 0.25 0.6 0.15 0.35 0.1 0.55 ] P=\begin{bmatrix} 0.7 & 0.05 & 0.25 \\ 0.25 & 0.6 & 0.15 \\ 0.35 & 0.1 & 0.55 \\ \end{bmatrix} P=0.70.250.350.050.60.10.250.150.55

矩阵的每一行都是一个概率向量,其分量和等于1。例如,表示具有三种可能状态的模型的转换矩阵:
P = [ 0.7 0.05 0.25 0.25 0.6 0.15 0.35 0.1 0.55 ] P=\begin{bmatrix} 0.7 & 0.05 & 0.25 \\ 0.25 & 0.6 & 0.15 \\ 0.35 & 0.1 & 0.55 \\ \end{bmatrix} P=0.70.250.350.050.60.10.250.150.55

If the current state of the process is x 2 x_2 x2, i.e. the second row of the matrix is considered, then the probability of transition into the state x 3 x_3 x3 equals to P ( X t + 1 = x 3 ∣ X t = x 2 ) = 0.15 P( X_{t+1}=x_3| X_t=x_2)=0.15 P(Xt+1=x3Xt=x2)=0.15. Another way to represent MCM is with a transition diagram, that is a weighted directed graph, where each vertex represents a state of the MCM and there is a directed edge from vertex x i x_i xi to vertex x j x_j xj if the transition probability P i , j > 0 P_{i,j}>0 Pi,j>0; this edge has the weight/probability of P i , j P_{i,j} Pi,j. An example of such diagram can be found in Figure 3.

如果马尔科夫过程当前状态为 x 2 x_2 x2,即考虑矩阵的第二行,则转移到状态 x 3 x_3 x3的概率等于 P ( X t + 1 = x 3 ∣ X t = x 2 ) = 0.15 P(X_{t+1}=x_3| X_t=x_2)=0.15 P(Xt+1=x3Xt=x2)=0.15(由 x 2 → x 3 x_2\to x_3 x2x3的转移概率,即第二行第三列的值)。

另一种表示MCM的方法是使用转移图,即加权有向图,其中每个顶点表示MCM的一种状态,如果转移概率 P i , j > 0 P_{i,j}>0 Pi,j>0,则有一条从顶点 x i x_i xi到顶点 x j x_j xj的有向边;这条边的权值/概率为 P i , j P_{i,j} Pi,j,图3中表示了这样一个例子。
在这里插入图片描述

图3. 一个马尔可夫链转换图示例。模型有三种状态 x 1 , x 2 , x 3 {x_1, x_2, x_3} x1,x2,x3。位于有向边附近的数字表示了由状态到状态的转移概率。

5.2. 本文提出的算法 Proposed Method

Based on the Bayesian inference that considers independence among evidences, we can predict the items that will be included in a last session S N S_N SN of a user u u u based on the items that are already included in S N S_N SN. In particular, we can use the following formula to build the Markov Chain-based transition probabilities between any two subsequent items in each distinct session in time window t p t_p tp as follows:
p ( j ∈ S N ∣ i 1 : m ∈ S N ) ∝ ∏ i k ∈ S N , k = 1... m p ( j ∈ S N ∣ i k ∈ S N ) (2) p(j\in S_N|i_{1:m}\in S_N)\propto\prod_{i_k\in S_N,k=1...m}p(j\in S_N|i_k\in S_N)\qquad\qquad\text{(2)} p(jSNi1:mSN)ikSN,k=1...mp(jSNikSN)(2)

where i k i_k ik is the set of items that user u u u already has clicked in current session S N S_N SN, and j j j is the item to be predicted as next recommended item in S N S_N SN.

基于考虑证据独立性的贝叶斯推断,我们可以根据已经包含在会话 S N S_N SN中的观看项来预测用户 u u u最近一次会话 S N S_N SN中将要包含的推荐项。

具体来说,我们可以使用以下公式,在时间窗口 t p t_p tp中的每个不同会话的后续任意两个观看项之间构建基于马尔科夫链的的转移概率,如下所示:
p ( j ∈ S N ∣ i 1 : m ∈ S N ) ∝ ∏ i k ∈ S N , k = 1... m p ( j ∈ S N ∣ i k ∈ S N ) (2) p(j\in S_N|i_{1:m}\in S_N)\propto\prod_{i_k\in S_N,k=1...m}p(j\in S_N|i_k\in S_N)\qquad\qquad\text{(2)} p(jSNi1:mSN)ikSN,k=1...mp(jSNikSN)(2)

其中, i k i_k ik是用户 u u u在当前会话 S N S_N SN中已经点击的观看项集合, j j j S N S_N SN中预测为下一个推荐项的集合。

The Markov Chain-based item Transition Probability Matrix(TPM) captures the transition probability between two subsequent events in a session. That is, we can simply count how often users viewed item i b i_b ib immediately after viewing item i a i_a ia.

基于马尔可夫链的项转移概率矩阵(TPM)捕捉一个会话中两个后续事件之间的转移概率。也就是说,我们可以简单地计算出用户在查看观看项 i a i_a ia后立即查看观看项 i b i_b ib的频率。

Let a session Sn be a chronologically ordered set of item click events S n = ( i 1 , i 2 , . . . , i m ) S_n=(i_1,i_2,...,i_m) Sn=(i1,i2,...,im) and S S S be a set of all sessions S = S 1 , S 2 , . . . , S N S={S_1,S_2,...,S_N} S=S1,S2,...,SN. Given a user’s current session S N S_N SN with i m i_m im being the last item in S N S_N SN, we can define the score for a recommendable item j j j as follows: p ( j ∈ S N ∣ i m ∈ S N ) = s c o r e ( j , i m ) = ∑ S n ∈ S , n = 1... N ∑ i k ∈ S n , n = 1... N i s S a m e ( i m , i k ) ⋅ i s S a m e ( j , i k + 1 ) ∑ S n ∈ S , n = 1... N ∑ i k ∈ S n , n = 1... N i s S a m e ( i m , i k ) (3) \begin{aligned} p(j\in S_N|i_m\in S_N) & = score(j,i_m) \\ & = \frac{\sum_{S_n\in S,n=1...N}\sum_{i_k\in S_n,n=1...N}isSame(i_m,i_k)\cdot isSame(j,i_{k+1})}{\sum_{S_n\in S,n=1...N}\sum_{i_k\in S_n,n=1...N}isSame(i_m,i_k)}\\ \end{aligned}\qquad\text{(3)} p(jSNimSN)=score(j,im)=SnS,n=1...NikSn,n=1...NisSame(im,ik)SnS,n=1...NikSn,n=1...NisSame(im,ik)isSame(j,ik+1)(3)

where the function i s S a m e ( i a , i b ) isSame(i_a, i_b) isSame(ia,ib) indicates where i a i_a ia and i b i_b ib refer to the same item as follows: i s S a m e ( i a , i b ) = { 1 , 若  i a = i b 0 , 若  i a ≠ i b isSame(i_a,i_b) = \begin{cases} 1, & \text{若 $i_a=i_b$} \\ 0, & \text{若 $i_a\neq i_b$} \end{cases} isSame(ia,ib)={1,0, ia=ib ia=ib

设会话Sn为一组按时间顺序排列的观看项单击事件 S n = ( i 1 , i 2 , . . . , i m ) S_n=(i_1,i_2,...,i_m) Sn=(i1,i2,...,im) S S S是所有会话的集合 S = S 1 , S 2 , . . . , S N S={S_1,S_2,...,S_N} S=S1,S2,...,SN

给定用户当前会话 S N S_N SN,其中 i m i_m im S N S_N SN中的最后一个观看项,我们可以定义一个推荐项 j j j的评分,如下所示: p ( j ∈ S N ∣ i m ∈ S N ) = s c o r e ( j , i m ) = ∑ S n ∈ S , n = 1... N ∑ i k ∈ S n , n = 1... N i s S a m e ( i m , i k ) ⋅ i s S a m e ( j , i k + 1 ) ∑ S n ∈ S , n = 1... N ∑ i k ∈ S n , n = 1... N i s S a m e ( i m , i k ) (3) \begin{aligned} p(j\in S_N|i_m\in S_N) & = score(j,i_m) \\ & = \frac{\sum_{S_n\in S,n=1...N}\sum_{i_k\in S_n,n=1...N}isSame(i_m,i_k)\cdot isSame(j,i_{k+1})}{\sum_{S_n\in S,n=1...N}\sum_{i_k\in S_n,n=1...N}isSame(i_m,i_k)}\\ \end{aligned}\qquad\text{(3)} p(jSNimSN)=score(j,im)=SnS,n=1...NikSn,n=1...NisSame(im,ik)SnS,n=1...NikSn,n=1...NisSame(im,ik)isSame(j,ik+1)(3)

其中 i s S a m e ( i a , i b ) isSame(i_a, i_b) isSame(ia,ib)表示 i a i_a ia i b i_b ib指的是同一个观看项,如下所示: i s S a m e ( i a , i b ) = { 1 , 若  i a = i b 0 , 若  i a ≠ i b isSame(i_a,i_b) = \begin{cases} 1, & \text{若 $i_a=i_b$} \\ 0, & \text{若 $i_a\neq i_b$} \end{cases} isSame(ia,ib)={1,0, ia=ib ia=ib

Based on Equation 3, in our running example of Figure 2, transition probability from item i 4 i_4 i4 to item i 6 i_6 i6 is equal to 1 2 \frac{1}{2} 21, and it is so since in all the sessions of time window t p t_p tp there is only one case where i 4 i_4 i4 is followed by i 6 i_6 i6 (session S 4 S_4 S4); and the denominator is equal to two, since there are two sessions where i 4 i_4 i4 is followed by any other item (sessions S 3 S_3 S3 and S 4 S_4 S4). The markov chain-based transition probability matrix of our running example is presented in Table 2 (rows and columns with zeros are not shown).

根据公式3,在图2所示的运行示例中,从观看项 i 4 i_4 i4到观看项 i 6 i_6 i6的转移概率等于 1 2 \frac{1}{2} 21,因为在时间窗口 t p t_p tp内的所有会话中,只有1种情况是 i 4 i_4 i4后面跟着 i 6 i_6 i6 (会话 S 4 S_4 S4);分母等于2是因为在2个会话中 i 4 i_4 i4后面跟着有其他观看项(会话 S 3 S_3 S3 S 4 S_4 S4)。

表2给出了我们运行示例中的基于马尔可夫链的转移概率矩阵(全零的行和列没有显示)。

表2. 运行示例中的基于马尔可夫链的转移概率矩阵。

基于马尔可夫链的转移概率矩阵 i 1 i_1 i1 i 2 i_2 i2 i 3 i_3 i3 i 6 i_6 i6 i 7 i_7 i7 i 8 i_8 i8
i 2 i_2 i2 0 0 0 0 0 0 1 2 \frac{1}{2} 21 0 0 0 0 0 0 1 2 \frac{1}{2} 21
i 3 i_3 i3 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
i 4 i_4 i4 1 2 \frac{1}{2} 21 0 0 0 0 0 0 1 2 \frac{1}{2} 21 0 0 0 0 0 0
i 6 i_6 i6 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0

To summarise, Markov Chain-based TPM infers similarity among items inside each session independently from other sessions. Please note that the Markov Chain-based TPM similarity is more effective with smaller time window sizes w w w, which makes it more suitable for capturing the short-term user preferences.

综上所述,基于马尔可夫链的TPM可以独立于其他会话来推断每个会话中观看项之间的相似性。

请注意,基于马尔可夫链的TPM相似性在较小时间窗 w w w下更有效,这使得它更适合捕捉短期用户偏好。

5.3. 推荐列表创建 Recommendation List Creation

Our recommender module provides recommendations based on the Markov Chain-based TPM presented in previous section. For each target user u u u, the recommender checks the set of her recently viewed items I t p , u I_{t_p,u} Itp,u(i.e., the ones she has interacted with in the current time period t p t_p tp) and computes K i K_i Ki, which is the set of the k k k nearest items to each item i i i that belongs in I t p , u I_{t_p,u} Itp,u. Next, for each target user u u u in t p t_p tp and for each item j j j we compute a ranking score s c o r e ( t p , u , j ) score(t_p,u,j) score(tp,u,j) as follows: s c o r e ( t p , u , j ) = ∑ i ∈ I t p , u T P M ( i , j ) ∗ 1 ( j , K i ) (3) score(t_p,u,j)=\sum_{i\in I_{t_p,u}}TPM(i,j)*1_{(j,K_i)}\qquad\text{(3)} score(tp,u,j)=iItp,uTPM(i,j)1(j,Ki)(3)

我们的推荐模块基于上一节介绍的基于马尔可夫链的TPM提供推荐建议。对于每个目标用户 u u u,推荐程序检查其最近的观看项集合 I t p , u I_{t_p,u} Itp,u(即她在当前时间段 t p t_p tp内互动过的项),并计算 K i K_i Ki,它是 I t p , u I_{t_p,u} Itp,u i i i k k k最近的观看项集合。

接下来,对每个目标用户 u u u在时间窗口 t p t_p tp内的每个观看项 j j j,计算一个排名得分 s c o r e ( t p , u , j ) score(t_p,u,j) score(tp,u,j),如下: s c o r e ( t p , u , j ) = ∑ i ∈ I t p , u T P M ( i , j ) ∗ 1 ( j , K i ) (3) score(t_p,u,j)=\sum_{i\in I_{t_p,u}}TPM(i,j)*1_{(j,K_i)}\qquad\text{(3)} score(tp,u,j)=iItp,uTPM(i,j)1(j,Ki)(3)

where 1 j , K i 1_{j,K_i} 1j,Ki is an indicator function that is equal to 1 1 1 if the item j j j is present within the k-nearest neighbors of item i i i, and 0 0 0 otherwise. Moreover, T P M ( i , j ) TPM(i,j) TPM(i,j) is a function that returns a similarity score for two items i i i and j j j based on the Markov Chain-based TPM (transition probability matrix) that we computed in the previous section. Then, for each user we sort the items in decreasing score and recommend to her the top-N ones.

其中 1 ( j , K i ) 1_{(j,K_i)} 1(j,Ki)是一个指示函数,如果项 j j j处于项 i i i的k近邻中,则等于 1 1 1,否则等于 0 0 0

此外, T P M ( i , j ) TPM(i,j) TPM(i,j)是一个函数,它根据我们在上一节计算的基于马尔可夫链的TPM(转移概率矩阵)返回两个项 i i i j j j之间的相似性评分。然后对于每个用户,我们按照分数递减的顺序对推荐项进行排序,并向用户推荐前 N N N个项。

6. 实证评估 Experimental Evaluation

In this Section, we will perform off-line and on-line evaluation of our proposed method together with other baselines and state-of-the-art comparison partners.

在本节中,我们将与其他基线和最先进的比较算法一起,对我们提出的方法进行离线和在线评估。

6.1. 离线评估 Offline Evaluation

6.1.1 数据集特征 Data Set Characteristics

For the offline evaluation, the data set was collected during 2 weeks in May 2018 (18/5/2018-1/6/2018). It accommodates 1,146,452 interactions/events on 63,897 videos of 18,447 unique users. The interactions of each session are logged with the following information: the user session’s identifier, the interaction’s time stamp and duration, the tv program’s textual content. As shown in Figure 4(a), fifty percent (50%) of user sessions in the offline data set have only one video interaction. However, as can be seen in Figure 4(b), most users interact with two videos before leaving the service in the online evaluation scenario. As will be explained later, this happens because user satisfaction is increased due to the existence of a new better recommendation algorithm in the A/B testing phase. Please note that for the data set used in the offline evaluation, there was a cleaning procedure, which lied in removing the sessions that contain only one video interaction, as no recommendations can be tested on such sessions, and no video co-occurence item patterns can be identified to build a model. Detailed general statistics of the data sets used for the offline and the online evaluation are summarized in Table 3.

离线评估数据集收集时间为2018年5月(2018/5/18-2018/6/1),总共2周。它包含了18447个独立用户的在63897个视频上的1146452个交互/事件。每个会话的交互都记录以下信息:用户会话的标识符、交互的时间戳和持续时间、电视节目的文本内容。

如图4(a)所示,离线数据集中百分之五十(50%)的用户会话只有一次视频交互。但是,如图4(b)可以看出,大多数用户在在线评估场景中离开服务之前都会与两个视频互动。正如后面将要解释的,这是因为在A/B测试阶段存在一种新的更好的推荐算法,从而提高了用户的满意度。
在这里插入图片描述

图4. 用于(a)离线和(b)在线评估的会话交互数量。

请注意,离线评价中使用的数据集有一个数据清洗的过程,即删除只包含一个视频交互的会话,因为此类会话无法测试任何推荐建议,也无法通过识别视频共有项模式来构建模型。表3总结了用于离线和在线评估的数据集的详细总体统计数据。

表3. 数据集的总体统计

离线(2018年5月数据集)在线(2019年6月数据集)
用户1844722375
交互11464522630008
会话317972363040
视频63897132895
平均每个会话的交互次数3.607.24
平均每个用户的交互次数62.15117.54
平均每个用户的会话个数17.2316.23
平均每个会话的播放个数4.982.73

Please note that for the offline evaluation, the average number of interactions per session is 3.6, which seems adequate for building a recommendation model.

请注意,对于离线评估,每个会话中的平均交互次数为3.6次,这看起来足以构建一个推荐模型。

6.1.2. 事前评估方案 Prequential Evaluation Protocol

In this Section, we present the evaluation protocol used for the offline evaluation, which is in the same direction, with the one introduced by Jannach et al. [12, 20] for predicting the next item inside a session, known also as prequential evaluation in stream mining [24, 32].

在本节中,我们将介绍用于离线评估的评估方案,该方案与Jannach等人 [ 12 , 20 ] [12,20] [12,20]所介绍的用于预测会话内下一个推荐项的评估方案方向相同,也称为流挖掘中的事前评估 [ 24 , 32 ] [24,32] [24,32]

As shown in Figure 5, the time frame of the available data is split into N t N_t Nt equal time periods t p t_p tp, and the data are then split in the way that in each period there are only sessions that were made during that time period. We use the splitting to later aggregate the evaluation results for each time period. Please note that t p t_p tp is the size of the horizon of the future that we will be able to predict. In the next section, we will try to identify what is the best future horizon that we will be able to predict more effectively (e.g. some hours in future, or some days in future).

如图5所示,可用数据的时间窗口被划分成 N t N_t Nt相等的时间段 t p t_p tp,然后按照以下方式分割数据:每个时间段内只包含在该时间段内创建的会话。我们使用分割之后再汇总每个时间段的评估结果。请注意, t p t_p tp是我们能够预测的未来范围。

在下一节中,我们将试着确定我们能够更有效预测的最佳未来范围是多大(例如未来几小时,或未来几天)。

在这里插入图片描述

图5. 事前评估

We also use parameter w w w to specify the window size on which the model is trained, which is defined as “Train data” in Figure 5. Parameter w w w controls how far back into the past we go to exploit information. Please note that if w w w is too large the system is not sensitive to changes (concept drifts). If it is too small there is not enough data to build a model predicting the next items in a session. In the test phase we also use a parameter v v v to specify how many views of a currently evaluated session are revealed before recommendation is made. As shown in the “Test phase” rectangle in Figure 5, after the first prediction is tested, the data point is added to the model, and the process repeats until there are data points in the session to be tested. Finally, we evaluate the precision (i.e., the number of hits divided by the number of recommended items) we get when we recommend top-5 videos for each next item prediction inside a session.

我们还使用参数 w w w来指定被训练模型的时间窗口大小,在图5中被定义为“Train data”。参数 w w w控制了我们利用过去信息的时间。请注意,如果 w w w太大,系统会对变化不敏感(概念漂移)。如果它太小,就会没有足够的数据来构建预测会话中下一个推荐项的模型。

在测试阶段,我们还使用参数 v v v来指定在提出建议之前显示当前评估会话的视图数。如图5中的“Test phase”矩形所示,在测试完第一个预测之后,将数据点添加到模型中,并重复该过程,直到会话中有待测试的数据点为止。

最后,当我们为会话中的每个推荐项预测推荐排名最前的5个视频时,我们要评估这个推荐的准确度(即点击次数除以推荐条目的数量)。

6.1.3. 所提出算法的灵敏度分析 Sensitivity Analysis of the proposed method

In this Section, we study the accuracy performance of the Markov Chain-based Transition Probability Matrix (TPM). We will explore, how the precision of the aforementioned method changes as we vary different parameters such as (i) different time period splits: N t = 14 N_t=14 Nt=14 time periods of 24 24 24 hours, 28 28 28 time periods of 12 12 12 hours, 55 55 55 time periods of 6 6 6 hours, 331 331 331 time periods of 1 1 1 hour (ii) various time window sizes: w = 1 , 3 , 9 , 12 , 15 , 18 , 21 w = 1, 3, 9, 12, 15, 18, 21 w=1,3,9,12,15,18,21. For all experiments we provide top-5 recommended items.

在本节中,我们研究了基于马尔可夫链的转移概率矩阵(TPM)的准确度性能。我们将探讨上述方法的准确度如何随着不同参数的变化而变化。

例如:
(i)不同的时间段划分: N t = 14 N_t=14 Nt=14 24 24 24小时时间段、 28 28 28 12 12 12小时时间段、 55 55 55 6 6 6小时时间段、 331 331 331 1 1 1小时时间段;

(ii)不同时间窗口大小: w = 1 、 3 、 9 、 12 、 15 、 18 、 21 w=1、3、9、12、15、18、21 w=13912151821

对于所有的实验,我们推荐了排名最前的5个推荐项。

As it is shown in Figure 6(a), as we split the timeline of data (i.e., the 14 days) in different time periods, recommendation accuracy changes drastically. As discussed earlier, parameter N t N_t Nt controls how big the future horizon (that we will try to predict) is. With this experiment, we want to identify how short or long this future horizon should be to have the best recommendation accuracy. Of course, this is related to the life span of the items and how often users re-appear into the system or change their preferences. Thus, by setting the time window size w = 1 w=1 w=1, when N t = 331 N_t=331 Nt=331 and t p = 1 t_p=1 tp=1 hour, the precision we get when we recommend 5 items (i.e., precision@5) is less than 0.04 0.04 0.04. When N t = 28 N_t=28 Nt=28 and t p = 12 t_p=12 tp=12 hours, we are able to get the best precision (0.051). Later, our prediction accuracy again drops as we consider N t = 14 N_t=14 Nt=14 and t p = 24 t_p=24 tp=24 hours. Henceforth, we will set N t = 28 N_t=28 Nt=28 and t p = 12 t_p=12 tp=12 hours for the rest experiments, which means that we are able to predict better when we try to predict the next 12 hours.

如图6(a)所示,当我们将数据的时间线(14天)划分为不同的时间段时,推荐准确度会发生急剧地变化。正如前面所讨论的,参数 N t N_t Nt控制未来我们将尝试预测的时间范围有多长。通过这个实验,我们想要确定未来的时间范围应该是多长才能获得最佳的推荐准确度。当然,这与推荐项的生命周期、用户回归系统(即用户结束了上一个会话重新开始了一个新会话)或更改首选项的频率有关。

因此,通过设置时间窗口大小 w = 1 w=1 w=1,当 N t = 331 N_t=331 Nt=331 t p = 1 t_p=1 tp=1个小时时,我们在推荐5个推荐项时得到的准确度小于 0.04 0.04 0.04

N t = 28 N_t=28 Nt=28 t p = 12 t_p=12 tp=12个小时时,可以得到最佳的准确度(0.051)。

之后,当我们考虑 N t = 14 N_t=14 Nt=14 t p = 24 t_p=24 tp=24小时时,得到的预测准确度再次下降。

此后,我们将其余的实验设 N t = 28 N_t=28 Nt=28 t p = 12 t_p=12 tp=12小时,这意味着我们可以更好地预测接下来的12小时。

在这里插入图片描述

图6. 对于离线评估,基于马尔科夫链的TPM在不同(a)时间段 t p = 1 、 6 、 12 、 24 t_p=1、6、12、24 tp=161224小时和(b)窗口大小 w = 1 、 3 、 9 、 12 、 15 、 18 、 21 w=1、3、9、12、15、18、21 w=13912151821(小时)下的准确度性能。

6.1.4. 与其他算法的比较 Comparison with other methods

In this Section, we compare our Markov Chain-based TPM algorithm with the following baselines and state-of-the-art comparison partners, which are representatives of different algorithmic families such as collaborative filtering (IBCF), session-based filtering (Session-knn), GRU4Rec and hybrid collaborative with content-based filtering (Cat-TPM).

在本节中,我们将基于马尔可夫链的TPM算法与以下基线和最先进的算法进行比较,它们是不同算法家族的代表,比如协同过滤(IBCF)、基于会话的过滤(Session-knn)、GRU4Rec和基于内容的混合协作过滤(Cat-TPM)。

(i) Recently Most Popular Items (Recently POPULAR): Recently POPULAR baseline recommends the top-N most clicked videos of the active/valid time period t p t_p tp.
(ii) Item-based Collaborative Filtering (IBCF) [ 2 ] [2] [2]: Based on IBCF, two items are considered similar, if they are selected by similar users. In [ 2 ] [2] [2], IBCF considers the co-visitation count of news articles, which counts the number of times an item was co-visited (clicked before of after) with another item.
(iii) Session-knn [ 13 ] [13] [13]: Session-knn method takes the set of user actions in the current session, e.g. two view events for certain items, and then in a first step determines the k most similar past sessions in the training data. Then, given the current session s s s, the set of k k k nearest neighbors N s N_s Ns, and a function s i m ( s 1 , s 2 ) sim(s_1,s_2) sim(s1,s2) that returns a similarity score for two sessions s 1 s_1 s1 and s 2 s_2 s2, the score of a recommendable item i i i is: s c o r e K N N ( i , s ) = ∑ n ∈ N s s i m ( s , n ) × 1 n ( i ) (3) score_{KNN}(i,s)=\sum_{n\in N_s}sim(s,n)\times1_n(i)\qquad\qquad\text{(3)} scoreKNN(i,s)=nNssim(s,n)×1n(i)(3)

where 1 n ( i ) = 1 1_n(i)=1 1n(i)=1 if n n n contains i i i and 0 0 0 otherwise. The similarity measure used by Jannach et al. [ 13 ] [13] [13] in experiments is cosine similarity, as it was found out that the best results are achieved when encoding sessions as binary vectors of the item space.
(iv) GRU4Rec [ 9 ] [9] [9]: GRU4Rec is a neural network-based recommender system that uses Gated Recurrent Units (GRU), which learns when and how much to update the hidden state of the GRU model. In particular, GRU4Rec is a recurrent neural network, which modifies the basic GRU to fit the prediction task better by introducing session-parallel mini-batches, mini-batch output negative sampling and a pairwise ranking loss function.
(v) Category-based TPM: (Cat-TPM) [ 18 ] [18] [18]: Based on Cat-TPM, when a user selects two videos in a row, a transition from a category of the first video to a category of the second video is recorded. Cat-TPM combines the content-based with the collaborative filtering methods to generate the personalized Google news recommendations [ 18 ] [18] [18].

(i) 最近最流行的推荐项(最近流行):最近流行的基线推荐在活动/有效时间段 t p t_p tp内点击次数排名前n的视频。

(ii) 基于项目的协同过滤(IBCF) [ 2 ] [2] [2]:基于IBCF,如果两个项目被相似的用户选择,则认为它们是相似的。在文献 [ 2 ] [2] [2]中,IBCF考虑新闻文章的共同访问次数,它用于统计一个推荐项与另一个推荐项被共同访问(单击之前或之后)的次数。

(iii) Session-knn [ 13 ] [13] [13]:Session-knn方法取当前会话中的用户动作集,例如两次查看某些观看项的事件,然后在第一步确定训练数据中k个之前最相似的会话。那么,给定当前会话 s s s k k k最近邻的集合 N s N_s Ns和函数 s i m ( s 1 , s 2 ) sim(s_1,s_2) sim(s1,s2),该函数返回两个会话 s 1 s_1 s1 s 2 s_2 s2的相似性得分,那么一个推荐项 i i i的得分表示为: s c o r e K N N ( i , s ) = ∑ n ∈ N s s i m ( s , n ) × 1 n ( i ) (3) score_{KNN}(i,s)=\sum_{n\in N_s}sim(s,n)\times1_n(i)\qquad\qquad\text{(3)} scoreKNN(i,s)=nNssim(s,n)×1n(i)(3)

其中: 1 n ( i ) = { 1 , 若 n 包含 i 0 , 其他 1_n(i) = \begin{cases} 1, & \text{若$n$包含$i$} \\ 0, & \text{其他} \end{cases} 1n(i)={1,0,n包含i其他Jannach等人在实验中使用的相似度度量为余弦相似度,因为我们发现将会话编码为项空间的二进制向量时效果最好。

(iv) GRU4Rec [ 9 ] [9] [9]:GRU4Rec是一种基于神经网络的推荐系统,它使用门控循环单元(GRU)来学习何时更新GRU模型的隐藏状态以及更新多少。其中,GRU4Rec是一种递归神经网络,通过引入会话并行迷你批,使其更好地适应预测任务,迷你批输出负抽样和成对排序损失函数,对基本的GRU进行了改进。

(v) Category-based TPM:(Cat-TPM) [ 18 ] [18] [18]:基于Cat-TPM,当用户连续选择两个视频时,会记录从第一个视频的类别到第二视频的类别的转移。Cat-TPM将基于内容的方法与协同过滤方法相结合,生成个性化的谷歌新闻推荐。

The parameters we used to evaluate the performance of all the aforementioned comparison partners are in accordance with those reported in the original papers and for our data set were tuned so as to get the best results for these methods.

我们用于评估上述比较算法性能的参数与原始论文中记录的参数一致,并且在我们的数据集上进行了调整,使得这些方法能得到最佳的结果。

Figure 7 reports the average precision over the two weeks of the comparison algorithms for N t = 28 N_t=28 Nt=28 time splits, t p = 12 t_p=12 tp=12 hours and sliding time window w = 18 w=18 w=18 for the offline data set. We run experiments with top-5 recommended tv programs. The reported results are tested for the difference of means between Markov Chain-based TPM and each of the rest comparison partners and found statistically significant based on one-sided t-test at the 0.05 level. As shown in Figure 7, our proposed approach has the best average precision over the 2 weeks among all comparison partners. This happens because we take under consideration the sequence between the items clicked inside a user session. This allows Markov Chain-based TPM to capture better the notion of recency of user’s interest (i.e. short-term preferences).

图7显示了在时间划分 N t = 28 N_t=28 Nt=28、活动时间 t p = 12 t_p=12 tp=12和滑动时间窗口 w = 18 w=18 w=18的情况下,对于为期两周的离线数据集上的比较算法平均准确度。

我们用推荐排名最高的前5个推荐项。图中的结果显示了基于马尔可夫链的TPM与其他比较算法之间的均值差异,发现单侧 t t t检验的统计特征可以达到 0.05 0.05 0.05

如图7所示,在所有比较算法中,我们提出的方法在2周内的平均准确度是最好的。这是因为我们考虑了用户会话中单击的观看项之间的顺序。这使得基于马尔可夫链的TPM能够更好地捕捉用户兴趣的近因(即短期偏好)。

在这里插入图片描述

图7. 离线评估,基于马尔科夫链的TPM在不同基线及其组合下的准确度性能比较。请注意,由于我们总是推荐5个项,并希望预测下一个推荐项,因此我们可以达到的最大准确度是20%(不是100%)。

As far as the rest comparison partners is concerned, “Recently POPULAR” baseline attains the worst results in terms of precision, since it cannot provide personalized recommendations. As expected, Session-knn attains better results than GRU4Rec as already reported in [ 14 ] [14] [14]. IBCF does not attain very good results because there are not enough data to build its prediction model. That is, many users re-appear irregularly and very rarely at timepoints that are far apart, which means that collaborative filtering cannot build always a model, since users should appear in two consecutive time slots. In other words, IBCF should be better for capturing the long-term preferences of a user and it is not so effective for items which have short life span such as news stories, tv programs, etc. We will further analyze this in the discussion section later. Moreover, as expected, Session-knn is far worst than Markov Chain-based TPM, because it cannot capture adequately the latent associations among items inside the same session. Moreover, the Cat-TPM which tries to combine collaborative filtering with content-based filtering (i.e., the category of the video is used) is also very ineffective because it probably is able to build only the long-term profile of users but misses to identify the very last short-term intentions of the users. Please note that in our experiments we have tested a combination of our Markov Chain-based TPM with the Cat-TPM, but this surprisingly did not resulted in better recommendaiton accuracy.

就其他的比较算法而言,“最近流行”基线在精确度方面的结果最差,因为它不能提供个性化的推荐。

意料之中,Session-knn获得了比GRU4Rec更好的结果,正如在文献 [ 14 ] [14] [14]中提到的那样。

IBCF并没有得到很好的结果,因为没有足够的数据来建立它的预测模型。也就是说,很多用户会不定期地重新出现,而且很少会出现在相隔较远的时间点上,这意味着协同过滤无法建立稳定的模型,因为用户应该会出现在两个连续的时间段中。换句话说,IBCF应该能更好地捕捉用户的长期偏好,而对于那些寿命较短的项目,如新闻报道、电视节目等,它就不那么有效了。我们将在后面的讨论部分对此进行进一步分析。

此外,正如预期的那样,Session-knn比基于马尔可夫链的TPM差得多,因为它不能充分捕获同一会话内观看项之间的潜在联系。

此外,Cat-TPM试图将协同过滤与基于内容的过滤(即使用视频的类别)结合起来,也非常无效,因为它可能只能建立用户的长期画像,但无法识别用户近期的短期意图。请注意,在我们的实验中,我们测试了基于马尔可夫链的TPM与Cat-TPM的组合,但令人惊讶的是,这并没有带来更高的推荐准确度。

6.2. 在线评估 Online Evaluation

The online A/B testing experiment was performed in 30 days between June and July 2019 (5/6/2019-4/7/2019). From the offline evaluation, we have selected the two best single (non-hybrid) methods in terms of precision, which was our Markov Chain-based TPM and the IBCF methods. We wanted to check their performance with a live experiment, to understand the user experience with our personalized tv program recommendations. In particular, we conducted experiments on a fraction of the live traffic at YOUTV, similarly to the procedure followed in the personalized Google news recommendations [ 18 ] [18] [18] work. The users were randomly assigned to a control group and a test group. The two groups had about the same number of users. The total number of users is 22,375, which have 2,630,008 interactions on 132,895 videos, as can be seen from Table 3. Please note that the engagement of users in the service has increased drastically since the average number of interactions per session is increased to 7.24 from 3.6 of the previous year. That is, at least half of the users (i.e., test group) in the online evaluation increased their interaction with the service due to the existence of a new recommendation algorithm. As can be seen in Figure 4(b), users interact with two videos before leaving the service. In addition, the average number of interactions per session (+3.64) and per user (+55.39) markedly increased (almost doubled) from the offline data set, as can be also seen by Table 3.

在线A/B测试实验于2019年6月至7月(2019年5月6日至2019年7月4日)进行,为期30天。

从离线评估中,我们选择了两种准确度最高的单一(非混合)方法,即基于马尔可夫链的TPM方法和IBCF方法。我们想通过现场实验来检验他们的表现,以了解我们个性化电视节目推荐的用户体验。

特别是,我们在YOUTV上对一小部分实时流量进行了实验,类似于谷歌个性化新闻推荐 [ 18 ] [18] [18]工作中所遵循的过程。用户被随机分配到对照组和测试组。这两组的用户数量大致相同。如表3所示,用户总数为22375,他们在132895个视频上交互了2630008次。

请注意,用户在服务中的参与度急剧上升,因为每个会话的平均交互次数从去年的3.6次增加到7.24次。也就是说,由于新的推荐算法的应用,在线评估中至少有一半的用户(即测试组)增加了与服务的交互。如图4(b)所示,用户在离开服务之前与两个视频交互。此外,从表3可以看出,每个会话的平均交互次数(+3.64)和每个用户的平均交互次数(+55.39)显著增加(几乎翻了一番)。

The evaluation protocol was the following: When a logged-in YOUTV user (who also explicitly has enabled web history) visited the website, we recommended to him tv programs. In our experiment, the users in the control group get video recommendations from the existing collaborative filtering method (i.e., IBCF method); while the new Markov Chain-based TPM is used for providing recommendations to the test group. The metric which is used to measure the performance of both recommendation algorithms is the click-through rate (CTR), which is the ratio of the number of clicks to the number of views of the recommended tv programs. We calculated the CTR metric for each user on daily basis. The performance of the control and test group was derived by averaging the measurements of all the users in the corresponding group.

评估方案如下:当登录的YOUTV用户(也明确启用了网络历史记录)访问该网站时,向他推荐电视节目。

在我们的实验中,对照组用户从现有的协同过滤方法(即IBCF方法)中获得视频推荐;而新的基于马尔科夫链的TPM用于向测试组提供推荐。用来衡量这两种推荐算法性能的指标是点击率(CTR),它是点击次数与推荐电视节目浏览量的比率。我们计算了每个用户每天的CTR指标。对照组和测试组的表现是通过计算对应组中所有用户测量的平均值得出的。

The CTR of the recommended videos is calculated as the number of clicks on the recommended videos every time the user visits the YOUTV website. It directly measures the quality of the recommendations as how many of the recommendations are clicked on, thus liked, by the user. Figure 8 shows the CTR of the recommended videos for the control and test group in the 30 days. The values are scaled so that the CTR of the control group in the first day is 1. As shown in the figure, the CTR in the test group is clearly higher than the CTR in the control group. This demonstrates that the proposed Markov chain-based TPM recommendation method improved considerably the quality of videos recommendations. On average, the Markov chain-based TPM method improves the CTR upon the pre-existing collaborative IBCF method by 93.61% in this real-world setting. The reported results are tested for the difference of CTR means between Markov Chain-based TPM and IBCF and found statistically significant based on one-sided t-test at the 0.05 level. This is very strong evidence for the improvement due to the exploitation of the sequential interactions of users.

推荐视频的点击率是根据用户每次访问YOUTV网站时对推荐视频的点击次数计算得出的。它直接衡量推荐的质量,即有多少推荐被用户点击或喜欢。

图8为对照组和测试组在30天内推荐视频的CTR。将这些值进行缩放,使对照组第一天的CTR为1。从图中可以看出,测试组的CTR明显高于对照组。这表明,本文提出的基于马尔可夫链的TPM推荐方法大大提高了视频推荐的质量。在这个真实环境中,基于马尔科夫链的TPM方法比已有的协作IBCF方法平均提高了93.61%的CTR。显示的结果检验了基于马尔可夫链的TPM和IBCF的CTR平均值的差异,发现单侧 t t t检验的统计值可以达到 0.05 0.05 0.05。这是对用户顺序交互的利用所带来改进的有力证据。

在这里插入图片描述

图8. 对于在线评估,对照组和测试组在30天内推荐视频的CTR。

7. 讨论 Discussion

In this section, we discuss the challenges for evaluating recommendations with an offline and an online evaluation protocol. We used the offline evaluation to select the two best recommendation algorithms along with their parameters that could most likely generate good recommendations in a live experiment. In contrast to what is reported in [5], in our case we have verified that our method is better than the IBCF in both the online and offline experiments. Thus, based on our own findings the offline experiments can reflect the relative performance of these two techniques. Of course, not many researchers can have access in the log server of a real-life company. To solve this, NewsReel [15] enabled an offline evaluation scenario similar to the online one by representing the data as a stream, which allow researchers to “replay” it offline. In other words, researchers can emulate the server used in online scenarios and carry out A/B tests for the news recommendation task.

在本节中,我们将讨论使用离线和在线评估方案来评估推荐所面临的挑战。我们使用离线评估来选择两种最佳的推荐算法及其参数,这两种算法最有可能在实际环境中产生好的推荐。

与文献[5]相比,在我们的案例中,我们验证了我们的方法在在线和离线实验中都优于IBCF。因此,基于我们自己的发现,离线实验可以反映这两种技术的相对性能。当然,没有多少研究人员能够访问真实公司的日志服务器。

为了解决这个问题,NewsReel [ 15 ] [15] [15]将数据表示为一个流,使得研究人员可以离线“回放”它,从而实现了类似于在线评估的离线场景。换句话说,研究者可以模拟在线场景中使用的服务器,对新闻推荐任务进行A/B测试。

In our live experiments, we used the CTR metric, whereas in our offline evaluation we have used the precision metric. Generally, the two metrics are considered as similar, but of course there are differences. The main difference is that the precision is computed for a specific number of recommended items, whereas for the CTR there is no such a parameter, but instead it just uses in the denominator the number of video views. This means that in precision as we increase the number of top-N recommended items precision always drops, whereas in the CTR you do not have such trend, which means that the acquired results differ but they should follow the same relative trends. The reason that CTR is used in the industry is because it is related with the revenues generated by the online advertisement (i.e, pay per click over views), which is a way to justify the performance of a recommender system through its generated revenue or other engagement objectives such as user clicks, user likes, and leaving a positive rating. For example, to measure the engagement of the users in social media (i.e, LinkedIn, Facebook, etc.) for a business’s post, we simply compute the ratio of the sum of comments, clicks, and likes on this post to the number of impressions.

在我们的实际实验中,我们使用了CTR度量,而在我们的离线评估中,我们使用了准确度作为指标。通常,这两个度量标准被认为是相似的,但当然也存在差异。

主要的区别是,准确度是针对特定数量的推荐项计算的,而CTR没有这样的参数,它是在分母中使用了视频浏览次数。这意味着,在准确度上,随着我们增加前N个推荐项的数量,准确度会随之下降,而在CTR中没有这样的趋势,这意味着获得的结果不同,但它们应该遵循相同的相对趋势。

CTR在行业中使用的原因是它与在线广告所产生的收入(即按点击次数付费)有关,这是一种通过其产生的收入或其他参与目标(如用户点击量、用户喜欢程度和留下正面评价)来证明推荐系统性能的一种方式。例如,为了衡量用户在社交媒体(如LinkedIn、Facebook等)上对某个企业帖子的参与度,我们只需计算该帖子的评论、点击和喜欢的总数与帖子的总数的比率即可。

One of the important findings in this work, is that the Markov Chain-based TPM, which exploits information from sessions is proved to be very effective in capturing the short-term preferences of the users, whereas the IBCF is mainly adequate for capturing the long term user preferences. The fact that in our case study, the items have a very short life span (e.g., news and tv programs) makes Markov Chain-based TPM more suitable for capturing this short-term dynamics. Of course, for cases where items have longer life span such as movies or books, the idea of trying to capture together the long-term and short-term dynamics may be more effective. As discussed in section 3 the intra-session Markov Chain-based item similarity reveals the short-term preference of the user. Based on this stand point, inter-session item similarity could also probably help in finding item similarities when sessions have a very small number of item interactions (i.e. low average number of item interactions per session) and could capture the long-term user preferences. In a future extended version of this paper, we plan to consider also the inter-session item similarity.

这项工作的一个重要发现是,基于马尔可夫链的TPM(它利用来自会话的信息)被证明在捕获用户的短期偏好上非常有效,而IBCF主要用于捕获用户的长期偏好。

在我们的案例研究中,推荐项的生命周期非常短(例如新闻和电视节目),这使得基于马尔可夫链的TPM更适合捕捉这种短期动态。当然,对于电影或书籍等具有较长寿命的物品,尝试将长期和短期偏好动态结合起来的想法可能更有效。

如第3节所述,基于会话内马尔可夫链的项相似度揭示了用户的短期偏好。基于这个观点,当会话的项交互次数非常少(即每个会话的项交互次数很低)时,会话间的项相似度也可能有助于寻找项相似度,并可以捕获用户的长期偏好。在本文未来的扩展版本中,我们还计划考虑会话间的项相似性。

Doing online experiments on a real-world system on real users in real time is extremely complicated and tricky to configure and interpret. Someone could argue, that in the offline data set, it is not mentioned what else the user saw on the screen when making decisions about the next item. For example, if an existing system was providing recommendations on the screen, that could have biased the data based on “what” was recommended and “where” on the user’s screen it was shown (e.g., center bias). Moreover, someone could argue that in the online experiments, there could have been some confounding factors, such as latency differences in the recommendation algorithms, or perhaps pre-existing differences in engagement between the two under comparison user groups, that could have affected the outcome. We have to mentioned that, the interface used to provide the recommended items, is shown in Figure 1, which depicts the items which are similar to the target item that the user interacts with.

在真实的系统上对真实的用户进行实时的在线实验有着非常复杂和棘手的配置和解释。

有人可能会说,在离线数据集中,用户在决定下一项时没有提到在屏幕上还看到了什么。例如,如果一个现有的系统在屏幕上提供建议,可能会根据推荐的内容和在用户屏幕上显示的位置而使数据产生偏差(例如中心偏差)。

此外,有人可能会说,在在线实验中,可能存在一些混淆因素,比如推荐算法中的延迟差异,或者在比较用户组中,被比较的两个用户组之间预先存在的参与度差异,这些都可能影响结果。必须提到的是,用于提供推荐项的界面如图1所示,它描述了与用户交互的目标项相似的项。

Another challenge for the offline evaluation of recommender systems is the identification of the size of the future horizon for our predictions and the time window size of how far back into the past we should go and exploit information to train our models. In this work, we have seen some evidence that these thresholds are very sensitive to the items’ life span and to how often users re-appear into the system. However, we need to make further investigations to prove this correlation. In any case, these factors should be taken under consideration for further optimising the recommendation accuracy.

离线评估推荐系统的另一个挑战是识别我们预测的未来范围大小,以及我们应该回到过去多久的时间窗口大小,并利用信息来训练我们的模型。

在这项工作中,我们看到的一些证据表明,这些阈值对推荐项的生命周期和用户回归系统的频率非常敏感。但是,我们还需要进一步的研究来证明这种相关性。在任何情况下,为了进一步优化推荐准确度,都应该考虑这些因素。

Our method is content agnostic and can be applied to different recommendation domains i.e., news recommendations in social media or even medicine recommendations for patients based on their Electronic Health Records (EHRs). Please note that for each different domain, our method only needs to sequentially process the user-item interactions. For example, in healthcare we can process the therapeutics (sessions) of patients (users) to consider the medicines (items) which were prescribed in each therapeutic.

我们的方法与内容无关,可以应用于不同的推荐领域,比如社交媒体中的新闻推荐,甚至是基于患者的电子健康记录(EHR)为其推荐药物。

请注意,对于每个不同的领域,我们的方法只需要顺序地处理用户-项交互。例如,在医疗保健中,我们可以处理患者(用户)的治疗方案(会话),以考虑在每个治疗中开出的药物(推荐项)。

8. 结论 Conclusion

In this paper, we proposed a Markov Chain-based TPM method to reveal the short-term intentions of individuals. We have evaluated experimentally our method and compare it against baselines and state-of-the-art algorithms in an offline and an online evaluation setting. We have shown the superiority of our method over its competitors specifically for the case where the items’ life span is very short (i.e., news and tv programs). As future work, we would like to test what is the users’ perception over different explanation styles (i.e., item-based, user-based, session-based, etc.), whenever users get an explanation along with a recommendation for reasons of transparency and accountability. Moreover, we plan to extend our algorithm to consider both the intra- and the inter-session item similarities, to deal with the problem of extreme data sparsity in user sessions.

在本文中,我们提出了一个基于马尔可夫链的TPM方法来揭示个体的短期意图。我们通过实验评估了我们的方法,并将其与基线和最先进的算法分别在离线和在线评估环境中进行了比较。我们已经展示了我们的方法优于它的竞争算法,尤其是在推荐项的寿命很短的情况下(例如新闻和电视节目)。

在未来的工作中,我们将测试用户对于不同的解释风格(即基于项目的、基于用户的、基于会话的等等)的看法,无论何时用户出于透明度和责任心而获得解释和建议。此外,我们计划扩展我们的算法,以考虑会话内和会话间项目的相似性,以解决用户会话中的极端数据稀疏问题。

引用 References

[1] Shuo Chen, Joshua L. Moore, Douglas Turnbull, and Thorsten Joachims. 2012. Playlist prediction via metric embedding. In The 18th ACM International Conference on Knowledge Discovery and Data Mining, KDD’12, Beijing, China. ACM, 714–722. https://doi.org/10.1145/2339530.2339643

[2] Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on World Wide Web. ACM, 271–280.

[3] Cagdas Esiyok, Benjamin Kille, Brijnesh Johannes Jain, Frank Hopfgartner, and Sahin Albayrak. 2014. Users’ reading habits in online news portals. In Fifth Information Interaction in Context Symposium, IIiX ’14, Regensburg, Germany. ACM, 263–266. https://doi.org/10.1145/2637002.2637038

[4] Florent Garcin, Christos Dimitrakakis, and Boi Faltings. 2013. Personalized news recommendation with context trees. In Proceedings of the 7th ACM conference on Recommender systems, RecSys’13, Hong Kong, China. ACM, 105–112. https://doi.org/10.1145/2507157.2507166

[5] Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin, and Amr Huber. 2014. Offline and online evaluation of news recommender systems at swissinfo. ch. In Proceedings of the 8th ACM Conference on Recommender systems. ACM, 169–176.

[6] Carlos A. Gomez-Uribe and Neil Hunt. 2015. The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Manage. Inf. Syst. 6, 4, Article 13 (Dec. 2015), 19 pages. https://doi.org/10.1145/2843948

[7] Qi He, Daxin Jiang, Zhen Liao, Steven C. H. Hoi, Kuiyu Chang, Ee-Peng Lim, and Hang Li. 2009. Web Query Recommendation via Sequential Query Prediction. In Proceedings of the 25th International Conference on Data Engineering, ICDE’09, Shanghai, China. IEEE, 1443–1454. https://doi.org/10.1109/ICDE.2009.71

[8] Balázs Hidasi and Alexandros Karatzoglou. 2017. Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. Proceedings of the 27th ACM International Conferernce on Information and Knowledge Management (2017).

[9]Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In Proceedings of the International Conference on Learning Representations (ICLR 2016), Vol. abs/1511.06939. arxiv:1511.06939http://arxiv.org/abs/1511.06939

[10] Balázs Hidasi, Massimo Quadrana, Alexandros Karatzoglou, and Domonkos Tikk. 2016. Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems(RecSys ’16). ACM, New York, NY, USA, 241–248. https://doi.org/10.1145/2959100.2959167

[11] Dietmar Jannach and Michael Jugovac. 2019. Measuring the business value of recommender systems. ACM Transactions on Management Information Systems (TMIS) 10, 4(2019), 1–23.

[12] Dietmar Jannach, Lukas Lerche, and Michael Jugovac. 2017. Adaptation and Evaluation of Recommendations for Short-term Shopping Goals. In Proceedings of the ninth ACM Conference on Recommender Systems(RecSys ’15). ACM, New York, NY, USA.

[13] D. Jannach and M. Ludewig. 2017. When Recurrent Neural Networks Meet the Neighborhood for Session-Based Recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems(RecSys ’17). ACM, New York, NY, USA, 306–310. http://doi.acm.org/10.1145/3109859.3109872

[14] Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems. 306–310.

[15] Benjamin Kille, Andreas Lommatzsch, Frank Hopfgartner, Martha Larson, and Arjen P de Vries. 2017. A stream-based resource for multi-dimensional evaluation of recommender algorithms. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1257–1260.

[16] Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1419–1428.

[17] Lei Li, Li Zheng, Fan Yang, and Tao Li. 2014. Modeling and broadening temporal user interest in personalized news recommendation. Expert Systems with Applications 41, 7 (2014), 3168 – 3177.

[18] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news recommendation based on click behavior. In Proceedings of the 15th international conference on Intelligent user interfaces. ACM, 31–40.

[19] Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018. STAMP: short-term attention/memory priority model for session-based recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1831–1839.

[20] Malte Ludewig and Dietmar Jannach. 2018. Evaluation of session-based recommendation algorithms. User Modeling and User-Adapted Interaction 28, 4-5 (2018), 331–390.

[21] Brian McFee and Gert R. G. Lanckriet. 2011. The Natural Language of Playlists. In Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR’11, Miami, Florida, USA. ISMIR, 537–542. http://ismir2011.ismir.net/papers/PS4-11.pdf

[22] Omar Moling, Linas Baltrunas, and Francesco Ricci. 2012. Optimal radio channel recommendations with explicit and implicit feedback. In Sixth ACM Conference on Recommender Systems, RecSys’12, Dublin, Ireland. ACM, 75–82. https://doi.org/10.1145/2365952.2365971

[23] Ioannis K. Paparrizos, Berkant Barla Cambazoglu, and Aristides Gionis. 2011. Machine learned job recommendation. In Proceedings of the 2011 ACM Conference on Recommender Systems, RecSys’11, Chicago, IL, USA. ACM, 325–328. https://doi.org/10.1145/2043932.2043994

[24] Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-aware recommender systems. ACM Computing Surveys (CSUR) 51, 4 (2018), 1–36.

[25] Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and Paolo Cremonesi. 2017. Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks. In Proceedings of the Eleventh ACM Conference on Recommender Systems(RecSys ’17). ACM, New York, NY, USA, 130–137. https://doi.org/10.1145/3109859.3109896

[26] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized Markov chains for next-basket recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW’10, Raleigh, North Carolina, USA. ACM, 811–820. https://doi.org/10.1145/1772690.1772773

[27] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. 1994. GroupLens: An open architecture for collaborative filtering on netnews. In Proceedings of the Computer Supported Collaborative Work Conference. 175–186.

[28] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web(WWW ’01). ACM, New York, NY, USA, 285–295. https://doi.org/10.1145/371920.372071

[29] Guy Shani, David Heckerman, and Ronen I. Brafman. 2005. An MDP-Based Recommender System. In Journal of Machine Learning Research (2005), Vol. 6. JMLR, 1265–1295. http://www.jmlr.org/papers/v6/shani05a.html

[30] Elena Smirnova and Flavian Vasile. 2017. Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks. In Proceedings of the 2Nd Workshop on Deep Learning for Recommender Systems(DLRS 2017). ACM, New York, NY, USA, 2–9. https://doi.org/10.1145/3125486.3125488

[31] Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved Recurrent Neural Networks for Session-based Recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems(DLRS 2016). ACM, New York, NY, USA, 17–22. https://doi.org/10.1145/2988450.2988452

[32] J. Vinagre, A.M. Jorge, and J. Gama. 2014. Evaluation of recommender systems in streaming environments. In Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD 2014), held in conjunction with RecSys 2014. Silicon Valley, USA.

[33] Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M Jose, and Xiangnan He. 2019. A simple convolutional generative network for next item recommendation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 582–590.

[34] Andrew Zimdars, David Maxwell Chickering, and Christopher Meek. 2001. Using Temporal Data for Making Recommendations. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence(UAI 2001). San Francisco, CA, USA, 580–588.

已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 撸撸猫 设计师:设计师小姐姐 返回首页