RecSys2020推荐系统论文Recommending the Video to Watch Next: An Offline and Online Evaluation at

Recommending the Video to Watch Next: An Offline and Online Evaluation at


Panagiotis Symeonidis, Free University of Bozen-Bolzano, Italy,
Andrea Janes, Free University of Bozen-Bolzano, Italy,
Dmitry Chaltsev, Free University of Bozen-Bolzano, Italy,
Philip Giuliani, Keep in Mind, Italy,
Daniel Morandini, Keep in Mind, Italy,
Andreas Unterhuber, Keep in Mind, Italy,
Ludovik Coba, Free University of Bozen-Bolzano, Italy,
Markus Zanker, Free University of Bozen-Bolzano, Italy,

RecSys '20: Fourteenth ACM Conference on Recommender Systems, Virtual Event, Brazil, September 2020

• Information systems → Collaborative filtering;
• Information systems → Recommender systems;

session-based recommendation, offline and online evaluation

摘要 Abstract

The task “recommend a video to watch next?” has been in the focus of recommender systems’ research for a long time. However, adequately exploiting the clues hidden in the sequences of actions of user sessions in order to reveal users’ short-term intentions moved only recently into the focus of research. Based on a real-world application scenario, in this paper, we propose a Markov Chain-based transition probability matrix to efficiently reveal the short-term preferences of individuals. We experimentally evaluated our proposed method by comparing it against state-of-the-art algorithms in an offline as well as a live evaluation setting. In both cases our method not only demonstrated its superiority over its competitors, but exposed a clearly stronger engagement of users on the platform. In the online setting, our method improved the click-through rate by up to 93.61%. This paper therefore contributes real-world evidence for improving the recommendation effectiveness, by considering sequence-awareness, since capturing the short-term preferences of users is crucial in the light of items with a short life span such as tv programs (news, tv shows, etc.).





1. 引言 Introduction operates as an online video recording service that allows users to record high quality German television programs of their choice and view them whenever and wherever they want. Users of can watch broadcasts directly after recording them online without any time delay (via streaming) or on the go by using an iOS or Android device. Personalized recommendations are therefore becoming a core functionality in order to enhance customer engagement and retention for such a service provider [11]. provides the functionality of recommending which videos to watch next based on the sequence of tv programs that the users have interacted with (see at the bottom of Figure 1). Consumer studies suggest that typical visitors of an Internet TV website lose their interest after clicking on 10 to 20 titles (perhaps reviewing 3 in more detail) [6]. That is, users either find something interesting within a short time-span or the likelihood that they abandon the service increases significantly.


因此,对于 [ 11 ] [11] [11]这样的服务提供商来说,个性化推荐正成为增强客户参与度和保留率的核心功能。YOUTV.de提供了根据与用户相交互的电视节目序列推荐接下来要观看哪些视频的功能(请参见图1底部)。

消费者研究表明,互联网电视网站的典型访问者在点击10到20个标题之后就会失去观看的兴趣(也许会更详细地回顾其中3个) [ 6 ] [6] [6]。也就是说,用户要么在短时间内找到感兴趣的东西,要么他们放弃服务的可能性大大增加。

图1. YOUTV的下一个视频推荐。这些推荐显示在底部的模块“Auch das konnte dir gefallen”(德语),翻译成“您可能也喜欢”。

Collaborative Filtering (CF) helps users in exploring videos that other users with similar behavior and taste have already identified in the past. That is, given a target user and her positively rated tv programs, a CF algorithm will identify their neighbourhood of similar users, i.e. user-based CF (UBCF) [27]. Another CF algorithmic variation is the item-based CF (IBCF) [28], where given a target user and her positively rated tv programs, the algorithm relies on the items’ similarities for the formation of a neighborhood of nearest items. Recently, session-based CF (SBCF) has been proposed by [12] also known as session-knn, where similar sessions to the current session of a user are identified. Please note that a session is a short-term interaction of a user with the system.

协同过滤算法(CF)可以帮助用户浏览那些有着相似行为和品味的用户在过去已经识别出来的视频。也就是说,给定目标用户及其好评过的电视节目,一个CF算法可以识别其他邻近的类似用户,即基于用户的CF(UBCF) [ 27 ] [27] [27]

CF的另一种算法变化是基于项目的CF(IBCF) [ 28 ] [28] [28],给定目标用户及其积极评价过的电视节目,该算法依赖观看项的相似性来形成最接近观看项的推荐项邻域。

最近, [ 12 ] [12] [12]也提出了基于会话的CF(SBCF),也称为Session-knn,该算法可以识别与用户当前会话相似的会话。请注意,会话是指用户与系统的短期交互。

However, all the aforementioned methods are considered as non-sequential, since they learn a user’s preference on each individual item, and then they rank these items based on their score to provide recommendations. This functionality makes these methods good at capturing the general tastes of users by using the whole historical information (i.e. by aggregating a user’s complete log history). In contrast, sequence-aware recommender systems capture the transition relationship between two or more adjacent items in user log sequences (i.e. user sessions) so as to learn the sequential dynamics among items [24]. For example, many sophisticated sequence-based approaches were proposed that implement some form of sequence modelling based on Markov Chain Models (MCM) [1,21,23], which can capture the item transition probabilities and the very last user intentions inside a user session.


而具有序列感知的推荐系统则通过捕捉用户日志序列(即用户会话)中两个或多个相邻观看项之间的转移关系,来了解观看项之间的序列动态 [ 24 ] [24] [24]。例如,许多复杂的基于序列的方法被提出,它们实现了基于马尔可夫链模型(MCM) [ 1 , 21 , 23 ] [1,21,23] [1,21,23]实现了某种形式的序列化建模,这些模型可以捕捉观看项之间的转移概率和用户会话中最新的用户意图。

In this paper, we use MCM to provide session-based video recommendations, by trying to capture the short-term preferences of users. To reveal the very last user’s intention, we analyse the item interactions inside her latest sessions (i.e., MCM-based items’ similarity) by building a Transition Probability Matrix (TPM). Moreover, we track the evolution of these preferences by using a sliding time window to assign smaller importance to videos that are outdated or old. Thus, our model is continuously updated with the latest user clicks, which makes it sensitive to adapt to the changes of the user preferences. We have run offline and online experiments with our proposed method and other baselines. In both settings, our method performs best. In particular, for online setting, our method which profiles users in real time improves the click-through rate by up to to 93.61%.



The rest of the paper is organized as follows. Section 2 summarizes the related work. Section 3 provides the problem formulation, whereas Section 4 presents a motivation toy example. Section 5 presents our proposed methodology. Experimental results are given in Section 6, whereas we discuss some methodological challenges in Section 7. Finally, Section 8 concludes the paper.


2. 相关工作 Related Work

There are several works [1, 3, 4, 7, 21, 23, 34] in recommender systems that use sequential modelling based on MCMs. Zimdars et al. [34] were the first to suggest the sequentially of the recommendation process. Esiyok et al. [3] studied the users’ behaviour in the context of news categories by building a MCM based on the Plista data set and by describing patterns in the evolution of news categories while users browse news articles online. Moreover, approaches for recommender systems that use MDPs are published by Moling et al. [22] and Shani et al. [29]. A hybrid model that combines MCM with MF is proposed by Rendle et al. [26] and it is denoted as Factorized Personalized Markov Chains (FPMC). FPMC is used for the next-basket recommendation problem. The task at hand is to predict user’s next basket content, given history of past shopping baskets. This approach combines MCM with MF using a three-dimensional tensor (user, current item, next item). Each entry in the tensor corresponds to an observed transition between two items performed by a specific user. The proposed method then uses pairwise factorisation to predict the unobserved entries in the sparse tensor [20]. However, this approach is computationally intensive, and does not scale well in real world on-line situations.


Zimdars等人 [ 34 ] [34] [34]首先提出了推荐过程的序列化建议。

Esiyok等 [ 3 ] [3] [3]通过基于Plista数据集来构建MCM,并描述了用户在线浏览新闻文章时,新闻类别演变的模式,研究了用户在新闻类别情景下的行为。

Moling et al. [ 22 ] [22] [22]和Shani et al. [ 29 ] [29] [29]公布了使用MDP的推荐系统的方法。

Rendle等 [ 26 ] [26] [26]提出了将MCM与MF相结合的混合模型,称为分解个性化马尔科夫链(FPMC)。FPMC用于下一个购物车推荐问题。给定过去购物车的历史记录,当前的任务是预测用户的下一个购物车内容。该方法利用三维张量(用户、当前项、下一项)将MCM与MF相结合。张量中的每一项都对应于一个特定用户所观察到的其他两项之间的转换。然后,该方法使用两两分解来预测稀疏张量中未观察到的项 [ 20 ] [20] [20]。然而,这种方法计算量大,在现实世界的在线情况下无法很好地扩展。

Session-based recommendation usually refers to the case, in which we only have anonymous sessions and we are not able to build a user’s profile. Recently, session-based recommendations have been modelled with Recurrent Neural Networks (RNNs). Hidasi et al. [9] presented a recommender system based on Gated Recurrent Unit (GRU), which learns when and how much to update the hidden state of the GRU model. However, a more recent study [12] have shown that a simple k-nearest neighbor (session-kNN) scheme adapted for session-based recommendations often outperforms the GRU4Rec model. Nevertheless, several new adjustments were proposed during last years that improve the performances of the initial RNN model [8, 10, 25, 30, 31]. Neural Attentive Recommendation Machine (NARM) [16] proposed an attentive mechanism to capture in a session both the user’s (i) sequential behavior and (ii) main purpose. Short-Term Attention/Memory Priority (STAMP) [19] model proposed another attention mechanism, which can effectively capture long-term and short-term interests of a user in a session. Yuan et al. [33] proposed a convolutional generative model, which combines masked filters with 1D dilated convolutions to model the long-range item dependencies among user sessions.


Hidasi等 [ 9 ] [9] [9]提出了一种基于门控递归单元(GRU)的推荐系统,该系统学习何时更新GRU模型的隐藏状态以及更新多少。

然而,最近的一项研究 [ 12 ] [12] [12]表明,一种简单的k-最近邻(Session-kNN)方案适用于基于会话的推荐,其性能往往优于GRU4Rec模型。

尽管如此,最近几年提出了一些新的调整措施,以改善初始RNN模型的性能 [ 8 , 10 , 25 , 30 , 31 ] [8,10,25,30,31] [8,10,25,30,31]

神经注意推荐机(NARM) [ 16 ] [16] [16]提出了一种注意机制,可以在会话中捕获用户的(i)连续行为和(ii)主要目的。短期注意/记忆优先级(Short-Term Attention/Memory Priority, STAMP)模型提出了另一种注意机制 [ 19 ] [19] [19],可以有效捕获用户在一个会话中的长期和短期兴趣。

Yuan等人 [ 33 ] [33] [33]提出了一种卷积生成模型,该模型将掩码过滤器与一维扩张卷积相结合,来建模用户会话之间的长期项依赖关系。

Another research direction is the combination of collaborative filtering with content-based filtering for providing item recommendations. Related work [17] has shown that a way to increase accuracy is to consider the context of the user (i.e., time, location, mood, etc). For example, Das et al. [2] generated recommendations based on collaborative filtering that takes under consideration the co-visitation count of articles, which is the number of times a news story was co-visited with another news stories in the user’s click-history. Later, Liu et al [18] combined a content-based method with the collaborative filtering method previously developed for Google News [2] to generate personalised news recommendations. The hybrid method develops a Bayesian framework for predicting users’ current news interests based on profiles learned from: (i) the target user’s activity and (ii) the news trends demonstrated in the activity of all users based on the categories that the news articles belong to. We will use in our experiments this method as a comparison partner since also videos have content that describes them and could be combined with the collaborative data.

另一个研究方向是将协同过滤与基于内容的过滤相结合,提供推荐项。相关工作 [ 17 ] [17] [17]表明,提高推荐准确性的一种方法是考虑用户的情景(例如时间、地点、心情等)。

例如,Das等人 [ 2 ] [2] [2]基于协同过滤生成推荐,考虑到了文章的共同访问次数,即用户点击历史中一个新闻故事与另一个新闻故事被共同访问的次数。

后来,Liu等 [ 18 ] [18] [18]将基于内容的方法与之前为Google News [ 2 ] [2] [2]开发的协同过滤方法相结合,生成个性化的新闻推荐。混合方法开发了一个贝叶斯框架来预测用户当前的新闻兴趣,该框架基于以下内容:(i)目标用户的活动;(ii)基于新闻文章所属类别在所有用户的活动中展示的新闻趋势。我们将在实验中使用这种方法作为比较算法,因为视频也有描述它们的内容,并可以与协作数据相结合。

3. 问题描述 Problem Formulation

We are interested in building a recommender system that suggests videos to interested internet television viewers (see the ”you may also like” section on Figure 1). The internet tv provider may update a small personalised top-N list of video recommendations (which may be shown inside a widget), every time a video is selected by the viewer, because the provider wants to engage the user more time in the website for reasons of advertisement and for fulfilling his viewing desires. The system monitors how visitors react upon the received recommendations to drive better suggestions and try to predict their next click/item inside a session.


Let U U U denote the increasing set of users that visit the online web site, and I I I represent the increasing set of incoming tv programs/items. We keep track of users’ actions over items on the website. In particular, whenever a user views one or more videos in a short period of time, we store these interactions in the database as a user’s session. These interactions with items have a sequence. That is, we know for every item that belongs to a session, if it is selected first, second or last and how long the user interacted with it. For instance, session S 1 ( u s e r = u 1 , T i m e S t a r t e d = t 1 ∣ i 1 , 20 s e c , i 2 , 145 s e c ) S_1(user = u_1, TimeStarted = t_1|{i_1, 20sec}, {i_2, 145sec}) S1(user=u1,TimeStarted=t1i1,20sec,i2,145sec) indicates that within session S 1 S_1 S1 that started at timepoint t 1 t1 t1 from user u 1 u_1 u1, item i 1 i_1 i1 was selected first, it was read for 20 seconds, i 2 i_2 i2 was selected second and it got user’s attention for 145 seconds. Table 1 summarises some basic symbols and notations that will be used later.

U U U表示不断增加的访问在线网站的用户集合, I I I表示不断增加的电视节目/观看项集合。


例如,会话 S 1 ( u s e r = u 1 , T i m e S t a r t e d = t 1 ∣ i 1 , 20 s e c , i 2 , 145 s e c ) S_1(user = u_1, TimeStarted = t_1|{i_1, 20sec}, {i_2, 145sec}) S1(user=u1,TimeStarted=t1i1,20sec,i2,145sec)表示在从用户 u 1 u_1 u1在时间点 t 1 t_1 t1开启了会话 S 1 S_1 S1,首先选择了项 i 1 i_1 i1,读取时间为20秒,其次选择了项 i 2 i_2 i2,它吸引了145秒的用户注意。表1总结了一些下文会用到的基本符号和表示。

表1. 符号及其描述

U U U用户集, U = U 1 , U 2 , . . . , U n U = {U_1, U_2, ...,U_n} U=U1,U2,...,Un
I I I项集, I = i 1 , i 2 , . . . , i m I = {i_1, i_2, ...,i_m} I=i1,i2,...,im
S S S会话集, S = S 1 , S 2 , . . . , S N S = {S_1, S_2, ...,S_N} S=S1,S2,...,SN
S u S_u Su用户 u u u所属的会话集
S i S_i Si i i i所属的会话集
S t p S^{t_p} Stp, S t y S^{t_y} Sty带有时间窗口的会话集
w w w滑动时间窗口的大小

4. 启发性示例 Motivating Example

Our session-based recommender consists of two modules. The first one is the user profile updater module, which reads instances from the stream of sessions combining them with earlier recorded information. In particular, our user profile updater assigns validity intervals to elements of the sessions stream S S S. Then, a sliding time window of size w w w states that the processing at a point in time t t t should respect all events not older than t − w t−w tw. Therefore, the profile updater sets a (half open) validity interval ( t − w , t ] (t−w, t] (tw,t] to an event that has been arisen at time t t t. Then, the second module is the recommender that runs on top of the profile updater to deliver the top- N N N recommended items to each user.


特别是,我们的用户信息文件更新模块为会话流 S S S中的元素分配有效期间隔。然后,大小为 w w w的滑动时间窗口限制在时间点 t t t的处理只需考虑不早于 t − w t−w tw的所有事件。因此,用户信息文件更新模块设置了(半开放)有效期间隔 ( t − w , t ] (t−w, t] (tw,t]

然后,第二个模块是运行在用户信息文件更新模块之上的推荐模块,它向每个用户推荐排名最高的前 N N N个推荐项。

To better explain our approach, we will use the following graphical representation for our running example, which is shown in Figure 2. We have 3 users and want to predict the video that user 2 will click next in his unfinished session (i.e. session S 7 S_7 S7).


图2. 我们的简易示例(用户-项目-会话)的可视化表示

For computing similarities between target user 2 with the two other users, please note that sessions S 1 S_1 S1 and S 2 S_2 S2 are ignored, due to being outside of the valid time window interval set as ( t − w , t ] (t−w, t] (tw,t]. This sliding window captures the notion of recency of tv programs. That is, a video may have a life span, which obsoletes fast (e.g. the news broadcast of the previous day is already old). Thus, we should try not to recommend videos to users that are not recent. Please note that still entertainment content, such as movies, tv series, and tv shows could be interesting for more extended time periods and have longer life span. When two or more items are selected within one session, these items can be considered to be more similar compared to items that were selected in different sessions from the same user. For instance, by taking actions of user 1 (i.e., U 1 U_1 U1) into account we can infer that item i 4 i_4 i4 is more similar to item i 6 i_6 i6 and i 7 i_7 i7 than item i 9 i_9 i9, since they were selected inside the same session S 4 S_4 S4 together with item i 4 i_4 i4.

为了计算目标用户2与其他两个用户之间的相似性,请注意,会话 S 1 S_1 S1 S 2 S_2 S2被忽略了,因为它们在有效时间窗口 ( t − w , t ] (t−w, t] (tw,t]之外。



当在一个会话中选择了两个或更多的观看项时,可以认为这些观看项与从同一用户在不同会话中选择的观看项更相似。例如,考虑到用户1(即 U 1 U_1 U1)的动作,我们可以推断观看项 i 4 i_4 i4比观看项 i 9 i_9 i9更类似于观看项 i 6 i_6 i6 i 7 i_7 i7,因为它们和观看项 i 4 i_4 i4是在同一个会话 S 4 S_4 S4中被选中的。

Our running example is depicted in Figure 2, session S 7 S_7 S7 is still open and it is running for user U 2 U_2 U2. Thus, items i 2 i_2 i2 and i 3 i_3 i3 of session S 7 S_7 S7 can be matched in order to make item recommendations to user U 2 U_2 U2. As shown, user U 3 U_3 U3 has also selected the same items (inside the valid window time interval) with U 2 U_2 U2. He has also selected i 8 i_8 i8, which could be a nice recommendation for U 2 U_2 U2. Please note that also U 1 U_1 U1 has selected exactly the same items with those of session S 7 S_7 S7 of U 2 U_2 U2. However, we cannot use session S 1 S_1 S1 for training recommendations since it is not inside the specified time window.

我们的运行示例如图2所示,会话 S 7 S_7 S7仍处于打开状态,并且正在为用户 U 2 U_2 U2运行。因此,可以匹配会话 S 7 S_7 S7中的观看项 i 2 i_2 i2 i 3 i_3 i3,从而对用户 U 2 U_2 U2提出推荐项建议。

如图所示,用户 U 3 U_3 U3也与 U 2 U_2 U2选择了相同的观看项(在有效的窗口时间间隔内), U 3 U_3 U3还选择了 i 8 i_8 i8,这可能是 U 2 U_2 U2的一个不错的推荐项。

请注意, U 1 U_1 U1也选择了与 U 2 U_2 U2的会话 S 7 S_7 S7完全相同的观看项。但是,我们不能使用会话 S 1 S_1 S1进行推荐训练,因为它不在指定的时间窗口内。

In summary, items that are selected within the same session (intra-session Markov Chain-based item similarity) are considered to be more similar than those items which are selected by the same user within different sessions (inter-session item similarity). Intra-session Markov Chain-based item similarity thus reveals the short-term preference of the user and his intentions inside a session independently of other sessions.


5. 方法 Methodology

In this Section, we will introduce our Markov Chain-based algorithm in more detail. In particular, for each user session, we update an item transition probability matrix based on the subsequences between items.


5.1. 马尔科夫链 Markov Chain

A Markov chain is a stochastic process of possible events that satisfies the Markov property, where the probability of each event depends only on the present state and not on the previous states. A variation of MCM, denoted as Markov Chain Model of Order m, states that the future state depends on the past m states. Hidden Markov Model (HMM) is also a MCM with hidden states. Moreover, Markov Decision Processes (MDPs) extend MCMs, where at each timepoint t t t, when the process is in state x t x_t xt, the decision maker may choose any action a ∈ A x t a\in A_{x_t} aAxt. MDP reacts at the next time step by randomly moving into a new state x t + 1 x_{t+1} xt+1, and giving the decision maker a corresponding reward R ( x t ,   a ,   x t + 1 ) R(x_t, a, x_t+1) R(xt,a,xt+1).



此外,马尔可夫决策过程(MDP)扩展了MCM,在每个时间点 t t t,当过程处于 x t x_t xt状态时,决策者可以在 A x t A_{x_t} Axt中选择任何动作 a ∈ A x t a\in A_{x_t} aAxt。MDP在下一步做出反应,并随机转移到新状态 x t + 1 x_{t+1} xt+1,同时给决策者一个相应的奖励 R ( x t , a , x t + 1 ) R(x_t, a, x_{t+1}) R(xt,a,xt+1)

One of the ways that a Markov Chain Model { X } \{X\} {X} can be represented is by using a transition matrix P i , j P_{i,j} Pi,j, where each row contains the probability of transition between states.
P i , j = P ( X t + 1 = x j ∣ X t = x i ) (1) \begin{aligned} P_{i,j}=\mathbb{P}(X_{t+1}=x_j|X_t=x_i) & \qquad\qquad\text{(1)} \end{aligned} Pi,j=P(Xt+1=xjXt=xi)(1)

马尔可夫链模型 { X } \{X\} {X}表示的方法之一是使用转移矩阵 P i , j P_{i,j} Pi,j,其中每一行包含状态之间的转移概率。 P i , j = P ( X t + 1 = x j ∣ X t = x i ) (1) \begin{aligned} P_{i,j}=\mathbb{P}(X_{t+1}=x_j|X_t=x_i) & \qquad\qquad\text{(1)} \end{aligned} Pi,j=P(Xt+1=xjXt=xi)(1)

Each row of the matrix is a probability vector, and the sum of its components is equal to 1. For instance, a transition matrix that represents a model with three possible states:
P = [ 0.7 0.05 0.25 0.25 0.6 0.15 0.35 0.1 0.55 ] P=\begin{bmatrix} 0.7 & 0.05 & 0.25 \\ 0.25 & 0.6 & 0.15 \\ 0.35 & 0.1 & 0.55 \\ \end{bmatrix} P=

P = [ 0.7 0.05 0.25 0.25 0.6 0.15 0.35 0.1 0.55 ] P=\begin{bmatrix} 0.7 & 0.05 & 0.25 \\ 0.25 & 0.6 & 0.15 \\ 0.35 & 0.1 & 0.55 \\ \end{bmatrix} P=

If the current state of the process is x 2 x_2 x2, i.e. the second row of the matrix is considered, then the probability of transition into the state x 3 x_3 x3 equals to P ( X t + 1 = x 3 ∣ X t = x 2 ) = 0.15 P( X_{t+1}=x_3| X_t=x_2)=0.15 P(Xt+1=x3Xt=x2)=0.15. Another way to represent MCM is with a transition diagram, that is a weighted directed graph, where each vertex represents a state of the MCM and there is a directed edge from vertex x i x_i xi to vertex x j x_j xj if the transition probability P i , j > 0 P_{i,j}>0 Pi,j>0; this edge has the weight/probability of P i , j P_{i,j} Pi,j. An example of such diagram can be found in Figure 3.

如果马尔科夫过程当前状态为 x 2 x_2 x2,即考虑矩阵的第二行,则转移到状态 x 3 x_3 x3的概率等于 P ( X t + 1 = x 3 ∣ X t = x 2 ) = 0.15 P(X_{t+1}=x_3| X_t=x_2)=0.15 P(Xt+1=x3Xt=x2)=0.15(由 x 2 → x 3 x_2\to x_3 x2x3的转移概率,即第二行第三列的值)。

另一种表示MCM的方法是使用转移图,即加权有向图,其中每个顶点表示MCM的一种状态,如果转移概率 P i , j > 0 P_{i,j}>0 Pi,j>0,则有一条从顶点 x i x_i xi到顶点 x j x_j xj的有向边;这条边的权值/概率为 P i , j P_{i,j} Pi,j,图3中表示了这样一个例子。

图3. 一个马尔可夫链转换图示例。模型有三种状态 x 1 , x 2 , x 3 {x_1, x_2, x_3} x1,x2,x3。位于有向边附近的数字表示了由状态到状态的转移概率。

5.2. 本文提出的算法 Proposed Method

Based on the Bayesian inference that considers independence among evidences, we can predict the items that will be included in a last session S N S_N SN of a user u u u based on the items that are already included in S N S_N SN. In particular, we can use the following formula to build the Markov Chain-based transition probabilities between any two subsequent items in each distinct session in time window t p t_p tp as follows:
p ( j ∈ S N ∣ i 1 : m ∈ S N ) ∝ ∏ i k ∈ S N , k = 1... m p ( j ∈ S N ∣ i k ∈ S N ) (2) p(j\in S_N|i_{1:m}\in S_N)\propto\prod_{i_k\in S_N,k=1...m}p(j\in S_N|i_k\in S_N)\qquad\qquad\text{(2)} p(jSNi1:mSN)ikSN,

where i k i_k ik is the set of items that user u u u already has clicked in current session S N S_N SN, and j j j is the item to be predicted as next recommended item in S N S_N SN.

基于考虑证据独立性的贝叶斯推断,我们可以根据已经包含在会话 S N S_N SN中的观看项来预测用户 u u u最近一次会话 S N S_N SN中将要包含的推荐项。

具体来说,我们可以使用以下公式,在时间窗口 t p t_p tp中的每个不同会话的后续任意两个观看项之间构建基于马尔科夫链的的转移概率,如下所示:
p ( j ∈ S N ∣ i 1 : m ∈ S N ) ∝ ∏ i k ∈ S N , k = 1... m p ( j ∈ S N ∣ i k ∈ S N ) (2) p(j\in S_N|i_{1:m}\in S_N)\propto\prod_{i_k\in S_N,k=1...m}p(j\in S_N|i_k\in S_N)\qquad\qquad\text{(2)} p(jSNi1:mSN)ikSN,

其中, i k i_k ik是用户 u u u在当前会话 S N S_N SN中已经点击的观看项集合, j j j S N S_N SN中预测为下一个推荐项的集合。

The Markov Chain-based item Transition Probability Matrix(TPM) captures the transition probability between two subsequent events in a session. That is, we can simply count how often users viewed item i b i_b ib immediately after viewing item i a i_a ia.

基于马尔可夫链的项转移概率矩阵(TPM)捕捉一个会话中两个后续事件之间的转移概率。也就是说,我们可以简单地计算出用户在查看观看项 i a i_a ia后立即查看观看项 i b i_b ib的频率。

Let a session Sn be a chronologically ordered set of item click events S n = ( i 1 , i 2 , . . . , i m ) S_n=(i_1,i_2,...,i_m) Sn=(i1,i2,...,im) and S S S be a set of all sessions S = S 1 , S 2 , . . . , S N S={S_1,S_2,...,S_N} S=S1,S2,...,SN. Given a user’s current session S N S_N SN with i m i_m im being the last item in S N S_N SN, we can define the score for a recommendable item j j j as follows: p ( j ∈ S N ∣ i m ∈ S N ) = s c o r e ( j , i m ) = ∑ S n ∈ S , n = 1... N ∑ i k ∈ S n , n = 1... N i s S a m e ( i m , i k ) ⋅ i s S a m e ( j , i k + 1 ) ∑ S n ∈ S , n = 1... N ∑ i k ∈ S n , n = 1... N i s S a m e ( i m , i k ) (3) \begin{aligned} p(j\in S_N|i_m\in S_N) & = score(j,i_m) \\ & = \frac{\sum_{S_n\in S,n=1...N}\sum_{i_k\in S_n,n=1...N}isSame(i_m,i_k)\cdot isSame(j,i_{k+1})}{\sum_{S_n\in S,n=1...N}\sum_{i_k\in S_n,n=1...N}isSame(i_m,i_k)}\\ \end{aligned}\qquad\text{(3)} p(jSNimSN)=score(j,im)=SnS,n=1...NikSn,n=1...NisSame(im,ik)SnS,n=1...NikSn,n=1...NisSame(im,ik)isSame(j,ik+1)(3)

where the function i s S a m e ( i a , i b ) isSame(i_a, i_b) isSame(ia,ib) indicates where i a i_a ia and i b i_b ib refer to the same item as follows: i s S a m e ( i a , i b ) = { 1 , 若  i a = i b 0 , 若  i a ≠ i b isSame(i_a,i_b) = \begin{cases} 1, & \text{若 $i_a=i_b$} \\ 0, & \text{若 $i_a\neq i_b$} \end{cases} isSame(ia,ib)={1,0, ia=ib ia=ib

设会话Sn为一组按时间顺序排列的观看项单击事件 S n = ( i 1 , i 2 , . . . , i m ) S_n=(i_1,i_2,...,i_m) Sn=(i1,i2,...,im) S S S是所有会话的集合 S = S 1 , S 2 , . . . , S N S={S_1,S_2,...,S_N} S=S1,S2,...,SN

给定用户当前会话 S N S_N SN,其中 i m i_m im S N S_N SN中的最后一个观看项,我们可以定义一个推荐项 j j j的评分,如下所示: p ( j ∈ S N ∣ i m ∈ S N ) = s c o r e ( j , i m ) = ∑ S n ∈ S , n = 1... N ∑ i k ∈ S n , n = 1... N i s S a m e ( i m , i k ) ⋅ i s S a m e ( j , i k + 1 ) ∑ S n ∈ S , n = 1... N ∑ i k ∈ S n , n = 1... N i s S a m e ( i m , i k ) (3) \begin{aligned} p(j\in S_N|i_m\in S_N) & = score(j,i_m) \\ & = \frac{\sum_{S_n\in S,n=1...N}\sum_{i_k\in S_n,n=1...N}isSame(i_m,i_k)\cdot isSame(j,i_{k+1})}{\sum_{S_n\in S,n=1...N}\sum_{i_k\in S_n,n=1...N}isSame(i_m,i_k)}\\ \end{aligned}\qquad\text{(3)} p(jSNimSN)=score(j,im)=SnS,n=1...NikSn,n=1...NisSame(im,ik)SnS,n=1...NikSn,n=1...NisSame(im,ik)isSame(j,ik+1)(3)

其中 i s S a m e ( i a , i b ) isSame(i_a, i_b) isSame(ia,ib)表示 i a i_a ia i b i_b ib指的是同一个观看项,如下所示: i s S a m e ( i a , i b ) = { 1 , 若  i a = i b 0 , 若  i a ≠ i b isSame(i_a,i_b) = \begin{cases} 1, & \text{若 $i_a=i_b$} \\ 0, & \text{若 $i_a\neq i_b$} \end{cases} isSame(ia,ib)={1,0, ia=ib ia=ib

Based on Equation 3, in our running example of Figure 2, transition probability from item i 4 i_4 i4 to item i 6 i_6 i6 is equal to 1 2 \frac{1}{2} 21, and it is so since in all the sessions of time window t p t_p tp there is only one case where i 4 i_4 i4 is followed by i 6 i_6 i6 (session S 4 S_4 S4); and the denominator is equal to two, since there are two sessions where i 4 i_4 i4 is followed by any other item (sessions S 3 S_3 S3 and S 4 S_4 S4). The markov chain-based transition probability matrix of our running example is presented in Table 2 (rows and columns with zeros are not shown).

根据公式3,在图2所示的运行示例中,从观看项 i 4 i_4 i4到观看项 i 6 i_6 i6的转移概率等于 1 2 \frac{1}{2} 21,因为在时间窗口 t p t_p tp内的所有会话中,只有1种情况是 i 4 i_4 i4后面跟着 i 6 i_6 i6 (会话 S 4 S_4 S4);分母等于2是因为在2个会话中 i 4 i_4 i4后面跟着有其他观看项(会话 S 3 S_3 S3 S 4 S_4 S4)。


表2. 运行示例中的基于马尔可夫链的转移概率矩阵。

基于马尔可夫链的转移概率矩阵 i 1 i_1 i1 i 2 i_2 i2 i 3 i_3 i3 i 6 i_6 i6 i 7 i_7 i7 i 8 i_8 i8
i 2 i_2 i2 0 0 0 0 0 0 1 2 \frac{1}{2} 21 0 0 0 0 0 0 1 2 \frac{1}{2} 21
i 3 i_3 i3 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
i 4 i_4 i4 1 2 \frac{1}{2} 21 0 0 0 0 0 0 1 2 \frac{1}{2} 21 0 0 0 0 0 0
i 6 i_6 i6 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0

To summarise, Markov Chain-based TPM infers similarity among items inside each session independently from other sessions. Please note that the Markov Chain-based TPM similarity is more effective with smaller time window sizes w w w, which makes it more suitable for capturing the short-term user preferences.


请注意,基于马尔可夫链的TPM相似性在较小时间窗 w w w下更有效,这使得它更适合捕捉短期用户偏好。

5.3. 推荐列表创建 Recommendation List Creation

Our recommender module provides recommendations based on the Markov Chain-based TPM presented in previous section. For each target user u u u, the recommender checks the set of her recently viewed items I t p , u I_{t_p,u} Itp,u(i.e., the ones she has interacted with in the current time period t p t_p tp) and computes K i K_i Ki, which is the set of the k k k nearest items to each item i i i that belongs in I t p , u I_{t_p,u} Itp,u. Next, for each target user u u u in t p t_p tp and for each item j j j we compute a ranking score s c o r e ( t p , u , j ) score(t_p,u,j) score(tp,u,j) as follows: s c o r e ( t p , u , j ) = ∑ i ∈ I t p , u T P M ( i , j ) ∗ 1 ( j , K i ) (3) score(t_p,u,j)=\sum_{i\in I_{t_p,u}}TPM(i,j)*1_{(j,K_i)}\qquad\text{(3)} score(tp,u,j)=iItp,uTPM(i,j)1(j,Ki)(3)

我们的推荐模块基于上一节介绍的基于马尔可夫链的TPM提供推荐建议。对于每个目标用户 u u u,推荐程序检查其最近的观看项集合 I t p , u I_{t_p,u} Itp,u(即她在当前时间段 t p t_p tp内互动过的项),并计算 K i K_i Ki,它是 I t p , u I_{t_p,u} Itp,u i i i k k k最近的观看项集合。

接下来,对每个目标用户 u u u在时间窗口 t p t_p tp内的每个观看项 j j j,计算一个排名得分 s c o r e ( t p , u , j ) score(t_p,u,j) score(tp,u,j),如下: s c o r e ( t p , u , j ) = ∑ i ∈ I t p , u T P M ( i , j ) ∗ 1 ( j , K i ) (3) score(t_p,u,j)=\sum_{i\in I_{t_p,u}}TPM(i,j)*1_{(j,K_i)}\qquad\text{(3)} score(tp,u,j)=iItp,uTPM(i,j)1(j,Ki)(3)

where 1 j , K i 1_{j,K_i} 1j,Ki is an indicator function that is equal to 1 1 1 if the item j j j is present within the k-nearest neighbors of item i i i, and 0 0 0 otherwise. Moreover, T P M ( i , j ) TPM(i,j) TPM(i,j) is a function that returns a similarity score for two items i i i and j j j based on the Markov Chain-based TPM (transition probability matrix) that we computed in the previous section. Then, for each user we sort the items in decreasing score and recommend to her the top-N ones.

其中 1 ( j , K i ) 1_{(j,K_i)} 1(j,Ki)是一个指示函数,如果项 j j j处于项 i i i的k近邻中,则等于 1 1 1,否则等于 0 0 0

此外, T P M ( i , j ) TPM(i,j) TPM(i,j)是一个函数,它根据我们在上一节计算的基于马尔可夫链的TPM(转移概率矩阵)返回两个项 i i i j j j之间的相似性评分。然后对于每个用户,我们按照分数递减的顺序对推荐项进行排序,并向用户推荐前 N N N个项。

6. 实证评估 Experimental Evaluation

In this Section, we will perform off-line and on-line evaluation of our proposed method together with other baselines and state-of-the-art comparison partners.


6.1. 离线评估 Offline Evaluation

6.1.1 数据集特征 Data Set Characteristics

For the offline evaluation, the data set was collected during 2 weeks in May 2018 (18/5/2018-1/6/2018). It accommodates 1,146,452 interactions/events on 63,897 videos of 18,447 unique users. The interactions of each session are logged with the following information: the user session’s identifier, the interaction’s time stamp and duration, the tv program’s textual content. As shown in Figure 4(a), fifty percent (50%) of user sessions in the offline data set have only one video interaction. However, as can be seen in Figure 4(b), most users interact with two videos before leaving the service in the online evaluation scenario. As will be explained later, this happens because user satisfaction is increased due to the existence of a new better recommendation algorithm in the A/B testing phase. Please note that for the data set used in the offline evaluation, there was a cleaning procedure, which lied in removing the sessions that contain only one video interaction, as no recommendations can be tested on such sessions, and no video co-occurence item patterns can be identified to build a model. Detailed general statistics of the data sets used for the offline and the online evaluation are summarized in Table 3.



图4. 用于(a)离线和(b)在线评估的会话交互数量。


表3. 数据集的总体统计


Please note that for the offline evaluation, the average number of interactions per session is 3.6, which seems adequate for building a recommendation model.


6.1.2. 事前评估方案 Prequential Evaluation Protocol

In this Section, we present the evaluation protocol used for the offline evaluation, which is in the same direction, with the one introduced by Jannach et al. [12, 20] for predicting the next item inside a session, known also as prequential evaluation in stream mining [24, 32].

在本节中,我们将介绍用于离线评估的评估方案,该方案与Jannach等人 [ 12 , 20 ] [12,20] [12,20]所介绍的用于预测会话内下一个推荐项的评估方案方向相同,也称为流挖掘中的事前评估 [ 24 , 32 ] [24,32] [24,32]

As shown in Figure 5, the time frame of the available data is split into N t N_t Nt equal time periods t p t_p tp, and the data are then split in the way that in each period there are only sessions that were made during that time period. We use the splitting to later aggregate the evaluation results for each time period. Please note that t p t_p tp is the size of the horizon of the future that we will be able to predict. In the next section, we will try to identify what is the best future horizon that we will be able to predict more effectively (e.g. some hours in future, or some days in future).

如图5所示,可用数据的时间窗口被划分成 N t N_t Nt相等的时间段 t p t_p tp,然后按照以下方式分割数据:每个时间段内只包含在该时间段内创建的会话。我们使用分割之后再汇总每个时间段的评估结果。请注意, t p t_p tp是我们能够预测的未来范围。



图5. 事前评估

We also use parameter w w w to specify the window size on which the model is trained, which is defined as “Train data” in Figure 5. Parameter w w w controls how far back into the past we go to exploit information. Please note that if w w w is too large the system is not sensitive to changes (concept drifts). If it is too small there is not enough data to build a model predicting the next items in a session. In the test phase we also use a parameter v v v to specify how many views of a currently evaluated session are revealed before recommendation is made. As shown in the “Test phase” rectangle in Figure 5, after the first prediction is tested, the data point is added to the model, and the process repeats until there are data points in the session to be tested. Finally, we evaluate the precision (i.e., the number of hits divided by the number of recommended items) we get when we recommend top-5 videos for each next item prediction inside a session.

我们还使用参数 w w w来指定被训练模型的时间窗口大小,在图5中被定义为“Train data”。参数 w w w控制了我们利用过去信息的时间。请注意,如果 w w w太大,系统会对变化不敏感(概念漂移)。如果它太小,就会没有足够的数据来构建预测会话中下一个推荐项的模型。

在测试阶段,我们还使用参数 v v v来指定在提出建议之前显示当前评估会话的视图数。如图5中的“Test phase”矩形所示,在测试完第一个预测之后,将数据点添加到模型中,并重复该过程,直到会话中有待测试的数据点为止。


6.1.3. 所提出算法的灵敏度分析 Sensitivity Analysis of the proposed method

In this Section, we study the accuracy performance of the Markov Chain-based Transition Probability Matrix (TPM). We will explore, how the precision of the aforementioned method changes as we vary different parameters such as (i) different time period splits: N t = 14 N_t=14 Nt=14 time periods of 24 24 24 hours, 28 28 28 time periods of 12 12 12 hours, 55 55 55 time periods of 6 6 6 hours, 331 331 331 time periods of 1 1 1 hour (ii) various time window sizes: w = 1 , 3 , 9 , 12 , 15 , 18 , 21 w = 1, 3, 9, 12, 15, 18, 21 w=1,3,9,12,15,18,21. For all experiments we provide top-5 recommended items.


(i)不同的时间段划分: N t = 14 N_t=14 Nt=14 24 24 24小时时间段、 28 28 28 12 12 12小时时间段、 55 55 55 6 6 6小时时间段、 331 331 331 1 1 1小时时间段;

(ii)不同时间窗口大小: w = 1 、 3 、 9 、 12 、 15 、 18 、 21 w=1、3、9、12、15、18、21 w=13912151821


As it is shown in Figure 6(a), as we split the timeline of data (i.e., the 14 days) in different time periods, recommendation accuracy changes drastically. As discussed earlier, parameter N t N_t Nt controls how big the future horizon (that we will try to predict) is. With this experiment, we want to identify how short or long this future horizon should be to have the best recommendation accuracy. Of course, this is related to the life span of the items and how often users re-appear into the system or change their preferences. Thus, by setting the time window size w = 1 w=1 w=1, when N t = 331 N_t=331 Nt=331 and t p = 1 t_p=1 tp=1 hour, the precision we get when we recommend 5 items (i.e., precision@5) is less than 0.04 0.04 0.04. When N t = 28 N_t=28 Nt=28 and t p = 12 t_p=12 tp=12 hours, we are able to get the best precision (0.051). Later, our prediction accuracy again drops as we consider N t = 14 N_t=14 Nt=14 and t p = 24 t_p=24 tp=24 hours. Henceforth, we will set N t = 28 N_t=28 Nt=28 and t p = 12 t_p=12 tp=12 hours for the rest experiments, which means that we are able to predict better when we try to predict the next 12 hours.

如图6(a)所示,当我们将数据的时间线(14天)划分为不同的时间段时,推荐准确度会发生急剧地变化。正如前面所讨论的,参数 N t N_t Nt控制未来我们将尝试预测的时间范围有多长。通过这个实验,我们想要确定未来的时间范围应该是多长才能获得最佳的推荐准确度。当然,这与推荐项的生命周期、用户回归系统(即用户结束了上一个会话重新开始了一个新会话)或更改首选项的频率有关。

因此,通过设置时间窗口大小 w = 1 w=1 w=1,当 N t = 331 N_t=331 Nt=331 t p = 1 t_p=1 tp=1个小时时,我们在推荐5个推荐项时得到的准确度小于 0.04 0.04 0.04

N t = 28 N_t=28 Nt=28 t p = 12 t_p=12 tp=12个小时时,可以得到最佳的准确度(0.051)。

之后,当我们考虑 N t = 14 N_t=14 Nt=14 t p = 24 t_p=24 tp=24小时时,得到的预测准确度再次下降。

此后,我们将其余的实验设 N t = 28 N_t=28 Nt=28 t p = 12 t_p=12 tp=12小时,这意味着我们可以更好地预测接下来的12小时。


图6. 对于离线评估,基于马尔科夫链的TPM在不同(a)时间段 t p = 1 、 6 、 12 、 24 t_p=1、6、12、24 tp=161224小时和(b)窗口大小 w = 1 、 3 、 9 、 12 、 15 、 18 、 21 w=1、3、9、12、15、18、21 w=13912151821(小时)下的准确度性能。

6.1.4. 与其他算法的比较 Comparison with other methods

In this Section, we compare our Markov Chain-based TPM algorithm with the following baselines and state-of-the-art comparison partners, which are representatives of different algorithmic families such as collaborative filtering (IBCF), session-based filtering (Session-knn), GRU4Rec and hybrid collaborative with content-based filtering (Cat-TPM).


(i) Recently Most Popular Items (Recently POPULAR): Recently POPULAR baseline recommends the top-N most clicked videos of the active/valid time period t p t_p tp.
(ii) Item-based Collaborative Filtering (IBCF) [ 2 ] [2] [2]: Based on IBCF, two items are considered similar, if they are selected by similar users. In [ 2 ] [2] [2], IBCF considers the co-visitation count of news articles, which counts the number of times an item was co-visited (clicked before of after) with another item.
(iii) Session-knn [ 13 ] [13] [13]: Session-knn method takes the set of user actions in the current session, e.g. two view events for certain items, and then in a first step determines the k most similar past sessions in the training data. Then, given the current session s s s, the set of k k k nearest neighbors N s N_s Ns, and a function s i m ( s 1 , s 2 ) sim(s_1,s_2) sim(s1,s2) that returns a similarity score for two sessions s 1 s_1 s1 and s 2 s_2 s2, the score of a recommendable item i i i is: s c o r e K N N ( i , s ) = ∑ n ∈ N s s i m ( s , n ) × 1 n ( i ) (3) score_{KNN}(i,s)=\sum_{n\in N_s}sim(s,n)\times1_n(i)\qquad\qquad\text{(3)} scoreKNN(i,s)=nNssim(s,n)×1n(i)(3)

where 1 n ( i ) = 1 1_n(i)=1 1n(i)=1 if n n n contains i i i and 0 0 0 otherwise. The similarity measure used by Jannach et al. [ 13 ] [13] [13] in experiments is cosine similarity, as it was found out that the best results are achieved when encoding sessions as binary vectors of the item space.
(iv) GRU4Rec [ 9 ] [9] [9]: GRU4Rec is a neural network-based recommender system that uses Gated Recurrent Units (GRU), which learns when and how much to update the hidden state of the GRU model. In particular, GRU4Rec is a recurrent neural network, which modifies the basic GRU to fit the prediction task better by introducing session-parallel mini-batches, mini-batch output negative sampling and a pairwise ranking loss function.
(v) Category-based TPM: (Cat-TPM) [ 18 ] [18] [18]: Based on Cat-TPM, when a user selects two videos in a row, a transition from a category of the first video to a category of the second video is recorded. Cat-TPM combines the content-based with the collaborative filtering methods to generate the personalized Google news recommendations [ 18 ] [18] [18].

(i) 最近最流行的推荐项(最近流行):最近流行的基线推荐在活动/有效时间段 t p t_p tp内点击次数排名前n的视频。

(ii) 基于项目的协同过滤(IBCF) [ 2 ] [2] [2]:基于IBCF,如果两个项目被相似的用户选择,则认为它们是相似的。在文献 [ 2 ] [2] [2]中,IBCF考虑新闻文章的共同访问次数,它用于统计一个推荐项与另一个推荐项被共同访问(单击之前或之后)的次数。

(iii) Session-knn [ 13 ] [13] [13]:Session-knn方法取当前会话中的用户动作集,例如两次查看某些观看项的事件,然后在第一步确定训练数据中k个之前最相似的会话。那么,给定当前会话 s s s k k k最近邻的集合 N s N_s Ns和函数 s i m ( s 1 , s 2 ) sim(s_1,s_2) sim(s1,s2),该函数返回两个会话 s 1 s_1 s1 s 2 s_2 s2的相似性得分,那么一个推荐项 i i i的得分表示为: s c o r e K N N ( i , s ) = ∑ n ∈ N s s i m ( s , n ) × 1 n ( i ) (3) score_{KNN}(i,s)=\sum_{n\in N_s}sim(s,n)\times1_n(i)\qquad\qquad\text{(3)} scoreKNN(i,s)=nNssim(s,n)×1n(i)(3)

其中: 1 n ( i ) = { 1 , 若 n 包含 i 0 , 其他 1_n(i) = \begin{cases} 1, & \text{若$n$包含$i$} \\ 0, & \text{其他} \end{cases} 1n(i)={1,0,n包含i其他Jannach等人在实验中使用的相似度度量为余弦相似度,因为我们发现将会话编码为项空间的二进制向量时效果最好。

(iv) GRU4Rec [ 9 ] [9] [9]:GRU4Rec是一种基于神经网络的推荐系统,它使用门控循环单元(GRU)来学习何时更新GRU模型的隐藏状态以及更新多少。其中,GRU4Rec是一种递归神经网络,通过引入会话并行迷你批,使其更好地适应预测任务,迷你批输出负抽样和成对排序损失函数,对基本的GRU进行了改进。

(v) Category-based TPM:(Cat-TPM) [ 18 ] [18] [18]:基于Cat-TPM,当用户连续选择两个视频时,会记录从第一个视频的类别到第二视频的类别的转移。Cat-TPM将基于内容的方法与协同过滤方法相结合,生成个性化的谷歌新闻推荐。

The parameters we used to evaluate the performance of all the aforementioned comparison partners are in accordance with those reported in the original papers and for our data set were tuned so as to get the best results for these methods.


Figure 7 reports the average precision over the two weeks of the comparison algorithms for N t = 28 N_t=28 Nt=28 time splits, t p = 12 t_p=12 tp=12 hours and sliding time window w = 18 w=18 w=18 for the offline data set. We run experiments with top-5 recommended tv programs. The reported results are tested for the difference of means between Markov Chain-based TPM and each of the rest comparison partners and found statistically significant based on one-sided t-test at the 0.05 level. As shown in Figure 7, our proposed approach has the best average precision over the 2 weeks among all comparison partners. This happens because we take under consideration the sequence between the items clicked inside a user session. This allows Markov Chain-based TPM to capture better the notion of recency of user’s interest (i.e. short-term preferences).

图7显示了在时间划分 N t = 28 N_t=28 Nt=28、活动时间 t p = 12 t_p=12 tp=12和滑动时间窗口 w = 18 w=18 w=18的情况下,对于为期两周的离线数据集上的比较算法平均准确度。

我们用推荐排名最高的前5个推荐项。图中的结果显示了基于马尔可夫链的TPM与其他比较算法之间的均值差异,发现单侧 t t t检验的统计特征可以达到 0.05 0.05 0.05



图7. 离线评估,基于马尔科夫链的TPM在不同基线及其组合下的准确度性能比较。请注意,由于我们总是推荐5个项,并希望预测下一个推荐项,因此我们可以达到的最大准确度是20%(不是100%)。

As far as the rest comparison partners is concerned, “Recently POPULAR” baseline attains the worst results in terms of precision, since it cannot provide personalized recommendations. As expected, Session-knn attains better results than GRU4Rec as already reported in [ 14 ] [14] [14]. IBCF does not attain very good results because there are not enough data to build its prediction model. That is, many users re-appear irregularly and very rarely at timepoints that are far apart, which means that collaborative filtering cannot build always a model, since users should appear in two consecutive time slots. In other words, IBCF should be better for capturing the long-term preferences of a user and it is not so effective for items which have short life span such as news stories, tv programs, etc. We will further analyze this in the discussion section later. Moreover, as expected, Session-knn is far worst than Markov Chain-based TPM, because it cannot capture adequately the latent associations among items inside the same session. Moreover, the Cat-TPM which tries to combine collaborative filtering with content-based filtering (i.e., the category of the video is used) is also very ineffective because it probably is able to build only the long-term profile of users but misses to identify the very last short-term intentions of the users. Please note that in our experiments we have tested a combination of our Markov Chain-based TPM with the Cat-TPM, but this surprisingly did not resulted in better recommendaiton accuracy.


意料之中,Session-knn获得了比GRU4Rec更好的结果,正如在文献 [ 14 ] [14] [14]中提到的那样。




6.2. 在线评估 Online Evaluation

The online A/B testing experiment was performed in 30 days between June and July 2019 (5/6/2019-4/7/2019). From the offline evaluation, we have selected the two best single (non-hybrid) methods in terms of precision, which was our Markov Chain-based TPM and the IBCF methods. We wanted to check their performance with a live experiment, to understand the user experience with our personalized tv program recommendations. In particular, we conducted experiments on a fraction of the live traffic at YOUTV, similarly to the procedure followed in the personalized Google news recommendations [ 18 ] [18] [18] work. The users were randomly assigned to a control group and a test group. The two groups had about the same number of users. The total number of users is 22,375, which have 2,630,008 interactions on 132,895 videos, as can be seen from Table 3. Please note that the engagement of users in the service has increased drastically since the average number of interactions per session is increased to 7.24 from 3.6 of the previous year. That is, at least half of the users (i.e., test group) in the online evaluation increased their interaction with the service due to the existence of a new recommendation algorithm. As can be seen in Figure 4(b), users interact with two videos before leaving the service. In addition, the average number of interactions per session (+3.64) and per user (+55.39) markedly increased (almost doubled) from the offline data set, as can be also seen by Table 3.



特别是,我们在YOUTV上对一小部分实时流量进行了实验,类似于谷歌个性化新闻推荐 [ 18 ] [18] [18]工作中所遵循的过程。用户被随机分配到对照组和测试组。这两组的用户数量大致相同。如表3所示,用户总数为22375,他们在132895个视频上交互了2630008次。


The evaluation protocol was the following: When a logged-in YOUTV user (who also explicitly has enabled web history) visited the website, we recommended to him tv programs. In our experiment, the users in the control group get video recommendations from the existing collaborative filtering method (i.e., IBCF method); while the new Markov Chain-based TPM is used for providing recommendations to the test group. The metric which is used to measure the performance of both recommendation algorithms is the click-through rate (CTR), which is the ratio of the number of clicks to the number of views of the recommended tv programs. We calculated the CTR metric for each user on daily basis. The performance of the control and test group was derived by averaging the measurements of all the users in the corresponding group.



The CTR of the recommended videos is calculated as the number of clicks on the recommended videos every time the user visits the YOUTV website. It directly measures the quality of the recommendations as how many of the recommendations are clicked on, thus liked, by the user. Figure 8 shows the CTR of the recommended videos for the control and test group in the 30 days. The values are scaled so that the CTR of the control group in the first day is 1. As shown in the figure, the CTR in the test group is clearly higher than the CTR in the control group. This demonstrates that the proposed Markov chain-based TPM recommendation method improved considerably the quality of videos recommendations. On average, the Markov chain-based TPM method improves the CTR upon the pre-existing collaborative IBCF method by 93.61% in this real-world setting. The reported results are tested for the difference of CTR means between Markov Chain-based TPM and IBCF and found statistically significant based on one-sided t-test at the 0.05 level. This is very strong evidence for the improvement due to the exploitation of the sequential interactions of users.


图8为对照组和测试组在30天内推荐视频的CTR。将这些值进行缩放,使对照组第一天的CTR为1。从图中可以看出,测试组的CTR明显高于对照组。这表明,本文提出的基于马尔可夫链的TPM推荐方法大大提高了视频推荐的质量。在这个真实环境中,基于马尔科夫链的TPM方法比已有的协作IBCF方法平均提高了93.61%的CTR。显示的结果检验了基于马尔可夫链的TPM和IBCF的CTR平均值的差异,发现单侧 t t t检验的统计值可以达到 0.05 0.05 0.05。这是对用户顺序交互的利用所带来改进的有力证据。


图8. 对于在线评估,对照组和测试组在30天内推荐视频的CTR。

7. 讨论 Discussion

In this section, we discuss the challenges for evaluating recommendations with an offline and an online evaluation protocol. We used the offline evaluation to select the two best recommendation algorithms along with their parameters that could most likely generate good recommendations in a live experiment. In contrast to what is reported in [5], in our case we have verified that our method is better than the IBCF in both the online and offline experiments. Thus, based on our own findings the offline experiments can reflect the relative performance of these two techniques. Of course, not many researchers can have access in the log server of a real-life company. To solve this, NewsReel [15] enabled an offline evaluation scenario similar to the online one by representing the data as a stream, which allow researchers to “replay” it offline. In other words, researchers can emulate the server used in online scenarios and carry out A/B tests for the news recommendation task.



为了解决这个问题,NewsReel [ 15 ] [15] [15]将数据表示为一个流,使得研究人员可以离线“回放”它,从而实现了类似于在线评估的离线场景。换句话说,研究者可以模拟在线场景中使用的服务器,对新闻推荐任务进行A/B测试。

In our live experiments, we used the CTR metric, whereas in our offline evaluation we have used the precision metric. Generally, the two metrics are considered as similar, but of course there are differences. The main difference is that the precision is computed for a specific number of recommended items, whereas for the CTR there is no such a parameter, but instead it just uses in the denominator the number of video views. This means that in precision as we increase the number of top-N recommended items precision always drops, whereas in the CTR you do not have such trend, which means that the acquired results differ but they should follow the same relative trends. The reason that CTR is used in the industry is because it is related with the revenues generated by the online advertisement (i.e, pay per click over views), which is a way to justify the performance of a recommender system through its generated revenue or other engagement objectives such as user clicks, user likes, and leaving a positive rating. For example, to measure the engagement of the users in social media (i.e, LinkedIn, Facebook, etc.) for a business’s post, we simply compute the ratio of the sum of comments, clicks, and likes on this post to the number of impressions.




One of the important findings in this work, is that the Markov Chain-based TPM, which exploits information from sessions is proved to be very effective in capturing the short-term preferences of the users, whereas the IBCF is mainly adequate for capturing the long term user preferences. The fact that in our case study, the items have a very short life span (e.g., news and tv programs) makes Markov Chain-based TPM more suitable for capturing this short-term dynamics. Of course, for cases where items have longer life span such as movies or books, the idea of trying to capture together the long-term and short-term dynamics may be more effective. As discussed in section 3 the intra-session Markov Chain-based item similarity reveals the short-term preference of the user. Based on this stand point, inter-session item similarity could also probably help in finding item similarities when sessions have a very small number of item interactions (i.e. low average number of item interactions per session) and could capture the long-term user preferences. In a future extended version of this paper, we plan to consider also the inter-session item similarity.




Doing online experiments on a real-world system on real users in real time is extremely complicated and tricky to configure and interpret. Someone could argue, that in the offline data set, it is not mentioned what else the user saw on the screen when making decisions about the next item. For example, if an existing system was providing recommendations on the screen, that could have biased the data based on “what” was recommended and “where” on the user’s screen it was shown (e.g., center bias). Moreover, someone could argue that in the online experiments, there could have been some confounding factors, such as latency differences in the recommendation algorithms, or perhaps pre-existing differences in engagement between the two under comparison user groups, that could have affected the outcome. We have to mentioned that, the interface used to provide the recommended items, is shown in Figure 1, which depicts the items which are similar to the target item that the user interacts with.




Another challenge for the offline evaluation of recommender systems is the identification of the size of the future horizon for our predictions and the time window size of how far back into the past we should go and exploit information to train our models. In this work, we have seen some evidence that these thresholds are very sensitive to the items’ life span and to how often users re-appear into the system. However, we need to make further investigations to prove this correlation. In any case, these factors should be taken under consideration for further optimising the recommendation accuracy.



Our method is content agnostic and can be applied to different recommendation domains i.e., news recommendations in social media or even medicine recommendations for patients based on their Electronic Health Records (EHRs). Please note that for each different domain, our method only needs to sequentially process the user-item interactions. For example, in healthcare we can process the therapeutics (sessions) of patients (users) to consider the medicines (items) which were prescribed in each therapeutic.



8. 结论 Conclusion

In this paper, we proposed a Markov Chain-based TPM method to reveal the short-term intentions of individuals. We have evaluated experimentally our method and compare it against baselines and state-of-the-art algorithms in an offline and an online evaluation setting. We have shown the superiority of our method over its competitors specifically for the case where the items’ life span is very short (i.e., news and tv programs). As future work, we would like to test what is the users’ perception over different explanation styles (i.e., item-based, user-based, session-based, etc.), whenever users get an explanation along with a recommendation for reasons of transparency and accountability. Moreover, we plan to extend our algorithm to consider both the intra- and the inter-session item similarities, to deal with the problem of extreme data sparsity in user sessions.



引用 References

[1] Shuo Chen, Joshua L. Moore, Douglas Turnbull, and Thorsten Joachims. 2012. Playlist prediction via metric embedding. In The 18th ACM International Conference on Knowledge Discovery and Data Mining, KDD’12, Beijing, China. ACM, 714–722.

[2] Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on World Wide Web. ACM, 271–280.

[3] Cagdas Esiyok, Benjamin Kille, Brijnesh Johannes Jain, Frank Hopfgartner, and Sahin Albayrak. 2014. Users’ reading habits in online news portals. In Fifth Information Interaction in Context Symposium, IIiX ’14, Regensburg, Germany. ACM, 263–266.

[4] Florent Garcin, Christos Dimitrakakis, and Boi Faltings. 2013. Personalized news recommendation with context trees. In Proceedings of the 7th ACM conference on Recommender systems, RecSys’13, Hong Kong, China. ACM, 105–112.

[5] Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin, and Amr Huber. 2014. Offline and online evaluation of news recommender systems at swissinfo. ch. In Proceedings of the 8th ACM Conference on Recommender systems. ACM, 169–176.

[6] Carlos A. Gomez-Uribe and Neil Hunt. 2015. The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Manage. Inf. Syst. 6, 4, Article 13 (Dec. 2015), 19 pages.

[7] Qi He, Daxin Jiang, Zhen Liao, Steven C. H. Hoi, Kuiyu Chang, Ee-Peng Lim, and Hang Li. 2009. Web Query Recommendation via Sequential Query Prediction. In Proceedings of the 25th International Conference on Data Engineering, ICDE’09, Shanghai, China. IEEE, 1443–1454.

[8] Balázs Hidasi and Alexandros Karatzoglou. 2017. Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. Proceedings of the 27th ACM International Conferernce on Information and Knowledge Management (2017).

[9]Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In Proceedings of the International Conference on Learning Representations (ICLR 2016), Vol. abs/1511.06939. arxiv:1511.06939

[10] Balázs Hidasi, Massimo Quadrana, Alexandros Karatzoglou, and Domonkos Tikk. 2016. Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems(RecSys ’16). ACM, New York, NY, USA, 241–248.

[11] Dietmar Jannach and Michael Jugovac. 2019. Measuring the business value of recommender systems. ACM Transactions on Management Information Systems (TMIS) 10, 4(2019), 1–23.

[12] Dietmar Jannach, Lukas Lerche, and Michael Jugovac. 2017. Adaptation and Evaluation of Recommendations for Short-term Shopping Goals. In Proceedings of the ninth ACM Conference on Recommender Systems(RecSys ’15). ACM, New York, NY, USA.

[13] D. Jannach and M. Ludewig. 2017. When Recurrent Neural Networks Meet the Neighborhood for Session-Based Recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems(RecSys ’17). ACM, New York, NY, USA, 306–310.

[14] Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems. 306–310.

[15] Benjamin Kille, Andreas Lommatzsch, Frank Hopfgartner, Martha Larson, and Arjen P de Vries. 2017. A stream-based resource for multi-dimensional evaluation of recommender algorithms. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1257–1260.

[16] Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1419–1428.

[17] Lei Li, Li Zheng, Fan Yang, and Tao Li. 2014. Modeling and broadening temporal user interest in personalized news recommendation. Expert Systems with Applications 41, 7 (2014), 3168 – 3177.

[18] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news recommendation based on click behavior. In Proceedings of the 15th international conference on Intelligent user interfaces. ACM, 31–40.

[19] Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018. STAMP: short-term attention/memory priority model for session-based recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1831–1839.

[20] Malte Ludewig and Dietmar Jannach. 2018. Evaluation of session-based recommendation algorithms. User Modeling and User-Adapted Interaction 28, 4-5 (2018), 331–390.

[21] Brian McFee and Gert R. G. Lanckriet. 2011. The Natural Language of Playlists. In Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR’11, Miami, Florida, USA. ISMIR, 537–542.

[22] Omar Moling, Linas Baltrunas, and Francesco Ricci. 2012. Optimal radio channel recommendations with explicit and implicit feedback. In Sixth ACM Conference on Recommender Systems, RecSys’12, Dublin, Ireland. ACM, 75–82.

[23] Ioannis K. Paparrizos, Berkant Barla Cambazoglu, and Aristides Gionis. 2011. Machine learned job recommendation. In Proceedings of the 2011 ACM Conference on Recommender Systems, RecSys’11, Chicago, IL, USA. ACM, 325–328.

[24] Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-aware recommender systems. ACM Computing Surveys (CSUR) 51, 4 (2018), 1–36.

[25] Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and Paolo Cremonesi. 2017. Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks. In Proceedings of the Eleventh ACM Conference on Recommender Systems(RecSys ’17). ACM, New York, NY, USA, 130–137.

[26] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized Markov chains for next-basket recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW’10, Raleigh, North Carolina, USA. ACM, 811–820.

[27] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. 1994. GroupLens: An open architecture for collaborative filtering on netnews. In Proceedings of the Computer Supported Collaborative Work Conference. 175–186.

[28] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web(WWW ’01). ACM, New York, NY, USA, 285–295.

[29] Guy Shani, David Heckerman, and Ronen I. Brafman. 2005. An MDP-Based Recommender System. In Journal of Machine Learning Research (2005), Vol. 6. JMLR, 1265–1295.

[30] Elena Smirnova and Flavian Vasile. 2017. Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks. In Proceedings of the 2Nd Workshop on Deep Learning for Recommender Systems(DLRS 2017). ACM, New York, NY, USA, 2–9.

[31] Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved Recurrent Neural Networks for Session-based Recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems(DLRS 2016). ACM, New York, NY, USA, 17–22.

[32] J. Vinagre, A.M. Jorge, and J. Gama. 2014. Evaluation of recommender systems in streaming environments. In Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD 2014), held in conjunction with RecSys 2014. Silicon Valley, USA.

[33] Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M Jose, and Xiangnan He. 2019. A simple convolutional generative network for next item recommendation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 582–590.

[34] Andrew Zimdars, David Maxwell Chickering, and Christopher Meek. 2001. Using Temporal Data for Making Recommendations. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence(UAI 2001). San Francisco, CA, USA, 580–588.

已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 撸撸猫 设计师:设计师小姐姐 返回首页