Pages

Jul 26, 2013

Learning to rank and recommender systems

A recommendation problem is essentially a ranking problem: Among a list of movies, which should rank higher in order to be recommended? Among the job candidates, who should LinkedIn display to the recruiters?  The task of recommendation can be viewed as creating a ranked list.

Classical approach to recommender systems is based on collaborative filtering. This is an approach using similar users or similar items to make recommendation.  Collaborative filtering is popularized by the Netflix contest from 2006 to 2009, when many teams around the world participated to create movie recommendation based on movie ratings provided by Netflix. 

While collaborative filtering has achieved certain success, it has its limitations. The fundamental problem is the limited information captured in user-item table. Each cell of this table is either a rating or some aggregated activity score (such as purchase) on an item (from a specific user). Complex information such as user browsing time, clicks, or external events is hard to capture in such table format.

A ranking approach to recommendation is much more flexible. It can incorporate all the information as different variables (features). Thus it is more explicit. In addition, we can combine ranking with machine learning, allowing ranking function evolve over time based on data. 

In traditional approach to ranking, a ranking score is generated by some fixed rules. For example, a page’s score depends on links pointing to that page, its text content, and its relevance to search keywords. Other information such as visitor’s location, or time of day could all be part of the ranking formula. In this formula, variables and their weights are pre-defined.

The idea of Learning to Rank is using actual user data to create a ranking function. The machine learning procedure for ranking has the following steps:
  1. Gather training data based on click information.
  2. Gather all attributes about each data point, such as item information, user information, time of day etc. 
  3. Create a training dataset that has 2 classes: positive (Click) and negative (no click).
  4. Apply a supervised machine learning algorithm (such as logistic regression) to the training data
  5. The learned model is our ranking model. 
  6. For any new data point, the ranking model assigns a probability score between 0 and 1 on whether the item will be clicked (selected). We call this probability score our “ranking score”. 

The training data are constructed from user visit logs containing user clicks or it may be prepared manually by human raters.

The learning to rank approach to recommendation has been adopted by Netflix and LinkedIn today. It is fast and can be trained repeatedly. It is behind those wonderful movie recommendations and connection recommendations we enjoy on these sites. 

4 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Hi,

    I really liked this article. Could you please provide more related material on "Learn to rank". Actually I am working on a similar problem, it would be really helpful.

    ReplyDelete
  4. So what call to action can you leave developers looking to augment or implement future recommendation systems? Can you provide an example of what an improved recommendation engine would look like? For example, how would you go about soliciting reviews for how good this article was? To get more info please visit http://essaydaddy.com/custom-essay/.

    ReplyDelete