Jul 26, 2013

Learning to rank and recommender systems

A recommendation problem is essentially a ranking problem: Among a list of movies, which should rank higher in order to be recommended? Among the job candidates, who should LinkedIn display to the recruiters?  The task of recommendation can be viewed as creating a ranked list.

Classical approach to recommender systems is based on collaborative filtering. This is an approach using similar users or similar items to make recommendation.  Collaborative filtering is popularized by the Netflix contest from 2006 to 2009, when many teams around the world participated to create movie recommendation based on movie ratings provided by Netflix. 

While collaborative filtering has achieved certain success, it has its limitations. The fundamental problem is the limited information captured in user-item table. Each cell of this table is either a rating or some aggregated activity score (such as purchase) on an item (from a specific user). Complex information such as user browsing time, clicks, or external events is hard to capture in such table format.

A ranking approach to recommendation is much more flexible. It can incorporate all the information as different variables (features). Thus it is more explicit. In addition, we can combine ranking with machine learning, allowing ranking function evolve over time based on data. 

In traditional approach to ranking, a ranking score is generated by some fixed rules. For example, a page’s score depends on links pointing to that page, its text content, and its relevance to search keywords. Other information such as visitor’s location, or time of day could all be part of the ranking formula. In this formula, variables and their weights are pre-defined.

The idea of Learning to Rank is using actual user data to create a ranking function. The machine learning procedure for ranking has the following steps:
  1. Gather training data based on click information.
  2. Gather all attributes about each data point, such as item information, user information, time of day etc. 
  3. Create a training dataset that has 2 classes: positive (Click) and negative (no click).
  4. Apply a supervised machine learning algorithm (such as logistic regression) to the training data
  5. The learned model is our ranking model. 
  6. For any new data point, the ranking model assigns a probability score between 0 and 1 on whether the item will be clicked (selected). We call this probability score our “ranking score”. 

The training data are constructed from user visit logs containing user clicks or it may be prepared manually by human raters.

The learning to rank approach to recommendation has been adopted by Netflix and LinkedIn today. It is fast and can be trained repeatedly. It is behind those wonderful movie recommendations and connection recommendations we enjoy on these sites. 

Jul 9, 2013

Text Mining: Name Entity Detection (2)

In the machine learning approach for detecting named entities, we first create labeled data, which are sentences with each word tagged. The tags correspond to the entities of interest. For example,
May      Smith   was  born   in   May.
        Person  Person                        Date
Since an entity may have several words, we can use finer tags in IOB (in-out-begin) format, which indicates whether the word is the beginning (B) or inside (I) of a phrase. We will have tags for B-Person, I-Person, B-Date, I-Date and O (other).  With such tagging, the above example becomes
May          Smith      was  born  in    May.
B-Person  I-Person   O     O    O   B-Date

Our training data have every word tagged. Our goal is learning about this mapping and apply it a new sentence. In other words, we want to find a mapping from a word (given its context) to an entity label.

Each word can be represented as a set of features. A machine learning model maps input features to an output (entity class). Such features could include:
Word identity (the word itself)
Word position from the beginning.
Word position from the end.
the word before
the word after
Is the word capitalized?

Our training data would look like the following, where each row is a data point.
Word identity
position from the beginning
position to the end
word before
word after
Is capitalized?

Once we create the training data like the above table, we can then train a standard machine learning method such as SVM, decision tree or logistic regression to generate a model (which contains machine generated rules). We can then apply this model to any new sentence and detect entity types related to "Person" and "Date".

Note that we can only detect entity types contained in our training data. While this may seem a limitation, it allows us to discover new instances or values associated with existing entity types.

The approach we have discussed so far is a supervised learning approach, which depends heavily on human-labeled data. When manual labels are hard to come by, we can find ways to create training data by machine. Such approach is called semi-supervised learning, which holds a lot of promise when data get large.

Jul 8, 2013

Text Mining: Named Entity Detection

An interesting task of text mining is detecting entities in the text. Such entities could be a person, a company, a product, or a location. Since an entity is associated with a special name, it is also called Named Entity. For example, the following text contains 3 named entities:
       Apple has hired Paul Deneve as vice president, reporting to CEO Tim Cook.
The first term “Apple” indicates a company, and the second and third are persons.

Named entity detection (NER) is an important component in  social media analysis. It helps us to understand user sentiment on specific products. NER is also important for product search for E-commerce companies. It helps us to understand user search query related to certain products.

To map each name to an entity, one solution is using a dictionary of special names. Unfortunately, this approach has two serious problems. The first problem is that our dictionary is not complete. New companies are created and new products are sold every day. It is hard to keep track all the new names. The second problem is the ambiguity of associating a name to an entity. The following example illustrates this:
As Washington politicians argue about the budget reform, it is a good time to look back at George Washington’s time.
In this text, the first mention of “Washington” refers to a city, while the second mention refers to a person. The distinction of these two entities comes from their context.

To resolve ambiguity in entity mapping,  we can create certain rules to utilize the context. For example, we can create the following rules:
  1. When ‘Washington’ is followed by ‘politician’, then it refers to a city.
  2. When ‘Washington’ is preceded by ‘in’, then it refers to a city.
  3. When ‘Washington’ is preceded by ‘George’, then it refers to a person.
But such rules could be too many. For example, each of the following phrases would generate a different rule: “Washington mentality”, “Washington atmosphere”, “Washington debate” as well as “Washington biography” and “Washington example”. The richness of natural language makes the number of rules exploding and still susceptible to exceptions.

Instead of manually creating rules, we can apply machine learning. The advantage of machine learning is that it creates patterns automatically from examples. No rule needs to be manually written by humans. The machine learning algorithm takes a set of training examples, and chunk out its own model (that is comparable to rules). If we get new training data, we can re-train the machine learning algorithm and generate a new model quickly.

How does the machine learning approach work? I will discuss it in the next post.