with existing query log data.
There are two ways of improving a classifier: One is selecting the most appropriate machine learning algorithm, the other is improving features for the classi er.
People have experimented with different machine learning algorithms for click prediction, including ranking SVM, binary SVM, probit regression, decision trees, and gradient boosted decision trees. We use logistic regression more often. This algorithm is fast and performs as well as SVM. In addition, it is less prone to over fitting.
The second way to improve a classi fier's performance is using good features. For this purpose, people tried various features for click prediction model. The most important feature is historical Click Through Rate (CTR) when such data exist. However, on many websites, new items or documents are introduced daily. They don't have click history. In this case, many other features can be used.
Two major types of features are item features and user features. Item features include information of a document or a product. For example, an item feature for a product can be price and posting date etc. User features include both demographic information and behavior information. Behavior information is user browsing behavior such as dwell time and query reformulation. In addition, we can use text features from user query.