amazon

Thursday, August 13, 2015

Active Learning for Ranking through Expected Loss Optimization




Active Learning for Ranking through Expected Loss Optimization



Abstract:

Learning to rank arises in many data mining applications, ranging from Web search engine, online advertising to recommendation system. In learning to rank, the performance of a ranking model is strongly affected by the number of labeled examples in the training set; on the other hand, obtaining labeled examples for training data is very expensive and time-
consuming. This presents a great need for the active learning approaches to
select most informative examples for ranking learning; however, in the literature there is still very limited work to address active learning for ranking. In this paper, we propose a general active learning framework, Expected Loss Optimization (ELO), for ranking. The ELO framework is applicable to a wide range of ranking functions. Under this framework, we derive a novel algorithm, Expected Discounted Cumulative Gain (DCG) Loss Optimization (ELO-DCG), to select most informative examples. Then, we investigate both query and document level active learning for raking and propose a two-stage ELO-DCG algorithm which incorporate both query and document selection into active learning. Furthermore, we show that it is flexible for the algorithm to deal with the skewed grade distribution problem with the modification of the loss function. Extensive





experiments on real-world Web search data sets have demonstrated great
potential and effectiveness of the proposed framework and algorithms.











Existing System:

Learning to rank represents an important class of supervised machine learning tasks with the goal of automatically constructing ranking functions from training data. As many other supervised machine learning problems, the quality of a ranking function is highly correlated with the amount of labeled data used to train the function.

Due to the complexity of many ranking problems, a large amount of labeled training examples is usually required to learn a high quality ranking function. However, in most applications, while it is easy to collect unlabeled samples, it is very expensive and time-consuming to label the samples.

Proposed System:

First, there is no notion of classification margin in ranking. Hence, many of the margin-based active learning algorithms proposed for classification tasks are not readily applicable to ranking. Furthermore, even some straightforward active learning approach, such as query-by-committee, has not been justified for the ranking tasks under regression framework.

Second, in most supervised learning setting, each data sample can be treated completely independent of each other. In learning to rank, data examples are not independent, though they are conditionally independent





given a query. We need to consider this data dependence in selecting data
and tailor active learning algorithms according to the underlying learning to rank schemes.

Third, ranking problems are often associated with very skewed data distributions.

Hardware Requirements:



•       System              : Pentium IV 2.4 GHz.

•       Hard Disk        : 40 GB.

•       Floppy Drive   : 1.44 Mb.

•       Monitor            : 15 VGA Colour.

•       Mouse               : Logitech.

•       RAM                 : 256 Mb.




Software Requirements:


•       Operating system    : - Windows XP.

•       Front End                  : - JSP

•       Back End                   : - SQL Server

Software Requirements:


•       Operating system    : - Windows XP.

•       Front End                  : - .Net






•       Back End                   : - SQL Server

No comments:

Post a Comment