Learning to Rank Stabilisation
Learning to Rank (Letor) is the application of Machine Learning (ML) to Information Retrieval (IR), in particular to the problem of ranking. Each document is represented by a vector of features. These features try to distinguish the levels of relevancy between documents (in the simple binary case between relevant and non-relevant documents). In the academic literature, Learning-to-Rank has been shown to perform better than unsupervised ranking models like TF-IDF or BM-25, especially in document retrieval and web-page retrieval.
Xapian has an experimental Letor API, with work split across a few previous GSoC projects, only some of which has been merged. This project is about consolidating the work done so far to get to a stable, tested core of functionality that can be included in a future Xapian release. This is done by the following steps:
- Creating a stable user-facing API
- Integrating work done on various branches
- Integrating a test-suite and writing automated tests for the user-facing API
- Writing some practical code examples and updating the documentation
Contributions
Through this project, xapian-letor
attains a useable state.
Merged
Following components have been merged to xapian master:
- Updates to the user-facing API
- Created
Feature
class and sub-classes, which handle calcuating a single letor feature. - Created
FeatureList
class that does the work of creatingFeatureVector
objects by calling onFeature
objects for feature values.
- Created
- Refactoring and cleaning-up of existing methods and removing unused methods and classes
- Removed
RankList
,Features
&FeatureManager
classes from the API. - Removed dead methods from the API
- Bug fixes
- Removed
- Integrating the automated test-suite
Link to commits: https://github.com/xapian/xapian/commits/master?author=ayshtmr
PRs currently open Merged
- PR#123 Integrate ListNET and NDCGScore
- This PR integrates
ListNETRanker
andNDCGScore
. - This PR also includes "Usage Guide" update and addition of practical code examples on how to use core functionality of
xapian-letor
.
- This PR integrates
- PR#124 Disable backend build options and update test harness
- This PR sets up a new test harness for
xapian-letor
, which uses the default database backend enabled byxapian-core
.
- This PR sets up a new test harness for
PRs in line Merged
- Exception handling for
xapian-letor
- (Merged as part of PR#123)- This PR will integrate exception handling for
xapian-letor
.
- This PR will integrate exception handling for
Future work
- Change the way Features request and get various statistics Ticket#733
- Returning MSet instead of sorted docids after re-ranking Ticket#734
- Writing automated tests for the API
Putting(Merged as part of PR#125)Letor
class methods directly underRanker
Storing models as database metadata instead of a file(Merged as part of PR#125)- Integrating remaining rankers and scorers
SVMRanker(Merged as part of PR#127)- ListMLE
- AdaRank
- ERR
- Testing for performance and scale with INEX2009 or similar data-set
- Revising existing documentation
- Python bindings