wiki:GSoC2016/LetorStabilisation/Code

Learning to Rank Stabilisation

Learning to Rank (Letor) is the application of Machine Learning (ML) to Information Retrieval (IR), in particular to the problem of ranking. Each document is represented by a vector of features. These features try to distinguish the levels of relevancy between documents (in the simple binary case between relevant and non-relevant documents). In the academic literature, Learning-to-Rank has been shown to perform better than unsupervised ranking models like TF-IDF or BM-25, especially in document retrieval and web-page retrieval.

Xapian has an experimental Letor API, with work split across a few previous GSoC projects, only some of which has been merged. This project is about consolidating the work done so far to get to a stable, tested core of functionality that can be included in a future Xapian release. This is done by the following steps:

  1. Creating a stable user-facing API
  2. Integrating work done on various branches
  3. Integrating a test-suite and writing automated tests for the user-facing API
  4. Writing some practical code examples and updating the documentation

Contributions

Through this project, xapian-letor attains a useable state.

Merged

Following components have been merged to xapian master:

  • Updates to the user-facing API
    • Created Feature class and sub-classes, which handle calcuating a single letor feature.
    • Created FeatureList class that does the work of creating FeatureVector objects by calling on Feature objects for feature values.
  • Refactoring and cleaning-up of existing methods and removing unused methods and classes
    • Removed RankList, Features & FeatureManager classes from the API.
    • Removed dead methods from the API
    • Bug fixes
  • Integrating the automated test-suite

Link to commits: https://github.com/xapian/xapian/commits/master?author=ayshtmr

PRs currently open Merged

  • PR#123 Integrate ListNET and NDCGScore
    • This PR integrates ListNETRanker and NDCGScore.
    • This PR also includes "Usage Guide" update and addition of practical code examples on how to use core functionality of xapian-letor.
  • PR#124 Disable backend build options and update test harness
    • This PR sets up a new test harness for xapian-letor, which uses the default database backend enabled by xapian-core.

PRs in line Merged

  • Exception handling for xapian-letor - (Merged as part of PR#123)
    • This PR will integrate exception handling for xapian-letor.

Future work

  • Change the way Features request and get various statistics Ticket#733
  • Returning MSet instead of sorted docids after re-ranking Ticket#734
  • Writing automated tests for the API
  • Putting Letor class methods directly under Ranker (Merged as part of PR#125)
  • Storing models as database metadata instead of a file (Merged as part of PR#125)
  • Integrating remaining rankers and scorers
    • SVMRanker (Merged as part of PR#127)
    • ListMLE
    • AdaRank
    • ERR
  • Testing for performance and scale with INEX2009 or similar data-set
  • Revising existing documentation
  • Python bindings
Last modified 3 years ago Last modified on 19/01/17 05:59:59