wiki:GSoC2017/LetorStabilisation/ProjectReport

Learning to Rank Stabilization 2017

Xapian has an experimental Learning to Rank(Letor) API, with work split across a few previous GSoC projects. The aim of this project was to consolidate the work done earlier on xapian-letor to create a stable and usable Learning to Rank API. This project was split into the following goals:

  • Resolving #733: Replace Feature::Internal with FeatureList::Internal.
  • Resolving #734: Rankers should update MSet instead of returning sorted docids.
  • Writing an automated test suite for Learning to Rank.
  • Writing Practical Examples and updating the documentation.
  • Integrating ListMLE Ranker, Ada Ranker and ERR Scorer from vhasu's 2014 GSoC directory.
  • Performance testing with suitable datasets.

Work Done in this Project:

All my contributions can be accessed here. Apart from fixing bugs and reformatting the work done in GSoC 2017 can be summarized into following

  • Merged and Resolved #733: Feature::Internal was eliminated and replaced by FeatureList::Internal.
  • Merged and Resolved #734: Made changes so that Rankers updates the MSet instead of returning sorted docids.
  • An automated Test suite was written containing comprehensive tests covering xapian-letor extensively.
  • ListMLE was added to xapian-letor along with the relevant test.
  • ERR Scorer was added to xapian-letor along with the relevant test.

Work in Progress:

  • Updating xapian-docsprint to add a section about Xapian Learning to Rank PR #18.

Future Work:

  • Performance testing with suitable datasets (A great dataset which contains queries, documents and qrel can be accessed here). In particular we are using fixed co-efficients which can be optimized further.
  • Re-formating existing xapian-letor files to meet xapian coding standards.
  • Add a regression for the to combine the scores given by different Rankers and then update the MSet.
  • A feature reduction technique can be added to eliminate redundant features.
  • Currently the xapian-core api allows updating MSet as whole. We can modify this to update individual entries in an MSet.
  • We can add parallelization to xapian-letor using OpenMP or OpenCL to improve the performance.
  • Add Ada Rank to xapian-letor. -- investigation showed that this was not in a state to be merged, and the remaining work is probably about as much as writing it from scratch
Last modified 20 months ago Last modified on 11/02/18 10:10:07