Learning to Rank Stabilization 2017
Xapian has an experimental Learning to Rank(Letor) API, with work split across a few previous GSoC projects. The aim of this project was to consolidate the work done earlier on xapian-letor to create a stable and usable Learning to Rank API. This project was split into the following goals:
- Resolving #733: Replace Feature::Internal with FeatureList::Internal.
- Resolving #734: Rankers should update MSet instead of returning sorted docids.
- Writing an automated test suite for Learning to Rank.
- Writing Practical Examples and updating the documentation.
- Integrating ListMLE Ranker, Ada Ranker and ERR Scorer from vhasu's 2014 GSoC directory.
- Performance testing with suitable datasets.
Work Done in this Project:
All my contributions can be accessed here. Apart from fixing bugs and reformatting the work done in GSoC 2017 can be summarized into following
- Merged and Resolved #733: Feature::Internal was eliminated and replaced by FeatureList::Internal.
- Merged and Resolved #734: Made changes so that Rankers updates the MSet instead of returning sorted docids.
- An automated Test suite was written containing comprehensive tests covering xapian-letor extensively.
- ListMLE was added to xapian-letor along with the relevant test.
- ERR Scorer was added to xapian-letor along with the relevant test.
Work in Progress:
- Updating xapian-docsprint to add a section about Xapian Learning to Rank PR #18.
- Performance testing with suitable datasets (A great dataset which contains queries, documents and qrel can be accessed here). In particular we are using fixed co-efficients which can be optimized further.
- Re-formating existing xapian-letor files to meet xapian coding standards.
- Add a regression for the to combine the scores given by different Rankers and then update the MSet.
- A feature reduction technique can be added to eliminate redundant features.
- Currently the xapian-core api allows updating MSet as whole. We can modify this to update individual entries in an MSet.
- We can add parallelization to xapian-letor using OpenMP or OpenCL to improve the performance.
Add Ada Rank to xapian-letor.-- investigation showed that this was not in a state to be merged, and the remaining work is probably about as much as writing it from scratch