wiki:GSoC2017/LetorStabilisation/ProjectReport

Context Navigation

Learning to Rank Stabilization 2017

Xapian has an experimental Learning to Rank(Letor) API, with work split across a few previous GSoC projects. The aim of this project was to consolidate the work done earlier on xapian-letor to create a stable and usable Learning to Rank API. This project was split into the following goals:

Resolving #733: Replace Feature::Internal with FeatureList::Internal.
Resolving #734: Rankers should update MSet instead of returning sorted docids.
Writing an automated test suite for Learning to Rank.
Writing Practical Examples and updating the documentation.
Integrating ListMLE Ranker, Ada Ranker and ERR Scorer from vhasu's 2014 GSoC directory.
Performance testing with suitable datasets.

Work Done in this Project:

All my contributions can be accessed here. Apart from fixing bugs and reformatting the work done in GSoC 2017 can be summarized into following

Merged and Resolved #733: Feature::Internal was eliminated and replaced by FeatureList::Internal.
Merged and Resolved #734: Made changes so that Rankers updates the MSet instead of returning sorted docids.
An automated Test suite was written containing comprehensive tests covering xapian-letor extensively.
ListMLE was added to xapian-letor along with the relevant test.
ERR Scorer was added to xapian-letor along with the relevant test.

Work in Progress:

Updating xapian-docsprint to add a section about Xapian Learning to Rank PR #18.

Future Work:

Performance testing with suitable datasets (A great dataset which contains queries, documents and qrel can be accessed here). In particular we are using fixed co-efficients which can be optimized further.
Re-formating existing xapian-letor files to meet xapian coding standards.
Add a regression for the to combine the scores given by different Rankers and then update the MSet.
A feature reduction technique can be added to eliminate redundant features.
Currently the xapian-core api allows updating MSet as whole. We can modify this to update individual entries in an MSet.
We can add parallelization to xapian-letor using OpenMP or OpenCL to improve the performance.
~~Add Ada Rank to xapian-letor.~~ -- investigation showed that this was not in a state to be merged, and the remaining work is probably about as much as writing it from scratch

Last modified 7 years ago Last modified on 11/02/18 10:10:07

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text