GSoC2011/LTR/Journal – Xapian

wiki:GSoC2011/LTR/Journal

Context Navigation

Community Bonding Week 1: April 25-May 1
Community Bonding Week 2: May 2-May 8
Community Bonding Week 3: May 9-May 15
Community Bonding Week 4: May 16-May 22
Coding Week 1: May 23-May 29
Coding Week 2: May 30-June 5
Coding Week 3: June 6-June 12
Coding Week 4: June 13-June 19
Coding Week 5: June 20-June 26
Coding Week 6: June 27-July 3
Coding Week 7: July 4-July 10 && Coding Week 8: July 11-July 17 (Midterm)
Coding Week 9: July 18-July 24
Coding Week 10: July 25-July 31
Coding Week 11: August 1-August 7
Coding Week 12: August 8-August 14
Coding Week 13: August 15-August 22 (Final evaluation)

Community Bonding Week 1: April 25-May 1

Community Bonding Week 2: May 2-May 8

Community Bonding Week 3: May 9-May 15

Community Bonding Week 4: May 16-May 22

Coding Week 1: May 23-May 29

Learnt to add my local changes to my branch. Details

Figured out how to fetch features from the existing xapian api.

Created an temporary utility questletor.cc in core/examples which fetches out the defined features.

Have put all the direct and indirect methods to calculated the features in questletor.cc though it is not the proper place to include them. Will move them to a proper place just after finalising the framework of Letor [Learning to Rank].

Coding Week 2: May 30-June 5

Off

Coding Week 3: June 6-June 12

Searched for different Evaluation Forums data suitable for the Task like TREC, CLEF, FIRE, INEX etc..

Zeroed down to INEX dataset due to its format (XML) and quite general nature (wikipedia articles).

Downloaded the data collection, queries and relevance judgements. [~50GB Uncompressed in total]

Thought of using its only one part of ~11GB for the purpose but later learnt that it gives quite small training file so have indexed 3 parts with ~32 GB and 2,000,038 documents.

Indexing took 24 Hours with 2GB RAM and intel core 2 duo processor though it could have been reduced by setting XAPIAN_FLUSH_THRESHOLD = 40000.

Coding Week 4: June 13-June 19

Prepared the basic framework to fit in the learning to rank weighting scheme into Xapian

Prepared Xapian::Letor class with declaration of methods : calculate_f1(),calculate_f2()..,calculate_f6(), different statistic fetching methods in order to be passed to calculate the features, declaration of prepare_training_file(), learn_model(), learn_score() methods

Defined the prepare_training_file() method which can generate 'train.txt' file - training file for machine learning.

Coding Week 5: June 20-June 26

till 22nd June

Added the code to compute QueryLevelNorm for the values of the features.

To know more about QueryLevelNorm read Here.

Coding Week 6: June 27-July 3

Completed Method letor_score() definition. It is the infrastructure where we can pass the initial MSet which will be reranked according to new scored assigned to the documents based on Learned Model.

With this all the required methods are in place except letor_learn_model() because its quite dependent on the SVM tool we use.

Checked the performance and time statistics of some SVM tools available like RankSVM , TinySVM and libSVM. Performance of all the SVM tools is significantly better than BM25 weighting scheme. RankSVM has some licensing issues, TinySVM takes painfully longer time in learning. While libSVM has both the issues clear with 3-clause BCD license and real-time learning time. So finally zeroed down choice to libSVM.

Have a look at the framework of the system here. https://trac.xapian.org/wiki/GSoC2011/LTR/LTRFramework

Coding Week 7: July 4-July 10 && Coding Week 8: July 11-July 17 (Midterm)

Understood the code of the LibSVM-3.1 and debugged it.

Incorporated libSVM successfully in Xapian::Letor class.

completed the methods letor_learn_model(), letor_score()

Coding Week 9: July 18-July 24

Started preparing the Overview Documentation of the Learning-to-Rank

Learnt how to document using reStructuredText

Merged trunk with branch successfully [thanks ojwb]. Details

Coding Week 10: July 25-July 31

Checked the mathematical conditions for the features calculation in the implementation to see if any of them violate any.

Computed the MAP and NDCG score for INEX queries on Ad-hoc data collection of INEX. Letor found to be significantly outperforming BM25 ranking scheme.

Details of the evaluation can be found Here.

Coding Week 11: August 1-August 7

Removed redundancy in the code and tried to optimize it.

Stored collection information needed in the feature calculation in the user metadata in omindex.cc so that it doesn't take much on-line time.

Coding Week 12: August 8-August 14

Coding Week 13: August 15-August 22 (Final evaluation)

Last modified 10 years ago Last modified on 01/26/16 10:10:43

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text

Context Navigation

Table of Contents

Community Bonding Week 1: April 25-May 1

Community Bonding Week 2: May 2-May 8

Community Bonding Week 3: May 9-May 15

Community Bonding Week 4: May 16-May 22

Coding Week 1: May 23-May 29

Coding Week 2: May 30-June 5

Coding Week 3: June 6-June 12

Coding Week 4: June 13-June 19

Coding Week 5: June 20-June 26

Coding Week 6: June 27-July 3

Coding Week 7: July 4-July 10 && Coding Week 8: July 11-July 17 (Midterm)

Coding Week 9: July 18-July 24

Coding Week 10: July 25-July 31

Coding Week 11: August 1-August 7

Coding Week 12: August 8-August 14

Coding Week 13: August 15-August 22 (Final evaluation)

Download in other formats: