Table of Contents
- Community Bonding Week 1: April 25-May 1
- Community Bonding Week 2: May 2-May 8
- Community Bonding Week 3: May 9-May 15
- Community Bonding Week 4: May 16-May 22
- Coding Week 1: May 23-May 29
- Coding Week 2: May 30-June 5
- Coding Week 3: June 6-June 12
- Coding Week 4: June 13-June 19
- Coding Week 5: June 20-June 26
- Coding Week 6: June 27-July 3
- Coding Week 7: July 4-July 10 && Coding Week 8: July 11-July 17 (Midterm)
- Coding Week 9: July 18-July 24
- Coding Week 10: July 25-July 31
- Coding Week 11: August 1-August 7
- Coding Week 12: August 8-August 14
- Coding Week 13: August 15-August 22 (Final evaluation)
Community Bonding Week 1: April 25-May 1
Community Bonding Week 2: May 2-May 8
Community Bonding Week 3: May 9-May 15
Community Bonding Week 4: May 16-May 22
Coding Week 1: May 23-May 29
- Learnt to add my local changes to my branch. Details
- Figured out how to fetch features from the existing xapian api.
- Created an temporary utility questletor.cc in core/examples which fetches out the defined features.
- Have put all the direct and indirect methods to calculated the features in questletor.cc though it is not the proper place to include them. Will move them to a proper place just after finalising the framework of Letor [Learning to Rank].
Coding Week 2: May 30-June 5
Coding Week 3: June 6-June 12
- Searched for different Evaluation Forums data suitable for the Task like TREC, CLEF, FIRE, INEX etc..
- Zeroed down to INEX dataset due to its format (XML) and quite general nature (wikipedia articles).
- Downloaded the data collection, queries and relevance judgements. [~50GB Uncompressed in total]
- Thought of using its only one part of ~11GB for the purpose but later learnt that it gives quite small training file so have indexed 3 parts with ~32 GB and 2,000,038 documents.
- Indexing took 24 Hours with 2GB RAM and intel core 2 duo processor though it could have been reduced by setting XAPIAN_FLUSH_THRESHOLD = 40000.
Coding Week 4: June 13-June 19
- Prepared the basic framework to fit in the learning to rank weighting scheme into Xapian
- Prepared Xapian::Letor class with declaration of methods : calculate_f1(),calculate_f2()..,calculate_f6(), different statistic fetching methods in order to be passed to calculate the features, declaration of prepare_training_file(), learn_model(), learn_score() methods
- Defined the prepare_training_file() method which can generate 'train.txt' file - training file for machine learning.
Coding Week 5: June 20-June 26
till 22nd June
- Added the code to compute QueryLevelNorm for the values of the features.
- To know more about QueryLevelNorm read Here.
Coding Week 6: June 27-July 3
- Completed Method letor_score() definition. It is the infrastructure where we can pass the initial MSet which will be reranked according to new scored assigned to the documents based on Learned Model.
- With this all the required methods are in place except letor_learn_model() because its quite dependent on the SVM tool we use.
- Checked the performance and time statistics of some SVM tools available like RankSVM , TinySVM and libSVM. Performance of all the SVM tools is significantly better than BM25 weighting scheme. RankSVM has some licensing issues, TinySVM takes painfully longer time in learning. While libSVM has both the issues clear with 3-clause BCD license and real-time learning time. So finally zeroed down choice to libSVM.
- Have a look at the framework of the system here. https://trac.xapian.org/wiki/GSoC2011/LTR/LTRFramework
Coding Week 7: July 4-July 10 && Coding Week 8: July 11-July 17 (Midterm)
- Understood the code of the LibSVM-3.1 and debugged it.
- Incorporated libSVM successfully in Xapian::Letor class.
- completed the methods letor_learn_model(), letor_score()
Coding Week 9: July 18-July 24
- Started preparing the Overview Documentation of the Learning-to-Rank
- Learnt how to document using reStructuredText
- Merged trunk with branch successfully [thanks ojwb]. Details
Coding Week 10: July 25-July 31
- Checked the mathematical conditions for the features calculation in the implementation to see if any of them violate any.
- Computed the MAP and NDCG score for INEX queries on Ad-hoc data collection of INEX. Letor found to be significantly outperforming BM25 ranking scheme.
- Details of the evaluation can be found Here.
Coding Week 11: August 1-August 7
- Removed redundancy in the code and tried to optimize it.
- Stored collection information needed in the feature calculation in the user metadata in omindex.cc so that it doesn't take much on-line time.
Coding Week 12: August 8-August 14
Coding Week 13: August 15-August 22 (Final evaluation)
Note: See TracWiki for help on using the wiki.