Table of Contents
- GSOC MidTerm Evaluation Meeting
- Evaluation Module Meeting
- Meeting to Discuss improvement in LMWeight and Bigram Hurting …
GSOC MidTerm Evaluation Meeting
Discussion on Query parser
- Since bi-grams are treated similarly to the terms.No new type Query Object needs to created,only changes in Query parser would work.
- Need to work closely with Sehaj on Query parser.
- Bigram needn't to be added for hate ,love ,Synonyms,wildcard expansion etc which are complex and doesn't matter for bigrams.
- Need to do minimal changes and move forward for evaluation to be in better position to see Weighting scheme.
- jaylett discussed a possible problem while weighting of bigram query as or.Which was there in Synonym.(Will be considered later after evaluations)
- Need to merge with upstream often.
- Smaller commits on Git need to done.
- Branches need to be used.
- Problem with formatting of code (which was due to git hub document viewer)
- Renaming of !UnigramLMWeight to have common class for bigram and unigram in LM.
- Further work need to be done was discussed - evaluation module or prefixing project.
Evaluation Module Meeting
Attendee: James Aylett , Gaurav Arora ,Olly Betts
Aim: Meeting was primarily aimed to discuss the evaluation module in Xapian.
Relevant links: http://pastebin.com/DUVDV02w
Key Points Discussed :
Low Precision value
- Problem: Current evaluation done with FIRE queries and datasets have very low accuracy.
Action: Manually calculate the evaluation results for the evaluation module for one query,find problem in module.
Result: Module was checked on a Randomly selected Query Id 136,findings are evaluation done is correct,minor bug was found one query was missed due to a minor flaw(Corrected).
Improving Output of Evaluation
- Discussion: Enhancing the evaluation result and store them in text file in pure column format to be used for further analysis and makes it easy to pull the data into something else for graphing, further calculation or whatever
Actions: Output all evaluation metric in different file in pure column format.
Result: Better Analysis of data.
Redundancy with Xapian Module in Andy's code
- Problem: As the evaluation module is build on the Andy Trec Code which predates Term Generator and even Query parser.And goes on to implement itself.So these module already lack bi-gram changes and will miss further upstream changes.
Action: Redirect the proper parts to TermGenerator and QueryParser.
Result: Module will not be outdated and wont miss upstream changes.
New Evaluation Scheme
- Discussion: New Evaluation Scheme - Mainly Graded Evaluation and NDCG Schemes are left.Currently MAP, RPrecision, Precision@Rank, Precision@… as per my reading NDCG is used for ML things.
Actions: Not decided.(Parths Comments Required)
Low Precision Reason
- Post Discussion Topic: Since the poll developed by FIRE for English at-least was very small consisting of only two runs(developed by them) and moreover our run was also not there in poll.So that can be reason of Low Precision value as a lot of relevant document are considered as non-relevant in good ranks in run.This was observed while checking the module as mosly queries have very few relevant document.
Actions: Not Decided(Suggestion needed for Parth,James,Olly and Dan)
Meeting to Discuss improvement in LMWeight and Bigram Hurting Performance for all
Focus for Future
- Figuring out the problem with Bigram and LM not outperforming BM25 is important.
- Documentation can be Carried out in DocSprint after GSOC if something is left,so concentrate on getting Work correct.
- Test might help in figuring out problem,We may do it in parallel to figuring out problem(_It should be).
- Use TREC Collection for more firm results.
- Draw a list of things to investigate for checking the Bigrams and LM.
- write document on what is worth testing in Evaluation Module.
- Going over TODOs and see how much time they will require.