Weighting Schemes
Name | Vivek Pal |
IRC nick | vivekp |
Timezone | UTC +0530 |
Work hours | 0400 - 1200 UTC |
Official mentor | Gaurav |
Code repository | https://github.com/ivmarkp/xapian |
Proposal | https://summerofcode.withgoogle.com/dashboard/project/4994403428990976/overview/ |
Public GSoC page | https://summerofcode.withgoogle.com/projects/#4994403428990976 |
I'll be improving existing weighting schemes in Xapian & add support for a new normalization (Piv+) in existing vector space model.
Also, evaluate & compare the existing schemes with their improved counterparts for speed & retrieval effectiveness.
I'm planning to complete the following tasks by the end of GSoC 2016:
- Implement improved existing weighting schemes (BM25, Pl2 & Dir) in Xapian as BM25+, PL2+ & Dir+ respectively.
- Implement a new normalization function (Piv+) for existing vector space model Tf-Idf.
- Evaluate the performance of implemented functions using TREC dataset collections & calculating Precision or Recall and MAP.
- In the end, compare existing weighting functions with their improved counterparts based upon the evaluation which will be useful for users.
Last modified
8 years ago
Last modified on 08/19/16 15:03:54
Note:
See TracWiki
for help on using the wiki.