Weighting Schemes
| Name | Vivek Pal |
| IRC nick | vivekp |
| Timezone | UTC +0530 |
| Work hours | 0400 - 1200 UTC |
| Official mentor | Gaurav |
| Code repository | https://github.com/ivmarkp/xapian |
| Proposal | https://summerofcode.withgoogle.com/dashboard/project/4994403428990976/overview/ |
| Public GSoC page | https://summerofcode.withgoogle.com/projects/#4994403428990976 |
I'll be improving existing weighting schemes in Xapian & add support for a new normalization (Piv+) in existing vector space model.
Also, evaluate & compare the existing schemes with their improved counterparts for speed & retrieval effectiveness.
I'm planning to complete the following tasks by the end of GSoC 2016:
- Implement improved existing weighting schemes (BM25, Pl2 & Dir) in Xapian as BM25+, PL2+ & Dir+ respectively.
- Implement a new normalization function (Piv+) for existing vector space model Tf-Idf.
- Evaluate the performance of implemented functions using TREC dataset collections & calculating Precision or Recall and MAP.
- In the end, compare existing weighting functions with their improved counterparts based upon the evaluation which will be useful for users.
Last modified
10 years ago
Last modified on Aug 19, 2016, 3:03:54 PM
Note:
See TracWiki
for help on using the wiki.
