wiki:GSoC2016/Weighting

Weighting Schemes

Name Vivek Pal
IRC nick vivekp
Timezone UTC +0530
Work hours 0400 - 1200 UTC
Official mentor Gaurav
Code repository https://github.com/ivmarkp/xapian
Proposal https://summerofcode.withgoogle.com/dashboard/project/4994403428990976/overview/
Public GSoC page https://summerofcode.withgoogle.com/projects/#4994403428990976

I'll be improving existing weighting schemes in Xapian & add support for a new normalization (Piv+) in existing vector space model.
Also, evaluate & compare the existing schemes with their improved counterparts for speed & retrieval effectiveness.

I'm planning to complete the following tasks by the end of GSoC 2016:

  1. Implement improved existing weighting schemes (BM25, Pl2 & Dir) in Xapian as BM25+, PL2+ & Dir+ respectively.
  2. Implement a new normalization function (Piv+) for existing vector space model Tf-Idf.
  3. Evaluate the performance of implemented functions using TREC dataset collections & calculating Precision or Recall and MAP.
  4. In the end, compare existing weighting functions with their improved counterparts based upon the evaluation which will be useful for users.

Last modified 3 years ago Last modified on 19/08/16 15:03:54