wiki:GSoC2016/Weighting

Weighting Schemes

Name Vivek Pal
IRC nick vivekp
Timezone UTC +0530
Work hours 0400 - 1200 UTC
Official mentor Gaurav
Code repository https://github.com/ivmarkp/xapian
Proposal https://summerofcode.withgoogle.com/dashboard/project/4994403428990976/overview/
Public GSoC page https://summerofcode.withgoogle.com/projects/#4994403428990976

I'll be improving existing weighting schemes in Xapian & add support for a new normalization (Piv+) in existing vector space model.
Also, evaluate & compare the existing schemes with their improved counterparts for speed & retrieval effectiveness.

I'm planning to complete the following tasks by the end of GSoC 2016:

  1. Implement improved existing weighting schemes (BM25, Pl2 & Dir) in Xapian as BM25+, PL2+ & Dir+ respectively.
  2. Implement a new normalization function (Piv+) for existing vector space model Tf-Idf.
  3. Evaluate the performance of implemented functions using TREC dataset collections & calculating Precision or Recall and MAP.
  4. In the end, compare existing weighting functions with their improved counterparts based upon the evaluation which will be useful for users.

Last modified 10 years ago Last modified on 08/19/16 15:03:54
Note: See TracWiki for help on using the wiki.