wiki:GSoC2020/WeightingSchemes/ProjectPlan

New weighting schemes: project plan

Xapian already supports the Vector space model used in Tf-idf Weighting Schemes.It has some normalisation (described by SMART) already implemented by sub classing Weight.We would be implementing more normalisations to this scheme - Entropy,Global frequency IDF,Changed-coefficient ATF1,Augmented average term frequency,Augmented log,Square root,Log-global frequency IDF,Incremented global frequency IDF,Square root global frequency IDF.

These normalisations have proven to be more effective than other popular weighting schemes in certain cases.(For details,please refer theresearch paper.)

We will also implement the max and aug normalisations described by SMART.For this max term frequency stat has to be made visible to our weighting scheme.

We will also add support for Weight::create() in xapian-evaluation.The use of Weight::create() in Xapian-evaluation would help to establish a standard format for specifying the weighting scheme, which is very convenient for users.

Then we will use xapian-evaluation for comparing the effectiveness of the new normalisation.FIRE database will be used for this.

Last modified 4 years ago Last modified on 06/05/20 13:32:00
Note: See TracWiki for help on using the wiki.