wiki:GSoC2016/Weighting/Journal

Community Bonding Week 1: April 25-May 1

  • Introduction in the community.
  • In the process of clearing doubts related to the project by discussion on mailing list.

Community Bonding Week 2: May 2-May 8

  • Discussed the testing process of existing weighting functions on mailing list.
  • Xapian-evaluation module: Started discussion regarding it with Gaurav.

Community Bonding Week 3: May 9-May 15

  • Added weighting functions details in Project Plan
  • Trying to set up xapian-evaluation module on my system.

Community Bonding Week 4: May 16-May 22

Semester exams started this week.

Coding Week 1: May 23-May 29

Exams continue. Will be getting back to project work this weekend.

Coding Week 2: May 30-June 5

  • Modified the original bm25weight.cc that contains implementation of BM25 weighting function to add support for BM25+ weighting function.
  • Also, modified existing test cases of BM25 to extend test coverage for BM25+ function.
  • Added API support in weight.h for the same.
  • Opened a pull request on Github repo for code review and feedback.

Coding Week 3: June 6-June 12

  • Have changed the approach a bit for the implementation of BM25+ as the previous approach didn't turn out very well.
  • Implemented a new subclass Xapian::BM25PlusWeight.
  • Have written new test cases -- all passing.
  • Pushed changes to update the previous PR.

Coding Week 4: June 13-June 19

  • Mostly, used this week to make further improvements in the PR.
  • Improved test coverage of BM25PlusWeight using the fact that k2 should have no effect on weight because k2 is associated with extra weight component and BM25+ has such no extra weight.
  • Fixed some pesky issues related to tab-width in the code.
  • Also, wrote an initial draft for the documentation of BM25+ weighting function and opened a PR here.

Coding Week 5: June 20-June 26

  • Made remaining changes in the PR to fix some formatting issues.
  • Successfully installed xapian-evaluation module on my system after fixing some issues faced during compilation. Will be performing the evaluation once we have the FIRE data hopefully next week or so.
  • Started looking at the implementation of next weighting function i.e. PL2+.

Coding Week 6: June 27-July 3 (Midterm deadline June 27)

  • Using the same approach used for implementing BM25+ previously, implemented a new subclass Xapian::PL2PlusWeight for PL2+ weighting function.
  • Have written new feature tests -- all passing.
  • Pushed changes to update the PR here.
  • [Edit] As discussed in mid-term meeting, documentation is due for later weeks or after we have evaluation results of new weighting schemes.

Coding Week 7: July 4-July 10

  • Xapian-evaluation module was successfully set up after quite a bit of effort. Thanks to Gaurav for all the help.
  • Have run evaluation of existing schemes to get familiar with its working.
  • Will be evaluating BM25+ and PL2+ weighting schemes in the coming week alongside the implementation of Dir+.

Coding Week 8: July 11-July 17

  • Modified Xapian::LMWeight subclass to implement support for new smoothing method; Dir+.
  • New tests added to validate changes.
  • Started looking at the implementation Piv+ normalization function.

Coding Week 9: July 18-July 24

  • Added more weighting schemes support in xapian-evaluation module. Pull request opened here.
  • Ran evaluation tests of existing and new weighting schemes alike.
  • Results files put together on Github Gist here for easy access.
  • Next up, implementing the Piv+ normalization function in the next week

Coding Week 10: July 25-July 31

  • Started implementing support for pivoted normalization in Tf-Idf weighting scheme.

Coding Week 11: August 1-August 7

  • Completed pivoted normalization support.
  • Added new test cases for normalization strings "PPP", "Ptn", "nPn" and "ntP".
  • Evaluated & compared above mentioned normalization strings along with the default tf-idf normalization i.e. "ntn". Results published here.
  • Discovered that "Ptn" normalization does fairly better job than all other tf-idf normalizations evaluated so far.

Coding Week 12: August 8-August 14

  • Started writing documentation for new weighting schemes as well as previously implemented-but-not-documented schemes.
  • Made some significant tweaks in BM25+ implementation following the code review by Olly.
  • Classes started for the new semester.

Submit code and evaluations: August 15-August 23

  • Completed documentation. PR (to be merged later on)
  • Working on final tweaks and code clean-up following the code review of PL2+, Dir+, Piv+ implementation.
Last modified 3 years ago Last modified on 18/08/16 00:30:45