wiki:GSoC2020/WeightingSchemes/Journal

Community Bonding Week 1: May 4–May 10

  • Went through the implementation of existing normalizations of Tf-idf weighting scheme.

Community Bonding Week 2: May 11-May 17

  • Getting familiar with the Macros used in writing automated test cases.

Community Bonding Week 3: May 18-May 24

  • Writing some mock test cases.

Community Bonding Week 4: May 25-May 31

  • Filling FIRE access form and email it to Gaurav(waiting for a reply). Went through Xapian evaluation files which require clean up to use Weight::Create().

Coding Week 1: June 1–June 7

Day 1

  • Implement Global frequency IDF and Augmented log normalization. (Entropy was put on hold because it requires wdf of all terms).

Day 2

  • Add test cases for Augmented log normalization.

Day 3

  • Add test cases for Global frequency IDF.

Day 4

  • Understanding how different stats are made available to the weighting schemes (Getting wdf is still requires more clarification.)

Day 5

  • Implement Square root and Log-Global frequency IDF normalization.

Coding Week 2: June 8-June 14

Day 1

  • Add test cases for Square root and Log-Global frequency IDF normalization.

Day 2

  • Make PR for the changes.

Day 3

  • Understanding the implementation of wdfdocmax in Tf-idf-max norm branch.

Day 4

  • Add code to pass wdfdocmax to get_sumextra() .

Coding Week 3: June 15-June 21

Day 1, Day 2

  • Update the tfidf-maxwdf-norm branch with master. Manage all the conflicts.

Day 3

  • Add code to pass wdfdocmax to Xapian::Internal::PostList::get_weight() .

Day 4

  • Add honey backend support for wdfdocmax.

Day 5

  • Add remote backend support for wdfdocmax.

Coding Week 4: June 22-June 28

Day 1

  • Add Incremented Global frequency IDF and Square Root Global frequency IDF normalisations.

Day 2

  • Understanding the calculation of wdf for synonyms.

Day 3

  • Make changes suggested in the PR after review.

Day 4, Day 5

  • Add enums support for Tf-Idf Weight.

Day 6

  • Make some modifications in # 301 to make it ready for review.
  • Make PR for enums support.

Coding Week 5: June 29-July 5 (first evaluation due July 3)

  • Make changes suggested in PR after review.
  • Add API docs for enum parameters.
  • Update timeline.

Coding Week 6: July 6-July 12

Day 1,2

  • Make changes in the new normalizations to use enums.

Day 3,4

  • Make changes in API docs for enum parameters.

Day 5

  • Add documentation in the user guide for enum parameters.

Coding Week 7: July 13-July 19

Day 1

  • Get access to FIRE database.

Day 2

  • Make changes in PR 302 suggested after review.

Day 3

  • Build xapian-evaluation. Learn about autoconf and automake tools.

Day 4

  • Make changes in PR 302 suggested after review.

Day 5

  • Try using xapian-evaluation tools like trec_index.

Coding Week 8: July 20-July 26

Day 1

  • Run xapian-evaluation with existing code.

Day 2

  • Add API tests PR 302.

Day 3,4

  • Make changes in config.cc and config.h

Day 5

  • Work on adding changes in trec.search.cc

Coding Week 9: July 27-August 2 (second evaluation due July 31)

Day 1

  • Make changes suggested in PR 298.

Day 2,3

  • Build xapian-evaluation on xapian-master.

Day 4

  • Make changes in trec_search to support Weight::Create.

Day 5

  • Implement the remaining normalizations that have all required stats available.
  • Make changes in the xapian-evaluation readme.

Coding Week 10: August 3-August 9

Day 1

  • Make changes in PR 301.
  • Understanding how doclen is stored.

Day 2

  • Add docs for using enums to specify normalizations in the user guide.
  • Plan the storing of unique terms in glass backend to get the exact value.

Day 3

  • Plan the storing of unique terms in the honey backend to get the exact value. (Glass backend changes would have caused compatibility problems.)

Day 4

  • Start coding to store unique terms as chunked streams.
  • Split PR 301 into smaller changes.

Day 5

  • Make changes suggested after PR 309 review.
  • Understanding how to make wdfdocmax synonym aware.

Coding Week 11: August 10-August 16

Day 1

  • Fix the synonym bug noticed by olly in PR 309.

Day 2

Day 3

  • Understand the working of Honey Cursor.
  • Make changes in PR 310 suggested after review.

Day 4

  • Make changes in alldocpostlist of honey. Also, correct the code of UniqTermsChunk.

Day 5

  • Implement max-norm and aug-norm along with tests.

Coding Week 12: August 17-August 23

Day 1

  • Understand the working of honey_compact.cc

Day 2,3

  • Update TfIdfWeight::create_from_parameters() specify enum parameters.

Day 4

  • Understand the basic perl syntax - go through xapian-core/languages/collate-sbl. (This is the first time I am working with Perl!)

Day 5

  • Write a perl script to match constant names to normalisations.

Submit code and evaluations: August 24-August 31

Last modified 4 years ago Last modified on 22/08/20 14:28:24
Note: See TracWiki for help on using the wiki.