Table of Contents
- Community Bonding Week 1: May 6–May 12
- Community Bonding Week 2: May 13–May 19
- Community Bonding Week 3: May 20–May 26
- Community Bonding Week 4: May 27–June 2 (work begins May 30)
- Coding Week 3: June 3–June 9
- Coding Week 4: June 10–June 16
- Coding Week 5: June 17–June 23
- Coding Week 6: June 24–June 30 (evaluations: June 26–30)
- July 1 - August 14
- August 15 - August 18
- Coding Week 14: August 19–August 25 (evaluations; August 21–29)
- Final Evaluations: August 26–August 29
Community Bonding Week 1: May 6–May 12
Started work on KMeans by starting implementation of elkans-kmeans (KMeans with triangular inequality to improve performance) Also fixed a few issues in the current KMeans PR. Work in this period will be a little slow due to exams till the end of the month.
Community Bonding Week 2: May 13–May 19
Working on getting the KMeans PR https://github.com/xapian/xapian/pull/149 in shape and will continue to better the API by moving to classes that use refcounted pointers to internal classes.
Community Bonding Week 3: May 20–May 26
No work done due to university exams
Community Bonding Week 4: May 27–June 2 (work begins May 30)
No work done due to university exams
Coding Week 3: June 3–June 9
Work on moving classes to PIMPL implementations
Coding Week 4: June 10–June 16
Work on review on the PIMPL classes and start work on dimensionality reduction, since high dimensionality takes too much time to run.
1) Removal of stopwords
2) Removal of other words which might not be important
3) Start implementing a way to test KMeans and other clusterers
Coding Week 5: June 17–June 23
Worked on PR 149 review https://github.com/xapian/xapian/pull/149 by James and Olly and added in all the necessary changes. Also started discussing about approaching stop word removal and stemming. Started a PR for Stopword removal https://github.com/richhiey1996/xapian/pull/2.
I had opened this PR against my own fork since PR 149 hadn't merged.
Coding Week 6: June 24–June 30 (evaluations: June 26–30)
Started working on dimensionality reduction. PR 149 has been finally merged and closed. Goal for this week :
1) Make stopword removal PR ready for merge
2) Start a PR for stemming. Since we are discarding all unstemmed terms, it is important to remove the 'stemmed stopwords' that exist in the document termlist. For this, as Olly had suggested, it would be best to have a subclass of stopper class to store and identify the stemmed forms of the stopwords too (as SimpleStopper doesn't do this). So I will be working on that and getting this PR ready for merge.
July 1 - August 14
Added in stopword removal, stemming and moved RoundRobin clusterer from public API to tests.
August 15 - August 18
Work on triangle inequality optimization and ClusterEvaluation class and start separate PR's for them as soon as possible