Table of Contents
- Community Bonding Week 1 - 3: April 23 - May14
- Coding Week 1: May 21-May 27
- Coding Week 2: May 28-June 3
- Coding Week 3: June 4-June 10
- Coding Week 4: June 11-June 17
- Coding Week 5: June 18-June 24
- Coding Week 6: June 26-July 1
- Coding Week 7: July 2-July 8
- Coding Week 8: July 9-July 15 (Midterm deadline July 13)
- Coding Week 9: July 17-July 22
- Coding Week 10: July 24-July 29
- Coding Week 11: July 31-August 5
- Coding Week 12: August 6-August 12
- Coding Week 13: August 13-August 20 (Final evaluation based on work up …
Community Bonding Week 1 - 3: April 23 - May14
Issues for Unigram Implementation:
- Decision for Bounds of Unigram Weighting Scheme.
log(k*min(wdf_max/doc_length_lower_bound,1.0)) with checks for doc_length zero/
- Compiled latest repository and fixed small bug to make code compatible with gcc4.7(My first accepted patch)
- Forked git repository for xapian repo and learned pushing changes to repo.
- Decision on Clamping of Negative value due to log
sum( i=1,...,n, if { max(log(K.Pi), 0) == 0) max(max(log(K.Pcollec.i),0 } else { log(K.Pi) } )
- Decision to Provide user API to select value of K and which smoothing to try with a sensible default.
Coding Week 1: May 21-May 27
Implementation of Uni-gram Language Model : WeeksBlogPost
- Decision on Smoothing techniques to be implemented and was documented in NOTES.
- Handling of negative value from the log in sum formula and bound for the optimization.
- Implementation of Parametric constructor for Uni-gram Language Model.
- Addition of Smoothing to the Uni-gram Language Model Implementation.
- Addition of Per document statistics (number of unique term in document) to the xapian architecture similar to document length.
- Shifting Back-end used of xapian implementation from Chert to Brass.
Coding Week 2: May 28-June 3
May 29
- Documentation and tutorial of Unigram Language Model Weighting class documented in new non-generated documentation.
May 30
- Added the 5 test cases for UnigramLMWeight class and removed bugs found by those test cases.
May 31
- Tested binding for Xapian and written test to smoke check Uni-gram implementation.
- Explored code more rigorously to check exact configuration to fit in Bi-gram Implementation (Notes regarding will be updated in NOTES Section).
June 1
- Generated test coverage report for the Xapian and more interestingly for new unigramlmweight.cc class(Test coverage is 91.1%)
- Code exploration for Bigram Integration proposal bigramproposal
June 2
- Code exploration for Bigram Integration proposal
- Work of Bigram Integration Proposal completed.
June 3
- Made changes to index_text to create bi-grams and add them to documents ReadMore
- Made changes to API of termgenerator class to give API User ability enable bi-gram indexing which is disable by default.ReadMore
- Made changes to stopword need to be removed while seeing bi-grams.ReadMore
- Added DocumentBigramTerm class ReadMore
Coding Week 3: June 4-June 10
June 4
- Updated document class to store bi-gram for bi-gram list. ReadMore CodeCommit
- Updated Database for add_document_ to initiate storing data in tables.ReadMore CodeCommit
June 5
- Added Method to support access to bi-grams ReadMore CodeCommit
June 6
- Bigramterm_iterator to iterate over the bi-grams.ReadMore CodeCommit
June 7
- Made Internal Document level changes to table BrassBigramTermList to add bi-grams and support access to them. ReadMore CodeCommit
- Added UnImplementedMethod Exception for other database like chert and remote as i couldn't workaround to disable backend as they are highly integrated to bin of xapian-delve and remote backend.
June 8
- Adding support to inverter class to store postlist changes for bigrams.CodeCommit
June 9
- Documented Bi-grams indexing and access of bigram termlist and postlist in non-generated documentation. DocumentationCommit
- Adding Methods to merge post-list changes of bigrams in backend.(Will use current infrastructure of postlist as it seems to work correctly)
June 10
- Checking Backend for Error on previous regression test.(Only one test failed due to newly added changes cursordelbug1 it is for a previous bug at BugTicket )
Weekend - off
Coding Week 4: June 11-June 17
June 11
- Deciding upon Method to access posting list based on matcher infrastructure and query calling infrastructure.(Since We now treat bi-grams as term calling infrastructure is same).
- Analyzing the wildcard query expansion for the problem due to storing bi-grams.
- Discussion on what is done and got suggestion to changes implementation to "treat bi-gram as terms"
June 12
- Changing the Implementation to "treating the bi-grams as terms" CodeCommit
- Adjusted the Bigram Iterator to just show the bi-grams with new implementation."There should be some way to iterate the unigrams also but left it for later as it not very important(Mentioned by Olly)".CodeCommit
June 13
- Discussion on what and how to integrate the Document statistics to the backend.And Key to store the new statistics per document in the backend.(The document statistics need to store in the backend with new keys as a posting list entry).
June 14
- Implementation of Document statistics in the backend.( PostList entry to store document statistics).CodeCommit ReadMore
June 15
- GSOC Meetup (OFF)
June 16
- Implement Methods to access the stored document statistics.CodeCommit ReadMore
- Regression testing using the previously written test available in tests folder.
June 17
- Testing Backend for bugs based on tests.
- Temporary Bug fixing due to implementation of Document statistics in the backend. CodeCommit ReadMore
Coding Week 5: June 18-June 24
June 18
- Analyzing and understanding changes required to query object and query parser for bi-gram implementation.
June 19
- Backend changes proposed by Olly for changing the Key and per document stats.
- Analyzing and understanding changes required to query object and query parser for bi-gram implementation.
June 20
- Per Document statistics in the the matcher infrastructure or query parser object. (Linkage to back-end functions) CodeCommit
- Removing Bugs in the implementation found during regression testing.
June 21
- Removing Bugs in the implementation found during regression testing. {Issues with Writable Database,accessing the writable Database i.e changes are in inverter and all was causing some test to fail} CodeCommit
- One regression Test failing Cursordelbug solved.CodeCommit
June 22
- Removed bug of all docs post list was failing due to typo in Writable database function.CodeCommit
- Update Remove document and replace document for the document statistics added in the back-end. CodeCommit
- Changes Document Length of Term list to be sum of wdf for Unigram + Bigram. CodeCommit
- Experiment on Time difference between using the Document Length from termlist table or postlist table for get_eset.ReadMore
June 23
- Update Compact of Brass with new additions to backend.
- Regression Test for Brass compact and removed the bugs for failing compact* tests.CodeCommit
June 24
- Regression Test for Brass compact and removed the bugs for failing compact* tests.CodeCommit
June 25
- Follow the Depreciation policy and discuss whether to add get_doclength() or rollback and add get_stats(). CodeCommit
Coding Week 6: June 26-July 1
June 26
- Understand the git merge which jaylett suggested and try to understand benefits of it and work on that.GitHub
- Check whether the changes you made to automake are necessary or not as asked by jaylett and reply.Mastercorrection
- Make sure current master is upto date with branch and every thing compiles well and test suite is passed.
June 27
- Added github best practices to the NOTES GitPractices
- Make the Query level changes document and ask for the reviews(Stretched to next day).Archive
June 28
- Make the Query level changes document and ask for the reviews(Stretched to next day).Archive
- Test implemented of bi-grams for Group Terms.CodeTestCommit
June 29
- Brushed up changes and work to be done and made checked road till now for review meeting.
- Review Meeting.
June 30
OFF
July 1
OFF
Coding Week 7: July 2-July 8
July 2
- Stemming for bi-grams.CodeCommit
- Removed extra spaces from end of bigram.CodeCommit
- Made addition of Bi-gram in group term more efficient using single iterator instead of two iterators. CodeCommit
- Added functionality to select whether to add bigram to Query or not. CodeCommit
- Now bigrams for Group Query type is handeled at single place as bigram unaffected from Multi auto synonym.CodeCommit
July 3
- Added support for bigrams in Terms,i.e for NEAR,PHRASE,ADJ queries CodeCommit
- Discussion about re-factoring of Language model Weight,evaluation,stats for title,body etc.
July 4
- Review of work on Weight, remove bugs for Weight.
July 5
- Refactored UnigramLMweight to LMWeight, bugs of LMWeight CodeCommit
July 6
OFF
July 7
- Adjust LMWeight and Weight to check for bigram.CodeCommit
July 8
- Adjust LMWeight to be parametric for all three unigram,bigram,mixturemodel.
Coding Week 8: July 9-July 15 (Midterm deadline July 13)
July 9
- Added support to include bigram in Weight,And constructor for user to set which gram model to choose CodeCommit
July 10
- Look over Evaluation Module development from terrier and Andy's Trec Code.
July 11
- Schemed about Evaluation Module based on lines with terrier and decided to carry with FIRE DATA for while.
- Forked Andy's Code Given by olly.CodeCommit
- Hacked code for FIRE Query to be parsed by the module similarly to TREC Query.CodeCommit
July 12
- Hacked code for FIRE Dataset to be indexed by the module similarly to TREC Dataset.CodeCommit
- Completed the run of code for FIRE Data and pulled out result file for FIRE DATA.
- Evaluation Module Work Started QRel Assessment for query can be stored in Class QRelInMemory.CodeCommit
- Overall QRel class TRECQrel defined .CodeCommit
July 13
- Added Load function in TRECQREL CodeCommit
July 14 to July16
- Now using TrecQrel object,can load Qrel file and access status of document for Query and access all relevant documents too.CodeCommit
Coding Week 9: July 17-July 22
July 17 to July18
- Added genric Evaluation Class and class for Adhoc Evaluation CodeCommit
- Implemented Basic MAP and printing function for MAP in adhoc_eval.cc CodeCommit
July 19 & July 20
- Improved Evaluation and Write Evaluation Function of Adhoc,now MAP works fine CodeCommit
- Updated the Makefile for new evaluation changes CodeCommit
July 21 to July 23
- Fixed makefile for removing newly added executable file on clean CodeCommit
- Added Relevance precision CodeCommit
- Fixed Makefile,Improved display of evaluation results and Added statistics of Document relevent,retreived CodeCommit
Coding Week 10: July 24-July 29
July 24
- Addedd precision by Rank,precision by Recall left to implementCodeCommit
- Added Precision at recall to the evaluations CodeCommit
- IRC Meeting to discuss evaluation Module.
July 25 & July 26
- Checked evaluation module for manually for one query and removed bug.
- Corrected Missing Last Query in Qrel CodeCommit
- Summery of IRC meeting ReadSummary
July 27
- Implemented writing of evaluation result in colum format to support easy manipulation of results CodeCommit
July 28 & 29
- Redireted Query formation to QueryParser module instead of making query ourself by splitting in words CodeCommit
- Redirected Indexing,stemming,stopw to to TermGenerator module of XapianCodeCommit
July 30
- Made Weighting scheme,Bigram configurable by user through config fileCodeCommit
- Compiled and Found Result for all the Weighting Scheme with bigram and without bigram.ResultDocument
Coding Week 11: July 31-August 5
July 31
- Removed a implementation bug from LMWeight (bug was over writing actual weight value).CodeCommit
- Tested Evaluation result for various parameter.ResultDocument
- Test for Bi-gram implementation in back-end.'
August 1
- Reviewed code of Language Model Weighting Scheme.
August 2
- IRC Meeting to Discss Evaluation Module and problem with Bigrams.MeetingNotes
August3 - August 5
- Transit to Banglore to join HP.
Coding Week 12: August 6-August 12
August 6
- Indexing the Collection with stopwords and removing bug with stop words implementation.Stopper was not configured correctly.CodeCommit
- Evaluation Result with Stop Words included.ResultDocument
August 7
- Reached to one problem causing the low result was setting high value for log param . Check Log Param if setting large value hurts the performance and it improved the performance.
August 8 - August 13
- Working on improving Uni-gram model.
Coding Week 13: August 13-August 20 (Final evaluation based on work up to August 20)
August 14 - August 15
- Working on Finding and improving bugs for Bi-gram.
August 16 - August 17
- Working on User Documentation.DocumentationCommit
August 18 - August 19
- Solved bug of QueryParser.CodeCommit
- Solved a bug of TermGenerator.CodeCommit
- Working on Test cases for Bigram Model.CodeCommit
August 20
- Cleaning Code
- Writing Pending Test cases for Bigrams
Last modified
9 years ago
Last modified on 01/26/16 10:10:43
Note:
See TracWiki
for help on using the wiki.