TODO's Bold ---> Pending Work
Finding/Analyzing feasibility and changes needed to add smoothing schemes Absolute Smoothing,Dirchlet,Jelinik. {Decided and Added in Notes}Adding and implementing Discussed log trick and handle negative value,clamping of log output.{Done as per discussion on IRC and mailing list discussions}Writing Documentation for Uni-gram Class.Running API test to check compatibility with previous version.Adding user API in uni-gram modelling for smoothing and log parameters.{Done and documentation is available in NOTES}Checking Java and other bindings for correct working of uni-gram LM weighting scheme.Bi-gram LM Proposal on Storage of Bi-grams,Bounds and log issues will be solved similarly to uni-gram model.Implement the BigramTokenization and UnigramTokenization class for Termgenerator and use them.(Not implemented rather different approach followed of in-place tokenization)Implement DocumentBigramTerm class.Update Document class changes for storing bigramsUpdating Database add_document to support bigramsTermlistTable changes to store termlist of bigramsDocumentation of Indexing and accessing bigrams.Adding or changing PostListTable to store postlist of bigram .(Not required using same infrastructural of !Postlist)Analyzing changes to query object and query parser for bi-gram implementation.Per Document statistics in the the matcher infrastructure or query parser object. (Linkage to back-end functions)Solving the bugs due to writable back-end and solved problem for uncommitted database.Integrated the back-end compact with per document statistics.Make summary of the evaluation module.Add check_adhoceval to the evaluation module~Draw a list of things to investigate to find why precision is low~~Reply to Parsenjit sir asking TREC Collection~~Again Review papers of Unigram and Bigram to see if some thing is missed,and hence low precision~- Review the ToDO list and find how much time each will require
~Writing Test for Bi-gram Language model,Bigram Implementation,Unigram~- Comments by Jaylett on last meeting
~Add last meeting to the Meetings Note~~Index FIRE Collection with Bigrams and Stopword~
Things to Investigate for Bigram:
Index the collection with StopWords and see if it improves.--> Improved the PerformanceCheck Log Param if setting large value hurts the performance--> Improved the Performance
Last modified
12 years ago
Last modified on 08/19/12 20:12:23
Note:
See TracWiki
for help on using the wiki.