[[TOC(inline)]] == Community Bonding Week 2: April 28-May 4 == == Community Bonding Week 3: May 5-May 11 == == Community Bonding Week 4: May 12-May 19 == == Coding Week 1: May 20-May 25 == === ''May 20'' === * '''Basic Normalizer''': I started with the scratch. (working only for hours, since I had an urgent travel) branch: not yet. documentation I needed: snowball. === ''May 21'' === * '''Test samples''': I browsed through many arabic corpuses and I've chosen this cause it contains diverse topics: Motaz K. Saad and Wesam Ashour, "OSAC: Open Source Arabic Corpus", 6th ArchEng International Symposiums, EEECS’10 the 6th International Symposium on Electrical and Electronics Engineering and Computer Science, European University of Lefke, Cyprus, 2010. branch: N/A. documentation I needed: omega. === ''May 22-23-24'' === * '''Stopwords''': basic arabic stopword list: - contains about 10k words (counting all forms) [https://github.com/assem-ch/xapian/blob/291b3d3030c97c5bd349cb14c6b1868d88b56271/xapian-data/stopwords/arabic/asw.list link] * '''Stopwords''': I included also stopword list of other languages from snowball project . [http://snowball.tartarus.org/algorithms/english/stop.txt eg. English stopwords] * '''Stopwords''': updates on the arabic stop word list. - eliminate lot of words that may appear not as a stop word - eliminate different forms, [https://raw.githubusercontent.com/assem-ch/xapian/stopwords/xapian-core/languages/stopwords/arabic/stop.txt Arabic stopword list] * '''Stopwords''': work on loading stopwords from a file. branch: [https://github.com/assem-ch/xapian/tree/stopwords stopword], documentation I needed: autotools, SWIG. == Coding Week 2: May 26-June 1 == === ''May 26'' === * '''Testing environment''': I indexed the chosen corpus using omiga and trying search and other operations on it. === ''May 27-28'' === * '''stop words''': continue working on the loading of stopwords via a file. pull-requst: https://github.com/xapian/xapian/pull/35 === ''May 29-30'' === * '''sphinx documentation''': finishing the work on the patch of sphinx documentation, pull-request: https://github.com/xapian/xapian/pull/34 == Coding Week 3: June 2-June 8 == === ''June 2 - 4'' === TODO === ''June 5 - 7'' === * '''Normalizer''': gathering arabic letters unicodes and proposing a prototype for normalizer. branch: [https://github.com/assem-ch/xapian/tree/normalizer_cpp normalizer_cpp] == Coding Week 4: June 9-June 15 == === ''june 9-10'' === * '''Normalizer''': working on the implementation of normalizer. It's working now, next is working to integrate it. Example of normalization: مؤيًّدًا ==> مءيدا branch: [https://github.com/assem-ch/xapian/tree/normalizer_cpp normalizer_cpp] === ''june 11-12'' === * '''Romanization Converter''': Implementation of [http://en.wikipedia.org/wiki/Buckwalter_transliteration Buckwalter transliteration system] Converter. eg: qawol>>قَوْل branch: [https://github.com/assem-ch/xapian/tree/normalizer_cpp normalizer_cpp] === ''june 13'' === * '''Stemmer''': Basic Structure of arabic stemmer: defining letters. branch: [https://github.com/assem-ch/xapian/compare/stemmer_snowball stemmer_snowball] == Coding Week 5: June 16-June 22 == === ''June 17'' === * '''Romanization''': Implementation of the ISO233 romanization standard [https://github.com/assem-ch/xapian/commit/d9d154c286ae2140c0361186d7121bd9d42388c1 changes] === ''June 18'' === * '''Stemmer''': Prototype of an aggressive Arabic stemmer for prefixes and suffixes [https://github.com/assem-ch/xapian/commit/3011ef28e79ec3dc0d93bbd7fb6e43fe86657c05 changes] === ''June 19-21'' === * '''word tagger''': Read how wordtagging is done in Naftawayh, https://pythonhosted.org/Naftawayh/ * '''Stemmer / Rooter''': check the strategy of stemming/rooting/lemmatization in http://pythonhosted.org//Tashaphyne/ == Coding Week 6: June 23-June 29 (Midterm deadline June 27) == === ''June 23'' === * '''Issue #346: Python3 support''': debugging the bug and propose a fix [https://github.com/xapian/xapian/pull/50 pull request] == Coding Week 7: June 30-July 6 == == Coding Week 8: July 7-July 13 == == Coding Week 9: July 14-July 20 == == Coding Week 10: July 21-July 27 == == Coding Week 11: July 28-August 3 == == Coding Week 12: August 4-August 10 == == Coding Week 13: August 11-August 18 (Final evaluation based on work up to August 18) ==