Table of Contents
- Community Bonding Week 2: April 28-May 4
- Community Bonding Week 3: May 5-May 11
- Community Bonding Week 4: May 12-May 19
- Coding Week 1: May 20-May 25
- Coding Week 2: May 26-June 1
- Coding Week 3: June 2-June 8
- Coding Week 4: June 9-June 15
- Coding Week 5: June 16-June 22
- Coding Week 6: June 23-June 29 (Midterm deadline June 27)
- Coding Week 7: June 30-July 6
- Coding Week 8: July 7-July 13
- Coding Week 9: July 14-July 20
- Coding Week 10: July 21-July 27
- Coding Week 11: July 28-August 3
- Coding Week 12: August 4-August 10
- Coding Week 13: August 11-August 18 (Final evaluation based on work up …
Community Bonding Week 2: April 28-May 4
Community Bonding Week 3: May 5-May 11
Community Bonding Week 4: May 12-May 19
Coding Week 1: May 20-May 25
May 20
- Basic Normalizer: I started with the scratch. (working only for hours, since I had an urgent travel)
branch: not yet. documentation I needed: snowball.
May 21
- Test samples: I browsed through many arabic corpuses and I've chosen this cause it contains diverse topics:
Motaz K. Saad and Wesam Ashour, "OSAC: Open Source Arabic Corpus", 6th ArchEng International Symposiums, EEECS10 the 6th International Symposium on Electrical and Electronics Engineering and Computer Science, European University of Lefke, Cyprus, 2010.
branch: N/A. documentation I needed: omega.
May 22-23-24
- Stopwords: basic arabic stopword list: - contains about 10k words (counting all forms) link
- Stopwords: I included also stopword list of other languages from snowball project . eg. English stopwords
- Stopwords: updates on the arabic stop word list. - eliminate lot of words that may appear not as a stop word - eliminate different forms, Arabic stopword list
- Stopwords: work on loading stopwords from a file.
branch: stopword, documentation I needed: autotools, SWIG.
Coding Week 2: May 26-June 1
May 26
- Testing environment: I indexed the chosen corpus using omiga and trying search and other operations on it.
May 27-28
- stop words: continue working on the loading of stopwords via a file.
pull-requst: https://github.com/xapian/xapian/pull/35
May 29-30
- sphinx documentation: finishing the work on the patch of sphinx documentation,
pull-request: https://github.com/xapian/xapian/pull/34
Coding Week 3: June 2-June 8
June 2 - 4
TODO
June 5 - 7
- Normalizer: gathering arabic letters unicodes and proposing a prototype for normalizer.
branch: normalizer_cpp
Coding Week 4: June 9-June 15
june 9-10
- Normalizer: working on the implementation of normalizer. It's working now, next is working to integrate it.
Example of normalization: مؤيًّدًا ==> مءيدا
branch: normalizer_cpp
june 11-12
- Romanization Converter: Implementation of Buckwalter transliteration system Converter. eg: qawol>>قَوْل
branch: normalizer_cpp
june 13
- Stemmer: Basic Structure of arabic stemmer: defining letters.
branch: stemmer_snowball
Coding Week 5: June 16-June 22
June 17
- Romanization: Implementation of the ISO233 romanization standard changes
June 18
- Stemmer: Prototype of an aggressive Arabic stemmer for prefixes and suffixes changes
June 19-21
- word tagger: Read how wordtagging is done in Naftawayh, https://pythonhosted.org/Naftawayh/
- Stemmer / Rooter: check the strategy of stemming/rooting/lemmatization in http://pythonhosted.org//Tashaphyne/
Coding Week 6: June 23-June 29 (Midterm deadline June 27)
June 23
- Issue #346: Python3 support: debugging the bug and propose a fix pull request