The aim of this project is to use Chinese segmentation algorithm to break Chinese text up to improve the performance.
23 - 29 May
Finish the code about segmentation algorithm based on dictionary
30 May – 5 June
Prepare to code how to deal with name
6 – 12 June
Test the ability to recognize number and location
13 - 19 June
Test the ability to recognize number and location
20 – 26 June
Finish the code about how to deal with name
27 June – 3 July
Test the ability to recognize name
4 – 10 July
Start to code about how to deal with recognize some high-frequency word,
11 – 17 July
Finish coding about how to recognize high-frequency out-of-vocabulary word
18 July – 24 July
Test the efficiency of the efficiency of the analysis.
25 – 31 July
Add the analysis to the Xapian System
1 – 7 August
Test the Xapian as a whole
8 – 14 August
Further refine tests and finish documentation for the whole project