wiki:GSoC2019/TextExtraction/ProjectPlan

Text-extraction libraries for Omega

Community Bonding Period: May 6 – May 24:

  • Get to know the community, interact with the people.
  • Read and understand the Xapian code base, get to know all the relevant classes.
  • Try to solve existing issues, go through code review process.
  • Research about different libraries.
  • Make sure that everything is ready for coding.

Coding Week 1 (May 27–May 31):

  • Design, implement and get familiar with classes to add the libraries and handle errors.
  • Write proper documentation.

Coding Weeks 2-11(June 3-August 9):

During this period I will be adding support to the different file formats. Each format is estimated to require between 1 and 2 weeks within which the following activities will be carried out:

  • Research about different available libraries (compare them and discuss with the mentor which is the most appropriate).
  • Implement code to add the library to the project.
  • Test code. Fix issues if any.
  • Write proper documentation and publish the changes in the project code.

At the end of each block (2 weeks), the changes applied and which is the next file format will be discussed with the Mentor.

At least 7 formats will be implemented during this time (my personal goal would be to implement at least 10).

Coding Week 12 (August 12–August 16):

  • Write proper documentation and samples of how to add support for a new file format.
  • Test all code of the project.
  • Make changes to the documentation if required.

A week is left free in case of any delay.

Last modified 4 months ago Last modified on 25/06/19 14:30:05