Text-Extraction Libraries
Work Product
The aim of this project is adding support to extract text from various file formats during indexing through external libraries. The main part of this is the modules worker and assistant which bring a way of integrating new libraries. These modules deal with library errors and isolate them in subprocesses to avoid them from crashing omindex.
Pull Request and Commits
The main pull request of this project are:
- Adding support for text-extraction libraries
- Improve Worker Communication
- Update documentation about how to add a new format
- Add Omindextest to Omega
These are the pull request corresponding to the added libraries:
Link containing all merged commits
Please, read Notes to get more information about this work.
Work in Progress
Currently, I am working on Omindextest and adding test cases to it. I would like to extend it and test other features of the program as I find having some automated testing of omindex really important.
Future Work
As future work I think that improve Omindextest would be important. Adding more test cases to test different features or improve the reliability of omindex is crucial for develop long term code.
Adding new formats and libraries could be another point. There is documentation about how to do it and it is advisable to choose popular formats and libraries with an active community.