Missing Documentation
This page is intended to list gaps in the documentation which should be addressed. Note that the new documentation (http://getting-started-with-xapian.readthedocs.org/en/latest/index.html) includes "todo" notes for missing documentation, showing where they can fit in and what should be covered.
Feel free to add to this list. Feel even freer to write these documents! If you do, either put them on the wiki or write them as restructured text and attach them to a trac ticket; or you could create a pull request against the new documentation source (https://github.com/xapian/xapian-docsprint/). Even a rather rough and ready version is useful.
xapian-core
Getting Started Guide
- Add a note that UTF-8 is assumed where the bytes need interpreting somewhere in the introductory documentation (if we do already, I failed to spot where).
- Discuss stemming and
STEM_SOME
in relation to the QueryParser. - Mention
TermIterator.skip_to
in the context of pulling terms out of a Document at match render time. (Possibly different for bindings, where you might not usually usetermlist_begin()
— for instance in pythonskip_to()
is available on theTermIter
, got byiter(document)
.) - How
QueryParser
VRP/RP andFieldProcessor
for the same prefix nicely coexist - Build system stuff - autotools (
XO_LIB_XAPIAN
,XO_REQUIRE
),xapian-config
, pkg-config, cmake - Write something on phrase queries
API documentation comments
- Check that
TermGenerator.set_flags()
mentionsFLAG_STEMMING
- Add a note to API features which actually assume an encoding of UTF-8.
- Documentation comments are currently very unreliable as to what exceptions can be thrown by what methods.
- 1.3.2 added support for writing to a multi-database, but it seems this only got documented in NEWS
FAQ
- A FAQ on "how to deal with DatabaseModifiedError" would be helpful. Also a discussion somewhere of how to do concurrent rolling indexing while allowing search may be handy.
- A FAQ on caching ("trust the OS cache") may be helpful, as this seems to come up periodically. There's some helpful discussion in the scalability doc. There's also a long email by James (that may be somewhat out of date)
Internals
- We have wiki documentation for the FlintBackend, but it's outdated (mildly suggesting flint is the latest and greatest, when flint was dropped completely in 1.4.x), and there aren't similar pages for newer backends. Some of the key differences are noted in the -core NEWS file (eg the v1.1 release notes), but there is other more detailed information available in some cases (eg a discussion by Olly about chert). More generally, having something on each backend (in an appendix of the user manual?), with clear upgrade instructions if people want to rebuild (via -compact or similar) would be helpful. A wiki page about the next-stable backend would be helpful, particularly if it (perhaps via a ticket search) noted known limitations or problems with the in-development version, for people trying it out.
Omega
- Document omega value usage, and something more concrete about using your own values
- scriptindex allows easily configurable indexing of data from diverse sources (e.g. indexing from SQL)
- document dbi2omega, including environment variables:
- DBUSER - user name to connect to the database with (defaults to $USER then $LOGNAME then "")
- DBPASSWORD - password to connect to the database with (defaults to "")
- DBIDRIVER - DBI driver to use (defaults to "mysql")
- document mbox2omega
- document dbi2omega, including environment variables:
- crawling using ht://dig:
- document htdig2omega
- crawling using GNU wget:
- mirror web pages locally and then use omindex
- supports resuming download after error, proxies, cookies
- HOWTO style guide and/or wrapper script would be useful
- Peter Masiar concluded ht://dig was more suitable - find out why...
- file formats which omindex understands
- how to add new formats (this should be specifiable in a config file). See FAQ/OmegaNewFileFormat.
- move FAQ/OmegaNewFileFormat into omega docs, and update for 1.4 where many filters can be specified without writing code.
- document what field values omindex sets (there's a partial list but it's out of date), and what field values the shipped templates support for people writing scriptindex index scripts.
Search::Xapian
Search::Xapian::MSet::Tied
has no POD documentation, butSearch::Xapian::Enquire
refers to it so the HTML ends up with a 404 link.