wiki:MissingDocumentation

Missing Documentation

This page is intended to list gaps in the documentation which should be addressed. Note that the new documentation (http://getting-started-with-xapian.readthedocs.org/en/latest/index.html) includes "todo" notes for missing documentation, showing where they can fit in and what should be covered.

Feel free to add to this list. Feel even freer to write these documents! If you do, either put them on the wiki or write them as restructured text and attach them to a trac ticket; or you could create a pull request against the new documentation source (https://github.com/jaylett/xapian-docsprint/). Even a rather rough and ready version is useful.

xapian-core

Getting Started Guide

  • Add a note that UTF-8 is assumed where the bytes need interpreting somewhere in the introductory documentation (if we do already, I failed to spot where).
  • Discuss stemming and STEM_SOME in relation to the QueryParser.
  • Mention TermIterator.skip_to in the context of pulling terms out of a Document at match render time. (Possibly different for bindings, where you might not usually use termlist_begin() — for instance in python skip_to() is available on the TermIter, got by iter(document).)
  • How QueryParser VRP/RP and FieldProcessor for the same prefix nicely coexist
  • Build system stuff - autotools (XO_LIB_XAPIAN, XO_REQUIRE), xapian-config, pkg-config, cmake

API documentation comments

  • Check that TermGenerator.set_flags() mentions FLAG_STEMMING
  • Improve API docs for OP_VALUE_*.
  • Add a note to API features which actually assume an encoding of UTF-8.
  • Documentation comments are currently very unreliable as to what exceptions can be thrown by what methods.
  • 1.3.2 added support for writing to a multi-database, but it seems this only got documented in NEWS

FAQ

  • A FAQ on "how to deal with DatabaseModifiedError" would be helpful. Also a discussion somewhere of how to do concurrent rolling indexing while allowing search may be handy.
  • A FAQ on caching ("trust the OS cache") may be helpful, as this seems to come up periodically. There's some helpful discussion in the scalability doc. There's also a long email by James (that may be somewhat out of date)

Internals

  • We have wiki documentation for the FlintBackend?, but it's outdated (mildly suggesting flint is the latest and greatest, when chert was the default in 1.2 and glass will be for 1.4), and there aren't similar pages for newer backends. Some of the key differences are noted in the -core NEWS file (eg the v1.1 release notes), but there is other more detailed information available in some cases (eg a discussion by Olly about chert). More generally, having something on each backend (in an appendix of the user manual?), with clear upgrade instructions if people want to rebuild (via -compact or similar) would be helpful. A wiki page about the next-stable backend would be helpful, particularly if it (perhaps via a ticket search) noted known limitations or problems with the in-development version, for people trying it out.

Omega

  • Document omega value usage, and something more concrete about using your own values
  • scriptindex allows easily configurable indexing of data from diverse sources (e.g. indexing from SQL)
    • document dbi2omega, including environment variables:
      • DBUSER - user name to connect to the database with (defaults to $USER then $LOGNAME then "")
      • DBPASSWORD - password to connect to the database with (defaults to "")
      • DBIDRIVER - DBI driver to use (defaults to "mysql")
    • document mbox2omega
  • crawling using ht://dig:
    • document htdig2omega
  • crawling using GNU wget:
    • mirror web pages locally and then use omindex
    • supports resuming download after error, proxies, cookies
    • HOWTO style guide and/or wrapper script would be useful
    • Peter Masiar concluded ht://dig was more suitable - find out why...
  • file formats which omindex understands
    • how to add new formats (this should be specifiable in a config file). See FAQ/OmegaNewFileFormat.
    • move FAQ/OmegaNewFileFormat into omega docs, and update for 1.4 where many filters can be specified without writing code.

Search::Xapian

  • Search::Xapian::MSet::Tied has no POD documentation, but Search::Xapian::Enquire refers to it so the HTML ends up with a 404 link.
Last modified 8 months ago Last modified on 24/12/16 08:03:19