Release Overview for 1.4.3

This page contains a high level description of the most notable changes in this release. For full details of user-visible changes, see the NEWS files in each module:

See also the full list of bug reports marked as fixed in this release.


  • MSet::snippet(): Favour candidate snippets which contain more of a diversity of matching terms by discounting the relevance of repeated terms using an exponential decay. A snippet which contains more terms from the query is likely to be better than one which contains the same term or terms multiple times, but a repeated term is still interesting, just less with each additional appearance. Diversity issue highlighted by Robert Stepanek's patch in - testcases taken from his patch.
  • MSet::snippet(): New flag SNIPPET_EMPTY_WITHOUT_MATCH to get an empty snippet if there are no matches in the text passed in. Implemented by Robert Stepanek.
  • Round MSet::get_matches_estimated() to an appropriate number of significant figures. The algorithm used looks at the lower and upper bound and where the estimate sits between them, and then picks an appropriate number of significant figures. Thanks to Sébastien Le Callonnec for help sorting out a portability issue on OS X.
  • Improve value range upper bound and estimated matches. The value slot frequency provides a tighter upper bound than Database::get_doccount(). The estimate is now calculated by working out the proportion of possible values between the slot lower and upper bounds which the range covers (assuming a uniform distribution). This seems to work fairly well in practice, and is certainly better than the crude estimate we were using: Database::get_doccount() / 2
  • Handle arbitrary combinations of OP_OR under OP_NEAR/OP_PHRASE, partly addressing #508. Thanks to Jean-Francois Dockes for motivation and testing.


  • omindex:
    • Add support for indexing vCard files if Perl and its Text::vCard module are available.
    • Don't use meta description as sample by default. Now we have dynamic snippets (via $snippet), the body text is a better default. Also generated HTML sometimes has unhelpful content in the meta description. To get the previous behaviour, use the new omindex command line option: --sample=description
Last modified 2 years ago Last modified on 27/01/17 04:38:18