wiki:ReleaseOverview/1.4.8

Release Overview for 1.4.8

This page contains a high level description of the most notable changes in this release. For full details of user-visible changes, see the NEWS files in each module:

See also the full list of bug reports marked as fixed in this release.

API:

  • Add new stemming mode STEM_SOME_FULL_POS. This stores positional information for both stemmed and unstemmed terms, allowing NEAR and ADJ to work with stemmed terms. The extra positional information is likely to take up a significant amount of extra disk space so the default STEM_SOME is likely to be a better choice for most users.

glass backend:

  • Revert change made in 1.4.6:

Enable glass's "open_nearby_postlist" optimisation (which especially helps large wildcard queries) for writable databases without any uncommitted changes as well.

The amended check isn't conservative enough as there may be postlist changes in the inverter while the table is unmodified. This breaks testcase T150-tagging.sh in notmuch's testsuite, reported by David Bremner.

build system:

  • New --enable-64bit-termpos configure option which makes Xapian::termpos a 64-bit type and enables support for storing 64-bit termpos values in the glass backend in an upwardly compatible way. Few people will actually want to index documents more than 4 billion words long, but the extra numbering space can be helpful if you want to use term positions in "interesting" ways.
  • xapian-pos: New tool to show term position info to help debugging when using positional information in more complex ways.

portability:

  • Fix undefined behaviour from C++ ODR violation due to using the same name two different non-static inline functions. It seems that with current GCC versions the desired function always ends up being used, but with current clang the other function is sometimes used, resulting in database corruption when using value slots in docid 16384 or higher with the default glass backend. Patch from Germán M. Bravo.
  • Avoid throwing and handling an exception in replace_document() when adding a document with a specified docid which is <= last_docid but currently unused.
  • Use our portable code for handling UUIDs on all platforms, and only use platform-specific code for generating a new UUID. This fixes a bug with converting UUIDs to and from string representation on FreeBSD, NetBSD and OpenBSD on little-endian platforms which resulted in reversed byte order in the first three components, so the same database would report a different UUID on these platforms compared to other platforms. With this fix, the UUIDs of existing databases will appear to change on these platforms (except in rare "palindronic" cases). Reported by Germán M. Bravo.
  • Fix to build with a C++17 compiler. Previously we used a "byte" type internally which clashed with std::byte in source files which use using namespace std;. Fixes #768, reported by Laurent Stacul.

omega

indexers:

  • omindex:
  • Improve date handling in .eml files. We now handle a "Date:" header without the day of the week, which is allowed by RFC822 and RFC2822 (though seems rare in practice). If the date can't be parsed, we now just omit the date information rather than failing to process the file.
  • Add support for indexing Apple iWork documents (Keynote (.key), Numbers (.numbers) and Pages (.pages)) using libetonyek. Currently only the file variants are handled since omindex doesn't currently support indexing a directory as a document.
  • Index Visio files using vsd2xhtml.
  • Extend --filter to support filters which produce SVG as output.
  • Handle SVG embedded in XML with svg: namespace prefix.
  • Add --read-filters option to read a list of filters from a file, each line of which is a rule as passed to --filter. Based on a patch from Gaurav Arora.
  • Add new --mime-type-match option which allows specifying a MIME Content-Type for a given shell filename pattern pattern (with the special Content-Type values "ignore" and "skip" supported, as for --mime-type).
  • Remove failed entries for ignored files. If a file is mapped to pseudo-mimetype "ignore" then remove any existing failure record for it so that ignored files so we don't potentially end up with a lot of cruft failure records for files we are no longer trying to index.
  • If a file fails to index due to failing to allocate enough memory we now try to flag it as failed to index so it will be skipped by default on future runs. This should help to avoid indexing getting stuck on problematic files.
  • Add a "pages" field with the number of pages in the document where we know how to determine this (currently only for PDF files for which pdfinfo reports this information).
  • scriptindex:
  • Improve scriptindex diagnostic messages. All diagnostics are now labelled as "error", "warning" or "note" as appropriate, and we now consistently report "FILE:LINE:" (and also "COLUMN:" in most cases) to make it clearer where the problem lies.
  • Add new "split" action which splits the text on a specified delimiter and executes the following actions for each piece. Based on a patch by Gaurav Arora.
  • omega:
  • Value-based date range filters can now be specified via CGI parameters START.N, END.N and/or SPAN.N where N is a value slot number, allowing multiple concurrent filters on different slots to be specified.
  • Support YYYY and YYYYMM limits in term-based date ranges. Previously value-based date ranges supported these as limits, but term-based date ranges gave an error.
  • Add stem_strategy option and deprecate existing stem_all option in favour of this new more versatile option.
  • Support "natural" $sort option via new flag "#" which sorts embedded natural numbers in numerical order.
  • Support numeric $sort option via new flag "n", similar to GNU sort -n.
Last modified 6 years ago Last modified on 10/27/18 07:56:41
Note: See TracWiki for help on using the wiki.