wiki:ReleaseOverview/1.0.2

Release Overview for 1.0.2

This page contains a high level description of the more notable changes made for release 1.0.2. For full details, see the NEWS files in each module:

Spelling correction

Xapian now offers spelling correction for searches, based on a dynamically maintained list of spelling "target" words. The TermGenerator has support for updating this table from the words in the text supplied to it, or the words can be added directly to the database. The spelling correction data is stored in a new Btree table in the Xapian database.

Synonym expansion

The QueryParser now has support for performing synonym expansion, based on a table of synonyms. These may be single or multi-word synonyms.

Remote protocol version increased

The remote protocol version has been increased to allow access to new features. However, thanks to changes in the last release, we have been able to perform this compatibly: old clients can safely connect to the new server. To upgrade a live system, first upgrade your server, and then upgrade your clients.

Optional Btree tables

The Flint Btree manager has been enhanced so that it doesn't create position list or value tables if no entries are added to them. This reduces the number of file handles used in this case, and the time required to open the database. It also means that a database with positional information can be converted to one without positional information simply by deleting the position table, which may be useful for testing performance with and without positional information.

The new spelling and synonym tables are also optional tables - they may safely be removed without affecting the performance of the rest of the database, and will only be created if data is put into them.

The version number of the Btree format has been incremented to support this change, but 1.0.2 will happily read and modify databases created by 1.0.0 or 1.0.1. However, databases which have been created or modified by 1.0.2 cannot be read by 1.0.0 or 1.0.1.

QueryParser improvements

The precedence of the boolean operators has been adjusted to match their usual precedence in mathematics and programming languages. "NOT" now binds as tightly as "AND", and "XOR" now binds more tightly than "OR", but less tightly than "AND".

Also, the behaviour of '+' and '-' on bracketed subexpressions was not behaving as documented; the behaviour has been fixed to match the documentation.

Finally, if the stemmer is set to "none", we no longer put a Z prefix on terms; this matches the output of TermGenerator, and should fix some reported cases of no results being returned when stemming was disabled.

New functions to assist with numeric sorting and ranges

Two new functions, Xapian::sortable_serialise() and Xapian::sortable_unserialise(), have been added. These convert between floating point numbers (as doubles) and strings, such that the sorting order of the strings is the same as the sorting order of the doubles.

Fixed NumberValueRangeProcessor

The NumberValueRangeProcessor class has been fixed to use the new numeric sorting functions: it assumes that the value being range filtered has been stored as a string converted from a double using Xapian::sortable_serialise(). This allows the NumberValueRangeProcessor class to work with arbitrary numeric values, rather than being restricted to positive integers in a fixed length representation.

Fixed bugs in the matcher

Two bugs related to the matcher were fixed: firstly, if the check_at_least parameter was supplied to get_mset(), the resulting !MSet could contain check_at_least items, instead of the (lower) maximum number of requested items. The efficiency of a match using check_at_least has also been improved - no extra memory is now needed to support check_at_least.

Secondly, if a search involving a MatchAll query was performed, the wrong statistics would be used, and assertions could fail.

Fixed bugs in delete_document()

In previous releases, WritableDatabase::delete_document() would cancel all pending changes if the document ID specified didn't exist (because the implementation assumed that any exception was a problem which should result in the transaction being aborted). This wasn't intended, and isn't particularly helpful behaviour, so it's been changed: in this situation, the pending changes will no longer be discarded.

Fixed bug in exception handling during commit

A similar bug in the handling of exceptions during commit() was found and fixed, which could have resulted in tables getting out-of-sync, resulting in a corrupt database. This hadn't been directly reported before, but one or two unverifiable reports of errors during a session leading to corruption may have been caused by this problem.

Last modified 8 years ago Last modified on 26/01/16 10:10:43
Note: See TracWiki for help on using the wiki.