wiki:ReleaseOverview/1.0.3

Release Overview for 1.0.3

This page contains a high level description of the more notable changes made for release 1.0.3. For full details, see the NEWS files in each module:

Tar Format

Distribution tarballs are now in the POSIX "ustar" format. This supports pathnames longer than 99 characters (which we now have a few instances of in the doxygen generated documentation) and also results in a xapian-core tarball that is about half the size! This format should be readable by any tar program in current use - if your tar program doesn't support it, we'd like to know (but note that the GNU tar tarball is smaller than the size reduction in the xapian-core tarball...)

User Metadata

Xapian now allows you to store arbitrary amounts of "metadata" in the database (currently the flint and inmemory backends support this). The metadata is versioned along with the other database contents.

Matcher Fixes and Enhancements

The lower bound on the number of matching documents for an AND query has been improved in the case where a lot of documents match either side.

If the checkatleast parameter to Enquire::get_mset() is used, but there are fewer results than checkatleast then MSet::get_matches_lower_bound() and MSet::get_matches_upper_bound() are now always reported as equal.

When sorting by value, and using the checkatleast parameter to Enquire::get_mset(), some potential matches weren't being counted.

Flint backend

The Flint database format has been extended to support user metadata, and each termlist entry is now a byte shorter (before compression). As a result, Xapian 1.0.2 and earlier won't be able to read Xapian 1.0.3 databases. However, Xapian 1.0.3 can read older databases. If you open an older flint database for writing with Xapian 1.0.3, it will be upgraded such that it cannot then be read by Xapian 1.0.2 and earlier.

Zlib compression wasn't being used for the spelling or synonym tables (due to a typo - Z_DEFAULT_COMPRESSION where it should be Z_DEFAULT_STRATEGY).

Flint can now write out changes during a transaction without actually committing them, which means it should no longer be possible to run out of memory by doing an enormous number of operations as a transaction.

We now check that the length of new terms is at most 245 bytes in WritableDatabase::add_document() and WritableDatabase::replace_document() so you get an error right away rather than when flush() is (explicitly or implicitly) called.

Flint used to read the value of the environmental variable XAPIAN_FLUSH_THRESHOLD when the first WritableDatabase was opened and would then cache this value. However the program using Xapian may have changed it, so we now reread it each time a WritableDatabase is opened.

Remote Backend

MSet::get_matches_lower_bound() now gives a correct answer when using the checkatleast parameter to Enquire::get_mset().

Build System

xapian-config --libs now gives the correct output when shared libraries are disabled.

Documentation

We now have a glossary, and a number of other documents have been improved in minor ways.

Omega

omindex now supports indexing AbiWord documents and TeX DVI files.

omindex now imposes a 5 minute CPU time limit on external filter programs to prevent indexing from stalling if an external filter goes into an infinite loop (e.g. on malformed input).

scriptindex now correctly reports line numbers for format errors in dump files.

Add $muldiv{A,B,C} which calculates int(A*B/C).

Fixed a bug in the decimal fraction in $size for files >= 1M in size.

The query template has been overhauled and improved.

xapian-bindings

make uninstall now removes the loadable module we install for each of the bindings, and make distcheck now works. make clean now removes class files for Java inner classes and testsuite.pyc.

PHP

Fixed wrapping of Enquire::set_cutoff() - previously this would only work if the third parameter was specified and a floating point number (e.g. 0.0). Fix errors in example code in the PHP bindings documentation.

Python

ValueRangeProcessor::operator() is now wrapped as a __call__ method in Python which takes two strings and returns a 3-tuple (value_number, modified_begin, modified_end). Previously this always failed with a type error, so this doesn't break existing code. The issues with mod_python are now documented.

Ruby

configure now checks for RUBY_INC, RUBY_LIB, and RUBY_LIB_ARCH in the environment or on the command-line. The defaults for RUBY_LIB and RUBY_LIB_ARCH are now the site-specific directories, which is more correct when building from source.

Tcl

configure now checks for TCL_LIB in the environment or on the command-line to allow installing without root access more cleanly.

Last modified 8 years ago Last modified on 26/01/16 10:10:43
Note: See TracWiki for help on using the wiki.