Release Overview for 1.0.3
This page contains a high level description of the more notable changes made for release 1.0.3. For full details, see the NEWS files in each module:
Tar Format
Distribution tarballs are now in the POSIX "ustar" format. This supports pathnames longer than 99 characters (which we now have a few instances of in the doxygen generated documentation) and also results in a xapian-core tarball that is about half the size! This format should be readable by any tar program in current use - if your tar program doesn't support it, we'd like to know (but note that the GNU tar tarball is smaller than the size reduction in the xapian-core tarball...)
User Metadata
Xapian now allows you to store arbitrary amounts of "metadata" in the database (currently the flint and inmemory backends support this). The metadata is versioned along with the other database contents.
Matcher Fixes and Enhancements
The lower bound on the number of matching documents for an AND query has been improved in the case where a lot of documents match either side.
If the checkatleast
parameter to Enquire::get_mset()
is used, but there
are fewer results than checkatleast
then MSet::get_matches_lower_bound()
and
MSet::get_matches_upper_bound()
are now always reported as equal.
When sorting by value, and using the checkatleast
parameter to
Enquire::get_mset()
, some potential matches weren't being counted.
Flint backend
The Flint database format has been extended to support user metadata, and each termlist entry is now a byte shorter (before compression). As a result, Xapian 1.0.2 and earlier won't be able to read Xapian 1.0.3 databases. However, Xapian 1.0.3 can read older databases. If you open an older flint database for writing with Xapian 1.0.3, it will be upgraded such that it cannot then be read by Xapian 1.0.2 and earlier.
Zlib compression wasn't being used for the spelling or synonym tables (due
to a typo - Z_DEFAULT_COMPRESSION
where it should be Z_DEFAULT_STRATEGY
).
Flint can now write out changes during a transaction without actually committing them, which means it should no longer be possible to run out of memory by doing an enormous number of operations as a transaction.
We now check that the length of new terms is at most 245 bytes in
WritableDatabase::add_document()
and WritableDatabase::replace_document()
so you get an error right away rather than when flush() is (explicitly or
implicitly) called.
Flint used to read the value of the environmental variable
XAPIAN_FLUSH_THRESHOLD
when the first WritableDatabase
was opened and would
then cache this value. However the program using Xapian may have changed
it, so we now reread it each time a WritableDatabase
is opened.
Remote Backend
MSet::get_matches_lower_bound()
now gives a correct answer when using the
checkatleast
parameter to Enquire::get_mset()
.
Build System
xapian-config --libs
now gives the correct output when shared libraries
are disabled.
Documentation
We now have a glossary, and a number of other documents have been improved in minor ways.
Omega
omindex
now supports indexing AbiWord documents and TeX DVI files.
omindex
now imposes a 5 minute CPU time limit on external filter programs
to prevent indexing from stalling if an external filter goes into an infinite
loop (e.g. on malformed input).
scriptindex
now correctly reports line numbers for format errors in dump files.
Add $muldiv{A,B,C}
which calculates int(A*B/C)
.
Fixed a bug in the decimal fraction in $size
for files >= 1M in size.
The query template has been overhauled and improved.
xapian-bindings
make uninstall
now removes the loadable module we install for each of
the bindings, and make distcheck
now works. make clean
now
removes class files for Java inner classes and testsuite.pyc
.
PHP
Fixed wrapping of Enquire::set_cutoff() - previously this would only work
if the third parameter was specified and a floating point number (e.g. 0.0
).
Fix errors in example code in the PHP bindings documentation.
Python
ValueRangeProcessor::operator()
is now wrapped as a __call__
method in
Python which takes two strings and returns a 3-tuple (value_number,
modified_begin, modified_end). Previously this always failed with a
type error, so this doesn't break existing code.
The issues with mod_python are now documented.
Ruby
configure
now checks for RUBY_INC
, RUBY_LIB
, and
RUBY_LIB_ARCH
in the environment or on the command-line.
The defaults for RUBY_LIB
and RUBY_LIB_ARCH
are
now the site-specific directories, which is more correct when building
from source.
Tcl
configure
now checks for TCL_LIB
in the environment or on the
command-line to allow installing without root access more cleanly.