Release Overview for 1.0.0
This page contains a high level description of the changes made for release 1.0.0. For full details, see the NEWS files in each module:
Unicode Support
Much of Xapian just treats strings as opaque data, but Xapian::Stem, Xapian::QueryParser and the new Xapian::TermGenerator class need to know what text the data represents. These classes now assumes Unicode data encoded as UTF-8. The full range of Unicode 5.0 (U+0000 to U+10FFFF) is supported.
It's still possible to use other encodings for text with Xapian, but you won't be able to make use of the above classes (except by transforming inputs to UTF-8 and outputs back from UTF-8).
Omega now works in UTF-8, with omindex converting document text to UTF-8 if it isn't already.
The bindings for other languages have been updated - where the language uses a standard encoding internally, we convert this to/from UTF-8 automatically if it isn't already UTF-8. Some languages don't have a standard internal representation - for these we've just documented the fact, with a note about how to transform text to/from UTF-8. See the new "Unicode" section of the appropriate language-specific documentation for more details.
Updated Stemmers
The stemmers have been updated to the latest version of the Snowball stemmers. This means that a small number of words produce different (and generally better) stems. The stemmers are now generated to work in UTF-8.
Some new stemmers are now included: german2 (like german but normalises umlauts), hungarian, kraaij_pohlmann (a different Dutch stemmer), romanian, and turkish.
Removal of Deprecated Features
Any features deprecated since 0.9.0 or earlier have been removed. There's a new document (xapian-core/docs/deprecation.html) which details these, along with suggested replacements. Features which are currently deprecated but still present are also listed, along with an intended schedule for removal.
New Indexing Strategy
We've thoroughly reviewed how we generate terms from a piece of text, and how we generate terms from a query string, to produce a new indexing strategy which is implemented by the new Xapian::TermGenerator class. The Xapian::QueryParser class has been updated to match.
The new strategy is documented in xapian-core/docs/termgenerator.html.
Remote Backend Improvements
The remote backend now supports all the features which local backends do, and some operations now require less data to be transferred across the link. It also now works on Microsoft Windows.
New QueryParser Features
The QueryParser now supports:
- Value ranges, e.g.
$50..100
,10..20kg
,01/02/2007..03/04/2007
- Pure NOT queries, e.g.
NOT apples
- Partial queries - wildcarding the last term in a query to support "search as you type".
Backend changes
Flint is now the default backend, and uses zlib to compress tags in the record and termlist tables. Quartz is still supported, but deprecated and likely to be removed in Xapian 1.1.0. The Muscat36 backend has been removed completely now.
Python improvements
The Python bindings have been improved in many ways in this release, and now present a more "pythonic" API in many places. They also release the GIL during calls to Xapian so will work better in a threaded Python application. Docstrings are now attached to most functions, methods and classes so it should be easier to find your way around: further improvements to these docstrings are planned.
Smaller and faster to build
We've made a number of improvements to the build system to make Xapian quicker to build and use less disk space. We also now make use of the symbol visibility in GCC 4.0 and later, and also -Bsymbolic-functions where available, to produce a smaller shared library which loads quicker and even runs a bit faster.
Better Documentation
We've improved the documentation for this release; further improvements are in the pipeline.