| 1 | |
|---|
| 2 | .. Copyright (C) 2007 Olly Betts |
|---|
| 3 | |
|---|
| 4 | ====================== |
|---|
| 5 | Xapian Synonym Support |
|---|
| 6 | ====================== |
|---|
| 7 | |
|---|
| 8 | .. contents:: Table of contents |
|---|
| 9 | |
|---|
| 10 | Introduction |
|---|
| 11 | ============ |
|---|
| 12 | |
|---|
| 13 | Xapian provides support for storing a synonym dictionary, or thesaurus. This |
|---|
| 14 | can be used by the Xapian::QueryParser class to expand terms in user query |
|---|
| 15 | strings, either automatically, or when requested by the user with an explicit |
|---|
| 16 | synonym operator (``~``). |
|---|
| 17 | |
|---|
| 18 | Note that Xapian doesn't offer automated generation of the synonym dictionary. |
|---|
| 19 | |
|---|
| 20 | Model |
|---|
| 21 | ===== |
|---|
| 22 | |
|---|
| 23 | The model for the synonym dictionary is that a term or group of consecutive |
|---|
| 24 | terms can have one or more synonym terms. A group of consecutive terms is |
|---|
| 25 | specified in the dictionary by simply joining them with a single space between |
|---|
| 26 | each one. |
|---|
| 27 | |
|---|
| 28 | QueryParser Integration |
|---|
| 29 | ======================= |
|---|
| 30 | |
|---|
| 31 | In order for any of the synonym features of the QueryParser to work, you must |
|---|
| 32 | call ``QueryParser::set_database()`` to specify the database to use. |
|---|
| 33 | |
|---|
| 34 | If ``FLAG_SYNONYM`` is passed to ``QueryParser::parse_query()`` then the |
|---|
| 35 | QueryParser will recognise ``~`` in front of a term as indicating a request for |
|---|
| 36 | synonym expansion. If ``FLAG_LOVEHATE`` is also specified, you can use ``+`` |
|---|
| 37 | and ``-`` before the ``~`` to indicate that you love or hate the synonym |
|---|
| 38 | expanded expression. |
|---|
| 39 | |
|---|
| 40 | A synonym-expanded term becomes the term itself OR-ed with any listed synonyms, |
|---|
| 41 | so ``~truck`` might expand to ``truck OR lorry OR van``. A group of terms is |
|---|
| 42 | handled in much the same way. |
|---|
| 43 | |
|---|
| 44 | If a term to be synonym expanded will be stemmed by the QueryParser, then |
|---|
| 45 | synonyms will be checked for the unstemmed form first, and then for the stemmed |
|---|
| 46 | form, so you can provide different synonyms for particular unstemmed forms |
|---|
| 47 | if you want to. |
|---|
| 48 | |
|---|
| 49 | If ``FLAG_AUTO_SYNONYMS`` is passed to ``QueryParser::parse_query()`` then the |
|---|
| 50 | QueryParser will automatically expand any term which has synonyms, unless the |
|---|
| 51 | term is in a phrase or similar. |
|---|
| 52 | |
|---|
| 53 | If ``FLAG_AUTO_MULTIWORD_SYNONYMS`` is passed to ``QueryParser::parse_query()`` |
|---|
| 54 | then the QueryParser will look at groups of terms separated only by whitespace |
|---|
| 55 | and try to expand them as term groups. This is done in a "greedy" fashion, so |
|---|
| 56 | the first term which can start a group is expanded first, and the longest group |
|---|
| 57 | starting with that term is expanded. After expansion, the QueryParser will |
|---|
| 58 | look for further possible expansions starting with the term after the last |
|---|
| 59 | term in the expanded group. |
|---|
| 60 | |
|---|
| 61 | Current Limitations |
|---|
| 62 | =================== |
|---|
| 63 | |
|---|
| 64 | Explicit multi-word synonyms |
|---|
| 65 | ---------------------------- |
|---|
| 66 | |
|---|
| 67 | There ought to be a way to explicitly request expansion of multi-term synonyms, |
|---|
| 68 | probably with the syntax ``~"stock market"``. This hasn't been implemented |
|---|
| 69 | yet though. |
|---|
| 70 | |
|---|
| 71 | Backend Support |
|---|
| 72 | --------------- |
|---|
| 73 | |
|---|
| 74 | Currently synonyms are only supported by flint databases. They work with a |
|---|
| 75 | single database or multiple databases (use Database::add_database() as usual). |
|---|
| 76 | We've no plans to support them for the deprecated Quartz backend, nor for |
|---|
| 77 | InMemory, but we do intend to support them for the remote backend in the |
|---|
| 78 | future. |
|---|