| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
|---|
| 2 | <HTML> |
|---|
| 3 | <HEAD> |
|---|
| 4 | <TITLE>Xapian::QueryParser Syntax</TITLE> |
|---|
| 5 | </HEAD> |
|---|
| 6 | <BODY BGCOLOR="white" TEXT="black"> |
|---|
| 7 | |
|---|
| 8 | <H1>Xapian::QueryParser Syntax</H1> |
|---|
| 9 | |
|---|
| 10 | <P>This document describes the query syntax supported by the |
|---|
| 11 | Xapian::QueryParser class. The syntax is designed to be similar to other web |
|---|
| 12 | based search engines, so that users familiar with them don't have to learn |
|---|
| 13 | a whole new syntax. |
|---|
| 14 | |
|---|
| 15 | <H2>Operators</H2> |
|---|
| 16 | |
|---|
| 17 | <H3>AND</H3> |
|---|
| 18 | |
|---|
| 19 | <P><i>expression</i> AND <i>expression</i> matches documents which are matched |
|---|
| 20 | by both of the subexpressions. |
|---|
| 21 | |
|---|
| 22 | <H3>OR</H3> |
|---|
| 23 | |
|---|
| 24 | <P><i>expression</i> OR <i>expression</i> matches documents which are matched |
|---|
| 25 | by either of the subexpressions. |
|---|
| 26 | |
|---|
| 27 | <H3>NOT</H3> |
|---|
| 28 | |
|---|
| 29 | <P><i>expression</i> NOT <i>expression</i> matches documents which are matched |
|---|
| 30 | by only the first subexpression. This can also be written as |
|---|
| 31 | <i>expression</i> AND NOT <i>expression</i>. |
|---|
| 32 | If <code>FLAG_PURE_NOT</code> is enabled, then <P>NOT <i>expression</i> will |
|---|
| 33 | match documents which don't match the subexpression. |
|---|
| 34 | |
|---|
| 35 | <H3>XOR</H3> |
|---|
| 36 | |
|---|
| 37 | <P><i>expression</i> XOR <i>expression</i> matches documents which are matched |
|---|
| 38 | by one or other of the subexpressions, but not both. XOR is probably a bit |
|---|
| 39 | esoteric. |
|---|
| 40 | |
|---|
| 41 | <H3>Bracketed expressions</H3> |
|---|
| 42 | |
|---|
| 43 | <P>You can control the precedence of the boolean operators using brackets. |
|---|
| 44 | In the query <code>one OR two AND three</code> the AND takes precedence, |
|---|
| 45 | so this is the same as <code>one OR (two AND three)</code>. You can override |
|---|
| 46 | the precedence using <code>(one OR two) AND three</code>. |
|---|
| 47 | |
|---|
| 48 | <H3>+ and -</H3> |
|---|
| 49 | |
|---|
| 50 | <P>A group of terms with some marked with + and - will match documents |
|---|
| 51 | containing all of the + terms, but none of the - terms. Terms |
|---|
| 52 | not marked with + or - contribute towards the document rankings. |
|---|
| 53 | You can also use + and - on phrases and on bracketed expressions. |
|---|
| 54 | |
|---|
| 55 | <H3>NEAR</H3> |
|---|
| 56 | |
|---|
| 57 | <P><code>one NEAR two NEAR three</code> matches documents containing those |
|---|
| 58 | words within 10 words of each other. You can set the threshold to <i>n</i> |
|---|
| 59 | by using <code>NEAR/<i>n</i></code> like so: <code>one NEAR/6 two</code>. |
|---|
| 60 | |
|---|
| 61 | <H3>ADJ</H3> |
|---|
| 62 | |
|---|
| 63 | <P><code>ADJ</code> is like <code>NEAR</code> but only matches if the words |
|---|
| 64 | appear in the same order as in the query. So <code>one ADJ two ADJ |
|---|
| 65 | three</code> matches documents containing those three words in that order |
|---|
| 66 | and within 10 words of each other. You can set the threshold to <i>n</i> |
|---|
| 67 | by using <code>ADJ/<i>n</i> like so: <code>one ADJ/6 two</code>. |
|---|
| 68 | |
|---|
| 69 | <H3>Phrase searches</H3> |
|---|
| 70 | |
|---|
| 71 | <P>A phrase surrounded with double quotes ("") matches documents containing |
|---|
| 72 | that exact phrase. Hyphenated words are also treated as phrases, as are |
|---|
| 73 | cases such as filenames and email addresses (e.g. /etc/passwd or president@whitehouse.gov). |
|---|
| 74 | |
|---|
| 75 | <H3>Searching within a probabilistic field</H3> |
|---|
| 76 | |
|---|
| 77 | <P>If the database has been indexed with prefixes on probabilistic terms |
|---|
| 78 | from certain fields, you can set up a prefix map so that the user can |
|---|
| 79 | search within those fields. For example <code>author:dickens title:shop</code> |
|---|
| 80 | might find documents by dickens with shop in the title. You can also specify a |
|---|
| 81 | prefix on a quoted phrase or on a bracketed expression. |
|---|
| 82 | |
|---|
| 83 | <H3>Searching for proper names</H3> |
|---|
| 84 | |
|---|
| 85 | <P>If a query term is entered with a capitalised first letter, then it will |
|---|
| 86 | be searched for unstemmed. |
|---|
| 87 | |
|---|
| 88 | <H3>Range searches</H3> |
|---|
| 89 | |
|---|
| 90 | <P>The QueryParser can be configured to support range-searching using document |
|---|
| 91 | values. The syntax for a range search is <code><i>start</i>..<i>end</i></code> |
|---|
| 92 | - for example, <code>01/03/2007..04/04/2007</code>, <code>$10..100</code>, |
|---|
| 93 | <code>5..10kg</code>. |
|---|
| 94 | |
|---|
| 95 | <H3>Synonyms</H3> |
|---|
| 96 | |
|---|
| 97 | <P>The QueryParser can be configured to support synonyms, which can either |
|---|
| 98 | be used when explicitly specified (using the syntax <code>~<i>term</i></code>) |
|---|
| 99 | or implicitly (synonyms will be used for all terms or groups of terms for |
|---|
| 100 | which they have been specified). |
|---|
| 101 | |
|---|
| 102 | <H3>Wildcards</H3> |
|---|
| 103 | |
|---|
| 104 | <P>The QueryParser supports using a trailing '*' wildcard, which matches any |
|---|
| 105 | number of trailing characters, so <code>wildc*</code> would match wildcard, |
|---|
| 106 | wildcarded, wildcards, wildcat, wildcats, etc. This feature is disabled |
|---|
| 107 | by default - pass <code>Xapian::QueryParser::FLAG_WILDCARD</code> in the flags |
|---|
| 108 | argument of |
|---|
| 109 | <code>Xapian::QueryParser::parse_query(<i>query_string</i>, <i>flags</i>)</code> |
|---|
| 110 | to enable it, and tell the QueryParser which database to expand wildcards from |
|---|
| 111 | using the <code>QueryParser::set_database(<i>database</i>)</code> method. |
|---|
| 112 | |
|---|
| 113 | <H3>Partially entered query matching</H3> |
|---|
| 114 | |
|---|
| 115 | <P> |
|---|
| 116 | The QueryParser also supports performing a search with a query which has only |
|---|
| 117 | been partially entered. This is intended for use with "incremental search" |
|---|
| 118 | systems, which don't wait for the user to finish typing their search before |
|---|
| 119 | displaying an initial set of results. For example, in such a system a user |
|---|
| 120 | would enter a search, and the system would display a new set of results after |
|---|
| 121 | each letter, or whenever the user pauses for a short period of time (or some |
|---|
| 122 | other similar strategy). |
|---|
| 123 | |
|---|
| 124 | <P> |
|---|
| 125 | The problem with this kind of search is that the last word in a partially |
|---|
| 126 | entered query often has no semantic relation to the completed word. For |
|---|
| 127 | example, a search for "dynamic cat" would return a quite different set of |
|---|
| 128 | results to a search for "dynamic categorisation". This results in the set of |
|---|
| 129 | results displayed flicking rapidly as each new character is entered. A much |
|---|
| 130 | smoother result can be obtained if the final word is treated as having an |
|---|
| 131 | implicit terminating wildcard, so that it matches all words starting with the |
|---|
| 132 | entered characters - thus, as each letter is entered, the set of results |
|---|
| 133 | displayed narrows down to the desired subject. |
|---|
| 134 | |
|---|
| 135 | <P> |
|---|
| 136 | A similar effect could be obtained simply by enabling the wildcard matching |
|---|
| 137 | option, and appending a "*" character to each query string. However, this |
|---|
| 138 | would be confused by searches which ended with punctuation or other |
|---|
| 139 | characters. |
|---|
| 140 | |
|---|
| 141 | <P> |
|---|
| 142 | This feature is disabled by default - pass |
|---|
| 143 | <code>Xapian::QueryParser::FLAG_PARTIAL</code> |
|---|
| 144 | flag in the flags argument of |
|---|
| 145 | <code>Xapian::QueryParser::parse_query(<i>query_string</i>, <i>flags</i>)</code> |
|---|
| 146 | to enable it, and tell the QueryParser which database to expand wildcards from |
|---|
| 147 | using the <code>QueryParser::set_database(<i>database</i>)</code> method. |
|---|
| 148 | </BODY> |
|---|
| 149 | </HTML> |
|---|