| 1 | ============ |
|---|
| 2 | Value Ranges |
|---|
| 3 | ============ |
|---|
| 4 | |
|---|
| 5 | .. contents:: Table of contents |
|---|
| 6 | |
|---|
| 7 | Introduction |
|---|
| 8 | ============ |
|---|
| 9 | |
|---|
| 10 | The ``Xapian::ValueRangeProcessor`` was introduced in Xapian 1.0.0. It |
|---|
| 11 | provides a powerful and flexible way to parse range queries in the users' |
|---|
| 12 | query string. |
|---|
| 13 | |
|---|
| 14 | This document describes the ``Xapian::ValueRangeProcessor`` class and |
|---|
| 15 | its standard subclasses, how to create your own subclasses, and how |
|---|
| 16 | these classes are used with ``Xapian::QueryParser``. |
|---|
| 17 | |
|---|
| 18 | ``Xapian::ValueRangeProcessor`` is a virtual base class, so you need to |
|---|
| 19 | use a subclass of it. ``Xapian::QueryParser`` maintains a list of |
|---|
| 20 | ``Xapian::ValueRangeProcessor`` objects which it tries in order for |
|---|
| 21 | each range search in the query until one accepts it, or all have been |
|---|
| 22 | tried (in which case an error is reported). |
|---|
| 23 | |
|---|
| 24 | Each ``Xapian::ValueRangeProcessor`` is passed the start and end of the |
|---|
| 25 | range. If it doesn't understand the range, it should return |
|---|
| 26 | ``Xapian::BAD_VALUENO``. If it does understand the range, it should return |
|---|
| 27 | the value number to use with ``Xapian::Query::OP_VALUE_RANGE`` and if it |
|---|
| 28 | wants to, it can modify the start and end values (to convert them to the |
|---|
| 29 | correct format so that for the string comparison which ``OP_VALUE_RANGE`` |
|---|
| 30 | uses). |
|---|
| 31 | |
|---|
| 32 | StringValueRangeProcessor |
|---|
| 33 | ========================= |
|---|
| 34 | |
|---|
| 35 | This is the simplest of the standard subclasses. It understands any range |
|---|
| 36 | passed (so it should always be the last ``ValueRangeProcessor``) and it |
|---|
| 37 | doesn't alter the range start or end. |
|---|
| 38 | |
|---|
| 39 | For example, suppose you have stored author names in value number 4, and want |
|---|
| 40 | the user to be able to filter queries by specifying ranges of values such as:: |
|---|
| 41 | |
|---|
| 42 | mars asimov..bradbury |
|---|
| 43 | |
|---|
| 44 | To do this, you can use a ``StringValueRangeProcessor`` like so:: |
|---|
| 45 | |
|---|
| 46 | Xapian::QueryParser qp; |
|---|
| 47 | Xapian::StringValueRangeProcessor author_proc(4); |
|---|
| 48 | qp.add_valuerangeprocessor(&author_proc); |
|---|
| 49 | |
|---|
| 50 | The parsed query will use ``OP_VALUE_RANGE``, so ``query.get_description()`` |
|---|
| 51 | would report:: |
|---|
| 52 | |
|---|
| 53 | Xapian::Query(mars:(pos=1) FILTER (VALUE_RANGE 4 asimov bradbury) |
|---|
| 54 | |
|---|
| 55 | The ``VALUE_RANGE`` subquery will only match documents where value 4 is |
|---|
| 56 | >= asimov and <= bradbury (using a string comparison). |
|---|
| 57 | |
|---|
| 58 | DateValueRangeProcessor |
|---|
| 59 | ======================= |
|---|
| 60 | |
|---|
| 61 | This class allows you to implement date range searches. As well as the value |
|---|
| 62 | number to search, you can tell it whether to prefer US-style month/day/year |
|---|
| 63 | or European-style day/month/year, and specify the epoch year to use for |
|---|
| 64 | interpreting 2 digit years (the default is day/month/year with an epoch of |
|---|
| 65 | 1970). The best choice of settings depends on the expectations of your users. |
|---|
| 66 | As these settings are only applied at search time, you can also easily offer |
|---|
| 67 | different versions of your search front-end with different settings if that is |
|---|
| 68 | useful. |
|---|
| 69 | |
|---|
| 70 | For example, if your users are American and the dates present in your database |
|---|
| 71 | can extend a decade or so into the future, you might use something like this |
|---|
| 72 | which specifies to prefer US-style dates and that the epoch year is 1930 (so |
|---|
| 73 | 02/01/29 is February 1st 2029 while 02/01/30 is February 1st 1930):: |
|---|
| 74 | |
|---|
| 75 | Xapian::QueryParser qp; |
|---|
| 76 | Xapian::DateValueRangeProcessor date_proc(0, true, 1930); |
|---|
| 77 | qp.add_valuerangeprocessor(&date_proc); |
|---|
| 78 | |
|---|
| 79 | The dates are converted to the format YYYYMMDD, so the values you index also |
|---|
| 80 | need to also be in this format - for example, if ``doc_time`` is a ``time_t``:: |
|---|
| 81 | |
|---|
| 82 | char buf[9]; |
|---|
| 83 | if (strftime(buf, sizeof(buf), "%Y%m%d", gmtime(&doc_time))) { |
|---|
| 84 | doc.add_value(0, buf); |
|---|
| 85 | } |
|---|
| 86 | |
|---|
| 87 | NumberValueRangeProcessor |
|---|
| 88 | ========================= |
|---|
| 89 | |
|---|
| 90 | .. note:: This class had a design flaw in Xapian 1.0.0 and 1.0.1 - you should |
|---|
| 91 | avoid using it with releases of Xapian earlier than 1.0.2. |
|---|
| 92 | |
|---|
| 93 | This class allows you to implement numeric range searches. The numbers used |
|---|
| 94 | may be any number which is representable as a double, but requires that the |
|---|
| 95 | stored values which the range is being applied have been converted to strings |
|---|
| 96 | at index time using the ``Xapian::sortable_serialise()`` method:: |
|---|
| 97 | |
|---|
| 98 | Xapian::Document doc; |
|---|
| 99 | doc.add_value(0, Xapian::sortable_serialise(price)); |
|---|
| 100 | |
|---|
| 101 | This method produces strings which will sort in numeric order, so you can use |
|---|
| 102 | it if you want to be able to sort based on the value in numeric order, too. |
|---|
| 103 | |
|---|
| 104 | The class allows a prefix or suffix to be specified which must be present on |
|---|
| 105 | the values, allowing multiple NumberValueRangeProcessors to be active in the |
|---|
| 106 | same queryparser. For example, this specifies that a prefix of "$" must be |
|---|
| 107 | present on the first value (and may optionally be present on the second |
|---|
| 108 | value):: |
|---|
| 109 | |
|---|
| 110 | Xapian::QueryParser qp; |
|---|
| 111 | Xapian::NumberValueRangeProcessor numrange_proc(0, "$", true); |
|---|
| 112 | qp.add_valuerangeprocessor(&numrange_proc); |
|---|
| 113 | |
|---|
| 114 | |
|---|
| 115 | |
|---|
| 116 | Custom subclasses |
|---|
| 117 | ================= |
|---|
| 118 | |
|---|
| 119 | You can easily create your own subclasses of ``Xapian::ValueRangeProcessor``. |
|---|
| 120 | Your subclass needs to implement a method |
|---|
| 121 | ``Xapian::valueno operator()(std::string &begin, std::string &end)`` |
|---|
| 122 | so for example you could implement a better version of the author range |
|---|
| 123 | described above which only matches ranges with a prefix (e.g. |
|---|
| 124 | ``author:asimov..bradbury``) and lower-cases the names:: |
|---|
| 125 | |
|---|
| 126 | struct AuthorValueRangeProcessor : public Xapian::ValueRangeProcessor { |
|---|
| 127 | AuthorValueRangeProcessor() {} |
|---|
| 128 | ~AuthorValueRangeProcessor() {} |
|---|
| 129 | |
|---|
| 130 | Xapian::valueno operator()(std::string &begin, std::string &end) { |
|---|
| 131 | if (begin.substr(0, 7) != "author:") |
|---|
| 132 | return Xapian::BAD_VALUENO; |
|---|
| 133 | begin.erase(0, 7); |
|---|
| 134 | begin = Xapian::Unicode::tolower(term); |
|---|
| 135 | end = Xapian::Unicode::tolower(term); |
|---|
| 136 | return 4; |
|---|
| 137 | } |
|---|
| 138 | }; |
|---|
| 139 | |
|---|
| 140 | Using Several ValueRangeProcessors |
|---|
| 141 | ================================== |
|---|
| 142 | |
|---|
| 143 | If you want to allow the user to specify different types of ranges, you can |
|---|
| 144 | specify multiple ``ValueRangeProcessor`` objects to use. Just add them in |
|---|
| 145 | the order you want them to be checked:: |
|---|
| 146 | |
|---|
| 147 | Xapian::QueryParser qp; |
|---|
| 148 | AuthorValueRangeProcessor author_proc(); |
|---|
| 149 | qp.add_valuerangeprocessor(&author_proc); |
|---|
| 150 | Xapian::DateValueRangeProcessor date_proc(0, false, 1930); |
|---|
| 151 | qp.add_valuerangeprocessor(&date_proc); |
|---|
| 152 | |
|---|
| 153 | And then you can parse queries such as |
|---|
| 154 | ``mars author:Asimov..Bradbury 01/01/1960..31/12/1969`` successfully. |
|---|