Ticket #280 (assigned enhancement)

Opened 5 months ago

Last modified 5 months ago

Review storage of parameters in Query

Reported by: richard Owned by: olly
Priority: normal Milestone: 1.1.0
Component: Library API Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Operating System: All Blocking:

Description

Currently, Xapian::Query::Internal stores any "double" parameter value as a sortable_serialised string. There is a FIXME in the code for set_dbl_parameter() and get_dbl_parameter() (around line 976 of api/omqueryinternal.cc) saying: "FIXME: rework for 1.1.0". This hasn't been changed until now due to fear of breaking ABI compatibility.

Instead, we should store double parameters as doubles in Query::Internal.

While reorganising this, it might be worth making parameter storage a bit more general, and tidying it up. We currently have the following parameters stored in Query::Internal:

  • op: The operation to perform
  • subqs: A list of subqueries
  • parameter: A "termcount" - used by NEAR and PHRASE to be the window size, used by ELITE_SET to be the number of terms, and used by RANGE to be the value number to apply the range to. For the last of these, a "termcount" type isn't really appropriate (though it is probably the same storage size as "valueno" at present, so it probably works correctly).
  • tname: A string holding the term, for a leaf query. The start of the range, for a range query.
  • str_parameter: The end of the range for a range query. The result of sortable_serialise() on the multiplier for OP_SCALE_WEIGHT queries.
  • term_pos: The position of the term for leaf queries.
  • wqf: The within query frequency, for leaf queries.
  • external_source: The external source, for external source queries.

Two approaches seem plausible to me - firstly, we could define a union with the possible parameter types, and store the parameters in a list of these unions. Alternatively, we could subclass Query::Internal for each of the possible query types, and just store the appropriate parameters for each.

The latter approach seems cleaner to me, and more likely to be flexible for future expansion of the available query operators, but I've not thought about this much yet.

Attachments

patch (21.5 kB) - added by richard 5 months ago.
Patch moving Query::Internal out of query.h, by using an InternalPtr? class.

Change History

Changed 5 months ago by olly

  • status changed from new to assigned

I've actually looked at this already.

A union doesn't help as much as you might hope as you can only put POD types in it. So every object would still need to have a subqs member. If new operators need a new non-POD type, then every operator pays the price. Also, every object has to be the same size (unless you under-allocate space, which is really nasty). And a union is syntactically clumsy for the "shared" members (not significantly better than just using "parameter" I feel).

Subclassing is almost certainly the better option, though ideally I think we want to try to get the internal class out of the external header first. I've not figured out how that could be done yet though...

Changed 5 months ago by richard

I had just about come to the same conclusions as you, that subclassing is probably the way to go.

One approach to getting the internal class out of the header is to make an "InternalPtr?" class which wraps the reference counted pointer, and using a pointer to that instead. This leads to an extra redirection, which is annoying, but I think the benefit of not having the implementation in the public header would be worth it (and it's not in a particularly performance critical place).

I have an experimental patch which does this which I'll attach shortly.

Changed 5 months ago by richard

Patch moving Query::Internal out of query.h, by using an InternalPtr? class.

Note: See TracTickets for help on using tickets.