Opened 16 years ago

Closed 13 years ago

#280 closed enhancement (fixed)

Review storage of parameters in Query

Reported by: Richard Boulton Owned by: Olly Betts
Priority: normal Milestone: 1.3.0
Component: Library API Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

Currently, Xapian::Query::Internal stores any "double" parameter value as a string using serialise_double(). There is a FIXME in the code for set_dbl_parameter() and get_dbl_parameter() (around line 976 of api/omqueryinternal.cc) saying: "FIXME: rework for 1.1.0".

This hasn't been changed until now due to fear of breaking ABI compatibility.

Instead, we should store double parameters as doubles in Query::Internal.

While reorganising this, it might be worth making parameter storage a bit more general, and tidying it up. We currently have the following parameters stored in Query::Internal:

  • op: The operation to perform
  • subqs: A list of subqueries
  • parameter: A "termcount" - used for the wqf of leaf queries, in NEAR and PHRASE it's the window size, used by ELITE_SET to be the number of terms, and used by RANGE to be the value number to apply the range to. For the last of these, a "termcount" type isn't really appropriate (though it is probably the same storage size as "valueno" at present, so it probably works correctly).
  • tname: A string holding the term, for a leaf query. The start of the range, for a range query.
  • str_parameter: The end of the range for a range query. The result of serialise_double() on the multiplier for OP_SCALE_WEIGHT queries.
  • term_pos: The position of the term for leaf queries.
  • external_source: The external source, for external source queries.

Two approaches seem plausible to me - firstly, we could define a union with the possible parameter types, and store the parameters in a list of these unions. Alternatively, we could subclass Query::Internal for each of the possible query types, and just store the appropriate parameters for each.

The latter approach seems cleaner to me, and more likely to be flexible for future expansion of the available query operators, but I've not thought about this much yet.

Attachments (1)

patch (21.5 KB ) - added by Richard Boulton 16 years ago.
Patch moving Query::Internal out of query.h, by using an InternalPtr class.

Download all attachments as: .zip

Change History (11)

comment:1 by Olly Betts, 16 years ago

Status: newassigned

I've actually looked at this already.

A union doesn't help as much as you might hope as you can only put POD types in it. So every object would still need to have a subqs member. If new operators need a new non-POD type, then every operator pays the price. Also, every object has to be the same size (unless you under-allocate space, which is really nasty). And a union is syntactically clumsy for the "shared" members (not significantly better than just using "parameter" I feel).

Subclassing is almost certainly the better option, though ideally I think we want to try to get the internal class out of the external header first. I've not figured out how that could be done yet though...

comment:2 by Richard Boulton, 16 years ago

I had just about come to the same conclusions as you, that subclassing is probably the way to go.

One approach to getting the internal class out of the header is to make an "InternalPtr" class which wraps the reference counted pointer, and using a pointer to that instead. This leads to an extra redirection, which is annoying, but I think the benefit of not having the implementation in the public header would be worth it (and it's not in a particularly performance critical place).

I have an experimental patch which does this which I'll attach shortly.

by Richard Boulton, 16 years ago

Attachment: patch added

Patch moving Query::Internal out of query.h, by using an InternalPtr class.

comment:3 by Olly Betts, 16 years ago

Milestone: 1.1.01.1.1

This is an API compatible change, so bumping to 1.1.1.

comment:4 by Olly Betts, 16 years ago

Milestone: 1.1.11.1.7

Triaging milestone:1.1.1 bugs.

comment:5 by Olly Betts, 15 years ago

Priority: normallow

Not madly keen on the idea of adding another layer of indirection and another small memory allocation per Xapian::Query object - we can end up creating a lot of temporary ones as we parse queries (so I'm not sold on this being not performance critical with benchmarking). Nice to move all the innards out of the header though.

This is really an internal clean-up rather than being directly beneficial to users. Being able to modify the innards and keep the ABI compatible is an indirect benefit, but the indirection means there's a penalty to pay.

Marking as low priority for now.

comment:6 by Olly Betts, 15 years ago

Description: modified (diff)

Fix misinformation in description - it's serialise_double() not sortable_serialise() which is used here.

comment:7 by Olly Betts, 15 years ago

Description: modified (diff)

In trunk r13095, I've removed wqf - we now store the wqf in the parameter member. Updated the description to match this new scheme. This should save us 4 bytes per Query object.

comment:8 by Olly Betts, 15 years ago

Milestone: 1.1.71.3.0

While Query::Internal is a bit of an ugly mess overall, it does work currently (except for some remaining issues with the O() behaviour of pairwise construction which affects unusually large queries). So a rewrite would be good, but is potentially destabilising at a point where we really want to avoid that, and the only real direct benefit to users would be that Query objects would need less memory.

Added a layer of indirection would allow us to tidy up during 1.2.x, but there's a small time and space penalty for that change right now, and even with smaller internals, there's still the time and code overhead of the indirection, and an extra memory allocation per Query object for the indirecting object.

Adding a double member to house the scaling factor for OP_SCALE_WEIGHT would add a space overhead for every Query object, but make use of OP_SCALE_WEIGHT a little faster (though note that if you use the remote backend, you don't save any time as then we're essentially eagerly converting the scale factor to its serialised form).

So I think the "least worst" action is probably to postpone this ticket to the next development series - I've added a new 1.3.0 milestone for this purpose.

comment:9 by Olly Betts, 13 years ago

Priority: lownormal

I've started to work on this.

My plan is to have a virtual base class for Xapian::Internal, with a subclass for each "type" of operation. Not sure whether to handle AND/OR/XOR/... together or not currently, etc. Perhaps they'll be mostly handled by a common parent class, or perhaps we'll just store the op in the derived class.

Anyway, that should mean only the virtual base class needs to go in the external header, and perhaps not even that.

comment:10 by Olly Betts, 13 years ago

Resolution: fixed
Status: assignedclosed

Fixed by merge of new query internals in r16236.

Note: See TracTickets for help on using tickets.