Opened 16 years ago
Closed 14 years ago
#350 closed enhancement (fixed)
Stopping wildcard expansion at some point
Reported by: | Adam Sjøgren | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.2.6 |
Component: | QueryParser | Version: | |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description (last modified by )
If there are many (many) terms with the same prefix and a wildcard search matching those is performed, the QueryParser tries to create a huge query, exhausting RAM even on reasonably sized machines (8GB).
Attached is a patch that makes it possible to ask the QueryParser to throw an exception if a wildcard expands to more than a configurable number of terms.
Some discussion on the mailinglist:
Attachments (5)
Change History (21)
by , 16 years ago
Attachment: | 0001-Add-set_max_wildcard_expansion-method-to-the-queryp2.patch added |
---|
comment:1 by , 16 years ago
Description: | modified (diff) |
---|---|
Milestone: | → 1.1.1 |
Status: | new → assigned |
Adding to the 1.1.1 bucket...
comment:4 by , 14 years ago
Just done a brief read-through of the patch - it looks pretty reasonable, though in xapian-core/queryparser/queryparser_internal.h I think the max_wildcard_expansion member should be initialised in the constructor, rather than assigned to. ie:
default_op(Query::OP_OR), errmsg(NULL): max_wildcard_expansion(0) { }
The general functionality of this seems useful - though perhaps it should be a query-wide setting: the exception would be raised if a query would consist of more than the specified number of terms. This would catch cases where a query contained multiple wildcards, each individual wildcard being composed of sufficiently few terms.
comment:5 by , 14 years ago
Thanks for the comments! I am attaching an updated patch that includes your change, and also displays the max number of terms in the exception message.
I haven't looked into making it a query-wide setting; I though I would start by putting the updated patch up here. In my usage a low factor N - i.e. how many wildcards a user is likely to enter, won't make a difference, but I am willing to give it a shot if needed.
by , 14 years ago
Attachment: | 0001-Add-set_max_wildcard_expansion-method-to-the-queryp3.patch added |
---|
Patch updated as per richards comments
comment:6 by , 14 years ago
Just to note the idea I mentioned on IRC - the issue of wanting to be able to push the wildcard expansion from the QueryParser to the matcher can be easily be addressed by documenting that the exception may be thrown by !QueryParser::parse_query(), or later when !Enquire is actually handling the query.
comment:7 by , 14 years ago
If I were to try and tackle adding documentation to accompany the code in the patch, where should the documentation go?
Thanks!
comment:8 by , 14 years ago
I think xapian-core/docs/queryparser.html is the appropriate file (the "wildcards" section).
Olly - any other place you can think of?
by , 14 years ago
Attachment: | 0001-Add-set_max_wildcard_expansion-method-to-the-queryp4.patch added |
---|
Tried adding a paragraph of documentation to queryparser.html
comment:9 by , 14 years ago
Milestone: | 1.2.x → 1.2.5 |
---|
I doubt further thinking time will magically find a perfect solution, so marking for 1.2.5.
by , 14 years ago
Attachment: | Add-set_max_wildcard_expansion-method-to-the-queryparser-updated.patch added |
---|
xapian-core patch, updated for trunk and tweaked
by , 14 years ago
perl parts of patch, tweaked slightly
follow-up: 12 comment:10 by , 14 years ago
Hmm, it seems my last comment (before attaching the patches) didn't make it.
Roughly speaking, I made some tweaks to the patches, like updating to use str() which has since replaced om_tostring(), but then realised we hadn't checked (at least that I can recall) that Adam has agreed to the patch licensing requirements:
http://trac.xapian.org/browser/trunk/xapian-core/HACKING#L1201
It would also be good to have test coverage for this new feature.
And also, we should document that the exception may get thrown later in the future.
follow-up: 13 comment:11 by , 14 years ago
Milestone: | 1.2.5 → 1.2.6 |
---|
I don't want to delay 1.2.5 further, so bumping.
comment:12 by , 14 years ago
Replying to olly:
Roughly speaking, I made some tweaks to the patches, like updating to use str() which has since replaced om_tostring(), but then realised we hadn't checked (at least that I can recall) that Adam has agreed to the patch licensing requirements:
http://trac.xapian.org/browser/trunk/xapian-core/HACKING#L1201
Thanks for the tweaks!
I agree to the patch licensing requirements.
It would also be good to have test coverage for this new feature.
Yeah, I don't have much time currently, but I will try to provide some when I can.
And also, we should document that the exception may get thrown later in the future.
This is in the patch from October 2010: "The exception may be thrown by the QueryParser, or later when when Enquire handles the query." - but maybe it needs to be clarified.
Best regards, Adam
comment:13 by , 14 years ago
Replying to olly:
I don't want to delay 1.2.5 further, so bumping.
Ok (this ticket is more than two years old, so a little more time is fine :-))
I didn't get any emails from trac when you added the two tweaked patches 11 days ago, though, which was a surprise to me - I would have expected to be notified as reporter on the ticket?
Best regards, Adam
follow-up: 15 comment:14 by , 14 years ago
I've now implemented feature tests for the exception getting thrown in appropriate cases, and not getting thrown when we don't want it to be.
Adam: what copyright attribution should I use for this?
comment:15 by , 14 years ago
Replying to olly:
I've now implemented feature tests for the exception getting thrown in appropriate cases, and not getting thrown when we don't want it to be.
Thanks!
Adam: what copyright attribution should I use for this?
"Adam Sjøgren <asjo@…>" should be sufficient, I think?
Best regards, Adam
comment:16 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Patch to add set_max_wildcard_expansion() to the QueryParser