Opened 13 years ago

Closed 10 years ago

#350 closed enhancement (fixed)

Stopping wildcard expansion at some point

Reported by: Adam Sjøgren Owned by: Olly Betts
Priority: normal Milestone: 1.2.6
Component: QueryParser Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

If there are many (many) terms with the same prefix and a wildcard search matching those is performed, the QueryParser tries to create a huge query, exhausting RAM even on reasonably sized machines (8GB).

Attached is a patch that makes it possible to ask the QueryParser to throw an exception if a wildcard expands to more than a configurable number of terms.

Some discussion on the mailinglist:

Attachments (5)

0001-Add-set_max_wildcard_expansion-method-to-the-queryp2.patch (4.1 KB ) - added by Adam Sjøgren 13 years ago.
Patch to add set_max_wildcard_expansion() to the QueryParser
0001-Add-set_max_wildcard_expansion-method-to-the-queryp3.patch (5.0 KB ) - added by Adam Sjøgren 11 years ago.
Patch updated as per richards comments
0001-Add-set_max_wildcard_expansion-method-to-the-queryp4.patch (5.8 KB ) - added by Adam Sjøgren 11 years ago.
Tried adding a paragraph of documentation to queryparser.html
Add-set_max_wildcard_expansion-method-to-the-queryparser-updated.patch (4.0 KB ) - added by Olly Betts 11 years ago.
xapian-core patch, updated for trunk and tweaked
Add-set_max_wildcard_expansion-method-to-the-queryparser-perl-updated.patch (824 bytes ) - added by Olly Betts 11 years ago.
perl parts of patch, tweaked slightly

Download all attachments as: .zip

Change History (21)

by Adam Sjøgren, 13 years ago

Patch to add set_max_wildcard_expansion() to the QueryParser

comment:1 by Olly Betts, 13 years ago

Description: modified (diff)
Milestone: 1.1.1
Status: newassigned

Adding to the 1.1.1 bucket...

comment:2 by Olly Betts, 13 years ago

Milestone: 1.1.11.1.4

Triaging milestone:1.1.1 bugs.

comment:3 by Olly Betts, 12 years ago

Milestone: 1.1.41.2.0

API addition, so punting to 1.2.x.

comment:4 by Richard Boulton, 11 years ago

Just done a brief read-through of the patch - it looks pretty reasonable, though in xapian-core/queryparser/queryparser_internal.h I think the max_wildcard_expansion member should be initialised in the constructor, rather than assigned to. ie:

default_op(Query::OP_OR), errmsg(NULL): max_wildcard_expansion(0) { } 

The general functionality of this seems useful - though perhaps it should be a query-wide setting: the exception would be raised if a query would consist of more than the specified number of terms. This would catch cases where a query contained multiple wildcards, each individual wildcard being composed of sufficiently few terms.

comment:5 by Adam Sjøgren, 11 years ago

Thanks for the comments! I am attaching an updated patch that includes your change, and also displays the max number of terms in the exception message.

I haven't looked into making it a query-wide setting; I though I would start by putting the updated patch up here. In my usage a low factor N - i.e. how many wildcards a user is likely to enter, won't make a difference, but I am willing to give it a shot if needed.

by Adam Sjøgren, 11 years ago

Patch updated as per richards comments

comment:6 by Olly Betts, 11 years ago

Just to note the idea I mentioned on IRC - the issue of wanting to be able to push the wildcard expansion from the QueryParser to the matcher can be easily be addressed by documenting that the exception may be thrown by !QueryParser::parse_query(), or later when !Enquire is actually handling the query.

comment:7 by Adam Sjøgren, 11 years ago

If I were to try and tackle adding documentation to accompany the code in the patch, where should the documentation go?

Thanks!

comment:8 by Richard Boulton, 11 years ago

I think xapian-core/docs/queryparser.html is the appropriate file (the "wildcards" section).

Olly - any other place you can think of?

by Adam Sjøgren, 11 years ago

Tried adding a paragraph of documentation to queryparser.html

comment:9 by Olly Betts, 11 years ago

Milestone: 1.2.x1.2.5

I doubt further thinking time will magically find a perfect solution, so marking for 1.2.5.

by Olly Betts, 11 years ago

xapian-core patch, updated for trunk and tweaked

by Olly Betts, 11 years ago

perl parts of patch, tweaked slightly

comment:10 by Olly Betts, 11 years ago

Hmm, it seems my last comment (before attaching the patches) didn't make it.

Roughly speaking, I made some tweaks to the patches, like updating to use str() which has since replaced om_tostring(), but then realised we hadn't checked (at least that I can recall) that Adam has agreed to the patch licensing requirements:

http://trac.xapian.org/browser/trunk/xapian-core/HACKING#L1201

It would also be good to have test coverage for this new feature.

And also, we should document that the exception may get thrown later in the future.

comment:11 by Olly Betts, 11 years ago

Milestone: 1.2.51.2.6

I don't want to delay 1.2.5 further, so bumping.

in reply to:  10 comment:12 by Adam Sjøgren, 11 years ago

Replying to olly:

Roughly speaking, I made some tweaks to the patches, like updating to use str() which has since replaced om_tostring(), but then realised we hadn't checked (at least that I can recall) that Adam has agreed to the patch licensing requirements:

http://trac.xapian.org/browser/trunk/xapian-core/HACKING#L1201

Thanks for the tweaks!

I agree to the patch licensing requirements.

It would also be good to have test coverage for this new feature.

Yeah, I don't have much time currently, but I will try to provide some when I can.

And also, we should document that the exception may get thrown later in the future.

This is in the patch from October 2010: "The exception may be thrown by the QueryParser, or later when when Enquire handles the query." - but maybe it needs to be clarified.

Best regards, Adam

in reply to:  11 comment:13 by Adam Sjøgren, 11 years ago

Replying to olly:

I don't want to delay 1.2.5 further, so bumping.

Ok (this ticket is more than two years old, so a little more time is fine :-))

I didn't get any emails from trac when you added the two tweaked patches 11 days ago, though, which was a surprise to me - I would have expected to be notified as reporter on the ticket?

Best regards, Adam

comment:14 by Olly Betts, 10 years ago

I've now implemented feature tests for the exception getting thrown in appropriate cases, and not getting thrown when we don't want it to be.

Adam: what copyright attribution should I use for this?

in reply to:  14 comment:15 by Adam Sjøgren, 10 years ago

Replying to olly:

I've now implemented feature tests for the exception getting thrown in appropriate cases, and not getting thrown when we don't want it to be.

Thanks!

Adam: what copyright attribution should I use for this?

"Adam Sjøgren <asjo@…>" should be sufficient, I think?

Best regards, Adam

comment:16 by Olly Betts, 10 years ago

Resolution: fixed
Status: assignedclosed

Adam: what copyright attribution should I use for this?

"Adam Sjøgren < asjo@…>" should be sufficient, I think?

OK, I wasn't sure if it would be your employer or something. Thanks for promptly clarifying.

Core parts committed in r15556, and the Search::Xapian parts in r15557.

Note: See TracTickets for help on using tickets.