Opened 11 years ago

Closed 8 years ago

#350 closed enhancement (fixed)

Stopping wildcard expansion at some point

Reported by: asjo Owned by: olly
Priority: normal Milestone: 1.2.6
Component: QueryParser Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by olly)

If there are many (many) terms with the same prefix and a wildcard search matching those is performed, the QueryParser tries to create a huge query, exhausting RAM even on reasonably sized machines (8GB).

Attached is a patch that makes it possible to ask the QueryParser to throw an exception if a wildcard expands to more than a configurable number of terms.

Some discussion on the mailinglist:

Attachments (5)

0001-Add-set_max_wildcard_expansion-method-to-the-queryp2.patch (4.1 KB) - added by asjo 11 years ago.
Patch to add set_max_wildcard_expansion() to the QueryParser?
0001-Add-set_max_wildcard_expansion-method-to-the-queryp3.patch (5.0 KB) - added by asjo 9 years ago.
Patch updated as per richards comments
0001-Add-set_max_wildcard_expansion-method-to-the-queryp4.patch (5.8 KB) - added by asjo 9 years ago.
Tried adding a paragraph of documentation to queryparser.html
Add-set_max_wildcard_expansion-method-to-the-queryparser-updated.patch (4.0 KB) - added by olly 9 years ago.
xapian-core patch, updated for trunk and tweaked
Add-set_max_wildcard_expansion-method-to-the-queryparser-perl-updated.patch (824 bytes) - added by olly 9 years ago.
perl parts of patch, tweaked slightly

Download all attachments as: .zip

Change History (21)

Changed 11 years ago by asjo

Patch to add set_max_wildcard_expansion() to the QueryParser?

comment:1 Changed 10 years ago by olly

  • Description modified (diff)
  • Milestone set to 1.1.1
  • Status changed from new to assigned

Adding to the 1.1.1 bucket...

comment:2 Changed 10 years ago by olly

  • Milestone changed from 1.1.1 to 1.1.4

Triaging milestone:1.1.1 bugs.

comment:3 Changed 10 years ago by olly

  • Milestone changed from 1.1.4 to 1.2.0

API addition, so punting to 1.2.x.

comment:4 Changed 9 years ago by richard

Just done a brief read-through of the patch - it looks pretty reasonable, though in xapian-core/queryparser/queryparser_internal.h I think the max_wildcard_expansion member should be initialised in the constructor, rather than assigned to. ie:

default_op(Query::OP_OR), errmsg(NULL): max_wildcard_expansion(0) { } 

The general functionality of this seems useful - though perhaps it should be a query-wide setting: the exception would be raised if a query would consist of more than the specified number of terms. This would catch cases where a query contained multiple wildcards, each individual wildcard being composed of sufficiently few terms.

comment:5 Changed 9 years ago by asjo

Thanks for the comments! I am attaching an updated patch that includes your change, and also displays the max number of terms in the exception message.

I haven't looked into making it a query-wide setting; I though I would start by putting the updated patch up here. In my usage a low factor N - i.e. how many wildcards a user is likely to enter, won't make a difference, but I am willing to give it a shot if needed.

Changed 9 years ago by asjo

Patch updated as per richards comments

comment:6 Changed 9 years ago by olly

Just to note the idea I mentioned on IRC - the issue of wanting to be able to push the wildcard expansion from the QueryParser to the matcher can be easily be addressed by documenting that the exception may be thrown by !QueryParser::parse_query(), or later when !Enquire is actually handling the query.

comment:7 Changed 9 years ago by asjo

If I were to try and tackle adding documentation to accompany the code in the patch, where should the documentation go?

Thanks!

comment:8 Changed 9 years ago by richard

I think xapian-core/docs/queryparser.html is the appropriate file (the "wildcards" section).

Olly - any other place you can think of?

Changed 9 years ago by asjo

Tried adding a paragraph of documentation to queryparser.html

comment:9 Changed 9 years ago by olly

  • Milestone changed from 1.2.x to 1.2.5

I doubt further thinking time will magically find a perfect solution, so marking for 1.2.5.

Changed 9 years ago by olly

xapian-core patch, updated for trunk and tweaked

Changed 9 years ago by olly

perl parts of patch, tweaked slightly

comment:10 follow-up: Changed 9 years ago by olly

Hmm, it seems my last comment (before attaching the patches) didn't make it.

Roughly speaking, I made some tweaks to the patches, like updating to use str() which has since replaced om_tostring(), but then realised we hadn't checked (at least that I can recall) that Adam has agreed to the patch licensing requirements:

http://trac.xapian.org/browser/trunk/xapian-core/HACKING#L1201

It would also be good to have test coverage for this new feature.

And also, we should document that the exception may get thrown later in the future.

comment:11 follow-up: Changed 9 years ago by olly

  • Milestone changed from 1.2.5 to 1.2.6

I don't want to delay 1.2.5 further, so bumping.

comment:12 in reply to: ↑ 10 Changed 9 years ago by asjo

Replying to olly:

Roughly speaking, I made some tweaks to the patches, like updating to use str() which has since replaced om_tostring(), but then realised we hadn't checked (at least that I can recall) that Adam has agreed to the patch licensing requirements:

http://trac.xapian.org/browser/trunk/xapian-core/HACKING#L1201

Thanks for the tweaks!

I agree to the patch licensing requirements.

It would also be good to have test coverage for this new feature.

Yeah, I don't have much time currently, but I will try to provide some when I can.

And also, we should document that the exception may get thrown later in the future.

This is in the patch from October 2010: "The exception may be thrown by the QueryParser?, or later when when Enquire handles the query." - but maybe it needs to be clarified.

Best regards, Adam

comment:13 in reply to: ↑ 11 Changed 9 years ago by asjo

Replying to olly:

I don't want to delay 1.2.5 further, so bumping.

Ok (this ticket is more than two years old, so a little more time is fine :-))

I didn't get any emails from trac when you added the two tweaked patches 11 days ago, though, which was a surprise to me - I would have expected to be notified as reporter on the ticket?

Best regards, Adam

comment:14 follow-up: Changed 8 years ago by olly

I've now implemented feature tests for the exception getting thrown in appropriate cases, and not getting thrown when we don't want it to be.

Adam: what copyright attribution should I use for this?

comment:15 in reply to: ↑ 14 Changed 8 years ago by asjo

Replying to olly:

I've now implemented feature tests for the exception getting thrown in appropriate cases, and not getting thrown when we don't want it to be.

Thanks!

Adam: what copyright attribution should I use for this?

"Adam Sjøgren <asjo@…>" should be sufficient, I think?

Best regards, Adam

comment:16 Changed 8 years ago by olly

  • Resolution set to fixed
  • Status changed from assigned to closed

Adam: what copyright attribution should I use for this?

"Adam Sjøgren < asjo@…>" should be sufficient, I think?

OK, I wasn't sure if it would be your employer or something. Thanks for promptly clarifying.

Core parts committed in r15556, and the Search::Xapian parts in r15557.

Note: See TracTickets for help on using tickets.