Opened 17 years ago
Closed 14 years ago
#245 closed defect (fixed)
All-stopword queries with two or more terms should ignore stopword list
Reported by: | Richard Boulton | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.2.3 |
Component: | QueryParser | Version: | SVN trunk |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description (last modified by )
Currently, if a single word query is parsed, and that word is a stopword, the stopwording is ignored. However, if a multiple word query is parsed, and all words are stopwords, the stopwording is applied (resulting in an empty query).
If all the words in the query are stopwords, I think it may make sense to ignore the stopwording. However, even if we decide to apply the stopwording in this case, we should be consistent in our behaviour.
Some examples, in python:
>>> import xapian >>> s=xapian.SimpleStopper() >>> s.add('foo') >>> s.add('bar') >>> qp=xapian.QueryParser() >>> qp.set_stopper(s) >>> str(qp.parse_query('foo')) 'Xapian::Query(foo:(pos=1))' >>> str(qp.parse_query('foo foo')) 'Xapian::Query()' >>> str(qp.parse_query('foo bar')) 'Xapian::Query()'
Either the first parse_query() call should return Xapian::Query(), or the later ones should return non-empty queries.
Attachments (2)
Change History (11)
comment:1 by , 17 years ago
Status: | new → assigned |
---|---|
Summary: | Inconsistent behaviour if all words in the query are stopwords. → All-stopword queries with two or more terms should ignore stopword list |
comment:2 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:3 by , 17 years ago
Operating System: | → All |
---|---|
Status: | new → assigned |
comment:6 by , 14 years ago
Milestone: | 1.2.x → 1.2.3 |
---|
The approach in the patch seems reasonable to me, so I'm going to take another look at this and if I don't see a better fix, I'll update and apply it for 1.2.3.
comment:7 by , 14 years ago
There's a case the patch doesn't handle: the AND an
This doesn't work currently either, so the patch is an improvement, but it would be better to solve this whole issue in one go if we can.
comment:8 by , 14 years ago
Actually the AND an
works - the failing testcase is: the AND an a
Which is pretty unlikely to be hit in practice unless the stopword list is rather long, but the error message is confusing.
comment:9 by , 14 years ago
New patch seems to fix all the cases I've checked without breaking more, but feels rather clumsy in places. It's bedtime now, but I'll have a look when I'm more awake and see if it can be done more elegantly.
comment:10 by , 14 years ago
Description: | modified (diff) |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
I committed the patch (with comment improvements) in trunk r14845. Maybe there's a neater way, but it does at least work.
I think this probably isn't worth backporting to 1.0 at this point - I've not seen any feedback from end users on this (unless that was what triggered your report?)
The bug is that all-stopword queries with two or more terms aren't handled specially - the single term case was easy to handle, while the multi-term case seemed trickier for some reason, and I only handled the easy one so far.
There's actually a FIXME comment about this in queryparser.lemony which probably shows where the fix should go: