Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#484 closed defect (fixed)

QueryParser does not expand wildcarded terms in some cases

Reported by: Daniel Ménard Owned by: Olly Betts
Priority: normal Milestone: 1.0.21
Component: QueryParser Version: 1.0.2
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Hi,

It seems that the query parser does not expand wildcards if the query contains at least 3 terms and the wildcarded term is in the middle of the query.

Exemples:

OK : test* xapian user -> testable OR tester OR xapian OR user.
OK : xapian user test* -> xapian OR user OR testable OR tester.
NOT OK : xapian test* user -> xapian OR test OR user.

Attached is a small PHP script which reproduce the problem.

On my machine (Windows XP, Xapian php-bindings 1.2.0 binaries from flax.co.uk, php 5.2.13), this script produces the following output:

PHP version 5.2.13, Xapian version 1.2.0
 
Creating a new db containing one document with terms : xapian, tester, testable, user, query, 
 
query:  test* xapian
result: Xapian::Query(((testable:(pos=1) SYNONYM tester:(pos=1)) OR xapian:(pos=2)))
OK.
 
query:  xapian test*
result: Xapian::Query((xapian:(pos=1) OR (testable:(pos=2) SYNONYM tester:(pos=2))))
OK.
 
query:  test* xapian user
result: Xapian::Query(((testable:(pos=1) SYNONYM tester:(pos=1)) OR xapian:(pos=2) OR user:(pos=3)))
OK.
 
query:  xapian user test*
result: Xapian::Query((xapian:(pos=1) OR user:(pos=2) OR (testable:(pos=3) SYNONYM tester:(pos=3))))
OK.
 
query:  xapian test* user
result: Xapian::Query((xapian:(pos=1) OR test:(pos=2) OR user:(pos=3)))
expect: Xapian::Query((xapian:(pos=1) OR (testable:(pos=2) SYNONYM tester:(pos=2)) OR user:(pos=3)))
FAILS.
 
query:  xapian query test* user
result: Xapian::Query((xapian:(pos=1) OR query:(pos=2) OR test:(pos=3) OR user:(pos=4)))
expect: Xapian::Query((xapian:(pos=1) OR query:(pos=2) OR (testable:(pos=3) SYNONYM tester:(pos=3)) OR user:(pos=4)))
FAILS.
 
query:  xapian que* test* user
result: Xapian::Query((xapian:(pos=1) OR que:(pos=2) OR test:(pos=3) OR user:(pos=4)))
expect: Xapian::Query((xapian:(pos=1) OR query:(pos=2) OR (testable:(pos=3) SYNONYM tester:(pos=3)) OR user:(pos=4)))
FAILS.
 
query:  xapian que* user test*
result: Xapian::Query((xapian:(pos=1) OR que:(pos=2) OR user:(pos=3) OR test:(pos=4)))
expect: Xapian::Query((xapian:(pos=1) OR query:(pos=2) OR user:(pos=3) OR (testable:(pos=4) SYNONYM tester:(pos=4))))
FAILS.

I tried with various older versions of Xapian and I think that the problem was introduced in Xapian 1.0.2 because my tests pass with Xapian 1.0.1 but not with more recent versions:

PHP version 5.2.13, Xapian version 1.0.1
 
Creating a new db containing one document with terms : xapian, tester, testable, user, query, 
 
query:  test* xapian
result: Xapian::Query((testable:(pos=1) OR tester:(pos=1) OR xapian:(pos=2)))
OK.
 
query:  xapian test*
result: Xapian::Query((xapian:(pos=1) OR testable:(pos=2) OR tester:(pos=2)))
OK.
 
query:  test* xapian user
result: Xapian::Query((testable:(pos=1) OR tester:(pos=1) OR xapian:(pos=2) OR user:(pos=3)))
OK.
 
query:  xapian user test*
result: Xapian::Query((xapian:(pos=1) OR user:(pos=2) OR testable:(pos=3) OR tester:(pos=3)))
OK.
 
query:  xapian test* user
result: Xapian::Query((xapian:(pos=1) OR testable:(pos=2) OR tester:(pos=2) OR user:(pos=3)))
OK.
 
query:  xapian query test* user
result: Xapian::Query((xapian:(pos=1) OR query:(pos=2) OR testable:(pos=3) OR tester:(pos=3) OR user:(pos=4)))
OK.
 
query:  xapian que* test* user
result: Xapian::Query((xapian:(pos=1) OR query:(pos=2) OR testable:(pos=3) OR tester:(pos=3) OR user:(pos=4)))
OK.
 
query:  xapian que* user test*
result: Xapian::Query((xapian:(pos=1) OR query:(pos=2) OR user:(pos=3) OR testable:(pos=4) OR tester:(pos=4)))
OK.

Sorry for spotting only now a 3 years old problem!

Attachments (1)

bug-wildcards.php (3.3 KB ) - added by Daniel Ménard 14 years ago.
PHP script which reproduce the problem.

Download all attachments as: .zip

Change History (5)

by Daniel Ménard, 14 years ago

Attachment: bug-wildcards.php added

PHP script which reproduce the problem.

comment:1 by Olly Betts, 14 years ago

Milestone: 1.2.1
Version: 1.0.2

If this broke in 1.0.2, it was probably the addition of synonyms which did it.

Marking for 1.2.1, at least for now.

comment:2 by Daniel Ménard, 14 years ago

Not sure if it helps, but if the wildcarded term if followed by any character other than a space (e.g. a comma), the term is correctly expanded :

NOT OK: xapian test* user
OK: xapian test*, user
OK: xapian test* (user)
OK: (xapian test*) user

Inspiration from queryparser.lemony?rev=9085, line 1209: "GROUP_TERM is a query term which follows a TERM or another GROUP_TERM and is only separated by whitespace characters."

comment:3 by Olly Betts, 14 years ago

Milestone: 1.2.11.0.21
Resolution: fixed
Status: newclosed

Fixed in trunk r14712, backported for 1.0.21 in r14713.

comment:4 by Daniel Ménard, 14 years ago

Ouah, it was fast!

Thanks a lot, Olly, for correcting this.

Daniel

Note: See TracTickets for help on using tickets.