Opened 14 years ago

Closed 14 years ago

#475 closed defect (fixed)

Multi XOR can return inconsistent results

Reported by: Richard Boulton Owned by: Olly Betts
Priority: normal Milestone: 1.0.21
Component: Matcher Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Richard Boulton)

I've run the following query, on a database containing 1000 random documents, in which each document contains a sequence of terms starting at N1, and ending at Nx, where x is a random integer in the range 1 to 100 (inclusive). ie, in this database, if a document contains term Nx, it also contains N(x-1), for all x > 1. All wdfs are 1.

Xapian::Query((N48:(wqf=2) XOR (N70 AND_NOT N42) XOR (N99 OR N68 OR N10 OR N28) XOR N22 XOR N44))

When I ask for only the top result (get_mset(0, 1)), a document which contains all terms to N99 is returned.

When I ask for all the results (get_mset(0, 1000)), this document is not returned.

I believe this is because, when asking for only the top result, the postlist decays changing some of the XORs into AND_NOTs.

The decay of XOR to AND_NOT changes the semantics of a multi-XOR (which is to match if an odd number of the children match). Consider:

N99 XOR N44 XOR N22

N99 present implies N44 and N22 present, so a document containing N99 will be returned.

However, this can decay to

(N99 XOR N44) AND_NOT N22

N99 present implies all the other terms are present, so the XOR part of this never matches a document containing N99.

I think the upshot of this is that either:

  • we should disable the decay of XOR to AND_NOT, or
  • we should disallow more than 2 children being passed to an XOR query constructor (and disallow redistribution of terms of XOR queries next to each other in the tree), or
  • we should implement a MultiXORPostList which handles the decay correctly (though I've not thought hard about what that means).

Change History (8)

comment:1 by Richard Boulton, 14 years ago

Version: SVN trunk

This occurs in SVN trunk, but I suspect it's been present in Xapian for many years.

comment:2 by Richard Boulton, 14 years ago

Description: modified (diff)

Corrected typo in description

comment:3 by Olly Betts, 14 years ago

From discussion on IRC, it seems the problem is probably that XOR should pass min_weight of 0 to child posting lists, since it wants to know if either child matches since that match invalidates a match in the other child. Decay to AND_NOT should be safe still (the actual issue with the example being with the left XOR, not the decaying right one).

Richard: Are you working on this? Do you have a reduced test case already, or just the description of one?

comment:4 by Richard Boulton, 14 years ago

I committed a fix and a testcase in r14525 for the problem of XOR children decaying, causing the XOR to return documents which it shouldn't have returned.

However, I think there is still a problem with the weights returned from multi-XORs: if we have A XOR B XOR C, a document which contains A, B and C should be returned, and the weight should be that from A + B + C. However, it will actually be the weights of exactly one of A, B or C; worse, it's unpredictable which of those weights will be returned because it depends on what order the posting list tree was built in (which isn't under the users control). I think to fix this there's no other option than implementing a MultiXorPostList - I'm not sure it's worth the effort, though.

comment:5 by Olly Betts, 14 years ago

Status: newassigned

Do you have a testcase for the multi-way xor issue?

comment:6 by Olly Betts, 14 years ago

Status: assignednew

Backported the fix in r14525 to 1.0 as r14553.

I've been working on MultiXorPostList, so stealing this ticket.

comment:7 by Olly Betts, 14 years ago

Milestone: 1.2.11.0.21
Status: newassigned

MultiXorPostList implemetation added in r14608, which should fix this completely.

Marking for backport to 1.0.21.

comment:8 by Olly Betts, 14 years ago

Resolution: fixed
Status: assignedclosed

Backported for 1.0.21 as r14609.

Note: See TracTickets for help on using tickets.