Opened 17 years ago

Closed 8 years ago

#181 closed enhancement (fixed)

Optional Termlist Table

Reported by: Olly Betts Owned by: Olly Betts
Priority: normal Milestone: 1.3.2
Component: Backend-Glass Version: SVN trunk
Severity: minor Keywords:
Cc: Richard Boulton Blocked By: #363
Blocking: Operating System: All

Description (last modified by Olly Betts)

The termlist table should be optional - without it, documents can't be deleted or replaced, and query expansion couldn't work, but most other things could be made to work.

Things which will now work:

  • Database::alldocs_begin() - no longer uses the termlist.
  • Database::get_doclength() - uses the document lengths stored in the postlist.
  • Determining the percentage scores when the top document doesn't match all the query terms no longer uses the termlist (was #363).

Things which currently use it:

  • Enquire::matching_terms_begin() - we could record this information during the match, though it might be hard to do without a speed penalty.
  • WritableDatabase::delete_document() - we could allow this with inexact statistics like how lucene does (#388).
  • WritableDatabase::replace_document() if the document exists already (again, possible with inexact statistics).
  • At least currently, chert stores the list of which values are used in the termlist table, so things like iterating the values in a document require it. Not sure if this argues for putting this data elsewhere or not.

Things which just wouldn't work:

  • Database::termlist_begin()
  • Document::termlist_begin()
  • Document::termlist_count()
  • Enquire::get_eset()

Change History (22)

comment:1 by Olly Betts, 17 years ago

Blocking: 120 added
Cc: richard@… added
Status: newassigned

No ABI changes, and existing flint database remain valid, so could be done in 1.0.x.

comment:2 by Richard Boulton, 17 years ago

Operating System: All
Owner: changed from Richard Boulton to Not currently assigned
Status: assignednew

I have no plans to work on this, so assigning to nobody.

comment:3 by Richard Boulton, 16 years ago

Currently, get_mset() uses the termlist for assigning percentages. This could be worked around by keeping track of which terms matched the current highest-relevance msetitem (actually, it would probably be easiest to track which Xapian::Weight::Internal objects were used for the best msetitem, since that will work for synonyms too). This information would need to be serialised along with the MSet, though, so will require a network protocol bump. We could probably manage to make this backwards compatible, so only a minor version bump, but it's easier to wait until we branch for 1.1.

comment:5 by Richard Boulton, 16 years ago

Description: modified (diff)
Milestone: 1.1

comment:6 by Richard Boulton, 16 years ago

Blocking: 120 removed

(In #120) Remove the unfixed dependencies so we can close this bug - they're all marked for the 1.1.0 milestone.

comment:7 by Richard Boulton, 16 years ago

Component: Backend-FlintBackend-Chert
Description: modified (diff)

Changed Component to Backend-Chert because flint is now "frozen", so we won't be working on this for flint.

comment:8 by Olly Betts, 15 years ago

Milestone: 1.1.01.1.1

Bumping milestone to 1.1.1 as there's no patch yet and this isn't an incompatible change.

comment:9 by Olly Betts, 15 years ago

Milestone: 1.1.11.1.4

Triaging milestone:1.1.1 bugs.

comment:10 by Olly Betts, 15 years ago

Owner: changed from Not currently assigned to Olly Betts
Status: newassigned

Without a termlist table, we could allow "imperfect" deletion and replacement of documents, where the term statistics aren't correctly adjusted for the old document being removed. And xapian-compact could recompute these statistics pretty cheaply (we'd have to scan all the postlist chunks for a particular term before writing any of them in this case, but that's not a big deal). This is essentially how Lucene works AIUI.

comment:11 by Richard Boulton, 15 years ago

Blocked By: 363 added

comment:12 by Richard Boulton, 15 years ago

I've created a separate ticket (#363) to track fixing the requirement for a termlist access in get_mset().

comment:13 by Olly Betts, 15 years ago

Description: modified (diff)

comment:14 by Olly Betts, 15 years ago

Priority: normalhigh

comment:15 by Olly Betts, 15 years ago

Description: modified (diff)

#363 fixed.

comment:16 by Olly Betts, 15 years ago

Description: modified (diff)
Milestone: 1.1.41.2.0

I've added support for chert databases without a termlist table in r13488.

xapian-check handles them, as does xapian-compact (and trying to merge databases when some have termlists and some don't generates output without a termlist and a message explaining this).

Currently the only way to create such a database is to create a chert database and do "rm termlist.*".

There's also no explicit test coverage for this yet.

But with what is now in place, we can add better test coverage and support for generating such databases via the API in 1.2.x and they'll work with 1.2.0. So I'm updating the milestone.

comment:17 by Olly Betts, 13 years ago

Priority: highnormal

comment:18 by Olly Betts, 11 years ago

Component: Backend-ChertBackend-Brass
Milestone: 1.2.x1.3.x

This isn't 1.2.x material now.

comment:19 by Olly Betts, 10 years ago

r17752 adds a Xapian::DB_NO_TERMLIST flag to provide an API for creating a database without a termlist (currently only supported for brass).

comment:20 by Olly Betts, 9 years ago

Component: Backend-BrassBackend-Glass

comment:21 by Olly Betts, 9 years ago

This isn't worth holding up 1.4.0 for.

comment:22 by Olly Betts, 9 years ago

Milestone: 1.3.x1.4.x

comment:23 by Olly Betts, 8 years ago

Milestone: 1.4.x1.3.2
Resolution: fixed
Status: assignedclosed

I've created #700 for Enquire::matching_terms_begin(), and #701 for where to store which values are used in each document.

The rest is done, and we documented the restrictions along with the addition of DB_NO_TERMLIST in 1.3.2, so marking this as fixed in that version.

Note: See TracTickets for help on using tickets.