Opened 17 years ago
Closed 9 years ago
#181 closed enhancement (fixed)
Optional Termlist Table
Reported by: | Olly Betts | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.3.2 |
Component: | Backend-Glass | Version: | SVN trunk |
Severity: | minor | Keywords: | |
Cc: | Richard Boulton | Blocked By: | #363 |
Blocking: | Operating System: | All |
Description (last modified by )
The termlist table should be optional - without it, documents can't be deleted or replaced, and query expansion couldn't work, but most other things could be made to work.
Things which will now work:
- Database::alldocs_begin() - no longer uses the termlist.
- Database::get_doclength() - uses the document lengths stored in the postlist.
- Determining the percentage scores when the top document doesn't match all the query terms no longer uses the termlist (was #363).
Things which currently use it:
- Enquire::matching_terms_begin() - we could record this information during the match, though it might be hard to do without a speed penalty.
- WritableDatabase::delete_document() - we could allow this with inexact statistics like how lucene does (#388).
- WritableDatabase::replace_document() if the document exists already (again, possible with inexact statistics).
- At least currently, chert stores the list of which values are used in the termlist table, so things like iterating the values in a document require it. Not sure if this argues for putting this data elsewhere or not.
Things which just wouldn't work:
- Database::termlist_begin()
- Document::termlist_begin()
- Document::termlist_count()
- Enquire::get_eset()
Change History (22)
comment:1 by , 17 years ago
Blocking: | 120 added |
---|---|
Cc: | added |
Status: | new → assigned |
comment:2 by , 17 years ago
Operating System: | → All |
---|---|
Owner: | changed from | to
Status: | assigned → new |
I have no plans to work on this, so assigning to nobody.
comment:3 by , 17 years ago
Currently, get_mset() uses the termlist for assigning percentages. This could be worked around by keeping track of which terms matched the current highest-relevance msetitem (actually, it would probably be easiest to track which Xapian::Weight::Internal objects were used for the best msetitem, since that will work for synonyms too). This information would need to be serialised along with the MSet, though, so will require a network protocol bump. We could probably manage to make this backwards compatible, so only a minor version bump, but it's easier to wait until we branch for 1.1.
comment:5 by , 17 years ago
Description: | modified (diff) |
---|---|
Milestone: | → 1.1 |
comment:6 by , 17 years ago
Blocking: | 120 removed |
---|
(In #120) Remove the unfixed dependencies so we can close this bug - they're all marked for the 1.1.0 milestone.
comment:7 by , 17 years ago
Component: | Backend-Flint → Backend-Chert |
---|---|
Description: | modified (diff) |
Changed Component to Backend-Chert because flint is now "frozen", so we won't be working on this for flint.
comment:8 by , 16 years ago
Milestone: | 1.1.0 → 1.1.1 |
---|
Bumping milestone to 1.1.1 as there's no patch yet and this isn't an incompatible change.
comment:10 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Without a termlist table, we could allow "imperfect" deletion and replacement of documents, where the term statistics aren't correctly adjusted for the old document being removed. And xapian-compact could recompute these statistics pretty cheaply (we'd have to scan all the postlist chunks for a particular term before writing any of them in this case, but that's not a big deal). This is essentially how Lucene works AIUI.
comment:11 by , 16 years ago
Blocked By: | 363 added |
---|
comment:12 by , 16 years ago
I've created a separate ticket (#363) to track fixing the requirement for a termlist access in get_mset().
comment:13 by , 16 years ago
Description: | modified (diff) |
---|
comment:14 by , 15 years ago
Priority: | normal → high |
---|
comment:16 by , 15 years ago
Description: | modified (diff) |
---|---|
Milestone: | 1.1.4 → 1.2.0 |
I've added support for chert databases without a termlist table in r13488.
xapian-check handles them, as does xapian-compact (and trying to merge databases when some have termlists and some don't generates output without a termlist and a message explaining this).
Currently the only way to create such a database is to create a chert database and do "rm termlist.*".
There's also no explicit test coverage for this yet.
But with what is now in place, we can add better test coverage and support for generating such databases via the API in 1.2.x and they'll work with 1.2.0. So I'm updating the milestone.
comment:17 by , 14 years ago
Priority: | high → normal |
---|
comment:18 by , 12 years ago
Component: | Backend-Chert → Backend-Brass |
---|---|
Milestone: | 1.2.x → 1.3.x |
This isn't 1.2.x material now.
comment:19 by , 11 years ago
r17752 adds a Xapian::DB_NO_TERMLIST flag to provide an API for creating a database without a termlist (currently only supported for brass).
comment:20 by , 10 years ago
Component: | Backend-Brass → Backend-Glass |
---|
comment:22 by , 10 years ago
Milestone: | 1.3.x → 1.4.x |
---|
comment:23 by , 9 years ago
Milestone: | 1.4.x → 1.3.2 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
No ABI changes, and existing flint database remain valid, so could be done in 1.0.x.