Ticket #198 (assigned defect)
Add support for multiple values in each value slot in a Document.
| Reported by: | richard | Owned by: | richard |
|---|---|---|---|
| Priority: | normal | Milestone: | 1.1.0 |
| Component: | Backend-Chert | Version: | SVN trunk |
| Severity: | normal | Keywords: | |
| Cc: | olly | Blocked By: | |
| Operating System: | All | Blocking: | #199 |
Description (last modified by richard) (diff)
Currently, the value stored in a slot in a Document is a single string. It would sometimes be useful to be able to store multiple strings in the slot. For example, when using a value slot to store the set of facets that a document is relevant to, a given document may be relevant to multiple values. Also, if storing the set of tags matching a document, for use when generating a tag cloud, we want to be able to store multiple tags for each document.
However, we also need to preserve the existing API, and ensure that database formats are compatible.
Some discussion from IRC follows:
Richard Boulton: Do you think we could convert values as stored in databases currently to allow multiple values, without breaking backwards compatibility? ojwb: probably ojwb: if only by checking the flint version Richard Boulton: Hmm - if an old flint version is used to create a database, and insert some values, it could be hard to then modify that database with a new version of flint. Richard Boulton: Unless we have a pass through the whole database to rewrite the values. ojwb: well, you could just disable the ability to add multiple values Richard Boulton: Oh, no, its easy to store this. Richard Boulton: Each value entry consists of a list of "valueno, entry" items. Richard Boulton: (serialised, of course) ojwb: or start the newly encoded ones in a way which is invalid ojwb: oh, just duplicate? Richard Boulton: Yep, there doesn't seem to be any thing to stop that. ojwb: so the only question is if it's actually desirable! Richard Boulton: They're kept in sorted order. Richard Boulton: And the existing get_value() just returns the first of a particular valueno found. ojwb: that's nice then Richard Boulton: So it would even be backwards compatible for reading purposes. (Old versions of xapian just wouldn't see the duplicate values) ojwb: rewriting would mess up a document with multiple values, wouldn't it? Richard Boulton: Not entirely. Richard Boulton: add_value adds on the values at the end of the list. Richard Boulton: without checking them. ojwb: but aren't the unserialised into a map in Xapian::Document? Richard Boulton: Oh. Ah. Richard Boulton: Yes, so getting a document out and then inserting it again would lose the duplicates. Richard Boulton: But that's a pretty nice way to degrade. ojwb: yeah, it's not too bad Richard Boulton: It would be nicer if we'd named Document::add_value() as document::set_value() Richard Boulton: We can't change the behaviour of add_value() now, though: I suppose we could add Document::append_value() Richard Boulton: And leave Document::remove_value() as removing all values with a given number. Richard Boulton: Document::get_value() would return the first value for a given valueno. Richard Boulton: And we could add Document::get_values() which gets a list of all the values for a given valueno. Richard Boulton: Hmm - I wonder if the list of values for each valueno should be kept in insertion order. Or sorted in some way (binary sort, I would think). ojwb: It shouldn't sort them I think Richard Boulton: I think just in insertion order. Richard Boulton: *snap* ojwb: because you want a "primary version" Richard Boulton: That's true. ojwb: which is used for sorting, etc ojwb: I'm not completely sure this is a good plan, but it seems to have merit Richard Boulton: Yes. That's the main thing I was unhappy about StringListSerialiser for - you couldn't sensibly sort on the resulting values.
Change History
Note: See
TracTickets for help on using
tickets.
