Opened 9 years ago

Closed 8 years ago

#709 closed enhancement (fixed)

Expose Database::get_revision_info()

Reported by: German M. Bravo Owned by: Olly Betts
Priority: high Milestone: 1.4.1
Component: Library API Version: git master
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Add get_revision_info() public method to Database class.

This is to allow implementations to get a representation of the internal description of the current revision of a database. This can be used to implement replication and/or sinchronization protocols among databases.

Attachments (3)

get_revision_info.diff (2.0 KB ) - added by German M. Bravo 9 years ago.
709-get_revision_info.diff (2.0 KB ) - added by German M. Bravo 9 years ago.
709-added_get_revision_info.diff (2.1 KB ) - added by German M. Bravo 9 years ago.
This tries to fix the issues Olly mentioned

Download all attachments as: .zip

Change History (17)

by German M. Bravo, 9 years ago

Attachment: get_revision_info.diff added

comment:1 by Olly Betts, 9 years ago

Simply using ':' to separate isn't really a sensible approach - I'm guessing you copied this code from get_uuid() but we know a UUID only contains hex digits and -. A revision info string is an opaque binary blob.

As things stand, revision_info encodings for chert and glass happen to encode their own length, so we can actually just concatenate them safely without any separator character.

We could just document that as a requirement. Alternatives I can see are to encode the length externally, or else escape the chosen separator. Escaping seems the messiest solution (both to encode and to decode, if we later need to unpick a revision_info string somewhere).

comment:2 by Olly Betts, 9 years ago

Component: Backend-GlassLibrary API

This is really "Library API".

comment:3 by German M. Bravo, 9 years ago

I absolutely agree. maybe it could be: <encoded_length><path_info>[<encoded_length><path_info>...]

comment:4 by Olly Betts, 9 years ago

Also, the internal revision_info() throws an exception for backends which don't support it, while the patch assumes they return an empty string (and then itself returns an empty string if any do). We should decide what's most useful in the public API, make sure the implementation matches, and document it.

by German M. Bravo, 9 years ago

Attachment: 709-get_revision_info.diff added

by German M. Bravo, 9 years ago

This tries to fix the issues Olly mentioned

comment:5 by German M. Bravo, 9 years ago

Passing all tests

comment:6 by Olly Betts, 9 years ago

Kronuz: Elsewhere ISTR you saying it might be nice if the returned values string compared the same as the revision numbers - I think that's not possible in the face of multiple databases though. Do you have a use in mind for that property?

Also, do you care what happens for backends which don't support reporting revision info? Or should I just pick some consistent handling for that case?

comment:7 by German M. Bravo, 9 years ago

Olly, we are using the revision number to compare if we need to send/receive changes and what changes during replication. We needed to implement a way to replicate the database as we couldn't use the current replication protocol, it didn't suit our needs as it is.

Basically we need to receive the revision string and we need a way to compare two revision strings to see which one is from an "older" revision. Currently we're converting the string back to the revision number; as we're only using glass that is a simple, but undesirable, way to check that between the two strings.

We're not using backends which don't support revision info, so I'm not entirely sure about if we want them all returning something or not.

comment:8 by Olly Betts, 8 years ago

So what you actually want to know is "has the revision increased" or something like that?

If so, I think we should consider alternative APIs that actually provide that, rather than just exposing this and then having you assume you know how the result is encoded.

Adding an API which doesn't really support the one use we have in mind for it seems a bad idea.

Where's the code you're using this in?

comment:9 by German M. Bravo, 8 years ago

This is where it's used: https://github.com/Kronuz/Xapiand/blob/ccadeabf7bba25c8a529ca69a2f3c11548f5a64e/src/database.cc#L65

The idea is to have a WAL file with all operations written in files. When Xapiand loads, it get the current revision number in the database and then it walks the file from that location till the end, executing all operations which have a revision number higher than the current one.

comment:10 by Olly Betts, 8 years ago

Milestone: 1.4.x1.4.1
Status: newassigned

OK, so you do need the revision number itself.

Than I think the cleanest way to resolve this is to add a method which just returns the revision number as a suitable integer type, and make it throw an exception for multi-databases and for non-disk based databases (or even for everything but glass maybe - what do you support WAL for?). Then Xapiand can just use that and not have to worry about the encoding of it, or tying down the encoding Xapian uses internally.

Longer term, perhaps it would make most sense to move the WAL support into Xapian, but trying to do that first will mostly ensure that nothing user-visible happens for some time...

I'd be tempted to mark the method as internal or similar, with a note encouraging people to talk to us before using it, so we're aware of the impact of changing or removing it (so if Xapiand remains the only user, we only need to coordinate future changes between the two projects).

For example, Richard wondered about ways to handle the revision counter hitting its limit gracefully - making it 64 bit would make this impossible in practice (if you managed to commit a revision every nanosecond, it'd still take 584 years to exhaust a 64-bit counter), or else invalidate revisions [0,range/2) when the counter hits range and wraps to 0, and invalidate [range/2,range) when the counter hits range/2 and use a custom comparison (this probably requires a linear scan of the database to discover any blocks with such revisions, but the amortised cost is low). Increasing the type size would require a change to users of this method, as would a custom comparison for any which want an ordering of revision numbers.

Thoughts?

comment:11 by Olly Betts, 8 years ago

Ping? I'm hoping to wrap up 1.4.1 soon and it would be good to resolve this in it.

comment:12 by German M. Bravo, 8 years ago

Yes, the revision number itself is what we actually need. We are supporting WAL only for Glass, however it should as well work with Chert (and maybe others). I'm all for it returning the revision number directly (we don't use this with multiple databases either, so it could also throw an exception for those)

Revision number could also be 64 bit, we also thought about it as well... 32 bit is rather low. Regarding this, We have implemented a serializer for long double (sortable_serialise()) which works with 64 bit numbers, it's in our Xapiand repository, it's based upon the sortable_serialise in Xapian, if you mind taking a peek.

comment:13 by Olly Betts, 8 years ago

double is almost universally 64 bit currently, long double is often larger than that...

comment:14 by Olly Betts, 8 years ago

Resolution: fixed
Status: assignedclosed

Implemented in [324700a967f6e1ab79bb785f95d9dae58d43ba64] in git master.

Backported to RELEASE/1.4 in [741d39d8044cd3ecd7c246207c3114517704ab6f].

Note: See TracTickets for help on using tickets.