Opened 9 years ago

Last modified 20 months ago

#687 assigned defect

64-bit docids in the bindings

Reported by: Olly Betts Owned by: Olly Betts
Priority: normal Milestone: 2.0.0
Component: Xapian-bindings Version: git master
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

You can now use 64-bit docid (and termcount), so marking #385 fixed seems appropriate.

But we ought to add testcases that this works for each of the bindings - it's not something that necessarily will automatically.

I wrote a simple testcase for PHP (patch attached), which passes with 64 bit docid (promising) but fails with 32 bit docid (as the C++ version does). I don't see any easy way to determine the type widths from the bindings as things are though, which makes adding tests problematic.

The failure mode isn't good either - you just get the docid quietly wrapping. If we fix that for C++ then the testcases could at least check for it working or giving that exception. Or perhaps we expose the information about type widths through the bindings.

Attachments (1)

php-64bit-docid-test.patch (1.2 KB ) - added by Olly Betts 9 years ago.
Patch to add testcase for PHP

Download all attachments as: .zip

Change History (6)

by Olly Betts, 9 years ago

Attachment: php-64bit-docid-test.patch added

Patch to add testcase for PHP

comment:1 by Olly Betts, 9 years ago

I think we need some way to determine the largest possible docid.

There's actually two such values, as there's what the type supports, and what the backend supports. Currently these are the same by default, but if 64-bit docid is enabled, the current backends don't support that directly.

So I'm not sure if this should be a static value (perhaps for the bindings only, as in C++ it's just numeric_limits<Xapian::docid>::max()), or a method such as Database::get_max_docid() (which method which reports the largest docid which can be used for this object (and would take multi-dbs into account). Or perhaps both.

For the bindings, there's potentially a third limit, if the type Xapian::docid is mapped to is narrower than the C++ type (e.g. if the language has no 64 bit type).

comment:2 by Olly Betts, 9 years ago

Status: newassigned

The glass backend can now handle 64-bit document ids.

This probably isn't actually a 1.4.0 blocker, as we any constants or methods which would be added would be API additions and not break the ABI, though it would be good to document what the status of 64-bit docid support is for each language.

comment:3 by Olly Betts, 9 years ago

Milestone: 1.3.51.4.x

On further reflection, I think we punt on this for 1.4.0 - 64 bit docids aren't the default, so you're choosing an ABI-incompatible build to start with.

It's also going to be significant work to check the consequences for all the languages we have bindings for. E.g. it looks like PHP's integer type can be 32 or 64 bits depending on the platform (I guess it maps to C long), and large values quietly turn into a floating point value (which presumably means that where PHP has a 32-bit integer type, integer values are effectively precisely representable up to where C double stops being able to represent consecutive integers). This all needs careful research (or existing in-depth knowledge), and careful construction of tests cases.

comment:4 by Olly Betts, 5 years ago

Component: Library APIXapian-bindings
Summary: 64 bit docid follow-on64-bit docids in the bindings
Version: SVN trunkgit master

comment:5 by Olly Betts, 20 months ago

Milestone: 1.4.x2.0.0
Note: See TracTickets for help on using tickets.