Opened 8 years ago

Last modified 5 months ago

#714 new defect

Use C++ API optional reference counting in bindings

Reported by: Olly Betts Owned by: Olly Betts
Priority: normal Milestone: 2.0.0
Component: Xapian-bindings Version: git master
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

[follow-on for #186]

The C++ API now has optional reference counting of the various functor classes which API objects keep references to.

We should make use of this in the bindings - several of the bindings (Python, PHP, Perl) already have separate tracking code so there will be no observable behaviour change for those (except for a small speed-up), while for the other languages this would fix cases which currently fail.means that cases which failed before now work, so it's reasonable as a change to make mid-1.4.x (and indeed the tracking code for Python and PHP wasn't added at the start of release series).

There's an attempt at a patch in #186, but it looks like this may need changes to SWIG.

This is a bug for the languages which don't currently have tracking, so setting type=defect.

Change History (8)

comment:1 by Olly Betts, 23 months ago

I've hit on this approach, which seems to work in tests with PHP so far, and can be neatly wrapped up in a macro (perhaps just done by the existing SUBCLASSABLE macro):

%typemap(in) (Xapian::FieldProcessor* proc) %{
$typemap(in, Xapian::FieldProcessor* DISOWN)
{
  Swig::Director* xapian_swig_director = dynamic_cast<Swig::Director*>($1);
  if (xapian_swig_director) xapian_swig_director->swig_disown();
}
$1->release();
%}

I'm not actually sure if the special handling for director classes is needed.

comment:2 by Olly Betts, 22 months ago

The special handling for director classes does seem to be needed.

comment:3 by Olly Betts, 19 months ago

This is now implemented for PHP as part of PHP8 support (and backported to 1.4.22).

I tried to extend it to cover all bindings on master, but it breaks Python3 and Perl.

Branch is: https://trac.xapian.org/browser/git/?rev=use-optional-ref-counting-for-all-bindings

comment:4 by Olly Betts, 19 months ago

The problem seems to be that after release() Xapian will delete the object once it no longer has any references to it, and then if the Python object still exists it points to a deleted C++ object. The Python3 pythontest.py testcase "matchspy" exercises this case.

I think we can use the ref() and unref() methods instead (which is more like Richard's earlier attempts in #186 in fact).

comment:5 by Olly Betts, 19 months ago

Having dug into this some more, I don't think this approach can work, at least for languages with directors. I can get a segfault out of the new PHP8 bindings with this patch to smoketest.php:

diff --git a/xapian-bindings/php/smoketest.php b/xapian-bindings/php/smoketest.php
index dbc8a6ce455d..eea7d4c9b995 100644
--- a/xapian-bindings/php/smoketest.php
+++ b/xapian-bindings/php/smoketest.php
@@ -281,7 +281,13 @@ class testfieldprocessor extends XapianFieldProcessor {
     }
 }
 
-$qp->add_prefix('spam', new testfieldprocessor);
+$fp = new testfieldprocessor;
+{
+    $qptmp = new XapianQueryParser;
+    $qptmp->add_prefix('spam', $fp);
+    unset($qptmp);
+}
+$qp->add_prefix('spam', $fp);
 $qp->add_boolean_prefix('filter', new testfieldprocessor);
 $query = $qp->parse_query('spam:ignored');
 if ($query->get_description() !== 'Query(spam)') {

I'll talk about this in the context of PHP and setting Xapian::FieldProcessor on Xapian::QueryParser, but I believe this applies to all Xapian optional-reference-counted classes in all SWIG target languages where we enable directors.

Consider the case of testfieldprocessor which is a PHP subclass of XapianFieldProcessor.

We need to do a "director disown" for a testfieldprocessor object when it gets set on a XapianQueryParser object so that the underlying C++ object owns the PHP proxy object because we need the PHP proxy to live on until Xapian deletes the underlying C++ object so that virtual method calls can be routed to it.

However with the patched testcase above, the underlying C++ object gets deleted at unset($qptmp) because the C++ side only knows about the C++ references to this object - it doesn't know that the PHP proxy object still exists, so $fp now has a live PHP proxy object pointing to a deleted C++ object, and when we try to set it again we're into C++ undefined behaviour.

SWIG's ref and unref features seem like them might allow us to address this, but they don't actually solve the problem completely. We can increment the reference count of the underlying C++ object while the PHP proxy object is alive, then decrement it when the PHP proxy object is destroyed, but this effectively creates a reference loop: the PHP proxy has a Xapian-specific C++ reference to the Xapian C++ object, and the swig subclass of that created to implement directors has a PHP reference to the PHP proxy object. Since this reference loop is partly outside the target language's reference tracking there's not even any hope that it can be detected by the target language garbage collector (if there even is one).

It seems fundamentally "director disown" assumes that a target language reference isn't kept (in Python a weak reference is even returned to assist with that).

We could potentially wrap these methods in a C++11 move-like way, where the target language object is deliberately left unusable (SWIG 4.1.0 adds explicit support for move semantics) but that's an incompatible change.

So I think we need to stay with keeping a PHP-reference to the currently set XapianFieldProcessor objects, (which for PHP specifically will need some custom typemaps now there's no longer a PHP wrapper script generated we can patch).

comment:6 by Olly Betts, 19 months ago

In languages where we don't use directors (usually, perhaps always, because SWIG doesn't support them yet) I think we can just use ref and unref. In this case the C++ object can live on beyond the target language proxy object because it doesn't need to be able to call back to it, so if the proxy object increments and decrements the C++ reference types it seems everything should work. I haven't tested this yet though.

comment:7 by Olly Betts, 18 months ago

which for PHP specifically will need some custom typemaps now there's no longer a PHP wrapper script generated we can patch

Implemented in 7445f6c1d5b15b946ca77724895aeadabb125131.

In languages where we don't use directors (usually, perhaps always, because SWIG doesn't support them yet) I think we can just use ref and unref.

It looks like those languages actually already work OK - at least smoketest.lua tests setting a Stopper on a TermGenerator works after the Stopper goes out of scope. I'm not immediately seeing how that works, so will test a bit more thoroughly.

comment:8 by Olly Betts, 5 months ago

Milestone: 1.4.x2.0.0

I'm going to postpone the rest of this until a future release series as this is now mainly about making use of a better mechanism for holding onto these references.

Quite possibly some or all of the work could be backported to the current stable series once done.

Note: See TracTickets for help on using tickets.