Opened 13 years ago

Closed 4 years ago

#186 closed defect (fixed)

User subclassable classes should be reference counted

Reported by: Richard Boulton Owned by: Olly Betts
Priority: normal Milestone: 1.3.5
Component: Library API Version: git master
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

MatchDeciders should be subclasses of RefCntBase, so that they can be reference counted. This would allow classes like MultipleMatchDecider to keep a reference count to the match deciders in it, avoiding the risk of the deciders used in it being freed before the match decider is used.

This will involve and API and ABI change, so will have to wait for 1.1.0

Attachments (3)

patch.2 (29.1 KB ) - added by Richard Boulton 12 years ago.
Updated patch for subclassing Stem
patch (15.4 KB ) - added by Richard Boulton 12 years ago.
Patch which allows subclasses of Stem
ref-counted-subclassables.patch (10.8 KB ) - added by Olly Betts 7 years ago.
Proposed patch (updated)

Download all attachments as: .zip

Change History (42)

comment:1 by Richard Boulton, 13 years ago

Blocking: 160 added

Marking as blocking 1.1.0 so we don't forget.

comment:2 by Richard Boulton, 13 years ago

Status: newassigned

comment:3 by Olly Betts, 13 years ago

I don't think this will work (otherwise we'd already be doing it!)

If Xapian::MatchDecider just subclasses RefCntBase then it can be reference counted internally, but not externally. That just won't work. For example, the user might not allocate the MatchDecider explicitly:

    Xapian::MatchDecider foo;
    enq.set_matchdecider(&foo);

Or it could be allocated using new[].

Even if we restrict the user to using new, then the external pointer won't be reference counted. How do we know when the last pointer to the object has gone when there may be an unknown number of external pointers to it?

So Xapian::MatchDecider would have to be like most other API classes and contain a RefCntPtr<...> member pointing to an internal class. But then the user can't subclass usefully - sub-classing MatchDecider would add member variables to the pointer, not the object.

I agree a way to reference count MatchDecider and other similar classes would be nice, but the RefCntBase approach doesn't seem to work.

Last edited 5 years ago by Olly Betts (previous) (diff)

comment:4 by Olly Betts, 13 years ago

Blocking: 182 added

comment:5 by Olly Betts, 13 years ago

Cc: olly@… added

comment:6 by Richard Boulton, 13 years ago

Hmm - after reading your comments, the only way I can see that it could work would be to require the user to keep all references to the MatchDecider object in the form of RefCntPtr<...> wrappers. This is probably far too restrictive, and error-prone. So, unless anyone has any bright ideas, I suspect we'll have to drop this suggestion (and bug #182, as a result).

Last edited 5 years ago by Olly Betts (previous) (diff)

comment:7 by Richard Boulton, 13 years ago

Owner: changed from Richard Boulton to Not currently assigned
Status: assignednew

I don't know how to do this, so I'm not working on it, so assigning it to nobody.

comment:8 by Richard Boulton, 13 years ago

Just to note - ExpandDeciders should be too, for all the same reasons.

comment:9 by Olly Betts, 12 years ago

Also Sorter and Stopper objects.

by Richard Boulton, 12 years ago

Attachment: patch.2 added

Updated patch for subclassing Stem

by Richard Boulton, 12 years ago

Attachment: patch added

Patch which allows subclasses of Stem

comment:10 by Richard Boulton, 12 years ago

attachments.isobsolete: 01

comment:11 by Olly Betts, 12 years ago

This seems quite a fundamental change, so I'd quite like to have a good look at the patch before we apply it, but I won't be able to do so until next week.

comment:12 by Richard Boulton, 12 years ago

I've made a branch ("stemrefcnt") which contains an updated version of the patch. I've also added a "StemWithNoStemList" class on this branch, which is a stemmer with a list of exceptions which aren't stemmed.

I think the stemmer implementation in xapian-core on this branch is suitable for merging to HEAD. However, stemrefcnt/xapian-bindings needs updating accordingly. This isn't trivial, but I think some of the recent features added to swig may be sufficient to allow it to be wrapped.

comment:13 by Richard Boulton, 12 years ago

attachments.isobsolete: 01
Operating System: All

comment:16 by Richard Boulton, 12 years ago

Description: modified (diff)
Milestone: 1.1

comment:17 by Richard Boulton, 12 years ago

Blocking: 160 removed

comment:18 by Olly Betts, 12 years ago

ValueRangeProcessor is also affected.

comment:19 by Richard Boulton, 12 years ago

On reflection, I think the only way to make MatchDeciders, ValueRangeProcessors, and other user-defined subclasses reference counted is to expose the reference counting mechanism fully. Without this, I don't think there's any hope of making the bindings work correctly with the reference counting.

SWIG has support for reference counting (and other smart pointers) with some of its backends - but I'm not convinced it has support for all the backends which we require.

My favoured long-term solution would be to move xapian (at least, the publically exposed parts of xapian) over to using Boost::shared_ptr, and other smart pointers. This would make it easier to use xapian correctly from other Boost using C++ projects, and avoid us having to maintain our own smart pointer implementations, and make it easier to integrate with the SWIG reference counting support.

However, doing this is probably too major a change to be considered for 1.1, so I suggest postponing until a fictional future 2.0 release.

comment:20 by Olly Betts, 12 years ago

Milestone: 1.1.02.0.0

Yes, I think the boost implementations are probably the way to go, but indeed this is a big incompatible API change, so our release numbering means 2.0.0 is the time for that.

comment:21 by Olly Betts, 11 years ago

Summary: MatchDeciders should be reference countedUser subclassable classes should be reference counted

Just to note that the symptoms of the lack of C++ reference counting for the classes where this is currently visible (which doesn't include MatchDecider) is fixed for Python (#341), and other bindings could probably do something similar.

I'm going to retitle actually.

comment:22 by Olly Betts, 10 years ago

Blocking: 182 removed

Removing "blocking", since removing reciprocal "blocked-by" from #182 didn't seem to do so automatically.

comment:23 by Olly Betts, 7 years ago

Cc: Richard Boulton added; Olly Betts removed
Description: modified (diff)
Milestone: 2.0.01.3.x
Owner: changed from Not currently assigned to Olly Betts

I've had an idea about this - if we provide ref() and unref() methods, we can make this work for the bindings without having to expose how the ref counts are handled internally, and provide a nice simple API to the mechanism required.

And we can also provide a "release()" method (maybe there's a better name?) which allows C++ users to hand ownership of the object to the class it is set on (see the new testcase in the patch I'm about to attach).

The really nice thing is that if you don't call these methods, things work as they do already - i.e. this is API compatible with existing user code (but not ABI compatible). So it's 1.3.x material.

by Olly Betts, 7 years ago

Proposed patch (updated)

comment:24 by Olly Betts, 7 years ago

Cc: Richard Boulton removed

I replaced the patch with one which also removes the current workaround for Perl too.

comment:25 by Olly Betts, 7 years ago

Milestone: 1.3.x1.3.2
Status: newassigned

There are some issues still with the bindings, shown by the python tests.

Having poked it for a while, I think we need to "disown" (in SWIG terms) the wrapped subclassed objects when we pass them to C++, so that the C++ half holds a reference to the Python (or other language) half, and then sort out a few kinks so everything works smoothly with that done.

Marking this for 1.3.2.

comment:26 by Olly Betts, 7 years ago

Milestone: 1.3.21.3.3

Postpone until 1.3.3.

comment:27 by Olly Betts, 6 years ago

We now work around this issue for PHP too (see #614).

comment:28 by Olly Betts, 5 years ago

So some experimental hacking later, I have a hand-modified version which works for Python and VRPs.

To make this work we need to call vrproc.__disown__() just after we call the C++ add_valuerangeprocessor() method, and we need to unref() the C++ vrp object then too (otherwise both the C++ and Python objects will end up with one reference and neither will get deleted), but we need to only do this if the object hasn't already been disowned.

This is going to need a bit of work to get SWIG to generate code which makes this happen, but at least we now have a plan.

Last edited 5 years ago by Olly Betts (previous) (diff)

comment:29 by Olly Betts, 5 years ago

[5d10895593a6e277544640092b0e526d26ecb6be] adds optional reference counting of ValueRangeProcessor objects to the C++ API, and [07d1ffcd10d8bbe6111d2e35a2f99486f0147b08] updates the bindings to ignore the new release() method and suppress a new warning from SWIG due to the C++ change.

Other subclassable functors also need doing, and then the bindings need modifying to make use of this. I don't see how to do that at all cleanly without extending SWIG to support this, so this part isn't feasible for 1.3.3 (at least not without delaying 1.3.3 unreasonably) so updating the milestone.

comment:30 by Olly Betts, 5 years ago

Milestone: 1.3.31.3.4

comment:31 by Olly Betts, 5 years ago

Milestone: 1.3.41.3.5

Ticket retargeted after milestone closed

comment:32 by Olly Betts, 5 years ago

Version: SVN trunkgit master

I think that we should implement this for the other functor classes for C++ before 1.4.0, as that is easy to do.

However we shouldn't try to make use of this in the bindings, as that's more involved (and seems to need changes to SWIG, or more cunning hackery than I've so far managed to come up with). Several of the bindings (Python, PHP, Perl) already have separate tracking code so there will be no observable behaviour change, while for the other languages, this means that cases which failed before now work, so it's reasonable as a change to make mid-1.4.x - the tracking code for Python and PHPwasn't added at the start of release series after all.

comment:33 by Olly Betts, 5 years ago

Actually, it's only some functor classes that this is useful for - ones which just get passed to a method and used by it don't need it, only cases where the functor is stored by the object for later use.

Classes to do:

  • ErrorHandler (? might just get removed/replaced)
  • ExpandDecider (? lingering references exist via ExpandDeciderAnd)
  • LatLongMetric (? is this class just too much generality?)
  • KeyMaker
  • PostingSource (? has clone(), but bindings need to handle specially)
  • Stopper
  • Weight (? not wrappable for bindings it seems)

comment:35 by Olly Betts, 4 years ago

Weight isn't relevant here - the weight object passed to Enquire::set_weighting_scheme() doesn't need to live on after the call (internally that object is cloned, and the clone used further).

So updated list of classes still to deal with:

  • ErrorHandler (? might just get removed/replaced)
  • ExpandDecider (? lingering references exist via ExpandDeciderAnd)
  • LatLongMetric (? is this class just too much generality?)
  • KeyMaker
  • PostingSource (? has clone(), but bindings need to handle specially)
Last edited 4 years ago by Olly Betts (previous) (diff)

comment:36 by Olly Betts, 4 years ago

KeyMaker done in [c90a93c55f5d7f49c8bee47a9b9473a3e46d7c05/git].

Updated list of classes still to deal with:

  • ErrorHandler (#3 suggests keeping this class but changing how it is used - adding ref counting now means we can make that change in 1.4.x)
  • ExpandDecider (? lingering references exist via ExpandDeciderAnd)
  • LatLongMetric (? is this class just too much generality?)
  • PostingSource (? has clone(), but bindings need to handle specially)
Last edited 4 years ago by Olly Betts (previous) (diff)

comment:37 by Olly Betts, 4 years ago

ErrorHandler now is optionally referenced counted in [08f9f7fa2684d8171dc954aa6c92866b3946956f/git]. Nothing in the library yet uses this, but it's there for the future.

Updated list of classes still to deal with:

  • ExpandDecider (? lingering references exist via ExpandDeciderAnd)
  • LatLongMetric (? is this class just too much generality?)
  • PostingSource (? has clone(), but bindings need to handle specially)
Last edited 4 years ago by Olly Betts (previous) (diff)

comment:38 by Olly Betts, 4 years ago

ExpandDecider done in [6a76ceed1546e1f298b23f2a593eb66e7adb8d25/git].

Remaining classes to deal with:

  • LatLongMetric (? is this class just too much generality?)
  • PostingSource (? has clone(), but bindings need to handle specially)

comment:39 by Olly Betts, 4 years ago

LatLongMetric objects are currently cloned by the objects they are set on, so don't need reference counting as things stand.

I'm not entirely sure this cloning is a good idea - Weight objects support clone(), but they have to be cloned for each weighted subquery; PostingSource objects support clone(), but they need to be cloned for each subdatabase. It seems LatLongMetric only supports clone() as a way to avoid holding a pointer/reference to an external LatLongMetric object, and making it optionally reference counted is more consistent and probably makes it a little simpler to implement your own metric.

comment:40 by Olly Betts, 4 years ago

OK, so the reason why LatLongMetric has clone() is so that LatLongPostingSource::clone() can in turn clone the metric. Optionally reference counting the metric would provide the same ability with less object copying and better consistency, though this only happens once per sub database per query run, so really isn't a hot spot.

If we change that, it would make sense to change to passing a pointer instead of a reference. The bindings would also need an updating to keep a reference to the currently set metric.

Overall, I'm inclined to leave this as-is for 1.4.0. I think we may want to review the geospatial API at some future date anyway (and I've opened #712 for that), but more real world use would be useful to inform that.

So that leaves just PostingSource to consider...

comment:41 by Olly Betts, 4 years ago

Resolution: fixed
Status: assignedclosed

PostingSource done in [0660a9a99bcd09ff137e75e4581729a2beb0a5c2/git]. I've opened #714 to cover the work still needed on the bindings, so this ticket can be closed.

Note: See TracTickets for help on using tickets.