Opened 19 years ago

Last modified 11 months ago

#63 assigned enhancement

Improve visibility annotations for the library

Reported by: Olly Betts Owned by: Olly Betts
Priority: low Milestone: 2.0.0
Component: Library API Version: git master
Severity: minor Keywords:
Cc: Richard Boulton Blocked By:
Blocking: Operating System: Linux

Description (last modified by Olly Betts)

See http://gcc.gnu.org/wiki/Visibility

This should make the shared library much smaller, and a little faster!

Change History (28)

comment:1 by Olly Betts, 19 years ago

Status: newassigned

comment:2 by Olly Betts, 18 years ago

Priority: highnormal

comment:3 by Richard Boulton, 18 years ago

Blocking: 118 added

It would be lovely to do this for 1.0. Possibly not important enough to worry about right now, though.

comment:4 by Richard Boulton, 18 years ago

Cc: richard@… added

comment:5 by Olly Betts, 18 years ago

I've put this off previously as visibility annotations are only useful for GCC >= 4.0. However, such versions are now deployed by many linux distros, so I think the time has come for this, and it's not really something we can retrofit into the 1.0 releases. It shouldn't be a lot of work, but it would be good to do it now so any issues have a chance to be shaken out before release.

So I've just made a first pass. The library builds, as do simple utilities like "delve". Things like "quartzcheck" which hook into quartz don't build, and neither does the test harness, so it needs some more work still.

Currently (on x86_64) the stripped libxapian.so is 8% smaller (unstripped is 2.6% smaller), while the number of external symbols drops from 2851 to 508.

Last edited 11 years ago by Olly Betts (previous) (diff)

comment:6 by Olly Betts, 18 years ago

Hmm, what a mess!

The test harness uses a few bits of the library internals, which is a bit unclean but we can expose those for now at least.

Doing that, apitest builds and passes, which is a promising sign.

But quartztest, btreetest, and internaltest don't stand a chance without exposing shedloads of library internals.

Looking at quartztest, much of it can either just be removed or turned into API level tests where such tests don't already exist.

Not sure what to do about the rest. Unit tests are a good idea, so deleting them isn't really a good option (btreetest and quartztest are less important now flint is the default, but we ought to at least consider a flint btree-level test).

We could try to just link against the object files which are actually needed, but then we also need utility functions, debug functions, etc and this seems likely to be a maintenance headache.

Passing "-static" when linking these problematic binaries is a fix which works (though it's not ideal either). This only links libxapian statically at least - libstdc++ and the shared library aren't affected. Also, we only need to do this for GCC >= 4.0. If we go this route, I think we should merge all the affected tests into a single binary (or fewer binaries at least).

Any other ideas?

comment:7 by Richard Boulton, 18 years ago

Possibly not a helpful comment - but don't the utilities in xapian-core/bin/ also need to link to various internals? It would be nice if we could move sufficient amounts of their functionality into the API, so the utilties could be implemented with just the public API (with the side benefit that things like the python bindings could perform a check of a database without needing to exec a sub-process). However, that's not going to happen in the near future.

The -static approach seems the only way forward to me: if the symbols aren't exported by the library, there's no way that utilities are going to be able to link to it, however clever we are. So, the question is whether it's worth doing given that cost.

This isn't a particularly urgent fix, in my mind; though 8% would be a nice size improvement, I've not come across anywhere recently where size is a problem (though if someone wanted to use Xapian in an embedded system it might become one). The attraction of doing this is more to keep the API neat, and reduce the probability of symbol collisions. Perhaps we should defer this to 1.1, and try and move tests into fewer binaries and work on the utilities in the meantime.

comment:8 by Olly Betts, 18 years ago

The utilities in xapian-core/bin do indeed, but the API we need to expose there is fairly limited, so for now I've just marked those as visible (which isn't actually worse than just using them because they happen to be visible if you include the right header!) As you say, longer term it would be better to make those implementable through the API. In particular, being able to merge databases would be handy when writing an indexer.

Given I've got this far, I'm inclined to do a bit more work and commit this, even if it means marking all the classes and functions used by the testsuite as visible (at least some things will be hidden, which would be a step forward).

In the longer term, a coherent approach to allowing unit testing of things like the serialisation functions would be useful though...

comment:9 by Richard Boulton, 18 years ago

As you say, hiding even some bits of the internals is better than nothing at all, and getting the infrastructure in place to remove more when we are able to is also good. I hadn't thought of doing a partial implementation like that.

comment:10 by Olly Betts, 18 years ago

Blocking: 118 removed
Operating System: Linux
Priority: normallow
Summary: Add visibility annotations to the libraryImprove visibility annotations for the library

I've just committed changes to implement this.

For now, all the internals required by stuff in xapian-core/bin and xapian-core/tests is simply marked as visible. We can look at removing the need for some of these dependencies (e.g. by adding API methods to allow databases to be merged or checked) and think about a strategy to allow sane unit testing.

I've also not worried about marking internal members of visible classes as non-visible - the gain from that is likely to be fairly small, so that can wait until later too I think.

So I'm retitling this bug, and removing the blockage on 1.0.

comment:12 by Olly Betts, 16 years ago

Description: modified (diff)
Milestone: 1.1.1

Flag for reassessing during the 1.1.x series.

comment:13 by Olly Betts, 16 years ago

Milestone: 1.1.11.1.4

Triaging milestone:1.1.1 bugs.

comment:14 by Olly Betts, 15 years ago

Milestone: 1.1.41.2.0

I've removed three symbols which didn't need to be visible at all. All the rest seem to be currently used.

We could improve things further by having a "standalonetest" which does not link to libxapian, and move some of the internaltest testcases there.

But there are some obstacles:

  • testsuite.cc currently pulls in backendmanager stuff (which is shouldn't really anyway as this is meant to be a generic test harness).
  • testsuite.cc has a catch for Xapian::Error (again, it shouldn't really). Richard's idea to fix this is a call-back function - we can use the catch (...) and rethrow trick that xapian-bindings uses to rethrow the exception in the callback function and process it there.
  • Some of the internals tested aren't as standalone as perhaps they ought to be - e.g. mixing generic and Xapian-specific code in the same file, throwing Xapian::Error subclasses from otherwise generic code.

Also, if it is just a case of changing a non-API symbol from visible to hidden, that isn't an ABI breaking change, as no user code should be linked to that symbol. So we can address this in 1.2.x without issues.

So I'm going to bump the milestone now.

comment:15 by Olly Betts, 13 years ago

Milestone: 1.2.x1.3.x

BitWriter doesn't need public visibility - fixed in trunk r16265.

Moved tests of serialise_double() and unserialise_double() from internaltest to unittest so they no longer need public visibility in trunk r16266.

comment:16 by Olly Betts, 13 years ago

I've knocked off a few more:

  • decode_length() in r16280
  • file_exists() and dir_exists() in r16288

comment:17 by Olly Betts, 13 years ago

And serialise_document() and unserialise_document() (which are now available via the public API) in r16294.

comment:18 by Olly Betts, 13 years ago

In r16316 I've added an API for xapian-check's functionality and modified xapian-check to use that, allowing removal of XAPIAN_VISIBILITY_DEFAULT from a number of functions which aren't in the public API.

comment:19 by Olly Betts, 11 years ago

More progress on eliminating symbols has been made on trunk (these will all debut in 1.3.2):

A while ago, I reworked the 4 version inspection functions to be inline wrappers around a single function in the shared library which returns a pointer to a structure, which eliminates 3 external symbols.

I've also just deprecated 6 of the database factory functions in favour of flags passed to the Database and WritableDatabase constructors, and these deprecated functions are now implemented as inline wrappers which just call those constructors, so that's another 6 external symbols eliminated.

comment:20 by Olly Betts, 9 years ago

Milestone: 1.3.x1.3.5

[d56fc51f498b474001744cd76e53c9f6beef6264] makes the symbols for the various overloaded forms of str() private to the library.

Marking for 1.3.5 to at least check through the remaining symbols which are currently public but shouldn't really need to be.

comment:21 by Olly Betts, 9 years ago

Looking at where XAPIAN_VISIBILITY_DEFAULT is used outside of `include/xapian/' in xapian-core:

  • api/replication.h: Replication was originally an external API, but we weren't entirely happy with it. Perhaps whatever that was ought to be sorted out and these made external API features. DatabaseMaster and DatabaseReplica need to be visible for xapian-replicate and xapian-replicate-server. ReplicationInfo is entirely defined within the header, so probably doesn't need to be marked with XAPIAN_VISIBILITY_DEFAULT.
  • backends/chert/: ChertCursor and ChertTable used by xapian-inspect. The xapian-inspect tool is really more of a development aid than a user tool, so perhaps it shouldn't be installed, just built in-tree and statically linked to the bits of the backend it needs.
  • common/stringutils.h: Tables for C locale versions of isalpha(), etc, tolower() and toupper(). Perhaps these functions should be available in the API.
  • net/: RemoteServer, RemoteTcpServer: Used by xapian-progsrv and xapian-tcpsrv. Perhaps ought to be available via the API to allow other forms of remote server.
  • net/: ReplicateTcpClient, ReplicateTcpServer: Used by xapian-replicate and xapian-replicate-server. It seems silly we need these as well as the (originally public) API above - I guess it's for the TCP support code.
  • net/serialise.h: serialise_error() and unserialise_error() - used in testcases in tests/internaltest.cc. Perhaps could be moved to tests/unittest.cc to avoid needing these outside the library.
  • net/tcpserver.h: TcpServer, referenced in bin/xapian-tcpsrv.cc as a hook for setting a Registry (for adding custom weighting schemes, etc) but this is actually the wrong class! (EDIT: Also used via subclasses in bin/xapian-tcpsrv.cc and bin/xapian-replicate-server.cc)
Last edited 9 years ago by Olly Betts (previous) (diff)

comment:23 by Olly Betts, 9 years ago

Moved the tests of serialise_error() and unserialise_error() to tests/unittest.cc and made these symbols hidden in [63208048574983216aac07a101e04a609b4f0848].

Updated list of where XAPIAN_VISIBILITY_DEFAULT is used outside of `include/xapian/' in xapian-core:

  • api/replication.h: Replication was originally an external API, but we weren't entirely happy with it. Perhaps whatever that was ought to be sorted out and these made external API features. DatabaseMaster and DatabaseReplica need to be visible for xapian-replicate and xapian-replicate-server.
  • backends/chert/: ChertCursor and ChertTable used by xapian-inspect. The xapian-inspect tool is really more of a development aid than a user tool, so perhaps it shouldn't be installed, just built in-tree and statically linked to the bits of the backend it needs.
  • common/stringutils.h: Tables for C locale versions of isalpha(), etc, tolower() and toupper(). Perhaps these functions should be available in the API.
  • net/: RemoteServer, RemoteTcpServer: Used by xapian-progsrv and xapian-tcpsrv. Perhaps ought to be available via the API to allow other forms of remote server.
  • net/: ReplicateTcpClient, ReplicateTcpServer: Used by xapian-replicate and xapian-replicate-server. It seems silly we need these as well as the (originally public) API above - I guess it's for the TCP support code.
  • net/tcpserver.h: TcpServer used via subclasses in bin/xapian-tcpsrv.cc and bin/xapian-replicate-server.cc.

comment:24 by Olly Betts, 9 years ago

In [95a6ea015c939109cce30a4ee328b39f3a0c5216] I have folded the tables for isalpha() etc into one table, and put it into the constinfo structure. There's not a public API using this yet, but one could be added at any point.

comment:25 by Olly Betts, 9 years ago

Milestone: 1.3.51.4.x

And [4b3e73c87dac176437b985faebc50ebfecad566a] cleans up ChertCursor and ChertTable.

Updated list of where XAPIAN_VISIBILITY_DEFAULT is used outside of `include/xapian/' in xapian-core:

  • api/replication.h: Replication was originally an external API, but we weren't entirely happy with it. Perhaps whatever that was ought to be sorted out and these made external API features. DatabaseMaster and DatabaseReplica need to be visible for xapian-replicate and xapian-replicate-server. There was some discussion of the idea that replication should be split into creating changesets, transporting changesets and applying changesets, so you could transport the changeset files by other means (replication by sneakernet, etc), but I think that's out of scope for 1.3.x at this point.
  • net/: RemoteServer, RemoteTcpServer: Used by xapian-progsrv and xapian-tcpsrv. Perhaps ought to be available via the API to allow other forms of remote server.
  • net/: ReplicateTcpClient, ReplicateTcpServer: Used by xapian-replicate and xapian-replicate-server. It seems silly we need these as well as the (originally public) API above - I guess it's for the TCP support code.
  • net/tcpserver.h: TcpServer used via subclasses in bin/xapian-tcpsrv.cc and bin/xapian-replicate-server.cc.

I think we need to punt on the rest for this development cycle. The only quick change I can see is making them public APIs, which wouldn't actually reduce the symbol count.

comment:26 by Olly Betts, 5 years ago

Version: SVN trunkgit master

The current status is unchanged from the comment above.

comment:27 by Olly Betts, 5 years ago

Milestone: 1.4.x1.5.0

[c46ddd7eaa2c8bc3730e169255f87f97f4ce6438] moves RemoteTcpServer from libxapian to xapian-tcpsrv, since it's only used in the latter.

Updated list of where XAPIAN_VISIBILITY_DEFAULT is used outside of include/xapian/ in xapian-core:

  • api/replication.h: Replication was originally an external API, but we weren't entirely happy with it. Perhaps whatever that was ought to be sorted out and these made external API features. DatabaseMaster and DatabaseReplica need to be visible for xapian-replicate and xapian-replicate-server. There was some discussion of the idea that replication should be split into creating changesets, transporting changesets and applying changesets, so you could transport the changeset files by other means (replication by sneakernet, etc), but I think that's out of scope for 1.3.x at this point.
  • net/: RemoteServer: Used by xapian-progsrv and xapian-tcpsrv. Perhaps ought to be available via the API to allow other forms of remote server.
  • net/: ReplicateTcpClient, ReplicateTcpServer: Used by xapian-replicate and xapian-replicate-server. It seems silly we need these as well as the (originally public) API above - I guess it's for the TCP support code.
  • net/tcpserver.h: TcpServer used via subclasses in bin/xapian-tcpsrv.cc and bin/xapian-replicate-server.cc.

This isn't 1.4 material at this point.

Last edited 5 years ago by Olly Betts (previous) (diff)

comment:28 by Olly Betts, 18 months ago

We can't just move ReplicateTcpClient and link it only into xapian-replicate because it uses RemoteConnection so we'd need to expose that in the library ABI instead. Similarly for ReplicateTcpServer.

I tried this and then had to move various other things around and add ABI visibility to TcpClient as well as RemoteConnection.

In exchange, I removed ABI visibility from ReplicateTcpClient, ReplicateTcpServer and TcpServer.

I need to check if this is actually a win in terms of symbols in the ABI. TcpClient is just a single method in a namespace (so one symbol) but RemoteConnection has quite a lot of methods.

Maybe more refactoring could establish a cleaner split point.

comment:29 by Olly Betts, 11 months ago

Milestone: 1.5.02.0.0
Note: See TracTickets for help on using tickets.