Opened 8 years ago

Closed 19 months ago

#707 closed defect (fixed)

Problems on OS X El Capitan

Reported by: James Aylett Owned by: James Aylett
Priority: normal Milestone: 1.4.6
Component: Xapian-bindings Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: Mac OS X

Description

This is a tracking ticket for WIP to resolve this as best we can. El Capitan introduces System Integrity Protection, which (amongst other things) prevents us from setting DYLD_LIBRARY_PATH for any 'protected' binary (in this case typically things in /bin and /usr/bin). We use this DYLD feature to build and test bindings against uninstalled builds of xapian-core.

There are a number distinct results, with different mitigations:

  1. You cannot test against system version of languages (perl, tcl8, php, ruby, python2) with an uninstalled xapian-core, period. This will need noting somewhere, probably in the HACKING / developer guide docs.

Ideally we'd have a buildbot for the stable tarball which builds, tests & installs xapian-core, then builds & tests xapian-bindings using system languages. Right now we don't have that (and it's difficult to get right, because in practice you have to install a couple of extra libraries, which you typically do via homebrew).

  1. We need to adjust the build system so that you can run tests with non-system languages (eg homebrew), because make will typically invoke /bin/sh for any line that sets an environment variable before running something. Typically this is done in a run-<language>-test script. This is generally only going to affect people building from a git checkout.
  1. The Python bindings go a little further, because they (a) import their own extension while building documentation, and (b) import the built .so and then build pyo/pyc files. ie to even build them against an uninstalled xapian-core they need changes. Again, this is only going to work with non-system python (and so should probably be fixed at some point, although only for python2 as OS X 10.11 doesn't ship with python3). (You should still be able to install xapian-core, then build and install the python2 bindings, using the system python2.)
  1. Tests for the Perl bindings use prove, which you have to manually pick up as well as the perl binary itself. Homebrew doesn't link perl into /usr/local because it's a core piece of the system that could upset all sorts of other things, so you have to do something like:
configure PERL=/usr/local/Cellar/perl/5.22.1/bin/perl \
  PROVE=/usr/local/Cellar/perl/5.22.1/bin/prove

As yet untested: Java. There's no system Java as of OS X 10.11, and hopefully Oracle's JDK installs in a sensible place that doesn't get hit by SIP, but it's possible that the tests will have to be run slightly differently to work.

  1. Omega's omegatest has an optional test which uses faketime; if this is installed via homebrew, the test fails. Otherwise it is skipped.

Attachments (1)

bindings.patch (9.6 KB ) - added by James Aylett 8 years ago.
Build and test against non-system languages on OS X; untested for breaking system or non-OS X

Download all attachments as: .zip

Change History (9)

by James Aylett, 8 years ago

Attachment: bindings.patch added

Build and test against non-system languages on OS X; untested for breaking system or non-OS X

comment:1 by Olly Betts, 8 years ago

[2f535671ef3d74bf74abb945f1170ef973769110/git] should address point 5.

I wonder if setting DYLD_LIBRARY_PATH everywhere could be problematic. Making that all conditional is going to be rather ugly though.

I'd suggest we exec at the end of the run-XXX-test scripts, which should at least avoid an extra fork().

We could avoid using prove by making run-perl-test a perl script using App::Prove. The actually code content of the prove script is just a handful of lines which create an App::Prove object and use it, which neatly avoids the whole minefield of finding the prove which matches the perl we're using.

I also think at some point it may be sanest to stop trying to work around this and just document what doesn't work on OS X, and point out that this is Apple's fault not ours.

comment:2 by James Aylett, 8 years ago

We don't set DYLD_LIBRARY_PATH indiscriminately; it's controlled by NEED_INTREE_DYLD, which is only set on darwin*. However under some circumstances under the patch we'll _clear_ DYLD_LIBRARY_PATH on other platforms, although I don't believe that will cause any problems. The SET_DYLD_LIBRARY_PATH_IF_NEEDED approach doesn't have this problem. We could perhaps let that variable (which expands to DYLD_LIBRARY_PATH=… on darwin*) be exported down to shell invocations, and there do export $SET_DYLD_LIBRARY_PATH_IF_NEEDED, which I believe works although may be unportable, and is certainly unreadable.

+1 on using exec to roll up the ends of shell scripts.

I'm not sure whether re-implementing prove is wise…is it likely to change? If not it's moderately safe, at least. But I think we can just document this somewhere; it only really applies to people who are developing Xapian anyway.

I'm certainly in favour of drawing a boundary to what we're going to bother doing. I think if you can install core, and build & test bindings against system languages; and if you can build & test bindings against non-system languages with an uninstalled core (eg from a bootstrapped repo) then that's all we need, and we just document the rest. The former should still work. The latter it's about ensuring that we don't directly set DYLD_LIBRARY_PATH in make lines that call through /bin/sh.

comment:3 by James Aylett, 8 years ago

Ah, export $SET_DYLD_LIBRARY_PATH_IF_NEEDED is unportable, because export NAME=VALUE is a bash extension. And we can't do $SET_DYLD_LIBRARY_PATH_IF_NEEDED on its own portable either. So I think the current approach, but with more exec, is probably the best we can manage.

in reply to:  1 comment:4 by James Aylett, 8 years ago

Replying to olly:

[2f535671ef3d74bf74abb945f1170ef973769110/git] should address point 5.

Seems to, thanks!

comment:5 by James Aylett, 8 years ago

Rather than continue to update a patch, this is now on a github branch: https://github.com/jaylett/xapian/tree/osx-el-capitan .

comment:6 by Olly Betts, 8 years ago

FreeBSD also seems to use DYLD_LIBRARY_PATH (https://www.freebsd.org/cgi/man.cgi?query=dyld&apropos=0&sektion=0&manpath=Darwin+8.0.1%2Fppc&format=html) so setting it to an empty value is potentially problematic.

Can we not just make the code to set and export DYLD_LIBRARY_PATH conditionally - something like:

if [ -n "INTREE_DYLD_PATH" ] ; then
  DYLD_LIBRARY_PATH=$INTREE_DYLD_PATH
  export DYLD_LIBRARY_PATH
fi

App::Prove is documented as "Implements the "prove" command" and "The "prove" command is a minimal wrapper around an instance of this module", so it seems a safe approach.

export foo=bar is actually specified by POSIX, but it's not portable to older shells - the autoconf manual notes:

Posix requires 'export' to honor assignments made as arguments, but older shells do not support this, including '/bin/sh' in Solaris 10.

And Solaris 10 is apparently still in support for another 5 years.

comment:7 by Olly Betts, 6 years ago

I think the patch on your github branch was since addressed (in a similar but slightly different way) by [4c60bf7fc0d8c90d2d65f5bc96acefcf5998cea9].

And I've just pushed [b0f8bfb073fe71dda699304312bb71b0fa0b3852] which runs prove via App::Prove so specifying PROVE is no longer necessary.

There's still stuff unaddressed here I think though - going through the list in the original report:

Point 1 - we don't seem to have documented that anywhere yet. However, it seems #732 may offer a way to make this work for system interpreters.

Point 2 (ensuring DYLD_PATH is specified so that at least non-system interpreters work) should be addressed by [4c60bf7fc0d8c90d2d65f5bc96acefcf5998cea9].

Point 3 (python generation of .pyc and .pyo) is I think also address by [4c60bf7fc0d8c90d2d65f5bc96acefcf5998cea9].

Point 4 (perl and prove) should be addressed by [b0f8bfb073fe71dda699304312bb71b0fa0b3852].

I think we don't know about Java yet.

Point 5 (faketime in omegatest) should be addressed by [2f535671ef3d74bf74abb945f1170ef973769110], though it might be better to test it on omega rather than on the date command. But I think it'll fail on the date command because that's protected by SIP, while it'll fail on the in-tree omega because that's actually a libtool shell-script wrapper, and gets SIP protection thanks to the shell being invoked to run it. I'm not sure I see a nice way to get this to actually work (best way to make it work I can see is to link a second version of omega with -no-install so there's no shell wrapper), but disabling tests that need faketime isn't a big deal really - there aren't many and the get exercised on other platforms. It's not nice that they get disabled for kind of the wrong reason though. If you installed coreutils from source in a non-SIP directory and put that on your path, then faketime date could work while faketime ./omega would still fail.

So if you get a chance it would be good to:

  • Test that the patches applied address the points I think they do.
  • See what the situation is with Java.

Beyond that:

  • We ought to document whatever cases remain that don't work, but perhaps that's better done once we've looked at #732 (at least if we can do that fairly soon).
  • Perhaps adjust faketime handling. The downside of testing if faketime works by running omega under it is that if omega is buggy these tests could be skipped because we think faketime doesn't work. I think I'm OK with this as it is overall, unless someone has a better plan.

comment:8 by Olly Betts, 19 months ago

Milestone: 1.4.x1.4.6
Resolution: fixed
Status: newclosed

The last four action points are:

Test that the patches applied address the points I think they do.

The relevant changes seem to have landed in 1.3.5, 1.4.2 and 1.4.6.

It has been more than 5 years so I think we can reasonably assume the patches are working, and any issues will have been raised.

See what the situation is with Java.

CI is testing system Java on macOS successfully.

We ought to document whatever cases remain that don't work, but perhaps that's better done once we've looked at #732 (at least if we can do that fairly soon).

I've moved point 1 to #732 as an alternative possible resolution for that ticket.

Perhaps adjust faketime handling. The downside of testing if faketime works by running omega under it is that if omega is buggy these tests could be skipped because we think faketime doesn't work. I think I'm OK with this as it is overall, unless someone has a better plan.

The current situation isn't perfect, but it tends to err towards disabling faketime unnecessarily, and that's the better way to err. Also the code which requires faketime to test is also tested on Linux so we have reasonable coverage anyway.

So I think we can close this ticket now (but if you (or anyone) find unfixed issues with SIP please open a new ticket for them).

Note: See TracTickets for help on using tickets.