Opened 6 months ago
Last modified 6 months ago
#832 new task
Compiling xapian with meson. Strange error in tests.
Reported by: | mgautier | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Build system | Version: | |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description (last modified by )
Hi,
I am currently trying to compile xapian with meson build system. Once done, I plan to either provide a patch here (if you are interested) or publish the build system on https://mesonbuild.com/Wrapdb-projects.html
You can find the current status of xapian with meson here : https://github.com/openzim/xapian-meson
Here few status information (for context):
- We are interested on glass backend only so I concentrate on this only for now.
- Works only creating static library. Got undefined reference else (both linux and windows)
- Some changes are independents of meson and you probably want them anyway (you may find them on the github repository)
But for my issue, even if it seems to work pretty nicely on Linux (static lib) but on Windows, tests are broken.
Running test_api, I have some strange behavior. Some tests (qpmemoryleak1 and newfreelistblock1) fails sometime (with FAIL: DatabaseError: Commit failed (Permission denied)
) and sometime not. (with multi_glass or glass backend).
On top of that, the number of skip tests are not constant.
On libzim side (https://github.com/openzim/libzim) (which use xapian build on windows). Our test (xapian creation) always crashes (segfault) when compacting the backend database with flag Xapian::Compactor::FULLER
. It never fails without this flag, whatever xapian unittest are failing or not.
I am a bit stuck here. I hope that some of you would be able to help me.
---
For more context, we are already building xapian using autotools on windows on appveyor CI (https://github.com/kiwix/kiwix-build/blob/main/appveyor/build_xapian.sh). But we are in the process of homogenize our build system and I never succeed to build correctly again xapian using autotools locally. (spaces in path are a nightmare)
Thanks
Change History (8)
comment:1 by , 6 months ago
comment:2 by , 6 months ago
Works only creating static library. Got undefined reference else (both linux and windows)
It struck me that this is rather suspicious - there shouldn't be any undefined references (and there aren't with the current build system) so this suggests to me that you aren't putting the correct set of files into the library (or you're misbuilding them in some way that means symbols are missing, perhaps due to symbol visibility).
I'd suggest working on solving this as it may be the cause of the other problems and should be easier to debug (first compare the list of sources in the library with the current build system and yours; if the same, check the compilation options used match).
Also even if this is not the root of the problem, a shared library build would be a prerequisite for switching build systems so it needs resolving.
it would need to be targetted at the release series after next
BTW I should probably note that I'm expecting the release series after next to follow much sooner than 1.4.0 to now (or 1.2.0 to 1.4.0). The key thing to do for it is to complete the relicensing of libxapian.
comment:3 by , 6 months ago
Description: | modified (diff) |
---|
I'd been wondering about looking at switching to meson at some point so I'm open to the idea, but at this point in the release cycle I think it would need to be targetted at the release series after next
Sure, I was more looking for help as I'm a bit stuck here. My work is not yet ready for integration now.
It struck me that this is rather suspicious - there shouldn't be any undefined references (and there aren't with the current build system) so this suggests to me that you aren't putting the correct set of files into the library (or you're misbuilding them in some way that means symbols are missing, perhaps due to symbol visibility).
Thanks, I will search on this direction.
Aside of meson, I think the two commits https://github.com/openzim/xapian-meson/commit/4b302ec704f7b80860a0cf62b5c446341ace8e8b and https://github.com/openzim/xapian-meson/commit/88a068e42ec655853e1fb072295a8d9722c3b64b are commits you want to integrate. Can you directly apply them or you prefer I do separated PR(s) ?
comment:4 by , 6 months ago
Description: | modified (diff) |
---|
comment:5 by , 6 months ago
It struck me that this is rather suspicious - there shouldn't be any undefined references (and there aren't with the current build system) so this suggests to me that you aren't putting the correct set of files into the library (or you're misbuilding them in some way that means symbols are missing, perhaps due to symbol visibility).
Thanks, I will search on this direction
Well, it seems I was wrong with undefined references (and I would prefer being right as it doesn't seems simpler).
Dll need us to mark public (export) symbol with __declspec(dllexport)
but you never seems to do it. You have the XAPIAN_VISIBILITY_DEFAULT
defined in visibility.h
but it is only used for gcc whit symbol visibility support.
With autotools on windows it seems that libtool does the job for us in m4/libtool.m4
around line 4010. It seems to create a def file we can pass to compiler and avoid the __declspec
.
But meson doesn't do it for us. Strangely, I had the same issue on libzim and open a issue on meson at the time : https://github.com/mesonbuild/meson/issues/7838
Anyway, I have patched visibility.h
to define XAPIAN_VISIBILITY_DEFAULT
to __deplcspec(dllexport)
on msvc. But new problems coming. I have a lot of warning because (private) data member classes are not exported (which seems to be only a warning and we can ignore it https://stackoverflow.com/questions/2132747/warning-c4251-when-building-a-dll-that-exports-a-class-containing-an-atlcstrin#comment66255284_4563701)
But the real problem is that cl complains that intrusive_ptr (~intrusive_ptr
) uses undefined type Xapian::Compactor::Internal
. This is strange as we actually use ~intrusive_ptr
in Xapian::~Compactor
in compactor.cc
when Internal
is actually defined, so it should be ok (at least it is with gcc).
But cl complain as we use ~intrusive_ptr
in compactor.h
line 185. I suspect some strange rule about symbol definition in dll but I haven't found documentation confirming this.
I don't know why it is ok with libtool build. Maybe using a .def
file do not trigger the same things.
Anyway, using delete on incomplete type is UB (https://en.cppreference.com/w/cpp/language/delete) so I hope the windows&libtool version correctly call destructor from compactor.cc
and not silently generate UB.
---
Sorry if I spam you. I think I will use this issue to share my progress. Hope it is ok for you, I will find somewhere else if not.
comment:6 by , 6 months ago
Aside of meson, I think the two commits https://github.com/openzim/xapian-meson/commit/4b302ec704f7b80860a0cf62b5c446341ace8e8b
This isn't the right fix. The condition path && remote
is not true for any of the current test harness backends, so your patch effectively just disables these two testcases. The key point here is that remote
in the condition means that the test harness managed backend is a remote one, not that the remote backend is enabled in the library. We actually want to run these testcases only for non-remote test harness backends, for which we generate a database and put its path in a stub file to access using xapian-progsrv
, then try to check the database (which isn't implemented but we want to at least check it fails with an appropriate exception). Probably the best way to address potentially not having the remote backend is to check for FeatureUnavailableError
being thrown instead if XAPIAN_HAS_REMOTE_BACKEND
is not defined. I've pushed a change to do that, which should fix this for you.
Also these two testcases are pretty much the same testcase so one should just be removed. I've done that too.
and https://github.com/openzim/xapian-meson/commit/88a068e42ec655853e1fb072295a8d9722c3b64b are commits you want to integrate.
This one looks good (aside from whitespace) - applied.
Can you directly apply them or you prefer I do separated PR(s) ?
If you have further fixes PRs are definitely preferred (and against master unless the fix is only relevant for 1.4.x like the second one here). You seem to have imported just xapian-core into a fresh git repo, so I can't just cherry-pick commits across to the xapian repo, and generating and applying patches is fiddly because you seem to have generated files committed and be using different paths. PRs also get CI tested automatically.
comment:7 by , 6 months ago
Yes, libtool has a nice feature that avoids the need for all that tedious annotating of headers with __declspec
nonsense with MSVC - see the "Windows DLLs" section of the libtool manual:
https://www.gnu.org/software/libtool/manual/html_node/Windows-DLLs.html
If meson would require us to add and maintain all that I'm much less keen on it. I have better things to do with my time.
I'll take a look at the destructor issue, but not today.
comment:8 by , 6 months ago
The destructor of Compactor
looks OK to me - as you say the destructor is defined within the library where Compactor::Internal
is visible, so I don't see any undefined behaviour.
Anyway, I have patched
visibility.h
to defineXAPIAN_VISIBILITY_DEFAULT
to__deplcspec(dllexport)
on msvc.
I've seen suggestions to use the same set of macros for both like this out on the web, but I don't think the semantics of GCC's symbol visibility and MSVC's __declspec
for DLLs are identical so I'm dubious it'll just work.
I'd been wondering about looking at switching to meson at some point so I'm open to the idea, but at this point in the release cycle I think it would need to be targetted at the release series after next. We're getting close to starting the next release series (down to a handful of open tickets) and it's already been much too long since we started 1.4.x. I would still consider merging additional changes at this point, but changing the entire build system seems too much (and there isn't a working patch yet).
I really can't think of a reason for the failures you're seeing, but I'm afraid I just don't have the spare cycles to investigate at the moment.
I don't think I've seen any of those problems with the current build system. The only currently sometimes flaky tests are some of the timed ones (which you can set
export AUTOMATED_TESTING=1
to skip). We have sometimes hit platform-specific flakiness, but I think we now have fixed or worked around all known cases of that.We also have builds in our CI: