Opened 17 years ago

Closed 17 years ago

Last modified 16 years ago

#115 closed defect (released)

msvc failing some tests

Reported by: Mark Hammond Owned by: Charlie Hull
Priority: normal Milestone: 1.0.0
Component: MSVC makefiles Version: SVN trunk
Severity: normal Keywords:
Cc: Olly Betts, Richard Boulton, Sidnei da Silva, Mark Hammond Blocked By:
Blocking: #118 Operating System: All

Description (last modified by Richard Boulton)

I'm attaching a patch with the outstanding changes to the MSVC makefiles, necessary to get things building on Windows with the new remote/net port.

The patch includes the following:

  • win32_net.mak, a new makefile to allow the 'xapian-core/net' directory to

build on windows.

  • config.mak now attempts to use a debug Python library if xapian is built in

debug mode. Further, add support for building from a Python source-tree where 2 INCLUDE directories are necessary (the fact one does not exist for binary users should be fine).

  • win32_bindings_python.mak includes support for the 2 Python INCLUDE

directories, as well as adds the 'net' and 'remote' libraries.

Please note the patch has Windows line endings.

*

Attachments (4)

win32msvc.patch (3.8 KB ) - added by Mark Hammond 17 years ago.
as described
tests.log (23.3 KB ) - added by Charlie Hull 17 years ago.
Log output of 'nmake check' from Xapian HEAD
patch9.patch (313 bytes ) - added by Charlie Hull 17 years ago.
Patch to copy zlib.dll before running tests
tests.2.log (23.8 KB ) - added by Charlie Hull 17 years ago.
Log of make check on CH's computer

Download all attachments as: .zip

Change History (34)

by Mark Hammond, 17 years ago

Attachment: win32msvc.patch added

as described

comment:1 by Mark Hammond, 17 years ago

Cc: sidnei@… added

comment:2 by Charlie Hull, 17 years ago

I've now applied this patch to the latest SVN-HEAD and unfortunately it fails here:

C:\work\xapian-SVN\xapian-core\tests>apitest -b remotetcp Running anydb tests with remotetcp backend... Running test: zerodocid1... At this point both apitest.exe and xapian-tcpsrv.exe die with the usual '..has encountered a problem and needs to close'.

Is there anything else I need to set up to make the tests work?

comment:3 by Mark Hammond, 17 years ago

The crash seems to have been resolved - apitest now passes for me - stemtest fails, but that is a different story...

comment:4 by Charlie Hull, 17 years ago

Nope, still crashes for me. Mark's original patches are now applied to trunk.

comment:5 by Richard Boulton, 17 years ago

Blocking: 118 added

Charlie - is this still crashing for you? I've set this bug to block the tracker bug for releasing version 1.0, so we can be sure this works before we release - close the bug when it's resolved.

comment:6 by Charlie Hull, 17 years ago

Yep, still crashing for me. Haven't worked out a decent way to find out why though.

comment:7 by Olly Betts, 17 years ago

Cc: olly@… added
Status: newassigned

Is this a problem in the MSVC makefiles or the C++ code or unknown? The tests built with mingw all passed when I tried a few days ago FWIW.

comment:8 by Richard Boulton, 17 years ago

Cc: richard@… added

comment:9 by Olly Betts, 17 years ago

Summary: updated msvc makefilesmsvc failing some tests

The build (as seen on buildbot) wasn't working at all, but I've fixed up the @PERL@ substituting rules and got stemlangs1 passing. Lots of problems with remote and remotetcp tests, quite likely with the same cause though.

comment:10 by Olly Betts, 17 years ago

attachments.isobsolete: 01

(From update of attachment 51) Marking patch obsolete as it has been applied.

comment:11 by Olly Betts, 17 years ago

Any progress on this? I asked buildbot, but it's failing because of not having zlib installed at the moment.

Strictly speaking, I don't think this is a blocker for 1.0. The MSVC makefiles use static linking, so ABI breakage isn't an issue, and fixing this shouldn't require API changes. And since remote backend support for MSVC is a new feature this isn't a regression from a previous version.

It would definitely be better to release 1.0 with fully working MSVC remote backend support, but I don't think I'd want to hold up the release for it when it can easily be done in 1.0.1 or similar. I'd be amazed if we don't need a few "mop up" releases to fix issues found in 1.0.0 - so much has changed since 0.9.10.

comment:12 by Mark Hammond, 17 years ago

Charlie - are you using VS 2005? Sidnei found an issue with that compiler regarding the use of _osf_handle (sp?). I'm not in a position to test that compiler at the moment though.

comment:13 by Charlie Hull, 17 years ago

Nope, I'm using Visual C++ Express Edition for all my Xapian compilation now.

comment:14 by Sidnei da Silva, 17 years ago

Visual C++ Express Edition 2005 is what I'm using, so I believe it's the same?

comment:15 by Charlie Hull, 17 years ago

OK, here are the details of the tests that are failing for me; if there's something different happening at your end (Sidnei/Mark) then we should investigate. The log is from running nmake check on a HEAD revision by the way, and during the last line of the tests I get xapian-tcpsrv.exe and apitest.exe causin segfaults.

By the way, have I missed something or do we need to mod the tests makefile to copy in zlib.dll? This isn't a major change, I've attached a patch.

by Charlie Hull, 17 years ago

Attachment: tests.log added

Log output of 'nmake check' from Xapian HEAD

by Charlie Hull, 17 years ago

Attachment: patch9.patch added

Patch to copy zlib.dll before running tests

comment:16 by Olly Betts, 17 years ago

attachments.isobsolete: 01

(From update of attachment 71) Applied patch (though Mark was suggesting that zlib should just be linked into the xapian library, which probably makes more sense overall...)

comment:17 by Sidnei da Silva, 17 years ago

Yup, that's exactly the same issue I'm seeing here.

I've traced it down to, on xapian-tcpsrv.exe, RemoteConnection::send_message at:

HANDLE hout = (HANDLE)_get_osfhandle(fdout);

And for apitest.exe, it happens in RemoteConnection::read_at_least at:

HANDLE hin = (HANDLE)_get_osfhandle(fdin);

comment:18 by Olly Betts, 17 years ago

And Mark has worked out what the problem is with that line.

In the 2005 compiler, MS changed methods which receive invalid parameters to throw a structured exception rather than return an error code, but the WIN32-specific code in the remote backend relies on using such an error code to know if it has a file descriptor or a HANDLE. I think he's going to add an exception handler to work around this issue.

Here's a URL which discusses this change a bit:

http://boinc.berkeley.edu/app_debug_win.php#Common0xc000000d

comment:19 by Richard Boulton, 17 years ago

Cc: mhammond@… added

I've applied a patch to remoteconnection.cc which I hoped would fix this, but it fails to compile on windows. The error log is at:

http://buildbot.enfoldsystems.com/all/xapian-msvc/builds/130/step-compile/0

The relevant MSDN documentation pages are:

http://msdn2.microsoft.com/en-us/library/ks2530z6(vs.80).aspx and http://msdn2.microsoft.com/en-us/library/a9yf33zb(VS.80).aspx

I can't see why the build is failing, so I'll have to ask someone with direct access to a windows platform to take a look.

comment:20 by Olly Betts, 17 years ago

I believe this marvellous feature is new in MSVC 2005, so perhaps the buildbot is too old. The compile errors suggest that we've just not seen prototypes for the handler stuff. Hmm, except that if this were true, why is the buildbot failing?

Ah, a bit of googling turns up this:

http://mail.python.org/pipermail/python-checkins/2006-July/054456.html

So it looks like we actually need <crtdbg.h> not <stdlib.h> and to check the undocumented macro STDC_SECURE_LIB. I'll commit a suitable patch and prod the buildbot.

comment:21 by Olly Betts, 17 years ago

OK, <crtdbg> is for _CrtSetReportMode() (which it seems we also need). The problem with not finding _invalid_parameter_handler, etc was that <stdlib.h> was in "#ifndef WIN32" not "#ifdef WIN32"!

So buildbot now builds again, but the tests are still failing in much the same way...

comment:22 by Richard Boulton, 17 years ago

The buildbot test results look quite odd to me: only 8 tests failed, but lots display error messages of the form:

Running test: poslist3....flint\dbw\position.DB - The process cannot access the file because it is being used by another process. .flint\dbw\postlist.DB - The process cannot access the file because it is being used by another process.

ok

but then pass. Perhaps these errors are being displayed by the sub-process, but I'm not sure how the tests still manage to pass.

Of the 8 tests which failed, 5 are expandweights1, which is reporting failures with TEST_EQUAL_DOUBLE for values which are displayed as the same - Olly did some fixes in this direction recently; I'm guessing that the fixes don't work on windows.

So, the output relating to the 3 remaining test failures is:

Running test: longpositionlist1....flint\dbw\position.DB - The process cannot access the file because it is being used by another process. .flint\dbw\postlist.DB - The process cannot access the file because it is being used by another process.

NetworkErrorNetworkError exception: REMOTE:read failed; context was: .flint/dbw

(context:remote:prog(..\win32\Release\xapian-progsrv -t300000 --writable .flint/dbw)

Running test: postlist1... NetworkErrorNetworkError exception: REMOTE:read failed; context was: .flint/db=apitest_simpledata (context:remote:prog(..\win32\Release\xapian-progsrv -t300000 .flint/db=apitest_simpledata)

Running test: keepalive1... Expecting exception NetworkError .\api_db.cc:564: Expected "Xapian::NetworkError"

FAILED

comment:23 by Olly Betts, 17 years ago

Should this bug be assigned to Charlie still? He doesn't seem to be working on it very actively at all...

I hadn't spotted that "The process cannot access the file because it is being used by another process" wasn't a test failing. A quick poke around the web finds lots of people encountering the error (in various contexts) but not a lot in the way of helpful answers. But the top hit makes me wonder if this is an error written out by the CRT when you try to bind to a port already in use, so we're getting it when BackendManager fails to open a server on a particular port because the old server hasn't quite released it yet, so it tries the next port up until it finds one not in use. In which case it's annoying but essentially harmless, as this code is only used by the testsuite.

Incidentally, I think I've found the least helpful piece of documentation ever:

http://www.microsoft.com/technet/prodtechnol/windows2000serv/reskit/w2000Msgs/3729.mspx?mfr=true

expandweights1 is a misfire I think - the displayed weights are identical, so it's just down to rounding. We were testing against DBL_EPSILON which is too exacting a requirement after multiple FP calculations. I tried to make it test such that it would only complain if the displayed numbers would be different, but it looks like I goofed.

I looked at keepalive1 a few weeks ago, and it looks like it's the idle timeout failing to fire (keepalive checks both that a keepalive results in no timeout, and that no keepalive results in a timeout). Someone with access to MSVC really needs to investigate that, but it's probably a bug in the Windows specific code in net/remoteconnection.cc.

I don't know about longpositionlist1 or postlist1. Perhaps it's a problem with sending a lot of data?

comment:24 by Olly Betts, 17 years ago

Resolution: fixed
Status: assignedclosed

Mark tracked down the remaining two problems, and reports that all tests now pass (the buildbot is sulking though).

comment:25 by Richard Boulton, 17 years ago

Resolution: fixed
Status: closedreopened

Charlie reports that it's still not working for him. He and I are investigating.

by Charlie Hull, 17 years ago

Attachment: tests.2.log added

Log of make check on CH's computer

comment:26 by Olly Betts, 17 years ago

Sadly, the log doesn't seem to tell me anything useful.

I think we've probably reached the point where this is just something we're going to have to note as not currently working in the readme file for the nmake files and the release notes. This problem only appears to manifest for MSVC 2005 (e.g. the buildbot builds and passes all tests). I thought the fix we applied had actually fixed with for some MSVC 2005 users - I've asked on #enfold.

comment:27 by Charlie Hull, 17 years ago

Leave it assigned to me and I'll keep bashing away and identifying where it might be failing, but I agree that for 1.0 we'll have to mark the remote db as non-working if you compile using VC2005. Unfortunately I'm away from the 17th-30th as well so won't be able to look at it much more now...

comment:28 by Charlie Hull, 17 years ago

Resolution: fixed
Status: reopenedclosed

Better news; me and Richard had a go at this and managed to fix it by adding the "ignore invalid parameter handler" to do_close() - it seems that attempting to close an invalid handle was causing the crash in a similar way to the _get_osfhandle problem found earlier. Richard has patched HEAD and all tests pass on VC 2005 now.

comment:29 by Olly Betts, 17 years ago

Operating System: All
Resolution: fixedreleased

Fixed in 1.0.0 release.

comment:31 by Richard Boulton, 16 years ago

Description: modified (diff)
Milestone: 1.0.0
Note: See TracTickets for help on using tickets.