Opened 14 years ago
Closed 14 years ago
#487 closed defect (fixed)
ordecay1 fails on i386 architecture
Reported by: | Richard Boulton | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.0.21 |
Component: | Build system | Version: | SVN trunk |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Currently, test ordecay1 fails on an i386 box as follows:
Running test: ordecay1... FAILED docids differ at item 22 in range: 44 != 31
If I recompile with CXXFLAGS="-mfpmath=sse -msse"
, the test passes, so I'm pretty sure this is triggered (or at least exacerbated) by i386 excess precision. It also seems that the matcher is a bit faster with this option (I measured 6% faster, but that was in a situation which is probably close to optimal for showing the difference).
Configure should probably enable the appropriate option to CXXFLAGS on i386 architectures by default. The debian packages will then need to build a separate library for actual 486 processors, so there may need to be a configure flag to override this.
This is certainly a problem with trunk, and will also be a problem with the 1.0 branch if ordecay1 has been backported there.
Change History (4)
comment:1 by , 14 years ago
Component: | Other → Build system |
---|
comment:2 by , 14 years ago
Status: | new → assigned |
---|
Fixed to put the new flags in AM_CXXFLAGS so we don't clobber user specified CXXFLAGS, or the default of -O2 -g in trunk r14687.
comment:3 by , 14 years ago
Milestone: | 1.2.1 → 1.0.21 |
---|
-march=pentium4
doesn't seem to give a measurable speed-up (from Richard's tests) and it carries a small risk of introducing instructions which don't work on some obscure CPU which implements SSE2, so I've removed that in r14692.
So this is now sorted in trunk. Marking for backport to 1.0.21.
comment:4 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Backported for 1.0.21 in r14693.
Should be fixed on trunk by r14686, but I've not tested yet.
We're actually now using
-mfpmath=sse -msse2 -mtune=generic -march=pentium4
which assumes a Pentium 4. SSE2 added double precision FP instructions, and we use double a lot, and the last two mean that we'll generate code which will work on a Pentium 4, but is optimised to run fast on modern CPUs.