#194 closed defect (released)
Segfault with spelling suggestion
Reported by: | Fabrice Colin | Owned by: | Richard Boulton |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | QueryParser | Version: | 1.0.2 |
Severity: | normal | Keywords: | |
Cc: | Olly Betts | Blocked By: | |
Blocking: | Operating System: | Linux |
Description
I am experimenting with spelling correction. I found one case where I get a segfault every single time.
My index is built with a spelling dictionary : documents are indexed with the TermGenerator, set_database() and the FLAG_SPELLING flag are set. I use the QueryParser's FLAG_SPELLING_CORRECTION flag; set_database() is also set. When I search for the Chinese character ä¸ (pinyin "bu"), a segfault is thrown from somewhere in api/editdistance.cc. For instance, gdb will give this backtrace :
#0 edist_state<unsigned int>::edist_calc_f_kp (this=0x409ff340, k=-339, p=339) at api/editdistance.cc:76 #1 0x00002aaaaab05fa3 in edit_distance_unsigned (ptr1=<value optimized out>, len1=<value optimized out>, ptr2=<value optimized out>,
len2=<value optimized out>) at api/editdistance.cc:190
#2 0x00002aaaaab0b2e8 in Xapian::Database::get_spelling_suggestion (this=<value optimized out>, word=@0x409ff810, max_edit_distance=2)
at api/omdatabase.cc:414
#3 0x00002aaaaac45464 in Xapian::QueryParser::Internal::parse_query (this=0x2aaab800a8c0, qs=@0x409ffb30, flags=<value optimized out>,
default_prefix=<value optimized out>) at
/data/home/olly/tmp/xapian-svn-snapshot/tags/1.0.2/xapian/xapian-core/queryparser/queryparser.lemony:867 #4 0x00002aaaaac3c390 in Xapian::QueryParser::parse_query (this=0x409ffb60, query_string=@0x409ffb30, flags=191, default_prefix=@0x409ffd20)
at queryparser/queryparser.cc:117
#5 0x00000000004e611c in XapianEngine::parseQuery (pIndex=0x8bd600, queryProps=@0xc76720, stemLanguage=@0x409ffe50,
defaultOperator=SearchEngineInterface::DEFAULT_OP_AND,
correctedFreeQuery=@0xc730c0, minimal=false) at XapianEngine.cpp:352 ...
Strangely enough, the segfault occurs only on one of the boxes I have access to, a Pentium D box running Fedora 7/x86_64 with the latest updates :
$ uname -a Linux rexor 2.6.22.4-65.fc7 #1 SMP Tue Aug 21 21:50:50 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
$ g++ -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20070502 (Red Hat 4.1.2-12)
The indexes I have queried on that box don't have any document with this term, as far as I can tell. I have searched the same indexes as well as others (some of which had the term) on other machines. In all cases, the right thing happened.
Is there anything I could try to get to the root of this problem ?
Fabrice
Attachments (1)
Change History (8)
comment:1 by , 17 years ago
by , 17 years ago
Attachment: | patches-and-document.tgz added |
---|
Patches for simpleindex and simplesearch, and a sample document
comment:3 by , 17 years ago
Status: | new → assigned |
---|
Thanks for the details - I've reproduced this with your example, and will now investigate.
comment:4 by , 17 years ago
attachments.ispatch: | 0 → 1 |
---|---|
attachments.mimetype: | application/octet-stream → text/plain |
(From update of attachment 125) Marking patch as a patch...
comment:5 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I believe this is now fixed in SVN HEAD - there was an off-by-one error in the loops which initialised the working array in the editdistance calculation code.
With the fix, you patched simplesearch example works for me. I've also added a regression test to apitest (spell5).
comment:6 by , 17 years ago
Resolution: | fixed → verified |
---|
I can confirm it now works fine for me. Thanks for the quick fix !
Fabrice
comment:7 by , 17 years ago
Operating System: | → Linux |
---|---|
Resolution: | verified → released |
Fixed in 1.0.3
I have had more time to look into this. I was wrong when I said that the problem only happened on one machine. I tried searching that same index again on another box and got a segfault.
Another good news is that I have managed to replicate the problem with simpleindex and simplesearch, thus proving that my application was not at fault :-)
I will attach patches and a sample document.
Fabrice