Ticket #194 (closed defect: released)
Segfault with spelling suggestion
| Reported by: | fabrice.colin | Owned by: | richard |
|---|---|---|---|
| Priority: | normal | Milestone: | |
| Component: | QueryParser | Version: | 1.0.2 |
| Severity: | normal | Keywords: | |
| Cc: | olly | Blocked By: | |
| Operating System: | Linux | Blocking: |
Description
I am experimenting with spelling correction. I found one case where I get a segfault every single time.
My index is built with a spelling dictionary : documents are indexed with the TermGenerator?, set_database() and the FLAG_SPELLING flag are set. I use the QueryParser?'s FLAG_SPELLING_CORRECTION flag; set_database() is also set. When I search for the Chinese character ä¸ (pinyin "bu"), a segfault is thrown from somewhere in api/editdistance.cc. For instance, gdb will give this backtrace :
#0 edist_state<unsigned int>::edist_calc_f_kp (this=0x409ff340, k=-339, p=339) at api/editdistance.cc:76 #1 0x00002aaaaab05fa3 in edit_distance_unsigned (ptr1=<value optimized out>, len1=<value optimized out>, ptr2=<value optimized out>,
len2=<value optimized out>) at api/editdistance.cc:190
#2 0x00002aaaaab0b2e8 in Xapian::Database::get_spelling_suggestion (this=<value optimized out>, word=@0x409ff810, max_edit_distance=2)
at api/omdatabase.cc:414
#3 0x00002aaaaac45464 in Xapian::QueryParser::Internal::parse_query (this=0x2aaab800a8c0, qs=@0x409ffb30, flags=<value optimized out>,
default_prefix=<value optimized out>) at
/data/home/olly/tmp/xapian-svn-snapshot/tags/1.0.2/xapian/xapian-core/queryparser/queryparser.lemony:867 #4 0x00002aaaaac3c390 in Xapian::QueryParser::parse_query (this=0x409ffb60, query_string=@0x409ffb30, flags=191, default_prefix=@0x409ffd20)
at queryparser/queryparser.cc:117
#5 0x00000000004e611c in XapianEngine::parseQuery (pIndex=0x8bd600, queryProps=@0xc76720, stemLanguage=@0x409ffe50,
defaultOperator=SearchEngineInterface::DEFAULT_OP_AND,
correctedFreeQuery=@0xc730c0, minimal=false) at XapianEngine?.cpp:352 ...
Strangely enough, the segfault occurs only on one of the boxes I have access to, a Pentium D box running Fedora 7/x86_64 with the latest updates :
$ uname -a Linux rexor 2.6.22.4-65.fc7 #1 SMP Tue Aug 21 21:50:50 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
$ g++ -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20070502 (Red Hat 4.1.2-12)
The indexes I have queried on that box don't have any document with this term, as far as I can tell. I have searched the same indexes as well as others (some of which had the term) on other machines. In all cases, the right thing happened.
Is there anything I could try to get to the root of this problem ?
Fabrice
