Opened 17 years ago

Closed 17 years ago

Last modified 17 years ago

#194 closed defect (released)

Segfault with spelling suggestion

Reported by: Fabrice Colin Owned by: Richard Boulton
Priority: normal Milestone:
Component: QueryParser Version: 1.0.2
Severity: normal Keywords:
Cc: Olly Betts Blocked By:
Blocking: Operating System: Linux

Description

I am experimenting with spelling correction. I found one case where I get a segfault every single time.

My index is built with a spelling dictionary : documents are indexed with the TermGenerator, set_database() and the FLAG_SPELLING flag are set. I use the QueryParser's FLAG_SPELLING_CORRECTION flag; set_database() is also set. When I search for the Chinese character 不 (pinyin "bu"), a segfault is thrown from somewhere in api/editdistance.cc. For instance, gdb will give this backtrace :

#0 edist_state<unsigned int>::edist_calc_f_kp (this=0x409ff340, k=-339, p=339) at api/editdistance.cc:76 #1 0x00002aaaaab05fa3 in edit_distance_unsigned (ptr1=<value optimized out>, len1=<value optimized out>, ptr2=<value optimized out>,

len2=<value optimized out>) at api/editdistance.cc:190

#2 0x00002aaaaab0b2e8 in Xapian::Database::get_spelling_suggestion (this=<value optimized out>, word=@0x409ff810, max_edit_distance=2)

at api/omdatabase.cc:414

#3 0x00002aaaaac45464 in Xapian::QueryParser::Internal::parse_query (this=0x2aaab800a8c0, qs=@0x409ffb30, flags=<value optimized out>,

default_prefix=<value optimized out>) at

/data/home/olly/tmp/xapian-svn-snapshot/tags/1.0.2/xapian/xapian-core/queryparser/queryparser.lemony:867 #4 0x00002aaaaac3c390 in Xapian::QueryParser::parse_query (this=0x409ffb60, query_string=@0x409ffb30, flags=191, default_prefix=@0x409ffd20)

at queryparser/queryparser.cc:117

#5 0x00000000004e611c in XapianEngine::parseQuery (pIndex=0x8bd600, queryProps=@0xc76720, stemLanguage=@0x409ffe50,

defaultOperator=SearchEngineInterface::DEFAULT_OP_AND,

correctedFreeQuery=@0xc730c0, minimal=false) at XapianEngine.cpp:352 ...

Strangely enough, the segfault occurs only on one of the boxes I have access to, a Pentium D box running Fedora 7/x86_64 with the latest updates :

$ uname -a Linux rexor 2.6.22.4-65.fc7 #1 SMP Tue Aug 21 21:50:50 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

$ g++ -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20070502 (Red Hat 4.1.2-12)

The indexes I have queried on that box don't have any document with this term, as far as I can tell. I have searched the same indexes as well as others (some of which had the term) on other machines. In all cases, the right thing happened.

Is there anything I could try to get to the root of this problem ?

Fabrice

Attachments (1)

patches-and-document.tgz (33.7 KB ) - added by Fabrice Colin 17 years ago.
Patches for simpleindex and simplesearch, and a sample document

Download all attachments as: .zip

Change History (8)

comment:1 by Fabrice Colin, 17 years ago

I have had more time to look into this. I was wrong when I said that the problem only happened on one machine. I tried searching that same index again on another box and got a segfault.

Another good news is that I have managed to replicate the problem with simpleindex and simplesearch, thus proving that my application was not at fault :-)

I will attach patches and a sample document.

Fabrice

by Fabrice Colin, 17 years ago

Attachment: patches-and-document.tgz added

Patches for simpleindex and simplesearch, and a sample document

comment:2 by Fabrice Colin, 17 years ago

Cc: olly@… added

Adding Olly to CC list.

Fabrice

comment:3 by Richard Boulton, 17 years ago

Status: newassigned

Thanks for the details - I've reproduced this with your example, and will now investigate.

comment:4 by Olly Betts, 17 years ago

attachments.ispatch: 01
attachments.mimetype: application/octet-streamtext/plain

(From update of attachment 125) Marking patch as a patch...

comment:5 by Richard Boulton, 17 years ago

Resolution: fixed
Status: assignedclosed

I believe this is now fixed in SVN HEAD - there was an off-by-one error in the loops which initialised the working array in the editdistance calculation code.

With the fix, you patched simplesearch example works for me. I've also added a regression test to apitest (spell5).

comment:6 by Fabrice Colin, 17 years ago

Resolution: fixedverified

I can confirm it now works fine for me. Thanks for the quick fix !

Fabrice

comment:7 by Olly Betts, 17 years ago

Operating System: Linux
Resolution: verifiedreleased

Fixed in 1.0.3

Note: See TracTickets for help on using tickets.