Ticket #194 (closed defect: released)

Opened 16 months ago

Last modified 16 months ago

Segfault with spelling suggestion

Reported by: fabrice.colin Owned by: richard
Priority: normal Milestone:
Component: QueryParser Version: 1.0.2
Severity: normal Keywords:
Cc: olly Blocked By:
Operating System: Linux Blocking:

Description

I am experimenting with spelling correction. I found one case where I get a segfault every single time.

My index is built with a spelling dictionary : documents are indexed with the TermGenerator?, set_database() and the FLAG_SPELLING flag are set. I use the QueryParser?'s FLAG_SPELLING_CORRECTION flag; set_database() is also set. When I search for the Chinese character 不 (pinyin "bu"), a segfault is thrown from somewhere in api/editdistance.cc. For instance, gdb will give this backtrace :

#0 edist_state<unsigned int>::edist_calc_f_kp (this=0x409ff340, k=-339, p=339) at api/editdistance.cc:76 #1 0x00002aaaaab05fa3 in edit_distance_unsigned (ptr1=<value optimized out>, len1=<value optimized out>, ptr2=<value optimized out>,

len2=<value optimized out>) at api/editdistance.cc:190

#2 0x00002aaaaab0b2e8 in Xapian::Database::get_spelling_suggestion (this=<value optimized out>, word=@0x409ff810, max_edit_distance=2)

at api/omdatabase.cc:414

#3 0x00002aaaaac45464 in Xapian::QueryParser::Internal::parse_query (this=0x2aaab800a8c0, qs=@0x409ffb30, flags=<value optimized out>,

default_prefix=<value optimized out>) at

/data/home/olly/tmp/xapian-svn-snapshot/tags/1.0.2/xapian/xapian-core/queryparser/queryparser.lemony:867 #4 0x00002aaaaac3c390 in Xapian::QueryParser::parse_query (this=0x409ffb60, query_string=@0x409ffb30, flags=191, default_prefix=@0x409ffd20)

at queryparser/queryparser.cc:117

#5 0x00000000004e611c in XapianEngine::parseQuery (pIndex=0x8bd600, queryProps=@0xc76720, stemLanguage=@0x409ffe50,

defaultOperator=SearchEngineInterface::DEFAULT_OP_AND,

correctedFreeQuery=@0xc730c0, minimal=false) at XapianEngine?.cpp:352 ...

Strangely enough, the segfault occurs only on one of the boxes I have access to, a Pentium D box running Fedora 7/x86_64 with the latest updates :

$ uname -a Linux rexor 2.6.22.4-65.fc7 #1 SMP Tue Aug 21 21:50:50 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

$ g++ -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20070502 (Red Hat 4.1.2-12)

The indexes I have queried on that box don't have any document with this term, as far as I can tell. I have searched the same indexes as well as others (some of which had the term) on other machines. In all cases, the right thing happened.

Is there anything I could try to get to the root of this problem ?

Fabrice

Attachments

patches-and-document.tgz (33.7 kB) - added by fabrice.colin 16 months ago.
Patches for simpleindex and simplesearch, and a sample document

Change History

Changed 16 months ago by fabrice.colin

I have had more time to look into this. I was wrong when I said that the problem only happened on one machine. I tried searching that same index again on another box and got a segfault.

Another good news is that I have managed to replicate the problem with simpleindex and simplesearch, thus proving that my application was not at fault :-)

I will attach patches and a sample document.

Fabrice

Changed 16 months ago by fabrice.colin

Patches for simpleindex and simplesearch, and a sample document

Changed 16 months ago by fabrice.colin

  • cc olly@… added

Adding Olly to CC list.

Fabrice

Changed 16 months ago by richard

  • status changed from new to assigned

Thanks for the details - I've reproduced this with your example, and will now investigate.

Changed 16 months ago by olly

  • attachments.mimetype changed from application/octet-stream to text/plain
  • attachments.ispatch changed from 0 to 1

(From update of attachment 125) Marking patch as a patch...

Changed 16 months ago by richard

  • status changed from assigned to closed
  • resolution set to fixed

I believe this is now fixed in SVN HEAD - there was an off-by-one error in the loops which initialised the working array in the editdistance calculation code.

With the fix, you patched simplesearch example works for me. I've also added a regression test to apitest (spell5).

Changed 16 months ago by fabrice.colin

  • resolution changed from fixed to verified

I can confirm it now works fine for me. Thanks for the quick fix !

Fabrice

Changed 16 months ago by olly

  • resolution changed from verified to released

Fixed in 1.0.3

Changed 16 months ago by trac

  • platform set to Linux
Note: See TracTickets for help on using tickets.