Ticket #46 (assigned defect)

Opened 4 years ago

Last modified 4 months ago

zero byte cleanliness in C# and Java bindings

Reported by: olly Owned by: olly
Priority: normal Milestone: 1.1.0
Component: Xapian-bindings Version: SVN trunk
Severity: minor Keywords:
Cc: richard Blocked By:
Operating System: All Blocking:

Description (last modified by richard) (diff)

Check for zero byte cleanness wherever strings are used. There are a number of c_str()s in the code, but I believe all in the core library are harmless at 2002-04-29. There may be other zero byte issues though. xapian-applications/dbtools also uses c_str() where it should probably use data() and length(). xapian-bindings hasn't been checked.

Change History

Changed 4 years ago by olly

  • status changed from new to assigned
  • severity changed from blocker to normal

Changed 4 years ago by olly

All c_str calls in xapian-core are ok at 2004-12-05

Changed 4 years ago by olly

No calls to c_str in xapian-examples!

Changed 2 years ago by olly

  • rep_platform changed from Other to All
  • version changed from other to SVN HEAD
  • component changed from other to Other
  • op_sys changed from other to All

Most of the bindings now have a simple zero-byte cleanliness test.

The notable exception is C# which doesn't seem to easily support zero byte clean string passing currently.

Tcl encodes zero bytes as the non-normalised utf-8 form "\xc0\x80" so it's not actually possible to pass a zero byte into Xapian from Tcl...

Changed 21 months ago by olly

  • component changed from Other to Xapian-bindings

Changed 20 months ago by olly

  • priority changed from high to normal
  • severity changed from normal to minor
  • summary changed from zero byte cleanliness to zero byte cleanliness in C# bindings

Changed 20 months ago by richard

  • blocking set to 118

This should at least be investigated for the 1.0 release, though maybe it's too hard to fix in all bindings right now.

Changed 20 months ago by olly

  • summary changed from zero byte cleanliness in C# bindings to zero byte cleanliness in C# and Java bindings

It's already fixed for everything except C# (oh, hmm and Java too) - there are tests in smoketest.* for it.

Tcl is fixed by virtue of not being able to have literal zero bytes in strings as they're encoded as non-normalised utf-8. Strictly speaking that should be addressed by checking for and translating '\xc0\x80' <-> '\0' as it's not valid utf-8 under current rules.

I asked on the SWIG list about C# ages ago and William Fulton seemed to suggest it was hard to fix for C# - the pinvoke stuff is built around C strings it seems.

Java is probably best fixed by transitioning to SWIG rather than expending effort on doomed code, but I'd rather leave moving it to SWIG until after 1.0. I don't think it's a huge amount of work, but it is a fundamental change that's potentially disruptive.

Changed 20 months ago by richard

  • cc richard@… added

Changed 20 months ago by richard

Okay - since fixing this isn't going to break anything which already works, I think it's a prime candidate for postponing to after 1.0.

Changed 20 months ago by olly

  • blocking deleted

For Tcl, it would change what went in the database if you sent a "tcl nul", but if we ignore "\xc0\x80" in what comes out, it'll still essentially work OK (and I don't expect a lot of people put zero bytes in Xapian databases from Tcl anyway).

So I agree (removed block).

Changed 20 months ago by richard

  • blocking set to 120

Changed 20 months ago by trac

  • platform set to All

Changed 7 months ago by richard

  • description modified (diff)
  • milestone set to 1.1

Changed 7 months ago by richard

  • blocking deleted

(In #120) Remove the unfixed dependencies so we can close this bug - they're all marked for the 1.1.0 milestone.

Changed 4 months ago by olly

As of 1.0.7, passing strings from Java to C++ is now zero byte safe.

Note: See TracTickets for help on using tickets.