Opened 9 years ago
Last modified 5 months ago
#665 assigned defect
Avoid C library functions not guaranteed to be thread-safe
Reported by: | Olly Betts | Owned by: | Olly Betts |
---|---|---|---|
Priority: | highest | Milestone: | 1.5.0 |
Component: | Library API | Version: | git master |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Currently we use environmental variables to control a few things (default backend for a newly created DB, flush threshold, replication stuff, cjk ngram, debug logging) but getenv()
isn't necessarily thread safe.
Adding a mutex which we lock while calling it wouldn't help as it isn't Xapian potentially calling setenv()
/putenv()
.
In general we'll be unlucky to hit problems, as we don't call getenv()
very often, but that's not very reassuring.
I don't see a good fix, aside from not reading environmental variables. Debug logging is a special case, but aside from that I think we should probably try to phase them out - being able to change the default backend without changing the code is kind of nice, but otherwise these things would arguably be better specified in the code.
Change History (21)
comment:1 by , 9 years ago
Summary: | getenv() not guaranteed to be thread-safe → Avoid C library functions not guaranteed to be thread-safe |
---|
comment:2 by , 9 years ago
GCC supports "poisoning" identifiers, e.g. #pragma GCC poison inet_ntoa
:
https://gcc.gnu.org/onlinedocs/gcc-4.9.0/cpp/Pragmas.html
Less portable, but perhaps less likely to cause issues - if a bad function is used as a method name or a variable, the pragma will at worst prevent compilation, whereas the #define approach could, for example, break the ABI of the built library.
comment:3 by , 9 years ago
Functions which POSIX says are not required to be thread-safe which we're actually using in the library code in current master:
getenv()
- used to read things likeXAPIAN_FLUSH_THRESHOLD
.inet_ntoa()
- used in the remote server code - while that's in the library, it's only actually used in binaries we build currently.readdir()
- used in replication code to delete the old replica - again, in the library but only used in binaries we build
And strerror()
is used, but only if neither strerror_r()
nor sys_errlist
are available.
There's also at least atoi()
and strtod()
which the Linux man pages note aren't thread-safe if another thread calls setlocale()
while they are executing.
comment:4 by , 9 years ago
[ada3bd5f12a444d778d57c0c75cccacc077d5051] updates master to use inet_ntop()
instead of inet_ntoa()
.
comment:5 by , 9 years ago
POSIX requires that readdir()
be thread-safe if the DIR*
pointer is only used in a single thread, which is how we always use it:
The returned pointer, and pointers within the structure, might be invalidated or the structure or the storage areas might be overwritten by a subsequent call to readdir() on the same directory stream. They shall not be affected by a call to readdir() on a different directory stream.
(my emphasis)
http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html
comment:6 by , 8 years ago
Milestone: | 1.3.x → 1.4.x |
---|
So the list is now:
getenv()
- used to read things likeXAPIAN_FLUSH_THRESHOLD
.strerror()
(but only if there's no thread-safe alternative).atoi()
- quite a few uses.strtod()
- only use is inNumberValueRangeProcessor::operator()
- also means that behaviour depends on the locale, which is probably not always desirable.
We should have a section in the developer docs about this too.
None of this directly affects the ABI, though eliminating uses of getenv()
might need some API additions. I don't think this should block 1.4.0 though.
comment:7 by , 5 years ago
[60bd2fb4e2a559c214f3569d1f5f64f85ba1766f] replaced 17 uses of atoi()
. That leaves just 6 in libxapian.
comment:8 by , 4 years ago
c31aa9b56eb858c9acce410d5465e91007d997b4 removes 2 more uses of atoi()
from libxapian.
comment:9 by , 4 years ago
Current status:
getenv()
- used to read things likeXAPIAN_FLUSH_THRESHOLD
.strerror()
(but only if there's no thread-safe alternative).atoi()
-api/valuerangeproc.cc
(DateRangeProcessor
),backends/databasehelpers.h
(parsing port number in remote stub lines),strtod()
-weight/weightinternal.h
parsing double parameters,api/valuerangeproc.cc
NumberRangeProcessor::operator()
- also means that behaviour depends on the locale, which is probably not always desirable.strtol()
-weight/lmweight.cc
And as noted before, we should have a section in the developer docs about this too. Also maybe poison some of these to stop them being reintroduced (at least strtol()
seems to have appeared since I last checked). Not sure how easy that is to do just for the library, and we don't need to worry about these in the tools, examples, or testsuite.
comment:10 by , 4 years ago
Can we do the poisoning in config.h
so it only applies during library compilation? We could then put bless blocks around any uses we want to whitelist, such as in examples and tools. (Clang seems to support GCC poison pragmas, but looks like it doesn't handle blessing. However we could enable this for GCC only, because we really just want CI to fail somewhere. Doesn't have to fail everywhere.)
comment:11 by , 4 years ago
I hadn't considered we could unpoison things - that might work.
Another option that might work is to have a #ifdef XAPIAN_NO_POISON
or similar around them in config.h
, and just define that before #include <config.h>
where we want to whitelist.
I agree we only need to catch this in CI, but it's more helpful to people submitting patches and using clang to have this flagged during early development rather than only after they've submitted a patch.
comment:12 by , 4 years ago
Actually it seems we can't do this in <config.h>
because that should be included before any other headers, and:
$ cat poison.cc #pragma GCC poison atoi #include <stdlib.h> $ g++ -Wall -W -O2 -c poison.cc In file included from /usr/include/c++/9/cstdlib:75, from /usr/include/c++/9/stdlib.h:36, from poison.cc:2: /usr/include/stdlib.h:104:12: error: attempt to use poisoned "atoi" 104 | extern int atoi (const char *__nptr) | ^ In file included from /usr/include/c++/9/cstdlib:75, from /usr/include/c++/9/stdlib.h:36, from poison.cc:2: /usr/include/stdlib.h:361:8: error: attempt to use poisoned "atoi" 361 | __NTH (atoi (const char *__nptr)) | ^ In file included from /usr/include/c++/9/stdlib.h:36, from poison.cc:2: /usr/include/c++/9/cstdlib:91:8: error: attempt to use poisoned "atoi" 91 | #undef atoi | ^ /usr/include/c++/9/cstdlib:141:11: error: attempt to use poisoned "atoi" 141 | using ::atoi; | ^ In file included from poison.cc:2: /usr/include/c++/9/stdlib.h:56:12: error: attempt to use poisoned "atoi" 56 | using std::atoi; | ^
So we'd have to poison after we include headers which might reference the identifier - we can't poison atoi
and then #include <stdlib.h>
, nor any header which might indirectly #include <stdlib.h>
.
Perhaps this is better checked by xapian-check-patch
- we can easily control which directories that applies a check to.
comment:13 by , 4 years ago
Does GCC actually support a bless
pragma? With GCC 9.2.1 I get:
poison.cc:2: warning: ignoring #pragma GCC bless [-Wunknown-pragmas] 2 | #pragma GCC bless atoi
The only references I found seem to be copies of a blog post wishing it existed.
comment:14 by , 4 years ago
Yeah, I didn't read the blog post closely enough in the first place. Having something in xapian-check-patch
seems a good plan, and at least the ones we are avoiding at the moment are symbols that won't be used elsewhere (except possibly in comments, which we could cater for).
comment:15 by , 4 years ago
We could also define the symbols to something else and get a link error which would be un-poisonable, but it's not going to be as clear an error as we can manage in xapian-check-patch
.
comment:16 by , 4 years ago
xapian-check-patch
already strips comment content so that shouldn't be an issue.
I think the #define inet_ntoa DO_NOT_USE_inet_ntoa
approach would (like the poison pragma) require something like #include "poison.h"
to be included after any other header in every source which is part of the library. But that's easy to fail to do (and easy for someone to merge that include into a block of others thinking it's out of place or add an include after it which might work on one platform but fail on another depending which system headers implicitly include others), so we'd really need xapian-check-patch
to check it's suitably included where it should be. Probably simpler to just check for a list of bad functions in xapian-check-patch
.
Another benefit of checking from a script is we can be a little more nuanced - e.g. strerror()
is used in the library, but only as a fallback option if we don't have strerror_r()
or sys_errlist
or similar - with a script we can allow the use in common/errno_to_string.cc
.
Also the poison pragma is based only on identifiers, so e.g. would complain about a class with an atoi()
method. I'm not sure if any of the identifiers we would want to poison would be likely method names though.
comment:17 by , 4 years ago
I've now eliminated all uses of atoi()
and strtol()
in the library itself.
Current status:
getenv()
- used to read things likeXAPIAN_FLUSH_THRESHOLD
.strerror()
(but only if there's no thread-safe alternative).strtod()
-weight/weightinternal.h
parsing double parameters,api/valuerangeproc.cc
NumberRangeProcessor::operator()
- also means that behaviour depends on the locale, which is probably not always desirable. C++17 provides std::from_chars() which we could use if available. Do we want to provide a way to allow,
as decimal point though?
Also to do:
- Add section to the developer docs about this
- Add checks to
xapian-check-patch
to prevent more calls to problematic functions from being introduced into the library code
comment:18 by , 4 years ago
Milestone: | 1.4.x → 1.5.0 |
---|---|
Status: | new → assigned |
Version: | 1.3.2 → git master |
I think we should aim to address this for 1.5.0 (or 1.5.x at least), at least for mainstream platforms (for example, the requirements might be an alternative to strerror()
and a C++17 compiler).
comment:19 by , 4 years ago
https://en.cppreference.com/w/cpp/compiler_support seems to suggest that no compiler supports std::from_chars()
for FP types yet, and looking at /usr/include/c++/9/charconv
from GCC 9 seems to confirm that, so that's no help.
We could write our own or find an existing implementation, but I think it's quite subtle to get completely right.
comment:20 by , 4 years ago
That page now suggests that GCC11 and a recent MSVC version support std::from_chars()
for floating point types (search for "Elementary string conversions" to find the relevant entry) so probably using this if available is the best option overall, and will get better with time.
comment:21 by , 5 months ago
Priority: | normal → highest |
---|
OK, let's use std::from_chars()
where it supports double
and postpone the rest. With a suitable platform and toolchain that means only getenv()
is a concern (and it seems perhaps it's fine on many platforms, it's just that POSIX allows it to return its value in a static buffer; or possibly concurrent setenv()
might be problematic too?)
I think we should broaden this to cover any library functions which aren't necessarily thread-safe.
According to the Linux man pages, there are potential issues with
atoi()
(and related functions likeatol()
,strtoul()
, etc which we don't currently use in the library code):I can't see anything in POSIX about this potential issue, so not sure if Linux is failing to conform here. It's not likely to bite in practice, but perhaps we should aim to avoid these functions in the library code.
We use
inet_ntoa()
(which isn't thread-safe as it has a static return buffer) in one place, to report where the connection is from in net/tcpserver.cc. That'll change when IPv6 support is added.I'm thinking we should see if we can compile a list of library functions to avoid, and blacklist them somehow (perhaps a header which does
#define inet_ntoa DO_NOT_USE_inet_ntoa
which gets included in all library code sources).