{5} Assigned, Active Tickets by Owner (Full Description) (58 matches)
List tickets assigned, group by ticket owner. This report demonstrates the use of full-row display.
olly (6 matches)
| Ticket | Summary | Component | Milestone | Type | Created | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #3 | Get multierrhandler1 working again | Test Suite | enhancement | 2003-03-27 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Redo machinery in InMemory? backend to allow multierrhandler1 to work. Probably leave until user database backends are possible, then do it by subclassing InMemory?... |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #51 | Nightly snapshots | Website | enhancement | 2004-09-09 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Get nightly snapshot builds set up again - essentially, take the current SVN snapshots and "bless" them if they "look good" according to the tinderbox. The problem is that we can't easily tie a row of green lights in tinderbox to a particular snapshot - switching to buildbot will help this I believe. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #53 | Xapian::Fields | Library API | enhancement | 2004-09-09 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Implement a Xapian::Fields class to serialise/unserialise name=value pairs to/from Document data field. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #62 | How to use the Tcl binding so cleanup works | Xapian-bindings | defect | 2005-06-01 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The current Tcl binding has problems with cleanup, sometimes the destructor does not get called and other nuisances. I did some small experiments with the binding and found, that the constructor gets called in some cases and not in others: This works: xapian::WritableDatabase? xapiandb testdir $::xapian::DB_CREATE_OR_OVERWRITE rename xapiandb "" This seems it does not: xapian::WritableDatabase? xapiandb testdir $::xapian::DB_CREATE_OR_OVERWRITE set db xapiandb $db -delete neither does this set db [xapian::WritableDatabase xapiandb testdir $::xapian::DB_CREATE_OR_OVERWRITE] $db -delete or this: set db [xapian::WritableDatabase xapiandb testdir $::xapian::DB_CREATE_OR_OVERWRITE] rename $db "" I'm not sure if it is a problem with the SWIG wrapping, but thing there are some subtle problems somewhere in there. Michael |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #40 | Alternative approach to tracking free blocks in btrees | Backend-Chert | enhancement | 2004-09-09 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Use chains of free blocks rather than a bitmap - then we can store many old revisions more cheaply (just the space they actually need, not a whole bitmap for each one too). Then readers use fcntl locking on a single byte corresponding to the revision they're using (bytes off the end of the file can be locked, and shared locks on read-only files are ok). Then a writer would only delete old revisions for which it could obtain an exclusive lock (otherwise it would preserve them). The Btree manager is generally written with multiple old revisions in mind, so this shouldn't be a huge project. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #63 | Improve visibility annotations for the library | Library API | enhancement | 2005-06-11 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
See http://gcc.gnu.org/wiki/Visibility This should make the shared library much smaller, and a little faster! URL: http://gcc.gnu.org/wiki/Visibility |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
richard (4 matches)
| Ticket | Summary | Component | Milestone | Type | Created | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #48 | RangePostList | Library API | enhancement | 2004-09-09 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Provide explicit support for range searches, such as "RangePostList?" - combine a sequence of adjacent terms... |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #50 | SynonymPostList | Library API | 1.1.0 | enhancement | 2004-09-09 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Add synonym postlists, which represents a set of postlists merged together such that each document that occurs in any of the sublists occurs in the synonym list. The termfrequency should ideally be the number of documents that one or more of the terms occurs in, but that's too expensive to find, so we'll need to estimate. Need to be able to take underlying postlists which aren't necessarily just postlists for single terms too. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #58 | Convert from tinderbox to buildbot | Buildbot | enhancement | 2004-11-25 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Tinderbox isn't cleanly configurable in the way we want, so I've had to hack it around a lot. Buildbot looks a much better bet, as it's designed to allow modification by subclassing. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #138 | Tidy up output of epydoc when processing xapian python bindings | Xapian-bindings | 1.1.0 | enhancement | 2007-04-23 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Now that the python bindings have documentation strings, I've tried running epydoc on the xapian module to generate an HTML format version of the documentation. This does a pretty good job (good enough that we should include it on the xapian website for each released version). However, there are several issues that could do with being tidied up, so I'll list them here. 1. epydoc seems to consider some of the methods (eg Document.termlist) which are added to the classes by "extra.i" as "private", and therefore doesn't display them by default. This should be changed so that they're visible by default. 2. Methods which aren't intended to be called externally should be hidden so that epydoc considers them "private". This could be done by renaming them. For example, Document.termlist_begin() shouldn't be public; renaming it to Document._termlist_begin() would make epydoc consider it to be private, and would prevent users calling it and not knowing how to use the returned iterator. In particular, this would reduce the likelihood of confusion between classes like MSetIter and MSetIterator, since it would be impossible to get an instance of MSetIterator without accessing a private method or attribute. 3. "epydoc xapian" reports several errors, due to markup in the documentation comments being invalid restructuredText. This should be fixed - in many cases the fix will lie in doxy2swig.py, but in some cases the documentation comments in the C++ headers could do with fixing up. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
olly (28 matches)
| Ticket | Summary | Component | Milestone | Type | Created | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #46 | zero byte cleanliness in C# and Java bindings | Xapian-bindings | 1.1.0 | defect | 2004-09-09 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Check for zero byte cleanness wherever strings are used. There are a number of c_str()s in the code, but I believe all in the core library are harmless at 2002-04-29. There may be other zero byte issues though. xapian-applications/dbtools also uses c_str() where it should probably use data() and length(). xapian-bindings hasn't been checked. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #158 | Query::MatchNothing and Query::MatchAll aren't wrapped | Xapian-bindings | 1.1.0 | defect | 2007-05-26 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The obvious patch for this (below) doesn't work - in Python, you get a property of xapian.Query() added, which means that you have to instantiate xapian.Query to get at MatchNothing? (ie, xapian.Query().MatchNothing? works, but xapian.Query.MatchNothing? doesn't). It should be easy enough to fix this with a python specific workaround though. PHP also doesn't work; I can't seem to access the resulting function at all, but this may be more due to my lack of PHP knowledge. I've not tested for other languages yet. Index: xapian.i =================================================================== --- xapian.i (revision 8676) +++ xapian.i (working copy) @@ -871,6 +871,9 @@
+ static Xapian::Query MatchAll?; + static Xapian::Query MatchNothing?; +
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #170 | Windows structured exceptions produce RuntimeError with some MSVC versions | Xapian-bindings | defect | 2007-06-19 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We are having an issue on a testing machine. We are running stress tests on it, and Xapian eventually raises a RuntimeError?, "unknown Xapian error". This is on Windows, using the Xapian Python bindings. Richard mentioned that exception.i has a catch(...) that catches all the unknown exceptions. He also mentioned that this might have something to do with Windows Structured Exceptions, and that Mark had investigated this previously so he thought it had been fixed. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #175 | xapian-compact functionality should be available from the C++ API | Library API | defect | 2007-06-26 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The ability to merge and compact databases efficiently would be a useful addition to the C++ API (and the language bindings), so it would be good to move most of the implementation xapian-compact into the core, and change xapian-compact to just be a simple interface to this. The first step is probably to refactor xapian-compact, such that it's not mainly a single massive function: I've made a start on this, and the patch will be attached to this bug shortly. I'm happy to work on this, and don't think it's very much work, but Olly says that there are a few outstanding issues he needs to fix in xapian-compact, so I'll leave this bug assigned to him until then. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #193 | NumberValueRangeProcessor_apply not working in the PHP-bindings | Xapian-bindings | 1.1.0 | defect | 2007-08-20 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The following returns an error, even though the right arguments have been passed: $vrp = new XapianNumberValueRangeProcessor?(0, "\$", true); $vrp->apply((string)"240", (string)"500"); The error returned is: Fatal error: Type error in argument 2 of NumberValueRangeProcessor?_apply. Expected SWIGTYPE_p_stdstring in xapian.php on line 1217 line 1217 being: return NumberValueRangeProcessor?_apply($this->_cPtr,$begin,$end); Have tried changing that line to say (string)$begin and (string)$end, same result. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #195 | Flint writable databases should take a parameter indicating flush threshold. | Backend-Flint | defect | 2007-09-11 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Possibly, this should be a global parameter (ie, applies to all databases), or maybe it should be a database specific parameter (ie, set as a parameter to the "open" method for flint writable databases). In any case, the current way of setting a flush threshold (ie, setting an environment variable) is unsatisfactory, due to being difficult to set in some circumstances (or on some OSes), and it being easy for users to forget to export the variable, resulting in bogus bug reports. A parameter to a Xapian function would be a cleaner API for this. However, we indend to improve the handling of automatic flushes in future, such that the count of added documents won't be the crucial factor; instead, amount of memory used will be. We need to ensure we don't add a parameter to the API which will shortly become meaningless. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #201 | Attempting to create a NEAR search with two OR nodes gives assertion error | Library API | 1.1.0 | defect | 2007-09-21 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've observed this from python, but I expect it can occur from C++ too. The following python script gives an assertion error: import xapian a=xapian.Query('A') b=xapian.Query('B') c=xapian.Query(xapian.Query.OP_OR, a, b) d=xapian.Query(xapian.Query.OP_NEAR, c, c) The error is: xapian.AssertionError?: /home/richard/private/Working/xapian/xapian-core/api/omqueryinternal.cc:770: op Xapian::Query::OP_NEAR || opXapian::Query::OP_PHRASE This is because an attempt is made to flatten the query "(A OR B) NEAR (A OR B)", which isn't supported (I believe). It would be nice to fix this by supporting such searches, but meanwhile we shouldn't raise AssertionError? from the API; a more explanatory exception should be returned instead. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #216 | Inconsistent return values for percentage weights | Matcher | 1.0.9 | defect | 2007-11-27 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
When results are being sorted primarily by an order other than relevance (e.g. sort_by_value()), the percentage values returned by the MSet object may be incorrect because they are calculated based on the document in the portion of the MSet requested which has the highest weight, instead of the document matching the query which has the highest weight. This issue has existed in all previous Xapian releases, as far as we can tell. There is currently no fix in progress, since it is probably not possible to fix without significant loss of efficiency, which would adversely affect users who aren't interested in the percentage scores. If you really need percentage scores in this situation, one workaround would be to first run the search using relevance order, asking for only the top document, and to remember the weight and percentage assigned to that document. Then, re-run the search in sorted order, and calculate the percentages yourself from the weights assigned to the results, using this information. A testcase demonstrating this is attached to this ticket. The issue is that in multimatch.cc, we calculate "best" by looking for the highest weighted document in the candidate mset, but when sorting by anything other than relevance, the highest weighted document may have been discarded already. It is hard to see how to fix this - one obvious approach would be to check every candidate document's weight before discarding it during the match process, and keep track the docid of the document with the highest weight seen so far. However, we currently don't calculate the weight for all the documents we see (because we first check the document against the lowest document in the mset using mcmp), so this would force us to calculate the weights on documents we wouldn't otherwise need to calculate it for. Since the percentages aren't necessarily even wanted, this seems a shame. Perhaps a reasonable approach would be to add a flag on enquire which governed whether percentages were wanted or not; it would then be more reasonable to go to extra effort to keep track of the highest weighted document if the percentages were actually desired. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #228 | Trying to build xapian package for dapper fails during fakeroot apt-get source -b xapian-core | Build system | defect | 2008-01-18 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Building ubuntu dapper or gutsy gives a similar failure for command fakeroot apt-get source -b xapian-core.
-I../languages -Ilanguages -I../queryparser -Wall -W -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wno-multichar -Wno-long-long -fno-gnu-keywords -Wundef -Wshadow -fvisibility=hidden -O2 -c ../api/editdistance.cc -o api/editdistance.o >/dev/null 2>&1 /bin/sh ./libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I.. -I../common -I../include -I./include -I../languages -Ilanguages -I../queryparser
-Wno-multichar -Wno-long-long -fno-gnu-keywords -Wundef -Wshadow -fvisibility=hidden -O2 -c -o api/error.lo ../api/error.cc
-I../languages -Ilanguages -I../queryparser -Wall -W -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wno-multichar -Wno-long-long -fno-gnu-keywords -Wundef -Wshadow -fvisibility=hidden -O2 -c ../api/error.cc -fPIC -DPIC -o api/.libs/error.o In file included from ../api/error.cc:25: ../common/safeerrno.h:25:3: error: #error You must #include <config.h> before #include "safeerrno.h" make[3]: *** [api/error.lo] Error 1 make[3]: Leaving directory `/home/rhatch/xapian-core-1.0.5/xapian-core-1.0.5/build' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/rhatch/xapian-core-1.0.5/xapian-core-1.0.5/build' make[1]: *** [all] Error 2 make[1]: Leaving directory `/home/rhatch/xapian-core-1.0.5/xapian-core-1.0.5/build' make: *** [build-stamp] Error 2 Build command 'cd xapian-core-1.0.5 && dpkg-buildpackage -b -uc' failed. E: Child process failed Prior to this failure the command below does not seem to work. sudo apt-get build-dep xapian-core Reading package lists... Done Building dependency tree... Done 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #230 | C++-exceptions are not wrapped for Perl | Search::Xapian | 1.1.0 | defect | 2008-01-30 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This results in exceptions being uncatchable, or catchable as string-exceptions. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #245 | All-stopword queries with two or more terms should ignore stopword list | QueryParser | defect | 2008-03-07 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Currently, if a single word query is parsed, and that word is a stopword, the stopwording is ignored. However, if a multiple word query is parsed, and all words are stopwords, the stopwording is applied (resulting in an empty query). If all the words in the query are stopwords, I think it may make sense to ignore the stopwording. However, even if we decide to apply the stopwording in this case, we should be consistent in our behaviour. Some examples, in python:
'Xapian::Query(foo:(pos=1))'
'Xapian::Query()'
'Xapian::Query()' Either the first parse_query() call should return Xapian::Query(), or the later ones should return non-empty queries. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #254 | Setting QueryParser default_op to OP_NEAR doesn't set an explicit window size | QueryParser | 1.1.0 | defect | 2008-04-24 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
When searching with more then 2 parameters on Boolean operator NEAR it throws and error: Exception: Can't use NEAR/PHRASE with a subexpression containing NEAR or PHRASE Test case: http://myhealthcare.com/cgi-bin/search?q=american+actor+kevin&bool=near -Kevin Duraj |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #284 | occasional DatabaseModifiedErrors | Backend-Flint | 1.0.9 | defect | 2008-07-23 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I use xapian-core-1.0.7 with the corresponding perl bindings. I run a 1 writer/N reader setup, and I do reopen() a database-handle before each query. Nevertheless I casually get DatabaseModifiedErrors. This is what I found out so far: * The errors occurs after explicit flushing my most frequented index. The error does less often occur, if I do a sleep(1) after each explicit flush() before applying no changes (without flush) to the index, and it never occured so far with a sleep(4). This is my workaround. * I already set XAPIAN_FLUSH_THRESHOLD to a large value (100000). * I patched the xapian-core lib to log all calls of FlintDatabase::set_revision_number(), and the throw-points of the XapianModifiedErrors, which turned out that the exception gets thrown in FlintTable::set_overwritten(). * I patched again to get the caller and found out that set_overwritten() got called by FlintTable::block_to_cursor(), which I patched again to expose the condions: if (REVISION(p) > REVISION(C_[j + 1].p)) {
fprintf(stderr, "set_overwritten: from block_to_cursor() %d > %d\n", REVISION(p), REVISION(C_[j + 1].p));
set_overwritten();
return;
}
and it turned out: set_overwritten: from block_to_cursor() 10194 > 10192 terminate called after throwing an instance of 'Xapian::!DatabaseModifiedError' (...) set_overwritten: from block_to_cursor() 10195 > 10193 terminate called after throwing an instance of 'Xapian::!DatabaseModifiedError' set_overwritten: from block_to_cursor() 10195 > 10193 terminate called after throwing an instance of 'Xapian::!DatabaseModifiedError' (...) set_overwritten: from block_to_cursor 10199 > 10197 terminate called after throwing an instance of 'Xapian::!DatabaseModifiedError' set_overwritten: from block_to_cursor 10199 > 10197 terminate called after throwing an instance of 'Xapian::!DatabaseModifiedError' I originally tested this with xapian-1.0.6, but it also occurs in 1.0.7. I run xapian on Ubuntu Linux 8.04 (Hardy) with a 2.6.24-19-server kernel and an ext 3 file filesystem. The machine is an IBM x3650 with 40 GB RAM, and a ServeRAID-8k Controller running a Raid 10 over 6 SAS-Disks. My most frequented index (the one that drops the exceptions) contains about 850.000 documents, needs 11 Gb of disk space, gets 5-15 updates per second, and about 20-25 search hits per second. I flush() this index every 10 minutes (which takes about 60-100 seconds + 4 seconds workaround delay ;-) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #288 | Use F_FULLFSYNC ioctl where supported | Backend-Chert | 1.1.1 | defect | 2008-08-07 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've recently noticed that, when performing an fsync, sqlite and mysql use a special ioctl on OS X which makes an effort to ensure that the disk's internal write buffers are flushed to the platters. Perhaps we should be using this ioctl too. http://lists.apple.com/archives/darwin-dev/2005/Feb/msg00072.html has some details about why this is needed. http://www.sqlite.org/cvstrac/fileview?f=sqlite/src/os_unix.c&v=1.195 contains the sqlite implementation; search for the "full_fsync" function. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #22 | Eliminate common cases which cause a slow phrase search | QueryParser | 1.1.0 | enhancement | 2004-03-15 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Some common punctuation (notably -) is treated as a word break when indexing, and as a phrase generator when searching. The problem is that many common cases end up creating phrase searches with one or two character terms which are very common, and these search are slow with a big database. Examples include: {e-mail cd-r d-i-y This could perhaps be addressed by a smarter word identifying algorithm. When indexing and searching, we could decide never to generate a single character term in certain circumstances (maybe also apply the same rules for two character terms). So "e-mail" would be indexed as "email" not "e" and "mail". And similarly for searching. In general the extra conflation this gives seems useful (although email is apparently dutch for enamel...) The query parser probably wouldn't apply this rule to quoted phrase searches - otherwise searching for "o freddled gruntbuggly" would search for "ofreddled gruntbuggly" and tragically not find any matches (I'm sure there are less esoteric examples - a search for "i robot" say...) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #52 | Running postlists backwards | Backend-Flint | enhancement | 2004-09-09 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ability to run a postlist backwards - it's chunked, so this is feasible (with a small change we can even decode the current encoding backwards!) This is useful as we can add articles in date order and do a boolean search running the posting lists backwards to do "sort by date" (which is good as it an terminate once we've enough matches). Need this for gmane. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #59 | Compress chert postlist changes buffered in memory | Backend-Chert | 1.1.0 | enhancement | 2004-11-26 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If we could somehow reduce the memory used by the postlist changes chert buffers, we could buffer more and/or let the OS have more spare memory for buffering disk blocks. That should allow indexing to run faster. However we need to compress in such a way that we can still implement Xapian::Database methods including the effects of the buffered changes. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #113 | QueryParser limitation/inconsistency | QueryParser | 1.1.0 | enhancement | 2007-03-15 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi,
that the exquisite QueryParser? (no irony intended) imposes some serious limitations for certain queries, as it does treat some characters specially, even when flags does not contain FLAG_PHRASE.
for we have a mixed setup of html documents and code. This includes several references to text in the word_word format. Unfortunately the QueryParser? treats underscore as phrase generator, making impossible to search for terms indexed using whitespace separators, even when allterms() shows the term exists on the database.
as it does not matter what flags are used, in such cases where the query string contains any of the characters defined in is_phrase_generator(), the query will be automatically converted to a phrase search (note that these characters can't be changed).
(using " or any other previously defined character) and if this is not the case the QueryParser? should not try to convert the query to anything else (except for the defined operations, OR, AND, etc).
characters) and treat every part of the word_word as a separate term, but that would also mean that "word word" would match as well, when it's not what you wanted.
further details or I can clarify anything else.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #114 | Use libmagic or libextractor instead of own MIME mappings and extractions | Omega | enhancement | 2007-03-29 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello, I locally first modified omindex to use libmagic's MIME database, instead of hard coding the MIME type to file extension mapping. This ensures that the internally used MIME types are more consistent with accepted standard types. Then I went further and instead of using file extensions to determine type, used libmagic to fingerprint the files. This is slower, but ensures that the file actually is identified correctly even if the extension is wrong. Now I am using libextractor to actually extract the metadata from the file, instead of calling these external programs inside omindex based on the MIME type. Using libextractor greatly simplifies omindex. Is anyone interested in these modifications? |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #145 | remote connection should pass 'writable' flag | Backend-Remote | 1.1.0 | enhancement | 2007-05-06 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
When a client application uses (say) xapian.remote_open() to connect to a server running in 'writable' mode, the server still opens the database for the connection in 'writable' mode, even though this was not requested by the caller.
writable and one for read-only - as usage of the writable server will lock out all other read-only requests, which would be unacceptable in some environments. The fix is not trivial as the protocol doesn't provide a way of providing connection-specific options. A solution would be to have the client send a MSG_KNOCK (?) message at connection with options (just this flag in the first instance) and the server could respond with its REPLY_GREETING if all is well. I understand this isn't going to make 1.0 though (well, unless you are really keen and would accept a patch if I could make one :) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #150 | Enhancements to Unicode support | QueryParser | 1.1.0 | enhancement | 2007-05-13 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This bug is intended to just gather together enhancements we'd like to make to our Unicode support. Currently I'm aware of two: * Special cases for case conversion: http://www.unicode.org/Public/5.0.0/ucd/UCD.html#Case_Mappings and in particular: http://www.unicode.org/Public/5.0.0/ucd/SpecialCasing.txt * Normalisation (mostly combining accents): http://www.unicode.org/Public/5.0.0/ucd/UCD.html#Decompositions_and_Normalization I'd imagine we would probably want to target most such changes at 1.1.0, for reasons of database compatibility. There are probably cases where it would be reasonable to implement such changes sooner though - if we build a different database in a case where the existing behaviour is poor, or the difference isn't problematic for some other reason, say. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #151 | Use function attributes to mark functions as "const", "pure", and "nothrow" | Other | 1.1.0 | enhancement | 2007-05-13 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
GCC allows functions to be annotate with attribute((const)) if they "do not examine any values except their arguments, and have no effects except the return value", which allows the compiler to use CSE to eliminate calls to them with identical arguments. This would probably be very useful for Xapian::Unicode::get_category() for example. URL: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Function-Attributes.html#Function-Attributes |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #167 | Add mode to query parser to search for both stemmed and unstemmed forms | QueryParser | enhancement | 2007-06-13 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Now that we store both the stemmed and unstemmed forms of each word in the database, it might be nice to add a new stemming mode to the query parser which takes each word in the query and generates an "OR" query for it, with two parts; one being the unstemmed form and one being the stemmed form. This would mean that each query would match any document with words which match the stemmed form, but would give documents with the unstemmed form a higher weight. We might call this option "STEM_BOTH", or some better name that someone other than me can think of. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #222 | omindex should make use of O_NOATIME where available | Omega | enhancement | 2007-12-18 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
On Linux >= 2.6.8, open() accepts a O_NOATIME flag which is intended for use by "indexing or backup programs". That means us! I have a patch for this, which I'll attach shortly. There's a wrinkle though - in some cases O_NOATIME will cause open to fail with EPERM and you need to retry the open call without O_NOATIME:
So for example, if we're indexing /usr/share/doc as a non-root user, we incur an extra syscall for each file - in this case it would be more efficient not to use O_NOATIME at all. We need to quantify this overhead, and (if it's an issue) look at how to reduce it. One thought I had was, on a per-directory basis, to give up on using O_NOATIME if we failed to open a file using it. Then we only incur one syscall per directory for a read-only tree. Various tweaks to this are possible - e.g. give up for this directory and all subdirectories. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #234 | add an option to specify whether filter terms of a given prefix should be ORed or ANDed together | Omega | enhancement | 2008-02-01 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, the patch at http://people.debian.org/~tviehmann/list-search/xapian_omega_add_option_filter_defaultop.diff adds an option map to allow overriding the filter behaviour from OR to AND among the terms of a given prefix. For example, if first and last name are indexed with prefix A, I would add
to the query template in order to be handle first, last, or first and last name entered into the appropriate fields. Kind regards Thomas URL: http://people.debian.org/~tviehmann/list-search/xapian_omega_add_option_filter_defaultop.diff |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #235 | store the sort specification in the option map instead of separate variables | Omega | 1.1.0 | enhancement | 2008-02-01 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, the patch at http://people.debian.org/~tviehmann/list-search/xapian_omega_make_sortstuff_options.diff moves the sort specification into the option map and removes the sort_* variables in omega.h. This makes the sort specification better accessible from omegascript. Quite possibly, it could be improved by doing the same to docid_order. The goal immediately at hand is to reduce the amount of changes to omega in the gmane/Debian list search patches, here to move the sort handling into the query templates. Kind regards Thomas URL: http://people.debian.org/~tviehmann/list-search/xapian_omega_make_sortstuff_options.diff |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #280 | Review storage of parameters in Query | Library API | 1.1.0 | enhancement | 2008-06-27 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Currently, Xapian::Query::Internal stores any "double" parameter value as a sortable_serialised string. There is a FIXME in the code for set_dbl_parameter() and get_dbl_parameter() (around line 976 of api/omqueryinternal.cc) saying: "FIXME: rework for 1.1.0". This hasn't been changed until now due to fear of breaking ABI compatibility. Instead, we should store double parameters as doubles in Query::Internal. While reorganising this, it might be worth making parameter storage a bit more general, and tidying it up. We currently have the following parameters stored in Query::Internal:
Two approaches seem plausible to me - firstly, we could define a union with the possible parameter types, and store the parameters in a list of these unions. Alternatively, we could subclass Query::Internal for each of the possible query types, and just store the appropriate parameters for each. The latter approach seems cleaner to me, and more likely to be flexible for future expansion of the available query operators, but I've not thought about this much yet. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #290 | Omega support for Office 2007 Word and Excel Documents | Omega | 1.1.0 | enhancement | 2008-08-26 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This patch uses the xmlparser and unzip to extract and process strings from *.xlsx and *.docx files. P.S. First time I have used svn to create a diff or Trac so forgive me if I've screwed something up :) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
richard (20 matches)
| Ticket | Summary | Component | Milestone | Type | Created | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #169 | Standard build system should support windows with MSVC | Build system | defect | 2007-06-19 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The standard build system (ie, configure, autotools, etc) should be able to detect and use MSVC on a windows system. This will require some unix support stuff to be installed, but MSys & Mingw should suffice (ie, a full cygwin installation shouldn't be needed). Mark has made some progress in this direction, and Richard made a start at looking at it, so this bug is intended as a place to collaborate. Olly says: "I don't know if libtool's support has bitrotted though. If it has, there's a (very poorly named) wrapper called "wgcc" which translated gcc options to msvc" |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #178 | No remote backend support for: spelling correction, synonyms, metadata | Backend-Remote | 1.1.0 | defect | 2007-07-04 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The remote database was briefly feature complete, but it's fallen behind again - it doesn't support spelling correction, or synonym expansion. It may also not support the new matchspy stuff. We should add these in to it at some point. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #180 | Add support for CJK text to queryparser and termgenerator | QueryParser | defect | 2007-07-05 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Some code to do this kind of tokenisation is now available at http://code.google.com/p/cjk-tokenizer/ which should probably be used as the basis for supporting this in Xapian. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #182 | Match decider should be set on enquire object, not as get_mset() param | Library API | defect | 2007-07-06 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Currently, match deciders (and match spies) are specified by passing them as get_mset() parameters. It would be neater, and reduce the excessive number of parameters passed to get_mset(), if there was a "set_match_decider" function, instead of these parameters. We could also use this style of API to support things like multiple match deciders, where each would be called in sequence, allowing only those documents which pass all deciders to be returned. This would be useful if only a limited set of predefined match deciders were available (for example, in a remote search, or when calling from Python), and a combination of restrictions was desired. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #183 | Remote backend should support use of Xapian::MatchDecider | Backend-Remote | defect | 2007-07-06 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Currently, Enquire::register_match_decider() simply stores the values passed to it in the internals of the Enquire object. These values never get used. Either register_match_decider() should be removed, or (more probably) the values should be used in the remote match case to allow match deciders registered with the server to be used. For now, I've added a note in the documentation comment that this method effectively does nothing. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #185 | Deadlocks with apache mod_python and mod_wsgi | Xapian-bindings | defect | 2007-07-11 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Summary of current known statusmod_pythonCalling any Xapian methods or functions is likely to cause dead-lock unless you set this option in the Apache configuration section for all mod_python scripts which use Xapian: PythonInterpreter main_interpreter You may also need to use Python >= 2.4 (due to [http://issues.apache.org/jira/browse/MODPYTHON-217 problems in Python 2.3 with the APIs the code uses). Even with main_interpreter and Python >= 2.4, calling from Xapian's C++ code back to Python code won't work properly (this means that you can't subclass Xapian objects in Python). This is apparently an issue with mod_python. mod_wsgiYou'll need to set: WSGIApplicationGroup %{GLOBAL}
For details see: http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGIApplicationGroup and http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Python_Simplified_GIL_State_API The mod_wsgi developers say this should be sufficient, and you should be able to subclass Xapian objects in Python. If you encounter problems, please talk to us or the mod_wsgi developers so we can investigate. Originally reported on the mailing list: http://thread.gmane.org/gmane.comp.search.xapian.general/4486 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #191 | Possible license conflict with the PHP bindings | Xapian-bindings | defect | 2007-08-17 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I am reporting this on behalf of Adel Gadllah <adel.gadllah@…>, who is looking into packaging the bindings for Fedora 7. The PHP license and the GPL aren't compatible but xapian-bindings links PHP licenced and GPL licensed code. Quotes from the conversation on IRC with Fedora developers : "the problem i'm seeing is that xapian-bindings has bits of code that are GPLv2+ and PHP" "and it is merging them together into one .cc file and compiling _that_" "except, the GPLv2 and PHP are incompatible" "BOOM" "tell upstream that they can't compile PHP code with GPL* code" We need this solved first before continuing with building the other bindings in Fedora. Fabrice |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #198 | Add support for multiple values in each value slot in a Document. | Backend-Flint | 1.1.0 | defect | 2007-09-17 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Currently, the value stored in a slot in a Document is a single string. It would sometimes be useful to be able to store multiple strings in the slot. For example, when using a value slot to store the set of facets that a document is relevant to, a given document may be relevant to multiple values. Also, if storing the set of tags matching a document, for use when generating a tag cloud, we want to be able to store multiple tags for each document. However, we also need to preserve the existing API, and ensure that database formats are compatible. Some discussion from IRC follows: Richard Boulton: Do you think we could convert values as stored in databases currently to allow multiple values, without breaking backwards compatibility? ojwb: probably ojwb: if only by checking the flint version Richard Boulton: Hmm - if an old flint version is used to create a database, and insert some values, it could be hard to then modify that database with a new version of flint. Richard Boulton: Unless we have a pass through the whole database to rewrite the values. ojwb: well, you could just disable the ability to add multiple values Richard Boulton: Oh, no, its easy to store this. Richard Boulton: Each value entry consists of a list of "valueno, entry" items. Richard Boulton: (serialised, of course) ojwb: or start the newly encoded ones in a way which is invalid ojwb: oh, just duplicate? Richard Boulton: Yep, there doesn't seem to be any thing to stop that. ojwb: so the only question is if it's actually desirable! Richard Boulton: They're kept in sorted order. Richard Boulton: And the existing get_value() just returns the first of a particular valueno found. ojwb: that's nice then Richard Boulton: So it would even be backwards compatible for reading purposes. (Old versions of xapian just wouldn't see the duplicate values) ojwb: rewriting would mess up a document with multiple values, wouldn't it? Richard Boulton: Not entirely. Richard Boulton: add_value adds on the values at the end of the list. Richard Boulton: without checking them. ojwb: but aren't the unserialised into a map in Xapian::Document? Richard Boulton: Oh. Ah. Richard Boulton: Yes, so getting a document out and then inserting it again would lose the duplicates. Richard Boulton: But that's a pretty nice way to degrade. ojwb: yeah, it's not too bad Richard Boulton: It would be nicer if we'd named Document::add_value() as document::set_value() Richard Boulton: We can't change the behaviour of add_value() now, though: I suppose we could add Document::append_value() Richard Boulton: And leave Document::remove_value() as removing all values with a given number. Richard Boulton: Document::get_value() would return the first value for a given valueno. Richard Boulton: And we could add Document::get_values() which gets a list of all the values for a given valueno. Richard Boulton: Hmm - I wonder if the list of values for each valueno should be kept in insertion order. Or sorted in some way (binary sort, I would think). ojwb: It shouldn't sort them I think Richard Boulton: I think just in insertion order. Richard Boulton: *snap* ojwb: because you want a "primary version" Richard Boulton: That's true. ojwb: which is used for sorting, etc ojwb: I'm not completely sure this is a good plan, but it seems to have merit Richard Boulton: Yes. That's the main thing I was unhappy about StringListSerialiser for - you couldn't sensibly sort on the resulting values. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #213 | Expose statistics to user defined Xapian::Weight subclasses | Library API | 1.1.0 | defect | 2007-11-24 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Currently, The Xapian::Weight::Internal class (which is, as of last night, the class holding the statistics for the whole collection used by the weight objects) is not publically visible. This means that it would be impossible, for example, for a user to write a weighting class equivalent to, say, the BM25Weight class, using the public API, because the statistics aren't available.
Xapian::Weight::Internal class is now nearly clean enough that it could reasonably be made public, allowing custom weighting classes access to all the statistics currently available. We might want to make the termfreq and reltermfreq members private, since they're likely to be accessed mainly through the accessor functions anyway. Also we might want to combine them into a single map with entries holding both the termfreq and the reltermfreq, since it's usual to want to access both the termfreq and the reltermfreq for a particular term at the same time. Also, we might want to call the class Xapian::Stats, instead of Xapian::Weight::Internal, to reflect the Stats being part of the public API, but this would require an ABI change, so would have to wait for 1.1.0. (We could keep the API compatible by making Xapian::Weight::Internal a typedef for Xapian::Stats, I think; currently Stats (with no namespace) is a typedef for Xapian::Weight::Internal). |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #229 | Stub databases should be read with msvc_posix_open | Other | defect | 2008-01-28 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Currently, stub databases are read using a standard C++ ifstream. (See backends/database.cc, function open_stub()) This works fine, except that if a user (or the database replication code) tries, on Windows, to atomically rename a new stub db file over an existing one, it will receive an error if the old stub DB file was open. This can be avoided if we instead use msvc_posix_open() (or just open() on unix) in open_stub() to get a file handle for the stub database, and access it using C file-handling routines. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #236 | Implement automated tests of concurrent db replication and modification | Backend-Flint | defect | 2008-02-05 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Currently, there is no automated test of the behaviour when the replication function is doing a full copy of a database which gets modified while the copy is in progress. I've done a manual test of this, so I'm moderately confident it works right, but I can't work out how to do a reliable automated test of it... at least, not without hacking a big sleep (or even a condition) into the database copying code, to allow me to be sure of getting some modifications done in the middle of it. Any suggestions appreciated. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #243 | common/fileutils.cc needs tests | Test Suite | 1.1.0 | defect | 2008-03-05 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This file was added for use by the replication stuff, and handles parsing and some simple manipulation of paths. This is particularly tricky for windows paths, unfortunately, and needs proper testing. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #268 | Review ValueWeightPostingSource, possibly replace with a query operator | Library API | 1.1.0 | defect | 2008-05-12 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
External PostingSources? have at least two annoying limitations (don't work with remote databases, don't work well with multi databases). The newly added ValueWeightPostingSource? simply reads a value slot, returns documents with a non empty value, and returns the weight obtained by applying sortable_unserialise to the slot. Therefore, it could be implemented instead by a query operator, which would be similar to the existing OP_VALUE_... operators. This would make the feature available with remote databases and multi databases. There may be a cleaner alternative which we haven't thought of yet, too. Marking this for 1.1.0, since ValueWeightPostingSource? isn't yet in the API for any release, and we should remove it before making a release if we're going to remove it at all. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #278 | When changesets are being generated, old changesets aren't cleaned up | Backend-Chert | 1.1.0 | defect | 2008-06-23 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Currently, changesets are generated when the "XAPIAN_MAX_CHANGESETS" environment variable is set to a non-empty value. However, they are never removed. Whenever a changeset is generated, the number of changesets around should be checked, and old changesets should be removed if too many old changesets exist. Alternatively (or as well), a different criteria might be useful for the changesets: it might be useful to be able to set an absolute limit on the total size of the changesets, or perhaps, a limit on the total size of the changesets as a proportion of the total database size. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #104 | Wildcard queries should use synonym instead of OR | QueryParser | 1.1.0 | enhancement | 2006-12-13 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
When the synonym query operator, and synonym postlists, are implemented, the queryparser should build wildcard queries using the synonym operator instead of the OR operator. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #107 | We should have an automated performance test suite | Test Suite | enhancement | 2007-01-02 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We need to be able to keep track of how changes to the code affect the performance (ie, speed / resource usage) of Xapian. In particular, we should be able to test how fast a standard set of data is indexed and searched, simply by running a single command (ideally, integrated into the build system - eg, "make speedcheck"). I have the beginnings of such a system, in the shape of some python code which builds a wikipedia index. I'm starting this bug to keep track of progress on building this. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #128 | Allow queryparser to treat some prefixes as literal text | QueryParser | 1.1.0 | enhancement | 2007-04-12 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
By default, the query parser splits words at spaces and applies lower-casing, stemming, and other normalisation to generate terms. I believe that it should be possible to override the query parser's default behaviour for fields with a given set of prefixs, such that the query parser will treat some terms as literal text, allowing any character to occur in the term (including spaces and quotes), and not applying stemming or other normalisation to the term. My thinking is that this can be implemented by adding a third prefix type (which I've called "EXACT_TEXT" for want of a better name), which causes the query parser to put all the characters following the prefix until the next space or ')' into the term (like terms with a "BOOL_FILTER" prefix type). The terms so generated are then included in the query structure in the same way as "FREE_TEXT" terms - ie, they obey surrounding boolean operators, and '+' and '-' prefixes. In order to allow spaces (and ')' characters) in the terms, the query parser should support basic backslash escaping for the contents of such fields. I have a patch which implements this that I'll attach to this bug report shortly. The patch has a few test cases (but more are needed for such a new feature), and has I've not written any documentation for it yet. I know that Sidnei needs this for something he's working on, and I'd be delighted if we managed to get this into 1.0 since I'm going to have to maintain it until it gets committed, but it needs thorough review before being committed and timescales for 1.0 may not allow this. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #173 | Bindings should have an explicit WritableDatabase::close() method | Xapian-bindings | enhancement | 2007-06-22 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In garbage collected languages, it is hard to ensure that a WritableDatabase? object has been closed, because this requires ensuring that no objects still hold a reference to it. To make this easier, WritableDatabases? should have an explicit close() method, which would delete the underlying C++ object. After this method has been called, all other methods on the WritableDatabase? object in the bindings would be invalid. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #227 | Implement database replication system | Other | 1.1.0 | enhancement | 2008-01-18 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have a setup where I would like to be able to perform index updates one one master database, and then replicate this database to multiple client machines for searching. I've experimented with using an NFS setup for this, with the database kept local on the index server and mounted remotely on the search clients, hoping that the client machines would keep enough of the database cached that the network traffic would not slow down searches too much. However, this method doesn't work satisfactorily because the NFS protocol doesn't allow NFS clients to get information about file updates other than by polling the mtime of a file: therefore, whenever the index is updated, any cached pages from the database are discarded. This leads to many very slow searches. For now, I'm looking at setting up a system to take snapshots of databases using filesystem features (eg, the snapshot functionality provided by ZFS) and then using xdelta to calculate the differences between the databases, transferring the differences manually, and then applying the differences to the database on the search machines. However, this approach has two major drawbacks: firstly, it depends on filesystem specific features (to take filesystem snapshots - a standard file copy could be used, but this would have poor cache performance, which is exactly what we're trying to avoid). Secondly, it requires the whole database to be traversed on the index machine to calculate the binary diffs. This is undesirable because it imposes unnecessary load on the index machine. Instead, I would like to have a hook into flint which writes out a list of the modified btree pages, so that these can then be distributed to the search servers. If this information was written to a log file, together with the points at which fsync were called, and with details of the changes made to the base files, this log file could be transferred to the search machines, and could be replayed there, with minimal work required there. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #189 | Add a place for translations of the documentation to the source tree | Other | task | 2007-08-09 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yung-chung Lin has translated the intro_ir.html document into zh_TW. It would be good to have a place in the source tree to put such documents. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
