#491 closed defect (invalid)
calling replace_document() twice doubles the size of the database
Reported by: | maad | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Other | Version: | |
Severity: | major | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Calling replace_document twice with the same key doubles the size of the database on disk (unexpected). However when running search only one document returned (expected). Subsequent replace_document calls do not change the database size(expected). This is the code to replicate:
<?php include "xapian.php"; $value="thisistest"; myreplace($value); passthru("ls -l replace.db"); myreplace($value); passthru("ls -l replace.db"); myreplace($value); passthru("ls -l replace.db"); function myreplace($value) { try { // Open the database for update, creating a new database if necessary. $database = new XapianWritableDatabase("replace.db", Xapian::DB_CREATE_OR_OPEN); $indexer = new XapianTermGenerator(); $stemmer = new XapianStem("english"); $indexer->set_stemmer($stemmer); $doc = new XapianDocument(); $doc->add_term("Q$value"); $indexer->set_document($doc); $indexer->index_text($value); //$database->add_document($doc); $database->replace_document("Q$value", $doc); $database = Null; } catch (Exception $e) { print $e->getMessage() . "\n"; exit(1); } } ?>
Output:
total 84 -rw-r--r-- 1 alex alex 0 2010-06-28 20:30 flintlock -rw-r--r-- 1 alex alex 12 2010-06-28 20:30 iamflint -rw-r--r-- 1 alex alex 13 2010-06-28 20:30 position.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 position.baseB -rw-r--r-- 1 alex alex 8192 2010-06-28 20:30 position.DB -rw-r--r-- 1 alex alex 13 2010-06-28 20:30 postlist.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 postlist.baseB -rw-r--r-- 1 alex alex 8192 2010-06-28 20:30 postlist.DB -rw-r--r-- 1 alex alex 13 2010-06-28 20:30 record.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 record.baseB -rw-r--r-- 1 alex alex 8192 2010-06-28 20:30 record.DB -rw-r--r-- 1 alex alex 13 2010-06-28 20:30 termlist.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 termlist.baseB -rw-r--r-- 1 alex alex 8192 2010-06-28 20:30 termlist.DB -rw-r--r-- 1 alex alex 13 2010-06-28 20:30 value.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 value.baseB -rw-r--r-- 1 alex alex 8192 2010-06-28 20:30 value.DB total 124 -rw-r--r-- 1 alex alex 0 2010-06-28 20:30 flintlock -rw-r--r-- 1 alex alex 12 2010-06-28 20:30 iamflint -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 position.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 position.baseB -rw-r--r-- 1 alex alex 16384 2010-06-28 20:30 position.DB -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 postlist.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 postlist.baseB -rw-r--r-- 1 alex alex 16384 2010-06-28 20:30 postlist.DB -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 record.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 record.baseB -rw-r--r-- 1 alex alex 16384 2010-06-28 20:30 record.DB -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 termlist.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 termlist.baseB -rw-r--r-- 1 alex alex 16384 2010-06-28 20:30 termlist.DB -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 value.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 value.baseB -rw-r--r-- 1 alex alex 16384 2010-06-28 20:30 value.DB total 124 -rw-r--r-- 1 alex alex 0 2010-06-28 20:30 flintlock -rw-r--r-- 1 alex alex 12 2010-06-28 20:30 iamflint -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 position.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 position.baseB -rw-r--r-- 1 alex alex 16384 2010-06-28 20:30 position.DB -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 postlist.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 postlist.baseB -rw-r--r-- 1 alex alex 16384 2010-06-28 20:30 postlist.DB -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 record.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 record.baseB -rw-r--r-- 1 alex alex 16384 2010-06-28 20:30 record.DB -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 termlist.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 termlist.baseB -rw-r--r-- 1 alex alex 16384 2010-06-28 20:30 termlist.DB -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 value.baseA -rw-r--r-- 1 alex alex 14 2010-06-28 20:30 value.baseB -rw-r--r-- 1 alex alex 16384 2010-06-28 20:30 value.DB
Change History (3)
comment:1 by , 14 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 14 years ago
would not it be possible if it checked if the data is absolutely the same and than would do nothing? it is kinda awkward behavior, at least in case when the record does not change at all
comment:3 by , 14 years ago
We do actually check that. But to commit the table, we need a new root block. So if you replace a million documents with themselves, you should see the table sizes increase by a single block (8KB by default). So it's only "doubles the size" if your database tables are only 8KB to start with.
The new Btree manager I'm working on (see browser:branches/brass-btree for the code) reuses the existing root block in this case.
It would be nice to reuse the root block in the current backends too, but when I looked briefly at trying to avoid committing unmodified tables some years ago, the obvious simple change didn't worked, so I think somewhere assumes that the revision number stored in the root block is that of the latest revision.
If you want to work on a patch to achieve this for the current chert backend, feel free - chert_table.cc and chert_database.cc are the files to look at. If the patch looks sane and the test coverage for it is good, I would support applying it.
This isn't a bug, just how the Btree files work.
Modifications are made by copying modified blocks so that we can rollback change. So after the second operation, each table has 2 blocks in, one unused. Subsequent repetitions reuse the spare block. If you want to eliminate the unused space, xapian-compact will make a copy of the table with unused space minimised.