Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#688 closed defect (invalid)

stor value

Reported by: matf Owned by: Olly Betts
Priority: normal Milestone:
Component: Other Version: 1.2.21
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

backends/chert/chert_values.cc 193 static const size_t CHUNK_SIZE_THRESHOLD = 2000; 226 if (tag.size() >= CHUNK_SIZE_THRESHOLD) write_tag(); #define CHERT_DEFAULT_BLOCK_SIZE 8192

Whether I can modify function append_to_stream for the following ?

216 void append_to_stream(Xapian::docid did, const string & value) { 217 Assert(did); 218 if (tag.size()+sizeof(did)+1+value.size() >= table->block_size) write_tag(); 219 220 if (tag.empty()) { 221 new_first_did = did; 222 } else { 223 AssertRel(did,>,prev_did); 224 pack_uint(tag, did - prev_did - 1); 225 } 226 prev_did = did; 227 pack_string(tag, value); 228 }

the value table utilization ratio of less than the term table. 454M termlist.DB 342M termlist.DB.tar 748M postlist.DB 218M posilist.DB.tar

Change History (3)

comment:1 by matf, 9 years ago

sorry,Maybe I was wrong. CHUNK_SIZE_THRESHOLD = 2000; To improve the value lookup speed

in reply to:  description comment:2 by matf, 9 years ago

Resolution: invalid
Status: newclosed

Replying to matf:

backends/chert/chert_values.cc 193 static const size_t CHUNK_SIZE_THRESHOLD = 2000; 226 if (tag.size() >= CHUNK_SIZE_THRESHOLD) write_tag(); #define CHERT_DEFAULT_BLOCK_SIZE 8192

Whether I can modify function append_to_stream for the following ?

216 void append_to_stream(Xapian::docid did, const string & value) { 217 Assert(did); 218 if (tag.size()+sizeof(did)+1+value.size() >= table->block_size) write_tag(); 219 220 if (tag.empty()) { 221 new_first_did = did; 222 } else { 223 AssertRel(did,>,prev_did); 224 pack_uint(tag, did - prev_did - 1); 225 } 226 prev_did = did; 227 pack_string(tag, value); 228 }

the value table utilization ratio of less than the term table. 454M termlist.DB 342M termlist.DB.tar 748M postlist.DB 218M posilist.DB.tar

comment:3 by Olly Betts, 9 years ago

If I follow what you had in mind, I don't think it really works.

B-tree items larger than 1/4 of the block size will get split by the lower level B-tree code.

Also, tailoring the chunk size to exactly the remaining space in the block when it is first added doesn't help in the face of future changes to the block. It might be good for the chunking to be aware of the remaining space in the block, but I think it'll have to be more sophisticated than this to really work well in general.

the value table utilization ratio of less than the term table. 454M termlist.DB 342M termlist.DB.tar 748M postlist.DB 218M posilist.DB.tar

I'm not clear what you think the size of a tar file indicates here, but if you want to see the actual block utilisation, you can get that from xapian-check t.

Also, the postlist table stores chunks for both postlists and values.

Note: See TracTickets for help on using tickets.