#543 closed defect (incomplete)
excessive postlist.DB reads on flush
Reported by: | fulltext | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Backend-Chert | Version: | 1.2.5 |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Problem: as database grows the commits become unbearably slow due to iowait maxing out - likely due to postlist.DB reads
The issue has been tested on:
2.8Ghz P4 HT, 1GB RAM sda: Ubuntu 10 + Chert database except postlist.DB sdb: postlist.DB file only (symlinked)
Test results including iostat output follows:
Starting building the database from scratch (nice write speed, no reads in sdb at all): Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 21.50 0.00 3552.00 0 7104 sdb 133.00 0.00 23060.00 0 46120 First appearance of reads in sdb: Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 11.00 0.00 2416.00 0 4832 sdb 188.50 72.00 5368.00 144 10736 number of documents = 656208 average document length = 17 document length lower bound = 3 document length upper bound = 52 highest document id ever used = 656208 172680 More reads (now matching or exceeding writes): Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 4.00 0.00 124.00 0 248 sdb 223.50 3200.00 1512.00 6400 3024 number of documents = 1344294 average document length = 18 document length lower bound = 3 document length upper bound = 52 highest document id ever used = 1344294 363296 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 356.50 4504.00 4136.00 9008 8272 number of documents = 2646571 average document length = 15 document length lower bound = 2 document length upper bound = 52 highest document id ever used = 2646571 641200 Finally it becomes really slow (single 10000-record flush, 4 sec intervals iostat): $ iostat 4 Linux 2.6.35-28-generic (GX280) 11-04-08 _i686_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 12.82 0.12 1.14 37.96 0.00 47.96 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 3.06 124.24 196.06 6726523 10614866 sdb 208.38 1934.57 3506.29 104741834 189837952 avg-cpu: %user %nice %system %iowait %steal %idle 5.77 0.00 1.06 40.28 0.00 52.89 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 3.00 10.00 142.00 40 568 sdb 165.25 5570.00 0.00 22280 0 avg-cpu: %user %nice %system %iowait %steal %idle 6.74 0.00 0.70 39.42 0.00 53.14 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.75 0.00 12.00 0 48 sdb 166.50 5020.00 0.00 20080 0 avg-cpu: %user %nice %system %iowait %steal %idle 7.08 0.00 1.28 38.17 0.00 53.48 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.00 0.00 20.00 0 80 sdb 159.25 4888.00 0.00 19552 0 avg-cpu: %user %nice %system %iowait %steal %idle 3.17 0.00 1.58 44.95 0.00 50.30 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 3.25 0.00 634.00 0 2536 sdb 225.25 6712.00 0.00 26848 0 avg-cpu: %user %nice %system %iowait %steal %idle 2.45 0.00 2.33 59.19 0.00 36.03 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.00 0.00 22.00 0 88 sdb 249.75 8794.00 6.00 35176 24 avg-cpu: %user %nice %system %iowait %steal %idle 1.60 0.00 1.35 64.66 0.00 32.39 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 301.50 5224.00 2350.00 20896 9400 avg-cpu: %user %nice %system %iowait %steal %idle 0.74 0.00 1.11 54.21 0.00 43.94 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 329.50 5614.00 2966.00 22456 11864 avg-cpu: %user %nice %system %iowait %steal %idle 1.84 0.00 1.35 50.49 0.00 46.31 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 374.25 6438.00 3468.00 25752 13872 avg-cpu: %user %nice %system %iowait %steal %idle 1.47 0.00 1.22 54.77 0.00 42.54 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 355.00 6832.00 3302.00 27328 13208 avg-cpu: %user %nice %system %iowait %steal %idle 1.47 0.00 1.84 54.60 0.00 42.09 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 5.25 152.00 22.00 608 88 sdb 398.00 7882.00 3402.00 31528 13608 avg-cpu: %user %nice %system %iowait %steal %idle 1.35 0.00 1.84 58.90 0.00 37.91 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 26.00 1150.00 0.00 4600 0 sdb 404.50 8594.00 3174.00 34376 12696 avg-cpu: %user %nice %system %iowait %steal %idle 1.35 0.00 1.84 65.40 0.00 31.41 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 413.25 9080.00 3662.00 36320 14648 avg-cpu: %user %nice %system %iowait %steal %idle 1.24 0.00 1.11 48.95 0.00 48.70 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.00 0.00 24.00 0 96 sdb 408.00 6694.00 4552.00 26776 18208 avg-cpu: %user %nice %system %iowait %steal %idle 1.47 0.00 1.23 61.64 0.00 35.66 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 414.25 7566.00 4086.00 30264 16344 avg-cpu: %user %nice %system %iowait %steal %idle 1.24 0.00 1.49 52.48 0.00 44.79 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 438.25 6764.00 4974.00 27056 19896 avg-cpu: %user %nice %system %iowait %steal %idle 6.92 0.00 2.35 66.38 0.00 24.35 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 13.50 724.00 1350.00 2896 5400 sdb 415.75 2916.00 7010.00 11664 28040 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 1.12 98.13 0.00 0.75 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 437.50 0.00 9580.00 0 38320 avg-cpu: %user %nice %system %iowait %steal %idle 0.12 0.00 0.62 66.38 0.00 32.88 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 476.75 0.00 11138.00 0 44552 avg-cpu: %user %nice %system %iowait %steal %idle 0.86 0.00 0.62 48.15 0.00 50.37 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 4.25 0.00 266.00 0 1064 sdb 346.25 1604.00 13242.00 6416 52968 avg-cpu: %user %nice %system %iowait %steal %idle 5.67 0.00 0.59 41.09 0.00 52.66 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 191.50 6212.00 0.00 24848 0 number of documents = 4001242 average document length = 16 document length lower bound = 1 document length upper bound = 56 highest document id ever used = 4001242 1022248
So why is it reading from postlist.DB for before writing and then during the writing? The size of the database at this point is only 999MB in sda + 1995MB postlist.DB=about 3GB
Change History (4)
comment:1 by , 14 years ago
Keywords: | iowait flush postlist removed |
---|---|
Milestone: | → 1.2.x |
Status: | new → assigned |
comment:3 by , 13 years ago
Resolution: | → incomplete |
---|---|
Status: | assigned → closed |
No response from submitter for 3 months, so resolving as incomplete.
Additional information is still most welcome - feel free to provide some and reopen the ticket.
Note:
See TracTickets
for help on using tickets.
Thanks for your report.
Xapian is merging the new postlist entries with the existing ones, so it has to read some of the old ones to do this. The best way to make this more efficient (if you have spare RAM) is to increase the batch size - set XAPIAN_FLUSH_THRESHOLD in the environment to control this (and make sure it is exported if setting it in the shell). It counts in documents changed and the default is 10000.
You can also build a number of smaller databases and merge them with xapian-compact.
You don't say anything about the indexing you're doing. If you're only appending new documents, i.e. only calling add_document(), that's significantly more efficient than replacing existing documents (as we only need to read the first and last block of each postlist). Also, looking up a unique term will require some additional reads.
I don't have a good explanation for the times where it reads significantly more than it writes though. Unless you're causing significant postlist reads in your indexing code, it sounds like a bug in Xapian, but perhaps the system was running other stuff during that batch which looked at sdb, e.g. background stuff from cron like the locate db updater or the tracker file search thing?
Measuring I/O for just the Xapian process would eliminate this potential issue if there's other stuff reading from sdb. If it's a Xapian bug, it's probably going to be hard to track down without more information - ideally some code which reproduces this.
If that's difficult, checking with the flint backend might be informative. If flint doesn't exhibit this behaviour, it's related to something which changed between the two, which would narrow it down quite a lot.