#543 closed defect (incomplete)
excessive postlist.DB reads on flush
| Reported by: | fulltext | Owned by: | Olly Betts |
|---|---|---|---|
| Priority: | normal | Milestone: | |
| Component: | Backend-Chert | Version: | 1.2.5 |
| Severity: | normal | Keywords: | |
| Cc: | Blocked By: | ||
| Blocking: | Operating System: | All |
Description
Problem: as database grows the commits become unbearably slow due to iowait maxing out - likely due to postlist.DB reads
The issue has been tested on:
2.8Ghz P4 HT, 1GB RAM sda: Ubuntu 10 + Chert database except postlist.DB sdb: postlist.DB file only (symlinked)
Test results including iostat output follows:
Starting building the database from scratch (nice write speed, no reads in sdb at all):
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 21.50 0.00 3552.00 0 7104
sdb 133.00 0.00 23060.00 0 46120
First appearance of reads in sdb:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 11.00 0.00 2416.00 0 4832
sdb 188.50 72.00 5368.00 144 10736
number of documents = 656208
average document length = 17
document length lower bound = 3
document length upper bound = 52
highest document id ever used = 656208
172680
More reads (now matching or exceeding writes):
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 4.00 0.00 124.00 0 248
sdb 223.50 3200.00 1512.00 6400 3024
number of documents = 1344294
average document length = 18
document length lower bound = 3
document length upper bound = 52
highest document id ever used = 1344294
363296
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 356.50 4504.00 4136.00 9008 8272
number of documents = 2646571
average document length = 15
document length lower bound = 2
document length upper bound = 52
highest document id ever used = 2646571
641200
Finally it becomes really slow (single 10000-record flush, 4 sec intervals iostat):
$ iostat 4
Linux 2.6.35-28-generic (GX280) 11-04-08 _i686_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
12.82 0.12 1.14 37.96 0.00 47.96
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 3.06 124.24 196.06 6726523 10614866
sdb 208.38 1934.57 3506.29 104741834 189837952
avg-cpu: %user %nice %system %iowait %steal %idle
5.77 0.00 1.06 40.28 0.00 52.89
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 3.00 10.00 142.00 40 568
sdb 165.25 5570.00 0.00 22280 0
avg-cpu: %user %nice %system %iowait %steal %idle
6.74 0.00 0.70 39.42 0.00 53.14
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.75 0.00 12.00 0 48
sdb 166.50 5020.00 0.00 20080 0
avg-cpu: %user %nice %system %iowait %steal %idle
7.08 0.00 1.28 38.17 0.00 53.48
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.00 0.00 20.00 0 80
sdb 159.25 4888.00 0.00 19552 0
avg-cpu: %user %nice %system %iowait %steal %idle
3.17 0.00 1.58 44.95 0.00 50.30
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 3.25 0.00 634.00 0 2536
sdb 225.25 6712.00 0.00 26848 0
avg-cpu: %user %nice %system %iowait %steal %idle
2.45 0.00 2.33 59.19 0.00 36.03
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.00 0.00 22.00 0 88
sdb 249.75 8794.00 6.00 35176 24
avg-cpu: %user %nice %system %iowait %steal %idle
1.60 0.00 1.35 64.66 0.00 32.39
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 301.50 5224.00 2350.00 20896 9400
avg-cpu: %user %nice %system %iowait %steal %idle
0.74 0.00 1.11 54.21 0.00 43.94
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 329.50 5614.00 2966.00 22456 11864
avg-cpu: %user %nice %system %iowait %steal %idle
1.84 0.00 1.35 50.49 0.00 46.31
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 374.25 6438.00 3468.00 25752 13872
avg-cpu: %user %nice %system %iowait %steal %idle
1.47 0.00 1.22 54.77 0.00 42.54
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 355.00 6832.00 3302.00 27328 13208
avg-cpu: %user %nice %system %iowait %steal %idle
1.47 0.00 1.84 54.60 0.00 42.09
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 5.25 152.00 22.00 608 88
sdb 398.00 7882.00 3402.00 31528 13608
avg-cpu: %user %nice %system %iowait %steal %idle
1.35 0.00 1.84 58.90 0.00 37.91
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 26.00 1150.00 0.00 4600 0
sdb 404.50 8594.00 3174.00 34376 12696
avg-cpu: %user %nice %system %iowait %steal %idle
1.35 0.00 1.84 65.40 0.00 31.41
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 413.25 9080.00 3662.00 36320 14648
avg-cpu: %user %nice %system %iowait %steal %idle
1.24 0.00 1.11 48.95 0.00 48.70
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.00 0.00 24.00 0 96
sdb 408.00 6694.00 4552.00 26776 18208
avg-cpu: %user %nice %system %iowait %steal %idle
1.47 0.00 1.23 61.64 0.00 35.66
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 414.25 7566.00 4086.00 30264 16344
avg-cpu: %user %nice %system %iowait %steal %idle
1.24 0.00 1.49 52.48 0.00 44.79
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 438.25 6764.00 4974.00 27056 19896
avg-cpu: %user %nice %system %iowait %steal %idle
6.92 0.00 2.35 66.38 0.00 24.35
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 13.50 724.00 1350.00 2896 5400
sdb 415.75 2916.00 7010.00 11664 28040
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.12 98.13 0.00 0.75
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 437.50 0.00 9580.00 0 38320
avg-cpu: %user %nice %system %iowait %steal %idle
0.12 0.00 0.62 66.38 0.00 32.88
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 476.75 0.00 11138.00 0 44552
avg-cpu: %user %nice %system %iowait %steal %idle
0.86 0.00 0.62 48.15 0.00 50.37
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 4.25 0.00 266.00 0 1064
sdb 346.25 1604.00 13242.00 6416 52968
avg-cpu: %user %nice %system %iowait %steal %idle
5.67 0.00 0.59 41.09 0.00 52.66
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 191.50 6212.00 0.00 24848 0
number of documents = 4001242
average document length = 16
document length lower bound = 1
document length upper bound = 56
highest document id ever used = 4001242
1022248
So why is it reading from postlist.DB for before writing and then during the writing? The size of the database at this point is only 999MB in sda + 1995MB postlist.DB=about 3GB
Change History (4)
comment:1 by , 15 years ago
| Keywords: | iowait flush postlist removed |
|---|---|
| Milestone: | → 1.2.x |
| Status: | new → assigned |
comment:3 by , 14 years ago
| Resolution: | → incomplete |
|---|---|
| Status: | assigned → closed |
No response from submitter for 3 months, so resolving as incomplete.
Additional information is still most welcome - feel free to provide some and reopen the ticket.
Note:
See TracTickets
for help on using tickets.

Thanks for your report.
Xapian is merging the new postlist entries with the existing ones, so it has to read some of the old ones to do this. The best way to make this more efficient (if you have spare RAM) is to increase the batch size - set XAPIAN_FLUSH_THRESHOLD in the environment to control this (and make sure it is exported if setting it in the shell). It counts in documents changed and the default is 10000.
You can also build a number of smaller databases and merge them with xapian-compact.
You don't say anything about the indexing you're doing. If you're only appending new documents, i.e. only calling add_document(), that's significantly more efficient than replacing existing documents (as we only need to read the first and last block of each postlist). Also, looking up a unique term will require some additional reads.
I don't have a good explanation for the times where it reads significantly more than it writes though. Unless you're causing significant postlist reads in your indexing code, it sounds like a bug in Xapian, but perhaps the system was running other stuff during that batch which looked at sdb, e.g. background stuff from cron like the locate db updater or the tracker file search thing?
Measuring I/O for just the Xapian process would eliminate this potential issue if there's other stuff reading from sdb. If it's a Xapian bug, it's probably going to be hard to track down without more information - ideally some code which reproduces this.
If that's difficult, checking with the flint backend might be informative. If flint doesn't exhibit this behaviour, it's related to something which changed between the two, which would narrow it down quite a lot.