Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#543 closed defect (incomplete)

excessive postlist.DB reads on flush

Reported by: fulltext Owned by: Olly Betts
Priority: normal Milestone:
Component: Backend-Chert Version: 1.2.5
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Problem: as database grows the commits become unbearably slow due to iowait maxing out - likely due to postlist.DB reads

The issue has been tested on:

2.8Ghz P4 HT, 1GB RAM
sda: Ubuntu 10 + Chert database except postlist.DB
sdb: postlist.DB file only (symlinked)

Test results including iostat output follows:

Starting building the database from scratch (nice write speed, no reads in sdb at all):
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              21.50         0.00      3552.00          0       7104
sdb             133.00         0.00     23060.00          0      46120

First appearance of reads in sdb:
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              11.00         0.00      2416.00          0       4832
sdb             188.50        72.00      5368.00        144      10736

number of documents = 656208
average document length = 17
document length lower bound = 3
document length upper bound = 52
highest document id ever used = 656208
172680

More reads (now matching or exceeding writes):
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               4.00         0.00       124.00          0        248
sdb             223.50      3200.00      1512.00       6400       3024

number of documents = 1344294
average document length = 18
document length lower bound = 3
document length upper bound = 52
highest document id ever used = 1344294
363296

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb             356.50      4504.00      4136.00       9008       8272

number of documents = 2646571
average document length = 15
document length lower bound = 2
document length upper bound = 52
highest document id ever used = 2646571
641200 

Finally it becomes really slow (single 10000-record flush, 4 sec intervals iostat):
$ iostat 4
Linux 2.6.35-28-generic (GX280)         11-04-08        _i686_  (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.82    0.12    1.14   37.96    0.00   47.96

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               3.06       124.24       196.06    6726523   10614866
sdb             208.38      1934.57      3506.29  104741834  189837952

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.77    0.00    1.06   40.28    0.00   52.89

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               3.00        10.00       142.00         40        568
sdb             165.25      5570.00         0.00      22280          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.74    0.00    0.70   39.42    0.00   53.14

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.75         0.00        12.00          0         48
sdb             166.50      5020.00         0.00      20080          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.08    0.00    1.28   38.17    0.00   53.48

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.00         0.00        20.00          0         80
sdb             159.25      4888.00         0.00      19552          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.17    0.00    1.58   44.95    0.00   50.30

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               3.25         0.00       634.00          0       2536
sdb             225.25      6712.00         0.00      26848          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.45    0.00    2.33   59.19    0.00   36.03

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.00         0.00        22.00          0         88
sdb             249.75      8794.00         6.00      35176         24

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.60    0.00    1.35   64.66    0.00   32.39

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb             301.50      5224.00      2350.00      20896       9400

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.74    0.00    1.11   54.21    0.00   43.94

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb             329.50      5614.00      2966.00      22456      11864

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.84    0.00    1.35   50.49    0.00   46.31

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb             374.25      6438.00      3468.00      25752      13872

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.47    0.00    1.22   54.77    0.00   42.54

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb             355.00      6832.00      3302.00      27328      13208

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.47    0.00    1.84   54.60    0.00   42.09

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               5.25       152.00        22.00        608         88
sdb             398.00      7882.00      3402.00      31528      13608

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.35    0.00    1.84   58.90    0.00   37.91

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              26.00      1150.00         0.00       4600          0
sdb             404.50      8594.00      3174.00      34376      12696

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.35    0.00    1.84   65.40    0.00   31.41

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb             413.25      9080.00      3662.00      36320      14648

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.24    0.00    1.11   48.95    0.00   48.70

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.00         0.00        24.00          0         96
sdb             408.00      6694.00      4552.00      26776      18208

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.47    0.00    1.23   61.64    0.00   35.66

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb             414.25      7566.00      4086.00      30264      16344

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.24    0.00    1.49   52.48    0.00   44.79

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb             438.25      6764.00      4974.00      27056      19896

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.92    0.00    2.35   66.38    0.00   24.35

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              13.50       724.00      1350.00       2896       5400
sdb             415.75      2916.00      7010.00      11664      28040

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.12   98.13    0.00    0.75

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb             437.50         0.00      9580.00          0      38320

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.12    0.00    0.62   66.38    0.00   32.88

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb             476.75         0.00     11138.00          0      44552

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.86    0.00    0.62   48.15    0.00   50.37

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               4.25         0.00       266.00          0       1064
sdb             346.25      1604.00     13242.00       6416      52968

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.67    0.00    0.59   41.09    0.00   52.66

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb             191.50      6212.00         0.00      24848          0

number of documents = 4001242
average document length = 16
document length lower bound = 1
document length upper bound = 56
highest document id ever used = 4001242
1022248

So why is it reading from postlist.DB for before writing and then during the writing? The size of the database at this point is only 999MB in sda + 1995MB postlist.DB=about 3GB

Change History (4)

comment:1 by Olly Betts, 13 years ago

Keywords: iowait flush postlist removed
Milestone: 1.2.x
Status: newassigned

Thanks for your report.

Xapian is merging the new postlist entries with the existing ones, so it has to read some of the old ones to do this. The best way to make this more efficient (if you have spare RAM) is to increase the batch size - set XAPIAN_FLUSH_THRESHOLD in the environment to control this (and make sure it is exported if setting it in the shell). It counts in documents changed and the default is 10000.

You can also build a number of smaller databases and merge them with xapian-compact.

You don't say anything about the indexing you're doing. If you're only appending new documents, i.e. only calling add_document(), that's significantly more efficient than replacing existing documents (as we only need to read the first and last block of each postlist). Also, looking up a unique term will require some additional reads.

I don't have a good explanation for the times where it reads significantly more than it writes though. Unless you're causing significant postlist reads in your indexing code, it sounds like a bug in Xapian, but perhaps the system was running other stuff during that batch which looked at sdb, e.g. background stuff from cron like the locate db updater or the tracker file search thing?

Measuring I/O for just the Xapian process would eliminate this potential issue if there's other stuff reading from sdb. If it's a Xapian bug, it's probably going to be hard to track down without more information - ideally some code which reproduces this.

If that's difficult, checking with the flint backend might be informative. If flint doesn't exhibit this behaviour, it's related to something which changed between the two, which would narrow it down quite a lot.

comment:2 by Olly Betts, 13 years ago

This ticket is currently waiting on feedback from the submitter...

comment:3 by Olly Betts, 13 years ago

Resolution: incomplete
Status: assignedclosed

No response from submitter for 3 months, so resolving as incomplete.

Additional information is still most welcome - feel free to provide some and reopen the ticket.

comment:4 by Olly Betts, 13 years ago

Milestone: 1.2.x

Unsetting milestone.

Note: See TracTickets for help on using tickets.