Opened 10 years ago

Closed 10 years ago

#664 closed defect (notabug)

omindex hangs on indexing 10G database

Reported by: Hubert J. Owned by: Olly Betts
Priority: normal Milestone:
Component: Omega Version: 1.2.16
Severity: normal Keywords: hang
Cc: Blocked By:
Blocking: Operating System: Linux

Description (last modified by Hubert J.)

I have been launching omindex on a 10G database. It hangs at a certain point without warning. There is nothing special in the log, and it does not hang always at the same point in the data. At that moment the process is taking 3.5G of memory, of which 2.76G resident. It slightly changes, but remains there for hours. How can I troubleshoot the issue? thanks for any help or suggestions, and have a great day, hubert

Platform: Linux Ubuntu 14.04.1 Package xapian-omega in Version: 1.2.16-1

Closed as user error. See comment for details

Change History (4)

comment:1 by Olly Betts, 10 years ago

It sounds to me like it is probably flushing changes to disk when it appears to hang. Xapian batches up changes to the postlist table, and every 10000 (by default) documents changed it will flush them to disk. This takes a while, especially with a big database. It ought not take hours, but if things stop fitting in memory it may start swapping so probably could.

I would check how much swapping and disk I/O is happening - this will should a reading every 5 seconds until you hit Ctrl+C:

vmstat 5

Look at the columns si/so (which is the number of blocks swapped in the last time interval) and bi/bo (which is blocks read and written by processes) - I'd expect you'll see quite a lot of both.

You can adjust the threshold lower by setting XAPIAN_FLUSH_THRESHOLD in the environment (and exporting it so that subprocesses actually see the value set) - e.g. to reduce it to 1000 try:

export XAPIAN_FLUSH_THRESHOLD=1000

Ideally this should auto-adjust based on the amount of memory needed to batch the data compared to what's available, but it doesn't currently.

comment:2 by Hubert J., 10 years ago

Olly, you were right! Changing the threshold to 1000 allowed the job to finish. I did not verify the vmstat values, but the process had taken a lot of memory and it was certainly swapping. I will try a couple of other settings to adjust, publish the results, and close the ticket. Thanks so much for your help! Hubert J.

comment:3 by Hubert J., 10 years ago

Olly, any other idea about why it would hang still? I reduced XAPIAN_FLUSH_THRESHOLD=500 but still observe the same symptom. As per below, the system is not swapping, but performing io. When I query lsof -p <pid> the offsets of the files are not changing, so I'm wondering what I/O it is doing. Substantially the process takes all memory available (2.4G resident + 3.2G swapped)

Any further suggestion would be wildly appreciated. Thanks a lot, Hubert

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1   3034    134     19     19    2    1  1890   891  749  908  0  1 79 20  0
 0  2   3032    122     19     19    2    0  2729    34  315  509  0  0 83 17  0
 0  1   3029    106     20     20    3    0  3026    87  538  809  0  1 81 18  0
 1  2   3031    115     20     19    1    1  1590   959  520  650  0  0 77 23  0
 0  1   3030    107     21     20    2    0  2005   170  637  967  0  1 74 25  0
 0  1   3030    116     21     20    2    0  2162   557  610  783  1  1 72 25  0
 0  1   3027    104     21     21    2    0  2654    64  547  917  0  0 85 14  0
 0  2   3032    113     21     21    1    1  1047  1608  648  706  0  1 75 24  0
 0  1   3033    114     22     21    1    0  1684   517  564  731  0  1 71 28  0
 0  1   3032    112     22     21    2    0  2488   434  708 1009  0  1 81 19  0
 0  1   3029    100     22     21    2    0  2637   212  452  779  0  0 83 16  0
 0  1   3033    112     21     22    1    1   811  1212  591  668  0  1 64 35  0
 0  1   3030     99     21     22    3    0  2830    21  370  661  0  0 83 16  0
 0  1   3034    112     20     22    1    1   980  1178  531  542  0  1 64 35  0
 0  1   3031    100     20     22    3    0  2741   255  435  748  0  0 84 15  0
 0  1   3034    123     15     20    1    1  1244   951  501  555  0  1 70 30  0
 0  1   3031    110     15     20    3    0  2805    22  381  690  0  0 84 15  0
 0  2   3030    102     14     20    2    0  2385   590  465  671  0  1 82 17  0
 0  1   3031    112     14     20    1    0  1515   415  445  599  0  0 71 29  0
 0  2   3029    100     14     21    3    0  2730   236  395  653  0  0 84 15  0
 1  1   3035    121     14     20    0    1    60  1415  549  489  0  0 48 52  0

The last line on the stdoutput is ========= last line ================== no text extracted from document body, but indexing metadata anyway ==========end of last line============

comment:4 by Hubert J., 10 years ago

Description: modified (diff)
Resolution: notabug
Status: newclosed

Closing the ticket as user error. I was saving a dirty file, binary, but qualified as html to xapian. When indexing it, and seeing the random bytes, vocabulary for xapian would explode and fill the memory. Key to analysis was to use delve and see that the vocabulary was largely composed of random binary words. Solution is to cancel those files from indexing, or qualify their type properly.

Note: See TracTickets for help on using tickets.