Opened 10 years ago
Closed 10 years ago
#664 closed defect (notabug)
omindex hangs on indexing 10G database
Reported by: | Hubert J. | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Omega | Version: | 1.2.16 |
Severity: | normal | Keywords: | hang |
Cc: | Blocked By: | ||
Blocking: | Operating System: | Linux |
Description (last modified by )
I have been launching omindex on a 10G database. It hangs at a certain point without warning. There is nothing special in the log, and it does not hang always at the same point in the data. At that moment the process is taking 3.5G of memory, of which 2.76G resident. It slightly changes, but remains there for hours. How can I troubleshoot the issue? thanks for any help or suggestions, and have a great day, hubert
Platform: Linux Ubuntu 14.04.1 Package xapian-omega in Version: 1.2.16-1
Closed as user error. See comment for details
Change History (4)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
Olly, you were right! Changing the threshold to 1000 allowed the job to finish. I did not verify the vmstat values, but the process had taken a lot of memory and it was certainly swapping. I will try a couple of other settings to adjust, publish the results, and close the ticket. Thanks so much for your help! Hubert J.
comment:3 by , 10 years ago
Olly, any other idea about why it would hang still? I reduced XAPIAN_FLUSH_THRESHOLD=500 but still observe the same symptom. As per below, the system is not swapping, but performing io. When I query lsof -p <pid> the offsets of the files are not changing, so I'm wondering what I/O it is doing. Substantially the process takes all memory available (2.4G resident + 3.2G swapped)
Any further suggestion would be wildly appreciated. Thanks a lot, Hubert
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 1 3034 134 19 19 2 1 1890 891 749 908 0 1 79 20 0 0 2 3032 122 19 19 2 0 2729 34 315 509 0 0 83 17 0 0 1 3029 106 20 20 3 0 3026 87 538 809 0 1 81 18 0 1 2 3031 115 20 19 1 1 1590 959 520 650 0 0 77 23 0 0 1 3030 107 21 20 2 0 2005 170 637 967 0 1 74 25 0 0 1 3030 116 21 20 2 0 2162 557 610 783 1 1 72 25 0 0 1 3027 104 21 21 2 0 2654 64 547 917 0 0 85 14 0 0 2 3032 113 21 21 1 1 1047 1608 648 706 0 1 75 24 0 0 1 3033 114 22 21 1 0 1684 517 564 731 0 1 71 28 0 0 1 3032 112 22 21 2 0 2488 434 708 1009 0 1 81 19 0 0 1 3029 100 22 21 2 0 2637 212 452 779 0 0 83 16 0 0 1 3033 112 21 22 1 1 811 1212 591 668 0 1 64 35 0 0 1 3030 99 21 22 3 0 2830 21 370 661 0 0 83 16 0 0 1 3034 112 20 22 1 1 980 1178 531 542 0 1 64 35 0 0 1 3031 100 20 22 3 0 2741 255 435 748 0 0 84 15 0 0 1 3034 123 15 20 1 1 1244 951 501 555 0 1 70 30 0 0 1 3031 110 15 20 3 0 2805 22 381 690 0 0 84 15 0 0 2 3030 102 14 20 2 0 2385 590 465 671 0 1 82 17 0 0 1 3031 112 14 20 1 0 1515 415 445 599 0 0 71 29 0 0 2 3029 100 14 21 3 0 2730 236 395 653 0 0 84 15 0 1 1 3035 121 14 20 0 1 60 1415 549 489 0 0 48 52 0
The last line on the stdoutput is ========= last line ================== no text extracted from document body, but indexing metadata anyway ==========end of last line============
comment:4 by , 10 years ago
Description: | modified (diff) |
---|---|
Resolution: | → notabug |
Status: | new → closed |
Closing the ticket as user error. I was saving a dirty file, binary, but qualified as html to xapian. When indexing it, and seeing the random bytes, vocabulary for xapian would explode and fill the memory. Key to analysis was to use delve and see that the vocabulary was largely composed of random binary words. Solution is to cancel those files from indexing, or qualify their type properly.
It sounds to me like it is probably flushing changes to disk when it appears to hang. Xapian batches up changes to the postlist table, and every 10000 (by default) documents changed it will flush them to disk. This takes a while, especially with a big database. It ought not take hours, but if things stop fitting in memory it may start swapping so probably could.
I would check how much swapping and disk I/O is happening - this will should a reading every 5 seconds until you hit
Ctrl+C
:Look at the columns si/so (which is the number of blocks swapped in the last time interval) and bi/bo (which is blocks read and written by processes) - I'd expect you'll see quite a lot of both.
You can adjust the threshold lower by setting
XAPIAN_FLUSH_THRESHOLD
in the environment (and exporting it so that subprocesses actually see the value set) - e.g. to reduce it to 1000 try:Ideally this should auto-adjust based on the amount of memory needed to batch the data compared to what's available, but it doesn't currently.