Opened 16 years ago

Closed 15 years ago

#358 closed defect (fixed)

Omega: omindex eating up all available physical memory

Reported by: Eric Voisard Owned by: Olly Betts
Priority: normal Milestone: 1.0.17
Component: Omega Version: 1.0.12
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: Linux

Description

I'm having a recurring problem with Omega's indexing with omindex.

When I run omindex, sometimes it is like if it was missing to recognize the extension of some .doc and .pdf files and it skips them with an "Unknown extension ... - skipping" message. In the same run, omindex is otherwise perfectly able to index other files with same extensions.

If I manually run antiword on a .doc file that failed previously, it works. If I narrow down the directory structure to make the recursion and indexing lighter, and then I run omindex, it works with files that failed previously. It never seems to fail with html and plain text files (built-in formats)

Each time a failure occurs and a file is skipped, a kernel error like the following one is recorded in /var/log/messages:

Apr 21 14:10:12 zen kernel: sh[4153]: segfault at ffffffffffffffff rip 00002ac7e7c4581f rsp 00007fffc3452de0 error 4

As reported by some other users who had same problem, it can be due to my system running low on memory and omindex not being able to run the external converter.

I ran omindex while checking the system's memory usage. The system (SLES10) has 1GB of RAM. Approx two thirds of that was used by other processes. Over the free third, omindex gradually took up all until 10MB remained free. At this point, memory usage stabilized and failures began to occur. Remember that it doesn't fail on every subsequent .doc or .pdf, but only on some of them.

--- Before running omindex:

top - 13:44:50 up 187 days, 21:46,  6 users,  load average: 2.02, 2.03, 2.08
Tasks: 149 total,   2 running, 147 sleeping,   0 stopped,   0 zombie
Cpu(s): 50.0%us,  0.0%sy,  0.0%ni, 50.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1027164k total,   654256k used,   372908k free,     7760k buffers
Swap:  4200988k total,   258284k used,  3942704k free,   404760k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3268 root      34  19  261m 4656 2440 S  100  0.5 270524:02 zmd
    1 root      16   0   796   72   40 S    0  0.0   0:00.96 init
    2 root      RT   0     0    0    0 S    0  0.0   0:00.18 migration/0
    3 root      34  19     0    0    0 S    0  0.0   0:00.00 ksoftirqd/0
    4 root      RT   0     0    0    0 S    0  0.0   0:00.14 migration/1
    5 root      34  19     0    0    0 S    0  0.0   0:00.00 ksoftirqd/1

--- Beginning (omindex using 9860kB or less than 1% of RAM):

top - 13:45:35 up 187 days, 21:47,  6 users,  load average: 2.16, 2.05, 2.09
Tasks: 152 total,   1 running, 151 sleeping,   0 stopped,   0 zombie
Cpu(s): 63.7%us,  6.5%sy,  0.0%ni, 29.4%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:   1027164k total,   678548k used,   348616k free,     8224k buffers
Swap:  4200988k total,   258284k used,  3942704k free,   421772k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3268 root      34  19  261m 4656 2440 S   93  0.5 270524:47 zmd
  829 root      17   0 16864 6520 1788 D   20  0.6   0:00.64 omindex
28316 root      10  -5     0    0    0 S    1  0.0   1:12.45 cifsd
31328 root      16   0  5656 1260  876 R    1  0.1   0:14.33 top
    1 root      16   0   796   72   40 S    0  0.0   0:00.96 init


[evoisard@zen]/home/evoisard > date ; ps aux | grep omindex
Tue Apr 21 13:45:40 CEST 2009
root       829  8.7  0.9  20052  9860 pts/5    D+   13:45   0:01 \
   /usr/local/bin/omindex --db /srv/xapian/test --follow --url /docs/test/ /srv/xapian/targets/test

--- During runtime, still working fine (omindex using 103200 kB or 10% of RAM):

top - 13:56:14 up 187 days, 21:58,  6 users,  load average: 3.06, 2.98, 2.58
Tasks: 153 total,   1 running, 152 sleeping,   0 stopped,   0 zombie
Cpu(s): 60.7%us,  3.0%sy,  0.0%ni, 35.8%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:   1027164k total,   824360k used,   202804k free,    10340k buffers
Swap:  4200988k total,   258284k used,  3942704k free,   464760k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3268 root      34  19  261m 4656 2440 S   99  0.5 270535:07 zmd
  829 root      17   0  110m 100m 1820 S   25 10.0   0:55.32 omindex
31328 root      16   0  5656 1260  876 R    1  0.1   0:16.99 top
    1 root      16   0   796   72   40 S    0  0.0   0:00.96 init
    2 root      RT   0     0    0    0 S    0  0.0   0:00.18 migration/0
    3 root      34  19     0    0    0 S    0  0.0   0:00.00 ksoftirqd/0


[evoisard@zen]/home/evoisard > date ; ps aux | grep omindex
Tue Apr 21 13:56:17 CEST 2009
root       829  8.5 10.0 113396 103200 pts/5   D+   13:45   0:55 \
   /usr/local/bin/omindex --db /srv/xapian/test --follow --url /docs/test/ /srv/xapian/targets/test

--- Close to the end, docs skipping and segfaults occuring (omindex using 369340kB or 36% of RAM):

top - 14:10:23 up 187 days, 22:12,  6 users,  load average: 3.19, 3.22, 2.96
Tasks: 152 total,   2 running, 150 sleeping,   0 stopped,   0 zombie
Cpu(s): 94.6%us,  1.5%sy,  0.0%ni,  0.0%id,  3.0%wa,  0.0%hi,  1.0%si,  0.0%st
Mem:   1027164k total,  1017024k used,    10140k free,      996k buffers
Swap:  4200988k total,   258284k used,  3942704k free,   401204k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3268 root      34  19  261m 4656 2440 S  100  0.5 270549:04 zmd
  829 root      18   0  370m 360m 1920 D   93 36.0   5:05.18 omindex
  154 root      15   0     0    0    0 S    1  0.0   0:08.67 kswapd0
31328 root      16   0  5656 1260  876 R    1  0.1   0:20.52 top
    1 root      16   0   796   72   40 S    0  0.0   0:00.96 init
    2 root      RT   0     0    0    0 S    0  0.0   0:00.18 migration/0


[evoisard@zen]/home/evoisard > date ; ps aux | grep omindex
Tue Apr 21 14:10:28 CEST 2009
root       829 20.5 35.9 379324 369340 pts/5   D+   13:45   5:08 \
   /usr/local/bin/omindex --db /srv/xapian/test --follow --url /docs/test/ /srv/xapian/targets/test

When omindex terminates, all reserved resources are freed.

So, it looks like omindex somehow is not releasing all the runtime memory it is using. Sure I could add more memory to the system, but would then omindex not eat up all the extra memory too, or would it not have same problem again if the directories to index increase in size..

I don't know if this memory is required for handling the database itself or if it's used for the runtime and the filtering/indexing jobs.

I don't know if this behavior should be considered a bug or not, or if the process could be optimized. I let Xapian masters decide...

Anyway, many thanks for the wonderful work! Eric

Change History (5)

comment:1 by Eric Voisard, 16 years ago

Version: 1.0.12

I forgot to mention the version of Xapian/Omega I'm using. Eric

comment:2 by Olly Betts, 15 years ago

Assuming you're using GCC 3.4 or newer, could you try:

GLIBCXX_FORCE_NEW=1
export GLIBCXX_FORCE_NEW

And then run omindex.

This tells the C++ STL allocator not to horde memory it has previously allocated, which might be at least part of the issue here.

comment:3 by Olly Betts, 15 years ago

Milestone: 1.0.17
Status: newassigned

I think this Debian bug explains the issue here:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=548987

That's fixed in trunk r13572.

The other factor is I believe due to the C++ STL hording released memory, as I suggested in comment:2. In the absence of any feedback on that, I plan to backport the _SC_PHYS_PAGES change in r13572 for 1.0.17 and then close this ticket. If you (or anyone) is still seeing issues after that, please supply the requested information and reopen this ticket.

comment:4 by Eric Voisard, 15 years ago

I added 1GB (=>2GB) to this system, now it runs fine. As it's now in production it'll not be easy to remove the memory and redo the tests. Hopefully, next week I'll have time for this...

Thanks, Eric

comment:5 by Olly Betts, 15 years ago

Resolution: fixed
Status: assignedclosed

Backported for 1.0.17 in r13600.

I think it makes sense to close this ticket now. If the memory usage issue isn't STL hoarding, then please open a new ticket for it.

Note: See TracTickets for help on using tickets.