#637 closed defect (worksforme)
Potential memory leak when assigning MSetItem values
Reported by: | Jeff Rand | Owned by: | Richard Boulton |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Xapian-bindings (Python) | Version: | 1.2.15 |
Severity: | normal | Keywords: | Memory leak |
Cc: | Blocked By: | ||
Blocking: | Operating System: | Linux |
Description (last modified by )
I've traced a memory leak to a statement which assigns the values from an MSetItem to a dictionary which is then appended to a list in python. We're running python 2.7.3, xapian-core 1.2.15 and xapian-bindings 1.2.15. I've provided an example which reproduces the behavior below. The example prints the PID and has a few statements waiting for input to make observing the behavior easier.
Run the following code and monitor the PID's memory usage in top or a similar program. I've observed the resident memory for this example go from 18m to 52m after deleting objects and running garbage collection.
I think the MSetItems are preserved in memory and are not being garbage collected correctly, possibly from a lingering reference to the MSet or MSetIterator.
import os import simplejson as json import xapian as x import shutil import gc def make_db(path, num_docs=100000): try: shutil.rmtree(path) except OSError, e: if e.errno != 2: raise db = x.WritableDatabase(path, x.DB_CREATE) for i in xrange(1, num_docs): doc = x.Document() doc.set_data(json.dumps({ 'id': i, 'enabled': True })) doc.add_term('XTYPA') db.add_document(doc) return db def run_query(db, num_docs=100000): e = x.Enquire(db) e.set_query(x.Query('XTYPA')) m = e.get_mset(0, num_docs, True, None) # Store the MSetItem's data, which causes a memory leak data = [] for i in m: data.append({ 'data': i.document.get_data(), 'id': i.docid, }) # Make sure I'm not crazy del num_docs, db, i, e, m, data gc.collect() def main(): # print the PID to monitor print 'PID to monitor: {}'.format(os.getpid()) db = make_db('/tmp/test.db') raw_input("database is done, ready?") run_query(db, 100000) raw_input('done?') if __name__ == '__main__': main()
Attachments (1)
Change History (4)
comment:1 by , 11 years ago
Description: | modified (diff) |
---|
comment:2 by , 11 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
No further info for 6 weeks, so closing as "worksforme".
If anyone can show evidence that there's actually a leak here (rather than just memory pooling by C++), please reopen.
If you're using GCC >= 3.4, you can export GLIBCXX_FORCE_NEW=1
before running your code to stop it doing this, which might help to determine if this is the cause of what you're seeing.
comment:3 by , 11 years ago
Milestone: | 1.2.x |
---|
If you ask the python gc module now many objects are allocated, it doesn't increase. The attached slightly modified version of your script shows this (note calling
gc.collect()
more than once sometimes seems to be necessary to actually collect all objects - not sure why).On trunk:
And HEAD of 1.2 branch:
So I don't see how this can be Python hanging on to objects.
I think this is just due to C++'s allocator hanging on to memory. As I said in my reply to the mailing list, this memory should just get reused by later operations (like the next query you run).