Logging memory usage in Python

Lately I’ve had to optimize machine learning code in Python, and there are many modules that will give you a time breakdown per function or per line, such as cProfile or kernprof. However I also have memory usage issues (as in MemoryError, RAM maxed), and I suspect I sometimes have spikes of memory usage that I can’t see from ‘top’.

Here’s a little utility class I wrote to log usage after certain calls, to track what memory was allocated at chosen points in the program. I only create one such MemoryUsageLogger, a global one, and then it’s very easy to track usage at a new spot. E.g. once “global_logger” is created, I can simply do “global_logger.log_current_usage()” and I’ll get lines such as:

5292032    memusage.py:52 (test_MemoryUsageLogger)    2012-08-31 16:36:50.517296
5324800    memusage.py:52 (test_MemoryUsageLogger)    2012-08-31 16:36:50.517529
5332992    memusage.py:52 (test_MemoryUsageLogger)    2012-08-31 16:36:50.517768

Here’s the code (link to raw file), which relies on the psutil module (or the ‘procfs’ system on Linux, as a fallback; here’s a statm format description).

import os
import traceback
import datetime

# this is a fallback if psutil is not available, but it won't work
# on systems which don't have procfs, the statm file, or a different statm
# file format (if such a thing even exists, just sayin')
def get_resident_memory_from_procfs():
    pid = os.getpid()
    with open("/proc/"+str(pid)+"/statm", "r") as f:
        statm = f.read()
        resident = int(statm.split(" ")[1]) * 4096
    return resident

class MemoryUsageLogger(object):
    def __init__(self, log_file):
        self.log_file = log_file

        with open(self.log_file, "w") as f:
            print >>f, "Format: resident memory (bytes), location of "\
                       "call in code, timestamp"

        try:
            import psutil
            self.get_resident_memory = \
                        lambda: psutil.Process(os.getpid()).get_memory_info()[0]
        except ImportError:
            self.get_resident_memory = get_resident_memory_from_procfs

    def log_current_usage(self):
        st = traceback.extract_stack(limit=2)[0]
        caller_location = "%s:%s (%s)" % (st[0], st[1], st[2])

        line = "%s\t%s\t%s\n" % (self.get_resident_memory(),
                                 caller_location,
                                 str(datetime.datetime.now()))
        with open(self.log_file, "a") as f:
            f.write(line)

def test_MemoryUsageLogger():
    import sys, tempfile

    f = tempfile.NamedTemporaryFile(delete=False)
    f_name = f.name
    f.close()

    logger = MemoryUsageLogger(f_name)

    a = []
    for i in range(100):
        # increasing memory usage with a list of 1000 elements each time
        a.append([42 for _ in range(1000)])
        logger.log_current_usage()

    # now read the log back and check it
    with open(f_name, "r") as f:
        all_lines = f.readlines()

        # comment these two lines to make the test silent
        for line in all_lines:
            sys.stdout.write(line)

        # skipping first line, which shows format
        usages = [int(line.split("\t")[0]) for line in all_lines[1:]]
        for u_idx in range(len(usages)-1):
           assert usages[u_idx] < usages[u_idx+1]

    os.unlink(f_name) 

if __name__ == '__main__':
    test_MemoryUsageLogger()

Please note:

  • The ‘statm’ version supposes a page size of 4096 bytes.
  • I’m not sure if the information provided by /proc/###/statm and/or psutils has a delay associated with it (ie. if you create a large Python object and then get statm information immediately, will the information have been updated?). From my own limited testing, and from quick Google searches turning out nada, I’ll suppose it reflects the current state, but if you know better, please let me know.

Leave a comment