Lately I’ve had to optimize machine learning code in Python, and there are many modules that will give you a time breakdown per function or per line, such as cProfile or kernprof. However I also have memory usage issues (as in MemoryError, RAM maxed), and I suspect I sometimes have spikes of memory usage that I can’t see from ‘top’.
Here’s a little utility class I wrote to log usage after certain calls, to track what memory was allocated at chosen points in the program. I only create one such MemoryUsageLogger, a global one, and then it’s very easy to track usage at a new spot. E.g. once “global_logger” is created, I can simply do “global_logger.log_current_usage()” and I’ll get lines such as:
5292032 memusage.py:52 (test_MemoryUsageLogger) 2012-08-31 16:36:50.517296 5324800 memusage.py:52 (test_MemoryUsageLogger) 2012-08-31 16:36:50.517529 5332992 memusage.py:52 (test_MemoryUsageLogger) 2012-08-31 16:36:50.517768
import os import traceback import datetime # this is a fallback if psutil is not available, but it won't work # on systems which don't have procfs, the statm file, or a different statm # file format (if such a thing even exists, just sayin') def get_resident_memory_from_procfs(): pid = os.getpid() with open("/proc/"+str(pid)+"/statm", "r") as f: statm = f.read() resident = int(statm.split(" ")) * 4096 return resident class MemoryUsageLogger(object): def __init__(self, log_file): self.log_file = log_file with open(self.log_file, "w") as f: print >>f, "Format: resident memory (bytes), location of "\ "call in code, timestamp" try: import psutil self.get_resident_memory = \ lambda: psutil.Process(os.getpid()).get_memory_info() except ImportError: self.get_resident_memory = get_resident_memory_from_procfs def log_current_usage(self): st = traceback.extract_stack(limit=2) caller_location = "%s:%s (%s)" % (st, st, st) line = "%s\t%s\t%s\n" % (self.get_resident_memory(), caller_location, str(datetime.datetime.now())) with open(self.log_file, "a") as f: f.write(line) def test_MemoryUsageLogger(): import sys, tempfile f = tempfile.NamedTemporaryFile(delete=False) f_name = f.name f.close() logger = MemoryUsageLogger(f_name) a =  for i in range(100): # increasing memory usage with a list of 1000 elements each time a.append([42 for _ in range(1000)]) logger.log_current_usage() # now read the log back and check it with open(f_name, "r") as f: all_lines = f.readlines() # comment these two lines to make the test silent for line in all_lines: sys.stdout.write(line) # skipping first line, which shows format usages = [int(line.split("\t")) for line in all_lines[1:]] for u_idx in range(len(usages)-1): assert usages[u_idx] < usages[u_idx+1] os.unlink(f_name) if __name__ == '__main__': test_MemoryUsageLogger()
- The ‘statm’ version supposes a page size of 4096 bytes.
- I’m not sure if the information provided by /proc/###/statm and/or psutils has a delay associated with it (ie. if you create a large Python object and then get statm information immediately, will the information have been updated?). From my own limited testing, and from quick Google searches turning out nada, I’ll suppose it reflects the current state, but if you know better, please let me know.