Mind Your PyO3 Allocations!
WARNING: This post covers the old pre-v0.21.0 PyO3 API. For more about the new "Bound" API see this post by David Hewitt. Thanks to @Dr_Emann for alerting me to this.
I was recently messing around with the PyO3 Rust library. It let's you write Rust code that either calls Python code or is itself called by Python code. In my use case (processing lots of lines of a JSON file and call a Python function), I was having a problem where the program's memory usage grew unbounded.
The problem arises from misunderstanding how PyO3's memory management works. The gist below shows two versions of the same function, process_file
.
The second version in pyo3_unbounded_mem.rs has unbounded memory usage. Every iteration of the innermost loop allocates a PyDict
object and a PyTuple
object.
These objects exist until the Python process terminates. Meanwhile the first version in pyo3_bounded_mem.rs shows that the inner loop is now wrapped by a call
to Python::with_gil
. This allows us to create a new pool via py.new_pool()
.
This pool now holds all Python objects that get allocated inside the inner loop. Now when the Python::with_gil
block finishes, the pool is dropped, decrementing the reference count of the allocated Python objects to allow the Python GC to do cleanup.
Incidentally, this Rust code isn't a smart way to do this. It's usually just faster to process it in Python. As always, make sure you read the docs :) The PyO3 memory management section has way more detail on this topic.
P.S. Thanks to @adimancv for the gist darkmode theme.