CPython’s garbage collection relies on each object’s reference count. Each object has their own reference count, when the object is referenced by others, then we will need to increase object’s reference count by the Py_INCREF
macro. In another way, when the referencer don’t need the object anymore, it will need to decrease object’s reference count by the Py_DECREF
macro. When object’s reference count down to 0, it will be collect by CPython’s GC.
1 2 3 4 5 6 7 8 9 10 |
/* Nothing is actually declared to be a PyObject, but every pointer to * a Python object can be cast to a PyObject*. This is inheritance built * by hand. Similarly every pointer to a variable-size Python object can, * in addition, be cast to PyVarObject*. */ typedef struct _object { _PyObject_HEAD_EXTRA Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject; |
So the problem is, when the programmer didn’t manage object’s reference count correctly, it will leak out the reference, that says the GC won’t collect the object forever — since there will still have a reference count on the object.
How to diagnose reference leaks?
In CPython, there is a module called “test“, it can test the reference leak in -R
option:
-R runs each test several times and examines sys.gettotalrefcount() to see if the test appears to be leaking references. The argument should be of the form stab:run:fname where ‘stab’ is the number of times the test is run to let gettotalrefcount settle down, ‘run’ is the number of times further it is run and ‘fname’ is the name of the file the reports are written to. These parameters all have defaults (5, 4 and “reflog.txt” respectively), and the minimal invocation is ‘-R :’.
For example, we can run all unit test in CPython to check the reference leak:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
$ ./python –m test –R 3:3 == CPython 3.7.0a0 (heads/master:ff48739ed0, Jun 7 2017, 10:55:13) [GCC 6.3.1 20170306] == Linux–4.11.3–1–ARCH–x86_64–with–arch little–endian == hash algorithm: siphash24 64bit == cwd: /home/grd/Python/cpython/build/test_python_19384 == CPU count: 4 == encodings: locale=UTF–8, FS=utf–8 Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0) Run tests sequentially 0:00:00 load avg: 0.77 [ 1/405] test_grammar beginning 6 repetitions 123456 ...... 0:00:00 load avg: 0.77 [ 2/405] test_opcodes beginning 6 repetitions 123456 ...... 0:00:00 load avg: 0.77 [ 3/405] test_dict |
Or run only on one unit test file:
1 2 3 4 5 6 |
$ ./python –m test –R 3:3 test_threading Run tests sequentially 0:00:00 load avg: 0.29 [1/1] test_threading beginning 6 repetitions 123456 .. |
Or using -m
option to run only match on test method’s name:
1 2 3 4 5 6 7 8 9 10 |
$ ./python –m test –R 3:3 test_threading –m test_various_ops Run tests sequentially 0:00:00 load avg: 0.47 [1/1] test_threading beginning 6 repetitions 123456 ...... 1 test OK. Total duration: 189 ms Tests result: SUCCESS |
When the test run failed, it will show up the leaked reference count:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$ ./python –m test –R 3:3 test_threading –m test_threads_join_2 Run tests sequentially 0:00:00 load avg: 0.24 [1/1] test_threading beginning 6 repetitions 123456 ...... test_threading leaked [3, 3, 3] references, sum=9 test_threading failed 1 test failed: test_threading Total duration: 1 sec Tests result: FAILURE |
How to fixed – leak in test.support.run_in_subinterp
The leak information is provided by Victor Stinner in Python core-mentorship mailing list – New easy C issues: reference leaks with bpo-30536, bpo-30547. Anyone who want to reproduce the leaks, you may checkout with commit 65ece7ca2366308fa91a39a8dfa255e6bdce3cca
.
The strategy to fix reference leak in CPython has two step. First is to comment out as more as possible code to get the minimal code to reproduce the leak. Second, when identify out the leak point, use git bisect to find the first bad commit.
Let us apply the methodology.
First: comment as more as possible code.
The full code of test_threads_join_2
is in test_threading.py
, and the full test is here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
def test_threads_join_2(self): # Same as above, but a delay gets introduced after the thread’s # Python code returned but before the thread state is deleted. # To achieve this, we register a thread-local object which sleeps # a bit when deallocated. r, w = os.pipe() self.addCleanup(os.close, r) self.addCleanup(os.close, w) code = r“””if 1: import os import threading import time class Sleeper: def __del__(self): time.sleep(0.05) tls = threading.local() def f(): # Sleep a bit so that the thread is still running when # Py_EndInterpreter is called. time.sleep(0.05) tls.x = Sleeper() os.write(%d, b”x”) threading.Thread(target=f).start() “”” % (w,) ret = test.support.run_in_subinterp(code) self.assertEqual(ret, 0) # The thread was joined properly. self.assertEqual(os.read(r, 1), b“x”) |
To comment as much as possible code, we can comment all code first, then uncomment from the top to bottom. After some try, we will get that the leak was came from the line of code:
1 |
ret = test.support.run_in_subinterp(code) |
Digging into it, then apply the same strategy in the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
def run_in_subinterp(code): “”” Run code in a subinterpreter. Raise unittest.SkipTest if the tracemalloc module is enabled. “”” # Issue #10915, #15751: PyGILState_*() functions don’t work with # sub-interpreters, the tracemalloc module uses these functions internally try: import tracemalloc except ImportError: pass else: if tracemalloc.is_tracing(): raise unittest.SkipTest(“run_in_subinterp() cannot be used “ “if tracemalloc module is tracing “ “memory allocations”) import _testcapi return _testcapi.run_in_subinterp(code) |
The critical line is here:
1 |
return _testcapi.run_in_subinterp(code) |
It call the C-extension module of testcapi, which can be found at Modules/_testcapimodule.c
.
Another method to verify that is leaked in functionrun_in_subinterp
, is to found other methods that have used the same function and leaked too. We can verify this in test_atexit
, test_capi
, test_threading
where leak in the same place.
Second: git bisect to find the bad commit
We must know one thing, that the bug may not be introduced in _testcapimodule.c/run_in_subinterp
, it may be introduced in the function that it used in different place.
So, use git log Modules/_testcapimodule.c
to find a far away commit, then checkout to build it, retry the test to check this commit is good or not:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# I choose commit 13e602ea0f08e8c04d635356375d1d2ab5a9b964 $ git checkout 13e602ea0f08e8c04d635356375d1d2ab5a9b964 $ cp Modules/Setup.dist Modules/Setup $ make –j8 $ ./python –m test –R 3:3 test_threading –m test_threads_join_2 Run tests sequentially 0:00:00 load avg: 0.49 [1/1] test_threading beginning 6 repetitions 123456 ...... 1 test OK. Total duration: 1 sec Tests result: SUCCESS $ |
Great, this is a good commit, then we can apply git bisect
to find the first bad commit:
1 2 3 4 5 6 |
$ git bisect start $ git bisect bad master $ git bisect good 13e602ea0f08e8c04d635356375d1d2ab5a9b964 Bisecting: 3781 revisions left to test after this (roughly 12 steps) [4b9abf3a27185aaceb6db39ef1e1fa784f420b4f] merge 3.5 $ |
Then we will need to rebuild, test and check if this build has refleak or not, if it is leaked, then type git bisect bad
, otherwise type git bisect good
. At the end, you will get the first bad commit. (Remember to use git bisect reset
at the end.)
There have two commit to introduce refleak:
1 2 3 4 5 |
6b4be195cd8868b76eb6fbe166acc39beee8ce36 bad f9169ce6b48c7cc7cc62d9eb5e4ee1ac7066d14b good 1abcf6700b4da6207fe859de40c6c1bada6b4fec bad c842efc6aedf979b827a9473192f46cec53d58db good |
Check if you got it or not!
Thrid: find where got the refleak
Thanks to workflow are migrated to GitHub, we can saw the result on GitHub. We can found the refleak point at initexternalimport, and here. (how to found it? Apply the first step, find and comment as more as possible to find the point).
This will be the hardest part, because you will need to test on different place, and the changed in pull request may be huge like this one.
After that, re-build, re-run the test where leak, then run the full-test to check other place didn’t be affected!
Conclusion
The full fixed of these refleak was sent by @matrixise at #1995, Co-Authored-By: Victor Stinner and Louie Lu.
If you found and question about fixing reference leaked, tweet me or leave the reply.
Leave a Reply