Diagnosing and Fixing Reference Leaks in CPython

CPython’s garbage collection relies on each object’s reference count. Each object has their own reference count, when the object is referenced by others, then we will need to increase object’s reference count by the Py_INCREF macro. In another way, when the referencer don’t need the object anymore, it will need to decrease object’s reference count by the Py_DECREF macro. When object’s reference count down to 0, it will be collect by CPython’s GC.

So the problem is, when the programmer didn’t manage object’s reference count correctly, it will leak out the reference, that says the GC won’t collect the object forever — since there will still have a reference count on the object.

How to diagnose reference leaks?

In CPython, there is a module called “test“, it can test the reference leak in -R option:

-R runs each test several times and examines sys.gettotalrefcount() to see if the test appears to be leaking references. The argument should be of the form stab:run:fname where ‘stab’ is the number of times the test is run to let gettotalrefcount settle down, ‘run’ is the number of times further it is run and ‘fname’ is the name of the file the reports are written to. These parameters all have defaults (5, 4 and “reflog.txt” respectively), and the minimal invocation is ‘-R :’.

For example, we can run all unit test in CPython to check the reference leak:

Or run only on one unit test file:

Or using -m option to run only match on test method’s name:

When the test run failed, it will show up the leaked reference count:

How to fixed – leak in test.support.run_in_subinterp

The leak information is provided by Victor Stinner in Python core-mentorship mailing list – New easy C issues: reference leaks with bpo-30536, bpo-30547. Anyone who want to reproduce the leaks, you may checkout with commit 65ece7ca2366308fa91a39a8dfa255e6bdce3cca.

The strategy to fix reference leak in CPython has two step. First is to comment out as more as possible code to get the minimal code to reproduce the leak. Second, when identify out the leak point, use git bisect to find the first bad commit.

Let us apply the methodology.

First: comment as more as possible code.

The full code of test_threads_join_2 is in test_threading.py , and the full test is here:

To comment as much as possible code, we can comment all code first, then uncomment from the top to bottom. After some try, we will get that the leak was came from the line of code:

Digging into it, then apply the same strategy in the code:

The critical line is here:

It call the C-extension module of testcapi, which can be found at Modules/_testcapimodule.c.

Another method to verify that is leaked in functionrun_in_subinterp, is to found other methods that have used the same function and leaked too. We can verify this in test_atexit, test_capi, test_threading where leak in the same place.

Second: git bisect to find the bad commit

We must know one thing, that the bug may not be introduced in _testcapimodule.c/run_in_subinterp, it may be introduced in the function that it used in different place.

So, use git log Modules/_testcapimodule.c to find a far away commit, then checkout to build it, retry the test to check this commit is good or not:

Great, this is a good commit, then we can apply git bisect to find the first bad commit:

Then we will need to rebuild, test and check if this build has refleak or not, if it is leaked, then type git bisect bad, otherwise type git bisect good. At the end, you will get the first bad commit. (Remember to use git bisect reset at the end.)

There have two commit to introduce refleak:

Check if you got it or not!

Thrid: find where got the refleak

Thanks to workflow are migrated to GitHub, we can saw the result on GitHub. We can found the refleak point at initexternalimport, and here. (how to found it? Apply the first step, find and comment as more as possible to find the point).

This will be the hardest part, because you will need to test on different place, and the changed in pull request may be huge like this one.

After that, re-build, re-run the test where leak, then run the full-test to check other place didn’t be affected!

Conclusion

The full fixed of these refleak was sent by @matrixise at #1995, Co-Authored-By: Victor Stinner and Louie Lu.

If you found and question about fixing reference leaked, tweet me or leave the reply.

如果你覺得這篇文章不錯,歡迎打賞 IOTA:MUYIDBBJZHJMSTMQZGHUSSGCKBBKRDKHQRNMLZONXPFYWSBCFOKEYOHBYFZKLVVFDTG9AOSZCBPUDWGTBNXYUDTDOD

Leave a reply:

Your email address will not be published.