Python’s Reference Counting Has Changed (And You Probably Missed It)
Well, that’s not entirely accurate — I actually spent most of last Tuesday staring at a flame graph that absolutely refused to make sense. (I was profiling a high-throughput data ingestion service running on Python 3.14.1, and the CPU usage was pinned.) But here’s the kicker: the application logic wasn’t doing anything heavy.
No complex math, no image processing, no crypto. Just moving JSON objects from a socket to a queue. Yet, the profiler claimed we were spending 30% of our time in _Py_Decref and _Py_Incref.
And if you’ve been following the CPython internals saga since the “No-GIL” (PEP 703) work started landing in 3.13, you know exactly where I’m going with this. We aren’t just dealing with simple reference counting anymore. The rules of memory management have shifted under our feet, and if you’re writing high-performance Python in 2026, you can’t treat the interpreter like a black box anymore.
The Atomic Tax
Back in the day—and by that, I mean like, 2023—Python’s Global Interpreter Lock (GIL) protected us from race conditions on reference counts. Since only one thread could run Python bytecode at a time, updating an object’s reference count was a cheap, non-atomic operation. It was just ob_refcnt++.
But then we got greedy. We wanted real parallelism. We got free-threading.
The trade-off was brutal initially. To make Python thread-safe without the GIL, every single reference count update had to become an atomic operation. If you’ve ever done low-level systems programming, you know that atomic instructions (like compare-and-swap) are expensive. They force CPU cores to synchronize caches, killing performance.
This is why my flame graph looked like a crime scene. My threads were fighting over the reference counts of shared objects.
Enter Deferred Reference Counting
Actually, I should clarify — this is where things get interesting. The core devs didn’t just accept this performance regression. They introduced deferred reference counting (DRC). I was skeptical when I first read the proposal, but seeing it in action on our staging cluster changed my mind.
The concept is deceptively simple: Not all references are created equal.
Most objects in Python are short-lived and only accessed by the thread that created them. Why pay the atomic tax for those? With deferred reference counting, the interpreter can skip the INCREF/DECREF dance for references held on the interpreter stack. It assumes the object is alive because the stack frame is active.
Let’s look at what this actually looks like under the hood. I wrote a quick script to inspect the raw memory of a Python object to see how the refcounts behave differently now compared to older versions.
import sys
import ctypes
import gc
# Define the PyObject structure for CPython 3.14+
# Note: The structure layout can change between versions!
class PyObject(ctypes.Structure):
_fields_ = [
("ob_refcnt", ctypes.c_ssize_t),
("ob_type", ctypes.c_void_p),
]
def get_refcount_internal(obj):
"""
Reads the ob_refcnt field directly from memory.
This avoids the extra refcount that sys.getrefcount() adds.
"""
address = id(obj)
return PyObject.from_address(address).ob_refcnt
def demonstrate_refcounting():
# Create a string object
# We use a constructed string so it's not interned/immortal
s = "hello" + "_" + "world"
print(f"Initial refcount (sys): {sys.getrefcount(s)}")
print(f"Initial refcount (raw): {get_refcount_internal(s)}")
# Create a reference
y = s
print(f"After y = s (raw): {get_refcount_internal(s)}")
# Create a list containing s
container = [s]
print(f"After container = [s] (raw): {get_refcount_internal(s)}")
# In a deferred refcounting world (depending on build flags),
# stack references might behave differently than heap references.
del y
del container
print(f"After cleanup (raw): {get_refcount_internal(s)}")
if __name__ == "__main__":
demonstrate_refcounting()
If you run this on a standard build versus a free-threaded build with deferred counting enabled, the numbers tell a story. In the deferred model, simply passing objects between functions doesn’t necessarily trigger the cache-thrashing atomic updates we used to fear.
The Garbage Collector’s New Job
Deferred reference counting introduces a problem, though. If we aren’t tracking every single reference immediately, how do we know when to delete an object?
We can’t rely solely on the refcount reaching zero instantly. The Garbage Collector (GC) has to pick up the slack. In previous versions of Python, the GC was mostly there to clean up reference cycles (e.g., Object A references Object B, and B references A). Now, it plays a more active role in reconciling these deferred counts.
I ran into a weird edge case last month where memory usage was creeping up because the GC wasn’t running frequently enough for our specific workload. We were creating millions of tiny, short-lived objects in a tight loop.
import gc
import time
class MemoryHog:
def __init__(self, size_mb):
self.data = bytearray(size_mb * 1024 * 1024)
# Create a cycle just to force GC involvement
self.cycle = self
def monitor_gc_stats():
# Enable debug stats
gc.set_debug(gc.DEBUG_STATS)
print(f"GC Thresholds: {gc.get_threshold()}")
# Create some pressure
objs = []
start_time = time.time()
for i in range(50):
# Create 10MB objects
o = MemoryHog(10)
objs.append(o)
# Explicitly break reference to older objects to allow collection
if len(objs) > 5:
objs.pop(0)
print(f"Time taken: {time.time() - start_time:.4f}s")
print(f"GC Counts: {gc.get_count()}")
# Force a collection and see what happens
n = gc.collect()
print(f"Unreachable objects collected: {n}")
if __name__ == "__main__":
monitor_gc_stats()
Immortal Objects: The Unsung Hero
Another piece of this puzzle that doesn’t get enough love is PEP 683 (Immortal Objects). It landed a while back, but it’s crucial for the deferred refcounting strategy to work efficiently.
Objects like None, True, False, and small integers are “immortal.” Their reference count is set to a special value that tells the interpreter: “Don’t bother counting me. I live forever.”
This seems trivial, but think about how many times your code references None. In a multi-threaded environment, updating the refcount for None every single time would be a massive bottleneck. By marking it immortal, we bypass the atomic overhead entirely for these frequent objects.
You can actually see this if you check the refcount of None. It looks like a garbage number, but it’s actually a specific high-value bitmask.
import sys
def check_immortality():
# In older Python versions, this would be a large number like 4294967295
# In 3.12+, it reflects the immortal bit
ref_count = sys.getrefcount(None)
# The actual value depends on the specific 32/64 bit implementation
# But it should be massive.
print(f"Refcount of None: {ref_count}")
# Let's try to verify if it changes
x = None
y = None
z = [None] * 1000
new_ref_count = sys.getrefcount(None)
if new_ref_count == ref_count:
print("Confirmed: None is immortal (refcount didn't change)")
else:
print(f"Refcount changed! Old: {ref_count}, New: {new_ref_count}")
check_immortality()
Why This Matters for Your Code
So, why should you care about any of this? (You’re just writing API endpoints, right?)
Because abstraction leaks. When you understand that Python is deferring reference counts on the stack, you start to write code differently. You realize that keeping objects inside a function scope is significantly cheaper than storing them in global state or shared heap objects where atomic operations are mandatory.
It also changes how you debug memory leaks. If you see memory usage spiking, it might not be a leak in the traditional sense—it might just be the deferred mechanism waiting for a collection cycle. Calling gc.collect() manually during debugging is more important than ever to distinguish between “deferred garbage” and “actual leaks.”
I’m still getting used to the quirks of Python 3.14. It feels faster, sure, but it also feels a bit more unpredictable in terms of memory footprint. But hey, if it means I can finally run threads without the GIL choking my CPU, I’ll take the trade-off.
