Python Concurrency Part 3 threading: The GIL, ThreadPoolExecutor & run_in_executor

← Back to Home

Python threads are OS-level threads real, preemptively scheduled units of execution that can run on any CPU core. But there's a catch: the Global Interpreter Lock (GIL) means only one thread executes Python bytecode at a time. So threads in Python are concurrent but not parallel for CPU-bound work.

This is Part 3 of a six-part series on Python concurrency. We've covered asyncio (Part 1) and subprocess (Part 2). Now we cover threading the right tool when you need to run synchronous, blocking code alongside an async program. Topics:

Full project code: research-agent

1. The Global Interpreter Lock (GIL)

The GIL is a mutex inside CPython (the standard Python interpreter) that ensures only one thread executes Python bytecode at any given moment. It exists because CPython's memory management (reference counting) is not thread-safe the GIL protects internal data structures from corruption.

The practical consequence: even if you have 8 CPU cores and 8 threads, CPython only ever runs one thread's Python code at a time. For CPU-bound work (computation, parsing, numpy operations written in pure Python) this makes threading useless for parallelism.

The GIL is released during I/O operations network calls, disk reads, time.sleep(), and any C extension that does its own work (numpy, sqlite3). This is why threading works well for I/O-bound code even in Python: threads interleave during the wait periods.

If your app downloads files from five vendors, each request spends most of its time waiting on the network. Threads help because one request can wait while another makes progress. If the job is pure number crunching, the GIL becomes the limiting factor instead.
Workload typeThreads help?Why
Network I/O, HTTP calls YesGIL released during wait
Database queries (sqlite3, psycopg2) YesGIL released during I/O
File read/write YesGIL released during I/O
Pure Python number crunching NoGIL blocks true parallelism
numpy / scipy heavy math Partiallynumpy releases GIL during C operations

2. threading.Thread The Low-Level API

threading.Thread is the base class for creating threads. You subclass it or pass a target function. Threads start with .start() and you wait for them with .join().

import threading
import time

def download(name: str, delay: float):
    print(f"{name}: starting download")
    time.sleep(delay)           # GIL released here  other threads can run
    print(f"{name}: done")

# Create three threads
t1 = threading.Thread(target=download, args=("File A", 2))
t2 = threading.Thread(target=download, args=("File B", 1))
t3 = threading.Thread(target=download, args=("File C", 3))

t1.start(); t2.start(); t3.start()
t1.join();  t2.join();  t3.join()

print("All downloads complete")
# All three start concurrently:
File A: starting download
File B: starting download
File C: starting download
File B: done          # delay=1
File A: done          # delay=2
File C: done          # delay=3
All downloads complete

Total wall time ≈ 3 seconds (the longest), not 6 seconds (sum). The GIL releases during time.sleep(), so all three threads run concurrently.

This pattern is useful for simple background jobs like downloading multiple reports, checking several servers, or reading files from different locations without redesigning the whole program around async.

3. ThreadPoolExecutor The Modern API

concurrent.futures.ThreadPoolExecutor is the recommended modern interface. It manages a pool of worker threads and provides a cleaner submit() / map() API. It also integrates seamlessly with asyncio via run_in_executor().

from concurrent.futures import ThreadPoolExecutor
import time

def fetch_sync(url: str) -> str:
    """A synchronous blocking call  simulates requests.get()"""
    time.sleep(1)
    return f"Result from {url}"

urls = ["https://api.source-a.com", "https://api.source-b.com", "https://api.source-c.com"]

with ThreadPoolExecutor(max_workers=3) as executor:
    # map() returns results in input order
    results = list(executor.map(fetch_sync, urls))

for r in results:
    print(r)
# All three run concurrently  total time ~1s not 3s
Result from https://api.source-a.com
Result from https://api.source-b.com
Result from https://api.source-c.com

For finer control individual tasks, callbacks, or exception handling use submit() which returns a Future:

If you need to fetch prices from ten supplier APIs and continue as each one finishes, ThreadPoolExecutor gives you a cleaner way to parallelise that blocking I/O than manually starting and tracking raw threads.
with ThreadPoolExecutor(max_workers=3) as executor:
    futures = {executor.submit(fetch_sync, url): url for url in urls}

from concurrent.futures import as_completed
for future in as_completed(futures):
    url = futures[future]
    try:
        result = future.result()
        print(f"{url}: {result}")
    except Exception as e:
        print(f"{url} failed: {e}")

4. loop.run_in_executor() Bridging Sync into Async

This is the most important threading pattern in an async codebase. Many libraries sqlite3, psycopg2, requests, legacy SDKs are synchronous. Calling them directly inside an async def function blocks the event loop and freezes every other coroutine.

loop.run_in_executor(executor, func, *args) runs func in a thread pool and returns an awaitable. The event loop is free while the thread is running.

import asyncio
import sqlite3
from concurrent.futures import ThreadPoolExecutor

# The synchronous function  runs in a thread
def save_to_db_sync(session_id: str, message: str, db_path: str) -> None:
    conn = sqlite3.connect(db_path)
    conn.execute(
        "INSERT INTO messages (session_id, content) VALUES (?, ?)",
        (session_id, message)
    )
    conn.commit()
    conn.close()

_db_executor = ThreadPoolExecutor(max_workers=2)

# The async wrapper  what the rest of your async code calls
async def save_message(session_id: str, message: str) -> None:
    loop = asyncio.get_event_loop()
    # run_in_executor pushes the blocking call to a thread
    # the event loop is free during the DB write
    await loop.run_in_executor(
        _db_executor,
        save_to_db_sync,
        session_id, message, "app.db"
    )

From the caller's perspective, await save_message(...) looks identical to any other async call. But under the hood the blocking DB write happens in a worker thread, leaving the event loop free to run other coroutines.

If an async API endpoint saves audit logs with a blocking library, calling that library directly can freeze unrelated requests. run_in_executor() moves the slow sync call off the event loop so the rest of the app stays responsive.
Fire-and-Forget Pattern

For non-critical side effects (like syncing to a cache or logging), you can wrap the executor call in asyncio.create_task() so the caller doesn't wait for it at all:

async def handle_request(question: str):
    answer = await llm.generate(question)

    # Don't wait for the DB write  fire and forget
    asyncio.create_task(save_message("session-1", answer))

    return answer    # returns immediately

5. Thread Safety Lock and RLock

Threads share memory. If two threads read and write the same object without coordination, you get race conditions one thread overwrites another's work, or reads a half-updated value. Use threading.Lock to protect shared state.

import threading

counter = 0
lock = threading.Lock()

def increment(n: int):
    global counter
    for _ in range(n):
        with lock:          # acquire lock, do work, release
            counter += 1

threads = [threading.Thread(target=increment, args=(100_000,)) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()

print(counter)    # Always 500000  never a race-corrupted value

Without the lock, counter += 1 (which is actually three operations: read, add, write) can be interrupted between steps, causing lost updates.

Think of two threads updating the same inventory count after orders arrive at nearly the same time. Without a Lock, one write can overwrite the other and leave stock numbers wrong.
RLock Reentrant Lock

A regular Lock will deadlock if the same thread tries to acquire it twice. Use threading.RLock (reentrant lock) when a function that holds a lock needs to call another function that also acquires the same lock.

This happens in real code when a high-level "update account" function grabs a lock and then calls a lower-level helper that also protects the same account object. RLock prevents that nested call from deadlocking the thread against itself.
rlock = threading.RLock()

def outer():
    with rlock:
        inner()           # safe  same thread can re-acquire

def inner():
    with rlock:           # would deadlock with regular Lock
        print("inner")
threading.local() Per-Thread State

threading.local() creates a storage object where each thread gets its own independent copy. Useful for database connections you want one connection per thread, not shared across threads.

If each worker thread handles a different customer request, thread-local storage lets each one keep its own DB connection or request context without accidentally leaking that state into another request.
import threading
import sqlite3

_thread_local = threading.local()

def get_connection(db_path: str) -> sqlite3.Connection:
    if not hasattr(_thread_local, "conn"):
        _thread_local.conn = sqlite3.connect(db_path)
    return _thread_local.conn   # each thread gets its own connection

6. Threading in the Research Agent

In the Research Agent, sqlite3 is used for long-term memory (persisting conversation history between sessions). Because sqlite3 has no async API, calling it directly from the event loop would block it. The solution is a dedicated ThreadPoolExecutor:

Full project code: agents/threaded_memory/memory_agent.py

# agents/threaded_memory/memory_agent.py (excerpt)
import asyncio
import sqlite3
from concurrent.futures import ThreadPoolExecutor

_db_executor = ThreadPoolExecutor(max_workers=2, thread_name_prefix="memory-agent")

def _save_state_sync(state, db_path: str) -> None:
    """Pure sync  runs in a worker thread, never on the event loop."""
    conn = sqlite3.connect(db_path)
    conn.execute("""
        INSERT OR REPLACE INTO sessions (session_id, summary, updated_at)
        VALUES (?, ?, ?)
    """, (state.session_id, state.summary, "now"))
    conn.executemany(
        "INSERT INTO messages (session_id, role, content) VALUES (?,?,?)",
        [(state.session_id, m.role, m.content) for m in state.messages]
    )
    conn.commit()
    conn.close()

async def save_state(state) -> None:
    """Async wrapper  event loop stays free during the DB write."""
    loop = asyncio.get_event_loop()

    async def _do():
        await loop.run_in_executor(_db_executor, _save_state_sync, state, "agent_memory.db")

    asyncio.create_task(_do())    # fire-and-forget  caller doesn't wait

The two-layer design is intentional: _save_state_sync is a pure synchronous function with no asyncio imports it's safe to call from a thread. The save_state async wrapper handles scheduling and is the only interface the rest of the codebase sees.

In a chat app, saving memory to SQLite after each reply should not delay the answer reaching the user. Offloading that write to a thread keeps the conversation feeling instant while persistence happens in the background.

7. Threading vs asyncio vs ProcessPoolExecutor

ScenarioBest choiceWhy
Async-native I/O (aiohttp, asyncpg)asyncioNo threads needed library handles it
Sync I/O library (sqlite3, requests, psycopg2)threading + run_in_executorGIL releases during I/O; bridges to async
CPU-bound Python computationProcessPoolExecutorSeparate process bypasses GIL entirely
Running isolated/untrusted codesubprocessFull process isolation with killable handle
Many short-lived concurrent tasks (>1000)asyncioThreads have memory overhead; coroutines don't
Legacy sync codebase integrationthreadingNo async refactor needed
If your problem is "this library blocks but I do not want to rewrite everything," threading is often the practical upgrade. If the problem is CPU-heavy work or hard isolation, a process-based solution is usually the better optimization.

8. Conclusion

Python threading is the adapter between the synchronous world and the async world. Its key strength is that you can take any blocking, synchronous library and run it in a thread pool with run_in_executor(), making it compatible with an async event loop without rewriting the library.

With all three concurrency tools now understood, the next three posts will build the complete Research Agent from scratch watching exactly where and why each tool is applied in a real production-style codebase.

Threading isn't broken it's just often the wrong tool. Know the GIL, pick the right primitive, and write code that is fast where it needs to be and safe everywhere else.

← Part 2: Subprocess Next: Building the Research Agent →