Python threads are OS-level threads real, preemptively scheduled units of execution that can run on any CPU core. But there's a catch: the Global Interpreter Lock (GIL) means only one thread executes Python bytecode at a time. So threads in Python are concurrent but not parallel for CPU-bound work.
This is Part 3 of a six-part series on Python concurrency. We've covered asyncio (Part 1) and subprocess (Part 2). Now we cover threading the right tool when you need to run synchronous, blocking code alongside an async program. Topics:
threading.Thread the low-level APIThreadPoolExecutor the modern high-level APIloop.run_in_executor() bridging sync code into an async programLock, RLock, and ThreadLocalFull project code: research-agent
The GIL is a mutex inside CPython (the standard Python interpreter) that ensures only one thread executes Python bytecode at any given moment. It exists because CPython's memory management (reference counting) is not thread-safe the GIL protects internal data structures from corruption.
The practical consequence: even if you have 8 CPU cores and 8 threads, CPython only ever runs one thread's Python code at a time. For CPU-bound work (computation, parsing, numpy operations written in pure Python) this makes threading useless for parallelism.
The GIL is released during I/O operations network calls, disk reads, time.sleep(), and any C extension that does its own work (numpy, sqlite3). This is why threading works well for I/O-bound code even in Python: threads interleave during the wait periods.
| Workload type | Threads help? | Why |
|---|---|---|
| Network I/O, HTTP calls | Yes | GIL released during wait |
| Database queries (sqlite3, psycopg2) | Yes | GIL released during I/O |
| File read/write | Yes | GIL released during I/O |
| Pure Python number crunching | No | GIL blocks true parallelism |
| numpy / scipy heavy math | Partially | numpy releases GIL during C operations |
threading.Thread is the base class for creating threads. You subclass it or pass a target function. Threads start with .start() and you wait for them with .join().
import threading
import time
def download(name: str, delay: float):
print(f"{name}: starting download")
time.sleep(delay) # GIL released here other threads can run
print(f"{name}: done")
# Create three threads
t1 = threading.Thread(target=download, args=("File A", 2))
t2 = threading.Thread(target=download, args=("File B", 1))
t3 = threading.Thread(target=download, args=("File C", 3))
t1.start(); t2.start(); t3.start()
t1.join(); t2.join(); t3.join()
print("All downloads complete")
# All three start concurrently:
File A: starting download
File B: starting download
File C: starting download
File B: done # delay=1
File A: done # delay=2
File C: done # delay=3
All downloads complete
Total wall time ≈ 3 seconds (the longest), not 6 seconds (sum). The GIL releases during time.sleep(), so all three threads run concurrently.
concurrent.futures.ThreadPoolExecutor is the recommended modern interface. It manages a pool of worker threads and provides a cleaner submit() / map() API. It also integrates seamlessly with asyncio via run_in_executor().
from concurrent.futures import ThreadPoolExecutor
import time
def fetch_sync(url: str) -> str:
"""A synchronous blocking call simulates requests.get()"""
time.sleep(1)
return f"Result from {url}"
urls = ["https://api.source-a.com", "https://api.source-b.com", "https://api.source-c.com"]
with ThreadPoolExecutor(max_workers=3) as executor:
# map() returns results in input order
results = list(executor.map(fetch_sync, urls))
for r in results:
print(r)
# All three run concurrently total time ~1s not 3s
Result from https://api.source-a.com
Result from https://api.source-b.com
Result from https://api.source-c.com
For finer control individual tasks, callbacks, or exception handling use submit() which returns a Future:
ThreadPoolExecutor gives you a cleaner way to parallelise that blocking I/O than manually starting and tracking raw threads.with ThreadPoolExecutor(max_workers=3) as executor:
futures = {executor.submit(fetch_sync, url): url for url in urls}
from concurrent.futures import as_completed
for future in as_completed(futures):
url = futures[future]
try:
result = future.result()
print(f"{url}: {result}")
except Exception as e:
print(f"{url} failed: {e}")
This is the most important threading pattern in an async codebase. Many libraries sqlite3, psycopg2, requests, legacy SDKs are synchronous. Calling them directly inside an async def function blocks the event loop and freezes every other coroutine.
loop.run_in_executor(executor, func, *args) runs func in a thread pool and returns an awaitable. The event loop is free while the thread is running.
import asyncio
import sqlite3
from concurrent.futures import ThreadPoolExecutor
# The synchronous function runs in a thread
def save_to_db_sync(session_id: str, message: str, db_path: str) -> None:
conn = sqlite3.connect(db_path)
conn.execute(
"INSERT INTO messages (session_id, content) VALUES (?, ?)",
(session_id, message)
)
conn.commit()
conn.close()
_db_executor = ThreadPoolExecutor(max_workers=2)
# The async wrapper what the rest of your async code calls
async def save_message(session_id: str, message: str) -> None:
loop = asyncio.get_event_loop()
# run_in_executor pushes the blocking call to a thread
# the event loop is free during the DB write
await loop.run_in_executor(
_db_executor,
save_to_db_sync,
session_id, message, "app.db"
)
From the caller's perspective, await save_message(...) looks identical to any other async call. But under the hood the blocking DB write happens in a worker thread, leaving the event loop free to run other coroutines.
run_in_executor() moves the slow sync call off the event loop so the rest of the app stays responsive.
For non-critical side effects (like syncing to a cache or logging), you can wrap the executor call in asyncio.create_task() so the caller doesn't wait for it at all:
async def handle_request(question: str):
answer = await llm.generate(question)
# Don't wait for the DB write fire and forget
asyncio.create_task(save_message("session-1", answer))
return answer # returns immediately
Threads share memory. If two threads read and write the same object without coordination, you get race conditions one thread overwrites another's work, or reads a half-updated value. Use threading.Lock to protect shared state.
import threading
counter = 0
lock = threading.Lock()
def increment(n: int):
global counter
for _ in range(n):
with lock: # acquire lock, do work, release
counter += 1
threads = [threading.Thread(target=increment, args=(100_000,)) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()
print(counter) # Always 500000 never a race-corrupted value
Without the lock, counter += 1 (which is actually three operations: read, add, write) can be interrupted between steps, causing lost updates.
Lock, one write can overwrite the other and leave stock numbers wrong.
A regular Lock will deadlock if the same thread tries to acquire it twice. Use threading.RLock (reentrant lock) when a function that holds a lock needs to call another function that also acquires the same lock.
RLock prevents that nested call from deadlocking the thread against itself.rlock = threading.RLock()
def outer():
with rlock:
inner() # safe same thread can re-acquire
def inner():
with rlock: # would deadlock with regular Lock
print("inner")
threading.local() creates a storage object where each thread gets its own independent copy. Useful for database connections you want one connection per thread, not shared across threads.
import threading
import sqlite3
_thread_local = threading.local()
def get_connection(db_path: str) -> sqlite3.Connection:
if not hasattr(_thread_local, "conn"):
_thread_local.conn = sqlite3.connect(db_path)
return _thread_local.conn # each thread gets its own connection
In the Research Agent, sqlite3 is used for long-term memory (persisting conversation history between sessions). Because sqlite3 has no async API, calling it directly from the event loop would block it. The solution is a dedicated ThreadPoolExecutor:
Full project code: agents/threaded_memory/memory_agent.py
# agents/threaded_memory/memory_agent.py (excerpt)
import asyncio
import sqlite3
from concurrent.futures import ThreadPoolExecutor
_db_executor = ThreadPoolExecutor(max_workers=2, thread_name_prefix="memory-agent")
def _save_state_sync(state, db_path: str) -> None:
"""Pure sync runs in a worker thread, never on the event loop."""
conn = sqlite3.connect(db_path)
conn.execute("""
INSERT OR REPLACE INTO sessions (session_id, summary, updated_at)
VALUES (?, ?, ?)
""", (state.session_id, state.summary, "now"))
conn.executemany(
"INSERT INTO messages (session_id, role, content) VALUES (?,?,?)",
[(state.session_id, m.role, m.content) for m in state.messages]
)
conn.commit()
conn.close()
async def save_state(state) -> None:
"""Async wrapper event loop stays free during the DB write."""
loop = asyncio.get_event_loop()
async def _do():
await loop.run_in_executor(_db_executor, _save_state_sync, state, "agent_memory.db")
asyncio.create_task(_do()) # fire-and-forget caller doesn't wait
The two-layer design is intentional: _save_state_sync is a pure synchronous function with no asyncio imports it's safe to call from a thread. The save_state async wrapper handles scheduling and is the only interface the rest of the codebase sees.
| Scenario | Best choice | Why |
|---|---|---|
| Async-native I/O (aiohttp, asyncpg) | asyncio | No threads needed library handles it |
| Sync I/O library (sqlite3, requests, psycopg2) | threading + run_in_executor | GIL releases during I/O; bridges to async |
| CPU-bound Python computation | ProcessPoolExecutor | Separate process bypasses GIL entirely |
| Running isolated/untrusted code | subprocess | Full process isolation with killable handle |
| Many short-lived concurrent tasks (>1000) | asyncio | Threads have memory overhead; coroutines don't |
| Legacy sync codebase integration | threading | No async refactor needed |
Python threading is the adapter between the synchronous world and the async world. Its key strength is that you can take any blocking, synchronous library and run it in a thread pool with run_in_executor(), making it compatible with an async event loop without rewriting the library.
ThreadPoolExecutor is the modern way to manage thread pools cleaner than raw Thread objects.run_in_executor() is the bridge: push any sync blocking call into a thread, get back an awaitable.Lock when multiple threads read and write the same object.With all three concurrency tools now understood, the next three posts will build the complete Research Agent from scratch watching exactly where and why each tool is applied in a real production-style codebase.
Threading isn't broken it's just often the wrong tool. Know the GIL, pick the right primitive, and write code that is fast where it needs to be and safe everywhere else.