Step-by-Step Tutorial: Writing Python Extensions in Rust With PyO3
I hit a massive performance wall last Tuesday. I was tasked with parsing a 50GB dataset of nested JSON logs for a cybersecurity client doing malware analysis. Pure Python choked. Multiprocessing helped slightly, but the IPC (Inter-Process Communication) overhead destroyed most of the gains. Even leveraging recent Pandas updates and NumPy news promising better vectorized speeds didn’t help, because my custom business logic was too convoluted for simple matrix operations.
I needed raw, unadulterated speed. Historically, this meant diving into CPython internals and writing a C extension. If you’ve ever done that, you know the pain: manual reference counting, memory leaks, and mysterious segmentation faults that keep you awake at 3 AM. While the upcoming Python JIT and the buzz around the Mojo language offer exciting future promises for performance, I had a deadline this week. I needed a production-ready solution now.
That’s when I decided to write python extension in rust. If you look at the modern Python ecosystem, Rust is quietly taking over. The blazing fast Ruff linter, the Polars dataframe library, the Uv installer, the Rye manager, and the Pydantic core are all powered by Rust. Rust gives you C-level performance with mathematical guarantees against memory errors. In this tutorial, I will show you exactly how to build, compile, and distribute your own Rust-powered Python modules using PyO3 and Maturin.
Why Write Python Extension in Rust?
Before we touch the code, let’s establish why Rust is the definitive choice for extending Python today. You might be tempted to use Cython or C++, but Rust Python integration offers distinct architectural advantages.
- Fearless Concurrency: Python’s Global Interpreter Lock (GIL) prevents true multithreading. While the community is excited about PEP 703 (GIL removal) and Free threading coming in Python 3.13+, Rust allows you to safely release the GIL today. You can spin up native OS threads in Rust, crunch your data across all CPU cores, and return the result to Python.
- Memory Safety: CPython’s C API requires you to manually call
Py_INCREFandPy_DECREF. One mistake, and your FastAPI news aggregator or Django async backend crashes the entire server. Rust’s ownership model guarantees memory safety at compile time. - Ecosystem Tooling: Cargo (Rust’s package manager) is a joy to use. Integrating third-party Rust crates for cryptography, Python quantum computing algorithms (competing with Qiskit news), or Edge AI inference is as simple as adding a single line to a config file.
Whether you are optimizing a Local LLM pipeline integrated with LangChain updates and LlamaIndex news, or building high-frequency algo trading bots for Python finance, Rust provides the exact tools you need to bypass Python’s bottlenecks.
Setting Up Your Rust and Python Toolchain
We are going to use Maturin. Maturin is a build system specifically designed for Rust Python packages. It handles the nightmare of cross-compiling and linking against the correct CPython headers automatically.
First, install the Rust compiler using rustup. Open your terminal and run:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Next, you need a modern Python environment. I highly recommend ditching standard pip for project management and using the Uv installer or the Rye manager. For this tutorial, we will create an isolated virtual environment and install Maturin.
# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install maturin
pip install maturin
If you prefer standardizing your team’s workflow, Maturin integrates perfectly with the Hatch build system and the PDM manager via PEP 517 build hooks. But for our direct tutorial, the Maturin CLI is the fastest path to working code.
Project Scaffolding: Creating the Extension
Navigate to your workspace and let Maturin scaffold the project. We will call our extension fast_crunch.
maturin new fast_crunch
When prompted, select pyo3 as your binding. Maturin generates a hybrid repository containing both a Cargo.toml for Rust and a pyproject.toml for Python.
Open Cargo.toml. You will see something like this:
[package]
name = "fast_crunch"
version = "0.1.0"
edition = "2021"
[lib]
name = "fast_crunch"
crate-type = ["cdylib"]
[dependencies]
pyo3 = { version = "0.20.0", features = ["extension-module"] }
The cdylib crate type is crucial. It tells the Rust compiler to produce a dynamic system library (a .so on Linux, .pyd on Windows, or .dylib on macOS) that CPython can load natively at runtime.
Writing Your First Rust Function for Python

Open src/lib.rs. This is where the magic happens. PyO3 uses Rust macros to generate the C API boilerplate. Let’s write a function that performs a heavy mathematical operation—something that would typically block a Python web server.
use pyo3::prelude::*;
/// A computationally heavy function that calculates the nth Fibonacci number.
#[pyfunction]
fn calculate_fibonacci(n: u64) -> PyResult<u64> {
if n == 0 {
return Ok(0);
}
let mut a = 0;
let mut b = 1;
for _ in 1..n {
let temp = a;
a = b;
b = temp + b;
}
Ok(b)
}
/// A Python module implemented in Rust. The name of this function must match
/// the lib.name setting in the Cargo.toml, else Python will not be able to
/// import the module.
#[pymodule]
fn fast_crunch(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(calculate_fibonacci, m)?)?;
Ok(())
}
Here is exactly what is happening in this code:
- #[pyfunction]: This procedural macro parses our Rust function and generates a C-compatible wrapper. It handles the conversion of Python objects (like integers) into Rust native types (
u64). - PyResult: We return a
PyResult. If our Rust code encounters an error, PyO3 automatically translates it into a Python Exception (likeValueErrororRuntimeError). This ensures your Python testing suites and Pytest plugins can catch errors naturally. - #[pymodule]: This macro defines the entry point of the extension. When you run
import fast_crunchin Python, CPython looks for an initialization function matching the module name. We register ourcalculate_fibonaccifunction inside this module.
To compile and install this directly into your active virtual environment, run:
maturin develop --release
The --release flag is non-negotiable for performance testing. Rust compiled in debug mode is incredibly slow. Once the build finishes, open a Python REPL:
import fast_crunch
# Calculates the 90th Fibonacci number instantly
result = fast_crunch.calculate_fibonacci(90)
print(f"Result from Rust: {result}")
Handling Complex Data Types and Vectorization
Simple integers are easy. But real-world data engineering involves heavy arrays. Suppose you are building a custom indicator for algo trading. You have a massive list of closing prices, and you need to calculate an exponential moving average (EMA). Passing data between Python and Rust incurs overhead. You must minimize the boundary crossings.
Let’s write a Rust function that accepts a list of floats and returns a new list of floats. We will use PyO3’s built-in type conversions.
use pyo3::prelude::*;
use pyo3::types::PyList;
#[pyfunction]
fn calculate_ema(py: Python, prices: Vec<f64>, window: usize) -> PyResult<PyObject> {
if prices.is_empty() || window == 0 {
// Return an empty Python list
return Ok(PyList::empty(py).into());
}
let mut ema_values = Vec::with_capacity(prices.len());
let multiplier = 2.0 / (window as f64 + 1.0);
// Initial EMA is just the first price
let mut current_ema = prices[0];
ema_values.push(current_ema);
for &price in prices.iter().skip(1) {
current_ema = (price - current_ema) * multiplier + current_ema;
ema_values.push(current_ema);
}
// Convert the Rust Vec back to a Python List
let py_list = PyList::new(py, ema_values);
Ok(py_list.into())
}
Notice that we accept Vec<f64>. PyO3 automatically iterates over the incoming Python list and allocates a Rust vector. While this is convenient, it involves copying the data.
If you are tracking PyArrow updates or using the DuckDB python and Ibis framework for massive ETL pipelines, copying memory is a sin. For true zero-copy data sharing, you would use the rust-numpy crate, which allows Rust to directly access the raw memory buffers of NumPy arrays. But for web requests, scraping pipelines built on Scrapy updates, or Playwright python interceptors passing moderate payloads, standard PyO3 type conversion is exceptionally fast and perfectly adequate.
Managing State with Custom Python Classes in Rust
Functions are great, but modern Python relies heavily on object-oriented programming. If you are building a backend for a Reflex app, a Flet ui dashboard, or managing WebSocket connections in a Litestar framework API, you need stateful objects.
Let’s build a high-performance, thread-safe LRU Cache in Rust and expose it as a Python class. This is exactly the kind of architecture you use when deploying Local LLM context managers or Edge AI state trackers.
Add the lru crate to your Cargo.toml:
cargo add lru
Now, update your src/lib.rs to define a Python class using #[pyclass] and #[pymethods].
use pyo3::prelude::*;
use std::num::NonZeroUsize;
use lru::LruCache;
use std::sync::Mutex;
/// A thread-safe LRU Cache exposed to Python
#[pyclass]
struct FastCache {
// We wrap the cache in a Mutex because PyO3 requires classes
// to be Send + Sync if they are to be shared across Python threads.
cache: Mutex<LruCache<String, String>>,
}
#[pymethods]
impl FastCache {
/// The __init__ method for the Python class
#[new]
fn new(capacity: usize) -> Self {
let cap = NonZeroUsize::new(capacity).unwrap_or(NonZeroUsize::new(1).unwrap());
FastCache {
cache: Mutex::new(LruCache::new(cap)),
}
}
fn put(&self, key: String, value: String) {
let mut cache = self.cache.lock().unwrap();
cache.put(key, value);
}
fn get(&self, key: String) -> Option<String> {
let mut cache = self.cache.lock().unwrap();
// Clone the string to return it to Python
cache.get(&key).cloned()
}
}
#[pymodule]
fn fast_crunch(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_class::<FastCache>()?;
Ok(())
}
Compile this again with maturin develop --release. You have just built a stateful, compiled object that behaves exactly like a native Python class.
from fast_crunch import FastCache
# Instantiate the Rust struct from Python
cache = FastCache(1000)
cache.put("user_1", "authenticated")
print(cache.get("user_1")) # Outputs: authenticated
print(cache.get("user_2")) # Outputs: None
This pattern is devastatingly effective. You get the ergonomic API of Python with the memory layout and execution speed of Rust. Furthermore, by using a Mutex, this cache is completely thread-safe. As Python automation scripts and Selenium news bots scale up their multithreading, your Rust backend will never suffer from race conditions.
Defeating the GIL: True Multithreading in Python

This is the holy grail. The primary reason senior engineers write python extension in rust is to bypass the Global Interpreter Lock. If you are executing heavy cryptography, image processing, or running Monte Carlo simulations for algorithmic trading, you want to peg all 16 cores of your CPU at 100%.
To do this, we combine PyO3 with Rayon, Rust’s premier data-parallelism library. Add Rayon to your project:
cargo add rayon
Let’s write a function that takes a massive list of strings (perhaps scraped web text or raw documents for LlamaIndex news pipelines) and hashes them using SHA-256. We will release the Python GIL and process the array in parallel across all CPU cores.
use pyo3::prelude::*;
use rayon::prelude::*;
use sha2::{Sha256, Digest}; // Requires cargo add sha2
#[pyfunction]
fn parallel_hash(py: Python, documents: Vec<String>) -> PyResult<Vec<String>> {
// Release the GIL here!
let hashed_docs = py.allow_threads(move || {
// We are now outside the GIL. Python threads can continue executing.
// We use Rayon's par_iter to distribute the workload across all CPU cores.
documents.par_iter().map(|doc| {
let mut hasher = Sha256::new();
hasher.update(doc.as_bytes());
format!("{:x}", hasher.finalize())
}).collect()
});
// The GIL is automatically re-acquired when allow_threads finishes.
Ok(hashed_docs)
}
The py.allow_threads() closure is the most powerful construct in PyO3. While your Rust code is spinning on all CPU cores inside that closure, the Python interpreter is free to handle other tasks. Your Django async server can continue serving HTTP requests. Your FastAPI event loop keeps ticking. You have achieved true parallelism.
Tooling, Type Hints, and Distribution
Writing the code is only half the battle. If you want your team to actually use your extension, it needs to behave like a good Python citizen.
First, type hints. Modern Python relies heavily on MyPy updates and the Ruff linter for static analysis. Because Rust extensions are compiled binaries (.so or .pyd files), Python linters cannot read their source code to infer types. You must provide a .pyi stub file.
Create a file named fast_crunch.pyi in the root of your Python module directory:
from typing import List, Optional
def calculate_fibonacci(n: int) -> int: ...
def calculate_ema(prices: List[float], window: int) -> List[float]: ...
class FastCache:
def __init__(self, capacity: int) -> None: ...
def put(self, key: str, value: str) -> None: ...
def get(self, key: str) -> Optional[str]: ...
Running the Black formatter and SonarLint python analyzers will now correctly interpret your Rust extension’s API, ensuring PyPI safety and developer productivity.
Finally, publishing. Maturin makes publishing to PyPI trivial. When you are ready to distribute your wheel, run:

maturin build --release
This generates a standard .whl file in the target/wheels directory. You can upload this directly to PyPI using Twine. For multi-platform support (Linux, macOS, Windows), you should set up GitHub Actions using the maturin-action. It will automatically spin up CI runners, cross-compile your Rust code for every operating system, and publish the wheels to PyPI. Your end users will just run pip install fast_crunch and get a pre-compiled binary instantly—no Rust toolchain required on their end.
FAQ: Writing Python Extensions in Rust
How hard is it to learn Rust if I only know Python?
The learning curve is steep but manageable. Python developers usually struggle most with Rust’s borrow checker and strict typing. However, because PyO3 abstracts away the complex C-bindings, you can start by writing simple, procedural Rust functions and gradually learn advanced concepts like lifetimes as your extensions grow.
Can I distribute Rust Python extensions safely on PyPI?
Yes. Maturin compiles your Rust code into standard Python wheels. By using CI/CD pipelines (like GitHub Actions) to build wheels for Windows, macOS, and Linux, your end users simply run pip install your-package. They do not need the Rust compiler installed on their machines, ensuring high PyPI safety and ease of use.
