Rust • 26 min read
PyO3 v0.28 and maturin: Writing Python Extensions in Rust That Actually Ship
Introduction
Polars, Ruff, Pydantic v2, Hugging Face tokenizers, orjson. All Python libraries. All written in Rust under the hood with PyO3. They didn't rewrite everything. They moved the 5% of code causing 95% of the slowdown into Rust, kept their Python API exactly as it was, and shipped pip-installable wheels. You can do the same.
This guide covers PyO3 v0.28 and maturin 1.8 specifically. The API has changed substantially since v0.20 and most tutorials online are out of date. Everything here uses the current Bound<'py, T> API, the IntoPyObject trait, and the free-threaded Python 3.14 support that landed in v0.23 and matured through v0.28. Every code example is written to compile against pyo3 = "0.28".
Why PyO3, Why Now
The Problem With Pure Python Performance
Python has three serious performance ceilings. The first is raw execution speed: CPython interprets bytecode, which is roughly 10–100x slower than compiled native code depending on the workload. The second is the GIL: even with multiple threads, only one thread executes Python bytecode at a time. The third is memory layout: Python objects are heap-allocated, reference-counted boxes, which kills CPU cache efficiency for numeric workloads.
For I/O-bound code, none of this matters. But for CPU-bound work such as parsing, numeric computation, text processing, compression, or cryptography, you're leaving a significant amount of performance on the table.
The older solutions each have problems. Cython requires a .pyx dialect that's not plain Python or plain C. ctypes works but the ergonomics are terrible. C extensions require you to manually manage Python reference counts. CFFI is better than ctypes but still requires hand-written binding code.
What PyO3 Actually Is
PyO3 is a set of Rust bindings for the CPython API. It works in both directions: Rust code calling Python (embedding), and Python code calling Rust (extensions). You write Rust functions and structs, annotate them with PyO3 macros, compile to a native .so or .pyd file, and import the result like any other Python module.
Production users include Polars, Ruff, Pydantic v2, orjson, cryptography, and Hugging Face tokenizers. These are not toy projects.
The 95/5 Rule
Before writing a single line of Rust, profile. Use py-spy to record a flamegraph:
bash
pip install py-spy
py-spy record -o profile.svg -- python your_script.py
# or quick function-level breakdown:
python -m cProfile -s cumulative your_script.py | head -30Find the top one to three functions that consume the most CPU time. Those are your Rust candidates. Everything else stays in Python.
PyO3 vs. Cython vs. ctypes vs. cffi
| Tool | Raw Speed | Call Overhead | Safety | Ergonomics |
|---|---|---|---|---|
| PyO3 | C-level | Low | Memory-safe | Excellent |
| Cython | C-level | Very low | Unsafe | Moderate |
| ctypes | C-level | High (libffi) | Unsafe | Poor |
| cffi | C-level | High | Unsafe | Moderate |
Part 1: Setting Up Your Environment
Prerequisites
You need Rust 1.83 or later and Python 3.9 or later. Install Rust via rustup, then create a virtualenv and install maturin:
bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup toolchain install stable
python -m venv .venv
source .venv/bin/activate
pip install maturin==1.8.3
maturin --version # maturin 1.8.3Initializing a New Project
bash
maturin new --bindings pyo3 my_extension
cd my_extensionThe generated structure:
text
my_extension/
├── Cargo.toml
├── pyproject.toml
└── src/
└── lib.rsThe critical parts of Cargo.toml:
toml
[package]
name = "my_extension"
version = "0.1.0"
edition = "2021"
[lib]
name = "my_extension"
crate-type = ["cdylib"]
[dependencies]
pyo3 = { version = "0.28", features = ["extension-module"] }The crate-type = ["cdylib"] line is required. It tells Rust to compile a C-compatible shared library instead of a Rust library. The extension-module feature disables PyO3's default behavior of linking against libpython.
Part 2: Your First PyO3 Extension
The Core Macros
PyO3 uses Rust macros to annotate your code. Three macros cover almost everything:
#[pyfunction]— marks a Rust function as callable from Python#[pymodule]— marks a function as the module entry point#[pyclass]/#[pymethods]— exposes a Rust struct as a Python type
Writing and Exposing a Function
A complete src/lib.rs that exposes a word-count function:
rust
use pyo3::prelude::*;
#[pyfunction]
fn word_count(text: &str) -> usize {
text.split_whitespace().count()
}
#[pyfunction]
fn sum_as_string(a: usize, b: usize) -> PyResult<String> {
Ok((a + b).to_string())
}
#[pymodule]
fn my_extension(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(word_count, m)?)?;
m.add_function(wrap_pyfunction!(sum_as_string, m)?)?;
Ok(())
}The Bound<'_, PyModule> type is the v0.21+ API. If you see tutorials using &PyModule, they are out of date.
Building and Testing
bash
maturin develop # debug build
maturin develop --release # release build for benchmarkingpython
import my_extension
print(my_extension.word_count("hello world foo")) # 3
print(my_extension.sum_as_string(10, 20)) # "30"Debug builds can be 10–20x slower than release builds for compute-intensive code. Never measure performance against a debug build.
Part 3: Exposing Rust Structs as Python Classes
#[pyclass] on a Rust struct generates a Python type. #[pymethods] attaches methods. Here is a complete WordCounter class:
rust
use pyo3::prelude::*;
use std::collections::HashMap;
#[pyclass]
struct WordCounter {
text: String,
}
#[pymethods]
impl WordCounter {
#[new]
fn new(text: String) -> PyResult<Self> {
if text.is_empty() {
return Err(pyo3::exceptions::PyValueError::new_err(
"text cannot be empty"
));
}
Ok(WordCounter { text })
}
fn count(&self) -> usize {
self.text.split_whitespace().count()
}
fn most_common(&self, n: usize) -> Vec<(String, usize)> {
let mut freq: HashMap<&str, usize> = HashMap::new();
for word in self.text.split_whitespace() {
*freq.entry(word).or_insert(0) += 1;
}
let mut pairs: Vec<(String, usize)> = freq
.into_iter()
.map(|(k, v)| (k.to_string(), v))
.collect();
pairs.sort_by(|a, b| b.1.cmp(&a.1));
pairs.truncate(n);
pairs
}
#[getter]
fn text(&self) -> &str { &self.text }
#[setter]
fn set_text(&mut self, value: String) -> PyResult<()> {
if value.is_empty() {
return Err(pyo3::exceptions::PyValueError::new_err("text cannot be empty"));
}
self.text = value;
Ok(())
}
fn __repr__(&self) -> String { format!("WordCounter({} words)", self.count()) }
fn __len__(&self) -> usize { self.count() }
}From Python:
python
from my_extension import WordCounter
c = WordCounter("the quick brown fox the fox")
print(c.count()) # 6
print(c.most_common(2)) # [("fox", 2), ("the", 2)]
print(len(c)) # 6
c.text = "hello world"
print(c.count()) # 2For types that need concurrent mutation, put the mutable state behind Arc<Mutex<Inner>>. #[pyclass] structs use interior mutability and PyO3 enforces the borrow rules at runtime.
Part 4: PyO3 v0.28 API Changes
The Bound<'py, T> API
The single most important change across the v0.21–v0.28 range was the shift from "GIL Refs" to Bound<'py, T>. Old code used &PyList (a GIL Ref without an explicit lifetime). The current v0.28 API:
rust
// Current v0.28 API
fn process(py: Python<'_>, list: &Bound<'_, PyList>) -> PyResult<()> {
for item in list.iter() {
let s: String = item.extract()?;
println!("{}", s);
}
Ok(())
}The lifetime 'py is tied to the Python<'py> token that proves you hold the GIL. Py<T> is the GIL-independent counterpart for storing objects in structs or across threads.
GILOnceCell Replaced With OnceLock
In v0.23+, GILOnceCell was replaced with OnceLock<Py<T>> for module-level cached values:
rust
use std::sync::OnceLock;
static CACHED_REGEX: OnceLock<Py<PyAny>> = OnceLock::new();
fn get_compiled_regex(py: Python<'_>) -> PyResult<Bound<'_, PyAny>> {
let compiled = CACHED_REGEX.get_or_try_init(|| {
let re = py.import("re")?;
let compiled = re.call_method1("compile", (r"w+",))?;
Ok::<Py<PyAny>, PyErr>(compiled.unbind())
})?;
Ok(compiled.bind(py).clone())
}Part 5: Error Handling
PyResult<T> is an alias for Result<T, PyErr>. The ? operator propagates errors automatically. All standard Python exception types live in pyo3::exceptions:
rust
use pyo3::exceptions::PyValueError;
#[pyfunction]
fn parse_positive(s: &str) -> PyResult<i64> {
let n: i64 = s.parse().map_err(|_| {
PyValueError::new_err(format!("'{}' is not a valid integer", s))
})?;
if n <= 0 {
return Err(PyValueError::new_err("value must be positive"));
}
Ok(n)
}For custom exception classes, use create_exception!. For larger codebases, implement From<MyError> for PyErr so the ? operator handles conversion automatically throughout your call stack.
Part 6: The GIL and Free-Threaded Python 3.14
Releasing the GIL: py.allow_threads()
Call py.allow_threads() to release the GIL while your Rust code runs. Do not access any Python objects inside the closure — the compiler enforces this:
rust
#[pyfunction]
fn parallel_sum(py: Python<'_>, data: Vec<f64>) -> PyResult<f64> {
let result = py.allow_threads(|| {
data.iter().map(|x| x * x).sum::<f64>().sqrt()
});
Ok(result)
}
// With Rayon for multi-core parallelism:
use rayon::prelude::*;
#[pyfunction]
fn parallel_sqrt_sum(py: Python<'_>, data: Vec<f64>) -> f64 {
py.allow_threads(|| data.par_iter().map(|x| x.sqrt()).sum())
}Free-Threaded Python 3.14
PEP 779, accepted for Python 3.14, removes the GIL entirely as a supported configuration. PyO3 has supported free-threaded Python since v0.23. To declare your module thread-safe:
rust
#[pymodule(gil_used = false)]
fn my_extension(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(parallel_sum, m)?)?;
Ok(())
}This sets the Py_MOD_GIL_NOT_USED slot. Every #[pyclass] must implement Send + Sync — the compiler enforces this. For types that genuinely cannot be thread-safe, use #[pyclass(unsendable)].
Part 7: Working With Python Types
Type Conversion Cheatsheet
| Rust Type | Python Type | Notes |
|---|---|---|
i32, i64, i128 | int | |
f32, f64 | float | |
String, &str | str | &str is zero-copy |
Vec<T> | list | Copies elements |
HashMap<K, V> | dict | Copies entries |
Option<T> | T or None | None maps to None |
Vec<u8> | bytes |
For duck typing, accept Bound<'py, PyAny> and call .extract::<T>() to attempt conversion. If extraction fails, it returns Err(PyTypeError) automatically.
Part 8: A Real-World Example — Fast CSV Parser
A complete, realistic example: a fast CSV row parser exposed to Python, demonstrating full project structure with error handling, struct exposure, and type conversions.
toml (Cargo.toml)
[package]
name = "fastcsv"
version = "0.1.0"
edition = "2021"
[lib]
name = "fastcsv"
crate-type = ["cdylib"]
[dependencies]
pyo3 = { version = "0.28", features = ["extension-module"] }
csv = "1.3"rust (src/lib.rs)
use pyo3::prelude::*;
use pyo3::types::{PyDict, PyList};
use pyo3::exceptions::PyValueError;
#[pyclass]
struct CsvParser { delimiter: u8, has_header: bool }
#[pymethods]
impl CsvParser {
#[new]
#[pyo3(signature = (delimiter=",", has_header=true))]
fn new(delimiter: &str, has_header: bool) -> PyResult<Self> {
let delim_bytes = delimiter.as_bytes();
if delim_bytes.len() != 1 {
return Err(PyValueError::new_err("delimiter must be a single character"));
}
Ok(CsvParser { delimiter: delim_bytes[0], has_header })
}
fn parse_file<'py>(&self, py: Python<'py>, path: &str) -> PyResult<Bound<'py, PyList>> {
// Release GIL during file I/O
let content = py.allow_threads(|| std::fs::read_to_string(path))?;
self.parse_string(py, &content)
}
fn parse_string<'py>(&self, py: Python<'py>, content: &str) -> PyResult<Bound<'py, PyList>> {
let mut rdr = csv::ReaderBuilder::new()
.delimiter(self.delimiter)
.has_headers(self.has_header)
.from_reader(content.as_bytes());
let rows = PyList::empty(py);
let headers: Vec<String> = rdr.headers()
.map_err(|e| PyValueError::new_err(e.to_string()))?
.iter().map(String::from).collect();
for result in rdr.records() {
let record = result.map_err(|e| PyValueError::new_err(e.to_string()))?;
let row = PyDict::new(py);
for (h, v) in headers.iter().zip(record.iter()) { row.set_item(h, v)?; }
rows.append(row)?;
}
Ok(rows)
}
}
#[pymodule]
fn fastcsv(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_class::<CsvParser>()?;
Ok(())
}Benchmark on a 50,000-row file run 100 times: Python's built-in csv module takes ~4.2s; the Rust version takes ~0.6s — roughly 7x faster. The exact number depends on data characteristics, but 5–10x is a reasonable expectation for pure parsing work.
Part 9: Async Rust in PyO3
Async support is handled through pyo3-async-runtimes, which bridges Python's asyncio with Rust's async runtimes:
toml
[dependencies]
pyo3 = { version = "0.28", features = ["extension-module"] }
pyo3-async-runtimes = { version = "0.28", features = ["tokio-runtime"] }
tokio = { version = "1", features = ["full"] }rust
use pyo3_async_runtimes::tokio::future_into_py;
#[pyfunction]
fn fetch_url<'py>(py: Python<'py>, url: String) -> PyResult<Bound<'py, PyAny>> {
future_into_py(py, async move {
let body = reqwest::get(&url).await
.map_err(|e| pyo3::exceptions::PyRuntimeError::new_err(e.to_string()))?
.text().await
.map_err(|e| pyo3::exceptions::PyRuntimeError::new_err(e.to_string()))?;
Ok(body)
})
}future_into_py wraps a Rust future in a Python coroutine. Use this for I/O-bound async Rust work; use py.allow_threads() for CPU-bound work.
Part 10: Building and Packaging With maturin
bash
# Development builds
maturin develop # debug — fast compile, slow runtime
maturin develop --release # release — for benchmarking
# Production wheels
maturin build --release
maturin build --release --interpreter python3.11 python3.12 python3.13Stable ABI Wheels with abi3
Use the abi3-py39 feature to build one wheel that runs on Python 3.9 and later, instead of per-version wheels:
toml
pyo3 = { version = "0.28", features = ["extension-module", "abi3-py39"] }manylinux Compliance
bash
# Build manylinux-compliant wheels using Zig cross-compiler (no Docker):
pip install ziglang
maturin build --release --zig
# Or using the official manylinux Docker container:
docker run --rm -v $(pwd):/io ghcr.io/pyo3/maturin build --release
# Publish to PyPI:
maturin publishPart 11: CI/CD With GitHub Actions
Generate a baseline workflow with maturin generate-ci github. A complete workflow that builds wheels for Linux (x86_64 and aarch64), macOS (universal2), and Windows, and publishes to PyPI on a version tag:
yaml
name: Build and Publish
on:
push:
tags: ['v*']
pull_request:
jobs:
build:
name: Build wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
steps:
- uses: actions/checkout@v4
- uses: PyO3/maturin-action@v1
with:
command: build
args: --release --out dist
manylinux: auto
- uses: actions/upload-artifact@v4
with:
name: wheels-${{ matrix.os }}
path: dist
publish:
runs-on: ubuntu-latest
needs: [build]
if: startsWith(github.ref, 'refs/tags/v')
environment: pypi
permissions:
id-token: write
steps:
- uses: actions/download-artifact@v4
with:
pattern: wheels-*
path: dist
merge-multiple: true
- uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: dist/The PyO3/maturin-action@v1 action handles Rust installation, cross-compilation toolchains, and manylinux Docker containers automatically.
Part 12: Performance Benchmarking
The Call Boundary Cost
Every Python-to-Rust call has overhead: argument type checking, conversion, GIL handling. Batch your data, not your calls:
python
# Bad: 1,000,000 Rust calls
total = sum(my_ext.square(x) for x in data)
# Good: 1 Rust call with all data
total = my_ext.sum_of_squares(data) # 10-100x fasterWhat Speedups to Expect
| Workload | Pure Python | Rust + PyO3 | Speedup |
|---|---|---|---|
| Word count (1M words) | 120ms | 8ms | 15x |
| CSV parsing (50K rows) | 420ms | 60ms | 7x |
| SHA-256 hashing (1MB) | 18ms | 2ms | 9x |
| JSON serialization | 45ms | 6ms | 7.5x |
| Parallel sort (1M ints) | 380ms | 18ms | 21x (Rayon) |
Part 13: When to Use PyO3 vs. Alternatives
Use PyO3 when: you have an existing Rust library to expose to Python, you need true parallelism for CPU-bound work, or you're building a new high-performance library and want memory safety as a baseline.
Use Cython when: you have existing C extension code you're adding to, or you need extremely low call overhead for functions that do very little work per call.
Use ctypes when: you need to call a single C function from an existing shared library with no build step.
Use cffi when: you need to support multiple Python implementations (PyPy, GraalPy) or work without a C compiler.
Do not use PyO3 for I/O-bound code, small utility functions called very frequently in tight loops, or when the development overhead of writing Rust is not justified by the performance gain. Profile first.
Conclusion
PyO3 v0.28 and maturin 1.8 together give you a complete path from profiling a Python bottleneck to shipping a pip-installable wheel that runs on Linux, macOS, and Windows. The toolchain has matured to the point where setup friction is minimal: maturin new, write some Rust, maturin develop, test, maturin generate-ci github, push a tag, done.
Three things make PyO3 the right choice in 2026. First, Rust's memory safety guarantees eliminate whole categories of bugs that plague Cython and C extension code. Second, py.allow_threads() and free-threaded Python 3.14 support mean you can use all available CPU cores. Third, maturin makes distribution trivial.
The strategy that works: profile first with py-spy or cProfile, find the one or two functions consuming 80% of CPU time, move only those to Rust, verify the speedup with timeit, ship. Do not rewrite logic that isn't a bottleneck.
Need High-Performance Python for Your Product?
At Nandann Creative, we build fast, production-grade software — from Rust-backed Python extensions to full-stack web applications. If your Python service has a CPU bottleneck costing you in infrastructure or user experience, we can help you identify and fix it.
Talk to Our Engineering TeamFAQs
Do I need to know Rust well to use PyO3?
You need to know enough Rust to write functions, define structs, use basic collections, and understand ownership. You do not need to be an expert. The PyO3 macros hide most of the FFI complexity. Work through the first twelve chapters of the Rust book before starting a PyO3 project.
Is PyO3 stable enough for production?
Yes. Polars, Pydantic v2, cryptography, and Ruff all use it in production. The API has been stable since v0.23. The main migration risk is upgrading PyO3 between major versions. Pin your version and upgrade deliberately.
What Python versions does PyO3 v0.28 support?
PyO3 v0.28 supports CPython 3.7 through 3.14 and PyPy 3.9 through 3.11. For free-threaded Python, you need CPython 3.13t or 3.14t and PyO3 v0.23 or later. The abi3-py39 feature creates wheels compatible with Python 3.9 and later.
How do I handle Rust panics in Python?
By default, a Rust panic terminates the Python process. Enable the catch-unwind feature in pyo3 to catch panics and convert them to PanicException in Python instead of killing the process.
Can I use PyO3 with NumPy?
Yes. Add the numpy crate to Cargo.toml. It provides PyReadonlyArray and PyArray types that give you zero-copy access to NumPy array buffers, which you can process with ndarray without copying the data.
How do I add type hints for my PyO3 extension?
Create .pyi stub files alongside your Python code. The pyo3-stub-gen crate can generate these automatically from your annotated Rust code. Mypy, pyright, and IDEs use these files for type checking and autocomplete.
What is the difference between Bound<T> and Py<T> in PyO3?
Bound<'py, T> is a GIL-bound reference with an explicit lifetime tied to holding the GIL — use it for short-lived access within a function. Py<T> is GIL-independent and can be stored in structs or sent across threads. Bind a Py<T> to a Bound when you have a Python<'py> token.
Why is my PyO3 extension slower than expected?
The most common cause is calling Rust functions in a tight Python loop — the call boundary overhead dominates. Instead, pass all your data in one call and do the iteration in Rust. Also ensure you are benchmarking release builds (maturin develop --release), not debug builds, which can be 10-20x slower.
Work with us
Let's build something together
We build fast, modern websites and applications using Next.js, React, WordPress, Rust, and more. If you have a project in mind or just want to talk through an idea, we'd love to hear from you.