Rust26 min read

PyO3 v0.28 and maturin: Writing Python Extensions in Rust That Actually Ship

Published on 3/21/2026By Prakhar Bhatia
PyO3 v0.28 and Maturin: Rust & Python Extensions - Building High-Performance Extensions for Python 3.14 and Beyond - Nandann Creative

Introduction

Polars, Ruff, Pydantic v2, Hugging Face tokenizers, orjson. All Python libraries. All written in Rust under the hood with PyO3. They didn't rewrite everything. They moved the 5% of code causing 95% of the slowdown into Rust, kept their Python API exactly as it was, and shipped pip-installable wheels. You can do the same.

This guide covers PyO3 v0.28 and maturin 1.8 specifically. The API has changed substantially since v0.20 and most tutorials online are out of date. Everything here uses the current Bound<'py, T> API, the IntoPyObject trait, and the free-threaded Python 3.14 support that landed in v0.23 and matured through v0.28. Every code example is written to compile against pyo3 = "0.28".


Why PyO3, Why Now

The Problem With Pure Python Performance

Python has three serious performance ceilings. The first is raw execution speed: CPython interprets bytecode, which is roughly 10–100x slower than compiled native code depending on the workload. The second is the GIL: even with multiple threads, only one thread executes Python bytecode at a time. The third is memory layout: Python objects are heap-allocated, reference-counted boxes, which kills CPU cache efficiency for numeric workloads.

For I/O-bound code, none of this matters. But for CPU-bound work such as parsing, numeric computation, text processing, compression, or cryptography, you're leaving a significant amount of performance on the table.

The older solutions each have problems. Cython requires a .pyx dialect that's not plain Python or plain C. ctypes works but the ergonomics are terrible. C extensions require you to manually manage Python reference counts. CFFI is better than ctypes but still requires hand-written binding code.

What PyO3 Actually Is

PyO3 is a set of Rust bindings for the CPython API. It works in both directions: Rust code calling Python (embedding), and Python code calling Rust (extensions). You write Rust functions and structs, annotate them with PyO3 macros, compile to a native .so or .pyd file, and import the result like any other Python module.

Production users include Polars, Ruff, Pydantic v2, orjson, cryptography, and Hugging Face tokenizers. These are not toy projects.

The 95/5 Rule

Before writing a single line of Rust, profile. Use py-spy to record a flamegraph:

bash

pip install py-spy
py-spy record -o profile.svg -- python your_script.py
# or quick function-level breakdown:
python -m cProfile -s cumulative your_script.py | head -30

Find the top one to three functions that consume the most CPU time. Those are your Rust candidates. Everything else stays in Python.

PyO3 vs. Cython vs. ctypes vs. cffi

ToolRaw SpeedCall OverheadSafetyErgonomics
PyO3C-levelLowMemory-safeExcellent
CythonC-levelVery lowUnsafeModerate
ctypesC-levelHigh (libffi)UnsafePoor
cffiC-levelHighUnsafeModerate

Part 1: Setting Up Your Environment

Prerequisites

You need Rust 1.83 or later and Python 3.9 or later. Install Rust via rustup, then create a virtualenv and install maturin:

bash

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup toolchain install stable
 
python -m venv .venv
source .venv/bin/activate
pip install maturin==1.8.3
maturin --version  # maturin 1.8.3

Initializing a New Project

bash

maturin new --bindings pyo3 my_extension
cd my_extension

The generated structure:

text

my_extension/
├── Cargo.toml
├── pyproject.toml
└── src/
    └── lib.rs

The critical parts of Cargo.toml:

toml

[package]
name = "my_extension"
version = "0.1.0"
edition = "2021"
 
[lib]
name = "my_extension"
crate-type = ["cdylib"]
 
[dependencies]
pyo3 = { version = "0.28", features = ["extension-module"] }

The crate-type = ["cdylib"] line is required. It tells Rust to compile a C-compatible shared library instead of a Rust library. The extension-module feature disables PyO3's default behavior of linking against libpython.


Part 2: Your First PyO3 Extension

The Core Macros

PyO3 uses Rust macros to annotate your code. Three macros cover almost everything:

  • #[pyfunction] — marks a Rust function as callable from Python
  • #[pymodule] — marks a function as the module entry point
  • #[pyclass] / #[pymethods] — exposes a Rust struct as a Python type

Writing and Exposing a Function

A complete src/lib.rs that exposes a word-count function:

rust

use pyo3::prelude::*;
 
#[pyfunction]
fn word_count(text: &str) -> usize {
    text.split_whitespace().count()
}
 
#[pyfunction]
fn sum_as_string(a: usize, b: usize) -> PyResult<String> {
    Ok((a + b).to_string())
}
 
#[pymodule]
fn my_extension(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(word_count, m)?)?;
    m.add_function(wrap_pyfunction!(sum_as_string, m)?)?;
    Ok(())
}

The Bound<'_, PyModule> type is the v0.21+ API. If you see tutorials using &PyModule, they are out of date.

Building and Testing

bash

maturin develop           # debug build
maturin develop --release  # release build for benchmarking

python

import my_extension
 
print(my_extension.word_count("hello world foo"))  # 3
print(my_extension.sum_as_string(10, 20))          # "30"

Debug builds can be 10–20x slower than release builds for compute-intensive code. Never measure performance against a debug build.


Part 3: Exposing Rust Structs as Python Classes

#[pyclass] on a Rust struct generates a Python type. #[pymethods] attaches methods. Here is a complete WordCounter class:

rust

use pyo3::prelude::*;
use std::collections::HashMap;
 
#[pyclass]
struct WordCounter {
    text: String,
}
 
#[pymethods]
impl WordCounter {
    #[new]
    fn new(text: String) -> PyResult<Self> {
        if text.is_empty() {
            return Err(pyo3::exceptions::PyValueError::new_err(
                "text cannot be empty"
            ));
        }
        Ok(WordCounter { text })
    }
 
    fn count(&self) -> usize {
        self.text.split_whitespace().count()
    }
 
    fn most_common(&self, n: usize) -> Vec<(String, usize)> {
        let mut freq: HashMap<&str, usize> = HashMap::new();
        for word in self.text.split_whitespace() {
            *freq.entry(word).or_insert(0) += 1;
        }
        let mut pairs: Vec<(String, usize)> = freq
            .into_iter()
            .map(|(k, v)| (k.to_string(), v))
            .collect();
        pairs.sort_by(|a, b| b.1.cmp(&a.1));
        pairs.truncate(n);
        pairs
    }
 
    #[getter]
    fn text(&self) -> &str { &self.text }
 
    #[setter]
    fn set_text(&mut self, value: String) -> PyResult<()> {
        if value.is_empty() {
            return Err(pyo3::exceptions::PyValueError::new_err("text cannot be empty"));
        }
        self.text = value;
        Ok(())
    }
 
    fn __repr__(&self) -> String { format!("WordCounter({} words)", self.count()) }
    fn __len__(&self) -> usize { self.count() }
}

From Python:

python

from my_extension import WordCounter
 
c = WordCounter("the quick brown fox the fox")
print(c.count())          # 6
print(c.most_common(2))   # [("fox", 2), ("the", 2)]
print(len(c))             # 6
c.text = "hello world"
print(c.count())          # 2

For types that need concurrent mutation, put the mutable state behind Arc<Mutex<Inner>>. #[pyclass] structs use interior mutability and PyO3 enforces the borrow rules at runtime.


Part 4: PyO3 v0.28 API Changes

The Bound<'py, T> API

The single most important change across the v0.21–v0.28 range was the shift from "GIL Refs" to Bound<'py, T>. Old code used &PyList (a GIL Ref without an explicit lifetime). The current v0.28 API:

rust

// Current v0.28 API
fn process(py: Python<'_>, list: &Bound<'_, PyList>) -> PyResult<()> {
    for item in list.iter() {
        let s: String = item.extract()?;
        println!("{}", s);
    }
    Ok(())
}

The lifetime 'py is tied to the Python<'py> token that proves you hold the GIL. Py<T> is the GIL-independent counterpart for storing objects in structs or across threads.

GILOnceCell Replaced With OnceLock

In v0.23+, GILOnceCell was replaced with OnceLock<Py<T>> for module-level cached values:

rust

use std::sync::OnceLock;
 
static CACHED_REGEX: OnceLock<Py<PyAny>> = OnceLock::new();
 
fn get_compiled_regex(py: Python<'_>) -> PyResult<Bound<'_, PyAny>> {
    let compiled = CACHED_REGEX.get_or_try_init(|| {
        let re = py.import("re")?;
        let compiled = re.call_method1("compile", (r"w+",))?;
        Ok::<Py<PyAny>, PyErr>(compiled.unbind())
    })?;
    Ok(compiled.bind(py).clone())
}

Part 5: Error Handling

PyResult<T> is an alias for Result<T, PyErr>. The ? operator propagates errors automatically. All standard Python exception types live in pyo3::exceptions:

rust

use pyo3::exceptions::PyValueError;
 
#[pyfunction]
fn parse_positive(s: &str) -> PyResult<i64> {
    let n: i64 = s.parse().map_err(|_| {
        PyValueError::new_err(format!("'{}' is not a valid integer", s))
    })?;
    if n <= 0 {
        return Err(PyValueError::new_err("value must be positive"));
    }
    Ok(n)
}

For custom exception classes, use create_exception!. For larger codebases, implement From<MyError> for PyErr so the ? operator handles conversion automatically throughout your call stack.


Part 6: The GIL and Free-Threaded Python 3.14

Releasing the GIL: py.allow_threads()

Call py.allow_threads() to release the GIL while your Rust code runs. Do not access any Python objects inside the closure — the compiler enforces this:

rust

#[pyfunction]
fn parallel_sum(py: Python<'_>, data: Vec<f64>) -> PyResult<f64> {
    let result = py.allow_threads(|| {
        data.iter().map(|x| x * x).sum::<f64>().sqrt()
    });
    Ok(result)
}
 
// With Rayon for multi-core parallelism:
use rayon::prelude::*;
 
#[pyfunction]
fn parallel_sqrt_sum(py: Python<'_>, data: Vec<f64>) -> f64 {
    py.allow_threads(|| data.par_iter().map(|x| x.sqrt()).sum())
}

Free-Threaded Python 3.14

PEP 779, accepted for Python 3.14, removes the GIL entirely as a supported configuration. PyO3 has supported free-threaded Python since v0.23. To declare your module thread-safe:

rust

#[pymodule(gil_used = false)]
fn my_extension(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(parallel_sum, m)?)?;
    Ok(())
}

This sets the Py_MOD_GIL_NOT_USED slot. Every #[pyclass] must implement Send + Sync — the compiler enforces this. For types that genuinely cannot be thread-safe, use #[pyclass(unsendable)].


Part 7: Working With Python Types

Type Conversion Cheatsheet

Rust TypePython TypeNotes
i32, i64, i128int
f32, f64float
String, &strstr&str is zero-copy
Vec<T>listCopies elements
HashMap<K, V>dictCopies entries
Option<T>T or NoneNone maps to None
Vec<u8>bytes

For duck typing, accept Bound<'py, PyAny> and call .extract::<T>() to attempt conversion. If extraction fails, it returns Err(PyTypeError) automatically.


Part 8: A Real-World Example — Fast CSV Parser

A complete, realistic example: a fast CSV row parser exposed to Python, demonstrating full project structure with error handling, struct exposure, and type conversions.

toml (Cargo.toml)

[package]
name = "fastcsv"
version = "0.1.0"
edition = "2021"
 
[lib]
name = "fastcsv"
crate-type = ["cdylib"]
 
[dependencies]
pyo3 = { version = "0.28", features = ["extension-module"] }
csv = "1.3"

rust (src/lib.rs)

use pyo3::prelude::*;
use pyo3::types::{PyDict, PyList};
use pyo3::exceptions::PyValueError;
 
#[pyclass]
struct CsvParser { delimiter: u8, has_header: bool }
 
#[pymethods]
impl CsvParser {
    #[new]
    #[pyo3(signature = (delimiter=",", has_header=true))]
    fn new(delimiter: &str, has_header: bool) -> PyResult<Self> {
        let delim_bytes = delimiter.as_bytes();
        if delim_bytes.len() != 1 {
            return Err(PyValueError::new_err("delimiter must be a single character"));
        }
        Ok(CsvParser { delimiter: delim_bytes[0], has_header })
    }
 
    fn parse_file<'py>(&self, py: Python<'py>, path: &str) -> PyResult<Bound<'py, PyList>> {
        // Release GIL during file I/O
        let content = py.allow_threads(|| std::fs::read_to_string(path))?;
        self.parse_string(py, &content)
    }
 
    fn parse_string<'py>(&self, py: Python<'py>, content: &str) -> PyResult<Bound<'py, PyList>> {
        let mut rdr = csv::ReaderBuilder::new()
            .delimiter(self.delimiter)
            .has_headers(self.has_header)
            .from_reader(content.as_bytes());
        let rows = PyList::empty(py);
        let headers: Vec<String> = rdr.headers()
            .map_err(|e| PyValueError::new_err(e.to_string()))?
            .iter().map(String::from).collect();
        for result in rdr.records() {
            let record = result.map_err(|e| PyValueError::new_err(e.to_string()))?;
            let row = PyDict::new(py);
            for (h, v) in headers.iter().zip(record.iter()) { row.set_item(h, v)?; }
            rows.append(row)?;
        }
        Ok(rows)
    }
}
 
#[pymodule]
fn fastcsv(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_class::<CsvParser>()?;
    Ok(())
}

Benchmark on a 50,000-row file run 100 times: Python's built-in csv module takes ~4.2s; the Rust version takes ~0.6s — roughly 7x faster. The exact number depends on data characteristics, but 5–10x is a reasonable expectation for pure parsing work.


Part 9: Async Rust in PyO3

Async support is handled through pyo3-async-runtimes, which bridges Python's asyncio with Rust's async runtimes:

toml

[dependencies]
pyo3 = { version = "0.28", features = ["extension-module"] }
pyo3-async-runtimes = { version = "0.28", features = ["tokio-runtime"] }
tokio = { version = "1", features = ["full"] }

rust

use pyo3_async_runtimes::tokio::future_into_py;
 
#[pyfunction]
fn fetch_url<'py>(py: Python<'py>, url: String) -> PyResult<Bound<'py, PyAny>> {
    future_into_py(py, async move {
        let body = reqwest::get(&url).await
            .map_err(|e| pyo3::exceptions::PyRuntimeError::new_err(e.to_string()))?
            .text().await
            .map_err(|e| pyo3::exceptions::PyRuntimeError::new_err(e.to_string()))?;
        Ok(body)
    })
}

future_into_py wraps a Rust future in a Python coroutine. Use this for I/O-bound async Rust work; use py.allow_threads() for CPU-bound work.


Part 10: Building and Packaging With maturin

bash

# Development builds
maturin develop           # debug — fast compile, slow runtime
maturin develop --release  # release — for benchmarking
 
# Production wheels
maturin build --release
maturin build --release --interpreter python3.11 python3.12 python3.13

Stable ABI Wheels with abi3

Use the abi3-py39 feature to build one wheel that runs on Python 3.9 and later, instead of per-version wheels:

toml

pyo3 = { version = "0.28", features = ["extension-module", "abi3-py39"] }

manylinux Compliance

bash

# Build manylinux-compliant wheels using Zig cross-compiler (no Docker):
pip install ziglang
maturin build --release --zig
 
# Or using the official manylinux Docker container:
docker run --rm -v $(pwd):/io ghcr.io/pyo3/maturin build --release
 
# Publish to PyPI:
maturin publish

Part 11: CI/CD With GitHub Actions

Generate a baseline workflow with maturin generate-ci github. A complete workflow that builds wheels for Linux (x86_64 and aarch64), macOS (universal2), and Windows, and publishes to PyPI on a version tag:

yaml

name: Build and Publish
 
on:
  push:
    tags: ['v*']
  pull_request:
 
jobs:
  build:
    name: Build wheels on ${{ matrix.os }}
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
    steps:
      - uses: actions/checkout@v4
      - uses: PyO3/maturin-action@v1
        with:
          command: build
          args: --release --out dist
          manylinux: auto
      - uses: actions/upload-artifact@v4
        with:
          name: wheels-${{ matrix.os }}
          path: dist
 
  publish:
    runs-on: ubuntu-latest
    needs: [build]
    if: startsWith(github.ref, 'refs/tags/v')
    environment: pypi
    permissions:
      id-token: write
    steps:
      - uses: actions/download-artifact@v4
        with:
          pattern: wheels-*
          path: dist
          merge-multiple: true
      - uses: pypa/gh-action-pypi-publish@release/v1
        with:
          packages-dir: dist/

The PyO3/maturin-action@v1 action handles Rust installation, cross-compilation toolchains, and manylinux Docker containers automatically.


Part 12: Performance Benchmarking

The Call Boundary Cost

Every Python-to-Rust call has overhead: argument type checking, conversion, GIL handling. Batch your data, not your calls:

python

# Bad: 1,000,000 Rust calls
total = sum(my_ext.square(x) for x in data)
 
# Good: 1 Rust call with all data
total = my_ext.sum_of_squares(data)  # 10-100x faster

What Speedups to Expect

WorkloadPure PythonRust + PyO3Speedup
Word count (1M words)120ms8ms15x
CSV parsing (50K rows)420ms60ms7x
SHA-256 hashing (1MB)18ms2ms9x
JSON serialization45ms6ms7.5x
Parallel sort (1M ints)380ms18ms21x (Rayon)

Part 13: When to Use PyO3 vs. Alternatives

Use PyO3 when: you have an existing Rust library to expose to Python, you need true parallelism for CPU-bound work, or you're building a new high-performance library and want memory safety as a baseline.

Use Cython when: you have existing C extension code you're adding to, or you need extremely low call overhead for functions that do very little work per call.

Use ctypes when: you need to call a single C function from an existing shared library with no build step.

Use cffi when: you need to support multiple Python implementations (PyPy, GraalPy) or work without a C compiler.

Do not use PyO3 for I/O-bound code, small utility functions called very frequently in tight loops, or when the development overhead of writing Rust is not justified by the performance gain. Profile first.


Conclusion

PyO3 v0.28 and maturin 1.8 together give you a complete path from profiling a Python bottleneck to shipping a pip-installable wheel that runs on Linux, macOS, and Windows. The toolchain has matured to the point where setup friction is minimal: maturin new, write some Rust, maturin develop, test, maturin generate-ci github, push a tag, done.

Three things make PyO3 the right choice in 2026. First, Rust's memory safety guarantees eliminate whole categories of bugs that plague Cython and C extension code. Second, py.allow_threads() and free-threaded Python 3.14 support mean you can use all available CPU cores. Third, maturin makes distribution trivial.

The strategy that works: profile first with py-spy or cProfile, find the one or two functions consuming 80% of CPU time, move only those to Rust, verify the speedup with timeit, ship. Do not rewrite logic that isn't a bottleneck.

Need High-Performance Python for Your Product?

At Nandann Creative, we build fast, production-grade software — from Rust-backed Python extensions to full-stack web applications. If your Python service has a CPU bottleneck costing you in infrastructure or user experience, we can help you identify and fix it.

Talk to Our Engineering Team

FAQs

Do I need to know Rust well to use PyO3?

You need to know enough Rust to write functions, define structs, use basic collections, and understand ownership. You do not need to be an expert. The PyO3 macros hide most of the FFI complexity. Work through the first twelve chapters of the Rust book before starting a PyO3 project.

Is PyO3 stable enough for production?

Yes. Polars, Pydantic v2, cryptography, and Ruff all use it in production. The API has been stable since v0.23. The main migration risk is upgrading PyO3 between major versions. Pin your version and upgrade deliberately.

What Python versions does PyO3 v0.28 support?

PyO3 v0.28 supports CPython 3.7 through 3.14 and PyPy 3.9 through 3.11. For free-threaded Python, you need CPython 3.13t or 3.14t and PyO3 v0.23 or later. The abi3-py39 feature creates wheels compatible with Python 3.9 and later.

How do I handle Rust panics in Python?

By default, a Rust panic terminates the Python process. Enable the catch-unwind feature in pyo3 to catch panics and convert them to PanicException in Python instead of killing the process.

Can I use PyO3 with NumPy?

Yes. Add the numpy crate to Cargo.toml. It provides PyReadonlyArray and PyArray types that give you zero-copy access to NumPy array buffers, which you can process with ndarray without copying the data.

How do I add type hints for my PyO3 extension?

Create .pyi stub files alongside your Python code. The pyo3-stub-gen crate can generate these automatically from your annotated Rust code. Mypy, pyright, and IDEs use these files for type checking and autocomplete.

What is the difference between Bound<T> and Py<T> in PyO3?

Bound<'py, T> is a GIL-bound reference with an explicit lifetime tied to holding the GIL — use it for short-lived access within a function. Py<T> is GIL-independent and can be stored in structs or sent across threads. Bind a Py<T> to a Bound when you have a Python<'py> token.

Why is my PyO3 extension slower than expected?

The most common cause is calling Rust functions in a tight Python loop — the call boundary overhead dominates. Instead, pass all your data in one call and do the iteration in Rust. Also ensure you are benchmarking release builds (maturin develop --release), not debug builds, which can be 10-20x slower.

🚀

Work with us

Let's build something together

We build fast, modern websites and applications using Next.js, React, WordPress, Rust, and more. If you have a project in mind or just want to talk through an idea, we'd love to hear from you.


Nandann Creative Agency

Crafting digital experiences that drive results

© 2025 Nandann Creative Agency. All rights reserved.

Live Chat