Claude Code, Python, and Rust, Oh My!

Recently, gdb blessed rust as a language for agents,

rust is a perfect language for agents, given that if it compiles it's ~correct
— Greg Brockman (@gdb) January 2, 2026

Certainly, I am no gdb. But as the CTO of a ML company, I do have a large number of responsibilities and a decent APM. Back in November, codex+rust left me genuinely astonished,

codex as a companion for writing rust is astonishingly good. i'm sure rustaceans would disagree, but it seems to have extremely good properties to help guide the model. like i'm surprised at how good it is
— johnny v5 (@generativist) November 20, 2025

Once our beautiful boy opus 4.5 got released, it almost became a pathologically fun experience in claude code,

yelling "yes Yes! MOAR RUST!" at claude's for 9 straight hours
— johnny v5 (@generativist) December 5, 2025

Some people thought gdb's tweet was dumb. They were wrong though,

if it compiled its ~correct is conditioned on SOTA llm as the generator. types plus the verification they afford seem to be almost astonishingly token efficient https://t.co/wAAGib0mlp
— johnny v5 (@generativist) January 4, 2026

Until you try it — even without rust experience! — it's hard to appreciate how useful it can be. Since November, I've churned out God knows how many rust tools to solve various problems I've had for months if not years that were never worth attending to. But since it's less discussed I just want to share one useful application of this pattern (claude code + rust) here: claude code + python + rust.

To set the stage, I recently had a sort of convoluted problem. Our company has a large number of archived images in S3. We don't use anything like webdataset because the image archives are meant to mirror the original sources. Usually when training models over this data, we just use FSx and there is no problem. But recently, I wanted to evaluate a number of architectures over a much larger set than we mount with FSx. This lead to two clear options: 1) just mount it all anyway and pay AWS its ransom or 2) fetch the objects on the fly and train at a glacial speed. But the thing is, if you think in terms of throughput for instances running in the same region as the S3 bucket, throughput is definitely not the problem. It's just a latency issue.

My first quick attempted solution involved explaining the problem to claude; telling it to define the shuffle plans ahead of time; then, do some sort of mark and fetch with boto3 and threading. I spent maybe 25 minutes of my time getting claude to build an initial implementation in python and then benchmarking it on a CSV. The performance was not great. Me and Claude had a brief dialog, and I decided it was mostly the omnipresent problem of Python threading being annoying (especially when interacting with the file system).

The solution: "Claude, I think the issue here is that Python multithreading can't sustain 128 threads. Please use maturin to create an optional rust version of the fetching machinery to accelerate this." Over my example csv file, I got somewhere between 10-50x over python's speed. I did not write any of it. It close-to-one-shotted the extension. You can view the full repo here, or just skim the code example below:

import numpy as np
from PIL import Image
import planfetch as pf

# Load CSV index
train_uris, train_labels = pf.load_index_csv(
    "train.csv",
    s3_uri_col="s3_path",
    class_col="class_id",
    pack_uris=True,
)

# Configure NVMe cache
nvme_bytes = 1_000_000_000_000  # 1 TB
cache = pf.SharedDiskCache(
    "/mnt/nvme/s3cache_train",
    max_bytes=int(0.8 * nvme_bytes),
    download_threads=64,
    s3_max_pool_connections=256,
)

# Create dataset with cache integration
def image_loader(path: str):
    return Image.open(path).convert("RGB")

train_ds = pf.S3PathDataset(
    train_uris,
    train_labels,
    cache=cache,
    image_loader=image_loader,
    transform=None,
)

# Configure batch planning
batch_size = 256
plan_provider = pf.StreamingStratifiedBatchPlanProvider(
    np.asarray(train_labels),
    batch_size=batch_size,
    epoch_samples=2_000_000,
    seed=123,
)

# Build prefetching DataLoader
train_loader, train_controller = pf.build_prefetching_dataloader(
    dataset=train_ds,
    s3_uris=train_uris,
    plan_provider=plan_provider,
    cache=cache,
    num_workers=8,
    batches_ahead=200,
    batches_back=4,
    prune_every=10,
    prune_mode="window",
    prune_backend="scan",
    persistent_workers=True,
    pin_memory=True,
)

# Minimal training loop
for epoch in range(10):
    train_controller.set_epoch(epoch)

    for images, y in train_loader:
        # Your training step here
        pass

I think you can argue this is convoluted. Except without this little hack, I had no good solution so I probably would have just done nothing. The incredible unlock everyone is experiencing is lots of little "this would be good but I can't spend my time writing it now" side quests become the responsibility of your army of juniors (who are fast becoming your betters).

Even more specifically, if you spend time in ML, there is a ridiculous number of opportunities to use python+rust in place of python to heavily optimize your pipelines. That's the pattern I have fallen in love with recently. To be honest, I still don't really trust codex or claude code (even Opus 4.5!) with core researcher tasks. It still makes some silly mistakes and those are still hard to catch without serious attention. But for "optimize this with a rust extension" tasks, you usually have the benefit of a python implementation first that can serve as your unit test in rust. You have something verifiable and then you have something fast and verified with little additional effort.

Long Live Claude Code.