Zero Drift
Promote validated runtimes from eval to serving by switching a single immutable reference.
Deterministic, reproducible CUDA ML/AI runtimes on NVIDIA GPUs. Define ML/AI runtimes as code. Promote by switching a pinned reference. Revert the same way.
Request AccessThe CUDA Kickstart Program ships a reference architecture and validated Flox environments for common serving patterns (diffusion, LLM inference, Triton, PyTorch inference) plus GPU-architecture-specific builds of PyTorch, torchvision, torchaudio, vLLM, ONNX Runtime, and other core frameworks. Teams ship as slim OCI images or run directly on Kubernetes from a pinned Flox environment reference.
Promote validated runtimes from eval to serving by switching a single immutable reference.
Deterministic SBOMs and provenance computed from each runtime environment's dependency graph.
Smaller artifacts and faster rollouts via GPU-specific builds, option to generate optimized OCI images.

If you’re completely new to Flox, start with our Flox in 5 minutes guide.
Get access to validated Flox environments for common GPU serving patterns, GPU-specific builds of ML frameworks like PyTorch, ONNX Runtime, and other resources.
Get concrete recommendations for your stack, deployment targets, and GPU architectures, along with guidance on adopting Flox in your environment.
CUDA-accelerated ML workloads depend on a fragile matrix of tightly coupled CUDA user-space libraries, Python runtimes, native libraries, and serving frameworks. Teams historically use OCI images to isolate these dependencies, but container rebuild → push → pull → test loops slow them down. Flox gives teams a declarative, reproducible alternative that eliminates image rebuild loops and provides a reviewable diff across OS, Python, and CUDA dependencies. This simplifies CVE patching, runtime validation, and audits.
A deployment blueprint for deterministic CUDA ML environments across GPU fleets: promotion gates, rollback mechanics, and reproducible runtime definitions.
Ready-to-run environments for common serving patterns: diffusion, LLM inference, Triton, PyTorch inference, and more.
Deterministic, reproducible builds for GPU-architecture-specific artifacts (PyTorch, vLLM, ONNX Runtime, llama.cpp, and more) to reduce artifact size and reduce rollout time across GPU fleets.
How declarative, deterministic runtime environments help highly regulated, latency-sensitive orgs move CUDA ML/AI workloads from R&D to production. Dependency changes are atomic edits to an environment definition. Promotion and rollback are reference switches, not container rebuilds. Deterministic SBOMs accelerate CVE triage and response.
Walkthroughs and implementation notes for operating reproducible CUDA stacks.
Flox environments resolve from a pinned environment definition, together with its lockfile, that resolves to an immutable, hash-addressed dependency set. That yields a tamper-evident chain stretching from what you declared (i.e., the environment definition and lockfile) to what runs (i.e., the realized runtime packages). SBOMs are derived from the dependency graph itself, which makes it easier to map CVEs to what's actually running in production; remediation becomes an edit to a pinned runtime definition plus a reference promotion.
Deterministic SBOMs per runtime reference/generation
Faster CVE triage by mapping alerts to environment refs/hashes
Patch by editing the declared runtime, promote by switching the ref
Roll back instantly by reverting the ref
See how financial services teams use Flox and Nix to ship CUDA ML stacks with reproducible environments, atomic rollbacks, and provable software supply chains.
Download for freeGet concrete recommendations for your stack, deployment targets, and GPU architectures, along with guidance on adopting Flox in your environment.
“Flox removes the risk of environment drift by letting you replicate your exact production environment during development, regardless of architecture differences between OSes.”