Connect

Flox CUDA
Kickstart Program

Deterministic, reproducible CUDA ML/AI runtimes on NVIDIA GPUs. Define ML/AI runtimes as code. Promote by switching a pinned reference. Revert the same way.

Request Access
Kickstart Program

From Model to Production:
CUDA Environments, Pre-Validated

The CUDA Kickstart Program ships a reference architecture and validated Flox environments for common serving patterns (diffusion, LLM inference, Triton, PyTorch inference) plus GPU-architecture-specific builds of PyTorch, torchvision, torchaudio, vLLM, ONNX Runtime, and other core frameworks. Teams ship as slim OCI images or run directly on Kubernetes from a pinned Flox environment reference.

Diagram showing four common CUDA stacks, their philosophies, and aha moments.
CUDA stacks diagram, part 1.Swipe to view additional diagram panels.
CUDA stacks diagram, part 2.
CUDA stacks diagram, part 3.
CUDA stacks diagram, part 4.

Zero Drift

Promote validated runtimes from eval to serving by switching a single immutable reference.

Zero Trust

Deterministic SBOMs and provenance computed from each runtime environment's dependency graph.

Zero Fat

Smaller artifacts and faster rollouts via GPU-specific builds, option to generate optimized OCI images.

How to get started

How it works

CUDA stacks move fast.
Production has to stay stable.

CUDA-accelerated ML workloads depend on a fragile matrix of tightly coupled CUDA user-space libraries, Python runtimes, native libraries, and serving frameworks. Teams historically use OCI images to isolate these dependencies, but container rebuild → push → pull → test loops slow them down. Flox gives teams a declarative, reproducible alternative that eliminates image rebuild loops and provides a reviewable diff across OS, Python, and CUDA dependencies. This simplifies CVE patching, runtime validation, and audits.

Diagram showing host OS variations flowing through Flox/Nix to stable binary execution regardless of host.
Reference architecture diagram, part 1.Swipe to view the remaining reference architecture diagrams.
Reference architecture diagram, part 2.
Reference architecture diagram, part 3.
How it works

What's included in the
CUDA Kickstart Program

Reference Architecture

A deployment blueprint for deterministic CUDA ML environments across GPU fleets: promotion gates, rollback mechanics, and reproducible runtime definitions.

Validated Environments

Ready-to-run environments for common serving patterns: diffusion, LLM inference, Triton, PyTorch inference, and more.

Build Recipes for GPU-Specific Frameworks

Deterministic, reproducible builds for GPU-architecture-specific artifacts (PyTorch, vLLM, ONNX Runtime, llama.cpp, and more) to reduce artifact size and reduce rollout time across GPU fleets.

Technical Case Study (Capital Markets)

How declarative, deterministic runtime environments help highly regulated, latency-sensitive orgs move CUDA ML/AI workloads from R&D to production. Dependency changes are atomic edits to an environment definition. Promotion and rollback are reference switches, not container rebuilds. Deterministic SBOMs accelerate CVE triage and response.

Demos and Blogs

Walkthroughs and implementation notes for operating reproducible CUDA stacks.

How it works

A secure, repeatable path from R&D to production

Flox environments resolve from a pinned environment definition, together with its lockfile, that resolves to an immutable, hash-addressed dependency set. That yields a tamper-evident chain stretching from what you declared (i.e., the environment definition and lockfile) to what runs (i.e., the realized runtime packages). SBOMs are derived from the dependency graph itself, which makes it easier to map CVEs to what's actually running in production; remediation becomes an edit to a pinned runtime definition plus a reference promotion.

  • Deterministic SBOMs per runtime reference/generation

  • Faster CVE triage by mapping alerts to environment refs/hashes

  • Patch by editing the declared runtime, promote by switching the ref

  • Roll back instantly by reverting the ref

Free e-book

Deterministic ML Infrastructure for Capital Markets

See how financial services teams use Flox and Nix to ship CUDA ML stacks with reproducible environments, atomic rollbacks, and provable software supply chains.

Download for free
Capital Market Case Study e-book cover displayed on a tablet

Explore Flox for
your organization

Get concrete recommendations for your stack, deployment targets, and GPU architectures, along with guidance on adopting Flox in your environment.

Trusted by teams building the future

Weaviate
Fellow
NVIDIA
PostHog
D.E. Shaw & Co
Neo4J
Flox removes the risk of environment drift by letting you replicate your exact production environment during development, regardless of architecture differences between OSes.

Priya Ananthasankar

Principal Software Engineer at Microsoft