Flox, the Nix Foundation, and NVIDIA are teaming up to make it easier for teams working with NVIDIA’s CUDA GPU compute framework to build and ship CUDA-accelerated stacks.
Teams can now easily build and deploy portable CUDA runtimes—including software like the CUDA Toolkit, Tensorflow, TensorRT, PyTorch, OpenCV, and other packages—that run anywhere, at any time. Not only that, they can easily install selected historical versions of CUDA packages—essential for supporting existing production workloads—along with conflicting versions of CUDA packages in the same runtime environment, at the same time.
The best part? All of this just works. Without changes to your global system state. Without containers or VMs.
If this sounds a lot like Arthur C. Haley’s definition of magic, let me introduce you to the “magic” of Nix, the open source package and environment manager that doubles as a deterministic build system.
Or the “magic” of Flox, open source software that gives you access to the same software—plus the same portability and reproducibility guarantees—as Nix, with intuitive semantics and a familiar, easy-to-use UI.
Both enable teams to ship the GPU-accelerated stacks they build and test locally to CI and production. Using either Nix or Flox, teams can build GPU-accelerated software on Linux or macOS—for CUDA or Metal—share what they build with one another, push their environments to CI, and even deploy them to prod.
If this sounds incredible to you, read on to learn more about Nix, Flox, and the promise of reproducible, deterministic environments—not just for CUDA, but for any language, toolchain, or framework you work with.
What’s Happening
With a huge assist from the Nix CUDA team and the Nix community as a whole, NVIDIA CUDA is coming to the Flox Catalog!
Flox is one of the first commercial open source vendors licensed to redistribute precompiled, prepatched packages that include NVIDIA’s proprietary CUDA binaries and libraries. Flox’s CUDA-accelerated packages are built using canonical recipes defined and maintained by the Nix CUDA team. For users of Flox and Nix, packages with CUDA dependencies will be available for use as soon as they’re downloaded: they no longer need to be built from source.
For packages like cudaPackages.tensorrt, opencv, or even ffmpeg-full with CUDA acceleration, building from source could take hours. And building a full CUDA-accelerated machine learning (ML) stack—with dependencies like tensorrt
, pytorch, opencv
, xgboost, cudnn, and other packages—could take days.
That’s why we’re glad to be partnering with NVIDIA and the Nix Foundation to make this possible.
Being able to download and install precompiled, prepackaged CUDA dependencies addresses an acute pain point for open source users. Not only does it make it easier and faster to download and run CUDA-accelerated packages, but—with Nix and Flox—it unlocks the ability to build and ship reproducible CUDA stacks. Nix and Flox build packages so that they’re always deterministic products of the declarative inputs used to create them. This means the same inputs always produce the same package. Anytime. Anywhere.
Let’s briefly explore what’s happening, how it’s different from the status quo, and why it matters.
What This Is—and Why It’s Different
Today Nix users leverage the Nix binary cache to install pre-built packages, saving time and resources otherwise spent building software from source. However, due to licensing restrictions, the Nix community could not redistribute pre-compiled proprietary binaries and libraries, forcing users to build these when needed.
The Flox Catalog uses Nixpkgs as its upstream, sampling once a day to capture and build any packages that change. These built packages live in the Flox Catalog’s binary cache. Previously, when Flox clients installed packages with CUDA dependencies, they, like Nix clients, had to build these packages from source and/or patch binaries that linked against CUDA due to the same licensing restrictions. They couldn’t get them pre-built/prepatched from the Flox Catalog’s binary cache.
Now when a Flox CLI client fetches a CUDA-accelerated package, like TensorRT, they get a prebuilt package that includes Nvidia’s proprietary bits. Even better, Nix clients can add Flox’s binary cache to their list of substituters and get the same experience. The CUDA-accelerated packages you get from Flox’s binary cache are built from the same upstream definitions in Nixpkgs; the only difference is that Flox is authorized to redistribute them … so users no longer have to build from source.
What This Isn’t—and What’s Still the Same
If you’re a user of the Flox CLI, you can pull CUDA-accelerated user space packages from the Flox Catalog.
To run them, you’ll need to install NVIDIA’s official CUDA driver stack on Linux. (On Windows with WSL2, installing NVIDIA’s WDM-certified display driver typically unlocks CUDA acceleration.) Once done, you can pull any CUDA-accelerated packages you need—including the CUDA Toolkit—from the Flox Catalog, and they’ll run against the driver version currently registered with your kernel.
If you use the Nix package manager (which is different from NixOS), you can pull CUDA-accelerated packages by adding Flox’s binary cache to your extra-substituters
in nix.conf
. Bear in mind that the same limitation applies: viz., you’ll get user-space packages from Flox, but you’ll still need to install NVIDIA’s official driver stack separately.
Just drop the following into nix.conf
:
extra-trusted-substituters = https://cache.flox.dev
extra-trusted-public-keys = flox-cache-public-1:7F4OyH7ZCnFhcze3fJdfyXYLQw/aV7GEed86nQ7IsOs=
Why use NVIDIA’s official drivers? Your Linux distribution may ship CUDA-enabled drivers in its repos, but these often trail (far behind) current-stable. For workloads like ML, AI, multimedia, or HPC, you usually need the latest CUDA Toolkit and CUDA driver releases. The Flox Catalog gives you access to the latest CUDA-accelerated applications and libraries—but they’ll only run if your CUDA stack supports them.
One Difference: CUDA Drivers on NixOS
If you’re using NixOS, you can also pull prebuilt, prepatched CUDA display drivers from Flox’s binary cache. To get both CUDA user-space packages and display drivers, you just need to add cache.flox.dev
as an extra-substituter
in your configuration.nix
file. This way cache.nixos.org
remains first in the lookup order, and cache.flox.dev
only gets checked if the specified package isn’t available upstream.
If on the off chance you do want Flox’s user-space CUDA packages but prefer to get your display drivers from cache.nixos.org
, no problem: by defining Flox’s binary cache as an extra-substituter
, you’ll get both user-space packages and drivers from upstream Nixpkgs—should they exist; Flox’s cache only provides them if upstream doesn’t. By the same token, to get CUDA drivers from cache.nixos.org
, you must declare a CUDA driver package that Hydra actually builds. The Nix binary cache only publishes the driver branches that NixOS maintainers include in Hydra’s jobsets, dropping them once those jobsets move on.
To sum up:
-
Flox or Nix (non-NixOS) users can pull CUDA-accelerated user-space packages from Flox’s binary cache, but must install NVIDIA’s official driver stack separately.
-
NixOS users can pull CUDA-accelerated user-space packages from Flox’s binary cache by adding it as an
extra-substituter
. NixOS users can get NVIDIA”s CUDA display drivers from Flox’s binary cache, too.
About the Flox Binary Cache
Flox maintains its own binary cache consisting of packages built from the official Nixpkgs repository’s build recipes (i.e., Nix expressions). These are used to produce Nix Archive (.nar
) files—analogous to the .deb
and .rpm
used by the Debian and Fedora projects, respectively—along with their associated metadata. The Flox Catalog consumes this metadata and gives Flox CLI users a way to discover, explore, and install pre-built packages from the Flox binary cache. In other words, Flox operates its own mirror of Nixpkgs, but this is for the purposes of build automation; Flox does not modify canonical upstream Nixpkgs definitions.
This is a “fork” only in the most pedantic sense. The purpose of mirroring and refreshing from Nixpkgs is to allow the Flox Catalog and Flox binary cache to stay current with upstream changes. Plus, by maintaining its own binary cache, Flox can build and distribute both current and historical versions of Nix packages—not just fully open source software, but also proprietary packages (like CUDA) subject to licensing restrictions. Flox users can then easily search for and install them alongside current ones.
Why Not Use the Nix Binary Cache?
If the recipes Flox uses to build CUDA-accelerated packages come from upstream Nixpkgs, why can’t the Nix community build and package these to begin with? Why can’t the Nix binary cache serve up pre-built CUDA-accelerated packages? The answer lies in the licensing restrictions that commercial vendors tend to place on proprietary software when it’s used with open source projects.
NVIDIA’s licensing terms require an unambiguous, legally accountable entity to sign agreements and manage compliance. Flox provides the corporate structure, legal resources, and liability coverage necessary for this kind of partnership. The Nix Foundation, as a mission-oriented, community-driven non-profit, has a structure that—at this point in time—can’t easily be brought into alignment with NVIDIA’s legal requirements.
Making CUDA Development Portable, Reproducible, and Repeatable
If you work with CUDA, NVIDIA’s market-leading GPU compute framework, you’ve probably had to deal with
- Reproducibility failures. CUDA stacks built on one system don’t always run on others;
- Conflicting CUDA versions. Different frameworks require different CUDA toolkits, which cannot coexist;
- Broken setups. Installing CUDA dependencies / debugging conflicts breaks your setup;
- Heavy-duty rebuilds. Keeping CUDA-capable CI runners up-to-date is challenging;
- Stalls in dev velocity. Until builds succeed, GPU-accelerated workloads can’t run.
Issues like these cause teams to lose productivity, iteration cycles to stretch out, and GPU resources to sit idle because workloads can’t run. These headaches aren’t in any sense unique to CUDA development, however; they’re a fact of life in modern software engineering, spanning all languages and toolchains.
But what if there were a mature, proven way to eliminate these and similar problems?
There is! Two ways actually: Nix and Flox. Nix lets you build portable dev environments and packages that run the same anywhere. Flox offers the same reproducibility guarantees as Nix, but with intuitive semantics and a familiar, easy-to-use interface. You can use either to create and ship CUDA stacks that just work anywhere.
Now you can get prebuilt CUDA-accelerated software packages—including NVIDIA’s CUDA Toolkit—from Flox. Even better, you can install separate, conflicting versions of CUDA-accelerated packages on the same system, at the same time. You can even use conflicting packages in the same runtime environment.
Here’s why this is a big deal:
- Developer velocity. No more hours-long local or CI rebuilds: teams can pull prebuilt, patched CUDA packages directly from Flox’s binary cache and start working / running jobs immediately;
- Reproducibility. Packages come from the same upstream definitions in Nixpkgs, so builds are always deterministic and versioned—whether you’re working locally in local dev, CI, or production;
- AI/ML acceleration. CUDA-enabled frameworks (PyTorch, TensorFlow, TensorRT, etc.) become drop-in ready across environments, accelerating experimentation / deployment of GPU-accelerated workloads.