Blog

Reproducible, Auditable ML/AI for Capital Markets with Flox

Steve Swoyer | 16 March 2026

tl;dr

For firms in capital markets, Flox enables:

Shorter delivery cycles for ML/AI workloads, reduced rebuild-and-distribute overhead
Safer upgrades and reversions via versioned environment definitions
A declarative environment spec becomes the unit of promotion across the SDLC, from local dev to prod
Better visibility for security, compliance, and audit into what runs in production
A more resilient, adaptable operational posture as market conditions and models change

Capital Markets and AI/ML

Firms in capital markets train custom ML/AI models for quantitative research, model-driven trading, risk management, and other operations. ML/AI workloads depend on a fragile matrix of compatible NVIDIA CUDA, Python, native libraries, and other dependencies. This fragility makes it challenging to ship production-grade releases quickly while controlling for operational, compliance, and supply-chain risk. Firms trust Flox to provide a fast, secure, repeatable path from R&D to production for ML/AI workloads. Flox is a cross-platform, cross-toolchain package manager and build system, based on open source Nix. It provides traceable, auditable provenance; generates deterministic SBOMs; simplifies CVE triage and remediation; and permits controlled, atomic upgrades (or rollbacks) without disrupting operations. Firms in capital markets rely on Flox to ensure their ML/AI workloads run reproducibly everywhere.

The challenge: Moving fast without breaking things

To enable that fast, secure path from R&D to production, firms need to tighten their ML/AI → build → validate → release loops. At a minimum, this means each ML/AI release needs to meet four basic requirements:

Compatibility with existing systems and workflows. New workloads can't break older ones; multiple generations of models and tooling need to coexist on the same machine/infrastructure.
Repeatable results. Different teams must be able to reproducibly run the same workloads in research, testing, and production. ML/AI models that run in R&D and testing must run the same way in prod.
Predictable, controlled change management. Leaders and auditors need clear visibility into what changed and what's being deployed, as well as a fast, reliable path to roll back when something goes wrong.
Improved visibility across the software supply chain. Security and compliance teams need an accurate, up-to-date view of what software is running, where it came from, and how to respond when CVEs drop.

Many organizations attempt to address these requirements with virtual machine (VM) or container images. These approaches work well for conventional workloads, but for GPU-accelerated ML/AI workloads, where toolkits, frameworks, and libraries must exactly line up, they come with several unacceptable drawbacks.

For ML/AI stacks, containers aren't always an obvious answer

Both VM and container patterns isolate ML/AI dependencies by packaging them into image files. This quarantines CUDA, Python, and other dependencies and prevents them from conflicting with one another. The VM or container image also doubles as a portable artifact for shipping the complete ML/AI stack.

The problem with this pattern is that the multi-gigabyte VM or container image itself becomes the unit of promotion. This is less of an issue with VM images than with container (i.e., OCI) images, which routinely need to be rebuilt when patching bugs, or when CUDA, Python, native libraries, or base images change.

In ML/AI, the typical container development loop looks like this:

This has the following real-world effects:

Container images are time-consuming to build, test, and change.
Container-based workflows do not neatly align with every firm's SDLC practices and patterns.
The container model makes specific assumptions about how ML/AI research teams work, as well as the types of hardware teams have access to in local dev, eval, CI, and production.
ML/AI stacks produce very large OCI images: usually multiple gigabytes, sometimes tens of gigabytes.

Flox makes software change predictable—and reversible

Flox was spun out of hedge fund the D. E. Shaw group, which engineered an early Flox-like prototype to serve as a user-friendly frontend for Nix. Even though Nix is a hugely powerful solution for building, packaging, and running software, many non-programmers find it intimidating to learn and use.

Flox uses TOML-defined environments in place of the Nix language. It surfaces an intuitive developer workflow, with familiar CLI semantics similar to git or npm. It combines an imperative CLI frontend with FloxHub, a central registry that teams use to version, manage, and share Flox environments, as well as generate deterministic SBOMs. The curated Flox Catalog indexes more than five years of package-version combinations, so teams can easily discover and install new and historical versions of packages. Like Nix, Flox supports reproducible builds, with results teams can publish to a private catalog and install anywhere.

Flox environments address two distinct SDLC requirements:

Flox Runtime environments. A declared set of everything required to reproducibly run ML/AI workloads, including CUDA and Python dependencies, performance-accelerated libraries, ML/AI models, etc.

Flox build environments. A declared set of everything required to reproducibly build and package ML/AI software—including checkpoint models, signals, and datasets—into versioned, immutable packages.

Flox gives teams a controlled, repeatable path from prototype to production. Teams use Flox to create declarative environments that pull in the software dependencies their ML/AI workloads need to build or run. Flox runtime environments execute in a partially isolated context, so CUDA, Python, and other dependencies that would otherwise conflict can coexist. Flox build environments run in a completely isolated sandbox.

With Flox, the declarative environment, not the OCI image, becomes the artifact that teams version, review, and promote. This means the same Flox environments that ML/AI researchers create and use in training can travel across the SDLC: through eval and CI, hardening by MLOps teams, to production deployment—and back again. Flox environments work on Apple, Linux, and Windows (with WSL2) systems, on bare metal or in VMs. They run natively on x86 or ARM chips, and take advantage of GPU acceleration (NVIDIA CUDA and Apple Metal). MLOps Teams can use them as declarative recipes to build minimal container images for testing or prod; optionally, Flox environments can run without containers on Kubernetes GPU clusters.

Key benefits for capital markets firms

By adopting Flox, firms in capital markets say they've realized the following concrete benefits:

1. Faster delivery of time-critical ML/AI assets

Teams can iterate without constantly rebuilding + redistributing massive image artifacts to test dependency changes. This tightens feedback loops and speeds the path from validated PoC to production rollout.

2. Conflicting software stacks easily coexist

New and legacy ML/AI workloads can run side by side on shared infrastructure, each getting access to the dependency set it requires. Researchers iterate quickly and ship what they build without breaking production.

3. Fits into existing workflows + simplifies rollback

Flox defines the environment and the software used to run an ML/AI workload as a versioned, declarative specification, so leaders and reviewers can see what changed before and after a deployment. If a rollout causes issues, reverting is an atomic switch to a prior version—e.g., editing the reference to a Git commit.

4. Improved security + a stronger, more resilient audit posture

Flox makes it possible to provide authoritative answers to "what's running?" and "where did it come from?" because the complete dependency tree is explicit and traceable from any built package's outputs. This supports software inventory, improves vulnerability response, and simplifies audit workflows.

The Bottom line

For firms in capital markets, Flox is the key to delivering and operationalizing cutting-edge ML/AI software.

With Flox, ML/AI teams can iterate much more rapidly and ship much more reliably. With Flox, organizations get SBOMs by default, so infosec and compliance teams always know which versions of which software are running in production. With Flox, organizations get a deterministic audit trail that records what changed, when, by whom, and how. With Flox, finally, organizations are able to move fast without breaking anything.

Flox is battle-tested in capital markets, where teams use it to build, train, harden, and run production ML and inference workflows with NVIDIA Triton Inference Server, vLLM, llama.cpp, PyTorch, ONNX Runtime, and other ML frameworks. To learn more about the Flox CUDA Kickstart Program, visit our resource site; you can also clone and run example public resources from the flox-cuda github repo.