Commodity Technology Systems

The Commodity Technology Systems (CTSs) currently sited and in use at LLNL were planned, researched, developed, procured, tested, integrated, and deployed to unify computing efforts across the National Nuclear Security Administration (NNSA) defense complex. These systems leverage industry advances and open-source software to build, field, and integrate high-performance clusters of various sizes into production service. The Advanced Simulation and Computing (ASC) programmatic objective is to substantially reduce the total cost of ownership of these CTSs relative to Linux cluster deployments today. Ultimately, the goal is to make these systems as robust and useful as quickly as possible to support ASC's scientific simulation capacity workloads.

Picture of Bengal CTS-2 machine installed at Livermore.

NNSA's latest capacity computing systems, collectively called CTS-2 systems, are the fourth joint procurement under the ASC program. They will replace the current CTS-1 machines, which were sourced by the 2015 CTS-1 contract and are now nearing retirement. LLNL’s first CTS-2 machines—Mutt, RZWhippet, and Poodle—arrived in mid-2022. Deployment of additional machines, including Dane, RZHound, and Bengal is ongoing, with additional future CTS-2 system deployments through 2025.

The CTS-2 contract is funded primarily by NNSA’s ASC program and will provide at least $70 million for more than 70 petaflops of computing capacity to the NNSA Tri-Labs (LLNL, LANL, and SNL). These systems will be deployed at the Tri-Labs in building blocks called “scalable units” (SUs), with each SU representing approximately 1.4 petaflops of computing power. The CTS-2 contract will also be available to other defense and institutional computing programs.

The CTS-2 SU hardware will couple with the LLNL-led Tri-Laboratory Operating System Software (TOSS) and provide the foundation for the common ASC user simulation environment. The CTS-2 SUs will incorporate next-generation Intel Xeon Scalable processors (code-named “Sapphire Rapids”) and DDR5 memory with Dell PowerEdge C6620 and R760 servers. Each laboratory will configure the SUs into more powerful multiple-SU systems according to specific mission needs. The CPUs will utilize CoolIT direct-to-chip liquid cooling technology. Dell servers that incorporate graphical processing units (GPUs) and advanced HBM memory will also be available under the CTS-2 contract.

This Tri-Lab procurement model reduces costs through economies of scale based on standardized hardware and software environments at the three labs. Scientists are now using the CTS1- and CTS-2 computers for programmatic simulations.

Ruby Supercomputer, with tinted purple light
Ruby is being used in the fight against COVID-19.

NNSA's new capacity computing systems, called the Commodity Technology Systems-1 (CTS-1), were its third joint procurement under the Advanced Simulation and Computing (ASC) program. These computing clusters provide the needed computing capacity for NNSA's day-to-day scientific work at the three labs managing the nation's nuclear deterrent.

Under the CTS-1 contract, Penguin Computing—a Silicon Valley–based developer of high-performance Linux cluster computing systems—furnishes the labs with multiple systems, ranging in size from a few hundred to several thousand nodes.

The previous procurement, the Tri-Lab Linux Capacity Cluster (TLCC2), represented a multi-million dollar and multi-year contract to provide multiple procurement options exceeding 3 petaFLOP/s in CT systems. Under the terms of the contract, computing clusters built of scalable units (SUs) were delivered to Lawrence Livermore, Los Alamos, and Sandia National Laboratories between October 2011 and June 2012. Each SU represented 50 teraFLOP/s of peak computing power and was designed to be interconnected to create more powerful systems. The SUs were divided among the three labs, with each lab configuring the SUs into clusters according to mission needs.

In October 2011, LLNL received the first of 18 SUs, which were combined into a single classified cluster named Zin, with a peak speed of 970 teraFLOP/s. Additional SUs were combined to create the single unclassified cluster named Cab, which has a peak speed of 431 teraFLOP/s. Cab is in the "collaboration zone," where users in the new High Performance Computing Innovation Center (HPCIC) can access the machine. A third cluster, Merl, is a small resource shared by LLNL and the ASC Program for small to moderate parallel jobs. The names of the three supercomputers were inspired by the Livermore area wine country.

This tri-lab procurement model reduced costs through economies of scale based on standardized hardware and software environments at the three labs. Scientists used the TLCC2 computers for programmatic simulations.