El Capitan: NNSA’s first exascale machine

 

Image showing the different scales of the systems in comparison to exascale, many of which were ranked #1 on the Top500 list at the time they were built. Click to enlarge.

Continuing its tradition as a dominant Top500 high-performance computing (HPC) site, LLNL began installing components in May 2023 for NNSA’s first exascale supercomputer, El Capitan. An exascale supercomputer can calculate at least one quintillion (1,000,000,000,000,000,000+) double precision (64-bit) operations per second (1 exaflop). Deployed in 2024, El Capitan is ranked as the world’s most powerful supercomputer, capable of performing more than 2 exaflops per second.

El Capitan’s purpose

Funded by NNSA’s ASC program, El Capitan was a collaboration among the three NNSA labs—Livermore, Los Alamos, and Sandia. El Capitan's capabilities help researchers ensure the safety, security, and reliability of the nation’s nuclear stockpile in the absence of underground testing. The machine is essential for the design and stewardship of a modernized stockpile and other critical national security missions. Research performed on El Capitan also supports unclassified mission areas of interest to national security, including material discovery, high-energy-density physics, nuclear data, material equations of state, and conventional weapon design.

To ensure the system achieves its full computing potential, LLNL is investing in cognitive simulation capabilities such as artificial intelligence (AI) and machine learning (ML) techniques that will benefit both unclassified and classified missions.

El Capitan’s early access systems

Three of El Capitan’s smaller early access systems—Tenaya, Tioga, and RZVernal—currently rank among the Top500 supercomputers in the world.

Tuolumne and RZAdams

Research performed on El Capitan’s largest “sibling” system, Tuolumne, supports other unclassified projects in energy security, climate change, cancer drug discovery, and other areas of public interest. Like Tuolumne, El Capitan’s other smaller, unclassified “sibling” system, RZAdams, supports both weapons and non-weapons missions. Both systems were purchased under the El Capitan contract and arrived in 2024.

El Capitan’s novel systems’ software strategy

El Capitan is the first ASC Advanced Technology System, which includes ASC's largest systems, to use the TOSS, the Tri-Lab Operating System Software, which is the same environment and operating system that ASC’s commodity technology machines use. This advancement simplifies system administration and improves user experiences.

Siting El Capitan

Installing the HPE/AMD system required the efforts of hundreds of people as well as public and private partnerships. Many years of careful planning and preparation have paved the way for its successful arrival, including a massive construction upgrade of power and water to LLNL’s HPC facility.

While it is one of the world’s most energy-efficient supercomputers, El Capitan requires about 30 megawatts (MW) of energy to run at peak—enough power to run a mid-size city.

See our Exascale Computing Facility Modernization (ECFM) fact sheet, or watch this video for more information.

For more details about Livermore’s journey to exascale, watch the Building of El Capitan YouTube video and read our multi-part article series “The Road to El Capitan.”

El Capitan’s details and distinguishing features

  • Funded by NNSA’s ASC program
  • Siting complete in 2024
  • Peak performance 2.79 exaflops
  • Peak power ~35 MW
  • AMD MI300 accelerated processing unit (APU)-3D chiplet design, which includes a tightly coupled central processing unit (CPU)—graphics processing unit (GPU) in one processing unit
  • Slingshot interconnect
  • Innovative Rabbit near-node local storage
  • Used by Tri-Labs (Livermore, Los Alamos, and Sandia)