What is CORAL?
CORAL is a first-of-its-kind U.S. Department of Energy (DOE) collaboration between the National Nuclear Security Administration’s (NNSA’s) ASC Program and the Office of Science’s Advanced Scientific Computing Research program (ASCR) that will culminate in three ultra-high performance supercomputers at Lawrence Livermore, Oak Ridge, and Argonne national laboratories. The systems, delivered in the 2017 timeframe, will be used for the most demanding scientific and national security simulation and modeling applications, and will enable continued U.S. leadership in computing. The Livermore system resulting from CORAL will be named Sierra.
CORAL is the next major phase in the U.S. Department of Energy’s scientific computing roadmap and path to exascale computing. The procurements resulting from CORAL will influence the modernization of future generations of computing throughout the NNSA complex.
Consult the summary of the CORAL benchmarking process [PDF] presented at the May 31 CORAL vendor meeting. Updated September 24.
Figures of Merit (FOMs) for baseline calculations, scaling data, and initial weights [XLSX] [PDF] are subject to change until issuance of the final RFP.
CORAL Benchmarks
The CORAL benchmarks contained within represent their state as of February 2014. Other than minor bug fixes and build issues, these versions will not be updated. Users should use the links provided to access official benchmark home pages for updated and maintained versions.
Supplemental Information | Change Log
CORAL Benchmarks | |||||||||
---|---|---|---|---|---|---|---|---|---|
Scalable Science Benchmarks | Priority Level |
Lines of Code |
Parallelism | Language | Code Description/Notes | ||||
MPI | OpenMP/ Pthreads |
Fortran | Python | C | C++ | ||||
LSMS | TR-1 | 200,000 | X | X | X | X | Floating point performance, point-to-point communication scaling. | ||
QBOX | TR-1 | 47,000 | X | X | X | Quantum molecular dynamics. Memory bandwidth, high floating-point intensity, collectives (alltoallv, allreduce, bcast). | |||
HACC | TR-1 | 35,000 | X | X | X | Compute intensity, random memory access, all-to-all communication. | |||
Nekbone | TR-1 | 48,000 | X | X | X | Compute intensity, small messages, allreduce. | |||
Throughput Benchmarks | Priority Level |
Lines of Code |
Parallelism | Language | Code Description/Notes | ||||
MPI | OpenMP/ Pthreads |
Fortran | Python | C | C++ | ||||
CAM-SE | TR-1 | 150,000 | X | X | X | X | Memory bandwidth, strong scaling, MPI latency. | ||
UMT2013 | TR-1 | 51,000 | X | X | X | X | X | X | Single physics package code. Unstructured Mesh deterministic radiation Transport. Memory bandwidth, compute intensity, large messages, Python. |
AMG2013 | TR-1 | 75,000 | X | X | X | Algebraic Multi-Grid linear system solver for unstructured mesh physics packages. | |||
MCB | TR-1 | 13,000 | X | X | X | Monte Carlo transport. Non-floating-point intensive, branching, load balancing. | |||
QMCPACK | TR-2 | 200.000 | X | X | X | X | Memory bandwidth, thread efficiency, compilers. | ||
NAMD | TR-2 | 180,000 | X | X | X | Classical molecular dynamics. Compute intensity, random memory access, small messages, all-to-all communications. | |||
LULESH | TR-2 | 5,000 | X | X | X | Shock hydrodynamics for unstructured meshes. Fine-grained loop level threading. | |||
SNAP | TR-2 | 3,000 | X | X | X | Deterministic radiation transport for structured meshes. | |||
miniFE | TR-2 | 50,000 | X | X | X | Finite element code. | |||
Data-Centric Benchmarks |
Priority Level |
Lines of Code |
Parallelism | Language | Code Description/Notes | ||||
MPI | OpenMP/ Pthreads |
Fortran | Python | C | C++ | ||||
Graph500 | TR-1 | X | X | Scalable breadth-first seach of a large undirected graph. | |||||
Integer Sort | TR-1 | 2,000 | X | X | X | Parallel integer sort. | |||
Hash | TR-1 | X | X | X | Parallel hash benchmark. | ||||
SPECint2006 "peak" | TR-2 | X | X | X | CPU integer processor benchmark; Report peak results or estimates. | ||||
Skeleton Benchmarks |
Priority Level |
Lines of Code |
Parallelism | Language | Code Description/Notes | ||||
MPI | OpenMP/ Pthreads |
Fortran | Python | C | C++ | ||||
CLOMP | TR-1 | X | X | Measure OpenMP overheads and other performance impacts due to threading. | |||||
IOR | TR-1 | 4,000 | X | X | Interleaved or Random I/O benchmark. Used for testing the performance of parallel filesystems and burst buffers using various interfaces and access patterns. | ||||
CORAL MPI benchmarks | TR-1 | 1,000 | X | X | Subsystem functionality and performance tests. Collection of independent MPI benchmarks to measure various aspects of MPI performance including interconnect messaging rate, latency, aggregate bandwidth, and collective latencies. | ||||
Memory benchmarks STREAM | STRIDE |
TR-1 | 1,500 | X | X | Memory subsystem functionality and performance tests. Collection of STREAMS and STRIDE memory benchmarks to measure the memory subsystem under a variety of memory access patterns. | ||||
LCALS | TR-1 | 5,000 | X | X | Single node. Application loops to test the performance of SIMD vectorization. | ||||
Pynamic | TR-2 | 12,000 | X | X | X | Subsystem functionality and performance test. Dummy application that closely models the footprint of an important Python-based multi-physics ASC code. | |||
HACC IO | TR-2 | 2,000 | X | X | Application centric I/O benchmark tests. | ||||
FTQ | TR-2 | 1,000 | X | Fixed Time Quantum test. Measures operating system noise. | |||||
XSBench (mini OpenMC) | TR-2 | 1,000 | X | X | Monte Carlo Neutron Transport. Stresses system through memory capacity (including potential NVRAM), random memory access, memory latency, threading, and memory contention. | ||||
MiniMADNESS | TR-2 | 10,000 | X | X | X | Vector FPU, threading, active-messages. | |||
Microkernel Benchmarks |
Priority Level |
Lines of Code |
Parallelism | Language | Code Description/Notes | ||||
MPI | OpenMP/ Pthreads |
Fortran | Python | C | C++ | ||||
NEKbonemk | TR-3 | 2,000 | X | Single node. NEKbone micro-kernel and SIMD compiler challenge. | |||||
HACCmk | TR-3 | 250 | X | X | Single core optimization and SIMD compiler challenge, compute intensity. | ||||
UMTmk | TR-3 | 550 | X | Single node UMT microkernel. | |||||
AMGmk | TR-3 | 1,800 | X | X | Three compute intensive kernels from AMG. | ||||
MILCmk | TR-3 | 5,000 | X | X | Compute intensity and memory performance. | ||||
GFMCmk | TR-3 | 150 | X | X | Random memory access, single node. |
Scalable Science Benchmarks
HACC
Throughput Benchmarks
- NAMD home page
- NAMD summary
- NAMD tar file
Inputs: small 1M 3M
miniFE
Data-Centric Benchmarks
- Vendors are encouraged to use their own SPEC CINT2006 license to run the benchmark. If you do not have a copy, please send e-mail to coral-apps [at] lists.llnl.gov (coral-apps[at]lists[dot]llnl[dot]gov).
Skeleton Benchmarks
XSBench
Microkernel Benchmarks
Supplemental Information
The following content is provided as supplemental information for vendors. It is included to help offerers better understand our application requirements. It is not part of the formal RFP process or technical requirements, should not take precedence over formal RFP technical requirements, and is not intended to be addressed by offerers as part of a formal response.
- Performance Characteristics of HYDRA, a multi-physics simulation code from LLNL
- Use Cases for Large Memory Appliance/Burst Buffer, an LLNL perspective
Change Log
02/05 - UMT2013 source was updated to address OpenMP thread safety issues. The files Teton/transport/Teton/mods/[ZoneData,Quadrature,Boundary]_mod.F90 have added threadprivate() directive statements. Details are in the updated README file. Due to the late date of this fix, offerers need not recalculate FOMs if tests have already been completed.
01/31 - The LCALSSuite.cxx file was changed to set array allocation lengths to be maximum array index accessed over all suite kernels. This fixes an issue where the VOL3D_CALC kernel was accessing data beyond some of the end of the arrays it was using. (Ref. CORAL RFP Q&A 18)
01/23 - The SNAP summary file and example inputs in the source distribution were updated to clarify how to run problems.
01/22 - Updated the Nekbone summary file to reflect a relaxation of the allowable deviation in spectral element count from 5% to 15% (see CORAL RFP Q&A for more detail).
01/10 - Updated UMT build to handle preprocessing of .F90 files correctly. Updated UMT README to correct definition of FOM to be consistent with source code and summary file.
01/08 - Update to KMI Hash benchmark to support > 2 Gb per MPI rank with MPI-2.
01/06 - Updated SNAP summary file with command line parameters to scale to various problem sizes.
Questions or comments about the benchmarks should be directed to coral-apps [at] lists.llnl.gov (coral-apps[at]lists[dot]llnl[dot]gov).
Last modified on June 19, 2014
LLNL-WEB-637074