Benchmark Codes


Refer to the Draft CORAL Statement of Work for additional information about CORAL application benchmarks.

Consult the summary of the CORAL benchmarking process [PDF] presented at the May 31 CORAL vendor meeting. Updated September 24.

Figures of Merit (FOMs) for baseline calculations, scaling data, and initial weights [XLSX] [PDF] are subject to change until issuance of the final RFP.


The CORAL benchmarks contained within represent their state as of February 2014. Other than minor bug fixes and build issues, these versions will not be updated. Users should use the links provided to access official benchmark home pages for updated and maintained versions.

Supplemental Information | Change Log

CORAL Benchmarks
Scalable Science Benchmarks Priority
Level
Lines
of
Code
Parallelism Language Code Description/Notes
 MPI  OpenMP/
Pthreads
Fortran Python   C    C++ 
LSMS TR-1 200,000 X X X     X Floating point performance, point-to-point communication scaling.
QBOX TR-1 47,000 X X       X Quantum molecular dynamics. Memory bandwidth, high floating-point intensity, collectives (alltoallv, allreduce, bcast).
HACC TR-1 35,000 X X       X Compute intensity, random memory access, all-to-all communication.
Nekbone TR-1 48,000 X   X   X   Compute intensity, small messages, allreduce.
 
Throughput Benchmarks Priority
Level
Lines
of
Code
Parallelism Language Code Description/Notes
 MPI  OpenMP/
Pthreads
Fortran Python   C    C++ 
CAM-SE TR-1 150,000 X X X   X   Memory bandwidth, strong scaling, MPI latency.
UMT2013 TR-1 51,000 X X X X X X Single physics package code. Unstructured Mesh deterministic radiation Transport. Memory bandwidth, compute intensity, large messages, Python.
AMG2013 TR-1 75,000 X X     X   Algebraic Multi-Grid linear system solver for unstructured mesh physics packages.
MCB TR-1 13,000 X X     X   Monte Carlo transport. Non-floating-point intensive, branching, load balancing.
QMCPACK TR-2 200.000 X X     X X Memory bandwidth, thread efficiency, compilers.
NAMD TR-2 180,000 X X       X Classical molecular dynamics. Compute intensity, random memory access, small messages, all-to-all communications.
LULESH TR-2 5,000 X X     X   Shock hydrodynamics for unstructured meshes. Fine-grained loop level threading.
SNAP TR-2 3,000 X X X       Deterministic radiation transport for structured meshes.
miniFE TR-2 50,000 X X       X Finite element code.
 
Data-Centric
Benchmarks
Priority
Level
Lines
of
Code
Parallelism Language Code Description/Notes
 MPI  OpenMP/
Pthreads
Fortran Python   C    C++ 
Graph500 TR-1     X     X   Scalable breadth-first seach of a large undirected graph.
Integer Sort TR-1 2,000 X   X   X   Parallel integer sort.
Hash TR-1   X   X   X   Parallel hash benchmark.
SPECint2006 "peak" TR-2       X   X X CPU integer processor benchmark; Report peak results or estimates.
 
Skeleton
Benchmarks
Priority
Level
Lines
of
Code
Parallelism Language Code Description/Notes
 MPI  OpenMP/
Pthreads
Fortran Python   C    C++ 
CLOMP TR-1     X     X   Measure OpenMP overheads and other performance impacts due to threading.
IOR TR-1 4,000 X       X   Interleaved or Random I/O benchmark. Used for testing the performance of parallel filesystems and burst buffers using various interfaces and access patterns.
CORAL MPI benchmarks TR-1 1,000 X       X   Subsystem functionality and performance tests. Collection of independent MPI benchmarks to measure various aspects of MPI performance including interconnect messaging rate, latency, aggregate bandwidth, and collective latencies.
Memory benchmarks
STREAM | STRIDE
TR-1 1,500     X   X   Memory subsystem functionality and performance tests. Collection of STREAMS and STRIDE memory benchmarks to measure the memory subsystem under a variety of memory access patterns.
LCALS TR-1 5,000   X       X Single node. Application loops to test the performance of SIMD vectorization.
Pynamic TR-2 12,000 X     X   X Subsystem functionality and performance test. Dummy application that closely models the footprint of an important Python-based multi-physics ASC code.
HACC IO TR-2 2,000 X         X Application centric I/O benchmark tests.
FTQ TR-2 1,000         X   Fixed Time Quantum test. Measures operating system noise.
XSBench (mini OpenMC) TR-2 1,000   X     X   Monte Carlo Neutron Transport. Stresses system through memory capacity (including potential NVRAM), random memory access, memory latency, threading, and memory contention.
MiniMADNESS TR-2 10,000 X X       X Vector FPU, threading, active-messages.
 
Microkernel
Benchmarks
Priority
Level
Lines
of
Code
Parallelism Language Code Description/Notes
 MPI  OpenMP/
Pthreads
Fortran Python   C    C++ 
NEKbonemk TR-3 2,000     X       Single node. NEKbone micro-kernel and SIMD compiler challenge.
HACCmk TR-3 250   X       X Single core optimization and SIMD compiler challenge, compute intensity.
UMTmk TR-3 550     X       Single node UMT microkernel.
AMGmk TR-3 1,800   X     X   Three compute intensive kernels from AMG.
MILCmk TR-3 5,000   X     X   Compute intensity and memory performance.
GFMCmk TR-3 150   X X       Random memory access, single node.

 



Scalable Science Benchmarks

LSMS

QBOX

HACC

Nekbone

Throughput Benchmarks

CAM-SE

UMT2013

AMG2013

MCB

QMCPACK

NAMD

LULESH

SNAP

miniFE

Data-Centric Benchmarks

Graph500

Integer Sort

Hash

SPECint2006 "peak"

Skeleton Benchmarks

CLOMP

IOR

CORAL MPI Benchmarks

Memory Benchmarks

LCALS

Pynamic

HACC IO

FTQ

XSBench

MiniMADNESS

Microkernel Benchmarks

NEKbonemk

HACCmk

UMTmk

AMGmk

MILCmk

GFMCmk

Supplemental Information

The following content is provided as supplemental information for vendors. It is included to help offerers better understand our application requirements. It is not part of the formal RFP process or technical requirements, should not take precedence over formal RFP technical requirements, and is not intended to be addressed by offerers as part of a formal response.

Change Log

02/05 - UMT2013 source was updated to address OpenMP thread safety issues. The files Teton/transport/Teton/mods/[ZoneData,Quadrature,Boundary]_mod.F90 have added threadprivate() directive statements. Details are in the updated README file. Due to the late date of this fix, offerers need not recalculate FOMs if tests have already been completed.
01/31 - The LCALSSuite.cxx file was changed to set array allocation lengths to be maximum array index accessed over all suite kernels. This fixes an issue where the VOL3D_CALC kernel was accessing data beyond some of the end of the arrays it was using. (Ref. CORAL RFP Q&A 18)
01/23 - The SNAP summary file and example inputs in the source distribution were updated to clarify how to run problems.
01/22 - Updated the Nekbone summary file to reflect a relaxation of the allowable deviation in spectral element count from 5% to 15% (see CORAL RFP Q&A for more detail).
01/10 - Updated UMT build to handle preprocessing of .F90 files correctly. Updated UMT README to correct definition of FOM to be consistent with source code and summary file.
01/08 - Update to KMI Hash benchmark to support > 2 Gb per MPI rank with MPI-2.
01/06 - Updated SNAP summary file with command line parameters to scale to various problem sizes.

Pre-RFP Release Change Log


   ASC logo