GPU Versions and Other Supplementary Material

GPU Versions and Other Supplementary Material

This page collects information about GPU ports and other supplementary material in a single place for the Tier-1 benchmarks. The information here is meant to help vendors better understand how various applications scale and how they have been ported to GPUs in the past. All information and code on this site is provided as is and there is no warranty that it will work as advertised.

CORAL 2 Memory Requirement Estimates (.xlsx)

NEKBone

There is no "official" GPU version of Nekbone, however there is a GPU branch here:

This version and the CORAL2 code are different branches so there are some differences besides just the GPU code and the GPU implementation is not optimal. GPU code
from the GPU branch could likely be ported to the CORAL/OpenMP branch without large scale effort and the optimization work required is primarily for the local_grad3 and local_grad3_t kernels.

HACC

The "experimental" version of the HACC Cuda implementation can be found at the following link:
https://anl.box.com/s/ilz0yi0bni19iz60ao3endehr4jq01l0

The README file is not yet updated and will be done later.

QMCPACK

The base benchmark version of QMCPack already contains GPU support.

LAMMPS

Summary slides for LAMMPs presented in deep-dives

The output script from the benchmark run on Sequoia

Benchmarking data for LAMMPS ReaxFF

AMG

There is no GPU support in AMG. There is GPU support in hypre, which is the parent code to AMG. The hypre release v. 2.13.0 with GPU code is available at https://github.com/hypre-space/hypre. It requires a few special settings to get full GPU support on the solve cycle:

HYPRE_BoomerAMGSetRelaxType (<>, 18);

HYPRE_BoomerAMGSetKeepTranspose(<>,1);

It also needs to be configured in a special way. Both configure lines are for P8+Pascal systems at LLNL.

./ configure −−with−nvcc CFLAGS=”−O2 −qmaxmem=−1 −I /usr/ local /cuda/include” CXXFLAGS=”−O2 −qmaxmem=−1 −I /usr/ local /cuda/include”

And if one also wants openmp

./ configure −−with−nvcc −−enable−persistent −−with−openmp −−enable−hopscotch CFLAGS=”−O2 −qmaxmem=−1 −qsmp=omp −I /usr/ local /cuda/include” CXXFLAGS=”−O2 −qmaxmem=−1 −qsmp=omp −I /usr/ local /cuda/include” LDFLAGS=”−qsmp=omp”

Hypre currently requires unified memory to work correctly.

While there are some differences between AMG and AMG2013 most of the underlying code and algorithms are similar and the references on https://computing.llnl.gov/projects/co-design/download/amg2013.tgzhttps://computing.llnl.gov/projects/co-design/download/amg2013.tgz provide more details for people interested in the performance and scalability of the code.

Kripke

In addition to the information about Kripke that can be found on the Kripke website including an overview paper of the code we provide some supplementary material.

A CUDA port of Kripke can be found on github. The code is a research variant and may be hard to work with and understand. It is provided as is.

Quicksilver

The Quicksilver benchmark code contains a CUDA and an OpenMP 4.5 GPU port. Instructions to build these ports are given in the makefile. Some changing of paths may be required for some systems. Unified memory is assumed.

The code is of late beta quality with no current known bugs, but no promises that none exist. Additionally, performance of the code is likely sub-optimal as little work has been done to tune the code for GPUs.

A paper showing performance on modern hardware, discussing the representativeness of Quicksilver to its parent code Mercury, and describing the changes needed to port the original version of Quicksilver to GPUs can be found doi.org. The related slides are here with the second half of them containing performance data.

Pennant

A paper describing Pennant, how it has been parallelized for various architecture, including GPUs and performance results can be found here. Here is the source code for a single node GPU/CUDA implementation.

Big Data Analytics Suite

A open source port of the algorithms in this suite can be found here.