LAMMPS has several accelerator options, implemented via five accelerator packages. Some of the packages support multiple hardware options and precision options (double,mixed,single). These are the package abbreviations used in the plots and tables below.
For acceleration on a CPU:
For acceleration on an Intel KNL:
For acceleration on an NVIDIA GPU:
Note that if you browse the alphabetized listing of pair, fix, compute, etc commands in Section commands 3.5 of the manual, many commands are followed by parentheses, with letters for "g" = GPU, "i" = Intel, "k" = Kokkos, "o" = OMP, and "t" = OPT. This indicates which packages support that command.
Benchmarks were run on the following machines and node hardware.
mutrino = Intel Haswell CPUs or Intel KNLs
ride80 = IBM Power8 CPUs with NVIDIA K80 GPUs
ride100 = IBM Power8 CPUs with NVIDIA P100 GPUs
This table shows which accelerator packages were used on which machines:
Machine | Hardware | CPU | OPT | OMP | GPU | Intel/CPU | Intel/KNL | Kokkos/OMP | Kokkos/KNL | Kokkos/Cuda |
mutrino | Haswell/KNL | yes | yes | yes | no | yes | yes | yes | yes | no |
ride80 | K80 | no | no | no | yes | no | no | no | no | yes |
ride100 | P100 | no | no | no | yes | no | no | no | no | yes |
These are the software environments on each machine and the Makefiles used to build LAMMPS with different accelerator packages.
mutrino
ride80
ride100
Some of the Makefiles were used to build LAMMPS with multiple accelerator packages and options included, specifically the "cpu" and "knl" makefiles:
Makefile suffix | Accelerator options |
cpu | CPU, OPT, OMP, Intel/CPU |
kokkos_omp | Kokkos/OMP |
kokkos_serial | Kokkos/serial |
knl | CPU/KNL, OPT/KNL, OMP/KNL, Intel/KNL |
kokkos_knl | Kokkos/KNL |
kokkos_knl_serial | Kokkos/KNL/serial |
gpu | GPU |
kokkos_cuda | Kokkos/Cuda |
If a specific benchmark requires a build with additional package(s) installed, it is noted with the benchmark info below.
With the software environment initialized (e.g. modules loaded) and the machine Makefiles copied into src/MAKE/MINE, building LAMMPS is straightforward:
cp Makefile.serrano_cpu lammps/src/MAKE/MINE # for example cd lammps/src make yes-manybody # install any packages the benchmark requires make yes-opt yes-user-intel ... # install accelerator package(s) supported by the Makefile make serrano_cpu # target = suffix of Makefile.machine
This should produce an executable named lmp_machine, e.g. lmp_serrano_cpu. If desired, you can copy the executable to a directory where you run the benchmark.
Note that if the GPU package in being included in the build, these steps should be done before the LAMMPS build:
cp Makefile.gpulib.ride100.double lammps/lib/gpu # for example cd lammps/lib/gpu make -f Makefile.gpulib.ride100.double clean make -f Makefile.gpulib.ride100.double
This should produce the file lammps/lib/gpu/libgpu.a.
IMPORTANT NOTE: Achieving best performance for the benchmarks (or your own input script) on a particular machine with a particular accelerator option, requires attention to the following issues.
All of the plots below include a link to a table with details on all of these issues. The table shows the mpirun (or equivalent) command used to produce each data point on each curve in the plot, the LAMMPS command-line arguments used to get best performance with a particular package on that hardware, and a link to the logfile produced by the benchmark run.
All the plots below have atoms or nodes on the x-axis, and performance on the y-axis. On all the plots, better performance is up and worse performance is down. For all the plots:
Per-core and per-node plots:
Strong-scaling and weak-scaling plots:
Additional packages needed for this benchmark: USER-REAXC
Comments:
ReaxFF single core and single node performance:
Best timings for any accelerator option as a function of problem size. Running on a single CPU or KNL core. Running on a single CPU or KNL or GPU node. Only for double precision.
ReaxFF strong and weak scaling:
Fastest timing for any accelerator option running on multiple CPU or KNL or GPU nodes, as a function of node count. For strong scaling of 3 problem sizes: 64K atoms, 512K atoms, 4M atoms. For weak scaling of 2 problem sizes: 64K atoms/node, 1M atoms/node. Only for a single GPU/node, only double precision.
Strong scaling means the same size problem is run on successively more nodes. Weak scaling means the problem size doubles each time the node count doubles. See a fuller description here of how to interpret these plots.
ReaxFF performance details:
Mode | LAMMPS Version | Hardware | Machine | Size | Plot | Table |
core | 17Jan18 | Haswell | mutrino | 1K-1M | plot | table |
core | 17Jan18 | KNL | mutrino | 1K-1M | plot | table |
node | 17Jan18 | Haswell | mutrino | 1K-4M | plot | table |
node | 17Jan18 | KNL | mutrino | 1K-4M | plot | table |
node | 17Jan18 | K80 | ride80 | 1K-4M | plot | table |
node | 17Jan18 | P100 | ride100 | 1K-4M | plot | table |
strong | 17Jan18 | Haswell | mutrino | 64K | plot | table |
strong | 17Jan18 | KNL | mutrino | 64K | plot | table |
strong | 17Jan18 | K80 | ride80 | 64K | plot | table |
strong | 17Jan18 | P100 | ride100 | 64K | plot | table |
strong | 17Jan18 | Haswell | mutrino | 512K | plot | table |
strong | 17Jan18 | KNL | mutrino | 512K | plot | table |
strong | 17Jan18 | K80 | ride80 | 512K | plot | table |
strong | 17Jan18 | P100 | ride100 | 512K | plot | table |
strong | 17Jan18 | Haswell | mutrino | 4M | plot | table |
strong | 17Jan18 | KNL | mutrino | 4M | plot | table |
strong | 17Jan18 | K80 | ride80 | 4M | plot | table |
strong | 17Jan18 | P100 | ride100 | 4M | plot | table |
weak | 17Jan18 | Haswell | mutrino | 128K/node | plot | table |
weak | 17Jan18 | KNL | mutrino | 128K/node | plot | table |
weak | 17Jan18 | K80 | ride80 | 128K/node | plot | table |
weak | 17Jan18 | P100 | ride100 | 128K/node | plot | table |