Single node performance, ReaxFF HNS benchmark, P100
Performance in millions of atom-timesteps / second

Natoms Kokkos/Cuda-1 (mpi) Kokkos/Cuda-2 (mpi) Kokkos/Cuda-4 (mpi)
3648 0.151 (1) 0.151 (2) 0.1324 (4)
7296 0.2771 (1) 0.248 (2) 0.2494 (4)
14592 0.4207 (1) 0.5167 (2) 0.4631 (4)
29184 0.5183 (1) 0.7664 (2) 0.9207 (4)
58368 0.5972 (1) 0.9805 (2) 1.37 (4)
116736 0.6654 (1) 1.162 (2) 1.903 (4)
233472 0.7046 (1) 1.287 (2) 2.263 (4)
466944 0.7313 (1) 1.381 (2) 2.504 (4)
933888 None 1.437 (2) 2.704 (4)

Run commands and logfile links for column Kokkos/Cuda-1

3648 mpirun -np 1 --npernode 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 1 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 2 -v y 2 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=4K.node=1.mpi=1.gpu=1
7296 mpirun -np 1 --npernode 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 1 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 2 -v y 4 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=8K.node=1.mpi=1.gpu=1
14592 mpirun -np 1 --npernode 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 1 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 4 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=16K.node=1.mpi=1.gpu=1
29184 mpirun -np 1 --npernode 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 1 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 4 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=32K.node=1.mpi=1.gpu=1
58368 mpirun -np 1 --npernode 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 1 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 8 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=64K.node=1.mpi=1.gpu=1
116736 mpirun -np 1 --npernode 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 1 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 8 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=128K.node=1.mpi=1.gpu=1
233472 mpirun -np 1 --npernode 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 1 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 8 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=256K.node=1.mpi=1.gpu=1
466944 mpirun -np 1 --npernode 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 1 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 16 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=512K.node=1.mpi=1.gpu=1
933888 None

Run commands and logfile links for column Kokkos/Cuda-2

3648 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 2 -v y 2 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=4K.node=1.mpi=2.gpu=2
7296 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 2 -v y 4 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=8K.node=1.mpi=2.gpu=2
14592 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 4 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=16K.node=1.mpi=2.gpu=2
29184 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 4 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=32K.node=1.mpi=2.gpu=2
58368 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 8 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=64K.node=1.mpi=2.gpu=2
116736 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 8 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=128K.node=1.mpi=2.gpu=2
233472 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 8 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=256K.node=1.mpi=2.gpu=2
466944 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 16 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=512K.node=1.mpi=2.gpu=2
933888 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 16 -v y 16 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=1M.node=1.mpi=2.gpu=2

Run commands and logfile links for column Kokkos/Cuda-4

3648 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 2 -v y 2 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=4K.node=1.mpi=4.gpu=4
7296 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 2 -v y 4 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=8K.node=1.mpi=4.gpu=4
14592 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 4 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=16K.node=1.mpi=4.gpu=4
29184 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 4 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=32K.node=1.mpi=4.gpu=4
58368 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 8 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=64K.node=1.mpi=4.gpu=4
116736 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 8 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=128K.node=1.mpi=4.gpu=4
233472 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 8 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=256K.node=1.mpi=4.gpu=4
466944 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 16 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=512K.node=1.mpi=4.gpu=4
933888 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride100_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 16 -v y 16 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride100.pkg=kokkos_cuda.kind=node.size=1M.node=1.mpi=4.gpu=4