Single node performance, ReaxFF HNS benchmark, K80
Performance in millions of atom-timesteps / second

Natoms Kokkos/Cuda-1 (mpi) Kokkos/Cuda-2 (mpi)
3648 0.08815 (2) 0.08409 (4)
7296 0.1433 (2) 0.155 (4)
14592 0.2303 (2) 0.2672 (4)
29184 0.2872 (2) 0.4319 (4)
58368 0.314 (2) 0.5405 (4)
116736 0.3413 (2) 0.6136 (4)
233472 0.3506 (2) 0.679 (4)
466944 0.3564 (2) 0.6914 (4)
933888 0.3701 (2) 0.7094 (4)

Run commands and logfile links for column Kokkos/Cuda-1

3648 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 2 -v y 2 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=4K.node=1.mpi=2.gpu=2
7296 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 2 -v y 4 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=8K.node=1.mpi=2.gpu=2
14592 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 4 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=16K.node=1.mpi=2.gpu=2
29184 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 4 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=32K.node=1.mpi=2.gpu=2
58368 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 8 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=64K.node=1.mpi=2.gpu=2
116736 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 8 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=128K.node=1.mpi=2.gpu=2
233472 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 8 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=256K.node=1.mpi=2.gpu=2
466944 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 16 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=512K.node=1.mpi=2.gpu=2
933888 mpirun -np 2 --npersocket 1 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 16 -v y 16 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=1M.node=1.mpi=2.gpu=2

Run commands and logfile links for column Kokkos/Cuda-2

3648 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 2 -v y 2 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=4K.node=1.mpi=4.gpu=4
7296 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 2 -v y 4 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=8K.node=1.mpi=4.gpu=4
14592 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 4 -v z 3 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=16K.node=1.mpi=4.gpu=4
29184 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 4 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=32K.node=1.mpi=4.gpu=4
58368 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 4 -v y 8 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=64K.node=1.mpi=4.gpu=4
116736 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 8 -v z 6 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=128K.node=1.mpi=4.gpu=4
233472 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 8 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=256K.node=1.mpi=4.gpu=4
466944 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 8 -v y 16 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=512K.node=1.mpi=4.gpu=4
933888 mpirun -np 4 --npersocket 2 --bind-to core lmp_ride80_kokkos_cuda -sf kk -k on g 4 -pk kokkos neigh half neigh/qeq full newton on comm device binsize 11.0 -v x 16 -v y 16 -v z 12 -v t 100 -in in.reaxc.hns.kokkos_cuda.steps -nocite -log log.lammps.date=17Jan18.model=hns.machine=ride80.pkg=kokkos_cuda.kind=node.size=1M.node=1.mpi=4.gpu=4