on logically rectangular grids. The code solves both 2D and 3D problems with discretization stencils of up to 9-point in 2D and up to 27-point in 3D. See the following paper for details on the algorithm and its parallel implementation/performance:
P. N. Brown, R. D. Falgout, and J. E. Jones, "Semicoarsening multigrid on distributed memory machines." SIAM Journal on Scientific Computing, 21 (2000), pp. 1823-1834 . Also available as Lawrence Livermore National Laboratory technical report UCRL-JC-130720.
The driver provided with SMG2000 builds linear systems for the special case of the above equation,
with Dirichlet boundary conditions of u = 0, where h is the mesh spacing in each direction. Standard finite differences are used to discretize the equations, yielding 5-point and 7-point stencils in 2D and 3D, respectively.
To determine when the solver has converged, the driver currently uses the relative-residual stopping criteria,
This solver can serve as a key component for achieving scalability in radiation diffusion simulations.
The following files are included in the smg2000 directory:
COPYRIGHT_and_DISCLAIMERThe following subdirectories are included in the smg2000 directory:
docsThe following file is included in the 'docs' directory:
smg2000.readmeThe following files are included in the 'krylov' directory:
HYPRE_pcg.cThe following files are included in the 'utilities' directory:
HYPRE_utilities.hThe following files are included in the 'struct_mv' directory:
HYPRE_struct_grid.cThe following files are included in the 'struct_ls' directory:
HYPRE_struct_ls.hThe following files are included in the 'test' directory:
krylov/MakefileTo build the code, first modify the 'Makefile.include' file appropriately, then type (in the smg2000 directory)
makeThis will produce an executable file 'smg2000' in the 'test' directory. Other available targets are
make clean (deletes .o files)To configure the code to run with:
make veryclean (deletes .o files, libraries, and executables)
|Blue-Pacific||up to 1000 procs|
|Red||up to 3150 procs|
|Compaq cluster||up to 64 procs|
|Sun Sparc Ultra 10s||up to 4 machines|
Consider increasing both problem size and number of processors in tandem. On scalable architectures, time-to-solution for SMG2000 will initially increase, then it will level off at a modest numbers of processors, remaining roughly constant for larger numbers of processors. Iteration counts will also increase slightly for small to modest sized problems, then level off at a roughly constant number for larger problem sizes.
For example, we get the following
results for a 3D problem with cx = 0.1, cy = 1.0, and cz = 10.0, for a
problem distributed on a logical P x Q x R processor topology, with fixed
local problem size per processor given as 35x35x35:
These results were obtained
on ASCI Red.
mpirun -np 1 smg2000 -helpto get usage information. This prints out the following:
Usage: .../smg2000/test/smg2000 [<options>] -n <nx> <ny> <nz> : problem size per block -P <Px> <Py> <Pz> : processor topology -b <bx> <by> <bz> : blocking per processor -c <cx> <cy> <cz> : diffusion coefficients -v <n_pre> <n_post> : number of pre and post relaxations -d <dim> : problem dimension (2 or 3) -solver <ID> : solver ID(default = 0) 0 - SMG 1 - CG with SMG precond 2 - CG with diagonal scaling 3 - CGAll of the arguments are optional. The most important options for SMG2000 are the -n and -P options. The -n option allows one to specify the local problem size per processor, the the -P option specifies the processor topology to run on. The global problem size will be <Px>*<nx> by <Py>*<ny> by <Pz>*<nz>.
When running with OpenMP, the number of threads used per MPI process is controlled via the OMP_NUM_THREADS environment variable.
Consider the following run:
mpirun -np 1 smg2000 -n 12 12 12 -c 2.0 3.0 40This is what SMG2000 prints out:
Running with these driver parameters: (nx, ny, nz) = (12, 12, 12) (Px, Py, Pz) = (1, 1, 1) (bx, by, bz) = (1, 1, 1) (cx, cy, cz) = (2.000000, 3.000000, 40.000000) (n_pre, n_post) = (1, 1) dim = 3 solver ID = 0 ============================================= Struct Interface: ============================================= Struct Interface: wall clock time = 0.005627 seconds cpu clock time = 0.010000 seconds ============================================= Setup phase times: ============================================= SMG Setup: wall clock time = 0.330096 seconds cpu clock time = 0.330000 seconds ============================================= Solve phase times: ============================================= SMG Solve: wall clock time = 0.686244 seconds cpu clock time = 0.480000 seconds Iterations = 4 Final Relative Residual Norm = 8.972097e-07The relative residual norm may differ slightly from machine to machine or compiler to compiler, but should only differ very slightly (say, the 6th or 7th decimal place). Also, the code should generate nearly identical results for a given problem, independent of the data distribution. The only part of the code that does not guarantee bitwise identical results is the inner product used to compute norms. In practice, the above residual norm has remained the same.
Last modified on September
For information about this page contact:
Brian Carnes, email@example.com