SMG2000 Semicoarsening Multigrid Solver

Privacy & Legal Notice

Contents


Code Description

A. General Description

SMG2000 is a parallel semicoarsening multigrid solver for the linear systems arising from finite difference, finite volume, or finite element discretizations of the diffusion equation
on logically rectangular grids. The code solves both 2D and 3D problems with discretization stencils of up to 9-point in 2D and up to 27-point in 3D. See the following paper for details on the algorithm and its parallel implementation/performance:

P. N. Brown, R. D. Falgout, and J. E. Jones, "Semicoarsening multigrid on distributed memory machines." SIAM Journal on Scientific Computing, 21 (2000), pp. 1823-1834 . Also available as Lawrence Livermore National Laboratory technical report UCRL-JC-130720.

The driver provided with SMG2000 builds linear systems for the special case of the above equation,

with Dirichlet boundary conditions of u = 0, where h is the mesh spacing in each direction. Standard finite differences are used to discretize the equations, yielding 5-point and 7-point stencils in 2D and 3D, respectively.

To determine when the solver has converged, the driver currently uses the relative-residual stopping criteria,

This solver can serve as a key component for achieving scalability in radiation diffusion simulations.

B. Coding

SMG2000 is written in ISO-C. It is an SPMD code that uses MPI. Parallelism is achieved by data decomposition. The driver provided with SMG2000 achieves this decomposition by simply subdividing the grid into logical P x Q x R (in 3D) chunks of equal size.

C. Parallelization

SMG2000 is a highly synchronous code. The communications and computations patterns exhibit the surface-to-volume relationship common to many parallel scientific codes. Hence, parallel efficiency is largely determined by the size of the data "chunks" mentioned above, and the speed of communications and computations on the machine. SMG2000 is also memory-access bound, doing only about 1-2 computations per memory access, so memory-access speeds will also have a large impact on performance.

Files in this Distribution

Note: The SMG2000 code is derived directly from the hypre library, a large linear solver library that is being developed in the Center for Applied Scientific Computing (CASC) at LLNL.

The following files are included in the smg2000 directory:

COPYRIGHT_and_DISCLAIMER
HYPRE_config.h
Makefile
Makefile.include
The following subdirectories are included in the smg2000 directory:
docs
krylov
struct_ls
struct_mv
test
utilities
The following file is included in the 'docs' directory:
smg2000.readme
The following files are included in the 'krylov' directory:
HYPRE_pcg.c
Makefile
krylov.h
pcg.c
The following files are included in the 'utilities' directory:
HYPRE_utilities.h
Makefile
general.h
hypre_smp_forloop.h
memory.c
memory.h
mpistubs.c
mpistubs.h
random.c
threading.c
threading.h
timer.c
timing.c
timing.h
utilities.h
version
The following files are included in the 'struct_mv' directory:
HYPRE_struct_grid.c
HYPRE_struct_matrix.c
HYPRE_struct_mv.h
HYPRE_struct_stencil.c
HYPRE_struct_vector.c
Makefile
box.c
box_algebra.c
box_alloc.c
box_neighbors.c
communication.c
communication_info.c
computation.c
grow.c
headers.h
hypre_box_smp_forloop.h
project.c
struct_axpy.c
struct_copy.c
struct_grid.c
struct_innerprod.c
struct_io.c
struct_matrix.c
struct_matrix_mask.c
struct_matvec.c
struct_mv.h
struct_scale.c
struct_stencil.c
struct_vector.c
The following files are included in the 'struct_ls' directory:
HYPRE_struct_ls.h
HYPRE_struct_pcg.c
HYPRE_struct_smg.c
Makefile
coarsen.c
cyclic_reduction.c
general.c
headers.h
pcg_struct.c
point_relax.c
semi_interp.c
semi_restrict.c
smg.c
smg.h
smg2_setup_rap.c
smg3_setup_rap.c
smg_axpy.c
smg_relax.c
smg_residual.c
smg_setup.c
smg_setup_interp.c
smg_setup_rap.c
smg_setup_restrict.c
smg_solve.c
struct_ls.h
The following files are included in the 'test' directory:
Makefile
smg2000.c

Building the Code

SMG2000 uses a simple Makefile system for building the code. All compiler and link options are set by modifying the file 'smg2000/Makefile.include' appropriately. This file is then included in each of the following makefiles:
krylov/Makefile
struct_ls/Makefile
struct_mv/Makefile
test/Makefile
utilities/Makefile
To build the code, first modify the 'Makefile.include' file appropriately, then type (in the smg2000 directory)
make
This will produce an executable file 'smg2000' in the 'test' directory. Other available targets are
make clean    (deletes .o files)
make veryclean    (deletes .o files, libraries, and executables)
To configure the code to run with:

Optimization and Improvement Challenges

This code is memory-access bound. We believe it would be very difficult to obtain "good" cache reuse with an optimized version of the code.

Parallelism and Scalability Expectations

SMG2000 has been run on the following platforms:
 
Blue-Pacific up to 1000 procs
Red up to 3150 procs
Compaq cluster up to 64 procs
Sun Sparc Ultra 10s up to 4 machines

Consider increasing both problem size and number of processors in tandem. On scalable architectures, time-to-solution for SMG2000 will initially increase, then it will level off at a modest numbers of processors, remaining roughly constant for larger numbers of processors. Iteration counts will also increase slightly for small to modest sized problems, then level off at a roughly constant number for larger problem sizes.

For example, we get the following results for a 3D problem with cx = 0.1, cy = 1.0, and cz = 10.0, for a problem distributed on a logical P x Q x R processor topology, with fixed local problem size per processor given as 35x35x35:
 
"P x Q x R"
P
"iters"
"setup time"
"solve time"
  1x1x1
  2x2x2
  3x3x3
  6x6x6
  8x8x8
10x10x10
14x15x15
1
8
27
216
512
1000
3150
6
6
6
7
7
7
8
  1.681680
  3.738600
  6.601194
12.310776
18.968893
18.890876
30.635085
11.114720
16.419849
19.874300
27.370993
34.317915
31.921784
42.171934

These results were obtained on ASCI Red.


Running the Code

The driver for SMG2000 is called 'smg2000', and is located in the smg2000/test subdirectory. Type
mpirun -np 1 smg2000 -help
to get usage information. This prints out the following:
Usage: .../smg2000/test/smg2000    [<options>]
    
     -n <nx> <ny> <nz>    : problem size per block
     -P <Px> <Py> <Pz>    : processor topology
     -b <bx> <by> <bz>    : blocking per processor
     -c <cx> <cy> <cz>    : diffusion coefficients
     -v <n_pre> <n_post>  : number of pre and post relaxations
     -d <dim>                     : problem dimension (2 or 3)
     -solver <ID>                 : solver ID(default = 0)
                                         0 - SMG
                                         1 - CG with SMG precond
                                         2 - CG with diagonal scaling
                                         3 - CG
All of the arguments are optional. The most important options for SMG2000 are the -n and -P options. The -n option allows one to specify the local problem size per processor, the the -P option specifies the processor topology to run on. The global problem size will be <Px>*<nx> by <Py>*<ny> by <Pz>*<nz>.

When running with OpenMP, the number of threads used per MPI process is controlled via the OMP_NUM_THREADS environment variable.


Timing Issues

If using MPI, the whole code is timed using the MPI timers. If not using MPI, standard system timers are used.  Timing results are printed to standard out, and are divided into "Setup Phase" times and "Solve Phase" times.

Memory Needed

SMG2000 is a memory-intensive code, and its memory needs are somewhat complicated to describe. For the 3D problems discussed in this document, memory requirements are roughly 54 times the local grid size times the size of a double plus some overhead for storing ghost points, etc. in the code. The overhead required by this version of the SMG code grows essentially like the logarithm of the problem size.

About the Data

SMG2000 does not read in any data. All control is on the execute line.

Expected Results

This version of SMG2000 is self-checking. The results of the matrix solve are not saved. Results of the code are checked by examining the last line of the output, which reports the relative residual norm. See further discussion below.

Consider the following run:

mpirun -np 1 smg2000 -n 12 12 12 -c 2.0 3.0 40
This is what SMG2000 prints out:
   Running with these driver parameters:
     (nx, ny, nz)    = (12, 12, 12)
     (Px, Py, Pz)    = (1, 1, 1)
     (bx, by, bz)    = (1, 1, 1)
     (cx, cy, cz)    = (2.000000, 3.000000, 40.000000)
     (n_pre, n_post) = (1, 1)
     dim             = 3
     solver ID       = 0
   =============================================
   Struct Interface:
   =============================================
   Struct Interface:
     wall clock time = 0.005627 seconds
     cpu clock time  = 0.010000 seconds

   =============================================
   Setup phase times:
   =============================================
   SMG Setup:
     wall clock time = 0.330096 seconds
     cpu clock time  = 0.330000 seconds

   =============================================
   Solve phase times:
   =============================================
   SMG Solve:
     wall clock time = 0.686244 seconds
     cpu clock time  = 0.480000 seconds


   Iterations = 4
   Final Relative Residual Norm = 8.972097e-07
The relative residual norm may differ slightly from machine to machine or compiler to compiler, but should only differ very slightly (say, the 6th or 7th decimal place). Also, the code should generate nearly identical results for a given problem, independent of the data distribution. The only part of the code that does not guarantee bitwise identical results is the inner product used to compute norms. In practice, the above residual norm has remained the same.


Last modified on September 19, 2001
For information about this page contact:
Brian Carnes, bcarnes@llnl.gov

UCRL-MI-144211

UCRL-CODE-2000-022