DOE Centers of Excellence Performance Portability Meeting

April 19–21, 2016 - Glendale, Arizona

Overview

The Department of Energy (DOE) Centers of Excellence (COEs) Performance Portability meeting is an opportunity for the five COEs to share ideas, progress, and challenges toward the goal of performance portability across DOE's large upcoming advanced architecture supercomputer procurements. The need for applications to run effectively on multiple vendor advanced architecture solutions (as well as on standard "cluster" technology) is pervasive across application teams within DOE and is a specified goal of the DOE's exascale plans for risk mitigation. The two primary goals of this meetings are to:

Inform application teams and tool developers of activities and methodologies being used across the COEs, and foster informal relationships that can help DOE participants benefit from activities beyond their own COE.
Identify major challenges toward the goal of performance portability, and work with the vendors and tool providers on determining implementations and solutions that will meet their own performance criteria without inadvertently impairing performance results elsewhere.

Talks

Tuesday, April 19, 2016

Overviews

Rob Neely, LLNL: Welcome/Kickoff
Tjerk Straatsma, ORNL: Summit COE/CAAR Overview
Jack Deslippe, LBL: NERSC-8 COE/NESAP Overview
Rob Neely, LLNL: Sierra COE Overview
Hai Ah Nam, LANL: Trinity COE Multi-Lab Overview
Kalyan Kumaran, ANL: ANL COE Overview
Nick Romero, ANL: HPCOR Workshop Recap
Bert Still, LLNL/Multi-Lab: ECP Application Overview and Criteria
NVIDIA: New Programming Model Features

Applications/Optimizations/Algorithms

Jae-Seung Yeom, LLNL: Data-Dependent Performance Modeling of Linear Solvers for Sparse Matrices
Charles Ferenbaugh, LANL: Coarse versus Fine-Level Threading in the PENNANT Mini-App
Scott Parker, ANL: Performance Optimization and Portability of the Nekbone Mini-App
Kris Garrett, LANL: A First Look at Optimizing Performance on the KNL
Vitali Morozov, ANL: Portability of HACC—a Highly Tuned Cosmology Application
Kristopher Keipert, ANL: Experiences and Challenges while Modernizing GAMESS for Theta and Aurora
Steve Rennich, NVIDIA: GPU Performance Optimization of the Sweep Operation in Kripke
Balint Joo, JLab/ANL/LBL: Experiences and Challenges for Performance Portability in Lattice QCD
Alvaro Vazquez-Mayagoitia, ANL: Many-Core and GPU Developments in the Parallel ELectronic Structure Infrastructure Library (ELSI)

Performance Portable Abstractions

Tan Nguyen, LBL: Portable Data Locality Management with High-Level Programming Abstractions
Jeff Vetter, ORNL: Understanding Portability of a High-Level Programming Model on Diverse HPC Architectures
Christian Trott, SNL: Kokkos—Performance Portability Today
Rich Hornung, LLNL: The RAJA Encapsulation Model for Architecture Portability
Arpith Jacob, IBM: Towards Performance Portable GPU Programming with RAJA

Wednesday, April 20, 2016

Managing the Memory Hierarchy

David Poliakoff, LLNL: Copy Hiding Application Interface (CHAI)—Hiding Data Motion for Performance Portability
Nikolai Sakharnykh, NVIDIA: Harnessing Performance of Geometric Multi-Grid Methods by Using LOC and TOC Architectures
Fabian Delalondre, ANL: Leveraging Heterogeneous Systems and Deep Memory Hierarchies for Brain Tissue Modeling
Luiz DeRose, Cray: Cray’s Programming Environment for Portable Performance and Programmability on Systems with High-Bandwidth Memory
Ian Karlin, LLNL/Multi-Lab: Quad Lab Proposal of Fundamental Cross Architecture Multi-Level Memory Support

Application Experience with Performance Portable Abstractions

Changhoan Kim, IBM: An Abstraction for Unstructured Mesh Problems
Adam Kunen, LLNL: Nested Loop RAJA for Performance Portability
Stan Moore, SNL: Obtaining Threading Performance Portability in SPARTA Using Kokkos
David Beckingsale, LLNL: Lightweight Models for Dynamically Tuning Data-Dependent Code
Geoff Womeldorff, LANL: Kokkos and Legion Implementations of the SNAP Proxy Application
Ryan Bleile, LLNL: Investigation of Portable Event-Based Monte Carlo Transport
Matt Martineau, UK: Investigating the Performance Portability Capabilities of OpenMP 4, Kokkos, and Raja
Leopold Grinberg, IBM: Performance Portable Single Source-Code Implementation of Sparse Linear Algebra Operations on CPUs and GPUs
Slaven Peles, LLNL: Investigating Interoperability and Performance Portability of Select LLNL Numerical Libraries

Experience with OpenMP and Recommendations on Guilding Future Standards

John Pennycook, Intel: Performance Portability of Kernel-based Abstractions
John Pennycook, Intel: Generalizing a DSL for Structured Dependency (Stencil-Like) Codes to OpenMP Loops
John Levesque, Cray: How We Can Get Hybrid OpenMP/MPI to Out-Perform All-MPI
Carlo Bertolli, IBM: Performance Portability with OpenMP on Nvidia GPUs
Jeff Larkin, NVIDIA: Performance Portability through Descriptive Parallelism
David Appelhans, IBM: Performance Portability Experience with LLVM, OpenMP 4, and Kripke
Kevin O’Brien, IBM: OpenMP Specifications for Portability
Oscar Hernandez, ORNL: Experiences with High-Level Programming Directives for Porting SPEC ACCEL on Multiple Architectures
Tom Scogland, LLNL: Performance Portability with OpenMP: Experiences with 4.5 and Looking toward 5.0

Thursday, April 21, 2016

Tools for Performance Portability and Analysis

Jeanine Cook, SNL: The Importability of Performance Tools
Juan Gonzalez Garcia, IBM: Next-Gen Profiling-Infrastructure for Supercomputers Based on Hybrid Nodes
Ignacio Laguna, LLNL: STATuner—Tuning CUDA Kernels via Compiler Analysis and Machine Learning
Si Hammond, SNL: Profiling Interfaces for Parallel C++ Abstractions - KokkosP
Protonu Basu, LBL: Leveraging Compiler-Based Tools for Performance-Portability
Heidi Poxon, Cray: Adding Parallelism to HPC Applications Using Reveal

The Input/Output Bottleneck and Use of Burst Buffers

Mark Miller, LLNL: Probing Portable Performance of Parallel I/O Paradigms Using MACSio (4 M)
Andrey Ovsyannikov, LBL: ChomboCrunch and VisIt for Carbon Sequestration and In-Transit Data Analysis Using Burst Buffers
Kathryn Mohror, LLNL: Performance Portability for Burst Buffers with the Scalable Checkpoint/Restart Library (SCR)

Use of Domain-Specific Languages for Performance Portability

David Richards, LLNL: Portable Performance in Real Applications Using Generated Code
Brian Van Straalen, LBL: AMRStencil—An Embedded DSL for Expressing Structured Adaptive Mesh Refinement Algorithms

Breakout Sessions

During the meeting, several breakout sessions were held to gather participant input on four topics areas outlined below. Each topic had two groups independently discuss a set of questions that the breakout moderators worked together to come up with before the meeting. Below are the summaries of the discussions presented during the meeting as out-briefs:

Managing the Memory Hierarchy: Session A, Session B
Performance-Portable Abstractions: Session C, Session D
OpenMP Futures: Session E, Session F
Tools/Compiler/System Software Requirements: Session G, Session H

Agenda

Download the final agenda (updated 4/14/16).\

Call for Abstracts/Talks (CLOSED)

Meeting organizers put out an open call for speakers to give short talks on progress, ideas, and/or challenges in the following topical areas:

Algorithmic and application work aimed at addressing trends in advanced architectures (e.g. reduced data motion, increased concurrency, use of burst buffers, etc.)
Software tools, libraries, abstractions, and standards intended to help applications address performance portability
Early application experiences (both positive and negative) attempting to run portably across diverse platforms
Portable approaches application and library teams are taking to "manage the memory hierarchy", and optimize data placement and movement

Speakers were chosen from the submissions that were received by Feb 29, 2016, and will be reflected in the final agenda.

Speaker and Participant Guidance

The following "ground rules" have been established for participants and speakers:

Talks and discussions must refrain from discussing information held under non-disclosure agreements. Contact your steering committee representative (below) if you need specific guidance.
In the spirit of the meeting, talks and discussions should address general challenges to the goal of performance portability and approaches that might be applied to overcome those challenges, rather than identifying and comparing state-of-play at a particular point in time.
Talks and discussions should not compare performance across specific platforms. Talks and discussions can address performance improvements on a given platform due to programming approaches or can address performance achieved relative to a theoretical performance model.
The focus of talks and discussions should be on portable, non-vendor-specific solutions as seen from the application developer perspective (that is, abstractions that hide vendor-specific solutions are acceptable). It is expected that a particular focus of the meeting will be to address possible evolutions of current standards (for example, OpenMP and C++) to better support performance portability.
Projections to future machines should not be presented.
Talks and discussions must be unclassified and non-sensitive in nature.
Speakers and participants (both labs and vendors) should accept that DOE will have multiple target platforms as part of their national strategy and join the discussion in the spirit of cooperation. All COEs are working toward the goal of making these platforms the most useful and high performance they can be without the threat of "vendor lock-in."

Steering Committee

james.r.reinders [at] intel.com (James Reinders,) Intel/Trinity-Cori
mwglass [at] sandia.gov (Mike Glass), SNL/Trinity
rjhartmanbaker [at] lbl.gov (Rebecca Hartman-Baker), LBNL/Cori
levesque [at] cray.com (John Levesque), Cray/Trinity-Cori
hnam [at] lanl.gov (Hai Ah Nam), LANL/Trinity

neely4 [at] llnl.gov (Rob Neely )(chair), LLNL/Sierra
sextonjc [at] us.ibm.com (Jim Sexton), IBM/Sierra-Summit
straatsmatp [at] ornl.gov (Tjerk Straatsma), ORNL/Summit
zippy [at] alcf.anl.gov (Tim Williams), ANL/Aurora
czeller [at] nvidia.com (Cyril Zeller), NVIDIA/Sierra-Summit

DOE Centers of Excellence Performance Portability Meeting

Overview

Talks

Breakout Sessions

Agenda

Call for Abstracts/Talks (CLOSED)

Speaker and Participant Guidance

Steering Committee

SITE MAP

LLNL.GOV

ORGANIZATIONS

RESOURCES

SITE MAP

LLNL.GOV

ORGANIZATIONS

RESOURCES

DOE Centers of Excellence Performance Portability Meeting

Overview

Talks

Breakout Sessions

Agenda

Call for Abstracts/Talks (CLOSED)

Speaker and Participant Guidance

Steering Committee

SITE MAP

LLNL.GOV

ORGANIZATIONS

RESOURCES

STAY CONNECTED