April 19–21, 2016 - Glendale, Arizona
Post-meeting report now available
Overview
The Department of Energy (DOE) Centers of Excellence (COEs) Performance Portability meeting is an opportunity for the five COEs to share ideas, progress, and challenges toward the goal of performance portability across DOE's large upcoming advanced architecture supercomputer procurements. The need for applications to run effectively on multiple vendor advanced architecture solutions (as well as on standard "cluster" technology) is pervasive across application teams within DOE and is a specified goal of the DOE's exascale plans for risk mitigation. The two primary goals of this meetings are to:
- Inform application teams and tool developers of activities and methodologies being used across the COEs, and foster informal relationships that can help DOE participants benefit from activities beyond their own COE.
- Identify major challenges toward the goal of performance portability, and work with the vendors and tool providers on determining implementations and solutions that will meet their own performance criteria without inadvertently impairing performance results elsewhere.
Talks
Tuesday, April 19, 2016
Overviews
- Rob Neely, LLNL: Welcome/Kickoff
- Tjerk Straatsma, ORNL: Summit COE/CAAR Overview
- Jack Deslippe, LBL: NERSC-8 COE/NESAP Overview
- Rob Neely, LLNL: Sierra COE Overview
- Hai Ah Nam, LANL: Trinity COE Multi-Lab Overview
- Kalyan Kumaran, ANL: ANL COE Overview
- Nick Romero, ANL: HPCOR Workshop Recap
- Bert Still, LLNL/Multi-Lab: ECP Application Overview and Criteria
- NVIDIA: New Programming Model Features
Applications/Optimizations/Algorithms
- Jae-Seung Yeom, LLNL: Data-Dependent Performance Modeling of Linear Solvers for Sparse Matrices
- Charles Ferenbaugh, LANL: Coarse versus Fine-Level Threading in the PENNANT Mini-App
- Scott Parker, ANL: Performance Optimization and Portability of the Nekbone Mini-App
- Kris Garrett, LANL: A First Look at Optimizing Performance on the KNL
- Vitali Morozov, ANL: Portability of HACC—a Highly Tuned Cosmology Application
- Kristopher Keipert, ANL: Experiences and Challenges while Modernizing GAMESS for Theta and Aurora
- Steve Rennich, NVIDIA: GPU Performance Optimization of the Sweep Operation in Kripke
- Balint Joo, JLab/ANL/LBL: Experiences and Challenges for Performance Portability in Lattice QCD
- Alvaro Vazquez-Mayagoitia, ANL: Many-Core and GPU Developments in the Parallel ELectronic Structure Infrastructure Library (ELSI)
Performance Portable Abstractions
- Tan Nguyen, LBL: Portable Data Locality Management with High-Level Programming Abstractions
- Jeff Vetter, ORNL: Understanding Portability of a High-Level Programming Model on Diverse HPC Architectures
- Christian Trott, SNL: Kokkos—Performance Portability Today
- Rich Hornung, LLNL: The RAJA Encapsulation Model for Architecture Portability
- Arpith Jacob, IBM: Towards Performance Portable GPU Programming with RAJA
Wednesday, April 20, 2016
Managing the Memory Hierarchy
- David Poliakoff, LLNL: Copy Hiding Application Interface (CHAI)—Hiding Data Motion for Performance Portability
- Nikolai Sakharnykh, NVIDIA: Harnessing Performance of Geometric Multi-Grid Methods by Using LOC and TOC Architectures
- Fabian Delalondre, ANL: Leveraging Heterogeneous Systems and Deep Memory Hierarchies for Brain Tissue Modeling
- Luiz DeRose, Cray: Cray’s Programming Environment for Portable Performance and Programmability on Systems with High-Bandwidth Memory
- Ian Karlin, LLNL/Multi-Lab: Quad Lab Proposal of Fundamental Cross Architecture Multi-Level Memory Support
Application Experience with Performance Portable Abstractions
- Changhoan Kim, IBM: An Abstraction for Unstructured Mesh Problems
- Adam Kunen, LLNL: Nested Loop RAJA for Performance Portability
- Stan Moore, SNL: Obtaining Threading Performance Portability in SPARTA Using Kokkos
- David Beckingsale, LLNL: Lightweight Models for Dynamically Tuning Data-Dependent Code
- Geoff Womeldorff, LANL: Kokkos and Legion Implementations of the SNAP Proxy Application
- Ryan Bleile, LLNL: Investigation of Portable Event-Based Monte Carlo Transport
- Matt Martineau, UK: Investigating the Performance Portability Capabilities of OpenMP 4, Kokkos, and Raja
- Leopold Grinberg, IBM: Performance Portable Single Source-Code Implementation of Sparse Linear Algebra Operations on CPUs and GPUs
- Slaven Peles, LLNL: Investigating Interoperability and Performance Portability of Select LLNL Numerical Libraries
Experience with OpenMP and Recommendations on Guilding Future Standards
- John Pennycook, Intel: Performance Portability of Kernel-based Abstractions
- John Pennycook, Intel: Generalizing a DSL for Structured Dependency (Stencil-Like) Codes to OpenMP Loops
- John Levesque, Cray: How We Can Get Hybrid OpenMP/MPI to Out-Perform All-MPI
- Carlo Bertolli, IBM: Performance Portability with OpenMP on Nvidia GPUs
- Jeff Larkin, NVIDIA: Performance Portability through Descriptive Parallelism
- David Appelhans, IBM: Performance Portability Experience with LLVM, OpenMP 4, and Kripke
- Kevin O’Brien, IBM: OpenMP Specifications for Portability
- Oscar Hernandez, ORNL: Experiences with High-Level Programming Directives for Porting SPEC ACCEL on Multiple Architectures
- Tom Scogland, LLNL: Performance Portability with OpenMP: Experiences with 4.5 and Looking toward 5.0
Thursday, April 21, 2016
Tools for Performance Portability and Analysis
- Jeanine Cook, SNL: The Importability of Performance Tools
- Juan Gonzalez Garcia, IBM: Next-Gen Profiling-Infrastructure for Supercomputers Based on Hybrid Nodes
- Ignacio Laguna, LLNL: STATuner—Tuning CUDA Kernels via Compiler Analysis and Machine Learning
- Si Hammond, SNL: Profiling Interfaces for Parallel C++ Abstractions - KokkosP
- Protonu Basu, LBL: Leveraging Compiler-Based Tools for Performance-Portability
- Heidi Poxon, Cray: Adding Parallelism to HPC Applications Using Reveal
The Input/Output Bottleneck and Use of Burst Buffers
- Mark Miller, LLNL: Probing Portable Performance of Parallel I/O Paradigms Using MACSio (4 M)
- Andrey Ovsyannikov, LBL: ChomboCrunch and VisIt for Carbon Sequestration and In-Transit Data Analysis Using Burst Buffers
- Kathryn Mohror, LLNL: Performance Portability for Burst Buffers with the Scalable Checkpoint/Restart Library (SCR)
Use of Domain-Specific Languages for Performance Portability
- David Richards, LLNL: Portable Performance in Real Applications Using Generated Code
- Brian Van Straalen, LBL: AMRStencil—An Embedded DSL for Expressing Structured Adaptive Mesh Refinement Algorithms
Breakout Sessions
During the meeting, several breakout sessions were held to gather participant input on four topics areas outlined below. Each topic had two groups independently discuss a set of questions that the breakout moderators worked together to come up with before the meeting. Below are the summaries of the discussions presented during the meeting as out-briefs:
- Managing the Memory Hierarchy: Session A, Session B
- Performance-Portable Abstractions: Session C, Session D
- OpenMP Futures: Session E, Session F
- Tools/Compiler/System Software Requirements: Session G, Session H
Agenda
Download the final agenda (updated 4/14/16).\
Call for Abstracts/Talks (CLOSED)
Meeting organizers put out an open call for speakers to give short talks on progress, ideas, and/or challenges in the following topical areas:
- Algorithmic and application work aimed at addressing trends in advanced architectures (e.g. reduced data motion, increased concurrency, use of burst buffers, etc.)
- Software tools, libraries, abstractions, and standards intended to help applications address performance portability
- Early application experiences (both positive and negative) attempting to run portably across diverse platforms
- Portable approaches application and library teams are taking to "manage the memory hierarchy", and optimize data placement and movement
Speakers were chosen from the submissions that were received by Feb 29, 2016, and will be reflected in the final agenda.
Speaker and Participant Guidance
The following "ground rules" have been established for participants and speakers:
- Talks and discussions must refrain from discussing information held under non-disclosure agreements. Contact your steering committee representative (below) if you need specific guidance.
- In the spirit of the meeting, talks and discussions should address general challenges to the goal of performance portability and approaches that might be applied to overcome those challenges, rather than identifying and comparing state-of-play at a particular point in time.
- Talks and discussions should not compare performance across specific platforms. Talks and discussions can address performance improvements on a given platform due to programming approaches or can address performance achieved relative to a theoretical performance model.
- The focus of talks and discussions should be on portable, non-vendor-specific solutions as seen from the application developer perspective (that is, abstractions that hide vendor-specific solutions are acceptable). It is expected that a particular focus of the meeting will be to address possible evolutions of current standards (for example, OpenMP and C++) to better support performance portability.
- Projections to future machines should not be presented.
- Talks and discussions must be unclassified and non-sensitive in nature.
- Speakers and participants (both labs and vendors) should accept that DOE will have multiple target platforms as part of their national strategy and join the discussion in the spirit of cooperation. All COEs are working toward the goal of making these platforms the most useful and high performance they can be without the threat of "vendor lock-in."
Steering Committee
james.r.reinders [at] intel.com (James Reinders,) Intel/Trinity-Cori
mwglass [at] sandia.gov (Mike Glass), SNL/Trinity
rjhartmanbaker [at] lbl.gov (Rebecca Hartman-Baker), LBNL/Cori
levesque [at] cray.com (John Levesque), Cray/Trinity-Cori
hnam [at] lanl.gov (Hai Ah Nam), LANL/Trinity
neely4 [at] llnl.gov (Rob Neely )(chair), LLNL/Sierra
sextonjc [at] us.ibm.com (Jim Sexton), IBM/Sierra-Summit
straatsmatp [at] ornl.gov (Tjerk Straatsma), ORNL/Summit
zippy [at] alcf.anl.gov (Tim Williams), ANL/Aurora
czeller [at] nvidia.com (Cyril Zeller), NVIDIA/Sierra-Summit