The aim of the summer school is to give a thorough introduction of tools, langages and execution models for efficient programming of heterogeneous multicore architectures with accelerators. Three lectures will describe the existing programming levels for accelerators as well as techniques for simultaneously using multicores and accelerators within a single application, while trying to maintain performance portability. All courses consist of lectures and hands-on sessions where everyone can try out the tools on several exercises using different parallel computing hardware. No preliminary knowlegde of accelerator programming is required, but basic knowledge of parallel programming is recommended.
* Lecture 1:
Michael WOLFE (The Portland Group, Inc.)
The lecture will introduce high-level directive-based programming for heterogeneous architectures, such as CPUs connected to GPUs or Intel Xeon Phi coprocessors. We will discuss what kinds of programs are amenable to acceleration with these architectures, and the advantages and penalties of high-level programming. We will go into details of OpenACC, and compare to OpenMP and Intel Languages Extensions for Offload, and will use the PGI compilers for demonstration.
Michael WOLFE has over 35 years of experience developing languages and compilers for high performance and parallel computers in industry and academia. He joined The Portland Group in 1996, where his responsibilities and interests include deep compiler analysis and optimizations ranging from improving the efficiency of programs on parallel clusters to designing and implementing features for high level GPU programming. He was an associate professor at the Oregon Graduate Institute from 1988 until 1996, and was a cofounder and lead compiler engineer at Kuck and Associates, Inc., prior to that. He has published one textbook, "High Performance Compilers for Parallel Computing", a monograph, "Optimizing Supercompilers for Supercomputers," and a number of technical papers, and holds one patent.
Easy GPU Parallelism with OpenACC, June 11, 2012
The OpenACC Execution Model, August 27, 2012
Creating and Using Libraries with OpenACC, October 29, 2012
The PGI OpenACC Getting Started Guide
Tips for Maximizing Performance with OpenACC in Fortran
* Lecture 2:
Programming Massively Parallel Processors Using CUDA and C++AMP
Wen-Mei HWU (University of Illinois at Urbana-Champaign)
Description: This course introduces the principles and basic techniques for programming a CPU+GPU heterogeneous parallel computing system. We will start with CUDA, a low-level programming interface where the programmers must explicitly specify the details in data movement, thread index to data index mapping, and use of scratchpad memory. Once the students are familiar with the details, we will also cover C++AMP, a higher-level programming interface where the compiler takes care of a good portion of the details, improving code maintainability. However, the programmer still needs to go through the same thought process in order to achieve performance goals. We will cover some important parallel computation patterns with both CUDA and C++AMP.
"Programming Massively Parallel Processors - A Hands-on Approach", Kirk and Hwu, Morgan-Kaufmann Publisher, 2012.
* Lecture 3:
Implicit and task-based approaches to heterogeneous parallel programming
Josef WEIDENDORFER ( Technical University Munich)
After introducing general-purpose processors and their issues, the lecture will detail the design space for accelerators and why some codes are good or bad candidates for these architectures. We describe ways to manage heterogeneity, such as work partitioning, granularity, load balancing, scheduling and data transfers. We also explain how memory namespaces between devices and/or nodes can be managed before presenting some existing programming models and runtime systems that address these issues using task-based models.
Barcelona Supercomputing Center: Programming with StarSS