Michael Wolfe, HPCWire, Compilers and More: MPI+X, here.
At ISC’14, there was intense and continuing interest in the choice of a standard approach for programming the next generation HPC systems. While not guaranteed, many of these systems are likely to be large clusters of nodes with multicore CPUs and some sort of attached accelerators. A standard programming approach is necessary to convince developers, and particularly ISVs, to start adoption now in preparation for this coming generation of systems. John Barr raised the same question in an article at Scientific Computing World from a more philosophical point of view. Here I address this question from a deeper technical perspective.
HPC programming is currently dominated by either a flat model with MPI across nodes as well as cores within a node, or a hybrid model with MPI across the nodes and OpenMP shared memory parallelism across the cores in a node. The advantage of flat MPI is a simpler programming model, only one level of parallelism and only one API. The disadvantage is it doesn’t take advantage of the shared data across the ranks on the same node, requiring message and buffer management across all ranks. MPI+OpenMP roughly inverts those advantages and disadvantages.
The reason MPI and MPI+OpenMP have worked so well over the past 20 years now is that most HPC systems are roughly isomorphic, with some differences in instruction set, node topology, and performance profiles. The system is a network of nodes, the nodes have one or more processors, the processors have one or more cores. The cores on a node share virtual and physical memory, with hardware cache coherence to make shared memory programming relatively safe. There are some outliers, like the big SGI shared memory systems, which have some programming model and performance advantages for certain applications.