John Mellor-Crummey, Rice, Current Research, here.
It is increasingly difficult for application developers writing complex scientific programs to attain a significant fraction of peak performance on modern microprocessor-based computer systems. Largely, this problem stems from the difficulty of expressing the application in a form that can effectively exploit the high-degree of instruction-level parallelism and deep memory hierarchies present in these systems. Furthermore, the complexity of these systems makes it difficult to pinpoint performance bottlenecks.
To address this issue, we have developed HPCToolkit — a novel suite of multi-platform tools for performance analysis of sequential and parallel programs.