Mike Pearce, Intel, What is Code Modernization? here.
Modern high performance computers are built with a combination of resources including: multi-core processors, many core processors, large caches, high speed memory, high bandwidth inter-processor communications fabric, and high speed I/O capabilities. High performance software needs to be designed to take full advantage of these wealth of resources. Whether re-architecting and/or tuning existing applications for maximum performance or architecting new applications for existing or future machines, it is critical to be aware of the interplay between programming models and the efficient use of these resources. Consider this a starting point for information regarding Code Modernization. When it comes to performance, your code matters!
Building parallel versions of software can enable applications to run a given data set in less time, run multiple data sets in a fixed amount of time, or run large-scale data sets that are prohibitive with un-optimized software. The success of parallelization is typically quantified by measuring the speedup of the parallel version relative to the serial version. In addition to that comparison, however, it is also useful to compare that speedup relative to the upper limit of the potential speedup. That issue can be addressed using Amdahl’s Law and Gustafson’s Law.
Good code design takes into consideration several levels of parallelism.
The first level of parallelism is Vector parallelism (within a core) where identical computational instructions are performed on large chunks of data. Both scalar and parallel portions of code will benefit from the efficient use of vector computing.
A second level of parallelism called Thread parallelism, is characterized by a number of cooperating threads of a single process, communicating via shared memory and collectively cooperating on a given task.
The third level is when many codes have been developed in the style of independent cooperating processes, communicating with each other via some message passage system. This is called distributed memory Rank parallelism, so named as each process is given a unique rank number.