Nicole Hemsoth, HPC Wire, Intel Firms Up Fortran, Steps to Standards, here. Intel kind of has to make the case that programming model is not going to change all that much to get to Phi. I do not know state of the art but this could be a debatable point. You have to reprogram and rework your program architecture just to get to the ILP in the AVX2. Anyone believing otherwise is mistaken or doesn’t actually need the FP juice. The idea that moving to 50 cores with avx-512 is not going to completely fuck with your current program architecture and algorithm selection is the rarest level of optimism.
“The key for us with Phi has been to give the ability to program in a common manner between Xeon and Xeon phi. We have some features now that show how we’re going to be able to bridge the current generation of Phi to the future Knight’s Landing chips, which are in development now. Some of the functionality starts to show up in our tools this year, so in terms of things like AVX -512, our tools are ready. There will be some updates in October that bring it into full force in our suite, but for the first time we’re talking about that being part of our products.”
Rich Brueckner, Inside HPC, Benchmarks: Intel Xeon Phi vs. NVIDIA Tesla GPU, here.
We haven’t seen many side-by-side application performance comparisons of Intel Xeon Phi vs. Nvidia Kepler, but that is starting to change. Over at the Xcelertit Blog, Jörg Lotze writes that a recent test shows it is a close race on a benchmark called the Monte-Carlo LIBOR Swaption Portfolio Pricer.
Jorg Lotze, xcelerit, Benchmarks:Intel Xeon Phi vs. NVIDIA Tesla GPU, here. This looks real interesting. I probably like this blog but just let me note my complaints up front. I probably don’t care about Sandy Bridge for the baseline benchmark in 2013 almost 2014. Obviously, use Haswell even with the smaller number of cores probably beats Tesla and Phi on the American Options test shown later in the article. I like the part in the end where you’re gonna buy an SDK to make your code go fast. Maybe there is an SDK to help you stick with Uncle Drew on D. It could happen, you don’t know.
PerformanceThe algorithm has been executed on all three platforms, in double precision, for varying numbers of paths. The portfolio consisted of 15 swaptions, simulated for 40 time steps. For reference, we’ve also included a straightforward sequential C++ implementation, running on a single core of the Sandy-Bridge CPU. The results are listed in the table below:
Paths Sequential Sandy-Bridge CPU1,2 Xeon Phi1,2 Tesla GPU2 128K 13,062ms 694ms 603ms 146ms 256K 26,106ms 1,399ms 795ms 280ms 512K 52,223ms 2,771ms 1,200ms 543ms
1 The Sandy-Bridge and Phi implementations make use of SIMD vector intrinsics. 2 The MRG32K3a random generator from the cuRAND library (GPU) and MKL library (Sandy-Bridge/Phi) were used.