Anand Lai Shimpi, AnandTech, The Mac Pro Review (Late 2013), here. This is why there is an opening for DynaRack if you have a latency sensitive SIMD application. if you are running a low latency ALGO in 2014 and it’s not a SIMD application then you need to take a deep breath and figure out why it’s not a SIMD application. That’s kind of what’s on the menu in 2014, so to speak. User level asynch/context switching code takes a toll; Danny Hillis makes clocks for Disney with a second hand that moves once every 10,000 years now, the Connection Machine II, not so much.
The iMac Haswell is much faster for certain computations than the Mac Pro Ivy Bridge, even through the MacPro is a much beefier and more expensive system If your ALGO computation falls into the group of computations that can use AVX2 then you might run significantly faster on an iMac. Get a NucRack of Haswells and clean up in the HFT arms race? When they say there is a lag of a generation you can read that as a factor of 2x performance differential for optimized inner loops between the laggard (Ivy Bridge) and the current generation silicon (Haswell) , even through their benchmarks (here at AnandTech) smooth out the performance differential. The AnandTech folks are probably not classical Numerical Analysis types so they use more general purpose benchmarks, i.e., they live in a performance space with a different measure. If you want to see a benchmark display the difference of a microprocessor generation then go look at Hager’s Blaze lib versus Eigen benchmarks that we reviewed earlier. That’s one of the reasons folks optimize their code – when Moore’s Law and Intel give you a factor of 2x performance improvement with a new generation of silicon – your optimized code (inner loop) performance goes up by… a factor of 2x. Or think of it this way, you just purchased a 2x boost in performance but if you are running the performance equivalent of Johnson’s code (aka not optimized) you are getting almost nothing of what you just paid for. Johnson’s code just ate your 2x bump. In about 6 to 9 months we are going to start to ask for Broadwell performance numbers because the Haswell numbers will be getting too old and crusty for use as competitive silicon benchmarks. Alternatively, Johnson’s code is getting hungry and you need to feed it?
If there’s one graph that tells the story of why Intel’s workstation roadmap is ridiculous, it’s this one. The Mac Pro follows Intel’s workstation roadmap, which ends up being cut down versions of Intel’s server silicon, which happens to be a generation behind what you can get on the desktop. So while the latest iMac and MacBook Pro ship with Intel’s latest Haswell cores, the Mac Pro uses what those machines had a year ago: Ivy Bridge. Granted everything else around the CPU cores is beefed up (there’s more cache, many more PCIe lanes, etc…), but single threaded performance does suffer as a result.
Now part of this is exaggerated by the fact that I’m reviewing the 2.7GHz 12-core Mac Pro configuration. Single core turbo tops out at 3.5GHz vs. 3.9GHz for the rest of the parts. I suspect if you had one of the 8-core models you’d see peak single threaded performance similar to what the 2012 27-inch iMac delivers. The 2013 27-inch iMac with its fastest CPU should still be quicker though. We’re not talking about huge margins of victory here, a matter of a handful of percent, but as a much more expensive machine it’s frustrating to not see huge performance leadership in all areas.