Ryan Smith, Anand Tech, Intel Broadwell Architecture Preview: A Glimpse into Core M, here. Floating point on your iPad running at Tiger Noodles is going to smoke JPM/Maxeler’s million dollar FPGA supercomputer for credit derivative valuation. However, the barrier to entry is high, you would have to learn compile a program on an x86.
Of course efficiency increases can only take you so far, so along with the above changes Intel is also making some more fundamental improvements to Broadwell’s math performance. Both multiplication and division are receiving a performance boost thanks to performance improvements in their respective hardware. Floating point multiplication is seeing a sizable reduction in instruction latency from 5 cycles to 3 cycles, and meanwhile division performance is being improved by the use of an even larger Radix-1024 (10bit) divider. Even vector operations will see some improvements here, with Broadwell implementing a faster version of the vector Gather instruction.