**Rik Mylewski**, The Register, AMD’s ‘Revolution; will be televised … if its CPU-GPU frankenchip Kaveri is a hit, here. Need some FP juice.

AMD has released its long-awaited Kaveri processor, the first accelerated processing unit (APU) to incorporate both on-die CPU and GPU cores in a heterogeneous system architecture (HSA) with a shared memory architecture.

In order to simulate the entire balance sheet of a large bank it appears that you need about one second of execution time on a Haswell core to get the Net Interest Margin at the end of year 1. If you then want to run a 10K path monte carlo simulation of the balance sheet to get the expected Net Interest Margin at the end of year one – you need 10K seconds on that Haswell core, or you start to parallelize. Let’s just hold the parallelization for bit until we exhaust all the sources of sub 10 picosecond (on average) FP arithmetic on core. We can always parallelize later, but if you don’t catch that sub 10 picosecond arithmetic, it’s gone forever. Living in the Golden Age of FP computing I am now getting two FMA units issuing on each 3GHz clock cycle. That’s nice but is there more FP execution available on chip before I wander off into parallelization. With Kaveri apparently someone will give me an on chip GPU. I guess what I would like is to put the accrual portfolio balance/cashflow generation in the FMA units and the 10K monte carlo in the GPU, all on chip. Then I can use 100k of these chips to implement the quasi Newton NLP for the risk adjusted optimal Net Interest Margin.

As it stands, assuming 2K microprocessors for $1 million and I have a 24 hour budget to find the Optimized Net Interest Margin, then each NonLinear Programming (NLP) function evaluation costs between $10 and $100 . That means I need a machine that costs $10 to $100 billion dollars to get the optimal Net Interest Margin, if NLP execution time constants are close to one (and everything scales nicely). Since the whole NIM Optimization market is projected to be like $100 billion a year, I am going to need some improvements to get to breakeven on the hardware costs, right? Let’s say over the next 5 years the FMAs somehow get me 32x vectorized FP execution performance. That gets my NLP function evaluation cost to, call it between 30 cents to $3. If I could offload the monte carlo on to the on GPU and get 10x my NLP function evaluation cost is down to between 3 cents and 30cents. That is better, the machine I need would then be low end $3 million and I have not had to do anything smart algorithmically.

Next go talk with Nemirovsky, Ruszcznski, or Bertsekas and find out how to get a competitive parallel NLP solver for Net Interest Margin Optimization. Looking for something like that parallel Nelder-Mead optimizer. If anyone has any ideas about parallel NLP there is a non-trivial bucket of money here, just take a second to think about it. US Bank balance sheets hold something like $15 trillion of assets at a 30 year low 300 basis points of decidedly unoptimized Net Interest Margin. Bunch of folks driving rockets with joysticks.