What is there so far? The code is simply a main() with a case statement that exercises portions of the classes referred to previously in unoptimized gcc generated x86 code. We test and measure the elapsed run-time of holding the balance sheet flat as a baseline. That more or less tracks the performance you would see in a naive implementation of memcpy() of memset(). You basically just want to see how much of the L1 cache bandwidth you can use with the detuned x86 code gcc issues -O2 or worse. In my hazy recollection you get about 50 GBs out of the L1 cache if the microprocessor is not the worst x86 out there in 2016. So for single precision floats you can get about ten billion of them into the registers for execution and you can write back about ten billion floats every second. The L2 and L3 bandwidth are not as high as the L1 bandwidth but the whole system is architected to give you memory consistency. Once L1 commits to the write by returning control to your program you are effectively done even if the L3 write-backs are still in flight. So for example, assume you are discounting 512K (half a million) cash flow time-series for each of 60 months. So for every account in your balance sheet ,you are going to discount all future cashflows to the current as of date. That would imply about 30 million floats for the cash flows, and 30 million floats (worst case) for the discounts (it is probably closer to 10K floats) have to be read and you have to sum the products of the cashflows and the discounts. You are going to write back half a million values. That discounting is done with optimized code in about 0.01 seconds. If the code is not optimized, like it is done naively by someone who has never written a program before and copies code out of books, it is done in 0.2 seconds single core, single processor.
We simulate the balance sheet in a tight but unoptimized loop, simulating each of the products with a regression function implemented in a C++ function (that I do not expect that the compiler in-lines). So there are two fundamental modes of static balance sheet simulation.
- Simulate the balances and rates in the balance sheet itself effectively ignoring the securities that are mapped to that account in the balance sheet, and
- Simulate all the securities where the Reference Server has a calibrated model.
Right now the regressions are simply place holders. They do not fit any historical data. They are just there to make the x86 FP execution unit do something that takes a bunch of clock cycles to finish.
Why do we want these two simulation modes? In the event that the Reference Server has suitable models for the products it is preferable from an efficiency standpoint to simply simulate the products directly, since there are not likely to be that many models, even at a GSIB. If there are many more than a couple thousand models there is no one person who really grasps the entire status of the model implementation. The models may be parameterized so there is automated generation a specific set of models. But it seems unlikely that there will be several hundred thousand different non-parameterized models even if there are a billion individual contracts/securities. So simulation at the product model levels seems to have the possibility of being the most efficient mode of simulation. If we get into a situation where the Reference Server does not have a model to evaluate for some portion of the balance sheet it might make sense to have the user supply the static balances directly. In this case, we want simulate the balance sheet rather than just the individual product models. in either case you probably want to simulate the products and then deal out the simulated values as dictated by the balance sheet mapping implied by the user’s accrual portfolio input to the Reference Server.
Most of the coding effort has been simply to establish the elapsed run-time baseline for the largest expected GSIB balance sheet. The typical elapsed run time is measured with a microsecond resolution clock and the typical elapsed time for a 512K balance sheet with 60 simulation periods is under a couple seconds. I have assumed 512K is the high end of the number of accounts we will see at a GSIB. If we simulate all the GSIBs or all the commercial banks we might get several million balance sheet accounts, but it is probably better to simulate by product/security for those large aggregates.