Felix von Leitner, Code Blau GmbH, Source Code Optimization, here. Top shelf presentation. I would adjust this money shot though for the competitive FinQuant folks. Yes, the optimization of the memory hierarchy performance is a really important goal. But simply optimizing your code’s memory performance without addressing the fact that you are running on a little superscalar, superpiplined parallel machine, may set you apart from the apathetic coders out there, but it doesn’t win you many competitive races. There are two primary optimization targets for FinQuant code optimizations that need to exhibit competitive performance:
1. Instruction Level Parallelism (ILP) and
2. Multi level Cache Hierarchies
If you miss the ILP, you can lose close to an order of magnitude of execution performance. If your code is missing in L2 or L3 needlessly, you could be dumping multiple orders of magnitude of execution performance. If you get both of these right in FinQuant analytics, and you have a the right algorithms, and top of the line off-the-shelf microprocessors then there is no competitor running significantly faster than you (in any race you care about). Where does this fail to hold? Something like session oriented protocol processing for exchange order routing – custom hardware w. FPGAs can demonstrate much better latency performance (say an order of magnitude improvement 10mics to 400ns) than off-the-shelf servers running software no matter how cleverly you optimize. Where does this hold? Standard FinQuant applications like Monte Carlo simulation, most Fixed Income P&L and Risk, even something as simple as Black Scholes.
• Only important optimization goal these days
• Use mul instead of shift: 5 cycles penalty.
• Conditional branch mispredicted: 10cycles.
• Cache miss to main memory: 250cycles.