Home » Analytics
Category Archives: Analytics
Original Dynamic Pie Charts
Original Dynapi, here. Those CDs sure did well for a while.
Floating Point and Roundoff Error Analysis
David Bindel, 2001, CS 279 Annotated Course Bibliography, here. Rereading this – very nice summary.
This annotated bibliography is a complement to a set of course notes for CS 279, System Support for Scientific Computation, taught in Spring 2001 by W. Kahan at UC Berkeley. It is meant to be representative, not comprehensive, and includes pointers to survey works and online bibliographies which cover the literature in various areas in more depth. In the hope of making the bilbiography simpler to use, the notes are partitioned by topic. The categorization is far from precise, but where a reference provides pertinent coverage of multiple topics, I have tried to provide appropriate cross-references in the section text.
This bibliography is based largely on the bibliography assembled by Judy Hu for the Spring 1992 version of this course. Other major sources for annotated entries include the bibliography of the Apple Numerics Manual and bibliogra- phies from the papers of W. Kahan. Annotations attributable to those sources are clearly labeled.
Muller, et. al., 2009, Handbook of Floating-Point Arithmetic, here. Just starting to read through this morning. Seems to have some prospect of covering roundoff analysis on contemporary microprocessors.
3.5.2 Fused multiply-add
The IBM Power/PowerPC, HP/Intel IA-64, and HAL/Fujitsu SPARC64 VI instruction sets define a fused multiply-add (FMA) instruction, which performs the operation a × b + c with only one rounding error with respect to the exact result (see Section 2.8 page 51).18 This is actually a family of instructions that includes useful variations such as fused multiply-subtract.
These operations are compatible with the FMA defined by IEEE 754- 2008. As far as this operator is concerned, IEEE 754-2008 standardized already existing practice.
The processors implementing these instruction sets (the IBM POWER family, PowerPC processors from various vendors, the HP/Intel Itanium family for IA-64) provide hardware FMA operators, with latencies compa- rable to classical + and × operators. For illustration, the FMA latency is 4 cycles in Itanium2, 7 cycles on Power6, and both processors are capable of launching 2 FMA operations at each cycle.
There should soon be FMA hardware in the processors implementing the IA-32 instruction set: they are defined in the SSE5 extensions announced by AMD and in the AVX extensions announced by Intel.
Kahan, 1996, The Improbability of Probabilistic Error Analysis for Numerical Computations, here. Treasure
Roundoff in Floating-Point Arithmetic:
Suppose the program asks the computer to calculate W := X·Y + Z ;
what the computer actually calculates is w = ( (x·y)·(1 + ß) + z )·(1 + μ)
in which ß and μ stand for rounding errors, tiny values for which we know a priori bounds like, say,
| ß | < 2-53 , | μ | < 2-53 ; 2-53 ≈ 10-16 . ( These bounds suit Double Precision ( REAL*8 ) on most computers nowadays.)
The simplest model of roundoff assumes that nothing more can be known about ß and μ .
The simplest probabilistic model of roundoff assumes that ß and μ are independent random variates distributed Uniformly
between their bounds ±2-53 . Both models merely approximate the truth.
Navier Stokes Solution Claimed and Twitter Algos
Soulskill, Slashdot, Kazakh Professor Claims Solution of Another Millennium Prize Problem, here. Did time as a first year grad student in the Navier Stokes mines, coding and debugging (mostly debugging) numerical PDE solver schemes for Wavy Vortex flow on an IBM mainframe. Can’t parse the paper in Russian though. I hope Tao can read Russian, otherwise confirmation could take a while.
An anonymous reader writes “Kazakh news site BNews.kz reports that Mukhtarbay Otelbaev, Director of the Eurasian Mathematical Institute of the Eurasian National University, is claiming to have found the solution to another Millennium Prize Problems. His paper, which is called ‘Existence of a strong solution of the Navier-Stokes equations’ and is freely available online (PDF in Russian), may present a solution to the fundamental partial differentials equations that describe the flow of incompressible fluids for which, until now, only a subset of specific solutions have been found.
Otelbaev M., EXISTENCE OF A STRONG SOLUTION OF THE NAVIER-STOKES EQUATION, here. 100+ pages and last page is in English. Confirmation complexity compared to ABC Conjecture should be a layup, unless of course the result depends on InterUniversal Teichmuller Theory. How do you say “InterUniversal Teichmuller Theory” in Russian? Here is an earlier publication M. Otelbaev et. al. 2006, in English – Existence Conditions for a Global Strong Solution to One Class of Nonlinear Evolution Equations in a Hilbert Space, here.
otelbaev.com, Activities, here. Work summary.
His main works are grouped around the following fields:
I. Spectral theory of differential operators.
M. Otelbaev developed new methods for studying the spectral properties of differential operators, which are the result of a consistent and skilled implementation of the general idea of the localization of the problems under consideration. In particular, he invented a construction of averaging coefficients well describing those features of their behaviour which influence the spectral properties of a differential operator. This construction known under the name made it possible to answer many of the hitherto open questions of the spectral theory of the Schrödinger operator and its generalizations.
The function and its different variants have a number of remarkable properties, which allowed to apply this function to a wide range of problems. Here we note some problems for the first time solved by M. Otelbaev by using the function on the basis of sophisticated analysis of the properties of differential operators.
1) A criterion for belonging of the resolvent of the Schrödinger type operator with a non-negative potential to the class was found (previously only a criterion for belonging to was known) and two-sided estimates for the eigenvalues of this operator were obtained with the minimal assumptions of the smoothness of the coefficients.
2) The general localization principle was proved for the problems of selfadjointness and of the maximal dissipativity (simultaneously with the American mathematician P. Chernov) which provided significant progress in this area.
3) Examples were given showing the classical Carleman-Titchmarsh formula for the distribution function of the eigenvalues of the Sturm-Liouville operator is not always correct even in the class of monotonic potentials and a new formula was found valid for all monotonic potentials .
4) The following result of M. Otelbaev is principally important: for there is no universal asymptotic formula.
5) From the time of Carleman, who found the asymptotics for and, by using it, the asymptotics of the eigenvalues themselves, all mathematicians started with finding the asymptotics for and as a result they could not get rid of the so-called Tauberian conditions. M. Otelbaev was the first who, when looking for the asymptotics of the eigenvalues, omitted the interim step of finding the asymptotics for , which allowed getting rid of all non-essential conditions for the problem including Tauberian conditions.
6) The two-sided asymptotics for for the Dirac operator was for the first time found when and are not equivalent.
The results of M. Otelbaev on the spectral theory were included as separate chapters in the monographs of B.M .Levitan and I.S. Sargsyan “ Sturm-Liouville and Dirac operators» (Moscow: Nauka, 1985), and of A.G. Kostyuchenko and I.S. Sargsyan «Distribution of eigenvalues» (Moscow: Nauka, 1979), which became classical
Tyler Durden, Zerohedge, How Twitter Algos Determine Who Is Market-Moving And Who Isn’t, here.
Now that even Bridgewater has joined the Twitter craze and is using user-generated content for real-time economic modelling, and who knows what else, the scramble to determine who has the most market-moving, and actionable, Twitter stream is on. Because with HFT algos having camped out at all the usual newswire sources: Bloomberg, Reuters, Dow Jones, etc. the scramble to find a “content edge” for market moving information has never been higher. However, that opens up a far trickier question: whose information on the fastest growing social network, one which many say may surpass Bloomberg in terms of news propagation and functionality, is credible and by implication: whose is not? Indeed, that is the $64K question. Luckily, there is an algo for that.
Black Scholes 2014
Black Scholes 2014
Here is the Black Scholes equation, competitive code in 2014 on a single off-the-shelf microprocessor can execute this computation 150 to 200 million times a second. Who does this matter to? If you are running an Algorithmic trading/SOR and Black Scholes is in your inner loop between processing market data/order book and executing via the exchange gateway/SOR then you probably do not want to run non-competitive code. There are folks who say microseconds count. If you are running a Monte Carlo simulator with Black Scholes code evaluating the randomly generated scenarios across a large inventory, you probably need competitive code just to get the simulation to stabilize and converge.
Anyway here is the Black Scholes closed form equation for a call option:
call price = S*N(d1) – K*exp(-r*(T-t))*N(d2)
d1 = (ln(S/K) +(r + sigma**2 *0.5)*(T-t))/sigma*sqrt(T-t)
d2 = d1 – sigma * sqrt(T-t)
• N() is the cumulative distribution function of the standard normal distribution
• T – t is the time to maturity
• S is the spot price of the underlying asset
• K is the strike price
• r is the risk free rate (annual rate, expressed in terms of continuous compounding)
• sigma is the volatility of returns of the underlying asset
What is competitive performance for Black Scholes valuation in late 2013? How do you generate a reasonable estimate, so you know when to stop optimizing/tinkering? The most straightforward way to get a competitive execution time estimate is to count the required multiplies and additions. We could do that directly from the equations above, making some assumptions about common subexpression elimination and caching. But we can simply analyze Intel’s code for Black Scholes (below), that’s easier for now. Once we know the number of multiplies and additions (or equivalently the instruction cycle count) we can estimate how many execution cycles we will need on a given microprocessor executing this code. Since we can look up the clock speed of the given microprocessor we can back into an estimate of the time per Black Scholes valuation. The last time we did this analysis in 2009, here, we found that on a 2007 vintage microprocessor (IBM POWER 6) that we needed 170 cycles for the valuations and input sensitivities. That was about 36 nanoseconds per valuation or about 30 million Black Scholes valuations per second. What has changed in competitive Black Scholes performance in the past five years?
Here is Intel code implementing the Black Scholes equation valuation.
void BlackScholesFormula( int nopt, tfloat r, tfloat sig, tfloat s0[], tfloat x[], tfloat t[], tfloat vcall[], tfloat vput[] ) { vmlSetMode( VML_EP ); DIV(s0, x, Div); LOG(Div, Log); for ( j = 0; j < nopt; j++ ) { // loop 1 tr [j] = t[j] * r; tss[j] = t[j] * sig_2; tss05[j] = tss[j] * HALF; mtr[j] = -tr[j]; } EXP(mtr, Exp); INVSQRT(tss, InvSqrt); for ( j = 0; j < nopt; j++ ) { // loop 2 w1[j] =(Log[j] + tr[j] + tss05[j]) * InvSqrt[j] *INV_SQRT2; w2[j] =(Log[j] + tr[j] – tss05[j]) * InvSqrt[j] *INV_SQRT2; } ERF(w1, w1); ERF(w2, w2); for ( j = 0; j < nopt; j++ ) { // loop 3 w1[j] = HALF + HALF * w1[j]; w2[j] = HALF + HALF * w2[j]; vcall[j] = s0[j] * w1[j] – x[j] * Exp[j] * w2[j]; vput[j] = vcall[j] – s0[j] + x[j] * Exp[j]; } }
Let’s estimate the theoretical competitive performance of this code. Notice a couple things in the intel code that shave off a few cycles. The risk free rate and the volatility are assumed to be scalars, they don’t vary with the portfolio of call options. On the other hand the code always computes a put and call option price as opposed to a single portfolio position. We will assume double precision although single precision would be significantly faster. We will assume LA rather than EP for VML execution. We will add some cycles to the estimates coming from this code to account for input sensitivities, but otherwise leave the code as is. That will make these estimates directly comparable to the 2009 estimates. We need some current cycles counts from Intel VML, here. We use the counts labeled Intel® Core ™ i5-4670T Processor base clocked at 2.3 GHz turbo to 3.3 GHz.
Code |
Cycles/Element |
Accuracy ULP |
Div() |
3.02 |
3.10 |
Log() |
6.16 |
0.80 |
loop 1 |
2 |
|
Exp() |
3.65 |
1.98 |
Invsqrt() |
3.70 |
1.42 |
loop2 |
3 |
|
ERF() x 2 |
6.05*2 |
1.33 |
loop3 |
4 |
|
37.63 |
The sensitivities drop out from differentiating the Black Scholes formula. We have lots of common subexpressions:
delta = N(d1),
gamma = N(d1)/(S*sigma*t),
vega = S*N(d1) * sqrt(t),
theta = -(S*N(d1)*sigma)/2*sqrt(t) – r* K*exp(-r*t)*N(d2), and
rho = K*t*exp(-r*t)*N(d2)
This looks like 6 or maybe 7 cycles of computation, remember you have two FMA execution units running on each clock tick. I don’t see a way to argue away the 3 cycles for the divide in the gamma. Let’s say 44 cycles all in for this estimate on a single core. So, a full Black Scholes valuation is executed every 13 nanoseconds @ 3.3 GHz and about 19 nanoseconds @ 2.3 GHz if there is no L1/L2 cache pressure on the core. This code is not going to generate many extra cache misses.
The i5-4670T has four cores let’s assume you can use three of them trivially @2.3 GHz and the OS uses the 4th. You can get to 150 million full Black Scholes valuations per second on one i5-4670T, if you can somehow get to execute on that 4th core you could crack 200 million per second. It does not look like Turbo boost is going to get you 4 cores running at 3.3 GHz. Maybe over-clocking the 4670 will let you crack 200 million full Black Scholes valuations per second. I’d estimate competitive performance in late 2013 is 150 to 200 million full Black Scholes per second on a single i5-4670T at $213 a pop, up from 30 million per second in 2009. More or less what you would expect from code tracking Moore’s Law.
Source Code Optimization
Felix von Leitner, Code Blau GmbH, Source Code Optimization, here. Top shelf presentation. I would adjust this money shot though for the competitive FinQuant folks. Yes, the optimization of the memory hierarchy performance is a really important goal. But simply optimizing your code’s memory performance without addressing the fact that you are running on a little superscalar, superpiplined parallel machine, may set you apart from the apathetic coders out there, but it doesn’t win you many competitive races. There are two primary optimization targets for FinQuant code optimizations that need to exhibit competitive performance:
1. Instruction Level Parallelism (ILP) and
2. Multi level Cache Hierarchies
If you miss the ILP, you can lose close to an order of magnitude of execution performance. If your code is missing in L2 or L3 needlessly, you could be dumping multiple orders of magnitude of execution performance. If you get both of these right in FinQuant analytics, and you have a the right algorithms, and top of the line off-the-shelf microprocessors then there is no competitor running significantly faster than you (in any race you care about). Where does this fail to hold? Something like session oriented protocol processing for exchange order routing – custom hardware w. FPGAs can demonstrate much better latency performance (say an order of magnitude improvement 10mics to 400ns) than off-the-shelf servers running software no matter how cleverly you optimize. Where does this hold? Standard FinQuant applications like Monte Carlo simulation, most Fixed Income P&L and Risk, even something as simple as Black Scholes.
Memory Hierarchy
• Only important optimization goal these days
• Use mul instead of shift: 5 cycles penalty.
• Conditional branch mispredicted: 10cycles.
• Cache miss to main memory: 250cycles.
Algorithmic and High-frequency trading: an overview
Marco Avellaneda, NYU, Algorithmic and High-frequency trading: an overview, here.
Algorithmic trading: the use of programs and computers to generate and execute (large) orders in markets with electronic access.
Almgren and Chriss, NYU, Dec 2000, Optimal Execution of Portfolio Transactions, here.
Abstract
We consider the execution of portfolio transactions with the aim of minimizing a combination of volatility risk and transaction costs aris- ing from permanent and temporary market impact. For a simple lin- ear cost model, we explicitly construct the efficient frontier in the space of time-dependent liquidation strategies, which have minimum expected cost for a given level of uncertainty. We may then select op- timal strategies either by minimizing a quadratic utility function, or by minimizing Value at Risk. The latter choice leads to the concept of Liquidity-adjusted VAR, or L-VaR, that explicitly considers the best tradeoff between volatility risk and liquidation costs.
Counterparty Valuation Adjustment Analytics 101
Found some old prose for a study I wrote on Counterparty Valuation Adjustment for OTC derivatives. I was thinking to pull this together for publication back in the day and then saw Jon Gregory’s CVA book was already out on the shelves, so this prose kind of got lost. I had not looked at it for a couple years and the details on the per trade mitigants was not foremost in my memory when I reread this. It isn’t bad. The Analytics are really quite straightforward. Getting the client data clean and lined up and the grid/cache layout of the Monte Carlo grid execution are really the whole game here.
Counterparty Valuation Adjustment Analytics
Once you have the Reference Data and Trade Inventory in hand, the problem is to apply suitable CVA models allowing a production CVA computation to produce valuation, sensitivity, and explanatory information in a timely fashion for the desk and P&L reporting. Each desk’s product group maintains a series of models coded in an analytics library for the P&L valuation of each of their particular products. For each trade the analytics compute the expected fair value of the position given a suitable series of underlying prices, spreads, and market levels for that particular product. As part of their daily P&L batch they retrieve market data and compute the MTMs, risk sensitivities, and PAA for each active trade in their inventory on the current business day. Now for CVA, the product group and research develop a model for estimating the expected future MTM exposure, not for just a single product trade (and its hedges), but for a P&L batch trade portfolio corresponding to a particular counterparty or even a specific Master Agreement signed with a specific counterparty. In this case the model evolves a set of relevant prices, spreads, and market levels forward in time in order to calculate and estimate of the total MTM exposure of multiple products simultaneously. Whereas, the product groups P&L pricing model has to get expected fair value for product A and be well behaved with respect to other products used to hedge product A; The CVA model sort of has to get a fair price for all the products in a portfolio simultaneously by evolving a larger correlated set of underlying prices, spreads, and market levels forward in time. Remember someone is going to try to hedge the resulting CVA risk.
The Asset Charge and Liability Benefit are main valuation components of the CVA numbers used to describe the credit exposure due to fluctuations observed in the credit spreads of OTC derivative counterparties. One important distinction to note upfront is that the CVA numbers are treated as portfolio level P&L quantities; the CVA numbers do not generally get broken down to the trade level unlike MTM P&L. We will see why that is by first describing how CVA numbers are computed on a single trade and then examining how CVA numbers are generally computed as a portfolio of trades subject to the same ISDA Master Agreement (or absent that, the same Legal Entity x counterparty).
Counterparty risk exists on those individual trades for which the current or potential future replacement cost from Dealer’s perspective may be non-zero. Think of two trades entered in to by Party A; a dealer the first trade is a 5Y payor swap sales trade with Party B, a client of Dealer. Party B default protection is currently trading at 320 bps over USDlibor. The second trade is 5Y receiver swap, exactly offsetting the first trade, between Party A and Party C. Party C default protection is currently trading at 165 bps over USDlibor. The first trade is marked to market at $100 today and the second trade at -$100. From Party A’s perspective there is no market risk on the hedged position, however, since Party B and Party C default protection trades differently (note different spreads over Libor) there is counterparty risk for Party A. The trade currently marked at $100 is unlikely to exactly offset the second trade with Party C because the market expects that Party B has a significantly greater chance to default on its payment obligations than Party C. Notice there is nothing special about the underlying Swap – it could as well be an FX option, commodity swap, or a Local market swaption.
Looking more closely at the first trade between Party A and Party B we see that the positive MTM trade is slightly less positive when you consider Party B’s market implied expected capacity to pay – this is called the CVA quantity – Asset Charge. Similarly, Party B, in the event of its own default, will only pay the Recovery Rate *$100 to Party A so the liability arising from the first trade’s MTM is, once again, slightly less negative (from Party B’s perspective) – the Liability Benefit to Party B.
Asset Charge (Party A) = – Liability Benefit (Party B)
One method to quantify the magnitude of the CVA adjustment (e.g., Asset Charge) to the MTM to reflect the counterparty risk is to imagine that in addition to printing the 5Y Payor swap, Party A implicitly sold an option to knock-out on default the 5Y payor swap to Party B (represented by the dotted arrow in Figure) with a termination fee of Recovery Rate * the MTM of the 5Y payor swap at the time Party B defaults. The value of the CVA Asset Charge is identically the value of the knock-out option and therefore also the magnitude of the Liability Benefit.
So how does one compute the Asset CVA on a single 5Y Payor Swap? The issue in determining CVA is that, unlike coupons in fixed rate bonds, swap cash flows are not certain and future values can be positive or negative with asymmetric outcomes in the event of default by either counterparty. Let’s assume we employ a Monte Carlo analytic framework even though in certain cases closed form solutions or other faster numerical approximations may be used in practice. One important factor in the Monte Carlo framework is the ability to use the Front Office valuation for the day zero mark to market. The mark from which the process evolves is the P&L mark so one might expect that the major approximation error to monitor is the Monte Carlo convergence in the context of the market factor evolution/diffusion model.
The CVA Monte Carlo framework dictates that we compute PVs along a set of paths of future interest rates using current market (forward) rates and implied volatilities indexed by a series of tenors. This is likely to be the dominant part of the runtime of the entire CVA computation. Any good runtime estimates need more information on the specifics of the quantitative model and numerical approximation frameworks selected by the product groups and Research.
The estimated data requirements for 2000 paths and 300 tenors at 8 bytes a double is, worst case, 4.8 MB per trade (presumably, we are running 300 tenors out to 30Y so not all tenors will be needed for a 5Y swap). Assuming 1MM trades in scope we are looking at 4.8 TB (worst case) of intermediate storage for PVs. From a computational runtime perspective I think you already know that the computation needs to entirely residing in a single process memory address space (in say a single grid processor after the embarrassingly parallel MC computation is load balanced across a grid) to avoid absolutely egregiously bad memory performance.
The tenors cover the remaining time to maturity of the 5Y payor swap. For a given set of N simulations let PVi(t) denote the mark to market exposure at simulation path i and tenor t. Additionally, we need to segregate the positive and negative contributions to the expected exposure by defining posPV and negPV. Note, that when we take the max() and min() we lose the expectation linearity past this point in the process of computing CVA.
posPVi(t) = max(0, PVi(t))
and
negPVi(t) = min(0, PVi(t))
The computation requires the valuation of a swap at a sequence of tenors out to maturity of the swap when the mark to market is zero. Additionally the computation determines the EPE and ENE, the average positive mark to market (posPV) and the average negative mark to market (negPV). The data volume at this step is proportional to the number of counterparties times the tenors.
EPE(t) = 1/N * ∑(i=1,N) posPVi(t)
ENE(t) = 1/N * ∑(i=1,N) negPVi(t)
Compute the Unilateral CVA or the Asset CVA by discounting the loss on default * EPE(t) termwise per tenor with the risky discount factor to obtain a scalar value. The Liability CVA is similarly discounted by the Firm’s loss on default and risky discount factors. Typically, a Unilateral CVA has been applied to trades where the expected counterparty exposure is positive (i.e. a receivable from the counterparty) occasionally referred to as Asset CVA or Asset Charge.
Asset CVA= ∑(t=1, T) delta t * EPE(t) * (1 – Recovery Rate) * CP.rdf(t)
Liability CVA = ∑(t=1, T) delta t * ENE(t) * (1 – Dealer Recovery Rate) * Dealer.rdf (t)
Unilateral CVA can be defined as the difference between the risk-free portfolio value and the true portfolio value that takes into account the possibility of a counterparty default. Bilateral CVA requires adding a term to the Unilateral CVA that accounts for the risk the Dealer Legal Entity, as a party to the trade, poses for the counterparty
Bilateral CVA = Asset CVA + Liability CVA
The Firm’s current CVA exposure (Asset CVA) to the counterparty is a function of current MTMs (as we just discussed) as well as the effect of any legally enforceable exposure mitigations. We are going to review the exposure mitigants at the single contract level now. Ignoring credit derivative hedges, at the individual trade level the applicable exposure mitigants include:
- Margin/Collateral
- Stop Loss and Recouponing
- Guarantees
- Optional Early Termination and
- Reverse Walk Away Clauses and Extinguishing Derivatives.
Margin and collateral are applicable for discussion at the single trade level but really gets implemented at the counterparty account level. Typically, an investment bank has an entire independent department of folks running margin for the Firm based on a centralized database of accounts opened for counterparties and their trading/legal entities at the inception of the trading relationship. Historically, the challenge to the desk is almost always to the view integration between the counterparty account level current collateral on hand reports from Margin and the P&L book level MTM and risk reports from the desk for a specific trader. Margin happens to be a portfolio exposure mitigant that can be discussed at the single trade level so we will take advantage of that and cover it here before we proceed to the portfolio CVA computation.
Margin agreements for OTC derivatives are typically in the form of an ISDA Credit Support Annex to the main ISDA Master Agreement. Among other things the CSA sets a ratings based schedule of Collateral thresholds. In a vastly simplified description, if at a given time a counterparty has a Moody’s rating of A the schedule is used to determine a dollar (in general, some stipulated currency) level for the value of the collateral that must be held in the margin account. The value of the MTM minus the posted collateral must be less than the threshold. If the threshold is breached the Margin folks call up the counterparty and request no less than a minimum transfer amount (also in the CSA) of collateral to restore the threshold integrity. So, in effect, the greater the Firm’s CVA exposure to a counterparty, on a particular trade, the more collateral the Margin folks are holding. Think of the positive MTM of the trade as a loan to the counterparty and the margin folks as maintaining a dynamic portfolio of cash-like assets in a counterparty account securing the loan. The CVA computation can model margin by bounding the expected PV of the trade by the residue after the counterparty margin call. Below we show the residue as the remainder after dividing the PV by the ratings based threshold (we are ignoring the Minimum Transfer Amount here) assuming that the margin agreement is bilateral (i.e., both the counterparty and Dealer post collateral per the CSA).
posPVi(t) = max(0, res(PVi(t), CP Threshold))
and
negPVi(t) = min(0, res(PVi(t), Dealer Threshold))
Then use posPV and negPV to compute EPE and ENE (and CVA) as above. We are just describing the tip of iceberg that is Margin and Collateral modelling. From a computational runtime perspective Margin has a non-trivial cost which we will account for in estimates presented later in the discussion.
In several trading jurisdictions where legal concerns about enforceability of margin agreements have arisen, a trade may include specific contractual clauses to limit distressed counterparty exposures as well as loss limits in the event of defaults. A stop loss agreement is an MTM threshold trigger at which the contract unwinds. The parties to a stop loss exercise change both their market and counterparty risk profiles. Recouponing is an MTM threshold triggered unwind followed by reprinting the terms of the unwound trade as par trade. So after the unwind fee is paid the parties to the recouponed trade exercise maintain the same market risk but not the same counterparty credit risk. Third party guarantees may be used to offset counterparty default risk. Think of once popular monoline wraps of synthetic CDO tranches, for example. The credit risk of the guaranteed tranche is dependent on the correlated default probability of the counterparty and its guarantor. Optional Early Termination is negotiated on a per trade basis allowing Party A and Party B to terminate and settle a trade at a predefined date (or schedule of dates) prior to the nominal maturity date. Finally, reverse walk away clauses or extinguishing derivatives are gaining some traction. These are trade clauses, invoked at the time of a counterparty default, stating that the Firm (as party A to the trade) has no claim for the MTM of the trade. The Firm is short a binary credit default option and is exposed to recovery rate market risk. Analysis of the possibility of the Firm actually exercising these clauses against a counterparty in distress must evaluate the value of the trading relationship. The simple observation that the Firm is in effect terminating the trading relationship with the distressed counterparty by invoking these clauses and causing addition economic hardship to the client when they can least afford it limits the exercise of these options. In the case of a counterparty default, or the counterparty exercising against the Firm, then the trading relationship is presumably concluded in any case making exercise of these trade-by-trade negotiated options more plausible. From a computational runtime perspective the additional cost of these clauses to the data already enumerated inventory lookup runtime cost is small. The evaluation runtime cost seems likely to be small as well.
Some of the most significant counterparty portfolio exposure mitigants are the close-out netting provisions defined in the ISDA Master Agreements. The general idea is that a Firm should not have to recognize the potential exposure with respect to a given trade’s counterparty exposure if that exposure is adequately hedged by another trade (or set of trades). To discuss netting we need to extend the CVA computation discussion from the single trade case to the portfolio context.
For a given set of N simulations let PVij(t) denote the mark to market exposure at simulation path i for the jth trade in the netting group at tenor t. Netting at the PV level lets you choose the order of summation since expectation linearity is, so far, preserved. In this case we choose to net at the PV level and maintain the respective margin thresholds:
posPVi(t) = max(0, res(∑(j=1,J)PVij(t), CP Threshold))
and
negPVi(t) = min(0, res(∑(j=1,J)PVij(t), Dealer Threshold))
The average of all positive MTM values is called the “Expected Positive Exposure” or EPE. Similarly we can calculate “Expected Negative Exposure” or ENE. EPE (and ENE) use market rates and implied volatilities rather than historic vols. Assume that all the trade-by-trade mitigants are handled separately and aggregated appropriately to get an accurate CVA result. The data volume at this step is proportional to the number of counterparties times the tenors.
EPE(t) = 1/N * ∑(1,N) posPVi(t)
ENE(t) = 1/N * ∑(1,N) negPVi(t)
As in the single trade case,
Asset CVA= ∑(t=1, T) delta t * EPE(t) * (1 – Recovery Rate) * CP.rdf(t)
Liability CVA = ∑(t=1, T) delta t * ENE(t) * (1 – Dealer Recovery Rate) * Dealer.rdf (t)
This worst case data volume can be brought down a bit by collecting the trades per Master Agreement or counterparty ~50K master agreements versus perhaps 1MM trades. From a runtime perspective, even with the reduction in the data volume it seems likely the computation needs to reside in a single memory address space (a single grid process) from computation inception to completion. All in, the Master Agreement portion of the CVA computation cost the runtime for the production of the PVs (the dominant part of the runtime) plus the computation for the posPV, negPV, EPE, ENE, Asset CVA, and Liability CVA. posPV and negPV should be computed as partial sums along side the initial PV computation. Let’s allocate 1000 cycles for cache misses for each of the N paths. The min() and max() compare costs 10 cycles per path. Lets assume all million trades are in the netting group but since we accumulate them as partial sums in the original PV calculation we only need to account for one incremental addition more or less hitting the L1 caches or 4MM cycles per path. The residue will cost you a divide, multiply and subtraction – call it 50 cycles (the divide is probably dominant at ~25 cycles). So the posPV and negPV look like 2000 paths * ~4MM cylces (all in) = 8 billion cycles at 3 GHz or 3 seconds of runtime. The runtime of the EPE and ENE computation is 4000 adds and a couple of multiplies and is completely dominated by the posPV and negPV runtime even if the execution suffers L1 cache misses on every single one of the 4000 adds. Ditto the Asset CVA and Liability CVA computations, so all in there is about 3 seconds of aggregate runtime (w. contemporary microprocessor and competitive code) for all the Master Agreements on top of the product group’s PV computation.
There is a demand for computing Economic CVA and measures for reporting Cost of Funding as well as Asset and Liability CVA. Risk sensitivities, distressed market Scenarios, and PAA are required at a small multiplicative runtime cost to that already outlined in the preceeding CVA valuation discussion.
Conjugate Gradient FP Benchmark and Common Knowledge
Nicole Hemsoth, HPCWire, LINPACK Creator Sheds Light on Emerging HPC Benchmark, here. Interestingly both LINPACK and HPCG are of limited relevance to a major chunk of Wall Street analytics. On The Street if you ain’t a Magoo and you’re missing in L2, you’re doing it wrong. Efficient use of the NUMA interconnect by the Top Gun Erlang Low Latency Analytics boys, surely Sir you jest?
Back in June during the International Supercomputing Conference (ISC), we discussed the need for a potential alternative to the current LINPACK benchmark, which is the sturdy yardstick by which supercomputing might is measured, with its creator, Dr. Jack Dongarra.
At that time, he discussed a new benchmarking effort that is taking shape with the input of several collaborators, called the high performance conjugate gradient (HPCG) benchmark. The news about this effort drew a great deal of positive reaction from the scientific computing community in particular as it is more in tune with the types of modern and future simulations that are actually running on LINPACK top-ranked systems on the Top500. This new benchmark will be announced in further detail tomorrow (Tuesday) during the Top500 announcement and will be made available to be tested across a wider array of systems.
W. Ben Hunt, Epsilon Theory, A Game of Sentiment, here. So Sally Kellerman reads for MASH, and Altman tells her she has the best role in the movie, and she’s not sure the part has enough lines. 30+ min of burn for Bogut, Z-Bo 26-15 – 2 stls and good %s, but Pierce sucked hard and I had to watch it on TeeVee because Joe Johnson is now on the wire. Farmers ended up pulling Mo Williams off the wire and leaving Iso Joe even though Mo barely gets 30+ burn in the loaded Blazer backcourt. Johnson cannot be this bad the entire year, Prokarov must be losing his shit realizing that at this point in KG’s career, Prokarov can probably score on him, if you give him ten tries standing directly under the basket, with the ball, and his dribble.
But the most interesting aspect of the CK game played on the Island of the Green-Eyed Tribe is the role of the Missionary. It is the public statement of information, not the prevalence of private information or beliefs, that forces movement in the CK game. The public statement is what creates Common Knowledge, even if all of that knowledge was already there privately. Everyone must see that everyone else sees the same thing in order to unlock that privately held information and drive individual decisions and behavior.
QE’s portfolio rebalancing effect has been underestimated
Cardiff Garcia, FT Alphaville, GS: QE’s portfolio rebalancing effect has been underestimated, here. Index pricing looks interesting with the limited corp bond liquidity.
Here is the explanation from Edgerton:
Unlike equities, many corporate bonds trade infrequently. In the current iBoxx US investment grade index, for example, about 15% of bonds in the index trade less than once a month, and only about 65% trade on any given day. Even fewer bonds trade multiple times per day in substantial sizes. Thus prices and yields for a large fraction of the bonds that are aggregated into published indices must be estimated by the providers of the index data each day.
Unfortunately, it appears that the procedures used to estimate these prices do not incorporate all information available on each day, because future movements in bond indices are easily forecastable well into the future. To illustrate, we regress daily changes from 2010 to present in the Bank of America-Merrill Lynch BBB index yield on contemporaneous and lagged daily changes in 5-year Treasury rates and daily changes in spreads on the 5-yr CDX index of corporate default swaps, a more liquid credit market instrument. Exhibit 1 graphs the cumulative effect over time of a 1 bp increase in 5-year Treasury yields and a 1 bp increase in the CDX index spread on BBB index yields.
but Patrick Beverley
Kurt Helin, Rotoworld, Patrick Beverley Latest News, here. Ongoing study of what Happiness really is. Wired McBob and Nene, Pulled ZBo and Nash from the wire. Lead Farmers are going to try and make a go of it with a bunch of players who only do a couple things well. Ibaka, Aston, KMart, Zbo, Varejao, and Bogut. I don’t understand where Ibaka has disappeared to, it’s messing up the plan. Patrick Beverley looks good in warmups, feels improved, and increased his standing with the Rockets by not playing.
Patrick Beverley reportedly “looked good” in Monday’s pregame warmups according to Pro Basketball Talk’s Kurt Helin, and also said that he “feels improved.”The Rockets got torched defensively tonight and in particular Jeremy Lin and the perimeter group, leaving Rockets beat writers to bemoan the absence of Beverley at regular intervals throughout the night. He actually gained value on a night he didn’t play, and while we’re not saying it will happen it wouldn’t be surprising if he accelerated his timetable given the totality of the situation. Given Beverley’s likely stat output and chance he could eventually start for one of the NBA’s best fantasy teams, he shouldn’t be on waiver wires. Nov 5 – 1:45 AM
Chris Paul scored 23 points on 7-of-13 shooting (1-of-3 from deep, 8-of-8 from the line) with three rebounds, 17 assists and two steals in the Clippers’ 137-118 win over the Rockets on Monday.
Fantasy’s No. 1 play scoffs at your James Harden and Steph Curry selections as he has taken Doc Rivers’ suggestion to get more aggressive under complete consideration.