Vlad Tenev, Forbes, Quora, What Is The Technology Stack Like Behind A High-Frequency Platform, here. Odd that this comes out in Forbes, right? 10 microsecond one way gateway latencies on vanilla x86 gear with kernel bypass is standard w. Linux. This year (2014) you will see (as standard) one way sub microsecond native protocol gateways hooked to 10GB Arista ports. Getting the cash equity latencies minimized between Colos is still a challenge: looking for Weehawken to Mahwah at ~140 mics, Weehawken to Carteret @ ~100 mics, Carteret to Mahwah @ ~210 mics, and Aurora to Carteret @ sub 5 millis. Strike Technologies is reporting 4.3 millis Aurora to Carteret. Mr. Tenev is spot on about the servers being on par with nice inexpensive gaming rigs – hence the DynaRack opportunity. Maybe that will heat up after folks get their Colo to Colo latencies nailed down/deterministic. I would imagine that once you know how you want your optimized ALGO to execute, then you are going to have a pretty good idea about how you want to configure DynaRack to run the optimized ALGO code. Would not be shocking if there were a patent or two you could push through on the DynaRack buildout and configuration.
How about a service where prior to the market open you are offered the chance to test any of 1000 randomly selected sessions to say the BATS matching engines? If there is any variation in the session latency on any of the sessions that you can detect, then you can bid for exclusive use of that session for the remainder of the trading day. Nominally all the exchange sessions should have similar latency performance but the exchange may be rolling out upgrades or new hardware, let’s just check to be sure. I’ll bet there is some reproducible latency variation. Need a name for this new service -call it DynaSession:
Hey, Joe how’d you do in the DynaSession auction this morning?
Oh, not so bad, got six sessions with -2.3 mics to the mean apiece for 100 bucks.
For the lowest possible latency, you would plug the physical cross-connect directly into a network card on your trading server. However, this is seldom practical, as you will likely want multiple machines for failover/warehousing and will often have dozens of exchange cross-connects. So, we recommended a low-latency switch, such as the 24-port Arista. This added 350 nanoseconds (port-to-port) latency. There was also a Blade Networks (now IBM IBM +1.99%) switch that used the same Fulcrum ASIC and was lower priced.
Next , your server would need a low latency network card with a kernel bypass driver. We recommended Myricom (10G-PCIE2-8C2-2S) for UDP traffic and Solarflare (Solarflare Flareon Ultra SFN7122F Dual-Port 10GbE PCIe 3.0 Server I/O Adapter – Part ID: SFN7122F) for TCP.
Both of these cards provide kernel bypass drivers that allow you to send/receive data via TCP and UDP in userspace. Context switching is an expensive (high-latency) operation that is to be avoided, so you will want all critical processing to happen in user space (or kernel-space if so inclined).
We compiled and provided our clients with a lightweight custom build of Linux (Gentoo), which we maintained in-house.
We pinned our trading software to dedicated cores and ensured other processes didn’t run on those cores (again to avoid context switches). We also disabled local timer and other interrupts on those cores.
The platform logic code was all event-based and written in C. We developed our own lockfree data structures (FIFOs and Ring Buffers), and each platform thread basically ran as an infinite loop polling its own input FIFO.
The end result was a system with less than 15 microseconds of latency (wire to wire). In terms of cost, the low latency switch ran about $15k, and you could put the server together for less than $5k, on par with a nice gaming rig.