High Performance Trading – Deployment of Leading Technology – Electronic design services with Argon Design Ltd.

High Performance Trading – Deployment of Leading Technology

Skill sets used

High performance trading, FPGA logic, Business advantage

Since doing this proof of concept work in 2012 we have continued to develop our technology and have completed several customer projects. Our current best latency is 35ns.

The Challenge

High performance automated trading as a niche requires constant innovation at the leading edge of technology deployment.

Typically, technology invention is ahead of the markets’ ability to deploy, challenged with the skills to programme new code or move existing functions to new platforms.

For high frequency strategies seeking arbitrage or market making opportunities, the use of technology can be very close to the latest frontiers of R&D requiring the lowest possible latency and high throughput. FPGA, GPU, many-core and x86 are the technologies to be harnessed, but keeping pace with the latest engineering thinking and capability is an intellectual and human stretch for all firms – large Tier-1s or small proprietary traders.

The vast majority of existing computing code for trading today is written for Linux on x86 processor architectures. A heterogeneous approach to use of processors means migrating existing code or re-writing – resource-expensive chasing scarce skills in the market at the leading edge of thinking. Hence, while ground breaking performance enhancement is in sight – it has cost and operational risks attached, which Argon is addressing.

Our Approach

We believe in a heterogeneous multi-layered solution of different processor architectures that uses FPGAs at the lowest level to handle tasks away from mainstream x86 components.

As an example of this approach we have invested over a man-year and developed a prototype system where market data feed analysis and fast-path trade execution is performed directly on the switch under rules determined in parallel on “traditional” processors. This split in tasks and its impact is shown in the following video:

The system uses the Arista 7124FX switch. This switch is unconventional because it contains a large Altera FPGA with hardware-level access to 8 of its 24 10Gb Ethernet ports. This direct FPGA access allows data feeds to be parsed and analysed as close as possible to the feed network endpoints. Similarly the heterogeneous processor mix in the switch enables other related functions to be undertaken and orders executed back onto the wire. Deployed in CoLo at the trading venues as part of the day to day mix of technology found in the racks today – this technology can take the design and performance of trading functionality to a higher level of performance.

The Finteligent Trading Technology Community was set up informally in 2012 (led by Intel, OnX Enterprise Solutions and others) to test the various layers of the trading stack. Using the Finteligent test harness for consistency we tested the Argon Design pattern to compare the impact of our heterogeneous architecture.

The Finteligent reference set-up is described in their October 2012 report titled “Research Report: 10GbE Low Latency Networking Technology Review”. In that report, available here, and subsequent update reports on individual technologies, the impact on trading execution performance was measured for various switch, server, CPU, and OS combinations – the collaboration has created the basis of de facto benchmarks.

The best performance for a traditional all x86 based design was 4,600ns. The Argon system, for the same measured leg of trading functions, achieved an execution latency of 170ns, over an order of magnitude smaller.

Our system uses three key approaches:

  • Rule-based operation. A higher level x86 system provides rules that when matched, trigger the fast path execution
  • Inline parsing and matching. As an Ethernet frame enters the switch it is parsed as it arrives. This allows for partial information to be extracted before even the whole frame has been received
  • Pre-emption of market orders. Instead of waiting until the end of a potential triggering input packet, we start sending the overhead part of a response which contains the Ethernet, IP, TCP and FIX headers. This allows for sending a populated response right after a match is made. If a match is not made, the Ethernet frame is “poisoned” by using an invalid Frame Checksum. This aborted Ethernet Frame will be discarded by the network device at the other end of the cable.

The overall effect is a dramatic reduction in latency to close to the minimum that is theoretically possible. The key factor here is the selection and separation of tasks between hardware and software processing.

Download our whitepaper

Related case studies

Contact us

Do you have a project that you would like to discuss with us? Or have a general enquiry? Please feel free to contact us

Contact us