Page 1

Report: High Performance Trading FIX Messaging Testing for Low Latency

Abstract: FIX is the de-facto standard protocol used extensively for electronic communication between buy and sellside and execution venues, where the performance requirements of algorithmic and high frequency trading are extreme and or the benefits of STP (straight through processing) are sought from electronic connectivity .

January 2012 W ITH THANKS TO THE TEAM AT INTEL FASTERLAB UK

STATEMENT OF CONFIDENTIALITY / DISCLAIMER This document has been prepared by the consortium of companies described herein. No part of this document shall be reproduced without the consultation of these parties and acknowledgement of its source. Contact can be made to FIX@onx.com.


FIX Messaging Low Latency Testing

TABLE OF CONTENTS 1. SUMMARY .............................................................................................................................................................. 1 2. INTRODUCTION ...................................................................................................................................................2 2.1 PURPOSE ...................................................................................................................................................................... 2 2.2 ROLES AND RESPONSIBILITIES ............................................................................................................................. 2 2.3 CONDUCT AND PRESENTATION ............................................................................................................................ 3

3. METHOD .................................................................................................................................................................4 3.1 TEST HARNESS – SOFTWARE DESIGN .................................................................................................................. 4 3.2 TEST HARNESS - HARDWARE DESIGN ................................................................................................................. 5 3.3 THE MESSAGE PASSING PROCESS……….………………………………………………………………………. 7 3.4 TIMINGS ...................................................................................................................................................................... 8 3.5 POST-TEST DATA PROCESSING.............................................................................................................................. 8 3.6 TEST SCENARIOS ....................................................................................................................................................... 9

4. RESULTS AND OBSERVATIONS ..................................................................................................................... 10 4.1 EFFECTIVENESS OF KERNEL BYPASS ................................................................................................................ 10 4.2 MAIN RESULTS ........................................................................................................................................................ 10

5. DISCUSSION ......................................................................................................................................................... 16 5.1 VALUE OF THE EXERCISE TO THE ELECTRONIC FINANCIAL TRADING COMMUNITY .......................... 16 5.2 PERFORMANCE OF THE TEST RIG ....................................................................................................................... 16 5.3 RAISING THE TEST RIG TO PRODUCTION STANDARD ................................................................................... 17 5.4 EXPLOITING THE RESULTS ................................................................................................................................... 19

6. CONCLUSION ...................................................................................................................................................... 20 APPENDICES............................................................................................................................................................ 21 1A TECHNOLOGY MEMBERS ...................................................................................................................................... 21

January 2012


FIX Messaging Low Latency Testing

1. Summary This briefing paper reports on the activity of a consortium of leading IT vendors that have joined forces to create demonstrable high performance solution stacks to address common business requirements in financial trading. The initial focus of the consortium is on a reference-able technology stack of products and services to support FIX protocol communication functions. The paper describes the test environment, documents a set of benchmark tests performed on both commercial and open source FIX engine offerings, and details and interprets the representative latency and throughput figures achieved. The objective is to create transparency in and capability around comparing performance statistics for key functions along the trading life cycle. The tests used business workloads and were deliberately aligned to reflect the market‟s current interest in the measurement of interparty latency across the trade life cycle – using FIX formatted messages for defined legs. An on-going objective is to provide the market with useful data in order to support decisions in technology investment. Therefore, a range of technologies and application software has been addressed. Approaches were made to a number of application vendors with the ultimate agreement to test FIX engines covering both C++ and Java implementations from EPAM Systems‟ B2BITS unit and Rapid Addition, respectively. As a datum point for comparison, the open source QuickFIX, in both its C++ and Java variants was used. OnX Enterprise Solutions Ltd is leading a consortium whose charter members include Intel, Dell, Arista Networks and Solarflare Communications, with additional services provided by Edge Technology Group, GreySpark Partners and Equinix. The foundation objective is to create transparent comparative performance statistics for key functions along the trading life cycles using business workloads – FIX being used on a number of legs of the typical trade life cycle. A series of tests were undertaken that demonstrate the value of commercial software (versus open source) and use of specialist technologies in a low latency infrastructure. The consortium approach recognises the reality that the creation of high performance solutions requires the interaction of many leading edge technologies and the integration of components from several vendors. These parties must work together in order to specify correct parts and then to tune them together such that a complete and reliable solution is available through a collective single channel. Results for the tests showed that both B2BITS and Rapid Addition‟s commercial FIX engines out performed the open source QuickFIX offerings (C++ and Java) in a range of tests, being between 4 and 16 times faster in generating messages during a standardised simulated trade. The average latency for the commercial engines was 11 to 12 microseconds, whereas the open source engines were between 45 and 180 microseconds. The variation in results was equally stark, since the frequency distribution results from the commercial engines were bell curved but the open source results had a long fat tail. This indicated the commercial solutions significantly reduced the effect of network jitter and with it the undesired variance of performance. B2BITS FIX Antenna engine was a C++ version; Rapid Addition‟s Cheetah engine was Java. Both demonstrated similar performance characteristics over a range of tests and workloads. The similarity of results between the commercial C++ and Java engines stood in contrast to the open source equivalents, demonstrating that Java can perform as well as C++ code when implemented in an optimised fashion.

1

January 2012


FIX Messaging Low Latency Testing

2. Introduction In the online and Co-Lo based financial trading markets, performance, both in terms of latency and throughput is paramount. It is the difference between a firm being „in the market‟ or not. Complete trading systems are built from many complex elements, including market data capture, trading algorithms, trade execution, and in-flow risk analysis. These elements run on critical infrastructure components, hardware, software, network and connectivity all of which must interoperate with each other. Today, there is a lack of industry-recognised benchmarks for designers, which can demonstrate solutions have „high performance‟ characteristics. To achieve performance and agility, with low up-front and ongoing operating costs, trade infrastructure implementation teams need to source the best available components from different innovative specialist vendors, integrate them and tune their interoperability. FIX is the de-facto standard protocol used extensively for electronic communication between buy and sell-side and execution venues, where the performance requirements of algorithmic and high frequency trading are extreme and or the benefits of STP (straight through processing) are sought from electronic connectivity.

2.1 Purpose FIX message generation is an increasingly important leg in automated trading and can be a source of significant latency and jitter which can adversely impact the success of business and trading strategies. As trading strategies require access to a greater diversity of execution venues, communication over the standard FIX protocol is more cost effective than accessing markets via diverse proprietary protocols at the various venues. Infrastructure deployment teams have to select appropriate components, integrating them, commissioning them, deploying them for maximum performance, which can be an extreme challenge, It requires a combination of knowledge, skills, experience and deployment ability that is today scarce and expensive in the market. The testing undertaken by OnX in the Intel lab with support from the consortium was to investigate these assertions: 1.

Using commercial FIX engines would achieve lower latency and less jitter.

2.

Using specialist low latency network techniques would have a significant impact on latency.

Full results for each environment and latency improvement are available on request.

2.2 Roles and Responsibilities OnX Enterprise Solutions, as a “solution facilitator” led a collaborative approach through the creation of a consortium of IT vendors focused on the creation of high performance infrastructure designs specifically for financial trading systems. OnX consultants provided input into the hardware selection, conduct of the tests and post-test analysis.

2

January 2012


FIX Messaging Low Latency Testing

The benchmarks were conducted at Intel‟s fasterLAB in the UK. Intel engineers screened hardware and software performance for optimization. Intel engineers performed the tests, recorded the results and provided post test process to produce average tables and graph outputs. Software suppliers Rapid Addition and B2BITS EPAM Systems provided their FIX engines. The test harness was designed by Rapid Addition and the open source implementations in Java and C++ were supplied by Rapid Addition and B2BITS respectively.

2.2.1 Consortium Members A number of technology and services providers have invested as charter members of the consortium. However, the initiative is open, and further participant members may be added in the future. Between them, these members provide a complete infrastructure capability and created the reference architecture, each drawing on specific expertise while OnX provided the integration and build capability. The charter group members directly involved in building the initial technology stack and in the performance benchmark testing comprise: Lead: OnX Enterprise Solutions – Product procurement and architecture design Infrastructure component providers: Arista Networks – Network Switch Dell – X86 Servers Intel – Intel® Xeon® processors, and lab environment Solarflare Communications – Network Interface Card Implementation and deployment Services: Edge Technology Group – Buy-side solutions Equinix – Trading ecosystem hosting GreySpark Partners – Capital Markets business, management and Technology consulting services Applications under test: Rapid Addition – FIX engine B2BITS EPAM – FIX engine QuickFIX – Open Source FIX engine

2.3 Conduct and Presentation Tests were performed by Intel engineers and preliminary results shared with the software suppliers who were then given an opportunity to optimize their code. A second round of testing was then conducted, the results of which were used in the preparation of this paper. The software houses had access to the test harness prior to testing, in order to agree and finalize the methodology – but no access or amendment was allowed during the test runs. All results were captured by Intel and shared with OnX. Only results from Rapid Addition were shared with Rapid Addition and likewise the results from B2BITS EPAM were shared only with B2BITS EPAM.

3

January 2012


FIX Messaging Low Latency Testing

3. Method 3.1 Test Harness – Software Design The test harness used to perform the benchmarks was designed by Rapid Addition (audited by B2BITS EPAM) with implementations for C++ and Java written by B2BITS EPAM and Rapid Addition, respectively. The test software was implemented across the two servers. One ran simulators for a market data (MD) feed and an execution venue (EV), the other represented a typical software implementation of a real life algorithmic trading system – with all applications run on a single server to minimise latency, and through the „in-process‟ linkage of the algorithmic application logic and the FIX engine under test. The tests measured recognisable legs in the trading life cycle, mapping to real life workflow scenarios and researching current industry interest in the measurement of interparty latency over discrete legs of a trading cycle. The very simple logic of the simulated algorithmic trading component minimises latency and jitter, so allowing the focus of the benchmark to be on the FIX engines themselves. The benchmarking on the FIX engines focused on their ability (a) to process both FIX-formatted market data and (b) order processing messages for 2 defined stages in the trade cycle at different throughput rates, over both burst and prolonged periods. The companies under scrutiny were given controlled access to the test rig with the ability to run tests, analyse results, tune and re-test. This activity was supported by skilled Intel engineers, who were also available to assist the companies optimising their code for the target hardware stack. The diagram below illustrates the test harness with its simulated market data and execution venue.

Figure 1: Test Harness Overview

4

January 2012


FIX Messaging Low Latency Testing

3.2 Test Harness - Hardware Design In order to conduct benchmark tests on the FIX engines, the reference architecture was specified and built by OnX at the fasterLAB in the UK. OnX also analysed and interpreted the benchmarks, and provided an independent audit of the test activities by the FIX engine vendors. These vendors accessed the test rig via remote access under pre-approved and agreed conditions. The main components of the reference architecture are shown below:

Figure 2: Reference Architecture Components

3.2.1 CPU and Servers The test harness server was a Dell PowerEdge R710 server. This occupies 2U of rack space and incorporates energy efficient technologies to reduce power consumption and cooling. These are typically deployed in co-location environments, where space and power can be limited. The market data simulator and execution venue server included 2 x Intel速 Xeon速 processor X5677 , each with 4 cores, at 3.47GHz and 16GB of RAM; running Microsoft Windows Server 2008. This configuration was sufficient for the test harness task of generating a suitable trading workload. Dell also provided the monitoring server that hosted the network monitoring service. This comprised an Endace network monitor, timings were uploaded to the operating system only for post test processing.

5

January 2012


FIX Messaging Low Latency Testing

The test harness server housed the algorithmic trading system simulator and FIX engine. This server included a single Intel® Xeon® processor X5698 (dual-core), clocked at 4.4 GHz, with 12MB of L3 cache and 96GB of RAM (12 x 8GB); running Red Hat Enterprise Linux (RHEL 6.0). This processor has been designed based on feedback directly from Intel‟s teams in the field close to financial trading for applications where the fastest single-thread instruction execution is required. Performance increases of more than 20% compared to other Intel® Xeon® processor X5600 Series from Intel were noted. Preliminary tests were undertaken to select the most appropriate processor for the workload by comparing the Intel® Xeon® processor X5698 (4.4GHz) against the Intel® Xeon® processor X5680 (3.33GHz) . The preliminary test, using a message rate of 100,000 messages a second showed the Intel® Xeon® processor X5698 to have 36% better latency performance than the Intel® Xeon® processor X5680. The speed difference between the processors was 32%, indicating the Intel® Xeon® processor X5698 was exhibiting better linear scalability under test and was better suited to the FIX engine workload. On the basis of this preliminary test, the Intel® Xeon® processor X5698 was selected for the test environment.

3.2.2 Network Design The network used in the test harness used a network switch design rather than incorporating network taps at the points of measurement. Network taps are often deployed to measure latency across certain trade processing legs; however they can introduce instability and unreliability into the network. Port mirroring was used to forward packet data to the Endace network monitor. This is a much more common network implementation in production trading environments. Two types of network switch were considered: 1. Cut-through switch. This switch starts forwarding a network packet before the whole packet has been received, normally as soon as the destination address is processed. This reduces latency at the switch but decreases reliability as corrupted packets may be forwarded. 2. Store and Forward switch. This design buffers the whole packet before processing it. This enables the switch to validate the integrity of the packet before forwarding it. There is a consequential delay as a result of the buffering process, which increases latency. Knowing that timings were likely to be in the range of 5 microseconds to 300 microseconds, the low latency cut-through switch design was selected. The delay of switching packets using a cut-though switch is of the order of 300 to 1000 nanoseconds, depending on manufacturer. The delay of store and forward switching is between 500 to 1000 microseconds, again depending on manufacturer. Therefore, the cut through design was adopted – with the switch from Arista Networks (7124SX) built into the stack. The 7124SX uses a low latency design application specific integrated circuit (ASIC, switching at a 250 nanosecond rate. The ASIC is from Fulcrum Microsystems (an Intel company). This network switch has an extended operating system, (EOS), which can support additional features, such as PTP (Precision Time Protocol) and can also use Arista‟s latency analyser utility, known as LANZ. The switched design depended on a feature called port mirroring, which is used for monitoring traffic by sending information on a specified physical port to another interface. In this case it was also essential port mirroring copied both received and transmitted packets on the mirror source to the mirror destination. In the configuration, the source was the port connecting the FIX engine server, and the destination was the server hosting the Endace Network monitor card. 6

January 2012


FIX Messaging Low Latency Testing

Low latency network interface cards (NICs) were selected for all servers. With a non-specialized network card, latencies are around 20 microseconds. Empirical evidence from Solarflare Communications indicates that this can be reduced by 50% by using a specialized low latency network card and by a further 50% using a technique referred to as „kernel bypass‟. Solarflare is a recognized provider of low latency NICs offering kernel bypass support for both Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) traffic. Typically, market data is broadcast using stateless UDP and trade execution uses TCP. The model selected was the dual interface SFN5122-F.

3.3 The Message Passing Process The diagram below illustrates how messages pass through the test harness. Key

Network Card

= timing points

MD Simulator

FIX Engine Under Test

Algo simulator

Hand up message

35=X

Does MD Entry Price end .00?

Session-1

Price + Stock + Other

Start

No Yes

Is this a bid (269=0)? Invoke “Create Order Single” class to “buy” stock

Create 35=D

Yes

Invoke “Create Order Single” class to “sell” stock

Order Single

No

End

Yes

No Session-2

Create 1st ER 35=8,39=0

Status=Filled

Hand up message

Create 2nd ER 35=8,39=2

Is this “New” ER?

No

Has “buy” Traded?

Yes EV Simulator

Discard

Figure 3: Execution Flow of Messages Through the Test Harness

Referring to figure 3 above, the message flow in detail is now described: 1. The market data simulator created Market Data incremental refresh messages (tag 35 = X), assigning an MD Entry Price (tag 270) that was incremented through a saw tooth pattern from 0.001 in 0.001 increments, cycling through a small, in memory, list of stocks (tag 55); with each cycle of the saw tooth pattern the integer part of the price was incremented. 7

January 2012


FIX Messaging Low Latency Testing

2. The FIX engine listened to this stream of messages on a single FIX session (Session-1) and hands each message up to the algorithmic trading simulator. 3. The algorithm simulator interrogated the data and when a bid (tag 269=0) had a MD Entry Price that ends in ".000” (e.g. 270=56.000) it instructed the FIX engine to create and send a new Order Single (tag 35=D) message to buy 100 lots (tag 38=100) of the symbol (tag 55) to an execution venue simulator on a second FIX session (Session-2). Since each market data message had a unique price, market data messages could be correlated with the order messages that they triggered. 4. The execution venue simulator automatically filled the order by creating two Execution Reports (tag 35=8). The first had an Order Status of “New” (tag 39=0); the second, “Filled” (tag 39=2). These were returned on the same FIX session (Session-2). 5. On receipt of the fill (tag 35=8; tag 39=2) the algo simulator instructed the FIX engine to send another Order Single (tag35=D) to sell 100 lots (tag 38=100) the same symbol (tag 55). 6. Again, the execution venue simulator automatically filled the order by creating two Execution Reports (tag 35=8). The first will have an Order Status of “New” (tag 39=0); the second, “Filled” (tag 39=2). 7.

Note: Tests were performed without use of persistent storage.

3.4 Timings Since timestamps within the test harness hardware components lacked sufficient accuracy to the microsecond, timings were recorded on an Endace network monitor. Three timestamps were recorded for each benchmark process: 1.

Receipt of the market data message from the market data simulator (T1).

2. Transmission of each of the 2 single order messages to the execution venue simulator (T2). 3. Receipt of confirmation of the execution of T1 from the execution venue simulator (T3).

3.5 Post-Test Data Processing The timings collected on the Endace card were uploaded to a Unix workstation for post processing. Depending on the test run parameters and duration between 3000 and 24 million timings (generating 24MB of test result data per second) were recorded. Scripts were used to analyze the timestamps to give 2 performance measurements: 1. The FIX engine‟s ability to process market data messages and create single order messages as a result, calculated as T2-T1 for each buy order. 2. The FIX engine‟s ability to generate single order messages after receipt of an order filled message, calculated as T3-T2 for each sell order. 8

January 2012


FIX Messaging Low Latency Testing

3.6 Test Scenarios The benchmark process was repeated as part of 3 different test cycles covering short to extended duration periods. Each set of benchmark cycles was repeated 3 times, in order to establish mean latency figures of the FIX engines. The 3 intended test cycles were: 1. Burst test – where 50,000 market data messages per second were generated by the market data simulator for a period of 5 minutes. 2. Sustained test – where market data message rates were increased from 10,000 to 100,000 per second, by 10,000 every 4 minutes, for a total of 40 minutes. 3. Extended sustained test – where market data rates were increased from 10,000 to 50,000 per second, by 10,000 every 10 minutes, and then held at 50,000 for a total time of 4 hours. At 50,000 market data messages per second, 50 orders per second were generated by the algo simulator, and at 100,000 market data messages, 100 orders per second were generated. Different execution venue simulator delays were tested since not all matching engines at different exchanges are equal. The delays were varied for the burst test as follows: Delay (Microseconds) 10 14 20 50 100 200 1,000 2,000

Packets Per Second 100,000 71,429 50,000 20,000 10,000 5,000 1,000 500

9

January 2012


FIX Messaging Low Latency Testing

4. Results and Observations A total of 84 test runs were conducted and analyzed out of an anticipated total number of 96 test results from the FIX engines tested: B2BITS EPAM, QuickFIX C++; and Rapid Addition, QuickFIX Java. Performance issues with the QuickFIX C++ engine would not permit it to operate when the execution venues matching engine was set to perform at a rate faster than 50 microseconds.

4.1 Effectiveness of Kernel Bypass A preliminary design test was conducted and results indicated that the use of kernel bypass (Solarflareâ€&#x;s Open Onload product) had an impact on the commercial FIX engines from B2BITS and Rapid Addition across all tests. No observable impact was recorded when testing the open source variants. With this observation being established, it was decided that kernel bypass would be enabled for all tests, irrespective of whether the application design was capable of taking advantage of it.

4.2 Main Results The results of the tests showed varying performance characteristics between both Java, C++ and open source code streams. This included outright latency when delivering message workloads, and the level of jitter displayed by the engines as they performed their tasks across the period of the test workloads. The graphs below show a selection of performance characteristics. Full detailed figures for each environment can be seen in the C++ and Java results reports respectively, where each commercial engine is compared with its open source equivalent.

10

January 2012


FIX Messaging Low Latency Testing

4.2.1 Test Results and Observations Buy Orders – Execution Venue simulating a 50 microsecond order matching delay

The two graphs above show the latency of the workload completion over a 300 microsecond range, comparing open source against the commercially available Java and C++ FIX engines, respectively.

11

January 2012


FIX Messaging Low Latency Testing

The two graphs above show the same results over a 60 microsecond range, comparing open source against the commercially available Java and C++ FIX engines, respectively.

12

January 2012


FIX Messaging Low Latency Testing

Buy Orders – Execution Venue simulating a 14 microsecond order matching delay

The two graphs above show the latency of the workload completion over a 300 microsecond range, comparing open source against the commercially available Java and C++ FIX engines, respectively. Note the absence of performance test results from the open source C++ engine under these test conditions.

13

January 2012


FIX Messaging Low Latency Testing

The two graphs above show the same results over a 60 microsecond range, comparing open source against the commercially available Java and C++ FIX engines, respectively. 1. The commercial FIX engines completed the messaging tasks between 30 and 50 microseconds more quickly than the QuickFIX engines. 2. The QuickFIX engines had outlying results to 300 microseconds (they did not complete their task inside this time), a source of jitter (unpredictability). 3. QuickFIX C++ was unable to perform with the Exchange simulator set at 14 Microseconds. 4. Across the range of tests, each commercial engine exhibited different characteristics, with differences in outright latency and jitter, which showed no common theme as to performance characteristics and are hence considered to be within experimental error. This assertion is demonstrated when examining the whole result set. 5. Open source/free Java and C++ QuickFIX engines show random variation between themselves – C++ version could not perform at the 14 microsecond load level. 6. The commercial FIX engines were consistent and deterministic throughout the tests. The commercial engines showed a normal distribution pattern and calculations of standard deviation were undertaken. The results for the QuickFIX engines showed a large number of outlying results (which translates to poor reliability in handling trading workloads) and did not fit the normal distribution model. 14

January 2012


FIX Messaging Low Latency Testing

The commercial FIX engines showed a much tighter distribution range of 4 microseconds, as opposed to 50 microseconds.

Number Of Samples

The sample run below illustrates the point. Note the difference in microsecond range on the X axis of each graph below.

QuickFIX - open source

Number Of Samples

Time in Âľs

Sample commercial engine

Time in Âľs

15

January 2012


FIX Messaging Low Latency Testing

5. Discussion 5.1 Value of the Exercise to the Electronic Financial Trading Community The testing exercise has illuminated the debate by practitioners who look to quantify the benefits of commercial FIX engines over their open source counterparts. It is clear that the commercial engines outperform open source versions by an order of magnitude – and also have significantly higher consistency in performance, an essential feature for the execution of certain trading strategies. While the open source model is widely successful as a driver for innovation, in the case of FIX it is clearly important to select software products based on the required workload and performance characteristics. The Java based FIX engine closely matched the native C++ code – with each engine showing individual characteristics. Finally, the exercise has demonstrated the value of optimized high performance infrastructure when deploying automated electronic trading systems.

5.2 Performance of the Test Rig The test rig in the Intel fasterLAB did not prove to be a limiting factor in the testing process. The infrastructure showed itself to be reliable (with no failed components over the test cycle) when running at the extremes of performance, including running the CPU‟s at 100% capacity for prolonged periods. In future tests an enhancement which is being pursued is to implement the Precision Time Protocol (PTP), which is accurate to 500 nanoseconds. PTP enabled NIC‟s will be tested. These will include Solarflare‟s SFN5322F, which has an accurate oscillator to act as a grand master clock. Other network components are then synchronized, provided they implement the PTP network daemon, which is available for both Red Hat and the Arista EOS switch operating system. Latency can be measured at the switch using the LANZ feature from Arista. This will reduce the number of components required to accurately time stamp network packets generated during the trade cycle. This enhancement will continue to ensure the integrated trading test suite remains at the leading edge of network component innovation.

16

January 2012


FIX Messaging Low Latency Testing

5.3 Raising the Test Rig to Production Standard 5.3.1 Deploying Production-Quality Infrastructure A focus of this series of tests has been to illustrate the importance of design in the technical infrastructure and its direct and positive impact on performance. Moving from a lab experiment to a stable production system, which can support live trading execution strategies that rely on speed and reliability, can be expensive and time consuming. Deploying high performance infrastructure requires prudent engineering discipline, which has to be accommodated in any implementation plan. This is characterized by the non-functional requirements listed below:       

Reliability – the stability of a system to reproduce the same results under the same conditions on an on-going basis requiring minimized intervention. Availability – the ability to continue operation, with failover/disaster recovery when one or more components fail. Testability – scrutinize and assert the integrity of the system as fit for purpose as planned and required. Manageability – control of the system, start, stop and vary the control parameters, using planned resources, be they in-house or outsourced to a service provider. Performance – closely aligned to reliability – the ability of the system to work within the required functional constraints and meet operational expectations. Security – ensure the access control, audit and privacy of the system is maintained – maintaining required audit trails – and access to information for internal prudence and external compliance. Scalability – the system can maintain performance requirements and/or accommodate spikes in demand as workloads increase within defined boundaries. Extensibility – the ease of change of a component without consequential change to adjacent components – the ability to extend the scope of the system to support additional business functions, e.g. adjacent and/or new roles such as risk reporting, compliance, etc.

Project governance is required across the implementation of a high performance trading infrastructure. This begins with an analysis of the current environment, whether it is a green field deployment or a complete replacement of existing systems. A critical component is to ensure that any new system can integrate effectively with existing systems (SOR, risk, market data, etc.). Across the financial services technology landscape, these skills and competencies are typically spread across multiple parties with differing and often overlapping areas of competence and responsibility. This can introduce variance in the effectiveness of the trading infrastructure, which can impact the overall effectiveness of the deployment project. The consortium has been assembled to create teaming amongst parties, who can carry the resource loads of planning and designing suitable infrastructure within the context of each firm‟s current and ongoing environment. This approach can be equally applied whether the deployment is in-house or at a colocation facility in proximity to market liquidity.

17

January 2012


FIX Messaging Low Latency Testing

5.3.2 Implementing Commercial FIX Engines Implementing a FIX engine is a non-trivial exercise, which can be split into two parts. 1. Application Integration: The market data, algorithm and trade execution components of a trading platform need to be linked to the FIX engine through linked libraries. This requires a level of programming skill that depends on the complexity of the trading platform, level and quality of the FIX engine documentation and the number of FIX engine touch points to the trading application. Even the simplified test rig had three implementation stages: 1) Planning the application integration; 2) Executing and optimizing the applications and infrastructure for optimum performance; and 3) Commissioning and deploying the infrastructure. These stages can be accelerated while identifying and containing elements of risk by engaging a suitable specialist systems integrator, such as GreySpark Partners. 2. Execution Venue Integration: Each execution venue will have its own rules and tests to allow market participants to join the market. These tests typically require validation within test environments with a prescribed test schedule. Passing the venue integration test requires planning and logistical rigor.

5.3.3 Implementing Production-Ready Networks Building the network infrastructure to production standard is a pre-requisite of the integration work. Four areas of infrastructure design and operation need to be considered. 1. Assessment of and elimination of single points of failure: the test rig had one network link to a single switch. Building redundant links between servers and having a redundant switch is common practice, which is recommended when deploying this type of infrastructure. In network terms this is “Multi-chassis, link aggregation groups� or MLAG. See figure 4 below. 2. Application failover: via the network enables the software components of a system to restart on a standby server (for High Availability). Using clustering techniques reduces the amount of time failover takes to complete and requires input to assess the relative cost of the outage period to determine the complexity of the clustering solution. 3. Backup and Restore: every solution should have a tested backup and restore mechanism to protect the business from system failure. Since some trading platforms tend to be stateless, the restore mechanism will resemble the original commissioning steps (having recorded configuration details and files). Designing a backup and restoration mechanism requires the same business input as application failover, specifically guided by the cost of outage. 4. Operational Management: which encompasses all aspects of change and configuration management, systems monitoring and maintenance of operational integrity.

18

January 2012


FIX Messaging Low Latency Testing

MLAG Peer Link

MLAG pair

MLAG pair

Private Cluster links

Figure 4: Reliable Network Connectivity

5.4 Exploiting the Results The choice of application software for financial services is extensive. Making an appropriate selection is a challenge for the business – be it buy-side, sell-side or execution venue. For those applications addressing the trading functions there is a lack of transparency and consistency in measuring and assessing performance of solutions from individual vendors, or solution sets of interoperating elements. This consortium based testing program is an exercise in collaboration to achieve operating solutions with FIX messaging as the initial focus. The composition of the group models the reality facing trading firms. Production systems come from the assembly of many parts from many entities and with resources from different sources. This exercise has provided a basis for consistent testing and comparison of how technologies handle the (FIX) business workloads. Its success in achieving a granular and detailed set of results comes in major part from the facilitation that OnX Enterprise Solutions brings through its product distribution and architecture design capabilities – and Intelâ€&#x;s objective to support testing of solution scenarios on its Xeon processors. Across the team, each party to the consortium has volunteered its core capability, whether it is product or service IP. The combined resources effectively anticipate the exercise that trading firms would have to address in selection, procurement and commissioning of systems. Remarkable levels of co-operation, and open sharing have been displayed with visible useful results. This output can feed directly into the technology selection processes of firms.

19

January 2012


FIX Messaging Low Latency Testing

6. Conclusion The major result from the testing exercise was the collaboration between parties to create a robust and representative testing environment, which was able to produce results simulating real-life conditions and their effect on the key function of FIX message transmission. The commercial FIX engines were between 4 and 16 times faster (depending on load) than the open source QuickFIX equivalent engines, with an average latency test result of 11 microseconds, as opposed to 180 microseconds. This was even more evident when the performance of the execution venue was increased to reflect faster matching (sub 50 microseconds). Stress exerted on the FIX engine drew out different performance characteristics. Under different stress conditions, each engine exhibited different characteristics. The commercial enginesâ€&#x; performance was vastly superior to that of the open source models. The standard deviation from the mean for a commercial engine was only 1 microsecond. The open source software exhibited results which when translated into the real production world would not be considered sufficiently robust to support automated trading strategies. The major factors affecting open source variants are poor performance under high load, higher levels of network jitter and trade execution outliers up to 300 microseconds. Tuning the Network Interface Cards with kernel bypass technology improved the performance of both commercial engines and demonstrated a 50% reduction in latency. This translated into a round-trip saving in latency, which would have material impact on the trading strategy being executed. Engineering an integrated trading platform was proven to deliver incremental benefits in reducing overall latency. Both Java and C++ environments in open source and commercial form exhibited individual characteristics across the various code streams in the applications. This indicates the on-going scope for improvement in the software, which can lead to improvements in overall performance. The test results demonstrated that trading strategies which rely on minimising response times should be deployed on a high performance infrastructure. This is integral in obtaining enhanced levels of performance and reliability. Each layer in the technology stack has a role to play with incremental enhancements being possible when implementing options, such as kernel bypass.

20

January 2012


FIX Messaging Low Latency Testing

Appendices 1. Consortium The consortium comprises a group of companies whose combined capability maps to the provision of trading technology solutions. This is not a closed group – and fully is open to inputs from additional parties on an on-going basis.

1a Technology Members A number of technology and services providers have invested as charter members of the consortium. However the initiative is open, and further participant members may be added in the future. Between them, these members provide a complete infrastructure capability and created the reference architecture, each drawing on specific expertise, while OnX provided the integration and build capability. The charter group members with technology product directly involved building the rig and in the performance benchmark testing comprise: OnX Enterprise Solutions – As consortium lead, OnX selected vendors for the benchmark test stack, built the test rig by integrating the product components and interpreted the results of the tests. Arista Networks – Provided its 7124SX network switch to connect servers for the benchmark and its LANZ (Latency Analyzer) capability for tuning. Dell – The benchmark was run on two Dell PowerEdge R710 servers, one of which was equipped with an Intel® Xeon® processor X5698 . Intel – The benchmarks were conducted at Intel‟s fasterLAB in the UK. Intel® Xeon® processors X5677 and X5698 were installed in the Dell servers. Intel engineers screened hardware and software performance for optimum utilisation of iA (Intel Architecture) features, including use of Intel Compiler . Solarflare Communications – SFN5122F 10 gigabit Ethernet network adaptors were installed in each of the Dell servers, offering kernel-bypass communications.

21

January 2012


FIX Messaging Low Latency Testing

Other consortium members, which can provide services for deployment in real life production scenarios – be they Co-Lo, onsite or other – include:

Edge Technology Group – Provides integration and managed services, in particular for buyside participants. Equinix – run financial services data centres around the globe supporting high-performance trading across multiple-asset classes on a deep mix of trading venues. Trading participants are connected inside the data centre using cross-connects to reduce network latency delay and enable price discovery, order routing and execution at the highest possible performance levels. GreySpark Partners – Provides „top down‟ trading strategy and technology consulting, and integration services, with a focus on assessing requirements and designing „technology bundles‟ for high performance.

1b Application Providers Being Tested Rapid Addition - FIX engine "Cheetah" - in Java - and Quick FIX Java harness. B2BITS EPAM - FIX engine "FIX Antenna" 2.7 - in C++ and Quick FIX C++ harness.

QuickFIX – Open source FIX engine in C++ and Java.

22

January 2012

fixresearchcomplete  

http://hosting.3bweb.com/~welovepi/images/press/fixresearchcomplete.pdf

Read more
Read more
Similar to
Popular now
Just for you