Deploying an effective disaster recover strategy for forex brokerage operations by Fair Trading Technology

Business process and outsourcing

Deploying an effective Disaster Recovery strategy for forex brokerage operations By Gerry O’kane

Business continuity in the face of disaster or bad luck is crucial for any business but in the fastmoving world of FX, brokers will find that without adequate plans they could be put out of business. Maintaining 100% of a broker’s operation can at least treble IT costs, but more inexpensive solutions can bring them back online within 10 minutes. Companies need to identify their pinch points, recognise risk and not allow a strategy to be dominated by a focus on shortterm margins. The miserable truth is that many FX brokers believe themselves protected but have untested systems, weak protection and may even be in breach of regulations

s the past few years have shown risk comes in many forms: central bank intervention, crumbling credit lines, terrorist attack or natural hazards. Making sure your business continues operating in the face of any of these adversities is the difference between having a company or not.

not an option to have nothing, but I’d say most of the industry is not aware of the range of issues in having a proper DR system – they’re led by sales rather than IT,” warns Tom Higgins, chief executive officer (CEO) of UK based Gold-i, a brokerage systems specialist. The business of disaster recovery (DR) has been around for as long as companies have been using computer systems. Essentially it is the corporate response to something which knocks over its computer systems – getting business up and running again. In the FX world all but the biggest forex firms use third-party DR solutions because of cost.

“The question is ‘how long can you afford a full stop’. For FX brokers in most circumstances you can’t!” warns technology provider Fair Trading Technology’s CEO, Tim Haman.

The problem is getting the brokers to recognise the full range of threats they face, understand the technical implications of one solution over another and assess the costs versus the risk.

Paul Pocock, managing consultant business continuity at IBM, agrees. “If something goes wrong and the brokerage is out of the market for any length of time, clients will go to the competitors, so it’s critical for businesses to able the access a back-up structure.”

That many FX brokers still need to both identify their weak spots and understand the realities of DR becomes abundantly clear when even the simple of process of backing up computer data is something that a significant minority fail to do correctly. “You need regular backups and all broker companies have some variation on

this but we’ve found only about 10% test them and we’ve been called in after a crisis to find the backup has been corrupted,” illustrates Haman.

Compliance in spirit not practice This sort of chaotic approach to DR continues to fly in the face of regulatory requirements. Since the terrorist attacks of 9/11, the US National Futures Association (NFA) “has devoted increased attention to issues relating to disaster recovery plans”.

Gerry O’kane

But assessing risk is not something firms tend to be too good at, especially when there is a cost involved in mitigating them. In the financial sector, as it continues to reel from the collapse of sub-prime debt and its domino effects, has tended to see the risk of operational failure and disaster recovery as an old chestnut that has been well and truly roasted. Yet the worrying truth is that many brokers, including those in the FX world, are only paying lip service to the necessity. “Look it’s required by regulators to have a disaster recovery solution and it’s

74 | Institutional FX Services - The Brokers Handbook 2012/2013

Institutional FX Services - The Brokers Handbook 2012/2013 | 75

Business process and outsourcing

Paul Pocock

“If something goes wrong and the brokerage is out of the market for any length of time, clients will go to the competitors, so it’s critical for businesses to able the access a back-up structure.” Under its Compliance Rule 2-38 a disaster recovery plan is needed: “The plan shall be reasonably designed to enable the Member to continue operating, to reestablish operations, or to transfer its business to another Member with minimal disruption to its customers, other Members, and the commodity futures markets”.

falls short in practice, according to industry insiders. “I don’t agree that the FSA, for example, should regulate technology per se as it changes too quickly, instead they should demand that the broker show methodology and the systems can be proved,” says Haman.

Alpari US reported the results of a DR test (information rarely made public). It said it had anticipated that the downtime for its Live server would be approximately 5 minutes, however all system data was wiped out and then fully restored within less than 2 minutes.

He adds: “I’d say in our experience that in 20% of the cases, true DR is not there, the system offers false protection and it hasn’t even been tested.”

“The fact is that time to recovery in a high availability system is a high cost. If you want it up and running 100% of the time it’ll cost millions, but if you can cope with an outage of 15 minutes then a solution is thousands- it’s a risk calculation on the effects on your business,” explains Higgins. “But having no DR, as some people do, is obviously suicide. You have nowhere to go.”

Planning a disaster recovery strategy for FX brokers is not altogether straight-forward: its fiduciary responsibilities to clients, the speed and complexity of its trading structure and its need for fast access to liquidity providers, all make a DR solution complicated. “The FX broker needs to articulate its strategy in the event of something going wrong, what do they need to prioritise – do they only continue to serve certain clients, black box trading, certain currency pairs, whatever. I can say from the outset that trying to have high capabilities in every area will mean an expensive solution,” says Pocock. The message is that brokers need to balance cost with what they evaluate as being critical for the company to continue business. The company needs to assess at what point in the trade process it must be able to recover and how fast it needs to be up and running again: the recovery point objective (RPO) and recovery time objective (RTO).

Like the UK’s FSA, the NFA compliance rule is an interpretive notice and only provides guidance. Indeed while there is a licensing requirement in all Because of the industry’s timethe major trading global trading centres for disaster recovery plans, sensitivity both the RPO and the forex industry frequently RTO need to be fast. In 2011 76 | Institutional FX Services - The Brokers Handbook 2012/2013

Paul Pocock’s recommendations • W hat does the FX broker want to achieve – how long can he be offline? • Do risk and impact assessments on business lines – what do they need to keep going? • Have a crisis management process to effectively respond to issues • Test it and validate it

IT structures • H igh network bandwidth to stop denial of service attacks • Fast network for synchronous or fast backups • Clustered, hot primary system • Backup servers • Backup liquidity providers • Backup Internet services

Amongst these considerations must also be an understanding that simply looking at DR through the prism of keeping business margins high, in other words simply going for the cheapest solution, is a short term view and one likely to do long term damage to the company. The next step is to determine the business impact of longer versus shorter recovery times for these key business processes.

What’s the business priority? The natural reaction to assessing business priorities is that everything is a priority. The fact is that they are not. Management must identify its business process areas and then how and why they fit in the IT infrastructure: these can include:

• • • • •

l iquidity phone systems trading system marketing information customer relationship management system • their customer interface, now often web sites. “But they must also ask the critical question of what their core competencies are too – are they an IT company or an FX company? Except for all buy the biggest players in the market the complexity and the cost means that outsourcing DR and systems to a third party is far cheaper than doing it in-house,” advises Higgins. But here too he warns that brokers need to fully understand

what they are signing up to. “One thing to note is that businesses often mix up highavailability computing with DR. High availability is I want to keep it running all the time but won’t cope with a disaster,” says Higgins. “Likewise just because you’re using a hosting service, doesn’t mean you have DR.” “In the first instance the minimum requirement for any broker is that all his data is replicated off-site and then he has to consider which solution answers the critical time question – how long can I afford to be down,” outlines Pocock. Haman also points out that in a full disaster scenario the company needs to have both its applications and data ready to run on back-up machines. “In

Institutional FX Services - The Brokers Handbook 2012/2013 | 77

Business process and outsourcing

While synchronous replication has benefits easing the complexity of handling transactions caught in the middle of a disaster strike, it has costs. High-speed networks and fast storage systems are critical to its success.

Tim Haman

There is also another issue that faces any DR solution for forex brokers and this is latency. This is the speed with which, even over fibre optic cable, data gets transferred, reduces over distance. “The fact is that synchronous replication, for example, beyond a distance of 60 kilometres isn’t possible and this can be a consideration in the DR strategy,” warns Pocock.

“Ideally it is often better to have a server in London and another in Zurich and so on, a solution used by the bigger players. But then again there is the problem that latency times will be longer.”

For example while many brokers from Australia and New Zealand have their computer operations in the US, the response and trading times can be up to 500 milliseconds. “Now that doesn’t sound like much but when you consider Chicago traders have latency of only 5 milliseconds, it can have a trading impact,” says Haman.

order to provide DR we need hot standby solutions and even triple redundancy on parts of the system.”

Indeed it also highlights another trade-off. “So you have to look at where your backup system might be, ideally at a distance unlikely to be affected by the same event – you must assess your location risk and concentration risk,” explains Pocock. While this might seem obvious in a discussion about DR, it remains a huge issue. “Look at Hurricane Sandy on the eastern seaboard last year. Many brokers had their primary data centre in street A in New Jersey, while the backup

“The most common structure of backup is leasing sever systems from data centre companies because it’s cost-effective and we’d use two data centre providers a main and backup both in different locations. While Metatrader has a built-in backup facility called Watchdog, we don’t find it good enough and use Double-Take. It’s real-time replication to your back-up service, applications and software updates,” explains Higgins.

78 | Institutional FX Services - The Brokers Handbook 2012/2013

was in street B in New Jersey. In fact they were only blocks away from each other and their disaster recovery plans didn’t work – the primary computer service was flooded, while the other was without power for days,” observes Haman. “Ideally it is often better to have a server in London and another in Zurich and so on, a solution used by the bigger players. But then again there is the problem that latency times will be longer - London to New York 100 milliseconds for example. Put it like this, Hurricane Sandy covered an area the size of northern Europe, so where is a safe distance?” adds Haman. These are the sorts of issues that FX brokers need to assess, balancing risk mitigation with costs. “A helpful rule of thumb is that the more immediate and full a solution you want the more expensive it can be,” says Pocock.

Business process and outsourcing

It also raises another issue with how companies structure their IT. “If you’re using a shared or syndicated service you must be comfortable on the rates and that the suppliers designs the sharing of servers and networks will fulfil your needs,” says Pocock. This goes for white label structures too.

the advent of high frequency trading. “The impact on your infrastructure is huge when you have algorithmic trading, they’re more expensive to run,” warns Haman. “A sole trader might do 2000 trades per month, a high frequency trader can have 2000 trades a minute and this costs far more in bandwidth and trading queues.” Whether servicing these sorts of trader in the immediate aftermath of a disaster is a necessity can only be based on what the company perceives to be a business imperative, albeit a more expensive one.

Daryl Lang / Shutterstock.com

On top of having backup systems ready to go with data and application sets in place, a growing area of disaster recovery is linked to security. “The fact is that hardware failure very rarely happens, software failure slightly more common as with network failure but cyber attacks are becoming a big problem and fitting your security processes into your DR strategy is critical,” says Higgins. He has had clients who have been attacked after blackmail attempts for up to $1 million and even one example of hackers entering the system to manipulate prices.

A report by Florida-based Prolexic, a distributed denial of service (DDoS) protection service, says that Layer 7 DDoS attacks, the most serious kind, increased steadily from 17% in Q3 2011 to 21% in Q4 2011 to 27% in Q1 2012. Last year Global eSolutions (Hong Kong) Limited had one FX client headquartered in the UK become a target after management did not respond to a ransom demand from cybercriminals. The first attack flooded the site with a distributed denial of service requests which interrupted site availability for

80 | Institutional FX Services - The Brokers Handbook 2012/2013

four hours. A second, more damaging attack occurred three weeks later, rendering the trading platform almost inaccessible to online traders. “The problem has become more important since most FX companies have their web site connected to CRM systems and operations are critical – you need to have sufficient bandwidth to be able to stop a DDoS attack,” says Haman. That can be an additional cost too.

High frequency trading comes at cost One development that also poses problems for a DR strategy is

For Higgins it also touches on an area often overlooked in FX disaster recovery. “Your liquidity provider is critical to your business and people don’t think about it enough – backup for liquidity provider,” he says. “Your price feed and execution might be done through a provider who might have a problem and so it doesn’t matter what you’ve done, you need another provider. You also need to find out how your prime brokers will cope with their liquidity providers and make sure there’s not a single point of failure.” Haman agrees it is an area often overlooked. “If you lose your only provider, you have a choice to either stop trading or take all the risk yourself.” Of course this creates a huge risk premium and may be in breach of some form of regulatory compliance on risk and liquidity rules.

White label solutions usually fall short As with hosting services and

Tom Higgins

“Your liquidity provider is critical to your business and people don’t think about it enough – backup for liquidity provider,” high availability servers, FX brokers often seem to believe a DR process is built within a white label solution. According to Haman it is rare to see a fully-fledged DR system in place. “There’s an intrinsic issue with white label providers who generally operate with high load ratios on their computers to make their business cost-effective,” he says. He feels it is critical to get the white label provider to explain its procedures and DR plans in detail. “It’s also important that your white label provider sends you backup every 12 or 24 hours, that way you can switch to a near complete restore with another provider in 30 seconds,” he adds.

Both Haman and Higgins also warn that using cloud structures for DR come with serious risks. “I’ve seen some brokers use a public cloud-based systems like Amazon and when someone on it starts over-using it, it creates havoc with their own operations,” warns Haman. He advocates that cloud or shared systems should not exceed 20% of total cloud capacity, leaving all on the system operational space even when system use peaks. “The cloud is a lovely idea but it has hideous repercussions unless its been designed to do it. If only part of common shared systems go down it’ll have an affect everyone unless you’ve designed it properly,” says Higgins.

Conclusion “Brokers need to get their DR providers to articulate what they’ll do in the event of a crisis and recognise that having top capabilities in every area is an expensive business,” outlines Pocock. “How fast you want to recover is a business decision, but there remain basic structures for any DR policy – regular and fast backups, security, at least primary and secondary servers in separate places even triple redundancy. Remember too that a good DR strategy does not have short-term margin-driven foundation, it needs to be risk driven.”

Institutional FX Services - The Brokers Handbook 2012/2013 | 81