All industries rely on technology infrastructure to function, but businesses in the world of finance need to be agile enough to respond to the world market without interruption. The volume and speed of transactions play a crucial role in the success of a hedge fund and technology outages can have a devastating impact. Firms invest in infrastructure that needs to be reliable, whether it
is located on-site or part of a cloud services offering. But what happens in the case of an unavoidable emergency? Traders, managers and executives alike know the possibility of down-time is very real, but often don’t have a thorough answer for what to do if a crisis strikes. With a high concentration of financial firms located in the New York Metro area, many hedge had their disaster recovery solutions put to the test in the wake of Hurricane Sandy. It was a wake-up call for many, proving that firms of all sizes need to have a disaster recovery (DR) solution for their systems as well as a business continuity plan (BCP) so employees know where to go and what to do in case of an emergency.
Implementing a fully-functional DR/BCP strategy can seem overwhelming, especially for startup firms that may have limited in-house technology resources. Firms may overlook the necessity to have a complete business continuity plan that accounts for all types of technology outages. The right DR/BCP strategy will account for each element of the business ecosystem, from what elements of infrastructure should be accounted for in DR to the acceptable amount of downtime for systems and employees.
When formulating their DR/BCP strategies, firms can begin with the following six best practice recommendations to create a process to ensure that they can continue to do business through a disruption:
1. Think About Scale
Make sure the business continuity plan takes into account all elements of the firm’s technology and realistically approaches how employees conduct business. A five-person startup may be able to successfully implement a plan that won’t work for an established fifty-person firm. Acknowledgement of scale is crucial. It’s not just office scale that needs to be accounted for, but also the scale of the potential business disruption. For example, during Hurricane Sandy, power outages across the New York Metro area meant that the traditional work-from-home approach to unusable office facilities was not a scalable option.
2. Real-time Replication of the Production Environment
The best DR solutions provide the ability to seamlessly connect to a duplicate of the production environment that represents the most recent versions of production files, applications and server configurations. Traders and fund managers should be confident that they can re-connect to the most recent versions of mission-critical data and applications when there is an outage or a displacement from the office. Non-real-time replication results in significant increases in RTO (recovery time objective) and RPO (recovery point objective), meaning critical information is not present in DR.
3. Replication Method Suits the Production Environment
Firms need to tailor their DR strategy as DR functions differently depending on whether the business runs off traditional stack servers or virtual machines. In much the same way that virtual machines have allowed firms to take advantage of dramatic increases in computing power and
flexibility, the VM environment offers some unique advantages when implementing DR. Matching VM production environments to virtualized DR environments means businesses can utilize the simplicity and efficiency that virtual computing offers
4. Ease of Fail-over
Unlike backups, which are typically executed on a daily basis and can take days to restore to servers, DR is a duplicate of production that replicates in real time. As such, it should be able to be brought online within a very short time frame. All this is dependent upon the volume of data and services, and these aspects need to be considered when determining the RTO for the DR environment. Invoking backups is an often-tedious process that can involve securing physical media from off-site storage, rebuilding the environment and reinstalling data. DR should be invoked through fail-over, a process by which the environment switches from the now-unavailable production to its off-site replica. There should be a minimum of stress surrounding the fail-over process and there should be minimal questions as to how data will behave in these scenarios. A flexible DR environment should be able to be invoked in the case of interruptions large and small.
5. SOC Audit and Compliance with Highest Industry Regulations
Due to the extremely confidential nature of the data that passes through hedge fund systems, firms need to ensure that their DR environment resides in a fully-audited data center. DR servers should never be accessed by unauthorized personnel and should be subject to the most stringent quality controls and maintenance procedures. Best practices recommend that vendors pass the SSAE16 Type II audit as outlined by the AICPA. These standards for controls and processes ensure that data centers are staffed and maintained with the highest degree of security in mind.
6. Hot Site Offices
What is the firm’s solution in cases where it’s not just the production environment that is rendered unusable, but also the physical offices where business is conducted? Wide-scale interruption of power results in disruption of internet connectivity in residential areas. To ensure a fully-functional office atmosphere, critical hedge fund personnel identified in the firm’s BCP plan should have access to hot site offices that have reliable, redundant power and internet. These offices represent a duplicate of the main office complete with access to the firm’s DR environment and mission-critical peripherals like printers, scanners and meeting rooms. Firms need to be sure that employees are aware that they have access to the hot site office resources and should include guidelines on when to use these services in their business continuity plans.
With thorough planning and reliable technology, firms can weather interruptions of all sizes. A proper DR/BCP strategy backed by a comprehensive test program will relieve much of the stress and “what if’s” experienced during a service interruption, whether it is a power or equipment failure, a severe weather event or simply localized maintenance.