Issuu on Google+

Business Continuity in a heterogenous environment Horia Constantinescu Presales Manager, EMC Bucharest, September 2009

Š Copyright 2009 EMC Corporation. All rights reserved.

1


Š Copyright 2009 EMC Corporation. All rights reserved.

2


Business Continuity Objectives

Drivers Service levels becoming more stringent: Over 50% of customers surveyed have recovery times (RTOs) less than 4 hours and maximum potential data lost (RPOs) of less than 4 hours (Source: Enterprise Strategy Group)

Objectives 

Continuous availability

Expanded regulatory requirements (SOX, HIPAA, SEC, etc.)

Documented business continuity processes, controls, and test results required for defined records

Global business processes (Supply Chain, Customer Service, etc.)

24-hour application-availability expectations

Application consolidations: 75% of downtime occurrences are caused by poor technology in the network and application infrastructure (Source: IDC)

Increased business impact of application outage driving increased availability expectations

An average company incurs over $1 million of revenue loss per hour of downtime (Source: Meta Group)

Increased cost of downtime driving increased availability expectations

© Copyright 2009 EMC Corporation. All rights reserved.

3


A Business-Oriented Approach…  Determine requirements/service levels – Determine system/application mapping

 Validate ability to achieve service-level agreements – Evaluate costs/tradeoffs of technologies to meet service levels

 Create right level of protection for your specific business and application requirements  Tie it all together: – – – –

Across storage platforms Across infrastructure (storage, servers, networks, applications) Across data centers and geographic locations Simplify management overhead and implementation risk by working with vendors who can manage the whole project

…Resulting in improved protection of information:  Business continuity solution that meets your particular business needs  End-to-end solutions across storage, network, application, and server infrastructure

© Copyright 2009 EMC Corporation. All rights reserved.

4


Business Continuity Framework

Plan

Build

Manage

Assess Program/ Service Levels

Testing and Implement Technologies

Develop/Update Program Definition

Define Business Requirements

Develop Recovery/ Failover Plans

Manage Resources, Improvements, Measurement

Evaluate Availability and Recovery Alternatives

Conduct Recovery Testing

Design Infrastructure

Conduct Implementation Planning

PROGRAM MANAGEMENT AND INTEGRATION Š Copyright 2009 EMC Corporation. All rights reserved.

5


Assess Program/Service Levels

EXAMPLE RECOVERABILITY MATRIX Failure Scenarios Data Center

Catastrophic data center failure

Hardware

Catastrophic application hardware failure - loss of redundancy

Software

Application failure due to virus or worm - data is unaffected

Data Corruption

Data or database corruption that proliferates through replication

Network

Loss of network connectivity to primary site

Key Deliverables

Data Network Manual Corruption Process RTO RPO RTO RPO RTO RPO RTO RPO RTO Rating (Hrs) (Hrs) (Hrs) (Hrs) (Hrs) (Hrs) (Hrs) (Hrs) (Hrs)

Data Center

Business Function

Hardware

Software

Conduct high-level review of current recovery program

188

36

168

24

168

24

168

24

8

0

Process A

188

36

168

24

168

24

168

24

8

2

Application A

188

36

168

24

168

24

168

24

8

4

Application B

1

0

0.5

2

0.5

2

0.5

2

8

2

Process B

72

36

24

2

24

2

24

2

4

0

Executive-level presentation describing the relative strengths and weaknesses of the business continuity program, including the ability to:  Validate current RTO/RPO service levels  Meet business requirements (RPO/RTO)  Recover using existing plans

Manual Process Rating Legend Process exists, and is sustainable with acceptable productivity levels

4

Process exists, but has limited sustainability due to productivity impacts

2

Process does not exist or is not sustainable

0

© Copyright 2009 EMC Corporation. All rights reserved.

6


Define Business Requirements

SYSTEM APPLICATION MAPPING showing business-process, application, and infrastructure interdependencies

Analyze the criticality of key business processes and applications Key Deliverables  Business-process diagram listing key business processes, sub-processes, cycle high points, and associated support applications  Financial and operational impacts associated with downtime or data loss  Scorecard identifying gaps between required and current recovery capabilities

© Copyright 2009 EMC Corporation. All rights reserved.

7


Evaluate Availability and Recovery Alternatives Recommend an availability strategy

ALTERNATIVE ANALYSIS, based on recovery requirements and financial validation Hours of Lost Transactions (RPO)

Hours Required to Resume Business (RTO)

20K

Tape Vaulting

30K

Database Journaling

40K

Consistent Recovery Restart Asynchronous Point in Time Copy Continuous Asynchronous

60K 90K

150K

Synchronous Mirror 24

Cost per Month

Full Volume Tape Back up Nightly

12

0

12

24

36

48

60

72

250K 84

Transactions Not Captured

Declaration

Data Retrieval

Transit

System Restore

IPL & Network

Database Restore

Transaction Recreation

© Copyright 2009 EMC Corporation. All rights reserved.

Key Deliverables  Executive-level cost/benefit analysis of alternatives, the recommended alternative, and a high-level implementation plan  Technical architectures, high-level cost and benefits for each alternative  Availability-, disaster-, and operationrecovery service catalog, including: – Application-recovery tier definitions – Technical and operational support requirements – Reference architecture and total cost of ownership model

8


Design Infrastructure

BANDWIDTH USAGE Analysis of network-bandwidth requirement for recovery against current capacity, using EMC Business Continuity Design Tool

Key Deliverables  Recovery-design documentation

35 30

Peak workload

25 MB/s

Develop detailed architectural design for recovery technologies

20 15 10 5

Non-peak workload

0

Time Interval Compressed Bandwidth

– Detailed solution design – Scope and objectives – Solution recommendations and resources by technology, location, RTO, etc. – Assumptions – List of constraints and potential solutions – Financial alternatives – Implementation and ongoing cost estimates

Bandwidth Limit

© Copyright 2009 EMC Corporation. All rights reserved.

9


Conduct Implementation Planning

Month 1

Month 2

Month 3

Month 4

Month 5

Month 6

Phase Name

Phase Name

Phase Name

Phase Name

Phase Name

Phase Name

Milestone

Milestone

Milestone

Milestone

Phase Name Milestone

Success Criteria

• Milestone

• Success • Success • Success • Success • Success • Success Criteria Criteria Criteria Criteria Criteria Criteria

CSFs

Milestones

Activities

DETAILED TASKS, TIMELINE, RESOURCES, AND COSTS FOR SELECTED SOLUTION

• Critical • Critical • Critical • Critical • Critical • Critical Success Success Success Success Success Success Factor Factor Factor Factor Factor Factor Core Infrastructure

Application Specific Infrastructure $110,833 $1,121,382 $25,786 $344,879

Software ($) $25,987 Server ($) $197,250 SAN($) $11,461 Storage ($) $58,141 Implementation $25,000 $139,000 Services ($) Hardware $317,839 $1,741,880 Subtotals Shared Infrastructure & Services Subtotals Project & Management Cost Subtotal GRAND TOTAL

$179,405 $2,436,990 $88,819 $755,407

Annual Operating Expense $21,529 $292,439 $10,658 $90,649

$200,934 $2,729,429 $99,477 $846,056

$273,269

$0

$273,269

$3,733,890

$415,274

$4,149,164

$2,049,031 $452,000 $6,234,920

($427,325) $235,000 $222,949

$1,621,705 $687,000 $6,457,869

Capital Expense

© Copyright 2009 EMC Corporation. All rights reserved.

Total

Plan recovery-infrastructure build-out Key Deliverables  Detailed implementation plan for technology architecture, including: – – – – – – –

Tasks Timelines Dependencies Resources Milestones Deliverables Costs

10


Test and Implement Technologies

COORDINATED INSTALLATION, INTEGRATION, AND TESTING OF RECOVERY SOLUTION Add T-1s or ask if carrier can swing from data center ATM Transaction Encryption Device (2) Ethernet Switch

FedLine PC Terminal ACH files Received and Sent

PACE Controller

UnifiLT

WAN point-to-point connectivity to branches with Teller Operations, Loan Officers, and ATM. Also connectivity to Loan Centers. Have carrier swing DS3s from data center to hot site.

Router Ethernet Switch

UnifiLC

Disk Storage Subsystem Connectivity would be contingent on NAS or SAN

Internet Firewall Banking Server Bank by E- Commerce Internet and Firewall capabilities Bank by Internet would have to be replicated at hot Customer Internet site. Current data would be Customer available to systems. High speed, data mirroring to/from disk subsystem at Ideally, all of the systems would data center (DS3, OC3, or utilize NAS or SAN to maintain data DWDM over dark fiber) on disk subsystem. This is minimal amount of equipment to support critical business operations.

© Copyright 2009 EMC Corporation. All rights reserved.

 Implementation of recovery software and hardware  Migration of recovery-group applications to new architecture

Internet Firewall

MISER Core Banking and ATM Authorization and Routing

 Recovery-architecture implementation

ATM Router

Fractional T1 or ISDN for management PCI/Reports Cold Server All Branches and Back Office personnel view reports at their WS on this server

Key Deliverables

ATMs

Vendor Zone Firewall

Implement recovery solution

 Technical sizing, tuning, and unit-testing results

11


Develop Recovery/Failover Plans

RECOVERY ORGANIZATION AND TIMELINE, defining the execution of recovery solution

Create procedures to recover from primary to alternate sites Key Deliverables

Emergency Management Team

Administrative Council

Decision/Direction

Authorization

Event Business Continuity Coordinator Reporting Process

Business Units

Admin. Team Task/Coordination

Facilities

Technology Services

Help Desk

Network Operations

Business Facilities Security Human Resources Operations Information Tech. Finance Accounting Clerical Support Supplies Purchasing Travel Insurance

Data Center

 Completed recovery and/or failover/ failback plans  Plan administration-process definition  Plan development training  Plan acceptance testing  Supporting documentation (optional)  Plan automation-software installation (optional)  Plan automation-software training (optional)

© Copyright 2009 EMC Corporation. All rights reserved.

12


Conduct Recovery Testing Systematically test recovery capability Key Deliverables  Documented test results  Testing guidelines, including goals, budget, and audit procedures  Annual test plan  Training materials for each testing scenario

© Copyright 2009 EMC Corporation. All rights reserved.

13


Develop/Update Program Definition

TYPICAL PROGRAM DEFINITIONS Objectives Fundamental objectives of disaster recovery plan are: • To protect IT employees of the company • To provide a plan structure that, when executed, has the ability to recover normal daily operations across the in scope applications following a catastrophic event at the Company X data center • To guarantee continued availability of critical services and processes to Scenario Company X customers A may be declared when adetails disruption to normal Company X trained • disaster To provide sufficient procedural to allow execution by other processing operations occurs and the expected time for returning to normal operations would exceed predetermined timeframes established by IT for the in scope applications. Company X’s recovery and restoration program is designed to support a recovery effort where Company X’s IT staff would not have access to its primary data center at the onset of the emergency condition.

Establish program goals, policies, and metrics Key Deliverables  Program plan, including goals, policies, and metrics

Approach The Disaster Recovery Planning approach is to: • Prevent disruptive events through pre-emptive technical and administrative controls and heightened employee awareness • Pre-assign and define recovery responsibilities by team and task to control disaster response • Prudently maintain the plan at regular intervals

Assumptions

The Company X Disaster Recovery Plan was developed under certain assumptions in order to address the disaster scenario stated in Section 1.7 above. The recovery strategy for the five critical applications operating in Company X’s primary data center is dependent on the following assumptive statements.

© Copyright 2009 EMC Corporation. All rights reserved.

14


Manage Resources, Improvements, Measurements ELEMENTS OF TYPICAL BUSINESS CONTINUITY IMPROVEMENT PLAN Business Profile Business continuity strategy and objectives Organization responsibilities Client profile Strategic plans Business and technology plans

Strategy Vital records Data recovery and synchronization Alternate facilities Voice/data network Hardware/software Server recovery

Process and Results

Support program operations and continuous improvement Key Deliverables  Program-review report and presentation  Resource plan  Improvement plan  Measurement plan  Regularly scheduled presentation to management

Impact and risk assessment Plans and procedures Testing Maintenance Interdependencies

© Copyright 2009 EMC Corporation. All rights reserved.

15


EMC RecoverPoint Family Replication for Operational and Disaster Recovery

Š Copyright 2009 EMC Corporation. All rights reserved.

16


RECOVERPOINT OPERATIONS

RecoverPoint Replication

PRODUCTION SITE

SAN

DISASTER RECOVERY SITE

RecoverPoint appliance

RecoverPoint appliance SAN/WAN

Cluster Cluster active passive node node

Standby disaster recovery server SAN Tape backup manager

Production LUNs

CRR copy

Tape library

CDP copy

RecoverPoint Replication Services

      

Local and CDP journals Production data available during replication Initial synch via network, tape, or additional array Compresses and sends only changed data over the wire Local and/or remote replication Application-consistent replication with CDP Integration with Exchange and SQL and other applications Replication of Fibre Channel and iSCSI LUNs

© Copyright 2009 EMC Corporation. All rights reserved.

 Enhanced support with EMC Replication Manager and EMC NetWorker  Server-consistent replication and recovery  Supports federated collection of servers and storage arrays  Asynchronous crash and application consistent data recovery  Any copy can be made available as read/write  Changes to the copy can be incrementally reapplied to the primary 20


RecoverPoint Remote Protection Process— CRR 2a. Host splitter

1. Data is split and sent to the RecoverPoint appliance in one of three ways 3. Writes are acknowledged back from the RecoverPoint appliance

2b. Intelligentfabric splitter

6. Data is received, uncompressed, sequenced, and checksummed

7. Data is written to the journal volume

2c. CLARiiON splitter

4. Appliance functions

/A

/B

/C

Local site

© Copyright 2009 EMC Corporation. All rights reserved.

• Fibre ChannelIP conversion • Replication • Data reduction and compression • Monitoring and management

5. Data is sequenced, checksummed, compressed, and replicated to the remote RecoverPoint appliances over IP or SAN

rA

rB

rC

Remote site

Journal volume

8. Consistent data is distributed to the remote volumes

21


RECOVERPOINT OPERATIONS

Replication Source Objects  LUNs – Used for content distribution, backup, and application testing – One-time copy also an option – Resides on any array supported by RecoverPoint – iSCSI LUNs residing on CLARiiON CX3 or CX4 array

 Consistency groups – LUNs belonging to a specific application reside in a RecoverPoint consistency group – Each consistency group has one or more replication sets – Each replication set has the production LUN and a local and/or remote LUN – All replication and recovery is performed at the consistency group level

© Copyright 2009 EMC Corporation. All rights reserved.

22


RECOVERPOINT OPERATIONS

Defining Replication Parameters  Consistency group – Multiple replication sets  Source LUN  Local (CDP) LUN  Remote (CRR) LUN

 Compression  Optimization (lag or bandwidth)  Resource prioritization

Specify RPO for remote replication using size, number of writes, or time

© Copyright 2009 EMC Corporation. All rights reserved.

23


Synchronous Replication: Dynamic Switching Between Synchronous and Asynchronous  Asynchronous and synchronous is a policy for each consistency group – Dynamic by latency and by throughput can be set and later updated – Checking the “Allow Regulation” option will throttle the application and is required for an RPO of zero

Check for synchronous CRR Monitor latency (0–4 ms) Monitor throughput

Check for true synchronous

© Copyright 2009 EMC Corporation. All rights reserved.

26


RECOVERPOINT OPERATIONS

RecoverPoint Bandwidth Reduction  Administrator sets policies for importance and RPO  Administrator optionally specifies bandwidth policy  RecoverPoint monitors bandwidth and optimizes resource usage 12:00 a.m.

6:00 a.m.

6:00 p.m.

12:00 a.m.

Source Update Update CG1

10 Mb/s Source Update Update CG2

Source array © Copyright 2009 EMC Corporation. All rights reserved.

2 Mb/s

Bandwidth reduced by external traffic shaping tools

Target CG1

20-minute RPO

Target CG2

6-minute RPO

10 Mb/s

Target array 27


RECOVERPOINT V3.1 FEATURES

RecoverPoint Enhancements New in V3.1

New with RecoverPoint V3.1  RecoverPoint/Cluster Enabler – Integrates with Microsoft clusters to enhance application availability

 Snapshot consolidation – Enables longer-term recovery with same storage consumption

 Stretched CDP – Provides synchronous replication up to 30 kilometers – Enables cascaded RecoverPoint for three-site multi-hop disaster recovery configurations

 Virtual Provisioning support – Supports CLARiiON CX4 and Symmetrix DMX – Replication of thin LUNs preserves storage allocation policies

 Replication over Fibre Channel – Preserves existing financial investments

 Performance and scalability improvements – Protects more applications with existing investments – Protects more applications quicker © Copyright 2009 EMC Corporation. All rights reserved.

31


RecoverPoint Use Cases  Operational recovery – Local journal allows any-point-in-time rollback for quick recovery from data corruption – Quickly access and mount local replica to any server at local site – Recover data manually or use RecoverPoint wizards to rebuild production

 Disaster recovery – – – – – –

Duplicate copy of data at a remote location Remote journal allows point-in-time rollback Aggregate to single array at remote location Utilize different array family or vendor at remote location Integrate with VMware Site Recovery Manager Integrate with Microsoft Failover Clusters for Windows Server 2003/2008

 Backup, decision support, testing – – – –

Local and/or remote replicas Copy of data used for backups Copy of a database used for data mining Copy of data used to test software upgrades

 Data center migrations – Move data off one site to a new location © Copyright 2009 EMC Corporation. All rights reserved.

40


USE CASE

Operational Recovery  Local replication with CDP from production volumes to local target

SOURCE SITE UNIX

Windows

– Takes a snapshot for every write – Near synchronous, with at most a lag of a single write between production and CDP replica

SAN

 Journal compression—keep more snapshots in existing space  Snapshot consolidation by policy— keeps snapshots on disk for longer periods Production

Target

Journals

© Copyright 2009 EMC Corporation. All rights reserved.

41


USE CASE

Cascaded Replication for Disaster Recovery

SOURCE SITE UNIX

Windows SAN

DISASTER RECOVERY BUNKER SITE

DISASTER RECOVERY REMOTE SITE

UNIX

UNIX

Windows SAN

SAN IP or Fibre Channel

Stretched Fibre Channel

Policy: no lag Prod

Asynchronous policy-based replication

CDP

RPO Policy: managed lag CRR

Journals

    

Windows

Journal

CDP replication from production to bunker CRR replication over IP or Fibre Channel from production, with a managed lag, to remote site If source site is lost, production can continue from bunker or remote site If remote site is lost, replication continues from source to bunker If bunker is lost, replication stops

© Copyright 2009 EMC Corporation. All rights reserved.

42


USE CASE

Data Distribution Easily Seed Data for Multiple Servers

SOURCE SITE CRM PIT1 Server 1

CRM Update

CRM PIT2 Server 2

Update

CDP Update PIT2 PIT1 PIT3 PIT4 Copy

CRM PIT3

 Near instant point-in-time (PIT) rollback – Limited only by journal size

 Efficiently clone images from production data to alternate servers  Provide more timely access to information  Leverage data for: – Optimized local access – Test and development – Backups

Update

Server 3

Production

CRM PIT4 Server 4

© Copyright 2009 EMC Corporation. All rights reserved.

43


USE CASE

Backup, Testing, Decision Support Replicate to Avoid Affecting the Production Application Backup Production application

Production volumes

Asynchronous replication

Synchronous replication

Decisionsupport tools

Report generation

Remote CRR image

Array snap

Local CDP image

Writable snap

Production array

Remote array

Software upgrade test

 Local copy for data mining  Remote copy for backup  Write-able snap for testing © Copyright 2009 EMC Corporation. All rights reserved.

44


Policy-Based Management  Group policies are used to minimize lag between sites or bandwidth utilized, allowing capping of lag or bandwidth per group  Differing policies can be set for local copy and remote copy – Enables separate recovery point objectives

 RecoverPoint optimizes resources as necessary to meet policies  Alerts are raised when policies are exceeded

© Copyright 2009 EMC Corporation. All rights reserved.

45


Recovery to Any Point in Time  Instant recovery of any image

Applications

– Recover any-point-in-time image – Mount image to any host in SAN – Full read/write access to image without protection loss

SAN

 Use recovered image for a variety of purposes – – – – –

Appliance SAN

Replica

Journal

Virtual LUN

Backup and recovery Testing, development, and training Surgical recovery of files and folders Seeding data mining farm Cloning a federated environment

Source

Physical storage © Copyright 2009 EMC Corporation. All rights reserved.

46


Journaling for Consistent Recovery Journal Includes Data Plus User or System Metadata  Time/date – Identifies time image was captured

 Bookmarks – System-generated group bookmarks  e.g., Volume Shadow Copy Service (VSS) backup

– User-generated bookmarks  e.g., Pre- and Post-Patch

– Other EMC products  e.g., EMC Replication Manager

– Cross-tagging  e.g., Exchange and SQL Restart Point

© Copyright 2009 EMC Corporation. All rights reserved.

47


Replication Manager Support for RecoverPoint REPLICATION MANAGER ORCHESTRATES REPLICAS FROM THE CONTEXT OF THE APPLICATION  Replicas

Production

– Continuity – Backup – Repurpose – ILM

Application-aware

 Replication Manager simplifies management of RecoverPoint CDP, CRR, or CLR  Automates the creation, management, and usage of RecoverPoint consistency group images for applicationaware usage

SQL production

SQL copy

 Maps applications on the host to RecoverPoint infrastructure

Exchange production

Exchange copy

 Enables Storage Managers to delegate replication tasks to multiple human resources

 Auto-discovery of applications and their replication configuration during each replica cycle  Built-in intelligence places applications into proper state for consistent restart versus crash recovery – VSS for Exchange and VDI for SQL Server – Supports Exchange log management and ESEUTIL checks © Copyright 2009 EMC Corporation. All rights reserved.

48


Replication Manager Support for RecoverPoint (Continued) RecoverPoint CDP Application servers

Database File and servers print servers

SAN

EMC

EMC

 Transaction-level CDP data recovery

 Local CDP replica management

 True CDP (any point in time)

 Host-based splitter version

 Out-of-band network-based architecture

 Supported on CLARiiON and Symmetrix in Windows environments

 Application bookmarks for local recovery © Copyright 2009 EMC Corporation. All rights reserved.

 Supported on RecoverPoint 49


Replication CDP Integration with EMC NetWorker PowerSnap Application Microsoft File/print servers SQL Server servers

NetWorker PowerSnap

SAN

Tape library

NetWorker

Local CDP Journal

 Supports RecoverPoint image tracking within the NetWorker catalog – Recover directly from CDP images (daily/weekly…) – Allows use of CDP images for backup to disk or tape targets for longer-term protection – Centralizes management through NetWorker Management Console – Supports Windows File Systems and Microsoft SQL Server – Requires RecoverPoint CDP V2.4 and higher

CLARiiON or Symmetrix systems

© Copyright 2009 EMC Corporation. All rights reserved.

50


Š Copyright 2009 EMC Corporation. All rights reserved.

52


5.EMC