Disaster Recovery Procedures
Purpose
This document defines the technical disaster recovery (DR) procedures for the Bank's critical IT systems and infrastructure. It provides step-by-step guidance for executing system failover, data restoration, and service validation to ensure the Bank can resume technology-dependent operations within defined Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs).
Scope
These procedures cover all Priority 1 and Priority 2 systems as classified in the Business Continuity Plan, including the Core Banking System (CBS), SWIFT Alliance, payment processing platforms, online and mobile banking, the data warehouse, and supporting infrastructure (network, storage, authentication services).
Infrastructure Overview
| Component | Primary Site | DR Site | Replication |
|---|---|---|---|
| Core Banking System | Data Centre A (City Centre) | Data Centre B (Regional Hub, 150 km distance) | Synchronous (real-time) |
| SWIFT Alliance | Data Centre A | Data Centre B | Synchronous |
| Payment Processing Platform | Data Centre A | Data Centre B | Synchronous |
| Online / Mobile Banking | Cloud (Region 1) | Cloud (Region 2) | Active-active (real-time) |
| Data Warehouse / Reporting | Data Centre A | Data Centre B | Asynchronous (1-hour lag) |
| Email and Collaboration | Cloud (multi-region) | Cloud (multi-region) | Active-active |
DR Activation Criteria
Disaster Recovery procedures are activated when:
- The primary data centre (Data Centre A) becomes unavailable due to a physical event (fire, flood, power failure, structural damage).
- A critical system outage at the primary site exceeds two (2) hours and cannot be resolved through standard incident management.
- The Crisis Management Team (CMT) determines that failover to the DR site is necessary to meet RTO commitments.
The decision to activate DR is made by the Chief Information Officer (CIO) or the designated Technology Recovery Coordinator, in consultation with the CMT.
Failover Procedures
Phase 1: Assessment and Decision (Target: 30 minutes)
- The Technology Recovery Coordinator convenes the DR team (on-site or via conference bridge) and confirms the nature and extent of the outage.
- The team assesses whether the primary site can be restored within RTO or whether failover to DR is required.
- The CIO approves the failover decision and notifies the CMT.
Phase 2: System Failover (Target: 2 hours for Priority 1 systems)
- Network failover: The network team redirects DNS and routing to Data Centre B. VPN concentrators at the DR site are activated for remote access.
- Database failover: The database team promotes the replicated database instances at Data Centre B to primary status. Data integrity checks are performed.
- Application failover: Application servers at Data Centre B are brought online in the prescribed sequence: (1) authentication services, (2) Core Banking System, (3) SWIFT Alliance, (4) payment processing platform, (5) client-facing channels.
- Connectivity validation: External connectivity is verified, including SWIFT network, RTGS/ACH clearing systems, correspondent bank links, and internet-facing services.
Phase 3: Service Validation (Target: 1 hour post-failover)
- The DR team executes the Service Validation Checklist, which includes predefined test transactions for each critical system.
- Validation tests include: CBS account enquiry and transaction posting, SWIFT message send/receive, payment initiation and approval workflow, online banking login and transaction, ATM network connectivity.
- Results are documented and reported to the Technology Recovery Coordinator. Systems that fail validation are subject to immediate remediation.
Phase 4: Stabilisation and Monitoring
- Once all Priority 1 systems are operational at the DR site, the DR team transitions to continuous monitoring mode.
- Enhanced monitoring thresholds are applied to all DR systems for the first 48 hours.
- Priority 2 and 3 systems are brought online in sequence according to their RTO targets.
Failback Procedures
Once the primary site is restored, failback to Data Centre A is executed during a planned maintenance window (typically a weekend). The failback follows the same phased approach as failover, with full service validation before client traffic is redirected back to the primary site. Failback must be approved by the CIO and communicated to all stakeholders at least 48 hours in advance.
DR Testing
| Test | Frequency | Description |
|---|---|---|
| Component failover test | Monthly | Failover of individual system components (database, application server) to verify replication integrity |
| Full failover test | Semi-annually | Complete failover of all Priority 1 systems to Data Centre B, including service validation |
| End-to-end DR simulation | Annually | Full DR activation exercise, including CMT mobilisation, staff notification, and client communication |
Related Documents
- Business Continuity Plan Overview
- Incident Escalation Matrix
- Technology Incident Management Policy
- Data Backup and Retention Policy