Cost optimized SAP HANA DR options on Google Cloud

anilnarang

Abstract

Business Continuity is of utmost importance for any organization. A well defined High
Availability (HA) & Disaster Recovery (DR) strategy ensures business critical SAP
Applications accessibility during any planned or unplanned outage. SAP HANA database
being a central component of a SAP Application is configured with relevant HA/DR setup
to make the business data available at secondary node/site to ensure business continuity.

HA for HANA database is being used as fault tolerance mostly for any infra related failures
where HANA database fails over to secondary hot standby node deployed in cluster mode
within a single region. RPO and RTO are almost zero in this case as most of the steps are
seamless and automated with cluster management. Synchronous HANA replication across
zones of the same region ensures the secondary HANA node is in sync with the primary
node all the time.

In case of complete primary site loss, where Primary Node along with HA hot standby is
not available, DR solution on a separate geographical location termed as Secondary or DR
Region in public cloud acts as a safety net to ensure business continuity. Asynchronous
HANA replication to the standby node in the secondary region keeps pushing the data
from the Primary node. Manual failover is required to make the DR node as Primary.

Let's discuss some SAP HANA DR options along with their pros & cons respectively.
> Performance optimized DR setup (cost challenge)
For mission critical applications with requirement of minimum RTO (few minutes to hours)
to have the system up & running in the secondary region with all the business data,
performance optimized DR using SAP HANA HSR is deployed. In this DR setup,
computing capacity of DR HANA node is kept the same as Primary HANA Node and full
data is loaded in memory of DR HANA node at the time of replication setup. Then all the
delta data committed in primary, post initial load, is replicated regularly in secondary
through archive logs.

As depicted in the diagram below, secondary node cpu and memory configuration is kept
the same as primary node so the major chunk of data is already loaded in secondary
HANA node memory. In a disaster scenario, such configuration enables the Secondary
node as Primary in minimum possible time and almost no data loss. However maintaining
the PRD equal hardware at secondary site for DR node adds to a significant cost.

> Cost optimized DR setup (higher RTO)
To overcome the cost challenge with performance optimized DR setup, we can consider
following cost optimized SAP HANA DR options. Trade off with such setups will be higher
RTO but with low DR cost.

(i) Shared HANA DR node
In a DR setup where secondary node sizing is kept identical to PRD primary,
generally resources on secondary node can not be used for anything else until the
takeover takes place. In this shared DR setup, PRD data is not loaded in memory
but in disk at secondary site and thus resources can be shared by another non-PRD
SAP HANA instance e.g. QAS or TEST on the same node. In order to achieve it,
memory allocation to the PRD secondary node is restricted and the rest of the
memory is allocated to non-PRD (QAS/TEST) instance.

In case of a PRD Primary disaster scenario, non-PRD system QAS/TEST to be
stopped, full memory/resources to be allocated to the PRD standby node, load data
from disk to memory and bring it up as Primary. Apparently these steps will increase
the recovery time but it has the advantage of low cost DR setup because we are
using the same DR node for our QAS/TEST instance.

(ii) Lean HANA DR node
As compared to Shared DR setup, here we opt for bare minimum memory
configuration for PRD secondary instance so as the replication of PRD data keeps
loading in the disk. Thus we don't need a DR node to match the same
sizing/memory configuration as PRD primary. As in shared DR setup, preload of
column tables to memory of Standby HANA node is disabled by setting the
database parameter “preload_column_tables” as false.

In case of a PRD Primary disaster scenario, DR node to be stopped & upgraded to
configuration matching the PRD primary (Google VM type approved by SAP for
PRD use) and full memory/resources to be allocated to PRD standby node. The
value of database parameter “preload_column_tables” must be changed back to
default value as true so as to load the complete data including the column tables to
memory. As compared to Shared mode, this setup will have significant reduction in
the DR cost as we are keeping the Standby node computing/memory to bare
minimum to support the data replication from the Primary node.

Minimum memory configuration/Google VM type needed for Cost optimized lean
DR secondary node to be calculated as per SAP guidelines and supporting
documentation (SAP Note 1999880 FAQ SAP HANA System Replication). It is
advisable to run a Pilot/PoC to come up with exact memory & sizing configuration
requirements for lean DR node and arrest any other unforeseen issues upfront.

(iii) Backup-Restore HANA DR
In this most cost effective HANA DR solution, no dedicated secondary HANA Node
is deployed and no real time data replication from Primary HANA node happens.
However, we need to ensure that backups (HANA Database – data & logs and
Application/file systems) are being stored (dual region/multi region mode) on
another region identified as a DR site. RTO to bring up the Primary instance at
identified DR site will be quite high as , in case of disaster scenario of Primary
region not being available, one needs to set up the Servers in DR region from
scratch, Install the Application along with Database and then restore the Database
from the backup.

We also must reserve needed computing capacity in the DR region so that required
VMs can be deployed quickly with needed capacity at DR site in a disaster
scenario. We also must ensure to have a network connectivity (VPN) to DR side to
access the Applications.

Conclusion
Performance based HANA DR setup is the preferred one with minimum RTO &
RPO as the business would like to have the critical SAP Application up & running
on a secondary site in the shortest possible time. But having hardware configuration
same as Production will be a cost overhead.

All cost optimized HANA DR options discussed here will definitely save cost as
compared to performance optimized HANA DR setup. However, one must sacrifice
on the time needed to stand up the functional DR in the secondary region.
Depending upon the acceptable RTO and criticality of the HANA based SAP
application in a DR scenario, appropriate cost optimized HANA DR setup can be
deployed.

Hope this technical blog is helpful

Cheers !!

Cost optimized SAP HANA DR options on Google Cloud

SAP PI for Beginners

ABAP 7.40 Quick Reference

Fiori: technical installation and configuration of one app from A - Z