Explore Business Continuity Options for SAP worklo...

saikumarjaddu

Introduction:

To handle disasters due to natural events (like floods, earthquake etc..), power outages, application outages, accidental data loss due to human errors, Business Continuity Planning (BCP) is part of every customer`s IT strategy. Disaster recovery is a key element of BCP & is critical for organizations running mission critical workloads such as SAP. However, managing disaster recovery on-premises is cost, time and resource intensive. They can also be complex to implement and test.

AWS Elastic Disaster Recovery (also known as DRS) can help to mitigate this aspect without compromising the reliability of your applications.

AWS Elastic Disaster Recovery simplifies and automates disaster recovery by leveraging AWS as a scalable and cost-effective recovery site for your on-premises and cloud-based applications. You can quickly recover your on-premises and cloud-based SAP applications using AWS Elastic Disaster Recovery. It reduces costs by removing idle recovery site resources, and you pay for your fully provisioned disaster recovery site on AWS only when you need it for recovery or drills.

For SAP workloads already running on AWS, you can use AWS Elastic Disaster Recovery to recover applications in another AWS Region, which can help you meet resilience and availability goals for these applications.

In this blog post we will walk you through the key benefits, features, disaster recovery scenarios & lessons learned from our internal POC for failover & failback of SAP workloads using AWS Elastic Disaster Recovery service.

Key benefits and features:

AWS DRS is application agnostics and replicate data at the block level.

Key benefits and features are:

Flexible:

AWS DRS is easy to setup & configure. You can recover any on-premises or cloud-based source applications that runs on supported operating systems and common databases (SQL, Oracle, etc.), mission-critical applications such as SAP, and homegrown applications, hence making it adaptable to your specific needs.

Elastic Disaster Recovery automatically provisions and manages a staging area subnet. The subnet uses cost-effective AWS resources to receive your replicated data to help limit incurring significant costs.

Reliable:

DRS continuously replicates your workloads at the block level in a reliable and robust manner, Continuous replication can help you meet desired RPOs and RTOs related SLAs.

For events like ransomware attacks, data corruption, accidental user error, or bad patches, you can use DRS to recover your servers on AWS from a previous point in time.

Simple & highly automated:

You need only minimal skills to set up and operate Elastic Disaster Recovery. There is a streamlined, unified process to test, recover, and fail back all your applications to AWS.

You can also conduct frequent, nondisruptive drills at any time to help maintain disaster readiness.

Pre-Requisites:

To use AWS Elastic Disaster Recovery, the service must first be initialized for any AWS Region in which you plan to use Elastic Disaster Recovery.

In your AWS account where you want to set up disaster recovery, make sure to create VPCs for staging area and recovery area. In those VPCs, create staging area and recovery area subnets respectively.

For this POC we have created 2 IAM users.

DRSAgentUser: AWSElasticDisasterRecoveryAgentInstallationPolicy
Failback: AWSElasticDisasterRecoveryFailbackInstallationPolicy

Security groups: If the default group is too open, you can apply a different security group with relevant rules and uncheck the default security group. Note that an incorrectly configured security group could result in replication being unable to function.

Supported operating systems: With AWS Elastic Disaster Recovery, you can replicate your on-premises, virtual or cloud-based servers to variety of operating systems. For the entire list and additional considerations, refer supported operating systems documentation on AWS portal. Please refer the URL below for supported operating systems.

To prepare your network for running Elastic Disaster Recovery, open below Firewall ports:

Source servers and the replication servers launched by AWS Elastic Disaster Recovery in your staging area subnet need to be able to send data over TCP port 443 to the AWS Elastic Disaster Recovery API endpoint.
The source servers on which the AWS Replication Agent is installed need be able to send data over TCP port 1500 to the Replication Servers in the staging area subnet.

Make sure to check system settings on source servers:

Root directory – Verify that your source server has at least 4 GB of free disk space on the root directory (/)
RAM – Verify that your source server has at least 300 MB of free RAM to run the AWS Replication Agent.
Python is installed on the server – Python 2 (2.4 or above) or Python 3 (3.0 or above).
Free disk space on the /tmp directory – for the duration of the installation process only, verify that you have at least 500 MB of free disk space on the /tmp directory.
The active bootloader software is GRUB 1 or 2.
Ensure that /tmp is mounted as read+write+exec.
Root user access
Ensure that the dhclient package is installed.
Verify that you have kernel-devel/linux-headers installed that are the same version as the kernel you are running.

Scenarios:

Below are the architecture diagrams of scenarios we tested for POC.

Scenario1: Cross-region

Both the application and database are on same source server and replicated using DRS

In this scenario, SAP workload is running in us-east-1 region. Failover was tested in us-west-2 region using AWS Elastic Disaster Recovery (EDR) service.

Before the actual failover, we performed failover drill.

After the actual failover to the us-west-2 region, we accessed the SAP system from the RDP host (windows server) & performed required SAP application testing and validations.

Scenario 2: On-premises Source

Both the application and database are on same source server and replicated using DRS

In this scenario we tested the failover of SAP workload running on-premises using AWS EDR service. DR site is AWS us-east-1 region.

Scenario 3: On-premises Source distributed.

Database server is different from application server and replicated using HSR replication and Application is replicated using DRS.

In this scenario, SAP workload running on-premises. SAP application and databases are running on distributed hosts. HANA database is replicated to DR site (us-east-1) using SAP HANA System Replication (HSR). SAP application is replicated using AWS EDR.

Steps to set up DR using DRS:

Set up replication servers:

They should be set up in the region where you want to perform the DRS.
These default settings will be applied to every source server that is added to AWS Elastic Disaster Recovery. However, you can change them later individually.
Choose Set default replication settings on the AWS Elastic Disaster Recovery landing page.
Choose the subnet within which the replication server will be launched.
Replication server instance type (the default is a t3. small)

Specify volumes and security groups:

Amazon EBS volume type is an important section. Any source drive less than 500 GiB in size will be selected as the standard volume type, which is a low-cost Amazon EBS volume. Any volume greater than 500 GiB can be a faster, general purpose gp2 volume or a slower but far cheaper st1 volume. If your source drives are not I/O intensive in nature, the cheaper option might suit you. However, if your source is quite busy, consider the faster volumes to avoid Amazon EBS becoming a bottleneck in your replication. You can then decide if your replication server EBS volumes should be encrypted at rest. If you want them to be encrypted, you can decide to use the default key for Amazon EBS encryption. You can also use a customer-managed key, or CMK.
Our Source drives are greater than 500GB but not I/O intensive, so we proceeded with default selection.
Select the custom security group that opens inbound TCP Port 1500 for receiving the data from the source servers.

Configure additional replication settings:

By default, the AWS Replication Agent installed on your source server will attempt to communicate with the replication server through its public IP. Alternatively, if you have private IP connectivity to AWS, such as through a VPN or AWS Direct Connect, you can set up communication through a private IP instead. [Note: in our case we used AWS Direct Connect]
You can also throttle network bandwidth. By default, data replication will attempt to use all available network bandwidth, but this can be limited if needed. The throttling value is in megabits per second.
The replication settings also include the point-in-time snapshot policy. By default, Elastic Disaster Recovery will maintain 7 days of point-in-time snapshots for each of your source servers. You can modify the days of retention based on your own requirements.

You can edit the replication settings for each individual source servers.

Launch Settings:

Set default DRS launch settings, you can change a variety of options in General launch settings, including:

Instance type right sizing.
Start instance upon launch.
Copy private IP.
Transfer server tags
OS licensing.

Set default EC2 launch template.

The default EC2 launch template sets the default values that will be copied to EC2 templates created for newly added source servers. This template defines how drill, recovery, or failback instances are launched.
We kept the default settings here, as we wanted to configure launch templates individually for every source server.

Adding source servers:

Download the agent installer aws-replication-installer-init.py onto your Linux source server.
Ex: wget -O ./aws-replication-installer-init.py https://aws-elastic-disaster-recovery-us-east-1.s3.us-east-1.amazonaws.com/latest/linux/aws-replicat...
run the installation script. [sudo python3 aws-replication-installer-init.py]
Enter your AWS Region Name, the AWS Access Key ID and AWS Secret Access Key that you previously generated.
Choose the disks you want to replicate; press enter if you want to replicate all disks.

Now, you can see the real time progress on source server’s page. It will take time to complete the initial sync.

Drill/Test

Configuring launch settings:

After you have added your source servers to the AWS Elastic Disaster Recovery console, you will need to configure the launch settings for each server which determine how your drill and recovery instances are launched in AWS.
You must configure the launch settings prior to launching test or recovery instances based on the requirement for each source server.
Instance type right sizing INACTIVE. Here, we selected inactive because the AWS DRS was picking an instance type automatically based on the settings in 5.1, which was much larger and expensive than what was required. And now AWS Elastic Disaster Recovery will launch the AWS instance type as configured in your EC2 launch template.
EC2 launch template, are automatically created every time you add a source server to AWS Elastic Disaster Recovery
Change the settings as required and make it as the default version to be used.
Once the launch settings are done, set a default launch template for the instances.

Launching a drill instance:

After you have added all your source servers and configured their launch settings, you are ready to launch a drill instance.
Once the status is ready, proceed to initiate recovery drill.
Select the Point in Time snapshot from which to launch the instances for the selected source server à Initiate drill.
The AWS Elastic Disaster Recovery Console will indicate Recovery job is creating drill instance for X source servers when the drill has started.
Once the recovery job is successful, you will see the recovery instance ID.
Test the launched drill instance from both OS and SAP GUI level

Actual Failover

Once you have finalized the testing of source server, you are ready for recovery.
Make the source server down to initiate recovery.
Select the source servers and initiate recovery and follow the same process as of a drill instance.

Failback

Once the disaster is over, you can perform a failback to your original source server or to any other AWS Elastic Disaster Recovery Failback Client on the server.
perform a failback from AWS server back to our on-premises server.
To start the failback process we need to boot a failback ISO image in bios mode in on-premises.
Using the Failback Client: Failback replication is performed by booting the Failback Client on the source server into which you want to replicate your data from AWS. To use the Failback Client, you must meet the failback prerequisites and generate failback AWS credentials as described in failback prerequisites section of the AWS DRS guide
The AWS Elastic Disaster Recovery (DRS) Console allows you to track the progress of your failback replication on the Recovery instances.
Post failback, restart the OS of your source server and proceed with validations.

Lessons Learned/ Best practices:

While installing the Replication agent it failed because agent unable to add the newly created "aws-replication" user to "sudoers", upon checking OS level and found that the sudoers is getting reset and the user is not able to perform sudo commands anymore, worked with Security team and added 'aws-replication' user to be part of sudoers manually on all the source servers to overcome the issue.
To access SAP application using SAP GUI at the recovery site, please make sure appropriate firewall ports are open & security group rules are added to the instance VPC. For ports and network setup details refer network requirements documentation.
During our test we observed that recovery instance IP is not getting added to the /etc/hosts file by default, make sure to check and add the entries to the file.
Later, to test the recovery instance, we launched a free-tier windows terminal and were able to launch the SAP GUI application successfully.
At the time of our testing, cross-account failover was not supported by Amazon EDR service. Based on the feedback, this feature is now made available by AWS & is generally available to all customers and partners.
For the recovery phase, AWS DRS provides options to recover your entire environment or a specific server or servers. Select a recovery point from which to launch servers, using their latest state or a previous point in time.
After you have failed over to AWS, evaluate whether to continue operating your workload as primary on AWS or whether to fail back to your original source. If you choose to fail back, start replicating your data back to your original source environment as soon as it is back online. Choose whether to replicate back to the original servers or new servers, and whether to replicate only the data volumes or the entire server. For more details

The above observations are based on our POC, for the detailed troubleshooting guidance, refer AWS DRS troubleshooting guide.

Conclusion

With AWS Elastic Disaster Recovery (DRS) service organizations can setup disaster recovery sites in AWS cloud to meet their RTO and RPO requirements. It minimizes downtime and data loss with fast, reliable recovery of on-premises and cloud-based applications using affordable storage, minimal compute, and point-in-time recovery.

In this blogpost we shared the information based on our POC with different scenarios listed above. The steps & lesson learned above are purely based on POC environment setup internally, the results can significantly vary for your disaster recovery setup, depending on your specific needs of RTO and RPO. For complete details about Elastic Disaster Recovery, please refer AWS documentation AWS Elastic Disaster Recovery user guide & Disaster recovery for SAP workloads on AWS using AWS Elastic Disaster Recovery

Explore Business Continuity Options for SAP workload using AWS Elastic DisasterRecoveryService (DRS)

SAP PI for Beginners

ABAP 7.40 Quick Reference

Fiori: technical installation and configuration of one app from A - Z