A New Approach for Replicating Tables Across Diffe...

SeungjoonLee · ‎08-11-2020

Update on September 15, 2023:

Replacing the statements for manual reactivation.
Making minor adjustments.

Update on June 29, 2022:

Adjusting to make the supported scenarios clearer.
Replacing 'Other Limitations' with the official help documentation link.

Update on January 21, 2021:

Hybrid scenario with SAP Cloud Connector.

Starting with SAP HANA Cloud, the SAP HANA database in SAP HANA Cloud supports a new remote table replication feature that allows tables to be replicated across different SAP HANA systems natively. With this new feature, users can handily replicate remote tables by sending DDL and DML logs directly from a source system to a target system through the SAP HANA smart data access (SDA) ODBC connection.

In a nutshell, the new remote table replication was developed with the following five key characteristics.

Transient (in-memory) log-based replication: Since this new feature is log-based at the transactional layer, it can provide better performance with less overhead to a source system compared to trigger-based replication.
ODBC connection: It ensures optimized performance between SAP HANA systems because it uses an SDA ODBC connection with the hanaodbc adapter for its log/data transfer.
Transactional consistency: If multiple source tables are updated within one transaction, it applies the same behavior to the target tables to ensure database-level transactional consistency. However, transactional consistency is only guaranteed if the transaction is handled at the database level, and it doesn’t guarantee transaction consistency at the application level such as ABAP.
Near real-time replication: It continues to replicate changes as they occur (change data capture).
Auto reactivation for failures: When an indexserver is turned off for any reasons, such as a crash or upgrade, all replications are disabled. However, when the indexserver restarts, the replications are reactivated and resumes replicating.

Getting started with the new remote table replication

First of all, this feature is supported between SAP HANA Cloud systems. It is also supported when replicating tables from SAP HANA Platform 2.0 SPS05 or later (source) to SAP HANA Cloud (target) as a hybrid scenario. In this hybrid scenario, the SAP Cloud Connector is needed to create SDA remote sources to SAP HANA Platform 2.0 SPS05 or later, which is protected by a firewall.

Please also refer to the table below for the supported scenarios.

Scenario	Supported?
Replication between SAP HANA Cloud systems	Yes
Replication from SAP HANA Platform 2.0 SPS05 or later (source) to SAP HANA Cloud (target)	Yes
Replication from SAP HANA Platform 2.0 SPS04 or earlier (source) to SAP HANA Cloud (target)	No
Replication from SAP HANA Cloud (source) to SAP HANA Platform 2.0 (target)	No
Replication between SAP HANA Platform 2.0 systems	No

That is, the remote table replication feature is only supported when the target is the SAP HANA database in SAP HANA Cloud, as this feature is exclusively available in SAP HANA Cloud. Therefore, it cannot be used for replicating tables between SAP HANA Platform 2.0 (on-premise) systems. For replicating tables between SAP HANA Platform 2.0 (on-premise) systems, SAP HANA smart data integration (SDI) is the right solution.

Configuration and activation

To use this new remote table replication, the first step is to create a remote source with SDA in the target SAP HANA database in SAP HANA Cloud. SDA supports 3 types of credential modes to access a remote source with SAP HANA Cloud: technical user, secondary credentials, and single sign-on (SSO) with JSON Web Tokens (JWT). Currently, this new feature only supports the technical user mode, and users need to configure 'DML Mode' of a remote source property as ‘readwrite’.

The ‘readwrite’ is a default value for the remote source property ‘DML Mode’ in SAP HANA Database Explorer but please make sure that the created remote source is configured correctly. And the remote source should be named in UPPERCASE to initialize the remote table replication.

For more details of creating an SAP HANA Cloud remote source, please refer to the below official documentations.

SAP HANA Cloud, SAP HANA Database Data Access Guide: Import Certificates for SSL Connections to Remote Sources
SAP HANA Cloud, SAP HANA Database Data Access Guide: Create an SAP HANA Cloud Remote Source

For creating an SAP HANA on-premise remote source in a hybrid scenario, which is becoming increasingly important, please refer to the below official documentations.

SAP HANA Cloud, SAP HANA Database Data Access Guide: Set Up the Cloud Connector
SAP HANA Cloud, SAP HANA Database Data Access Guide: Create an SAP HANA On-Premise Remote Source

The next step is to create a virtual table in the target system. The following statement will create a virtual table called V_CUSTOMER in the MARKETING schema of the target system. This virtual table will be based on the CUSTOMER table in the SALES schema of the source system.

CREATE VIRTUAL TABLE MARKETING.V_CUSTOMER AT "REMOTE_REP"."<NULL>"."SALES"."CUSTOMER";

Then to create a replica table and remote subscription, the following steps are needed.

-- create a replica table
CREATE COLUMN TABLE MARKETING.R_CUSTOMER LIKE MARKETING.V_CUSTOMER;

-- create a remote subscription
CREATE REMOTE SUBSCRIPTION SUB ON MARKETING.V_CUSTOMER TARGET TABLE MARKETING.R_CUSTOMER;

Since the replica table is created with COLUMN LOADABLE by default, the entire columns will be loaded into memory for optimal performance. However, if necessary, the user can switch it to PAGE LOADABLE. This means that column data is loaded by page into the buffer cache, reducing memory usage. To switch, the user can use the following statement.

-- switch to page loadable
ALTER TABLE MARKETING.R_CUSTOMER PAGE LOADABLE;

-- switch to column loadable
ALTER TABLE MARKETING.R_CUSTOMER COLUMN LOADABLE;

Those are all the configurations for setting up new remote table replication. The following statements can be used to activate, deactivate, or drop the replication.

-- activate the replication
ALTER REMOTE SUBSCRIPTION SUB DISTRIBUTE;

-- deactivate the replication
ALTER REMOTE SUBSCRIPTION SUB RESET;

-- drop the replication
DROP REMOTE SUBSCRIPTION SUB;

If users drop the replication and want to reactivate it, they need to either truncate or recreate the replica table. Please find the further details in the links below.

SAP HANA Cloud, SAP HANA Database Data Access Guide: Configure Remote Table Replication with the SDA HANA Adapter
SAP HANA Cloud, SAP HANA Database Data Access Guide: System Views for Monitoring Remote Table Replication

Auto reactivation

As mentioned, this new remote table replication feature supports auto reactivation for failures. During the reactivation, it synchronizes each pair of source and target table first then changes their replication status to ENABLED. Users can also manually do reactivation with the following statements.

ALTER REMOTE SOURCE <remote_source_name> SUSPEND CAPTURE;

ALTER REMOTE SOURCE <remote_source_name> RESUME CAPTURE;

Please also refer to the following official documentation.

SAP HANA Cloud, SAP HANA Database Data Access Guide: Suspending and Resuming Remote Table Replication (Smart Data Access HANA Adapter)

Differences from the replication with SDI or SLT

SAP HANA smart data integration (SDI) with HanaAdapter can be used for replicating tables between SAP HANA Cloud systems. However, this new feature with SDA hanaodbc adapter provides better performance with log-based replication and a simplified approach without the need to deploy the Data Provisioning (DP) Agent.
Although it supports near real-time change data capture like SDI, there are some limitations. For example, the schema of source table and target table must be identical with no additional filters or different partitions. If you would like to know more about the limitations or comparisons between SDI and SDA, please refer to the link below.

SAP HANA Cloud, SAP HANA Database Data Access Guide: Replicating Tables from Remote Sources

SAP Landscape Transformation (SLT) Replication Server can also be used for replicating tables between SAP HANA systems but it is trigger-based and requires additional installation. For these reasons, this new remote table replication can provides benefits from log-based replication with a simplified approach compared to SLT.

Furthermore, the key differentiator, as explained, compared to SDI and SLT, is that in the case where multiple source tables are updated within one transaction, it applies the same behavior to the target tables to ensure database-level transactional consistency. However, this does not mean that SDI does not support transactional consistency. The difference is that this new remote table replication ensures transactional commit order based on the original transaction commit ID.

Other limitations

For other limitations, please refer to the link below.

SAP HANA Cloud, SAP HANA Database Data Access Guide: Unsupported Data Types and Other Limitations

Hybrid scenario and futures

As briefly mentioned above, the hybrid scenario is growing in importance, and this new remote table replication will play a crucial role as one of key enablers for the hybrid scenario. In parallel, it will continue to evolve and be augmented with more capabilities and better supportability. So, please stay tuned for our next blog series.

werner_daehn · ‎08-11-2020

Nice!

A couple of thoughts when reading. Appreciate your input.

One other odd limitation is that you cannot create subscriptions on Hana local tables. You always have to create a virtual table that reads via the adapter your own database. With this feature, it should be possible to resolve. Granted, not the most important use case, especially in the light of no-data-duplication but is required sometimes.

You are saying the HanaAdapter of SDI does not support transactional consistency. Are you sure about that?

Are all subscription writer types supported, insert-only, upsert, with change_type and change_time?

Why the limitation of "source and target have the same partitioning"? I assume because of truncate statements? If that is the only reason, truncate table on production data happens less often than partition changes.

What happens with the replication when the source does change the partitioning of a table?

Is the adaptive schema supported, meaning when the source gets an additional column via alter-table-add does the target as well?

The reason SDI does use triggers is because there is no API to read the Hana transaction log, not even an SAP internal one. It seems there is one now. Can this API be used for other consumers as well, e.g. SAP Data Services reading the change data or SAP BW Extractors having an easy way to identify changes or 3rd party?

Will this solution make the SDI Adapter obsolete some time in the future as it can do everything the SDI Adapter can? For the supported Hana versions and deployments at least? Or asking the other way around, other than dpagent and no support for on-prem Hana yet, what are other limitations that might prevent using this new adapter?

Taesuk · ‎08-11-2020

Hi Werner,

If the replication is about replicating within the same instance, it should be handled by OSTR (for scale-out). So, no need to use virtual tables for replication. I think the limitation you’re mentioning is about DP server trying to replicate local tables, isn’t it?
In your blog (https://blogs.sap.com/2016/03/26/hana-smart-data-integration-transactional-consistency-and-delta-han...), you’ve explained that SDI supports this as well. But the difference is that it is based on time changes. There can be multiple transactions committed in the same milli-seconds where as this is respecting the original commit transactions.
The underlying technology is different from SDI, so different writer types are not supported.
As the replication is done at the transactional layer and not session layer, internal dependencies exists
The DDL changes are forwarded to the target
Yes, any DDL changes are also applied to target
This is limited to HANA to HANA only and cannot be consumed by other tools.
There are still differences between SDI such as the writer types you mentioned above and support for flowgraphs

@Lee, Seungjoon – Please correct if needed

mohan_kalidindi · ‎08-12-2020

Like smart phone data transfer ...

SeungjoonLee · ‎08-12-2020

Thank you Tae-Suk!

Hello Werner,

Thank you for your feedback and questions.

As almost questions are already covered by Tae-Suk, let me add a few points as below.

2. My explanation might somehow mislead readers. I didn’t say that SDI does not support transactional consistency, and that was not my intention. As Tae-Suk commented, it is about the difference. This new remote table replication ensures transaction commit order by leveraging original transaction commit ID.

I added the below sentence to the blog to avoid any further misleading.

“However, it doesn’t mean that SDI does not support transactional consistency. The difference is that this new remote table replication ensures transactional commit order based on original transaction commit ID.”

3. Even though it shares the SQL interface of SDI, its underlying technology is different as Tae-Suk mentioned. It only supports real-time replication with schema change, and users will see ‘feature not supported’ for other writer types.

5. More precisely, it supports adding empty partitions or dropping existing partitions, but it doesn’t support splitting or shuffling partitions.

8. No, there is no plan like making the SDI adapter obsolete. However, many customers expect that replicating tables between SAP HANA Cloud systems should be possible natively without any further installation or customer managed application. And it is also natural to extend its scope such as hybrid as long as it is technically feasible and it can provide benefits. So, between SAP HANA systems, this is the recommended option as it gives better optimizations but customers can still prefer SDI as it has richer features for data movement.

lbreddemann · ‎08-30-2020

Thanks for this outlook on (yet another) data replication technology for SAP HANA.

Just as with the already existing products (SDI, SDA, SLT, Sybase Replication, RDSync, ...) it is not quite clear, when to use what and what the implications of each of the different options are.

Notably, those products focus on a single major data "federation" scenario: data is brought into HANA. While I am a fan of HANA, I can tell that this is not the single best scenario for many customers. This is especially true nowadays, where sufficiently "smart" data storage is available from the diverse cloud providers for a fraction of what HANA costs.

So, why is there the perception that the SAP user base needs/wants yet another "bring-it-into-HANA" product? Surely the technology could have just been implemented within the existing products if it was simply about improving on the shortcomings of previous developments.

Is there a specific use-case where this scenario really shines? What is it?

Another question that I can't quite figure out myself is: why package the replication of HANA internal data structures in ODBC? I understand that in order to operate "in the cloud" one cannot use arbitrary low-level constructs (i.e. using TCP/IP | UDP is the common denominator for network communication, implementing your own wire-protocol on top of ethernet or building your very own network interconnect is simply not an option on public IaaS infrastructure). But ODBC and its demands for type casting and memory-copying don't seem to lend itself for replicating data structures between to instances of the same software.

Can you provide some insight into this choice?

Final question (for now): how does the "re-sync" work in case the replication had been interrupted? Where is the "last-known-good" sync point kept? As the virtual tables support read-/write access there must be at least two change-sequence-histories that need to be merged and kept in sync.

How does that work and how can one monitor this?

Cheers,

Lars

SeungjoonLee · ‎09-09-2020

Hello Lars,

Regarding when to use what and a specific use case, please consider this new approach as a preferred option for replicating tables between SAP HANA Cloud instances.

As you know, SAP HANA Cloud doesn’t support MDC. So, customers cannot create multiple tenants within the same SAP HANA Cloud instance. In on-premise world, MDC is being used for overcoming hardware limitations or for some business reasons. However, in cloud, as we don’t have such hardware limitations and we can leverage cloud elasticity, having separate instances can provide more benefits.

In that sense, if customers want to have multiple tenants with SAP HANA Cloud as they did in on-premise, provisioning multiple SAP HANA Cloud instances is needed.

And if they want to replicate tables between those SAP HANA Cloud instances, this is the recommended option as it is optimized for this use case and there is no need to deploy or operate additional application.

Regarding your second question, ODBC is just used for transferring internal logs and replication requests. And SAP has been investing in SDA for further optimizations and performance improvements.

Regarding the re-sync, this new approach is not based on keeping the sync points mechanism. And please refer to the above link for monitoring (SAP HANA Cloud Administration Guide: System Views for Monitoring Remote Table Replication).

former_member623636 · ‎04-09-2021

Nice blog! Thanks for sharing!

priyanka_patankar · ‎04-09-2021

Hi seungjoonlee and taesuk.son

Nice blog seungjoonlee! Past few months I have been working on SDI that is connecting on-premise HDB to HANA DB on Cloud foundry. Was able to connect successfully through HANAAdapter and the Virtual tables are have the live data,
Created Replicated tables Like the Virtual tables.
Created Remote Subscription for the replication from Virtual table to Replicated table.
Inserted the initial load of records from the Virtual table to the replicated table
then Altered the Subscription for QUEUE and then DISTRIBUTE

the replicated tables showed live data for a couple of days, then suddenly it stopped working. Have you faced this scenario before? Am I missing any step? Could you please guide me on this for any troubleshooting steps I need to carry?
Also, there are not exceptions recorded in neither of the systems, hence not sure what I can check.
Have also tried few troubleshooting steps following this blog:

https://blogs.sap.com/2017/12/09/hana-sdi-smart-data-integration-2.0-h2h-real-time-replication-lesso...

was not able to change the value for DDL scan interval in minutes as the Remote source is running other than the queries on _SYS_TASK due to insufficient privileges, did not find any anomalies!

Hope to hear from you soon!

Thanks and Regards,

Priyanka

SeungjoonLee · ‎04-09-2021

Hi Priyanka,

Thank you! By the way, as I'm a Product Manager for SDA only, unfortunately, I cannot answer your question. I would recommend asking to SDI experts.

Best,
Seungjoon

priyanka_patankar · ‎04-09-2021

Hi seungjoonlee

No worries! Thank you for your prompt response!

Regards,

Priyanka

David_B · ‎03-22-2022

Hi,

I'm using HANA Services on BTP.
I'm already using SDI to replicate tables into HANA.

Can I use this solution to replicate data in realtime across various tables with small transformations ?

Regards,

SeungjoonLee · ‎03-23-2022

Hi David,

No, unfortunately, this is 'SAP HANA Cloud, SAP HANA database' (container-based) only feature.

In 'SAP Cloud Platform, SAP HANA Service' (VM-based), SDI is the right solution for replicating tables.

Please also refer to the blog from Denys, SAP (HANA) Cheat Sheet for the differences.

Best,
Seungjoon

Cocquerel · ‎03-01-2023

Is there some options to improve to duration of initial load for large table ?
In my case, I'm trying to replicate a table that has 201 columns, 356 millions rows and size is 20 GB.

The initial load is estimated to 12 hours and my issue is that the tunnel connection between SAP Cloud Connector and HANA Cloud get always interrupted during few milliseconds before the end of initial load. This issue is described in note 2682913 - SAP Cloud Connector - Tunnel connection is broken - Connection reset by peer . My network team made some investigation to find the root cause of those intermittent connection cuts but did not find anything. That's the reason why I'm looking on how to improve initial load duration to minimize the risk it gets interrupted.
Looking at "SYS"."M_REMOTE_STATEMENTS", I have observed the initial load is split into several small selects of 200 000 records max and each single statement is running around 30 seconds. Those statements are running in sequence one after the other. Is there some option to run those statements in parallel ? is there some options to increase the packet size ?

SeungjoonLee · ‎03-16-2023

Hello Michael,

I saw the separate ongoing email discussions, but sorry for the delayed response here.

Regarding your questions, in general, if you can partition the source table (e.g., 8 partitions, and the target table should be identical to the source table), parallelization can be achieved to some extent, and we can expect improved performance with this approach. However, please kindly understand that the performance will not be improved in proportion to the number of partitions. And as there are many other factors, the performance improvement via partitioning cannot always be guaranteed.

Apart from that, since we're aware of the difficulties caused by the initial loading of a huge table, we're continuously investing in this area. So, I expect to come back with some progress in the future.

Cocquerel · ‎03-16-2023

Thanks, I will try this method

Cocquerel · ‎03-20-2023

I have dropped a table replication subscription but, for unknown reason, on on-prem source system, the related entry is view "SYS"."M_REMOTE_TABLE_REPLICAS" has not been dropped.
Now, when I try to create a new subcription for the same table, it fails with error : "Error: (dberror) [403]: internal error: Could not execute a SELECT statement on a source table (detail: [SAP AG][LIBODBCHDB SO][HDBODBC] General error;129 transaction rolled back by an internal error: Could not add replica due to duplicate replication status entry)". The same error is also appearing on the on-prem source system indexserver trace.
How to clean-up "SYS"."M_REMOTE_TABLE_REPLICAS" entries ?

Cocquerel · ‎03-21-2023

Hello,

I have made 2 tests: one with 10 partitions, one with 20 partitions: For both cases, I have observed in "SYS"."M_REMOTE_STATEMENTS" a maximum of 8 statements running in parallel. Is it a limit that I can change ?

Cocquerel · ‎03-22-2023

Does the new connection property allowReconnectOnSelect coming with HANA Client 2.16 (see https://help.sap.com/docs/SAP_HANA_CLIENT/f1b440ded6144a54ada97ff95dac7adf/197dc47daa1d43efa5a77d314... ) is also available for SDA remote source from HANA Cloud to HANA On-premise ? Would it help in my issue that the tunnel communication from SAP Cloud Connector to HANA Cloud is regularly interrupted by proxy and would avoid having to restart the table initial load from scratch when the issue occurs ?

SeungjoonLee · ‎03-22-2023

No, the number of parallelization is not a configurable. And this is one of the reasons why I mentioned 8 partitions as an example, and also mentioned that the performance will not be improved in proportion to the number of partitions.

SeungjoonLee · ‎03-22-2023

No, I don't think that allowReconnectOnSelect helps as it is about SQLDBC and JDBC. As you know, SDA based remote table replication is based on ODBC.

SeungjoonLee · ‎03-22-2023

I'm not sure what happened here, but for some unknown reasons, it appears that my previous reply was removed. Here's my response again.

This is a known issue, and it was improved from the below combinations.

SAP HANA Platform 2.0 SPS06 Rev63 or later (source)
SAP HANA Cloud QRC 02/2022 or later (target)

That is, from the above combinations, executing DROP REMOTE SUBSCRIPTION and then CREATE REMOTE SUBSCRIPTION will automatically detect those inconsistencies and clean up the garbage information. So, you will not see this error anymore.

However, unfortunately, if the versions of SAP HANA Platform and SAP HANA Cloud are lower than above, one possible workaround is cleaning up all of the remote table replication related tables by dropping remote sources on both source and target systems, then creating a new remote source with a different name and establishing the remote table replication again. If you don't want this workaround, and what you want is just cleaning up the duplicate entries without dropping and creating a new remote source, then I have to say that the only option is raising an incident via SAP Support.

ravicondamoor · ‎03-09-2024

Hi,

Am replicating data from Hana 2.0 sps6 to SAP datasphere using the SDA connection option in Datasphere (no DP Agent but only Cloud connector). Is this replication done via log replication like with Hana cloud as a target?

Thanks

-ravi

SeungjoonLee · ‎03-29-2024

@ravicondamoor Since I am responsible for SAP HANA Cloud, unfortunately, I am not the right person to confirm the capability of SAP Datasphere. As far as I know, the remote table can be used to replicate data from SAP HANA on-premise, but I would recommend reaching out to the SAP Datasphere team.